On Location: Chicago Tribune
The Tribune can't afford to miss a deadline, so its IT department proposed an impressive server-consolidation project that promises to cut costs and improve uptime.
August 13, 2004
"It's very simple: We don't accept missed deadlines--or missed budgets--ever, for any reason," Dejanovic says. "That may seem harsh, but it's the nature of this business."
That mentality is also driving the Tribune's server consolidation, a comprehensive initiative that includes elements of disaster recovery, mainframe migration, server clustering and storage-area networking. The project is designed to make the newspaper's computer systems available at peak performance in any situation, at a lower cost than the mainframe environment it replaces. Ironically, the glitch occurred as the Tribune was moving one of its critical applications over to the consolidated server environment, proving--unpleasantly--the need for high availability.
But the project isn't just about technology. The endeavor reflects a cultural shift of people and processes that has been taking place at the newspaper for several years (see "Tribune's New Goals Require Culture Shock"). Although many companies talk about aligning IT with business units, the Tribune's IT group is indeed making itself over to operate like a news organization. That means getting the server-consolidation project done on time, no matter how much overtime.
The Problem
In September 2002, Dejanovic and his team were preparing their annual budget requests, and it wasn't a pretty picture. The newspaper's IBM System/390 mainframes were running out of steam and would have to be upgraded or replaced with Unix servers. The Tribune also had a few Sun Microsystems 3500 servers for graphics and manufacturing applications, but these, too, were running out of capacity.The server limitations not only affected IT capacity, but they also held up related projects, including a SAN rollout and an upgrade of circulation and advertising applications. Most critical for a deadline-obsessed organization, the Tribune's disaster-recovery setup wasn't sufficient. Although the newspaper has two large buildings--Tribune Tower and Freedom Center, about two miles away--almost all critical operations were concentrated in the Tribune Tower data center. One disaster--an anthrax scare, for instance--could have been a catastrophe.
Mainframe migration. Sun server upgrade. SAN implementation. Application upgrade. Disaster recovery. The Tribune had so many ongoing independent projects and subprojects that it was hard to keep track of them. The schedules and budgets for these separate projects were staggered over several years. Yet many were dependent on one another.
After wrestling with technologies, schedules and budgets for weeks, Dejanovic and his team came up with a radical idea: Consolidate all server-upgrade projects into one with a disaster-recovery element, then accelerate the schedules and budgets to complete the project in a year.
The Proposal
Vital StatsClick to Enlarge |
The Tribune's server and network design teams came up with an innovative architecture that solved several problems simultaneously. First, they would migrate the newspaper's most critical business apps--advertising, circulation, customer information and editorial production--off the mainframes and onto two Sun Microsystems Sun Fire 15K servers. The new project would accelerate the mainframe-migration project in progress and standardize the newspaper's critical applications on the Sun platform.
Second, the teams designed a server and network architecture that would geographically separate the two 15Ks, placing one in Tribune Tower and one in Freedom Center. The two servers would back each other up, letting the newspaper switch all critical operations over to one data center if a problem occurred in the other.
Third--and this is the innovative part--the teams proposed to link the two servers over dark fiber AT&T had already sunk under Chicago. By leasing about 20 Gbps of dark fiber capacity, the Tribune would be able to connect the far-flung servers at room speed, making it possible to create a Sun server cluster between Tribune Tower and Freedom Center. The proposed link was one of the first to extend a Sun cluster beyond a few hundred feet.
By buying new Unix servers and clustering them across two buildings, the Tribune's consolidation project solved several of the newspaper's problems. The new servers upgraded the old Sun boxes and provided a path for moving off the mainframe. The clustered environment ensured enough performance and capacity to support the new SAN as well as new applications for circulation and advertising. And because the new servers were being used simultaneously, the new environment would create an active-active disaster-recovery setup, obviating costly, passive backup devices that sit idle until a disaster occurs.
"From a project-management perspective, it just made sense," says Scott Tafelski, director of technical development. "We were eliminating time and cost from the equation."Before the proposal could fly, however, the IT organization had to clear one huge hurdle: persuading upper management to spend money in 2003 and 2004 that had originally been budgeted for projects in 2005 and 2006.
"The first time we posed it, the answer was no, no, no," Dejanovic recalls. "It was a significant expenditure, and there was a lot of pushback." Dejanovic would not reveal the cost of the project but says it was "well into seven figures."
Rather than develop an IT-only business case and then try to prove it to the bean counters, the IT organization brought in members of the finance department to help model the ROI. The team analyzed the cost of doing each IT project individually over several years, based purely on the cost of the equipment, software and maintenance agreements. Those costs were weighed against the expenditures required for the proposed one-year server-consolidation project. The team determined it would be about 30 percent cheaper to do everything in one fell swoop. And completing the project in one year instead of three would save the newspaper about 130 percent on hardware and software maintenance agreements, Dejanovic says.
Dejanovic's boss, Richard Malone, senior vice president and general manager of Tribune Co. and a man known for being tough with a budget, went to bat for the proposal. Within a single planning cycle, the IT team had the top-level approvals it needed to begin. There was no need for an RFP because Sun was already the Tribune's Unix vendor, and AT&T was the only service provider offering the dark fiber. AT&T recommended Nortel as the switch vendor for the fiber connection, and representatives from the three companies met with Tribune staff for an intense design and negotiation session. After the initial deals were drawn up, the Tribune team said, no more would be paid--if the vendors didn't meet the budget and deadlines, they'd be spending their own nickel to make up the difference. The Tribune also left it up to the vendors to work out the integration of servers, switches and dark fiber.
"We had all of them in one conference room," says Milind Dere, supervisor of client systems and one of the project's managers. "We said, 'OK, you guys are the experts, but we need this project to work right the first time, and if you can't work out the connections right now, we are not going ahead with it.' "The Project Plan
The server-consolidation project crosses many technology and organizational boundaries. On the technology side, it has required a transition from OS/390 to Solaris servers, from a single data center to two data centers, and from a single host environment to multiple servers clustered over dark fiber. In addition, the server-consolidation team had to coordinate its efforts with other project teams, including those implementing the EMC SAN and two new applications environments.
On the organizational side, the Tribune's IT department has instituted a project-management methodology that establishes best practices for all IT projects and involves the "business owners" from the beginning. Under Tafelski's leadership, the Tribune's IT department has, in fact, become converts to the Project Management Institute, the cross-industry organization that sets standards, methodologies and guidelines. About half of the IT staffers who manage projects at the newspaper have passed the PMI certification test, and Tafelski's goal is to make certification a requirement for every project manager. PMI maintains a set of guidelines called the Project Management Book of Knowledge (PMBOK), which is designated as a standard by the American National Standards Institute.
"Basically, we took PMBOK and we built our own methodology around it, and that is what our folks are required to follow for any project," Tafelski says. "We embed it into the planning phase, the initiation phase, the execution phase and the closing phase."
The project-management structure ensures that all involved parties--IT groups, business managers and vendors--are given a stake in the project and are held to a strict schedule. Budgets are generally shared between IT and the business groups, and business people often have as many deliverables as IT people. If an individual fails to meet a deadline in one step of the plan, no matter how small, Tafelski and his colleagues can spot it immediately and respond to it. "Interim deliverables are the best predictors of undeliverables," Tafelski says.Uptime is considered a crucial element in every aspect of the Tribune's business, from the development of editorial content to home delivery. "If circulation is offline, the call center is affected. If advertising is offline, that impacts revenue," says Pete Mashek, director of production systems. "Editorial, manufacturing, distribution--there just isn't any part of a daily newspaper that can afford downtime."
The need for uptime made it easy for the server-consolidation team to gain the support of the functional business groups, but it also presented a problem: how to move all critical applications to the new server environment without affecting system availability.
Working with the business groups, the team came up with a project schedule that migrated applications to the Sun servers one at a time, in order of their sensitivity to downtime. The advertising department's data warehouse, essentially a large database of customers and prospects, was the first to go, because it's a static application that doesn't need to be available every minute. The editorial system, which is in play at all hours and is crucial to content development and production, was among the last apps scheduled to move to the new servers.
With funding complete and a detailed project plan in place, the team was ready to begin the server-consolidation project by the fall of 2003. The first step was to link the two Nortel Opteron switches via AT&T's dark fiber--but if you've ever dealt with a service provider to provision a circuit, you know it seldom goes according to plan. The Tribune's IT team first had to talk AT&T into delivering the required circuits within six months, lightning speed for provisioning dark fiber. Then AT&T wanted the Trib to accept liability for maintenance of the circuits, which run through a tunnel under the Chicago River. If the tunnel collapsed during routine maintenance of the fiber-optic cable, the Tribune would have to foot the bill.
"We told them, 'no way,' " Dejanovic says. AT&T assumed liability for the circuit.Once AT&T and Nortel established the two redundant circuits, the Tribune team was ready to take delivery of the servers, as well as the SAN equipment that would be installed at the same time. And the vendors did deliver--in the middle of the Christmas holiday vacation. Rather than lose a couple of weeks, the Trib roused some people to head over to the loading dock.
It's this commitment to deadlines that characterizes the entire project. For each application ported to the new server environment, there was only one available time window, a three-hour period on Sunday mornings designated for systems maintenance. That means for each application, the IT project team gave up a Sunday for app cutover and testing.
Once the circuits were established and the server cluster was tested, it was time to move the applications to the new environment. The first application, the ad department's data warehouse, made the transition in May without a hitch. But when the editorial application was migrated last month, the software glitch set production back by more than five hours and wreaked havoc on a Monday edition.
"We tested it beforehand and found no problems," Mashek says, "and the application actually ran fine until we got to the point where we do the last-minute editorial changes. It was an intermittent problem, which made it even harder to detect."
The code error was located eventually by the software vendor, CCI, which provided a work-around that let the Tribune complete production of the Monday edition, albeit nearly six hours late and 24 pages short. Once the paper was out, the IT team worked with the vendor to fix the problem permanently, and Tuesday's paper was delivered without delay. It's a problem that won't occur again because the other applications moving to the 15Ks don't require custom code, Mashek says.Following Up
At large newspapers, a team of copy editors, fact checkers and proofreaders ensures the quality and veracity of edited stories. The Tribune's IT QA (quality assurance) team performs a similar function, conducting training, auditing and quality checking.
The QA team joined the project team before server-consolidation implementation to ensure that the appropriate Tribune staffers received training on the new Sun servers. It also ensured that the new systems would comply with the Tribune's password policies, as well as with the Sarbanes-Oxley Act, which could affect the processes for storing data on the servers as well as the SAN.
During implementation, QA works with the project team to ensure that all possible testing scenarios are identified and completed and that the servers, applications and dark fiber connections are completely operational before cutover. QA also helps audit project management so that all steps are completed on budget and deadline.
Once the project is completed, QA will help monitor and test the disaster-recovery system during quarterly reviews. The QA team also will participate in future audits of changes to the server environment, as well as ongoing testing of the servers and the disaster-recovery system.The Future
If the server-consolidation project continues according to plan, the final applications will be migrated to the new server environment in the fall. For Tribune IT, much work remains, including full deployment of the SAN and the implementation of a new suite of advertising applications, but the server portion will be complete.
Meantime, the server-clustering architecture and disaster-recovery plan may soon grow beyond Chicago. The Tribune Co., which acquired Times-Mirror Co. in 2000, also operates eight other major daily newspapers across the United States, including the Los Angeles Times, Baltimore Sun and Long Island's Newsday. Eventually, Dejanovic says, these other newspapers could make use of the Tribune's server architecture, and the papers could use one another's data centers for backup in a nationwide disaster-recovery network.
"We actually had a crew from the L.A. Times here in June, and we showed them the solution and they liked it a lot," Dejanovic says. "I don't see any reason why, in several years, the whole company won't have a similar project working across the board."
Tim Wilson is Network Computing's editor, business technology. His background includes four years as an IT industry analyst and more than 14 years as a journalist specializing in networking technology. Write to him at [email protected]. Our sixth "On Location" documentary-style case study takes us up close and personal with the Chicago Tribune, the seventh-largest daily newspaper in the United States. The Tribune is getting a complete redesign, but not in the way the newspaper looks. The IT organization is combining projects for mainframe migration, Unix server upgrade and disaster recovery--initiatives originally scheduled to take several years--into a single server-consolidation project that will be completed in just over one year. We'll look at the process of building the new server environment, as well as the unusual high-speed connection that enabled the Tribune to extend a Sun server cluster across two miles. Just as important, we'll examine how a new IT culture and project-management environment helped the Tribune complete the project on time and within budget.
Previous 'On Location' Packages:
Life Time Fitness, Web Services
When Darko Dejanovic became CTO of the Chicago Tribune, he inherited an IT organization with a culture that was, to put it kindly, a bit less driven than the news business it supports.
"Before, it wasn't unusual for an IT project to run late or for systems to be unavailable," Dejanovic says. A test conducted in those first few months indicated that the Tribune's IT systems had about a 60 percent reliability rate. "About 80 percent of our time was spent in maintenance--finding problems and fixing them. In a business where deadlines are critical, it didn't make sense."
After evaluating the situation, Dejanovic and his team began to re-engineer the Tribune's IT environment. Not just the systems and technologies, but the culture of the department itself. Like a newspaper editor, Dejanovic made it clear that downtime, missed deadlines and high error rates would no longer be tolerated. "And if you don't like it, there's the door," he says.Over the past several years, many of the Tribune's IT workers have found that door, voluntarily or otherwise. The newspaper's IT staff has turned over almost 70 percent since Dejanovic came on board, through attrition, retirement and outright dismissals. The new crop of Tribune staff works long hours and meets hard deadlines, just like the reporters who write the stories and the press operators who print the paper.
"I won't kid you, it's not always easy," says Pete Mashek, director of production systems. "But when the paper goes out on the trucks on time, we take a lot of pride in that."
In addition to making the culture shift, the newspaper has revamped its servers and other systems to improve reliability. And the effort is paying off. In the past two years, the Tribune's IT systems have received a 100 percent reliability rating, and time spent on maintenance has dropped to less than 30 percent. Missed deadlines are not just rare, they are unacceptable.
Dejanovic's boss, Tribune Co. senior vice president and general manager Richard Malone, says it took some new blood to get the IT department moving in the right direction.
"The credit goes to the entire organization, but it started with Darko," Malone says. "We brought Darko here because we thought he could motivate people, he had a clear vision of the future and he was technically savvy. But most importantly, he had appreciation for the user. It wasn't about bits and bytes. I think he has infused the whole organization with those concepts."VP and CTO, Chicago TribuneVP and CTO, Tribune Co.
At Work: Responsible for managing all aspects of IT operations for the Chicago Tribune and the Tribune Co., its corporate parent
At Home: 34 years old. Married, one child. No hobbies: "I work too much."
Alma Mater: Northwestern University, MBA from Kellogg Graduate School of Management
HOW HE GOT HERE:2002 to present: Vice President, Chief Technology Officer, Tribune Co.
1999 to present: Vice President, Chief Technology Officer, Chicago Tribune
1997 to 1999: Vice President/Technology, Sun-Sentinel Co., Fort Lauderdale, Fla.
MOUTHING OFF:
What I say to people who resist moving off the mainframe: "You ain't got a clue."Worst moment of downtime in my career: Any downtime is the worst--a few minutes can seem like hours.
Most off-the-wall complaint ever made by a user: " 'Your IT department sucks, and there's nothing you can do about it.' We proved that one wrong."
If I only had a bigger IT budget, I would: "Be forced to cut it down."
Greatest business challenge: "Integrating business processes across the company."
The most misunderstood aspect of my job: "How much it takes to run a successful 24/7/365 operation. There is a lot that goes on behind the scenes."If I had the server-consolidation project to do over again, I would: "Do it sooner."
My next career: "CTO."
When I retire, I will: "Teach and travel."
Senior Vice President and General Manager, Tribune Co.
At Work: Responsible for managing many Tribune Co. operations, including IT, manufacturing, distribution, marketing, finance and emerging businessesAt Home: 48 years old. Married, three children. Hobbies include coaching children's sports
Alma Mater: Northwestern University, MBA from Kellogg Graduate School
HOW HE GOT HERE:
2003 to present: Senior Vice President, General Manager, Tribune Co.
2002 to 2003: Vice President of Operations, Tribune Co.MOUTHING OFF:
What I say to people who think that disaster recovery is a luxury: "It only takes one time to see otherwise."
The most misunderstood aspect of my job: "Explaining to people who don't work in the business that I'm not a reporter."
Why the Chicago Tribune is better than any other newspaper: "We strive to serve customers in our marketplace in a way that no other company can. And we do it, because we have the best customer insights that we can possibly get, and we have great efficiency of operations."
Greatest business challenge: "The proliferation of choices that consumers have for news and information, and the challenge to continuously innovate to meet the needs of our consumers."I love technology when: "It delivers real value to the business."
I hate technology when: "It takes priority over other business concerns."
My next career: "Volunteer work."
When I retire, I will: "Go fishing."
You May Also Like