Don't Swap One Management Problem For Another
It's all about defining and adhering to change management.Utility computing is useful and cool, but before committing to a system, I'd have a real heart to heart with the software vendor about potential failures and recovery steps, and I'd want demos of both.
November 16, 2009
When I ran Network Computing's lab at Syracuse University, the network and servers were critical production systems that had to be kept running.If something broke,we had to fix it or get tech support to fix it. This was a product testing lab with a highly dynamic environment,and we were often our own worst enemy in terms of system stability. We did what we could to keep things efficient, like making server images and restoring them after testing. But no matter what we did, 20% of our time was spent on installs and another 10% to 20% on maintenance.I considered adding more automation,but that seemed likely to just shift the management burden from hardware to software.
One of my goals was to never open a rack. Every time someone opened a door,the chance of knocking something loose loomed,and more important, cabling runs become a mess of orphaned cables, mocking us whenever we strung new cable.You know what I mean. The utility computing systems that Randy George talks about in Next- Generation Data Center:Delivered [magazine registration required] and the LiquidIQ system JoeHernick tests in A True Data Center In A Box would have been just the ticket. A fully racked and cabled system where I could dynamically provision servers, networking, and I/O with a multiterabyte SAN would have significantly reduced test-bed setup time.Sure,we cut provisioning time from days to hours with server imaging software and a flat,consistent network platform,but too often we had to spend time in our noisy,cold data center,running cables and accessing consoles.
To make a system like Liquid's work, you need orchestration software that automates all of the fiddly tasks to provision a new server. It includes multiple integrated systems that provide server management, application deployment, runbook automation, configuration management, and network and storage management.
Orchestration systems use templates that administrators select and customize. Press the "go"button, and within minutes your servers are deployed, booted,and waiting for you to log in.That works great in a homogeneous environment,but few data centers are homogeneous so typically some part of the orchestration ends up being manual.
Given that reality,my fear is that orchestration systems simply shift the management burden from servers to the orchestration software. Take the case of enterprise network management systems. They're expensive and need a small army of experienced administrators to integrate the components and maintain the fragile system. Orchestration systems could suffer similarly.When working properly,they're great.When they crumble,they could cost you as much downtime as any hardware failure. An orchestration software failure could cascade through all of the integrated management subsystems and possibly corrupt the configuration management data.We had to rebuild our server inventory more than once because of corruption.At the very least, a failure can leave you in a state where you can't or don't want to make changes outside the orchestration system.Making such changes can make recovery more difficult.
If you must make changes, then have a back-out plan in case something goes wrong. It's all about defining and adhering to change management.Utility computing is useful and cool, but before committing to a system, I'd have a real heart to heart with the software vendor about potential failures and recovery steps, and I'd want demos of both. And be sure to talk to peers from companies using the product in production to get their take.
Read more about:
2009About the Author
You May Also Like