Beware The Latest Infrastructure Tools
Proceed with caution when deploying new tools to avoid costly outages.
November 15, 2016
We computer geeks and software developers tend to be suckers for newfangled infrastructure tools, which we can’t help trying out like shiny new toys. However, I would caution IT professionals first to do no harm, because every component has bugs of some kind.
In my experience, the best network operators won’t deploy a component until they know how it breaks. Remember that technical infrastructure tools are almost always demonstrated in an ideal environment that’s tuned for maximum performance. You should only expect these things to work equally well if your environment recreates an identical input and use case, which generally is not the case.
At my company Kentik, we made this very mistake of falling for a trendy new tool last year. At first, it seemed like a cool service discovery system for network monitoring that would neatly tie various infrastructure components together for us. Unfortunately, this service discovery tool was not designed to work at the scale we were using it for, so it started causing micro-outages.
As a result, our operations team wasted up to 10 hours each week troubleshooting these outages until we finally gave up and migrated off the tool. Luckily, this problem only cost us time and resources and didn’t cause outages that were visible to our users. Such outages can damage a company, especially a startup.
I’ve witnessed far too many startups that were forced to make emergency migrations because their tools caused network bottlenecks or complete outages. I know of firms that went through three-day meltdowns due to multiple instabilities buried deep within their layers of infrastructure. In the worst cases, I’ve seen startups that implemented untested storage systems which failed miserably, causing a loss of critical infrastructure metadata and even some customer data. Such a betrayal of customer trust can destroy a young business; it’s just not worth the gamble.
Questions to ask
Simplicity really matters when it comes to assembling infrastructure. In terms of the core components that you use for storage, load balancing, and service discovery, you should only choose tools that won’t cause problems on their own. After all, you have enough problems to worry about with the rest of your infrastructure and application stack.
When it comes to adopting a new infrastructure tool, it’s worthwhile to create a system of checks and balances within your organization to vet the decision. Get the technical leadership team together to answer the following series of questions. Only in this way can your team decide if implementing a new technology is really worth the risk.
caution.jpg
Upside questions:
How mature is this tool, and is it actively in use for the type of application you’re running?
How much time, money, and effort can the tool save you if it works properly?
Is this a tool or component that you would eventually have to write yourself because your other/current options are so painful?
Downside questions:
How big of a risk are you willing to take, such as the loss of time, money, and possibly service outages or customers?
Do you have proof that the tool will remain stable in a variety of situations?
Will you have to contribute to the development of the component or tool? If so, can you afford the staff time to turn the component into something workable at a different scale?
Can you find people who have documented failure modes? If not, can you invest enough time to figure out the fragilities and recovery paths on your own?
Given all of these variables, there are still certain situations where it makes sense to adopt an experimental tool. For instance, when you’ve got a problem that could jeopardize your whole business model due to high costs or low network availability. In such cases, a new tool may make sense if the component can provide the rapid scale or economics that you need to survive.
Another valid reason for adoption could involve a problem that’s key to your customer or user experience, but one that you cannot solve in-house. When you’re caught in such a bind, it can be cheaper to join the developer community for an emerging tool rather than building it yourself from scratch.
Lastly, once you grow in size and maturity of infrastructure and team, I would suggest creating an infrastructure architecture review board. This should be a panel of your smartest technical minds who will ask the right questions and clearly think through all of the potential outcomes.
An “arch board” should weigh in on technical issues, much like the legal department that gives legal advice to the CEO. Having a formal review process will ensure that your decisions are based on specific details about the system and component architectures before you adopt any new tools. Taking these advance precautions will save you from a lot of pain and anguish, and it just may save your bacon.
Avi has decades of experience as a leading technologist and executive in networking. He was with Akamai for more than a decade, as VP network infrastructure and then chief network scientist. Prior to that, Avi started Philadelphia's first ISP (netaxs) in 1992, later running the network at AboveNet and serving as CTO for ServerCentral.
Read more about:
2016About the Author
You May Also Like