Crunching The Virtualization Numbers
Network engineers are doing a good job of handling virtualization in their environments. Yes, you heard that right. There are some performance problems, but the adjustment from physical to virtual servers has not resulted in a large uptick in the key Layer 2 KPIs that we use to quantify LAN performance. That means that the cries from application teams about the LAN under-performing now that servers and their corresponding applications are virtualized is not based in fact. So, let's look at some
September 16, 2010
Network engineers are doing a good job of handling virtualization in their environments. Yes, you heard that right. There are some performance problems, but the adjustment from physical to virtual servers has not resulted in a large uptick in the key Layer 2 KPIs that we use to quantify LAN performance. That means that the cries from application teams about the LAN under-performing now that servers and their corresponding applications are virtualized is not based in fact. So, let's look at some facts.
Start with a simple question: are virtualized environments more prone to network problems than traditional networks? I wanted to know, so I took some time to do a deep dive into some of our analytics and identify whether or not physical servers were immune from some of the more prominent issues we see in data center LANs. My gut told me that the LANs with virtualized servers would fare worse than those with physical servers. I thought that the added complexity of the network connectivity for virtualized servers would add to misconfiguration, oversubscription, and ultimately, more dropped packets.
I was wrong.
So far this year, we have analyzed 4781 interfaces that connect to single server hosts. These interfaces are directly attached to a server and ranged from 10Mbps (yep) to 10Gbps. On 3.11 percent (149) of these interfaces, we recorded dropped packets (discards). The overwhelming majority were output discards, representing more traffic than can be put onto the wire, a resulting exhaustion of buffer space, and ultimately a dropped packet. These drops averaged 424,739 per host. That's a really high number.
I then cut out the top five percent of interfaces, and the total dropped from 63,286,205 to only 727,268. This points to a group of interfaces dropping a significant amount of traffic but reduces the overall drops per host to only 5,157 packets. That's an improvement, although you definitely don't want to have eight interfaces showing over 60,000,000 drops in your network. The physical servers, save for a few, appear to be doing pretty well, and even when they do have dropped packets due to discards, the average amount is not huge. (They still need to be corrected, though.)How did the virtual servers do in this comparison? Well of the 2540 interfaces that connected to multi-guest servers--meaning that we see multiple servers on a single interface--79 had output discards. That also comes to 3.11 percent. No kidding. It is exactly the same amount as the physical servers. That isn't the whole story, though.
You might assume, as I did, that because we have multiple hosts on a single interface we would see a larger amount of loss when we had it, but that is not correct either. On 79 links, we have a total of 30,518,646 output discards. Controlling for the top 95 percent, as we did with the physical servers, we have only 376,321 total drops. That number is proportional to the physical servers on a per interface basis, but the big difference is that on these 74 interfaces, we are actually hosting 463 servers. That drops our per server discards to only 812.
Even if you look only at the top 5 percent of interfaces, you get approximately the same amount of discards per interface, with the major difference being the number of VMs served by the five percent in the virtualized group. Input discards are also an issue on these server interfaces. However, we believe that input discards are tied to other factors such as shared buffer architectures. We will be looking at them over the next week to let you know if the data backs up that hypothesis. In the meantime, let's track down these bad interfaces, virtual or not, and fix them.
About the Author
You May Also Like