Linux Virtually Ready for the Data Center

New start-up efforts and maturing commercial and open-source projects promise to make virtualization more powerful and ubiquitous for Linux and the x86 platform. (Originally published in IT Architect)

April 1, 2005

14 Min Read

By now, the virtualization vision has been so hyped that its concepts have been burned into our collective consciousness. Soon to be gone are the days when running a new application meant buying a new server, sticking a bunch of storage on it, and loading up the new application. The new order calls for flexible, virtualized network processing and storage that can self-allocate to meet the needs of any workload without so much as a click of a mouse.

As applications are divorced from their hardware and the resources of the data center are sliced and diced in any way necessary, the theory goes that utilization will rise, processing will speed up, and the data center will become the nimble, cost-effective business driver everyone had hoped for. In the perfect data center, each application runs on a virtual machine (VM) sized perfectly for it. If the application's resource needs grow, the VM simply grows with it.

HAIL LINUX

As Linux becomes the heir apparent to proprietary Unix offerings, it's these virtualization capabilities that will be the measure of whether or not Linux is fit for the data center throne. But just how fit is Linux for the crown, and who's driving its progress?

Three different approaches to Linux virtualization have taken the limelight. The first and most well-known is VMware's, which calls for full virtualization and allows for such unique capabilities as running Windows and Linux on the same server. The Xen open-source project takes an approach called paravirtualization, in which the kernel is modified so that the OS knows it's running virtualized. While you won't be running Windows alongside Linux with Xen just yet, you'll see some performance advantages over the VMware approach.Finally, there's the approach taken by start-up Virtual Iron with its VFe product. VFe lets you take a collection of generic x86 servers and allocate anywhere from a fraction of a CPU to 16 CPUs all running a single OS image. By contrast, Xen and VMware only chop up the resources of a single system. However, VFe requires kernel modifications and a high-performance system interconnect between the servers under its care.

In each of these approaches, the idea is to give up a little--either in performance or system resources or both--to gain a lot in terms of resource utilization flexibility. In that regard, the management capabilities of these products are every bit as important as their virtualization capabilities. The ability to recognize increasing workload and either move processes to faster systems or increase the resources available to those processes is perhaps more important than whether a virtualization technique exacts a 5 percent performance penalty or a 15 percent one. That said, making the wrong choice in virtualization can cost dearly in terms of performance, so it's worth taking a look at the compromises required to make each virtualization approach happen.

VIRTUAL 101

The central job of any OS is to manage the hardware on which it runs. That boils down to scheduling CPU time, arbitrating memory allocation, and managing I/O to such physical devices as displays, storage, and networking adaptors. Once CPU scheduling, memory management, and I/O are virtualized, OSs become divorced from the hardware. They instead run on a VM as presented by the VM monitor (a term that harkens back to IBM mainframes of the 1970s). The OS becomes a guest on the physical hardware and no longer manages the hardware itself.

While CPU scheduling is certainly a topic that concerns virtualization systems, it's the approaches to memory management and I/O arbitration that affect VM performance most.The easiest way to attack the I/O problem is to create a virtual Ethernet or SCSI driver for the guest OS to use. A VM monitor then runs the real driver, and a client/server relationship is developed between the virtual driver seen by the OS and the real driver. While the concept is straightforward, an equally straightforward implementation can be a performance killer--particularly when I/O devices run at multigigabit speeds. Let's take the example of data coming into the system from a SCSI device. The VM monitor gets the data, figures out where it goes, then copies it to the appropriate guest OS's virtual SCSI driver memory space. The guest OS figures out what application process actually needs the data, then copies the data into the application's memory space.

All that copying greatly slows a system's performance. Devices such as routers and switches are fast because they go to great lengths to limit the number of times data is copied from one part of memory to another. Increasingly, server OSs are following in this direction. Ideally, once the driver copies data into memory, it's never copied again. Instead, pointers to and ownership of that data are simply moved around. But in a VM environment, even that can be computationally expensive because the VM monitor, the host OS, and the application all have to be involved in deciding what's done with the data and which process should eventually own the memory space where the data sits. If VM environments don't find a way around this three-step process, performance will suffer.

Even more important is memory management. Modern OSs have used virtual memory management for years. The idea is that the OS presents an application with a virtual memory image that's mapped into physical memory. With virtual memory, an application can have access to more memory than the system actually has. Since the memory map is virtual, slower storage devices like hard drives can be mapped into physical memory.

To facilitate memory management, memory is divided into pages. It's the job of the OS to keep frequently used pages in real memory, while allowing the rest to stay on a disk drive. To make this work efficiently, the OS needs the system hardware to help out. The CPU keeps a table of memory pages allocated to the running process. If that process tries to access a page outside of physical memory, the CPU lets the OS know, and the OS fetches the page from disk and loads it into physical memory.

In order to present a VM environment to a guest OS, this notion must be extended further. Another memory page table can be created for each guest OS, and the VM monitor can handle new allocations and instances where pages need to be fetched from disk. These straightforward implementations--ones that require either lots of memory-to-memory copying of data or lots of invoking of the VM monitor to fix the memory map--will slow down a server.The situation can be even more complicated for Non-Uniform Memory Access (NUMA) machines. With these systems, memory access time isn't uniform for all CPUs or physical memory. AMD's Opteron possessor was designed so that multi-CPU systems based on the Opteron would be NUMA machines. The CPUs each manage a piece of system memory and can access each other's memory through a HYPERchannel connection. By comparison, Intel currently employs a single off-CPU memory manager chip through which all CPUs communicate. That single memory management chip often becomes a bottleneck, even in four- and eight-way systems.

AMD's approach, while presenting less of a bottleneck, also makes the job of memory management somewhat harder for an OS or VM monitor. More care has to be taken to ensure that the most frequently accessed data isn't just in main memory, but in memory directly attached to the CPU running the process.

Because HYPERchannel is very fast and AMD's design allows for only a single hop to all physical memory, application performance is typically improved by the AMD approach. However, when NUMA systems grow, hop counts increase, interconnects become slower, and memory management becomes critical. Depending on the goals of the virtualization environment, all this must be taken into account.

VMWARE

As mentioned, VMware provides full virtualization, with all its benefits and drawbacks. I/O and memory management are fully virtualized, giving VMware the ability to run a wide variety of guest OSs. On the downside, this also presents some challenges for VMware--mostly centering on performance. That performance hit can be significant, particularly for I/O-intensive applications or ones that rely heavily on the OS for memory management (such as applications that frequently create and kill off new processes).Even so, in VMware's primary data center-use case, its performance hit may not be such a big deal. If the goal is to create an environment where you can do away with a bunch of old, slow servers that never see high utilization in the first place, then VMware is definitely attractive. It gives you the ability to move those applications and their OSs onto a consolidated platform seemingly unmolested. And because it doesn't require kernel source-code modifications like the Xen and Virtual Iron approaches, VMware says you should have to conduct less testing before deployment.

However, this is more hype than reality. In VMware's case, the run-time environment for a guest OS has to be modified. That's because VMware must trap out certain x86 instructions and involve itself with operations such as memory allocation. Since VMware doesn't modify the kernel prior to compilation, it must do so at run time. So if your goal is to run an x86 environment identical to that on the server you want to replace, it's just not going to happen--at least not with the current crop of x86 processors. That means even with VMware, predeployment testing is required.

VMware's latest enhancement supports four-way Symmetric Multiprocessing (SMP) servers. Announced last fall, SMP support allows a single VM to span all four processors on a server, or have those processors be divided up as needed. (Previously, guest OSs could only access a single CPU.) This capability allows VMware to manage larger, more business-critical applications. Indeed, VMware's best asset in the long run may be its management layer. The company has had a while to work with enterprise data center customers, and the management interface to the VMware environment shows a level of sophistication and maturity lacking in other offerings.

The ability to migrate VMs between physical servers is well-developed. In the data center, it's the management software that will likely maintain VMware's seat at the table. That's because both Intel and AMD are planning hardware advances that will make VMware-style virtualization simpler to implement.

XENXen also has the goal of allowing potentially dozens of OSs to run on a single server. While primarily developed for supporting Linux, Xen isn't architected in such a way as to make it Linux-specific. Other Unix implementations can use Xen, and there's a Windows XP port for Xen under way as well.

As mentioned, in contrast to VMware, Xen relies on the modification and cooperation of the guest OS in reaching its virtualization goal. In particular, guest OSs get a partial look at real memory, time, and I/O, and in doing so avoid a good bit of what slows down full virtualization schemes. Because the kernel modifications are in memory and I/O management, applications need no modification to run on a Xen-aware OS. VMware spokespeople point out that because the kernel is modified, it stands to reason that applications should be recertified for the Xen kernel. However, as we've seen, at least with Xen such certification is possible. Because VMware does its modifications at runtime, there's still an equally good chance that application anomalies will crop up--but testing for that is more difficult.

The kernel modification argument will soon be rendered moot. In February, the Linux kernel development team announced that Xen modifications would become part of the standard Linux 2.6 kernel distribution. This should alleviate much of the concern for running production applications on Xen as long as they run on the latest versions of the 2.6 kernel. Furthermore, Red Hat and Novell have both promised support for Xen, as have hardware vendors such as Intel, HP, and IBM. While applications that require Linux 2.4 kernel may still be a concern, it appears that Xen-style virtualization will be part of Linux going forward.

While Xen clearly avoids some of the performance bottlenecks of VMware, there's still work to be done to make Xen virtualization right for the enterprise. Novell, for instance, will support Xen on its SuSE Linux Professional product by the time you read this, but its enterprise SuSE Linux product won't see Xen support until roughly this time next year. In that time, Novell expects to develop an extensive set of management tools for virtualization. It also intends to add to Xen support for the Common Information Model (CIM) so that any CIM-aware management console can manage a SuSE Linux VM.

Finally, Xen 3.0, due out in the third quarter of this year, will support SMP guest OSs. The current version, 2.0, already supports SMP host hardware.VIRTUAL IRON

While VMware and Xen are busy about the job of carving up single servers for many guest OSs, Virtual Iron aims to move beyond the single server environment into what we might normally think of as a clustered environment. With Virtual Iron's VFe 1.0, a single OS image can run on a fraction of a processor, as is typically done with VMware and Xen. In the other extreme, it can also run on as many as 16 CPUs simultaneously. To make this magic happen with commodity hardware, Virtual Iron requires that a Topspin Communications InfiniBand switch connect all systems participating in the virtual system.

In so doing, VFe essentially creates an extreme version of a NUMA machine. It also makes it easier to virtualize networking and storage subsystems. That's because all I/O is done through the InfiniBand interface. Network and SAN access is handled through the Topspin switch. This use of InfiniBand makes Virtual Iron's approach the closest to the "virtualized everything" model we typically imagine. On the other hand, it's also a model that comes with some unique challenges.

VFe will have to manage a memory pool that's highly non-uniform. For example, it's possible that applications may need to access data in memory on systems across the InfiniBand switch. The trick is to properly allocate memory so that such instances are rare and so that the VFe VM monitor doesn't spend a lot of time rearranging processes for optimal performance. Many Ph.D.'s in computer science have yet to be earned on optimizing multiprocessor, non-uniform memory access.

While the cluster approach makes VFe's job of memory management a good deal more complex, it has some advantages. The most obvious is the ability to scale up past the resources of any single server. In this way, Virtual Iron solves a scalability problem that neither VMware nor Xen do. Those who need this kind of scalability can often justify a sizable performance hit to get it. However, VFe is still new, so just what sort of performance hit it exacts is difficult to say. One thing is certain, however: It'll depend on the application being run.VFe's single OS image also simplifies some problems that are commonly presented by compute clusters. In a standard cluster, each host runs its own OS, so typically there's a cluster-aware file system that ensures the integrity of data for each member of the cluster. Companies such as Veritas have long been providers of cluster file systems and management software. While there's nothing inherently bad about cluster-aware software, it's just another element that's needed for cluster computing. Much of the cluster management functions, such as CPU allocation and adding and removing compute resources, are also handled by the VFe environment. VFe was still in beta testing as of press time.

Finally, both Intel and AMD are working to making virtualization easier on x86 CPUs. Intel's two efforts, code-named Silvervale and Vanderpool, have since been combined under the Vanderpool name and will show up on desktop chips by the end of this year, with server and laptop chips to follow in 2006. The presence of the technology will allow fully virtualized systems such as VMware's to perform better, but it'll also make some of the company's patents unnecessary. AMD has similar enhancements in the works for 2006 under a project code-named Pacifica.

Editor-in-Chief Art Wittmann can be reached at [email protected].

Risk Assessment: Linux VirtualizationWhile virtualization technology is very mature in mainframes and proprietary Unix servers, the Intel architecture and Linux are just now supplying standardized mechanisms to make virtualization commonplace.

If you need Linux virtualization today, VMware offers an excellent solution. If you want better performance or don't want to pay for VMware, Red Hat and Novell will soon offer virtualization as part of their enterprise offerings.

No technology promises to change the way business is done in the data center more than virtualization. It was vital to mainframes, and it'll be vital to the commodity Linux market. If Virtual Iron can make its approach appealing, it'll be game over for the proprietary Unix market.

This is proven technology that's making its way to the x86 architecture and Linux. As one would expect, risk increases with a newer technology, but VMware is a proven commodity and Xen has the benefit of 20,000 or so eyeballs studying it.

Related Topics

Recent in Infrastructure

Related Topics

Recent in Network Mgmt

Related Topics

Recent in Security

Related Topics

Recent in Enterprise Connectivity

Related Topics

Recent in Wireless

Related Topics

Recent in Careers

Related Topics

Linux Virtually Ready for the Data Center

Editor's Choice

Related Topics

Recent in Infrastructure

Related Topics

Recent in Network Mgmt

Related Topics

Recent in Security

Related Topics

Recent in Enterprise Connectivity

Related Topics

Recent in Wireless

Related Topics

Recent in Careers

Related Topics

<span class="ArticleBase-LargeTitle">Linux Virtually Ready for the Data Center</span>Linux Virtually Ready for the Data Center

Editor's Choice

Linux Virtually Ready for the Data Center