Virtual SANs: Scale Out's Next Iteration
Shared, block storage SANs have been the exclusive domain of big, centralized storage arrays, but distributed systems are about to crash the party.
March 4, 2013
Storage system design often seems as if it's stuck in the Big Iron era of mainframes, a consolidated world of enormous, expensive machines managed by a high priesthood of experts. Yet as computing has become virtualized and democratized, with distributed systems knitted into self-service clouds where every developer can create his own dedicated sandbox, mainframes have had to adapt.
Storage systems are following along, as scale-out designs using distributed object and file systems have become a popular technology for providing large, scalable storage pools over an Ethernet backbone. The concept was commercialized over a decade ago by pioneers like EqualLogic, Isilon and Spinnaker, which were acquired, by Dell, EMC and NetApp, respectively.
Yet useful as they are, and indeed I have long recommended scale-out systems as alternatives to traditional large, centralized storage frames, as both a more efficient and cost effective means of providing shared storage, they've lacked all-purpose adaptability because they can't provide block-level storage required for databases and DB-backed applications.
That's changing as the concepts underlying distributed, networked file systems collide with the virtual servers and cloud stacks. This fusion could ultimately threaten the hegemony of traditional SANs over mission-critical, transaction-oriented, back-office applications.
One of the first to marry a scale-out design with SAN versatility was Coraid, which emerged from a project in the Linux community to develop a native Ethernet storage protocol, ATA over Ethernet (AoE) that operated at Layer 2, thus eliminating the IP overhead plaguing iSCSI. But unlike FCoE, AoE, which as the name implies encapsulates the SATA command set within Ethernet frames, was made with simplicity in mind. It is much more efficient than FCoE, which shoehorns the Fibre Channel stack into a physical layer it wasn't originally designed for.
[ Join us at Interop Las Vegas for access to 125+ IT sessions and 300+ exhibiting companies. Register today! ]
Coraid, which has been shipping its EtherDrive storage nodes for several years, long struggled to gain much visibility outside niches in academia, government/military and hosting providers, but appears to be catching on as pan-virtualized, cloud infrastructure gradually seeps from early adopters to mainstream enterprise IT. In retrospect, it looks to be a case of a company (and technology) ahead of its time, coupled with plenty of competitor-incited FUD about a scary new storage protocol.
Coraid's scale-out virtual SAN comes in two pieces. The building blocks are its EtherDrive storage nodes. These are typical scale-out appliances featuring 16 to 36 drive bays sporting either mechanical disks or SDDs. They use AoE to present raw storage volumes as LUNs to any system running an AoE stack. (Drivers are available for Linux, Windows, OS X, Solaris, VMware and OpenBSD.) But in Coraid's first instantiation, each host was responsible for attaching to LUNs on different storage bricks; in other words, volumes couldn't natively span nodes. Instead, the host was responsible for setting up RAID stripes across multiple EtherDrives. It was still a nicely distributed system, in that volumes could withstand both multidisk and multinode failures because the storage nodes themselves were already RAID protected, but wasn't a fully virtualized SAN.
Coraid eliminated this shortcoming a couple of years ago with the VSX SAN virtualization appliance, a device that can create so-called macro LUNs that stripe across multiple nodes. It includes features de rigueur for any enterprise SAN, including synchronous mirroring, asynchronous remote replication, cloning, snapshots and thin provisioning. The final piece of the puzzle came when the company introduced management and automation software, EtherCloud, that simplifies the deployment to the point of allowing self-service storage provisioning for virtualized applications and can handle upwards of petabytes of pooled capacity.
Next page: Rethinking SANsBut Coraid hews to the appliance model of storage deployment, where storage resources are still treated as infrastructure distinct from servers. It's an architecture soundly rejected by native cloud services such as Google and Facebook, along with many Hadoop implementations. In such massively scaled infrastructure, it makes no sense wasting a server's inherent physical capacity and processing abundance, where storage could be just another resource like CPU cycles or memory size, only to connect them to an independent storage system.
In an era of distributed file systems like Ceph, GlusterFS, HadoopFS and Lustre, running cloud servers with DAS no longer means the capacity is dedicated to a single host and application. The only problem is that cloud stacks still only support file and object stores (with the caveat that Ceph does offer block support but is thus far limited to guests with the appropriate installed kernel module or to Qemu VMs running on KVM).
Enter ScaleIO, a stealthy startup that hopes to do for storage what OpenStack and CloudStack have done for computing. SANs are hard to manage and harder yet to scale, so founder and CEO Boaz Palgi says ScaleIO set about building a SAN without the fabric and dedicated hardware, in which local disks in commodity servers are stitched together with software that provides all the features storage pros expect: high availability and performance, shared block volumes and distributed file systems, rich features like snapshots, thin provisioning, disk and node redundancy, self-healing (replace a failed node and it's automatically reincorporated back into the storage grid) and even performance QoS--a topic we explored in depth in Network Computing's February digital issue (registration required).
The idea is simple, says Palgi: to turn local disk into a SAN, accessible by any server in the data center, that's scalable to thousands of systems and where adding capacity is as easy as connecting another server to the ScaleIO "hive mind." ScaleIO SANs can also mix and match solid state and mechanical disks in a couple of ways. The simplest is just building separate HDD and SDD volumes, binding applications to whichever is most appropriate. Alternatively, Palgi says, the software can also use SDDs for caching in native HDD SANs.
It's such a logical concept, one wonder's why it's taken so long. Two words: it's hard. Palgi points out that even AWS, the world's premiere cloud system, maintains a distinction between object (S3) and file (EBS) storage service. And he says building cloud-like block storage is particularly difficult, noting that although all cloud providers offer some sort of object store, "almost no one offers an alternative to EBS." Access to ScaleIO SANs is done through a kernel driver, with ports to Linux and ESX currently available and Windows to come. Palgi says it has even been working with Calxeda to provide support on ARM servers.
IT architectural trends are nothing if not cyclical, oscillating between epochs of extreme centralization and hyper-distribution. Storage, as witnessed by EMC's enormous and sustained success, has been living through an era of consolidation, but one that's come at a high price, as the cost of buying and managing huge storage systems hasn't kept pace with that of disks and flash chips.
Cloud infrastructure is likely the catalyst to swing the architectural pendulum back toward distributed storage systems, where server storage bays don't sit empty, adding capacity isn't a moonshot project, and server admins needn't supplicate before storage gurus every time they need to spin up a new applications.
Read more about:
2013About the Author
You May Also Like