Grid Computing's Promises And Perils
What do fad diets and grid computing have in common? They both make enticing claims and promise dramatic results.
February 12, 2004
What do fad diets and grid computing have in common? They both make enticing claims and promise dramatic results. Yes, you can lose weight while you sleep! Yes, you can boost your company's computing power while simultaneously cutting costs! Get something, for nothing! To any sensible person, these grand claims don't ring true.
But before you toss grid computing into the same trash bin as fat-burning pills, consider this: Hewitt Associates (www.hewitt.com), a global human resources outsourcer, cut the cost of running a key business application by 90 percent when it switched from a mainframe to a grid. And that's not all: The application now runs faster and more reliably than ever.
"The grid worked out truly better than I expected," says Dan Kaberon, director of computer resource management at Hewitt.
Does this mean you should dial 1-800-GRID right away? Not necessarily. The truth is grid computing does offer significant benefits, but only to organizations that meet particular criteria. For instance, not every application will benefit from the parallel computing offered by a grid. Other factors to consider include security, resource management, and even good old departmental politics.
Read on to find out just what you need to know to make sensible choices about this bleeding-edge technology. We'll start with some background, and then delve into the pros and cons of the grid. We'll also get feedback from real-world grid users. By the time you're through, you'll know if you're ready for grid computing-or if grid computing is ready for you.WHAT'S A GRID?
Grid computing hooks up geographically or departmentally separate machines to create a "virtual" supercomputer. This virtual machine appears as a single pool of computing resources and possesses the computational muscle to do jobs that individual machines can't. Thus, a computer scientist modeling a gene sequence on a grid in California may not know that the application is using computers in Chicago and Pittsburgh to get its results. All the researcher knows is that a specific amount of compute power is available for the process.
Grid deployments usually follow either a research-oriented or enterprise-oriented track, reflecting the different needs and goals of the two major consumers of grid technology.
Research-oriented grids are cobbled together by universities and labs, often using open-source software and government funding. Research grids connect computing resources and scientific instruments so that participants can share the enhanced computational power and collaborate on experiments. An excellent example is the TeraGrid (www.teragrid.org), which links the computing facilities of five academic and research labs via a 40Gbit/sec optical pipe. The TeraGrid project estimates that it will have up to 20 teraflops of computing power at its disposal by 2004. (A teraflop equals one trillion floating-point operations per second.)
Research grids are complex undertakings. Participants must tinker with communication protocols and software stacks to get different OSs and applications on speaking terms. They must also negotiate service agreements, resource sharing, security frameworks, and so on. Research grids are also expensive: The National Science Foundation has already sunk almost $100 million into the TeraGrid project.
A growing subclass of research grids takes a less expensive approach to distributed computing. Rather than link up a handful of powerful supercomputers, they use thousands of desktop PCs to collectively generate massive amounts of computing power. Sometimes referred to as "scavenging" or "Internet computing," these grids take immense calculations and break them down into tiny components that can be run piecemeal and in parallel on thousands of computers.
A case in point is Purdue University. Using grid-enabling software from United Devices (www.ud.com), Purdue can now squeeze 6.5 teraflops per second of computing power out of 2,300 campus PCs, says Paul Kirchoff, vice president of marketing at United Devices. Kirchoff compares that to the (seemingly) measly 1.3 teraflops of the campus' resident supercomputer.
Many of these scavenger grids are now enlisting volunteers to "donate" CPU time to a particular cause, such as cancer research or the study of climate change (see "Doing Your Bit(s)"). Volunteers download special software onto their home PCs. When the PC slips into idle mode, the software calls back to a central server for chunks of data to nibble on, and then sends back the finished calculation.
ENTERPRISE GRIDS Enterprise grids also create virtual supercomputers by tying together distributed resources to appear as one pool of computing power. However, while enterprise grids may be geographically or departmentally separated, they're logically located within the same walls (for example, behind the same firewall). And unlike research grids, the additional capacity that's generated is only available internally to the organization.
To deploy a grid, enterprises can go with an open-source resource like Globus Alliance's (www.globus.org) Globus Toolkit, a set of software and services that enables grids and grid applications, or choose among a growing marketplace of commercial vendors offering hardware, software, and services.
IBM, HP, and Sun Microsystems are the biggest players in the nascent grid market. These companies are "grid-enabling" their hardware to support the commercial and open-source software that turns heterogeneous machines into a grid. These big guns also offer software packages and services to help enterprises get a grid off the ground and manage it once it's running.
The major vendors are also partnering with smaller software providers that make grid middleware. Companies such as United Devices, Avaki, Platform Computing (www.platform.com), and Data-Synapse make software agents that reside on different kinds of hardware, including desktops, workstations, servers, compute clusters, and mainframes. Other vendors, such as Entropia and Parabon Computation (www.parabon.com), provide software to grid-enabling PCs only.
These middleware agents communicate with a master server. They report the OS type, available memory and processor speed, and current levels of available compute power. The agents also provide security functions such as authentication, usually via digital certificates.The master server breaks application data into chunks and parcels them out to available machines. The server tracks the progress of each chunk so that if a machine suddenly becomes unavailable or goes offline, it can redistribute the data to another computer. Scheduling and resource management functions ensure that the grid applications don't rob agent machines of the power to perform their primary tasks. Vendors say users often perform their day-to-day tasks unaware that their PC is working behind the scenes on an additional application.
At this stage, enterprise grids aren't linked to other grids outside the organization, though proponents foresee a day when business partners and suppliers may link individual grids to enhance collaboration and integration.
GRID BENEFITS
Though grid computing was born in research labs, vendors and enterprise customers are adopting the technology as well. They see grid computing as a way to take disparate OSs and hardware platforms and turn them into a single, virtualized entity whose sum is greater than its parts.
Grid promoters say there are two business benefits to virtualizing heterogeneous resources through a grid: savings and speed.Grid computing "saves money on both capital investment and operating costs," says Sara Murphy, marketing manager for grid computing at HP. Grid computing manifests this magical combination by fully using all the compute resources of all the components in the grid.
Experts say most computers are grossly underutilized. "PCs and Windows servers are about 5 percent utilized; Unix servers are 15 percent utilized," says Dan Powers, vice president of grid strategy at IBM. "Even IBM mainframes are only utilized about 65 percent of the time."
A grid solution "virtualizes all these untapped resources to look like one big computer," says Powers. This windfall in capacity reduces the need to buy new hardware, while at the same time squeezing more Return on Investment (ROI) from the equipment on hand.
It also squeezes more computing speed from an organization's machines, which is the second benefit of grid computing. The increased processing power of a grid means applications run faster and deliver results more quickly.
"Speed is an enormous benefit," says United Devices' Kirchoff. "If you can do ten times the amount you used to in the same period ... that affects time to market, development cycles, and quality assurance processes."Promoters say the increased speed and computational power also gives customers the much-sought-after quality of agil-ity. "Companies need to deploy resources quickly to respond to new market demands," says Murphy. "It's easier to be agile when you can tap into resources over a grid."
As mentioned earlier, Hewitt's Kabe-ron is a grid believer. He oversaw a 10-month project at the human resources outsourcing firm to move a pension calculation application from an IBM mainframe to a grid.
Hewitt customers use the application to calculate pension benefits based on a set of variables, such as retirement age. Kaberon says the CPU-intensive application gets as many as 110,000 calculation requests a week.
Because customers use the calculation via a Web site, Kaberon's project risked failing in public. But so far, Kaberon has only met with success. In addition to cutting the cost of running the application by 90 percent, he's also improved performance and reliability. "Everything we do on the grid comes up the same or substantially faster than on the mainframe alone," he says.
Kaberon's grid is a collection of dual-processor IBM blade servers linked to the mainframe. "The mainframe deals with a general set of employee benefit issues, but when the application does a pension calculation, it calls the grid to do that," says Kaberon.Kaberon admits that the blade servers are dedicated to the pension calculation application alone, but he says certain features make those blades a grid and not just a cluster. First, all the servers are running grid middleware from DataSynapse.
Says Kaberon: "All the protocols we're using are grid protocols. They are different from cluster approaches in that we can easily add many different machines to this configuration. We could throw 100 blades into the grid without reconfiguring substantially. There's no cluster you can do that with."
He also says the company is working on a second application to run over the grid. "We may need to increase the capacity on the grid, but we'll manage it as a single environment," he says.
AVOIDING GRIDLOCK
The benefits of grid computing are real, but grids are still a specialized technology. Grid computing is most suitable for businesses already using high-performance computing, such as financial services firms, pharmaceutical companies, and large-scale designers and manufacturers in, for example, the auto and aerospace industries."We don't see it moving outside of its high-performance computing area for a number of years," says Carl Clauch, a vice president at Gartner's research and advisory service. He says the business value isn't there yet to justify a grid solution for functions such as tracking inventory or updating bank accounts.
One reason is the applications themselves, which must be able to take advantage of compute parallelism-that is, the calculations must be divisible into small pieces that can be operated on simultaneously across different processors. If the operation depends on one chunk of data being processed before the operation as a whole can continue, it may not be a job suitable for grid computing.
In addition, says Clauch, "the many parallel parts of the task must not need a significant amount of interaction, or the general-purpose network connecting the grid together will be the barrier."
However, he notes: "The high-performance computing community has been working for decades to increase parallelism so that the fraction [of tasks] suitable for grid is reasonably large."
Hewitt's Kaberon says converting his pension calculation application from a mainframe to a grid was relatively smooth. "The core application that did all the numeric processing didn't get changed at all," he notes. However, some changes were made in the way the application communicated with a database, and programs had to be written to interface with Web software.That said, porting the application to a grid was still a lengthy process that involved teams from IBM, DataSynapse, and Hewitt. "IBM got our code in January 2003, and we had our first clients in production on the grid in September 2003," says Kaberon. "The whole project was all-new technology."
Another inhibitor to grid deployment is management and accounting. "As people share resources, they want to account for them effectively," says HP's Murphy. This is especially true if a grid solution spans multiple departments; each department needs assurances that it's getting as much from the grid as it's contributing.
Within accounting, software licensing in particular needs close attention. How should an application vendor charge for an application that may be scattered across hundreds or thousands of devices and used only intermittently? Clauch says software licensing models are still adapting to a grid environment "to be able to dynamically calculate charges based on a very volatile and changing mix of hardware and usage."
This is important because licensing costs can actually cancel out any potential savings from a grid. "Using desktops [in a grid] gives you an incredible amount of horsepower that becomes arbitrarily close to free," says Kaberon. "But it doesn't take into consideration whether you have to license a significant multiple of [an application]. You could spend more on software than on hardware. It's a danger point that people need to explore."
Other concerns include security. At the top of the list is the threat to intellectual property. "High-performance computing traditionally runs on nodes that are hard to access and tightly controlled," says United Devices' Kirchoff. "But when you connect up non-dedicated resources like your salesperson's PC, you have to worry. You cannot have data flying all over the place that is vulnerable."Most vendors' products, as well as Globus Alliance's Toolkit, rely on digital certificates to authenticate the devices on a grid. Many solutions also utilize the security frameworks developed for Web services to authenticate agents and encrypt data in transit.
Securing the server that manages the computers in a grid is also important. Many times these servers are publicly addressable because they send and receive data from geographically distant agents. This exposes them to unauthorized intrusion and even Denial of Service (DoS) attacks. Experts recommend that enterprises remove all unnecessary services and keep a watchful eye on such machines.
Last but not least is the human factor. Department managers or other local system "owners" may be unhappy about having their computing resources poured into a communal pool that everyone can dip into. "There's a tendency for people to hug their own resources and not want to share," says Murphy. "There has to be an advocate in the organization to encourage sharing."
GRIDS STANDARDS AND WEB SERVICES
Two forces are leading the development of grid standards: the Globus Alliance, which oversees the Globus Toolkit; and the Global Grid Forum (GGF, www.gridforum.org), which is creating a set of open standards for grid technologies and applications. The GGF includes academics, researchers, and small and big technology companies. The GGF's major efforts include the Open Grid Services Architecture (OGSA) and the Open Grid Services Infrastructure (OGSI). Globus and the GGF have cooperated such that Globus Toolkit 3.0 includes a reference implementation of the OGSA/OGSI standards.The emerging grid infrastructure also incorporates Web services standards to facilitate communication among heterogeneous resources. Grid proponents expect that Web services mechanisms will become the interface for grid computing. Vendors such as United Devices and DataSynapse already use Web services protocols in their products, including Simple Object Access Protocol (SOAP) for communication and Web Services Description Language (WSDL) to describe services.
GRID IRON VS. BIG IRON
While grid computing may be the next evolution in computer science, don't expect supercomputers to go the way of the dinosaur.
"Grids are not a panacea for every form of high-performance computing," admits United Devices' Kirchoff. "Big iron is not going to go away." That's because there are some calculations that simply can't be performed on anything but a supercomputer. Weather simulation, high-energy physics experiments, and other intense calculations require the dedicated muscle and communication infrastructure of a supercomputer.
Even so, the benefits of grid computing can't be ignored. As the technology matures, as standards coalesce, and as new applications are discovered, grid computing may become as indispensable to the enterprise as the Internet.As vendors pump up the volume on their grid computing marketing campaigns, you may be tempted to tune it out. Of course, it's always good practice not to believe the hype, but in this case wise networkers shouldn't ignore it either.
Andrew Conry-Murray, contributing editor, can be reached at [email protected].
You May Also Like