Most Of Our Benchmarks Are Broken

For years, we in the storage industry have relied on a fairly small set of benchmarks to measure the relative performance of storage systems under different conditions. As storage systems have included new technologies-- including data reduction, flash memory as cache or automated tiering--our existing portfolio of synthetic benchmarks are starting to report results that aren't directly comparable to the performance that this new generation of storage systems will deliver in the real world.

Howard Marks

December 20, 2011

4 Min Read

The most commonly used storage benchmark is IOmeter, originally developed by Intel and, since 2001, an open source project on SourceForge. IOmeter can perform random and sequential I/O operations of various sizes, reporting number of IOPs, throughput and latency of the system under test. IOmeter has the virtues of being free and easy to use. As a result, we’ve developed IOmeter access patterns that mix various size I/O requests and random vs. sequential access patterns to mimic file, Web and database servers.

After years of hearing application vendors tell us that the impact of storage system cache should be minimal, we adjusted our test suite to measure actual disk performance, minimizing the impact of the storage system’s RAM cache. Since RAM caches, even today, are just a few gigabytes, simply running the benchmark across a data set, or volume, at least several times the size of the cache would ensure we weren’t seeing a fast cache as a fast storage system.

Once we start testing storage systems that use flash as a cache or automated storage tier, the system will no longer provide consistent performance across the test data set. Instead, when running real applications, some portions of the data, like indexes, will be "hot" and served from flash, where other portions of the data set, like transaction logs or sales order line item records, will be accessed only once or twice. These cooler data items will be served from disk.

The problem is that when IOmeter does random IO, its IO requests are spread evenly across the volume being tested. Unlike real applications, IOmeter doesn’t create hot spots. As a result, IOmeter results won’t show as significant a performance boost from the addition of flash as real-world applications.

To get meaningful results from a hybrid storage system, our benchmarks need to access the storage the way real world applications access storage. Benchmarks like TPC-C and SPECsfs are based on IO traces from real-world users and applications, so they create hot and cold areas in their test data. This means that their results should correlate more closely to real world performance than IOmeter does. The problem is that these benchmarks are expensive to acquire, and to run, so vendors tend to report results only for specially tuned high-end storage systems that use large numbers of small disk drives and other configuration options that are rare in the real world.

If that weren’t depressing enough, even the most sophisticated benchmarks write the same, or random, data to create their entire data set. While disk drives, and most SSDs, perform the same regardless of the data you write to them, the same can’t be said about storage systems that include data reduction technology such as compression or data deduplication. If we test a storage system that does inline deduplication--like the new generation of all solid state systems from Pure Storage, Nimbus Data or Solidfire--and use a benchmark that writes a constant data pattern all the time, the system will end up storing a 100-Gbyte test file in just a few megabytes of memory, eliminating pretty much all IO to the back-end disk drives, or flash, to deliver literally unreal performance numbers.

Our friendly competitors at Demartek recently posted hex dumps of the data files created by several popular benchmarks, so you can see how bad the problem is first hand.

Creating a benchmark that stores realistic data in realistic locations is a major undertaking. The benchmark would end up having to read data from a repository of some kind to write it to the system under test. To generate enough traffic to make an enterprise storage array with 500 Gbytes of flash breathe hard, we’ll need several servers working in concert and reading their source data from a storage system at least as fast at sequential IO as the system under test can run the benchmark. I sure hope someone comes up with a good one soon.

Disclaimer: Solidfire is a client of DeepStorage.net, Tom from Nimbus Data let me sit in his Lamborghini at SNW. DeepStorage.net and Demartek provide similar services, so I hate giving them a plug.

About the Author

Howard Marks

Network Computing Blogger

Howard Marks is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.He has been a frequent contributor to Network Computing and InformationWeek since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of Networking Windows and co-author of Windows NT Unleashed (Sams).He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders.  You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

See more from Howard Marks

Related Topics

Recent in Infrastructure

Related Topics

Recent in Network Mgmt

Related Topics

Recent in Security

Related Topics

Recent in Enterprise Connectivity

Related Topics

Recent in Wireless

Related Topics

Recent in Careers

Related Topics

Most Of Our Benchmarks Are Broken

About the Author

Editor's Choice

Related Topics

Recent in Infrastructure

Related Topics

Recent in Network Mgmt

Related Topics

Recent in Security

Related Topics

Recent in Enterprise Connectivity

Related Topics

Recent in Wireless

Related Topics

Recent in Careers

Related Topics

<span class="ArticleBase-LargeTitle">Most Of Our Benchmarks Are Broken</span>Most Of Our Benchmarks Are Broken

About the Author

Editor's Choice

Most Of Our Benchmarks Are Broken