The Costs of Long Term Storage
I thought I could put together a spreadsheet to calculate the TCO of various storage techniques for a set of data. The more I thought about it the more I realized that even calculating a first approximation of the TCO to store data over the long term involved too many factors for one simple spreadsheet.
June 15, 2009
As I've been thinking about the archive problem, and writingthis now endless series of blog posts, I thought I could put together aspreadsheet to calculate the TCO of various storage techniques for a set ofdata. The more I thought about it themore I realized that even calculating a first approximation of the TCO to storedata over the long term was never going to get done by deadline. Instead, I've decided to wimp out and keep itgeneral examining the cost elements leaving actual calculations for later.
First, let's take the NAS with retention enforcement modellike a NetApp filer with SnapLock. Wherea storage manager planning to use this type of solution probably takes intoaccount the obvious items like a second NAS at another location, replicationsoftware, data center floor space at $300/sq ft., 15% maintenance a year andpower, they may be in for a rude surprise in year 5-7 when their NAS vendorannounces their NAS is end of life.
Now they have to migrate all their data to a new storagesystem, or if they're in an industry like Pharma or Wall Street where governmentregulations require their data to be maintained in a non-modifiable,undeletable system, they have to pay a very expensive someone to migrate thedata and submit all sorts of documentation that it wasn't modified in themigration process. Now, as a consultant, Ilike professional services engagements, but you may have better things to dowith your money.
So if you're looking at the NAS model, figure 2-3 migrationsand new systems over 20 years. The goodnews is the each time the new system should cost about 1/2 as much as the one itreplaces and, even better, the maintenance will go down too.
RAIN systems like Hitachi's HCAP, Permabit or NEC'sHydrastor make the inevitable hardware replacement significantly less painful.Five years from now, when your vendor stops supporting today's nodes that use1 TB drives, you can add new nodes with 8 TB or 16 TB drives to your cluster/grid,and tell the system you want to remove the old ones. After a day or 20, the system will have movedall the data to the new nodes and send you a message that the old ones areready for the dump.
The compression, deduplication and drive spin down willreduce the floor space and power requirements but not eliminate them. Plus you'llstill need two clusters in different locations to really protect your data.
Storage-on-the-shelf systems using tape, Blu-Ray disks orProStor's RDX technology, which is the only removable hard disk system to haveany real engineering behind it, can greatly reduce the storage cost sincestorage on the shelf doesn't take power or generate heat. In addition, media vendors haven't figured outhow to make us pay maintenance on tapes on the shelf.
A high density deduped storage system will store about75TB/sq ft when you account for the fact that only about 35% of a typical datacenter is actually used for rack space. Aisles,UPSes, PDUs and the like take up the rest. So space alone is a quarter to abuck a TB a month depending on density.
Where some of the vendors that make disk-based archiving systems,and have been commenting on this series (for which I thank them), calculate TCOfor tape systems they assume archive tapes are treated like backup tapes with thecourier from ReCall or Iron Mt. coming every day and taking the only copy ofthe tapes to the warehouse. This is ofcourse a VERY labor intensive process and makes accessing the archive datadifficult, so they get to figure in courier costs, lots of operator costs andlost productivity costs when the users wait for tapes to come back from thewarehouse.
When I talk about removable media archives, I'm thinking of asubstantially different model that writes data to two pieces of media in theprimary location and replicates to a second system that writes to a third. The warehouse or salt mine is a fourth or mourthcopy, and if you're paranoid enough to make it you're the type to be spoolingoff to tape from your Centera for the salt mine too.
In the primary location the most active data is still onspinning disks, with MAID and dedupe please, a second tier with at least onecopy in the robotic library and the deep archive on high density shelving downthe hall from the data center NOT miles away. The archive software and/or B&L's Vertices tracks all the media inand out to reduce the misplaced media problems.
That media on the shelf can achieve data densities over200TB/sq ft in $35/sq ft office space as opposed to $300/sq ft data centerspace. Now you will need an operator to mount media, but fetching from down thehall 2-3 times a day isn't a full time job and since it's down the hall theproductivity cost is much less than recalling boxes from the warehouse.
The simplest model to calculate is cloud archiving. Get a quote for a fixed rate per GB/month fromyour cloud vendor of choice, pay the extra to have them store your data inmultiple data centers -- after all, you'd do that if you were storing it yourself -- and pay for a big fat internet connection and you're in business. Clearly the numbers start to add up over 20years, but boy the headache factor is small. Details will have to wait till I have time to do the full calculation.
Read more about:
2009About the Author
You May Also Like