The Difference between Block and File Data and Why It MattersThe Difference between Block and File Data and Why It Matters
Block storage arrays are designed to enable applications like Oracle or Microsoft SQL to directly access the storage drive itself without an OS inserting itself and adding a layer of management.
March 2, 2023
Whatever data you’re looking for, it’s probably in the cloud. More than 70% of companies have migrated at least some workloads onto the public cloud, according to Gartner, and that number will only increase over time. This underscores how critical the cloud is in today's workplace—but first, you have to move your data there. Oracle reports that 83% of data migration projects either fail or exceed their budgets and schedules.
It’s clear there is a common struggle to do data migration right, and not knowing the details of the data you’re moving is a big reason why. To date, the majority of the data shifted to the cloud has been file data. But there is another important type of data that is more complex and under-discussed in cloud migration: Block data. In this post, we’re going to introduce the concept of block data, show how it differentiates from file data, and show how knowing what kind of data you have is key to properly moving it.
What is block data, and how is it different from file data?
Block data refers to your mission-critical databases and applications, which are structured data owned directly by applications and not managed by an operating system. Block storage arrays are designed to enable applications like Oracle or Microsoft SQL to directly access the storage drive itself without an OS inserting itself and adding a layer of management. This means block data is always faster to work with than file data because the applications access the datastore seamlessly.
In contrast, file data is unstructured data that is given structure by an OS to make it easier to manage. With file data, everyone can use the OS to see where data resides, so users don't have to know exactly where data is stored. At the same time, the OS has to reserve memory to read off a drive, organize it, and show it to a user. Because this uses CPU cycles and memory, file data requires more computing resources than block data—essentially trading performance to deliver greater manageability.
Here's how to visualize this difference: Imagine block data is a massive amount of freight that needs to be shipped, and file data is a mailbag full of individual letters. To get the freight to the right location, all the boxes are packaged together to ensure they all arrive at the same warehouse. The shipper simply looks at the address, knows the right destination, and moves the freight together. Migrating the file data mailbag is a different process. A carrier needs to sort through individual pieces, read the zip codes and addresses, and send each item where it needs to go. Both freight shipping and postal mail are important—but you wouldn’t ask a mailman to transport a shipping container, nor would you hire a logistics company to deliver your holiday cards.
See also: Storage Comparison: Object vs. File vs. Block Storage
Why is block data important to cloud migration?
Migrating data, simply put, is moving data from old infrastructure to new. And just as freight and postal mail should be delivered differently, migrating block data requires a different solution than migrating file data. Otherwise, your business could become one of the 83% that experience failed data migration projects.
Block data migration needs to be fast and efficient. It is mission-critical because block data deals with databases and applications that are essential to your business operations. Ideally, your data migration needs to move that block data with zero downtime, so your business applications stay online. At the same time, this represents an opportunity to expedite your cloud migrations. By moving databases and applications together as a block, the applications and databases remain coordinated, and you reduce business disruption.
How should we think about block data in our data migration strategy?
Now that you have a grasp of what block data is and how it needs to be migrated, here are three key questions to ask as you choose your data migration solution
What kind of data do I need to migrate, block, file, or both?
When most cloud migration vendors talk about data, they assume it's all file data that requires an OS to structure it. But as you now know, not all data is created equal. By examining how your data is accessed on your existing infrastructure, you can know what you're moving and how best to do so. This will result in a smoother and faster data migration.
What solution should I use to move my data?
Once you know whether you need to move block data or file data, you're better equipped to choose the right data migration solution. The established vendors in block data include HPE, IBM, Dell Technologies, and Pure Storage; in the file data market, you're usually looking at NetApp and NAS (network-attached storage) vendors, many of whom also do block data storage (like IBM and HPE). There are also unified arrays that handle both block and file in their environments. Regardless of whom you choose, be sure your vendor knows the type of data you need to move and how to handle it with care. Another word of caution: Don't put off planning how you'll move your data to your new storage environment until after you purchase your storage. Planning upfront for your migration ensures you have a successful strategy that accounts for the nuances of your environment.
Once I determine whether my data is block or file, are there other data type questions to consider?
Yes. Even within the categories of block and file data, not all data is the same. Block data is likely the lifeblood of your organization, so you need to examine details of its location, type, and go-forward strategy. It's also not uncommon to have your block data virtualized by a vendor like VMware, which is the leader in this market. When migrating virtualized block data from one environment to another, it can be tempting to use free tools from your vendor—but this process will be very slow if you have a lot of data. This is because tools to migrate virtual machines can only migrate 1–2 VMs at a time. Be sure to consider the amount of block data you need to move, the percentage that is virtualized, and your timeframe before choosing your migration strategy.
Now that you grasp the difference between block data and file data, you’re ready to have a smarter conversation about data migration.
Mark Greenlaw is Vice President of Market Strategy at Cirrus Data.
About the Author
You May Also Like