Data Overload: Keep It Simple
XML allows many programs to speak one language. But we pay for this flexibility in complexity and verbosity.
November 7, 2003
Take into account the fact that we're transmitting and storing more types of content in a digital format, and it's obvious that IT folks have a lot to think (and worry) about. Much of this new information is coming in large chunks--voice and video aren't terse media. A picture is worth a thousand words, or upward of 1 MB of data storage.
There are some instances when data chunks are necessarily large. But in other cases, could smaller be better? Are we adding to the information overload without deriving equivalent benefit?
Some of you may remember the old joke about becoming an experienced software developer. A beginner could write a program that said, "Hello, World!" in just a couple of lines of code. An expert, though, could use arcane Unix system calls to make a 300-line behemoth that would say ... "Hello, World!" Newbie developers everywhere longed to be able to write the behemoth because it was so much cooler.
Not Out of Room Yet
The good news is, hard drives are growing faster than we can put data on them. But there are some snags. First, networks must handle these new types and amounts of load. Second, people need to make sense of all this extra information--they need metadata that describes and defines content, for networks, applications and end users.The W3C has been working on ways to make the World Wide Web more semantically useful, and XML and its brethren have come a long way in helping us do just that. XML is a wonderful, flexible solution that allows many different types of programs to speak a common language. But we pay a price for this flexibility in complexity and verbosity. (Some of my colleagues think we use XML too often. See BuzzCut, page 30, for Don MacVittie's take.)
Network Computing's editors occasionally get into e-mail rants about technology problems; and while these rants don't usually make it into the magazine, they can be eye-openers. A recent discussion centered on the ways that XML can be used--and the well-known fact that because it's wordy, it can be inefficient when stored or transported across networks. The IETF's IDMEF (Intrusion Detection Message Exchange Format) came up as a specific example: Does information that could easily be parsed from regular-text syslog files really need to be marked up in the more wordy XML format? Are we working too hard to make simple things complex?
Intrusion-detection systems generate lots of events very quickly, so the speed at which these events can be created and transmitted is key. Hence, spending extra CPU cycles or lots of superfluous bytes generating information is unintuitive at best.
In a recent column, I discussed a law that shows how perceived complexity can be increased by abstractions--a take on Occam's Razor. XML can be part of the solution to this large data problem we have. But there are times when it might not be the best tool for the job: If you're trying to drive a small nail, you don't need a big hammer.
- Mike Lee [email protected]Post a comment or question on this story.
Read more about:
2003You May Also Like