Crash Course: XSLT and XPath in Your Organization

To make the most of company data, you must keep up with the changes in its flow--among users, applications and systems. An understanding of XSLT and XPath is essential to

July 13, 2006

11 Min Read
NetworkComputing logo in a gray background | NetworkComputing

The lifeblood of any enterprise IT environment is the data that moves among users, applications and external systems. As the environment changes, user needs and application demands force us to adjust the nature of these data flows.

The Extensible Stylesheet Language recommendations from the World Wide Web Consortium (W3C) describe two useful technologies that offer a mechanism to adapt these data flows to both our needs and the needs of our users: XSL Transformations provide a language to transform XML documents, and XPath supplies a tool to identify particular elements in an XML document. Combining XSLT and XPath with a programming language that implements them is an affordable way for small businesses to deploy cutting-edge technologies.

The XSLT/XPath combo offers an excellent solution to a specific need common to almost every IT organization: maximize the value of data by ensuring that it is always structured so users and applications can derive the most use from it. XSLT 1.0 and XPath 1.0 have been available since November 1999, and the W3C is in the process of revising both standards extensively. In fact, as this article was being written, both XSLT 2.0 and XPath 2.0 were at the Candidate Recommendation stage.

While your development staff may choose to adopt XSLT and XPath, there is another way these technologies will be encountered: XSLT and XPath underpin many enterprise applications that process XML data structures.Oracle and Sun have both adopted XSLT in their products. Oracle's XML Developer's Kit Version 10g supports XSLT 1.0, as well as the Candidate Recommendation version of XSLT 2.0. Sun has made XSLT an integral part of its J2EE technologies. The Sun JAXP 1.3 API includes an XSLT implementation derived from the Apache Software Foundation's Xalan-J project. Many other companies have made XSLT implementations available, including Microsoft, with its XML Core Services, used by Internet Explorer 6 and .Net Framework. The popular Xalan XSLT processor has its roots in IBM's LotusXSL. Although LotusXSL has been released into the open-source community, IBM continues to play a significant role both in the development of Xalan and in the creation of XSL recommendations within the W3C.

The Basics

The stylesheet is the basic component of XSLT; it defines the mapping from an XML document to a destination document, which could be in XML, HTML or another text format. This mapping is often referred to as a "transformation." The word "stylesheet" is a bit of a misnomer, as it does not relate to "style"; rather, a stylesheet provides a set of instructions to the XSLT processor about how to convert the source document's data into the destination document's structure.

XML tags within the stylesheet define the translations that build the destination document from the desired parts of the source document. XSLT contains familiar programming constructs such as "if" and "choose," which allow the developer to place conditions on what appears in the output document. XSLT also contains a powerful template processing mechanism that is triggered when a described pattern is found in the source document. For example, in a retail environment, for a source document that lists items sold, you could create templates for each item type to ensure they are processed in the correct manner.XPath plays an important supporting role to XSLT. XSLT uses XPath expressions within its processing tags to identify elements from the source document. XPath is like a road that leads to a particular file on your hard drive, but instead of locating a file, XPath is used to identify a particular element within the source document.

XPath expressions can become fairly complicated; for example, one could say, "Find for me in the document the first account for the customer whose name attribute is John Doe." XPath expressions can also be used outside of XSLT stylesheets to search XML documents in many programming languages.

Practical Advice

There are several ways XSLT and XPath could be used in an IT environment. The most common and simplest use of XSLT is to map one XML document to another. You may find that you'd like to import data into a new application, but your current XML-structured message is unsuitable. Rather than changing the structure of the existing message, a stylesheet in conjunction with an XSLT processor could be used to create a message with a new structure. XSLT could also be useful when an external client provides XML data to you; in that situation, stylesheets could be created to map the elements from the client's data into the standard your applications recognize.

XSLT also has a role to play in the presentation layer, as stylesheets can be used to construct HTML from data contained in an XML document. In this fashion, XSLT can be used to dynamically generate HTML from any data stored in XML in your system. Different user groups could have different stylesheets displaying data in a form most useful to them. Adjusting the appearance of the data becomes as simple as modifying a stylesheet.If XML is not being used pervasively in your organization, you can still leverage these technologies by building your own adapter using tools such as Java or Perl to map the existing data structure to XML. XSLT 2.0 also provides the capability of navigating non-XML data using a new instruction, analyze-string (see "The Future of XSLT").

When your company implements new software and you need to migrate user data to a new structure as part of the update process, XSLT can provide a mechanism to automate that migration--assuming the data is stored as XML. Using an update application, you can apply a stylesheet designed to update user data, as well as any XML configuration files, to the new version.

As mentioned earlier, XPath does have uses independent of XSLT. XPath can be used to search an XML document for a particular element that matches a given set of conditions. This capability is invaluable when dealing with the complex hierarchical structures commonly present in XML data.

Decisions, Decisions ...

There are key questions to consider when deciding whether to put certain processing instructions in the stylesheet. For example, are you using a lot of flow control statements, such as "if" and "when," within your code rather than making full use of the template mechanism? Are you relying on extensions to XSLT to complete tasks within the stylesheet? If you answer yes to these questions, you may be forcing the stylesheets to perform too much processing.Many programmers fall into the trap of creating stylesheets that process documents in a sequential manner. Developers may find themselves falling back on this usage rather than becoming proficient with XSLT's template mechanism. The most powerful way of using XSLT is as a pattern-matching language, with templates applied based upon found patterns in the source document. It is well worth the time to learn to program in this fashion, as it allows XSLT to solve a much broader class of problems.

Ultimately, the stylesheets that drive an XSLT process are code and should be treated in the same manner as you'd treat a Java or C# code base. If you treat stylesheets as little more than easily deployed configuration files that drive your data transformation process, you may be burned by poorly designed and untested code.

Along those lines, you should also consider the fact that debugging an XSLT stylesheet is more complex than with other types of code. You'll often be forced to debug without the aid of watches and breakpoints provided by a full-featured IDE (integrated development environment).

One tool that will be useful to environments using the JUnit or NUnit test frameworks is the XMLUnit extension. XMLUnit allows you to compare XML output with what was expected in an automated manner, as part of the build process. Also, testing will be much easier if your input and output XML documents each have a well-defined DTD (Document Type Definition) or XML Schema.

Since XSLT is based on an open Web standard, you won't be locked into using one platform or programming language. Any developed stylesheet should be portable to a wide variety of platforms. XSLT processors exist for numerous languages, including C++, Java, Perl and .Net.Under the Hood

As mentioned earlier, there are other uses for these two technologies than what your development staff discovers. Many applications used to implement emerging technologies rely on XSLT to process XML-structured data. When your organization purchases these applications, you'll also be adopting XSLT.

As part of an SOA (service-oriented architecture), an organization might want to deploy an Enterprise Service Bus to integrate applications across their environment. It's common to see XSLT as part of an ESB implementation, whereby an XSLT processor allows existing applications to communicate via Web services or other means. XSLT also allows the adoption of standards encouraged by SOA by serving as an adapter to existing applications.

Another technology that relies heavily on XSLT is Security Assertion Markup Language (SAML), which enables an organization to communicate about the security characteristics of a particular entity to other organizations through an XML-based structure. By sharing this information, organizations can extend their trust relationships to other organizations.

XSLT is not a silver bullet; it doesn't solve every problem perfectly, but no technology does. With the emergence of XML as the primary method of structuring data within the enterprise, a growing number of the technologies and applications you bring into your environment will rely on XSLT and XPath. If you take a balanced approach to using XSLT and XPath, they are sure to become valuable additions to your IT toolbox.



Edward Hand is an independent software consultant in Madison, Wis. He has more than 15 years' experience as an IT analyst, developer and project manager, and has worked in numerous industries, including defense and finance. Write to him at [email protected].

The Future of XSLT and XPath

XSLT and XPath have been undergoing major changes. Candidate Recommendations, both at Version 2.0, have been made for these technologies; progression to an official recommendation will follow after another round of reviews, at which point any remaining issues must be resolved. The process length will vary based on the amount of feedback and the number of issues encountered.In its current form, XSLT 2.0 features more robust error-handling, which provides descriptive error messages for a wider variety of error conditions, thus potentially shortening the time needed to debug problems with stylesheets. XSLT 2.0 also deals with a broader range of data types, including sequences, and handles exceptions generated from violating their use.

With XSLT 2.0, a transformation defined in a stylesheet can create multiple output trees, allowing a user to parse a single source document and create multiple resultant documents. This could be used to create multiple Web pages from a single data source, for example.

A powerful analyze-string instruction has been added that allows a string to be examined using a regular expression engine. This will allow a stylesheet to examine and make decisions based upon non-XML-structured sections of an input document. This instruction uses standard regular expressions to look for particular patterns within an input string. This feature could broaden the use of XSLT to many other document types.

Many other, minor changes will increase XSLT's usability. For example, 2.0 will include instructions that address the formatting of date and time, making the use of these common data types much easier.

To aid backward compatibility, stylesheets can inform the XSLT processor of the version of the XSLT recommendation it's using. If this value is specified to be 1.0, the processor could operate in a backward-compatibility mode. If you've invested considerable time and energy into your XSLT code base, this ability will enable you to transition your transformations to the new recommendation on your schedule. However, note that it is up to XSLT 2.0 software providers as to whether they choose to implement this backward-compatibility option.XPath 2.0, designed to work hand-in-hand with XSLT 2.0, also has a number of new features, such as support for a large number of data types. Perhaps the most important new data type is that of the sequence. Using sequences and related functions, XPath 2.0 can process ordered lists of data in a previously impossible manner.

Much like XSLT 2.0, XPath 2.0 offers a backward-compatibility mode to ensure that expressions built with version 1.0 will work.

If you'd like to experiment with XSLT 2.0 and XPath 2.0, an open-source processor is available at saxon.sourceforge.net. The editor of the W3C's XSLT 2.0 specification, Dr. Michael Kay, created this processor.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights