Metaphorical Web: The Business Case for XSLT 2.0

by Kurt Cagle

In my previous posting (Imports and Tunnelling in XSLT2) I started down a path that I've been planning on covering for a while: presenting a solid business case for migrating to XSLT2. When I first encountered XSLT, after an initial period of attempting to understand the paradigm, I found myself both impressed and disappointed. XSLT is an often underrated technology, in great part because it doesn't fit cleanly into the Algol-based model that is most commonly used today (C, C++, C#, Java, etc.).

I consider XSLT something of a jujitsu language - it is most effective when used sparingly, letting the XML itself do the heavy lifting with the XSLT providing just enough of the pivotal support to do incredible things. That's why it has quietly become the silent partner on any number of different platforms as XML becomes more pervasive on those platforms. It is used within any number of Java and .NET applications, just another piece of the infrastructure, though one that does a disproportionate amount of the real work in applications that are increasingly driven by dynamic GUIs and web services.

Yet what disappointed me about XSLT, especially the more that I had a chance to play with it, was the fact that it was deliberately shackled by its conception as an XML translator to HTML. You couldn't manipulate non-XML text with it, could only do a fairly limited number of strong manipulations (in a language that was, fundamentally, parsing text), you couldn't create intermediate nodes for processing, and things that should have been fundamental - the ability to create an indexed for-loop, for instance - necessitated some very ugly recursion that added considerable complexity to the language without a lot of reward.

I wasn't the only one who found this to be the case, by the way. Indeed, many developers have come to XSLT for its potential capabilities but found themselves so bogged down with the verbosity and complexity of XPath manipulations that they would soon beg to find some other, easier solution. This has, in turn, created something of a backlash to the language, and more than a few projects built around XSLT have consequently become management nightmares, because few developers wanted to develop the expertise to debug seemingly incomprehensible stylesheets, especially given that it fell into the "declarative ghetto" where salaries were often lower than for procedural programmers because of the bias to see XML expertise (and consequently XSLT) as being simply an extension of HTML expertise.

This motivated me to follow the development of the "next generation" of XSLT, with the hope that it might prove an improvement over what currently existed. XSLT 1.0 was not so much broken as incomplete, though there were some fundamental changes that needed to be made to the data model in irder to accomodate the additional changes. Thus began an arduous trek following the development of XSLT 2.0.

By the time that XSLT 1.0 came out, James Clark, the iconoclastic genius who created XSLT in the first place, had shifted his attention away from transformations and into schemas, eventually laying the groundwork for Relax NG. Meanwhile, Michael Kay, the author of the authoritative XSLT books for Wrox and the creator of the Saxon XSLT processor, took over the editorship of the XSLT working group, working in conjunction with people such as Jeni Tennison and Dimitre Novatchev to establish both a set of extensions to XSLT 1.0 under the EXSLT.org banner and ultimately a proposed XSLT 1.1 Working Draft by mid 2002.

However, a number of realizations about the depth of the problem with the data model (and consequently of XPath, which relies heavily upon this model) forced the withdrawal of the XSLT 1.1 Working Draft from the W3C and the formal establishiment of an XSLT 2.0 working group. The goal of this group was simple -- to do XSLT right, to fix some of the biggest problems of XSLT that came from being based upon certain intrinsic assumptions and to revise XPath so that it would be robust enough to handle a much wider range of problems.

Not Your Father's XSLT

The language that is emerging bears a number of broad similarities with XSLT 1.0, but underneath it is a considerably more sophisticated vehicle. Perhaps the biggest change has come in the introduction of sequences. A sequence is a linear list - a set of XML objects, numbers, text strings, and other objects that can be ordered and otherwise manipulated. In XSLT 1.0 (and more specifically within XPath 1.0), you could only work with node lists, and even though such lists could hold text content (what are called text nodes) these were still containers for content rather than the content itself. By generalizing the node-set into a sequence, several things were made to happen:

Sequences could hold references to multiple distinct XML trees, something which was included as function in XSLT 1.0 (the document() function) but not in XPath 1.0.

You could create temporary trees of XML from other XML operations, using this intermediate XML as a way to perform other transformations. Most XSLT implementations had implemented a "fiat" way of doing this (a node-set() function, typically) but the implementations varied and the underlying model was incompatible with this.

Sequences made it much easier to eliminate duplicates and perform other logical operations on XML data, such as grouping (something that can be fiendishly difficult with XSLT 1.0).

Sequences made it possible to create iterative loops (something analogous to for(i=0;i!=n;i++){do something with i;}). The to operator in XPath 2.0 let's you create constructs such as (1 to 5), which generates the sequence (1,2,3,4,5).

Sequences also lay at the heart of another critical requirement for XSLT - the ability to parse strings into constituent pieces.

Once this foundational step was laid, the next stage in the process was to build up from that, adding new capabilities while still trying to retain as much of the power of the old standard as possible. This came about through the introduction of other innovations into the XSLT and XPath standards:

Regular Expressions. Regexes (as they are often called) provide a powerful tool for both searching and manipulating patterns within text. XPath 2 now incorporates the Perl regular expression engine (with some minor additions), making it possible to convert text files into sequences and from there into XML documents. This key capability makes XSLT a particularly noteworthy candidate for handling compiling, something that will be discussed later.

User Defined XPath Functions. The XPath Working Group established a formal mechanism for binding external functions into XPath, providing a clean, consisten means to build new functions XPath that could be written in C++, C#, Javascript, Java, Perl ... and XSLT. This dramatically reduces the amount of code necessary to invoke XSLT named templates (often by an order of magnitude or more) and also makes it possible to migrate XSLT from a Java based system to a C# based one without needing to change any XSLT - you'd just rewrite the external functions but keep the same function signatures.

Conditional Expressions. With XPath 2.0, you can now write if/then/else and for() statements within XPath, making it possible to create much richer logic into the language. Not only does this also reduce the amount of verbosity in the language significantly, it also makes it possible to solve what was typically not even possible in XPath - such as adding taxes and discounts into item costs in an invoice before finding a total.

Date/Time Manipulation. Date and time manipulation was something of a nightmare in XSLT 1.0, yet because of the importance of such information in transformations, there was a thriving industry in building work-arounds. Now such capability, including finding the difference between two dates or times, is built into the language.

Complex Grouping. The data model in XSLT 1.0 made it very difficult to handle certain kinds of groupings, such as mapping the relatively flat structure of HTML to the group and section model of XSL-FO or DocBook. With sequences and regular expressions, generating such groups is now possible, especially in conjunction with certain additional XSLT 2.0 elements.

Multiple Outputs. XSLT 1.0 was assymetric -- it was possible to pass in multiple XML documents through parameters, but it was not possible to produce more than one formal output. That's changed with XSLT 2.0. Now, you can write transformations that will generate any number of XML or text output formats, either to be saved to local storage or to be sent to external web addresses, depending upon security restrictions.

Type Awareness. Perhaps one of the most contoversial aspects of XSLT 2.0 and XPath 2.0 is the introduction of schema-aware tranformations which were capable of validating and manipulating typed XML content from external XML objects. This is not an intrinsic part of the specification, however, so it is less likely that all XSLT 2.0 processors will be schema aware.

This combination of features fills most of the holes left from the XSLT 1.0 implementation and makes it possible to start thinking about XSLT sitting not just along the periphery of your operation, but right in the middle handling the processing of business logic.

XSLT For Businessmen

In a typical business, you buy or implement business software in response to your changing business needs. Much of this software is anticipatory in nature - the designers of the applications attempt to model ahead of time the scenarios that are most likely to occur in your business, and then build the business logic for these scenarios into the code itself.

Anticipatory design has a number of side-effects, few of them positive. For starters, the applicability of the software becomes a measure of the degree to which the application designers were successfully able to model the business processes that occur. When the modelling is close, the application integrates well into the work flow of the company. When the modelling isn't so close, the company is all too often forced to adapt to the workflow of the software, which introduces inefficiencies.

Moreover, over time, a company's business requirements change as the business itself changes. However, the software has almost certainly been written by someone who is no longer writing that particular piece of software -- best case scenario is that they are working on some other part, and consequently have to stop what they're doing to change the code. Worst case scenario is that your lead programmer is India (unless you are in India, in which case your lead programmer is in London), has long since left the company, and likely didn't document the code terribly well. Thus, over time, the software decays, until it is no longer useful to the company, forcing another massive expenditure into the whole in your business called IT Expenditures.

Finally, many such solutions are intimately tied not just to a particular operating system but a particular machine, and should something happen to that machine, your company could be left with a major problem.

All of these situations point out the limitations of anticipatory design, but such design is still the most prevalent because it 1) keeps software vendors in business, 2) keeps consultants in business, and 3) ultimately forces hardware upgrades, keeping hardware vendors in business. Of course, unless your business is specifically dedicated to keeping these three groups in business, such design often becomes a hidden tax on computer usage, a constant drain on expenditures that becomes very easy to accept as unavoidable. However, that cost really isn't as necessary as it may seem.

One of the great benefits of XML is the fact that it's use tends to encourage adaptive rather than anticipatory design. With adaptive design, the business logic of a company can be readily encoded in an easy to manipulate bundle of information which can work across any platform. Your code can generate your user interfaces in response to changes in data requirements, passing that information into transformations that can readily encode the business logic. Moreover, even the transformations themselves can be transformed, can be designed to change as business parameters change. In short, such systems, adapt to the changing requirements of the business.

XSLT 1.0 was an interesting first step in this process, but all of the points mentioned above - the complexity of the language, the verbosity of the code, and the often counterintuitive techniques necessary to handle frequent operations made it less than idea for this particular process. However, XSLT 2.0 is considerably simpler to follow, write, and maintain, can more reliably integrate with external processes and objects, and is able to handle multiple possible input and output forms at once.

As tools such as XForms (or some equivalent XML centric forms technology) becomes more prevalent, this also means that interface tools (and not necessarily just "web tools") will increasingly generate and send XML content directly rather than the more limited name/value pairs of HTTP (in essence what SOAP does via the agency of web services), and in general XSLT is a better tool for manipulating and extracting information form XML sources than DOM tools are ... if that extracted information is itself in XML format. In that respect, DOM can be thought of as being a binding mechanism that connects XML with other object representations (that is, other programming language data structures).

This use of XSLT within XML millieus is an important concept, with very broad implications. XSLT is not sexy. There are no marketing teams out there who put out multimillion dollar ad campaigns featuring well-coifed executives staring raptly at XSLT code on their laptops. Instead, XSLT is an infrastructure sort of thing, found deep within (a surprising number of) applications, increasingly taking over the task of document and object conversions that for years had been the domain of heavily sequestered filter writers. The application I'm writing this on right now, an HTML editor which I wrote, uses XSLT to convert between an editor component's internal representation and one of several XML formats -- including docBook, XHTML, Microsoft Word 2003's XML format and others. Yet without knowing that, you'd never even be aware of how critical that technology is, because it does exist so quietly.

Code Building Code

XSLT 2.0 will likely become much more pervasive, because it's domain of applicability is so much broader and because much of the design of the second version of the language is deliberately built around the manipulation of any textually represented object -- including other programming languages. Most programming languages have a very well-defined programming structure, independent of the commands themselves -- packages, classes, constructors, destructors, methods, properties, event handlers -- in most cases there are relatively few variations off of these basic entities, in great part because programming languages are process descriptions (at least imperative languages are).

XML in turn is a good language for the description of descriptions, and as a consequence, it can very readily incorporate larger functional blocks of code in a descriptive matrix. Once in that format, generating code in other languages becomes much easier using a language such as XSLT2, especially with the addition of regular expression tokenizing parsers. On the flip side, XSLT2 is also remarkably good at the inverse process -- parsing program language blocks and converting them into an XML representation. In short, XSLT2 could find itself becoming what amounts to a universal compiler/decompiler, at first into intermediate forms such as Java or C#, and then with increasing frequency, directly into bytecode or intermediate language (IL) generators (this is especially significant of C#, which already maps many of its languages into a common IL format).

From an application standpoint, this raises the real possibility that this next generation of XSLT2 could in fact not only handle the processing of business logic, but actually generate just-in-time compiled code to more quickly execute such logic, and could perforce route it to the locations where such JIT code would be needed. User interfaces could be built on the fly, something especially critical in business applications where a significant portion of the programming work (and hence cost) that takes place is oriented to developing the screens whereby people interact with the data. The combination of XSLT2 and web services can also abstract connections into SQL and LDAP repositories, meaning both that such data sources become more interchangeable and that the way of accessing something as different as a relational database and an LDAP directory becomes irrelevant.

Finally, XSLT2 simplifies the way that information moves within your enterprise, ironically by moving away from what had been the cornerstone of programming in the 1990s - the object-oriented-programming paradigm. One of the difficulties that has emerged from OOP principles has been in determining the decomposition of a problem space into distinct classes. Programmers new to OOP (and in all too many cases not so new) have a tendency to want to model everything as a class, and as a consequence their application code begins to resemble the Harry Potter series -- full of wonder and magic, but with entirely too many pages for what is, fundamentally, a children's story. The problem with this is that each class has to be written and tested, not only in isolation but also in tandem, and a seemingly trivial change in a base class can have profound consequences for other classes built upon it.

XSLT 2.0, on the other hand, shifts the approach taken from building up this complex zoo of class critters and pushes it back towards an approach which is coming back into vogue with the advent of Linux: streams and pipes. A stream can be thought of as data moving between two processes, whether those processes be a file reader, a web server, a program, a database, a web client, or the interface to some piece of hardware. A pipe on the other hand, is the conduit which carries the stream. The OOP revolution placed a huge amount of significance on the node where pipes met, and tended to relegate the pipes and streams to secondary status, at best.

However, XML and the web is changing this. One effect of XML web services is to envision programs as being the transmission of streams of data to distinct end-points, URLs, without necessarily caring about what happens within that end-point. An object orientation gives many more points of access into an object, but typically at a cost of dealing with that object's specific data restrictions. With a web service, I can send a stream of information to a URL, and the process at that end will choose (if it is well designed) to either determine that it is valid and usable (there are processes that are designed to work with that stream at that node), that it is valid but not immediately usable (it is sent off to a different process which will attempt to rectify it into something of significance to the first process) or that it is invalid (whereupon notification of this lack of validity is sent back to the sender).

XSLT2 can handle all three potential conditions (though the case where the data is not well-formed XML gets a little complicated). Well formed XML has an associated namespace, and this namespace can actually be used by the XSLT itself to determine the proper handling of XML content, perhaps with the concept that such an XSLT could pass on the parameters acting upon the transformation into part of a SOAP message and then routing that message to the appropriate final transformation. in a purely asynchronous model (one where each node can act both as a transmitted of XML and a receiver of XML under separate processes), the routing XSLT does not have to be the XSLT that handles the final processing of the associated data -- or the one that communicates back to the client. While this model doesn't quite work in the fairly strongly synchronous browser model that most people connected to the web currently use, contemporary web clients are in fact shifting to an asynchronous model where it will work just fine.

An Open Letter

If XSLT2 is such a superior technology, why has it raised the ire of a few of the largest software vendors (most notably Microsoft)? This is in fact a case of the question providing its own answer. XSLT 2.0 provides an answer to many (perhaps most) of the complaints of XSLT 1.0, including the most damning ... that it is too conceptually difficult and verbose for programmers to learn. XSLT2 is more functionally oriented than XSLT1, making it easier for programmers more comfortable with languages such as Java or C++ to use.

XSLT2 also binds to external classes much more transparently, making it much easier to communicate with external processes within the environment, regardless of what that environment is (or what kind of box that environment is running on). It doesn't require an expensive suite of tools, compilers, and libraries of objects to work with it, and it is fundamentally oriented to manipulating XML data (though not exclusively) without the strong-typing limitations that come with Algol based languages.

XSLT is also considerably more secure, based upon what I'd call the potency argument. In essence, most binary objects contain not only their own state but also the mechanisms by which that state gets expressed to the outside world. In essence, these objects are potent - they have the ability to create side-effects that may not be obvious from their interfaces, as they have to have a certain level of interaction with the system in order to be able to function. In effect, the level of trust that these objects require of the system simply in order to operate is too high, forcing the creation of human trust authorities.

With XSLT, on the other hand, the streams of XML information coming in are pre-potent. They provide a description of state, but are reliant upon the XSLT that is resident within the server to handle the manipulation of that state, and correspondingly to specifically provide exceptions for handling things outside of the boundaries of safe behavior. It is consequently incumbent upon the maintainer of the system to choose the interpreters of that data, rather than placing the security demands upon the (non-technical) users of the applications.

Given all this, XSLT2 enabled systems could serve to signficantly erode the requirements for complex components that sit at the heart of most operating systems, a potential boon to open source systems such as Linux but one that could dramatically impede the ability to effectively sell subsequent implementations of Windows (or so at least one line of reasoning goes that I've seen bandied about). It makes code generation a just-in-time process and so effectively blurs the distinction between data and code, a distinction that Microsoft still makes even as it defines its own XML user interface language (XAML, which requires a healthy dose of C# "code-behind" in order to do more than trivial applications).

Microsoft has chosen to include XQuery 1.0 (a data-binding language that builds on XPath) but not XSLT 2.0 in Longhorn, citing everything from lack of customer interest to complexity in implementation to insufficient maturity on the part of the specification. They have even gone so far as to try to develop an alternative language, C Omega, which is supposed to provide a C# oriented approach to manipulating XML.

I've played with C Omega some - it is a reasonably good way to avoid some of the tedium of working with the W3C DOM, and it is certainly possible to use it for some of the same purposes that you'd use XSLT2, though it lacks the powerful recursive templating capability that I think gives XSLT most of its power. It presupposes that the appetite for XQuery will be strong enough that they can essentially build a hybrid language around it, though after having written two books on XQuery that have between them garnered less than the production costs for the books in question I'm much less inclined to agree, especially as XPath2/XSLT2 becomes much more functionally oriented.

At the last Sells Brothers XML Conference (which I would heartily recommend, by the way) I gave a talk on Saxon.NET, an open source project in which M. David Peterson has converted Michael Kay's superb Saxon 8.0 XSLT 2 implementation over to .NET, with Dr. Kay's approval. I'm using it now for a content management system, and it has performed far better than I had even hoped. At any rate, when the Microsoft representatives at the conference later asked at the crowd whether they would rather have work on XQuery or XSLT2, the number of people (in many cases customers of Microsoft) who wanted to see a new XSLT outnumbered those of XQuery by a considerable margin.

While I strongly support Mr. Peterson's efforts, I also would like to make a plea to Microsoft to reconsider your stance on this. I believe that the demand for a more powerful version of XSLT is out there, and that it is being driven by application developers who are building applications for Windows as well as elsewhere. It will become the de-facto standard within your competitors' business productivity suites, web clients, home entertainment applications and operating systems, because if you choose not to develop such a processor, others will provide .NET versions that will be used in place of your own offerings. You will have already done most of the hard work in implementing it, as the major portion of the changes that occur in XSLT 2 is due to the revision of XPath 2.0, which you are already developing to support XQuery.

To business decision makers reading this, chances are really good that you will likely never actually have to sit and look at a screen of XSLT 2. However, as with XML six years ago, XSLT 2 is a technology that will likely end up shouldering much of the day to day processing within your organizations over the course of the next five years -- it is a natural complement to XML, which has, like kudzu, pretty much taken over the data infrastructure of most applications it comes in contact with.

There's another factor that comes into play here from a business perspective. In 1993-4, an independent consultant could earn $250 an hour writing HTML content. Today, HTML is considered a basic skill that every graphic designer needs to learn as part of his or her doing their job, and HTML generation is mostly handled via automation mechanisms. XSLT serves many of the same purposes that tools such PHP, ASP.NET, Perl, and JSP serve today, but as the world continues its adoption of XML as the standard for conveying structured content, XSLT is becoming something of a lingua franca - a common tongue - that developers in this space are learning, and are finding in that learning that by intelligent application of that XSLT that particular skill is cleanly transferable from a Linux box running Apache using PHP and a Windows box running IIS and ASP.NET.

XSLT 2 is not a new language - it is XSLT cleaned up to handle the areas it should have been capable of handling before, with much less verbosity, more integration, more power and a considerably easier development and maintenance path. This means that the learning curve for those developers going from XSLT to XSLT 2 will be much less extreme than having to learn another language in toto. This in turn means that within a space of a couple of years, if not less, XSLT2 will likely be just another core skill that a developer should have, yet one that helps them write platform and language neutral code for dealing with all of the XML that is even now circulating through your business. With skilled programmers in any area once again beginning to demand a premium, the coming ubiquity of XSLT2 skills should help keep your labor costs down not just in your web development department, but throughout your organization.

/EndTag

There's a tendency of late for writers in the XML space to want to play down some of the more recent standards from the W3C, a tendency to go "ho-hum, it's just the sequel". I think this attitude can blind people to what is actually happening. XSLT 1.0 was a uniquely different and powerful solution, and after having worked with it on an almost daily basis for nearly the last decade my respect for its innovation has only grown. However, even four years ago I felt that it wasn't powerful enough, and the amount of customization on the part of XSLT processor vendors over the last several years to me is testament to that. XSLT 2.0 is not profoundly different from XSLT 1.0 - it's in fact almost completely backwards compatible.

It does, however, rectify the shortcomings that emerged from the first iteration of the language, and does so in a way to make it an astonishingly powerful language. History is full of such standards, such as SQL, where it took one or two iterations to handle the inevitable discovery process that is part of any great human endeavor. This tool, XSLT, is already becoming one of the core work-horses in most contemporary applications, even given that it was never originally conceived to do what it is called for. To move forward to a version improved by half a decade of insight and exploration is not only logical, it's good business.

Metaphorical Web

December 16, 2004

The Business Case for XSLT 2.0