December 22, 2004

Migrating Metaphorical Web

I have started using a new blog software provide for The Metaphoric Web, incorporating it under M. David Peterson's UnderstandingX*L sites at Please join me there. I will be redirecting the Metaphorial Web URL ( to that address by the first of the year.

-- Kurt Cagle

December 21, 2004

Take 2: Chaos and XML

When I wrote the last post, I didn't realize at the time that I was running about a 102 F temperature, and would end up spending the next three days in bed with alternating bouts of chills and sweating, talking in my sleep - it was not one of the more pleasant weekends I've ever spent. While the underlying concept of measuring complexity is sound, I erred on the definition of entropy. Entropy is not a measure of the number of states in a given system configuration, but rather a measure of the change in the number of states in a configuration over time - in a non-self organizing system, the potential number of states increases, consequently becoming more disorganized (or more properly, converting into heat).

The concept describing the log of the number of states itself is the multiplicity of a system. Multiplicity isn't of course the only measure of complexity, but its a pretty useful one. Think of the traditional model of chaos - the conversion of laminar flow to turbulance. Typically there the first state is simple laminar flow - each water molecule flows in a straight line. As the flow moves faster, interactions with the surrounding media become more prominant, causing first a split into two streams, then four, then eight, until eventually there are thousands of such substreams, and the water becomes turbulant (this is the typical model described by Lorentz equations, by the way).

I have to wonder, though, if in fact there is some underlying connectivity between language (whether human based or artificial, as in the case of XML) and chaos theory, by this same modelling. Specifically, as the multiplicity of a document schema increases, so does the turbulence induced by that schema. I'm not sure what that the interpretation of that turbulence might be, though informational noise would be an obvious candidate. A simple schema is unambiguous - the specific meaning of a given aspect (element content or attribute value) is clearly defined, within its own constraints. However, as the number of aspects increase, their potential for coupling increases, for two properties to both be dependent upon a third (potentially undeclared) property, or two elements may in fact describe the same property in different and potentially conflicting, ways. Ambiguity is a form of informational noise.

Keeping this short (for me) tonight. I will be migrating in the near future to a new website, though it should still be accessible via Until next time ...

December 19, 2004

XML and Entropy

Lately, I've been spending a lot of time reading outside my usual diet of programming books,in great part because I find that inspirations often strike when you can think about different endeavors and how problems were solved in those. One particular book that has set me to thinking has been Minds, Machines, and the Multiverse: The Quest for the Quantum Computer by Julian Brown, an intriguing discussion both of quantum computers and the multiverse interpretation of quantum probabilities.

In one section, Brown discusses the role of thermodynamics in information theory, and more specifically, the role of entropy. Entropy is one of those concepts that has gained, over the years, and almost mystical aura about it, the basis for all Murphy's Laws, but in point of fact it is actually a pretty simple concept to understand - and it has definite applications to one of the central problems that I see with how business people utilize XML technologies.

Entropy, in its information theory form, is a measure of the total number of states that a given system can be in. For instance, think about two bytes ... this can hold up to 65,536 possible states. Typically entropy is measured as the logarithm (and in the case of information sets, the logarithm base 2, or log2) so that the total entropy of that system of bits would be log2(65,536) or 16, which is, not coincidentally, the number of bits in two bytes.

One of the challenges faced by IT professionals working with XML is trying to figure out which tools work best for the scope of XML you're going to be working with. DOM or SAX manipulation typically does not handle complex XML well, XQuery is perhaps better for slightly more complex XML but lacks the recursive templating structure that works best for documents. The difficulty comes in determing at what point an XML resource shifts from one area of complexity to another.

Every XML document has a schema that describes the structure. If you make a few basic assumptions, you can in fact get an idea about the number of states that the schema allows:

  • multiplicities of an identical structure count only once if the upper limit of such multiplicities is "unbounded",
  • PCDATA data is immaterial, whether as text or as attributes. However, an attribute with multiple NMTokens will be treated as having one state for each enumeration.
  • alternatives within the schema will each be treated as separate trees for determining the total count of states.
  • If a given element can contain another element of that same name, then the count stops at that second element. This avoids infinite recursion.

By this measure, a schema with no variability would have an entropy of log2(1) = 0, a schema with one element of variability would have an entropy of log2(2) = 1, two elements of variability would be log2(3) = 1.585, and so forth. By this measure, most business documents (invoices and so forth) would likely have entropies in the neighborhood of 0 to 4, XML processes (such as an XSLT transformation) might be in the neighborhood of 10-12, and literary documents might have entropies in the neighborhood of 15-20. Keep in mind that these are logarithmic values - an entropy of 20 would correspond to 2 to the 20th states, or roughly 1,000,000 possible schema instances.

Entropy is important because it can better clarify the domain at which it is best to work with a given document. XQuery I think provides a good case in point here. XQuery supports XPath, and so it has some of the advantages that XSLT has, but it's not really all that useful for dealing with documents -- converting a DocBook document into WordML or vice versa would be impossible in XQuery, but for many business schemas with comparatively low entropies, XSLT is definitely overkill.

Lately, its become somewhat fashionable, especially among the business set, to deprecate XSLT in favor of XQuery for enterprise level applications. I see nothing wrong with this. XSLT is not always easy to work with, requires a different way of looking at a set of problems, and is probably not worth the effort or overhead for many types of transformations. That does not mean that XSLT does not have a very important place in the ecosystem, and its my own personal opinion that thinking that you have to choose either/or will limit you when you do have to deal with high entropy documents, as is the case with most document management applications.

December 16, 2004

The Business Case for XSLT 2.0

by Kurt Cagle

In my previous posting (Imports and Tunnelling in XSLT2) I started down a path that I've been planning on covering for a while: presenting a solid business case for migrating to XSLT2. When I first encountered XSLT, after an initial period of attempting to understand the paradigm, I found myself both impressed and disappointed. XSLT is an often underrated technology, in great part because it doesn't fit cleanly into the Algol-based model that is most commonly used today (C, C++, C#, Java, etc.).

I consider XSLT something of a jujitsu language - it is most effective when used sparingly, letting the XML itself do the heavy lifting with the XSLT providing just enough of the pivotal support to do incredible things. That's why it has quietly become the silent partner on any number of different platforms as XML becomes more pervasive on those platforms. It is used within any number of Java and .NET applications, just another piece of the infrastructure, though one that does a disproportionate amount of the real work in applications that are increasingly driven by dynamic GUIs and web services.

Yet what disappointed me about XSLT, especially the more that I had a chance to play with it, was the fact that it was deliberately shackled by its conception as an XML translator to HTML. You couldn't manipulate non-XML text with it, could only do a fairly limited number of strong manipulations (in a language that was, fundamentally, parsing text), you couldn't create intermediate nodes for processing, and things that should have been fundamental - the ability to create an indexed for-loop, for instance - necessitated some very ugly recursion that added considerable complexity to the language without a lot of reward.

I wasn't the only one who found this to be the case, by the way. Indeed, many developers have come to XSLT for its potential capabilities but found themselves so bogged down with the verbosity and complexity of XPath manipulations that they would soon beg to find some other, easier solution. This has, in turn, created something of a backlash to the language, and more than a few projects built around XSLT have consequently become management nightmares, because few developers wanted to develop the expertise to debug seemingly incomprehensible stylesheets, especially given that it fell into the "declarative ghetto" where salaries were often lower than for procedural programmers because of the bias to see XML expertise (and consequently XSLT) as being simply an extension of HTML expertise.

This motivated me to follow the development of the "next generation" of XSLT, with the hope that it might prove an improvement over what currently existed. XSLT 1.0 was not so much broken as incomplete, though there were some fundamental changes that needed to be made to the data model in irder to accomodate the additional changes. Thus began an arduous trek following the development of XSLT 2.0.

By the time that XSLT 1.0 came out, James Clark, the iconoclastic genius who created XSLT in the first place, had shifted his attention away from transformations and into schemas, eventually laying the groundwork for Relax NG. Meanwhile, Michael Kay, the author of the authoritative XSLT books for Wrox and the creator of the Saxon XSLT processor, took over the editorship of the XSLT working group, working in conjunction with people such as Jeni Tennison and Dimitre Novatchev to establish both a set of extensions to XSLT 1.0 under the banner and ultimately a proposed XSLT 1.1 Working Draft by mid 2002.

However, a number of realizations about the depth of the problem with the data model (and consequently of XPath, which relies heavily upon this model) forced the withdrawal of the XSLT 1.1 Working Draft from the W3C and the formal establishiment of an XSLT 2.0 working group. The goal of this group was simple -- to do XSLT right, to fix some of the biggest problems of XSLT that came from being based upon certain intrinsic assumptions and to revise XPath so that it would be robust enough to handle a much wider range of problems.

Not Your Father's XSLT

The language that is emerging bears a number of broad similarities with XSLT 1.0, but underneath it is a considerably more sophisticated vehicle. Perhaps the biggest change has come in the introduction of sequences. A sequence is a linear list - a set of XML objects, numbers, text strings, and other objects that can be ordered and otherwise manipulated. In XSLT 1.0 (and more specifically within XPath 1.0), you could only work with node lists, and even though such lists could hold text content (what are called text nodes) these were still containers for content rather than the content itself. By generalizing the node-set into a sequence, several things were made to happen:

  • Sequences could hold references to multiple distinct XML trees, something which was included as function in XSLT 1.0 (the document() function) but not in XPath 1.0.

  • You could create temporary trees of XML from other XML operations, using this intermediate XML as a way to perform other transformations. Most XSLT implementations had implemented a "fiat" way of doing this (a node-set() function, typically) but the implementations varied and the underlying model was incompatible with this.

  • Sequences made it much easier to eliminate duplicates and perform other logical operations on XML data, such as grouping (something that can be fiendishly difficult with XSLT 1.0).

  • Sequences made it possible to create iterative loops (something analogous to for(i=0;i!=n;i++){do something with i;}). The to operator in XPath 2.0 let's you create constructs such as (1 to 5), which generates the sequence (1,2,3,4,5).

  • Sequences also lay at the heart of another critical requirement for XSLT - the ability to parse strings into constituent pieces.

Once this foundational step was laid, the next stage in the process was to build up from that, adding new capabilities while still trying to retain as much of the power of the old standard as possible. This came about through the introduction of other innovations into the XSLT and XPath standards:

  • Regular Expressions. Regexes (as they are often called) provide a powerful tool for both searching and manipulating patterns within text. XPath 2 now incorporates the Perl regular expression engine (with some minor additions), making it possible to convert text files into sequences and from there into XML documents. This key capability makes XSLT a particularly noteworthy candidate for handling compiling, something that will be discussed later.

  • User Defined XPath Functions. The XPath Working Group established a formal mechanism for binding external functions into XPath, providing a clean, consisten means to build new functions XPath that could be written in C++, C#, Javascript, Java, Perl ... and XSLT. This dramatically reduces the amount of code necessary to invoke XSLT named templates (often by an order of magnitude or more) and also makes it possible to migrate XSLT from a Java based system to a C# based one without needing to change any XSLT - you'd just rewrite the external functions but keep the same function signatures.

  • Conditional Expressions. With XPath 2.0, you can now write if/then/else and for() statements within XPath, making it possible to create much richer logic into the language. Not only does this also reduce the amount of verbosity in the language significantly, it also makes it possible to solve what was typically not even possible in XPath - such as adding taxes and discounts into item costs in an invoice before finding a total.

  • Date/Time Manipulation. Date and time manipulation was something of a nightmare in XSLT 1.0, yet because of the importance of such information in transformations, there was a thriving industry in building work-arounds. Now such capability, including finding the difference between two dates or times, is built into the language.

  • Complex Grouping. The data model in XSLT 1.0 made it very difficult to handle certain kinds of groupings, such as mapping the relatively flat structure of HTML to the group and section model of XSL-FO or DocBook. With sequences and regular expressions, generating such groups is now possible, especially in conjunction with certain additional XSLT 2.0 elements.

  • Multiple Outputs. XSLT 1.0 was assymetric -- it was possible to pass in multiple XML documents through parameters, but it was not possible to produce more than one formal output. That's changed with XSLT 2.0. Now, you can write transformations that will generate any number of XML or text output formats, either to be saved to local storage or to be sent to external web addresses, depending upon security restrictions.

  • Type Awareness. Perhaps one of the most contoversial aspects of XSLT 2.0 and XPath 2.0 is the introduction of schema-aware tranformations which were capable of validating and manipulating typed XML content from external XML objects. This is not an intrinsic part of the specification, however, so it is less likely that all XSLT 2.0 processors will be schema aware.

This combination of features fills most of the holes left from the XSLT 1.0 implementation and makes it possible to start thinking about XSLT sitting not just along the periphery of your operation, but right in the middle handling the processing of business logic.

XSLT For Businessmen

In a typical business, you buy or implement business software in response to your changing business needs. Much of this software is anticipatory in nature - the designers of the applications attempt to model ahead of time the scenarios that are most likely to occur in your business, and then build the business logic for these scenarios into the code itself.

Anticipatory design has a number of side-effects, few of them positive. For starters, the applicability of the software becomes a measure of the degree to which the application designers were successfully able to model the business processes that occur. When the modelling is close, the application integrates well into the work flow of the company. When the modelling isn't so close, the company is all too often forced to adapt to the workflow of the software, which introduces inefficiencies.

Moreover, over time, a company's business requirements change as the business itself changes. However, the software has almost certainly been written by someone who is no longer writing that particular piece of software -- best case scenario is that they are working on some other part, and consequently have to stop what they're doing to change the code. Worst case scenario is that your lead programmer is India (unless you are in India, in which case your lead programmer is in London), has long since left the company, and likely didn't document the code terribly well. Thus, over time, the software decays, until it is no longer useful to the company, forcing another massive expenditure into the whole in your business called IT Expenditures.

Finally, many such solutions are intimately tied not just to a particular operating system but a particular machine, and should something happen to that machine, your company could be left with a major problem.

All of these situations point out the limitations of anticipatory design, but such design is still the most prevalent because it 1) keeps software vendors in business, 2) keeps consultants in business, and 3) ultimately forces hardware upgrades, keeping hardware vendors in business. Of course, unless your business is specifically dedicated to keeping these three groups in business, such design often becomes a hidden tax on computer usage, a constant drain on expenditures that becomes very easy to accept as unavoidable. However, that cost really isn't as necessary as it may seem.

One of the great benefits of XML is the fact that it's use tends to encourage adaptive rather than anticipatory design. With adaptive design, the business logic of a company can be readily encoded in an easy to manipulate bundle of information which can work across any platform. Your code can generate your user interfaces in response to changes in data requirements, passing that information into transformations that can readily encode the business logic. Moreover, even the transformations themselves can be transformed, can be designed to change as business parameters change. In short, such systems, adapt to the changing requirements of the business.

XSLT 1.0 was an interesting first step in this process, but all of the points mentioned above - the complexity of the language, the verbosity of the code, and the often counterintuitive techniques necessary to handle frequent operations made it less than idea for this particular process. However, XSLT 2.0 is considerably simpler to follow, write, and maintain, can more reliably integrate with external processes and objects, and is able to handle multiple possible input and output forms at once.

As tools such as XForms (or some equivalent XML centric forms technology) becomes more prevalent, this also means that interface tools (and not necessarily just "web tools") will increasingly generate and send XML content directly rather than the more limited name/value pairs of HTTP (in essence what SOAP does via the agency of web services), and in general XSLT is a better tool for manipulating and extracting information form XML sources than DOM tools are ... if that extracted information is itself in XML format. In that respect, DOM can be thought of as being a binding mechanism that connects XML with other object representations (that is, other programming language data structures).

This use of XSLT within XML millieus is an important concept, with very broad implications. XSLT is not sexy. There are no marketing teams out there who put out multimillion dollar ad campaigns featuring well-coifed executives staring raptly at XSLT code on their laptops. Instead, XSLT is an infrastructure sort of thing, found deep within (a surprising number of) applications, increasingly taking over the task of document and object conversions that for years had been the domain of heavily sequestered filter writers. The application I'm writing this on right now, an HTML editor which I wrote, uses XSLT to convert between an editor component's internal representation and one of several XML formats -- including docBook, XHTML, Microsoft Word 2003's XML format and others. Yet without knowing that, you'd never even be aware of how critical that technology is, because it does exist so quietly.

Code Building Code

XSLT 2.0 will likely become much more pervasive, because it's domain of applicability is so much broader and because much of the design of the second version of the language is deliberately built around the manipulation of any textually represented object -- including other programming languages. Most programming languages have a very well-defined programming structure, independent of the commands themselves -- packages, classes, constructors, destructors, methods, properties, event handlers -- in most cases there are relatively few variations off of these basic entities, in great part because programming languages are process descriptions (at least imperative languages are).

XML in turn is a good language for the description of descriptions, and as a consequence, it can very readily incorporate larger functional blocks of code in a descriptive matrix. Once in that format, generating code in other languages becomes much easier using a language such as XSLT2, especially with the addition of regular expression tokenizing parsers. On the flip side, XSLT2 is also remarkably good at the inverse process -- parsing program language blocks and converting them into an XML representation. In short, XSLT2 could find itself becoming what amounts to a universal compiler/decompiler, at first into intermediate forms such as Java or C#, and then with increasing frequency, directly into bytecode or intermediate language (IL) generators (this is especially significant of C#, which already maps many of its languages into a common IL format).

From an application standpoint, this raises the real possibility that this next generation of XSLT2 could in fact not only handle the processing of business logic, but actually generate just-in-time compiled code to more quickly execute such logic, and could perforce route it to the locations where such JIT code would be needed. User interfaces could be built on the fly, something especially critical in business applications where a significant portion of the programming work (and hence cost) that takes place is oriented to developing the screens whereby people interact with the data. The combination of XSLT2 and web services can also abstract connections into SQL and LDAP repositories, meaning both that such data sources become more interchangeable and that the way of accessing something as different as a relational database and an LDAP directory becomes irrelevant.

Finally, XSLT2 simplifies the way that information moves within your enterprise, ironically by moving away from what had been the cornerstone of programming in the 1990s - the object-oriented-programming paradigm. One of the difficulties that has emerged from OOP principles has been in determining the decomposition of a problem space into distinct classes. Programmers new to OOP (and in all too many cases not so new) have a tendency to want to model everything as a class, and as a consequence their application code begins to resemble the Harry Potter series -- full of wonder and magic, but with entirely too many pages for what is, fundamentally, a children's story. The problem with this is that each class has to be written and tested, not only in isolation but also in tandem, and a seemingly trivial change in a base class can have profound consequences for other classes built upon it.

XSLT 2.0, on the other hand, shifts the approach taken from building up this complex zoo of class critters and pushes it back towards an approach which is coming back into vogue with the advent of Linux: streams and pipes. A stream can be thought of as data moving between two processes, whether those processes be a file reader, a web server, a program, a database, a web client, or the interface to some piece of hardware. A pipe on the other hand, is the conduit which carries the stream. The OOP revolution placed a huge amount of significance on the node where pipes met, and tended to relegate the pipes and streams to secondary status, at best.

However, XML and the web is changing this. One effect of XML web services is to envision programs as being the transmission of streams of data to distinct end-points, URLs, without necessarily caring about what happens within that end-point. An object orientation gives many more points of access into an object, but typically at a cost of dealing with that object's specific data restrictions. With a web service, I can send a stream of information to a URL, and the process at that end will choose (if it is well designed) to either determine that it is valid and usable (there are processes that are designed to work with that stream at that node), that it is valid but not immediately usable (it is sent off to a different process which will attempt to rectify it into something of significance to the first process) or that it is invalid (whereupon notification of this lack of validity is sent back to the sender).

XSLT2 can handle all three potential conditions (though the case where the data is not well-formed XML gets a little complicated). Well formed XML has an associated namespace, and this namespace can actually be used by the XSLT itself to determine the proper handling of XML content, perhaps with the concept that such an XSLT could pass on the parameters acting upon the transformation into part of a SOAP message and then routing that message to the appropriate final transformation. in a purely asynchronous model (one where each node can act both as a transmitted of XML and a receiver of XML under separate processes), the routing XSLT does not have to be the XSLT that handles the final processing of the associated data -- or the one that communicates back to the client. While this model doesn't quite work in the fairly strongly synchronous browser model that most people connected to the web currently use, contemporary web clients are in fact shifting to an asynchronous model where it will work just fine.

An Open Letter

If XSLT2 is such a superior technology, why has it raised the ire of a few of the largest software vendors (most notably Microsoft)? This is in fact a case of the question providing its own answer. XSLT 2.0 provides an answer to many (perhaps most) of the complaints of XSLT 1.0, including the most damning ... that it is too conceptually difficult and verbose for programmers to learn. XSLT2 is more functionally oriented than XSLT1, making it easier for programmers more comfortable with languages such as Java or C++ to use.

XSLT2 also binds to external classes much more transparently, making it much easier to communicate with external processes within the environment, regardless of what that environment is (or what kind of box that environment is running on). It doesn't require an expensive suite of tools, compilers, and libraries of objects to work with it, and it is fundamentally oriented to manipulating XML data (though not exclusively) without the strong-typing limitations that come with Algol based languages.

XSLT is also considerably more secure, based upon what I'd call the potency argument. In essence, most binary objects contain not only their own state but also the mechanisms by which that state gets expressed to the outside world. In essence, these objects are potent - they have the ability to create side-effects that may not be obvious from their interfaces, as they have to have a certain level of interaction with the system in order to be able to function. In effect, the level of trust that these objects require of the system simply in order to operate is too high, forcing the creation of human trust authorities.

With XSLT, on the other hand, the streams of XML information coming in are pre-potent. They provide a description of state, but are reliant upon the XSLT that is resident within the server to handle the manipulation of that state, and correspondingly to specifically provide exceptions for handling things outside of the boundaries of safe behavior. It is consequently incumbent upon the maintainer of the system to choose the interpreters of that data, rather than placing the security demands upon the (non-technical) users of the applications.

Given all this, XSLT2 enabled systems could serve to signficantly erode the requirements for complex components that sit at the heart of most operating systems, a potential boon to open source systems such as Linux but one that could dramatically impede the ability to effectively sell subsequent implementations of Windows (or so at least one line of reasoning goes that I've seen bandied about). It makes code generation a just-in-time process and so effectively blurs the distinction between data and code, a distinction that Microsoft still makes even as it defines its own XML user interface language (XAML, which requires a healthy dose of C# "code-behind" in order to do more than trivial applications).

Microsoft has chosen to include XQuery 1.0 (a data-binding language that builds on XPath) but not XSLT 2.0 in Longhorn, citing everything from lack of customer interest to complexity in implementation to insufficient maturity on the part of the specification. They have even gone so far as to try to develop an alternative language, C Omega, which is supposed to provide a C# oriented approach to manipulating XML.

I've played with C Omega some - it is a reasonably good way to avoid some of the tedium of working with the W3C DOM, and it is certainly possible to use it for some of the same purposes that you'd use XSLT2, though it lacks the powerful recursive templating capability that I think gives XSLT most of its power. It presupposes that the appetite for XQuery will be strong enough that they can essentially build a hybrid language around it, though after having written two books on XQuery that have between them garnered less than the production costs for the books in question I'm much less inclined to agree, especially as XPath2/XSLT2 becomes much more functionally oriented.

At the last Sells Brothers XML Conference (which I would heartily recommend, by the way) I gave a talk on Saxon.NET, an open source project in which M. David Peterson has converted Michael Kay's superb Saxon 8.0 XSLT 2 implementation over to .NET, with Dr. Kay's approval. I'm using it now for a content management system, and it has performed far better than I had even hoped. At any rate, when the Microsoft representatives at the conference later asked at the crowd whether they would rather have work on XQuery or XSLT2, the number of people (in many cases customers of Microsoft) who wanted to see a new XSLT outnumbered those of XQuery by a considerable margin.

While I strongly support Mr. Peterson's efforts, I also would like to make a plea to Microsoft to reconsider your stance on this. I believe that the demand for a more powerful version of XSLT is out there, and that it is being driven by application developers who are building applications for Windows as well as elsewhere. It will become the de-facto standard within your competitors' business productivity suites, web clients, home entertainment applications and operating systems, because if you choose not to develop such a processor, others will provide .NET versions that will be used in place of your own offerings. You will have already done most of the hard work in implementing it, as the major portion of the changes that occur in XSLT 2 is due to the revision of XPath 2.0, which you are already developing to support XQuery.

To business decision makers reading this, chances are really good that you will likely never actually have to sit and look at a screen of XSLT 2. However, as with XML six years ago, XSLT 2 is a technology that will likely end up shouldering much of the day to day processing within your organizations over the course of the next five years -- it is a natural complement to XML, which has, like kudzu, pretty much taken over the data infrastructure of most applications it comes in contact with.

There's another factor that comes into play here from a business perspective. In 1993-4, an independent consultant could earn $250 an hour writing HTML content. Today, HTML is considered a basic skill that every graphic designer needs to learn as part of his or her doing their job, and HTML generation is mostly handled via automation mechanisms. XSLT serves many of the same purposes that tools such PHP, ASP.NET, Perl, and JSP serve today, but as the world continues its adoption of XML as the standard for conveying structured content, XSLT is becoming something of a lingua franca - a common tongue - that developers in this space are learning, and are finding in that learning that by intelligent application of that XSLT that particular skill is cleanly transferable from a Linux box running Apache using PHP and a Windows box running IIS and ASP.NET.

XSLT 2 is not a new language - it is XSLT cleaned up to handle the areas it should have been capable of handling before, with much less verbosity, more integration, more power and a considerably easier development and maintenance path. This means that the learning curve for those developers going from XSLT to XSLT 2 will be much less extreme than having to learn another language in toto. This in turn means that within a space of a couple of years, if not less, XSLT2 will likely be just another core skill that a developer should have, yet one that helps them write platform and language neutral code for dealing with all of the XML that is even now circulating through your business. With skilled programmers in any area once again beginning to demand a premium, the coming ubiquity of XSLT2 skills should help keep your labor costs down not just in your web development department, but throughout your organization.


There's a tendency of late for writers in the XML space to want to play down some of the more recent standards from the W3C, a tendency to go "ho-hum, it's just the sequel". I think this attitude can blind people to what is actually happening. XSLT 1.0 was a uniquely different and powerful solution, and after having worked with it on an almost daily basis for nearly the last decade my respect for its innovation has only grown. However, even four years ago I felt that it wasn't powerful enough, and the amount of customization on the part of XSLT processor vendors over the last several years to me is testament to that. XSLT 2.0 is not profoundly different from XSLT 1.0 - it's in fact almost completely backwards compatible.

It does, however, rectify the shortcomings that emerged from the first iteration of the language, and does so in a way to make it an astonishingly powerful language. History is full of such standards, such as SQL, where it took one or two iterations to handle the inevitable discovery process that is part of any great human endeavor. This tool, XSLT, is already becoming one of the core work-horses in most contemporary applications, even given that it was never originally conceived to do what it is called for. To move forward to a version improved by half a decade of insight and exploration is not only logical, it's good business.

December 12, 2004

Imports and Tunnels in XSLT2

Every so often, I discover that just when I think I've wrung everthing I can out of XSLT, it is still possible to do something subtle that changes my whole mindset about the language. To me, one of the hallmarks of greatness in a language is that very ability to surprise, to make even a somewhat jaded coder sit up and say "Hey, that's cool!".

I faced a challenge of that sort recently in an application I was working on. The issue that I was struggling with was that there are aspects to XSLT2 which are not quite polymorphic. Polymorphism, for those of you who might of slept through that particular aspect of OOP training, is the idea that you can have two distinct objects have a particular method which, when invoked, produce similar but not necessarily identical actions.

The canonical example of this comes with the Shape class, with the method draw(). If I have two child classes called Rect and Circle, each of which inherit the draw() interface from Shape, then I could create instances of these classes:

rect =new Rect(); rect.draw();

circle = new Circle(); circle.draw();

each of these objects will perform a draw operation relevant to its given class (drawing a rectangle or a circle, as appropriate). By creating such a set of conventions (and the notion of a consistent interface gained via inheritance) you can additionally work with a wide range of similar objects through the same interface without having to write special type handlers for each object.

The three pillars of OOP, inheritance, encapsulation, and polymorphism, can exist within XSLT, but only somewhat uneasily. Inheritance, for instance, can be gained by creating two stylesheets, the first containing a dominant set of templates, the second "replacement" templates that effectively replace the first if they have an equal or higher precedence match. The problem with this is that there is no real way of saying "utilize the replacement template, then invoke the original template", the analog to the super() function in languages such as Java, which invokes the parent's method when invoked the same method of a child object.

XSLT 2.0 does provide features to ameliorate this somewhat.The first involves a subtle shift (albeit one that can cause a lot of grief for XSLT 1.0 developers). In XSLT 1.0, imported stylesheets had a higher priority than importing ones. This made it possible to "subclass" templates easily, but provided no clear mechanism for being able to invoke the replaced templates.

However, in XSLT 2.0, the priorities have been switched, which means that the calling stylesheet's templates will have a higher priority than the called stylesheet's templates, unless the priority is explicitly set as an attribute as being different. This has the immediate effect of giving the illusion that imported templates don't work. That's not quite the case, however. Instead, such templates can now be invoked with the <xsl:apply-imports/> element. This causes the processor to only look at templates in the imported stylesheet when applying the rules, and it is this functionality that gives you "super()" like behavior.

For instance, suppose that you have a stylesheet that takes the value of an item, and if it is a number less than 0, renders that number in red (in the file accounting.xsl)

<xsl:template match="example">
<xsl:when test="number(text()) != NaN">
<xsl:when test="number(text()) ge = 0">
<span style="color:green"><xsl:value-of select="."/></span>
<span style="color:red"><xsl:value-of select="."/></span>
</xsl:when> <xsl:otherwise> <xsl:value-of select="."/> </xsl:otherwise> </xsl:choose></xsl:template>

A second template for "example" in a calling document could handle placing this content into a table cell, using the <xsl:apply-imports> element, as follows:

<xsl:import href="accounting.xsl"/>
<xsl:template match="example">
</xsl:template><xsl:template match="examples"><table> <xsl:apply-templates/></table></xsl:template>

If the source XML document had the form:<examples><example>125</example> <example>-150</example><example>Twelve</example></examples>

Then this will render as:<table><tr><td><span style="color:green">125</span></td></tr><tr><td><span style="color:red">-150</span></td></tr><tr><td>Twelve</td></tr></table>

Significantly, if no import document is specified, then this will act in the manner appropriate for <xs:apply-templates/> for processing child elements and text of the current context. In other words, without the imported stylesheet, this will render as:

<table> <tr><td>125</td></tr> <tr><td>-150</td></tr> <tr><td>Twelve</td></tr></table>

This fairly subtle change will have an enormous impact upon the way that you develop stylesheets, especially in conjunction with a number of the other features that are now in play, including the next bit of subtlety - tunnelling parameters.

Digging Those Tunnels

You have probably noticed that while its possible to talk about applying OOP techniques to XSLT, such concepts do not necessarily translate cleanly in a one to one manner to how XSLT itself works. One instance of this comes from the problems inherent with dealing with globals.

Global variables are bad in the way that cookies and sweets are bad ... one or two, taken sparingly, can improve your code without too many ill effects, but go overboard and you'll be seeing the results show up in bloated code (or at least a bloated waistline).

Normally, in OOP code, you define variables that are appropriate for a given class, and oany methods that are defined within that class have the variables available to them. You can also create public properties which appear like "public" variables that are available from outside the class, but in most languages, these "public" variables are being replaced by getter/setter methods that expose specific properties while at the same time providing a certain degree of insulation on these variables. Additionally, regardless of scope, variables defined in this manner may also be available to any derived class of the base class, through some variation of the friend keyword.

In XSLT, things work a little differently. You can define a global variable or parameter within the <xsl:stylesheet> element of a transformation, in which case the variables are available anywhere within the transformation. However, if you import a global variable from an external template into your transformation, that variable or parameter will be overridden by a variable in the calling document if they have the same name. In some cases, this may be desirable, but in others, the calling template may have defined these variables for handling some internal state within the transformation, at which point you're feeding gibberish into your transformations (and usually very hard to debug gibberish, at that, as the code is syntactically correct).

You could, of course, not use globals in the called routines (a good practice anyway) and instead choose only to define the parameters within the likely entry point of the called stylesheet and then pass them down from template to template. Unfortunately, this solution tends to be rather verbose, especially when dozens of such "internal variables" are needed. Especially if the called templates didn't require the parameters, this particular technique was really only worthwhile if you absolutely needed that data at some point.

In XSLT2, a different approach was taken. A new attribute, the tunnel attribute, is placed on parameters in matched templates (i.e., <xsl:template match="foo") elements, with tunnel = "yes" indicating that the parameter should participate in tunneling, and tunnel = "no" indicating that the parameter is not tunnelled. An apply-templates parameter in turn would require that the tunnel = "yes" attribute be added to the <xsl:with-parameter> element in order for that variable to tunnel.

So what exactly is this tunnelling? In essence, a tunnel indicates that if a template calls another template that does not have the parameter declared (and consequently does not use that parameter), and this template in turn calls a third template that does have tunnelled parameters, then the third (or fourth, etc.) template acts as if it was called with the parameters from the first template. This way, the only templates that receive the parameters are the ones that actually use them, and you are able to keep your code lean in mean in the process.

As an illustration, the following illustrates a "tunnelled" hello,world type transformation:

<xsl:stylesheet xmlns:xsl="" version="2.0">
<xsl:template match="/">
<xsl:apply-templates select="foo">
<xsl:with-param name="username" select="'Aleria'" tunnel="yes"/>

<xsl:template match="foo">
<title>Hello, World!</title>
<xsl:apply-templates select="bar"/>

<xsl:template match="bar">
<xsl:param name="username" tunnel="yes"/>
<h1>Hello, <xsl:value-of select="$username"/></h1>

With the assumed input being:

In this particular case, the parameter "username" is invoked from the root template match for the "foo" template. However, foo has no need of the parameter, so it doesn't need to declare the parameter. The "bar" template, on the other hand, does need to use the parameter, so it creates a parameter declaration, including the critical tunnel = "yes". Without this, the XSLT processor would work on the assumption that the parameter was simply not invoked from the "foo" template, and as a consequence, the $username is assumed to be blank.

This form of parameterization can vastly simplify writing imported stylesheets. Typically such sheets will typically provide a single point of entry for subsequent transformations in the imported sheets(especially if the templates use the mode attribute), and rather than defining global parameters that could possibly be clobbered, the entry point template could define the parameters within its body as being tunnelled, making these parameters available to any member of that set of modal templates.



Thus, in addition to making your code less verbose (and hence easier to both write and maintain) tunnelling can additionally serve to make your imported stylesheets more modular and classlike. In essence, this turns an imported stylesheet into something that has a number of the characteristics of classes within a more formal hierarchical language such as Java or C++, while at the same time retaining the templating architecture that makes XSLT such a powerful language in the first place.

December 4, 2004

Internet Time and the Long Now

I wish to thank Edd Dumbill for syndicating the Metaphorical Web on his superb blog aggregator site It's an honor and a privilege, and I hope to be able to produce commentary in keeping with the luminaries on this board.

I've been pondering for a while the state of the XML world. XML has now been in existence as a standard for nearly seven years. XSLT and XPath for six, SVG for four and change. By the scale of Internet time (how retrograde that expression seems now) these technologies are no longer bright and shiny new, but are actually approaching middle-aged. Heck, Microsoft would have gone through two world changing marketing campaigns by now, and would be gearing up on the third. So why does it seem that XML is still just getting started?

Oh, you could single out all the usual suspects, of course. The W3C moving at a snail's pace, vendor lock-in keeping these open standards from reaching their true pace, the lack of marketing to push standards into the forefront of computing and so forth (all of which are true to a certain extent). However, over the last year or so another idea has occurred to me, one that is perhaps heretical in the world of nanosecond processors, yet something that has nagged and tugged at my consciousness for a while.

Daniel Hillis, the founder of Thinking Machines and an Imagineer for Disney for a number of years, has of late spent a great deal of time concentrating on another project of his, The Clock of the Long Now. This particular clock is unusual in that it is intended to exist for 10,000 years. To put that into perspective, 10,000 years ago Europe was beginning to repopulate after the last of the continent covering glaciers had retreated, and the dominant language was an early Mesopotamian ur-language which would in time branch into the Indo-European languages.

Thus, to build such a clock, you have to design it to be simple, to be self-maintaining (who knows how many centuries may pass between people even knowing of it) and to be intuitively obvious regardless of what culture finds it. In many ways, the Clock is the ultimate exercise in user interface design. It is also, in its own way, a remarkable statement about the nature of time itself and man's interaction with it.

So what does such a clock have to do with XML? In any period of intense innovation, a pattern tends to emerge, regardless of the technology in question. Most of this should be familiar as a variant of the adoption curve:

  1. An innovation is created, usually a variant of some existing technology but with a twist that moves the technology in a completely unexpected direction.
  2. "Hobbyists" emerge, the earliest adopters who work with out while the technology is still in rough stages. They often end up becoming the evangelists for that technology later on.
  3. The technology attracts the attention of early investors, who see this as a way to start companies that exploit the technology. Such investors seldom come from the hobbyists, but they are keyed in enough to what where the hobbyists are most excited.
  4. The technology goes into its hype phase, where marketers begin promoting it for everything from improving efficiency to curing the common cold. This phase usually involves the appearance of semi-nude women seductively wrapped around the piece of hardware, working at a computer with the software, or otherwise doing improper things with a technological implementation.
  5. The backlash occurs as people realize that it does not in fact cure the common cold, may actual decrease efficiency in certain ways, and tends to be a big turnoff to attractive young women at bars (who are usually after the early investors, not the schleppy users).
  6. Meanwhile, those people who utilize the technology for a living figure out whether the technology actually does meet a specific need of theirs, and in the process will usually provide feedback to strengthen the technology in question.
  7. If the technology manages to get past this stage into adoption, its development shifts from paradigm shifting innovation to more stable improvements, especially if the same technology inspires other implementations/variations. This competitive phase usually lasts until one particular implementation manages to gain a 90/10 mindshare. This phase also usually sees the emergence of some body of standards that define interconnectivity of that particular technology (for instance, standard gauge sizes in railroads, standard socket types, HTML). These standards usually reflect the implementation of the dominant player.
  8. The technology then is able to sit with comparatively minor changes for a significant period of time - years or even decades, but because they are based upon these standards, the dominant player also has the greatest to lose and least to gain in changing those standards to reflects shifts in new technologies.Standards that were based upon a given implementation consequently freeze that implementation, until eventually, the standard is out of sync with additional innovations.
  9. The dominant standard-bearer can suddenly find its position upended very quickly -- within a couple of years -- and find that it is now in a position with an aging infrastructure and a massive installed base. It either abandons that base and moves on, putting itself into a much more vulnerable position, or it sticks to that base even as the base itself moves on to new technology. In either case, the technology may remain for a while longer in a kind of white dwarf state, slowly cooling to a brown cinder. Periodically, the technology may be resurrected for a revival (think of all of the coin-op video games which now exist as software emulations) but such technology holds entertainment value only (restoring an 1890s railroad train engine, not because it is even remotely competitive, but because it has sentimental value).

Given this birth to death cycle for most technologies, why does XML feel so different? My suspicion is that XML is different -- it is a technology for the Long Now. XML of course has its roots in SGML, a technology that was relatively obscure for most people outside of corporate librarians and academics but that can in turn be traced back to the work of Charles Goldfarb in the 1960s. This means that it in fact predates another "long-now" technology: the Unix/C duality. SGML was not intended as a means to gain temporary market share; indeed, the high cost of creating SGML implementations meant that it was really commercially viable only for the largest of organizations.

SGML is not an implementation-driven technology -- it was from the first a vehicle that required consensus, because it's focus was at the heart of document communication, which means that from the start it was intended to model the way that people think, not what is necessarily the best way for machines to store artificially rigid class constructions. Because it was a meta-language, SGML by its very nature was intended to be implementation independent, long before Sun's "Write Once Run Anywhere" slogan came into play.

In addition to this, SGML is declarative. It's operant verb is BE, not DO. By doing this, it was able to provide a level of abstraction that bypassed the 1000 class monstrosities that emerged over the course of the next three decades. That abstraction lives on with XML, to the extent that it is dramatically affecting both the theory and practice of coding.

XML, in turn, has been a refinement of this notion of working with abstractions through an abstract interface, with the underlying assumption that so long as the expected behavior that's agreed upon is met, that specific implementation is irrelevant. One effect of this has been the increasing dominance of XML abstraction layers that in turn push the imperative code -- the C++ and Java and Perl of the world -- down the processing stack, away from the business logic and closer to providing a common low-level substrate. XUL, SVG, XAML, XFaces, XForms, etc. all provide manifestations of this principle to some degree. Create a binding layer such as XBL, and you can hide the API even more, with the consequence that you can increasingly reproduce sophisticated sets of object code with XML representations. The imperative code never goes away completely (nor should it) but it becomes much less significant in the scheme of things.

As a consequence of this, while certainly there have been companies that have ridden some aspect of XML through the technology business cycle described above, I think that XML as a technology is acting very much like the rise of mitochondrial RNA arising during the Cambrian era - providing a mechanism that serves as a substrate for a whole new kind of life (or in this case for a whole new kind of programming paradigm). The problem from the standpoint of those in what had been the dominant paradigm - the framework based OOP system - is that such XML provides few of the features that traditionally make money for vendors -- lock-in of file formats, opacity of data structures, variations in access formats that provide advantages to one or another vendor, arcane API implementations, reliance upon a specific language, and so forth.

Web services make it possible to do things that would have been impossible otherwise (the dominant of which seem to be less providing unique data feeds and more performing transformations between varying schemas). XML based GUIs (and even full application sets) are now becoming the standard way of building human faces to applications, cutting down on the reliance of specific language toolkits. XML is even being used for discovery of non-XML APIs, something which usually indicates a transient phase (if you can discover non-XML APIs, you are invoking an abstraction layer within the interface, which in turn makes it easier to decouple the implementations from an imperative basis.

Microsoft has gone through several core changes of technology (and more marketing "initiatives") in the last 30 years, yet curiously few of them have a staying power beyond about five years. This is not a disparagement of their products (which for all of my grumpiness about Microsoft even I will concede they do produce very usable and professional products), but I think is more indicative of the commercial software vendor market (I could replace Microsoft with Sun or Hewlett Packard in the above statement and be equally correct.) Their definition of Internet Time gets conflated with their definition of Business Time, where next quarter is more important than next year, and five years out is an eternity.

Ultimately, though, Internet Time to me is something that isn't measured in nanoseconds and months between deployment cycles. If, as I do, you believe that XML is becoming the neural circuitry of the Internet, then I have to wonder if perhaps the real Internet Time is measured in decades, and maybe even centuries.

Indeed, when Danny Hillis finally gets around to inscribing the operating instructions of the Long Now Clock on its stainless steel base, perhaps, just perhaps, he should write it ... in XML.

November 30, 2004

Tech (non-)Savvy?

This morning as I was pulling into Starbucks for my pre-work latte, listening to The Beat on Seattle's KUOW (an NPR station), the topic of web security, Internet Explorer and Firefox came up. Beyond the moderator (Steve Schoerr, I believe) there were three other panelists - a web security consultant, a writer for the Seattle Times, and an analyst from Jupiter research.

The discussion was, for the most part, non-technical; at the level of defining what a virus or worm was and why they were so problematic, the kind of discussion you'd expect in a radio forum with a wide, but not necessarily technically savvy listenership. Even given that, I think there were several points to be taken from the show beyond what was mentioned directly.

Security is Killing Microsoft. Almost from the outset, these people worked upon the assumption that Microsoft's products are fundamentally insecure, and these insecurities are likely to force you to frequently reformat your hard drives, lose critical information on your computers, or have your credit card information stolen. Long-time readers know that I'm not generally a big Microsoft proponent, but the kind of public trashing that Microsoft received may very well do more to erode their market share than any hundred bad eWeek articles about the company. The idea that Microsoft is fundamentally insecure (which I don't agree with - properly managed, Microsoft is not really any less secure than properly managed Linux distributions) has now become established folk wisdom. This is devastating for Redmond, and if anything it necessitates that they not only start putting some serious PR dollars into stopping that hemorraging of public trust but that they also seriously re-examine what their role vis-a-vis the Internet should be.

Firefox as Browser. If the radio show hurt Microsoft, it served as an hour long infomercial on the benefits of not only Firefox but Mozilla and Open Source in general, and I have no doubt that there was probably an extraordinarily high concentration of downloads coming from the Puget Sound after that. Firefox has momentum, in a way that I've not really seen any technology in the last five years have momentum, and most people who try Firefox are not going back to Internet Explorer. I argued in my previous post that while its still too early for enterprises to be developing applications around it, there is enough solidity within the platform that enterprise developers and IT managers are beginning to examine it for fast tracking into less mission critical areas.

Tech (non-)Savvy. Working at the fore-front of the Internet revolution, it is often easy to forget that most people out there know how to use the technology but have little to no clue about exactly how the technology operates. Most of the callers were of the "my computer has lately become slower than it was, is there a virus at work?" variety, though it was rather heartening to hear that many of these same people did make the switch to Firefox and found immediate improvements, though whether this is due to anything more than switching from one already full cache (and tens of thousands of ghostly temp files that will now likely never be removed from their computers) to an empty cache.

This attitude of technology as (unstated) magic is worrisome because it underlies to me how easily it is to slip into an attitude where you assume that the computer is doing valid and legitimate things at all times (or, on the other hand, that there is some evil force of terrorists out there that are determined to turn their machines into some kind of mindless zombies [aren't they already?]). I'd like to teach people to think of their computers as being an organism within a complex ecosystem, and like most such organisms attracts viruses both benign and malicious (benign viruses being programs) that provide useful benefits with no adverse side effects, a category which can become remarkably slippery if you look at it too closely. However, the fear is that this particular metaphor only reinforces the notion that computers are self-contained (and self-aware) entities. I'm increasingly convinced that in today's culture, computer training should be a staple in elementary, secondary and collegiate education.

Whither SVG? Thither!!

The blogosphere has become, for me, an awful temptation, as there are any number of articles that get posted that I so want to refute. Perhaps one of the most recent came from a writer for whom I actually have a great deal of respect ... InfoWeek commentator Jon Udell. His article entitled Whatever Happened to SVG? has set the various SVG lists buzzing, as it questioned whether SVG had dissappeared off the face of the planet, one of those articles that makes you really despair about the state of things.

His contentions, that SVG was supposed to be the technology that would change the world and then failed to do any such thing is accurate as far as it goes. In 2000, SVG was wracked with contention because there was such strong disagreement between the principals, especially Microsoft, Macromedia and Adobe, that the SVG 1.2 specification lists neither company among the working group members. Patent issues surfaced the next year which caused the W3C to debate the viability of the Reasonable And Non-Discriminatory (RAND) patent which would have let companies charge royalties for the use of standards, which came to a head in August and September of 2001 with the W3C response servers being overloaded with thousands of comments from the web community including most of the major luminaries in the field, saying almost universally that RANDs were a bad way to go. This process in turn made all W3C standard patents Royalty Free only (though this is an area where second attempts are being made to push RAND back into the organization).

Since then SVG has been subject to other challenges ... it is an incredibly complex specification to implement, and to date there have been only a few successful implementations, most notably that produced by Adobe, though the last year has seen the emergence of nearly complete SVG 1.0/1.1 viewers by companies in the wireless space, SVG static editors, and a couple of nearly complete SVG dynamic editors. A year ago at this time, few applications exported to or imported from SVG, now Microsoft (Visio), Macromedia (Flash), Adobe (illustrator) and Corel (Draw) all have SVG as either an input or output vector, and in some cases both, and most vector editors (open source and proprietary) provide SVG as an alternative format (and in some cases as the primary format).

This last point may seem trivial, but in practice it's anything but. SVG differs from nearly all other graphical drawing formats out there in several key ways - it has an XML structure that can be readily parsed and has a verfiable schema, it can reference other distributed graphic and metadata content, it is non-proprietary and as such does not run the risk of another GIF debacle, it can work on any platform that supports an SVG viewer and it can maintain external namespaces within its body. Even before you get into the scripting features of SVG, these points are enough to make it an attractive vehicle for complex documents over the web.

SVG has been "almost there" for a while, and its easy enough after a bit to begin to think that SVG will never take off. I'll be the first person to say that I don't think it will take off either ... if, by take-off, you imply explosive, rocket-like growth. Rather, it is creeping in on little cat feet (to paraphrase Carl Sandburg), showing up in places that you'd never expect to find it. Graps showing disease vectors or population shifts, icons in an operating system, drop-down boxes on a web page, a block of text in a pre-press application. SVG doesn't have a multi-million dollar marketing budget, and the Rolling Stones won't be there to play when it's used as part of an operating system. It doesn't run ads on TV, doesn't take out multi-page spreads pointing out where it is in the supply chain.

In short, I fully anticipate that SVG will end up being the VW Beetle of the graphics/multimedia world -- a little comical perhaps, not taken all that seriously, certainly not for the class conscious (at least in its 1960s incarnation), but ultimately both ubiquitous and easy enough that a grade-school student can create simple graphics with it without needing to know anything about programming beyond a few simple rules, and without needing to buy expensive editing or animation programs. Okay, SVG'll also have an air-cooled engine (you need to know exactly how far you can extend a metaphor... ;-)

It seems to me that HTML went through this stage in 1994, when the pundits who caught the wave early enough were already beginning to write articles about how HTML was beginning to be old hat, and would be dead soon enough in the face of pressure from languages such as Visual Basic (and later Java). Java on the desktop is now seen as the bad joke, Visual Basic, while still widely used, is itself now rapidly going the way of the dodo ... helped by the less than spectacular adoption of Visual Basic.NET. Meanwhile HTML is still being written by hand by those grade schoolers. Will SVG take the same arc? I'm counting on it.

November 28, 2004

Firefox: Why Microsoft Should Be Worried

I inadvertantly posted this earlier while it was still under development. Sorry for the confusion.

Firefox turned 1.0 last week, and in the process managed to hit 1,000,000 downloads in one day. Put that into perspective - Firefox is in the neighborhood of 5 MB, which means that the servers had something in the neighborhood of 5 terabytes of data streaming over their pipes. From a pure networking standpoint, that's pretty amazing, not to mention the indication about how major a release Firefox has become.

A few Netcraft statistics are perhaps just as revealing. Netcraft measures browser usage on the web, and according to its data, Firefox has managed to capture about 4% of the browser share just since October when the 1. 0 preliminary review was released. Given that the movement of any given browser usually tends to be in the neighborhoods of tenths of points from month to month, this jump was phenomenal. Firefox and Mozilla combined now occupy roughly 7% of the browser market, most of it at the expense of Microsoft's Internet Explorer. IE dropped below 90% or the market for the first time in several years.

I've seen a number of articles on the web asking whether Microsoft should be worried about the rise of Firefox, especially given their own market-share of around 90%. After all, ASP.NET is increasingly putting the orientation of web pages regardless of which server the browser is aimed at, a server-centric philosophy that seems to be consistent with the stance that the company took after moving from a rich client model in the late 1990s.

Personally, I would contend that Microsoft does need to worry ... a great deal. Internet Explorer is more than just a browser ... it is a critical piece of infrastructure that is used in any number of applications, including applications that don't necessarily talk to the web. The ability to create dynamic interfaces is not something to take lightly, as such systems are far easier to update, more readily customizable than precompiled binaries, and are often simpler to write applications around, for the vast majority of all such applications. Given that a significant proportion of such applications are written not for the home user but for the enterprise, Internet Explorer may in fact anchor Windows in businesses even more than Microsoft Office.

Given that, Firefox represents a significant threat to Microsoft. I have been working with Firefox and XUL for roughly three months now, building a number of tools for a content management system including a customized WYSIWYG XML editor and a versioning system monitor. With some work, I've managed to make these tools work across Windows, the Macintosh and Linux, using a combination of Mozilla's XUL, Javascript, and XSLT. The editor, as one example, overrides the Firefox menu, making it possible for me to actually piggyback on top of a user's version of Firefox to get the functionality that I need.

I am writing this blog using the editor I built on the Firefox XUL library and API, with the editor actually running as a tabbed pane within the browser itself. While it is certainly possible (and I'll discuss in more detail how it can be done) to set up such core functions as cut, copy and paste, text searching, undo and redo, and so forth through Javascript code, by building on top of the web browser itself I was able to effectively get all of this for free, leaving me with more time to implement functionality specific to the company's requirements. Perhaps the closest analogy I can think of as to the power of this would be as if you had access to the source code for Internet Explorer, could make changes to the interface using XML and Javascript, and could then run it on any platform without complex recompilation. The XAML model comes closest, but XAML is also still at least two years out, and it's unlikely that you'll actually get a chance to manipulate (or even see) the source code for the XAML rendering engine.

Yet for all this, perhaps the most intriguing aspect of Firefox is its ability to integrate multiple extensions. It's worth considering that most of Firefox is in fact an extension of some sort - some extensions are just bundled more tightly with the original package. Third party extensions exist to do everything from translate or speak selected text to showing the weather for the next few days. Some, such as the Web Developer's Toolkit, can actually work very nicely with an editor to show the dimensions and paths of images, boundaries of tables and divs, and activating and deactivating Javascript and Java components on the fly. These extensions can in fact be utilized in conjunction with your own applications as well -- I use a number of them with the editing suite I've developed, again letting me concentrate on the relevant business logic on my end rather than trying to reimplement everything from scratch. This capability will also increase considerably by mid-next year, when SVG and XForms are integrated into the mix - making it possible to generate rich, intelligent forms and interactive multimedia using SVG, XBL bindings and data-aware form components.

The Anatomy of Mozilla

I've been rather blithely throwing around terms and acronyms here that may be familiar to the XUL developers among you but may otherwise be somewhat mysterious to the rest of you. Consequently, digging into the innards of Mozilla may both end up explaining some of this and giving you a better understanding of what exactly applications such as Firefox can do.

Conceptually, Mozilla (and by extension Firefox and Thunderbird) can be broken down two ways: Gecko and SeaMonkey. Gecko is a set of core objects, written primarily in C++, that handle the detailed rendering and memory management of web-based applications. Gecko is perhaps the oldest part of the Mozilla project, started from scratch to better perform the drawing of web pages than the older Netscape browsers did. You can thank Gecko for Firefox's surprisingly fast speed in rendering. Gecko serves as the interface between tbe application and the native graphical rendering system (such as GDI on Windows or XFree86 on Linux and Unix based systems), freeing up developers from having to explicitly access this layer directly.

Gecko, however, is a largely invisible layer from the application developer standpoint. If you're writing an application, you are much more likely to be interfacing with it through SeaMonkey (you can probably begin to detect the direction of the Mozilla Foundation's code name strategy at work here). SeaMonkey provides the code interface layer that makes it possible for us ordinary mortals to write applications, and even to take over the Mozilla browser in order to create our own. SeaMonkey exposes an XML language called the XML User-interface Language (or XUL) that provides a set of building blocks that control various components - textboxes, formatting boxes, status bars, menus, lists, trees, and so forth, along with abstractions for creating key bindings, event observers and referential commands. This set is fairly rich (there are more than one hundred such tags), but it can also be extended with the HTML element set (useful for creating formatted markup within applications) and will further be augmented with the SVG tag-set by March 2005, and XForms by early 2006.

It is possible to put together applications with nothing but XUL, but they are generally trivial applications at best. As with any other application framework, the structural elements usually need to be bound together with some code of procedural code. SeaMonkey borrowed a page from Internet Explorer here (as well as .NET) - rather than building one language inextricably into the interface, SeaMonkey breaks the process up into two distinct technologies - XPCOM and XPConnect. XPCOM performs the same role for Mozilla that COM does for pre-.NET windows applications - it queries and binds object interfaces and makes them available for other coding applications to utilize. This cuts down on the requirement of maintaining a static-API, and provides a vehicle for writing binary extensions as XPCOM objects. While the two layers are not identical, there is enough similarity between XPCOM and COM that an ActiveX container for Mozilla should soon be supported, making it possible for Firefox applications to run ActiveX controls while at the same time providing a layer of security that prevents them from being the threat they've become under Internet Explorer.

To get around coding a specific language to SeaMonkey, XPCOM is designed to be accessed through XPConnect, a binding layer that maps XPCOM to a specific language's interfaces. Currently the primary such language is Javascript 1.5, though plans are in the work to incorporate Javascript 2.0 once that language goes through its final development phase and is approved by the ECMA (a body, incidentally, that has quietly become the de facto holder in trust of programming languages in general). I've covered some of the features of Javascript 1.5 before, including the use of setters and getters, robust regular expression support, the use of constants, and multiple try-catch statement support. However, bindings for other languages, including Python and Perl, are available, and a much more complete Java binder is also under development. Because of the open nature of Mozilla, I would not be at all surprised to see a C# implementation in the near future as well.

The list of XPCOM objects is quite impressive. A partial list includes the following functionality:
  • Core Functionality (See below)
  • Accessibility Components
  • Address Book Support
  • Clipboard and Selection
  • Content and Layout Managers
  • Cookies
  • HTML and XML DOM Support
  • HTML Editors
  • File and Stream Interfaces
  • Graphics Creation and Manipulation
  • Interprocess Communication (IPC)
  • LDAP
  • Localization
  • Mail Support
  • Network Support (Sockets, et al)
  • News Support
  • Preferences Objects
  • Security
  • Web Browser control
  • Web Services (SOAP/WSDL based)
  • Window Management
  • XML Support (Schema, XSLT, XPath)
  • XUL
The Core functionality provides a number of useful data structures (including dictionaries, arrays, property bags and enumerations) and language type support,along with threading libraries (and pools), timers, event resources, and exception management. While some of these are not necessarily that useful in Javascript, they do have definite utility in other languages such as C++. The graphics library includes interfaces for actually drawing on surfaces within the various objects, though accessing these services can be a little convoluted. The mail, LDAP and news support point out a subtle but important fact about Firefox and Thunderbird - they are simply applications that both sit on the same API - meaning that you could in fact build integrated mail services directly into Firefox if you wanted to.

XPCOM exposes these services and objects via a contract ID, something analogous to the classid used by Microsoft tools. The following, for instance, illustrates how you could create a new local File object:
var file = Components.classes[";1"].

The first part of the expression,
creates a reference to the local file class defined by the contract ID, ";1". This is a class reference, not an instance reference (it points to a particular class definition, rather than one specific instance of the class). The createInstance() function in turn creates an instance of this object, using the Components.interfaces.nsILocalFile interface to expose that particular interface on the instance. A given object may conceivably have more than one interface; this code makes it easier (and more cost efficient in terms of computing) to get the specific interface properties. Once this object is retrieved, you can use its properties and methods in exactly the same manner you would do so in any other language.

The final piece of the SeaMonkey language is the XML Binding Language (or XBL). This XML-based language provides a transformation mechanism that will take user-defined tags written in XUL files and convert them into an internal XUL representation, complete with properties, methods, and event hooks. XBL provides a way of creating more sophisticated elements, and is in fact used within XUL itself for the definition of things such as tab-browsers, which combine tab boxes and browsers into a single component.

A very simple XBL file, one that builds a box with OK and Cancel buttons, might look something like this:

XUL (example.xul):

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>
<?xml-stylesheet href="chrome://example/skin/example.css" type="text/css"?>

<box class="okcancelbuttons">

CSS (example.css):

box.okcancelbuttons {
-moz-binding: url('chrome://example/skin/example.xml#okcancel');

XBL (example.xml):

<?xml version="1.0"?>
<bindings xmlns=""
<binding id="okcancel">
<xul:button label="OK">
<xul:button label="Cancel">

The XUL file creates a reference to a CSS file, while in turn uses CSS selector and rule syntax for defining the bindings between a given class (okcancelbuttons) and an XBL file and the associated "okcancel" binding item. Real XBL can become much more complex than this, of course, but this is a topic for a different article.

As expected, SeaMonkey also handles the bindings between CSS and the XUL applications, with XUL heavily utilizing CSS not just for simple "styling" but for the actual creation of complex components through XBL. The CSS support that exists as a consequence is VERY impressive, including certain features that have been floating for a while, such as support for multiple flow columns.

The final piece of any XUL application is the use of overlays. An overlay is a XUL file that changes the XUL (and associated scripts) of a given application. By overwriting or extending (as a form of inheritance) you can do such things as create overlays on Firefox or Thunderbird itself. I do this myself to override the default load and save menu items and replace them with my own, making it possible for me to save to a custom XML schema and load from that schema later.

Firefox is an example of all of these principles in action, by the way. If you have Firefox running on your system (and maintain the Java SDK on your system), create a copy of the browser.jar file located in the chrome directory of your Firefox distribution somewhere outside of the firefox application folder. You can use the Jar file extractor from the SDK to convert this into a directory:

jar xvf browser.jar

This will create a folder called content, which in turn will hold the various XUL, XML, and CSS files for the Firefox browser. It is worth spending the time looking at these closely. One of the things you should realize pretty quickly on is that almost all of Firefox is contained within these XUL files, not in some form of C++ application, and that a significant portion of the coding for Firefox is handled by Javascript.

Firefox is remarkable to me in that it is one of a new breed of applications, built around an XML interface and scripting yet fully capable of handling some of the most serious challenges that any "formal" application written in C++ or even Java can handle. It is also eminently accessible, in a way that a lot of other applications aren't. It is this model, as much as any widgets or features of the Firefox application, that is the real story here.


I'm looking at the November 22, 2004 issue of eWeek on my desk at an article entitled Browser wars back on. I think that sums it up pretty well. Firefox is not just a shot across the bow to Microsoft -- stealing 5% of market share is much like taking out the yard-arm on your ship with that cannon-shot. A major portion of Microsoft's control over that market share has come from the fact that you could access other components from within it, turning the Internet Explorer shell into a general purpose shell for hosting any kind of application. No other browser out there has really managed to pull it off and still be able to maintain its quality as a browser.

Firefox opens up that possibility. It's not that much of a stretch to envision Open Office creating wrappers around their UNO wrapper class to make them work within Firefox, and as the editor application I've written myself illustrates, you can actually go a long way toward building commercially viable enterprise-level applications just using the core components from Firefox. The addition of SVG support and XForms provides another point of attack against both Power Point and InfoPath, and its not hard to envision data-access tools appearing in the next year (perhaps powered by XQuery?) that will give Access a run for its money. Such applications could run on multiple platforms with little or no modification, would be a menu-item away from normal browsing (and could easily run in one browser tab while you maintain your mail in a second tab and surf the web in a third).

None of this will happen overnight, of course, but its easy enough to see the general trendline. Already it is prompting Microsoft to come back with a number of new extensions and innovations on its own browser, though in most cases these extensions still rely upon the existing ActiveX architecture. The biggest danger that Microsoft faces from this comes in its tendency to pick and choose which standards it chooses to comply to; a truly standards compliant development system is likely to be far more politically attractive than one that is closed and proprietary, especially where it counts -- not in the big enterprise settings where the adoption of any new technology usually takes place only after such a technology has become very settled but in the spare bedrooms and coffeehouses and garage work-stations of the individual developers who are the ones who are learning (and in many cases developing) the technology of the future. For them, Firefox and the Mozilla Application Suite represents a huge step forward, one that will have reverberations for the next decade and beyond.

This was sent to me by a friend, and I thought it would help put into perspective just exactly how far we have come in fifty years. No word yet on the laptop model. Posted by Hello

November 18, 2004


Most of us, the people for whom the Metaphorical Web makes any sense whatsoever, are geeks. Not so long ago such a term was used with the greatest of derision, an open reference to carnival sideshow geeks - people who were bizarre or did bizarre things. I was a geek, the high school kid that would sit in the computer room (and former storage closet) writing programs on the Apple IIe there while everyone else was at lunch or chilling out during recess. I was the kid that would inevitably be picked at the tail end of whatever sports team match we were mandated to play, because everyone knew I had neither skill nor interest in team sports.

Somewhere along the line, being nerdy gained its fifteen minutes of fame, in large part because one of the most stereotypically nerdy people ever had also managed to become the wealthiest man in the world, and for a while, being the odd man out was in. Then the Technical Nuclear Winter hit, and the football team captains and the marketing types were once again in the driver's seat, laughing at how these stupid tech guys had been taken with stock options that were as empty as vaporware and SCO legal threats.

For the younger kids (and even for many of us who are beginning to see gray in our beards) the viciousness of this turnaround was stunning - going from being able to afford expensive houses to being considered one of the lucky ones if you had parents who could put you up in their spare bedroom proved a powerful blow to a lot of people, shaking them up and making them realize how truly random such wealth could be -- and how duplicitous other people could be, if they thought that they could use greed to control you.

I've been through this cycle a couple of times before - I entered into the programming marketplace in the mid-1980s, when career counselors were advising people not to go into computers, there was no future in it, and again in the multimedia bust of 1994, before the Internet really began to heat up. The tech field's like that ... sometimes you ride the big waves, and sometimes it's better just to take your board out of the water and spend the time waxing and repairing it, because the surf is about as flat as it can be.

Curiously enough, though, the real innovations that occur in the field don't occur when the economy is red-hot and the potential for making money is strongest - in fact that's usually the worst time. Your judgement becomes clouded because you're not asking the question "does this solve a real need?" but rather "can this make me rich?". No, the real breakthroughs come when you're sitting at home, chatting with friends via IRC (IM for you newbies) or email lists, playing with ideas or flaming away the dross, putting together something just to see if it can be done. There's a new toolkit I want to play with, there's an idea I saw that I think could work here as well, we need to figure out where the holes are in this specification, because they're causing real interoperability problems.

A surprising number of programmers are also musicians, though not necessarily world class ones. Part of the music/programming association, I suspect, has to do with the analytical nature of music, but a bigger part is that a musician is in his or her own way also a technician, someone who is interested less in the money than in making their tools do something really cool. The process of innovation has reminded me more than once of an extended jam session, continuous improvisation off a theme. Any musician knows that not all such jam sessions produce great music -- often what they produce is just noise, and the musicians just shake their heads and agree to meet again next week. Sometimes, though, everything clicks, everyone finds themselves in the groove, and before you know it the music ends in the wee hours of the morning with the participants exhausted but happy (and I"m deliberately avoiding another obvious metaphor here).

Lately I find that the meaningful work that I'm doing is not coming from the 9 to 5 grind, the thing that keeps the roof over my family's heads. It comes despite it, in the interstices, through the improvisational conversations that we all seem to be engaged in. There are some profound things shaping up in the software field right now ... things that have occurred not because a CEO somewhere had a grand initiative to add another billion dollars to the bottom line but because, in coffeehouses and pizza parlors and IRC chats and e-mails, people have been playing the music of innovation and inspiration, of trying to build something because it needs to be built, profit be damned.

The cycle is shifting yet again, the momentum building, the ideas exchanged at two in the morning at Starbuks going to make the next BIG THING. No doubt the former jocks and marketing types will begin to circle soon, sensing the potential for making money off these things that they did not create, were not a part of. I suspect they may discover that smart people can be fooled once, but that smart people by definition also learn very, very quickly. Yet I also pity these people, the ones that once reviled us by calling us geeks, for I suspect that deep down they have no music in their souls, that they will never know the real joy of creation.

No code today, though I promise some tasty morsels soon. Until then ... enjoy!