November 29, 2007

CDF: The common format you've never heard of

Quick! Do you use the Compound Document Format?! You, know, CDF … surely you use CDF, right?

Chances are pretty good that you have no idea about what I’m talking about. Everyone knows Microsoft’s word document format and Adobe’s PDF, chances are pretty good that if you’re reading this on XML.com you’ve heard of ODF and OOXML, especially after the fairly rancorous discussions about ISO status for these two formats. Yet CDF, hmmmm … that’s a rough one. Didn’t it belong to Corel, once upon a time?

Okay, now, how about this one … do you work with (or even just read) XHTML? Probably, if you’re involved in XML work, your HTML conforms to a great degree to the formal XHTML standard. Good enough. How about CSS 2.1? Sure, who doesn’t. Okay, here’s a biggy -how about XMLHttpRequest? You do AJAX work? Good for you. XForms - well, that one’s a little less prominent, but yeah, it’s beginning to appear. SVG? Hmmm … again, kind of touch and go, but even after a few hard years SVG’s by no means dead yet. The occasional XSLT - Google’s doing some nice work in getting their Javascript based AJAXSLT up and running, which means that those few browsers that don’t support XSLT (and XPath) natively should be able to support it via an AJAX layer. Oh, and perhaps through in XSL-FO for good measure, as it continues its quiet but relentless march into becoming a mainstream format.

So, do you work with CDF? You betcha. The Compound Document Format was set up as a way of tying together at a minimum all of those technologies described above into a single cohesive whole. Put another way, it’s a fancy way of describing the core suite of W3C document standards into a cohesive whole, although it does place some fairly minor requirements on usage in order to provide a consistent standard.

CDF was in the news recently with the implosion of the Open Document Foundation, originally established to endorse ODF, though in its death throes it briefly highlighted the CDF format as perhaps a better format for documents than either OOXML or ODF. This is admittedly one of those areas where it may be justified in looking at XHTML especially and going “huh”? How can that be a full document format - it’s used for web pages, after all - you wouldn’t want to use it to mark up a full book, would you?

Document formats are a lot like religions - people are ready to defend them to the death if need be, yet at the same time it becomes easy to dismiss certain religions that don’t even seem to be religions at all (such as my personal favorite, the rather philosophical Tao). Could you mock up a brochure in XHTML and CSS? Actually, it turns out that its surprisingly easy to do just that - especially if you throw a little SVG into the mix and allow the possibility of embedding XHTML within SVG (for all those odd little bits of rotation and other special effects).

How about linking between blocks of physically disconnected content? That’s what hyperlinks are for, after all. CSS gives you at least a dozen different unit references, lets you control borders, margins, padding, kerning, image placement, color management, and so forth. There are pieces I wish could work better - I’d love to have the capability of defining a color in SVG then referencing it within my XHTML document via CSS; full support of CDF will likely allow that. If you jump just a little beyond the current CSS 2.1 spec you even have some fairly decent support for columns of text, not to mention tabular structures and even VoiceML support. Moreover, consider XInclude support, something that really, really needs to be a part of every browser implementation (though its fairly trivial to write AJAX classes that let you create similar bindings).

These are all disparate documents, not a single “document” akin to Word or the ODF Writer format. However, even that’s not quite true. The effort of the CDF working group has been to essentially standardize on the way that web documents can be “bound together” into what appears to be a cohesive “whole”.

Part of this is accomplished through the use of a standard called the Web Integration Compound Document (or WICD). This standard provides a number of both new features and clarifications:

  • an extended definition for the HTML object,
  • the integration of Scalable Vector Graphics (SVG) with XHTML and other documents,
  • establishes the and elements,
  • establishes the characteristics of hyperlinks and focus across the boundaries of namespaces,
  • defines the nature of focus across multiple embedded documents,
  • introduces support for animation and synchronization, building on the older SMIL and SVG standards, and
  • includes support for SVG fonts.

Two additional WICD standards - WICD Mobile and WICD Full - extend these with a few other features, most notably ECMAScript 3rd Edition (4th edition is currently still under development), the XMLHttpRequest to support most AJAX applications, XHTML 1.1 and the CSS 2.1 specification. Additionally, CDF working group also define two “modes” of operation - Compound Documents by Reference (CDR), in which internal content to the document is provided via reference links, and Compound Documents by Inclusion (CDI) in which internal content is rendered into the containing document directly in a different namespace.

Admittedly, I think it can be argued that the W3C effort could do with a few less acronyms (and perhaps a bit more of a PR effort) but overall, what is happening with CDF is a very critical - and welcome - evolution of HTML. The web needs to be more than just static web pages - this has been demonstrated by the continuing strength of the Web 2.0 meme, that documents should be able to talk back to the server and interact with them at higher levels than simple links for refreshing content. Commercial vendors would love to fill that space, and by doing so regain for themselves control over the underlying technologies that make up the web. Yet the message coming about with CDF is both simple and profound: the web grew up on the strength of seemingly simple technologies - HTML, CSS, JavaScript. These technologies are still around, they’re just maturing as people come up with new ideas about what the web can do, the pieces slowly unfolding as we progress further into the realm of the Internet as operating system.

Already, much of CDR has been implemented in the more sophisticatedly forward browsers. Opera 9.5 has a rather extensive support for most of CDF core and Firefox 3.0 is moving in that direction (though the biggest area of weakness is in SVG animation support). JustSystems, a company that has a huge presence in Japan but is only now just (sorry) beginning to make an impact outside of that country, has been working towards a CDF platform for a number of years, and has one of the more expressive (and impressive) displays of how compound documents COULD work (I will no doubt by writing more about JustSystems over time). Both Sony and Nokia have WICD implementations working (as prototypes) on certain of their mobile phone chipsets, with similar announcements from Abbra Vidualize and BitFlash, both makers of mobile graphical chipsets, while Sun is partnering with OpenWave to create a formal WICD implementation in line with JSR 290: JavaTM Language & XML User Interface Markup Integration.

This is no guarantee, of course, that WICD will catch on, or even become a household name (even in geek households), but unlike other technology most of what’s involved already exists, and has already been proven every day millions of times a day. Already, the number of HTML documents that exist dwarf (by a few orders of magnitude) the total number of Microsoft Word documents. As editing increasingly moves onto the web, its safe to say that the document of choice will be neither ODF nor OOXML, both of which gain their power on the basis of supporting legacy word processing systems. Instead, what seems to be emerging from the W3C is something that is not an office suite because it didn’t evolve from one, but that nonetheless is capable of most if not all of the same functions that office suite documents pose.

Moreover, if you come to realize that XHTML by itself is NOT the only targeted compound document (indeed, the specification is rather clear that it is intended to be extended to other formats, from XSL-FO to DocBook to the aforemention ODF or OOXML) then what becomes clear is that the ability to integrate content itself can be standardized, and like many other W3C formats, this move to the metadata level may very well provide the necessary differential to make the technology succeed in even the most competitive of milieus.

So, don’t worry - if you’re not using CDF yet … you will be.

Kurt Cagle is an author and chief architect of Metaphorical Web (http://www.metaphoricalweb.org). He lives in Victoria, BC, where he is making it his mission to visit every Starbucks on Vancouver Island.

2 comments:

Sam said...

Nice article, Kurt.

Anonymous said...

You write very well.