In this third installment of our eight-part series on XSLT we'll tackle how XSLT processing works and the kinds of processors used for that job. We'll also describe how an XSLT processor internalizes and represents input documents, how a results tree is created, and then how XSLT output occurs. All in all, this gets to the very heart of how XSLT is done.
XSLT processing begins as a special piece of software – the XSLT processor, in fact – reads the input documents in for interpretation and processing. To begin with, an XSLT processor will read in a stylesheet document that basically states the rules and designates targets for further processing and output. Along the way, the processor encounters one or more instances of the following types of input, and uses them to guide its subsequent behavior:
- <xsl:output> elements describe the format to which the XSLT processor should adhere when creating output
- <xsl:template> elements specify how various parts of the XML input document should be transformed or processed
- Input document: identifies the input document against which the templates should be applied, and from which output should be created
All of these elements come together in the invocation statement for the XSLT processor itself. For example, here's a command-line call to the Open Source Xalan XSLT processor:
java org.apache.xalan.xslt.Process -in test.xml -xsl test.xsl -out test.html
Using java as the first term in this command invokes the Java runtime environment. The string org.apache.xalan.xslt.Process defines the Java package within which the processor resides and also brings in all supporting objects and methods it needs to do its job. -in identifies the next argument, test.xml, as the input file to be processed, -xsl identifies the next argument, test.xsl, as the XSL style sheet to apply to that input file and -out identifies the final argument, test.html, as the name (and type) for the output file that the processor will create. Syntax details vary from processor to processor but most include all of the same kinds of elements.
To do its job, the XSLT processor must read in and parse the XML input file that provides the focus for transformation, and do likewise for the XSL stylesheet file that defines the transformations to be applied to the input. This usually occurs in reverse order (stylesheet first, input second) simply because some processor usage scenarios involve applying the same stylesheet to more than one input document. Here's a high-level overview of this process:
- Parsing the Stylesheet. To begin this activity, the processor reads the stylesheet. As it reads the processor recognizes elements in the stylesheet and creates tokens or other compact and arbitrary placeholders to represent its structure and instructions. It also captures input data (element contents and parameter values) and associates them with specific stylesheet elements for later use. Xalan, for example, creates a sparse table structure to represent stylesheet data so it can easily navigate around its structure and quickly access related content and parameter values.
- Parsing Input. Here again, the processor reads all the text in the input XML document. This time it builds a tree view of that XML source markup. This tree uses the outermost document-level element that starts every well-formed XML document as the root of the tree and then instantiates each child it finds as a branch from the root. As it continues parsing it creates nodes for each XML element it encounters and attaches that node to its parent. All associated content and parameter values become data for the node to which they belong. When it finishes parsing the entire input document, the result is a complete tree that not only captures all of its contents but also represents the document's structure completely and precisely.
- Applying Transformations. Processing is an iterative process that involves repeating a recipe something like this:
- Look for nodes to process by examining the processing context (which essentially represents the processor's position in the document tree). This starts out at the root of the input document tree and changes in response to XSLT elements encountered in the stylesheet and nodes encountered in the input. Essentially this works like a finite-state machine where input comes from the input tree and instructions from the combination of processing context and stylesheet table data. Next steps assume one or more nodes are present in that context.
- Check to see if the next node in the context has any matching <xsl:template> elements. If so, go to the next step; if not, the XSLT processor goes through some built-in rules. These permit templates to address only nodes that they wish to handle and instruct the processor to keep grabbing the next node when and as nodes that don't have matching templates are encountered. Similar built-in defaults also make sure all element and root nodes are processed (even if they're essentially skipped over), that input contents can easily be copied to output results, that comments and processing instructions (PI) are ignored, and that namespaces are recognized but not inherited in the output.
- If one or more templates do match, the most specific one applies (more specific instances identify fewer nodes, but do so more precisely).
- Change the context in response to whatever action applies (usually, this means advancing to the next node in the input document tree).
- Return back to the first item in this list until there are no more nodes to process.
Combined with XSLT various functions and transformations, this creates an environment where processing is relatively fast and where considerable output transformations are relatively easy to achieve.
The Open Source Xalan and Saxon XSLT processors are by no means the only such processors available, but they are among the most popular. Using Xalan means installing a Java runtime environment and downloading the right Java packages, all of which are explored and explained at the Xalan-Java home page. Saxon comes in two flavors: Saxon-B, which implements basic XSLT2.0 and XQuery, and Saxon-SA, which adds support for XML Schema to Saxon-B's capabilities. Some versions of Saxon require JAXP 1.3 or 1.4 as well as Java. Visit the SAXON home page for instructions on how and what to download, and how to install this processor.
With one of these tools installed on your desktop, you can begin your own experiments with processing XSLT. It's the best way to learn and an essential way to make use of this great technology.
About the author
Ed Tittel is a full-time writer and trainer whose interests include XML and development topics along with IT Certification and information security topics. E-mail Ed with comments, questions, or suggested topics or tools for review.