At one time, programmers trying to process XML with Java programs had to hunt around for one or more libraries from Sun, the Apache Software Foundation, or some other source to add to their Java programming environment. However, beginning with Java 1.4, most XML processing tasks can be handled with the classes in the standard edition and Java 1.5 adds many new tools. If you are looking for an excuse to move your Java development to 1.5, the built in XML capabilities may be all you need.
The basic division in XML processing is between approaches that give continued access to the entire document at once and those which take a single "streaming" pass through the document. Java provides the Java API for XML Processing or JAXP API which includes a standardized API for both types of processing. JAXP 1.3 is the latest version, finalized in September of 2004. It is implemented in the Java 1.5 Standard Edition release.
JAXP is designed to be an implementation-independent, portable API, meaning that any vendor's parser toolkit that meets the API specification can be plugged into a program that is written to the API. In practice, you will probably find the parser that comes with the Java Standard Edition to be satisfactory.
Most people easily grasp the Document Object Model or DOM since it presents a hierarchical or "tree" structure like that used in HTML and XHTML documents. In this approach, the entire document is read into memory where each node can be examined and manipulated. The key Java package for DOM manipulation is org.w3c.dom where the "org.w3c" reflects the fact that the implementation follows the World-Wide Web Consortium XML DOM standard.
The key interface within the org.w3c.dom package is Node. In a DOM representation, all elements in an XML document are objects that inherit the Node interface or a sub interface. For example, there are interfaces representing elements, attributes, text and comments. A DOM consists of a collection of Java objects linked together so that they represent the entire contents of the source XML document. The JavaDocs for the Node interface have a table showing the properties of objects implementing these interfaces. You should become familiar with this table before writing Java code to use the DOM.
The advantages of the DOM programming model are that every element can be located, manipulated and changed very efficiently. The disadvantages of working with a DOM are the processing time and memory required. Even if you only need access to a single element in the XML document, you have to parse the entire document into memory. As documents get larger, DOM processing becomes less practical.
Programming with a streaming parser is not as obvious, but it provides many advantages. A streaming parser takes a single pass through the document and deals with one element at a time. It is up to the programmer to decide which information to keep and which to ignore. The streaming parser in the JAXP library is called SAX for Simple API for XML. SAX was developed for Java outside the W3C standardization effort by David Megginson. The key package in the Java library for SAX programming is org.xml.sax.
As a SAX parser works its way through an XML document it generates events that represent the various elements it encounters. Programming for SAX processing consists of creating the classes and methods to handle these events and extract the desired information.
The Java standard edition provides other XML related APIs to make working with XML in Java easier and faster. All of these tools build on the JAXP foundation. However, discussion of these advanced tools will have to wait for another tips column.
About the author
Bill Brogden is a computer consultant who enjoys exploring new technologies. He has written study guides for Java certifications and several books on using XML with Java. You can reach Bill at email@example.com.