Manage Learn to apply best practices and optimize your operations.

The XPath Toolkit in Java 5

William Brogden discusses the XML Path Language (XPath) and two different ways to put it to use in Java EE 5 Web services.

What is XPath Anyway?

The XML Path Language, or XPath, is a language that defines a syntax for locating items in an XML document. It was originally defined for use with XSL transformations and most readers will have encountered it in that context. Java programmers recognized that XPath expressions could be very useful and with the release of Java 1.5, XPath arrived in the standard toolkit in the javax.xml.xpath package.

Getting an Instance of XPath

Like many other APIs in JAXP, in order to get an instance of a working class you start with a factory. Although it seems cumbersome, this architecture provides flexibility and allows for future expansion. In the following example, the parameter handed to the newInstance method says that we want to build XPath objects that work with the default W3CDOM model, the only one supported in Java 1.5.

XPathFactory factory = XPathFactory.newInstance( XPathFactory.DEFAULT_OBJECT_MODEL_URI );
XPath xpath = factory.newXPath();

Once you have an XPath object, there are two ways to put it to work. You can have it evaluate an expression or you can have it compile the expression to create an instance of XPathExpression that incorporates the expression logic and can be used repeatedly.

A Simple XPath Example

The first XML example I am going to use is the web.xml file for the example servlets in Tomcat 5.5.9. In the following statement, doc is a reference to the JAXP Document for the web.xml file.

System.out.println( xpath.evaluate("/web-app/filter", doc  ) ); 

Execution of that line produces the following output:
         Servlet Mapped Filter              

That is the text content extracted from the following section of the web.xml document, note that the evaluation preserved all of the text content of all of the nodes contained in the first "filter" element found.

        <filter-name>Servlet Mapped Filter</filter-name>

It is important to note that only the first node satisfying the expression contributed to the output. Returning the full text content of the first node is the default for that particular "evaluate" method call. Contrast that simple XPath statement with the number of org.w3c.dom.Node method calls which would be required to extract that text from 6 separate elements and you begin to see the attraction of working with XPath.

Evaluation for Different Content Types

There are four different XPath methods named "evaluate", two are defined as returning a java.lang.String and two as returning a java.lang.Object reference. Therefore in writing a statement using evaluate, you may have to provide a specific type cast. The methods which provide for returning various object types are controlled by means of constants defined in the XPathConstants class.

For example, we can get all five of the nodes in the example web.xml document using the following statement.

NodeList nl = (NodeList)xpath.evaluate("/web-app/filter", doc, XPathConstants.NODESET );

Where the returned type implements the org.w3c.dom.NodeList interface methods. Note that although "NodeList" sounds like it should implement the java.util.List interface, it does not. The XPathConstants and the corresponding Java reference types that will be returned can be summarized as follows:

XPathConstants.BOOLEAN                 java.lang.Boolean
XPathConstants.NUMBER                  java.lang.Double
XPathConstants.STRING                  java.lang.String
XPathConstants.NODE                    org.w3c.dom.Node
XPathConstants.NODESET                 org.w3c.dom.NodeList

Now for a more complex example. The XML document source will be the server.xml file that Tomcat uses to define the service to be created and the connectors that will be exposed. Here are the pertinent XML elements. The real file is much larger.

<Server port="8005" shutdown="SHUTDOWN">
 <Service name="Catalina">
   <Connector port="80" maxHttpHeaderSize="8192"
               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               connectionTimeout="20000" disableUploadTimeout="true" />
    <!-- many details left out here -->           

The following code, where doc is a org.w3c.dom.Document containing server.xml, locates the element having the "name" attribute equal to "Catalina". Inside that element it finds the first element and locates the attribute named "enableLookups". The text value of that attribute is then used to create a Boolean object which is returned.

Boolean flag = (Boolean)xpath.evaluate( 
  doc, XPathConstants.BOOLEAN );

Note that although the examples I have been using start with a org.w3c.dom.Document object, the evaluate method can apply an expression to any node in a document.

Using XPathExpression Instances

Instead of using the XPath evaluate method used in the first examples, you can build an XPathExpression instance that contains the expression and use it repeatedly. For example we could reproduce the output from the first examplewith the following:

  XPathExpression xpe = xpath.compile("/web-app/filter");
  System.out.println( xpe.evaluate( doc ) );

The intent of the XPathExpression class is to let the programmer define a suite of search expressions which can be reused, thus saving a bit of programming complexity.

Performance of the XPath Toolkit

Surely nobody would expect XPath, which is built on top of standard JAXP classes, to be faster than those classes. To get at the performance penalty for using XPath I timed the creation of XPathExpression instances and subsequent evaluation with an expression to get a NodeList of the nodes in a web.xml file. The Java statements required to do this (given an existing instance of XPath) are:

XPathExpression xpe = xpath.compile("/web-app/filter/filter-name");
NodeList nl = (NodeList) xpe.evaluate( doc,  XPathConstants.NODESET );

Using the methods in the org.w3c.dom package, this would be accomplished by code like the following to first get a NodeList containing the elements:

NodeList nlOne = doc.getElementsByTagName("filter");

Followed by looping through the elements to get each element as the contents of a second NodeList:

for( int j = 0 ; j < nlOne.getLength(); j++ ){
  Element fE = (Element)nlOne.item( j ) ;
  NodeList nlTwo = fE.getElementsByTagName("filter-name");

The timing results using my AMD Athlon 1.4GHz cpu can be summarized as follows:

Creating an instance of XPathExpression                 0.3 millisec
Using XPathExpression to get a 
   NodeList   7.5 millisec
Using getElementsByTagName to find <filter-name> nodes  0.3 millisec


The other performance indicator of interest is the amount of memory used, so I measured the memory consumed by creating 1,000 instances of XPathExpression. This turned out to be very small, approximately 500 bytes per instance.

Apparently the convenience and flexibility of using XPath comes with a considerable execution speed penalty. However, for many applications programmers will be glad to accept a speed penalty in exchange for simplicity and flexibility. I think we can all be glad that XPath is now a part of the Java standard library.


The W3C's XPath Recommendation 1.0 is at:

The chapter on XPath in Elliote Rusty Harold's book, "Processing XML with Java" is available online at:

The JavaDocs for the javax.xml.xpath package are available online at:

Dig Deeper on Topics Archive

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.