JavaScript XPath Implementation – A Short Way To Rome

Here's the situation: The W3C DOM XPath Specification, although formally correct, appears cumbersome and awkward when implemented in JavaScript.

Providing an XPath expression's evaluation result by means of five distinct properties/methods, of which always only one is valid, doesn't seem an ideal approach, particularly not when JavaScript's strengths come into play, permitting any type of result to be returned by a function call or variable.

One of the more outstanding JavaScript features so far is that JavaScript is an untyped language. Whilst this is a valid assertion, segregating return values into type-aware constructs like "booleanValue", "numberValue" etc. is basically unneccessary and redundant.

Morover, iterators like "iterateNodes()", or "snapShotItem()", are inconvenient constructs not perpetuated in JavaScipt. Accessing collections by means of Array-like objects is a considerably more convenient and less ponderous approach.

Astonishingly, the W3C's JavaScript binding so far perpetuates the W3C DOM XPath Specification just like it is… "1:1". But they didn't take the step ahead to reflect the specification's formal, language-independant, intention and to amend it into a well-designed JavaScript construct.

Well, many roads lead to Rome. And, sure, implementing it this way is one possible way… However, I'd like to suggest a different pattern, describing a significantly shorter way.

My below-mentioned pattern alleviates the overhead for using XPath down to the essential. It's efficient and comfortable (often referred to as the KISS principle). At the same time it retains the full existing plethora of XPath usability.

I suggest to get the following pattern adapted by current browsers, becoming a standard client-site JavaScript construct recommended by the W3C.

The following XPath implementation pattern is a suggestion, to be filled with live by anyone interested in striving the idea forward.

I am very much interested in getting your feedback on my XPath JavaScript implementation pattern after you have used it with your HTML pages and XML documents. Your ideas and improvement suggestions are very much welcome. I am looking forward to a vivid discussion.

Currently, my website is static, not allowing any comments online. If you want to comment on my suggestion, please fill-in the contact form.

How does it work?

By using my suggested XPath implementation pattern, retrieving an XPath evaluation result becomes as easy as just creating an XPath object and calling its evaluate() method:

var x = new XPath(); var result = x.evaluate(expression);

Or, even more simple:

var result = document.xpath(expression);

The evaluation function usually returns an array of nodes. But it may also return a boolean, number or string value, depending on the return value of the XPath expression you have provided.

Here's a more elaborated version, using a context node:

var contextNode = document.getElementById("SomeNode"); var x = new XPath(); var result = x.evaluate(expression,contextNode);

Or, in a more object oriented fashion:

var contextNode = document.getElementById("SomeNode"); var result = contextNode.xpath(expression);

Want to use XML and namespaces? The following code demonstrates the potential of utilizing XML with namespaces. An anonymous object is used to provide a namespace map to the evaluate() call. Its property names define the prefixes and the corresponding property values contain the full namespace IRIs:

var xmlDoc = createDocument(); var namespaces = {"xsd" : "", "xsi" : ""} var x = new XPath(xmlDoc, namespaces); var result = x.evaluate(expression);

Or, using a context node:

var xmlDoc = createDocument(); var contextNode = xmlDoc.getElementById("SomeNode"); var namespaces = {"xsd" : "", "xsi" : ""} var result = contextNode.xpath(expression, namespaces);

To provide a namespace IRI for a document's default namespace, use the ":" prefix:

var namespaces = { ":" : "", "xsd" : "", "xsi" : ""}

That's easy, isn't it?

I have designed a proof-of-concept implementation to vividly and clearly expound the advantages of my concept. It runs on all modern browsers, including Internet Explorer. The implementation portrays the complete and properly functioning construct. You can use it in your documents right away to immediately work with XPath.

Give it a try!

How to use this implementation

First, link to the XPath library file.

Loading this file with your document will automatically add

Next, …

All three functions, the XPath constructor, the XPath.evaluate() member function and the xpath() document element member function, take any of the following arguments in any order:

Name Type Description Default
expression string XPath expression to be evaluated against the context node. An empty string
doc DOM Document object Document object of either a HTML or XML document. Current HTML document object
context DOM Element object Element object within the current XPath object's document property. Root element object of current XPath object's document property.
namespaces Anonymous mapping object Object consisting of name/value pairs. Namespace prefixes are stored as property names and corresponding namespace IRIs are stored as corresponding property string values. null

If you are using the XPath constructor, any of the arguments are stored as the XPath object's member fields. This is also true for arguments provided to the XPath.evaluate() member function. That way you can call evaluate() several times without the requirement to provide constant arguments over and over again.

Keeping this in mind, the following sequence is valid if you want to execute an XPath evaluation twice:

var x = new XPath(); var result1 = x.evaluate(expression); var result2 = x.evaluate(); // returns same result as first call, because "expression" is cached

Internet Explorer peculiarities

Being a proof-of-concept, developed in JavaScript but not in browser native code, my implementation certainly has to confine itself within the limits implied by current browser implementations.

Internet Explorer currently does not support XPath on HTML documents. It only supports XPath on XML documents. To still be able to apply XPath evaluation calls to HTML documents I had to circumvent this constraint.

The solution I came up with was to clone HTML documents into XML documents and to evaluate XPath expressions on those XML documents instead of the original HTML documents. When executing the evalute() function call, the evaluation result is then first looked up in the original HTML document before being returned.

This solution certainly suffers from a performance penalty. And you should keep in mind that the document you want to evaluate may not be altered after the document property of the XPath object has been set. If you have altered the document after having it assigned to the XPath object, you need to re-assign the document's document object to the XPath object with the next evaluate() call to rebuild the document cache:

var x = new XPath(); // implicitly caches HTML document var contextNode = document.getElementById("SomeNode") var result = x.evaluate(expression,contextNode) // alter the document contextNode.parentNode.insertBefore(document.createElement("br")); // after altering a HTML document, Internet Explorer requires to // re-assign the document in order for document cache to be updated: var result = x.evaluate(expression,contextNode,document)

Document caching only occurs in Internet Explorer, and only when XPath is applied to HTML documents therein.

Microsoft has made a number of breaking changes to the Internet Explorer DOM since Internet Explorer 7. I have successfully tested my current XPath implementation with Internet Explorer 8 and Internet Explorer 9.

By cloning a HTML document into an XML document element names and attribute names become case sensitive. With Internet Explorer's implementation of JavaScript HTML element names are represented using uppercase characters whereas attribute names are represented using lowercase characters. So when searching for elements and attributes in HTML documents using XPath with Internet Explorer keep in mind to use the correct capitalization.

Test drive

I have created a test page so you can experiment with XPath expressions. It basically consists of a table with three rows of three columns, each containing a hyperlink. Click on any of these hyperlinks to set the XPath context node property to the corresponding table cell.

At the bottom, the test page also provides a block element displaying the source of the document. So you can navigate through the source in order to enter appropriate XPath expressions into the input field at the top of the page.

The result of your XPath expression evaluations will be displayed right below the table. The test page basically calls the toString() method of all nodes returned and outputs the result. If a scalar value was returned, its toString() method is called and returned.

If you enter an invalid XPath expression, a JavaScript alert() dialog is displayed.

The test page source might appear, well, a "little" more stuffed than neccessary. I wanted it to be comfortable and look pretty, so I ended up adding CSS and some tiny comfort JavaScript event handlers here and there… :-)

Implementation notes

Because XPath implementation in Internet Explorer differs significantly from other browser's XPath implementation, I decided to create two distinct XPath JavaScript implementation files, one for Internet Explorer and a second for all other modern browsers. Both files share the same functional interface. If you use above Hyperlink to the XPath library file in your documents, it will automatically load the correct XPath implementation file into your browser.

XPath is no longer available in Internet Explorer as of Internet Explorer 10.

Axel Dahmen Soft- and Hardware-Engineering
03/27/2010 12:23