Introduction to the TreeWalker object of DOM

Home

The TreeWalker object is a powerful DOM2 object that lets you easily filter through and create custom collections out of nodes in the document. Ok, this is sounding geeky already, but for geeky jobs requiring parsing the document tree, it doesn't hurt at all to get familiar with this object. While scripting you may have come across the need to retrieve all elements in a webpage with a specific CSS classname, or for a XML document, elements that carry a particular attribute value. The TreeWalker object makes light work of accomplishing such tasks. In this tutorial, I'll provide a introductory look at the TreeWalker object of DOM2, which is a DOM2 method supported in Firefox/Opera8+ though not IE6 or IE7 (as of beta3).

Before I continue, note that there is a cousin to the TreeWalker object called NodeIterator, which I'll cover in a future tutorial.

document.createTreeWalker() method

The TreeWalker object can come off as mysterious and complicated to some, but it really is just realized through a single method- document.createTreeWalker(). This method and the 4 parameters it accepts simplifies what may take many times the conventional coding required to, say, filter all nodes in the document that are of a certain element type and carry a particular attribute. But before we get to all that, here's a basic description of document.createTreeWalker():

document.createTreeWalker(root, nodesToShow, filter, entityExpandBol)

Time to break down the 4 parameters:

root: The root node to begin searching the document tree using.
nodesToShow: The type of nodes that should be visited by TreeWalker.
filter (or null): Reference to custom function (NodeFilter object) to filter the nodes returned. Enter null for none.
entityExpandBol: Boolean parameter specifying whether entity references should be expanded.

For 3), the valid constant values are:

NodeFilter constants
NodeFilter.SHOW_ALL	NodeFilter.SHOW_ ENTITY_REFERENCE	NodeFilter.SHOW_ DOCUMENT_TYPE
NodeFilter.SHOW_ELEMENT	NodeFilter.SHOW_ENTITY	NodeFilter.SHOW_FRAGMENT
NodeFilter.SHOW_ATTRIBUTE	NodeFilter.SHOW_ PROCESSING_INSTRUCTION	NodeFilter.SHOW_NOTATION
NodeFilter.SHOW_TEXT	NodeFilter.SHOW_COMMENT
NodeFilter.SHOW_ CDATA_SECTION	NodeFilter.SHOW_DOCUMENT

While there are 15 different NodeFilter constants to let you limit the type of nodes returned by TreeWalker, you probably will just be working with a few of them most of the time. NodeFilter.SHOW_ELEMENT for example returns all element nodes.

Ok, so you're dying to see a demonstration of document.createTreeWalker(), a very rudimentary one to start:

<div id="contentarea">
<p>Some <span>text</span></p>
<b>Bold text</b>
</div>

<script type="text/javascript">

var rootnode=document.getElementById("contentarea")
var walker=document.createTreeWalker(rootnode, NodeFilter.SHOW_ELEMENT, null, false)

</script>

In this example, I specify the root node for TreeWalker to begin traversing to the container with ID "contentarea". The second parameter for the object specifies that TreeWalker should only crawl element nodes (versus text nodes, comment nodes etc) within the container. The third parameter, set to null, means no additional filtering should be done (not yet!). The 4th parameter concerns whether entity references should be expanded, and is set to false. With all the parameters in place, "walker" now references all elements (P, SPAN, and B) within the DIV, along with the DIV itself.

TreeWalker traversal methods

Having created a filtered list of nodes using document.createTreeWalker(), you can then process these filtered nodes using TreeWalker's traversal methods:

TreeWalker traversal methods
Method	Description
firstChild()	Travels to and returns the first child of the current node.
lastChild()	Travels to and returns the last child of the current node.
nextNode()	Travels to and returns the next node within the filtered collection of nodes.
nextSibling()	Travels to and returns the next sibling of the current node.
parentNode()	Travels to and returns the current node's parent node.
previousNode()	Travels to and returns the previous node of the current node.
previousSibling()	Travels to and returns the previous sibling of the current node.

TreeWalker traversal properties
Property	Description
currentNode	Returns the current position/ node of TreeWalker. Read/write, allowing you to explicitly set the current position of TreeWalker to a particular node within the nodes returned.

Don't confuse the above methods with the standard DOM element properties/ methods; the ones work exclusively within the TreeWalker object to let you navigate the filtered nodes.

Using the same example as above, lets see how to use the traversal methods to walk through the returned nodes:

<div id="contentarea">
<p>Some <span>text</span></p>
<b>Bold text</b>
</div>

<script type="text/javascript">

var rootnode=document.getElementById("contentarea")
var walker=document.createTreeWalker(rootnode, NodeFilter.SHOW_ELEMENT, null, false)

//Alert the starting node Tree Walker currently points to (root node)
alert(walker.currentNode.tagName) //alerts DIV (with id=contentarea)

//Step through and alert all child nodes
while (walker.nextNode())
alert(walker.currentNode.tagName) //alerts P, SPAN, and B.

//Go back to the first child node of the collection and alert it
walker.currentNode=rootnode //reset TreeWalker pointer to point to root node
alert(walker.firstChild().tagName) //alerts P

</script>

As you use the traversal methods to step through the nodes, TreeWalker not only returns the node in question, but travels to it. This is why after stepping through the nodes using:

while (walker.nextNode())
//code here

I reset TreeWalker's position back to its root node before trying to retrieve the firstChild of the filtered collection:

walker.currentNode=rootnode //reset TreeWalker pointer to point to root node

This is necessary, since TreeWalker prior to that point has its pointer directed at the very last node (B element) of the collection due to the while loop, in which there is no firstChild. and even if there were, is not the firstChild of the entire filtered collection, but the B element's!

Ok, another example of traversal in TreeWalker to solidify our understanding of it:

<p id="essay">George<span> loves </span> <b>JavaScript!</b></p>

<script type="text/javascript">

var rootnode=document.getElementById("essay")
var walker=document.createTreeWalker(rootnode, NodeFilter.SHOW_TEXT, null, false)

walker.firstChild() //Walk to first child node (the text "George")
var paratext=walker.currentNode.nodeValue
while (walker.nextSibling()){ //Step through each sibling of "George"
paratext+=walker.currentNode.nodeValue
}

alert(paratext) //alerts "George loves JavaScript!"

</script>

In this example, I traverse all text nodes of the root container to get its entire textual content.

You're free to use standard DOM element properties/ methods on top of the TreeWalker traversal methods, though the returned information reflect that node's relationship relative to the entire document, not just the filtered results. An example should drive this point home:

<ul id="mylist">
<li>List 1</li>
<li>List 2</li>
<li>List 3</li>
</ul>

<script type="text/javascript">

var rootnode=document.getElementById("mylist")
var walker=document.createTreeWalker(rootnode, NodeFilter.SHOW_ELEMENT, null, false)

alert(walker.currentNode.childNodes.length) //alerts 7 (includes text nodes)
alert(walker.currentNode.getElementsByTagName("*").length) //alerts 3

</script>

In this example I'm using TreeWalker to filter out all elements of the UL element. The line of interest here is:

alert(walker.currentNode.childNodes.length) //alerts 7 (includes text nodes)

You may have expected 3 to be alerted; after all, there are only 3 elements within the UL list. However, "childNode" is a DOM property, not TreeWalker's, and returns information about a node oblivious to any filtering that may have taken place by TreeWalker! That's why 7 is returned, the total number of nodes including text nodes that the UL contains. The same concept applies to DOM methods that you may invoke on top of a TreeWalker returned node.

Having learned to navigate the nodes filtered by document.createTreeWalker(), it's time to see how to refine the filtering process itself. Recall that the 3rd parameter of document.createTreeWalker() accepts an optional reference to a filtering function. Lets look at that next.

Introduction to the TreeWalker object of DOM
Filtering in document.createTreeWalker()

Filtering in document.createTreeWalker()