Introduction to the TreeWalker object of DOM
The TreeWalker object is a powerful DOM2 object that lets you easily filter through and create custom collections out of nodes in the document. Ok, this is sounding geeky already, but for geeky jobs requiring parsing the document tree, it doesn't hurt at all to get familiar with this object. While scripting you may have come across the need to retrieve all elements in a webpage with a specific CSS classname, or for a XML document, elements that carry a particular attribute value. The TreeWalker object makes light work of accomplishing such tasks. In this tutorial, I'll provide a introductory look at the TreeWalker object of DOM2, which is a DOM2 method supported in Firefox/Opera8+ though not IE6 or IE7 (as of beta3).
Before I continue, note that there is a cousin to the TreeWalker object called NodeIterator, which I'll cover in a future tutorial.
The TreeWalker object can come off as mysterious and complicated to some, but it really is just realized through a single method- document.createTreeWalker(). This method and the 4 parameters it accepts simplifies what may take many times the conventional coding required to, say, filter all nodes in the document that are of a certain element type and carry a particular attribute. But before we get to all that, here's a basic description of document.createTreeWalker():
document.createTreeWalker(root, nodesToShow, filter, entityExpandBol)
Time to break down the 4 parameters:
For 3), the valid constant values are:
While there are 15 different NodeFilter constants to let you limit the type of nodes returned by TreeWalker, you probably will just be working with a few of them most of the time. NodeFilter.SHOW_ELEMENT for example returns all element nodes.
Ok, so you're dying to see a demonstration of document.createTreeWalker(), a very rudimentary one to start:
In this example, I specify the root node for TreeWalker to begin traversing to the container with ID "contentarea". The second parameter for the object specifies that TreeWalker should only crawl element nodes (versus text nodes, comment nodes etc) within the container. The third parameter, set to null, means no additional filtering should be done (not yet!). The 4th parameter concerns whether entity references should be expanded, and is set to false. With all the parameters in place, "walker" now references all elements (P, SPAN, and B) within the DIV, along with the DIV itself.
TreeWalker traversal methods
Having created a filtered list of nodes using document.createTreeWalker(), you can then process these filtered nodes using TreeWalker's traversal methods:
Using the same example as above, lets see how to use the traversal methods to walk through the returned nodes:
As you use the traversal methods to step through the nodes, TreeWalker not only returns the node in question, but travels to it. This is why after stepping through the nodes using:
while (walker.nextNode()) //code here
I reset TreeWalker's position back to its root node before trying to retrieve the firstChild of the filtered collection:
walker.currentNode=rootnode //reset TreeWalker pointer to point to root node
This is necessary, since TreeWalker prior to that point has its pointer directed at the very last node (B element) of the collection due to the while loop, in which there is no firstChild. and even if there were, is not the firstChild of the entire filtered collection, but the B element's!
Ok, another example of traversal in TreeWalker to solidify our understanding of it:
In this example, I traverse all text nodes of the root container to get its entire textual content.
You're free to use standard DOM element properties/ methods on top of the TreeWalker traversal methods, though the returned information reflect that node's relationship relative to the entire document, not just the filtered results. An example should drive this point home:
In this example I'm using TreeWalker to filter out all elements of the UL element. The line of interest here is:
alert(walker.currentNode.childNodes.length) //alerts 7 (includes text nodes)
You may have expected 3 to be alerted; after all, there are only 3 elements within the UL list. However, "childNode" is a DOM property, not TreeWalker's, and returns information about a node oblivious to any filtering that may have taken place by TreeWalker! That's why 7 is returned, the total number of nodes including text nodes that the UL contains. The same concept applies to DOM methods that you may invoke on top of a TreeWalker returned node.
Having learned to navigate the nodes filtered by document.createTreeWalker(), it's time to see how to refine the filtering process itself. Recall that the 3rd parameter of document.createTreeWalker() accepts an optional reference to a filtering function. Lets look at that next.