6.5 Walking Over The Entire Tree

Once you have a tree, another common task is to traverse it. Document instances have a method called createTreeWalker(root, whatToShow, filter, entityRefExpansion) that returns an instance of the TreeWalker class.

Once you have a TreeWalker instance, it allows traversing through the subtree rooted at the root node. The currentNode attribute contains the current node that's been reached in this traversal, and can be advanced forward or backward by calling the nextNode() and previousNode() methods. There are also methods titled parentNode(), firstChild(), lastChild(), and nextSibling(), previousSibling() that return the appropriate value for the current node.

whattoshow is a bitmask with bits set for each type of node that you want to see in the traversal. Constants are available as attributes on the NodeFilter class. 0 filters out all nodes, NodeFilter.SHOW_ALL traverses every node, and constants such as SHOW_ELEMENT and SHOW_TEXT select individual types of node.

filter is a function that will be passed every traversed node, and can return NodeFilter.FILTER_ACCEPT or NodeFilter.FILTER_REJECT to accept or reject the node. filter can be passed as None in order to accept all nodes.

Here's an example that traverses the entire tree and prints out every element.

from xml.dom.NodeFilter import NodeFilter

walker = doc.createTreeWalker(doc.documentElement,
                              NodeFilter.SHOW_ELEMENT, None, 0)

while 1:
    print walker.currentNode.tagName
    next = walker.nextNode()
    if next is None: break