6.1 Getting A DOM Tree

The easiest way to get a DOM tree is to have it built for you. PyXML offers two alternative implementations of the DOM, xml.dom.minidom and 4DOM. xml.dom.minidom is included in Python 2. It is a minimal implementation, which means it does not provide all interfaces and operations required by the DOM standard. 4DOM, part of the 4Suite set of XML tools (http://www.4suite.org), is a complete implementation of DOM Level 2 Core, so we will use that in the examples.

The xml.dom.ext.reader package contains a number of classes that build a DOM tree from various input sources. One of the modules in the xml.dom package is named Sax2, and contains a Reader class that builds a DOM tree from a series of SAX2 events. Reader instances provide a fromStream() method that constructs a DOM tree from an input stream; the input can be a file-like object or a string. In the second case, it will be assumed to be a URL and will be opened with the urllib module.

import sys
from xml.dom.ext.reader import Sax2

# create Reader object
reader = Sax2.Reader()

# parse the document
doc = reader.fromStream(sys.stdin)

fromStream() returns the root of a DOM tree constructed from the input XML document.