The easiest way to get a DOM tree is to have it built for you. PyXML
offers two alternative implementations of the DOM,
xml.dom.minidom and 4DOM
. xml.dom.minidom is
included in Python 2. It is a minimal implementation, which means it
does not provide all interfaces and operations required by the DOM
standard. 4DOM
, part of the 4Suite set of XML tools
(http://www.4suite.org), is a complete implementation of
DOM Level 2 Core, so we will use that in the examples.
The xml.dom.ext.reader package contains a number of classes that build a DOM tree from various input sources. One of the modules in the xml.dom package is named Sax2, and contains a Reader class that builds a DOM tree from a series of SAX2 events. Reader instances provide a fromStream() method that constructs a DOM tree from an input stream; the input can be a file-like object or a string. In the second case, it will be assumed to be a URL and will be opened with the urllib module.
import sys from xml.dom.ext.reader import Sax2 # create Reader object reader = Sax2.Reader() # parse the document doc = reader.fromStream(sys.stdin)
fromStream() returns the root of a DOM tree constructed from the input XML document.