5.4 Enabling Namespace Processing

SAX2 supports XML namespaces. If namespace processing is active, parsers won't call startElement(), but instead will call a method named startElementNS(). The default of this setting varies from parser to parser, so you should always set it to a safe value (unless your handler supports both namespace-aware and -unaware processing).

For example, our FindIssue content handler described in previous section doesn't implement the namespace-aware methods, so we should request that namespace processing is deactivated before beginning to parse XML:

from xml.sax import make_parser
from xml.sax.handler import feature_namespaces

# Create a parser
parser = make_parser()

# Disable namespace processing
parser.setFeature(feature_namespaces, 0)

The second argument to setFeature() is the desired state of the feature, mostly commonly a Boolean. You would call parser.setFeature(feature_namespaces, 1) to enable namespace processing.

Namespaces in XML work by first defining a namespace prefix that maps to a given URI specified by the relevant DTD, and then using that prefix to mark elements and attributes that come from that DTD. For example, the XLink specification says that the namaspace URI is "http://www.w3.org/1999/xlink". The following XML snippet includes some XLink attributes:

<root xmlns:xlink="http://www.w3.org/1999/xlink">
  <elem xlink:href="http://www.python.org" />
</root>

The xmlns:xlink attribute on the root element declares that the prefix "xlink" maps to the given URL. The elem element therefore has one attribute named href that comes from the XLink namespace. Namespace-aware methods expect (URI, name) tuples instead of just element and attribute names; instead of "xlink:href", they would receive ('http://www.w3.org/1999/xlink', 'href').

Note that the actual value of the prefix is immaterial, and software shouldn't make assumptions about it. The XML document would have exactly the same meaning if the root element said "xmlns:pref1="http://..."" and the attribute name was given as "pref1:href".

If namespace processing is turned on, you would have to write startElementNS() and endElementNS() methods that looked like this:

    def startElementNS(self, (uri, localname), qname, attrs):
        ...

    def endElementNS(self, (uri, localname, qname):
        ...

The first argument is a 2-tuple containing the URI and the name of the element within that namespace. qname is a string containing the original qualified name of the element, such as "xlink:a", and attrs is a dictionary of attributes. The keys of this dictionary will be (URI, attribute_name) pairs. If no namespace is specified for an element or attribute, the URI will given given as None.