Well-formedness just says that all tags nest properly and that every opening tag is matched by a closing tag. It says nothing about the order of elements or about which elements can be contained inside other elements.
The following XML, apparently representing a book, is well-formed but it doesn't match the structure expected for a book:
<book> <index> ... </index> <chapter> ... </chapter> <chapter> ... </chapter> <abstract> ... </abstract> <chapter> ... </chapter> <preface> ... </preface> </book>
Prefaces don't come at the end of books, the index doesn't belong at the front, and the abstract doesn't belong in the middle. Well-formedness alone doesn't provide any way of enforcing that order. You could write a Python program that took an XML file like this and checked whether all the parts are in order, but then someone wanting to understand what documents are legal would have to read your program.
Document Type Definitions, or DTDs for short, are a more concise
way of enforcing ordering and nesting rules. A DTD declares the
element names that are allowed, and how elements can be nested inside
each other. To take an example from HTML, the LI
element,
representing an entry in a list, can only occur inside certain
elements which represent lists, such as OL
or UL
.
The DTD also specifies the attributes that can be provided for each
element, the default value for each attribute, and whether the
attribute can be omitted. A validating parser can take a
document and a DTD, and check whether the document is legal according
to the DTD's rules. (The PyXML package includes a validating parser
called xmlproc.)
DTDs are therefore an example of a schema language, a language for specifying a set of legal XML documents. Other applications want even stricter control over which documents are legal, and there are therefore stricter schema languages. XML Schema provides a type system and a number of basic types, so you can say that the value of an attribute must be a number or a date. RELAX NG is another schema language that provides more power and flexibility than XML Schema, but is simpler to read and implement.
Note that it's quite possible to get useful work done without using any schema language at all. You might decide that just writing well-formed XML and checking it with a Python program is all you need. There's no reason to drag in a schema language if it won't be useful.
Let's return to DTDs. A DTD lists the supported elements, the order in which elements must occur, and the possible attributes for each element. Here's a fragment from an imaginary DTD for writing books:
<!ELEMENT book (abstract?, preface, chapter*, appendix?)> <!ELEMENT abstract ...> <!ELEMENT chapter ...> <!ATTLIST chapter id ID #REQUIRED title CDATA #IMPLIED>
The first line declares the book
element, and specifies the
elements that can occur inside it and the order in which the
subelements must be provided. DTDs borrow from regular expression
notation in order to express how elements can be repeated; "?"means an element must occur 0 or 1 times, "*" is 0 or more times,
and "+" means the element must occur 1 or more times. For
example, the above declarations imply that the abstract
and
appendix
elements are optional inside a book
element. Exactly one preface
element has to be present, and
it can be followed by any number of chapter
elements; having
no chapters at all would be legal.
The ATTLIST
declaration specifies attributes for the
chapter
element. Chapters can have two attributes,
id
and title
. title
contains
character data (CDATA) and is optional (that's what "#IMPLIED"means, for obscure historical reasons). id
must contain
an ID value, and it's required and not optional.
A validating parser could take this DTD and a sample document, and report whether the document is valid according to the rules of the DTD. A document is valid if all the elements occur in the right order, and in the right number of repetitions.