A legal XML document must, as a minimum, be well-formed: each
opening tag must have a corresponding closing tag, and tags must nest
properly. For example, <b><i>text</b></i>
is not well-formed
because the i
element should be enclosed inside the
b
element, but instead the closing </b>
tag is
encountered first. This example can be made well-formed by swapping
the order of the closing tags, resulting in <b><i>text</i></b>
.
If you've ever written HTML by hand, you may have acquired the habit of being a bit sloppy about this. Strictly speaking HTML has exactly the same rules about nesting tags as XML, but most Web browsers are very forgiving of errors in HTML. This is convenient for HTML authors, but it makes it difficult to write programs to parse HTML input because the programs have to cope with all sorts of malformed input.
The authors of the XML specification didn't want XML to fall into the same trap, because it would make XML processing software much harder to write. Therefore, all XML parsers have to be strict and must report an error if their input isn't well-formed. The Expat parser includes an executable program named xmlwf that parses the contents of files and reports any well-formedness violations; it's very handy for checking XML data that's been output from a program or written by hand.