1 Introduction to XML

XML, the eXtensible Markup Language, is a simplified dialect of SGML, the Standardized General Markup Language. XML is intended to be reasonably simple to implement and use, and is already being used for specifying markup languages for various new standards: MathML for expressing mathematical equations, Synchronized Multimedia Integration Language for multimedia presentations, and so forth.

SGML and XML represent a document by tagging the document's various components with their function or meaning. For example, a book contains several parts: it has a title, one or more authors, the text of the book, perhaps a preface or an index, and so forth. A markup languge for writing books would therefore have elements indicating what the contents of the preface are, what the title is, and so forth. This logical structure should not be confused with the physical details of how the document is actually printed on paper. The index might be printed with narrow margins in a smaller font than the rest of the book, but markup usually isn't (or shouldn't be, anyway) concerned with details such as this. Instead, other software will translate from the markup language to a typeset format, handling the presentation details.

This section will provide a brief overview of XML and a few related standards, but it's far from being complete because making it complete would require a full-length book and not a short HOWTO. There's no better way to get a completely accurate (if rather dry) description than to read the original W3C Recommendations; you can find links to them in section 1.4, ``Related Links''. If you already know what XML is, you can skip the rest of this section.

Later sections of this HOWTO assume that you're familiar with XML terminology. Most sections will use XML terms such as element and attribute. Section  does not require that you have experience with any of the various Java SAX implentations.


Subsections