1.1 Elements, Attributes and Entities

A markup language specified using XML looks a lot like HTML; a document consists of a single element, which contains sub-elements, which can have further sub-elements inside them. Elements are indicated by tags in the text. Tags are always inside angle brackets < >. Elements can either contain content, or they can be empty.

An element can contain content between opening and closing tags, as in <name>Euryale</name>, which is a name element containing the data "Euryale". This content may be text data, other XML elements, or a mixture of both.

Elements can also be empty, containing nothing, and are represented as a single tag ended with a slash. For example, <stop/> is an empty stop element. Unlike HTML, XML element names are case-sensitive; stop and Stop are two different elements.

Opening and empty tags can also contain attributes, which specify values associated with an element. For example, in the XML text <name lang='greek'>Herakles</name>, the name element has a lang attribute which has a value of "greek". In <name lang='latin'>Hercules</name>, the attribute's value is "latin".

XML also includes entities as a shorthand for including a particular character or a longer string. Entity references always begin with a "&" and end with a ";". For example, a particular Unicode character can be written as &#4660; using its character code in decimal, or as &#x1234; using hexadecimal. It's also possible to define your own entities, making &title; expand to ``The Odyssey'', for example. If you want to include the "&" character in XML content, it must be written as &amp;.