Unicode supported by XML documents supports 95,000 different characters from various languages across the world. XML defines predefined entities for 5 characters which can be interpreted as markup tags, i.e. <, >, &, ‘,”. For all other characters XML uses character references which are numbers defining the character. That number can be given in a decimal or hexadecimal format.

To introduce characters like µ, π, …etc in a XML document, character references can be used. They can be used in an element content, or attribute values and comments. They also can be present in a DTD attribute default values and entity replacement texts. The tag and attribute names can be written in character set of any language like Chinese, Greek, Russian ..etc but the character references cannot be used there.

The example 1 is valid and well-formed. But example 2 gives error with the character reference of π in element tag.

Valid example of XML character sets usage

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE root[

    <!ELEMENT root (π)>

    <!ELEMENT π ANY>

]>

<root>

    <π>Value of Pi is 22.7</π>

</root>

Non-valid example of XML character sets usage

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE root[

    <!ELEMENT root (&#x3C6;)>

    <!ELEMENT &#x3C6; ANY>

]>

<root>

    <&#x3C6;>Value of Pi is 22.7</&#x3C6;>

</root>

If there are many character references in a document it can be collectively defined as entities in a DTD document with extension .ent. The entities can be used in a document.

Example of entities used in XML document

Math.ent

<!ENTITY phi “&#x3C6;”>

<!ENTITY alpha  “&#x3B1;”>

 

MyDocument.xml

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE root[

    <!ENTITY % math SYSTEM “math.ent”>

    %math;

]>

<root>

    <line>&phi; value is 22.7</line>

    <line>Let &alpha; be value of circumference of circle</line>

</root>

 

›› go to examples ››