Internationalization simply refers to buildoing an application compatible in all regions and languages. An internationalized document has data such as date…etc in the format suitable to the end-user’s region and language. There are no engineering changes required for the internationalized document. It supports multiple character encoding standards and can be tailor made to a local user easily.

XML internationalization

The XML document begins with the XML version, if DTD is external or internal, and encoding declaration of the document. Encoding is a binary representation of the characters of the document. First few bytes of the document called byte order mark (BOM) tell the parser that encoding type of the XML document. This helps the parser to read the contents of the document.

Syntax for XML internationalization

<?xml version="1.0" encoding="UTF-8" standalone=”yes”?>

It is mandatory to declare the encoding format of a XML document. If a file is encoded in one format it has to be decoded by end user in same format; else meaningless characters appear on document making it difficult to read.

XML processors basically has to support UTF-8 and UTF-16 character sets, where, UTF stands for UCS Transformation Format. UCS is Universal Character Set. In this character sets, each character is represented using 8 bit or 16 bit respectively. UTF-8 is the default character set of the XML documents. Other characters supported in XML across the world are ISO-8859-1, Windows-1252, and EDCDIC

For the parsed external entities which are defined in encoding other than UTF-8 or UTF-16 must have a text declaration. The text declaration should be included in the document used. Text declarations are similar to the XML declaration however standalone declaration is not included. If the encoding of the external parsed entity is KOI8-R, text declaration is given below:

Syntax for XML internationalization and text declarations 

<?xml version=”1.0” encoding=” KOI8-R”?>

 

›› go to examples ››