You are here:
- Manual
- Development
- Character encoding
- Guidelines
Character encoding: Guidelines for indicating the character set
Web developers should always specify the character set in their pages:
Specify the character set for web pages.
Guideline R-pd.16.1
- First it must be determined whether the character set can be indicated by means of HTTP headers. The standard headers should not conflict with the intended set.
- Subsequently, the source code of each page must contain a
metaelement that specifies the character set. This element must be placed as high as possible in the<head></head>section of the document.
Which character set is suitable?
The UTF-8 character set
The UTF-8 character set – from the Unicode family of character sets – has the most extensive repertoire and combines most character sets (Western and Eastern scripts, and symbols).
Advantages of UTF-8
- The
UTF-8character set is an official international standard for the widest possible character encoding. - A
UTF-8encoded document can easily support multiple scripts. - Modern operating systems and programmes, including web browsers, support the
UTF-8character set. With internationalisation and wide compatibility in mind, most new standards use theUTF-8character set. Most word processing and web publishing programmes support reading and savingUTF-8encoded documents. - The first 127 characters of the
UTF-8character set are the same as those ofISO-8859-1. If there is no support for theUTF-8character set, the browser can display the page inISO-8859-1encoding; a text written in a Western script will then still be readable, with the exception of a few special characters.
Specify the UTF-8 character set.
The ISO-8859-1 (Latin-1) character set
This character set does not contain any special characters, like curved quotes or the euro symbol. Likewise, page content in a different script may cause problems. For example, Turkish contains several accented characters that are not included in the ISO-8859-1 character set.
These rare, deviating characters can be represented by character references. However, if content from a database is used, it is more efficient to specify a character set with a larger character repertoire, for instance the UTF-8 character set.
Guideline R-pd.16.2
