You are here:

  1. Manual
  2. Development
  3. Character encoding
  4. Guidelines

Character encoding: Guidelines for indicating the character set

Web developers should always specify the character set in their pages:

Specify the character set for web pages.

Guideline R-pd.16.1

Which character set is suitable?

The UTF-8 character set

The UTF-8 character set – from the Unicode family of character sets – has the most extensive repertoire and combines most character sets (Western and Eastern scripts, and symbols).

Advantages of UTF-8

  • The UTF-8 character set is an official international standard for the widest possible character encoding.
  • A UTF-8 encoded document can easily support multiple scripts.
  • Modern operating systems and programmes, including web browsers, support the UTF-8 character set. With internationalisation and wide compatibility in mind, most new standards use the UTF-8 character set. Most word processing and web publishing programmes support reading and saving UTF-8 encoded documents.
  • The first 127 characters of the UTF-8 character set are the same as those of ISO-8859-1. If there is no support for the UTF-8 character set, the browser can display the page in ISO-8859-1 encoding; a text written in a Western script will then still be readable, with the exception of a few special characters.

Specify the UTF-8 character set.

The ISO-8859-1 (Latin-1) character set

This character set does not contain any special characters, like curved quotes or the euro symbol. Likewise, page content in a different script may cause problems. For example, Turkish contains several accented characters that are not included in the ISO-8859-1 character set.

These rare, deviating characters can be represented by character references. However, if content from a database is used, it is more efficient to specify a character set with a larger character repertoire, for instance the UTF-8 character set.

Guideline R-pd.16.2


Web Guidelines version 1.3, November 2007.