Character encoding is a term for a mechanism that takes place behind the scenes of virtually every digital document. It tells a computer which characters (letters, numbers, punctuation marks, et cetera) are contained in a document: bytes are translated into characters and vice versa.

Character encoding has to do with what a character is according to the computer, and not with how it is displayed for the computer user.

With respect to language, character encoding has to do with the writing system, rahter than vocabulary, grammar or pronunciation. An English text can be written in Japanese characters (albeit with difficulty) and a Japanese text can be written in Western characters. The text stays in the original language, but the writing system is different. See also the chapter Languages.

With a view of internationalisation and support for different languages and writing systems, a number of rules for HTML 4.01 have been set up for the specification of characters in web pages. Web developers can indicate a character set and character references to correctly display characters on web pages. Web developers should always specify a character set for pages. Apart from that, they can use character references at their discretion.

Web Guidelines version 1.3, November 2007.