Character encoding is a term for a mechanism that takes place behind the scenes of virtually every digital document. It tells a computer which characters (letters, numbers, punctuation marks, et cetera) are contained in a document: bytes are translated into characters and vice versa.
Character encoding has to do with what a character is according to the computer, and not with how it is displayed for the computer user.
With respect to language, character encoding has to do with the writing system, rahter than vocabulary, grammar or pronunciation. An English text can be written in Japanese characters (albeit with difficulty) and a Japanese text can be written in Western characters. The text stays in the original language, but the writing system is different. See also the chapter Languages.
With a view of internationalisation and support for different languages and writing systems, a number of rules for HTML 4.01 have been set up for the specification of characters in web pages. Web developers can indicate a character set and character references to correctly display characters on web pages. Web developers should always specify a character set for pages. Apart from that, they can use character references at their discretion.
Character encoding and the Web
- Why specifying a character set is important
- Methods for indicating the character set
- Guidelines for indicating the character set
Web developers should always specify the character set in their pages. They are advised to use the UTF-8 character set.
- Character encoding and fonts
- Special characters, forms and databases