You are here:
- Manual
- Development
- Character encoding
- Methods
Character encoding: Methods for indicating the character set
The W3C HTML 4.01 specification prescribes that browsers should not make any assumptions concerning the character set when reading web pages. The character set of pages should be communicated to browsers.
This can be done by several mechanisms, of which the two most important ones are discussed here. In order of priority:
- An HTTP header from the web server indicates the set before sending a page to the browser.
- A
metaelement in the HTML document can tell the browser which character set should be used.
Indicating the character set by means of an HTTP Content-type header
The HTTP Content-type header can be used to indicate a character set for the document to be sent by the server.
The HTTP Content-type header
Content-type: text/html; charset=utf-8
Servers can be configured to send all documents of a particular type with a specific Content-type header. For Apache web servers this can be adjusted in the configuration and/or .htaccess file.
Also specify the character set by means of HTTP headers, if possible.
Guideline R-pd.16.3
However, a number of administrators tends to leave out the specification of a character set from the standard headers sent by their servers. This may be due to lack of familiarity with this technology. Another explanation is that they do not want web developers to be constricted to a single character set for their pages, due to the authoritarian nature of HTTP headers.
Therefore it is important that web developers at least include a meta element that defines the character set in the HTML source code of their pages.
Many server-side scripts, including PHP, are able to generate HTTP headers themselves, independent of the server settings.
PHP for creating a Content-type header
<?php header("Content-type: text/html; charset=utf-8"); ?>
ASP.Net for creating a Content-type header
<% Response.Charset="utf-8" %>
Indicating the character set by means of a meta element
Likewise, web developers can apply a meta element in the HTML source code to indicate the character set of a page.
The meta element for specification of a character set (HTML)
<meta http-equiv="Content-type" value="text/html; charset=utf-8">
The use of a meta element is secondary to the use of HTTP headers. When both are used, the browser will give priority to what the HTTP header prescribes. Nevertheless including a <meta> tag on each page has its advantages.
Advantages of using the meta element
- As a rule, many web servers are neutral in terms of specifying the character set of web pages. As a result, indicating what character is used depends on the <meta> tag in the page.
- If a page is loaded from the computer's hard drive, instead of a web server, it will lack the HTTP headers. In that case, a
metaelement can give the browser an indication of what character set has been used.
Use (at least) the meta element to specify the character set and place this element as high as possible in the head section of the markup.
Guideline R-pd.16.4
Place the meta element as high as possible in the source code
Originally, the Content-type header was the only way to specify a character set. Assigning different character sets to individual pages requires changes to the server settings. Due to the time-consuming nature of this process, the decision was made to apply the meta element for this purpose. It enables web developers to indicate a character set of their own in the web page.
But how can the browser know how to read the file if the relevant information comes from the file itself? This is possible because HTML is written in the 128 characters of the basic US-ASCII set. This set occurs in every character set.
It may happen that the browser does not infer from the Content-type header which character set to use, but subsequently encounters the applicable meta element in the file. If that happens, the browser will stop displaying the page and reread the file in the correct encoding. Web developers can prevent unwanted visual effects by placing the meta element as high as possible in the <head></head> section in the page source code. See also Page structure.
