HTML: The Definitive Guide

Previous Chapter 3
Anatomy of an HTML Document

3.4 Document Content

Nearly everything else you put into your HTML document that isn't a tag is by definition content, and the majority of that, in most HTML documents, is text. Like tags, document content is encoded using a specific character set, the ISO-8859-1 Latin character set, by default. This character set is a superset of conventional ASCII, adding the necessary characters to support the Western European languages. If your keyboard does not allow you to directly enter the characters you need, you can use character entities to insert the desired characters.

Advice Versus Control

Perhaps the hardest rule to remember when marking up an HTML document is that all the tags you insert regarding text display and formatting are only advice for the browser: they do not explicitly control how the browser will display the document. In fact, the browser can choose to ignore all of your tags and do what it pleases with the document content. What's worse, the user (of all people!) has control over text-display characteristics of his or her own browser.

Get used to this lack of control. The best way to use HTML markup to control the appearance of your documents is to concentrate on the content of the document, not on its final appearance. If you find yourself worrying excessively about spacing, alignment, text breaks, and character positioning, you'll surely end up with ulcers. You will have gone beyond the intent of HTML. If you focus on delivering information to users in an attractive manner, using the tags to advise the browser as to how best to display that information, you are using HTML effectively, and your documents will render well on a wide range of browsers.

Character Entities

Besides common text, HTML gives you a way to display special text characters you might not normally be able to include in your source document or which have other purposes in HTML. A good example is the less-than or opening bracket (<) symbol. In HTML, it normally signifies the start of a tag, so if you insert it simply as part of your text, the browser will get confused and probably misinterpret your document.

In HTML, the ampersand character instructs the browser to insert a special character, formally known as a character entity. For example, the command &lt; inserts that pesky less-than symbol into the rendered text. Similarly, &gt; inserts the greater-than symbol, and &amp; inserts an ampersand. There can be no spaces between the ampersand, the entity name, and the required, trailing semicolon. (Semicolons aren't special characters; you don't need to use an ampersand sequence to display a semicolon normally.)

You also may replace the entity name after the ampersand with a decimal value between 0 and 255 corresponding to the entity's position in the character set. Hence, the sequence &#60; does the same thing as &lt; and represents the less-than symbol. In fact, you could substitute all the normal characters within an HTML document with ampersand-special characters, such as &#65; for a capital "A" or &#97; for its lowercase version, but that would be silly. A complete listing of all characters, their names, and numerical equivalents can be found in Appendix E, Character Entities.

Keep in mind that not all special characters can be rendered by all browsers. Some browsers just ignore many of the special characters; with others, the characters aren't available in the character sets on a specific platform. Be sure to test your documents on a range of browsers before electing to use some of the more obscure character entities.



Comments are another type of textual content that appear in the source HTML document, but are not rendered by the user's browser. Comments fall between the special <!- - and - -> markup elements. Browsers ignore the text between the comment character sequences.

Here's a sample comment:

<!-- This is a comment -->
<!-- This is a 
multiple line comment
that ends on this line -->

There must be a space after the initial <!- - and preceding the final - ->, but otherwise you can put nearly anything inside the comment. The biggest exception to this rule is that the HTML standard doesn't let you nest comments.[2]

[2] Netscape does let you nest comments, but the practice is tricky; you cannot always predict how other browsers will react to nested comments.

As we mentioned above, Internet Explorer also lets you place comments within a special <comment> tag. Everything between the <comment> and </comment> tag is ignored by Internet Explorer, but all other browsers will display the comment to the user. Because of this undesirable behavior, we do not recommend using the <comment> tag for comments. Instead, always use the <!- - and - -> sequences to delimit comments.

Besides the obvious use of comments for HTML source documentation, many World Wide Web servers use comments to take advantage of features specific to the document server software. These servers scan the document for specific character sequences within conventional HTML comments and then perform some action based upon the commands embedded in the comments. The action might be as simple as including text from another file (known as a server-side include) or as complex as executing other commands on the server to dynamically generate the document contents.

Previous Home Next
HTML Tags Book Index HTML Document Elements

HTML: The Definitive Guide CGI Programming JavaScript: The Definitive Guide Programming Perl WebMaster in a Nutshell
Hosted by uCoz