Guidelines for HTML Standards

Manish Langa

Project Leader

Introduction

This article concludes our introduction to HTML with a presentation of some valuable guidelines for working with HTML documents and code that will help maximize their maintainability and reusability. Of central importance is the need to understand HTML and its role in Web applications, to plan ahead for maintainable and reusable code, and to adopt a consistent policy on coding style.

Coding Style Guidelines

Consistency is absolutely a prerequisite for maximizing maintainability and reusability. These general guidelines for coding style can form the basis of a set of standards that will help ensure that all developers in a project—or, better, in all projects across an organization—write code consistently.

Use well-formed HTML.
Pick good names and ID values.
Indent consistently.
Limit line length.
Standardize character case.
Use comments judiciously.

Use Well-formed HTML

Although Web browsers are generally forgiving and can ignore many mistakes, rendering most HTML as the document author intended, it is still a good idea to use well-formed HTML code, for a number of reasons.

Well-formed markup code is a concept that has gained importance with increased implementation of XML. While browsers did not, in general, enforce HTML language rules very closely, XML parsers do. Code is considered well formed when it is structured according to the rules for XML 1.0. These rules relate to character case, tags, nesting, and attribute values.

In general, when most browsers encounter an unrecognized or extraneous tag, they ignore them. However, different browsers might deliver results in different—and unpredictable—ways. In addition, future versions of browsers might adhere to standards more closely than do current versions. Finally, code that includes such elements can be harder to read and understand, making maintenance more difficult.

Lowercase names—To be well-formed, element and attribute names must be in all lower case. In versions through 4.01, HTML is not case-sensitive. However, XML is case-sensitive, and it follows that the XHTML 1.0 recommendation is also case-sensitive. So, to ensure that code keeps working and to maximize reusability, this must be planned for.
Closing tags—All nonempty elements must have corresponding closing tags. Empty elements—those previously signified with a single tag, such as

and
—must be followed immediately by a corresponding closing tag, or the tag must end with "/". For example,
and
are both examples of well-formed code.
Nested elements—All nested attributes must be properly nested—for example:
Some text
Note that the tag and its corresponding closing tag, , are both nested inside the
and tags.
If elements overlap, then they are not properly nested, as illustrated in the following code:
Some text
While many browsers have accepted overlapping elements and given the expected results, they have always been, strictly speaking, illegal in HTML, and future versions of browsers might not support them.

Attribute values—Attribute values, even numeric attributes should be quoted—for example:
Code validation: Another step toward improving HTML code is to validate it against a formal published grammar and to declare this validation at the beginning of the HTML document. For example, the following line declares validation against the public HTML 3.2 Final grammar:
A list of formal published grammars is available from the W3C at http://validator.w3.org/sgml-lib/catalog. The W3C also has a public HTML validation service at http://validator.w3.org/.

Assign meaningful Names and ID Values

Use a consistent scheme for assigning the value of name and ID properties. They should be as short as reasonably possible, but without giving up descriptive power. Also, use mixed-case property values to help readability (see Listing 2). In this code snippet, the check box names express not only what the purpose of the element is, but also information about the element's type. The code also illustrates the use of mixed case to help readability.

Listing 2: Example of Good Element Names

Member?

Admin?

Owner?

HTML primarily refers to elements by their name property, while DHTML and client-side scripts use the ID property. Although DHTML documents IDs must be unique in the document, in general, there is no reason not to use the same value for an element's name and ID properties. Using the same value for these properties can reduce confusion that might arise when mixing HTML and client-side scripting.

Indent Consistently

Use indentation consistently to enhance the readability of the code. When elements carry over more than one line of code, indent the contents of elements between the start tag and the end tag. This will make it easy to see where the element begins and ends. Also, use indentation to align code at attribute names (see Listing 3).

It is a good idea to use no more than two to four spaces for each level in indentation, so as not to use up all the available line length in indentation. If possible, set up the development tool to convert tabs to spaces so that the indentation will be the same when the source is viewed in different editors or as printed output.

Listing 3: Indent Code Consistently

To log into the system, enter your user
    name and password in the text boxes. Then
    click the "Login" button.

Limit Line Length

Break up lines when they run too long. It is much easier to read and understand code when you can see the entire line at once. When lines of code are so long that the reader must scroll right and left to read them, it requires much more cognitive effort to understand what the code is doing. Alternatively, in some applications, long lines might wrap to the next line at the nearest word break. In either case, source code is much easier to read and understand if the developer takes explicit control of line length.

HTML is not sensitive to line breaks, so the developer can break lines at will between keywords for readability. For example, Listing 4 illustrates a code snippet in which two elements have word-wrapped to the next line because they were two long for the editor window.

Listing 4: HTML Source Code with Uncontrolled Line Breaks

"JavaScript" onclick="return NameValid();">

language="JavaScript" onclick="return AddrValid();">

Compare this with Listing 5, where the developer took explicit control of line length. Here the code is much easier to read because the developer used line breaks and indenting to visually organize the source code.

Listing 5: HTML Source Code with Explicit Line Breaks

    name="txtName"
    language="JavaScript"
    onclick="return NameValid();">

    name="txtAddress"
    language="JavaScript"
    onclick="return AddrValid();">

Keep the limitations of printed output in mind as well. Lines longer than 80 characters will often wrap in printed output without consideration for word breaks, making source code very difficult to read.

Standardize Character Case

Source code is easier to read if the developer has applied a consistent set of rules for the use of character case—for example, the use of lower case exclusively for HTML tags. When scanning source code, the reader can unconsciously apply a visual filter, focusing attention on the HTML keywords.

The approach taken in code that appears in this article is to use all lowercase letters for HTML tags and the names of its attributes, while using mixed case and a modified form of Hungarian Notation for some attribute values (see the sidebar entitled "Hungarian Notation").

Hungarian Notation

Hungarian Notation is a convention for naming identifiers that adds a prefix to the name to provide information about the type and scope of the identifier. Dr. Charles Simonyi, a Microsoft Chief Architect at the time, introduced Hungarian Notation in the early 1980's. Long an internal Microsoft standard, variants of the convention have been widely adopted outside of Microsoft as well.

As an example of a simplified Hungarian Notation scheme, variables that contain a string could be prefixed with the character s, and a variable with global scope could be indicated with a gprefix. In this case, then, the variables sTemp and gsName in source code would be immediately identifiable as string variables with local and global scope, respectively.

In general, HTML is not a typed language, and Hungarian Notation plays a more important role in other types of Web development. However, in some cases it can add to readability. For example, the names or IDs of form elements are likely candidates for a modified form of Hungarian Notation. The prefix "btn" or "cmd" might be used for an input button. Text boxes might be prefixed with "txt," and check boxes might be prefixed with "chk" or "cb."

Use Comments Judiciously

Good comments can be invaluable for understanding and maintaining code. However, the unique nature of HTML introduces a trade-off between the value of thorough comments and the efficiency of the Web application.

The Web server reads in the HTML code and sends it as a stream of text over the network to the browser. Only after arriving at the client does the browser parse and interpret the HTML code, displaying the visible elements and ignoring the comments. The obvious implication is that the comments add nothing to the document as the browser displays it, yet they add to the processing overhead on both the server and client computers, and they increase the amount of data transferred. With almost 50 percent comments, Listing 6 illustrates what is probably excessively commented code.

Listing 6: Heavily Commented HTML Code

The trick is to find an appropriate level of commenting that balances these two issues. It is a good idea to comment the major logical flow and document sections to help readers quickly gain an overview of the code. Also comment dependencies and assumptions. Consistently following the other design and coding guidelines as suggested in this article—especially the ones related to naming and metadata—will help create self-documenting code.

Listing 7 illustrates how fewer comment lines and more descriptive element names can combine to provide effective documentation with a lot less overhead.

Listing 7: Lightly Commented HTML Code

Check list

Use Well-formed HTML

Avoid Style attributes in html

All non empty elements must have corresponding closing tags.

use Lowercase names

All nested attributes must be properly nested—for example:

Some text

Attribute values, even numeric attributes should be quoted

Pick Good Names and ID Values

Use a consistent scheme for assigning the value of name and ID properties.

Documents IDs must be unique in the document

Indent Consistently

Use indentation consistently to enhance the readability of the code

Standardize Character Case

Hungarian Notation is a convention for naming identifiers that adds a prefix to the name to provide information about the type and scope of the identifier.e.g. txt for text

Use Comments Judiciously

Blogs

Guidelines for HTML Standards

Manish Langa

Introduction

Coding Style Guidelines

Use Well-formed HTML

Assign meaningful Names and ID Values

Listing 2: Example of Good Element Names

Indent Consistently

Listing 3: Indent Code Consistently

Limit Line Length

Listing 4: HTML Source Code with Uncontrolled Line Breaks

Listing 5: HTML Source Code with Explicit Line Breaks

Standardize Character Case

Use Comments Judiciously

Listing 6: Heavily Commented HTML Code

Listing 7: Lightly Commented HTML Code

Latest postings blogs by this author

More from Technology blogs