Categories:

XHTML explained

This tutorial written and contributed by Dev Articles.

I’m sure you’ve heard people mumbling about XHTML and how it combines HTML and XML to create the "next generation" of HTML. In this article, I’m going to give you a quick run down of what XHTML is, its benefits, how it can be used, and what it looks like.

I will assume that you are familiar with Hypertext Markup Language (HTML), which is the fundamental building block of the web. You should also know what XML and DTD's are. I will build on this foundation by comparing and contrasting the current version of HTML (version 4.0), with the latest version of XHTML, version 1.0.

XHTML Explained

XHTML is HTML "reformulated" to conform to the current Extensible Markup Language (XML) standard, version 1.0. Imagine taking the best parts from the HTML language and mixing them with all of the great aspects of XML… then you’re coming close to imagining the power and flexibility of XHTML.

XHTML has much stricter language syntax that HTML, however. To create fully valid XHTML documents, they must adhere to these rules/guidelines:

All tags must be closed

With normal HTML documents, some browsers will still render the contents of a <table> even if you don’t close the table with a </table> tag. This allows developers to become lazy and forgetful. The tags within an XHTML document must always be nested correctly and closed properly.

If we have the following HTML 4.0 compliant table:

<table width="100%">
<tr>
<td>
<p><b>Welcome to my page
</td>
</tr>
</table>
<hr>

you can see straight away that the <p>, <b>, and <hr> tags aren’t closed. This is a big no-no for XHTML documents and will raise a parser error, because all tags must be closed (yes, even the <p> tag).

The XHTML 1.0 compliant version of the table shown above looks like this:

<table width="100%">
<tr>
<td>
<p><b>Welcome to my page</b></p>
</td>
</tr>
</table>
<hr />

Notice how the <p>, <b>, and <hr> tags are now closed? To close tags like <hr>, we can simply add a space and forward-slash within the tag, like this: <hr />.

Attributes must contain quoted values

All tag attributes, such as <p align="center"> must be enclosed within double quotes. You no longer have the choice of either single or double quotes. Also, for attributes which have no value, or aren’t quoted such as

<option checked>1</option>

you must assign a value to that attribute (even though it wont be used), and surround it in double quotes, for example:

<option checked="checked">1</option>

All element and attribute names must also be lower case.

Be careful with special characters

Because of the way XHTML documents are validated and must conform to specific rules, HTML comments like this:

<!-- This is a comment -->

as well as inline style sheets and inline JavaScript should always be removed from your XHTML document. You should store them in separate .css and .js files respectively, and reference them like this:

<link rel="stylesheet" type="text/css" href="mystyle.css" />

for style sheets, or:

<script language="JavaScript" src="mystuff.js"></script>

for JavaScript files.

If you're using other HTML characters such as <, > and & in attribute values, for example, then they should be replaced with their corresponding HTML entity representations such as "<", ">" and "&" respectively.

Last but not least, with the exception of form input elements such as <input>, <select>, you should use the "id" attribute instead of "name" when attaching attributes to an element. In XHTML documents, the "name" attribute is rendered useless (again, apart from form elements), and belongs back with HTML 4.0.