ITG
Logo

home

purpose

join itg

itg publication

job bank

itg listserv

hf/Web conferences

library

Internetworking 3.1 Header

contents prev: Workshop-Session Management next: Responses to Error Management Workshop
WORKSHOP

Extensible HyperText Markup Language (XHTML)
Pawan Vora , pawan@nextag.com
NexTag.com

Many of you probably heard that HTML 4.01 (http://www.w3.org/TR/html4/) is the last HTML standard coming out of the W3C (World Wide Web Consortium) and that the future standards will focus on XML. But, then you might have heard that Extensible HyperText Markup Language (XHTML) 1.0 became a W3C Recommendation on January 26, 2000. See http://www.w3.org/TR/xhtml1/. So what's the difference between HTML and XHTML and why this new standard?

To answer those questions, let's consider what's happening in the Web development world.

  1. Cross-browser cross-version compatibility problems
    I am not advocating that the browser companies shouldn't innovate and offer Web designers new and better ways to develop Web pages. However, designing pages that are cross-browser cross-version compatible is painful. Every time a company releases a new browser or a new version, we have to see what new tags were introduced, how we can use those tags to enhance user experience, and what "tweaks" will we need to make to ensure that we don't deny a pleasant experience to visitors using other or older version of browsers.

    With XHTML, the W3C is offering a way to accommodate those extensions to the markup languages and allow old and new markup to co-exist without requiring the users to upgrade their browsers. The reason new additions to the language will no longer break the previous versions is because such new tags will also includee built-in directions on how they should be used. All you'll need to do is let the browsers know via what is referred to as a "namespace" how the new tags should be interpreted. I will show you how this is achieved a little later.

  2. Increasing penetration of hand-held devices and Internet appliances
    I know you don't like those browsers that require 5-6 MB of downloads. But, for a minute, consider why that's the case. Let me give you a hint: If you forget </table> in Internet Explorer, you can still see the table rendered fine in Internet Explorer, but not in Netscape Navigator. So there's your answer. To make browsers forgive such markup errors, the programmers need to incorporate error checking code so that the page renders correctly. Now imagine incorporating such bloated programs on smaller hand-held devices and Internet appliances. That would be prohibitive in terms of the hardware and software requirements. Also, it is quite likely that, by necessity, these new Internet devices will be less forgiving of illegal syntax. Wouldn't it be nice if we can send a smaller subset of "cleaner" HTML code to such devices and have it correctly interpreted by those microbrowsers?

To summarize, with XHTML W3C is trying to achieve the following:

  • Allow extensibility in markup languages.
  • Ensure cleaner markup
  • Make it possible to support a wide variety of devices.

How is XHTML different from HTML?

XHTML is essentially the same as HTML, but is rewritten as an application of XML 1.0 with some good XML features thrown in. The current versions of HTML are written as an application of SGML (Standard Generalized Markup Language). Writing it as an XML application helps achieve both the goals of extensibility and "cleaner" markup. If you are not familiar with XML, I'd suggest reading Extensible Markup Language: Why the Fuss?.

So let's see how we can extend existing HTML code with new markup designed specifically for mathematical expressions. Here's an example from W3C spec:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>A Math Example</title>
  </head>
  <body>
    <p>The following is MathML markup:</p>
    <math xmlns="http://www.w3.org/1998/Math/MathML">
      <apply> <log/>
        <logbase>
          <cn> 3 </cn>
        </logbase>
        <ci> x </ci>
      </apply>
    </math>
  </body>
</html>

In the above example, the text marked red is not part of the HTML specification. However, it can be used in an XHTML document by specifying the namespace using the xmlns -- that is, the XML Namespace. It lets an XHTML-compliant browser know that the tags within the <math> and </math> tags are different from the rest of the document.

Let's take a simpler example to better understand the significance of namespaces. We know that in an HTML document the <title> tag is used to denote the title of the Web page. Suppose that in the same document we wanted to include another <title> tag to indicate a book title. How would a browser know that these two <title> tags are different? We would do that by enclosing the <title> tag for the book title within a unique namespace for a book -- say, for example <book xmlns="http://www.w3.org/1998/Book/BookML">. That way we would provide the browser the context for the two different <title> tags. That's it! That's what the namespaces are for: to provide the necessary context for the browser to avoic conflicts that could be caused by having the same tag names for different purposes.

Writing XHTML: Discipline is good!

If you're like me, you'd like to get started on writing XHTML as soon as possible. So here it is. First, let me warn you. Because XHTML is an application of XML, you will have to be very disciplined about how you write your HTML. Long gone are days where one could forget an end tag and have their friendly browser correctly interprets it for them.

Here are the syntactical rules that XHTML documents will need to follow:

1. All the element and attribute names must be in lower case.
Because XML is case-sensitive, XHTML, an application of XML, is also case sensitive. Therefore, all XHTML tags should be in lower case.

Incorrect:
<HR NOSHADE SIZE="3">
Correct:
<hr noshade size="3" />

This is not good news for many HTML authors, who are used to writing HTML tags in upper case to improve readability. This is also difficult for some of the WYSIWYG HTML editors, which produce HTML tags in uppercase.

2. All attribute values must be quoted
This is true even if the attribute value is a number. And the quotes must be double quotes (") and not single quotes (').

Incorrect:
<hr noshade size=3>
Correct:
<hr noshade size="3" />

3. All tags must close (if it's an empty element, see 4 below)

Incorrect:
<li>My Item 1
Correct:
<li>My Item 1</li>

4. Empty elements such as <hr> must have a trailing slash
Empty elements are those that are complete by themselves and do not require any additional content. For example, <hr> is an empty element, but <h1> is not; we need to specify a heading (content) between <h1> and </h1>. Examples of other empty elements are <hr>, <br>, and <img src="...">.

Incorrect:
<hr noshade size="3">
Correct:
<hr noshade size="3" />

If you notice, I've put a space before the trailing slash. Although this is not required in XHTML (or XML), it is important for backward compatibility with existing HTML browsers. I'll explain this a little later.

5. All elements must nest correctly and not overlap.
This is based on the XML requirement that the document be well-formed.

Incorrect:
<b>Are you <i>there?</b></i>
Correct:
<b>Are you <i>there?</i></b>

6. Minimized attributes must be stated as an attribute-value pair.
In HTML tags we use boolean attributes like nowrap and checked by themselves; their presence indicates that the value is "true." In XHTML, however, they must include the value of the attribute.

Incorrect:
<td nowrap>
Correct:
<td nowrap="nowrap">

7. Include DOCTYPE declaration for every document
If you did not includee a DOCTYPE declaration in your HTML documents, you're not alone. And most browsers do not require it. A DOCTYPE declaration basically tells the browser which DTD (Document Type Definition) the HTML document uses. A DTD essentially describes a set of rules and syntax that a document follows. Since a majority of Web pages are HTML, the DOCTYPE declaration was essentially redundant. However, with XHTML you must specify the DTD that your document uses. There are three flavors of the DTD: strict, transitional, and frameset.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">

You should use STRICT declaration when you're doing all of your formatting using Cascading Style Sheets (CSS). That is you aren't relying on <font> and <table> tags to control the presentation and layout of your page.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd">

You should use TRANSITIONAL declaration when you're NOT doing all of your formatting using Cascading Style Sheets (CSS). Most of us don't because we don't want users with non-CSS browsers to have a compromised experience.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "DTD/xhtml1-frameset.dtd">

The FRAMESET declaration is used when you're using FRAMES.

8. All scripts and style declarations must be in CDATA elements
Any script that you use on your Web page must now be surrounded with an XML CData element as shown below:

<script language="JavaScript">
<![CData[
function calcTotal() {
...
}
]]>
/script<

Making XHTML Compatible with Existing Browsers

Although most current crop of browsers do not support XHTML, it doesn't mean that we can't start writing HTML. If you start taking some care in how you write HTML now, you'll be closer to compatibility with XHTML when the browsers do start supporting it and requiring XHTML from Web authors.

Some of these compatibility guidelines are listed below:

  • Make all tags lowercase.

  • Make sure that you nest all your tags correctly as explained in the previous section.

  • Include a space before the trailing slash of empty elements.
    So write <hr /> instead of <hr/>. If you've been doing HTML for some time, you know that if a tag or its attribute is not "understood" by a browser, it's simply ignored. So when a browser sees <hr/>, because there is no space between "hr" and the trailing slash, the browser may consider it as an unknown tag and ignore it. Putting a space before the trailing slash solves that problem.

  • Do not change from <p> to <p />. Instead, use <p>....</p>.
    Although many of use the paragraph tag or the <p> as an empty presentation element (to create an extra line of space above or below a text paragraph, it is actually a non-empty element and signifies the beginning of a new paragraph. Therefore, it must have a closing </p> element. So be careful and not replace all your <p> tags to <p />

  • Avoid line breaks and multiple white space characters within attribute values.
    Browsers interpret extra spaces and line breaks inconsistently. So it's safer not to have any spaces or line breaks within attribute values.

    Incorrect:
    <a name="    section6  ">
    Correct:
    <a id="section6">

  • For anchors use both "name" and "id" attributes.

    Because XHTML doesn't support the name attribute, you must use the id attribute to identify elements. So when creating an anchor, you would use the id attribute and not the name attribute.

    Incorrect:
    <a name="section6">
    Correct:
    <a id="section6">

    No doubt, there will be a major update to JavaScript as a result, which relies upon the name attribute to identify elements in the Document Object Model.

  • Quote boolean attributes.
    So instead of writing <td nowrap>, write <td nowrap="nowrap">. And, instead of writing <input type="checkbox" checked>, write <input type="checkbox" checked="checked">

Testing Correctness of XHTML Syntax

Undoubtedly, we will make mistakes when creating XHTML-compatible documents. And it wouldn't be a very satisfying job in the world to figure out where we messed up in our syntax. Until HTML editors catch up with this, consider using HTML Tidy developed by Dave Raggett. It's downloadable for free at http://www.w3.org/People/Raggett/tidy/

Just to get used to writing XHTML, I tried to follow XHTML conventions on this page. If you're interested in knowing how different this page looks, view source of this document and you will get the idea. For your information, I did run HTML Tidy program built-in Allaire HomeSite 4.5.1 to make sure that I did it right. Interestingly, it generated 5 errors. All the errors were caused because I did not includee the summary attribute in my table tag.

And here's the reason it gave:

The table summary attribute should be used to describe the table structure. It is very helpful for people using non-visual browsers. The scope and headers attributes for table cells are useful for specifying which headers apply to each table cell, enabling non-visual browsers to provide a meaningful context for each cell.

Very cool!

What else is new with XHTML?

One of the important extension to the basic HTML form elements, is called FML or XForms. Perhaps, I'd explain that in one of the upcoming workshops. In the meantime, if you are interested in learning more about them, visit W3C page on XForms at http://www.w3.org/MarkUp/Forms/.


If you have any questions or suggestions about these workshops, feel free to send me an e-mail at pvora@mindspring.com.

References

contents prev: Workshop-Session Management next: Responses to Error Management Workshop

© Internet Technical Group
Last update: April 30, 2000
URL: http://www.sandia.gov/itg/newsletter/mar00/workshop_xhtml.html
hosted by Sandia National Labs

Disclaimer: Neither Sandia Corporation, the United States Government, nor any agency thereof, nor any of their employees makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately-owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by Sandia Corporation, the United States Government, or any agency thereof. The views and opinions expressed herein do not necessarily state or reflect those of Sandia Corporation, the United States Government or any agency thereof.