May 13, 2002
Combining XML and XHTML
Last Friday, I demonstrated how the CSS display attribute can be used to force HTML elements to be rendered a block-level or inline elements. In the example, I used a series of nested SPAN elements and forced them to be displayed as block elements via the HTML class attribute.
More than one astute reader pointed out to me that this approach, while legal, is discouraged by the W3C:
CSS gives so much power to the "class" attribute, that authors could conceivably design their own "document language" based on elements with almost no associated presentation (such as DIV and SPAN in HTML) and assigning style information through the "class" attribute. Authors should avoid this practice since the structural elements of a document language often have recognized and accepted meanings and author-defined classes may not. -- CSS2 Specification 5.8.3
It’s probably best to avoid using CSS classes to override the semantic structure of markup, especially where long-established document types like HTML are concerned. There is, however, another option: embedding XML directly into HTML.
The ability to combine non-HTML elements with XHTML is limited to a handful of browsers, specifically IE5.x, Mozilla and Netscape 6.2. Hopefully as browsers continue to evolve, we’ll see more powerful implementations that allow web developers to do crazy things like render SVG and MathML directly into HTML pages, without the requisite plug-in. For now, we’re limited to what the browser can do: styling custom XML element with CSS.
First, ya gotta start with an XML document. scottandrew.com is valid XHTML 1.0 Strict, which means it is also valid XML. This enables me to take advantage of XML namespaces to combine my XHTML with other non-HTML element types.
View the source of scottandrew.com, and you’ll see this:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
The xmlns attribute declares that that default namespace for this element is bound to the XHTML 1.0 specification. This tells any interested processor (like a browser) that the HTML tag and all elements it contains belong to the same default namespace.
Now, let’s look at the sample XML from Thursday’s post:
<quote>
<character>PROSPERO:</character>
<text>
For you, most wicked sir, whom to call brother<br/>
Would even infect my mouth, I do forgive<br/>
Thy rankest fault, — all of them; and require<br/>
My dukedom of thee, which perforce, I know,<br/>
Thou must restore.
</text>
</quote>
If I were to insert this directly into my XHTML page, the browser would ignore the QUOTE, CHARACTER and TEXT elements, and display them without special formatting. Typically, browsers ignore non-HTML elements.
So, let’s declare a namespace for these XML elements, by placing an xmlns attribute in the root QUOTE element. This namespace will apply to all elements contained within the QUOTE:
<quote xmlns="http://www.scottandrew.com">
This particular namespace value is arbitrary. There are guidelines for declaring namespaces; the bottom line is a namespace should be a unique and persistent URI. (Check out how the W3C’s XHTML namespace URI points to a placeholder page.)
Once the namespace is declared, I can apply CSS to the non-HTML elements. Simply declare each element as either block or inline as appropriate. Not all browsers can handle this mix of elements; the following example should only work in Netscape 6.2 or Mozilla (IE5 and Opera users should see unstyled text):
For you, most wicked sir, whom to call brother
Would even infect my mouth, I do forgive
Thy rankest fault, — all of them; and require
My dukedom of thee, which perforce, I know,
Thou must restore.
Here’s the CSS used in the above example:
quote {
display:block;
background-color:#eeeeee;
border: 1px outset #cccccc;
margin: 10px 50px;
color: #990000;
font-size: 12px;
font-family: Courier New,sans-serif;
padding: 10px;
}
character {
display:inline;
font-weight:bold;
}
text {
display:block;
}
Admittedly an XHTML document that uses additional XML namespaces does not strictly conform to the XHTML spec. The above example now invalidates my markup. In addition, IE5.x (dunno about 6.0) doesn’t acknowlege additional namespaces declared in child elements inside the default namespace. To be compatible with both IE5.x and Mozilla, I need to move my additional namespace up into the HTML element.
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:sa="http://www.scottandrew.com" …
Declaring all namespaces in the root element of a document is fairly common practice. Notice, however, that I’ve associated a special prefix for my custom namespace, sa. In order to use elements bound to my namespace, I must use this prefix in the element name with a joining colon:
<sa:quote>
<sa:character>PROSPERO:</sa:character>
<sa:text>
For you, most wicked sir, whom to call brother<br/>
Would even infect my mouth, I do forgive<br/>
Thy rankest fault, — all of them; and require<br/>
My dukedom of thee, which perforce, I know,<br/>
Thou must restore.
</sa:text>
</sa:quote>
The namespace prefix is particularly handy for avoiding namespace collisions between elements with the same name. For example, I may want to use a custom TITLE element to contain the title of a movie. A namespace prefix would prevent my TITLE element from conflicting with the HTML TITLE element.
Now I have to go back to my CSS and add the prefix to the element names there, escaping the colon character with a backslash:
sa\:quote {
display:block;
background-color:#eeeeee;
border: 1px outset #cccccc;
margin: 50px 50px;
color: #990000;
font-size: 12px;
font-family: Courier New,sans-serif;
padding: 10px;
}
…etc.
Unfortunately, the W3C validator doesn’t like this setup either. If validation is still important at this point, I might want to consider changing my document type from XHTML to pure XML, and include both the XHTML and custom namespaces in whatever root element I choose to begin with. That’s a little far to go for a Web page, so for now I’ll be satisfied with embedding XML fragments in my XHTML, rather than the other way around.
With this approach, I can create a custom set of elements with meaningful structure, without having to resort to CSS classes to override the established meaning and functionality of DIV and SPAN.










