Grauw’s blog

XHTML validation

June 17th, 2004

Mike Davidson writes about invalidation. He introduces a ‘this site does not validate’ image on his site, which has totally worthless code in it. The basic line of thought is, when the browsers start to actually care about such things, you will notice immediately in stead of tiny bits of your site breaking down one by one... or something. Well then, I thought, let’s put this on


Look what your stupid code did to my site!!

Er, look at it with Mozilla (Firefox) or Opera please ^_^. You’re looking at the regular version without that piece of code right now. The same would apply for the MSX Assembly Page by the way. In case you are *cough* too lazy *cough* to try this, what happens is that you get a big orange error message screen, and nothing of the content shows.

The thing is, these sites are made using XHTML 1.1. ‘So what’, you or Mike Davidson might say, ‘my site is made using XHTML as well’. Indeed it is, but the page’s content is still sent using a MIME type of text/html, while the actual MIME type for XHTML is application/xhtml+xml. This is an obvious choice, as Internet Explorer understands crap of the latter type and offers you to download the web page instead. No harm done, XHTML is especially designed to make this possible.

However, web browsers send information in their headers about which content types they accept. These sites of mine check whether the XHTML MIME type is accepted slash supported, and if so present their content using application/xhtml+xml in stead of text/html. Currently, both Mozilla and Opera support this (don’t know about Safari). When they display the site, they use their XML parser instead of their SGML parser.

With this, comes the strict XML validation. Which breaks on Mike’s carefully crafted faulty code. Massively.

Now one might say, ‘then why bother with XHTML’, but despite this there are several advantages of XHTML over HTML... I’ll name a few. First of all, point of XHTML is that it is compatible with XML and can be processed by standard tools. Anyone with a scripting language which has an XML library available for it can basically easily syndicate all content from my site, and if I would at some point in the future want to extract my blog entries into Word documents I could (theoretically ;p) use XML tools such as XSLT. If my page doesn’t validate for XML, all those tools won’t work with it either.

Another reason for using XHTML is that you can easily embed other content defined in XML, such as formulas beautifully rendered using MathML (don’t use IE for that page). Or use XSLT, but this time client-side as a stylesheet. And finally, I’m using XHTML because I am just a sucker for standards :). (Yes, I like unicode too).

Actually, this built-in validation is also quite useful to make sure I immediately notice any nonvalidating code and can fix it :). However, remember that this is only XML validation. It checks for stuff like proper nesting of elements, and quoting of attribute values and escaping of &, < and > characters. It does not check, however, whether an img tag has an alt attribute. Though now that I mention it... that is expressed in the DTD. Guess it just doesn’t bother to check the entire DTD then. Ahwell.

Wanting to validate however means a lot of extra trouble you have to go through, as discussed several times earlier (forgot link). For the MSX Assembly Page I mentioned earlier XHTML is a breeze – the content is static, and I can simply tell the other maintainers to write valid code and test in Firefox or else I’ll be angry.

Difficulties start to emerge when you loose control over the content. This for example happens with user comments which allow style applied to them, which I am working on for this site. Those suddenly need to make sure those style tags are opened and closed correctly, that characters are escaped correctly, and all the other rules of validation are met. And then there is the choice: will I force my visitors to write correctly styled comments, or will I try to make the best out of it if they make any errors? The first is obviously easiest to realize, and the question is how much of a bother it will really be, otherwise it will probably not look as it is supposed to anyway. ‘and if it doesn’t suit you, then just don’t style’ ;p.

Another thing is JavaScript. I am absolutely not against using it as a tool to achieve a means... When used with care, it can really do pretty cool and useful stuff, such as the comment preview system on Mike’s site. Unfortunately, document.write is an absolute no-no in XML. So this nice comment preview system... ain’t gonna happen. The way it is written right now at least. XML definately increases the complexity of some scripts, depending on what you want to do, and much more so than XHTML vs. HTML. Also there are very few pre-made scripts available which work in a strictly validating XML environment, so you do need to know how to program JavaScript, or know someone who does.

In any case, XHTML is really nice, but XML and in particular validation don’t make life a perfect and smooth experience. They make it more difficult in the sense that you must just follow the rules more strictly. But the way I see it, it really is just a matter of making my web site code ‘better’ (I am referring to both the HTML and the PHP scripting). It is basically the difference between ‘it works’ and ‘it works correctly’. Not producing valid code is IMHO a bit lazy programming. In ‘real’ programming (think C++, C#, ASM, etc.) there are exact rules too which have to be followed to the letter or else it won’t work, so why should I ignore those rules when it comes to web design. Hence, I don’t (or try not to).

If you don’t agree with that, fine, that is a choice you make. I can understand not everyone is used to the strict regime of programming, and that somewhat looser rules are appealing to many. It also makes creating web pages easier to learn (though at the same time teaching bad habits). What would make me personally already very happy is having a valid doctype present on your site, preferrably XHTML marked up code or at least HTML Strict, and a careful eye to keep your code structured, not use tables for layout unless really really necessary, and keep things more or less validating using the occasional W3C validator check (alt IS important for disabled people). And the occasional misstep – I won’t miss a night’s sleep for it.

(semi)Trackback: ::[ Empty Spaces ]:: ›› Validation and Webstandards