The sad saga of XHTML;
what happens when markup geeks
HTML’s early days
•Tim Berners-Lee: great humanitarian, LOUSY
document analyst/content modeler.
•HTML was supposedly designed for journal articles in physics.
•Based on your experience modeling articles... was it EVER gonna work for that?
•Early HTML was extremely crude markup.
•Crude in “structure.”
•Crude in appearance (as implemented in web browsers).
•Practically nonexistent interactivity. Documents just sat there; you couldn’t DO
anything on the web except read documents and (every once in a while) click
Result: tag soup!
•“Tag soup:” markup-geek’s disrespectful
term for lousy markup
•“Tag abuse:” markup-geek’s disrespectful
term for using a tag for a reason other
than its structural appropriateness
•Early HTML pages: lots and LOTS of both!
•“Who put <font> tags in my nice neat structural markup?!”
•“What the heck are you doing with table markup?! Stop that!”
•“OMG VALIDATE YOUR HTML, WILL YOU?” “Validation? What’s that?”
•Absolutely terrible for accessibility
•Even tag-abusing HTML won’t make
pretty web pages. Or interactive ones.
•partly due to browsers needing to consume near-completely incompatible,
bizarre, or just plain WRONG markup
•partly due to some browser implementors (MICROSOFT) trying to take
over the brand-new Web
So the W3C said “Stop.”
•And the W3C said “Use the CSS which we have made
for thee to separate structure from presentation.”
•And there was much (well, some) rejoicing!
•And the W3C said “Make not tag soup, but use the
stricter XHTML syntax, and validate thy documents.”
•And web designers said “NOPE.”
•(Adoption of XHTML on the web was essentially zero, except for those who were starting from XML
to begin with.)
•And after a long time, the W3C said “... okay, FINE.
Here’s HTML5, then. You want to be sloppy with tags?
•Human beings are very bad at:
•Checking their work
•This means that human beings are VERY
BAD AT MAKING XML.
•We have a whole course on this at SLIS for a reason! It’s pretty hard for
most people to learn on their own!
•Demanding XML from most human
beings is a loser’s game!!!!!!!!!!!!!
Why am I telling you this?
•Because Very Smart People keep making
the Very Stupid Mistake of demanding
XML from human beings.
•National Science Digital Library: OAI-PMH
•Many, many server-based software packages
•Library supply chain/Impelsys: ONIX from indie/self-publishers (http://
•This never, ever works out well!
•GO YE AND DO NOT DO LIKEWISE.
Postel’s law: a better way
•Be conservative in what you do, be liberal in
what you accept from others.
•Often reworded as “Be conservative in what you send, be liberal in what you
•If you want XML:
•Get the data in a way the other people are comfortable with.
•Plan on having to clean it up. (Automate that as best you can!)
•Turn it into XML yourself.
•Seems like a hassle because it is... but it’s the ONLY THING THAT ALWAYS WORKS.
This presentation is available under a
Creative Commons Attribution 4.0
United States license.