Four formats wrestle with each other for web glory. By their acronyms we shall know them: HTML, XML, JSON, RDF. Sometimes they clash, and sometimes they merge, forming weird and wonderful hybrids. Is there any way for them to work together? I will talk about the problems of mixing models and describe how we are using these formats together in legislation.gov.uk.
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Collisions, Chimera and Consonance in Web Content
1. Collisions, Chimera and Consonance in Web Content
Jeni Tennison
Sunday, 5 February 12
Suggested talking about microdata & RDFa, or about my work on legislation.gov.uk, got the
reply "yes, all of that!" Kinda hard to see how to bring them together, so I've had to go large-
scale...
2. what is the web? hypermedia = HTML
http://www.flickr.com/photos/believekevin/6490737589 from believekevin
Sunday, 5 February 12
In the beginning, the web was about hypertext, and shortly afterwards hypermedia: individual
pages of simple content whose revolutionary power was not a powerful, well-thought-out,
semantic document structure, but the fact they contained links.
3. what is the web? structured documents = XML
http://www.flickr.com/photos/marcus_hansson/87885327 from Marcus Hansson
Sunday, 5 February 12
People with SGML experience thought that the web could provide even more value if it was
not limited to a single, not particularly meaningful, language. This led to the birth and
energetic childhood of XML, one spent doing everything it could possibly do and more.
4. what is the web? (meta)data = RDF
http://www.flickr.com/photos/proimos/6033969880 from Alex E. Proimos
Sunday, 5 February 12
Around the same time, others had the notion that the web was not just for providing
documents, but for providing metadata about those documents, and data about things like
people and traffic and buildings, which gave us RDF and another stack of technologies.
5. what is the web? applications = JSON
Sunday, 5 February 12
Meanwhile computers got faster and web sites became about providing valuable services to
their users rather than access to either documents or data. The focus of web sites turned to
interaction, and to applications. Concise, application-specific messages, easy to use with
Javascript, meant JSON.
6. HTML JSON
XML RDF
four formats different answers
Sunday, 5 February 12
So we have ended up with four formats with which you can deliver content on the web, each
arising from a different view of what the web is.
7. lingua franca application-native data
HTML JSON
concise
hard to get wrong
single source format web-native data
XML RDF
flexible graph model
each format has strengths and weaknesses
Sunday, 5 February 12
Each format has advantages, and so each looks at others advantages jealously:
HTML's ubiquity
XML's flexibility and ease of parsing
RDF's reach to a real-world
JSON's practicality
One result is ghettoisation: "you should not exist! you have no point! I am all that's needed!"
Another result is self-doubt: "what am I here for? what should I be?"
8. I wanna be like you ... or you should be more like me
Sunday, 5 February 12
Another result is merged technologies: ones that seek to gain the benefits of two or more
formats.
"If we make RDF more like HTML, perhaps people will use it"
"If you turned that crappy JSON into XML, perhaps I might use it"
9. HTML microdata
JSON
XHTML RDFa JSON-LD
XML RDF/XML
RDF
hybrid technologies chimera
Sunday, 5 February 12
These hybrid technologies are chimera, constructed from constituent parts of two or more
technologies.
How people judge chimera depends on their background and experience with the
technologies that have been merged.
10. looks a bit stupid but it's cute underneath
Sunday, 5 February 12
11. you can put lipstick on a pig but it's still a pig
Sunday, 5 February 12
12. serendipity something new and wonderful
Sunday, 5 February 12
Sometimes, of course, you might get something wonderful and new in its own right.
Like XSLT! :)
13. chimera are usually ugly foolish or impossible fantasies
Sunday, 5 February 12
The original Chimera was a monster made from a lion, goat and snake.
The term now means a foolish or impossible fantasy.
Trouble with chimera is that when you dress up one format as another, the result seldom has
the advantages of either. To pick the worst offender, RDF/XML is a horrible way to express
RDF, because URLs aren't native in XML, and a horrible pattern for XML because its variability
makes it difficult to process with XML tools.
14. are chimera the only approach?
Sunday, 5 February 12
Are these hybrid technologies the only way of gaining the advantages that the different core
technologies offer?
15. being different is fine if you can work together
Sunday, 5 February 12
Or should we think of these four technologies as being like the members of the A-Team? (I'm
not going to say which I think is who, except RDF is obviously Murdock.)
What does that mean?
- recognise and appreciate their respective strengths and weaknesses; don't try to make one
do what another can do better
- also understand their similarities: a common language, a common goal
16. legislation.gov.uk access and interaction
Sunday, 5 February 12
Public legislation.gov.uk built on XML stack: MarkLogic database, Orbeon pipelines & XSLT,
producing HTML or XHTML.
Now working on editorial site to enable experts to help government team get and keep
legislation up to date. New requirements:
- flexibility in expressing & querying data about relationships between parts of legislation:
we need RDF
- dynamic and interactive site that supports a task: we need JSON
But we don't need chimera: we need JSON designed for JSON, and RDF as RDF, and XML as
XML.
17. leaves and branches named with URLs
Sunday, 5 February 12
What enables them to work together well is what the web really is: URLs that name and
address resources.
URLs enable hand-off. When XML structures are named with URLs, JSON and RDF can point to
document content stored in XML. They provide a common reference point, a common
language.
18. HTML JSON
URLs
XML RDF
consonance through URLs weak, flexible links
Sunday, 5 February 12
URLs that address structures within formats help those formats to be used together. They can
be used for their strengths, without being compromised.
19. languages
data types
URLs
link relations
content types
common micro-syntaxes consonance
Sunday, 5 February 12
URLs are one example of a common language or micro-syntax, used within the core
technologies.
The formats have problems working together when these common languages are not really
common.
- URLs in HTML != IRIs used in XML or RDF
- datatypes in HTML != those defined in XML Schema != those used in RDF (particularly
date/times)
- link relations in HTML != those used in Atom != those used in RDFa
These mismatches cause friction, and the most gnarly problems in dealing with microdata
and RDFa differences are caused by them. But then, no team is perfect.
20. closing thoughts
Sunday, 5 February 12
Strong theme of this conference is reflecting on the role of XML on the web.
XML had a over-achieving youth, where it thought it could do everything, and the realisation
it can't is perhaps a little painful.
We are right to reflect on where we are, and what we want to become.
21. the web is varied complex, dynamic, beautiful
Sunday, 5 February 12
A monoculture web would not survive. The web thrives because it is a diverse ecosystem,
hosting 800lb gorillas and tiny mice with long long tails.
22. so much beneath the crust core qualities != surface qualities
Sunday, 5 February 12
The web is also more than what you see, and it's a mistake to think that only the outwardly
visible parts matter. Without the structures below the crust, it would implode.
Assess XML's role in that context.
23. what changes make sense? chimera or consonance
http://www.flickr.com/photos/randyread/1007678907 from Randy Read
Sunday, 5 February 12
Another theme here is XML's relationship with other technologies, the use of XML
technologies with non-XML formats and how XML might change in the future.
We should be asking:
- are these chimera? are they beautiful new things, or pigs in lipstick?
- do these changes make it XML better at what it does, or not as bad at doing what
something else already does better?
- does this help XML work better in concert with other technologies?
XML will not improve by trying to be someone else, but by working better in the team of web
technologies: by doing its job well, and by communicating well with the others.