Web-scale IA using Linked Open Data


Published on

A preview of the talk I'll be giving at the 2014 IA Summit in San Diego. An introduction to the web of data, and how the BBC and other organisations create products which remix original content with third-party data.

Published in: Technology
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Web-scale IA using Linked Open Data

  1. 1. @MikeAtherton | #IAS14 LINKED OPEN DATA Mike Atherton RedUXD Thepathahead WEB-SCALE IA USING @MikeAtherton | #IAS14
  2. 2. @MikeAtherton | #IAS14 Tim Berners-Lee 1989
  3. 3. @MikeAtherton | #IAS14 Defining standards • Use a common format for publishing documents (HTML) • Use a common system of addresses to identify and locate documents (URL) • Establish a method of contextual linking between documents (HREF hyperlink)
  4. 4. @MikeAtherton | #IAS14 Actual feedback on Tim Berners-Lee’s proposal
  6. 6. @MikeAtherton | #IAS14 What wonderful things we wrote for people! @MikeAtherton | #IAS14
  7. 7. @MikeAtherton | #IAS14 As humans we can extract meaning and context from documents automatically. Spot the difference.
  8. 8. @MikeAtherton | #IAS14 The context of keywords doesn’t travel with them. Tag: “Apple”
  9. 9. @MikeAtherton | #IAS14 We can pick out the important things and relationships just by reading. For humans, the distinction between documents and data is subtle.
  10. 10. @MikeAtherton | #IAS14 By defining real-world things, we can teach computers the relationships between those things. Computers need to be told which things our documents contain.
  11. 11. @MikeAtherton | #IAS14 If a computer knows what ‘Mount Everest’ is and what ‘tall’ means, it can do the legwork for us. “How tall is Mount Everest?”
  12. 12. @MikeAtherton | #IAS14 By understanding terms and linking to data services, computers can even find out things they don’t know. “Where can I get a beer?”
  13. 13. @MikeAtherton | #IAS14 Actual queries from Facebook’s Graph Search tool. Cross-referencing data points gives new insight. http://actualfacebookgraphsearches.tumblr.com/
  14. 14. @MikeAtherton | #IAS14 TED conference 2009
  15. 15. @MikeAtherton | #IAS14 Use web addresses to represent real-world things Tim Berners-Lee Rule #1 of data publishing
  16. 16. @MikeAtherton | #IAS14 Return useful data about each resource, in a standard format. Tim Berners-Lee Rule #2 of data publishing
  17. 17. @MikeAtherton | #IAS14 Include links to other data, so people can discover more things. Tim Berners-Lee Rule #3 of data publishing
  18. 18. @MikeAtherton | #IAS14 Linked data • Use web addresses to represent real-world things • Return useful data about each resource, in a standard format. • Include links to other data, so people can dissever more things. Data sources combined create more insight than studying them separately.
  19. 19. @MikeAtherton | #IAS14 Researchers attempting to discover new drugs to treat Alzheimer’s Disease. “Which proteins are involved in signal transduction AND are related to pyramidal neurons?” Web search 223,000 results, 0 answers Linked healthcare data query 32 results, 32 answers
  20. 20. @MikeAtherton | #IAS14 We can even create entirely new value propositions from remixing existing content. Linked data helps us make sense of information. data.gov.uk Newspaper Hyper-local news publishing Land Registry Price Paid Historical property data Voter power Local constituency data
  21. 21. A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014 @MikeAtherton | #IAS14 CONTENT MODELS AT WEB-SCALE Where next for your content model?
  22. 22. THEME PARKS CONTENT MODEL Location Resort Park Hotel Weenie Land Meal Restaurant Attraction Character Creator Work locatedIn hasWeenie ParentResort locatedIn locatedIn hasEvent features adaptationOf adaptationOfhasAttraction CreatedBy appearsIn hasPark contains
  23. 23. @MikeAtherton | #IAS14 But ideally, those addresses should offer robot-readable data. Use http web addresses to represent real-world things. http://disneyland.disney.go.com/ attractions/disneyland/haunted- mansion/ http://www.geonames.org/ ontology#locatedIn http://en.wikipedia.org/wiki/ New_Orleans_Square The Haunted Mansion (is) located in New Orleans Square
  24. 24. @MikeAtherton | #IAS14 The Resource Description Framework is the web’s lingua franca for data integration. RDF lets different data sources play nice together. <subject> <predicate> <object> <Charles Dickens> <is the author of> <Great Expectations>
  25. 25. @MikeAtherton | #IAS14 RDF is an abstract syntax, so all these ‘serialisations’ are equivalent. RDF can be written in different ways. RDF/XML Turtle <http://dbpedia.org/resource/Charles_Dickens> <http://dbpedia.org/ontology/author> <http://dbpedia.org/resource/Great_Expectations> <http://dbpedia.org/resource/Charles_Dickens> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Person> <?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <dbpedia-owl:Person xmlns:dbpedia-owl="http://dbpedia.org/ontology/" rdf:about=“http://dbpedia.org/resource/Charles_Dickens"> <dbpedia-owl:artist rdf:resource=“http://dbpedia.org/resource/Great_Expectations”/> </dbpedia-owl:Person> </rdf:RDF> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://dbpedia.org/resource/Charles_Dickens> a <http://dbpedia.org/ontology/Person> ; <http://dbpedia.org/ontology/author> <http://dbpedia.org/resource/Great_Expectations> N-Triples
  26. 26. @MikeAtherton | #IAS14 Your data BBC New York Times
  27. 27. @MikeAtherton | #IAS14 But only for humans! What if we had a common way to define a concept for a robot? Wikipedia is great for defining individual concepts.
  28. 28. @MikeAtherton | #IAS14 It turns Wikipedia content into machine-readable linked data. DBpedia is Wikipedia for robots.
  29. 29. @MikeAtherton | #IAS14 Ok computer… • When did Disneyland first open? • What is its official homepage? • Who operates the park? • When is it open? • What’s it’s theme?
  30. 30. @MikeAtherton | #IAS14 It crowdsources music metadata for use by humans and robots. MusicBrainz is the open music encyclopedia.
  31. 31. @MikeAtherton | #IAS14 By saying our concept is the ‘same as’ an accepted identifier, we all speak the same language. Shared identifiers act as intermediaries. http://dbpedia.org/resource/China
  32. 32. A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014 @MikeAtherton | #IAS14 BBC MUSIC How BBC Music used linked data to get more people listening to the radio.
  33. 33. @MikeAtherton | #IAS14 How can I find out which one I should listen to? 10 national BBC radio stations.
  34. 34. @MikeAtherton | #IAS14 A continuously-updated record of every song played on-air. The BBC radio playout system was a data goldmine.
  35. 35. @MikeAtherton | #IAS14 The sources combine to create a new and useful product. Linked data builds a composite picture of the world. +Which song is playing on the radio now? Who is this artist? What other stuff has this artist done? What TV or radio clips of this artist do we have?
  36. 36. @MikeAtherton | #IAS14
  37. 37. @MikeAtherton | #IAS14 By exposing common identifiers, your website becomes its own API. Maintaining identifiers in your URIs makes playing with your stuff easier. http://www.bbc.co.uk/music/artists/4d2956d1-a3f7-44bb-9a41-67563e1a0c94 http://musicbrainz.org/artist/4d2956d1-a3f7-44bb-9a41-67563e1a0c94
  38. 38. @MikeAtherton | #IAS14 Creating an artist profile on MusicBrainz automatically creates a BBC Music artist page. Linked data lets you use the web as a content management system. ! freshonthenet.co.uk/musicbrainz/
  39. 39. @MikeAtherton | #IAS14 Filling in the blanks • The BBC knows when it's played a record by Tom Waits • MusicBrainz knows all the records Tom Waits ever released • DBpedia knows Tom Waits is from San Diego • DBpedia knows Blink-182 are also from San Diego • The BBC knows when it's played a record by Blink-182
  40. 40. @MikeAtherton | #IAS14 The ‘SPARQL Protocol and RDF Query Language’ lets us query linked data as easily as a local database. If you want linked data magic, try SPARQL. SQL: Centralised relational queries SPARQL: Distributed graph queries
  41. 41. @MikeAtherton | #IAS14 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> PREFIX dbpprop: <http://dbpedia.org/property/> ! SELECT ?s ?title ?author WHERE { ?s rdf:type dbpedia-owl:Book. ?s dbpedia-owl:author ?author_uri . ?author_uri dbpedia-owl:birthName ?author . ?s dbpprop:name ?title . FILTER (REGEX(STR(?title), "Great Expectations", "i")) } SPARQL query ‘Who wrote Great Expectations?’ The places where the terms we’ll use are defined Bring back the stuff that matches what I’m about to say… …which is the birth name of anything said to be the author of a book… …but only if that book is titled ‘Great Expectations’
  42. 42. @MikeAtherton | #IAS14 The various sources that make up BBC Wildlife Finder are structured according to the Wildlife Ontology. The ontology defines how everything hangs together. http://dbpedia.org/resource/Giant_panda http://www.bbc.co.uk/programmes/p00k3nx http://www.bbc.co.uk/news/world-asia- china-24784767 http://worldwildlife.org/species/giant-panda http://www.iucnredlist.org/details/712/0 http://www.bbc.co.uk/ ontologies/wildlife/ 2010-11-04.shtml
  43. 43. A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014 @MikeAtherton | #IAS14 ONTOLOGIES Creating, publishing, and consuming the words that help us mean what we say.
  44. 44. @MikeAtherton | #IAS14 BBC Wildlife Ontology http://www.bbc.co.uk/ontologies/wildlife/2010-11-04.shtml
  45. 45. @MikeAtherton | #IAS14 BBC News attempted to model news coverage to better represent how events are related. Ontologies start with a high-level understanding of the IA.
  46. 46. @MikeAtherton | #IAS14 The chronological chain of events and the graph of supporting coverage are modelled to aid understanding. News updates connect in sequence to a storyline.
  47. 47. @MikeAtherton | #IAS14 BBC News give journalists the tools to tag stories with web-scale identifiers as they write. Articles and storylines are tagged with people, places, and subjects.
  48. 48. @MikeAtherton | #IAS14 News Storylines model http://purl.org/ontology/storyline
  49. 49. @MikeAtherton | #IAS14 Detail from the News Storyline ontology http://purl.org/ontology/storyline
  50. 50. @MikeAtherton | #IAS14 Used by NYT to aggregate news stories around themed topic pages. The New York Times offers identifiers for people, places, and subjects.
  51. 51. @MikeAtherton | #IAS14 Mashing up sources of data can yield playful or surprisingly useful results. Linked open data weaves tales of the unexpected.
  52. 52. @MikeAtherton | #IAS14 Earn your stars! make your stuff available under an open license make it available as structured data use non-proprietary formats use URIs to identify things link your data to other data to provide context
  53. 53. @MikeAtherton | #IAS14 Linked Open Data cloud 2007
  54. 54. @MikeAtherton | #IAS14 Linked Open Data cloud 2011
  55. 55. A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014 @MikeAtherton | #IAS14 FIRST STEPS WITH LINKED DATA This all sounds awesome! Now what?
  56. 56. @MikeAtherton | #IAS14 1. Markup content with RDFa • RDFa is RDF embedded into HTML code to state our subject, predicate, and object. • Typically:
 <subject>: The page we’re adding the markup to
 <predicate>: The verb, as defined by an an external vocabulary
 <object>: The external URI we’re expressing a relationship to
  57. 57. @MikeAtherton | #IAS14 RDFa in action <divclass="vote2013-council-meta"resource="http://www.bbc.co.uk/news/politics/councils/[GSSID]"> <divvocab=“http://iptc.org/std/rNews/2011-10-07#”rel="about"resource="http://www.bbc.co.uk/things/[GUID]#id"> <divvocab="http://www.w3.org/2002/07/owl#"rel="sameAs"resource="http://opendatacommunities.org/id/[COUNCIL-TYPE]/[COUNCIL-NAME]"></div> <divvocab="http://www.bbc.co.uk/ontologies/politics#"rel="governsGSS"resource="http://statistics.data.gov.uk/id/statistical-geography/[GSSID]"></div> </div> </div> Define which city council this page is about. Define what we mean by ‘about’ using the rNews vocabulary. State that the city council we’re talking about is the same one referenced at Open Data Communities. Using our own ontology, state that this council governs a region identified on data.gov.uk Thanks to @r4isstatic for this example!
  58. 58. @MikeAtherton | #IAS14 2. Publish an ontology • Ontologies describe your content model in detail, defining the vocabulary for things, types of thing, and types of relationship: • Classes: ‘person’, ‘book’, ‘wine’ • Properties: ‘age’, ‘ISBN’, ‘hasDistillery’ • Individuals: ‘Charles Dickens’, ‘Great Expectations’, ‘Laphroaig’
  59. 59. @MikeAtherton | #IAS14 Many ontologies are published and available for reuse, or to build upon. Ontologies are guidebooks to help us explore and understand subjects. Schema.org General purpose vocabulary FOAF Person-to-person relationships rNews News story publishing
  60. 60. @MikeAtherton | #IAS14 3. Make your CMS work harder • Content management systems mostly suck for this stuff, but some - like Drupal and Umbraco - have growing support for publishing RDF (and other semantic formats). • These systems even have some SPARQL support allowing linked data to be added to your own page views. • Most showcase linked data projects aren’t using an off-the-shelf CMS, but things are improving with more semantically-friendly CMSs like Webnodes and Ximdex.
  61. 61. @MikeAtherton | #IAS14 <rdf:Description rdf:about="/nature/species/Giant_Panda"> <foaf:primaryTopic rdf:resource="/nature/species/Giant_Panda#species"/> <rdfs:seeAlso rdf:resource="/nature/species"/> </rdf:Description> ! <wo:Species rdf:about="/nature/life/Giant_Panda#species"> <rdfs:label>Giant panda</rdfs:label> <wo:name rdf:resource="http://www.bbc.co.uk/nature/species/Giant_Panda#name"/> <foaf:depiction rdf:resource="http://ichef.bbci.co.uk/naturelibrary/images/ic/640x360/g/gi/giant_panda/ giant_panda_1.jpg"/> ! <dc:description>The giant panda is a rare, endangered and elusive <a href="http:// www.bbc.co.uk/nature/life/Bear">bear</a>, making the videos below of a newborn baby giant panda and the remarkable courtship scene filmed in the wild unique. Giant pandas are famous for their love of bamboo, a diet so nutritionally poor that the pandas have to consume up to 20kg each day. The extra digit on the panda's hand helps them to tear the bamboo and their gut is covered with a thick layer of mucus to protect against splinters. Habitat loss is the greatest cause of the giant panda's decline, and today their range is restricted to six separate mountain ranges in western <a href="http://www.bbc.co.uk/nature/places/ China">China</a>. <br/> <br/> <b>Did you know?</b><br/> A giant panda is born pink, hairless, blind and 1/900th the size of its mother. <br/></dc:description> <owl:sameAs rdf:resource="http://dbpedia.org/resource/Giant_Panda"/> http://www.bbc.co.uk/nature/life/Giant_Panda http://www.bbc.co.uk/nature/life/Giant_Panda.rdf Human version Robot version
  62. 62. @MikeAtherton | #IAS14 4. Be a pirate! • Use your model to audit the content you have ready to go. • Find the gaps - the concepts which are important to the subject domain, but which you don’t have content for. • Sail the high seas in search of third-party content or data. • Enrich your content with third-party data, then pay it forward by publishing linked data back out to the web.
  63. 63. @MikeAtherton | #IAS14 Islands of treasure await the brave adventurer.
  64. 64. @MikeAtherton | #IAS14 5. IAs should code <CONTROVERSY KLAXON!> • Unexpected ideas can come from throwing different sources of data together. • A little coding knowledge goes a long way toward building rough prototypes which prove concepts. Right now, more designer-friendly tools don’t exist. • Python and Rails are popular choices among IAs who don’t mind a little hacking. • If UX designers are encouraged to use native web tools, shouldn’t we also?
  65. 65. @MikeAtherton | #IAS14 DBpedia Animal descriptions Freebase Species taxonomy Geonames Location data BBC Wildlife finder Video clips Flickr API Tagged photos ‘Wildlife Near You’ was an experiment in bootstrapping an entire content-rich product from no original content whatsoever.
  66. 66. @MikeAtherton | #IAS14 All the world’s a stage. What stories will we tell?
  67. 67. @MikeAtherton | #IAS14 Consider how your offering benefits the web as a whole. Stitch your content into the fabric of the web.
  68. 68. A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014 @MikeAtherton | #IAS14 THE PATH AHEAD Where next for the web, and for information architecture?
  69. 69. @MikeAtherton | #IAS14 It’s been an amazing ride, but the best is yet to come.
  70. 70. @MikeAtherton | #IAS14 The web was designed to break down barriers.
  71. 71. @MikeAtherton | #IAS14 The web was designed to build bridges of understanding.
  72. 72. @MikeAtherton | #IAS14 https://www.flickr.com/photos/raveland Time to let the robot army do the heavy lifting.
  73. 73. @MikeAtherton | #IAS14 Time to tool up and face the challenges that lie ahead.
  74. 74. @MikeAtherton | #IAS14 ‘Designers should code’. But that’s not code…
  75. 75. @MikeAtherton | #IAS14 That’s code! With zombies.
  76. 76. @MikeAtherton | #IAS14 Today the tools are rough and ready, as once they were for HTML.
  77. 77. @MikeAtherton | #IAS14 The Linked Data and Information Architecture communities have much to discuss.
  78. 78. @MikeAtherton | #IAS14 Information Architecture must continue to evolve, learn from others, and expand its range of influence.
  79. 79. @MikeAtherton | #IAS14 Ready to play?
  80. 80. Thanks for listening. This presentation now available at http://slideshare.net/reduxd To find out more about getting started with Linked Data, visit EUCLID. http://euclid-project.eu/ Dedicated to the coalition of the willing: Silver Oliver @silveroliver Michael Smethurst @fantasticlife Paul Rissen @r4isstatic Tom Scott @derivadow Leigh Dodds @ldodds Chris Sizemore @onpause and the London and Reading Linked Data Meetup groups Interested in content modelling? http://www.slideshare.net/reduxd/ beyond-the-polar-bear
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.