WTF is the Semantic Web and Linked Data


Talk given at UT ISchool on Nov 17, 2011

  1. 1. WTF is the Semantic Web and Linked Data Juan F. Sequeda Department of Computer Science University of Texas at Austin Nov 17, 2011
  2. 2. Semantic Web? Linked Data? WTF?
  3. 3. WTF is the Semantic Web?
  4. 4. WTF is the Semantic Web?
  5. 5. Internet != Web
  6. 6. What is the Web?“… the Web, is a system ofinterlinked hypertext documentsaccessed via the Internet. With aweb browser, one can view webpages that may contain text,images […] and navigate betweenthem via hyperlinks”
  7. 7. Current Web = internet + links + docs
  8. 8. History of the Web• Created by Tim Berners-Lee at CERN in 1989• Mosaic browser in 1993• W3C created in 1994• Exponential growth mid 90s• Amazon, Ebay – 1995• Search engines – Google 1998• Dot-com boom 1997 – 2001• Web 2.0 – blogs, Facebook, Twitter, etc
  9. 9. What is the problem?
  11. 11. What is the problem?• The web is full of documents• We aren’t always interested in documents – We are interested in THINGS – These THINGS might be in documents• We can read a HTML document rendered in a browser and find what we are searching for – This is hard for computers. – Computers have to guess (even though they are pretty good at it)
  12. 12. The Web of Documents SearchSearchEngineCrawler
  13. 13. The Web is a Data ShredderStructured Unstructured Data Data Thanks Martin Hepp
  14. 14. What would we like?• Make it easy for computers/software to find THINGS Do you SEARCH or do you FIND?
  15. 15. Search for Football Players who went to theUniversity of Texas at Austin, played for the Dallas Cowboys as Cornerback
  16. 16. Why can’t we just FIND it…
  17. 17. Guess how I FOUND out?
  18. 18. On a Semantic Web• Besides publishing documents on the web – which computers can’t understand easily• Let’s publish on the web something that computers can understand DATA
  19. 19. The Semantic Web is a web of data The current web is a web of documents
  20. 20. But wait… doesn’t the web already have data?
  21. 21. Current Data on the Web• Relational Databases• APIs• XML• CSV• XLS• …• Can’t computers and applications already consume that data on the web?
  22. 22. Yes! But it is all in differentformats and data models!
  23. 23. This makes it hard to integrate data
  24. 24. The data in differentdata sources aren’t linked
  25. 25. For example, how do I know that the Juan Sequeda in Facebook is the same as Juan Sequeda in Twitter
  26. 26. Or if I create a mashup fromdifferent services, I have to learndifferent APIs and I get different formats of data back
  27. 27. Data is Siloed
  28. 28. Wouldn’t it be great if we had astandard way of publishing data on the Web?
  29. 29. We have a standardized way ofpublishing documents on the web, right? HTML
  30. 30. Then why can’t we have a standardway of publishing data on the Web?
  31. 31. Good question! And the answer is YES. There is! RDF
  32. 32. Resource Description Framework (RDF)• Data Model = a way to model data – i.e. Relational databases use relational data model• RDF is a graph data model
  33. 33. Key Value vs Graph• Key Values – firstName Juan – lastName  Sequeda – livesIn  Austin – knows  Stephane Corlosquet• But what are these key/values describing? – ME!
  34. 34. RDF is a Graph• Let’s group the Key/Values together – <JuanSequeda> <firstName> “Juan” – <JuanSequeda> <lastName> “Sequeda” – <JuanSequeda> <livesIn> “Austin” – <JuanSequeda> <knows> <StephaneCorlosquet> – .. – <StephaneCorlosquet> <firstName> “Stephane” – <StephaneCorlosquet> <lastName> “Corlosquet” – <StephaneCorlosquet> <livesIn> “Boston”
  35. 35. Identifier forthe “group” RDF is a Graph Key/Value• Let’s group the Key/Values together – <JuanSequeda> <firstName> “Juan” – <JuanSequeda> <lastName> “Sequeda” – <JuanSequeda> <livesIn> “Austin” – <JuanSequeda> <knows> <StephaneCorlosquet> – .. – <StephaneCorlosquet> <firstName> “Stephane” – <StephaneCorlosquet> <lastName> “Corlosquet” – <StephaneCorlosquet> <livesIn> “Boston”
  36. 36. RDF can be serialized in different ways• RDF/XML• RDFa (RDF in HTML)• N3• Turtle• JSON
  37. 37. RDFa
  38. 38. RDF/XML
  39. 39. RDF/N-triples
  40. 40. RDF/Turtle
  41. 41. So does that mean that I have to publish my data in RDF now?
  42. 42. You don’t have to… but we would like you to  Rich Snippets …
  43. 43. An example
  44. 44. Document on the Web
  45. 45. Databases back up documents THINGS have PROPERTIES: A Book as a Title, an author, …Isbn Title Author PublisherID ReleasedData978-0-596- Programming Toby Segaran 1 July 200915381-6 the Semantic Web… … … … … PublisherID PublisherNameThis is a THING:A book title “Programming the 1 O’Reilly MediaSemantic Web” by Toby Segaran, … … …
  46. 46. Lets represent the data in RDFIsbn Title Author PublisherID ReleasedData978-0- Programming Toby 1 July 2009596- the Semantic Segaran15381- Web6 Programming thePublisherID PublisherName title Semantic Web1 O’Reilly Media author book Toby Segaran isbn 978-0-596-15381-6 publisher name Publisher O’Reilly
  47. 47. Remember that we are on the web Everything on the web is identified by a URI
  48. 48. And now let’s link the data to other data Programming the title Semantic Web http://…/i author Toby Segaran sbn978 isbn 978-0-596-15381-6 publisher http://…/p name ublisher1 O’Reilly
  49. 49. And now consider the data from Revyu.comhttp://…/ hasReview http://…/i review1 sbn978 descriptionreviewer Awesome Book http://…/ name reviewer Juan Sequeda
  50. 50. Let’s start to link data http://…/ hasReview http://…/i review1 sbn978 Programming the description title Semantic WebhasReviewer owl:sameAs Awesome author http://…/i Book Toby Segaran sbn978 http://…/ name reviewer isbn 978-0-596-15381-6 Juan publisher Sequeda http://…/p name ublisher1 O’Reilly
  51. 51. Juan Sequeda publishes data toohttp://juanse livesIn name Juan Sequeda
  52. 52. Let’s link more data http://…/ hasReview http://…/i review1 sbn978 descriptionhasReviewer Awesome Book http://…/ name reviewer sameAs Juan Sequeda http://juanse livesIn name Juan Sequeda
  53. 53. And more http://…/ hasReview http://…/i review1 sbn978 Programming the description title Semantic WebhasReviewer owl:sameAs Awesome author http://…/i Book Toby Segaran sbn978 http://…/ name reviewer isbn 978-0-596-15381-6 owl:sameAs Juan publisher http://…/p Sequeda name ublisher1 O’Reilly http://juanse livesIn name Juan Sequeda
  54. 54. Data on the Web that is in RDF and is linked to other RDF data is LINKED DATA
  55. 55. Linked Data Principles1. Use URIs as names for things2. Use HTTP URIs so that people can look up (dereference) those names.3. When someone looks up a URI, provide useful information.4. Include links to other URIs so that they can discover more things.
  56. 56. Linked Data makes the web appear as ONE GIANT HUGE GLOBAL DATABASE!
  57. 57. I can query a database with SQL. Is there a way to query Linked Data with a query language?
  58. 58. Yes! There is actually astandardize language for that SPARQL
  59. 59. FIND all the reviews on the book“Programming the Semantic Web” by people who live in Austin
  60. 60. SPARQLSELECT ?review ?commentWHERE { isbn:978 ex:hasReview ?review . ?review ex:description ?comment . ?review ex:hasReviewer ?person . ?person ex:lives dbpedia:Austin .}
  61. 61. SELECT ?review ?comment WHERE { isbn:978 ex:hasReview ?review . ?review ex:description ?comment . http://…/ hasReview ?review ex:hasReviewer ?person . http://…/i review1 ?person ex:lives dbpedia:Austin . sbn978 } Programming the description title Semantic WebhasReviewer sameAs Awesome author http://…/i Book Toby Segaran sbn978 http://…/ name reviewer isbn 978-0-596-15381-6 sameAs Juan publisher Sequeda http://…/p name ublisher1 O’Reilly http://juanse livesIn name Juan Sequeda
  62. 62. OWL• Here is where the real semantics shows up• Web Ontology Language• Define schema/vocabulary• Classes, Properties, Inheritance, etc• Subclasses, Subproperties• …• You can get more complicated with rules…
  63. 63. auth: <> dexa: <> dc: <> sw: <> swrc: <> owl: <> rdf: <> rdfs: <> swrc:Publicatio n rdfs:subClassOf dc:creatorswrc:InProceedings foaf:Person OWL rdf:type owl:sameAs rdf:type auth:Juan_Sequeda sw:juan-f-sequeda dc:creator owl:sameAsdexa:TirmiziSM08 auth:Daniel_P._Miranker sw:daniel-miranker dc:title owl:sameAs auth:Syed_Hamid_Tirmizi sw:syed-tirmizi “Translating SQLApplications to the Semantic Web" RDF
  65. 65. This looks cool, but let’s be realistic. What is the incentive to publish Linked Data?
  66. 66. What was your incentive topublish an HTML page in 1990?
  67. 67. 1) Share data in documents2) Because you neighbor was doing it … later on … 3) Marketing, Advertising, …, SEO
  68. 68. So why should we publish Linked Data in 2011?
  69. 69. 1) Share data as data2) Because you neighbor is doing it … 3) Marketing, Advertising, SEO ++
  70. 70. Linked Data Publishers• UK Government• US Government• BBC• Open Calais – Thomson Reuters• Freebase/Google• NY Times• Best Buy• Sears• Kmart•• CNET• Dbpedia• O’Reilly Media• …
  71. 71. May 2007
  72. 72. Oct 2007
  73. 73. Nov 2007
  74. 74. Feb 2008
  75. 75. Mar 2008
  76. 76. Sept 2008
  77. 77. Mar 2009 (1)
  78. 78. Mar 2009 (2)
  79. 79. July 2009
  80. 80. September 2010
  81. 81. September 2011Linking Open Datacloud diagram, byRichard Cyganiak andAnja Jentzsch.
  83. 83. What is the Web• Web of Documents  HTML• Web of Data  RDF• Global Unique IDs  HTTP URIs• Schema/Ontologies  OWL• Query RDF  SPARQL
  84. 84. Now what can we do with this data?
  85. 85. Generic Applications
  86. 86. Linked Data Browsers
  87. 87. Linked Data Browsers• Not actually separate browsers. Run inside of HTML browsers• View the data that is returned after looking up a URI in tabular form• User can navigate between data sources by following RDF Links• (IMO) No usability
  88. 88. Linked Data Browsers•• Tabulator• OpenLink Dataexplorer• Zitgist• Marbles• Explorator• Disco• LinkSailor
  89. 89. Linked Data (Semantic Web) Search Engines
  90. 90. Linked Data (Semantic Web) Search Engines• Just like conventional search engines (Google, Bing, Yahoo), crawl RDF documents and follow RDF links. – Current search engines don’t crawl data, unless it’s RDFa • Human focus Search – Falcons - Keyword – SWSE – Keyworkd – VisiNav – Complex Queries• Machine focus Search – Sindice – data instances – Swoogle - ontologies – Watson - ontologies – Uberblic – curated integrated data instances
  91. 91. (Semantic) SEO ++• Markup your HTML with RDFa• Use standard vocabularies (ontologies) – Google Vocabulary – Good Relations – Dublin Core• Google and Yahoo will crawl this data and use it for better rendering
  92. 92. On-the-fly Mashups
  93. 93.
  94. 94. Domain Specific Applications
  95. 95. Domain Specific Applications• Government – – –• Music –• Dbpedia Mobile• Life Science – LinkedLifeData• Sports – BBC World Cup
  96. 96. Faceted Browsers
  97. 97.
  98. 98. Query your data
  99. 99. Find all the locations of all the original paintings of Modigliani
  100. 100. Select all proteins that are linked to a curated interaction from the literature and to inflammatory response
  101. 101.
  102. 102. Links to other Data Sources
  103. 103. Linked Data is Data Integration SPARQL QueryDiamond Ultrawrap Ultrawrap Specify Ultrawrap Morphster Morphbank
  104. 104. Example 1 (Specify – DBpedia)• Get full name and guid from taxon with id axon51807#thing• AND fin any subjects it may have “skos:subject”
  105. 105. Result Example 1• Note that Australia comes from a different data source (
  106. 106. Example 2 (Specify-Morphbank)• Get full name and guid from taxon with id axon42947#thing• AND the rank and kingdom from Morphbank
  107. 107. Result Example 2• Note that full name and guid come from Specify axon42947• AND rank and kingdom come from Morphbank ata/taxa398354
  108. 108. The killer app for A little semanticsSemantic Technology is goes a long wayYOUR life (online) - Jim Hendler – Tom Gruber Knowledge is Power Occupy Your Data - Jim Hendler - Tim Finin Linked Data is the (Semantic) Web done rightThe novel part of the - Tim Berners-LeeSemantic Web is not theSemantics, but the Web - Frank van Harmelen RAW DATA NOW - Tim Berners-Lee
  109. 109. QUESTIONS?