Introduction tothe Semantic Web and Linked Data

2,385 views
2,312 views

Published on

This was presented to the San Francisco chapter of DAMA International on June 9, 2010 at SAP in Palo Alto, California.

Published in: Education

Introduction tothe Semantic Web and Linked Data

  1. 1. Introduction to the Semantic Web and Linking Data Eric Axel Franzon Vice President Semantic Universe/ Wilshire Conferences
  2. 2. About Me • Professional • Wilshire Conferences • Semantic Universe • W3C • Guidewire Group • Coach / Consultant / Trainer • Geek
  3. 3. Today we will talk about: • Semantic Technologies • Semantic Web & Web 3.0 • Linked Data – Linked Open Data – Linked Enterprise Data • Use cases • That harmonica on the first slide
  4. 4. Semantic Technologies Semantic Web
  5. 5. Web Technologies World Wide Web
  6. 6. Semantic Web = Web 3.0 = Web of Data
  7. 7. www.geekandpoke.com
  8. 8. What is the Web of Data Not? • A software package • Something that will ever “be complete” • A replacement for the current Web • A pipe dream • A silver bullet
  9. 9. It’s also not… • HAL 9000
  10. 10. It’s also not… • Skynet
  11. 11. What is the Web of Data? • A Web-scale architecture • A metadata technology • A layer of meaning on the existing Web • In use TODAY!
  12. 12. Web of Data
  13. 13. Q: What does Linked Data have to do with the Semantic Web?
  14. 14. Web 1.0 – Linking Documents
  15. 15. Web 1.0
  16. 16. Web 1.0 “I see: characters + formatting + images” --my Computer
  17. 17. Web 1.0 – Linking Documents Web 2.0 – Linking People
  18. 18. Web 2.0
  19. 19. Web 2.0 “I see: characters + formatting + images” --my Computer
  20. 20. Web 1.0 – Linking Documents Web 2.0 – Linking People Web 3.0 – Linking Data
  21. 21. Web 3.0 – Linking Data Title Publisher Format Author Price Cover
  22. 22. Web 3.0 – Linking Data Title Publisher Format Author “I see: things Price Cover + relationships. This information is about a book.”
  23. 23. Semantic Technologies Semantic Linked Web Open Data
  24. 24. Linking Open Data Project May, 2007
  25. 25. March 2009
  26. 26. Data from these trusted sources is available for you to use in your applications TODAY. Data you can LINK to. And not just data…
  27. 27. Semantic Data that is not only machine READABLE. It is machine UNDERSTANDABLE!
  28. 28. Disambiguation
  29. 29. Disambiguation mole, n.
  30. 30. But…
  31. 31. Metadata Doctorow’s Criticisms LOD/LED Response “People lie” Allow users to choose a social trust model Automate where possible and encourage “People are lazy” authoring where needed Automate where possible, check where “People are stupid” possible “Mission Impossible: know thyself” Allow multiple sources of metadata “Schemas aren’t neutral” Allow multiple schemas “Metrics influence results” Allow multiple metrics “There’s more than one way to describe Allow multiple descriptions something”
  32. 32. LOD/LED is flexible
  33. 33. How does LOD/LED work? 1. By uniquely identifying THINGS 2. By uniquely identifying RELATIONSHIPS 3. By using TRIPLES
  34. 34. How does LOD/LED work? 1. By uniquely identifying THINGS So, what’s a THING?
  35. 35. A THING is anything that can be uniquely identified by a URI or a literal (string) Me http://twitter.com/ericaxel My postal code http://www.city-data.com/zips/90043.html The White House Lat: 38.89859 Long: -77.035971 L.A. County’s sales tax rate 9.750 % http://ericfranzon.com/operator.jpg
  36. 36. This is a collection of THINGS: t_people Name City State Post code David Fredericksburg VA 22408 Eric Culver City CA 90230
  37. 37. Trees and Tables t_people Name City State Post code David Fredericksburg VA 22408 Eric Culver City CA 90230 people David Eric City City State Post State Post code code Fredericksburg VA 22408 Culver City CA 90230
  38. 38. Trees and Tables – Problem 1 Name City t_people State Post code flag Adding partial David Fredericksburg VA 22408 1 data to Eric Culver City CA 90230 tables leads to people sparseness flag 1 David Eric City City State Post State Post code code Fredericksburg VA 22408 Culver City CA 90230
  39. 39. Trees and Tables – Problem 2 t_people Common data Name City State Post code leads to (lots!) David Culver City CA 90230 of duplication Eric Culver City CA 90230 people David Eric City City State Post State Post code code Culver City CA 90230 Culver City CA 90230
  40. 40. Graphs people flag 1 David Eric City City Post Post code code Culver City State State CA 90230
  41. 41. How does LOD/LED work? 1. By uniquely identifying THINGS 2. By uniquely identifying RELATIONSHIPS Who’s your daddy?
  42. 42. Is Father of mailto:ericaxel@yahoo.com <owl:ObjectProperty rdf:ID="isFather"> <rdfs:domain rdf:resource="#Person"/> <rdfs:range rdf:resource="#Person"/> </owl:ObjectProperty>
  43. 43. 1. By uniquely identifying THINGS 2. By uniquely identifying RELATIONSHIPS 3. By using TRIPLES What’s a triple?
  44. 44. Triples? It’s Elementary! (School) book has title. Relationship Predicate That is a Triple!
  45. 45. Triples? It’s Elementary! “This book has a title.” “Eric wrote this Web page.” “This article is about moles.” “I like blues.” “I like B.L.U.E.S.” “This image can be used non-commercially.” “My email address is ericaxel@yahoo.com.”
  46. 46. Triples Book Has Title “Title” Created Objects Subjects Eric Webpage Has License CC Non- Image Commercial Predicates
  47. 47. Author Title Book ISBN Publisher
  48. 48. The Trouble with Triples
  49. 49. Cytoscape.org
  50. 50. Our Data are Multiplying. Review of the Review
  51. 51. Trends in data growth • Vast amounts of digital data being produced daily. –Wal-Mart produces 1 million transactions every hour. DBs estimated at > 2.5 petabytes • US National Archives creating > 10 million digital assets annually
  52. 52. Data Inflation • Megabyte (MB) = 220 • Gigabyte (GB) = 230 • Terabyte (TB) = 240 • Petabyte (PB) = 250 or 1000TB • Exabyte (EB) = 260 or 1,000PB 70 • Zettabyte (ZB) = 2 or 1,000EB 80 • Yottabyte (YB) = 2 or 1,000ZB
  53. 53. Acceleration –Decoding human genome involves analyzing 3 billion base pairs • what took 10 years to process in 2003, takes a week today
  54. 54. A brand new professional has emerged .... The data scientist who combines the , skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. - The Economist, “Data, data everywhere”, Feb 27th 2010
  55. 55. When we come back…
  56. 56. S – T – R – E – T – C - H Brea k!
  57. 57. Linked Data is like a harmonica • It’s easy to play
  58. 58. Facebook • Unique Visitors*: 540,000,000 • Page Views: 570,000,000,000 * Per month Source: Google - The 1000 most-visited sites on the web
  59. 59. Facebook
  60. 60. Facebook
  61. 61. FOAF: Friend-Of-A-Friend http://www.foaf-project.org/
  62. 62. FOAF-a-Matic http://www.ldodds.com/foaf/foaf-a-matic
  63. 63. semantictweet.com
  64. 64. semantictweet.com
  65. 65. semantictweet.com Can create four FOAF files: • Friends (who I follow) • Followers • All • Just Me
  66. 66. Linked Data is like a harmonica • It’s easy to play • It’s a “real” instrument
  67. 67. The Technologies of RDBMS • Data • Schemas • Query Language
  68. 68. RDBMS Data t_people Name City State Post code David Fredericksburg VA 22408 Eric Culver City CA 90230
  69. 69. RDBMS Schema
  70. 70. RDBMS Query Language: SQL SELECT isbn, title, price, price * 0.06 AS sales_tax FROM Book WHERE price > 100.00 ORDER BY title;
  71. 71. The Technologies of LOD/LED • Data • Schemas • Query Language
  72. 72. The Data Language Resource Description Framework
  73. 73. RDF Triples Subject Predicate Object http://plushbeautybar.com dc: creator http://www.ericax el.com/foaf.rdf http://www.geonames.org/ dc: location N 34° 1' 16'' maps/google_34.021_- W 118° 23' 47'' 118.396.html http://twitter.com/ericaxel foaf: knows “Brian Sletten”
  74. 74. RDF Triple Components Subject Predicate Object http://plushbeautybar.com dc: creator http://www.ericax el.com/foaf.rdf http://www.geonames.org/ dc: location N 34° 1' 16'' maps/google_34.021_- W 118° 23' 47'' 118.396.html http://twitter.com/ericaxel foaf: knows “Brian Sletten” http://twitter.com/bsletten URI URI URI or String Literal
  75. 75. “RDF is good for distributing data across the Web and pretending it’s in one place.” -Dean Allemang, TopQuadrant
  76. 76. Just so you know… There are many ways of representing RDF: • RDF/XML • N-Triples • N3 • Turtle • JSON • RDFa Each serialization has pros and cons, but they all are used to connect THINGS and RELATIONSHIPS into TRIPLES
  77. 77. The Schemata Linked Data schemas consist of: Your RDF relationships (predicates) + Relationship descriptions
  78. 78. LOD/LED Schemata id First Name Last Name Schema Relationship 1 Tony Shaw Data description hasSurname owl:sameAs Initial Schema hasFirstName hasLastName hasID Tony 1 Shaw
  79. 79. Choosing Relationships • Reuse popular vocabularies –FOAF (Friend-of-a-friend) –Dublin Core (library/publisher metadata) –SIOC (Semantically-Interlinked Online Communities) • ...or make up your own!
  80. 80. RDF Triples Subject Predicate Object http://plushbeautybar.com dc: creator http://www.ericax el.com/foaf.rdf http://www.geonames.org/ dc: location N 34° 1' 16'' maps/google_34.021_- W 118° 23' 47'' 118.396.html http://twitter.com/ericaxel foaf: knows “David Wood”
  81. 81. Relationship Descriptions 1. Resource Description Framework Schema (RDFS): Simple, hierarchical classes 2. Simple Knowledge Organization System (SKOS): Port taxonomies to the Semantic Web 3. Web Ontology Language (OWL): Complex logical relationships
  82. 82. Combine vocabularies and descriptions
  83. 83. LOD/LED Schemata • Put as much work into creating your LED schema as you put into creating your relational schemas • ... maybe even a bit more (due to links between your data and others’).
  84. 84. New York Times -SKOS
  85. 85. New York Times -SKOS
  86. 86. New York Times -SKOS SKOS STUFF
  87. 87. The query language SPARQL SPARQL Protocol And RDF Query Language
  88. 88. SPARQL Example #1 FOAF (some people that Eric Franzon knows) PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name FROM <http://ericaxel.com/eric.rdf> WHERE { ?knower foaf:knows ?known . ?known foaf:name ?name . }
  89. 89. SPARQL Example #1
  90. 90. Example #1 - Results
  91. 91. SPARQL Example #2 Querying two FOAF Profiles PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?name FROM NAMED <http://ericaxel.com/eric.rdf> FROM NAMED <http://zepheira.com/team/dave/dave.rdf> WHERE { GRAPH <http://ericaxel.com/eric.rdf> { ?x rdf:type foaf:Person . ?x foaf:name ?name . } . GRAPH <http://zepheira.com/team/dave/dave.rdf> { ?y rdf:type foaf:Person . ?y foaf:name ?name . } . }
  92. 92. Example #2 - Results
  93. 93. SPARQL Example #3 Bart Simpson's chalkboard gags (DBPedia) SELECT ?episode,?chalkboard_gag WHERE { ?episode skos:subject ?season . ?season rdfs:label ?season_title . ?episode dbpedia2:blackboard ?chalkboard_gag . FILTER (regex(?season_title, "The Simpsons episodes, season")) . } ORDER BY ?season
  94. 94. Example #3 - Results
  95. 95. http://www.milinkito.com/swf/bart.php
  96. 96. Are *real* companies using Linked Data?
  97. 97. Easy to play; takes work to master.
  98. 98. …and many more!
  99. 99. E-Commerce A vocabulary to describe products, services, and other e-commerce terms.
  100. 100. Who is using GoodRelations? 1100+ Best Buy stores
  101. 101. Phase 2 ~640,000 “next-gen” product detail pages
  102. 102. 21 Open Box Products listed at this store!
  103. 103. Who is using GoodRelations?
  104. 104. With RDFa + GoodRelations, but no additional SEO work, PlushBeautyBar.com was indexed by Google within one week.
  105. 105. Semantic (Web) Technologies Linked RDBMS Enterprise Semantic Linked Data CRM Web Open Data Calendars
  106. 106. MIXING private and public data? Absolutely! And it’s really useful to do so!
  107. 107. Example: iConcertCal
  108. 108. Public + Private Data: iConcertCal
  109. 109. Public + Private Data: iConcertCal
  110. 110. Example: Siri
  111. 111. Siri.com Siri is a Virtual Assistant. I ask it to do things for me. It does, by mixing data, by disambiguating, and by reasoning.
  112. 112. Siri.com Siri is a Virtual Assistant. I ask it to do things for me. It does, by mixing data, by disambiguating, and by reasoning.
  113. 113. Siri.com Siri is a Virtual Assistant. I ask it to do things for me. It does, by mixing data, by disambiguating, and by reasoning.
  114. 114. Siri.com Siri is a Virtual Assistant. I ask it to do things for me. It does, by mixing data, by disambiguating, and by reasoning.
  115. 115. Siri.com Siri is a Virtual Assistant. I ask it to do things for me. It does, by mixing data, by disambiguating, and by reasoning.
  116. 116. Siri.com Siri is a Virtual Assistant. I ask it to do things for me. It does, by mixing data, by disambiguating, and by reasoning.
  117. 117. Example:
  118. 118. • Largest broadcasting corp. in the world • 8 national TV channels • 10 national radio stations • 40 local radio stations • An extensive website, bbc.co.uk
  119. 119. • Broadcasts 1,000-1,500 programs per day. • Publishes information in several formats: audio, video, textual. • Needed to relate information across media for both users and third-party developers
  120. 120. • Approach: Create a Web presence for each • Broadcast • Artist • Species (and other biological ranks), habitat and adaptation –that the BBC has an interest in.
  121. 121. "Creating web identifiers for every item the BBC has an interest in, and considering those as aggregations of BBC content about that item, allows us to enable very rich cross-domain user journeys." -- Yves Raimond
  122. 122. • BBC Music is underpinned by the Musicbrainz music database and Wikipedia. • “BBC Music takes the approach that the Web itself is its content management system. [BBC] editors directly contribute to Musicbrainz and Wikipedia.”
  123. 123. BBC • Wildlife Finder links existing LOD data with BBC content to make pages about each species, habitat and adaptation: • Wildlife programmes (clips and episodes) are identified by tagging the clip or episode with the appropriate dbpedia URI.
  124. 124. "The RDF representations of these web identifiers allow developers to use our data to build applications." -- Yves Raimond
  125. 125. A few final thoughts
  126. 126. A little bit can be very powerful!
  127. 127. RDFs RDF RDFa OWL triple Web 3.0 = Semantic Web SPARQL Linked Data SKOS
  128. 128. RDFs Dublin Core OWL-DL OWL-Full NLP RDF RDFa triplestore PURLs OWL OWL2 triple OWL-lite vocabulary microdata ontology folksonomy subject predicate object entity extraction Web 3.0 = Semantic Web SPARQL microformats REST GRDDL taxonomy URI Artificial Intelligence cloud computing open world reasoning LOD LED reasoning engine Linked Data data portability SKOS
  129. 129. Further Reading …and more to come!
  130. 130. THANK YOU! Questions? Operators are standing by. EricAxel@yahoo.com
  131. 131. Semantic Technology Conference www.Semantic-Conference.com June 21-25, 2010 Semantic Universe Free Informational Resource www.SemanticUniverse.com
  132. 132. Resources http://geekandpoke.typepad.com/ http://richard.cyganiak.de/2007/10/lod/ http://iconcertcal.com http://siri.com http://data.nytimes.com http://freedigitalphotos.com http://aldobucchi.com http://www.milinkito.com/swf/bart.php
  133. 133. Resources http://www.flickr.com/photos/kellyhogaboom/4369774518/ http://www.flickr.com/photos/zenera/56677048/ http://www.flickr.com/photos/97964364@N00/59780745/ http://www.flickr.com/photos/starwarsblog/793008715/ http://www.flickr.com/photos/peterpearson/871254091/ http://www.flickr.com/photos/birdfarm/60946474/ http://www.flickr.com/photos/entropy1138/173847148/ http://www.flickr.com/photos/wainwright/351684037/ http://data.nytimes.com/50891932523096258603.rdf

×