The Semantic Web and Linked        Open Data               Pete DeVries             TaxonConcept.org      http://www.taxon...
What is the Semantic Web and how          does it Work?                   Lets Look at the Traditional Way                ...
Data IslandsThe result are database islands that contain a lot of redundant data which is independently curated.          ...
Data Sets often Overlap                               TextWhat they don’t have is a common set of field names or ID’s
Each Data set has is own “Vocabulary” Different Fields Different Names for the Same Fields Same Names for Different Fields...
Where the Semantic Web Helps               Tim Berners-Lee’s 4 Rules1. Use URIs* as names for things2. Use HTTP URIs so th...
Use URIs as Names for Things?Instead of “Door County” usehttp://sws.geonames.org/5250768/
For Humans this URI Dereferences to a    Human Interpretable Web Page                  Text                    Text
For Machines this Dereferences a   Machine Interpretable File             As N-Triples
Why Would Anyone Think this Made Sense? Now, each of these different databases are using an ID with a shared meaning.     ...
Life Sciences ExampleExample: Two databases with county recordsOne uses “La Crosse County,” the other lists “La Crosse” fo...
Normalize the Meaning between Data SourcesUse this shared vocabulary to integrate these two data sourcesUse that shared vo...
As More Data Sets Adopt these PrinciplesThe individual datasets are no longer islands, but are one interconnected knowledg...
Other Benefits Reduced duplication of effort and a better separation of concerns   It would be more efficient for me to simp...
Example: The Linked Open Data Cloud          Over 55 billion triples and rising
What is Linked Open Data?1. data representation using open standards2. use of hyperlinks to make it work on the global web
Wikipedia Images linked to my Species Concepts     TaxonConcept <=> Dbpedia <=> WikiCommons Images           Virtuoso Open...
How do I Mark up my Data?    Your data set can continue to exist in its current relationaldatabase form, but you need to e...
Knowledge as TriplesStatements are represented in a triple structure        Subject ➜ Predicate ➜ Object•   An English tex...
Machine Processable VersionOchlerotatus triseriatus is expected in La Crosse County, WI       Now represented as the follo...
Expressing RDF RDF = Resource Description Framework Ways to Express RDF (Serialization Formats)                   RDF/XML ...
The Same Triple in Different Formats                                    RDF/XML (.rdf)                                    ...
How do I tell the Semantic Web      about my Data?           PingtheSemanticWeb      http://pingthesemanticweb.com/       ...
PingtheSemanticWeb.com  Enter the URL for your RDF documents
Semantic SiteMaps          http://site.example.com/sitemap.xml        http://site.example.com/sitemap.xml.gzRefer to the s...
How can I Find other Potentially Useful             Data Sets?CKAN Comprehensive Knowledge Archive Network               h...
Ask the LOD CloudEnter in term or name like “Quercus alba”, to see what entities contain that term or name
LOD Cloud Query Result
How can I set up my own Knowledge Base?           Virtuoso Open-Source Edition             http://virtuoso.openlinksw.com/
How can I Query a Knowledge Base?                            SPARQL                http://en.wikipedia.org/wiki/SPARQL    ...
iSPARQL Query Example Web Interface
iSPARQL Query Result
More Elaborate SPARQL QueryQuery for those mammals that are “expected in” Wisconsin.* use optional keyword for those attri...
Result ViewLive Query of the LOD Cloud Data Set
Efforts to Align Vocabularieshttp://labs.mondeca.com/dataset/lov/index.html
Early EoL LOD
Early EoL LOD
Early EoL LOD Knowledge Base View
What does the Future hold for theSemantic Web and Linked Open DataImprovements in the quantity and quality of LOD data set...
Other Resources             Linked Open Data http://linkeddata.org/             W3C.org http://esw.w3.org/Main_Pagepublic-...
RecommendationsTry using and experimenting with existing vocabularies before creatingyour own.Although these technologies ...
Acknowledgments           Kingsley Idehen   http://www.openlinksw.com/blog/~kidehen/David “Paddy” Patterson mbl.edu     An...
Upcoming SlideShare
Loading in …5
×

Semantic Web and Linked Open Data

2,780 views

Published on

Marine Biological Laboratory, Woods Hole MA
Library and Informatics Group

Published in: Education

Semantic Web and Linked Open Data

  1. 1. The Semantic Web and Linked Open Data Pete DeVries TaxonConcept.org http://www.taxonconcept.org/ Department of Entomology University of Wisconsin - Madison
  2. 2. What is the Semantic Web and how does it Work? Lets Look at the Traditional Way Taxon Table Location TableThis data structure is really only interpretable within the context of this specific database
  3. 3. Data IslandsThe result are database islands that contain a lot of redundant data which is independently curated. Each effort benefits little from the other efforts.
  4. 4. Data Sets often Overlap TextWhat they don’t have is a common set of field names or ID’s
  5. 5. Each Data set has is own “Vocabulary” Different Fields Different Names for the Same Fields Same Names for Different Fields Different ways of Interpreting those FieldsThese nuances in meaning are often only understood by the designers of each individual data set. Consider how differently people interpret the meaning of what seem to be the same terms
  6. 6. Where the Semantic Web Helps Tim Berners-Lee’s 4 Rules1. Use URIs* as names for things2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful information.4. Include links to other URIs. so that they can discover more things. *URI = Uniform Resource Identifier http://www.w3.org/DesignIssues/LinkedData.html
  7. 7. Use URIs as Names for Things?Instead of “Door County” usehttp://sws.geonames.org/5250768/
  8. 8. For Humans this URI Dereferences to a Human Interpretable Web Page Text Text
  9. 9. For Machines this Dereferences a Machine Interpretable File As N-Triples
  10. 10. Why Would Anyone Think this Made Sense? Now, each of these different databases are using an ID with a shared meaning. A meaning that can be determined by dereferencing the URI. All the data sets that use this vocabulary are now connectable. All the data sets that are linked to this URI are now also linked to each other.
  11. 11. Life Sciences ExampleExample: Two databases with county recordsOne uses “La Crosse County,” the other lists “La Crosse” for LaCrosse County, WisconsinYou want to link and merge those records so that it is clear that youmean a particular species was observed in a particular county
  12. 12. Normalize the Meaning between Data SourcesUse this shared vocabulary to integrate these two data sourcesUse that shared vocabulary to find and link to other relevant data
  13. 13. As More Data Sets Adopt these PrinciplesThe individual datasets are no longer islands, but are one interconnected knowledge base
  14. 14. Other Benefits Reduced duplication of effort and a better separation of concerns It would be more efficient for me to simply link to a bibliographic reference URI on a site that specializes in that then to create my own bibliographic database.Similarly, it would be more efficient for the bibliographic database to linkto a URI in a nomenclatural database than curates that aspect separately.When represented as URI’s in a Semantic Web database or “Triple Store”, information can be encoded more efficiently ~32 bytes per statement Enabling usable knowledge bases that scale to billions of “facts”
  15. 15. Example: The Linked Open Data Cloud Over 55 billion triples and rising
  16. 16. What is Linked Open Data?1. data representation using open standards2. use of hyperlinks to make it work on the global web
  17. 17. Wikipedia Images linked to my Species Concepts TaxonConcept <=> Dbpedia <=> WikiCommons Images Virtuoso OpenSource and Microsoft Pivot (some images are too large to display)
  18. 18. How do I Mark up my Data? Your data set can continue to exist in its current relationaldatabase form, but you need to expose it to the semantic web in a different formThe goal is to make structured data accessible and discoverable via hyperlinks. It also includes the use of hyperlinks to denote properties/ predicates that have well defined semantics.These semantics are what ontologies and vocabularies deliver with more fidelity that whats available in a typical RDMS. Thus, the Semantic Web isnt a destination - it the effect ofpublishing data in line with a set of principles as outlined in TimBLs meme.
  19. 19. Knowledge as TriplesStatements are represented in a triple structure Subject ➜ Predicate ➜ Object• An English text version of a triple might look like• Ochlerotatus triseriatus expected in La Crosse County, WI
  20. 20. Machine Processable VersionOchlerotatus triseriatus is expected in La Crosse County, WI Now represented as the following triple* http://lod.taxonconcept.org/ses/iuCXz#Species http://lod.taxonconcept.org/ontology/txn.owl#isExpectedIn http://sws.geonames.org/5258961/ *Not Meant for Human Consumption
  21. 21. Expressing RDF RDF = Resource Description Framework Ways to Express RDF (Serialization Formats) RDF/XML http://www.w3.org/TR/REC-rdf-syntax/ Notation 3 (N3)http://www.w3.org/DesignIssues/Notation3.html Subsets of N3 Turtle (Terse RDF Triple Language) N-Triples
  22. 22. The Same Triple in Different Formats RDF/XML (.rdf) N3 (.n3) Turtle (.ttl) You might find one of these forms easier to create. There are various tools that will allow you to convert between one form and another.If you need RDF/XML, but can create N3; author in N3 then convert those files to RDF/XML.
  23. 23. How do I tell the Semantic Web about my Data? PingtheSemanticWeb http://pingthesemanticweb.com/ Semantic Sitemapshttp://sw.deri.org/2007/07/sitemapextension/
  24. 24. PingtheSemanticWeb.com Enter the URL for your RDF documents
  25. 25. Semantic SiteMaps http://site.example.com/sitemap.xml http://site.example.com/sitemap.xml.gzRefer to the sitemap.xml file in your sites robots.txt file
  26. 26. How can I Find other Potentially Useful Data Sets?CKAN Comprehensive Knowledge Archive Network http://ckan.net/
  27. 27. Ask the LOD CloudEnter in term or name like “Quercus alba”, to see what entities contain that term or name
  28. 28. LOD Cloud Query Result
  29. 29. How can I set up my own Knowledge Base? Virtuoso Open-Source Edition http://virtuoso.openlinksw.com/
  30. 30. How can I Query a Knowledge Base? SPARQL http://en.wikipedia.org/wiki/SPARQL http://www.w3.org/TR/rdf-sparql-query/ Query using the Web Interface Query using your own script or web application Example “Describe those occurrences of the species concept Boloria selene”
  31. 31. iSPARQL Query Example Web Interface
  32. 32. iSPARQL Query Result
  33. 33. More Elaborate SPARQL QueryQuery for those mammals that are “expected in” Wisconsin.* use optional keyword for those attributes that may not exist* the query includes those attributes that should be returnedThe result set will be feed through Microsoft Pivot for Browsing
  34. 34. Result ViewLive Query of the LOD Cloud Data Set
  35. 35. Efforts to Align Vocabularieshttp://labs.mondeca.com/dataset/lov/index.html
  36. 36. Early EoL LOD
  37. 37. Early EoL LOD
  38. 38. Early EoL LOD Knowledge Base View
  39. 39. What does the Future hold for theSemantic Web and Linked Open DataImprovements in the quantity and quality of LOD data sets. Improved Alignment of Vocabularies Improvements in SPARQL and QuadstoresHuman and Machine Interpretable Views Merged in RDFa Better Visualization and Analysis Tools
  40. 40. Other Resources Linked Open Data http://linkeddata.org/ W3C.org http://esw.w3.org/Main_Pagepublic-lod email list http://lists.w3.org/Archives/Public/public-lod/ TaxonConcept.org http://www.taxonconcept.org/ TaxonConcept.org Examples http://bit.ly/bundles/pjdlinkeddata/ SlideShare Talks Evolution Towards Web 3.0: The Semantic Webhttp://www.slideshare.net/LeeFeigenbaum/evolution-towards-web-30-the- semantic-web
  41. 41. RecommendationsTry using and experimenting with existing vocabularies before creatingyour own.Although these technologies allow you to run queries that you might nothave anticipated, thinking about use cases etc. will provide some guidanceon the best way to markup your data.Start with simple models and representations and add complexity as yougain experience.You may not want or be able to expose all your data to the LOD Cloud,but exposing the metadata in commonly used vocabularies will make yourdata more “findable” Some vocabularies* are still under development and discussion, but in many cases you can modify your SQL to RDF export to accommodate changes.* For instance, it is not clear to me what is the “best” vocabulary forrepresenting publications.
  42. 42. Acknowledgments Kingsley Idehen http://www.openlinksw.com/blog/~kidehen/David “Paddy” Patterson mbl.edu Anne Thessen mbl.edu Dmitry Mozzherin mbl.edu Han Wang rpi.edu Patrick Leary eol.org

×