Linked Data (1st Linked Data Meetup Malmö)


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Linked Data (1st Linked Data Meetup Malmö)

  1. 1. Linked Data -Evolving the Web into a Global Data SpaceAnja Jentzsch - @anjeveHasso-Plattner-Institute, Potsdam, Germany1st Linked Data Meetup @ Foo Café, Malmö2103/05/23
  2. 2. Outline1. Foundations of Linked Data• What is the vision and goal?2. The Web of Linked Data• What data is out there?• What is being done with the data?3. How to publish and consume Linked Data?
  3. 3. Architecture of the classic WebB CHTMLHTMLWebBrowsersSearchEngineshyper-linksAHTML• Single global document space• Small set of simple standards• HTML as document format• HTTP URLs as globally unique IDs• Retrieval mechanism: Hyperlinks to connecteverything
  4. 4. Web 2.0 APIs and MashupsNo single global dataspaceShortcomings1. APIs have proprietary interfaces2. Mashups are based on a fixed set ofdata sources3. No hyperlinks between data itemswithin different APIsWebAPIAMashupWebAPIBWebAPICWebAPID
  5. 5. Web APIs slice the Web into Walled GardensImage: Bob Jagensdorf,, CC-BY
  6. 6. Extend the Web with a single global dataspace1. by using RDF to publish structured data on the Web2. by setting links between data items within different data sourcesLinked DataB CRDFRDFLinksA D ERDFLinksRDFLinksRDFLinksRDFRDFRDFRDFRDF RDFRDFRDFRDF
  7. 7. Linked Data PrinciplesSet of best practices for publishing structured data on the Web inaccordance with the general architecture of the Web.1. Use URIs as names for things.2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful RDF information.4. Include RDF statements that link to other URIs so that they can discoverrelated things.Tim Berners-Lee,, 2006
  8. 8. The RDF Data ModelAnja Jentzschdbpedia:Berlinfoaf:namefoaf:based_nearfoaf:Personrdf:typens:anja
  9. 9. Data Items are identified with HTTP URIsns:anja = = Jentzsch
  10. 10. Resolving URIs over the Webdp:Cities_in_Germany3.499.879dp:populationskos:subjectdbpedia:Berlinfoaf:namefoaf:based_nearfoaf:Personrdf:typens:anjaAnja Jentzsch
  11. 11. Dereferencing URIs over the Webdbpedia:Hamburgdbpedia:Muenchenskos:subjectskos:subjectdp:Cities_in_Germany3.499.879dp:populationskos:subjectdbpedia:Berlinfoaf:namefoaf:based_nearfoaf:Personrdf:typens:anjaAnja Jentzsch
  12. 12. RDF Representation Formats• RDF/XML<rdf:RDF xmlns:rdf=""xmlns:foaf=""><foaf:Person rdf:about=""><foaf:name>Anja Jentzsch</foaf:name></foaf:Person>• RDF N-Triples<> <> <> .<> <> „AnjaJentzsch“ .• RDF N3
  13. 13. RDF Representation Formats<> <> <> .• Subject Predicate Object• In the end it‘s all triples!foaf:Personrdf:typens:anja
  14. 14. Properties of the Web of Linked Data• Global, distributed dataspace build on a simple set of standards• RDF, URIs, HTTP• Entities are connected by links• creating a global data graph that spans data sources and• enables the discovery of new data sources• Provides for data-coexistence• Everyone can publish data to the Web of Linked Data• Everyone can express their personal view on things• Everybody can use the vocabularies/schemas that they like
  15. 15. W3C Linking Open Data Project• Grassroots community effort to• publish existing open license datasets as Linked Data on the Web• interlink things between different data sources
  16. 16. LOD Data Sets on the Web: May 2007• 12 data sets• Over 500 million RDF triples• Around 120,000 RDF links between data sources
  17. 17. LOD Data Sets on the Web: November 2007• 28 data sets
  18. 18. LOD Data Sets on the Web: September 2008• 45 data sets• Over 2 billion RDF triples
  19. 19. LOD Data Sets on the Web: July 2009• 95 data sets• Over 6.5 billion RDF triples
  20. 20. LOD Data Sets on the Web: September 2010• 203 data sets• Over 24,7 billion RDF triples• Over 436 million RDF links between data sources
  21. 21. LOD Data Sets on the Web: September 2011• 295 data sets• Over 31 billion RDF triples• Over 504 million RDF links between data sources
  22. 22. The Growth in Numbers0350000000002007 2008 2009 2010 2011Year Data Sets Triples Growth2007 12 500,000,0002008 45 2,000,000,000 300%2009 95 6,726,000,000 236%2010 203 26,930,509,703 300%2011 295 31,634,213,770 33%
  23. 23. LOD Data Set statistics as of 09/2011LOD Cloud Data Catalog on the Data Hub• statistics•
  24. 24. DBpedia –The Hubon the Web of Data• DBpedia is a joint project with the following goals• extracting structured information from Wikipedia• publish this information under an open license on the Web• setting links to other data sources• Partners• Freie Universität Berlin (Germany)• Universität Leipzig (Germany)• OpenLink Software (UK)• neofonie (Germany)
  25. 25. Extracting structured data from Wikipedia
  26. 26. Extracting structured data from Wikipediadbpedia:Berlin rdf:type dbpedia-owl:City ,dbpedia-owl:PopulatedPlace ,dbpedia-owl:Place ;rdfs:label "Berlin"@en , "Berlino"@it ;dbpedia-owl:population 3499879 ;wgs84:lat 52.500557 ;wgs84:long 13.398889 .!dbpedia:SoundCloud dbpedia-owl:location dbpedia:Berlin .• access to DBpedia data:• dumps• SPARQL endpoint• Linked Data interface
  27. 27. The DBpedia Data Set• Information on more than 3.77 million “things”• 764,000 persons• 192,000 organisations• 573,000 places• 112,000 music albums• 72,000 movies• 202,000 species• overall more than 1 billion RDF triples• title and abstract in 111 different languages• 8,000,000 links to images• 24,400,000 links to external web pages• 27,200,000 links to other Linked Data sets
  28. 28. DBpedia Mappings• since March 2010 collaborative editing of• DBpedia ontology• mappings from Wikipedia infoboxes and tables to DBpedia ontology• curated in a public wiki with instant validation methods•• multi-langual mappings to the DBpedia ontology:• ar, bg, bn, ca, cs, de, el, en, es, et, eu, fr, ga, hi, hr, hu, it, ja, ko, nl, pl, pt, ru, sl,tr• allows for a significant increase of the extracted data’s quality• each domain has its experts
  29. 29. DBpedia Use Cases1. Improvement of Wikipedia search2. Data source for applications and mashups3. Text analysis and annotation4. Hub for the growing Web of Data
  30. 30. DBpedia Mobile• displays Wikipedia data on map• aggregates different data sources
  31. 31. Faceted Wikipedia Search• faceted browsing and free text search
  32. 32.
  33. 33. Uptake in the Government Domain• The EU is pushing Linked Data (LOD2, LATC, EuroStat)• W3C eGovernment Interest Group
  34. 34. Uptake in the Libraries Community• Institutions publishing Linked Data• Library of Congress (subject headings)• German National Library (PND dataset and subject headings)• Swedish National Library (Libris - catalog)• Hungarian National Library (OPAC and Digital Library)• Europeana project just released data about 4 million artifacts• W3C Library Linked Data Incubator Group• OKFN Working Group on Bibliographic Data• Goals:• Integrate library catalogs on global scale• Interconnect resources between repositories (by topic, by location, byhistorical period, by ...)
  35. 35. Uptake in Life Sciences• W3C Linking Open Drug Data Effort• Bio2RDF Project• Allen Brain Atlas• Goal: Smoothly integrate internal and external data in a pay-as-you-go-fashion.
  36. 36. Uptake in the Media Industry• Publish data as RDF/XML or RDFa• Goal: Drive traffic to websites viasearch engines
  37. 37. Linked Data Applications
  38. 38. Linked Data Browsers• Tabulator Browser (MIT, USA)• Marbles (FU Berlin, DE)• OpenLink RDF Browser (OpenLink, UK)• Zitgist RDF Browser (Zitgist, USA)• Humboldt (HP Labs, UK)• Disco Hyperdata Browser (FU Berlin, DE)• Fenfire (DERI, Irland)
  39. 39. Web of Data Search Engines• (DERI, Ireland)• Falcons (IWS, China)• Swoogle (UMBC, USA)• VisiNav (DERI, Ireland)• Watson (Open University, UK)
  40. 40.• jointly proposed vocabularies for embedding data into HTML pages (Microdata)• available since June 2011
  41. 41. • What does Linked Data mean to startups?Economic Opportunities
  42. 42. Huge Reservoir of Free Content• can be used to provide background facts about• places,• people,• topics• …Background Facts
  43. 43. Lower Data Integration CostsThe overall data integration effort is splitbetween the data publisher, the data consumerand third parties.• Data Publisher– publishes data as RDF– sets identity links– reuses terms or publishes mappings• Third Parties– set identity links pointing at your data– publish mappings to the Web• Data Consumer– has to do the rest– using record linkage and schema matchingtechniques
  44. 44. Options for Search Companies• new vertical search engines•, FalconS, …• Yahoo and Google are not sleeping• improving text search with background knowledge
  45. 45. Options for Data Integration Intermediaries• business model• crawl, integrate and clean Web data• sell clean data to subscribers• existing Linked Data integration intermediaries
  46. 46. Is your data 5 star?Tim Berners-Lee,, 2010Make your stuff available on the Web (whatever format) underan open license.Make it available as structured data (e.g., Excel instead of imagescan of a table) so that it can be reused.Use non-proprietary, open formats (e.g., CSV instead of Excel).Use URIs to identify things, so that people can point at your stuffand serve RDF from it.Link your data to other data to provide context.★★ ★★ ★ ★★ ★ ★ ★★ ★ ★ ★ ★
  47. 47. How to publish Linked Data1. Tasks: Make data available as RDF via HTTP2. Set RDF links pointing at other data sources3. Make your data self-descriptive4. Reuse common vocabulariesTom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space
  48. 48. Linked Data Publishing Patterns
  49. 49. Make Data available as RDF via HTTP•Ready to use tools (examples)• D2R Server• provides for mapping relationaldatabases into RDF and forserving them as Linked Data• Pubby• Linked Data Frontend forSPARQL Endpoints• More tools•
  50. 50. Set RDF links to other data sources• Examples of RDF links<> owl:sameAs <> .<> foaf:topic_interest<> .<> owl:sameAs <> .
  51. 51. How to generate RDF links?• Pattern-based approaches• Exploit naming conventions within URIs (for instance ISBNs, ISINs, …)• Similarity-based approaches• Compare items within different data sources using various similarity metrics• Ready to use tools (Examples)• Silk Link Discovery Framework• provides a declarative language for specifying link conditionswhich may combine different similarity metrics• Silk Single Machine, Silk MapReduce• More tools•
  52. 52. Make your Data Self-Descriptive• Increase the usefulness of your data and ease data integration• Aspects of self-descriptiveness• Enable clients to retrieve the schema• Reuse terms from common vocabularies• Publish schema mappings for proprietary terms• Provide provenance metadata• Provide licensing metadata• Provide data-set-level metadata using voiD• Refer to additional access methods using voiD
  53. 53. Enable Clients to retrieve the SchemaClients can resolve the URIs that identifyvocabulary terms in order to get their RDFS orOWL definitions.<>rdf:type owl:Class ;rdfs:label "Person";rdfs:subClassOf <> ;rdfs:subClassOf <> .<>foaf:name "Richard Cyganiak" ;rdf:type <> .RDFS or OWL definitionSome data on the WebResolve unknown term
  54. 54. ReuseTerms from CommonVocabularies• CommonVocabularies• Friend-of-a-Friend for describing people and their social network• SIOC for describing forums and blogs• SKOS for representing topic taxonomies• Organization Ontology for describing the structure of organizations• GoodRelations provides terms for describing products and business entities• Music Ontology for describing artists, albums, and performances• ReviewVocabulary provides terms for representing reviews• Common sources of identifiers (URIs) for real world objects• LinkedGeoData and Geonames locations• GeneID and UniProt life science identifiers
  55. 55. How to consume Linked Data
  56. 56. Integrated Linked Data ConsumptionTools• LDIF – Linked Data Integration Framework• provides for data translation and identity resolution•• ALOE - Assisted Linked Data Consumption framework• provides for data translation and identity resolution•• Information Workbench• provides for visualization and application development•
  57. 57. 62Finding Linked Data sets• Search engines• find data sets based on keywords• Data catalogs / directories• explore data sets and faceted search• Data Marketplaces• explore and consume data sets
  58. 58. 63The Data Hub• Manually curated list of (>3.500) data sets, at least 356 Linked Data Sets• Various metadata for each data set
  59. 59. 64
  60. 60. Conclusion• The Web of Data is growing fast• Linked Data provides a standardized data access interface• Linked Data allows for the development of a variety of tools to integrate,enhance and and view the data• Web search is evolving into query answering• Search engines will increasingly rely on structured data from the Web• Next step: Linked Data within enterprises• alternative to data warehouses and EAI middleware• advantages: schema-less data model, pay-as-you go data integration
  61. 61. Thanks!References:• Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space• Linking Open Data Project Wiki anja@anjeve.deTwitter: @anjeve