EDF2012: The Web of Data and its Five Stars

1,499 views

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,499
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
34
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

EDF2012: The Web of Data and its Five Stars

  1. 1. The Web of Data and its Five Stars Richard Cyganiak, DERI, NUI Galway @cygri 6 June 2012 Realising and Exploiting the EU data cloud European Data Forum, Copenhagen, Denmark
  2. 2. Generating insight from data •  Today, data is abundant •  New middlemen find new ways of getting data to the end user •  Supply and demand for data higher than ever •  Analysts problem is no longer a lack of relevant data, but: •  Understanding data •  Assessing applicability •  Getting it into the right form for use •  Similar problems inside and outside of the firewall
  3. 3. From the Web to the Web of Data
  4. 4. Tim Berners-Lee’s 5-star plan for an open web of data ★ Make data available on the Web under an open license ★★ Make it available as structured data ★★★ Use a non-proprietary format ★★★★ Use URIs to identify things ★★★★★ Link your data to other people’s data to provide context
  5. 5. The 0th star •  Data catalog with good metadata •  Make your data findable
  6. 6. Data on the Web, Open License ★
  7. 7. Open Data
  8. 8. Government data catalogs
  9. 9. Open vs. Closed Data used to be closed by default.In the future, it will be open by default.
  10. 10. Is open data just for governments?
  11. 11. Good reasons against opening data •  Privacy •  Competitive advantage •  Producing data and charging for it as business model •  Cant get license from upstream
  12. 12. Business models Scott Brinker, http://www.chiefmartec.com/2010/01/7-business-models-for-linked-data.html
  13. 13. Data licenses http://opendefinition.org/licenses/
  14. 14. Structured Data ★★
  15. 15. Enabling re-use •  Delivering data to end users in different forms •  Combining data with other data •  3rd party analysis of data
  16. 16. Formats in government data •  Good for re-use: MS Excel, CSV, XML, JSON, Microdata •  Not so good for re-use: Pure websites, MS Word •  Bad for re-use: PDF •  Really bad for re-use: Only charts/maps without numbers
  17. 17. Symptom: Screenscraping
  18. 18. Non-Proprietary Formats ★★★
  19. 19. Specialist formats •  Specialist tools often have specialist formats •  Few people have the tools •  Expensive •  Difficult to re-use •  (Geospatial tools, statistics packages, etc.)
  20. 20. Non-proprietary formats, open standards •  CSV (dead simple) •  XML •  JSON •  RDF (good for 4+5 stars) •  OGC web services •  OAI-ORE web services
  21. 21. Use URIs as Identifiers ★★★★
  22. 22. http://www.bbc.co.uk/music/artists/79239441-bfd5-4981-a70c-55c3f15c1287
  23. 23. http://data.ordnancesurvey.co.uk/id/postcodeunit/HA99HD
  24. 24. http://opencorporates.com/companies/us_vt/F013910
  25. 25. Turning local identifiers into URIs–Why? •  Make them globally unique •  Clarify authority •  Make them resolvable •  Make them linkable http://data.ordnancesurvey.co.uk/id/7000000000017765
  26. 26. The schema level By using URIs, connections that existed only in peoples minds can be put explicitly into the data model.
  27. 27. Include Links to Other Data ★★★★★
  28. 28. Hyperlinks are the soul of the Web. The Web of Data is no different.
  29. 29. Data links Central Contractor Registration (CCR) Geonames
  30. 30. Linked Data Principles 1.  Use URIs to name things (not only documents, but also people, locations, concepts, etc.) 2.  To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs 3.  When someone looks up a URI, provide useful information (structured data in RDF, SPARQL). 4.  Include links to other URIs allowing agents to discover more things http://www.w3.org/DesignIssues/LinkedData.html
  31. 31. Summary •  In the future, data will be open by default, unless good reason not to •  Emergence of a web of data •  “Five-star plan” for getting there, dataset by dataset •  2 stars: re-usable data! •  3 stars: open standards! •  4+5 stars: connect the silos!
  32. 32. Thank You! richard@cyganiak.de @cygri

×