EDF2012: The Web of Data and its Five Stars
Upcoming SlideShare
Loading in...5

EDF2012: The Web of Data and its Five Stars






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

EDF2012: The Web of Data and its Five Stars EDF2012: The Web of Data and its Five Stars Presentation Transcript

  • The Web of Data and its Five Stars Richard Cyganiak, DERI, NUI Galway @cygri 6 June 2012 Realising and Exploiting the EU data cloud European Data Forum, Copenhagen, Denmark
  • Generating insight from data •  Today, data is abundant •  New middlemen find new ways of getting data to the end user •  Supply and demand for data higher than ever •  Analysts problem is no longer a lack of relevant data, but: •  Understanding data •  Assessing applicability •  Getting it into the right form for use •  Similar problems inside and outside of the firewall
  • From the Web to the Web of Data
  • Tim Berners-Lee’s 5-star plan for an open web of data ★ Make data available on the Web under an open license ★★ Make it available as structured data ★★★ Use a non-proprietary format ★★★★ Use URIs to identify things ★★★★★ Link your data to other people’s data  to provide context
  • The 0th star •  Data catalog with good metadata •  Make your data findable
  • Data on the Web, Open License ★
  • Open Data
  • Government data catalogs
  • Open vs. Closed Data used to be closed by default.In the future, it will be open by default.
  • Is open data just for governments?
  • Good reasons against opening data •  Privacy •  Competitive advantage •  Producing data and charging for it as business model •  Cant get license from upstream
  • Business models Scott Brinker, http://www.chiefmartec.com/2010/01/7-business-models-for-linked-data.html
  • Data licenses http://opendefinition.org/licenses/
  • Structured Data ★★
  • Enabling re-use •  Delivering data to end users in different forms •  Combining data with other data •  3rd party analysis of data
  • Formats in government data •  Good for re-use: MS Excel, CSV, XML, JSON, Microdata •  Not so good for re-use: Pure websites, MS Word •  Bad for re-use: PDF •  Really bad for re-use: Only charts/maps without numbers
  • Symptom: Screenscraping
  • Non-Proprietary Formats ★★★
  • Specialist formats •  Specialist tools often have specialist formats •  Few people have the tools •  Expensive •  Difficult to re-use •  (Geospatial tools, statistics packages, etc.)
  • Non-proprietary formats, open standards •  CSV (dead simple) •  XML •  JSON •  RDF (good for 4+5 stars) •  OGC web services •  OAI-ORE web services
  • Use URIs as Identifiers ★★★★
  • http://www.bbc.co.uk/music/artists/79239441-bfd5-4981-a70c-55c3f15c1287
  • http://data.ordnancesurvey.co.uk/id/postcodeunit/HA99HD
  • http://opencorporates.com/companies/us_vt/F013910
  • Turning local identifiers into URIs–Why? •  Make them globally unique •  Clarify authority •  Make them resolvable •  Make them linkable http://data.ordnancesurvey.co.uk/id/7000000000017765
  • The schema level By using URIs, connections that existed only in peoples minds can be put explicitly into the data model.
  • Include Links to Other Data ★★★★★
  • Hyperlinks are the soul of the Web. The Web of Data is no different.
  • Data links Central Contractor Registration (CCR) Geonames
  • Linked Data Principles 1.  Use URIs to name things (not only documents, but also people, locations, concepts, etc.) 2.  To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs 3.  When someone looks up a URI, provide useful information (structured data in RDF, SPARQL). 4.  Include links to other URIs allowing agents to discover more things http://www.w3.org/DesignIssues/LinkedData.html
  • Summary •  In the future, data will be open by default, unless good reason not to •  Emergence of a web of data •  “Five-star plan” for getting there, dataset by dataset •  2 stars: re-usable data! •  3 stars: open standards! •  4+5 stars: connect the silos!
  • Thank You! richard@cyganiak.de @cygri