Finding Data Sets

1,711 views

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,711
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
45
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Finding Data Sets

  1. 1. Finding Data Sets Anja Jentzsch, Freie Universität Berlin 17 April 2012Tutorial: Practical Cross-Dataset Queries on the Web of Data WWW2012, Lyon, France 1
  2. 2. Different motivations• Finding data sets • Look for resources to link a data set to • Find a data set with relevant data to consume / integrate• Finding vocabularies • Find vocabularies to use to model data sets • Find vocabularies to map your existing schema to 2
  3. 3. Different tool types• Search engines • find data sets based on keywords• Data catalogs / directories • explore data sets and faceted search• Data Marketplaces • explore and consume data sets 3
  4. 4. Linked Data Search Engines• The description of the resources is published as document in RDF• RDF search engine index the RDF documents• Process similar to that of search engines for HTML documents 4
  5. 5. http://sindice.com 5
  6. 6. http://sindice.com 6
  7. 7. http://sig.ma 7
  8. 8. http://sig.ma 8
  9. 9. http://swoogle.umbc.edu 9
  10. 10. http://kmi-web05.open.ac.uk/WatsonWUI/ 10
  11. 11. http://factforge.net 11
  12. 12. http://factforge.net 12
  13. 13. Suitability• Look for resources to link a data set to • Good• Find a data set with relevant data to consume • Maybe good: depends on how the query is expressed• Find vocabularies to use to model data sets • Not good: everything is indexed, too much noise 13
  14. 14. Data catalogs• Several governments and institutions are opening their catalogs• http://datacatalogs.org provides a manually curated index of 226 data catalogs 14
  15. 15. http://datacatalogs.org 15
  16. 16. 16
  17. 17. The Data Hub• Manually curated list of (>3.500) data sets, at least 326 Linked Data Sets• Various metadata for each data set• Other views over (part of) its content • Semantic CKAN (http://semantic.ckan.net) • LATC Data Source Inventory • LOD Cloud • State of the LOD Cloud 17
  18. 18. http://thedatahub.org 18
  19. 19. 19
  20. 20. http://dsi.lod-cloud.net 20
  21. 21. http://lod-cloud.net 21
  22. 22. http://lod-cloud.net/state/ 22
  23. 23. http://lod-cloud.net/state 23
  24. 24. Data Marketplaces• “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable and unified format. Several of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers.” (http://datamarket.com) 24
  25. 25. Kasabi• Data domain • All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …• Data population • Public datasets • User submitted datasets• Data size • 186 data sets• Data model • RDF 25
  26. 26. http://kasabi.com 26
  27. 27. Freebase• Metaweb (USA), now Google• Free for 100K read API calls per day (10K write), paid for higher volumes• Data access • REST API • Linked Data endpoint (http://rdf.freebase.com) • Triple uploader / RDF dumps• Data tools • Web based – schema editor, review queue, viewers, … • GridWorks (Google Refine) • Exploring, data cleaning, transformation of tabular data • Map data to Freebase schema & RDF export (3rd party extension) 27
  28. 28. http://www.freebase.com 28
  29. 29. 29
  30. 30. Linked Open Vocabularies (LOV)• Initiative similar to the LOD Cloud but focused on vocabularies• 250+ vocabularies 30
  31. 31. http://labs.mondeca.com/dataset/lov/ 31
  32. 32. 32

×