Finding Data Sets           Anja Jentzsch, Freie Universität Berlin                       17 April 2012Tutorial: Practical...
Different motivations•   Finding data sets    •   Look for resources to link a data set to    •   Find a data set with rel...
Different tool types•   Search engines    •   find data sets based on keywords•   Data catalogs / directories    •   explor...
Linked Data Search Engines•   The description of the resources is published as document in RDF•   RDF search engine index ...
http://sindice.com   5
http://sindice.com   6
http://sig.ma   7
http://sig.ma   8
http://swoogle.umbc.edu   9
http://kmi-web05.open.ac.uk/WatsonWUI/   10
http://factforge.net   11
http://factforge.net   12
Suitability•   Look for resources to link a data set to    •   Good•   Find a data set with relevant data to consume    • ...
Data catalogs•   Several governments and institutions are opening their catalogs•   http://datacatalogs.org provides a man...
http://datacatalogs.org   15
16
The Data Hub•   Manually curated list of (>3.500) data sets, at least 326 Linked Data Sets•   Various metadata for each da...
http://thedatahub.org   18
19
http://dsi.lod-cloud.net   20
http://lod-cloud.net   21
http://lod-cloud.net/state/   22
http://lod-cloud.net/state   23
Data Marketplaces•   “Services that make it easy to find data from a range of secondary data sources,    then consume or ac...
Kasabi•   Data domain    •   All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …•   Data population    •   Public dat...
http://kasabi.com   26
Freebase•   Metaweb (USA), now Google•   Free for 100K read API calls per day (10K write), paid for higher volumes•   Data...
http://www.freebase.com   28
29
Linked Open Vocabularies (LOV)•   Initiative similar to the LOD Cloud but focused on vocabularies•   250+ vocabularies    ...
http://labs.mondeca.com/dataset/lov/   31
32
Upcoming SlideShare
Loading in …5
×

Finding Data Sets

1,659 views
1,544 views

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,659
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
44
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Finding Data Sets

  1. 1. Finding Data Sets Anja Jentzsch, Freie Universität Berlin 17 April 2012Tutorial: Practical Cross-Dataset Queries on the Web of Data WWW2012, Lyon, France 1
  2. 2. Different motivations• Finding data sets • Look for resources to link a data set to • Find a data set with relevant data to consume / integrate• Finding vocabularies • Find vocabularies to use to model data sets • Find vocabularies to map your existing schema to 2
  3. 3. Different tool types• Search engines • find data sets based on keywords• Data catalogs / directories • explore data sets and faceted search• Data Marketplaces • explore and consume data sets 3
  4. 4. Linked Data Search Engines• The description of the resources is published as document in RDF• RDF search engine index the RDF documents• Process similar to that of search engines for HTML documents 4
  5. 5. http://sindice.com 5
  6. 6. http://sindice.com 6
  7. 7. http://sig.ma 7
  8. 8. http://sig.ma 8
  9. 9. http://swoogle.umbc.edu 9
  10. 10. http://kmi-web05.open.ac.uk/WatsonWUI/ 10
  11. 11. http://factforge.net 11
  12. 12. http://factforge.net 12
  13. 13. Suitability• Look for resources to link a data set to • Good• Find a data set with relevant data to consume • Maybe good: depends on how the query is expressed• Find vocabularies to use to model data sets • Not good: everything is indexed, too much noise 13
  14. 14. Data catalogs• Several governments and institutions are opening their catalogs• http://datacatalogs.org provides a manually curated index of 226 data catalogs 14
  15. 15. http://datacatalogs.org 15
  16. 16. 16
  17. 17. The Data Hub• Manually curated list of (>3.500) data sets, at least 326 Linked Data Sets• Various metadata for each data set• Other views over (part of) its content • Semantic CKAN (http://semantic.ckan.net) • LATC Data Source Inventory • LOD Cloud • State of the LOD Cloud 17
  18. 18. http://thedatahub.org 18
  19. 19. 19
  20. 20. http://dsi.lod-cloud.net 20
  21. 21. http://lod-cloud.net 21
  22. 22. http://lod-cloud.net/state/ 22
  23. 23. http://lod-cloud.net/state 23
  24. 24. Data Marketplaces• “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable and unified format. Several of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers.” (http://datamarket.com) 24
  25. 25. Kasabi• Data domain • All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …• Data population • Public datasets • User submitted datasets• Data size • 186 data sets• Data model • RDF 25
  26. 26. http://kasabi.com 26
  27. 27. Freebase• Metaweb (USA), now Google• Free for 100K read API calls per day (10K write), paid for higher volumes• Data access • REST API • Linked Data endpoint (http://rdf.freebase.com) • Triple uploader / RDF dumps• Data tools • Web based – schema editor, review queue, viewers, … • GridWorks (Google Refine) • Exploring, data cleaning, transformation of tabular data • Map data to Freebase schema & RDF export (3rd party extension) 27
  28. 28. http://www.freebase.com 28
  29. 29. 29
  30. 30. Linked Open Vocabularies (LOV)• Initiative similar to the LOD Cloud but focused on vocabularies• 250+ vocabularies 30
  31. 31. http://labs.mondeca.com/dataset/lov/ 31
  32. 32. 32

×