Advertisement
Advertisement

More Related Content

Advertisement

More from Bernhard Haslhofer(20)

Advertisement

Open Data - Principles and Techniques

  1. Open Data
 - Principles and Techniques - VU Web Engineering / TU Wien May 15th 2014 ! - Bernhard Haslhofer -
  2. About me • Data Scientist @ AIT - Austrian Institute of Technology • Previously – Lecturer & Researcher @ Cornell University, NY, USA – Univ. Ass @ University of Vienna – … 2
  3. About me • Research Interests – Web-based information systems • Structured Web Data • Knowledge Graphs • Data quality issues • … – Large-scale data analytics • Machine learning • Network analysis • Information retrieval 3
  4. My plan for today… • Open Data – Principles and Examples ! • Technique #1: Linked (Open) Data ! • Technique #2: Microdata ! • Open Data Activities in Austria ! • Questions / Discussion 4
  5. Open Data – Principles ! “Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.” ! Open Data Handbook, 2012, Open Knowledge Foundation
 http://opendatahandbook.org/ 5
  6. P#1: Availability and Access Data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet ! Data must also be available in a convenient and modifiable form 6http://opendefinition.org/
  7. P#2: Reuse and Redistribution Data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. 7http://opendefinition.org/
  8. P#3: Universal Participation Everyone must be able to use, reuse and redistribute (no discrimination) ! No ‘non-commercial’ restrictions 8http://opendefinition.org/
  9. Questions ! • Do the open data principles sound familiar (to CS students / software engineers)? ! • Any known “open data” examples? 9
  10. Open Data Licensing 10
  11. Public Domain Dedication 11
  12. Open Data Movement 12 Source: http://www.flickr.com/photos/jamescridland/613445810/sizes/l/in/photo
  13. Open Government Data 13
  14. 14
  15. 15 “Decades ago, the US Government made both whether data and the GPS System freely available. Since that time, American entrepreneurs and innovators have utilised these resources to create navigation systems, location-based applications, …”
  16. 16
  17. Open Government Data 17
  18. 18
  19. 19 Open Government Data Developers Entrepreneurs Startups Apps / Services
  20. (Open) Data Journalism 20
  21. 21 (Open) Data Journalism
  22. (Open) Data Journalism 22 http://datajournalismhandbook.org/
  23. Open Data in Science 23
  24. Open Data in Science / Open Access 24
  25. How can publish and access structured data on the Web?
  26. My plan for today… • Open Data – Principles and Examples ! • Technique #1: Linked (Open) Data ! • Technique #2: Microdata ! • Open Data Activities in Austria ! • Questions / Discussion 26
  27. Linked Data ! “A method of publishing structured data so that it can be interlinked and become more useful. ! It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. ! This enables data from different sources to be connected and queried” ! [Bizer, Heath, Berners-Lee 2009] 27
  28. Linked Open Data 28Open Data + Linked Data = Linked Open Data
  29. Why Linked Data?
  30. Why Linked Data?
  31. Why Linked Data?
  32. Web Architecture
  33. Web Architecture • A set of simple standards – Uniform global addressing (URI) – Uniform document encoding (HTML) – Uniform transportation (HTTP) • Hyperlinks connecting documents • Works pretty well for accessing and exchanging documents

  34. How can publish and access structured data on the Web?
  35. Web Services and Web APIs Source: http://www.blogperfume.com/new-27-circular-social-media-icons-in-3-sizes/
  36. Web Services and Web APIs • Each Web API has a proprietary interface • Datasources must be known in advance • Information entities (papers, authors, subjects, etc.) are often not linked
  37. 37 Social Networking Sites as Walled Gardens by David Simonds
  38. Linked Data Vision • Publish and link structured data on the Web • Create a single globally connected data space based on the Web Architecture
  39. Web of Linked Data • A set of simple standards – Uniform global addressing (URI) – Uniform data model (RDF) – Uniform transportation (HTTP) • RDF links connecting entities • Forms a global data space and facilitates accessing and exchanging data

  40. What is Linked Data? • A method to build a Web of Data • Architectural style, set of standards
  41. Linking Open Data Project • A W3C community project with the goal to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting links between data items from different sources
  42. ~$ curl -I -H "Accept: text/turtle" http://dbpedia.org/resource/The_Shining_(film) ! ~$ curl -H "Accept: text/turtle" http://dbpedia.org/data/The_Shining_(film).ttl ~$ sudo apt-get install raptor (Linux) ~$ brew install raptor (Mac OSX) ~$ rapper http://dbpedia.org/resource/The_Shining_(film)
  43. LINKED DATA TECHNOLOGIES 48
  44. RDF • A data model for representing data on the Web • Several statements (triples) form a graph
  45. RDF/XML, N3, Turtle, etc. • Data formats for RDF resource representations • Used to transfer RDF data between apps
  46. RDFS • A language for describing the syntax and semantics of schemas/vocabularies in a machine-understandable way http://dbpedia.org/ontology/ Film http://dbpedia.org/ontology/ Work rdfs:subClassOf
  47. OWL • A more expressive (formal) language for defining the syntax and semantics of schemas/vocabularies • Solves RDFS shortcomings but introduces quite some complexity
  48. SKOS • A language for describing controlled vocabularies (taxonomies, thesauri, classification schemes)
  49. SPARQL • A query language and protocol for accessing RDF data on the Web SELECT DISTINCT ?x WHERE { ! ?x dcterms:subject ! <http://dbpedia.org/resource/Category:1980s_horror_films> . }
  50. Database Systems Analogy... Purpose Relational Database Management Systems (RDBMS) Linked Data Technologies Query Schema Definition Language Data Representation Identifiers 55 ?
  51. Database Systems Analogy... Purpose Relational Database Management Systems (RDBMS) Linked Data Technologies Query SQL SPARQL Schema Definition Language SQL DDL RDFS / OWL Data Representation Relational Model / Tables RDF / Graph Identifiers Primary Keys (numeric sequences) URI 56
  52. DBPedia Query Demo 57 SELECT ?person (count(DISTINCT ?spouse) as ?spouses) where { ?person a yago:AmericanFilmActors . ?person dbpprop:spouse ?spouse . ! } ORDER BY DESC(?spouses) LIMIT 100
  53. LINKED DATA EXAMPLES 58
  54. 65
  55. 66
  56. Google Knowledge Graph • Enables search for things (people, places) that Google knows about ! • Rooted in public sources such as Freebase, Wikipedia, CIA World Factbook, etc. – augmented to 500M objects, 3.5B facts and relationship ! • Next generation search (semantic index) 67
  57. 68
  58. 69
  59. My plan for today… • Open Data – Principles and Examples ! • Technique #1: Linked (Open) Data ! • Technique #2: Microdata ! • Open Data Activities in Austria ! • Questions / Discussion 70
  60. Rich Snippets / Microdata 71
  61. Microdata (HTML5) • An HTML 5 specification used to nest structured data within existing content on Web pages. ! • Search engines and browsers can extract and process Microdata and provide richer browsing experience for users
  62. Microdata Example <div itemscope itemtype="http://schema.org/Person"> ! ! <span itemprop="name">Bernhard Haslhofer</span>, ! <span itemprop="nickname">behas</span>. ! <div !itemprop="address” ! !itemscope itemtype="http://schema.org/PostalAddress"> ! ! <span itemprop="streetAddress">301 College Avenue</span> ! ! <span itemprop=”addressLocality">Ithaca</span> ! ! <span itemprop=”addressCountry">United States</span> ! </div> </div>
  63. Schema.org
  64. schema.org / Microdata example <h1>Pirates of the Carribean: On Stranger Tides (2011)</h1> Jack Sparrow and Barbossa embark on a quest to find the elusive fountain of youth, only to discover that Blackbeard and his daughter are after it too. ! Director: Rob Marshall Writers: Ted Elliott, Terry Rossio, and 7 more credits Stars: Johnny Depp, Penelope Cruz, Ian McShane 8/10 stars from 200 users. Reviews: 50.
  65. schema.org / Microdata example
  66. schema.org • Defines – a number of types (e.g, person), organized in an inheritance hierarchy – a number of properties (e.g., name) • Extension mechanisms to extend the schemas • OWL representation: http://schema.org/ docs/schemaorg.owl • http://schema.rdfs.org/index.html 78
  67. Open Graph Protocol
  68. 81
  69. My plan for today… • Open Data – Principles and Examples ! • Technique #1: Linked (Open) Data ! • Technique #2: Microdata ! • Open Data Activities in Austria ! • Questions / Discussion 83
  70. 84
  71. Open Government Data 85
  72. Open Government Data 86
  73. 87
  74. Open Government Data Apps 88
  75. My plan for today… • Open Data – The idea ! • Implementation #1: Linked Open Data ! • Implementation #2: Machine-readable HTML tags ! • Open Data Activities in Austria ! • Questions / Discussion 89
  76. Readings ! • Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. ! • Jason Ronallo: HTML5 Microdata and Schema.org
 http://journal.code4lib.org/articles/6400
Advertisement