Open Data

- Principles and Techniques -
VU Web Engineering / TU Wien
May 15th 2014
!
- Bernhard Haslhofer -
About me
• Data Scientist @ AIT - Austrian Institute of
Technology
• Previously
– Lecturer & Researcher @ Cornell University, NY,
USA
– Univ. Ass @ University of Vienna
– …
2
About me
• Research Interests
– Web-based information systems
• Structured Web Data
• Knowledge Graphs
• Data quality issues
• …
– Large-scale data analytics
• Machine learning
• Network analysis
• Information retrieval
3
My plan for today…
• Open Data – Principles and Examples
!
• Technique #1: Linked (Open) Data
!
• Technique #2: Microdata
!
• Open Data Activities in Austria
!
• Questions / Discussion
4
Open Data – Principles
!
“Open data is data that
can be freely used, reused
and redistributed by
anyone - subject only, at
most, to the requirement
to attribute and
sharealike.”
!
Open Data Handbook, 2012, Open Knowledge Foundation

http://opendatahandbook.org/
5
P#1: Availability and Access
Data must be available as a
whole and at no more than a
reasonable reproduction cost,
preferably by downloading over
the internet
!
Data must also be available in a
convenient and modifiable form
6http://opendefinition.org/
P#2: Reuse and Redistribution
Data must be provided
under terms that permit
reuse and redistribution
including the intermixing
with other datasets.
7http://opendefinition.org/
P#3: Universal Participation
Everyone must be able to use,
reuse and redistribute (no
discrimination)
!
No ‘non-commercial’
restrictions
8http://opendefinition.org/
Questions
!
• Do the open data principles sound
familiar (to CS students / software
engineers)?
!
• Any known “open data” examples?
9
Open Data Licensing
10
Public Domain Dedication
11
Open Data Movement
12
Source: http://www.flickr.com/photos/jamescridland/613445810/sizes/l/in/photo
Open Government Data
13
14
15
“Decades ago, the US Government made both
whether data and the GPS System freely
available. Since that time, American
entrepreneurs and innovators have utilised
these resources to create navigation systems,
location-based applications, …”
16
Open Government Data
17
18
19
Open Government Data
Developers	

Entrepreneurs	

Startups
Apps / Services
(Open) Data Journalism
20
21
(Open) Data Journalism
(Open) Data Journalism
22
http://datajournalismhandbook.org/
Open Data in Science
23
Open Data in Science / Open Access
24
How can publish and access structured data
on the Web?
My plan for today…
• Open Data – Principles and Examples
!
• Technique #1: Linked (Open) Data
!
• Technique #2: Microdata
!
• Open Data Activities in Austria
!
• Questions / Discussion
26
Linked Data
!
“A method of publishing structured data so
that it can be interlinked and become more
useful.
!
It builds upon standard Web technologies such
as HTTP, RDF and URIs, but rather than using
them to serve web pages for human readers,
it extends them to share information in a way
that can be read automatically by computers.
!
This enables data from different sources to
be connected and queried”
!
[Bizer, Heath, Berners-Lee 2009]
27
Linked Open Data
28Open Data + Linked Data = Linked Open Data
Why Linked Data?
Why Linked Data?
Why Linked Data?
Web Architecture
Web Architecture
• A set of simple standards
– Uniform global addressing (URI)
– Uniform document encoding (HTML)
– Uniform transportation (HTTP)
• Hyperlinks connecting documents
• Works pretty well for accessing and exchanging
documents

How can publish and access structured data
on the Web?
Web Services and Web APIs
Source: http://www.blogperfume.com/new-27-circular-social-media-icons-in-3-sizes/
Web Services and Web APIs
• Each Web API has a proprietary interface
• Datasources must be known in advance
• Information entities (papers, authors,
subjects, etc.) are often not linked
37
Social Networking Sites as Walled Gardens by David Simonds
Linked Data Vision
• Publish and link structured data on the Web
• Create a single globally connected data
space based on the Web Architecture
Web of Linked Data
• A set of simple standards
– Uniform global addressing (URI)
– Uniform data model (RDF)
– Uniform transportation (HTTP)
• RDF links connecting entities
• Forms a global data space and facilitates
accessing and exchanging data

What is Linked Data?
• A method to build a Web of Data
• Architectural style, set of standards
Linking Open Data Project
• A W3C community project with the goal to extend the
Web with a data commons by publishing various open data
sets as RDF on the Web and by setting links between data
items from different sources
~$ curl -I -H "Accept: text/turtle" http://dbpedia.org/resource/The_Shining_(film)	

!
~$ curl -H "Accept: text/turtle" http://dbpedia.org/data/The_Shining_(film).ttl
~$ sudo apt-get install raptor (Linux)	

~$ brew install raptor (Mac OSX)	

~$ rapper http://dbpedia.org/resource/The_Shining_(film)
LINKED DATA TECHNOLOGIES
48
RDF
• A data model for representing data on the Web
• Several statements (triples) form a graph
RDF/XML, N3, Turtle, etc.
• Data formats for RDF resource
representations
• Used to transfer RDF data between apps
RDFS
• A language for describing the syntax
and semantics of schemas/vocabularies
in a machine-understandable way
http://dbpedia.org/ontology/
Film
http://dbpedia.org/ontology/
Work
rdfs:subClassOf
OWL
• A more expressive (formal) language for defining
the syntax and semantics of schemas/vocabularies
• Solves RDFS shortcomings but introduces quite
some complexity
SKOS
• A language for describing controlled vocabularies
(taxonomies, thesauri, classification schemes)
SPARQL
• A query language and protocol for
accessing RDF data on the Web
SELECT DISTINCT ?x	

WHERE {	

! ?x dcterms:subject 	

! <http://dbpedia.org/resource/Category:1980s_horror_films> .	

}
Database Systems Analogy...
Purpose Relational Database
Management Systems (RDBMS)
Linked Data
Technologies
Query
Schema
Definition
Language
Data
Representation
Identifiers
55
?
Database Systems Analogy...
Purpose Relational Database
Management Systems (RDBMS)
Linked Data
Technologies
Query SQL SPARQL
Schema
Definition
Language
SQL DDL RDFS / OWL
Data
Representation
Relational Model / Tables RDF / Graph
Identifiers Primary Keys (numeric
sequences)
URI
56
DBPedia Query Demo
57
SELECT ?person (count(DISTINCT ?spouse) as ?spouses)
where {	

	

?person a yago:AmericanFilmActors .	

?person dbpprop:spouse ?spouse .	

!
}	

ORDER BY DESC(?spouses)	

LIMIT 100
LINKED DATA EXAMPLES
58
65
66
Google Knowledge Graph
• Enables search for things (people, places)
that Google knows about
!
• Rooted in public sources such as Freebase,
Wikipedia, CIA World Factbook, etc.
– augmented to 500M objects, 3.5B facts and
relationship
!
• Next generation search (semantic index)
67
68
69
My plan for today…
• Open Data – Principles and Examples
!
• Technique #1: Linked (Open) Data
!
• Technique #2: Microdata
!
• Open Data Activities in Austria
!
• Questions / Discussion
70
Rich Snippets / Microdata
71
Microdata (HTML5)
• An HTML 5 specification used to nest structured
data within existing content on Web pages.
!
• Search engines and browsers can extract and
process Microdata and provide richer browsing
experience for users
Microdata Example
<div itemscope itemtype="http://schema.org/Person">	

!
! <span itemprop="name">Bernhard Haslhofer</span>,	

! <span itemprop="nickname">behas</span>. 	

! <div !itemprop="address”	

! !itemscope itemtype="http://schema.org/PostalAddress">	

! ! <span itemprop="streetAddress">301 College Avenue</span>	

! ! <span itemprop=”addressLocality">Ithaca</span>	

! ! <span itemprop=”addressCountry">United States</span>	

! </div>	

</div>
Schema.org
schema.org / Microdata example
<h1>Pirates of the Carribean: On Stranger Tides (2011)</h1>	

Jack Sparrow and Barbossa embark on a quest to find the elusive
fountain	

of youth, only to discover that Blackbeard and his daughter
are after it too.	

!
Director: Rob Marshall	

Writers: Ted Elliott, Terry Rossio, and 7 more credits	

Stars: Johnny Depp, Penelope Cruz, Ian McShane	

8/10 stars from 200 users. Reviews: 50.
schema.org / Microdata example
schema.org
• Defines
– a number of types (e.g, person), organized in
an inheritance hierarchy
– a number of properties (e.g., name)
• Extension mechanisms to extend the
schemas
• OWL representation: http://schema.org/
docs/schemaorg.owl
• http://schema.rdfs.org/index.html
78
Open Graph Protocol
81
My plan for today…
• Open Data – Principles and Examples
!
• Technique #1: Linked (Open) Data
!
• Technique #2: Microdata
!
• Open Data Activities in Austria
!
• Questions / Discussion
83
84
Open Government Data
85
Open Government Data
86
87
Open Government Data Apps
88
My plan for today…
• Open Data – The idea
!
• Implementation #1: Linked Open Data
!
• Implementation #2: Machine-readable HTML
tags
!
• Open Data Activities in Austria
!
• Questions / Discussion
89
Readings
!
• Tom Heath and Christian Bizer (2011) Linked Data:
Evolving the Web into a Global Data Space (1st
edition). Synthesis Lectures on the Semantic Web:
Theory and Technology, 1:1, 1-136. Morgan &
Claypool.
!
• Jason Ronallo: HTML5 Microdata and Schema.org

http://journal.code4lib.org/articles/6400

Open Data - Principles and Techniques

  • 1.
    Open Data
 - Principlesand Techniques - VU Web Engineering / TU Wien May 15th 2014 ! - Bernhard Haslhofer -
  • 2.
    About me • DataScientist @ AIT - Austrian Institute of Technology • Previously – Lecturer & Researcher @ Cornell University, NY, USA – Univ. Ass @ University of Vienna – … 2
  • 3.
    About me • ResearchInterests – Web-based information systems • Structured Web Data • Knowledge Graphs • Data quality issues • … – Large-scale data analytics • Machine learning • Network analysis • Information retrieval 3
  • 4.
    My plan fortoday… • Open Data – Principles and Examples ! • Technique #1: Linked (Open) Data ! • Technique #2: Microdata ! • Open Data Activities in Austria ! • Questions / Discussion 4
  • 5.
    Open Data –Principles ! “Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.” ! Open Data Handbook, 2012, Open Knowledge Foundation
 http://opendatahandbook.org/ 5
  • 6.
    P#1: Availability andAccess Data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet ! Data must also be available in a convenient and modifiable form 6http://opendefinition.org/
  • 7.
    P#2: Reuse andRedistribution Data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. 7http://opendefinition.org/
  • 8.
    P#3: Universal Participation Everyonemust be able to use, reuse and redistribute (no discrimination) ! No ‘non-commercial’ restrictions 8http://opendefinition.org/
  • 9.
    Questions ! • Do theopen data principles sound familiar (to CS students / software engineers)? ! • Any known “open data” examples? 9
  • 10.
  • 11.
  • 12.
    Open Data Movement 12 Source:http://www.flickr.com/photos/jamescridland/613445810/sizes/l/in/photo
  • 13.
  • 14.
  • 15.
    15 “Decades ago, theUS Government made both whether data and the GPS System freely available. Since that time, American entrepreneurs and innovators have utilised these resources to create navigation systems, location-based applications, …”
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Open Data inScience 23
  • 24.
    Open Data inScience / Open Access 24
  • 25.
    How can publishand access structured data on the Web?
  • 26.
    My plan fortoday… • Open Data – Principles and Examples ! • Technique #1: Linked (Open) Data ! • Technique #2: Microdata ! • Open Data Activities in Austria ! • Questions / Discussion 26
  • 27.
    Linked Data ! “A methodof publishing structured data so that it can be interlinked and become more useful. ! It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. ! This enables data from different sources to be connected and queried” ! [Bizer, Heath, Berners-Lee 2009] 27
  • 28.
    Linked Open Data 28OpenData + Linked Data = Linked Open Data
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
    Web Architecture • Aset of simple standards – Uniform global addressing (URI) – Uniform document encoding (HTML) – Uniform transportation (HTTP) • Hyperlinks connecting documents • Works pretty well for accessing and exchanging documents

  • 34.
    How can publishand access structured data on the Web?
  • 35.
    Web Services andWeb APIs Source: http://www.blogperfume.com/new-27-circular-social-media-icons-in-3-sizes/
  • 36.
    Web Services andWeb APIs • Each Web API has a proprietary interface • Datasources must be known in advance • Information entities (papers, authors, subjects, etc.) are often not linked
  • 37.
    37 Social Networking Sitesas Walled Gardens by David Simonds
  • 38.
    Linked Data Vision •Publish and link structured data on the Web • Create a single globally connected data space based on the Web Architecture
  • 39.
    Web of LinkedData • A set of simple standards – Uniform global addressing (URI) – Uniform data model (RDF) – Uniform transportation (HTTP) • RDF links connecting entities • Forms a global data space and facilitates accessing and exchanging data

  • 40.
    What is LinkedData? • A method to build a Web of Data • Architectural style, set of standards
  • 41.
    Linking Open DataProject • A W3C community project with the goal to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting links between data items from different sources
  • 47.
    ~$ curl -I-H "Accept: text/turtle" http://dbpedia.org/resource/The_Shining_(film) ! ~$ curl -H "Accept: text/turtle" http://dbpedia.org/data/The_Shining_(film).ttl ~$ sudo apt-get install raptor (Linux) ~$ brew install raptor (Mac OSX) ~$ rapper http://dbpedia.org/resource/The_Shining_(film)
  • 48.
  • 49.
    RDF • A datamodel for representing data on the Web • Several statements (triples) form a graph
  • 50.
    RDF/XML, N3, Turtle,etc. • Data formats for RDF resource representations • Used to transfer RDF data between apps
  • 51.
    RDFS • A languagefor describing the syntax and semantics of schemas/vocabularies in a machine-understandable way http://dbpedia.org/ontology/ Film http://dbpedia.org/ontology/ Work rdfs:subClassOf
  • 52.
    OWL • A moreexpressive (formal) language for defining the syntax and semantics of schemas/vocabularies • Solves RDFS shortcomings but introduces quite some complexity
  • 53.
    SKOS • A languagefor describing controlled vocabularies (taxonomies, thesauri, classification schemes)
  • 54.
    SPARQL • A querylanguage and protocol for accessing RDF data on the Web SELECT DISTINCT ?x WHERE { ! ?x dcterms:subject ! <http://dbpedia.org/resource/Category:1980s_horror_films> . }
  • 55.
    Database Systems Analogy... PurposeRelational Database Management Systems (RDBMS) Linked Data Technologies Query Schema Definition Language Data Representation Identifiers 55 ?
  • 56.
    Database Systems Analogy... PurposeRelational Database Management Systems (RDBMS) Linked Data Technologies Query SQL SPARQL Schema Definition Language SQL DDL RDFS / OWL Data Representation Relational Model / Tables RDF / Graph Identifiers Primary Keys (numeric sequences) URI 56
  • 57.
    DBPedia Query Demo 57 SELECT?person (count(DISTINCT ?spouse) as ?spouses) where { ?person a yago:AmericanFilmActors . ?person dbpprop:spouse ?spouse . ! } ORDER BY DESC(?spouses) LIMIT 100
  • 58.
  • 65.
  • 66.
  • 67.
    Google Knowledge Graph •Enables search for things (people, places) that Google knows about ! • Rooted in public sources such as Freebase, Wikipedia, CIA World Factbook, etc. – augmented to 500M objects, 3.5B facts and relationship ! • Next generation search (semantic index) 67
  • 68.
  • 69.
  • 70.
    My plan fortoday… • Open Data – Principles and Examples ! • Technique #1: Linked (Open) Data ! • Technique #2: Microdata ! • Open Data Activities in Austria ! • Questions / Discussion 70
  • 71.
    Rich Snippets /Microdata 71
  • 72.
    Microdata (HTML5) • AnHTML 5 specification used to nest structured data within existing content on Web pages. ! • Search engines and browsers can extract and process Microdata and provide richer browsing experience for users
  • 73.
    Microdata Example <div itemscopeitemtype="http://schema.org/Person"> ! ! <span itemprop="name">Bernhard Haslhofer</span>, ! <span itemprop="nickname">behas</span>. ! <div !itemprop="address” ! !itemscope itemtype="http://schema.org/PostalAddress"> ! ! <span itemprop="streetAddress">301 College Avenue</span> ! ! <span itemprop=”addressLocality">Ithaca</span> ! ! <span itemprop=”addressCountry">United States</span> ! </div> </div>
  • 74.
  • 76.
    schema.org / Microdataexample <h1>Pirates of the Carribean: On Stranger Tides (2011)</h1> Jack Sparrow and Barbossa embark on a quest to find the elusive fountain of youth, only to discover that Blackbeard and his daughter are after it too. ! Director: Rob Marshall Writers: Ted Elliott, Terry Rossio, and 7 more credits Stars: Johnny Depp, Penelope Cruz, Ian McShane 8/10 stars from 200 users. Reviews: 50.
  • 77.
  • 78.
    schema.org • Defines – anumber of types (e.g, person), organized in an inheritance hierarchy – a number of properties (e.g., name) • Extension mechanisms to extend the schemas • OWL representation: http://schema.org/ docs/schemaorg.owl • http://schema.rdfs.org/index.html 78
  • 79.
  • 81.
  • 83.
    My plan fortoday… • Open Data – Principles and Examples ! • Technique #1: Linked (Open) Data ! • Technique #2: Microdata ! • Open Data Activities in Austria ! • Questions / Discussion 83
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
    My plan fortoday… • Open Data – The idea ! • Implementation #1: Linked Open Data ! • Implementation #2: Machine-readable HTML tags ! • Open Data Activities in Austria ! • Questions / Discussion 89
  • 90.
    Readings ! • Tom Heathand Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. ! • Jason Ronallo: HTML5 Microdata and Schema.org
 http://journal.code4lib.org/articles/6400