Open Data
- Principles and Techniques -
VU Web Engineering / TU Wien
May 15th 2014
!
- Bernhard Haslhofer -
About me
• Data Scientist @ AIT - Austrian Institute of
Technology
• Previously
– Lecturer & Researcher @ Cornell University, NY,
USA
– Univ. Ass @ University of Vienna
– …
2
About me
• Research Interests
– Web-based information systems
• Structured Web Data
• Knowledge Graphs
• Data quality issues
• …
– Large-scale data analytics
• Machine learning
• Network analysis
• Information retrieval
3
My plan for today…
• Open Data – Principles and Examples
!
• Technique #1: Linked (Open) Data
!
• Technique #2: Microdata
!
• Open Data Activities in Austria
!
• Questions / Discussion
4
Open Data – Principles
!
“Open data is data that
can be freely used, reused
and redistributed by
anyone - subject only, at
most, to the requirement
to attribute and
sharealike.”
!
Open Data Handbook, 2012, Open Knowledge Foundation
http://opendatahandbook.org/
5
P#1: Availability and Access
Data must be available as a
whole and at no more than a
reasonable reproduction cost,
preferably by downloading over
the internet
!
Data must also be available in a
convenient and modifiable form
6http://opendefinition.org/
P#2: Reuse and Redistribution
Data must be provided
under terms that permit
reuse and redistribution
including the intermixing
with other datasets.
7http://opendefinition.org/
P#3: Universal Participation
Everyone must be able to use,
reuse and redistribute (no
discrimination)
!
No ‘non-commercial’
restrictions
8http://opendefinition.org/
Questions
!
• Do the open data principles sound
familiar (to CS students / software
engineers)?
!
• Any known “open data” examples?
9
15
“Decades ago, the US Government made both
whether data and the GPS System freely
available. Since that time, American
entrepreneurs and innovators have utilised
these resources to create navigation systems,
location-based applications, …”
My plan for today…
• Open Data – Principles and Examples
!
• Technique #1: Linked (Open) Data
!
• Technique #2: Microdata
!
• Open Data Activities in Austria
!
• Questions / Discussion
26
Linked Data
!
“A method of publishing structured data so
that it can be interlinked and become more
useful.
!
It builds upon standard Web technologies such
as HTTP, RDF and URIs, but rather than using
them to serve web pages for human readers,
it extends them to share information in a way
that can be read automatically by computers.
!
This enables data from different sources to
be connected and queried”
!
[Bizer, Heath, Berners-Lee 2009]
27
Web Architecture
• A set of simple standards
– Uniform global addressing (URI)
– Uniform document encoding (HTML)
– Uniform transportation (HTTP)
• Hyperlinks connecting documents
• Works pretty well for accessing and exchanging
documents
Web Services and Web APIs
Source: http://www.blogperfume.com/new-27-circular-social-media-icons-in-3-sizes/
Web Services and Web APIs
• Each Web API has a proprietary interface
• Datasources must be known in advance
• Information entities (papers, authors,
subjects, etc.) are often not linked
Linked Data Vision
• Publish and link structured data on the Web
• Create a single globally connected data
space based on the Web Architecture
Web of Linked Data
• A set of simple standards
– Uniform global addressing (URI)
– Uniform data model (RDF)
– Uniform transportation (HTTP)
• RDF links connecting entities
• Forms a global data space and facilitates
accessing and exchanging data
What is Linked Data?
• A method to build a Web of Data
• Architectural style, set of standards
Linking Open Data Project
• A W3C community project with the goal to extend the
Web with a data commons by publishing various open data
sets as RDF on the Web and by setting links between data
items from different sources
RDF
• A data model for representing data on the Web
• Several statements (triples) form a graph
RDF/XML, N3, Turtle, etc.
• Data formats for RDF resource
representations
• Used to transfer RDF data between apps
RDFS
• A language for describing the syntax
and semantics of schemas/vocabularies
in a machine-understandable way
http://dbpedia.org/ontology/
Film
http://dbpedia.org/ontology/
Work
rdfs:subClassOf
OWL
• A more expressive (formal) language for defining
the syntax and semantics of schemas/vocabularies
• Solves RDFS shortcomings but introduces quite
some complexity
SKOS
• A language for describing controlled vocabularies
(taxonomies, thesauri, classification schemes)
SPARQL
• A query language and protocol for
accessing RDF data on the Web
SELECT DISTINCT ?x
WHERE {
! ?x dcterms:subject
! <http://dbpedia.org/resource/Category:1980s_horror_films> .
}
Database Systems Analogy...
Purpose Relational Database
Management Systems (RDBMS)
Linked Data
Technologies
Query
Schema
Definition
Language
Data
Representation
Identifiers
55
?
Database Systems Analogy...
Purpose Relational Database
Management Systems (RDBMS)
Linked Data
Technologies
Query SQL SPARQL
Schema
Definition
Language
SQL DDL RDFS / OWL
Data
Representation
Relational Model / Tables RDF / Graph
Identifiers Primary Keys (numeric
sequences)
URI
56
DBPedia Query Demo
57
SELECT ?person (count(DISTINCT ?spouse) as ?spouses)
where {
?person a yago:AmericanFilmActors .
?person dbpprop:spouse ?spouse .
!
}
ORDER BY DESC(?spouses)
LIMIT 100
Google Knowledge Graph
• Enables search for things (people, places)
that Google knows about
!
• Rooted in public sources such as Freebase,
Wikipedia, CIA World Factbook, etc.
– augmented to 500M objects, 3.5B facts and
relationship
!
• Next generation search (semantic index)
67
My plan for today…
• Open Data – Principles and Examples
!
• Technique #1: Linked (Open) Data
!
• Technique #2: Microdata
!
• Open Data Activities in Austria
!
• Questions / Discussion
70
Microdata (HTML5)
• An HTML 5 specification used to nest structured
data within existing content on Web pages.
!
• Search engines and browsers can extract and
process Microdata and provide richer browsing
experience for users
schema.org / Microdata example
<h1>Pirates of the Carribean: On Stranger Tides (2011)</h1>
Jack Sparrow and Barbossa embark on a quest to find the elusive
fountain
of youth, only to discover that Blackbeard and his daughter
are after it too.
!
Director: Rob Marshall
Writers: Ted Elliott, Terry Rossio, and 7 more credits
Stars: Johnny Depp, Penelope Cruz, Ian McShane
8/10 stars from 200 users. Reviews: 50.
schema.org
• Defines
– a number of types (e.g, person), organized in
an inheritance hierarchy
– a number of properties (e.g., name)
• Extension mechanisms to extend the
schemas
• OWL representation: http://schema.org/
docs/schemaorg.owl
• http://schema.rdfs.org/index.html
78
My plan for today…
• Open Data – Principles and Examples
!
• Technique #1: Linked (Open) Data
!
• Technique #2: Microdata
!
• Open Data Activities in Austria
!
• Questions / Discussion
83
My plan for today…
• Open Data – The idea
!
• Implementation #1: Linked Open Data
!
• Implementation #2: Machine-readable HTML
tags
!
• Open Data Activities in Austria
!
• Questions / Discussion
89
Readings
!
• Tom Heath and Christian Bizer (2011) Linked Data:
Evolving the Web into a Global Data Space (1st
edition). Synthesis Lectures on the Semantic Web:
Theory and Technology, 1:1, 1-136. Morgan &
Claypool.
!
• Jason Ronallo: HTML5 Microdata and Schema.org
http://journal.code4lib.org/articles/6400