Linking the Open Data? by Petko Valtchev
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Linking the Open Data? by Petko Valtchev

on

  • 252 views

Slides presented at Open Data Exchange 2013, April 6 2013, Montreal, Canada. ODX13.com. Sponsored by Trudat.co

Slides presented at Open Data Exchange 2013, April 6 2013, Montreal, Canada. ODX13.com. Sponsored by Trudat.co

Statistics

Views

Total Views
252
Views on SlideShare
243
Embed Views
9

Actions

Likes
0
Downloads
1
Comments
0

2 Embeds 9

http://odx13.com 6
http://odx.io 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • “ The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. ”
  • A way of publishing data on the web that: Encourages reuse Reduces redundancy Maximises inter-connectedness Enables network effects Many ways to introduce the Linked Open Data Links provide a context, and context is important for proper processing.
  • Global identifier: URI Access to data: via HTTP Data model: RDF (a graph)
  • A graph of resources Vertices and edges are typed by terms provided in vocabularies: vocabularies are published in an open and distributed fashion. They can be mixed at will Moreover, the vocabulary terms are also resources (identified via URIs) Like in XML namespaces, shortcuts (prefixes) are used to avoid overloading the code with long URSLs FOAF is a vocabulary (schema) for representing people in the way linkedIn sees them DBpedia is an RDF version of Wikipedia: pages are translated into structured data
  • A graph of resources Vertices and edges are typed by terms provided in vocabularies: vocabularies are published in an open and distributed fashion. They can be mixed at will Moreover, the vocabulary terms are also resources (identified via URIs) Like in XML namespaces, shortcuts (prefixes) are used to avoid overloading the code with long URSLs FOAF is a vocabulary (schema) for representing people in the way linkedIn sees them DBpedia is an RDF version of Wikipedia: pages are translated into structured data
  • A graph of resources Vertices and edges are typed by terms provided in vocabularies: vocabularies are published in an open and distributed fashion. They can be mixed at will Moreover, the vocabulary terms are also resources (identified via URIs) Like in XML namespaces, shortcuts (prefixes) are used to avoid overloading the code with long URSLs FOAF is a vocabulary (schema) for representing people in the way linkedIn sees them DBpedia is an RDF version of Wikipedia: pages are translated into structured data
  • But haven ’t we been putting linked data on the web for years? In CSV , relational databases, XML etc? Well yes, but these approaches are not so easy to integrate Web 2.0 mashups work against a fixed set of data sources Linked Data applications operate on top of an unbound, global data space.
  • “ The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. ”
  • TODO: Microformats schema.org
  • Let us dream a little bit... Once there are RDFa annotations on a good number of page, some more or less interesting questions can be answered directly from the RDFa - aware Web browser Ex. Now I am again Ted and I want to know where my colleagues from Trudat live A much less useless question could be: I want to invite my colleagues for dinner and therefore need to know their dietary restrictions. Instead of phoning them one-by-one or maintain a local database for colleagues and friends, I trust their own RDFa-enabled personal web pages.
  • First of all, there is no particular semantics to provide for your data to be linked to other available data Think of an example: Remember the example of how open data could support social justice in US? The guy took the census data of an american city (say Atlanta) Focus was on particular area and he distinguished houses  between inhabited by black people inhabited by white people  He also took the water supply data, i.e., which houses were connected to the water lines By superposing the datasets, he discovered that ~83 % of the unconnected houses were inhabited by black people!!! This was a proof of discrimination and a judge (district) Well, what he did is matching between addresses in both datasets: he basically compared strings This is what is all about and you know strings my not always match perfectly :-( In a LOD format, URI (URL) would be assigned to individual addresses, so that there is a unique way of identifying an entity (resource) The processing would have been simpler and more reliable: Finding paths in the graph - using a dedicated query language, SPARQL But, the question is: DO the governments WANT us to have that much INSIGHT and at such a low PRICE?

Linking the Open Data? by Petko Valtchev Presentation Transcript

  • 1. Linking the Open Data? Petko Valtchev (Assoc. Prof., Dept. of CS, UQAM) ODX’13 Montreal, April 6th
  • 2. Why Link The Data “I want you to put your data on the Web.” Sir T. Berners-Lee (TED’07)•Original Web (1990s): • network of linked documents•Web of Data (2000s): • network of interlinked data items•Linked Open Data: Publish data on the Web: • max. reuse and inter-connections, min. redundancy, network effect Data is really useful, whenever it is shared and combined with other data.
  • 3. Linking Data?• But how should one produce such data? 1. Global identification: a URL should point to any data item. 2. Reachability via HTTP: accessing the URL should retrieve the data item. 3. Linked structure: outgoing links (typed!) in the data should point to additional data with URLs. http://www.w3.org/DesignIssues/LinkedData.html• THE language : Resource Description Framework (RDF) 1. benefits: links provide context
  • 4. A Graph? rdf:type pd:tedstr pd:tedstr foaf:Person foaf:Person foaf:name Ted Strauss Ted Strauss foaf:based_near dbpedia:Montre dbpedia:Montre al al dpprop: population 3,407,963 3,407,963
  • 5. A Graph? rdf:type pd:tedstr pd:tedstr foaf:Person foaf:Person foaf:name Ted Strauss Ted Strauss foaf:based_near dbpedia:Montreal dbpedia:Montreal dpprop: dbpedia-owl:country population 3,407,963 3,407,963 dbpedia:Canada dbpedia:Canada
  • 6. A Graph? Global? rdf:type pd:tedstr pd:tedstr foaf:Person foaf:Person foaf:knows foaf:name rdf:type Ted Strauss Ted Strausspd:linguo pd:linguo foaf:Person foaf:Person foaf:based_near foaf:name dbpedia:Montreal dbpedia:Montreal dpprop: Linkun Guo Linkun Guo foaf:based_near dbpedia-owl:country population dbpedia:Beijing dbpedia:Beijing 3,407,963 3,407,963 dpprop:population dbpedia:Canada dbpedia:Canada 20,693,000 20,693,000
  • 7. A Graph? Global? Giant? rdf:type pd:tedstr pd:tedstr foaf:Person foaf:Person foaf:knows foaf:name rdf:type Ted Strauss Ted Strausspd:linguo pd:linguo foaf:Person foaf:Person foaf:based_near foaf:name dbpedia:Montreal dbpedia:Montreal dpprop: Linkun Guo Linkun Guo foaf:based_near dbpedia-owl:country population dbpedia:Beijing dbpedia:Beijing 3,407,963 3,407,963 dbpedia-owl:country dbpedia:Toronto dbpedia:Toronto dpprop:population dbpedia:Canada dbpedia:Canada 20,693,000 20,693,000 dbpedia:Quebec dbpedia:Quebec dbpedia-owl:country
  • 8. How is it Open ?• ‘‘If you want to start interlinking data then you can only do that if the data is licensed in a way that allows such interlinking.’’ Rufus Pollock• But why is Open data on the Web not ‘linked’? • CVS, XML, RDBs • no easy integration • Web 2.0 Mashups? • data sources fixed• Linked Open Data (LOD) cloud - global data space
  • 9. The LOD cloud family pictureSept. 2011
  • 10. What for?• Linking Open Drug Data (LODD), since 2008 • Publish/interlink publicly available data about drugs • Provide answers to non trivial questions on the LODD • For physicians • Which are the equivalent drugs for a given condition? • What drugs are currently under clinical trial? • For patients • What alternatives exist to a given drug? • What are the contraindications for a drug?
  • 11. Supplemental Slides Petko Valtchev (Assoc. Prof., Dept. of CS, UQAM) ODX’13 Montreal, April 6th
  • 12. Main Entry Points into the LOD cloud• DBPedia - a large multi-domain dataset containing extracted data from Wikipedia; it contains about 3.77M concepts, 400+M facts with abstracts in 11 different languages.• YAGO - precise knowledge base with 1.7M entities and 15M facts derived from Wikipedia and WordNet.• FOAF (Friend Of A Friend) - describes people, the links between them and the things they create and do.• GoodRelations - a vocabulary for eCommerce, enabling web sites to publish details of their products and services in a machine-readable way.• GeoNames - provides RDF descriptions of more than 6.5M geographical features worldwide.
  • 13. Cross-Media Cultural Heritage Management with LOD• Simon is a Maths student visiting Montreal. He is fond of reading, cinema, music and history. His friends recommended him the flourishing Mile End district where many cafés serve espresso and european pastry.• Once settled down in a bar, he opens his iPad to look what is exciting about the surroundings. Knowing his preferences, the mobile app suggests him an excerpt from a novel written by the local "infant du quarter", Mordecai Richler, called "The Apprenticeship of Duddy Kravitz". The excerpt describes the life of the Jewish community on two of the areas principal streets, St Urban St., and "The Main" St. in the 1930s.• Once finished, Simon feels intrigued and accepts the suggestion to go for a short walk looking for remains from that period. While sipping his coffee, Simon checks the authors biography and finds he has written another book, "Barneys Version".• After screening a summary, it is suggested to look at the eponimous film directed by Richard J. Lewis. While watching a trailer, he noticed the youthful red-haired actress playing the 1st wife of the main character and after querying the app’s knowledge base he learns thats Rachelle Lefevre whos born in Montreal.• Before walking out, he checks the availability of a copy of "Barneys Version" and discovers that he can find one in the local municipal library.• When on the go, the system plays "Im your man" a song by Leonard Cohen, another literary celebrity from Montreal.
  • 14. The Semantic Annotations : RDFa• RDFa serializes RDF through HTML attributes • similar to microformats • @resource, @property, @href, @instanceof, @rel, etc.
  • 15. Cool applications of semantic annotations • Semantic query answering: • Where do my colleagues live? • Possible answers from their own web pages (via Trudat HP) • dbpedia:Montreal • dbpedia:Laval • dbpedia:Toronto • What are their dietary restrictions?
  • 16. Practical take on OD vs LOD• OD for social justice in US (say Atlanta)? • Dataset 1: census data • Focus on particular area with houses distinguished • inhabited by black people vs white people • Dataset 2: water supply data, houses connected to water lines or not• By superposing datasets 1 and 2, analysis uncovered a discrimination • ~83 % of the unconnected houses were inhabited by black people!!!• How was it done (a guess) • matching between addresses as strings compared :-(• LOD format - simpler and more reliable processing: • finding paths in the graph
  • 17. Data about the Data• Reasoning about the dataset: • Metadata: • e.g. Dublin core vocabulary• Notion of provenance • The problem of trust: everybody could publish everything