SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Slides prepared with Clement Levallois for the tutorial held at the Meertens institute. The presentation goes over the need for using Linked Data to make data machine readable. The hands-on part is focused on the annotation of a profile page with RDFa.
Slides prepared with Clement Levallois for the tutorial held at the Meertens institute. The presentation goes over the need for using Linked Data to make data machine readable. The hands-on part is focused on the annotation of a profile page with RDFa.
Is Linked Data something for
me? Christophe Guéret, Clément Levallois eHumanities group meeting, November 22, 2012 1/
Get ready ! Goal of
today Learn about Linked Data See if that is something interesting for your activities 2/
Hands-on tutorial Make groups, one
per table Pick a famous person of your choice per group Grab the material on http://bit.ly/ehg_tutorial or catch a USB stick 3/
Big data, but how to
get it? Can't always gather all the information manually 4/
What if we could? If
all data where “readable”, connections between datasets could be made. We would simply know more than we do today. “Linked data” is an attempt to do that 7/
Why is it so hard?
Machines can not read the text and extract data What is the name of that person? 8/
Ouch! You just faced the
same problem as machines: Can't read the document and extract the data Linked Data is a solution to this problem Note: in the following we take the example of data “buried” in webpages (html documents), but the same logic applies to other kinds of docs (csv files, databases, your collection of pictures…) 9/
What we will do... Take
a the webpage of a researcher (one page per group!) Explain why the data in this page is “buried” Solve the issue by introducing some linked data sweetness in the webpage Show what we gained: now, we can connect the researchers! 11/
In what sense do we
mean that the name of this researcher is buried in this web page? There is no way for a software reading this page to guess: is there a name on this page? if so, what is this name? What does this name represent? What does it relate to? But wait, my Internet browser can read html pages, why can’t it figure out the name of the researcher? Because the html code gives info about how to display the page, but no info about what the content means! 18/
Two roads from there… We
could design a software that understands English This is the approach of natural language processing, statistics, etc... We can put extra code that tells directly to the software what the data means This is the linked data approach! This extra code in html pages is called “RDFa” 19/
Wait! What is that “foaf:name”
? It is a term from a vocabulary foaf:name comes from the vocabulary FOAF and is used to annotate the name of a person Key concept!!! Vocabulary = set of unambiguous consensual terms used to annotate pages with data Vocabulary are An agreement between data publisher and consumers Generally focused on particular topics 21/
Hands-on: annotate with foaf:name Add
the “foaf:name” annotation to the three templates Step 1: declare the vocabulary FOAF <html xmlns:foaf="http://xmlns.com/foaf/0.1/"> Step 2: annotate the data <span property="foaf:name">William Smith</span> Template 2 does not display the name we use a meta: <meta property="foaf:name" content="William Smith"/> 23/
Hands-on: extract annotations Use the
RDFa extractor at http://bit.ly/RDFaParser to get the annotations from the three templates Command line tool: java -jar RDFaParser-0.0.6.jar template1.html java -jar RDFaParser-0.0.6.jar template2.html java -jar RDFaParser-0.0.6.jar template3.html All the three return the same result: nothing! 24/
How to choose a vocabulary?
Vocabulary => consensus Therefore, it is better to Avoid obscure vocabularies nobody knows Focus on well organised and maintained vocabularies Why did we use FOAF? Specialised for personal profiles and widely accepted W3C support & recommended for use by EU members http://joinup.ec.europa.eu/asset/core_person/description 27/
What vocabularies are available? Many
are well established: FOAF, SIOC, Dublin Core, BIBO, … Creating vocabularies is doable but beware that: New vocabularies won't necessarily gain adoption Need to maintain the vocabulary Need to host it on the Web A vocabulary can borrow terms from other vocabs. 28/
How to use a vocabulary?
Look at the documentation, e.g. http://xmlns.com/foaf/spec/ Map your concepts to terms from the vocabulary Naam → foaf:name Voornaam → foaf:firstName Achternaam → foaf:lastName Werklocatie → foaf:based_near 32/
Triples and subjects Remember, we
created this annotation . foaf:name "William Smith“ But what entity has “William Smith” for a name? <template1.html> foaf:name "William Smith" Meaning: This document has for name “William Smith” This is a “triple” made of a subject, a predicate and an object Subject = <template1.html> Predicate = foaf:name Object = "William Smith" 33/
We did not declare a
subject This says that this is the foaf:name but does not define a subject → Use the page name by default foaf:name 34/
Why does this matter? Subjects
can be used as objects to create links foaf:knows foaf:name Need a common subject to group annotations foaf:name William smith foaf:based_near Durham 35/
Picking a resource Need to
be stable, web accessible, re-used Consensus again, example: Amsterdam: http://dbpedia.org/resource/Amsterdam TBL: http://www.w3.org/People/Berners-Lee/card#i The <C:/MyDirectory/templateX.html> are not valid Web based, we need to change that 36/
Hands-on: set the subject Step
1: decide on a resource for the person http://example.org/william_smith http://myurl.com/john_doe Step 2: add the resource with an “about” tag in the same span as the foaf:name Example: You had: <span property="foaf:name"> It becomes: <span about="http://example.org/william_smith_page" property="foaf:name"> 37/
5-star Linked Data Rules (see
http://5stardata.info/ ): Resource are valid URIs Machine readable data is associated to the resource The data contains links to other resources Example http://dbpedia.org/resource/Amsterdam 38/
Great! We're done now! We
added this structured piece of data to all the templates: <http://example.org/william_smith> foaf:name "William Smith" This data can be extracted by a software We can build our application that fetch persons' name, but there are still no links between them :-/ 39/
One of the new code
All the annotated templates have their name suffixed with “_with_name_and_subject” 40/
Creating links Links are used
to connect two resources Example: William Smith knows Tim Berners-Lee <http://example.org/william_smith> foaf:knows <http://www.w3.org/People/Berners-Lee/card#i> Two usages: Create (social) networks by connecting resources Disambiguate text by pointing to the exact resource 42/
Hands-on: getting social Step 1:
ask 3 other groups in this workshop for their subject (remember, a subject is: <span about="http://example.org/william_smith_page" property="foaf:name"> Step 2: use the 3 subjects you got to annotate the links Example: I know <span rel="foaf:knows" resource="http://example.org/john_doe">John Doe</span> , and <span rel="foaf:knows" resource="http://myUrl.com/nchomsky">Noam Chomsky</span> , and also <span rel="foaf:knows" resource="http://ehumanities.knaw.nl/sally_wyatt">Sally Wyatt</span> 43/
Remember, there are two Durham
One of the US, one in the UK, similar importance Which one is the “Durham” on the profile? http://sws.geonames.org/4464368 http://sws.geonames.org/2650628 45/
Hands-on: disambiguate Durham Annotate “Durham”
with a link to the exact resource Step 1: decide on which Durham to use Step 2: annotate Durham with the link <span rel="foaf:based_near" about="http://example.org/william_smith" resource="http://sws.geonames.org/4464368">Durham</ span> 47/
Hands-on: extract annotations Use the
RDFa extractor at http://bit.ly/RDFaParser to get the annotations from the three templates Command line tool: java -jar RDFaParser-0.0.6.jar template1.html java -jar RDFaParser-0.0.6.jar template2.html java -jar RDFaParser-0.0.6.jar template3.html All the three return the same result! 48/