SlideShare a Scribd company logo
1 of 37
31st ADLUG ANNUAL MEETING 2012 
Sala Brunelleschi of the OPA – CESVOT - Firenze 
19 – 21 September 2012 
Linking Linked Data 
Andrea Gazzarini 
Software Architect 
Copyright 2009-2010 @CULT. All rights reserved
Agenda 
Goals 
Information Retrieval 
Triple store 
Proof of concept 
Q&A 
Copyright 2009-2010 @CULT. All rights reserved 2
Agenda 
Goals 
Information Retrieval 
Triple store 
Proof of concept 
Q&A 
Copyright 2009-2010 @CULT. All rights reserved 3
Goals 
1) Combine two different technologies in order to improve the (user) search 
experience by decoupling the “search” from the “view” perspective. 
2) Provide a fast full-featured fulltext search that is able to scale over billion 
of records, providing tipical search features like faceting, stemming, 
autocompletion and so on... 
3) Provide a system that is able to benefit of the Linked Data 
extensibility feature 
Copyright 2009-2010 @CULT. All rights reserved 4
Le avventure di Pinocchio 
This is a record extracted from the recordset we will use during 
this presentation. 
000 00694nam a2200241 i 4500 
008 971205s1997 it j 000 0 ita c 
020 a 880921191X 
082 1 a 853.8 
100 1 a Collodi, Carlo. 
245 13 a Le avventure di Pinocchio / 
c C. Collodi ; illustrazioni di Attilio Mussino. 
260 a Firenze : 
b Giunti, 
c 1997. 
440 0 a Collana favolosa / [Giunti] 
521 a Letteratura per ragazzi 
700 1 a Mussino, Attilio. 
Copyright 2009-2010 @CULT. All rights reserved 5
Agenda 
Goals 
Information Retrieval 
Triple store 
Proof of concept 
Q&A 
Copyright 2009-2010 @CULT. All rights reserved 6
Information Retrieval (1/2) 
For our purposes we will (simplistically) define an Information Retrieval (IR) as 
a full-text search framework able to index textual data and perform some 
manipulation in order to enable some end user interesting search features like: 
» Relevance computation and boosting 
» Autocompletion 
» Faceting 
» Stemming 
» Did you mean? 
» Search by phoneme (i.e. Sounds Like) 
» More like this 
» ...and many many others... 
But there's a price to pay for that... 
Copyright 2009-2010 @CULT. All rights reserved 7
Inverted index 
In computer science, an inverted index (also referred to as postings file or 
inverted file) is an index data structure storing a mapping from content, such 
as words or numbers, to its locations in a database file, or in a document or a 
set of documents. The purpose of an inverted index is to allow fast full text 
searches, at a cost of increased processing when a document is added to the 
database. The inverted file may be the database file itself, rather than its 
index. It is the most popular data structure used in document retrieval systems 
http://en.wikipedia.org/wiki/Inverted_index 
An inverted index is an optimized structure that allows fast searches but is 
supposed to be immutable so that means if you need to change something in 
your data you need to rebuild your index. 
Copyright 2009-2010 @CULT. All rights reserved 8
Semantic destruction (1/3) 
A search engine doesn't care about how much accuracy you put and how 
many time you spent for cataloguing a bibliographic resource...once 
indexed, it will loose any semantic meaning! 
...ipsum 
dolor sit 
amet, 
consectetur 
adipiscing... 
A 
S 
C 
C 
I 
Y 
L 
O E 
Z 
P I 
O 
U 
A 
U 
Y R D 
W 
Copyright 2009-2010 @CULT. All rights reserved 9
Semantic destruction (2/3) 
The adventures of Pinocchio 
The adventures of Pinocchio 
adventures Pinocchio 
adventures pinocchio 
adventure pinocchio 
ATFN PNX 
Tokenization 
Stopwords 
Lowercase 
Stemming (light) 
Phoneme (!) 
These are the only tokens that will be indexed! 
Copyright 2009-2010 @CULT. All rights reserved 10
Semantic destruction (3/3) 
ATFN PNX 
KRL KLT 
Copyright 2009-2010 @CULT. All rights reserved 11
Agenda 
Goals 
Information Retrieval 
Triple store 
Proof of concept 
Q&A 
Copyright 2009-2010 @CULT. All rights reserved 12
Triple store (1/2) 
A triplestore is a purpose-built database for the storage and retrieval of triples, 
a triple being a data entity composed of subject-predicate-object, like "Bob 
is 35" or "Bob knows Fred". 
http://en.wikipedia.org/wiki/Triplestore 
Subject Predicate Object 
book hasTitle The adventures of Pinocchio 
book hasAuthor Collodi, Carlo 
book hasPublisher Giunti 
Of course it is more similar to a database and basically has nothing to do 
with an inverted index. 
Copyright 2009-2010 @CULT. All rights reserved 13
Triple store (2/2) 
Using a triple store you can have 
1) a standard Query language (SPARQL) to query the store; 
2) a standard format for exchanging data (RDF); 
3) a storage where you are free to change your data in realtime 
without doing any kind of reindex operation; 
But, most important, you cannot have 
any of the seach features we described in the previous slides; for 
some of them it is practically impossible (e.g. faceting), for others 
(e.g. autocompletion) the problem is mainly the response time; 
Copyright 2009-2010 @CULT. All rights reserved 14
Agenda 
Goals 
Information Retrieval 
Triple store 
Proof of concept 
Q&A 
Copyright 2009-2010 @CULT. All rights reserved 15
Proof of Concept 
Our system is able to combine together the previous described technologies 
trying to get all the advantages and minimize the disadvantages. 
MARC (Binary) MARC XML RDF / XML N3 Turtle NTriples 
Search View 
Information 
Retrieval 
Triple store 
Copyright 2009-2010 @CULT. All rights reserved 16
Concretely... 
Copyright 2009-2010 @CULT. All rights reserved 17
Le avventure di Pinocchio (MARC) 
000 00694nam a2200241 i 4500 
008 971205s1997 it j 000 0 ita c 
020 a 880921191X 
082 1 a 853.8 
100 1 a Collodi, Carlo. 
245 13 a Le avventure di Pinocchio / 
c C. Collodi ; illustrazioni di Attilio Mussino. 
260 a Firenze : 
b Giunti, 
c 1997. 
440 0 a Collana favolosa / [Giunti] 
521 a Letteratura per ragazzi 
700 1 a Mussino, Attilio. 
Copyright 2009-2010 @CULT. All rights reserved 18
Le avventure di Pinocchio (RDF / XML) 
<bibo:Book rdf:about="http://www.cbt.trentinocultura.net/biblio/000002577949"> 
<dcterms:identifier>000002577949</dcterms:identifier> 
<bibo:isbn10>880921191X</bibo:isbn10> 
<dcterms:shortTitle>Le avventure di Pinocchio</dcterms:shortTitle> 
<dcterms:title> 
Le avventure di Pinocchio / C. Collodi ; illustrazioni di Attilio Mussino 
The book... 
</dcterms:title> 
<dc:creator rdf:resource="http://www.cbt.trentinocultura.net/person/collodi_carlo"/> 
<dcterms:language>ita</dcterms:language> 
<dcterms:audience rdf:resource="http://www.cbt.trentinocultura.net/subject/opera_per_bambini"/> 
<dcterms:isPartOf rdf:resource="http://www.cbt.trentinocultura.net/biblio/2378129373323" /> 
<dcterms:extent>186 p.</dcterms:extent> 
<isbd:hasPlaceOfPublicationProductionDistribution> 
Firenze 
</isbd:hasPlaceOfPublicationProductionDistribution> 
<dcterms:issued>1997</dcterms:issued> 
<dcterms:publisher rdf:resource="http://www.cbt.trentinocultura.net/organisations/giunti"/> 
</bibo:Book> 
...the author... 
<foaf:Person rdf:about="http://www.cbt.trentinocultura.net/person/collodi_carlo"> 
<foaf:name>Collodi, Carlo</foaf:name> 
</foaf:Person> 
<foaf:Organization rdf:about="http://www.cbt.trentinocultura.net/organisations/giunti"> 
<foaf:name>Giunti</foaf:name> 
</foaf:Organization> 
...and the publisher 
Copyright 2009-2010 @CULT. All rights reserved 19
Step 1: transform MARC in RDF 
As first step we need to transform MARC records in their corresponding RDF 
representation. 
This presentation is not focused on this advanced topic, we will just index ten 
MARC records only for demonstrating the capabilities of the system. 
We choosen the RDF / XML format for expressing the resulting triples. This 
will be the input data of the system. 
MARC 21 RDF / XML 
Copyright 2009-2010 @CULT. All rights reserved 20
Step 2: submit RDF data 
The RDF data created in the previous step needs to be submitted to the 
system. 
RDF / XML 
Copyright 2009-2010 @CULT. All rights reserved 21
Step 3: make a search... 
Autocompletion 
Faceting 
Copyright 2009-2010 @CULT. All rights reserved 22
Step 4: more publisher data... 
It would be great if my users could see 
additional data on search results. 
For example, I could ask data to publishers 
(logo, homepage and so on)...maybe for them 
could be a kind of advertisment, while for my users an 
additional information displayed on my catalog 
But 
1) I don't want those data be part of my search index; 
2) I don't want to include those data in my bibliographic database; 
3) I don't want to reindex my data when some publisher information changes 
4) I would like to manage, improve those data without affecting searches 
Copyright 2009-2010 @CULT. All rights reserved 23
Step 6: Our sample publisher 
Before... 
<foaf:Organization rdf:about="http://www.cbt.trentinocultura.net/organisations/giunti"> 
<foaf:name>Giunti</foaf:name> 
</foaf:Organization> 
...and after 
<foaf:Organization rdf:about="http://www.cbt.trentinocultura.net/organisations/giunti"> 
<foaf:name>Giunti</foaf:name> 
<foaf:logo rdf:resource=”http://www.giunti.it/custom/src/@css/images/logo_Giunti.jpg”/> 
<rdfs:comment>Fondata nel pieno delle battaglie risorgimentali...</rdfs:comment> 
<foaf:mbox rdf:resource=”mailto:contactsus@domain.it”/> 
<foaf:homepage rdf:resource=”http://www.giunti.it”/> 
</foaf:Organization> 
As you can see, we added a logo, a brief description of the publisher, a mailbox and a 
homepage. We got data directly from the publisher website. 
This data will be submitted again to the search system but without rebuild the search index. 
As consequence of that, changes made to the publishers are immediately available. 
Copyright 2009-2010 @CULT. All rights reserved 24
Step 7: see additional data... 
Copyright 2009-2010 @CULT. All rights reserved 25
Step 7 bis: another publisher... 
Copyright 2009-2010 @CULT. All rights reserved 26
Step 8: still more (linked) data... (1/3) 
Great! My users were enthusiast!! 
So I'd like more...and not only publisher... 
but what else? 
Sir, I think it would be very useful if we would 
show, beside each record, author information 
Yes definitely it would, but you have no idea of what kind of 
job I did to insert all publisher data and I don't 
want to do the same for authors...too much work! 
If I remember well your system is 
Yes using Linked Data isn't it? 
So in this case the right question is not “How can I do, I have no data”, 
but “What kind of data I would like to show?” 
??? 
Copyright 2009-2010 @CULT. All rights reserved 27
Step 8: still more (linked) data...(2/3) 
There a lot of RDF authoritative endpoints that are exposing their data free of charge; 
the main advantage is that you can link this information to your system and you 
don't have to worry about their maintenance: it's not your data! See 
http://viaf.org or http://dbpedia.org 
By linking those resources, you can get data in a standardized way because sources 
are sharing one or more (accepted) ontologies for describing authors, subjects, 
things and so on... 
So for the example above we need the gather additional information about people 
(authors) and fortunately there's an ontology called Friend of a Friend (FOAF) that 
fits exactly our needs. This ontology is used in all RDF sources describing persons 
(like VIAF, Dbpedia) 
In our example instead of copying and storing in our triple store (as we did for 
publishers) all information about Carlo Collodi, the author of “The adventures of 
Pinocchio”, we will simply link our internal representation with the same resource 
as defined in DBPedia. 
Copyright 2009-2010 @CULT. All rights reserved 28
Step 8: still more (linked) data...(3/3) 
Copyright 2009-2010 @CULT. All rights reserved 29
Step 9: Our sample author 
Before... 
<foaf:Organization rdf:about="http://www.cbt.trentinocultura.net/person/collodi_carlo"> 
<foaf:name>Collodi, Carlo</foaf:name> 
</foaf:Organization> 
...and after 
<foaf:Organization rdf:about="http://www.cbt.trentinocultura.net/person/collodi_carlo"> 
<foaf:name>Collodi, Carlo</foaf:name> 
<owl:sameAs rdf:resource=”http://dbpedia.org/resource/Carlo_Collodi”/> 
</foaf:Organization> 
As you can see, we didn't add any information but just a “link” with the sameAs predicate. 
The URI (http://dbpedia.org/resource/Carlo_Collodi) points to a web resource describing 
Carlo Collodi, so we can gather this data and display to the end user (for example). 
Copyright 2009-2010 @CULT. All rights reserved 30
Step 10: again the same search... 
Copyright 2009-2010 @CULT. All rights reserved 31
Step 10 bis: another author... 
Copyright 2009-2010 @CULT. All rights reserved 32
Step 11: still more data??? yes! 
Wow!! And now? 
Is there some other content I could “link”? 
Yes sir, subjects for example...are you using subjects 
coming from the “Nuovo Soggettario”? 
Yes 
So in this case you can link those subjects directly 
with concepts of the thesaurus, therefore providing 
to end users information like scope notes, 
history notes, term relationships and so on.. 
And, as another example, for places you can link “Geonames” 
resources, which provides RDF description of cities, countries. 
Copyright 2009-2010 @CULT. All rights reserved 33
Step 12: Linking the “Nuovo Soggettario“ 
Copyright 2009-2010 @CULT. All rights reserved 34
Step 13: Linking Firenze with Geonames 
Copyright 2009-2010 @CULT. All rights reserved 35
Agenda 
Goals 
Information Retrieval 
Triple store 
Proof of concept 
Q&A 
Copyright 2009-2010 @CULT. All rights reserved 36
31st ADLUG ANNUAL MEETING 2012 
Sala Brunelleschi of the OPA – Firenze 
19 – 21 September 2012 
Linking Linked Data 
Thank You!

More Related Content

Viewers also liked

Cadeaux corporatif Collection Christian Lacroix
Cadeaux corporatif Collection Christian LacroixCadeaux corporatif Collection Christian Lacroix
Cadeaux corporatif Collection Christian LacroixDominique Caisse
 
Passion for the Past: Maxine Tewsley
Passion for the Past: Maxine TewsleyPassion for the Past: Maxine Tewsley
Passion for the Past: Maxine TewsleyJill Tewsley
 
Programmazione base 1
Programmazione base 1Programmazione base 1
Programmazione base 1asterixms
 
DESMANIA_NEW_Mayuri[1]
DESMANIA_NEW_Mayuri[1]DESMANIA_NEW_Mayuri[1]
DESMANIA_NEW_Mayuri[1]Mayuri Saxena
 
Maskenvalidierung
MaskenvalidierungMaskenvalidierung
MaskenvalidierungCofinpro AG
 
Health healthy or not project
Health healthy or not projectHealth healthy or not project
Health healthy or not projectjataya
 
Hong Kong Citer 2013 presentation
Hong Kong Citer 2013 presentationHong Kong Citer 2013 presentation
Hong Kong Citer 2013 presentationQiang Hao
 
Supriya_VentureLab_2012_Creativity_Assignment2 pay_attention
Supriya_VentureLab_2012_Creativity_Assignment2 pay_attentionSupriya_VentureLab_2012_Creativity_Assignment2 pay_attention
Supriya_VentureLab_2012_Creativity_Assignment2 pay_attentionSup2012
 
Why Simple Email Layouts Have Maximum Impact
Why Simple Email Layouts Have Maximum ImpactWhy Simple Email Layouts Have Maximum Impact
Why Simple Email Layouts Have Maximum ImpactMad Mimi
 
Seaborn Slide WV
Seaborn Slide WVSeaborn Slide WV
Seaborn Slide WVseaborn1
 
A possible cure 2014 may9
A possible cure 2014 may9A possible cure 2014 may9
A possible cure 2014 may9Glen Alleman
 
Proteção coletiva
Proteção coletivaProteção coletiva
Proteção coletivaTiago Alves
 
The future collider
The future colliderThe future collider
The future colliderJoshua Davis
 
Psy 6100 assignment guide sum11
Psy 6100 assignment guide sum11Psy 6100 assignment guide sum11
Psy 6100 assignment guide sum11k3stone
 
Philip quiere ayudar
Philip quiere ayudarPhilip quiere ayudar
Philip quiere ayudaraba
 

Viewers also liked (20)

Cadeaux corporatif Collection Christian Lacroix
Cadeaux corporatif Collection Christian LacroixCadeaux corporatif Collection Christian Lacroix
Cadeaux corporatif Collection Christian Lacroix
 
IFL
IFLIFL
IFL
 
Passion for the Past: Maxine Tewsley
Passion for the Past: Maxine TewsleyPassion for the Past: Maxine Tewsley
Passion for the Past: Maxine Tewsley
 
Programmazione base 1
Programmazione base 1Programmazione base 1
Programmazione base 1
 
This week in mc&fp october 17 2011
This week in mc&fp october 17 2011This week in mc&fp october 17 2011
This week in mc&fp october 17 2011
 
DESMANIA_NEW_Mayuri[1]
DESMANIA_NEW_Mayuri[1]DESMANIA_NEW_Mayuri[1]
DESMANIA_NEW_Mayuri[1]
 
Maskenvalidierung
MaskenvalidierungMaskenvalidierung
Maskenvalidierung
 
Health healthy or not project
Health healthy or not projectHealth healthy or not project
Health healthy or not project
 
Hong Kong Citer 2013 presentation
Hong Kong Citer 2013 presentationHong Kong Citer 2013 presentation
Hong Kong Citer 2013 presentation
 
Supriya_VentureLab_2012_Creativity_Assignment2 pay_attention
Supriya_VentureLab_2012_Creativity_Assignment2 pay_attentionSupriya_VentureLab_2012_Creativity_Assignment2 pay_attention
Supriya_VentureLab_2012_Creativity_Assignment2 pay_attention
 
Why Simple Email Layouts Have Maximum Impact
Why Simple Email Layouts Have Maximum ImpactWhy Simple Email Layouts Have Maximum Impact
Why Simple Email Layouts Have Maximum Impact
 
Seaborn Slide WV
Seaborn Slide WVSeaborn Slide WV
Seaborn Slide WV
 
Dinâmicas
DinâmicasDinâmicas
Dinâmicas
 
A possible cure 2014 may9
A possible cure 2014 may9A possible cure 2014 may9
A possible cure 2014 may9
 
Proteção coletiva
Proteção coletivaProteção coletiva
Proteção coletiva
 
Message
MessageMessage
Message
 
The future collider
The future colliderThe future collider
The future collider
 
Psy 6100 assignment guide sum11
Psy 6100 assignment guide sum11Psy 6100 assignment guide sum11
Psy 6100 assignment guide sum11
 
2011 july 1 this week in mc&fp (1)
2011 july 1  this week in mc&fp  (1)2011 july 1  this week in mc&fp  (1)
2011 july 1 this week in mc&fp (1)
 
Philip quiere ayudar
Philip quiere ayudarPhilip quiere ayudar
Philip quiere ayudar
 

Similar to Linking library data for fast search and rich display

2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pubStephen Buxton
 
The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011David F. Flanders
 
20100614 ISWSA Keynote
20100614 ISWSA Keynote20100614 ISWSA Keynote
20100614 ISWSA KeynoteAxel Polleres
 
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...Dr. Haxel Consult
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Andy Petrella
 
UTOPOLL白皮書.pdf
UTOPOLL白皮書.pdfUTOPOLL白皮書.pdf
UTOPOLL白皮書.pdfaipaypoll
 
Architecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web ApplicationsArchitecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web Applicationsbpanulla
 
Azure Media Services & Azure Search
Azure Media Services & Azure SearchAzure Media Services & Azure Search
Azure Media Services & Azure SearchEmanuele Bartolesi
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 
Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to SolrTNR Global
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.Andy Petrella
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersEmanuele Della Valle
 
Linked Open Government Data in UK
Linked Open Government Data in UKLinked Open Government Data in UK
Linked Open Government Data in UKreeep
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of DataRinke Hoekstra
 
Making the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedMaking the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedInside Analysis
 
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
Open for Business  Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business  Open Archives, OpenURL, RSS and the Dublin Core
Open for Business Open Archives, OpenURL, RSS and the Dublin CoreAndy Powell
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalDatabricks
 

Similar to Linking library data for fast search and rich display (20)

2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
 
The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011
 
20100614 ISWSA Keynote
20100614 ISWSA Keynote20100614 ISWSA Keynote
20100614 ISWSA Keynote
 
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)
 
UTOPOLL白皮書.pdf
UTOPOLL白皮書.pdfUTOPOLL白皮書.pdf
UTOPOLL白皮書.pdf
 
COinS (eng version)
COinS (eng version)COinS (eng version)
COinS (eng version)
 
Architecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web ApplicationsArchitecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web Applications
 
Azure Media Services & Azure Search
Azure Media Services & Azure SearchAzure Media Services & Azure Search
Azure Media Services & Azure Search
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to Solr
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
 
Utopoll Whitepaper.pdf
Utopoll Whitepaper.pdfUtopoll Whitepaper.pdf
Utopoll Whitepaper.pdf
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
 
Harvesting&Metadata Enrich Project EVA 2009
Harvesting&Metadata Enrich Project   EVA 2009Harvesting&Metadata Enrich Project   EVA 2009
Harvesting&Metadata Enrich Project EVA 2009
 
Linked Open Government Data in UK
Linked Open Government Data in UKLinked Open Government Data in UK
Linked Open Government Data in UK
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
Making the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedMaking the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than Speed
 
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
Open for Business  Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business  Open Archives, OpenURL, RSS and the Dublin Core
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 

More from Andrea Gazzarini

Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based RankingAndrea Gazzarini
 
Introduction to Music Information Retrieval
Introduction to Music Information RetrievalIntroduction to Music Information Retrieval
Introduction to Music Information RetrievalAndrea Gazzarini
 
Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Andrea Gazzarini
 
Haystack London - Search Quality Evaluation, Tools and Techniques
Haystack London - Search Quality Evaluation, Tools and Techniques Haystack London - Search Quality Evaluation, Tools and Techniques
Haystack London - Search Quality Evaluation, Tools and Techniques Andrea Gazzarini
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
 
ADLUG 2013 - A proposal for an RDF assembly line
ADLUG 2013 - A proposal for an RDF assembly lineADLUG 2013 - A proposal for an RDF assembly line
ADLUG 2013 - A proposal for an RDF assembly lineAndrea Gazzarini
 

More from Andrea Gazzarini (6)

Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based Ranking
 
Introduction to Music Information Retrieval
Introduction to Music Information RetrievalIntroduction to Music Information Retrieval
Introduction to Music Information Retrieval
 
Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)
 
Haystack London - Search Quality Evaluation, Tools and Techniques
Haystack London - Search Quality Evaluation, Tools and Techniques Haystack London - Search Quality Evaluation, Tools and Techniques
Haystack London - Search Quality Evaluation, Tools and Techniques
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
 
ADLUG 2013 - A proposal for an RDF assembly line
ADLUG 2013 - A proposal for an RDF assembly lineADLUG 2013 - A proposal for an RDF assembly line
ADLUG 2013 - A proposal for an RDF assembly line
 

Recently uploaded

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Linking library data for fast search and rich display

  • 1. 31st ADLUG ANNUAL MEETING 2012 Sala Brunelleschi of the OPA – CESVOT - Firenze 19 – 21 September 2012 Linking Linked Data Andrea Gazzarini Software Architect Copyright 2009-2010 @CULT. All rights reserved
  • 2. Agenda Goals Information Retrieval Triple store Proof of concept Q&A Copyright 2009-2010 @CULT. All rights reserved 2
  • 3. Agenda Goals Information Retrieval Triple store Proof of concept Q&A Copyright 2009-2010 @CULT. All rights reserved 3
  • 4. Goals 1) Combine two different technologies in order to improve the (user) search experience by decoupling the “search” from the “view” perspective. 2) Provide a fast full-featured fulltext search that is able to scale over billion of records, providing tipical search features like faceting, stemming, autocompletion and so on... 3) Provide a system that is able to benefit of the Linked Data extensibility feature Copyright 2009-2010 @CULT. All rights reserved 4
  • 5. Le avventure di Pinocchio This is a record extracted from the recordset we will use during this presentation. 000 00694nam a2200241 i 4500 008 971205s1997 it j 000 0 ita c 020 a 880921191X 082 1 a 853.8 100 1 a Collodi, Carlo. 245 13 a Le avventure di Pinocchio / c C. Collodi ; illustrazioni di Attilio Mussino. 260 a Firenze : b Giunti, c 1997. 440 0 a Collana favolosa / [Giunti] 521 a Letteratura per ragazzi 700 1 a Mussino, Attilio. Copyright 2009-2010 @CULT. All rights reserved 5
  • 6. Agenda Goals Information Retrieval Triple store Proof of concept Q&A Copyright 2009-2010 @CULT. All rights reserved 6
  • 7. Information Retrieval (1/2) For our purposes we will (simplistically) define an Information Retrieval (IR) as a full-text search framework able to index textual data and perform some manipulation in order to enable some end user interesting search features like: » Relevance computation and boosting » Autocompletion » Faceting » Stemming » Did you mean? » Search by phoneme (i.e. Sounds Like) » More like this » ...and many many others... But there's a price to pay for that... Copyright 2009-2010 @CULT. All rights reserved 7
  • 8. Inverted index In computer science, an inverted index (also referred to as postings file or inverted file) is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents. The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database. The inverted file may be the database file itself, rather than its index. It is the most popular data structure used in document retrieval systems http://en.wikipedia.org/wiki/Inverted_index An inverted index is an optimized structure that allows fast searches but is supposed to be immutable so that means if you need to change something in your data you need to rebuild your index. Copyright 2009-2010 @CULT. All rights reserved 8
  • 9. Semantic destruction (1/3) A search engine doesn't care about how much accuracy you put and how many time you spent for cataloguing a bibliographic resource...once indexed, it will loose any semantic meaning! ...ipsum dolor sit amet, consectetur adipiscing... A S C C I Y L O E Z P I O U A U Y R D W Copyright 2009-2010 @CULT. All rights reserved 9
  • 10. Semantic destruction (2/3) The adventures of Pinocchio The adventures of Pinocchio adventures Pinocchio adventures pinocchio adventure pinocchio ATFN PNX Tokenization Stopwords Lowercase Stemming (light) Phoneme (!) These are the only tokens that will be indexed! Copyright 2009-2010 @CULT. All rights reserved 10
  • 11. Semantic destruction (3/3) ATFN PNX KRL KLT Copyright 2009-2010 @CULT. All rights reserved 11
  • 12. Agenda Goals Information Retrieval Triple store Proof of concept Q&A Copyright 2009-2010 @CULT. All rights reserved 12
  • 13. Triple store (1/2) A triplestore is a purpose-built database for the storage and retrieval of triples, a triple being a data entity composed of subject-predicate-object, like "Bob is 35" or "Bob knows Fred". http://en.wikipedia.org/wiki/Triplestore Subject Predicate Object book hasTitle The adventures of Pinocchio book hasAuthor Collodi, Carlo book hasPublisher Giunti Of course it is more similar to a database and basically has nothing to do with an inverted index. Copyright 2009-2010 @CULT. All rights reserved 13
  • 14. Triple store (2/2) Using a triple store you can have 1) a standard Query language (SPARQL) to query the store; 2) a standard format for exchanging data (RDF); 3) a storage where you are free to change your data in realtime without doing any kind of reindex operation; But, most important, you cannot have any of the seach features we described in the previous slides; for some of them it is practically impossible (e.g. faceting), for others (e.g. autocompletion) the problem is mainly the response time; Copyright 2009-2010 @CULT. All rights reserved 14
  • 15. Agenda Goals Information Retrieval Triple store Proof of concept Q&A Copyright 2009-2010 @CULT. All rights reserved 15
  • 16. Proof of Concept Our system is able to combine together the previous described technologies trying to get all the advantages and minimize the disadvantages. MARC (Binary) MARC XML RDF / XML N3 Turtle NTriples Search View Information Retrieval Triple store Copyright 2009-2010 @CULT. All rights reserved 16
  • 17. Concretely... Copyright 2009-2010 @CULT. All rights reserved 17
  • 18. Le avventure di Pinocchio (MARC) 000 00694nam a2200241 i 4500 008 971205s1997 it j 000 0 ita c 020 a 880921191X 082 1 a 853.8 100 1 a Collodi, Carlo. 245 13 a Le avventure di Pinocchio / c C. Collodi ; illustrazioni di Attilio Mussino. 260 a Firenze : b Giunti, c 1997. 440 0 a Collana favolosa / [Giunti] 521 a Letteratura per ragazzi 700 1 a Mussino, Attilio. Copyright 2009-2010 @CULT. All rights reserved 18
  • 19. Le avventure di Pinocchio (RDF / XML) <bibo:Book rdf:about="http://www.cbt.trentinocultura.net/biblio/000002577949"> <dcterms:identifier>000002577949</dcterms:identifier> <bibo:isbn10>880921191X</bibo:isbn10> <dcterms:shortTitle>Le avventure di Pinocchio</dcterms:shortTitle> <dcterms:title> Le avventure di Pinocchio / C. Collodi ; illustrazioni di Attilio Mussino The book... </dcterms:title> <dc:creator rdf:resource="http://www.cbt.trentinocultura.net/person/collodi_carlo"/> <dcterms:language>ita</dcterms:language> <dcterms:audience rdf:resource="http://www.cbt.trentinocultura.net/subject/opera_per_bambini"/> <dcterms:isPartOf rdf:resource="http://www.cbt.trentinocultura.net/biblio/2378129373323" /> <dcterms:extent>186 p.</dcterms:extent> <isbd:hasPlaceOfPublicationProductionDistribution> Firenze </isbd:hasPlaceOfPublicationProductionDistribution> <dcterms:issued>1997</dcterms:issued> <dcterms:publisher rdf:resource="http://www.cbt.trentinocultura.net/organisations/giunti"/> </bibo:Book> ...the author... <foaf:Person rdf:about="http://www.cbt.trentinocultura.net/person/collodi_carlo"> <foaf:name>Collodi, Carlo</foaf:name> </foaf:Person> <foaf:Organization rdf:about="http://www.cbt.trentinocultura.net/organisations/giunti"> <foaf:name>Giunti</foaf:name> </foaf:Organization> ...and the publisher Copyright 2009-2010 @CULT. All rights reserved 19
  • 20. Step 1: transform MARC in RDF As first step we need to transform MARC records in their corresponding RDF representation. This presentation is not focused on this advanced topic, we will just index ten MARC records only for demonstrating the capabilities of the system. We choosen the RDF / XML format for expressing the resulting triples. This will be the input data of the system. MARC 21 RDF / XML Copyright 2009-2010 @CULT. All rights reserved 20
  • 21. Step 2: submit RDF data The RDF data created in the previous step needs to be submitted to the system. RDF / XML Copyright 2009-2010 @CULT. All rights reserved 21
  • 22. Step 3: make a search... Autocompletion Faceting Copyright 2009-2010 @CULT. All rights reserved 22
  • 23. Step 4: more publisher data... It would be great if my users could see additional data on search results. For example, I could ask data to publishers (logo, homepage and so on)...maybe for them could be a kind of advertisment, while for my users an additional information displayed on my catalog But 1) I don't want those data be part of my search index; 2) I don't want to include those data in my bibliographic database; 3) I don't want to reindex my data when some publisher information changes 4) I would like to manage, improve those data without affecting searches Copyright 2009-2010 @CULT. All rights reserved 23
  • 24. Step 6: Our sample publisher Before... <foaf:Organization rdf:about="http://www.cbt.trentinocultura.net/organisations/giunti"> <foaf:name>Giunti</foaf:name> </foaf:Organization> ...and after <foaf:Organization rdf:about="http://www.cbt.trentinocultura.net/organisations/giunti"> <foaf:name>Giunti</foaf:name> <foaf:logo rdf:resource=”http://www.giunti.it/custom/src/@css/images/logo_Giunti.jpg”/> <rdfs:comment>Fondata nel pieno delle battaglie risorgimentali...</rdfs:comment> <foaf:mbox rdf:resource=”mailto:contactsus@domain.it”/> <foaf:homepage rdf:resource=”http://www.giunti.it”/> </foaf:Organization> As you can see, we added a logo, a brief description of the publisher, a mailbox and a homepage. We got data directly from the publisher website. This data will be submitted again to the search system but without rebuild the search index. As consequence of that, changes made to the publishers are immediately available. Copyright 2009-2010 @CULT. All rights reserved 24
  • 25. Step 7: see additional data... Copyright 2009-2010 @CULT. All rights reserved 25
  • 26. Step 7 bis: another publisher... Copyright 2009-2010 @CULT. All rights reserved 26
  • 27. Step 8: still more (linked) data... (1/3) Great! My users were enthusiast!! So I'd like more...and not only publisher... but what else? Sir, I think it would be very useful if we would show, beside each record, author information Yes definitely it would, but you have no idea of what kind of job I did to insert all publisher data and I don't want to do the same for authors...too much work! If I remember well your system is Yes using Linked Data isn't it? So in this case the right question is not “How can I do, I have no data”, but “What kind of data I would like to show?” ??? Copyright 2009-2010 @CULT. All rights reserved 27
  • 28. Step 8: still more (linked) data...(2/3) There a lot of RDF authoritative endpoints that are exposing their data free of charge; the main advantage is that you can link this information to your system and you don't have to worry about their maintenance: it's not your data! See http://viaf.org or http://dbpedia.org By linking those resources, you can get data in a standardized way because sources are sharing one or more (accepted) ontologies for describing authors, subjects, things and so on... So for the example above we need the gather additional information about people (authors) and fortunately there's an ontology called Friend of a Friend (FOAF) that fits exactly our needs. This ontology is used in all RDF sources describing persons (like VIAF, Dbpedia) In our example instead of copying and storing in our triple store (as we did for publishers) all information about Carlo Collodi, the author of “The adventures of Pinocchio”, we will simply link our internal representation with the same resource as defined in DBPedia. Copyright 2009-2010 @CULT. All rights reserved 28
  • 29. Step 8: still more (linked) data...(3/3) Copyright 2009-2010 @CULT. All rights reserved 29
  • 30. Step 9: Our sample author Before... <foaf:Organization rdf:about="http://www.cbt.trentinocultura.net/person/collodi_carlo"> <foaf:name>Collodi, Carlo</foaf:name> </foaf:Organization> ...and after <foaf:Organization rdf:about="http://www.cbt.trentinocultura.net/person/collodi_carlo"> <foaf:name>Collodi, Carlo</foaf:name> <owl:sameAs rdf:resource=”http://dbpedia.org/resource/Carlo_Collodi”/> </foaf:Organization> As you can see, we didn't add any information but just a “link” with the sameAs predicate. The URI (http://dbpedia.org/resource/Carlo_Collodi) points to a web resource describing Carlo Collodi, so we can gather this data and display to the end user (for example). Copyright 2009-2010 @CULT. All rights reserved 30
  • 31. Step 10: again the same search... Copyright 2009-2010 @CULT. All rights reserved 31
  • 32. Step 10 bis: another author... Copyright 2009-2010 @CULT. All rights reserved 32
  • 33. Step 11: still more data??? yes! Wow!! And now? Is there some other content I could “link”? Yes sir, subjects for example...are you using subjects coming from the “Nuovo Soggettario”? Yes So in this case you can link those subjects directly with concepts of the thesaurus, therefore providing to end users information like scope notes, history notes, term relationships and so on.. And, as another example, for places you can link “Geonames” resources, which provides RDF description of cities, countries. Copyright 2009-2010 @CULT. All rights reserved 33
  • 34. Step 12: Linking the “Nuovo Soggettario“ Copyright 2009-2010 @CULT. All rights reserved 34
  • 35. Step 13: Linking Firenze with Geonames Copyright 2009-2010 @CULT. All rights reserved 35
  • 36. Agenda Goals Information Retrieval Triple store Proof of concept Q&A Copyright 2009-2010 @CULT. All rights reserved 36
  • 37. 31st ADLUG ANNUAL MEETING 2012 Sala Brunelleschi of the OPA – Firenze 19 – 21 September 2012 Linking Linked Data Thank You!

Editor's Notes

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37