SlideShare a Scribd company logo
1 of 28
Download to read offline
Search challenges for 
collections of book records 
Roberto Cornacchia 
ECIR 2014 – Industry day 
Amsterdam, 16 April 2014 
> design > publish > search!
2 
Outline 
● COMSODE (EU-FP7) 
– Publication platform for Linked Open Data 
● Spinque 
– Search modelling 
● A use-case from Digital Humanities 
– link, clean, search 
● A step further 
– Rank. Everything. Always. 
– Query-time resolution of data conflicts
3 
Unlocking the value of L(O)D... 
In the public sector 
In industry 
Source: Open Data 500 by The GovLab 
...is a hot topic 
In science 
Source: Bradley Allen, SlideShare
4 
The COMSODE project has received funding from the Seventh Framework Programme of the European Union in the grant agreement number 611358. 
COMSODE 
Unlock LOD value 
by improving publication 
www.comsode.eu
5 
Spinque 
● Spin-off of CWI Amsterdam (2009) 
● Develops domain-tailored search technology 
– Applied to: 
● IP, multimedia, cultural heritage, child-friendly, ... 
– Search by Strategy 
● visual modelling of search processes 
– Rank. Everything. Always. 
● integrated support for all-round probabilistic search 
● Work in progress in COMSODE 
– Search Linked Data
6 
A use case in Digital Humanities 
● "Can We Rank Scholarly Book Publishers? 
A Bibliometric Experiment with the Field of History" 
(Zuccala et al., Journal of the American Society for Information Science and Technology, 2014) 
● Goal: indicate publisher prestige quantitatively 
– bibliographic citations to books from journal articles. 
● Dataset: Elsevier Scopus journal citations 
– Granted via the 2012 Elsevier Bibliometrics Research Program 
– 5.6M citations, 3M from journals to books 
– History & literature 
– Periods 1996-2000 and 2007-2011
7 
Elsevier Scopus dataset 
citing_eid,cited_eid,source_title,source_id,article_pubyear,authors,article_title,volume,page_start,doctype 
4702,232311,"American Antiquity",40554,1996,"Graybill D. (6603866252);Michaelsen J. (7003483600);Neff H. (7005907495);Larson D. (7402633779);Ambos E. (14048059100)","Risk, climatic variability, and the study of southwestern prehistory: An evolutionary perspective",61,217,re 
4702,1333725,"American Antiquity",40554,1997,"Raab L. (6601955075);Larson D. (7402633779)","Medieval climatic anomaly and punctuated cultural evolution in coastal Southern California",62,319,ar 
4702,7613691,"American Antiquity",40554,1997,"Colten R. (8363369400);Arnold J. (8754215200);Pletka S. (25221793700)","Contexts of cultural change in insular California",62,300,ar 
4702,30302643,"Quarternary Science Reviews",26239,1996,"Stuiver M. (7007003882);Reimer P. (7103071876);Taylor R. (26030669400)","Development and extension of the calibration of the radiocarbon time scale: Archaeological applications",15,655,ar 
4702,30317536,"Canadian Journal of Earth Sciences",22031,1996,"Dyke A. (7003706220);McNeely R. (7004891098);Hooper J. (7102438470)","Marine reservoir corrections for bowhead whale radiocarbon age determinations",33,1628,ar 
4702,30739323,"Journal of Coastal Research",27374,1997,"Mason O. (7004241927);Hopkins D. (7202255075);Plug L. (7801522080)","Chronology and paleoclimate of storm-induced erosion and episodic dune growth across Cape Espenberg spit, Alaska, U.S.A.",13,770,ar 
7154,2287569,"American Sociological Review",16929,1997,"Goodwin J. (7402339411)","The libidinal constitution of a high-risk social movement: Affectual ties and solidarity in the Huk rebellion, 1946 to 1954",62,53,re 
7154,30495855,"Sociological Theory",18110,1996,"Emirbayer M. (23110549400)","Useful Durkheim",14,109,ar 
9412,9986565,"British Journal for the Philosophy of Science",19977,1997,"Eliasmith C. (6603720957);Thagard P. (6701846211)","Waves, Particles, and Explanatory Coherence",48,1,ar 
9412,30006171,Gastroenterology,28330,1996,"Hamlet A. (6701690210);Dalenb<E4>ck J. (7003418017);F<E4>ndriks L. (7005233384);Olbe L. (7006954993)","A mechanism by which Helicobacter pylori infection of the antrum contributes to the development of duodenal ulcer",110,1386,ar 
"Power and community: 
The archaeology of slavery 
at the hermitage plantation" American Antiquity 
(journal, history) 
Thomas B. 
MISSISSIPPIAN 
POLITICAL 
ECONOMY 
Muller J. 
1998 
1997 
cites 
article 
book 
CSV files 
RDF
8 
Warm-up 
● Load RDF data 
– (subject, predicate, object) 
● Most cited publications 
SELECT ?publication 
count(*) as ?nCitations 
subject predicate object 
publication1 cites publication2 
publication1 cites publication3 
publication3 publisher publisher5 
WHERE {[] scopus:cites ?publication} 
GROUP BY ?publication 
ORDER BY desc(?nCitations) 
● No problem with SPARQL or SQL 
publication nCitations 
publication3 288 
publication5 223 
publication2 124
9 
Warm-up 
● Load RDF data 
– (subject, predicate, object) 
● Most cited publications 
SELECT ?publication 
count(*) as ?nCitations 
subject predicate object 
publication1 cites publication2 
publication1 cites publication3 
publication3 publisher publisher5 
WHERE {[] scopus:cites ?publication} 
GROUP BY ?publication 
ORDER BY desc(?nCitations) 
● No problem with SPARQL or SQL 
publication nCitations 
publication3 288 
publication5 223 
publication2 124 
Predicate 
traversal 
Aggregation
10 
Warm-up .. visually 
"Search by Strategy"
11 
Warm-up .. visually 
"Search by Strategy" 
Elsevier 
data source 
Predicate 
traversal 
Aggregation 
DDeeppllooyy RREESSTT AAPPII 
Data flow 
DDeeppllooyy sseeaarrcchh eennggiinnee
12 
Back to the original goal: 
rank publishers 
Elsevier – Scopus 
(closed data) 
journal articles 
cited books 
“cited” publishers
13 
Back to the original goal: 
rank publishers 
"cited" publishers 
journal articles 
cited books cited books 
aggregated 
"cited" publishers 
sameAs 
Elsevier – Scopus 
(closed data) 
OCLC - WorldCat 
(open data) 
“cited” publishers 
● Open Data Node 
– Links books 
● Search 
– Uses links 
– On-the-fly 
matching? 
DDeeppllooyy sseeaarrcchh eennggiinnee
14 
Surprise.. 
– University Press,Cambridge [England] 
– University Press,Cambridge [etc.] 
– University Press,"Cambridge, Mass.," 
– University Press,"Cambridge, N.E." 
– University Press,"Cambridge, U.K." 
– University Press,"Cambridge, UK" 
– University Press,Cambridge [U.K.] 
– University Press [etc.],Cambridge 
– University Press [etc.],"Cambridge 
[Eng., etc.]" 
– University Press [etc.],Cambridge [etc.] 
– "University press [etc., 
etc.]","Cambridge," 
– University Pressf ats 
collnutz,Cambridge 
– University Press of Cambridge,"Boston, 
Mass." 
– University Press of 
Cambridge,"[Cambridge, Mass.]" 
– Univ. of Cambridge,Cambridge 
– Univ. P.,Cambridge 
– Univ. Pr,Cambridge 
– Univ. Pr.,Cambridge 
– Univ.Pr.,Cambridge 
– Univ. Pr.,Cambridge [u.a.] 
– Univ. Pr.,"Cambridge, U.S.A." 
– Univ. Pr.,Cambridge [usw.] 
2588 variations (just for "Cambridge Universty Press"). 
Probably only 2 or 3 distinct entities in there.
15 
De-duplicate publishers 
"cited" publishers 
Elsevier – Scopus 
(closed data) 
journal articles 
cited books cited books 
aggregated 
"cited" publishers 
sameAs 
OCLC - WorldCat 
(open data)
16 
De-duplicate publishers 
● Open Data Node 
– Links duplicates 
● Search 
– Uses links 
– On-the-fly 
matching? 
"cited" publishers 
Elsevier – Scopus 
(closed data) 
journal articles 
cited books cited books 
aggregated 
"cited" publishers 
sameAs 
OCLC - WorldCat 
(open data) 
DDeeppllooyy sseeaarrcchh eennggiinnee 
sameAs
17 
Is the DH researcher happy? 
● Yes. All very nice... 
– ...but...? 
● Data are not 100% clean yet. 
● Can we rank publishers of books about “women in war”? 
The initial database problem 
needs to deal with uncertainty
18 
Uncertainty from ranking 
"cited" publishers 
journal articles 
cited books 
aggregated 
"cited" publishers 
Elsevier – Scopus 
(closed data) 
OCLC - WorldCat 
(open data) 
subject predicate object 
book1 sameAs book9 
book7 publisher publisher3 
book9 publisher publisher5 
ranked 
cited books 
subject 
book1 
book1 
book2 
about 
"women in war" 
joins aggregations
19 
Uncertainty from ranking 
"cited" publishers 
journal articles 
cited books 
aggregated 
"cited" publishers 
Elsevier – Scopus 
(closed data) 
OCLC - WorldCat 
(open data) 
subject predicate object 
book1 sameAs book9 
book7 publisher publisher3 
book9 publisher publisher5 
ranked 
cited books 
prob 
0.7 
0.5 
0.4 
subject 
book1 
book1 
book2 
about 
"women in war" 
joins aggregations
20 
Uncertainty from ranking 
"cited" publishers 
journal articles 
cited books 
aggregated 
"cited" publishers 
Elsevier – Scopus 
(closed data) 
OCLC - WorldCat 
(open data) 
DDeeppllooyy sseeaarrcchh eennggiinnee 
subject predicate object 
book1 sameAs book9 
book7 publisher publisher3 
book9 publisher publisher5 
ranked 
cited books 
prob 
0.7 
0.5 
0.4 
subject 
book1 
book1 
book2 
about 
"women in war" 
probabilistic 
joins 
probabilistic 
aggregations
21 
More uncertainty from... 
"cited" publishers 
journal articles 
cited books 
aggregated 
"cited" publishers 
Elsevier – Scopus 
(closed data) 
OCLC - WorldCat 
(open data) 
cited books
22 
More uncertainty from... 
"cited" publishers 
journal articles 
cited books 
aggregated 
"cited" publishers 
Elsevier – Scopus 
(closed data) 
OCLC - WorldCat 
(open data) 
cited books 
Ranking
23 
More uncertainty from... 
"cited" publishers 
journal articles 
cited books 
aggregated 
"cited" publishers 
Elsevier – Scopus 
(closed data) 
OCLC - WorldCat 
(open data) 
cited books 
Ranking 
Fuzzy 
matching
24 
More uncertainty from... 
"cited" publishers 
journal articles 
cited books 
aggregated 
"cited" publishers 
Elsevier – Scopus 
(closed data) 
OCLC - WorldCat 
(open data) 
cited books 
Priors in 
data 
Ranking 
Fuzzy 
matching
25 
More uncertainty from... 
"cited" publishers 
journal articles 
cited books 
aggregated 
"cited" publishers 
Elsevier – Scopus 
(closed data) 
OCLC - WorldCat 
(open data) 
cited books 
Priors in 
data 
Ranking 
Fuzzy 
matching 
In fact...
26 
Rank. Everything. Always. 
● Unstructured search: uncertainty is first-class citizen 
● Structured search: let's switch from "facts" to "evidence" 
– Forcing uncertainty to “facts” risks to corrupt data and search results 
● Static data normalisation is good when it comes with high confidence 
● Otherwise, evidence can be used at query-time, depending on the context 
– Strategy blocks contain code for probabilistic DB 
● Based on Probabilistic Relational Algebra 
(Fuhr 1990, Rölleke et al. 2008) 
● Let's just call it "search", finally.
27 
Summary 
● The use case shown 
– benefits from LOD 
● data and results can be expanded / improved 
– benefits from Search by Strategy 
● probabilistic modelling of search scenarios 
● On-going effort in the COMSODE context 
– Open Data Node: good quality LOD 
– Search by Strategy: exploit uncertainty 
● Currently 
● improving RDF support (e.g. vocabularies, inference) 
● Improving query-time resolution of data conflicts
Thank you 
www.spinque.com 
www.comsode.eu 
www.youropendata.eu

More Related Content

What's hot

SciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoverySciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoveryAlichy Sowmya
 
Introduction to Scopus by the TrainingDesk
Introduction to Scopus by the TrainingDeskIntroduction to Scopus by the TrainingDesk
Introduction to Scopus by the TrainingDeskElsevier TrainingDesk
 
What's wrong with scholarly publishing today? II
What's wrong with scholarly publishing today? IIWhat's wrong with scholarly publishing today? II
What's wrong with scholarly publishing today? IIBjörn Brembs
 
EBSCO History Reference Center at the Idaho Falls Public Library
EBSCO History Reference Center at the Idaho Falls Public LibraryEBSCO History Reference Center at the Idaho Falls Public Library
EBSCO History Reference Center at the Idaho Falls Public LibraryAleishaStout1
 
UA Fam med fellows 2011 oct
UA Fam med fellows 2011 octUA Fam med fellows 2011 oct
UA Fam med fellows 2011 octjdondoyle
 
بنك المعرفة-المصرى
بنك المعرفة-المصرىبنك المعرفة-المصرى
بنك المعرفة-المصرىghadeermagdy
 
What's wrong with scholarly publishing today?
What's wrong with scholarly publishing today?What's wrong with scholarly publishing today?
What's wrong with scholarly publishing today?Björn Brembs
 
Informatics Transkills 2006-7
Informatics Transkills 2006-7Informatics Transkills 2006-7
Informatics Transkills 2006-7skelly
 
Oxford Bibliographies Online Instructional Presentation
Oxford Bibliographies Online Instructional PresentationOxford Bibliographies Online Instructional Presentation
Oxford Bibliographies Online Instructional PresentationAmber Mear
 
So you think you can Google?
So you think you can Google?So you think you can Google?
So you think you can Google?AshfordLibrary
 
Astronomy libraries - your gateway to information
Astronomy libraries - your gateway to informationAstronomy libraries - your gateway to information
Astronomy libraries - your gateway to informationUta Grothkopf
 
Presentation on web of science m.vi.library
Presentation on  web of science m.vi.libraryPresentation on  web of science m.vi.library
Presentation on web of science m.vi.libraryAziz EL Hassani
 
Increasing impact of journal articles (web version)
Increasing impact of journal articles (web version)Increasing impact of journal articles (web version)
Increasing impact of journal articles (web version)Durham_Library_DTP
 
Searching journal databases
Searching journal databasesSearching journal databases
Searching journal databasesEISLibrarian
 
Finding news guide feb 2011
Finding news guide feb 2011Finding news guide feb 2011
Finding news guide feb 2011Sam Aston
 
Citation Searching Presentation
Citation Searching PresentationCitation Searching Presentation
Citation Searching PresentationValerie Forrestal
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_uploadProf. Wim Van Criekinge
 
AJE Best Practices Workshop USP
AJE Best Practices Workshop USPAJE Best Practices Workshop USP
AJE Best Practices Workshop USPSIBiUSP
 

What's hot (20)

SciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoverySciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discovery
 
Introduction to Scopus by the TrainingDesk
Introduction to Scopus by the TrainingDeskIntroduction to Scopus by the TrainingDesk
Introduction to Scopus by the TrainingDesk
 
What's wrong with scholarly publishing today? II
What's wrong with scholarly publishing today? IIWhat's wrong with scholarly publishing today? II
What's wrong with scholarly publishing today? II
 
EBSCO History Reference Center at the Idaho Falls Public Library
EBSCO History Reference Center at the Idaho Falls Public LibraryEBSCO History Reference Center at the Idaho Falls Public Library
EBSCO History Reference Center at the Idaho Falls Public Library
 
UA Fam med fellows 2011 oct
UA Fam med fellows 2011 octUA Fam med fellows 2011 oct
UA Fam med fellows 2011 oct
 
بنك المعرفة-المصرى
بنك المعرفة-المصرىبنك المعرفة-المصرى
بنك المعرفة-المصرى
 
What's wrong with scholarly publishing today?
What's wrong with scholarly publishing today?What's wrong with scholarly publishing today?
What's wrong with scholarly publishing today?
 
Informatics Transkills 2006-7
Informatics Transkills 2006-7Informatics Transkills 2006-7
Informatics Transkills 2006-7
 
Oxford Bibliographies Online Instructional Presentation
Oxford Bibliographies Online Instructional PresentationOxford Bibliographies Online Instructional Presentation
Oxford Bibliographies Online Instructional Presentation
 
So you think you can Google?
So you think you can Google?So you think you can Google?
So you think you can Google?
 
Astronomy libraries - your gateway to information
Astronomy libraries - your gateway to informationAstronomy libraries - your gateway to information
Astronomy libraries - your gateway to information
 
Presentation on web of science m.vi.library
Presentation on  web of science m.vi.libraryPresentation on  web of science m.vi.library
Presentation on web of science m.vi.library
 
Master space 2017
Master space 2017Master space 2017
Master space 2017
 
Increasing impact of journal articles (web version)
Increasing impact of journal articles (web version)Increasing impact of journal articles (web version)
Increasing impact of journal articles (web version)
 
Searching journal databases
Searching journal databasesSearching journal databases
Searching journal databases
 
Emat689 June09
Emat689 June09Emat689 June09
Emat689 June09
 
Finding news guide feb 2011
Finding news guide feb 2011Finding news guide feb 2011
Finding news guide feb 2011
 
Citation Searching Presentation
Citation Searching PresentationCitation Searching Presentation
Citation Searching Presentation
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
AJE Best Practices Workshop USP
AJE Best Practices Workshop USPAJE Best Practices Workshop USP
AJE Best Practices Workshop USP
 

Viewers also liked

3 gils keynote
3   gils keynote 3   gils keynote
3 gils keynote iablta
 
Skilled fishermen | Francois van Niekerk | 16 March 2014
Skilled fishermen | Francois van Niekerk | 16 March 2014Skilled fishermen | Francois van Niekerk | 16 March 2014
Skilled fishermen | Francois van Niekerk | 16 March 2014Hatfield Christian Church
 
The effect of national culture on the choice of entry mode
The effect of national culture on the choice of entry modeThe effect of national culture on the choice of entry mode
The effect of national culture on the choice of entry modeTK Tof
 
Suggestmob pitch deck
Suggestmob pitch deckSuggestmob pitch deck
Suggestmob pitch decksuggestmob
 
Barcelona 2011 developing creative contexts
Barcelona 2011 developing creative contextsBarcelona 2011 developing creative contexts
Barcelona 2011 developing creative contextsLorraine Warren
 
Step bystep abap_field help or documentation
Step bystep abap_field help or documentationStep bystep abap_field help or documentation
Step bystep abap_field help or documentationMilind Patil
 
Alabama High School Graduation (Social Studies) Ch.2 The Revolutionary War an...
Alabama High School Graduation (Social Studies) Ch.2 The Revolutionary War an...Alabama High School Graduation (Social Studies) Ch.2 The Revolutionary War an...
Alabama High School Graduation (Social Studies) Ch.2 The Revolutionary War an...Terron Brooks
 
Infectioncontrolinter 120819213235-phpapp02 (1)
Infectioncontrolinter 120819213235-phpapp02 (1)Infectioncontrolinter 120819213235-phpapp02 (1)
Infectioncontrolinter 120819213235-phpapp02 (1)faiz hasan
 
Global Warming: Why Be Concerned?
Global Warming: Why Be Concerned? Global Warming: Why Be Concerned?
Global Warming: Why Be Concerned? Paul H. Carr
 
MLC Student iPad Survey March 2012
MLC Student iPad Survey March 2012MLC Student iPad Survey March 2012
MLC Student iPad Survey March 2012Corrie Barclay
 

Viewers also liked (16)

3 gils keynote
3   gils keynote 3   gils keynote
3 gils keynote
 
Gtug20110307
Gtug20110307Gtug20110307
Gtug20110307
 
Skilled fishermen | Francois van Niekerk | 16 March 2014
Skilled fishermen | Francois van Niekerk | 16 March 2014Skilled fishermen | Francois van Niekerk | 16 March 2014
Skilled fishermen | Francois van Niekerk | 16 March 2014
 
Progetto taci oil
Progetto taci oilProgetto taci oil
Progetto taci oil
 
The effect of national culture on the choice of entry mode
The effect of national culture on the choice of entry modeThe effect of national culture on the choice of entry mode
The effect of national culture on the choice of entry mode
 
Suggestmob pitch deck
Suggestmob pitch deckSuggestmob pitch deck
Suggestmob pitch deck
 
Barcelona 2011 developing creative contexts
Barcelona 2011 developing creative contextsBarcelona 2011 developing creative contexts
Barcelona 2011 developing creative contexts
 
Step bystep abap_field help or documentation
Step bystep abap_field help or documentationStep bystep abap_field help or documentation
Step bystep abap_field help or documentation
 
Alabama High School Graduation (Social Studies) Ch.2 The Revolutionary War an...
Alabama High School Graduation (Social Studies) Ch.2 The Revolutionary War an...Alabama High School Graduation (Social Studies) Ch.2 The Revolutionary War an...
Alabama High School Graduation (Social Studies) Ch.2 The Revolutionary War an...
 
Infectioncontrolinter 120819213235-phpapp02 (1)
Infectioncontrolinter 120819213235-phpapp02 (1)Infectioncontrolinter 120819213235-phpapp02 (1)
Infectioncontrolinter 120819213235-phpapp02 (1)
 
Rm 4
Rm 4Rm 4
Rm 4
 
Global Warming: Why Be Concerned?
Global Warming: Why Be Concerned? Global Warming: Why Be Concerned?
Global Warming: Why Be Concerned?
 
MLC Student iPad Survey March 2012
MLC Student iPad Survey March 2012MLC Student iPad Survey March 2012
MLC Student iPad Survey March 2012
 
Eims Adzzoopres3
Eims Adzzoopres3Eims Adzzoopres3
Eims Adzzoopres3
 
Tharrison
TharrisonTharrison
Tharrison
 
Jacklyn
JacklynJacklyn
Jacklyn
 

Similar to Search challenges for collections of book records

STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVES
STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVESSTRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVES
STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVESNicolaie Constantinescu
 
Bibliometrics, Journal Impact Factors and Maximising the Cite-ability of Jour...
Bibliometrics, Journal Impact Factors and Maximising the Cite-ability of Jour...Bibliometrics, Journal Impact Factors and Maximising the Cite-ability of Jour...
Bibliometrics, Journal Impact Factors and Maximising the Cite-ability of Jour...Jamie Bisset
 
Selecting open access Knowledge Base collections for Discovery
Selecting open access Knowledge Base collections for Discovery Selecting open access Knowledge Base collections for Discovery
Selecting open access Knowledge Base collections for Discovery Jeff Siemon
 
Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Micah Altman
 
Resource for finding and accessing scientific article
Resource for finding and accessing scientific articleResource for finding and accessing scientific article
Resource for finding and accessing scientific articleManuscriptpedia
 
Haustein, S. (2017). The evolution of scholarly communication and the reward ...
Haustein, S. (2017). The evolution of scholarly communication and the reward ...Haustein, S. (2017). The evolution of scholarly communication and the reward ...
Haustein, S. (2017). The evolution of scholarly communication and the reward ...Stefanie Haustein
 
Patterns in scholarly publications online: Erdős and beyond
Patterns in scholarly publications online: Erdős and beyondPatterns in scholarly publications online: Erdős and beyond
Patterns in scholarly publications online: Erdős and beyondJonathan Bowen
 
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...Krzysztof Wecel
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4jSimon Jupp
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesBertram Ludäscher
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
British Library
British LibraryBritish Library
British Libraryclarivate
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptxhasanrdhaiwi
 

Similar to Search challenges for collections of book records (20)

STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVES
STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVESSTRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVES
STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVES
 
ScienceOpen for Institutions
ScienceOpen for InstitutionsScienceOpen for Institutions
ScienceOpen for Institutions
 
Bibliometrics, Journal Impact Factors and Maximising the Cite-ability of Jour...
Bibliometrics, Journal Impact Factors and Maximising the Cite-ability of Jour...Bibliometrics, Journal Impact Factors and Maximising the Cite-ability of Jour...
Bibliometrics, Journal Impact Factors and Maximising the Cite-ability of Jour...
 
ACDI Interns Library training: 8 May 2017
ACDI Interns Library training: 8 May 2017ACDI Interns Library training: 8 May 2017
ACDI Interns Library training: 8 May 2017
 
Master Space 2016
Master Space 2016Master Space 2016
Master Space 2016
 
Selecting open access Knowledge Base collections for Discovery
Selecting open access Knowledge Base collections for Discovery Selecting open access Knowledge Base collections for Discovery
Selecting open access Knowledge Base collections for Discovery
 
Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1Overview of Bibliometrics - IAP Course version 1.1
Overview of Bibliometrics - IAP Course version 1.1
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Resource for finding and accessing scientific article
Resource for finding and accessing scientific articleResource for finding and accessing scientific article
Resource for finding and accessing scientific article
 
E profiles 1
E profiles 1E profiles 1
E profiles 1
 
Haustein, S. (2017). The evolution of scholarly communication and the reward ...
Haustein, S. (2017). The evolution of scholarly communication and the reward ...Haustein, S. (2017). The evolution of scholarly communication and the reward ...
Haustein, S. (2017). The evolution of scholarly communication and the reward ...
 
Patterns in scholarly publications online: Erdős and beyond
Patterns in scholarly publications online: Erdős and beyondPatterns in scholarly publications online: Erdős and beyond
Patterns in scholarly publications online: Erdős and beyond
 
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Engineering Management Library Research Slides
Engineering Management Library Research Slides Engineering Management Library Research Slides
Engineering Management Library Research Slides
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
British Library
British LibraryBritish Library
British Library
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptx
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Search challenges for collections of book records

  • 1. Search challenges for collections of book records Roberto Cornacchia ECIR 2014 – Industry day Amsterdam, 16 April 2014 > design > publish > search!
  • 2. 2 Outline ● COMSODE (EU-FP7) – Publication platform for Linked Open Data ● Spinque – Search modelling ● A use-case from Digital Humanities – link, clean, search ● A step further – Rank. Everything. Always. – Query-time resolution of data conflicts
  • 3. 3 Unlocking the value of L(O)D... In the public sector In industry Source: Open Data 500 by The GovLab ...is a hot topic In science Source: Bradley Allen, SlideShare
  • 4. 4 The COMSODE project has received funding from the Seventh Framework Programme of the European Union in the grant agreement number 611358. COMSODE Unlock LOD value by improving publication www.comsode.eu
  • 5. 5 Spinque ● Spin-off of CWI Amsterdam (2009) ● Develops domain-tailored search technology – Applied to: ● IP, multimedia, cultural heritage, child-friendly, ... – Search by Strategy ● visual modelling of search processes – Rank. Everything. Always. ● integrated support for all-round probabilistic search ● Work in progress in COMSODE – Search Linked Data
  • 6. 6 A use case in Digital Humanities ● "Can We Rank Scholarly Book Publishers? A Bibliometric Experiment with the Field of History" (Zuccala et al., Journal of the American Society for Information Science and Technology, 2014) ● Goal: indicate publisher prestige quantitatively – bibliographic citations to books from journal articles. ● Dataset: Elsevier Scopus journal citations – Granted via the 2012 Elsevier Bibliometrics Research Program – 5.6M citations, 3M from journals to books – History & literature – Periods 1996-2000 and 2007-2011
  • 7. 7 Elsevier Scopus dataset citing_eid,cited_eid,source_title,source_id,article_pubyear,authors,article_title,volume,page_start,doctype 4702,232311,"American Antiquity",40554,1996,"Graybill D. (6603866252);Michaelsen J. (7003483600);Neff H. (7005907495);Larson D. (7402633779);Ambos E. (14048059100)","Risk, climatic variability, and the study of southwestern prehistory: An evolutionary perspective",61,217,re 4702,1333725,"American Antiquity",40554,1997,"Raab L. (6601955075);Larson D. (7402633779)","Medieval climatic anomaly and punctuated cultural evolution in coastal Southern California",62,319,ar 4702,7613691,"American Antiquity",40554,1997,"Colten R. (8363369400);Arnold J. (8754215200);Pletka S. (25221793700)","Contexts of cultural change in insular California",62,300,ar 4702,30302643,"Quarternary Science Reviews",26239,1996,"Stuiver M. (7007003882);Reimer P. (7103071876);Taylor R. (26030669400)","Development and extension of the calibration of the radiocarbon time scale: Archaeological applications",15,655,ar 4702,30317536,"Canadian Journal of Earth Sciences",22031,1996,"Dyke A. (7003706220);McNeely R. (7004891098);Hooper J. (7102438470)","Marine reservoir corrections for bowhead whale radiocarbon age determinations",33,1628,ar 4702,30739323,"Journal of Coastal Research",27374,1997,"Mason O. (7004241927);Hopkins D. (7202255075);Plug L. (7801522080)","Chronology and paleoclimate of storm-induced erosion and episodic dune growth across Cape Espenberg spit, Alaska, U.S.A.",13,770,ar 7154,2287569,"American Sociological Review",16929,1997,"Goodwin J. (7402339411)","The libidinal constitution of a high-risk social movement: Affectual ties and solidarity in the Huk rebellion, 1946 to 1954",62,53,re 7154,30495855,"Sociological Theory",18110,1996,"Emirbayer M. (23110549400)","Useful Durkheim",14,109,ar 9412,9986565,"British Journal for the Philosophy of Science",19977,1997,"Eliasmith C. (6603720957);Thagard P. (6701846211)","Waves, Particles, and Explanatory Coherence",48,1,ar 9412,30006171,Gastroenterology,28330,1996,"Hamlet A. (6701690210);Dalenb<E4>ck J. (7003418017);F<E4>ndriks L. (7005233384);Olbe L. (7006954993)","A mechanism by which Helicobacter pylori infection of the antrum contributes to the development of duodenal ulcer",110,1386,ar "Power and community: The archaeology of slavery at the hermitage plantation" American Antiquity (journal, history) Thomas B. MISSISSIPPIAN POLITICAL ECONOMY Muller J. 1998 1997 cites article book CSV files RDF
  • 8. 8 Warm-up ● Load RDF data – (subject, predicate, object) ● Most cited publications SELECT ?publication count(*) as ?nCitations subject predicate object publication1 cites publication2 publication1 cites publication3 publication3 publisher publisher5 WHERE {[] scopus:cites ?publication} GROUP BY ?publication ORDER BY desc(?nCitations) ● No problem with SPARQL or SQL publication nCitations publication3 288 publication5 223 publication2 124
  • 9. 9 Warm-up ● Load RDF data – (subject, predicate, object) ● Most cited publications SELECT ?publication count(*) as ?nCitations subject predicate object publication1 cites publication2 publication1 cites publication3 publication3 publisher publisher5 WHERE {[] scopus:cites ?publication} GROUP BY ?publication ORDER BY desc(?nCitations) ● No problem with SPARQL or SQL publication nCitations publication3 288 publication5 223 publication2 124 Predicate traversal Aggregation
  • 10. 10 Warm-up .. visually "Search by Strategy"
  • 11. 11 Warm-up .. visually "Search by Strategy" Elsevier data source Predicate traversal Aggregation DDeeppllooyy RREESSTT AAPPII Data flow DDeeppllooyy sseeaarrcchh eennggiinnee
  • 12. 12 Back to the original goal: rank publishers Elsevier – Scopus (closed data) journal articles cited books “cited” publishers
  • 13. 13 Back to the original goal: rank publishers "cited" publishers journal articles cited books cited books aggregated "cited" publishers sameAs Elsevier – Scopus (closed data) OCLC - WorldCat (open data) “cited” publishers ● Open Data Node – Links books ● Search – Uses links – On-the-fly matching? DDeeppllooyy sseeaarrcchh eennggiinnee
  • 14. 14 Surprise.. – University Press,Cambridge [England] – University Press,Cambridge [etc.] – University Press,"Cambridge, Mass.," – University Press,"Cambridge, N.E." – University Press,"Cambridge, U.K." – University Press,"Cambridge, UK" – University Press,Cambridge [U.K.] – University Press [etc.],Cambridge – University Press [etc.],"Cambridge [Eng., etc.]" – University Press [etc.],Cambridge [etc.] – "University press [etc., etc.]","Cambridge," – University Pressf ats collnutz,Cambridge – University Press of Cambridge,"Boston, Mass." – University Press of Cambridge,"[Cambridge, Mass.]" – Univ. of Cambridge,Cambridge – Univ. P.,Cambridge – Univ. Pr,Cambridge – Univ. Pr.,Cambridge – Univ.Pr.,Cambridge – Univ. Pr.,Cambridge [u.a.] – Univ. Pr.,"Cambridge, U.S.A." – Univ. Pr.,Cambridge [usw.] 2588 variations (just for "Cambridge Universty Press"). Probably only 2 or 3 distinct entities in there.
  • 15. 15 De-duplicate publishers "cited" publishers Elsevier – Scopus (closed data) journal articles cited books cited books aggregated "cited" publishers sameAs OCLC - WorldCat (open data)
  • 16. 16 De-duplicate publishers ● Open Data Node – Links duplicates ● Search – Uses links – On-the-fly matching? "cited" publishers Elsevier – Scopus (closed data) journal articles cited books cited books aggregated "cited" publishers sameAs OCLC - WorldCat (open data) DDeeppllooyy sseeaarrcchh eennggiinnee sameAs
  • 17. 17 Is the DH researcher happy? ● Yes. All very nice... – ...but...? ● Data are not 100% clean yet. ● Can we rank publishers of books about “women in war”? The initial database problem needs to deal with uncertainty
  • 18. 18 Uncertainty from ranking "cited" publishers journal articles cited books aggregated "cited" publishers Elsevier – Scopus (closed data) OCLC - WorldCat (open data) subject predicate object book1 sameAs book9 book7 publisher publisher3 book9 publisher publisher5 ranked cited books subject book1 book1 book2 about "women in war" joins aggregations
  • 19. 19 Uncertainty from ranking "cited" publishers journal articles cited books aggregated "cited" publishers Elsevier – Scopus (closed data) OCLC - WorldCat (open data) subject predicate object book1 sameAs book9 book7 publisher publisher3 book9 publisher publisher5 ranked cited books prob 0.7 0.5 0.4 subject book1 book1 book2 about "women in war" joins aggregations
  • 20. 20 Uncertainty from ranking "cited" publishers journal articles cited books aggregated "cited" publishers Elsevier – Scopus (closed data) OCLC - WorldCat (open data) DDeeppllooyy sseeaarrcchh eennggiinnee subject predicate object book1 sameAs book9 book7 publisher publisher3 book9 publisher publisher5 ranked cited books prob 0.7 0.5 0.4 subject book1 book1 book2 about "women in war" probabilistic joins probabilistic aggregations
  • 21. 21 More uncertainty from... "cited" publishers journal articles cited books aggregated "cited" publishers Elsevier – Scopus (closed data) OCLC - WorldCat (open data) cited books
  • 22. 22 More uncertainty from... "cited" publishers journal articles cited books aggregated "cited" publishers Elsevier – Scopus (closed data) OCLC - WorldCat (open data) cited books Ranking
  • 23. 23 More uncertainty from... "cited" publishers journal articles cited books aggregated "cited" publishers Elsevier – Scopus (closed data) OCLC - WorldCat (open data) cited books Ranking Fuzzy matching
  • 24. 24 More uncertainty from... "cited" publishers journal articles cited books aggregated "cited" publishers Elsevier – Scopus (closed data) OCLC - WorldCat (open data) cited books Priors in data Ranking Fuzzy matching
  • 25. 25 More uncertainty from... "cited" publishers journal articles cited books aggregated "cited" publishers Elsevier – Scopus (closed data) OCLC - WorldCat (open data) cited books Priors in data Ranking Fuzzy matching In fact...
  • 26. 26 Rank. Everything. Always. ● Unstructured search: uncertainty is first-class citizen ● Structured search: let's switch from "facts" to "evidence" – Forcing uncertainty to “facts” risks to corrupt data and search results ● Static data normalisation is good when it comes with high confidence ● Otherwise, evidence can be used at query-time, depending on the context – Strategy blocks contain code for probabilistic DB ● Based on Probabilistic Relational Algebra (Fuhr 1990, Rölleke et al. 2008) ● Let's just call it "search", finally.
  • 27. 27 Summary ● The use case shown – benefits from LOD ● data and results can be expanded / improved – benefits from Search by Strategy ● probabilistic modelling of search scenarios ● On-going effort in the COMSODE context – Open Data Node: good quality LOD – Search by Strategy: exploit uncertainty ● Currently ● improving RDF support (e.g. vocabularies, inference) ● Improving query-time resolution of data conflicts
  • 28. Thank you www.spinque.com www.comsode.eu www.youropendata.eu