Integration of research literature and data (InFoLiS)

Integration of research literature and data
(InFoLiS)
Katarina Boland1
Philipp Zumstein2
1
GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
2
Mannheim University Library, Mannheim, Germany
CNI 2015 Spring Membership Meeting
April 14th, 2015

the InFoLiS project:
Integration of research data and publications
InFoLiS I: 05/2011 - 05/2013
InFoLiS II: 08/2014 - 08/2016
InFoLiS is funded by the DFG (SU 647/2-1)
Integration of research literature and data (InFoLiS) 2/22
Introduction

Catalogue:
Publications
SSOAR (GESIS),
Primo (UB MA),
...
DataCatalogue:
Research Data
da|ra (GESIS),
...
Query
Query
Response
Links
Response
Response
Response
InFoLiS Project Goals

1 Part I: Generation of Links
2 Part II: How can you reuse it?
Outline

Part 1: Generation of Links

Recommendation:1
:
Creator (Publication Date): Title. Publication
Agent. Identifier
Creator (Publication Date): Title. Version.
Publication Agent. Type of Resource. Identifier.
→ Extraction based on these patterns?
1
see
http://aufﬁnden-zitieren-dokumentieren.de/zitieren/empfohlene-datenzitation/
Citation of Research Data

presentation and discussion of the empirical ﬁndings. For this purpose, data
from the Socio-Economic Panel (SOEP) of the years 1990 and 2003 are used
and for both periods, the impact factors are estimated using linear regression
models.
data from the title of the years year are used
References to Datasets

Table 1: Population forecast for Germany depending on age cohorts -
proportion in percent.
Data base: 10th Population Forecast of the Federal Statistical Ofﬁce , variant 5.
(Data base: number title of the publication agent, variant
variant)

Consulted were furthermore ...
Consulted were furthermore title1, title2, title3, ..., titleN.

Table 3: Sample of the surveys conducted in the years 2003 and 2004 as well
as size of the sample, with valid data from both surveys
(Source: Ditton et al. 2005a)
(Source: citation of descriptive publication)

...are hard to detect!
see also...
Green, Toby (2009). We Need Publishing Standards for
Datasets and Data Tables. OECD Publishing White Paper.
doi: 10.1787/603233448430
Altman, Micah and Gary King (2007). A Proposed Standard
for the Scholarly Citation of Quantitative Data. In: D-Lib
Magazine 13.3.
url: http://www.dlib.org/dlib/march07/altman/03altman.html

Automatic Identiﬁcation of
References
Why not simply search for study titles in publications?

References
“ALLBUS/GGSS 1996 (Allgemeine Bev¨olkerungsumfrage der
Sozialwissenschaften/German General Social Survey 1996)”

References
“ALLBUS/GGSS 1996 (Allgemeine Bev¨olkerungsumfrage
der Sozialwissenschaften/German General Social Survey 1996)”
“ALLBUS 96”

References
“Youth 2010”

How do humans recognize study references?
Source: Estimations based on SOEP, wave 2002.
General idea

How do humans recognize study references?
Source: Estimations based on xyz, wave 2002.
General idea

Algorithm

for details see...
Katarina Boland, Dominique Ritze, Kai Eckert & Brigitte Mathiak (2012).
Identifying References to Datasets in Publications. In: Proceedings of the
Second International Conference on Theory and Practice of Digital Libraries
(TPDL), Lecture Notes in Computer Science Volume 7489, pp. 150-161. Berlin:
Springer. doi:10.1007/978-3-642-33290-6 17
Reference Extraction

Mapping to Datasets in da|ra

Strategies: 1) greedy; 2) exact; 3) best
Mapping to Datasets in da|ra:
granularity of registration vs. citation

ALLBUS
ALLBUS 2000 ALLBUS 1996ALLBUS 1998
ALLBUS 2000
CAPI/PAPI
ALLBUScompact 2000
CAPI/PAPI
ALLBUScompact 2000
CAPI
ALLBUS - Cumulation 1980-2006 ALLBUS - Cumulation 1980-2008ALLBUScompact - Cumulation 1980-2010
ALLBUScompact 2000 ... ... ...
......
... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ......
ALLBUScompact
→ use ontology
Mapping to Datasets in da|ra

Vocabulary: e.g. DDI-RDF Discovery Vocabulary2
2
Thomas Bosch, Richard Cyganiak, Arofan Gregory, Joachim Wackerow (2013): DDI-RDF Discovery Vocabulary: A Metadata
Vocabulary for Documenting Research and Survey Data. In: Proceedings of the 6th Linked Data on the Web (LDOW) Workshop at
the 22nd International World Wide Web Conference (WWW). CEUR Workshop Proceedings, pp. 46-55
Ontology: Approach

Links

Example: da|ra
Example: SSOAR
Integration of Links into Information
Systems

Thank you for your attention!
katarina.boland@gesis.org
Next part: How can you
reuse it?

Interna, Data Structure, Technology

(Internal) Data structure
Document
Pattern
Executation of
Algorithm
Study Title
Study URI

Document
Pattern
Executation of
Algorithm
Study Title
Study URI
Which studies
are found in a
document?

Document
Pattern
Executation of
Algorithm
Study Title
Study URI
How was a
pattern derived?
Which studies
are found in an
document?

Document
Pattern
Executation of
Algorithm
Study Title
Study URI
Which other study
titles are found with
the new
configuration of the
algorithm?
How was a
pattern derived?
Which studies
are found in an
document?

RESTful API (web services)
 GET, POST, PUT, DELETE, PATCH resources
 Search, perform algorithms, upload files
 open for integration into other workflows, e.g. in
 ressource discovery systems
 research data catalogues
 digital repositories
 possible to orchestrate over a web interface for
individual use

Lookup services
DB
(links)
lookup service
publication
URI
study URI
study URI
reverse lookup
service
publication
URI

Extraction of study URIs from a PDF
pdf (fulltext)
DB
(patterns)
pdf2txt
txt (fulltext) extract study titles
study URI
study titles
linking

Recognizing patterns
pdfs
(fulltext)
pattern recognizer
seed
DB
(pattern)

Integration of publications and
research data

Quoting the Horizon Report 2014
“Visionary leadership for research data management
models is also required to determine how to best
incorporate data connections into library catalogs” (NMC
Horizon Report 2014 - Library Edition, p. 7)

Current situation: Several steps needed
 Common situation today:
 Search online catalogue
 Evaluate search results
 Find fulltext to relevant source
 Read the publication
 Spot the research data
 Moreover, often the reverse information is missing
completely
 Which publications are built on some specific
research data?

Clientside
load additional data in
catalogue view (e.g. over
Ajax)
 enrich view, links
 up-to-date data
 Embedd data in the web
presentation
Serverside
add additional data in your
catalogue database (e.g.
Primo enrichement process)
 enrich view, links, search,
sort, filter
 time-lagged because of
the update mechanism
 Do the data fit into
existing infrastructure?
(fields, tables, database)
Two Approaches

Integration as links
 Link from catalogue entry ...
 … to the corresponding research data

Integration as popup
Cited research data: 2
• ALLBUS 2010 (used in 512 publications)
• part of ALLBUS (used in 13.456 publications)
• own research data (used in 1 publications)

Integration in search/sort
Cited data sets 4
Cited data sets 1
Sort by data
citation

Integration in search/filter
Research data available

Enrich your research data catalogue
Cited in: Ritze, D., Paulheim, H., &
Eckert, K. (2013). Evaluation Measures
for Ontology Matchers in Supervised
Matching Scenarios. In The Semantic
Web – ISWC 2013 (p. 392–407).
Tags from Publication: Supervised
Ontology Matching, Evaluation, Recall,
Precision, F-Measure, Precision@N-
Curves, ROC-Curves, Precision-Recall-
Curves

Current Goals of the Project
1. Expansion to other disciplines and languages
2. Linked data based infrastructure
3. Improve the reusability of generated links

Dissemination
 our web services will be open for everyone
 project webpage
 http://infolis.github.io/
 background information,
slides, publications, news
 Additionally our code is open source
 https://github.com/infolis
 you can install/try out everything locally
 development of code

Questions, Discussions, Feedback
 Questions?
 Discussions
 Give us feedback
 Small online survey: http://t1p.de/infolis
http://wiki.bib.uni-mannheim.de/limesurvey/index.php?sid=55594

Integration of research literature and data (InFoLiS)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Integration of research literature and data (InFoLiS)

Similar to Integration of research literature and data (InFoLiS) (20)

Recently uploaded

Recently uploaded (20)

Integration of research literature and data (InFoLiS)