1. 12015-06-10 InFoLiS II – Making Data citations a reality
InFoLiS II
Making data citations a reality
Stockholm, 2015-06-10
Dominique Ritze, Konstantin Baierer
6. 62015-06-10 InFoLiS II – Making Data citations a reality
Infolink: Detecting dataset patterns
1) Search for study term in text
2) Deduce pattern from context of term
3) Apply pattern to learn new study term
4) GOTO 1)
7. 72015-06-10 InFoLiS II – Making Data citations a reality
Reference Extraction study1
study2
(Datenbasis: ALLBUS, SOEP, ZUMA-Standarddemografie, 1976–2002)
(Datenbasis: SOEP, Jugendliche der Befragungsjahre 2000 bis 2003)
(Datenbasis: ALLBUS, Eurobarometer 2007)
.*(Datenbasis: ,.*)
Link Generation
8. 92015-06-10 InFoLiS II – Making Data citations a reality
Infrastructure
Internal API
Text Extraction
Pattern Learning
Reference Extraction
Link Generation
File Storage
Public API
JSON-LD ↔ RDF
REST API
Simple HTTP API
Resource Storage
HTTP
(JSON)
Command
Line
Indexing
System
Linked Data
Agent
HTTP
(Turtle)(native)
Browser
Plugin
HTTP
(RDF/XML)
API
Playground
HTTP
(JSON-LD)
HTTP
(JSON)
9. 102015-06-10 InFoLiS II – Making Data citations a reality
Integration
Transformation
link1
link2
compatible
format
Integration
Integration
OAI-PMH
Primo
Enrichments
DDI
DC
10. 112015-06-10 InFoLiS II – Making Data citations a reality
Link Generation
reference linkingstudy1
study2
|urn:nbn:de:0168-ssoar-206773|Publication|URN|Sozio-oekonmisches
Panel (SOEP)|SOEP|10.5684/soep.v27.2|Study|DOI|0.8|LitStudy automatic
link1
link2
11. 122015-06-10 InFoLiS II – Making Data citations a reality
Challenges – Granularity
Solution:
– Build on DDI to describe parts and sets of research data
– Use contextual information for resolving
– Keep provenance
?
?
“ALLBUS 2000“
...
...
?
?
12. 132015-06-10 InFoLiS II – Making Data citations a reality
|urn:nbn:de:0168-ssoar-206773|Publication|URN|Sozio-oekonmisches
Panel (SOEP)|SOEP|10.5684/soep.v27.2|Study|DOI|0.8|LitStudy automatic
Challenge – Provenance
Solution:
– Retain configuration of algorithms (esp. Learning and Matching)
– Immutable Linked Data resources with resolvable URI
13. 142015-06-10 InFoLiS II – Making Data citations a reality
Challenge – Broaden scope
● Support more languages and fields
– English language
– Economic Research
● Subtle differences
– Punctuation
– Capitalization
– Footnotes / Endnotes
– Finding the right seeds
● Solution:
– Tweak algorithms and configurations
– Communicate results and repeat
14. 152015-06-10 InFoLiS II – Making Data citations a reality
Challenge – Integration
• Integration into other systems
• Data up-to-dateness
• Provenance
15. 162015-06-10 InFoLiS II – Making Data citations a reality
Next steps (Q3 / Q4 2015)
● Stress test the API http://infolis.gesis.org
● Integrate full set of ICPSR documents
● Integrate data from partner institutions and companies
● Develop reliability-based bootstrapping algorithm further
● Test workflows with research prototype
http://www.bib.uni-mannheim.de/vufind/
● Develop browser plugins/JS libraries to make use of Infolink
16. 172015-06-10 InFoLiS II – Making Data citations a reality
Thank you for your attention!
Questions?
Keep in touch:
http://infolis.github.io
All Software is Open Source:
http://github.com/infolis