SlideShare a Scribd company logo
1 of 22
Download to read offline
(Big) Bibliographic Data
UB Leipzig & SLUB Dresden
ScaDS project meeting, 12.6.2015
Leander Seige, Felix Lohmeier, Ralf Talkenberger
“The library of the
21st century
is a data hub.”
quoted from an internal strategic paper of
Leipzig University Library, 2015
simple bibliographic metadata
<metadata>
title
author
isbn
publisher
year
…
<resource>
books
serials
newspapers
articles
...
<resource> book
● printed books in the library’s shelves
● bought ebooks
● licensed ebooks
● pay-per-use ebooks
● free content
● ebooks to be bought by the library (patron driven acquisition = pda)
● even printed books to be bought by the library (pda too)
<resource> journals
● printed journals in the library’s shelves
● much more licensed electronic journals
○ full text accessible via web interfaces
● do we have article metadata?
● yes: licensed journal articles: 10s of millions per library
<metadata> accessibility information
● where is a ressource? (physical or on the net)
● who is allowed to access this content? (students? faculty? everyone?)
● is it available off-campus?
● did we buy it or is it just licensed?
● may the user copy or print it?
● is the library allowed to store the electronic file?
● may we grant access from wifi connections?
● ...or any combination of these...
<metadata> knowledge bases
● librarians built large knowledge bases to describe resources
● in german speaking countries: GND (Gemeinsame Normdatei) der
Deutschen Nationalbibliothek http://www.dnb.de/EN/gnd
● international: http://viaf.org
● provide dbpedia-links to explore the linked data cloud and to enrich
library data
<metadata> knowledge bases
● GND (and other national authority files via VIAF)
○ describe Persons, Corporate bodies, Conferences and Events,
Geographic Information, Topics, Works and relationships
between them
○ form a generic knowledge base, independent from any specific
domain
○ provide links to other knowledge bases (dbpedia, geonames...)
resource discovery
● traditional “OPACs” provided access to traditional library resources like
printed books, users had to use proprietary vendor drive portals to
access electronic ressources
● today, printed materials represent only a small part of library resources
● in contrast: resource discovery systems aim to integrate all
resources of a library and present them in one single search
interface
Cooperation
● UBL and SLUB joined forces in March 2015
● Goals:
a. Exchange of metadata after processing
b. Develop common workflows to avoid “double work”
→ integrate existing tools finc & d:swarm
finc Community
● maintains a large search engine infrastructure
● developed and hosted at Leipzig University Library
● based on Apache Solr und VuFind
● rugged metadata management system,
processing millions of data records each day
● integrates more than 50 data sources
https://finc.info
finc Community
● provides more than 15 university libraries with
resource discovery systems
● offers great potential to design and implement user oriented
functions on real world systems, serving thousands of library
users in Saxony and beyond, every day
● employs the aggregated index at Leipzig University Library
https://finc.info
10%
physical items
90%
electronic content
on the net
aggregated index at
Leipzig University Library
aggregated index at
Leipzig University Library
● 12 million traditional data records (growing)
● 80 million electronic article data records (growing)
● each records contains 20 data fields
1.8 billion triple
(if you triplify it)
(without any enrichment data)
Data processing today
● distributed data storage
○ 2 Solr in Leipzig
(~12 mio + ~80 mio records)
○ 2 Solr in Dresden
(~2 mio + ~2 mio records)
● constraint: each data source is
handled separately
→ difficult to build up relations
and deep data integration
d:swarm
● yet another tool…?
a. property graph database
b. gui for library staff
Tools
finc d:swarm
focus data normalization data integration and enrichment
technology script-based transformations
(python, go, ElasticSearch)
encapsulates metafacture (open
source toolchain for metadata
transformation)
Property Graph (Neo4j)
status Works fine with ~100 mio.
records (less than one day)
Scability issues (~ 4 mio. records in
less than one day)
integrating finc with d:swarm
● enhance data processing regarding
○ authority data linking (NLP)
○ fuzzy deduplication
○ classification
○ relate bibliographic data to places, topics, abstract terms
○ publish machine readable data (linked data)
● create user interfaces to enable system librarians to control metadata
processing
Tomorrow: common workflows
● All data flows through both tools (finc + d:swarm)
● Deduplication (in graphDB easier duplication recognition)
● FRBRization (aggregate different physical and formal versions of a
work)
● Knowledge graph makes enrichment (authorities, altmetrics data,
usage data, …) and analytics easier
Scalability issues
● current implementation of property graph is too slow
● test results with 64GB RAM, SSD, 16 cores
○ 1,2 mio records (flat format): 10 hours for complete workflow
(ingest, transformation, export)
○ more complex formats (MARC21) up to 5x statements
● single Neo4j instance, storage and memory issues
d:swarm architecture
Possible solutions?
● “mit Hardware erschlagen”
● Another graphDB, parallelization?
○ ArangoDB: https://www.arangodb.com
○ Apache Giraph: http://giraph.apache.org
○ Blaze Graph: http://blazegraph.com (Wikidata’s choice)
● Gradoop?!

More Related Content

What's hot

20170501 Distributed Network of Digital Heritage Information
20170501  Distributed Network of Digital Heritage Information20170501  Distributed Network of Digital Heritage Information
20170501 Distributed Network of Digital Heritage InformationEnno Meijers
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Accessopenminted_eu
 
Presentation of the OpenAIRE webinars during the Open Access Week 2016
Presentation of the OpenAIRE webinars during the Open Access Week 2016Presentation of the OpenAIRE webinars during the Open Access Week 2016
Presentation of the OpenAIRE webinars during the Open Access Week 2016OpenAIRE
 
Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...
Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...
Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...Nicolaie Constantinescu
 
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataOpen Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataPascal-Nicolas Becker
 
Cloud Transforms Culture, Europeana Tech 2014
Cloud Transforms Culture, Europeana Tech 2014Cloud Transforms Culture, Europeana Tech 2014
Cloud Transforms Culture, Europeana Tech 2014PavelKats
 
Beyond OpenAIRE2020
Beyond OpenAIRE2020Beyond OpenAIRE2020
Beyond OpenAIRE2020OpenAIRE
 
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE
 
Enabling Accessible Resource Access via Service Providers
Enabling Accessible Resource Access via Service ProvidersEnabling Accessible Resource Access via Service Providers
Enabling Accessible Resource Access via Service ProvidersAlexander Haffner
 
Forschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
Forschungsdaten-Repositorien Typen, Herausforderungen und PerspektivenForschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
Forschungsdaten-Repositorien Typen, Herausforderungen und PerspektivenHeinz Pampel
 
Library and data lecture for inf21306
Library and data lecture for  inf21306Library and data lecture for  inf21306
Library and data lecture for inf21306Hugo Besemer
 
OpenAIRE @ OECD Blue Sky III
OpenAIRE @ OECD Blue Sky IIIOpenAIRE @ OECD Blue Sky III
OpenAIRE @ OECD Blue Sky IIIOpenAIRE
 
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritte
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritteDa Biblissima a Biblissima+ : per un osservatorio delle culture scritte
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritteEquipex Biblissima
 
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard JensenTuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard JensenWARCnet
 
Web at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataWeb at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataAI4BD GmbH
 
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...Equipex Biblissima
 

What's hot (20)

20170501 Distributed Network of Digital Heritage Information
20170501  Distributed Network of Digital Heritage Information20170501  Distributed Network of Digital Heritage Information
20170501 Distributed Network of Digital Heritage Information
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
 
Presentation of the OpenAIRE webinars during the Open Access Week 2016
Presentation of the OpenAIRE webinars during the Open Access Week 2016Presentation of the OpenAIRE webinars during the Open Access Week 2016
Presentation of the OpenAIRE webinars during the Open Access Week 2016
 
Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...
Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...
Datele in biblioteca noi servicii / Bibliotheken als Datenzentren: ein Einbli...
 
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataOpen Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked Data
 
Cloud Transforms Culture, Europeana Tech 2014
Cloud Transforms Culture, Europeana Tech 2014Cloud Transforms Culture, Europeana Tech 2014
Cloud Transforms Culture, Europeana Tech 2014
 
Beyond OpenAIRE2020
Beyond OpenAIRE2020Beyond OpenAIRE2020
Beyond OpenAIRE2020
 
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
 
Enabling Accessible Resource Access via Service Providers
Enabling Accessible Resource Access via Service ProvidersEnabling Accessible Resource Access via Service Providers
Enabling Accessible Resource Access via Service Providers
 
Forschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
Forschungsdaten-Repositorien Typen, Herausforderungen und PerspektivenForschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
Forschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
 
Library and data lecture for inf21306
Library and data lecture for  inf21306Library and data lecture for  inf21306
Library and data lecture for inf21306
 
Itapa2010 custodea
Itapa2010 custodeaItapa2010 custodea
Itapa2010 custodea
 
OpenAIRE @ OECD Blue Sky III
OpenAIRE @ OECD Blue Sky IIIOpenAIRE @ OECD Blue Sky III
OpenAIRE @ OECD Blue Sky III
 
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritte
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritteDa Biblissima a Biblissima+ : per un osservatorio delle culture scritte
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritte
 
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard JensenTuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
 
Web at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataWeb at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open Data
 
Xmseng
XmsengXmseng
Xmseng
 
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...
Digital Manuscripts Without Borders: A Discovery Platform of Manuscripts and ...
 
E-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government ArchivesE-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government Archives
 
Imac 090924
Imac 090924Imac 090924
Imac 090924
 

Viewers also liked

Open Source Bibliotheksmanagement (mit D:SWARM + AMSL)
Open Source Bibliotheksmanagement (mit D:SWARM + AMSL)Open Source Bibliotheksmanagement (mit D:SWARM + AMSL)
Open Source Bibliotheksmanagement (mit D:SWARM + AMSL)Felix Lohmeier
 
Fachreferat 3.0 - mit Campus Communities den Forschungsdiskurs auf digitaler ...
Fachreferat 3.0 - mit Campus Communities den Forschungsdiskurs auf digitaler ...Fachreferat 3.0 - mit Campus Communities den Forschungsdiskurs auf digitaler ...
Fachreferat 3.0 - mit Campus Communities den Forschungsdiskurs auf digitaler ...Felix Lohmeier
 
VIVO Use Case Dresden #VIVODE15 9.9.2015
VIVO Use Case Dresden #VIVODE15 9.9.2015VIVO Use Case Dresden #VIVODE15 9.9.2015
VIVO Use Case Dresden #VIVODE15 9.9.2015Felix Lohmeier
 
Leitbild Openness - Bibliotheken als Wächter für den (dauerhaft) freien Zugan...
Leitbild Openness - Bibliotheken als Wächter für den (dauerhaft) freien Zugan...Leitbild Openness - Bibliotheken als Wächter für den (dauerhaft) freien Zugan...
Leitbild Openness - Bibliotheken als Wächter für den (dauerhaft) freien Zugan...Felix Lohmeier
 
TextGrid 2.0 @ Bibliothekartag 2012
TextGrid 2.0 @ Bibliothekartag 2012TextGrid 2.0 @ Bibliothekartag 2012
TextGrid 2.0 @ Bibliothekartag 2012Felix Lohmeier
 
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Jan Polowinski
 
Schlanke Discovery-Lösung auf Basis von TYPO3. Der neue Bibliothekskatalog de...
Schlanke Discovery-Lösung auf Basis von TYPO3. Der neue Bibliothekskatalog de...Schlanke Discovery-Lösung auf Basis von TYPO3. Der neue Bibliothekskatalog de...
Schlanke Discovery-Lösung auf Basis von TYPO3. Der neue Bibliothekskatalog de...Felix Lohmeier
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...Jens Mittelbach
 
Installation einer virtuellen Maschine (Ubuntu MATE 16.04 LTS) auf USB-Stick ...
Installation einer virtuellen Maschine (Ubuntu MATE 16.04 LTS) auf USB-Stick ...Installation einer virtuellen Maschine (Ubuntu MATE 16.04 LTS) auf USB-Stick ...
Installation einer virtuellen Maschine (Ubuntu MATE 16.04 LTS) auf USB-Stick ...Felix Lohmeier
 

Viewers also liked (9)

Open Source Bibliotheksmanagement (mit D:SWARM + AMSL)
Open Source Bibliotheksmanagement (mit D:SWARM + AMSL)Open Source Bibliotheksmanagement (mit D:SWARM + AMSL)
Open Source Bibliotheksmanagement (mit D:SWARM + AMSL)
 
Fachreferat 3.0 - mit Campus Communities den Forschungsdiskurs auf digitaler ...
Fachreferat 3.0 - mit Campus Communities den Forschungsdiskurs auf digitaler ...Fachreferat 3.0 - mit Campus Communities den Forschungsdiskurs auf digitaler ...
Fachreferat 3.0 - mit Campus Communities den Forschungsdiskurs auf digitaler ...
 
VIVO Use Case Dresden #VIVODE15 9.9.2015
VIVO Use Case Dresden #VIVODE15 9.9.2015VIVO Use Case Dresden #VIVODE15 9.9.2015
VIVO Use Case Dresden #VIVODE15 9.9.2015
 
Leitbild Openness - Bibliotheken als Wächter für den (dauerhaft) freien Zugan...
Leitbild Openness - Bibliotheken als Wächter für den (dauerhaft) freien Zugan...Leitbild Openness - Bibliotheken als Wächter für den (dauerhaft) freien Zugan...
Leitbild Openness - Bibliotheken als Wächter für den (dauerhaft) freien Zugan...
 
TextGrid 2.0 @ Bibliothekartag 2012
TextGrid 2.0 @ Bibliothekartag 2012TextGrid 2.0 @ Bibliothekartag 2012
TextGrid 2.0 @ Bibliothekartag 2012
 
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
 
Schlanke Discovery-Lösung auf Basis von TYPO3. Der neue Bibliothekskatalog de...
Schlanke Discovery-Lösung auf Basis von TYPO3. Der neue Bibliothekskatalog de...Schlanke Discovery-Lösung auf Basis von TYPO3. Der neue Bibliothekskatalog de...
Schlanke Discovery-Lösung auf Basis von TYPO3. Der neue Bibliothekskatalog de...
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
 
Installation einer virtuellen Maschine (Ubuntu MATE 16.04 LTS) auf USB-Stick ...
Installation einer virtuellen Maschine (Ubuntu MATE 16.04 LTS) auf USB-Stick ...Installation einer virtuellen Maschine (Ubuntu MATE 16.04 LTS) auf USB-Stick ...
Installation einer virtuellen Maschine (Ubuntu MATE 16.04 LTS) auf USB-Stick ...
 

Similar to (Big) bibliographic data @ ScaDS project meeting - 2015-06-12

Seige arndt-lightning talk swib13
Seige arndt-lightning talk swib13Seige arndt-lightning talk swib13
Seige arndt-lightning talk swib13Leander Seige
 
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.Mike Mertens
 
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataIFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataLars G. Svensson
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)robin fay
 
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...Dr. Haxel Consult
 
Linked Open Data: Identifying Opportunities
Linked Open Data: Identifying OpportunitiesLinked Open Data: Identifying Opportunities
Linked Open Data: Identifying OpportunitiesLibrary_Connect
 
greenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrlgreenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrlCheryl Tanicala-Roldan
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
 
A Comparative Kalendar - DH2013 Presentation
A Comparative Kalendar - DH2013 PresentationA Comparative Kalendar - DH2013 Presentation
A Comparative Kalendar - DH2013 Presentationblalbritton
 
The ABES Discovery Study
The ABES Discovery StudyThe ABES Discovery Study
The ABES Discovery StudyABES
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital librariesSören Auer
 
131205 KU Leuven and the LIBISnet consortium on the way to the next generatio...
131205 KU Leuven and the LIBISnet consortium on the way to the next generatio...131205 KU Leuven and the LIBISnet consortium on the way to the next generatio...
131205 KU Leuven and the LIBISnet consortium on the way to the next generatio...Jo Rademakers
 
Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...CIGScotland
 
Wikisource - Where we are, where we want to go
Wikisource  - Where we are, where we want to go Wikisource  - Where we are, where we want to go
Wikisource - Where we are, where we want to go AubreyMcFato
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked dataEnno Meijers
 
Local content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providersLocal content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providerslocloud
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
LS DIGITAL FOR DIGITAL LIBRARY
 LS DIGITAL  FOR DIGITAL LIBRARY LS DIGITAL  FOR DIGITAL LIBRARY
LS DIGITAL FOR DIGITAL LIBRARYguestfa5009
 

Similar to (Big) bibliographic data @ ScaDS project meeting - 2015-06-12 (20)

Seige arndt-lightning talk swib13
Seige arndt-lightning talk swib13Seige arndt-lightning talk swib13
Seige arndt-lightning talk swib13
 
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
 
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataIFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)
 
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
 
Linked Open Data: Identifying Opportunities
Linked Open Data: Identifying OpportunitiesLinked Open Data: Identifying Opportunities
Linked Open Data: Identifying Opportunities
 
greenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrlgreenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrl
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
A Comparative Kalendar - DH2013 Presentation
A Comparative Kalendar - DH2013 PresentationA Comparative Kalendar - DH2013 Presentation
A Comparative Kalendar - DH2013 Presentation
 
Sistema Compartit a l'ICOLC
Sistema Compartit a l'ICOLCSistema Compartit a l'ICOLC
Sistema Compartit a l'ICOLC
 
The ABES Discovery Study
The ABES Discovery StudyThe ABES Discovery Study
The ABES Discovery Study
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
131205 KU Leuven and the LIBISnet consortium on the way to the next generatio...
131205 KU Leuven and the LIBISnet consortium on the way to the next generatio...131205 KU Leuven and the LIBISnet consortium on the way to the next generatio...
131205 KU Leuven and the LIBISnet consortium on the way to the next generatio...
 
Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...Publishing the British National Bibliography as Linked Open Data / Corine Del...
Publishing the British National Bibliography as Linked Open Data / Corine Del...
 
Wikisource - Where we are, where we want to go
Wikisource  - Where we are, where we want to go Wikisource  - Where we are, where we want to go
Wikisource - Where we are, where we want to go
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
Local content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providersLocal content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providers
 
Washington Linked Data Authority Service at University of Houston
Washington Linked Data Authority Service at University of HoustonWashington Linked Data Authority Service at University of Houston
Washington Linked Data Authority Service at University of Houston
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
LS DIGITAL FOR DIGITAL LIBRARY
 LS DIGITAL  FOR DIGITAL LIBRARY LS DIGITAL  FOR DIGITAL LIBRARY
LS DIGITAL FOR DIGITAL LIBRARY
 

Recently uploaded

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 

Recently uploaded (20)

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 

(Big) bibliographic data @ ScaDS project meeting - 2015-06-12

  • 1. (Big) Bibliographic Data UB Leipzig & SLUB Dresden ScaDS project meeting, 12.6.2015 Leander Seige, Felix Lohmeier, Ralf Talkenberger
  • 2. “The library of the 21st century is a data hub.” quoted from an internal strategic paper of Leipzig University Library, 2015
  • 4. <resource> book ● printed books in the library’s shelves ● bought ebooks ● licensed ebooks ● pay-per-use ebooks ● free content ● ebooks to be bought by the library (patron driven acquisition = pda) ● even printed books to be bought by the library (pda too)
  • 5. <resource> journals ● printed journals in the library’s shelves ● much more licensed electronic journals ○ full text accessible via web interfaces ● do we have article metadata? ● yes: licensed journal articles: 10s of millions per library
  • 6. <metadata> accessibility information ● where is a ressource? (physical or on the net) ● who is allowed to access this content? (students? faculty? everyone?) ● is it available off-campus? ● did we buy it or is it just licensed? ● may the user copy or print it? ● is the library allowed to store the electronic file? ● may we grant access from wifi connections? ● ...or any combination of these...
  • 7. <metadata> knowledge bases ● librarians built large knowledge bases to describe resources ● in german speaking countries: GND (Gemeinsame Normdatei) der Deutschen Nationalbibliothek http://www.dnb.de/EN/gnd ● international: http://viaf.org ● provide dbpedia-links to explore the linked data cloud and to enrich library data
  • 8. <metadata> knowledge bases ● GND (and other national authority files via VIAF) ○ describe Persons, Corporate bodies, Conferences and Events, Geographic Information, Topics, Works and relationships between them ○ form a generic knowledge base, independent from any specific domain ○ provide links to other knowledge bases (dbpedia, geonames...)
  • 9. resource discovery ● traditional “OPACs” provided access to traditional library resources like printed books, users had to use proprietary vendor drive portals to access electronic ressources ● today, printed materials represent only a small part of library resources ● in contrast: resource discovery systems aim to integrate all resources of a library and present them in one single search interface
  • 10. Cooperation ● UBL and SLUB joined forces in March 2015 ● Goals: a. Exchange of metadata after processing b. Develop common workflows to avoid “double work” → integrate existing tools finc & d:swarm
  • 11. finc Community ● maintains a large search engine infrastructure ● developed and hosted at Leipzig University Library ● based on Apache Solr und VuFind ● rugged metadata management system, processing millions of data records each day ● integrates more than 50 data sources https://finc.info
  • 12. finc Community ● provides more than 15 university libraries with resource discovery systems ● offers great potential to design and implement user oriented functions on real world systems, serving thousands of library users in Saxony and beyond, every day ● employs the aggregated index at Leipzig University Library https://finc.info
  • 13. 10% physical items 90% electronic content on the net aggregated index at Leipzig University Library
  • 14. aggregated index at Leipzig University Library ● 12 million traditional data records (growing) ● 80 million electronic article data records (growing) ● each records contains 20 data fields 1.8 billion triple (if you triplify it) (without any enrichment data)
  • 15. Data processing today ● distributed data storage ○ 2 Solr in Leipzig (~12 mio + ~80 mio records) ○ 2 Solr in Dresden (~2 mio + ~2 mio records) ● constraint: each data source is handled separately → difficult to build up relations and deep data integration
  • 16. d:swarm ● yet another tool…? a. property graph database b. gui for library staff
  • 17. Tools finc d:swarm focus data normalization data integration and enrichment technology script-based transformations (python, go, ElasticSearch) encapsulates metafacture (open source toolchain for metadata transformation) Property Graph (Neo4j) status Works fine with ~100 mio. records (less than one day) Scability issues (~ 4 mio. records in less than one day)
  • 18. integrating finc with d:swarm ● enhance data processing regarding ○ authority data linking (NLP) ○ fuzzy deduplication ○ classification ○ relate bibliographic data to places, topics, abstract terms ○ publish machine readable data (linked data) ● create user interfaces to enable system librarians to control metadata processing
  • 19. Tomorrow: common workflows ● All data flows through both tools (finc + d:swarm) ● Deduplication (in graphDB easier duplication recognition) ● FRBRization (aggregate different physical and formal versions of a work) ● Knowledge graph makes enrichment (authorities, altmetrics data, usage data, …) and analytics easier
  • 20. Scalability issues ● current implementation of property graph is too slow ● test results with 64GB RAM, SSD, 16 cores ○ 1,2 mio records (flat format): 10 hours for complete workflow (ingest, transformation, export) ○ more complex formats (MARC21) up to 5x statements ● single Neo4j instance, storage and memory issues
  • 22. Possible solutions? ● “mit Hardware erschlagen” ● Another graphDB, parallelization? ○ ArangoDB: https://www.arangodb.com ○ Apache Giraph: http://giraph.apache.org ○ Blaze Graph: http://blazegraph.com (Wikidata’s choice) ● Gradoop?!