SlideShare a Scribd company logo
Validating 126 million
MARC records
DATeCH 2019, Brussels, 2019-05-10.
Péter Király
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG)
Card catalog at Gent University Library, photo: Pieter Morlion, 2010 CC-BY 4.0
https://commons.wikimedia.org/wiki/File:Boekentoren_2010PM_1179_21H9015.JPG
http://bit.ly/qa-datech2019
part I. short introduction to MARC
❏ MAchine Readable Cataloging
❏ format and semantic specification
❏ comes from the age of punch cards – information compression required
❏ invented in early 60’s
❏ love to hate criticise it: “MARC must die”*, “Stockholm syndrome of MARC”**
❏ “There are only two kinds of people who believe themselves able to read a
MARC record without referring to a stack of manuals: a handful of our top
catalogers and those on serious drugs.”
* Roy Tennnant http://lj.libraryjournal.com/2002/10/ljarchives/marc-must-die/
** Niklas Lindström at ELAG 2019 https://twitter.com/cm_harlow/status/1126068414928293888
2
http://bit.ly/qa-datech2019
a (pretty printed) example
LDR 01136cnm a2200253ui 4500
001 002032820
005 20150224114135.0
008 031117s2003 gw 000 0 ger d
020 $a3805909810
100 1 $avon Staudinger, Julius,$d1836-1902$0(viaf)14846766
245 10$aJ. von Staudingers Kommentar zum ... /$cJ. von Staudinger.
250 $aNeubearb. 2003$bvon Jörn Eckert
260 $aBerlin :$bSellier-de Gruyter,$c2003.
300 $a534 p. ;.
500 $aCiteertitel: BGB.
500 $aBandtitel: Staudinger BGB.
700 1 $aEckert, Jörn
852 4 $xRE$bRE55$cRBIB$jRBIB.BUR 011 DE 021$p000000800147
3
http://bit.ly/qa-datech2019
looks like rocket science...
Apollo 11 (moon landing) source code
https://github.com/chrislgarry/Apollo-11
4
http://bit.ly/qa-datech2019
positional fields - 008
‘801003s1958 ja 000 0 jpn ‘
0 1 2 3
0123456789012345678901234567890123456789
aaaaaabccccddddeeefffgh All materials
IIIIjkLLLLmnopqr Books
ijklmnOOOpqrs Continuing Resources
iijklmNNNNNNOOp Music
IIIIjjklmnOO Maps
Iiijklmn Visual Materials
ijkl Computer Files
i Mixed Materials
lower case = distinct units
upper case = repeatable units
 = undefined position
depends on record
type (calculated
from Leader values)
5
http://bit.ly/qa-datech2019
datafields
repeatable/non-repeatable
Indicator1
Indicator2
Subfield1, ... , Subfieldn
always 1 char long dictionary term
❏ code
❏ value
❏ free text
❏ dictionary term
❏ fixed format (e.g. yymmdd)
❏ fixed format + dictionary terms (d7i2)
❏ fixed positions + dictionary terms
❏ repeatable/non-repeatable
6
http://bit.ly/qa-datech2019
versions
❏ changes of the standard
❏ no versioning
❏ new, deleted and changed elements every year
❏ localized versions
❏ introducing new fields
❏ overwriting existing fields
❏ mixing localized versions
❏ no notion about the localization
❏ 50+ localizations (international, national, consortial)
7
http://bit.ly/qa-datech2019
size – number of data elements implemented
8
MARC 21 versions total
control fields 7 7
control subfields 211 211
data fields 215 68 283
indicators 175 8 183
subfields 2259 344 2603
3287
http://bit.ly/qa-datech2019
Java classes
qa-metadata-marc.jar
Avram JSON
data model
export
machine readable standard
Remember heroines!
9
http://bit.ly/qa-datech2019
Margaret Hamilton
https://qz.com/726338/
Henriette D. Avram
smithsonianmag.com
Part II.
record validation
and quality assessment
Boekentoren UGent - de belvedère, photo: Michiel Hendryckx, 2013, CC-BY-SA 3.0
https://commons.wikimedia.org/wiki/File:Boekentoren_ugent_belvedere_675.jpg
10
http://bit.ly/qa-datech2019
quality assessment workflow
1. ingest
2. measure records
3. aggregate
4. report
5. evaluate with experts (feedback loop)
11
http://bit.ly/qa-datech2019
Improve records
1. ingest data
Bavarian union catalogue (bay) – 27.3 million records; Baden-Würtemberg
union catalogue (bzb) – 23.1 m; Columbia (col) – 6.0 m; Heritage of the
Printed Book Database, CERL (cer) – 6.7 m; German National Bibliography
(dnb) – 16.7 m; Gent (gen) – 1.8 m; Harvard (har) – 13.7 m; Library of
Congress (loc) – 10.1 m; Michigan (mic) – 1.3 m; Finnish National
Bibliography (nfi) – 1.0 m; Repertoire International des Sources Musicales
(ris) – 1.3 m; San Francisco Public Library (sfp) – 0.9 m; Stanford (sta) – 9.4
m; Szeged (szt) – 1.2 m; TIB Hannover (tib) – 3.5 m; Toronto Public Library
(tor) – 2.5 m
union catalogues – national libraries – university libraries – public libraries
12
http://bit.ly/qa-datech2019
2. measure records
$ ./validator [options] [file]
001999999 852 undefined subfield L
https://www.loc.gov/...
002000005 035 undefined subfield 9
https://www.loc.gov/...
002000005 852 undefined subfield L
https://www.loc.gov/...
002000005 852 undefined subfield L
https://www.loc.gov/...
002000008 035 undefined subfield 9
https://www.loc.gov/… 13
http://bit.ly/qa-datech2019
3. aggregating results – records with issues
14
all filtered
bay 100.0 18.8
bzb 100.0 76.1
cer 2.8 2.8
col 90.4 66.0
dnb 13.9 0.2
gen 40.8 27.3
har 100.0 97.3
loc 30.5 29.3
all filtered
mic 80.8 67.5
nfi 62.1 58.1
ris 99.7 57.1
sfp 82.7 60.4
sta 92.7 92.5
szt 30.8 30.6
tib 100.0 100.0
tor 100.0 74.2
Filtered = issues excluding the undocumented tags and subfields
http://bit.ly/qa-datech2019
issue types
issues on record level
❏ R1 ambiguous linkage
❏ R2 invalid linkage
❏ R3 type error
control field issues
❏ C1 invalid code
❏ C2 invalid value
15
field issues
❏ F1 missing reference
subfield (880$6)
❏ F2 non-repeatable field
❏ F3 undefined field
indicator issues
❏ I1 invalid value
❏ I2 non-empty value
❏ I3 obsolete value
subfield issues
❏ S1 classification
❏ S2 invalid ISBN
❏ S3 invalid ISSN
❏ S4 invalid length
❏ S5 invalid value
❏ S6 repetition
❏ S7 undefined subfield
❏ S8 non well-formatted
value
http://bit.ly/qa-datech2019
number of subfields in catalogues
total 1% 10%
bay 854 144 51
bzb 522 144 65
crl 169 65 39
col 1862 196 59
dnb 575 186 97
gnt 955 122 47
har 2024 154 49
loc 1156 128 40
16
total 1% 10%
mic 1233 138 37
nfi 811 145 54
ris 138 88 52
sfp 1046 125 37
sta 2997 225 64
szt 1210 74 42
tib 46 41 35
tor 1733 163 46
The tool has 2600+ subfield definitions
total: total number of fields, 1% fields availabe in at least 1% of the records, 10%: fields available in at
least 10% of the records.
Top fields (not in the table) – 50%: 13-25 fields, 80%: 4-18 fields, 90%: 0-16 fields
http://bit.ly/qa-datech2019
completeness by field groups
17
summary of errors
18
K-means clustering
Spark (Scala)
increasing number of clusters
decreasing the distance from
the centroids
after a point this gain is not
so big (“elbow effect”) -- in
theory
Big number or low
quality records
small clusters with ‘in
between’ quality records
the acceptable average
clusters with good quality
records
19
http://bit.ly/qa-datech2019 Thompson and Traill (2017) http://journal.code4lib.org/articles/12828
4. report (web UI)
20
http://bit.ly/qa-datech2019
21
http://bit.ly/qa-datech2019
22
http://bit.ly/qa-datech2019
23
http://bit.ly/qa-datech2019
Finding problems with facets
Vandenhoeck und Ruprecht
Vandenhoeck & Ruprecht
Vandenhoeck u. Ruprecht
Vandenhoeck
Vandenhoek & Ruprecht
Vandenhoek und Ruprecht
Bandenhoed und Ruprecht
Vandenhoeck et Ruprecht
Vandenhoeck & Reprecht
Vandenhoed und Ruprecht
V&R unipress
V&R Unipress
V & R Unipress
V & R unipress
24
http://bit.ly/qa-datech2019
est. 1735
cataloging
frontline
intensive backward
cataloging -
maybe importing?
backward
cataloging is still
intensive, the
tendency continues
peak is > 13K
2000-07-10, the “golden day”:
95K new records
forward cataloging
25
http://bit.ly/qa-datech2019
everything else
… at least regarding to this project
code & docs: https://github.com/pkiraly/metadata-qa-marc
Web UI source code: https://github.com/pkiraly/metadata-qa-marc-web
Avram Specification (Jakob Voß):
http://format.gbv.de/schema/avram/specification
https://twitter.com/kiru
peter.kiraly@gwdg.de
26
http://bit.ly/qa-datech2019

More Related Content

Similar to Validating 126 million MARC records (DATeCH 2019)

2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
datacite
 
MARC 21 Training at Daffodil International University
MARC 21 Training at Daffodil International UniversityMARC 21 Training at Daffodil International University
MARC 21 Training at Daffodil International University
Nur Ahammad
 
Odp
OdpOdp
SC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewSC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project Overview
BigData_Europe
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
GigaScience, BGI Hong Kong
 
Computer Interface for Electroluminescence (EL)
Computer Interface for Electroluminescence (EL)Computer Interface for Electroluminescence (EL)
Computer Interface for Electroluminescence (EL)
Editor IJCATR
 
Hydraulic Calculator Manual
Hydraulic Calculator ManualHydraulic Calculator Manual
Hydraulic Calculator Manual
Francis Mitchell
 
Change Tracking in Knowledge Organization Systems with skos-history
Change Tracking in Knowledge Organization Systems with skos-historyChange Tracking in Knowledge Organization Systems with skos-history
Change Tracking in Knowledge Organization Systems with skos-history
Joachim Neubert
 
New ways to communicate in science: perspectives from biodiversity research
New ways to communicate in science: perspectives from biodiversity researchNew ways to communicate in science: perspectives from biodiversity research
New ways to communicate in science: perspectives from biodiversity research
Vince Smith
 
Named Entity Recognition, Concept Normalization and Clinical Coding: Overview...
Named Entity Recognition, Concept Normalization and Clinical Coding: Overview...Named Entity Recognition, Concept Normalization and Clinical Coding: Overview...
Named Entity Recognition, Concept Normalization and Clinical Coding: Overview...
Martin Krallinger
 
MARC
MARCMARC
MARC
MARCMARC
ACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF TalkACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF Talk
Markus Sitzmann
 
PGDay.Amsterdam 2018 - Bruce Momjian - Will postgres live forever
PGDay.Amsterdam 2018 - Bruce Momjian - Will postgres live foreverPGDay.Amsterdam 2018 - Bruce Momjian - Will postgres live forever
PGDay.Amsterdam 2018 - Bruce Momjian - Will postgres live forever
PGDay.Amsterdam
 
A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008
Emanuele Della Valle
 
Coco co-desing and co-verification of masked software implementations on cp us
Coco   co-desing and co-verification of masked software implementations on cp usCoco   co-desing and co-verification of masked software implementations on cp us
Coco co-desing and co-verification of masked software implementations on cp us
RISC-V International
 
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Sease
 
Exploring the Great Olympian Graph
Exploring the Great Olympian GraphExploring the Great Olympian Graph
Exploring the Great Olympian Graph
Neo4j
 
Smart Container Overview UN/CEFACT Forum 2020
Smart Container Overview UN/CEFACT Forum 2020Smart Container Overview UN/CEFACT Forum 2020
Smart Container Overview UN/CEFACT Forum 2020
Jaco Voorspuij
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
Tim Bell
 

Similar to Validating 126 million MARC records (DATeCH 2019) (20)

2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
 
MARC 21 Training at Daffodil International University
MARC 21 Training at Daffodil International UniversityMARC 21 Training at Daffodil International University
MARC 21 Training at Daffodil International University
 
Odp
OdpOdp
Odp
 
SC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewSC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project Overview
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Computer Interface for Electroluminescence (EL)
Computer Interface for Electroluminescence (EL)Computer Interface for Electroluminescence (EL)
Computer Interface for Electroluminescence (EL)
 
Hydraulic Calculator Manual
Hydraulic Calculator ManualHydraulic Calculator Manual
Hydraulic Calculator Manual
 
Change Tracking in Knowledge Organization Systems with skos-history
Change Tracking in Knowledge Organization Systems with skos-historyChange Tracking in Knowledge Organization Systems with skos-history
Change Tracking in Knowledge Organization Systems with skos-history
 
New ways to communicate in science: perspectives from biodiversity research
New ways to communicate in science: perspectives from biodiversity researchNew ways to communicate in science: perspectives from biodiversity research
New ways to communicate in science: perspectives from biodiversity research
 
Named Entity Recognition, Concept Normalization and Clinical Coding: Overview...
Named Entity Recognition, Concept Normalization and Clinical Coding: Overview...Named Entity Recognition, Concept Normalization and Clinical Coding: Overview...
Named Entity Recognition, Concept Normalization and Clinical Coding: Overview...
 
MARC
MARCMARC
MARC
 
MARC
MARCMARC
MARC
 
ACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF TalkACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF Talk
 
PGDay.Amsterdam 2018 - Bruce Momjian - Will postgres live forever
PGDay.Amsterdam 2018 - Bruce Momjian - Will postgres live foreverPGDay.Amsterdam 2018 - Bruce Momjian - Will postgres live forever
PGDay.Amsterdam 2018 - Bruce Momjian - Will postgres live forever
 
A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008
 
Coco co-desing and co-verification of masked software implementations on cp us
Coco   co-desing and co-verification of masked software implementations on cp usCoco   co-desing and co-verification of masked software implementations on cp us
Coco co-desing and co-verification of masked software implementations on cp us
 
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
 
Exploring the Great Olympian Graph
Exploring the Great Olympian GraphExploring the Great Olympian Graph
Exploring the Great Olympian Graph
 
Smart Container Overview UN/CEFACT Forum 2020
Smart Container Overview UN/CEFACT Forum 2020Smart Container Overview UN/CEFACT Forum 2020
Smart Container Overview UN/CEFACT Forum 2020
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 

More from Péter Király

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Péter Király
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
Péter Király
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
Péter Király
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Péter Király
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Péter Király
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Péter Király
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
Péter Király
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Péter Király
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
Péter Király
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
Péter Király
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
Péter Király
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)
Péter Király
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Péter Király
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
Péter Király
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Péter Király
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
Péter Király
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Péter Király
 
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Péter Király
 
SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)
Péter Király
 
Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)
Péter Király
 

More from Péter Király (20)

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
 
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
 
SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)
 
Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)
 

Recently uploaded

一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
ArianaRamos54
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
mbawufebxi
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 

Recently uploaded (20)

一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 

Validating 126 million MARC records (DATeCH 2019)

  • 1. Validating 126 million MARC records DATeCH 2019, Brussels, 2019-05-10. Péter Király Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Card catalog at Gent University Library, photo: Pieter Morlion, 2010 CC-BY 4.0 https://commons.wikimedia.org/wiki/File:Boekentoren_2010PM_1179_21H9015.JPG http://bit.ly/qa-datech2019
  • 2. part I. short introduction to MARC ❏ MAchine Readable Cataloging ❏ format and semantic specification ❏ comes from the age of punch cards – information compression required ❏ invented in early 60’s ❏ love to hate criticise it: “MARC must die”*, “Stockholm syndrome of MARC”** ❏ “There are only two kinds of people who believe themselves able to read a MARC record without referring to a stack of manuals: a handful of our top catalogers and those on serious drugs.” * Roy Tennnant http://lj.libraryjournal.com/2002/10/ljarchives/marc-must-die/ ** Niklas Lindström at ELAG 2019 https://twitter.com/cm_harlow/status/1126068414928293888 2 http://bit.ly/qa-datech2019
  • 3. a (pretty printed) example LDR 01136cnm a2200253ui 4500 001 002032820 005 20150224114135.0 008 031117s2003 gw 000 0 ger d 020 $a3805909810 100 1 $avon Staudinger, Julius,$d1836-1902$0(viaf)14846766 245 10$aJ. von Staudingers Kommentar zum ... /$cJ. von Staudinger. 250 $aNeubearb. 2003$bvon Jörn Eckert 260 $aBerlin :$bSellier-de Gruyter,$c2003. 300 $a534 p. ;. 500 $aCiteertitel: BGB. 500 $aBandtitel: Staudinger BGB. 700 1 $aEckert, Jörn 852 4 $xRE$bRE55$cRBIB$jRBIB.BUR 011 DE 021$p000000800147 3 http://bit.ly/qa-datech2019
  • 4. looks like rocket science... Apollo 11 (moon landing) source code https://github.com/chrislgarry/Apollo-11 4 http://bit.ly/qa-datech2019
  • 5. positional fields - 008 ‘801003s1958 ja 000 0 jpn ‘ 0 1 2 3 0123456789012345678901234567890123456789 aaaaaabccccddddeeefffgh All materials IIIIjkLLLLmnopqr Books ijklmnOOOpqrs Continuing Resources iijklmNNNNNNOOp Music IIIIjjklmnOO Maps Iiijklmn Visual Materials ijkl Computer Files i Mixed Materials lower case = distinct units upper case = repeatable units = undefined position depends on record type (calculated from Leader values) 5 http://bit.ly/qa-datech2019
  • 6. datafields repeatable/non-repeatable Indicator1 Indicator2 Subfield1, ... , Subfieldn always 1 char long dictionary term ❏ code ❏ value ❏ free text ❏ dictionary term ❏ fixed format (e.g. yymmdd) ❏ fixed format + dictionary terms (d7i2) ❏ fixed positions + dictionary terms ❏ repeatable/non-repeatable 6 http://bit.ly/qa-datech2019
  • 7. versions ❏ changes of the standard ❏ no versioning ❏ new, deleted and changed elements every year ❏ localized versions ❏ introducing new fields ❏ overwriting existing fields ❏ mixing localized versions ❏ no notion about the localization ❏ 50+ localizations (international, national, consortial) 7 http://bit.ly/qa-datech2019
  • 8. size – number of data elements implemented 8 MARC 21 versions total control fields 7 7 control subfields 211 211 data fields 215 68 283 indicators 175 8 183 subfields 2259 344 2603 3287 http://bit.ly/qa-datech2019 Java classes qa-metadata-marc.jar Avram JSON data model export machine readable standard
  • 10. Part II. record validation and quality assessment Boekentoren UGent - de belvedère, photo: Michiel Hendryckx, 2013, CC-BY-SA 3.0 https://commons.wikimedia.org/wiki/File:Boekentoren_ugent_belvedere_675.jpg 10 http://bit.ly/qa-datech2019
  • 11. quality assessment workflow 1. ingest 2. measure records 3. aggregate 4. report 5. evaluate with experts (feedback loop) 11 http://bit.ly/qa-datech2019 Improve records
  • 12. 1. ingest data Bavarian union catalogue (bay) – 27.3 million records; Baden-Würtemberg union catalogue (bzb) – 23.1 m; Columbia (col) – 6.0 m; Heritage of the Printed Book Database, CERL (cer) – 6.7 m; German National Bibliography (dnb) – 16.7 m; Gent (gen) – 1.8 m; Harvard (har) – 13.7 m; Library of Congress (loc) – 10.1 m; Michigan (mic) – 1.3 m; Finnish National Bibliography (nfi) – 1.0 m; Repertoire International des Sources Musicales (ris) – 1.3 m; San Francisco Public Library (sfp) – 0.9 m; Stanford (sta) – 9.4 m; Szeged (szt) – 1.2 m; TIB Hannover (tib) – 3.5 m; Toronto Public Library (tor) – 2.5 m union catalogues – national libraries – university libraries – public libraries 12 http://bit.ly/qa-datech2019
  • 13. 2. measure records $ ./validator [options] [file] 001999999 852 undefined subfield L https://www.loc.gov/... 002000005 035 undefined subfield 9 https://www.loc.gov/... 002000005 852 undefined subfield L https://www.loc.gov/... 002000005 852 undefined subfield L https://www.loc.gov/... 002000008 035 undefined subfield 9 https://www.loc.gov/… 13 http://bit.ly/qa-datech2019
  • 14. 3. aggregating results – records with issues 14 all filtered bay 100.0 18.8 bzb 100.0 76.1 cer 2.8 2.8 col 90.4 66.0 dnb 13.9 0.2 gen 40.8 27.3 har 100.0 97.3 loc 30.5 29.3 all filtered mic 80.8 67.5 nfi 62.1 58.1 ris 99.7 57.1 sfp 82.7 60.4 sta 92.7 92.5 szt 30.8 30.6 tib 100.0 100.0 tor 100.0 74.2 Filtered = issues excluding the undocumented tags and subfields http://bit.ly/qa-datech2019
  • 15. issue types issues on record level ❏ R1 ambiguous linkage ❏ R2 invalid linkage ❏ R3 type error control field issues ❏ C1 invalid code ❏ C2 invalid value 15 field issues ❏ F1 missing reference subfield (880$6) ❏ F2 non-repeatable field ❏ F3 undefined field indicator issues ❏ I1 invalid value ❏ I2 non-empty value ❏ I3 obsolete value subfield issues ❏ S1 classification ❏ S2 invalid ISBN ❏ S3 invalid ISSN ❏ S4 invalid length ❏ S5 invalid value ❏ S6 repetition ❏ S7 undefined subfield ❏ S8 non well-formatted value http://bit.ly/qa-datech2019
  • 16. number of subfields in catalogues total 1% 10% bay 854 144 51 bzb 522 144 65 crl 169 65 39 col 1862 196 59 dnb 575 186 97 gnt 955 122 47 har 2024 154 49 loc 1156 128 40 16 total 1% 10% mic 1233 138 37 nfi 811 145 54 ris 138 88 52 sfp 1046 125 37 sta 2997 225 64 szt 1210 74 42 tib 46 41 35 tor 1733 163 46 The tool has 2600+ subfield definitions total: total number of fields, 1% fields availabe in at least 1% of the records, 10%: fields available in at least 10% of the records. Top fields (not in the table) – 50%: 13-25 fields, 80%: 4-18 fields, 90%: 0-16 fields http://bit.ly/qa-datech2019
  • 19. K-means clustering Spark (Scala) increasing number of clusters decreasing the distance from the centroids after a point this gain is not so big (“elbow effect”) -- in theory Big number or low quality records small clusters with ‘in between’ quality records the acceptable average clusters with good quality records 19 http://bit.ly/qa-datech2019 Thompson and Traill (2017) http://journal.code4lib.org/articles/12828
  • 20. 4. report (web UI) 20 http://bit.ly/qa-datech2019
  • 24. Finding problems with facets Vandenhoeck und Ruprecht Vandenhoeck & Ruprecht Vandenhoeck u. Ruprecht Vandenhoeck Vandenhoek & Ruprecht Vandenhoek und Ruprecht Bandenhoed und Ruprecht Vandenhoeck et Ruprecht Vandenhoeck & Reprecht Vandenhoed und Ruprecht V&R unipress V&R Unipress V & R Unipress V & R unipress 24 http://bit.ly/qa-datech2019 est. 1735
  • 25. cataloging frontline intensive backward cataloging - maybe importing? backward cataloging is still intensive, the tendency continues peak is > 13K 2000-07-10, the “golden day”: 95K new records forward cataloging 25 http://bit.ly/qa-datech2019
  • 26. everything else … at least regarding to this project code & docs: https://github.com/pkiraly/metadata-qa-marc Web UI source code: https://github.com/pkiraly/metadata-qa-marc-web Avram Specification (Jakob Voß): http://format.gbv.de/schema/avram/specification https://twitter.com/kiru peter.kiraly@gwdg.de 26 http://bit.ly/qa-datech2019