Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Péter Király
This document summarizes a presentation about developing a Dataverse repository for the SSHOC project. The SSHOC project aims to create an open science cloud for the social sciences and humanities. As part of this, it is developing a Dataverse repository to allow researchers to share and publish datasets according to FAIR principles. The presentation provides an overview of the SSHOC project, introduces Dataverse as a repository software, demonstrates its functionality, and discusses requirements for developing a domain-specific Dataverse for several research communities.
Validating 126 million MARC records (DATeCH 2019)Péter Király
This document summarizes Péter Király's presentation on validating 126 million MARC records. It discusses ingesting MARC records from various libraries, measuring the records for issues using a validator, and aggregating the results. Common issues found include invalid field codes, values and subfields. The results are analyzed and reported through a web interface to help identify problems and improve record quality.
Measuring Metadata Quality (doctoral defense 2019)Péter Király
This document summarizes Peter Király's presentation on measuring metadata quality. The presentation discusses the importance of metadata quality and proposes using structural metrics like completeness, multilinguality, and issue detection to approximate overall metadata quality. It presents a framework for flexible and scalable metadata quality assessment that cultural heritage institutions could implement to measure metadata, generate reports, and improve records. The framework is being used to evaluate metadata in Europeana and could help address challenges of assessing metadata quality at large scales with limited resources.
Empirical evaluation of library catalogues (SWIB 2019)Péter Király
This document summarizes a presentation on empirically evaluating library catalogues using MARC records. It describes ingesting MARC records, measuring them for quality issues, aggregating the results, and working with experts to improve record quality. The tool can validate records, analyze completeness, classifications, authorities, and more. It produces reports on issues and provides links to explore problematic fields and values.
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)Péter Király
The document discusses the Dataverse installation in Göttingen, Germany. It is run by the Göttingen eResearch Alliance which includes the University of Göttingen and several research institutes. The Dataverse, called Göttingen Research Online (GRO.data), serves as a general repository for research data from across the Göttingen campus. It has been customized to the local IT infrastructure and collaborates with other data initiatives both within Göttingen and beyond. Future plans include further integration with other services and assessing data quality.
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Péter Király
This document proposes collaborating to form the Göttingen Cultural Analytics Alliance between several Göttingen institutions to analyze digitized cultural heritage data using computational methods. It discusses potential areas for collaboration including developing new metadata services, conducting joint research projects, improving education offerings, and coordinating open source software development. Establishing this network could help connect experts, pursue funding opportunities, and strengthen partnerships with other cultural heritage organizations internationally.
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Péter Király
This document summarizes a presentation about developing a quality assessment tool for MARC21 catalogues. The tool allows users to:
1) Ingest MARC21 records
2) Measure the records against definitions in the MARC21 standard
3) Aggregate results and generate reports
4) Evaluate results with cataloging experts to improve record quality
The presentation demonstrates the tool and its ability to identify issues in fields, provide definitions, search terms, and eventually link terms to controlled vocabularies through cooperation with other projects.
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Péter Király
This document summarizes a presentation about developing a Dataverse repository for the SSHOC project. The SSHOC project aims to create an open science cloud for the social sciences and humanities. As part of this, it is developing a Dataverse repository to allow researchers to share and publish datasets according to FAIR principles. The presentation provides an overview of the SSHOC project, introduces Dataverse as a repository software, demonstrates its functionality, and discusses requirements for developing a domain-specific Dataverse for several research communities.
Validating 126 million MARC records (DATeCH 2019)Péter Király
This document summarizes Péter Király's presentation on validating 126 million MARC records. It discusses ingesting MARC records from various libraries, measuring the records for issues using a validator, and aggregating the results. Common issues found include invalid field codes, values and subfields. The results are analyzed and reported through a web interface to help identify problems and improve record quality.
Measuring Metadata Quality (doctoral defense 2019)Péter Király
This document summarizes Peter Király's presentation on measuring metadata quality. The presentation discusses the importance of metadata quality and proposes using structural metrics like completeness, multilinguality, and issue detection to approximate overall metadata quality. It presents a framework for flexible and scalable metadata quality assessment that cultural heritage institutions could implement to measure metadata, generate reports, and improve records. The framework is being used to evaluate metadata in Europeana and could help address challenges of assessing metadata quality at large scales with limited resources.
Empirical evaluation of library catalogues (SWIB 2019)Péter Király
This document summarizes a presentation on empirically evaluating library catalogues using MARC records. It describes ingesting MARC records, measuring them for quality issues, aggregating the results, and working with experts to improve record quality. The tool can validate records, analyze completeness, classifications, authorities, and more. It produces reports on issues and provides links to explore problematic fields and values.
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)Péter Király
The document discusses the Dataverse installation in Göttingen, Germany. It is run by the Göttingen eResearch Alliance which includes the University of Göttingen and several research institutes. The Dataverse, called Göttingen Research Online (GRO.data), serves as a general repository for research data from across the Göttingen campus. It has been customized to the local IT infrastructure and collaborates with other data initiatives both within Göttingen and beyond. Future plans include further integration with other services and assessing data quality.
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Péter Király
This document proposes collaborating to form the Göttingen Cultural Analytics Alliance between several Göttingen institutions to analyze digitized cultural heritage data using computational methods. It discusses potential areas for collaboration including developing new metadata services, conducting joint research projects, improving education offerings, and coordinating open source software development. Establishing this network could help connect experts, pursue funding opportunities, and strengthen partnerships with other cultural heritage organizations internationally.
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Péter Király
This document summarizes a presentation about developing a quality assessment tool for MARC21 catalogues. The tool allows users to:
1) Ingest MARC21 records
2) Measure the records against definitions in the MARC21 standard
3) Aggregate results and generate reports
4) Evaluate results with cataloging experts to improve record quality
The presentation demonstrates the tool and its ability to identify issues in fields, provide definitions, search terms, and eventually link terms to controlled vocabularies through cooperation with other projects.
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Péter Király
1) The document discusses data quality management and introduces metrics for assessing metadata quality. It provides examples of structural issues found in metadata records and outlines a proposed framework for measuring metadata quality.
2) A key hypothesis is that measuring structural elements can approximate metadata record quality. An organizational proposal suggests forming a metadata quality committee, and a technical proposal is to create a generic tool to measure metadata quality across different schemas.
3) The document demonstrates metadata quality dashboards and encourages cooperation on related open source projects and research on measuring metadata quality.
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Péter Király
The document describes a Metadata Quality Assessment Framework (MQAF) API that can validate JSON, XML, CSV, and MARC data against SHACL-like constraints. The MQAF API implements a subset of SHACL tests to validate data elements, including tests for data types, lengths, patterns, logical rules and more. It provides a Java API and configuration files to define validation rules for different data formats and schemas in an abstracted way.
FRBR a book history perspective (Bibliodata WG 2022)Péter Király
This document discusses applying FRBR and related bibliographic models from a book history perspective. It identifies several issues with modeling complex bibliographic relationships, including identifier problems when the same work is referenced in different ways, cardinality issues when modeling ownership relationships, and granularity issues in modeling different levels of bibliographic information. It also discusses technological challenges in applying these models, such as adapting them for different metadata schemas and handling uncertainties. The document suggests greater involvement in standardization, publishing research data in linked open data formats, and sharing data between researchers and heritage organizations to help address these issues.
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)Péter Király
This document summarizes a presentation about sustainable research data management using Dataverse. It discusses the Göttingen eResearch Alliance which manages Göttingen Research Online (GRO), including the GRO.data repository for publishing research data. GRO.data uses Dataverse to provide an open repository for research institutions in Göttingen. The presentation provides information on using GRO.data, testing sessions held to try the system, collaborations with other Dataverse groups, contributions to the Dataverse community, and plans for an eResearch Lab and improving metadata quality assessment.
Understanding, extracting and enhancing catalogue data (CE Book history works...Péter Király
This document discusses understanding, extracting, and enhancing catalogue data from library records. It describes analyzing publication date and place information, normalizing place names and dates, and linking place names to geographic coordinates. Tables show results of applying these techniques to records from the Austrian National Library, Hungarian Academy of Sciences, and Polish National Library. The document provides references to related analyses, code repositories, and contact information.
Measuring cultural heritage metadata quality (Semantics 2017)Péter Király
This document discusses measuring the quality of cultural heritage metadata. It proposes a generic "Metadata Quality Assurance Framework" tool to measure metadata quality across different schemas. The tool would measure completeness, availability, and other dimensions using structural analysis and by mapping metadata elements to discovery functions. It would provide customizable and scalable quality reports to help data curators improve metadata. The document outlines technical requirements and modules for an open source tool to systematically measure metadata at the record and aggregate level.
Measuring Metadata Quality in Europeana (ADOCHS 2017)Péter Király
This document discusses measuring metadata quality for records in Europeana. It proposes establishing a Europeana Data Quality Committee and developing a "Metadata Quality Assurance Framework" tool to measure metadata quality across Europeana's large collection. Key metrics would include completeness, field cardinality, uniqueness, multilinguality and conformance to requirements. The tool would provide customizable quality measurements, reports, and recommendations to help improve metadata quality.
Measuring library catalogs (ADOCHS 2017)Péter Király
This document discusses measuring library catalogs and record validation. It begins with an introduction to MARC format and examples of MARC records. It then covers validating individual records and generating summaries of validation errors. Other validation options and viewing/filtering records are described. Methods for calculating completeness, clustering records, indexing with Solr, and finding problems with facets are also summarized. The document concludes with discussions of using MARC data in digital humanities, reproducibility, available catalogs to measure, and future work.
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Péter Király
This document discusses evaluating data quality in Europeana by developing metrics for multilinguality. It identifies processes that contribute to multilinguality in metadata and proposes dimensions like completeness, consistency, conformity and accessibility to quantify multilinguality. Results of applying these metrics to Europeana data are presented, including the number of languages, language-tagged literals and their distribution. A demo of the analysis is also provided. Future work includes embedding the metrics into Europeana's workflow and further evaluation.
Researching metadata quality (ORKG 2018)Péter Király
This document discusses metadata quality and metrics for evaluating metadata. It defines metadata as structured information that describes something else. Metadata quality is described as fulfillment of specifications and goals. General metrics for metadata quality include completeness, accuracy, consistency, objectiveness, appropriateness, and correctness. For linked data, additional dimensions and metrics are proposed such as accessibility, intrinsic qualities, contextual relevance, and representational properties. Good metrics are said to be clear, realistic, measurable, discriminating, and universal. The document discusses using RDFUnit, SHACL and ShEx for evaluating linked data and using clustering algorithms like K-means to analyze metadata quality.
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Péter Király
This document discusses metadata quality in cultural heritage institutions. It provides examples of common metadata issues such as inconsistent date formats, non-informative titles, multilinguality problems, and copy-and-paste cataloging. It also discusses metrics for measuring metadata quality, such as completeness, accuracy, and consistency. Additionally, it proposes using a "Metadata Quality Assurance Framework" tool to measure metadata quality at large scale, generate reports for data curators, and help improve metadata quality over time.
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Péter Király
The document discusses measuring metadata quality in Europeana. It proposes using metrics like completeness to assess metadata records on a scale from good to bad. It suggests developing a Metadata Quality Assessment Framework tool to measure structural elements and functional requirements to approximate metadata quality. The tool would generate reports and be adaptable, scalable, and open source. It would involve ingesting metadata via APIs, analyzing it using Hadoop and Spark, and presenting results through a web interface.
This document discusses measuring library catalogs and introduces MARC (MAchine Readable Catalog) format. It provides an example MARC record and explains the different positional fields in the Leader and 008 fields. It also covers MARC data fields, different MARC versions, and addressing MARC elements using MARCspec. The second part discusses validating MARC records, including validating individual records, getting a summary of errors, and specifying the MARC version and output format. It also covers processing a subset of records and fixing ALEPHSEQ placeholders.
This document provides an introduction to the Shapes Constraint Language (SHACL) for validating RDF data against a set of rules defined in a shape graph. It demonstrates how to use SHACL to validate data using the shaclvalidate.sh tool and explains common SHACL constraints for defining minimum and maximum counts, expected data types, allowed values, and more. Core SHACL concepts covered include shapes, properties, constraints, and validating RDF data.
Measuring Metadata Quality (ELAG, 2018)Péter Király
This document discusses measuring metadata quality by analyzing structural elements of metadata records. It proposes using generic metrics like completeness, uniqueness and data type conformance to approximate a record's quality. Measuring requirements driven by specific user tasks and discovery scenarios is also suggested. The goals are to improve metadata, ensure reliable system functions, and propagate best practices. A batch processing API and further human analysis are presented as next steps.
Measuring completeness as metadata quality metric in Europeana (DH 2017)Péter Király
This document summarizes a presentation about measuring metadata completeness as a quality metric for records in Europeana, a digital library of over 53 million items from European cultural heritage institutions. The presentation proposes developing a Metadata Quality Assurance Framework tool to measure completeness at the overall, collection, and record level based on structural elements and support of functional requirements. Metrics would help identify records needing improvement and support improving metadata quality in Europeana.
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Péter Király
This document discusses measuring and visualizing data quality in Europeana. It proposes establishing a Europeana Data Quality Committee to analyze metadata quality, develop metrics and problem definitions. Metrics could measure completeness, availability, licensing and other dimensions. Problems like duplicate titles and descriptions are defined. A flexible tool is proposed to measure metadata quality across schemas through APIs and reporting. The results would help improve metadata and documentation.
Towards an extensible measurement of metadata quality (DATeCH 2017)Péter Király
This document discusses measuring metadata quality by analyzing structural elements of metadata records. It proposes that by measuring properties like field cardinality, uniqueness, multilinguality, and presence of non-informative values, the quality of metadata records can be predicted. The document outlines various metrics that could be measured at the record, collection, and overall dataset level. It also describes how measurements could be aggregated and visualized to identify outliers and opportunities for improvement.
Stiller & Király, Multilinguality of MetadataPéter Király
1. The document discusses measuring the multilingual degree of metadata in Europeana, a platform for cultural heritage materials.
2. It proposes a "multilingual score" to quantify the multilinguality of metadata based on factors like number of languages, language tags, and literals per language.
3. It describes implementing systems to automatically calculate multilingual scores from Europeana metadata and visualize the results.
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Péter Király
1. The document discusses measuring the multilingual degree of metadata in Europeana, a platform providing access to over 54 million digital cultural heritage objects from over 50 languages.
2. It presents a multilingual score for metadata based on factors like presence of language tags, number of languages per field, and links to multilingual vocabularies.
3. The score is implemented by processing Europeana metadata using techniques like Apache Spark and visualized through APIs and tools to analyze the distribution of languages and identify areas for improvement.
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Péter Király
1) The document discusses data quality management and introduces metrics for assessing metadata quality. It provides examples of structural issues found in metadata records and outlines a proposed framework for measuring metadata quality.
2) A key hypothesis is that measuring structural elements can approximate metadata record quality. An organizational proposal suggests forming a metadata quality committee, and a technical proposal is to create a generic tool to measure metadata quality across different schemas.
3) The document demonstrates metadata quality dashboards and encourages cooperation on related open source projects and research on measuring metadata quality.
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Péter Király
The document describes a Metadata Quality Assessment Framework (MQAF) API that can validate JSON, XML, CSV, and MARC data against SHACL-like constraints. The MQAF API implements a subset of SHACL tests to validate data elements, including tests for data types, lengths, patterns, logical rules and more. It provides a Java API and configuration files to define validation rules for different data formats and schemas in an abstracted way.
FRBR a book history perspective (Bibliodata WG 2022)Péter Király
This document discusses applying FRBR and related bibliographic models from a book history perspective. It identifies several issues with modeling complex bibliographic relationships, including identifier problems when the same work is referenced in different ways, cardinality issues when modeling ownership relationships, and granularity issues in modeling different levels of bibliographic information. It also discusses technological challenges in applying these models, such as adapting them for different metadata schemas and handling uncertainties. The document suggests greater involvement in standardization, publishing research data in linked open data formats, and sharing data between researchers and heritage organizations to help address these issues.
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)Péter Király
This document summarizes a presentation about sustainable research data management using Dataverse. It discusses the Göttingen eResearch Alliance which manages Göttingen Research Online (GRO), including the GRO.data repository for publishing research data. GRO.data uses Dataverse to provide an open repository for research institutions in Göttingen. The presentation provides information on using GRO.data, testing sessions held to try the system, collaborations with other Dataverse groups, contributions to the Dataverse community, and plans for an eResearch Lab and improving metadata quality assessment.
Understanding, extracting and enhancing catalogue data (CE Book history works...Péter Király
This document discusses understanding, extracting, and enhancing catalogue data from library records. It describes analyzing publication date and place information, normalizing place names and dates, and linking place names to geographic coordinates. Tables show results of applying these techniques to records from the Austrian National Library, Hungarian Academy of Sciences, and Polish National Library. The document provides references to related analyses, code repositories, and contact information.
Measuring cultural heritage metadata quality (Semantics 2017)Péter Király
This document discusses measuring the quality of cultural heritage metadata. It proposes a generic "Metadata Quality Assurance Framework" tool to measure metadata quality across different schemas. The tool would measure completeness, availability, and other dimensions using structural analysis and by mapping metadata elements to discovery functions. It would provide customizable and scalable quality reports to help data curators improve metadata. The document outlines technical requirements and modules for an open source tool to systematically measure metadata at the record and aggregate level.
Measuring Metadata Quality in Europeana (ADOCHS 2017)Péter Király
This document discusses measuring metadata quality for records in Europeana. It proposes establishing a Europeana Data Quality Committee and developing a "Metadata Quality Assurance Framework" tool to measure metadata quality across Europeana's large collection. Key metrics would include completeness, field cardinality, uniqueness, multilinguality and conformance to requirements. The tool would provide customizable quality measurements, reports, and recommendations to help improve metadata quality.
Measuring library catalogs (ADOCHS 2017)Péter Király
This document discusses measuring library catalogs and record validation. It begins with an introduction to MARC format and examples of MARC records. It then covers validating individual records and generating summaries of validation errors. Other validation options and viewing/filtering records are described. Methods for calculating completeness, clustering records, indexing with Solr, and finding problems with facets are also summarized. The document concludes with discussions of using MARC data in digital humanities, reproducibility, available catalogs to measure, and future work.
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Péter Király
This document discusses evaluating data quality in Europeana by developing metrics for multilinguality. It identifies processes that contribute to multilinguality in metadata and proposes dimensions like completeness, consistency, conformity and accessibility to quantify multilinguality. Results of applying these metrics to Europeana data are presented, including the number of languages, language-tagged literals and their distribution. A demo of the analysis is also provided. Future work includes embedding the metrics into Europeana's workflow and further evaluation.
Researching metadata quality (ORKG 2018)Péter Király
This document discusses metadata quality and metrics for evaluating metadata. It defines metadata as structured information that describes something else. Metadata quality is described as fulfillment of specifications and goals. General metrics for metadata quality include completeness, accuracy, consistency, objectiveness, appropriateness, and correctness. For linked data, additional dimensions and metrics are proposed such as accessibility, intrinsic qualities, contextual relevance, and representational properties. Good metrics are said to be clear, realistic, measurable, discriminating, and universal. The document discusses using RDFUnit, SHACL and ShEx for evaluating linked data and using clustering algorithms like K-means to analyze metadata quality.
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Péter Király
This document discusses metadata quality in cultural heritage institutions. It provides examples of common metadata issues such as inconsistent date formats, non-informative titles, multilinguality problems, and copy-and-paste cataloging. It also discusses metrics for measuring metadata quality, such as completeness, accuracy, and consistency. Additionally, it proposes using a "Metadata Quality Assurance Framework" tool to measure metadata quality at large scale, generate reports for data curators, and help improve metadata quality over time.
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Péter Király
The document discusses measuring metadata quality in Europeana. It proposes using metrics like completeness to assess metadata records on a scale from good to bad. It suggests developing a Metadata Quality Assessment Framework tool to measure structural elements and functional requirements to approximate metadata quality. The tool would generate reports and be adaptable, scalable, and open source. It would involve ingesting metadata via APIs, analyzing it using Hadoop and Spark, and presenting results through a web interface.
This document discusses measuring library catalogs and introduces MARC (MAchine Readable Catalog) format. It provides an example MARC record and explains the different positional fields in the Leader and 008 fields. It also covers MARC data fields, different MARC versions, and addressing MARC elements using MARCspec. The second part discusses validating MARC records, including validating individual records, getting a summary of errors, and specifying the MARC version and output format. It also covers processing a subset of records and fixing ALEPHSEQ placeholders.
This document provides an introduction to the Shapes Constraint Language (SHACL) for validating RDF data against a set of rules defined in a shape graph. It demonstrates how to use SHACL to validate data using the shaclvalidate.sh tool and explains common SHACL constraints for defining minimum and maximum counts, expected data types, allowed values, and more. Core SHACL concepts covered include shapes, properties, constraints, and validating RDF data.
Measuring Metadata Quality (ELAG, 2018)Péter Király
This document discusses measuring metadata quality by analyzing structural elements of metadata records. It proposes using generic metrics like completeness, uniqueness and data type conformance to approximate a record's quality. Measuring requirements driven by specific user tasks and discovery scenarios is also suggested. The goals are to improve metadata, ensure reliable system functions, and propagate best practices. A batch processing API and further human analysis are presented as next steps.
Measuring completeness as metadata quality metric in Europeana (DH 2017)Péter Király
This document summarizes a presentation about measuring metadata completeness as a quality metric for records in Europeana, a digital library of over 53 million items from European cultural heritage institutions. The presentation proposes developing a Metadata Quality Assurance Framework tool to measure completeness at the overall, collection, and record level based on structural elements and support of functional requirements. Metrics would help identify records needing improvement and support improving metadata quality in Europeana.
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Péter Király
This document discusses measuring and visualizing data quality in Europeana. It proposes establishing a Europeana Data Quality Committee to analyze metadata quality, develop metrics and problem definitions. Metrics could measure completeness, availability, licensing and other dimensions. Problems like duplicate titles and descriptions are defined. A flexible tool is proposed to measure metadata quality across schemas through APIs and reporting. The results would help improve metadata and documentation.
Towards an extensible measurement of metadata quality (DATeCH 2017)Péter Király
This document discusses measuring metadata quality by analyzing structural elements of metadata records. It proposes that by measuring properties like field cardinality, uniqueness, multilinguality, and presence of non-informative values, the quality of metadata records can be predicted. The document outlines various metrics that could be measured at the record, collection, and overall dataset level. It also describes how measurements could be aggregated and visualized to identify outliers and opportunities for improvement.
Stiller & Király, Multilinguality of MetadataPéter Király
1. The document discusses measuring the multilingual degree of metadata in Europeana, a platform for cultural heritage materials.
2. It proposes a "multilingual score" to quantify the multilinguality of metadata based on factors like number of languages, language tags, and literals per language.
3. It describes implementing systems to automatically calculate multilingual scores from Europeana metadata and visualize the results.
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Péter Király
1. The document discusses measuring the multilingual degree of metadata in Europeana, a platform providing access to over 54 million digital cultural heritage objects from over 50 languages.
2. It presents a multilingual score for metadata based on factors like presence of language tags, number of languages per field, and links to multilingual vocabularies.
3. The score is implemented by processing Europeana metadata using techniques like Apache Spark and visualized through APIs and tools to analyze the distribution of languages and identify areas for improvement.
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Magyar irodalom idegen nyelven (BTK ITI 2021)
1. Magyar irodalom idegen nyelven
Demeter Tibor és Demeter Tiborné Bibliographia Hungaricajának
adatközpontú elemzése
Király Péter1 – Kiséry András2
1Göttingen eResearch Alliance, 2The City College of New York
BTK Irodalomtudományi Intézet,
Digitális módszerek az irodalomtudomány támogatására
2021. szeptember 21.
https://bit.ly/patterns-iti2021
8. UNESCO: Index Translationum (1978–?)
1932-től könyvalakban, 1978-től digitalizálva: forditásban megjelent könyvek bibliográfiája
http://www.unesco.org/xtrans/
9. kutatási előzmények:
világrendszer-elmélet a fordításkutatásban
Heilbron, Johan. 1999. Towards a Sociology of Translation: Book Translations as
a Cultural World-System. European Journal of Social Theory 2 (4):429-444.
ld. még
Heilbron, Johan, and Sapiro, Gisèle. 2016. “Translation: Economic and
Sociological perspectives,” in Victor Ginsburgh, Shlomo Weber (eds), The
Palgrave Handbook of Economics and Language, (Basingstoke, UK: Palgrave-
Macmillan), pp. 373-402.
van Es, Nicky, and Johan Heilbron. 2015. Fiction from the Periphery: How Dutch
Writers Enter the Field of English-Language Literature. Cultural Sociology 9:
296-319.
9
https://bit.ly/patterns-iti2021
10. 3 regényíró a két világrendszer között
példa
https://bit.ly/patterns-iti2021
20. Hogyan épitünk a digitalizált
bibliográfiából adatbázist?
https://bit.ly/patterns-iti2021
21. bibliográfia mint kutatási adat – előzmények, párhuzamok
❏ DARIAH Bibliographical Data munkacsoport
❏ Cseh és Lengyel Tudományos Akadémia irodalomtudományi intézetei (Vojtěch Malínek,
Tomasz Umerle)
❏ Helsinki Egyetem: Computational History Group (Mikko Tolonen)
❏ Péter Róbert (Szegedi Egyetem): AVOBMAT
❏ „Collections as Data” mozgalom (Thomas Padilla)
❏ európai változat: Heritage Data Reuse Charter (ld. Czifra-Tóth–Romary, 2020)
❏ könyvtári referenszmodellek
❏ IFLA FRBR (1997)
❏ IFLA Library Reference Model (2018)
❏ ennek filológiai vonatkozásairól ld. Káldos (2021)
21
https://bit.ly/patterns-iti2021
22. adatszerkezet
22
id kotet szerzo magyar_cim nyelv idegen_cim fordito megjelenes megjegyzes
6613 4. Jakab
István−József Jolán
ILLYÉS Gyula A puszták
népe
lengyel LUD PUSZTY Tłumaczyła z
węgierskiego Ella
Maria Sperling.
Wyjątki z piesni
kuruckich i wiersza
Petöfiego tłumaczył
Aleksander
Rymkiewicz Poezje
ludowe tłumaczyła
Camilla Mondral
Warszawa,
1954.
Czytelnik 8°
278(1) str.
… … … … … … … … …
17300 9. Petőfi (A majtényi
síkon − Az volt a
nagy, nagy munka)
PETŐFI
Sándor
A négyökrös
szekér
lengyel NOC BYŁA
JASNA (Részlet)
Tłumaczył Aleksander
Rymkiewicz
Warszawa,
1954.
Czytelnik
"Lud Puszty"
str. 30.
https://bit.ly/patterns-iti2021
23. adattisztítás
23
id kotet szerzo magyar_cim nyelv idegen_cim fordito megjelenes megjegyzes
6613 4. Jakab
István−József Jolán
ILLYÉS Gyula A puszták
népe
lengyel LUD PUSZTY Tłumaczyła z
węgierskiego Ella
Maria Sperling.
Wyjątki z piesni
kuruckich i wiersza
Petöfiego tłumaczył
Aleksander
Rymkiewicz Poezje
ludowe tłumaczyła
Camilla Mondral
Warszawa,
1954.
Czytelnik 8°
278(1) str.
… … … … … … … … …
17300 9. Petőfi (A majtényi
síkon − Az volt a
nagy, nagy munka)
PETŐFI
Sándor
A négyökrös
szekér
lengyel NOC BYŁA
JASNA (Részlet)
Tłumaczył Aleksander
Rymkiewicz
Warszawa,
1954.
Czytelnik
"Lud Puszty"
str. 30.
egységesített hely
hely
normalizált év
év
egységesítés
bibliográfiai
tétel típusa
https://bit.ly/patterns-iti2021
24. adattisztítás
24
id kotet szerzo magyar_cim nyelv idegen_cim fordito megjelenes megjegyzes
6613 4. Jakab
István−József Jolán
ILLYÉS Gyula A puszták
népe
lengyel LUD PUSZTY Tłumaczyła z
węgierskiego Ella
Maria Sperling.
Wyjątki z piesni
kuruckich i wiersza
Petöfiego tłumaczył
Aleksander
Rymkiewicz Poezje
ludowe tłumaczyła
Camilla Mondral
Warszawa,
1954.
Czytelnik 8°
278(1) str.
… … … … … … … … …
17300 9. Petőfi (A majtényi
síkon − Az volt a
nagy, nagy munka)
PETŐFI
Sándor
A négyökrös
szekér
lengyel NOC BYŁA
JASNA (Részlet)
Tłumaczył Aleksander
Rymkiewicz
Warszawa,
1954.
Czytelnik
"Lud Puszty"
str. 30.
egységesített hely
hely
normalizált év
év
egységesítés
befoglaló mű
bibliográfiai
tétel típusa
https://bit.ly/patterns-iti2021
25. adatgazdagítás
25
id kotet szerzo magyar_cim nyelv idegen_cim fordito megjelenes megjegyzes
6613 4. Jakab
István−József Jolán
ILLYÉS Gyula A puszták
népe
lengyel LUD PUSZTY Tłumaczyła z
węgierskiego Ella
Maria Sperling.
Wyjątki z piesni
kuruckich i wiersza
Petöfiego tłumaczył
Aleksander
Rymkiewicz Poezje
ludowe tłumaczyła
Camilla Mondral
Warszawa,
1954.
Czytelnik 8°
278(1) str.
… … … … … … … … …
17300 9. Petőfi (A majtényi
síkon − Az volt a
nagy, nagy munka)
PETŐFI
Sándor
A négyökrös
szekér
lengyel NOC BYŁA
JASNA (Részlet)
Tłumaczył Aleksander
Rymkiewicz
Warszawa,
1954.
Czytelnik
"Lud Puszty"
str. 30.
egységesített hely
hely
normalizált év
év
egységesítés
befoglaló mű
bibliográfiai
tétel típusa
Geonames.org
Wikidata
API
SPARQL
https://bit.ly/patterns-iti2021
26. adatmodell – rész-egész kapcsolat
W JÓKAI Mór:
A kőszívű ember fiai
E SOIXANTE MINUTES.
Traduit par Jean Hankiss
M Anthologie de la Prose Hongroise
(Paris, 1938.)
W JÓKAI Mór:
Hatvan perc
isPartOf
W JÓKAI Mór:
Sárga rózsa
W JÓKAI Mór:
Pusztai párbaj
isPartOf
E DUEL DANS LA PUSZTA.
Traduit par Jean Hankiss
M SOIXANTE MINUTES. M DUEL DANS LA PUSZTA.
isPartOf
isPartOf
1. típusú
megjelenési
forma
2. típusú
megjelenési forma
(befoglaló manifesztáció)
Expression –
kifejezési forma
26
Work – mű
Manifestation –
megjelenési forma
https://bit.ly/patterns-iti2021
27. bibliográfiai tételek típusai
db szerzők
1 monografikus művek 4003 494
2 gyűjtemények (verseskötetek, novelláskötetek, periodikák) 1560 1240
2.1 - egyszerzős kötetek 901 140
2.2 - többszerzős kötetek
ezen belül:
- csak magyar tartalmú gyűjtemény
- több különböző forrásnyelv irodalmából álló gyűjtemény
643 1196
2.3 - eddig kategorizálatlan gyűjteményes kötetek 181 39
3 gyűjteményekben (2.1., 2.2., 2.3) megjelent művek 61541 1240
4 periodikákban megjelent művek 2798 431
5 eddig feldolgozatlan, azonosítatlan rekordok 1560 325
https://bit.ly/patterns-iti2021
27
28. hamis magyar szerzők
28
név a bibliográfiában feloldás
Barényi Olga Olga Barényi cseh írónő
Békessy János Hans Habe magyar származású svájci-amerikai író eredeti neve
Horváth Ödön Ödön von Horváth
Petry Anna Ann Petry afrikai-amerikai írónő
Polnay Péter Peter de Polnay magyar származású angol szerző
https://bit.ly/patterns-iti2021
38. Összesen 27161 tétel,
ebből oroszul és a SZU más nyelvein 8967 tétel
38
Gyűjteményekben megjelent forditások, 1948-1978
39. Az orosz és a többi szovjetköztársaságok nyelveinek aránya a különböző
típusú bibliográfiai tételek halmazain belül (1948-1978 közötti publikációk).
részegységek: antológiák,
periodikumok, gyűjteményes kötetek
tartalma:
önálló kötetek:
39
40. Összesen 3163 magyarból forditott
monográfia illetve kötetes gyűjtemény
szerepel a bibliográfiában ebből az
időszakból.
Ennek 43%-a van olyan szerzőnek
tulajdonítva, aki legalább 30 könyvvel
szerepel – ezek vannak itt névvel
feltüntetve.
40
41. 41
A 23510 tételnek
23%-a Petőfié;
35%-a olyan szerző műve, aki 100-nál kevesebb
szöveggel van jelen
48. 1958
Párizs Moszkva
DÉRY Tibor
FÜST Milán
JÓZSEF Attila
MÁRAI Sándor
MOLNÁR Ferenc
PASSUTH László
SZATHMÁRI Sándor (eszperantó)
ADY Endre
BARABÁS Tibor
HELTAI Jenő
ILLÉS Béla
JÓKAI Mór
JÓZSEF Attila
MOLNÁR Ferenc
MÓRICZ Zsigmond
NÉMETH László
PETŐFI Sándor
TABI László
URBÁN Ernő
48
https://bit.ly/patterns-iti2021
50. módszertani megjegyzések
❏ MySQL → CSV file
❏ R ↔ OpenRefine
❏ vizualizálás:
R, Google chart
❏ adatgazdagítás:
Wikidata SPARQL,
Geonames API
❏ kézi adatellenőrzés:
könyvtári katalógusok,
bibliográfiák
50
https://bit.ly/patterns-iti2021
51. ❏ további adattisztítás:
❏ forditók
❏ művek
❏ műfajok
❏ angol nyelvű cikk
Kiséry András @ akisery@ccny.cuny.edu,
w https://www.ccny.cuny.edu/profiles/andras-kisery
Király Péter @ pkiraly@gwdg.de,
w https://pkiraly.github.io/about, @kiru
adat https://doi.org/10.25625/5JFAMK
szkriptek https://github.com/pkiraly/patterns-of-translations
diák https://bit.ly/patterns-iti2021
tervek, kapcsolat
51