SlideShare a Scribd company logo
EVA/Minerva 2016
Integration and Retrieval of
Heterogeneous Archival Metadata
CONNECTING
COLLECTIONS
Kepa J. Rodriguez – Archives Yad Vashem
09/11/2016
Outline
●
Data integration in the first phase of the project
●
Our actual integration approach
●
Retrieval of data using controlled vocabularies
●
Development of the EHRI controlled vocabularies
Data integration in the first phase of the project
●
Holding institutions delivered data in very different formats:
●
XML, text files, CSV, JSON, etc...
●
Ingestion into the portal was made case by case
●
We interpreted data model and map it with our model
●
Sometimes without help of the institution
●
Lots of data introduced by hand
●
Process no sustainable, it cannot be repeated
●
No automatic updates are possible
●
If an institution updates content, data has to be updated by hand
●
Other problems: infrastructure, persistent identifiers, etc.
Proposal for the second phase of the project
● Data conversion
● Data publication and synchronization
● Data ingestion
Data conversion
●
Converstion tool: different data formats into EAD:
●
XML, JSON, CSV...
●
Generic transformation
●
Useful for a relevant number of institutions
●
Reusable functions, as mappings for specific fields of their export
format into EAD
●
Utilities to configure specific transformations
●
Validation of the output:
●
Machine validation: XML validation protocols
●
Schematron, RNG
●
Human validation: HTML preview including mark-up
for validation errors
EAD File sample (1)
<archdesc level="subgrp">
<did>
<unitid>M.49.E</unitid>
<unittitle encodinganalog="3.1.2">Testimonies of Holocaust Survivors collected by the
Central Jewish Historical Commission in Poland, 1944-1947</unittitle>
<physdesc encodinganalog="3.1.5">6845 files</physdesc>
<langmaterial>
<language langcode="deu" encodinganalog="3.4.3">German</language>
<language langcode="pol" encodinganalog="3.4.3">Polish</language>
<language langcode="yid" encodinganalog="3.4.3">Yiddish</language>
</langmaterial>
<repository>
<corpname>‫ושם‬ ‫יד‬ ‫ארכיון‬ / Yad Vashem Archives</corpname>
</repository>
</did>
<scopecontent encodinganalog="3.3.1">
<p>The collection consists of approximately 7,200 testimonies collected by the
Centralna Żydowska Komisja Historyczna (Central Jewish Historical Committee) in
Poland during its during its active years, 1944-1947.
…..
as well as testimonies from survivors who fought in partisan units and survivors who
were in hiding.</p>
</scopecontent>
…....
EAD File sample (2)
…...
<originalsloc encodinganalog="3.5.1">
<p>ZYDOWSKI INSTYTUT HISTORYCZNY - ZIH, WARSZAWA, POLAND</p>
</originalsloc>
…...
<controlaccess>
<geogname>Poland</geogname>
<geogname>Warsaw</geogname>
</controlaccess>
<controlaccess>
<subject>Persecution of Jews</subject>
<subject>Testimonies, Biographies</subject>
<subject>Holocaust survivors</subject>
</controlaccess>
<controlaccess>
<corpname>Centralna Żydowska Komisja Historyczna</corpname>
</controlaccess>
</archdesc>
Data publication and synchronization
●
We plan to use two data publication protocols:
●
OAI-PMH: one of the first protocols for publication of data
●
Publication of data in different formats: Dublin Core (default), EAD,
etc.
●
PMH-servers are not easy to implement and to mantain for small
archives
●
But we want to implement a client for institutions that already use it
●
RessourceSync: a new protocol
●
Based on SiteMaps
●
Data can be published on the web page of the institution
●
Higher security
●
Use sitemaps to expose changes and updates
●
Only modified and new data will be tranferred to the portal
●
Both are standard protocols of the Open Archives Initiative
Data ingestion
●
After data is ingested into the portal, it will receive a
permanent URL:
●
Formal protocol is in progress
●
Necessary to publish our data in the Linked Open Data cloud
●
Updates: data will be overwritten
●
But the portal keeps the user generated data
●
But... is it enough for the user just to have all
information in a single infrastructure?
Data retrieval
●
The user needs to be able to retrieve information related to
selected topics, places, people, organizations, creators...
●
Regardless which institution holds it
●
Regardless in which language the metadata is written
EHRI controlled vocabularies
●
EHRI Thesaurus
●
Concepts: hierarchy of concepts formalized in SKOS
●
A first set translated into 10 languages
●
Made by historians and content specialists
●
Authority lists:
●
Named entities or instances of the concepts
●
Proposed by historians and especialists: not really useful for indexing
and retrieval of data
●
During import a lot were added by hand to address necessities of the real
data
●
Domain specific authorities: Ghettos, Camps, Administrative Districts
●
Vocabularies created for applications in the portal:
●
Two research guides
●
Linked to the EHRI Thesaurus
Problems of the first approach of the project
●
A vocabulary built with knowledge about the Shoah can be
helpful to represent the history, but not necessarily the
documentation:
●
The complilation of an encyclopedia and the implementation of an
engine for cataloguing and retrieval are two very different things
and require different strategies and kinds of expertise.
●
The vocabularies should be able to retrieve the real existing
data:
●
Vocabularies should be able to describe the data, not only the
content... i.e: types of documents, physical format of the data...
●
A strategy to increase te datasets when new data addresses new
necessities has to be implemented.
The reality of the data
●
Different institutions use different systems to assign
keywords (or no system)
●
Keywords can have different relevance in different systems
●
In a National Archive “holocaust” can be a relevant keyword, but it
is not relevant for the EHRI portal.
●
A same keyword can have different meanings in different
knowledge basis
●
i.e: “labor” in one set of imported data corresponds to “forced
labor”, in another set to “trade unions”
●
Relevant information is often given as free text:
●
Necessary to use Natural Language Processing to extract this
information, but we can do in the project only in a experimental
level.
EHRI's data driven approach (1)
●
Extraction of access points of the EAD files during import
<controlaccess>
<geogname>Poland</geogname>
<geogname>Warsaw</geogname>
</controlaccess>
<controlaccess>
<subject>Persecution of Jews</subject>
<subject>Testimonies, Biographies</subject>
<subject>Holocaust survivors</subject>
</controlaccess>
<controlaccess>
<corpname>Centralna Żydowska Komisja Historyczna</corpname>
</controlaccess>
EHRI's data driven approach (2)
●
Person, corporate bodies:
●
Check whether we have corresponding authority files
●
If we have: link the description unit with the correspoinding authority
file
●
If we don't have: create a new authority file
●
Priority of EHRI: creators of archival collections
●
Places:
●
Link the places with the geographical database GeoNames
●
Problematic for historical places, some of them will be added as extra
vocabulary.
EHRI's data driven approach (3)
●
Concepts/terms: the most complicated case
●
Archives used very different strategies for concepts:
●
Some institutions make composition of terms using different rules
(or no-rule)
●
Subject: “Jews--Persecution--France” (data of USHMM)
●
EHRI has an atomic approach
●
Subject: “Persecution of Jews”
●
Place: “France”
●
Steps to process concepts/terms:
●
Terms are normalized and de-duplicated
●
If there are equivalent terms in the thesaurus we establish a link
●
If there are not equivalent terms the concept goes to further
analysis
●
If necessary a board of experts will consider to accomodate a new
concept in our concept hierarchy.
Ghethos and Concentration Camps
●
We evaluate to start a WikiData project for ghettos and
concentration camps
●
Strategy:
●
Extract information from the actual thesaurus and alternative
sources
●
Encyclopedic knowledge
●
Data from project partners
●
Integration of all this data in the WikiData platform
●
Enrichment with help of the community
●
Multilingual labels and no controversial information
●
Finally the data in WikiData and in the portal should be
synchronized
NIOD Institute for War, Holocaust and Genocide
Studies (NL)
 
CEGESOMA Centre for Historical Research and
Documentation
on War and Contemporary Society (BE)
 
Jewish Museum in Prague (CZ)
 
Center for Holocaust Studies at the Institute for
Contemporary History in Munich (DE)
 
YAD VASHEM The Holocaust Martyrs’ and
Heroes’ Remembrance Authority (IL)
United States Holocaust Memorial Museum (USA)
Bundesarchiv (DE)
 
The Wiener Library Institute for the Study of
the Holocaust & Genocide (UK)
Holocaust Documentation Centre (SK)
Polish Center for Holocaust Research (PL)
 
The Jewish Museum of Greece (GR)
Jewish Historical Institute (PL)
King’s College London (UK)
 
Ontotext AD (BG)
 
Elie Wiesel National Institute for the Study of Holocaust
in Romania (RO)
 
DANS Data Archiving and Networked Services (NL)
 
Shoah Memorial, Museum, Center for Contemporary
Jewish Documentation (FR)
 
ITS International Tracing Service (DE)
 
Hungarian Jewish Archives (HU)
 
INRIA Institute for Research in Computer Science and Automation (FR)
 
Vilna Gaon State Jewish Museum (LT)
 
VWI Vienna Wiesenthal Institute for Holocaust Studies (AT)
Foundation Jewish Contemporary Documentation Center (IT)
CONNECTING
KNOWLEDGE
CONNECTING
COLLECTIONS
Integration and Retrieval of
Heterogeneous Archival
Metadata
09/11/2016

More Related Content

What's hot

Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
Laura Po
 
Linked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and ApplicationsLinked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and ApplicationsRui Vieira
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
Jindřich Mynarz
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
Asuncion Gomez-Perez
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
Cristina Sarasua
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Ontotext
 
Open hpi semweb-06-part5
Open hpi semweb-06-part5Open hpi semweb-06-part5
Open hpi semweb-06-part5Nadine Ludwig
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ontotext
 
Linking library data
Linking library dataLinking library data
Linking library data
Jindřich Mynarz
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
Heiko Paulheim
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
Laura Po
 

What's hot (11)

Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Linked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and ApplicationsLinked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and Applications
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
Open hpi semweb-06-part5
Open hpi semweb-06-part5Open hpi semweb-06-part5
Open hpi semweb-06-part5
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Linking library data
Linking library dataLinking library data
Linking library data
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 

Similar to F2 kepa rodriguez_ehri_integration_retrieva_minerva_2016

Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
Connected Data World
 
Publishing Linked Data using Schema.org
Publishing Linked Data using Schema.orgPublishing Linked Data using Schema.org
Publishing Linked Data using Schema.org
DESTIN-Informatique.com
 
Technical details of the P2Pvalue directory
Technical details of the P2Pvalue directoryTechnical details of the P2Pvalue directory
Technical details of the P2Pvalue directoryDavid Rozas
 
Making DMPs actionable and public
Making DMPs actionable and publicMaking DMPs actionable and public
Making DMPs actionable and public
Stephanie Simms
 
Downscaling information systems for education
Downscaling information systems for educationDownscaling information systems for education
Downscaling information systems for education
Christophe Guéret
 
When a local project becomes beneficial for the whole community (and vice ver...
When a local project becomes beneficial for the whole community (and vice ver...When a local project becomes beneficial for the whole community (and vice ver...
When a local project becomes beneficial for the whole community (and vice ver...
4Science
 
Comsode tools - pushing data to open ecosystem
Comsode tools - pushing data to open ecosystemComsode tools - pushing data to open ecosystem
Comsode tools - pushing data to open ecosystem
Comsode - FP7 project
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
WARCnet
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
Oscar Corcho
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
SSSW
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Marcus Hanwell
 
Introducingthe anu datacommons
Introducingthe anu datacommonsIntroducingthe anu datacommons
Introducingthe anu datacommons
Doug Moncur
 
Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe
Research Data Alliance
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategies
Semantic Web Company
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
Sebastian Hellmann
 
Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...
Nuno Freire
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
Enno Meijers
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research data
ARDC
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
Martin Klein
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
openminted_eu
 

Similar to F2 kepa rodriguez_ehri_integration_retrieva_minerva_2016 (20)

Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Publishing Linked Data using Schema.org
Publishing Linked Data using Schema.orgPublishing Linked Data using Schema.org
Publishing Linked Data using Schema.org
 
Technical details of the P2Pvalue directory
Technical details of the P2Pvalue directoryTechnical details of the P2Pvalue directory
Technical details of the P2Pvalue directory
 
Making DMPs actionable and public
Making DMPs actionable and publicMaking DMPs actionable and public
Making DMPs actionable and public
 
Downscaling information systems for education
Downscaling information systems for educationDownscaling information systems for education
Downscaling information systems for education
 
When a local project becomes beneficial for the whole community (and vice ver...
When a local project becomes beneficial for the whole community (and vice ver...When a local project becomes beneficial for the whole community (and vice ver...
When a local project becomes beneficial for the whole community (and vice ver...
 
Comsode tools - pushing data to open ecosystem
Comsode tools - pushing data to open ecosystemComsode tools - pushing data to open ecosystem
Comsode tools - pushing data to open ecosystem
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
 
Introducingthe anu datacommons
Introducingthe anu datacommonsIntroducingthe anu datacommons
Introducingthe anu datacommons
 
Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategies
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research data
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 

More from evaminerva

G14 eyal reuven_nli_theopenlibrary
G14 eyal reuven_nli_theopenlibraryG14 eyal reuven_nli_theopenlibrary
G14 eyal reuven_nli_theopenlibrary
evaminerva
 
G12 susan hazan_roundtableopenaccesjewish
G12 susan hazan_roundtableopenaccesjewishG12 susan hazan_roundtableopenaccesjewish
G12 susan hazan_roundtableopenaccesjewish
evaminerva
 
G12 susan hazan_roundtableopenaccesjewish
G12 susan hazan_roundtableopenaccesjewishG12 susan hazan_roundtableopenaccesjewish
G12 susan hazan_roundtableopenaccesjewish
evaminerva
 
G11 alex valdman_yerushaproject
G11 alex valdman_yerushaprojectG11 alex valdman_yerushaproject
G11 alex valdman_yerushaproject
evaminerva
 
G11 alex valdman_yerushaproject
G11 alex valdman_yerushaprojectG11 alex valdman_yerushaproject
G11 alex valdman_yerushaproject
evaminerva
 
G10 ronit gadish_alexandervainer_hebrewterminology
G10 ronit gadish_alexandervainer_hebrewterminologyG10 ronit gadish_alexandervainer_hebrewterminology
G10 ronit gadish_alexandervainer_hebrewterminology
evaminerva
 
G10 ronit gadish_alexandervainer_hebrewterminology
G10 ronit gadish_alexandervainer_hebrewterminologyG10 ronit gadish_alexandervainer_hebrewterminology
G10 ronit gadish_alexandervainer_hebrewterminology
evaminerva
 
G8 seroussi sprinzak_mappingjewishculture
G8 seroussi sprinzak_mappingjewishcultureG8 seroussi sprinzak_mappingjewishculture
G8 seroussi sprinzak_mappingjewishculture
evaminerva
 
G8 seroussi sprinzak_mappingjewishculture
G8 seroussi sprinzak_mappingjewishcultureG8 seroussi sprinzak_mappingjewishculture
G8 seroussi sprinzak_mappingjewishculture
evaminerva
 
G7 menahem katz_hillelgershuni_textualvariants
G7 menahem katz_hillelgershuni_textualvariantsG7 menahem katz_hillelgershuni_textualvariants
G7 menahem katz_hillelgershuni_textualvariants
evaminerva
 
G7 menahem katz_hillelgershuni_textualvariants
G7 menahem katz_hillelgershuni_textualvariantsG7 menahem katz_hillelgershuni_textualvariants
G7 menahem katz_hillelgershuni_textualvariants
evaminerva
 
G6 jonathan bendovsqe_minerva 2016
G6 jonathan bendovsqe_minerva 2016G6 jonathan bendovsqe_minerva 2016
G6 jonathan bendovsqe_minerva 2016
evaminerva
 
G5 orit rosengarten_leonlevy_dl_deadseascrolls
G5 orit rosengarten_leonlevy_dl_deadseascrollsG5 orit rosengarten_leonlevy_dl_deadseascrolls
G5 orit rosengarten_leonlevy_dl_deadseascrolls
evaminerva
 
G5 orit rosengarten_leonlevy_dl_deadseascrolls
G5 orit rosengarten_leonlevy_dl_deadseascrollsG5 orit rosengarten_leonlevy_dl_deadseascrolls
G5 orit rosengarten_leonlevy_dl_deadseascrolls
evaminerva
 
G3 stoeck and_hayim_lapin_nextgenerationculturalheritage
G3 stoeck and_hayim_lapin_nextgenerationculturalheritageG3 stoeck and_hayim_lapin_nextgenerationculturalheritage
G3 stoeck and_hayim_lapin_nextgenerationculturalheritage
evaminerva
 
G3 stoeck and_hayim_lapin_nextgenerationculturalheritage
G3 stoeck and_hayim_lapin_nextgenerationculturalheritageG3 stoeck and_hayim_lapin_nextgenerationculturalheritage
G3 stoeck and_hayim_lapin_nextgenerationculturalheritage
evaminerva
 
G2 michale satlow_inscriptionsisraelpalestine
G2 michale satlow_inscriptionsisraelpalestineG2 michale satlow_inscriptionsisraelpalestine
G2 michale satlow_inscriptionsisraelpalestine
evaminerva
 
G2 michale satlow_inscriptionsisraelpalestine
G2 michale satlow_inscriptionsisraelpalestineG2 michale satlow_inscriptionsisraelpalestine
G2 michale satlow_inscriptionsisraelpalestine
evaminerva
 
F3 sigal arieerez_reconnectingpast_evaminerva2016
F3 sigal arieerez_reconnectingpast_evaminerva2016F3 sigal arieerez_reconnectingpast_evaminerva2016
F3 sigal arieerez_reconnectingpast_evaminerva2016
evaminerva
 
F3 sigal arieerez_reconnectingpast_evaminerva2016
F3 sigal arieerez_reconnectingpast_evaminerva2016F3 sigal arieerez_reconnectingpast_evaminerva2016
F3 sigal arieerez_reconnectingpast_evaminerva2016
evaminerva
 

More from evaminerva (20)

G14 eyal reuven_nli_theopenlibrary
G14 eyal reuven_nli_theopenlibraryG14 eyal reuven_nli_theopenlibrary
G14 eyal reuven_nli_theopenlibrary
 
G12 susan hazan_roundtableopenaccesjewish
G12 susan hazan_roundtableopenaccesjewishG12 susan hazan_roundtableopenaccesjewish
G12 susan hazan_roundtableopenaccesjewish
 
G12 susan hazan_roundtableopenaccesjewish
G12 susan hazan_roundtableopenaccesjewishG12 susan hazan_roundtableopenaccesjewish
G12 susan hazan_roundtableopenaccesjewish
 
G11 alex valdman_yerushaproject
G11 alex valdman_yerushaprojectG11 alex valdman_yerushaproject
G11 alex valdman_yerushaproject
 
G11 alex valdman_yerushaproject
G11 alex valdman_yerushaprojectG11 alex valdman_yerushaproject
G11 alex valdman_yerushaproject
 
G10 ronit gadish_alexandervainer_hebrewterminology
G10 ronit gadish_alexandervainer_hebrewterminologyG10 ronit gadish_alexandervainer_hebrewterminology
G10 ronit gadish_alexandervainer_hebrewterminology
 
G10 ronit gadish_alexandervainer_hebrewterminology
G10 ronit gadish_alexandervainer_hebrewterminologyG10 ronit gadish_alexandervainer_hebrewterminology
G10 ronit gadish_alexandervainer_hebrewterminology
 
G8 seroussi sprinzak_mappingjewishculture
G8 seroussi sprinzak_mappingjewishcultureG8 seroussi sprinzak_mappingjewishculture
G8 seroussi sprinzak_mappingjewishculture
 
G8 seroussi sprinzak_mappingjewishculture
G8 seroussi sprinzak_mappingjewishcultureG8 seroussi sprinzak_mappingjewishculture
G8 seroussi sprinzak_mappingjewishculture
 
G7 menahem katz_hillelgershuni_textualvariants
G7 menahem katz_hillelgershuni_textualvariantsG7 menahem katz_hillelgershuni_textualvariants
G7 menahem katz_hillelgershuni_textualvariants
 
G7 menahem katz_hillelgershuni_textualvariants
G7 menahem katz_hillelgershuni_textualvariantsG7 menahem katz_hillelgershuni_textualvariants
G7 menahem katz_hillelgershuni_textualvariants
 
G6 jonathan bendovsqe_minerva 2016
G6 jonathan bendovsqe_minerva 2016G6 jonathan bendovsqe_minerva 2016
G6 jonathan bendovsqe_minerva 2016
 
G5 orit rosengarten_leonlevy_dl_deadseascrolls
G5 orit rosengarten_leonlevy_dl_deadseascrollsG5 orit rosengarten_leonlevy_dl_deadseascrolls
G5 orit rosengarten_leonlevy_dl_deadseascrolls
 
G5 orit rosengarten_leonlevy_dl_deadseascrolls
G5 orit rosengarten_leonlevy_dl_deadseascrollsG5 orit rosengarten_leonlevy_dl_deadseascrolls
G5 orit rosengarten_leonlevy_dl_deadseascrolls
 
G3 stoeck and_hayim_lapin_nextgenerationculturalheritage
G3 stoeck and_hayim_lapin_nextgenerationculturalheritageG3 stoeck and_hayim_lapin_nextgenerationculturalheritage
G3 stoeck and_hayim_lapin_nextgenerationculturalheritage
 
G3 stoeck and_hayim_lapin_nextgenerationculturalheritage
G3 stoeck and_hayim_lapin_nextgenerationculturalheritageG3 stoeck and_hayim_lapin_nextgenerationculturalheritage
G3 stoeck and_hayim_lapin_nextgenerationculturalheritage
 
G2 michale satlow_inscriptionsisraelpalestine
G2 michale satlow_inscriptionsisraelpalestineG2 michale satlow_inscriptionsisraelpalestine
G2 michale satlow_inscriptionsisraelpalestine
 
G2 michale satlow_inscriptionsisraelpalestine
G2 michale satlow_inscriptionsisraelpalestineG2 michale satlow_inscriptionsisraelpalestine
G2 michale satlow_inscriptionsisraelpalestine
 
F3 sigal arieerez_reconnectingpast_evaminerva2016
F3 sigal arieerez_reconnectingpast_evaminerva2016F3 sigal arieerez_reconnectingpast_evaminerva2016
F3 sigal arieerez_reconnectingpast_evaminerva2016
 
F3 sigal arieerez_reconnectingpast_evaminerva2016
F3 sigal arieerez_reconnectingpast_evaminerva2016F3 sigal arieerez_reconnectingpast_evaminerva2016
F3 sigal arieerez_reconnectingpast_evaminerva2016
 

Recently uploaded

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 

Recently uploaded (20)

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 

F2 kepa rodriguez_ehri_integration_retrieva_minerva_2016

  • 1. EVA/Minerva 2016 Integration and Retrieval of Heterogeneous Archival Metadata CONNECTING COLLECTIONS Kepa J. Rodriguez – Archives Yad Vashem 09/11/2016
  • 2. Outline ● Data integration in the first phase of the project ● Our actual integration approach ● Retrieval of data using controlled vocabularies ● Development of the EHRI controlled vocabularies
  • 3. Data integration in the first phase of the project ● Holding institutions delivered data in very different formats: ● XML, text files, CSV, JSON, etc... ● Ingestion into the portal was made case by case ● We interpreted data model and map it with our model ● Sometimes without help of the institution ● Lots of data introduced by hand ● Process no sustainable, it cannot be repeated ● No automatic updates are possible ● If an institution updates content, data has to be updated by hand ● Other problems: infrastructure, persistent identifiers, etc.
  • 4. Proposal for the second phase of the project ● Data conversion ● Data publication and synchronization ● Data ingestion
  • 5. Data conversion ● Converstion tool: different data formats into EAD: ● XML, JSON, CSV... ● Generic transformation ● Useful for a relevant number of institutions ● Reusable functions, as mappings for specific fields of their export format into EAD ● Utilities to configure specific transformations ● Validation of the output: ● Machine validation: XML validation protocols ● Schematron, RNG ● Human validation: HTML preview including mark-up for validation errors
  • 6. EAD File sample (1) <archdesc level="subgrp"> <did> <unitid>M.49.E</unitid> <unittitle encodinganalog="3.1.2">Testimonies of Holocaust Survivors collected by the Central Jewish Historical Commission in Poland, 1944-1947</unittitle> <physdesc encodinganalog="3.1.5">6845 files</physdesc> <langmaterial> <language langcode="deu" encodinganalog="3.4.3">German</language> <language langcode="pol" encodinganalog="3.4.3">Polish</language> <language langcode="yid" encodinganalog="3.4.3">Yiddish</language> </langmaterial> <repository> <corpname>‫ושם‬ ‫יד‬ ‫ארכיון‬ / Yad Vashem Archives</corpname> </repository> </did> <scopecontent encodinganalog="3.3.1"> <p>The collection consists of approximately 7,200 testimonies collected by the Centralna Żydowska Komisja Historyczna (Central Jewish Historical Committee) in Poland during its during its active years, 1944-1947. ….. as well as testimonies from survivors who fought in partisan units and survivors who were in hiding.</p> </scopecontent> …....
  • 7. EAD File sample (2) …... <originalsloc encodinganalog="3.5.1"> <p>ZYDOWSKI INSTYTUT HISTORYCZNY - ZIH, WARSZAWA, POLAND</p> </originalsloc> …... <controlaccess> <geogname>Poland</geogname> <geogname>Warsaw</geogname> </controlaccess> <controlaccess> <subject>Persecution of Jews</subject> <subject>Testimonies, Biographies</subject> <subject>Holocaust survivors</subject> </controlaccess> <controlaccess> <corpname>Centralna Żydowska Komisja Historyczna</corpname> </controlaccess> </archdesc>
  • 8. Data publication and synchronization ● We plan to use two data publication protocols: ● OAI-PMH: one of the first protocols for publication of data ● Publication of data in different formats: Dublin Core (default), EAD, etc. ● PMH-servers are not easy to implement and to mantain for small archives ● But we want to implement a client for institutions that already use it ● RessourceSync: a new protocol ● Based on SiteMaps ● Data can be published on the web page of the institution ● Higher security ● Use sitemaps to expose changes and updates ● Only modified and new data will be tranferred to the portal ● Both are standard protocols of the Open Archives Initiative
  • 9. Data ingestion ● After data is ingested into the portal, it will receive a permanent URL: ● Formal protocol is in progress ● Necessary to publish our data in the Linked Open Data cloud ● Updates: data will be overwritten ● But the portal keeps the user generated data ● But... is it enough for the user just to have all information in a single infrastructure?
  • 10. Data retrieval ● The user needs to be able to retrieve information related to selected topics, places, people, organizations, creators... ● Regardless which institution holds it ● Regardless in which language the metadata is written
  • 11. EHRI controlled vocabularies ● EHRI Thesaurus ● Concepts: hierarchy of concepts formalized in SKOS ● A first set translated into 10 languages ● Made by historians and content specialists ● Authority lists: ● Named entities or instances of the concepts ● Proposed by historians and especialists: not really useful for indexing and retrieval of data ● During import a lot were added by hand to address necessities of the real data ● Domain specific authorities: Ghettos, Camps, Administrative Districts ● Vocabularies created for applications in the portal: ● Two research guides ● Linked to the EHRI Thesaurus
  • 12. Problems of the first approach of the project ● A vocabulary built with knowledge about the Shoah can be helpful to represent the history, but not necessarily the documentation: ● The complilation of an encyclopedia and the implementation of an engine for cataloguing and retrieval are two very different things and require different strategies and kinds of expertise. ● The vocabularies should be able to retrieve the real existing data: ● Vocabularies should be able to describe the data, not only the content... i.e: types of documents, physical format of the data... ● A strategy to increase te datasets when new data addresses new necessities has to be implemented.
  • 13. The reality of the data ● Different institutions use different systems to assign keywords (or no system) ● Keywords can have different relevance in different systems ● In a National Archive “holocaust” can be a relevant keyword, but it is not relevant for the EHRI portal. ● A same keyword can have different meanings in different knowledge basis ● i.e: “labor” in one set of imported data corresponds to “forced labor”, in another set to “trade unions” ● Relevant information is often given as free text: ● Necessary to use Natural Language Processing to extract this information, but we can do in the project only in a experimental level.
  • 14. EHRI's data driven approach (1) ● Extraction of access points of the EAD files during import <controlaccess> <geogname>Poland</geogname> <geogname>Warsaw</geogname> </controlaccess> <controlaccess> <subject>Persecution of Jews</subject> <subject>Testimonies, Biographies</subject> <subject>Holocaust survivors</subject> </controlaccess> <controlaccess> <corpname>Centralna Żydowska Komisja Historyczna</corpname> </controlaccess>
  • 15. EHRI's data driven approach (2) ● Person, corporate bodies: ● Check whether we have corresponding authority files ● If we have: link the description unit with the correspoinding authority file ● If we don't have: create a new authority file ● Priority of EHRI: creators of archival collections ● Places: ● Link the places with the geographical database GeoNames ● Problematic for historical places, some of them will be added as extra vocabulary.
  • 16. EHRI's data driven approach (3) ● Concepts/terms: the most complicated case ● Archives used very different strategies for concepts: ● Some institutions make composition of terms using different rules (or no-rule) ● Subject: “Jews--Persecution--France” (data of USHMM) ● EHRI has an atomic approach ● Subject: “Persecution of Jews” ● Place: “France” ● Steps to process concepts/terms: ● Terms are normalized and de-duplicated ● If there are equivalent terms in the thesaurus we establish a link ● If there are not equivalent terms the concept goes to further analysis ● If necessary a board of experts will consider to accomodate a new concept in our concept hierarchy.
  • 17. Ghethos and Concentration Camps ● We evaluate to start a WikiData project for ghettos and concentration camps ● Strategy: ● Extract information from the actual thesaurus and alternative sources ● Encyclopedic knowledge ● Data from project partners ● Integration of all this data in the WikiData platform ● Enrichment with help of the community ● Multilingual labels and no controversial information ● Finally the data in WikiData and in the portal should be synchronized
  • 18. NIOD Institute for War, Holocaust and Genocide Studies (NL)   CEGESOMA Centre for Historical Research and Documentation on War and Contemporary Society (BE)   Jewish Museum in Prague (CZ)   Center for Holocaust Studies at the Institute for Contemporary History in Munich (DE)   YAD VASHEM The Holocaust Martyrs’ and Heroes’ Remembrance Authority (IL) United States Holocaust Memorial Museum (USA) Bundesarchiv (DE)   The Wiener Library Institute for the Study of the Holocaust & Genocide (UK) Holocaust Documentation Centre (SK) Polish Center for Holocaust Research (PL)   The Jewish Museum of Greece (GR) Jewish Historical Institute (PL) King’s College London (UK)   Ontotext AD (BG)   Elie Wiesel National Institute for the Study of Holocaust in Romania (RO)   DANS Data Archiving and Networked Services (NL)   Shoah Memorial, Museum, Center for Contemporary Jewish Documentation (FR)   ITS International Tracing Service (DE)   Hungarian Jewish Archives (HU)   INRIA Institute for Research in Computer Science and Automation (FR)   Vilna Gaon State Jewish Museum (LT)   VWI Vienna Wiesenthal Institute for Holocaust Studies (AT) Foundation Jewish Contemporary Documentation Center (IT) CONNECTING KNOWLEDGE
  • 19. CONNECTING COLLECTIONS Integration and Retrieval of Heterogeneous Archival Metadata 09/11/2016