SlideShare a Scribd company logo
www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Building new knowledge from
distributed scientific corpus
HERBADROP & EUROPEANA: two concrete case studies for
exploring big archival data
2nd Computational Archival Science (CAS) workshop
Boston, USA, December 2017
Pascal Dugénie, Daan Broeder, Nuno Freire
Massively
distributed
collections
Digital Infrastructures for Research
Opportunities for preserving valuable scientific heritage
Collaborative Data
Infrastructure (CDI)
Trusted Digital Repositories (TDR)
ISO 16363, ISO 14721 (OAIS)
High-speed
network
infrastructures
LONG-TERM PRESERVATION
Monitoring
Data Storage
Persistent ID
Metadata
Data curation
and policies
Natural heritage Cultural heritage
HPC
infrastructures
BIG DATA
analysis tools
sharing
distributed
corpora
extraction
of text in
images
knowledge
building
visibility
of data
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPEANA, two concrete case studies for exploring big archival data
EUDAT: A truly pan-European Infrastructure
EUDAT offers common data
services to both research
communities and individuals
through a large network of
European organisations.
EUDAT wants to enable
European researchers from
any discipline to preserve,
find, access, and process data
in a trusted environment, as part
of a Collaborative Data
Infrastructure.
European infrastructures
Technology Providers
Research Communities
B2 Service Suite
https://www.eudat.eu/services
Covering both access and
deposit, from informal
data sharing to long-term
archiving, and addressing
identification,
discoverability and
computability of both
long-tail and big data,
EUDAT services seek to
address the full lifecycle
of research data
Common Language Resources and Technology
Infrastructure (CLARIN)
Building solutions with the
communities
European Network for Earth System Modelling (ENES)
Distributed infrastructure for life-science information
(ELIXIR)
European Plate Observing System (EPOS) - Solid Earth
sciences Research Infrastructure
Integrated Carbon Observation System (ICOS) to quantify
& understand greenhouse gas balance
Long-Term Ecosystem Research (LTER) in Europe
EUDAT services are designed, built and implemented together with
user communities.
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPEANA, two concrete case studies for exploring big archival data
Challenges and problem to be solved
 Digitalized images
 physical copies are fragile
 digital copy must be preserved
 Exploitation of digital
copies
 description metadata and
classification is complex
 images contain a lot of
information that should be
extracted and made available
Herbadrop rationale
• Millions of specimens in
herbaria all over the world
• Global trend to industrial
digitizing
• Data difficult to handle even
for medium size institutes
• Same challenges being faced
by hundreds of herbaria in
Europe
• Makes sense to work together
to develop a solution
tiff: 180MB zip: 80MB jpg: 1MB
Total: 161MB
Herbadrop in Europe
MEISE, BE
n
Herbadrop objectives
PRESERVATION1
INFORMATION
EXTRACTION
2
KNOWLEDGE
BUILDING
3
deep learning using OCR results with
access with the whole community for
crowdsourcing
long-term preservation of herbarium
specimen images
curent scope
extracting information from images by
using Optical Character Recognition
(OCR) basic image analysis techniques
perspectives
HERBADROP/EUDAT Workflows
STORAGE
TRANSFER
Transferring
images using
B2SAFE
service
OCR
ACCES MONITORING
images
Performing
OCR
analysis
using HPC
Ingesting OCR
results in a
full text
indexing engine
Controling
data quality
(file format
and integrity)
OCR
ARCHIVING
Surveying
bit-stream
integrity
and data
quality
Ingesting
images and
metadata for
long-term
archiving
Producing
regular
statistical
reports
Producing
regular
statistical
reports
Monitoring
data and
processes
status
reports
statistics
Harvesting
and indexing
metadata
Offering open
access to full
text engine,
images and
metadata
CERTIFICATION
Implementing a DSA-based certification including appropriate SLA
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPEANA, two concrete case studies for exploring big archival data
Europeana:
European Cultural Heritage on the
Web
The main goal of Europeana is to provide
access to cultural heritage and encourage
people to engage with culture.
• And the main access point is the Web!
• Promoting the research use of heritage data
resources is in its early stages of
development
CC BY-SAPerspectives on using Schema.org for publishing and harvesting metadata at
Europeana
CC BY-SA
The Challenges (1/2)
The Generic Challenge
How to facilitate the re-use of Cultural Heritage
language resources for research purposes
… by exploiting the existing and emerging European
research infrastructures
How can the resources be discovered
How can the resources be shared in practical ways
for researchers
How can advanced computation be applied to
these Cultural Heritage datasets
How can the resources and datasets be cited and
referenced in research
How can the Cultural Heritage institutions re-use
the outcomes of research
The Challenges (2/2)
The Specific Challenges of the Pilot
To identify requirements for technical interoperability
between the two infrastructures
Creating best practice guidelines for the publication
and citation of cultural heritage data
Facilitate the collaborative work between researchers,
with focus on:
Humanities
Social Sciences
Computer science
Europeana Newspapers Corpus
The pilot aims to expose the full text aggregated in the
Europeana Newspapers project.
This corpus contains over 11 million pages of full text of
historic newspapers
Mainly from the 19th century
Aggregated from national and research libraries
across Europe.
The pilot aims to expose and improve the text for more
data driven usage
…based on EUDAT Data services…
EUDAT service uptake
Europeana Newspaper Pilot relies on the following EUDAT
services:
Research data storage and sharing (B2SHARE):
as to undertake the enrichment of the datasets as
well as, more generally, expose them for re-use by
other academics, particularly those outside the
digital humanities
Persistent Identification Service (B2HANDLE):
Persistent identification of the main objects of the
full-text corpus: the newspapers titles and individual
issues
Multi-disciplinary joint metadata catalogue
(B2FIND): so that scientists will be able to
obtain the full corpus for machine processing
select just a portion of the corpus benefitting
from the enrichment of article-level annotations
with named entities and topics
www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Conclusions
&
Perspectives
Conclusions
• General conclusions:
• A successful application of the EUDAT services was
achieved
• Heritage research data brought new requirements to
EUDAT
• HERBADROP:
• Application of EUDAT’s computational capabilities are
identifying new challenges:
• How to address poor quality OCR
• Amount of data is large and may become a limitation
for accurate and exhaustive analysis
• EUROPEANA:
• Learned about the requirements of research usage
• Some may have impact on its data providers
HERBADROP and EUROPEANA:
Some perspectives for data services
 Improving discoverability of heritage research data resources
 Full-text based
 Metadata based
 Additional heritage specific metadata support in EUDAT
 Dat formats support, and semantics
 Semantic annotations
 Computational processing for heritage use cases:
 OCR
 Image analysis tools
For additional information
http://www.eudat.eu/
Nuno Freire,
Europeana DSI/INESC-ID
nuno.freire@europeana.eu
http://www.europeana.eu/

More Related Content

What's hot

Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
Archiver
 
2009 Barcelona Wepreserve Nestor
2009 Barcelona Wepreserve Nestor2009 Barcelona Wepreserve Nestor
2009 Barcelona Wepreserve Nestor
DigitalPreservationEurope
 
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE caseA Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
Blue BRIDGE
 
Berlin 6 Open Access Conference: Deirdre Furlong
Berlin 6 Open Access Conference: Deirdre FurlongBerlin 6 Open Access Conference: Deirdre Furlong
Berlin 6 Open Access Conference: Deirdre Furlong
Cornelius Puschmann
 
E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3
Alex Hardisty
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony
Archiver
 
E-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government ArchivesE-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government Archives
Danube University Krems, Centre for E-Governance
 
Europeana Cloud: The Essential Facts
Europeana Cloud: The Essential FactsEuropeana Cloud: The Essential Facts
Europeana Cloud: The Essential Facts
LIBER Europe
 
Towards standardization of plant phenotypic data_Krajewski
Towards standardization of plant phenotypic data_KrajewskiTowards standardization of plant phenotypic data_Krajewski
Towards standardization of plant phenotypic data_Krajewski
Platforma Otwartej Nauki
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
EOSC-hub project
 
KeepIt Course 1: Digital Preservation Tools for Repository Managers
KeepIt Course 1: Digital Preservation Tools for Repository ManagersKeepIt Course 1: Digital Preservation Tools for Repository Managers
KeepIt Course 1: Digital Preservation Tools for Repository Managers
JISC KeepIt project
 
User Interface of the National Repository of Grey Literature
User Interface of the National Repository of Grey LiteratureUser Interface of the National Repository of Grey Literature
User Interface of the National Repository of Grey Literature
pejsovap
 
OpenAIRE at UGOE 2010
OpenAIRE at UGOE 2010 OpenAIRE at UGOE 2010
OpenAIRE at UGOE 2010
OpenAIRE
 
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked DataMark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
mbruemmer
 
Method of structuring and compressing Labeled trees of arbitrary degree and ...
Method of structuring and compressing  Labeled trees of arbitrary degree and ...Method of structuring and compressing  Labeled trees of arbitrary degree and ...
Method of structuring and compressing Labeled trees of arbitrary degree and ...
Toscana Open Research
 
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
Leon Osinski
 
Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415
EDINA, University of Edinburgh
 
Project update - João Fernandes
Project update - João FernandesProject update - João Fernandes
Project update - João Fernandes
Archiver
 
Big data
Big dataBig data
Open @ EDINA
Open @ EDINAOpen @ EDINA

What's hot (20)

Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 
2009 Barcelona Wepreserve Nestor
2009 Barcelona Wepreserve Nestor2009 Barcelona Wepreserve Nestor
2009 Barcelona Wepreserve Nestor
 
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE caseA Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
 
Berlin 6 Open Access Conference: Deirdre Furlong
Berlin 6 Open Access Conference: Deirdre FurlongBerlin 6 Open Access Conference: Deirdre Furlong
Berlin 6 Open Access Conference: Deirdre Furlong
 
E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony
 
E-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government ArchivesE-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government Archives
 
Europeana Cloud: The Essential Facts
Europeana Cloud: The Essential FactsEuropeana Cloud: The Essential Facts
Europeana Cloud: The Essential Facts
 
Towards standardization of plant phenotypic data_Krajewski
Towards standardization of plant phenotypic data_KrajewskiTowards standardization of plant phenotypic data_Krajewski
Towards standardization of plant phenotypic data_Krajewski
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
 
KeepIt Course 1: Digital Preservation Tools for Repository Managers
KeepIt Course 1: Digital Preservation Tools for Repository ManagersKeepIt Course 1: Digital Preservation Tools for Repository Managers
KeepIt Course 1: Digital Preservation Tools for Repository Managers
 
User Interface of the National Repository of Grey Literature
User Interface of the National Repository of Grey LiteratureUser Interface of the National Repository of Grey Literature
User Interface of the National Repository of Grey Literature
 
OpenAIRE at UGOE 2010
OpenAIRE at UGOE 2010 OpenAIRE at UGOE 2010
OpenAIRE at UGOE 2010
 
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked DataMark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
 
Method of structuring and compressing Labeled trees of arbitrary degree and ...
Method of structuring and compressing  Labeled trees of arbitrary degree and ...Method of structuring and compressing  Labeled trees of arbitrary degree and ...
Method of structuring and compressing Labeled trees of arbitrary degree and ...
 
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
3TU.Datacentrum: presentation for OpenML Workshop (III) at Eindhoven, 22-10-2...
 
Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415
 
Project update - João Fernandes
Project update - João FernandesProject update - João Fernandes
Project update - João Fernandes
 
Big data
Big dataBig data
Big data
 
Open @ EDINA
Open @ EDINAOpen @ EDINA
Open @ EDINA
 

Similar to Building new knowledge from distributed scientific corpus: HERBADROP & EUROPEANA, two concrete case studies for exploring big archival data

eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructureeROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
e-ROSA
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
OpenAIRE
 
BioDT for the UiO Science section meeting 2023-03-24
BioDT for the UiO Science section meeting 2023-03-24BioDT for the UiO Science section meeting 2023-03-24
BioDT for the UiO Science section meeting 2023-03-24
Dag Endresen
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
guest0dc425
 
How EUDAT services support FAIR data - IDCC 2017| www.eudat.eu |
How EUDAT services support FAIR data - IDCC 2017| www.eudat.eu | How EUDAT services support FAIR data - IDCC 2017| www.eudat.eu |
How EUDAT services support FAIR data - IDCC 2017| www.eudat.eu |
EUDAT
 
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
e-ROSA
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Archiver
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver
 
EOSC-hub in EOSC context
EOSC-hub in EOSC contextEOSC-hub in EOSC context
EOSC-hub in EOSC context
EOSC-hub project
 
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
Dag Endresen
 
BeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptxBeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptx
FIWARE
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
Carole Goble
 
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euData management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.eu
EUDAT
 
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
EDINA, University of Edinburgh
 
Eudat presentation nov2013 | www.eudat.eu |
Eudat presentation nov2013 | www.eudat.eu | Eudat presentation nov2013 | www.eudat.eu |
Eudat presentation nov2013 | www.eudat.eu |
EUDAT
 
Reliance project introduction
Reliance project introductionReliance project introduction
Reliance project introduction
Raul Palma
 
Big Data Europe at eHealth Week 2017: Linking Big Data in Health
Big Data Europe at eHealth Week 2017: Linking Big Data in HealthBig Data Europe at eHealth Week 2017: Linking Big Data in Health
Big Data Europe at eHealth Week 2017: Linking Big Data in Health
BigData_Europe
 
AGGREGATING AND ENRICHING AUDIO-VISUAL METADATA USING EBUCORE | Athanasios DR...
AGGREGATING AND ENRICHING AUDIO-VISUAL METADATA USING EBUCORE | Athanasios DR...AGGREGATING AND ENRICHING AUDIO-VISUAL METADATA USING EBUCORE | Athanasios DR...
AGGREGATING AND ENRICHING AUDIO-VISUAL METADATA USING EBUCORE | Athanasios DR...
FIAT/IFTA
 
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...
EUDAT
 

Similar to Building new knowledge from distributed scientific corpus: HERBADROP & EUROPEANA, two concrete case studies for exploring big archival data (20)

eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructureeROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
BioDT for the UiO Science section meeting 2023-03-24
BioDT for the UiO Science section meeting 2023-03-24BioDT for the UiO Science section meeting 2023-03-24
BioDT for the UiO Science section meeting 2023-03-24
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
 
How EUDAT services support FAIR data - IDCC 2017| www.eudat.eu |
How EUDAT services support FAIR data - IDCC 2017| www.eudat.eu | How EUDAT services support FAIR data - IDCC 2017| www.eudat.eu |
How EUDAT services support FAIR data - IDCC 2017| www.eudat.eu |
 
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
EOSC-hub in EOSC context
EOSC-hub in EOSC contextEOSC-hub in EOSC context
EOSC-hub in EOSC context
 
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
 
BeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptxBeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptx
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euData management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.eu
 
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
 
Eudat presentation nov2013 | www.eudat.eu |
Eudat presentation nov2013 | www.eudat.eu | Eudat presentation nov2013 | www.eudat.eu |
Eudat presentation nov2013 | www.eudat.eu |
 
Reliance project introduction
Reliance project introductionReliance project introduction
Reliance project introduction
 
Big Data Europe at eHealth Week 2017: Linking Big Data in Health
Big Data Europe at eHealth Week 2017: Linking Big Data in HealthBig Data Europe at eHealth Week 2017: Linking Big Data in Health
Big Data Europe at eHealth Week 2017: Linking Big Data in Health
 
AGGREGATING AND ENRICHING AUDIO-VISUAL METADATA USING EBUCORE | Athanasios DR...
AGGREGATING AND ENRICHING AUDIO-VISUAL METADATA USING EBUCORE | Athanasios DR...AGGREGATING AND ENRICHING AUDIO-VISUAL METADATA USING EBUCORE | Athanasios DR...
AGGREGATING AND ENRICHING AUDIO-VISUAL METADATA USING EBUCORE | Athanasios DR...
 
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...
 

More from Nuno Freire

Aggregation of Schema.org Linked Data for the Europeana Common Culture project
Aggregation of Schema.org Linked Data for the Europeana Common Culture projectAggregation of Schema.org Linked Data for the Europeana Common Culture project
Aggregation of Schema.org Linked Data for the Europeana Common Culture project
Nuno Freire
 
Connecting Europe Facility - The eArchiving Building Block
Connecting Europe Facility - The eArchiving Building BlockConnecting Europe Facility - The eArchiving Building Block
Connecting Europe Facility - The eArchiving Building Block
Nuno Freire
 
Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...
Nuno Freire
 
Next Generation Research with Europeana: the Humanities and Cultural Heritage...
Next Generation Research with Europeana: the Humanities and Cultural Heritage...Next Generation Research with Europeana: the Humanities and Cultural Heritage...
Next Generation Research with Europeana: the Humanities and Cultural Heritage...
Nuno Freire
 
Demo of the Data Aggregation Lab - June 2018
Demo of the Data Aggregation Lab - June 2018Demo of the Data Aggregation Lab - June 2018
Demo of the Data Aggregation Lab - June 2018
Nuno Freire
 
Demo of the Data Aggregation Lab - October 2018
Demo of the Data Aggregation Lab - October 2018Demo of the Data Aggregation Lab - October 2018
Demo of the Data Aggregation Lab - October 2018
Nuno Freire
 
Opening Digitized Newspapers Corpora: Europeana’s Full-text Data Interoperabi...
Opening Digitized Newspapers Corpora: Europeana’s Full-text Data Interoperabi...Opening Digitized Newspapers Corpora: Europeana’s Full-text Data Interoperabi...
Opening Digitized Newspapers Corpora: Europeana’s Full-text Data Interoperabi...
Nuno Freire
 
Aggregation of Linked Data A case study in the cultural heritage domain
Aggregation of Linked Data A case study in the cultural heritage domainAggregation of Linked Data A case study in the cultural heritage domain
Aggregation of Linked Data A case study in the cultural heritage domain
Nuno Freire
 
Aggregation of cultural heritage datasets through the Web of Data
Aggregation of cultural heritage datasets through the Web of DataAggregation of cultural heritage datasets through the Web of Data
Aggregation of cultural heritage datasets through the Web of Data
Nuno Freire
 
Evaluation of Schema.org for Aggregation of Cultural Heritage Metadata
Evaluation of Schema.org for Aggregation of Cultural Heritage MetadataEvaluation of Schema.org for Aggregation of Cultural Heritage Metadata
Evaluation of Schema.org for Aggregation of Cultural Heritage Metadata
Nuno Freire
 
The Europeana Community: Semantics and Cultural Heritage Data
The Europeana Community: Semantics and Cultural Heritage DataThe Europeana Community: Semantics and Cultural Heritage Data
The Europeana Community: Semantics and Cultural Heritage Data
Nuno Freire
 
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
Nuno Freire
 
IIIF at europeana, IIIF conference, Vatican, 2017
IIIF at europeana, IIIF conference, Vatican, 2017IIIF at europeana, IIIF conference, Vatican, 2017
IIIF at europeana, IIIF conference, Vatican, 2017
Nuno Freire
 
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
Nuno Freire
 
Use Cases From Digital Humanities for Library Linked Data
Use Cases From Digital Humanities for Library Linked DataUse Cases From Digital Humanities for Library Linked Data
Use Cases From Digital Humanities for Library Linked Data
Nuno Freire
 

More from Nuno Freire (15)

Aggregation of Schema.org Linked Data for the Europeana Common Culture project
Aggregation of Schema.org Linked Data for the Europeana Common Culture projectAggregation of Schema.org Linked Data for the Europeana Common Culture project
Aggregation of Schema.org Linked Data for the Europeana Common Culture project
 
Connecting Europe Facility - The eArchiving Building Block
Connecting Europe Facility - The eArchiving Building BlockConnecting Europe Facility - The eArchiving Building Block
Connecting Europe Facility - The eArchiving Building Block
 
Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...
 
Next Generation Research with Europeana: the Humanities and Cultural Heritage...
Next Generation Research with Europeana: the Humanities and Cultural Heritage...Next Generation Research with Europeana: the Humanities and Cultural Heritage...
Next Generation Research with Europeana: the Humanities and Cultural Heritage...
 
Demo of the Data Aggregation Lab - June 2018
Demo of the Data Aggregation Lab - June 2018Demo of the Data Aggregation Lab - June 2018
Demo of the Data Aggregation Lab - June 2018
 
Demo of the Data Aggregation Lab - October 2018
Demo of the Data Aggregation Lab - October 2018Demo of the Data Aggregation Lab - October 2018
Demo of the Data Aggregation Lab - October 2018
 
Opening Digitized Newspapers Corpora: Europeana’s Full-text Data Interoperabi...
Opening Digitized Newspapers Corpora: Europeana’s Full-text Data Interoperabi...Opening Digitized Newspapers Corpora: Europeana’s Full-text Data Interoperabi...
Opening Digitized Newspapers Corpora: Europeana’s Full-text Data Interoperabi...
 
Aggregation of Linked Data A case study in the cultural heritage domain
Aggregation of Linked Data A case study in the cultural heritage domainAggregation of Linked Data A case study in the cultural heritage domain
Aggregation of Linked Data A case study in the cultural heritage domain
 
Aggregation of cultural heritage datasets through the Web of Data
Aggregation of cultural heritage datasets through the Web of DataAggregation of cultural heritage datasets through the Web of Data
Aggregation of cultural heritage datasets through the Web of Data
 
Evaluation of Schema.org for Aggregation of Cultural Heritage Metadata
Evaluation of Schema.org for Aggregation of Cultural Heritage MetadataEvaluation of Schema.org for Aggregation of Cultural Heritage Metadata
Evaluation of Schema.org for Aggregation of Cultural Heritage Metadata
 
The Europeana Community: Semantics and Cultural Heritage Data
The Europeana Community: Semantics and Cultural Heritage DataThe Europeana Community: Semantics and Cultural Heritage Data
The Europeana Community: Semantics and Cultural Heritage Data
 
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
 
IIIF at europeana, IIIF conference, Vatican, 2017
IIIF at europeana, IIIF conference, Vatican, 2017IIIF at europeana, IIIF conference, Vatican, 2017
IIIF at europeana, IIIF conference, Vatican, 2017
 
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
 
Use Cases From Digital Humanities for Library Linked Data
Use Cases From Digital Humanities for Library Linked DataUse Cases From Digital Humanities for Library Linked Data
Use Cases From Digital Humanities for Library Linked Data
 

Recently uploaded

DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
Kanchana Weerasinghe
 
the unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithmthe unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithm
huseindihon
 
Universidad de Valladolid degree offer diploma Transcript
Universidad de Valladolid  degree offer diploma TranscriptUniversidad de Valladolid  degree offer diploma Transcript
Universidad de Valladolid degree offer diploma Transcript
taqyea
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
gargnatasha985
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
GaneshGanesh399816
 
Seamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send MoneySeamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send Money
gargtinna79
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
kamli sharma#S10
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
saadkhan1485265
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata
 
ISBP 821 - UCP 600 - ed).pdf banking standards
ISBP 821 - UCP 600 - ed).pdf banking standardsISBP 821 - UCP 600 - ed).pdf banking standards
ISBP 821 - UCP 600 - ed).pdf banking standards
DevanshuAnada1
 
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured DataFine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
kevig
 
all about the data science process, covering the steps present in almost ever...
all about the data science process, covering the steps present in almost ever...all about the data science process, covering the steps present in almost ever...
all about the data science process, covering the steps present in almost ever...
palaniappancse
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
tanupasswan6
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
birajmohan012
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
uapta
 
transgenders community data in india by govt
transgenders community data in india by govttransgenders community data in india by govt
transgenders community data in india by govt
palanisamyiiiier
 
Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...
chetankumar9855
 
Harendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting PortfolioHarendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting Portfolio
harendmgr
 
Nipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma TranscriptNipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma Transcript
zyqedad
 

Recently uploaded (20)

DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
 
the unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithmthe unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithm
 
Universidad de Valladolid degree offer diploma Transcript
Universidad de Valladolid  degree offer diploma TranscriptUniversidad de Valladolid  degree offer diploma Transcript
Universidad de Valladolid degree offer diploma Transcript
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
 
Seamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send MoneySeamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send Money
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
 
ISBP 821 - UCP 600 - ed).pdf banking standards
ISBP 821 - UCP 600 - ed).pdf banking standardsISBP 821 - UCP 600 - ed).pdf banking standards
ISBP 821 - UCP 600 - ed).pdf banking standards
 
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured DataFine-Tuning of Small/Medium LLMs for Business QA on Structured Data
Fine-Tuning of Small/Medium LLMs for Business QA on Structured Data
 
all about the data science process, covering the steps present in almost ever...
all about the data science process, covering the steps present in almost ever...all about the data science process, covering the steps present in almost ever...
all about the data science process, covering the steps present in almost ever...
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
 
transgenders community data in india by govt
transgenders community data in india by govttransgenders community data in india by govt
transgenders community data in india by govt
 
Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...
 
Harendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting PortfolioHarendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting Portfolio
 
Nipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma TranscriptNipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma Transcript
 

Building new knowledge from distributed scientific corpus: HERBADROP & EUROPEANA, two concrete case studies for exploring big archival data

  • 1. www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Building new knowledge from distributed scientific corpus HERBADROP & EUROPEANA: two concrete case studies for exploring big archival data 2nd Computational Archival Science (CAS) workshop Boston, USA, December 2017 Pascal Dugénie, Daan Broeder, Nuno Freire
  • 2. Massively distributed collections Digital Infrastructures for Research Opportunities for preserving valuable scientific heritage Collaborative Data Infrastructure (CDI) Trusted Digital Repositories (TDR) ISO 16363, ISO 14721 (OAIS) High-speed network infrastructures LONG-TERM PRESERVATION Monitoring Data Storage Persistent ID Metadata Data curation and policies Natural heritage Cultural heritage HPC infrastructures BIG DATA analysis tools sharing distributed corpora extraction of text in images knowledge building visibility of data
  • 4. EUDAT: A truly pan-European Infrastructure EUDAT offers common data services to both research communities and individuals through a large network of European organisations. EUDAT wants to enable European researchers from any discipline to preserve, find, access, and process data in a trusted environment, as part of a Collaborative Data Infrastructure. European infrastructures Technology Providers Research Communities
  • 5. B2 Service Suite https://www.eudat.eu/services Covering both access and deposit, from informal data sharing to long-term archiving, and addressing identification, discoverability and computability of both long-tail and big data, EUDAT services seek to address the full lifecycle of research data
  • 6. Common Language Resources and Technology Infrastructure (CLARIN) Building solutions with the communities European Network for Earth System Modelling (ENES) Distributed infrastructure for life-science information (ELIXIR) European Plate Observing System (EPOS) - Solid Earth sciences Research Infrastructure Integrated Carbon Observation System (ICOS) to quantify & understand greenhouse gas balance Long-Term Ecosystem Research (LTER) in Europe EUDAT services are designed, built and implemented together with user communities.
  • 8. Challenges and problem to be solved  Digitalized images  physical copies are fragile  digital copy must be preserved  Exploitation of digital copies  description metadata and classification is complex  images contain a lot of information that should be extracted and made available
  • 9. Herbadrop rationale • Millions of specimens in herbaria all over the world • Global trend to industrial digitizing • Data difficult to handle even for medium size institutes • Same challenges being faced by hundreds of herbaria in Europe • Makes sense to work together to develop a solution tiff: 180MB zip: 80MB jpg: 1MB Total: 161MB
  • 11. Herbadrop objectives PRESERVATION1 INFORMATION EXTRACTION 2 KNOWLEDGE BUILDING 3 deep learning using OCR results with access with the whole community for crowdsourcing long-term preservation of herbarium specimen images curent scope extracting information from images by using Optical Character Recognition (OCR) basic image analysis techniques perspectives
  • 12. HERBADROP/EUDAT Workflows STORAGE TRANSFER Transferring images using B2SAFE service OCR ACCES MONITORING images Performing OCR analysis using HPC Ingesting OCR results in a full text indexing engine Controling data quality (file format and integrity) OCR ARCHIVING Surveying bit-stream integrity and data quality Ingesting images and metadata for long-term archiving Producing regular statistical reports Producing regular statistical reports Monitoring data and processes status reports statistics Harvesting and indexing metadata Offering open access to full text engine, images and metadata CERTIFICATION Implementing a DSA-based certification including appropriate SLA
  • 14. Europeana: European Cultural Heritage on the Web The main goal of Europeana is to provide access to cultural heritage and encourage people to engage with culture. • And the main access point is the Web! • Promoting the research use of heritage data resources is in its early stages of development CC BY-SAPerspectives on using Schema.org for publishing and harvesting metadata at Europeana CC BY-SA
  • 15. The Challenges (1/2) The Generic Challenge How to facilitate the re-use of Cultural Heritage language resources for research purposes … by exploiting the existing and emerging European research infrastructures How can the resources be discovered How can the resources be shared in practical ways for researchers How can advanced computation be applied to these Cultural Heritage datasets How can the resources and datasets be cited and referenced in research How can the Cultural Heritage institutions re-use the outcomes of research
  • 16. The Challenges (2/2) The Specific Challenges of the Pilot To identify requirements for technical interoperability between the two infrastructures Creating best practice guidelines for the publication and citation of cultural heritage data Facilitate the collaborative work between researchers, with focus on: Humanities Social Sciences Computer science
  • 17. Europeana Newspapers Corpus The pilot aims to expose the full text aggregated in the Europeana Newspapers project. This corpus contains over 11 million pages of full text of historic newspapers Mainly from the 19th century Aggregated from national and research libraries across Europe. The pilot aims to expose and improve the text for more data driven usage …based on EUDAT Data services…
  • 18. EUDAT service uptake Europeana Newspaper Pilot relies on the following EUDAT services: Research data storage and sharing (B2SHARE): as to undertake the enrichment of the datasets as well as, more generally, expose them for re-use by other academics, particularly those outside the digital humanities Persistent Identification Service (B2HANDLE): Persistent identification of the main objects of the full-text corpus: the newspapers titles and individual issues Multi-disciplinary joint metadata catalogue (B2FIND): so that scientists will be able to obtain the full corpus for machine processing select just a portion of the corpus benefitting from the enrichment of article-level annotations with named entities and topics
  • 19. www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Conclusions & Perspectives
  • 20. Conclusions • General conclusions: • A successful application of the EUDAT services was achieved • Heritage research data brought new requirements to EUDAT • HERBADROP: • Application of EUDAT’s computational capabilities are identifying new challenges: • How to address poor quality OCR • Amount of data is large and may become a limitation for accurate and exhaustive analysis • EUROPEANA: • Learned about the requirements of research usage • Some may have impact on its data providers
  • 21. HERBADROP and EUROPEANA: Some perspectives for data services  Improving discoverability of heritage research data resources  Full-text based  Metadata based  Additional heritage specific metadata support in EUDAT  Dat formats support, and semantics  Semantic annotations  Computational processing for heritage use cases:  OCR  Image analysis tools
  • 22. For additional information http://www.eudat.eu/ Nuno Freire, Europeana DSI/INESC-ID nuno.freire@europeana.eu http://www.europeana.eu/