SlideShare a Scribd company logo
1 of 28
Accessing the original observation
data captured during plant
exploration missions for collecting
crop diversity
Bioversity International, Via dei Tre Denari 472/a, Maccarese, Rome, Italy
Hannes Gaisberger, Massimo Buonaiuto, Federico Mattei, Andrea De Pirro,
Valentina Barbiero, Simone Mori, Imke Thormann, Tom Hazekamp, Elizabeth
Arnaud
Agenda
• Part 1: Safeguarding the original paper
documents by scanning and digitizing the data
– Hannes Gaisberger
• Part 2: Creation of a public repository of full
scanned documents enabling access to the
full text – Massimo Buonaiuto
Bioversity supported germplasm
collecting missions
• Since 1974, Bioversity International has
supported more than 550 germplasm collecting
missions yielding 225,875 samples and
covering 4,300 species from 137 countries
• Samples were sent to several genebanks
worldwide for safety duplication, conservation
and potential distribution
• Other CGIAR centers organized various
collecting missions for their mandate crops
Original observation data is essential for:
• Identify duplicates between
collections and gaps in diversity –
value for genebank curators and
collecting actions
• Tracking original sample & country
of origin in pedigrees – value for
Breeders and Benefit Sharing
• Collectors recorded key sample information
(passport data) and other observation data in field
books
Scanning of field notebooks and
related documents
Original observation: a treasure for
genebanks and breeders
•Genus and Species
•Collecting Number
•Site Information: Admin boundaries,
Latitude, Longitude and Elevation
•Collecting Source and Sample Status
The collecting form contains the
botanical classification along with
localization details, environment,
cultural practices, diseases and
pest presence and symptoms
and traditional uses
Identification and quality-checking
in databases
• Different publicly available genebank inventories are
checked in order to track corresponding samples and
complete missing passport data
Integration of quality passport
data
• Data extracted from field books and databases is
integrated in a sample level database of collecting
missions
Results in figures
• To date, the quality of 101,171
passport data records from 375
collecting missions has been
improved through data extracted from
scanned documentation
• 56,454 of these collected samples are
linked to genebank accessions in 51
institutes worldwide
Priority crops/
use group
Number of collected
samples
Forages 44056
Rice 25022
Maize 16484
Beans 10976
Wheat 7507
Cowpea 7473
Potato 7146
Pearl millet 6662
Barley 4429
Groundnut 2928
Finger millet 2850
Chickpea 1467
Banana 1326
Pigeon pea 999
Others 86550
Total 225875
• A total of 43,637 scanned pages are saved as 1063 pdf-
files and stored in an online repository aside the 26,000
other files scanned by CGIAR centers and partners
Publishing the data and attached
information
• End of 2010: work must be finished
for Bioversity supported missions
• Full text available on the online
repository and publish the
collection mission database
• Visualization: Map sites where
diversity was collected (after
georeferencing with Biogeomancer)
• Various projects to address gaps analysis and diversity analysis, like
Genesys, encourage partners to perform same work and share the full
text and data – links to CWR information, Museum herbaria information,
Literature
Public access to the scanned collecting
missions documents
A Repository that presently contains 27,000 Collecting
Missions Files from CGIAR Centers and partners:
• Agricultural Research Centre (ARC) of Lao People’s
Democratic Republic
• AfricaRice
• Agricultural Research for Development in Africa (IITA)
• Bioversity International
• International Rice Research Institute (IRRI)
Typology of the documents produced by
Collectors
1) Mission Reports
2) Summary Forms
3) Sample lists
4) Collecting Forms
5) Accession Vouchers
6) Newsletters
7) Factsheets
8) Distribution lists
9) Field Books
Documents Types Hierarchy
Analysis of Metadata (1/5)
Analysis of Metadata (2/5)
Analysis of Metadata (3/5)
Analysis of Metadata – Darwin Core for
Germplasm (4/5)
Analysis of Metadata (5/5)
Darwin Core Germplasm metadata
+
Collecting Missions metadata
=
Metadata for Collecting Missions Documents
How users will access the Repository
Alfresco DMS
Typo3 CMS
Import of 27,000 PDF Files
Process of import PDF files in 3 phases:
1. Conversion of institutional metadata in Darwin
Core Germplasm metadata
2. Association of metadata to all PDFs files, using
heterogeneous sources (databases, Excel files
and filenames, etc.)
3. Batch upload of all PDF files together with
metadata file associated to each file in DC-
Germplasm standard.
Public Search Mask (1/3)
Public Search Mask (2/3)
Public Search Mask (3/3)
How users will manage and publish
documents
• Simple Workflow to
publish into the
Repository:
1. Upload the file in private
user Home Space
2. Edit metadata
3. Approve the document for
public repository with a
click
... the file will be and public
Summary
• Improved quality of passport data for about 100,000
collected samples from 137 countries
• 56,454 of these collected samples are linked to
genebank accessions in 51 institutes worldwide
• Collected 27,000 documents classified in 9 types of
documents with metadata
• Metadata extracted and parsed using Gerplasm
Darwin Core standards
Open questions and challenges
- Interaction with Open Archive standards and
Protocol for Metadata Harvesting
- Integration with Crop Terminizer, University of
Manchester
- Web Analytics for monitoring of downloads in details
(referrers, visits, etc.) and web marketing
- CMIS protocol used to interact with content
management systems
- Metadata validation with crop scientists, collectors
http://www.central-repository.cgiar.org/
Guidelines for collecting samples
- Being revised and will be published in a new
section of the on the Crop genebank knowledge
base
- Adding guidelines for illustrating with photos that
support the tentative taxonomy, captured data and
GPS
THANK YOU!

More Related Content

Similar to Accessing the original observation data captured during plant exploration missions for collecting crop diversity

RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
World Agroforestry (ICRAF)
 
Biodiversity Informatics at the Natural History Museum
Biodiversity Informatics at the Natural History MuseumBiodiversity Informatics at the Natural History Museum
Biodiversity Informatics at the Natural History Museum
Edward Baker
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Spark Summit
 

Similar to Accessing the original observation data captured during plant exploration missions for collecting crop diversity (20)

Using e-Infrastructures for Biodiversity Conservation
Using e-Infrastructures for Biodiversity ConservationUsing e-Infrastructures for Biodiversity Conservation
Using e-Infrastructures for Biodiversity Conservation
 
An Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourceAn Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data Resource
 
Introduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshopIntroduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshop
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meeting
 
Digital libraries
Digital librariesDigital libraries
Digital libraries
 
Agro-Know & the European agricultural research information ecosystem
Agro-Know & the European agricultural research information ecosystemAgro-Know & the European agricultural research information ecosystem
Agro-Know & the European agricultural research information ecosystem
 
Repository : A Brief Comparative Study Between The National University Of Mal...
Repository : A Brief Comparative Study Between The National University Of Mal...Repository : A Brief Comparative Study Between The National University Of Mal...
Repository : A Brief Comparative Study Between The National University Of Mal...
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
Major germplasm data sources and referatories
Major germplasm data sources and referatoriesMajor germplasm data sources and referatories
Major germplasm data sources and referatories
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Making agricultural knowledge globally discoverable: are we there yet?
Making agricultural knowledge globally discoverable: are we there yet?Making agricultural knowledge globally discoverable: are we there yet?
Making agricultural knowledge globally discoverable: are we there yet?
 
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 1
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 1USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 1
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 1
 
GBIF data mobilisation for the Nansen Legacy, Tromsø, 2022-09-20
GBIF data mobilisation for the Nansen Legacy, Tromsø, 2022-09-20GBIF data mobilisation for the Nansen Legacy, Tromsø, 2022-09-20
GBIF data mobilisation for the Nansen Legacy, Tromsø, 2022-09-20
 
ELIXIR Competence Centre in EOSC-hub
ELIXIR Competence Centre in EOSC-hubELIXIR Competence Centre in EOSC-hub
ELIXIR Competence Centre in EOSC-hub
 
Biodiversity Informatics at the Natural History Museum
Biodiversity Informatics at the Natural History MuseumBiodiversity Informatics at the Natural History Museum
Biodiversity Informatics at the Natural History Museum
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
Research Data Management at The University of Edinburgh
Research Data Management at The University of EdinburghResearch Data Management at The University of Edinburgh
Research Data Management at The University of Edinburgh
 
FAO DOI presentation by Marco Marsella
FAO DOI presentation by Marco MarsellaFAO DOI presentation by Marco Marsella
FAO DOI presentation by Marco Marsella
 

Recently uploaded

Call Girls in Sarita Vihar Delhi Just Call 👉👉9873777170 Independent Female ...
Call Girls in  Sarita Vihar Delhi Just Call 👉👉9873777170  Independent Female ...Call Girls in  Sarita Vihar Delhi Just Call 👉👉9873777170  Independent Female ...
Call Girls in Sarita Vihar Delhi Just Call 👉👉9873777170 Independent Female ...
adilkhan87451
 
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 

Recently uploaded (20)

Call On 6297143586 Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...
Call On 6297143586  Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...Call On 6297143586  Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...
Call On 6297143586 Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...
 
Call Girls in Sarita Vihar Delhi Just Call 👉👉9873777170 Independent Female ...
Call Girls in  Sarita Vihar Delhi Just Call 👉👉9873777170  Independent Female ...Call Girls in  Sarita Vihar Delhi Just Call 👉👉9873777170  Independent Female ...
Call Girls in Sarita Vihar Delhi Just Call 👉👉9873777170 Independent Female ...
 
1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS
 
Akurdi ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Akurdi ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Akurdi ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Akurdi ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
 
Pimpri Chinchwad ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi R...
Pimpri Chinchwad ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi R...Pimpri Chinchwad ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi R...
Pimpri Chinchwad ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi R...
 
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
 
Junnar ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Junnar ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Junnar ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Junnar ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...
 
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
 
SMART BANGLADESH I PPTX I SLIDE IShovan Prita Paul.pptx
SMART BANGLADESH  I    PPTX   I    SLIDE   IShovan Prita Paul.pptxSMART BANGLADESH  I    PPTX   I    SLIDE   IShovan Prita Paul.pptx
SMART BANGLADESH I PPTX I SLIDE IShovan Prita Paul.pptx
 
Sustainability by Design: Assessment Tool for Just Energy Transition Plans
Sustainability by Design: Assessment Tool for Just Energy Transition PlansSustainability by Design: Assessment Tool for Just Energy Transition Plans
Sustainability by Design: Assessment Tool for Just Energy Transition Plans
 
Financing strategies for adaptation. Presentation for CANCC
Financing strategies for adaptation. Presentation for CANCCFinancing strategies for adaptation. Presentation for CANCC
Financing strategies for adaptation. Presentation for CANCC
 
VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)
 
The NAP process & South-South peer learning
The NAP process & South-South peer learningThe NAP process & South-South peer learning
The NAP process & South-South peer learning
 
An Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCCAn Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCC
 
Pimple Gurav ) Call Girls Service Pune | 8005736733 Independent Escorts & Dat...
Pimple Gurav ) Call Girls Service Pune | 8005736733 Independent Escorts & Dat...Pimple Gurav ) Call Girls Service Pune | 8005736733 Independent Escorts & Dat...
Pimple Gurav ) Call Girls Service Pune | 8005736733 Independent Escorts & Dat...
 
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'IsraëlAntisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
 
AHMR volume 10 number 1 January-April 2024
AHMR volume 10 number 1 January-April 2024AHMR volume 10 number 1 January-April 2024
AHMR volume 10 number 1 January-April 2024
 

Accessing the original observation data captured during plant exploration missions for collecting crop diversity

  • 1. Accessing the original observation data captured during plant exploration missions for collecting crop diversity Bioversity International, Via dei Tre Denari 472/a, Maccarese, Rome, Italy Hannes Gaisberger, Massimo Buonaiuto, Federico Mattei, Andrea De Pirro, Valentina Barbiero, Simone Mori, Imke Thormann, Tom Hazekamp, Elizabeth Arnaud
  • 2. Agenda • Part 1: Safeguarding the original paper documents by scanning and digitizing the data – Hannes Gaisberger • Part 2: Creation of a public repository of full scanned documents enabling access to the full text – Massimo Buonaiuto
  • 3. Bioversity supported germplasm collecting missions • Since 1974, Bioversity International has supported more than 550 germplasm collecting missions yielding 225,875 samples and covering 4,300 species from 137 countries • Samples were sent to several genebanks worldwide for safety duplication, conservation and potential distribution • Other CGIAR centers organized various collecting missions for their mandate crops
  • 4. Original observation data is essential for: • Identify duplicates between collections and gaps in diversity – value for genebank curators and collecting actions • Tracking original sample & country of origin in pedigrees – value for Breeders and Benefit Sharing
  • 5. • Collectors recorded key sample information (passport data) and other observation data in field books Scanning of field notebooks and related documents
  • 6. Original observation: a treasure for genebanks and breeders •Genus and Species •Collecting Number •Site Information: Admin boundaries, Latitude, Longitude and Elevation •Collecting Source and Sample Status The collecting form contains the botanical classification along with localization details, environment, cultural practices, diseases and pest presence and symptoms and traditional uses
  • 7. Identification and quality-checking in databases • Different publicly available genebank inventories are checked in order to track corresponding samples and complete missing passport data
  • 8. Integration of quality passport data • Data extracted from field books and databases is integrated in a sample level database of collecting missions
  • 9. Results in figures • To date, the quality of 101,171 passport data records from 375 collecting missions has been improved through data extracted from scanned documentation • 56,454 of these collected samples are linked to genebank accessions in 51 institutes worldwide Priority crops/ use group Number of collected samples Forages 44056 Rice 25022 Maize 16484 Beans 10976 Wheat 7507 Cowpea 7473 Potato 7146 Pearl millet 6662 Barley 4429 Groundnut 2928 Finger millet 2850 Chickpea 1467 Banana 1326 Pigeon pea 999 Others 86550 Total 225875 • A total of 43,637 scanned pages are saved as 1063 pdf- files and stored in an online repository aside the 26,000 other files scanned by CGIAR centers and partners
  • 10. Publishing the data and attached information • End of 2010: work must be finished for Bioversity supported missions • Full text available on the online repository and publish the collection mission database • Visualization: Map sites where diversity was collected (after georeferencing with Biogeomancer) • Various projects to address gaps analysis and diversity analysis, like Genesys, encourage partners to perform same work and share the full text and data – links to CWR information, Museum herbaria information, Literature
  • 11. Public access to the scanned collecting missions documents A Repository that presently contains 27,000 Collecting Missions Files from CGIAR Centers and partners: • Agricultural Research Centre (ARC) of Lao People’s Democratic Republic • AfricaRice • Agricultural Research for Development in Africa (IITA) • Bioversity International • International Rice Research Institute (IRRI)
  • 12. Typology of the documents produced by Collectors 1) Mission Reports 2) Summary Forms 3) Sample lists 4) Collecting Forms 5) Accession Vouchers 6) Newsletters 7) Factsheets 8) Distribution lists 9) Field Books
  • 17. Analysis of Metadata – Darwin Core for Germplasm (4/5)
  • 18. Analysis of Metadata (5/5) Darwin Core Germplasm metadata + Collecting Missions metadata = Metadata for Collecting Missions Documents
  • 19. How users will access the Repository Alfresco DMS Typo3 CMS
  • 20. Import of 27,000 PDF Files Process of import PDF files in 3 phases: 1. Conversion of institutional metadata in Darwin Core Germplasm metadata 2. Association of metadata to all PDFs files, using heterogeneous sources (databases, Excel files and filenames, etc.) 3. Batch upload of all PDF files together with metadata file associated to each file in DC- Germplasm standard.
  • 24. How users will manage and publish documents • Simple Workflow to publish into the Repository: 1. Upload the file in private user Home Space 2. Edit metadata 3. Approve the document for public repository with a click ... the file will be and public
  • 25. Summary • Improved quality of passport data for about 100,000 collected samples from 137 countries • 56,454 of these collected samples are linked to genebank accessions in 51 institutes worldwide • Collected 27,000 documents classified in 9 types of documents with metadata • Metadata extracted and parsed using Gerplasm Darwin Core standards
  • 26. Open questions and challenges - Interaction with Open Archive standards and Protocol for Metadata Harvesting - Integration with Crop Terminizer, University of Manchester - Web Analytics for monitoring of downloads in details (referrers, visits, etc.) and web marketing - CMIS protocol used to interact with content management systems - Metadata validation with crop scientists, collectors http://www.central-repository.cgiar.org/
  • 27. Guidelines for collecting samples - Being revised and will be published in a new section of the on the Crop genebank knowledge base - Adding guidelines for illustrating with photos that support the tentative taxonomy, captured data and GPS