SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Presentation DETA@SPNHC2019: Software for fast high quality transcription of digitized (herbarium) specimens
DETA is build for easy, fast and fully controlled transcription of information of: Forms, Herbarium Sheets and Entomology Collections. Due to DETA milions of Herbarium Sheets can be transcribed efficiently, in constant quality and low costs. With the DETA application organisations can work simultaneously by 1 – 200 persons or more, in different geographical locations, by different organisations or departments.
DETA is build for easy, fast and fully controlled transcription of information of: Forms, Herbarium Sheets and Entomology Collections. Due to DETA milions of Herbarium Sheets can be transcribed efficiently, in constant quality and low costs. With the DETA application organisations can work simultaneously by 1 – 200 persons or more, in different geographical locations, by different organisations or departments.
Presentation DETA@SPNHC2019: Software for fast high quality transcription of digitized (herbarium) specimens
1.
Software for Fast High Quality Transcription of
Digitized (Herbarium) Specimens
Frank Veldhuizen
www.alembo.nl
The Netherlands and Suriname
“Making the Case for Natural History Collections”
As more effort and resources are spent on digitizing
collections and making them available to an ever
expanding audience we feel it becomes more and more
important to explain what museum collections are, how
we preserve them, and most importantly why we have
these collections and why they matter.
2.
Challenges in Transcribing of
Large Collections
- Collections usually contains millions of
Herbarium sheets
- Digitizing the collections is a big effort
- Transcribing the digitized collection is
an enormous effort
- In duration
- In human effort
- In consistent quality
- Costs are significant
3.
Data Entry and Transcription Application
- SaaS application, built in angular = browser based input,
extremely light in usage of computer resources and user friendly
- High speed transcribing > 60 sheets per hour
- Multi Level Quality Control
- Utilizing existing look-up tables
- Resulting in high quality input >99% correct
- Output in all types of modern formats
- CSV
- XLS
- DBA
5.
Executed Projects
Naturalis The Netherlands
- 3.000.000 sheets transcribed
- Start in September 2013
- Finish in May 2015
- Transcription of:
- Full Taxon
- collector info: collector, number, date
- Location info: location, country,
coordinates
- 60 transcriber staff at Alembo
- Quality Control and projectmanagement by
Picturae and Naturalis
Oslo/ Trondheim
- 450.000 sheets transcribed
- Start in 2016
- Finish in 2017
- Transcription of:
- Full Taxon Genus and Species
- collector info: collector date
- Location info: location, country,
coordinates
- 30 transcriber staff at Alembo
7.
Current Projects
The Smithsonian Institute
– 1.000.000 scans (two projects)
– 700.000 covers to be transcribed
–Transcription of:
•Full Taxon
•collector info: collector, number, date
•Location info: location, country,
coordinates
– Duration 2-3 years
– 15 transcriber staff at Alembo
Australia Royal Botanic Garden Sydney
– 700.000 sheets
– Start May 2019
–Transcription of:
•Full Taxon Genus and Species
•collector info: collector date
•Location info: location, country, coordinates
– Duration 2 years
–15 transcriber staff at Alembo
8.
Workflow with multi-level two step Quality
Control
Transcribing
Quality
Control
Internal Workflow and Control Transcribers
Quality
Control
First Independent control
Accepted Batches
Rejected Batches
Feedback
Quality
Control
Database
Rejected Batches
Feedback
Accepted Batches
Approved Batches
9.
When a high level of quality input is of
the essence
DETA provides awesome quality monitoring tools:
- Multi Levels of control can be implemented
- This allows for control by independent parties, multiple organisation levels
- A specific (and random) sample size can be taken
- This allows to increase or decrease the control percentage based on
delivered quality
- Practically at the start a higher percentage of the Transcribed Herbarium
sheets are controlled and during the transcription process the level of
control can reduced.
- Control per input field is possible
- This allows to focus more on important fields
- Per person the quality can be monitored
- This allows for specific training in case of consistent errors
11.
When to utilize DETA and/or Alembo
Data Entry and Transcribing Application
- Transcribing large collections
- In a predictable time period
- When high quality is required
- Easy to use, low cost
Alembo
- When advice is welcome
- Within a defined period
- Professional transcribers
- High quality
DETA Licence
- Commercial application
- Continuous developments
- Implementation fee
- Licensefee per user per month
--- Verwerking --- Batches toewijzen (Batches kunnen ook automatisch toegewezen worden aan gebruikers indien het werd ingesteld bij de gebruikersinstellingen) Batch verwerken als een normale Operator Batch controleren als een Controleur Batch exporteren