SlideShare a Scribd company logo
1 of 22
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments  Chris Freeland Technical Director, BHL Director of Bioinformatics,  Missouri Botanical Garden
Goals of BHL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://www.biodiversitylibrary.org
BHL Institutions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Now Online Only 290 million to go! See you in 2048!
Scanning Operations ,[object Object],[object Object],[object Object],[object Object],[object Object],Locations of BHL/IA Scanning Centers
Complexities of distributed, mass scanning from NYBG from Smithsonian
Open Access Data The snakes of Australia ; an illustrated and descriptive catalogue of all the known species. By Gerard Krefft...   Publisher: Sydney,T. Richards, Government Printer,1869.  PDF OCR XML JP2
Name Finding via  TaxonFinder
Raw Image Converted to text via OCR Name finding via TaxonFinder Extract names Submit to NameBank SOAP response Name Finding in action with Taxonomic Intelligence…
Name Finding Stats to date * ,[object Object],[object Object],[object Object],[object Object],*19 October 2008
 
 
APIs & Data Sharing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Name Finding Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],See Poster in hall
Characteristics of sample = 86.91% 2610 Total Number of Unique Names 3003 Total Number of Names 7.7 Average Number of Names per Page 446.8 Average Number of Words per Page 392 Number of Pages
OCR error rate  for names only Top OCR errors Of the 3,003 names, 1,056 were incorrectly transcribed by OCR. e->o 14 c->e 7 h->ii 13 i->l 6 h->l 12 u->n 5 u->ii 11 u->I 4 r->i 10 e->c 3 l->i 9 Omit Space 2 n->v 8 Insert Space 1 35.16%
Performances of algorithms TaxonFinder FAT Excluding names with OCR errors Including names with OCR errors 28.20% 40.32% Precision 23.34% 36.62% Recall 25.77% 38.47% F-score 32.25% 43.77% Precision 17.21% 25.82% Recall 24.73% 34.80% F-score
Considerations ,[object Object],[object Object],[object Object],[object Object],[object Object]
Recommendations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Up next: BHL Article Repository ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
And if that wasn’t enough… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Contact ,[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...datacite
 
Creating Incentives
Creating IncentivesCreating Incentives
Creating Incentivesdatacite
 
Lines of Communication: Effectively Advocating Open Access Repositories
Lines of Communication: Effectively Advocating Open Access RepositoriesLines of Communication: Effectively Advocating Open Access Repositories
Lines of Communication: Effectively Advocating Open Access RepositoriesGaz Johnson
 
NGB Documenation System SESTO (4 February 2004)
NGB Documenation System SESTO (4 February 2004)NGB Documenation System SESTO (4 February 2004)
NGB Documenation System SESTO (4 February 2004)Dag Endresen
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Juan Sequeda
 
Data Journalism - Cleaning Data
Data Journalism - Cleaning DataData Journalism - Cleaning Data
Data Journalism - Cleaning DataBahareh Heravi
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
DataCite at APE 2011
DataCite at APE 2011DataCite at APE 2011
DataCite at APE 2011datacite
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century ResearchRoss Mounce
 
SciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro SlidesSciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro SlidesJenny Molloy
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)dgarijo
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publicationsdgarijo
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataRoberto García
 

What's hot (20)

2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
 
Creating Incentives
Creating IncentivesCreating Incentives
Creating Incentives
 
Lines of Communication: Effectively Advocating Open Access Repositories
Lines of Communication: Effectively Advocating Open Access RepositoriesLines of Communication: Effectively Advocating Open Access Repositories
Lines of Communication: Effectively Advocating Open Access Repositories
 
NGB Documenation System SESTO (4 February 2004)
NGB Documenation System SESTO (4 February 2004)NGB Documenation System SESTO (4 February 2004)
NGB Documenation System SESTO (4 February 2004)
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010
 
Data Journalism - Cleaning Data
Data Journalism - Cleaning DataData Journalism - Cleaning Data
Data Journalism - Cleaning Data
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
DataCite at APE 2011
DataCite at APE 2011DataCite at APE 2011
DataCite at APE 2011
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century Research
 
Friday talk 11.02.2011
Friday talk 11.02.2011Friday talk 11.02.2011
Friday talk 11.02.2011
 
SciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro SlidesSciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro Slides
 
Data analytics courses
Data analytics coursesData analytics courses
Data analytics courses
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Materials informatics
Materials informaticsMaterials informatics
Materials informatics
 
Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publications
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial data
 
Sql can be cool again
Sql can be cool againSql can be cool again
Sql can be cool again
 

Similar to An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments

BHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-AustraliaBHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-AustraliaChris Freeland
 
Digitization and enhancement of biodiversity literature through OCR, scientif...
Digitization and enhancement of biodiversity literature through OCR, scientif...Digitization and enhancement of biodiversity literature through OCR, scientif...
Digitization and enhancement of biodiversity literature through OCR, scientif...Chris Freeland
 
OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...OCLC Research
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeEdward Baker
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessdatacite
 
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014William Ulate
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
Global Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryGlobal Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryMartin Kalfatovic
 
Next Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 CalhounNext Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 CalhounKaren S Calhoun
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Getaneh Alemu
 
Scratchpads introductory presentation 45mins
Scratchpads introductory presentation   45minsScratchpads introductory presentation   45mins
Scratchpads introductory presentation 45minsDimitrios Koureas
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your DataDuncan Hull
 
Training daypresentation
Training daypresentationTraining daypresentation
Training daypresentationAmy Fry
 
OhioLINK ERM Forum: The Front End
OhioLINK ERM Forum: The Front EndOhioLINK ERM Forum: The Front End
OhioLINK ERM Forum: The Front EndAmy Fry
 
Digital Libraries for Science: Botanicus and the Biodiversity Heritage Library
Digital Libraries for Science: Botanicus and the Biodiversity Heritage LibraryDigital Libraries for Science: Botanicus and the Biodiversity Heritage Library
Digital Libraries for Science: Botanicus and the Biodiversity Heritage LibraryChris Freeland
 

Similar to An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments (20)

BHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-AustraliaBHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-Australia
 
Digitization and enhancement of biodiversity literature through OCR, scientif...
Digitization and enhancement of biodiversity literature through OCR, scientif...Digitization and enhancement of biodiversity literature through OCR, scientif...
Digitization and enhancement of biodiversity literature through OCR, scientif...
 
OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
Global Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryGlobal Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage Library
 
Next Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 CalhounNext Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 Calhoun
 
TIDSR
TIDSRTIDSR
TIDSR
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)
 
Scratchpads introductory presentation 45mins
Scratchpads introductory presentation   45minsScratchpads introductory presentation   45mins
Scratchpads introductory presentation 45mins
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your Data
 
Training daypresentation
Training daypresentationTraining daypresentation
Training daypresentation
 
OhioLINK ERM Forum: The Front End
OhioLINK ERM Forum: The Front EndOhioLINK ERM Forum: The Front End
OhioLINK ERM Forum: The Front End
 
Digital Libraries for Science: Botanicus and the Biodiversity Heritage Library
Digital Libraries for Science: Botanicus and the Biodiversity Heritage LibraryDigital Libraries for Science: Botanicus and the Biodiversity Heritage Library
Digital Libraries for Science: Botanicus and the Biodiversity Heritage Library
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
 

More from Chris Freeland

From Eames & Young to Pruitt-Igoe
From Eames & Young to Pruitt-IgoeFrom Eames & Young to Pruitt-Igoe
From Eames & Young to Pruitt-IgoeChris Freeland
 
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...Chris Freeland
 
Building the Missouri Hub for DPLA
Building the Missouri Hub for DPLABuilding the Missouri Hub for DPLA
Building the Missouri Hub for DPLAChris Freeland
 
Documenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repositoryDocumenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repositoryChris Freeland
 
Newman Numismatic Portal Overview - Mar 2015
Newman Numismatic Portal Overview - Mar 2015Newman Numismatic Portal Overview - Mar 2015
Newman Numismatic Portal Overview - Mar 2015Chris Freeland
 
Establishing the Missouri Hub: A Service Hub for DPLA
Establishing the Missouri Hub: A Service Hub for DPLAEstablishing the Missouri Hub: A Service Hub for DPLA
Establishing the Missouri Hub: A Service Hub for DPLAChris Freeland
 
Organizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in MissouriOrganizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in MissouriChris Freeland
 
Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Chris Freeland
 
Built Works Registry: Geocoding Biodiversity Heritage Library
Built Works Registry: Geocoding Biodiversity Heritage LibraryBuilt Works Registry: Geocoding Biodiversity Heritage Library
Built Works Registry: Geocoding Biodiversity Heritage LibraryChris Freeland
 
A Digitization Primer for Botanical and Horticultural Librarians
A Digitization Primer for Botanical and Horticultural LibrariansA Digitization Primer for Botanical and Horticultural Librarians
A Digitization Primer for Botanical and Horticultural LibrariansChris Freeland
 
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives Mainstreaming Digital Imaging: Missouri Botanical Garden Archives
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives Chris Freeland
 
MBG Rare Book Digitization Project (2003)
MBG Rare Book Digitization Project (2003)MBG Rare Book Digitization Project (2003)
MBG Rare Book Digitization Project (2003)Chris Freeland
 
BHL: Your 24hr Library
BHL: Your 24hr LibraryBHL: Your 24hr Library
BHL: Your 24hr LibraryChris Freeland
 
Seeding links from Wikipedia to BHL (2008 - 2012)
Seeding links from Wikipedia to BHL (2008 - 2012)Seeding links from Wikipedia to BHL (2008 - 2012)
Seeding links from Wikipedia to BHL (2008 - 2012)Chris Freeland
 
BHL: Assigning DOIs & Other Identifiers to Legacy Literature
BHL: Assigning DOIs & Other Identifiers to Legacy LiteratureBHL: Assigning DOIs & Other Identifiers to Legacy Literature
BHL: Assigning DOIs & Other Identifiers to Legacy LiteratureChris Freeland
 
Life & Literature Future Framing for BHL
Life & Literature Future Framing for BHLLife & Literature Future Framing for BHL
Life & Literature Future Framing for BHLChris Freeland
 
Approaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataApproaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataChris Freeland
 
Scribbles & Scraps: Darwin’s Library & Annotated Literature
Scribbles & Scraps: Darwin’s Library & Annotated LiteratureScribbles & Scraps: Darwin’s Library & Annotated Literature
Scribbles & Scraps: Darwin’s Library & Annotated LiteratureChris Freeland
 
Plant Name Services Using Tropicos
Plant Name Services Using TropicosPlant Name Services Using Tropicos
Plant Name Services Using TropicosChris Freeland
 

More from Chris Freeland (20)

From Eames & Young to Pruitt-Igoe
From Eames & Young to Pruitt-IgoeFrom Eames & Young to Pruitt-Igoe
From Eames & Young to Pruitt-Igoe
 
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
 
Building the Missouri Hub for DPLA
Building the Missouri Hub for DPLABuilding the Missouri Hub for DPLA
Building the Missouri Hub for DPLA
 
Documenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repositoryDocumenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repository
 
Newman Numismatic Portal Overview - Mar 2015
Newman Numismatic Portal Overview - Mar 2015Newman Numismatic Portal Overview - Mar 2015
Newman Numismatic Portal Overview - Mar 2015
 
Establishing the Missouri Hub: A Service Hub for DPLA
Establishing the Missouri Hub: A Service Hub for DPLAEstablishing the Missouri Hub: A Service Hub for DPLA
Establishing the Missouri Hub: A Service Hub for DPLA
 
Organizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in MissouriOrganizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in Missouri
 
Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...
 
Built Works Registry: Geocoding Biodiversity Heritage Library
Built Works Registry: Geocoding Biodiversity Heritage LibraryBuilt Works Registry: Geocoding Biodiversity Heritage Library
Built Works Registry: Geocoding Biodiversity Heritage Library
 
A Digitization Primer for Botanical and Horticultural Librarians
A Digitization Primer for Botanical and Horticultural LibrariansA Digitization Primer for Botanical and Horticultural Librarians
A Digitization Primer for Botanical and Horticultural Librarians
 
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives Mainstreaming Digital Imaging: Missouri Botanical Garden Archives
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives
 
MBG Rare Book Digitization Project (2003)
MBG Rare Book Digitization Project (2003)MBG Rare Book Digitization Project (2003)
MBG Rare Book Digitization Project (2003)
 
BHL: Your 24hr Library
BHL: Your 24hr LibraryBHL: Your 24hr Library
BHL: Your 24hr Library
 
Seeding links from Wikipedia to BHL (2008 - 2012)
Seeding links from Wikipedia to BHL (2008 - 2012)Seeding links from Wikipedia to BHL (2008 - 2012)
Seeding links from Wikipedia to BHL (2008 - 2012)
 
BHL: Assigning DOIs & Other Identifiers to Legacy Literature
BHL: Assigning DOIs & Other Identifiers to Legacy LiteratureBHL: Assigning DOIs & Other Identifiers to Legacy Literature
BHL: Assigning DOIs & Other Identifiers to Legacy Literature
 
Global BHL Activities
Global BHL ActivitiesGlobal BHL Activities
Global BHL Activities
 
Life & Literature Future Framing for BHL
Life & Literature Future Framing for BHLLife & Literature Future Framing for BHL
Life & Literature Future Framing for BHL
 
Approaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataApproaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic data
 
Scribbles & Scraps: Darwin’s Library & Annotated Literature
Scribbles & Scraps: Darwin’s Library & Annotated LiteratureScribbles & Scraps: Darwin’s Library & Annotated Literature
Scribbles & Scraps: Darwin’s Library & Annotated Literature
 
Plant Name Services Using Tropicos
Plant Name Services Using TropicosPlant Name Services Using Tropicos
Plant Name Services Using Tropicos
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments

  • 1. An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments Chris Freeland Technical Director, BHL Director of Bioinformatics, Missouri Botanical Garden
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Complexities of distributed, mass scanning from NYBG from Smithsonian
  • 7. Open Access Data The snakes of Australia ; an illustrated and descriptive catalogue of all the known species. By Gerard Krefft... Publisher: Sydney,T. Richards, Government Printer,1869. PDF OCR XML JP2
  • 8. Name Finding via TaxonFinder
  • 9. Raw Image Converted to text via OCR Name finding via TaxonFinder Extract names Submit to NameBank SOAP response Name Finding in action with Taxonomic Intelligence…
  • 10.
  • 11.  
  • 12.  
  • 13.
  • 14.
  • 15. Characteristics of sample = 86.91% 2610 Total Number of Unique Names 3003 Total Number of Names 7.7 Average Number of Names per Page 446.8 Average Number of Words per Page 392 Number of Pages
  • 16. OCR error rate for names only Top OCR errors Of the 3,003 names, 1,056 were incorrectly transcribed by OCR. e->o 14 c->e 7 h->ii 13 i->l 6 h->l 12 u->n 5 u->ii 11 u->I 4 r->i 10 e->c 3 l->i 9 Omit Space 2 n->v 8 Insert Space 1 35.16%
  • 17. Performances of algorithms TaxonFinder FAT Excluding names with OCR errors Including names with OCR errors 28.20% 40.32% Precision 23.34% 36.62% Recall 25.77% 38.47% F-score 32.25% 43.77% Precision 17.21% 25.82% Recall 24.73% 34.80% F-score
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.