SlideShare a Scribd company logo
current

  BHL



  ubio.org
future
process

           disambiguate /
identify                    ID lookup
              reconcile
process

                   disambiguate /
  identify                               ID lookup
                      reconcile


Mature & scalable, well defined and standardized
process

                    disambiguate /
identify                                  ID lookup
                       reconcile


           in progress, needs API & standard
process

                  disambiguate /
identify                                  ID lookup
                     reconcile


           GNI has API, needs standards
current response
<entity>
	 	 <nameString>Abietineae</nameString>
	 	 <namebankID>8401003</namebankID>
	 	 <weblinks>
	 	 	 <website>
	 	 	 	 <title>Tropicos</title>
	 	 	 	 <link>http://mobot.mobot.org/W3T/Search/vast.html</link>
	 	 	 	 <logo>http://names.ubio.org/tools/image/tropicos.png</logo>
	 	 	 	 <links>
	 	 	 	 	 <link nameString="Abietineae Eichler">http://
mobot.mobot.org/cgi-bin/search_vast?onda=N50205444</link>
	 	 	 	 </links>
	 	 	 </website>
	 	 </weblinks>
	 </entity>
issues
the TF API is doing jobs it shouldn’t do..

Namebank is a large but outdated dataset

“taxonfinder” has no idea what a namebank ID actually is, it only knows strings

current code is completely dependent on www.ubio.org and is not scalable
why change?
scaling - we can run 10,000 taxonfinding processes using any algorithm
that supports the standard. Super fast indexing of BHL

future-proofing for devs - any new namefinding tool can take advantage
of the API and doesn’t need to write a webservice or API of it’s own

future-proofing for BHL - any new namefinding tool can be added with
one parameter
(&client=taxonfinder | &client=neti)

reliability - existing TF API goes down when Rod runs a screen scraping
tool on ubio.org.
new API spec
API specs
Request
input (string)
type (text , url)
format (xml=default, json)
Response
XML Response
A response example that corresponds to the xml schema:
<names xmlns="http://globalnames.org/namefinder" xmlns:dwc="http://rs.tdwg.org/dwc/terms/">
  <name>
    <verbatim>T. rotundata</verbatim>
    <dwc:scientificName>Tillandsia rotundata</dwc:scientificName>
    <!--   0-100   -->
    <score>100</score>
    <offset start="4550" end="4573" />
  </name>
</names>
New API
you give us text, we give you strings and offsets. This is the limit of
what a “namefinding” tool can and should do

separately you also need IDs.. Namebank, EOL, tropicos, gn*, GBIF...

once you know Mus musculus is EOL ID “9872332” you don’t need to know
that again. If a book on mice has 40,000 instances of Mus musculus, you
need to know where they are, but not the NameBank ID 40,000 times..
(this is a scaling problem..)



Where do we get these? GNI has 19.3m names & IDs.
issues

misspellings etc need to be “reconciled”

this definitely isn’t the job of a name finding tool
next?
      we could make a tool that hacks together IDs and names..
                ... but that’s not dev time well spent

we could participate in a process to check off the latter two categories
            of the name finding -> ID resolution process
                             ... yes we can


                  Let’s make a spec, build some APIs.


                    silver lining - we can start now

More Related Content

Viewers also liked

Dog Breeds
Dog BreedsDog Breeds
Dog Breedshounds30
 
Devops @ Woods Hole Informatics talks
Devops @ Woods Hole Informatics talksDevops @ Woods Hole Informatics talks
Devops @ Woods Hole Informatics talks
Anthony Goddard
 
Cu00927 c gestion excepciones java try catch finally ejemplos ejercicios
Cu00927 c gestion excepciones java try catch finally ejemplos ejerciciosCu00927 c gestion excepciones java try catch finally ejemplos ejercicios
Cu00927 c gestion excepciones java try catch finally ejemplos ejercicios
Uniminuto - San Francisco
 
Formulas en excel
Formulas en excelFormulas en excel
Formulas en excel
Uniminuto - San Francisco
 
Woodpeckers
WoodpeckersWoodpeckers
Woodpeckershgbaize
 
Presentation about the Master of Science: Communication Technologies, Systems...
Presentation about the Master of Science: Communication Technologies, Systems...Presentation about the Master of Science: Communication Technologies, Systems...
Presentation about the Master of Science: Communication Technologies, Systems...
Escuela Técnica Superior de Ingenieros de Telecomunicación - UPV - Valencia
 

Viewers also liked (10)

Dog Breeds
Dog BreedsDog Breeds
Dog Breeds
 
Devops @ Woods Hole Informatics talks
Devops @ Woods Hole Informatics talksDevops @ Woods Hole Informatics talks
Devops @ Woods Hole Informatics talks
 
Cu00927 c gestion excepciones java try catch finally ejemplos ejercicios
Cu00927 c gestion excepciones java try catch finally ejemplos ejerciciosCu00927 c gestion excepciones java try catch finally ejemplos ejercicios
Cu00927 c gestion excepciones java try catch finally ejemplos ejercicios
 
Formulas en excel
Formulas en excelFormulas en excel
Formulas en excel
 
Woodpeckers
WoodpeckersWoodpeckers
Woodpeckers
 
Presentation about the Master of Science: Communication Technologies, Systems...
Presentation about the Master of Science: Communication Technologies, Systems...Presentation about the Master of Science: Communication Technologies, Systems...
Presentation about the Master of Science: Communication Technologies, Systems...
 
Aforismos
AforismosAforismos
Aforismos
 
Modulando nuestro oscilador_de_radiofrecuencia
Modulando nuestro oscilador_de_radiofrecuenciaModulando nuestro oscilador_de_radiofrecuencia
Modulando nuestro oscilador_de_radiofrecuencia
 
Oscilador de radiofrecuencia
Oscilador de radiofrecuenciaOscilador de radiofrecuencia
Oscilador de radiofrecuencia
 
Practicando morse con_nuestro_oscilador_de_radiofrecuencia
Practicando morse con_nuestro_oscilador_de_radiofrecuenciaPracticando morse con_nuestro_oscilador_de_radiofrecuencia
Practicando morse con_nuestro_oscilador_de_radiofrecuencia
 

Similar to Scaling Namefinding

Get your Hero Groove On - Heroes Reborn
Get your Hero Groove On - Heroes RebornGet your Hero Groove On - Heroes Reborn
Get your Hero Groove On - Heroes Reborn
Caleb Jenkins
 
A tale of two proxies
A tale of two proxiesA tale of two proxies
A tale of two proxies
SensePost
 
Persistently identifying website content
Persistently identifying website contentPersistently identifying website content
Persistently identifying website content
Andy Powell
 
SADI SWSIP '09 'cause you can't always GET what you want!
SADI SWSIP '09  'cause you can't always GET what you want!SADI SWSIP '09  'cause you can't always GET what you want!
SADI SWSIP '09 'cause you can't always GET what you want!
Mark Wilkinson
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
confluent
 
Building Event-driven Serverless Applications
Building Event-driven Serverless ApplicationsBuilding Event-driven Serverless Applications
Building Event-driven Serverless Applications
Amazon Web Services
 
Backend as a Service
Backend as a ServiceBackend as a Service
Backend as a Service
apiomat
 
Implementing Authorization
Implementing AuthorizationImplementing Authorization
Implementing Authorization
Torin Sandall
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014
openi_ict
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards   25-6-2014API Athens Meetup - API standards   25-6-2014
API Athens Meetup - API standards 25-6-2014Michael Petychakis
 
Using Semantics to personalize medical research
Using Semantics to personalize medical researchUsing Semantics to personalize medical research
Using Semantics to personalize medical research
Mark Wilkinson
 
Yahoo for the Masses
Yahoo for the MassesYahoo for the Masses
Yahoo for the Masses
Christian Heilmann
 
Open Source Information Gathering Brucon Edition
Open Source Information Gathering Brucon EditionOpen Source Information Gathering Brucon Edition
Open Source Information Gathering Brucon Edition
Chris Gates
 
Alitora Innovation Networks
Alitora Innovation NetworksAlitora Innovation Networks
Alitora Innovation Networks
alitora
 
Advanced Web Development
Advanced Web DevelopmentAdvanced Web Development
Advanced Web Development
Robert J. Stein
 
Prophet - Beijing Perl Workshop
Prophet - Beijing Perl WorkshopProphet - Beijing Perl Workshop
Prophet - Beijing Perl WorkshopJesse Vincent
 
API's - Successes to Replicate. Pitfalls to Avoid.
API's - Successes to Replicate. Pitfalls to Avoid.API's - Successes to Replicate. Pitfalls to Avoid.
API's - Successes to Replicate. Pitfalls to Avoid.
Inman News
 
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
Nordic APIs
 
2012 03 27_philly_jug_rewrite_static
2012 03 27_philly_jug_rewrite_static2012 03 27_philly_jug_rewrite_static
2012 03 27_philly_jug_rewrite_staticLincoln III
 

Similar to Scaling Namefinding (20)

Get your Hero Groove On - Heroes Reborn
Get your Hero Groove On - Heroes RebornGet your Hero Groove On - Heroes Reborn
Get your Hero Groove On - Heroes Reborn
 
A tale of two proxies
A tale of two proxiesA tale of two proxies
A tale of two proxies
 
Persistently identifying website content
Persistently identifying website contentPersistently identifying website content
Persistently identifying website content
 
SADI SWSIP '09 'cause you can't always GET what you want!
SADI SWSIP '09  'cause you can't always GET what you want!SADI SWSIP '09  'cause you can't always GET what you want!
SADI SWSIP '09 'cause you can't always GET what you want!
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Building Event-driven Serverless Applications
Building Event-driven Serverless ApplicationsBuilding Event-driven Serverless Applications
Building Event-driven Serverless Applications
 
Backend as a Service
Backend as a ServiceBackend as a Service
Backend as a Service
 
Implementing Authorization
Implementing AuthorizationImplementing Authorization
Implementing Authorization
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards   25-6-2014API Athens Meetup - API standards   25-6-2014
API Athens Meetup - API standards 25-6-2014
 
Using Semantics to personalize medical research
Using Semantics to personalize medical researchUsing Semantics to personalize medical research
Using Semantics to personalize medical research
 
Yahoo for the Masses
Yahoo for the MassesYahoo for the Masses
Yahoo for the Masses
 
Open Source Information Gathering Brucon Edition
Open Source Information Gathering Brucon EditionOpen Source Information Gathering Brucon Edition
Open Source Information Gathering Brucon Edition
 
Walter api
Walter apiWalter api
Walter api
 
Alitora Innovation Networks
Alitora Innovation NetworksAlitora Innovation Networks
Alitora Innovation Networks
 
Advanced Web Development
Advanced Web DevelopmentAdvanced Web Development
Advanced Web Development
 
Prophet - Beijing Perl Workshop
Prophet - Beijing Perl WorkshopProphet - Beijing Perl Workshop
Prophet - Beijing Perl Workshop
 
API's - Successes to Replicate. Pitfalls to Avoid.
API's - Successes to Replicate. Pitfalls to Avoid.API's - Successes to Replicate. Pitfalls to Avoid.
API's - Successes to Replicate. Pitfalls to Avoid.
 
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
 
2012 03 27_philly_jug_rewrite_static
2012 03 27_philly_jug_rewrite_static2012 03 27_philly_jug_rewrite_static
2012 03 27_philly_jug_rewrite_static
 

Recently uploaded

GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 

Recently uploaded (20)

GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 

Scaling Namefinding

  • 1. current BHL ubio.org
  • 3. process disambiguate / identify ID lookup reconcile
  • 4. process disambiguate / identify ID lookup reconcile Mature & scalable, well defined and standardized
  • 5. process disambiguate / identify ID lookup reconcile in progress, needs API & standard
  • 6. process disambiguate / identify ID lookup reconcile GNI has API, needs standards
  • 7. current response <entity> <nameString>Abietineae</nameString> <namebankID>8401003</namebankID> <weblinks> <website> <title>Tropicos</title> <link>http://mobot.mobot.org/W3T/Search/vast.html</link> <logo>http://names.ubio.org/tools/image/tropicos.png</logo> <links> <link nameString="Abietineae Eichler">http:// mobot.mobot.org/cgi-bin/search_vast?onda=N50205444</link> </links> </website> </weblinks> </entity>
  • 8. issues the TF API is doing jobs it shouldn’t do.. Namebank is a large but outdated dataset “taxonfinder” has no idea what a namebank ID actually is, it only knows strings current code is completely dependent on www.ubio.org and is not scalable
  • 9. why change? scaling - we can run 10,000 taxonfinding processes using any algorithm that supports the standard. Super fast indexing of BHL future-proofing for devs - any new namefinding tool can take advantage of the API and doesn’t need to write a webservice or API of it’s own future-proofing for BHL - any new namefinding tool can be added with one parameter (&client=taxonfinder | &client=neti) reliability - existing TF API goes down when Rod runs a screen scraping tool on ubio.org.
  • 10. new API spec API specs Request input (string) type (text , url) format (xml=default, json) Response XML Response A response example that corresponds to the xml schema: <names xmlns="http://globalnames.org/namefinder" xmlns:dwc="http://rs.tdwg.org/dwc/terms/">   <name>     <verbatim>T. rotundata</verbatim>     <dwc:scientificName>Tillandsia rotundata</dwc:scientificName>     <!--   0-100   -->     <score>100</score>     <offset start="4550" end="4573" />   </name> </names>
  • 11. New API you give us text, we give you strings and offsets. This is the limit of what a “namefinding” tool can and should do separately you also need IDs.. Namebank, EOL, tropicos, gn*, GBIF... once you know Mus musculus is EOL ID “9872332” you don’t need to know that again. If a book on mice has 40,000 instances of Mus musculus, you need to know where they are, but not the NameBank ID 40,000 times.. (this is a scaling problem..) Where do we get these? GNI has 19.3m names & IDs.
  • 12. issues misspellings etc need to be “reconciled” this definitely isn’t the job of a name finding tool
  • 13. next? we could make a tool that hacks together IDs and names.. ... but that’s not dev time well spent we could participate in a process to check off the latter two categories of the name finding -> ID resolution process ... yes we can Let’s make a spec, build some APIs. silver lining - we can start now

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n