SlideShare a Scribd company logo
1 of 21
Global Names Recognition and
Discovery (GNRD)
• High throughput, queue-based « skin » on
multiple processes of scientific name-finding
engines
– NetiNeti: Python, machine-learning-based
– TaxonFinder: Perl, dictionary-based
• Inputs: any file, URL, free-form text
– Uses Docsplit gem (Tesseract OCR as needed)
– Can send gzip request
• Outputs: JSON/xml
– Scientific names & their character offsets
– OCR text
– Resolved names
GNRD Clients & Applications
15,000 OCR’d articles, 1868 - 2002
All with DOIs
158,000 unique scientific names
92,000 vernaculars
20,000 entities
No Consistency in Search APIs
{
"totalResults": 152,
"startIndex": 1,
"itemsPerPage": 30,
"results": [
{
"id": 14349,
"title": "Ursus",
"link":
"http://eol.org/14349?action=overview&controller=taxa",
"content": "Ursus Linnaeus, 1758; Ursus; Ursus
(genus); Ursus (genus) Linnaeus, 1758; Ursus Arctos
Bruinosus"
},
{ ... },
],
"first": "http://eol.org/api/search/Ursus.json?page=1",
"self": "http://eol.org/api/search/Ursus.json?page=1",
"next": "http://eol.org/api/search/Ursus.json?page=2",
"last": "http://eol.org/api/search/Ursus.json?page=6"
}
http://eol.org/api/search/1.0.json?q=Ursus http://api.gbif.org/name_usage/search?q=Ursus
{
offset: 0,
limit: 20,
endOfRecords: false,
count: 77,
results: [
{
datasetTitle: "English Wikipedia Species Pages",
parent: "Ursidae",
kingdom: "Animalia",
phylum: "Chordata",
clazz: "Mammalia",
order: "Carnivora",
family: "Ursidae",
genus: "Ursus »,
scientificName: "Ursus",
canonicalName: "Ursus",
authorship: "",
nameType: "WELLFORMED",
rank: "GENUS",
…
Use Darwin Core Terms
OpenURL
• Created in late 1990s by a Flemish librarian
• eg v0.1
http://resolver.example.edu/cgi?genre=book
&isbn=0836218310&title=The+Far+Side+Galle
ry+3
• But no specification for response structure!!!
bibJSON
{
"title": "Open Bibliography for Science, Technology and Medicine",
"author":[
{"name": "Richard Jones"},
{"name": "Mark MacGillivray"},
{"name": "Peter Murray-Rust"},
{"name": "Jim Pitman"},
{"name": "Peter Sefton"},
{"name": "Ben O'Steen"},
{"name": "William Waites"}
],
"type": "article",
"year": "2011",
"journal": {"name": "Journal of Cheminformatics"},
"link": [{"url":"http://www.jcheminf.com/content/3/1/47"}],
"identifier": [{"type":"doi","id":"10.1186/1758-2946-3-47"}]
}
Recommendation
• Use DwC terms as query params for find or ‘q’ for
search
• Use DwC terms as keys in JSON responses
http://www.antweb.org/description.do?name=claripes%2
0orbiculatopunctatus&genus=camponotus&rank=species&
project=worldants
http://www.antweb.org/description.do?specificEpithet=cla
ripes&infraspecificEpithet=orbiculatopunctatus&genus=ca
mponotus&taxonRank=species&project=worldants
Canadensys:
Vascular Plants of Canada
(VASCAN)
Luc Brouillet, Peter Desmet, et al.
http://data.canadensys.net/vascan
http://data.canadensys.net/vascan/name/Carex%20abbreviata
http://data.canadensys.net/vascan/taxon/26512
http://doi.org/10.3897/phytokeys.25.3100
http://creativecommons.org/publicdomain/zero/1.0/
Suggestions for AntCat
• Run literature through GNRD
• Simplify web presence with concentration on
search as the entry point
– index all available content
– Present « pages » as declaration of relationships
• Use Darwin Core terms in « find » and
« search » services
• Make DwC-A, CC-0 waiver, data paper &
publish to GBIF, make accessible to GN

More Related Content

What's hot

Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningPetar Djekic
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrKai Chan
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Karel Minarik
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic searchmarkstory
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Roy Russo
 
Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用LearningTech
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practiceJano Suchal
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014Roy Russo
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBantoinegirbal
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introductionantoinegirbal
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hoodSmartCat
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNosh Petigara
 

What's hot (20)

Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Mongo indexes
Mongo indexesMongo indexes
Mongo indexes
 
Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 
04 standard class library c#
04 standard class library c#04 standard class library c#
04 standard class library c#
 
Week5
Week5Week5
Week5
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 

Viewers also liked

Have We Got the Names "Right"?
Have We Got the Names "Right"?Have We Got the Names "Right"?
Have We Got the Names "Right"?David Shorthouse
 
Paid, Earned and Owned = Converged Media
Paid, Earned and Owned = Converged MediaPaid, Earned and Owned = Converged Media
Paid, Earned and Owned = Converged MediaColin Browning
 
Canadensys - Federal Geospatial Platform
Canadensys - Federal Geospatial PlatformCanadensys - Federal Geospatial Platform
Canadensys - Federal Geospatial PlatformDavid Shorthouse
 
2008.Eol Chicago Smith And Rycroft
2008.Eol Chicago Smith And Rycroft2008.Eol Chicago Smith And Rycroft
2008.Eol Chicago Smith And RycroftDavid Shorthouse
 
Improving Drupal Taxonomy Editor
Improving Drupal Taxonomy EditorImproving Drupal Taxonomy Editor
Improving Drupal Taxonomy EditorDavid Shorthouse
 

Viewers also liked (10)

Have We Got the Names "Right"?
Have We Got the Names "Right"?Have We Got the Names "Right"?
Have We Got the Names "Right"?
 
Global Names ievobio 2012
Global Names ievobio 2012Global Names ievobio 2012
Global Names ievobio 2012
 
10minutes Roger
10minutes Roger10minutes Roger
10minutes Roger
 
Paid, Earned and Owned = Converged Media
Paid, Earned and Owned = Converged MediaPaid, Earned and Owned = Converged Media
Paid, Earned and Owned = Converged Media
 
BSC Shorthouse ESC 2011
BSC Shorthouse ESC 2011BSC Shorthouse ESC 2011
BSC Shorthouse ESC 2011
 
Eol Shorthouse
Eol ShorthouseEol Shorthouse
Eol Shorthouse
 
Canadensys - Federal Geospatial Platform
Canadensys - Federal Geospatial PlatformCanadensys - Federal Geospatial Platform
Canadensys - Federal Geospatial Platform
 
Shorthouse
ShorthouseShorthouse
Shorthouse
 
2008.Eol Chicago Smith And Rycroft
2008.Eol Chicago Smith And Rycroft2008.Eol Chicago Smith And Rycroft
2008.Eol Chicago Smith And Rycroft
 
Improving Drupal Taxonomy Editor
Improving Drupal Taxonomy EditorImproving Drupal Taxonomy Editor
Improving Drupal Taxonomy Editor
 

Similar to GlobalNames - Canadensys - Shorthouse

Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...confluent
 
iAuthor.cn: ORCID China Services and International Identifier for Researchers
iAuthor.cn: ORCID China Services and International Identifier for ResearchersiAuthor.cn: ORCID China Services and International Identifier for Researchers
iAuthor.cn: ORCID China Services and International Identifier for Researchersjianyongzhang
 
An analysis of the quality issues of the properties available in the Spanish ...
An analysis of the quality issues of the properties available in the Spanish ...An analysis of the quality issues of the properties available in the Spanish ...
An analysis of the quality issues of the properties available in the Spanish ...Nandana Mihindukulasooriya
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesTony Hammond
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersBen van Mol
 
JAX-RS JavaOne Hyderabad, India 2011
JAX-RS JavaOne Hyderabad, India 2011JAX-RS JavaOne Hyderabad, India 2011
JAX-RS JavaOne Hyderabad, India 2011Shreedhar Ganapathy
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lampermedcl
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Accelerating Genomics SNPs Processing and Interpretation with Apache Spark
Accelerating Genomics SNPs Processing and Interpretation with Apache SparkAccelerating Genomics SNPs Processing and Interpretation with Apache Spark
Accelerating Genomics SNPs Processing and Interpretation with Apache SparkDatabricks
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasMapR Technologies
 
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...OpenAIRE
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionJasonRafeMiller
 

Similar to GlobalNames - Canadensys - Shorthouse (20)

Discovering python search engine
Discovering python search engineDiscovering python search engine
Discovering python search engine
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
A hint of_mint
A hint of_mintA hint of_mint
A hint of_mint
 
iAuthor.cn: ORCID China Services and International Identifier for Researchers
iAuthor.cn: ORCID China Services and International Identifier for ResearchersiAuthor.cn: ORCID China Services and International Identifier for Researchers
iAuthor.cn: ORCID China Services and International Identifier for Researchers
 
An analysis of the quality issues of the properties available in the Spanish ...
An analysis of the quality issues of the properties available in the Spanish ...An analysis of the quality issues of the properties available in the Spanish ...
An analysis of the quality issues of the properties available in the Spanish ...
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
Discovering python search engines
Discovering python search enginesDiscovering python search engines
Discovering python search engines
 
JAX-RS JavaOne Hyderabad, India 2011
JAX-RS JavaOne Hyderabad, India 2011JAX-RS JavaOne Hyderabad, India 2011
JAX-RS JavaOne Hyderabad, India 2011
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lamper
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Accelerating Genomics SNPs Processing and Interpretation with Apache Spark
Accelerating Genomics SNPs Processing and Interpretation with Apache SparkAccelerating Genomics SNPs Processing and Interpretation with Apache Spark
Accelerating Genomics SNPs Processing and Interpretation with Apache Spark
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...
October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...
October 1 NISO Training Thursday: Using Alerting Systems to Ensure OA Policy ...
 
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
 
Publishing Linked Data using Schema.org
Publishing Linked Data using Schema.orgPublishing Linked Data using Schema.org
Publishing Linked Data using Schema.org
 

More from David Shorthouse

What do accessible occurrence data and checklists tell us about species diver...
What do accessible occurrence data and checklists tell us about species diver...What do accessible occurrence data and checklists tell us about species diver...
What do accessible occurrence data and checklists tell us about species diver...David Shorthouse
 
2014.04.01 Shorthouse REDM400
2014.04.01 Shorthouse REDM4002014.04.01 Shorthouse REDM400
2014.04.01 Shorthouse REDM400David Shorthouse
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics David Shorthouse
 
Chach Eol Drupalsprint Presentation
Chach Eol Drupalsprint PresentationChach Eol Drupalsprint Presentation
Chach Eol Drupalsprint PresentationDavid Shorthouse
 
Eol Drupal Dman Presentation
Eol   Drupal   Dman PresentationEol   Drupal   Dman Presentation
Eol Drupal Dman PresentationDavid Shorthouse
 

More from David Shorthouse (9)

What do accessible occurrence data and checklists tell us about species diver...
What do accessible occurrence data and checklists tell us about species diver...What do accessible occurrence data and checklists tell us about species diver...
What do accessible occurrence data and checklists tell us about species diver...
 
2014.07.22 shorthouse
2014.07.22   shorthouse2014.07.22   shorthouse
2014.07.22 shorthouse
 
2014.04.01 Shorthouse REDM400
2014.04.01 Shorthouse REDM4002014.04.01 Shorthouse REDM400
2014.04.01 Shorthouse REDM400
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics
 
Sperling Esc08 V Mds
Sperling Esc08 V MdsSperling Esc08 V Mds
Sperling Esc08 V Mds
 
Chach Eol Drupalsprint Presentation
Chach Eol Drupalsprint PresentationChach Eol Drupalsprint Presentation
Chach Eol Drupalsprint Presentation
 
Eol Drupal Dman Presentation
Eol   Drupal   Dman PresentationEol   Drupal   Dman Presentation
Eol Drupal Dman Presentation
 
Natcatchpoleslides
NatcatchpoleslidesNatcatchpoleslides
Natcatchpoleslides
 
Eol Matthias Hutterer
Eol Matthias HuttererEol Matthias Hutterer
Eol Matthias Hutterer
 

Recently uploaded

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 

Recently uploaded (20)

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

GlobalNames - Canadensys - Shorthouse

  • 1. Global Names Recognition and Discovery (GNRD) • High throughput, queue-based « skin » on multiple processes of scientific name-finding engines – NetiNeti: Python, machine-learning-based – TaxonFinder: Perl, dictionary-based • Inputs: any file, URL, free-form text – Uses Docsplit gem (Tesseract OCR as needed) – Can send gzip request • Outputs: JSON/xml – Scientific names & their character offsets – OCR text – Resolved names
  • 2.
  • 3. GNRD Clients & Applications
  • 4. 15,000 OCR’d articles, 1868 - 2002 All with DOIs 158,000 unique scientific names 92,000 vernaculars 20,000 entities
  • 5.
  • 6. No Consistency in Search APIs { "totalResults": 152, "startIndex": 1, "itemsPerPage": 30, "results": [ { "id": 14349, "title": "Ursus", "link": "http://eol.org/14349?action=overview&controller=taxa", "content": "Ursus Linnaeus, 1758; Ursus; Ursus (genus); Ursus (genus) Linnaeus, 1758; Ursus Arctos Bruinosus" }, { ... }, ], "first": "http://eol.org/api/search/Ursus.json?page=1", "self": "http://eol.org/api/search/Ursus.json?page=1", "next": "http://eol.org/api/search/Ursus.json?page=2", "last": "http://eol.org/api/search/Ursus.json?page=6" } http://eol.org/api/search/1.0.json?q=Ursus http://api.gbif.org/name_usage/search?q=Ursus { offset: 0, limit: 20, endOfRecords: false, count: 77, results: [ { datasetTitle: "English Wikipedia Species Pages", parent: "Ursidae", kingdom: "Animalia", phylum: "Chordata", clazz: "Mammalia", order: "Carnivora", family: "Ursidae", genus: "Ursus », scientificName: "Ursus", canonicalName: "Ursus", authorship: "", nameType: "WELLFORMED", rank: "GENUS", …
  • 8. OpenURL • Created in late 1990s by a Flemish librarian • eg v0.1 http://resolver.example.edu/cgi?genre=book &isbn=0836218310&title=The+Far+Side+Galle ry+3 • But no specification for response structure!!!
  • 9. bibJSON { "title": "Open Bibliography for Science, Technology and Medicine", "author":[ {"name": "Richard Jones"}, {"name": "Mark MacGillivray"}, {"name": "Peter Murray-Rust"}, {"name": "Jim Pitman"}, {"name": "Peter Sefton"}, {"name": "Ben O'Steen"}, {"name": "William Waites"} ], "type": "article", "year": "2011", "journal": {"name": "Journal of Cheminformatics"}, "link": [{"url":"http://www.jcheminf.com/content/3/1/47"}], "identifier": [{"type":"doi","id":"10.1186/1758-2946-3-47"}] }
  • 10. Recommendation • Use DwC terms as query params for find or ‘q’ for search • Use DwC terms as keys in JSON responses http://www.antweb.org/description.do?name=claripes%2 0orbiculatopunctatus&genus=camponotus&rank=species& project=worldants http://www.antweb.org/description.do?specificEpithet=cla ripes&infraspecificEpithet=orbiculatopunctatus&genus=ca mponotus&taxonRank=species&project=worldants
  • 11. Canadensys: Vascular Plants of Canada (VASCAN) Luc Brouillet, Peter Desmet, et al.
  • 13.
  • 14.
  • 17.
  • 19.
  • 21. Suggestions for AntCat • Run literature through GNRD • Simplify web presence with concentration on search as the entry point – index all available content – Present « pages » as declaration of relationships • Use Darwin Core terms in « find » and « search » services • Make DwC-A, CC-0 waiver, data paper & publish to GBIF, make accessible to GN