Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Making Textual Information
More Accessible
Holly Miller
Florida Institute of Technology
About me
•Biochemist
•Curious about Information
•Librarian
•Informatician/Project Director/Library
Director
•Asst. Dean, S...
DOMO
Every minute:
• Facebook users share
nearly 2.5 million pieces of
content.
• Twitter users tweet nearly
300,000 times...
Image via Flickr Jean-Etienne Minh-Duy Poirrier
Scientific literature
doubles every nine years
via Nature News blog
50 million
articles
publishe
d by
2009
Jinha (2010) Learned Publishing 23:258.
Cancer - 694,372
articles in the last 5
years
Climate change – 88,565
Species extinction –
8,453
Median # of articles read...
Too much
information
Example: Species Identification
Names offer a logical
way to search for and
index content
Names are one of
biology’s
Controlled Vocabularies
How to do it?
In the past….
Georges Louis Leclerc, comte de Buffon
Histoire naturelle : générale et particulière (Oiseaux)...
FindIT - Scientific Name Recognition
Algorithm
The OCR Problem
Epitonium foliaceicostwm Orbigny
Wrinkled-ribbed Wentletrap Southeast
Florida to the Lesser Antilles.
Phyllodesmium acanthorhinum
Source: http://ab.co/1ByZcIb Photographer: Robert Bolland
Machine Learning
for Species
Identification
Reptilia and Batrachia. (1885-
1902)
by Albert C.L.G. Günther
NetiNeti
Name Extraction from Textual
Information-Name Extraction for
Taxonomic Indexing
The fluorescent sea slug
Phyllode...
Named Entity Recognition (NER)
to locate and classify atomic elements in text into predefined categories
such as the names...
Adjective noun unknown
How does NetiNeti work?
Named Entity Recognition (NER)
The fluorescent sea slug Phyllodesmium acant...
How does NetiNeti work?
• Text is tokenized (broken into chunks)
• Prefiltering step
• Probability that token is a name is...
How well does NetiNeti work?
http://gnrd.globalnames.org/
Connecting Biodiversity Literature to
EOL
Questions?
The language of birds :
London: Saunders and Otley,1837.
biodiversitylibrary.org/page/47512020
via Flickr
Thank...
Making Information More Accessible
Making Information More Accessible
Making Information More Accessible
Upcoming SlideShare
Loading in …5
×

Making Information More Accessible

250 views

Published on

This talk, given at the ORAU Data Analytics Roundtable meeting in July 2015, gives an overview of how useful natural language processing is for scientific information. The example used is finding scientific names text using NetiNeti.

Published in: Data & Analytics
  • Be the first to comment

Making Information More Accessible

  1. 1. Making Textual Information More Accessible Holly Miller Florida Institute of Technology
  2. 2. About me •Biochemist •Curious about Information •Librarian •Informatician/Project Director/Library Director •Asst. Dean, Scholarly Content & Faculty Engagement
  3. 3. DOMO Every minute: • Facebook users share nearly 2.5 million pieces of content. • Twitter users tweet nearly 300,000 times. • Email users send over 200 million messages.
  4. 4. Image via Flickr Jean-Etienne Minh-Duy Poirrier Scientific literature doubles every nine years via Nature News blog
  5. 5. 50 million articles publishe d by 2009 Jinha (2010) Learned Publishing 23:258.
  6. 6. Cancer - 694,372 articles in the last 5 years Climate change – 88,565 Species extinction – 8,453 Median # of articles read in a year – 264* 1.00 10.00 100.00 1,000.00 10,000.00 100,000.00 1,000,000.00 Cancer Climate Change Species Extinction # of articles read/year 694372 88565 8453 264 NumberofArticles *Nature News (2014) Scientists may be reaching a peak in reading habits
  7. 7. Too much information
  8. 8. Example: Species Identification Names offer a logical way to search for and index content
  9. 9. Names are one of biology’s Controlled Vocabularies
  10. 10. How to do it? In the past…. Georges Louis Leclerc, comte de Buffon Histoire naturelle : générale et particulière (Oiseaux), 1799-1808
  11. 11. FindIT - Scientific Name Recognition Algorithm
  12. 12. The OCR Problem Epitonium foliaceicostwm Orbigny Wrinkled-ribbed Wentletrap Southeast Florida to the Lesser Antilles.
  13. 13. Phyllodesmium acanthorhinum Source: http://ab.co/1ByZcIb Photographer: Robert Bolland
  14. 14. Machine Learning for Species Identification Reptilia and Batrachia. (1885- 1902) by Albert C.L.G. Günther
  15. 15. NetiNeti Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing The fluorescent sea slug Phyllodesmium acanthorhinum is more than just a pretty collection of colors: the creature bridged the gap for scientists trying to understand the relationship between sea slugs that feed on hydroids and those that dine on corals. Source: http://ab.co/1ByZcIb Photographer: Robert Bolland Akella et al. BMC Bioinformatics 2012, 13:211 http://www.biomedcentral.com/1471-2105/13/211
  16. 16. Named Entity Recognition (NER) to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations…
  17. 17. Adjective noun unknown How does NetiNeti work? Named Entity Recognition (NER) The fluorescent sea slug Phyllodesmium acanthorhinum is more than just a pretty collection of colors: Adjective noun unknown
  18. 18. How does NetiNeti work? • Text is tokenized (broken into chunks) • Prefiltering step • Probability that token is a name is calculated (structure and context) • Training (positive and negative examples) • Features (letter combinations, # of vowels, part of speech) The fluorescent sea slug Phyllodesmium acanthorhinum is more than just a pretty collection of colors: name not a name
  19. 19. How well does NetiNeti work?
  20. 20. http://gnrd.globalnames.org/
  21. 21. Connecting Biodiversity Literature to EOL
  22. 22. Questions? The language of birds : London: Saunders and Otley,1837. biodiversitylibrary.org/page/47512020 via Flickr Thank You!

×