Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Expanding Access to
Biodiversity Literature
Presented by: William Ulate
Center for Biodiversity Informatics, MBG
May 26, 2...
Biodiversity Heritage Library
• a consortium of botanical and natural history
libraries
• stores digitised legacy literatu...
http://miningbiodiversity.org
http://diggingintodata.org/awards/2013
The partners
Social Media Lab
75/31/2016 Mining Biodiversity
Thanks to the sponsors:
Mining Biodiversity
• Transform BHL into a next-generation social
digital library
• A multi-disciplinary approach
– Text M...
What have we done so far?
Social Media
Visualisation
Semantic
Metadata
105/31/2016 Mining Biodiversity
Making
Biodiversity
Digital Objects
More Social
and Shareable
Follow us on Twitter: @SMLabTO
We also partnered with Altmetric to better understand who and why
people share BHL content across various social media pla...
“My Tweeps” app
mytweeps.com
Helping BHL (and other
organizations) to get daily
insights about their Twitter
followers (or...
MyTweeps.com
How are these tools in BHL being used?
What are we doing?
Social Media
Visualisation
Semantic
Metadata
165/31/2016 Mining Biodiversity
Current features
• supports keyword-based search
• species names annotated and linked to the
Encyclopedia of Life
• integr...
Keyword-based search
and Browsing
Advanced search
(also keyword-based)
5/31/2016 19Mining Biodiversity
Enhancements to BHL
What’s wrong with
keyword-based search: Polysemy
•Ambiguity!
Boxwood
historic place in
Alabama?
North American term for
pl...
What’s wrong with
keyword-based search: Polysemy
•Ambiguity!
California bay
hardwood
tree?
location?
Semantic metadata generation
• Entity types
– species
– location
– habitat
– anatomical parts
– qualities
– persons
– temp...
Examples of semantic metadata
(annotations)
• Observation
• Habitation
Examples of semantic metadata
(annotations)
• Nutrition
• Trait
How does semantic
information help?
SPECIES
California bay
hardwood tree
location
California bay
LOCATION
•Word sense disa...
What’s wrong with
keyword-based search: Synonymy
Campanula
portenschlagiana Schult.
Campanula
portenschlagiana Schult.
Cam...
What’s wrong with
keyword-based search: Synonymy
Clematis L.
Clematis L.
Clematopsis Bojer
ex Hutch.
Atragene L.
Archiclem...
How does semantic
information help?
Campanula
portenschlagiana Schult.
Campanula
portenschlagiana
Schult.
Campanula affini...
Term Inventory
• compilation of species names (flowering
plants, mammals, birds)
• acts as a thesaurus, as each name is li...
Sources we leveraged
• Names
– Encyclopedia of Life (EOL)
– Catalogue of Life
– Global Biodiversity Information Facility (...
Experiments
• Training data: all English texts from the BHL
– about 26 million pages with a size of 49GB
• Evaluation data...
Application to Query Expansion
• an interface for searching BHL documents
using a species name as a query
• query is autom...
https://www.youtube.com/watch?v=lF2ManWhljM
http://nactem10.mib.man.ac.uk/va/MiBio/Search/queryExpansion.html?prot=thumb
http://goo.gl/forms/3mO5fWd7Y4
Some Magnoliopsida species (common) names
CHOICE 1 CHOICE 2 CHOICE 3
Phaseolus multiflorus Garden pea Argemone alba
Citrus...
More Magnoliopsida species (common) names
CHOICE 1 CHOICE 2 CHOICE 3
Peucedanum ostruthium Cynoglossum sylvaticum Allamand...
And more Magnoliopsida species (common) names
CHOICE 1 CHOICE 2 CHOICE 3
Pyrus cydonia Barbados pride Prenanthes alba
Clia...
Thank you
William Ulate
Missouri Botanical Garden
william.ulate@mobot.org
Photo: W.Ulate. Corcovado National Park, Costa R...
Expanding Access to Biodiversity Literature. Mining Biodiversity.
Expanding Access to Biodiversity Literature. Mining Biodiversity.
Upcoming SlideShare
Loading in …5
×

Expanding Access to Biodiversity Literature. Mining Biodiversity.

665 views

Published on

Mining Biodiversity project introduction and advance report at the CBHL Annual Meeting, in the Cleveland Botanical Garden on May 26, 2016. Also feedback request for Semantic Search User Interface that employs Query Expansion using Term Inventory.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Expanding Access to Biodiversity Literature. Mining Biodiversity.

  1. 1. Expanding Access to Biodiversity Literature Presented by: William Ulate Center for Biodiversity Informatics, MBG May 26, 2016
  2. 2. Biodiversity Heritage Library • a consortium of botanical and natural history libraries • stores digitised legacy literature on biodiversity • currently holds 180,000 volumes = millions of pages (PDFs and OCR-generated text) • open-access 45/31/2016 Mining Biodiversity
  3. 3. http://miningbiodiversity.org
  4. 4. http://diggingintodata.org/awards/2013
  5. 5. The partners Social Media Lab 75/31/2016 Mining Biodiversity
  6. 6. Thanks to the sponsors:
  7. 7. Mining Biodiversity • Transform BHL into a next-generation social digital library • A multi-disciplinary approach – Text Mining – Machine learning – History of Science – Environmental History & Studies – Library and Information Science – Social Media 95/31/2016 Mining Biodiversity
  8. 8. What have we done so far? Social Media Visualisation Semantic Metadata 105/31/2016 Mining Biodiversity
  9. 9. Making Biodiversity Digital Objects More Social and Shareable Follow us on Twitter: @SMLabTO
  10. 10. We also partnered with Altmetric to better understand who and why people share BHL content across various social media platforms Follow us on Twitter: @SMLabTO
  11. 11. “My Tweeps” app mytweeps.com Helping BHL (and other organizations) to get daily insights about their Twitter followers (or Tweeps) and what they are interested in. We call it a "reverse" Twitter because instead of seeing tweets from people whom you follow, the app shows you tweets from people who follow you. Follow us on Twitter: @SMLabTO
  12. 12. MyTweeps.com
  13. 13. How are these tools in BHL being used?
  14. 14. What are we doing? Social Media Visualisation Semantic Metadata 165/31/2016 Mining Biodiversity
  15. 15. Current features • supports keyword-based search • species names annotated and linked to the Encyclopedia of Life • integrates automatic taxonomic name finding tools (uBio Taxonfinder / GNRDS)1 • data access through export functionalities and Web services 175/31/2016 Mining Biodiversity 1 Global Names Recognition and Discovery tools and Services (GNRDS). See http://gnrd.globalnames.org/
  16. 16. Keyword-based search and Browsing
  17. 17. Advanced search (also keyword-based) 5/31/2016 19Mining Biodiversity
  18. 18. Enhancements to BHL
  19. 19. What’s wrong with keyword-based search: Polysemy •Ambiguity! Boxwood historic place in Alabama? North American term for plants in the Buxaceae family? Box container? Boxwood for other English- speaking countries?
  20. 20. What’s wrong with keyword-based search: Polysemy •Ambiguity! California bay hardwood tree? location?
  21. 21. Semantic metadata generation • Entity types – species – location – habitat – anatomical parts – qualities – persons – temporal expressions • Association types – observation – Habitation – nutrition – trait 5/31/2016 Mining Biodiversity 23
  22. 22. Examples of semantic metadata (annotations) • Observation • Habitation
  23. 23. Examples of semantic metadata (annotations) • Nutrition • Trait
  24. 24. How does semantic information help? SPECIES California bay hardwood tree location California bay LOCATION •Word sense disambiguation
  25. 25. What’s wrong with keyword-based search: Synonymy Campanula portenschlagiana Schult. Campanula portenschlagiana Schult. Campanula affinis Rchb. ex Nyman Campanula muralis Port ex. A. DC.
  26. 26. What’s wrong with keyword-based search: Synonymy Clematis L. Clematis L. Clematopsis Bojer ex Hutch. Atragene L. Archiclematis tamura
  27. 27. How does semantic information help? Campanula portenschlagiana Schult. Campanula portenschlagiana Schult. Campanula affinis Rchb. ex Nyman Campanula muralis Port ex. A. DC. •Query expansion
  28. 28. Term Inventory • compilation of species names (flowering plants, mammals, birds) • acts as a thesaurus, as each name is linked to its synonyms as well as other semantically related names • “semantically relatedness”: defined in terms of a contextual similarity measure, computed over the entire BHL corpus
  29. 29. Sources we leveraged • Names – Encyclopedia of Life (EOL) – Catalogue of Life – Global Biodiversity Information Facility (GBIF) • Images – Encyclopedia of Life (EOL)
  30. 30. Experiments • Training data: all English texts from the BHL – about 26 million pages with a size of 49GB • Evaluation data: synonymous terms from the Catalogue of Life • Select 500 scientific names and their synonyms from the CoL • Results at top-20 Category Class #terms in CoL #terms in BHL #average synonyms in CoL Birds Aves 1140 818 2.28 Mammals Mammalia 1131 726 2.26 Plants Plantae 1141 826 2.28 Category Pre@20 Re@20 Birds 69.41% 63% Mammals 62.12% 53.84% Plants 56.17% 21.43%
  31. 31. Application to Query Expansion • an interface for searching BHL documents using a species name as a query • query is automatically expanded by retrieving synonyms/semantically related names from the term inventory • documents mentioning all of the names in the expanded query are returned
  32. 32. https://www.youtube.com/watch?v=lF2ManWhljM http://nactem10.mib.man.ac.uk/va/MiBio/Search/queryExpansion.html?prot=thumb
  33. 33. http://goo.gl/forms/3mO5fWd7Y4
  34. 34. Some Magnoliopsida species (common) names CHOICE 1 CHOICE 2 CHOICE 3 Phaseolus multiflorus Garden pea Argemone alba Citrus nobilis Sweetheart Arabis perfoliata Spergularia marina Aster pauciflorus Mimosa Canavalia ensiformis Physic nut Mung bean Chrysanthemum inodorum Guilandina bonducella Tilia parvifolia Fraxinus pubescens Arabidopsis thaliana Pulsatilla vulgaris Symphoricarpos orbiculatus Turritis glabra Medick Sorbus domestica Lespedeza reticulata Hypericum galioides Haematoxylon campechianum Scaevola lobelia Alliaria petiolata Kerria japonica Clematis indivisa Erythrina glauca Petasites officinalis Ptychotis ajowan Aster multiflorus Salix cinerea Ribes vulgare Sword bean
  35. 35. More Magnoliopsida species (common) names CHOICE 1 CHOICE 2 CHOICE 3 Peucedanum ostruthium Cynoglossum sylvaticum Allamanda schottii Windsor bean Common haricot Ranunculus pusillus Sambucus canadensis Field bean Dwarf bean Gaillardia bicolor Ipomoea nil Monniera monniera Arnica alpina Indian laburnum Eastern redbud Calocarpum mammosum Ribes americanum Lactuca alpina Polygonum maritimum Erica mediterranea Paronychia canadensis Imperatoria ostruthium Rubus lasiocarpus Melochia corchorifolia Valerianella olitoria Sonchus oleraceus Vicia hirsuta Mountain ebony Carduus lanceolatus Salix rubra Ledum groenlandicum Sida abutilon Tecoma radicans Gilia coronopifolia Corydalis canadensis Lacinaria spicata
  36. 36. And more Magnoliopsida species (common) names CHOICE 1 CHOICE 2 CHOICE 3 Pyrus cydonia Barbados pride Prenanthes alba Clianthus dampieri White lupin Yellow pea Geum intermedium Pyrus melanocarpa Erigeron canadensis Pyrola uniflora Japanese pagoda tree Epilobium hirsutum Ampelopsis engelmanni Soybean Salix pentandra Solanum nodiflorum Exogonium purga Lathyrus montanus Ribes floridum Impatiens biflora Stellaria media Orobus tuberosus Cassia marilandica Cnicus discolor Medicago maculata Melilotus indica Apium nodiflorum Glycine soja Balsam of tolu Juglans laciniosa Stellaria longifolia Salix arctica Purging cassia Echinospermum lappula Umbrella tree Potentilla pumila
  37. 37. Thank you William Ulate Missouri Botanical Garden william.ulate@mobot.org Photo: W.Ulate. Corcovado National Park, Costa Rica. 2013 This project was made possible in part by [LG-00-14-04-0032-14] Riza Batista-Navarro NaCTeM, University of Manchester riza.batista@manchester.ac.uk

×