Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Botanists and annotations: use cases and their relevance for the larger scientific community

72 views

Published on

This presentation was given by Trish Rose-Sandler and William Ulate at the iAnnotate conference in San Francisco, June 7th 2018

Published in: Science
  • Be the first to comment

  • Be the first to like this

Botanists and annotations: use cases and their relevance for the larger scientific community

  1. 1. Botanists and Annotations: use cases and their relevance for the larger scientific community William Ulate Trish Rose-Sandler Center for Biodiversity Informatics Missouri Botanical Garden Jun. 2018
  2. 2. Where do we come from? Why are we here? I Annotate Conference , Berlin (2016) The uptake of web annotation could be sufficiently moved forward by tackling three key issues: 1) interoperability 2) domain use cases 3) user centered design
  3. 3. Darwin Virtual Library (2011) Charles Darwin’s Library is a digital edition and virtual reconstruction of the surviving books owned by Charles Darwin. https://www.biodiversitylibrary.org/collection/darwinlibrary Charles Darwin’s Library is a digital edition and virtual reconstruction of the surviving books owned by Charles Darwin. In 1908, Charles Darwin’s son, Francis, transferred what he called the ‘Darwin Library’ to the Botany School at Cambridge University. ‘The chief interest of the Darwin books lies in the pencil notes scribbled on their pages, or written on scraps of paper and pinned to the last page.’ – Francis Darwin Darwin read to gather evidence, to explore and define the research possibilities of his evolutionary ideas, and to gauge reactions to his own publications. https://www.biodiversitylibrary.org/collection/darwinlibrary Charles Darwin’s Library is a digital edition and virtual reconstruction of the surviving books owned by Charles Darwin. In 1908, Charles Darwin’s son, Francis, transferred what he called the ‘Darwin Library’ to the Botany School at Cambridge University. https://www.biodiversitylibrary.org/collection/darwinlibrary Charles Darwin’s Library is a digital edition and virtual reconstruction of the surviving books owned by Charles Darwin. In 1908, Charles Darwin’s son, Francis, transferred what he called the ‘Darwin Library’ to the Botany School at Cambridge University. ‘The chief interest of the Darwin books lies in the pencil notes scribbled on their pages, or written on scraps of paper and pinned to the last page.’ – Francis Darwin https://www.biodiversitylibrary.org/collection/darwinlibrary Charles Darwin’s Library is a digital edition and virtual reconstruction of the surviving books owned by Charles Darwin. In 1908, Charles Darwin’s son, Francis, transferred what he called the ‘Darwin Library’ to the Botany School at Cambridge University. ‘The chief interest of the Darwin books lies in the pencil notes scribbled on their pages, or written on scraps of paper and pinned to the last page.’ – Francis Darwin Darwin read to gather evidence, to explore and define the research possibilities of his evolutionary ideas, and to gauge reactions to his own publications. This digital reconstruction of the Darwin Library delivers is the ability to retrace and reduplicate Darwin’s reading of a wealth of materials. https://www.biodiversitylibrary.org/collection/darwinlibrary
  4. 4. https://www.biodiversitylibrary.org/page/34074923
  5. 5. https://biodiversitylibrary.org/page/34074986
  6. 6. https://did3.jiscinvolve.org/wp/projects/mining-biodiversity/
  7. 7. Mining Biodiversity • Transform BHL into a next-generation social digital library • A multi-disciplinary approach – Text Mining – Machine learning – History of Science – Environmental History & Studies – Library and Information Science – Social Media This project was made possible in part by the Institute of Museum and Library Services [LG-00-14-04-0032-14]. http://miningbiodiversity.com
  8. 8. Enhancements to BHL
  9. 9. What’s wrong with keyword-based search: Polysemy •Ambiguity! Boxwood historic place in Alabama? North American term for plants in the Buxaceae family? California bay hardwood tree? location?
  10. 10. What’s wrong with keyword-based search: Synonymy Campanula portenschlagiana Schult. Campanula portenschlagiana Schult. Campanula affinis Rchb. ex Nyman Campanula muralis Port ex. A. DC.
  11. 11. Semantic metadata generation • Entity types – species – location – habitat – anatomical parts – qualities – persons – temporal expressions
  12. 12. Semantic metadata generation • Entity types – species – location – habitat – anatomical parts – qualities – persons – temporal expressions • Association types – observation – Habitation – nutrition – trait
  13. 13. Examples of semantic metadata (annotations) • Observation • Habitation
  14. 14. Examples of semantic metadata (annotations) • Nutrition • Trait
  15. 15. Text mining-based approach Seed documents Unlabelled documents Learn semantics Annotator/Curator Validate Feedback Annotate Search index Store Annotate
  16. 16. Validation interface
  17. 17. Enhanced document viewing Page in PDF/image format OCR-corrected text with colour-coded annotations
  18. 18. Text Annotation Use Cases Annotator Use Case: I am a contributing participant, adding or curating annotations in the Biodiversity Digital Library. Searcher Use Case: I am an user of the Biodiversity Digital Library, searching for content that is indexed by annotations Admin Use Case
  19. 19. Annotator Use Case • Add an annotation by selecting text • Conveniently select an appropriate annotation (autocomplete, dropdown menu) • “Cross out” an annotation (eg: a homonym) and toggle showing it. • Modify which text is selected and/or change the annotation term associated with my own or a pre-existing annotation. • Confirm or agree with an existing annotation. • Show measure of certainty on an annotation, either a count of how many people agree, or just “Confirmed” versus “Still in need of review” • Easily browse existing annotations in a document (using the tab or next button) • Browse annotations filtered by their status (confirmed, crossed out, review) • Find documents by annotation status. • Find documents that interest me (combine the solution above with search or filter by other document metadata (keyword, title, author, etc.)
  20. 20. Searcher Use Case • Discover annotation terms to search by (autocomplete, drop down menu, browsable tree of terms) • Navigate to locations in documents from my search (search results show truncated text found and a link to the location of the annotated text) • Download search results (several columns: annotation term; the chunk of text containing the annotation; URL to the location of the annotated text) • Search for documents containing combinations of terms • Search for combinations of terms in proximity to each other in the text. • Search for facts based on semantic combinations or relative positions of terms (eg: “Leptinotarsa” “feed on” ?) • Retrieve search results for associated terms. Asking for water bodies, should return rivers, bays, lakes, seas, etc. Asking for butterflies, should get all the Lepidoptera species.
  21. 21. Test your hypothesis with real Use Cases
  22. 22. Enhanced searching of content Faceted search Automatically generated questions Time- sensitive search
  23. 23. Search by facets Opisthoproctus soleatus reported between 1840 and 1950 filtered by Habitat, Morphology and Reproduction annotations. • Taxonomy (73) • Geography (18) • Habitat (61) • Traits (57) - Morphology(20) - Feeding (35) - Reproduction(10) • Publication (73) - Journal (21) - Author (63) -Collection (10)
  24. 24. Automatically generated questions Opisthoproctus soleatus reported between 1840 and 1950 filtered by Habitat, Morphology and Reproduction annotations. there is no strong sentiment on whether this functionality is something that is definitely useful this is very relevant to their work (50%) I can see how it can be useful but not currently (50%) Ask a question -Which species taxa are related to Opisthoproctus soleatus? - In which geographical locations can I find Opisthoproctus soleatus? - What other species are co-located with Opisthoproctus soleatus? - In which environments does Opisthoproctus soleatus live? - What other species are in the same habitat as Opisthoproctus soleatus? - What are the characteristics of Opisthoproctus soleatus? - What other species share the same characteristics of Opisthoproctus soleatus?
  25. 25. Searching by subject-verb-object Leptinotarsa feeds on ? reported between 1840 and 1950 …they can see how the graph-based visualization of results can be useful but not for their current purposes …
  26. 26. Searching for directly associated concepts I’m looking for Taxa/Geographic locations/Habitats/Traits directly associated with Eltanin reported between 1840 and 1950. this is very relevant to what they are doing (50%) it might be useful but not for their current purposes (40%) there is no strong indication of whether this feature is definitely wanted by our respondents
  27. 27. Searching for indirectly associated concepts I’m looking for Taxa indirectly associated with tarsier via Geographic locations reported between 1840 and 1950. they can see its benefits but not to what they are currently doing (50%) it will be definitely useful (26%)
  28. 28. Use Cases 1. Finding the original description (taxonomic research). 2. Finding host plants, for example (ecological research). 3. Finding illustrations and plates. 4. Finding taxon name usage instances (taxonomic treatment, nomenclatural act). 5. Capturing spelling variants (orthographic variants). 6. Marking errors on versions of OCR/transcribed text. 7. Exposing semantic metadata (as a SPARQL endpoint). 8. Being able to access through APIs search functionalities. 9. Allowing users to highlight in text (keywords). 10. Allowing users to annotate concepts if incorrectly recognized or missed.
  29. 29. Application to Query Expansion • an interface for searching documents using a species name as a query • query is automatically expanded by retrieving synonyms/semantically related names from the term inventory • documents mentioning all of the names in the expanded query are returned
  30. 30. Term Inventory • compilation of species names (flowering plants, mammals, birds) • acts as a thesaurus, as each name is linked to its synonyms as well as other semantically related names • “semantically relatedness”: defined in terms of a contextual similarity measure, computed over the entire Digital Library corpus
  31. 31. Magnoliopsida species (common) names CHOICE 1 CHOICE 2 CHOICE 3 Phaseolus multiflorus Garden pea Argemone alba Citrus nobilis Sweetheart Arabis perfoliata Spergularia marina Aster pauciflorus Mimosa Canavalia ensiformis Physic nut Mung bean Chrysanthemum inodorum Guilandina bonducella Tilia parvifolia Fraxinus pubescens Arabidopsis thaliana Pulsatilla vulgaris Symphoricarpos orbiculatus Turritis glabra Medick Sorbus domestica Lespedeza reticulata Hypericum galioides Haematoxylon campechianum Scaevola lobelia Alliaria petiolata
  32. 32. Real Use Cases "Collected by who? Zambia 1934..... Stuck again!! @KewDC“ Dr. Sandra Knapp (@SandyKnapp) Mar. 11, 2016
  33. 33. Real Use Cases
  34. 34. Real Use Cases May.25, 2016 : On the etymology of the word "elephant" and the origins of the word "tamarind", the "Indian date" Sketches of the natural history of Ceylon : with narratives and anecdotes illustrative of the habits and instincts of the mammalia, birds, reptiles, fishes, insects, etc. including a monograph of the elephant https://twitter.com/WUlate/status/734805482536198144
  35. 35. Disqus • Annotation functionality was made available as a trial within the portal from December of 2015 through June 2016 as part of the IMLS-funded Mining Biodiversity project. • A social commenting tool that allowed users to add comments to individual pages in a book and follow users and discussions about those books. • The following tasks were carried out: 1. Created Requirements document to outline the commenting tool needs and how Disqus achieved them. 2. Coordinated with Disqus development staff to determine how best to implement Disqus to meet those needs. 3. Tool built and implemented in Portal. 4. Extensive testing of the feature before launching the tool. 5. Developed User Tutorials and Outreach Content to announce the feature to the public and provide training for its use.
  36. 36. Disqus • In 6 months, 188 individual annotations were received and stored in Disqus repository. • The tool was discontinued within BHL because it was considered a proprietary tool that would not have served well as a long term scalable solution and customizations to the tool were limited and annotations were stored on Disqus and not BHL servers • The trial demonstrated a desire from users to actively engage in the annotation process within a digital library interface. • Citizen scientists and librarians were among the most active profiles in generating annotations.
  37. 37. • The International Plant Names Index (IPNI) is a database of the names and associated basic bibliographical details of seed plants, ferns and lycophytes. • Its goal is to eliminate the need for repeated reference to primary sources for basic bibliographic information about plant names. • The data are freely available and are gradually being standardized and checked. • IPNI is a dynamic resource, depending on direct contributions by all members of the botanical community. http://www.ipni.org Why Botanists?
  38. 38. Why Botanists? Botanico-Periodicum-Huntianum (1968) Worldwide bibliography of periodicals • 12,000 titles (45 languages) • title abbreviations • cross-referenced to other published abbreviations and complete titles • details of volumation and duration • and other basic bibliographic data. BPH-2 (2004) Periodicals with Botanical Content Second edition of B-P-H Alphabetical title list (1665 – 2002) 33,000 titles from around the world Agriculture, Agronomy, Bacteriology, Biology, Biotechnology, Botanical Bibliography and History, Conservation, Ecology, Environmental Science, Floriculture, Forestry, Fruit growing, Genetics and Plant breeding, Geography, Horticulture, Hydrobiology and Limnology, Immunology and Toxicology, Medical Mycology, Microbiology and Microscopy, Molecular biology, Palaeontology, Pharmacology and Pharmacognosy, Plant pathology and Vegetable crops, etc. B-P-H/Supplementum (1991) • 25,000 title entries arranged by title • key to entries in both volumes. • Citation abbreviations for all titles • improved cross-referencing • expanded thesaurus of title words and their abbreviation equivalents • included periodicals dealing with biotechnology, molecular biology, environmental studies and conservation.
  39. 39. Landscape Review New Media Consortium • Horizon Report Library Edition Few examples of adoption within Libraries Except for: • Australia’s Trove and • Europeana Sounds Project Lack of Available tools? No
  40. 40. Consumers As Creators Planning Grant From: May 2018 To: Apr 2019 #ConsumersAsCreators
  41. 41. Purpose Analyze Web annotation needs of the botanical community and develop a prototype of how those needs may be met within a digital library platform
  42. 42. Results from this project will be useful to the following audiences: • Librarians looking to improve their virtual library by enabling users to add value to their content. • Botanists who want to enhance the corpus of their digital library collection by augmenting knowledge through the annotations provided. • Developers who want to choose a tool to enable annotations in their online solutions, particularly within digital library platforms.
  43. 43. Deliverables: • Needs Analysis Report with prioritized list of annotation needs for users of a botanical virtual library. • Feasibility Study with the evaluation of four open source existing annotation tools based on their potential to address the needs identified in the Analysis Report • Proof of concept prototype installed within a virtual library to demonstrate the functional capacity of one of the evaluated tools • Outcomes Assessment with next step recommendations to propose a full-scale project adopting an annotation tool as part of a virtual library.
  44. 44. Needs analysis report Using case research approach, • Interview 10 users of a botanical virtual library from 5 separate institutions • Answers will be analyzed and classified by user type, purpose and function
  45. 45. Feasibility study Four existing annotation tools will be thoroughly evaluated against the needs analysis in order to develop a feasibility study for how they could satisfy botanists’ needs digilib
  46. 46. Proof of concept prototype RERUM will be integrated within a digital library platform as proof-of-concept
  47. 47. Outcomes assessment and next steps • Identify requisites, best practices, and further developments for research project • Identify appropriate partners
  48. 48. Interested in joining us? Contact: Trish Rose-Sandler trish.rose-sandler@mobot.org William Ulate william.ulate@mobot.org

×