Biotracker: Melding Human and Machine Capabilities to Document the World's Living Organisms


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Christmas bird countBioblitzesAmphibian declines
  • The United Nations has declared 2010 the International Year of Biodiversity in recognition of the importance of biological diversity and the looming biodiversity crisis. Biological diversity provides ecosystem services critical to our planet. As much as 90% of the needs of the world’s poorest people depend directly on biodiversity for food, fuel, medicine, etc. [1]. Each species represents a volume in a “living library,” as each has evolved solutions to nature’s challenges, solutions that can benefit human society. For example, the genomics revolution and half of our synthetic drugs were made possible by understanding the characteristics of particular species [2]. Yet the rate of species loss is currently 100 to 1,000 times estimates of historical extinction rates, and these rates are increasing with climate change [2]. Recent assessments indicate that, for example, nearly 25% of mammals and one-third of amphibians are endangered or threatened [3].Scientists alone cannot end the biodiversity crisis. Progress in the conservation and sustainable use of biodiversity will depend on the interface of science with both policy and the public. This is not only because the public must appreciate and understand biodiversity in order to be motivated to conserve it. There are nearly 2 million known species and potentially millions more are still undocumented. Without help, professional biologists will be unable to describe many of these species before they disappear from the planet, especially those in biodiversity-rich but economically poorer countries [4].Public participation can address the biodiversity crisis in several areas. One area is assembling existing knowledge on the 1.9 million species known to science. Doing so can accelerate the pace of research and new species description by making freely available, searchable, and re-usable the information currently in libraries or in local databases inaccessible to most of the world’s scientists. Addressing this need is the primary mission of the Encyclopedia of Life (EOL,, an international project headquartered at the Smithsonian’s National Museum of Natural History. In addition to mash-ups of existing scientific databases, we are combining a crowd-sourcing approach with expert review to achieve a high-quality central clearinghouse for species information.
  • Most citizen science . . .Is driven by scientistsIs analyzed by scientistsWorks best for charismatic speciesWorks best for simple observations or classificationRequires training – so sustained engagement desirableBioblitz is a 24-hour inventory of species in a particular location
  • So, the approach of EOL is rather different than many other sites. EOL is a giant mashup that creates pages, that are then available for curators (mostly credentialed scientists) to assess and rate, or for anybody to provide comments or tags.
  • Research QuestionsHow can we motivate users to continue to play when we are dealing with imperfect data that will sometimes provide two “correct” answers?What useful data for algorithm refinement can a game of tagging the least-similar image provide?How can data provided by novices users be employed to enhance the work of experts?ImplicationsImproving machine vision algorithms based on human performance.Minimizing the number of data sets that must be hand-verified by scientists and expertsProviding insight on what image factors provide for the best human identification of leavesProviding information on the extent that other game motivation techniques1 work for scientific identification games
  • Biotracker: Melding Human and Machine Capabilities to Document the World's Living Organisms

    1. 1. Melding human and machine capabilities to document the world’s living organisms University of Maryland TMSP series March 7, 2011
    2. 2. Project Team Arijit Biswas (CS, Doctoral student); Anne Bowser (iSchool, Masters student); Jen Hammock (EOL); Derek Hansen (iSchool); David Jacobs (CS, UMIACS); Darcy Lewis (iSchool, doctoral student); Cyndy Parr (EOL); Jenny Preece (iSchool); Dana Rotman (iSchool, Doctoral student); Erin Stewart (iSchool Masters student); Eric (CS, Undergrad student)
    3. 3. What we will talk about… • Research aims • Encyclopedia of Life (EOL) • Scientists, citizen scientists, enthusiasts • Identifying leaves: – Machine vision approach – Odd Leaf Out – Field Mission Games • Questions and Discussion
    4. 4. BioTracker system architecture Community Portal Profiles, groups, and species pages Images, Identifications, Maps, Threaded discussion information collection, clarification questions upload image answers Mobile Devices with BioTracker app Camera Internet connection Match recommendations Q&A component Biotracks map Photos, Biocaching and commentary Enthusiasts Computational Tools Image database Shape descriptors Image segmentation algorithm Image recognition algorithm Inference system user input Possible new species accuracy estimate Scientists identification and upload
    5. 5. First research question • What are the most effective strategies for motivating enthusiasts and experts to voluntarily contribute and collaborate?
    6. 6. The biodiversity crisis
    7. 7. Global collapse of commercial fisheries by 2053 The biodiversity crisis
    8. 8. A crisis in science
    9. 9. Photo credit: Cornell Univ. Photo credit: Mary Keim NA Butterfly Association Fourth of July Count Audubon Christmas Bird Count Citizen science
    10. 10. Powerful citizen science data
    11. 11. More species, less training GeocachingBioblitzes
    12. 12. Imagine an electronic page for each species of organism on Earth. The Encyclopedia of Life
    13. 13. Content providers Databases Journals LifeDesks Public contributions Curating Commenting Tagging EOL is a content curation community
    14. 14. • 100+ partner databases 700 curators/1000s contributors/46,000 members • 2.8 million pages 500 thousand pages with Creative Commons content • Over 2 million data objects and >1 million pages with links to research literature • Traffic in past year: 1.7 million unique users, 6.2 million page views EOL statistics
    15. 15. Scientists and volunteers "Scientists often have an aversion to what nonscientists say about science” (Salk, 1986) Collaboration is based on several factors: • Shared vocabulary, practices, and meanings • Mutual recognition of knowledge, competency, and prestige • Motivation to collaborate
    16. 16. Motivations for participation Participation in social activities stems from personal and collective reasons Egoism Collectivism Altruism Principalism Batson, Ahmad, Tsang, 2002
    17. 17. Pilot study – scientists’ motivational factors 0 1 2 3 4 5 Egoism Collectivism Altruism Principalism Senior Junior Other Faculty/ research position
    18. 18. Pilot study – volunteers’ motivational factors 0 1 2 3 4 5 Egoism Collectivism Altruism Primcipallism 1-3 4-5 Egoism Collectivism Altruism Principalism Years of experience
    19. 19. Second research question • How can a socially intelligent system be used to direct human effort and expertise to the most valuable collection and classification tasks?
    20. 20. Mobile devices for plant species ID • Build new digital collections • Image-based search to assist in identification • Make this available on mobile devices • Use this platform to build user communities • Collaboration with dozens of people at Columbia University, the Smithsonian NMNH, and UMD.
    21. 21. New images For Botanists: digitize 90,000+ Type Specimens at Smithsonian For EOL, people using mobile devices, highest quality images of live specimens. And for machines, images that capture leaf diversity
    22. 22. Computer Vision for species ID Use a photo to search a data set of known species. Goal is to assist the user, not make identification fully automatic. 1. Take a photo of a leaf on a plain background.
    23. 23. 2. Automatic segmentation and stem removal Segmentation relies on value and saturation of pixels, EM algorithm, domain knowledge.
    24. 24. Ipomoea lacunosa Must handle diversity of shapes Humulus japonicus
    25. 25. 3. Build shape descriptors • Inner Distance Shape Context • Multiscale histograms of curvature
    26. 26. 4. Search data set
    27. 27. System accuracy
    28. 28. Incorporating games into the Biotracker platform Using games to direct human effort and computational resources towards species identification and classification • Data Validation Games • Field Data Collection Games
    29. 29. Odd Leaf Out Using computer games for data validation and algorithm refinement
    30. 30. Odd Leaf Out Research Questions • What will make this game more fun? • What motivates users to play when the data is imperfect? • How can the game assist with algorithm improvement?
    31. 31. Odd Leaf Out Next Steps • Continue User Testing • Analyze Game Play Logs and Surveys • Preferred version • What aspects give most accurate data • Does this provide useful feedback into LeafSnap algorithm • Place game on Mechanical Turk for additional data
    32. 32. Biotracker field missions Inspirations Geocaching Letterboxing BioBlitz SFZero Project Noah Smart Phone as Data Collection Tool Biotracker Missions Developing mobile-social games that motivate citizens to collect and validate useful scientific data
    33. 33. Biotracker field missions Low fidelity prototypes Field testing at UMD Next steps - prototyping and user testing
    34. 34. Questions and Discussion