Your SlideShare is downloading. ×
Biodiversity Informatics: An Interdisciplinary Challenge
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Biodiversity Informatics: An Interdisciplinary Challenge


Published on

"Impacto de la Informática en el Conocimiento de la Biodiversidad: Actualidad y Futuro” at Universidad Nacional de Colombia on August 12, 2011. …

"Impacto de la Informática en el Conocimiento de la Biodiversidad: Actualidad y Futuro” at Universidad Nacional de Colombia on August 12, 2011.

Published in: Technology, Education

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Government staff, scientists, researchers, land manager spend to much time looking for data and getting it into a shape that is usefulIt is too difficult for data gatherers to make their data available in a useful format.
  • BIEN: Biological information and ecology networkNCEA: Nation center for ecological analysis and sythesis
  • Transcript

    • 1. P. Bryan Heidorn
      University of Arizona and JRS Biodiversity Foundation
      8 August 2011
      Impacto de la informática en el conocimiento de la biodiversidad: actualidadyfuturo
      Universidad Nacional de Colombia and Instituto de CienciasNaturales, Bogotá
      Biodiversity Informatics: An Interdisciplinary Challenge
      NAIROBI 15th to 17th September 2010
    • 2. University of Arizona
    • 3. Biodiversity Informatics
      The development and use of information technology-based sociotechnical systems to document, understand and protect biological diversity particularly at the organismal level.
    • 4. Main Themes
      Cyberinfrastructure enabled science
      Greater reuse of data
      Mobilization of analog data
      Data integration
      Distributed collaborative research
      Citizen science
      High volume and high computation
    • 5. Cyberinfrastructure Vision
      “The anticipated growth in both the production and repurposing of digital data raises complex issues not only of scale and heterogeneity, but also of stewardship, curation and long-term access.”
      NSF Cyberinfrastructure Vision for 21st Century Discovery, Chapter 3
    • 6. Recognition of need for data curation
      “Recommendation 6: The NSF, working in partnership with collection managers and the community at large, should act to develop and mature the career path for data scientists and to ensure that the research enterprise includes a sufficient number of high-quality data scientists.”
      Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century, Recommendations
    • 7. Interagency Working Group on Digital Data
      Recognition of the importance of Information
      Recognition of the need for education
      New work roles within traditional institutions
    • 8. Dark data is the data that we know is/was there but we can’t see it.
      Hubble Space Telescope composite image "ring" of dark matter in the galaxy cluster Cl 0024+17
    • 9. Does NSF’s Data Follow the Power Law?
      I do not know but if $1 = X bytes…..
      Heidorn, P. Bryan (2008). Shedding Light on the Dark Data in the Long Tail of Science. Library Trends 57(2) Fall 2008 . Institutional Repositories: Institutional
      Repositories: Current State and Future. Edited by Sarah Sheeves and Melissa Cragin. (
    • 10. The Future is all about Data
      How do we get it?
      How do we analyze it?
      How do we disseminate it (Maps, charts tables..)?
      How do we keep it?
      Provenance, Storage, Weeding
      How do we make it sustainable?
    • 11. Data Repurposing
      From: To stand the test of time: Long-term stewardship of of digital data sets in
      science and engineering. Sept 26-27, 2006 Arlington VA
    • 12. Where is your data now?
      Is it doing good or is it sleeping or dead?
    • 13. Cyberinfrastructure Needs
    • 14. The iPlant Collaborative Cyberinfrastructure to Support the Challenges of Modern Biology
      Society for Experimental Biology, Glasgow, UK
      July 3rd, 2011
      Dan Stanzione
      Co-PI and Cyberinfrastructure Lead, iPlant Collaborative
      Deputy Director, Texas Advanced Computing Center
    • 15. What is iPlant?
      iPlant’s mission is to build the CI to support plant biology’s Grand Challenge solutions
      Grand Challenges were not defined in advance, but identified through engagement with the community
      A virtual organization with Grand Challenge teams relying on national cyberinfrastructure
      Long term focus on sustainable food supply, climate change, biofuels, ecological stability, etc
      Hundreds of participants globally… Working group members at >50 US institutions, USDA, DOE, etc.
    • 16. Brief History
      Funding by NSF – February 1st, 2008
      iPlant Kickoff Conference at CSHL – April 2008
      • ~200 participants
      • 17. Grand Challenge Workshops – Sept-Dec 2008
      • 18. CI workshop – Jan 2009
      • 19. Grand Challenge White Paper Review – March 2009
      • 20. Project Recommendations – March 2009
      • 21. Project Kickoffs – May 2009 & August 2009
      • 22. Start of software development; September 2009
      • 23. First prototypes to public: April 2010
      • 24. First release with user-driven tool integration: July 2011
    • iPlant’s Central Challenge
      To define what it means to build a lasting, community driven Cyberinfrastructure for the Grand Challenges of Plant Science, to get community buy-in of this vision, and to execute this vision.
    • 25. Steve Goff, PI
      U of Arizona
      Dan Stanzione, coPI
      Texas Advanced Computing Center
      National Science Board
      Update on Award Progress: DBI -0735191
      Directorate for Biological Sciences
      July 2011
    • 26. What iPlant Offers
    • 27. Grand Challenges in Plant Science
      To understand how DNA blueprints produce a plant’s characteristic traits and functions and to predict how traits change in response to complex environments
      Requires ability to collect, query, interpret, and model high-throughput, genome-scale data sets
      Tree of Life
      To understand evolutionary
      relationships among green plants
      Requires ability to create, display, and query information in very large phylogenetic trees
    • 28. iPlant Progress
      Science Planning (Year 1)
      Community engagement
      Grand Challenge selection
      Cyberinfrastructure Design (Year 2)
      Requirements generation
      Technology evaluations
    • 29. iPlant Progress
      Release of CI deliverables (Year 3)
      iPlant Discovery Environments and Tools
      iPlant Genotype to Phenotype Tools
      Processing and integration of high throughput data
      Modeling and visualization of phenotypic expression
      iPlant Tree of Life Tools
      Assembly, Reconciliation and Viewing
      Taxonomic Name Resolution Service
      My-Plant social networking site
      DNA Subway Tool for genome annotation / analysis
    • 30. Taxonomic Name Resolution Service
    • 31. Biodiversity: Development of new knowledge and tools to use knowledge
      Progress on digitization of the world’s billion+ museum specimens
      Distribution of digitized products through global networks (e.g. the Global Biodiversity Information Facility).
      Digitization of hundreds of millions of pages of natural history text (begun with the Biodiversity Heritage Library)
      Large online stores of information on species such as the Encyclopedia of Life
    • 32. The Biodiversity Heritage Library has 34 million pages now
      Long Citation Half-life
      Critical use for Taxonomy
      Ecology and Environmental History
      Naming for genomics and metagenomics
      Palaeontology, or, A systematic summary of extinct animals and their geological relations / by Richard Owen. Publication info:Edinburgh :A. and C. Black,1860.
    • 33. The Rubiaceae of Colombia, by Paul C. Standley. Chicago,1930.
      Chicago :Field Museum of Natural History,
    • 34. Mobilizing Data Locked on Paper
      Fine-Grained Semantic Markup of Descriptive Data for Knowledge Applications in Biodiversity Domains Hong Cui (Principal Investigator)
      The University of Arizona is awarded a grant to develop and evaluate a set of algorithms/software to help computers to read and “understand” taxonomic descriptions of plants, animals, and other living or fossil organisms. The major functions of the algorithms/software include 1) annotate large sets of text descriptions in a machine-readable way to support various knowledge applications, including producing character matrices and identification keys for various taxon groups.
    • 35. Semantic Markup System
      Training Thursday for students
    • 36. The Problem
      It is difficult to find what is already known
      Clone specimens may be stored in different museums around the world
      DNA analysis may be conducted on one but not the other
      Micrographs may be in a database
      Taxonomic treatments or revisions may exist
    • 37. Biological Science Collections (BiSciCol) Tracker
      Nairobi National Museum
      Gene Sequence
      S1: KNM
      Living Collection: Missouri Botanical Garden
      Agave sisalana
      S3: MBG
      Muséum national d'histoire naturelle
      S2: MNHN
    • 38. BiSciCol Tracker
    • 39. BiSciCol Design
      Insert new design
    • 40. NSF: Advanced Digitization of Biological Collections
      iDigBio: The National Resource for Advancing Digitization of Biological Collections
    • 41. Organization
      National Hub (~$7.5M)
      Title: A Collections Digitization Framework for the 21st Century
      PI: Lawrence Page, University of Florida
      Thematic Hub (~$2M each)
      Title: InvertNet–An Integrative Platform for Research on Environmental Change, Species Discovery and Identification
      PI: Christopher Dietrich, University of Illinois, Urbana-Champaign
      Title: Plants, Herbivores and Parasitoids: A Model System for the Study of Tri-Trophic Associations
      PI: Randall T. Schuh, American Museum of Natural History
      Title: North American Lichens and Bryophytes: Sensitive Indicators of Environmental Quality and Change
      PI (Principal Investigator): Corinna Gries, University of Wisconsin, Madison
    • 42. Virtual Organization and Collaboration
      VOSS: Next Steps in Articulating Success Factors for Distributed Collaborations. Gary Olson (Principal Investigator) Judith Olson (Co-Principal Investigator)
      Theory of Remote Collaboration. Evaluation A prototype online Collaboration Success Wizard will be developed for those engaged in collaboration or planning to collaborate to assess their strengths and weaknesses.
    • 43. Example of Virtual Community in NanoTechnology
    • 44. Three of the pioneers behind novel light-scattering techniques to detect certain early stage cancers joined an outside expert on biophotonics in a call-in program to discuss new research results that were presented in the Aug. 1, 2007, edition of Clinical Cancer Research. Richard McCourt (right), of NSF's Directorate for Biological Sciences, was the moderator.Credit: National Science Foundation
    • 45. Features of Virtual Organization
      Common Goals
      Geographic dispersal
      Distributed strengths and capabilities
      Need to multimedia collaboration
      Non-residents to be treated as insiders
      Document sharing, video and voice, workflow integration.
    • 46. Interdisciplinary and high volume data
      Cyberinfrastructure and the Dimensions in Biodiversity - Planning for Success -Madison, WI - Oct 13-15, 2010 Corinna Gries (Principal Investigator) Matthew Jones (Co-Principal Investigator)David Vieglais (Co-Principal Investigator)
      Need to make order of magnitude improvements in rate of biodiversity study with 0 increase in cash.
      Development of cyberinfrastructure (CI) supporting integrative research in biodiversity sciences.
    • 47. Cloud Computing
      Data-Intensive Science Workshops, to be held Sept. 19 to 20, 2010, Seattle, WA; and Mar 20 to 21, 2011, Washington DC
      Needed for most modeling with large data sets including climate models
      Needed for phylogenetic analysis
    • 48. Occurrence Data Sharing
      SilverLining: A highly scalable cloud-based platform for data distribution and user collaboration. David Vieglais (Principal Investigator) Eileen Lacey (Co-Principal Investigator)
      Potential for leveraging a cloud-based Platform as a Service (PaaS) for data publication to address myriad challenges currently faced by existing distributed data service architectures such as Distributed Generic Information Retrieval (DiGIR) and TDWG Access Protocol for Information Retrieval (TAPIR). Specific goals are to 1) simplify and reduce the ongoing cost of publishing data, 2) improve data quality at the source, 3) provide scalable, effective access to published data, 4) stimulate innovation by creating a simple, highly scalable platform for new applications for data interaction, and 5) develop a suite of reference applications demonstrating capacities of the new architecture.
    • 49. Agile Science
      Disaster: RAPID: Gulf Coast Oil Spill Biodiversity Tracker. A Volunteer-based Observation Network Steven Kelling (Principal Investigator)
      RAPID: Enhancement of Fishnet2 for Disaster Impact Assessment Henry Bart (Principal Investigator)
    • 50.
    • 51.
    • 52. New Validation Models
      Filtered Push: Continuous Quality Control for Distributed Collections and Other Species-Occurrence Data. James Macklin (Principal Investigator) Bertram Ludaescher (Co-Principal Investigator)
      networked solution to enable annotation of distributed biological collection data and to share assertions about their quality or usability.
    • 53. Improved collection management
      Collaborative Biodiversity Collections Computing. James Beach (Principal Investigator)
    • 54. Map of Life
      An infrastructure for integrating and advancing global species distribution knowledge
      Co-Pis: Walter Jetz (Yale)
      Rob Guralnick (CU Boulder)
    • 55. Advancing species distribution knowledge
      Species distributions
      Landcover future
      1996: GTOPO 30
      2009: SRTMV V4
      2003: GLC 2000
      2009: GlobCover
      2001:Image 2.2
      Regional models
      2006 WWF
      2005-9: expert maps
      Atlas data, surveys
      Scale (Grain)
      Knowledge Gap
      Hurlbert and Jetz (PNAS 2007)
      Jetz et al. (Conservation Biology 2008)
    • 56. Overcoming the “Wallacean shortfall”
      The “Wallacean shortfall”, i.e. the geographic bias and coarseness of our species distribution knowledge is a (the?) major impediment for biodiversity science and our understanding of global change impacts on biodiversity
      Narrowing the knowledge gap:
      Data mobilization (Museums, NGOs, GBIF)
      Focused sampling
      Model-based data integration
    • 57. Map of Life
      ‘Map of Life’ aims to build on and complement the spatial biodiversity aspects of these and other efforts. By addressing key storage, query, visualization and modeling challenges common to all, and by providing mapping and data integration services, the platform is expected to empower region- and taxon-specific efforts, freeing their resources for investment in core competencies, including quality control or specific user-community needs.
    • 58. Map of Life
      An online workbench and knowledgebase to dynamically document, annotate, integrate, validate, advance, and analyze the disparate sources of global biodiversity distribution knowledge.
    • 59.
    • 60. Display, spatially explicit WIKI
      Jetz, McPherson & Guralnick. in review
    • 61. Cougar
    • 62. Modeling Software Support
      Development of a Data Assimilation Capability Towards Ecological Forecasting in a Data-Rich Era. Yiqi Luo (Principal Investigator) S Lakshmivarahan (Co-Principal Investigator)
      Powerful eco-informatics tool that assimilate data from measurement sensor networks and to generate data products that will be useful for policy making on resource management and climate change mitigation. Ecological Platform for Assimilation of Data (EcoPAD) for data assimilation and forecasting in ecology. EcoPAD will include components of (1) core computational algorithms (e.g., ecological models) that are specifically designed to solve ecological issues, (2) a variety of optimization techniques for data assimilation, (3) various data bases that will feed into EcoPAD, and (4) diverse functions of EcoPAD
    • 63. Formalizing Location Data
      Improving GEOLocate to Better Serve Biodiversity Informatics Henry Bart (Principal Investigator) Nelson Rios (Co-Principal Investigator)
      a software tool for assigning latitude and longitude coordinates to text descriptions of locations where scientific collections were made (Georeferencing)
    • 64. Collaborative Georeferencing
    • 65. Grant Making: about $2M/yr
      Animal Tracking in South Africa
      Specimen Digitization in Ghana
      Social Value of Conservation in Peru
      Species Pages and BD Education in Costa Rica
      Niche Modeling in Brazil
      Travel Grants
      Lake Victoria Data Library Project in Tanzania, Uganda and Kenya
      Flora de Colombia en Línea
      JRS Biodiversity Foundation
    • 66. The Future is Collaboration and Data Sharing
      To bring the best data to the major problems and opportunities of our time and the future