Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

912 views

Published on

Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator

The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.

Published in: Education
  • Be the first to comment

  • Be the first to like this

NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

  1. 1. Data Observation Network for Earth(DataONE): Supporting Scientific DataPreservation, Discovery, and InnovationBill MichenerProfessor and DataONE Project DirectorUniversity of New Mexico24 September 2012National Information Standards Organization
  2. 2. 2
  3. 3. Research and Data Life Cycle Integration ? Plan Proposal writing Analyze CollectIdeas Research Integrate Assure Discover Describe Publication Preserve ? 3
  4. 4. Three Key Challenges Plan Analyze Collect I v o n a n n t o i Integrate Assure Discover Describe Preserve 4
  5. 5. 1. Data Preservation and Planning✔ ? 5
  6. 6. The Long Tail of Orphan Data “Most of the bytes are at the high end, Specialized repositories but most of the (e.g. GenBank, PDB) datasets are at theVolume low end” – Jim Gray Orphan data (B. Heidorn) Rank frequency of datatype 6
  7. 7. Planning ? Metadata standard? Data repository? 7
  8. 8. DataONE and the DMPTool Support Data PreservationThree major components for a Member Nodesflexible, scalable, sustainable • diverse institutions Coordinating Nodesnetwork • serve local community • retain complete metadata Investigator Toolkit • provide resources for catalog managing their data • indexing for search • retain copies of data • network-wide services • ensure content availability (preservation) • replication services 8
  9. 9. Dryad (>3,000 data products)Coordinatedsubmission of articlesand underlying dataHandshaking withspecializedrepositoriesPromotion of reuseand incentives fordeposit 9
  10. 10. Knowledge Network for Biocomplexity (20,000+ data packages) Data Types • Ecological • Environmental • Demographic • Social/Legal/EconomicContributors 60• Individual investigators 45 Data• Field stations and networks 30 Sizes• Government agencies % 15• Non-profit partnerships 0 10-200 >200 <1 1-10• Synthesis centers MB 10
  11. 11. ✔Check for best practices ✔Create metadata ✔Connect to ONEShare Data &Metadata (EML) 11
  12. 12. Data Management Planning Tool 12
  13. 13. 13
  14. 14. 14
  15. 15. 2. Data Discovery 15
  16. 16. Data Silos 16
  17. 17. The DataONE Federation 17
  18. 18. Member Node Functional TiersTier 1: Read only, public content ping(), getLogRecords(), getCapabilities(),get(), getSys temMetadata(), getChecksum(),listObjects(), synchronizat ionFailed()Tier 2: Read only, with access control isAuthorized(), setAccessPolicy()Tier 3: Read/Write using client tools create(), update(), delete()Tier 4: Able to operate as a replication target replicate(),getReplica()http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html 18
  19. 19. ORNL DAACas a DataONEMember Node NASA collectors DAAC Users (UWG)Investigator Toolkit DataONE Users 19
  20. 20. 20
  21. 21. 21
  22. 22. 22
  23. 23. 23
  24. 24. 24
  25. 25. 1. Ontology-based discovery search resultsConcepts acquirecontext: biomass as Material orbiomass as Energy Additional search terms Super-classes may have different 1. NCBO ontology repository instance properties 2. Populated with ontologies (e.g., the NASA-JPL Semantic Web for Earth and Environmental Terminology) 3. Queried ontologies and returned results using REST services 25
  26. 26. Approach 2: Enrich MN Metadata DAAC DRYAD KNB 3 KNBNumber of Documents 978 1,729 24,249 2 DRYADTotal Number of Keywords 7,294 8,266 254,525 1 DAACAverage Keywords/Document 7.46 4.78 10.49 0 2 4 6 8 10 12 Actual Keywords Suggested Keywords [1]field investigation 1. canopy characteristics [2]analysis 2. field investigation [3]land cover [4]computational model 3. vegetation index [5]reflectance 4. leaf characteristics [6]vegetative cover [7]biomass 5. Satellite [8]primary production [9]steel measuring tape 6. land cover [10]weigh balance 7. leaf area meter [11]precipitation amount [12]canopy characteristics 8. Reflectance [13]leaf characteristics 9. steel measuring tape [14]water vapor [15]quadrat sample frame 10. vegetative cover [16]rain gauge [17]surface air temperature 11. plant characteristics [18]air temperature 12. albedo [19]meteorological station [20]human observer [21]vegetation index [22]soil core device [23]plant characteristics [24]surface wind 26 [25]albedo
  27. 27. 3. InnovationThe Fourth Paradigm:1. Observational and experimental2. Theoretical research3. Computer simulations of natural phenomena4. Data-intensive research • new tools, techniques, and ways of working 27 27
  28. 28. “Data Intensive Science” and the “80:20 Rule” Increasing Process KnowledgeDecreasing Spatial Coverage Intensive science sites and experiments Extensive science sites Volunteer & education networks Remote sensing Adapted from CENR-OSTP 28
  29. 29. Public Participation in Scientific Research Conference: 4-5 August 2012 inPortland, Oregon USA prior to Ecological Society of America meeting (6-10 Aug.):http://www.birds.cornell.edu/citscitoolkit/conference/2012 29
  30. 30. Investigator Toolkit Support Plan DMP-Tool Analyze CollectKepler Integrate Assure Discover Describe Preserve 30
  31. 31. Exploration, Visualization, and Analysis Diverse bird observations and Model results environmental data from 300,00 locations in the US Occurrence of Indigo Bunting (2008) integrated and analyzed using High Performance Computing ResourcesLand Cover Jan Ap Jun Sep Dec rMeteorology • Examine patterns of migrationMODIS – Spatio-Temporal Exploratory • Infer how climateRemote Model identifies factors change may affectsensing data affecting patterns of bird migration migration 31
  32. 32. Taverna, MyExperiment 32
  33. 33. Provenance Browser 33 33
  34. 34. DataONE: Supporting Scientific Data Preservation, Discovery, and Innovation Current Member Nodes: Coming Soon:Current Tools:Tools Coming Soon: Queensland University of Technology 34
  35. 35. Deployment Targets – Y5  2009 2010 2011 2012 2013 2014 Y1 Y2 Y3 Y4 Y5 Metadata Objects 100k (130k) 400k 1M Datasets 90k (120k) 180k 360k Uptime 99.0 (100) 99.9 99.9 Metadata Schemas 8 (4) 8 8 Member Nodes 10 (8) 20 40 MN Countries 3 (2) 5 10 Coordinating Nodes 3 (3) 4 5 CN Countries 1 (1) 1 2 ITK Tools 8 (4) 10 12 35
  36. 36. Community Engagement 36
  37. 37. User AssessmentsScientists: BL Scientists: FU Library Policies: BL Library Policies: FU Librarians: BL Librarians: FU Policy Makers: BL Policy Makers: FU Educators: BL Educators: FU Year 1 Year 2 Year 3 Year 4 Year 5 37
  38. 38. Community Engagement 38
  39. 39. Best Practices and Software Tools 39
  40. 40. June 3-21, 2013University of New Mexico 40
  41. 41. Internships 2009 – 4 interns, 2010 – 4 interns 2011 – 8 interns, 2012 – 6 internshttps://notebooks.dataone.org/summer2012/ 41
  42. 42. DataONE: Supporting Scientific DataPreservation, Discovery, and Innovation 42
  43. 43. DataONE.org 43
  44. 44. DataONE Team and Sponsors • Amber Budden, Roger Dahl, Rebecca Koskela, Bill • Ewa Deelman Michener, Robert Nahf, Skye Roseboom, Mark Servilla • Deborah McGuinness • Dave Vieglais • Suzie Allard, Nick Dexter, Kimberly • Jeff Horsburgh Douglass, Carol Tenopir, Robert Waltz, Bruce • Wilson John Cobb, Bob Cook, Ranjeet • Robert Sandusky Devarakonda, Giri Palanismy, Line Pouchard • Patricia Cruse, John Kunze • Bertram Ludaescher • Sky Bristol, Mike Frame, Richard Huffine, Viv • Peter Honeyman Hutchison, Jeff Morisette, Jake Weltzin, Lisa Zolly • Stephanie Hampton, Chris Jones, Matt • Cliff Duke Jones, Ben Leinfelder, Andrew Pippin • Paul Allen, Rick Bonney, Steve Kelling • Carole Goble • Ryan Scherle, Todd Vision • Donald Hobern • Randy Butler • David DeRoure LEON LEVY FOUNDATION 44

×