The DataONE mission/vision is to “enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it.” DataONE is based on three precepts. 1. We are leveraging existing infrastructure such as the hundreds of existing data centers and repositories, and the myriad of software tools. 2. We are focusing our efforts on developing new infrastructure that better enables interoperability across data centers and between scientific tools and data resources. [The new cyberinfrastructure being created by DataONE is illustrated on a future slide.] 3. We recognize that the largest challenges are sociocultural in nature, and thus we focus significant attention on engaging and supporting the broader community of stakeholders (e.g. scientists, students, librarians).
DataONE is a federated data network built to improve access to Earth science data, and to support science by: engaging the relevant science, data, and policy communities; facilitating easy, secure, and persistent storage of data; and disseminating integrated and user-friendly tools for data discovery, analysis, visualization, and decision-making. There are three principal components:Member Nodes that include a diverse array of data centers and repositories that are associated with national and international agencies and research networks, universities, libraries, etc.Coordinating Nodes that support data replication across Member Nodes (i.e., data centers) as well as network wide services like 24/7 access to metadata at the CNs, indexing and rapid search and discovery, etc.An Investigator Toolkit that includes tools that are widely used by scientists, The tools are coupled with the DataONE resources so that it is, for example, possible to seamlessly and transparently access data at Member Nodes through the tool of your choice.
Other development activities during years 3-5 will focus on expanding the suite of tools that are available through the Investigator Toolkit. New tool additions will be identified and prioritized by the DataONE Users Group.
Other development activities during years 2-5 will focus on expanding the suite of tools that are available through the Investigator Toolkit. New tool additions will be identified and prioritized by the DataONE Users Group.
This final slide illustrates the initial DataONE partners that have now been involved for over 3 years, since the proposal was conceived. The DataONE Users Group now includes significantly more partners and we expect to grow exponentially over the next five years.
The DataONE team is growing!
The Scientific Exploration, Visualization and Analysis Working Group is an example of a scientific use case. By running through a comprehensive case study, this working group was able to provide specific guidance on the challenges faced when conducting data intensive science. Challenges that were communicated to, and met by, the DataONE core CI team and developers.Science requires: Multiple cooperating extreme scale CI components (EVA/eBird pilot lesson learned)EVA pilot collaborated with TeraGrid (now XSEDE) to use HPC and “schlep” data as part of the workflow50K cpu-core hours (SU’s) last year(supporting SOTB 2011)3M hours allocated this year (Cornell CLO team has optimized code for 3-10X speedup, loosened data transfer bottleneck, so we will under run)Plan for 500 species (3 yr data) runs. Currently: 70/wk for 2011 campaignHPC use 10X 2 years in a row. Data increases as well.Conclusion: success breeds scale
DataOne - Suzie Allard - RDAP12
DataONEResearch Data Access & Preservation21 March 2012Suzie Allard, Ph.D.University of Tennessee
DataONE vision and approachEnable new science and knowledge creation throughuniversal access to data about life on earth and theenvironment that sustains it.1. Build on existing cyberinfrastructure 2. Create new cyberinfrastructure 3. Support communities of practice 2 2
DataONE CyberinfrastructureThree major components for a Member Nodesflexible, scalable, sustainable • diverse institutions Coordinating Nodesnetwork • serve local community • retain complete metadata Investigator Toolkit • provide resources for catalog managing their data • indexing for search • retain copies of data • network-wide services • ensure content availability (preservation) • replication services 3
Training in all elements of the data life cycle Plan Analyze Collect Kepler Integrate Assure Discover Describe Preserve 4
DataONE Education and TrainingSummer InternshipsTraining at Conferences and Workshops • Supercomputing 2011 • DataONE Implementation Workshop: Publishing data as a Member Node • Ecological Society of America (ESA) • American Geophysical Union (AGU)Educational ModulesGraduate-level course • Summer Institute for Environmental Informatics 5
Environmental Information Management (EIM) InstituteGraduate students biology, geology, ecology, or otherenvironmental sciences, environmental engineering, geographyor science librarianshipConceptual and practical hands-ontraining to effectivelydesign, manage, analyze, visualize, andpreserve data and information:• Managing data files• Creating databases and web portals• Data analysis and visualization• Techniques for managing, analyzing, and visualizing geospatial data 7
DataONE Team and Sponsors • Amber Budden, Roger Dahl, Rebecca • Ewa Deelman Koskela, Bill Michener, Robert Nahf, Mark • Servilla Dave Vieglais • Peter Honeyman • Suzie Allard, Carol Tenopir, Maribeth • Jeff Horsburgh Manoff, Kimberley Douglass, Robert • Waltz, Bruce Wilson Giri John Cobb, Bob Cook, • Robert Sandusky Palanismy, Line Pouchard • Patricia Cruse, John Kunze • Bertram Ludaescher • Sky Bristol, Mike Frame, Richard Huffine, Viv • Peter Buneman Hutchison, Jeff Morisette, Jake Weltzin, Lisa Zolly • Chris Jones, Stephanie Hampton, Matt • Cliff Duke Jones • Paul Allen, Rick Bonney, Steve Kelling • Carole Goble • Ryan Scherle, Todd Vision • Donald Hobern • Randy Butler • David DeRoure LEON LEVY FOUNDATION 8
A Science Use Case Diverse bird observations and Model results environmental data from 300,00 locations in the US Occurrence of Indigo Bunting (2008) integrated and analyzed using High Performance Computing ResourcesLand Cover Jan Ap Jun Sep Dec rMeteorology • Examine patterns of migrationMODIS – Spatio-Temporal Exploratory • Infer how climateRemote Model identifies factors change may affectsensing data affecting patterns of bird migration migration 11