Jeff Sale and Diane Baxter San Diego Supercomputer Center University of California, San Diego
Data: Evidence to unravel the mysteries of our Universe
Data provide answers to our children’s single most persistent question:
How do you know that?
Data Come From Every Field . . . Life Sciences Astronomy Physics Modeling and Simulation Data Management and Mining GAMESS Geosciences
And are shared around the world. Open Science Grid: Physics-driven Grid infrastructure NEES: Earthquake Engineering Grid SDSC PRAGMA: Pacific Rim Grid Middleware Consortium TeraGrid: National Research Resource Grid GEON: Geosciences Grid BIRN: Biomedical Informatics Grid
Life Sciences Disciplinary Databases Users Portals, Domain Specific APIs provide access to data Middleware federates data across disciplinary vocabularies Organisms Organs Cells Atoms Biopolymers Organelles Cell Biology Anatomy Physiology Proteomics Medicinal Chemistry Genomics
How much data are we producing*? 1 human brain at the micron level = 1 PetaByte 1 novel = 1 MegaByte iPod Shuffle (up to 120 songs) = 512 MegaBytes Printed materials in the Library of Congress = 10 TeraBytes SDSC HPSS tape archive = 25 PetaBytes and growing All worldwide information in one year = 2 ExaBytes 1 Low Resolution Photo = 100 KiloBytes * Rough/average estimates 1 DVD = 9.4 GigaBytes Kilo 10 3 Mega 10 6 Giga 10 9 Tera 10 12 Peta 10 15 Exa 10 18
Computational tools are essential to comprehend that much data!
Integrate vast data collections from a wide variety of collection points
Visualize empirical results
Create mathematical models based on complex, interconnected data
Extend beyond data to:
Ask “what if” questions
Evaluate alternate hypotheses
Visualize vast, complex data collections
Manipulate multiple variables
Why is Data Literacy So Essential?
Data = the foundation of science
Data shared can solve problems
Data can bridge and connect fields, ideas and people
Computation, the “third leg” of research, depends upon data
From atomic interaction data that form a model of a molecular dynamics
To light emission data from long-dead stars that explain the origins of the Universe
Teaching about data means teaching the “language and currency” of science.