SIMULATION INFORMATICS                                     A data-driven methodology for                                  ...
A bit more background on meBeforeUndergrad Harvey Mudd College (Joint CS/Math)Internships Yahoo! and Microsoft Internships...
It’s time to ask:                        What can science                        learn from Google?                       ...
A tale of two computers …    Cielo (#10 in top 500)                       SNL/CA Nebula Cluster    Separate storage system...
Simulation:The Third Pillar of Science21st Century Science in a nutshell     Experiments are not practical / feasible.    ...
The problem Simulation ain’t cheap!We can throw the numbers into                                              21.1st Centu...
Our CSRF project Simulation Informatics                                      The People                                   ...
The idea and vision: Store the runs!      Supercomputer           MapReduce cluster              Engineer    Each multi-da...
MapReduceOriginated at Google for                  Data scalableindexing web pages and                            Maps    ...
Mesh point variance in MapReduce                                  Three simulation runs for three time steps each.        ...
Mesh point variance in MapReduce                                   Three simulation runs for three time steps each.       ...
Hadoop???                        Hadoop: an open-source                        MapReduce system                        sup...
MapReduce and Surrogate ModelsA surrogate modelis a function thatreproduces the                                f1        S...
MapReduce + SimulationsThermal Race problem     Joe Ruthruff and Jeremy TempletonHeating an unclassified NW model to simul...
Data driven Green’s functions The simulation varies most in the corners of the parameter space.   These analyses will help...
What can science                        learn from Google?                        The importance of data.                 ...
The future?                                                  MapReduce computers                                          ...
Upcoming SlideShare
Loading in...5
×

Simulation Informatics

1,186

Published on

A high level vision for a new project on how to use MapReduce computers for simulating physical phenomenon

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,186
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Simulation Informatics

  1. 1. SIMULATION INFORMATICS A data-driven methodology for synthesis from simulation results David F. Gleich John von Neumann Postdoctoral Fellow Sandia National Laboratories Livermore, CADavid Gleich (Sandia) 5/5/2011 1/18
  2. 2. A bit more background on meBeforeUndergrad Harvey Mudd College (Joint CS/Math)Internships Yahoo! and Microsoft InternshipsGraduate Stanford (Computation and Mathematical Engineering) Thesis: Models and Algorithm’s for PageRank SensitivityInternships Intel, Joint Library of Congress/Stanford, MicrosoftPost-doc 1 Univ. of British ColumbiaNowPost-doc 2 2010 John von Neumann fellow @ SandiaIn AugustTenure-Track position at Purdue’s CS departmentDavid Gleich (Sandia) 5/5/2011 2/18
  3. 3. It’s time to ask: What can science learn from Google? -- Wired Magazine (2008) http://www.wired.com/science/discoveries/magazine/16-07/pb_theoryDavid Gleich (Sandia) 5/5/2011 3/18
  4. 4. A tale of two computers … Cielo (#10 in top 500) SNL/CA Nebula Cluster Separate storage system 2TB/core storage This platform is awesome This platform is awesome for simulating physical systems for working with data Cost ~$50 million Cost $150kDavid Gleich (Sandia) 5/5/2011 4/18
  5. 5. Simulation:The Third Pillar of Science21st Century Science in a nutshell Experiments are not practical / feasible. Simulate things instead.But do we trust the simulations?We’re trying!Model FidelityVerification & Validation (V&V)Uncertainty Quantification (UQ)Insight and confidence requires multiple runs.David Gleich (Sandia) 5/5/2011 5/18
  6. 6. The problem Simulation ain’t cheap!We can throw the numbers into 21.1st Century Sciencethe biggest computing clusters the in a nutshell?world has ever seen and let Simulations arestatistical algorithms find patterns too expensive.where science cannot. Let data provide a--Wired (again) surrogate. The Database Input Time history s1 -> f1 Parameters of simulation s2 -> f2 s f sk -> fkDavid Gleich (Sandia) 5/5/2011 6/18
  7. 7. Our CSRF project Simulation Informatics The People Myself Paul G. Constantine (2009 von Neumann) To enable analysts and engineers to hypothesize from inexpensive data computations instead of expensive HPC computations. Stefan Domino Jeremy Templeton Joe RuthruffDavid Gleich (Sandia) 5/5/2011 7/18
  8. 8. The idea and vision: Store the runs! Supercomputer MapReduce cluster Engineer Each multi-day HPC A MapReduce cluster can … enabling engineers to query simulation generates hold hundreds or thousands and analyze months of simulation gigabytes of data. of old simulations … data for statistical studies and uncertainty quantification.David Gleich (Sandia) 5/5/2011 8/18
  9. 9. MapReduceOriginated at Google for Data scalableindexing web pages and Maps M M 1 2 1 Mcomputing PageRank 2 M Reduce M M R 3 4 3 M M R 4 MThe idea 5 5 M ShuffleBring the computations to the data.Express algorithms in Fault tolerance by design data-local operations. Input stored in triplicate Reduce input/Implement one type of M output on disk communication: shuffle. M R MShuffle moves all data with R M the same key to the same M Map output reducer persisted to disk before shuffleDavid Gleich (Sandia) 5/5/2011 9/18
  10. 10. Mesh point variance in MapReduce Three simulation runs for three time steps each. Run 1 Run 2 Run 3T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3David Gleich (Sandia) 5/5/2011 10/18
  11. 11. Mesh point variance in MapReduce Three simulation runs for three time steps each. Run 1 Run 2 Run 3 T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3 M M M1. Each mapper out- 2. Shuffle moves allputs the mesh points values from the samewith the same key. mesh point to the same R R reducer. 3. Reducers just compute a numerical variance. Bring the computations to the data.! David Gleich (Sandia) 5/5/2011 11/18
  12. 12. Hadoop??? Hadoop: an open-source MapReduce system supported by Yahoo! Hadoop was the name of Doug Cutting’s son’s stuffed elephant. Photo credit, New York Times (2009) http://www.nytimes.com/2009/03/17/tech nology/business-computing/17cloud.htmlDavid Gleich (Sandia) 5/5/2011 12/18
  13. 13. MapReduce and Surrogate ModelsA surrogate modelis a function thatreproduces the f1 Surrogateoutput of a simul- Sampleation and predictsits output at new f2parameter values. f5 The Database New Samples The Surrogate s1 -> f1 Extraction Interpolation sa -> fa s2 -> f2 sb -> fb sk -> fk Just one machine sc -> fc On the MapReduce cluster On the MapReduce clusterDavid Gleich (Sandia) 5/5/2011 13/18
  14. 14. MapReduce + SimulationsThermal Race problem Joe Ruthruff and Jeremy TempletonHeating an unclassified NW model to simulate a fireGoal make sure the weak link fails before the strong linkObjective determine where parameters are interesting 30 min per runParameters material properties 700GB of Raw Data ~64GB of data for1000 sample MapReduceparameterstudy using Our results Extraction 30 min Interpolation 5 minDavid Gleich (Sandia) 5/5/2011 14/18
  15. 15. Data driven Green’s functions The simulation varies most in the corners of the parameter space. These analyses will help understand which parameters matter, and where to continue running the simulations.David Gleich (Sandia) 5/5/2011 15/18
  16. 16. What can science learn from Google? The importance of data. For more about this, see a talk and paper by Peter Norvig “Internet-Scale Data Analysis” http://www.stanford.edu/group/ mmds/slides2010/Norvig.pdf Halevy et al. “The unreasonable effectiveness of data.” IEEE Intelligent systems, 2009.David Gleich (Sandia) 5/5/2011 16/18
  17. 17. The future? MapReduce computers embedded within super computers as the parallel file system and data analysis platform. The best of both worlds. Our work Laying the foundation for useful surrogate models that will make data based simulation a compelling addition to the simulation revolution. Any questions?David Gleich (Sandia) 5/5/2011 17/18
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×