Your SlideShare is downloading. ×
0
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Simulation Informatics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Simulation Informatics

1,168

Published on

A high level vision for a new project on how to use MapReduce computers for simulating physical phenomenon

A high level vision for a new project on how to use MapReduce computers for simulating physical phenomenon

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,168
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SIMULATION INFORMATICS A data-driven methodology for synthesis from simulation results David F. Gleich John von Neumann Postdoctoral Fellow Sandia National Laboratories Livermore, CADavid Gleich (Sandia) 5/5/2011 1/18
  • 2. A bit more background on meBeforeUndergrad Harvey Mudd College (Joint CS/Math)Internships Yahoo! and Microsoft InternshipsGraduate Stanford (Computation and Mathematical Engineering) Thesis: Models and Algorithm’s for PageRank SensitivityInternships Intel, Joint Library of Congress/Stanford, MicrosoftPost-doc 1 Univ. of British ColumbiaNowPost-doc 2 2010 John von Neumann fellow @ SandiaIn AugustTenure-Track position at Purdue’s CS departmentDavid Gleich (Sandia) 5/5/2011 2/18
  • 3. It’s time to ask: What can science learn from Google? -- Wired Magazine (2008) http://www.wired.com/science/discoveries/magazine/16-07/pb_theoryDavid Gleich (Sandia) 5/5/2011 3/18
  • 4. A tale of two computers … Cielo (#10 in top 500) SNL/CA Nebula Cluster Separate storage system 2TB/core storage This platform is awesome This platform is awesome for simulating physical systems for working with data Cost ~$50 million Cost $150kDavid Gleich (Sandia) 5/5/2011 4/18
  • 5. Simulation:The Third Pillar of Science21st Century Science in a nutshell Experiments are not practical / feasible. Simulate things instead.But do we trust the simulations?We’re trying!Model FidelityVerification & Validation (V&V)Uncertainty Quantification (UQ)Insight and confidence requires multiple runs.David Gleich (Sandia) 5/5/2011 5/18
  • 6. The problem Simulation ain’t cheap!We can throw the numbers into 21.1st Century Sciencethe biggest computing clusters the in a nutshell?world has ever seen and let Simulations arestatistical algorithms find patterns too expensive.where science cannot. Let data provide a--Wired (again) surrogate. The Database Input Time history s1 -> f1 Parameters of simulation s2 -> f2 s f sk -> fkDavid Gleich (Sandia) 5/5/2011 6/18
  • 7. Our CSRF project Simulation Informatics The People Myself Paul G. Constantine (2009 von Neumann) To enable analysts and engineers to hypothesize from inexpensive data computations instead of expensive HPC computations. Stefan Domino Jeremy Templeton Joe RuthruffDavid Gleich (Sandia) 5/5/2011 7/18
  • 8. The idea and vision: Store the runs! Supercomputer MapReduce cluster Engineer Each multi-day HPC A MapReduce cluster can … enabling engineers to query simulation generates hold hundreds or thousands and analyze months of simulation gigabytes of data. of old simulations … data for statistical studies and uncertainty quantification.David Gleich (Sandia) 5/5/2011 8/18
  • 9. MapReduceOriginated at Google for Data scalableindexing web pages and Maps M M 1 2 1 Mcomputing PageRank 2 M Reduce M M R 3 4 3 M M R 4 MThe idea 5 5 M ShuffleBring the computations to the data.Express algorithms in Fault tolerance by design data-local operations. Input stored in triplicate Reduce input/Implement one type of M output on disk communication: shuffle. M R MShuffle moves all data with R M the same key to the same M Map output reducer persisted to disk before shuffleDavid Gleich (Sandia) 5/5/2011 9/18
  • 10. Mesh point variance in MapReduce Three simulation runs for three time steps each. Run 1 Run 2 Run 3T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3David Gleich (Sandia) 5/5/2011 10/18
  • 11. Mesh point variance in MapReduce Three simulation runs for three time steps each. Run 1 Run 2 Run 3 T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3 M M M1. Each mapper out- 2. Shuffle moves allputs the mesh points values from the samewith the same key. mesh point to the same R R reducer. 3. Reducers just compute a numerical variance. Bring the computations to the data.! David Gleich (Sandia) 5/5/2011 11/18
  • 12. Hadoop??? Hadoop: an open-source MapReduce system supported by Yahoo! Hadoop was the name of Doug Cutting’s son’s stuffed elephant. Photo credit, New York Times (2009) http://www.nytimes.com/2009/03/17/tech nology/business-computing/17cloud.htmlDavid Gleich (Sandia) 5/5/2011 12/18
  • 13. MapReduce and Surrogate ModelsA surrogate modelis a function thatreproduces the f1 Surrogateoutput of a simul- Sampleation and predictsits output at new f2parameter values. f5 The Database New Samples The Surrogate s1 -> f1 Extraction Interpolation sa -> fa s2 -> f2 sb -> fb sk -> fk Just one machine sc -> fc On the MapReduce cluster On the MapReduce clusterDavid Gleich (Sandia) 5/5/2011 13/18
  • 14. MapReduce + SimulationsThermal Race problem Joe Ruthruff and Jeremy TempletonHeating an unclassified NW model to simulate a fireGoal make sure the weak link fails before the strong linkObjective determine where parameters are interesting 30 min per runParameters material properties 700GB of Raw Data ~64GB of data for1000 sample MapReduceparameterstudy using Our results Extraction 30 min Interpolation 5 minDavid Gleich (Sandia) 5/5/2011 14/18
  • 15. Data driven Green’s functions The simulation varies most in the corners of the parameter space. These analyses will help understand which parameters matter, and where to continue running the simulations.David Gleich (Sandia) 5/5/2011 15/18
  • 16. What can science learn from Google? The importance of data. For more about this, see a talk and paper by Peter Norvig “Internet-Scale Data Analysis” http://www.stanford.edu/group/ mmds/slides2010/Norvig.pdf Halevy et al. “The unreasonable effectiveness of data.” IEEE Intelligent systems, 2009.David Gleich (Sandia) 5/5/2011 16/18
  • 17. The future? MapReduce computers embedded within super computers as the parallel file system and data analysis platform. The best of both worlds. Our work Laying the foundation for useful surrogate models that will make data based simulation a compelling addition to the simulation revolution. Any questions?David Gleich (Sandia) 5/5/2011 17/18

×