Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales

716 views

Published on

Talk given at Euro-Par 2017 on September 31 in Santiago De Compostela.

Published in: Technology
  • Be the first to comment

Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales

  1. 1. 1 Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales Ian Foster, Argonne & U.Chicago August 31, 2017 EuroPar, Santiago de Compostela https://www.researchgate.net/publication/317703782
  2. 2. 2 What I won’t talk about: globus.org 5major services 13 national labs use Globus 300PB transferred 10,000 active endpoints 50 Bn files processed 70,000 registered users 99.5% uptime 65+ institutional subscribers 1 PB largest single transfer to date 3 months longest continuously managed transfer 300+ federated campus identities 12,000 active users/year
  3. 3. 3 Shameless plugs
  4. 4. 4 Three messages Dramatic changes in HPC system geography … … are driving new application structures … … resulting in exciting new computer science challenges
  5. 5. 5 Geography: (Part of) what determines how long it takes to get from A to B
  6. 6. 6 Geography: (Part of) what determines how long it takes to get from A to B The memory hierarchy plays a big role in computing geography
  7. 7. 7 Geography: (Part of) what determines how long it takes to get from A to B • Computing geography is changing rapidly • Despite continued exponential growth in many technologies • Different rates mean that resources are getting farther away ~1980-2000 Patterson, CACM, 2004 CPU high, Disk low
  8. 8. 8 A. C. Bauer et al., EuroVis 2016Titan supercomputer
  9. 9. 9 10 180 x18 0.3 1 x3
  10. 10. 10
  11. 11. 11
  12. 12. 12
  13. 13. 13 Exascale climate goal: Ensembles of 1km models at 15 simulated years/24 hours Full state once per model day  260 TB every 16 seconds  1.4 EB/day
  14. 14. 14 Model selection in deep learning Evaluate 1M alternative models, each with 100M parameters  1014 parameter values https://de.mathworks.com/company/newsletters/articles/cancer-diagnostics-with-deep-learning-and-photonic-time-stretch.html
  15. 15. 15 Real-time analysis and experimental steering • Current experimental protocols typically process and validate data only after an experiment has completed, which can lead to undetected errors and prevents online steering • We built an autonomous stream processing system that allows data streamed from beamline computers to be processed in real time on a remote supercomputer with a control feedback loop used to make decisions during experimentation • The system has been tested in real- world setting TXM beamline (32- ID@APS) while performing cement wetting experiment (2 experiments, each with 8 hours of data acquisition time) Sustained # Projections/seconds CircularBufferSize Reconstruction Frequency Image Quality w.r.t. Streamed Projections SimilarityScore # Streamed Projections Reconstructed Image Sequence Tekin Bicer et al., eScience 2017
  16. 16. 16 Other examples • Materials science • Billion-atom atomistic simulations with femtosecond time steps • Simulations may run for simulated seconds • Want to study vibrational responses at 10s of femtoseconds • Fusion science • Full-device simulations may generate 100 PBs • Need to reduce 1000:1 for effective output • Eventual goal is real-time response during fusion experiments
  17. 17. 17 HPC applications: Synopsis Single program Multiple program Offline analysis Online analysis Many tasks • Reliable or unreliable • Loosely or tightly coupled • Static or dynamic New challenges: Efficient logistics! • “Amateurs talk strategy while professionals study logistics” – Robert Barrow • “The line between disorder and order lies in logistics...” – Sun Tzu Multiple simulations + analyses Simulation + analysis Multiple simulations
  18. 18. 18 The need for online data analysis and reduction Traditional approach: Simulate, output, analyze Write simulation output to secondary storage; read back for analysis Decimate in time when simulation output rate exceeds output rate of computer Online: y = F(x) Offline: a = A(y), b= B(y), …
  19. 19. 19 The need for online data analysis and reduction Traditional approach: Simulate, output, analyze Write simulation output to secondary storage; read back for analysis Decimate in time when simulation output rate exceeds output rate of computer Online: y = F(x) Offline: a = A(y), b= B(y), … New approach: Online data analysis & reduction Co-optimize simulation, analysis, reduction for performance and information output Substitute CPU cycles for I/O, via data (de)compression and/or online data analysis a) Online: a = A(F(x)), b = B(F(x)), … b) Online: r = R(F(x)) Offline: a = A’(r), b = B’(r), or a = A(U(r)), b = B(U(r)) [R = reduce, U = un-reduce]
  20. 20. 20 But reduction comes with challenges • Handling high entropy • Performance – no benefit otherwise • Not only errors in variable ∶ Ε ≡ 𝑓 − 𝑓 • Must also consider impact on derived quantities: Ε ≡ (𝑔𝑙 𝑡 (𝑓 𝑥, 𝑡 ) − 𝑔𝑙 𝑡 ( 𝑓𝑙 𝑡 ( 𝑥, 𝑡 ) S. Klasky
  21. 21. 21 Data reduction challenges Key research challenge: How to manage the impact of errors on derived quantities? Where did it go??? S. Klasky
  22. 22. 22 CODAR: Center for Online Data Analysis and Reduction A U.S. Department of Energy Exascale Computing Program Codesign Center CODAR Data services Exascale platforms Applications
  23. 23. 23 Infrastructure – Matthew Wolf (Lead) • Cheetah: Bryce Allen, Kshitij Mehta, Tahsin Kurc, Li Tang • Savannah: Justin Wozniak, Manish Parashar, Philip Davis • Chimbuko: Abid Malik, Line Pouchard Data Reduction – Franck Cappello (Lead) • Multilevel: Mark Ainsworth, Ozan Tugluk, Jong Choi • Z-checker: Julie Bessac, Sheng Di Data Analysis – Shinjae Yoo (Lead) • Blobs: Tom Peterka, Hanqi Guo • Hierarchical: Stefan Wild, Wendy Di • Functional: George Ostrouchov • Visual Analytics: Klaus Mueller, Wei Xu Management – Ian Foster (Lead) • Scott Klasky • Kerstin Kleese van Dam • Todd Munson (Project Management)
  24. 24. 24 Cross-cutting research questions What are the best data analysis and reduction algorithms for different application classes, in terms of speed, accuracy, and resource requirements? How can we implement those algorithms to achieve scalability and performance portability? What are the tradeoffs in data analysis accuracy, resource needs, and overall application performance between using various data reduction methods to reduce file size prior to offline data reconstruction and analysis vs. performing more online data analysis? How do these tradeoffs vary with hardware and software choices? How do we effectively orchestrate online data analysis and reduction to reduce associated overheads? How can hardware and software help with orchestration?
  25. 25. 25 Prototypical CODAR data analysis and reduction pipeline CODAR runtime Reduced output and reconstruction info I/O system CODAR data API Running simulation Multivariate statistics Feature analysis Outlier detection Application-aware Transforms Encodings Error calculation Refinement hints CODARdataAPI Offlinedataanalysis Simulation knowledge: application, models, numerics, performance optimization, … CODAR data analysis CODAR data reduction CODAR data monitoring
  26. 26. 26 Overarching data reduction challenges • Understanding the science requires massive data reduction • How do we reduce • The time spent in reducing the data to knowledge? • The amount of data moved on the HPC platform? • The amount of data read from the storage system? • The amount of data stored in memory, on storage system, moved over WAN? • Without removing the knowledge. • Requires deep dives into application post processing routines and simulations • Goal is to create both (a) co-design infrastructure and (b) reduction and analysis routines • General: e.g., reduce Nbytes to Mbytes, N<<M • Motif-specific: e.g., finite difference mesh vs. particles vs. finite elements • Application-specific: e.g. reduced physics allows us to understand deltas
  27. 27. 27 HPC floating point compression • Current interest is with lossy algorithms, some use preprocessing • Lossless may achieve up to ~3x reduction • ISABELA • SZ • ZFP • Linear auditing • SVD • Adaptive gradient methods Compress each variable separately: Several variables simultaneously: • PCA • Tensor decomposition • …
  28. 28. 28 Lossy compression with SZ No existing compressor can reduce hard to compress datasets by more than a factor of 2. Objective 1: Reduce hard to compress datasets by one order of magnitude Objective 2: Add user-required error controls (error bound, shape of error distribution, spectral behavior of error function, etc. etc.) NCAR atmosphere simulation output (1.5 TB) WRF hurricane simulation output Advanced Photon Source mouse brain data What we need to compress (bit map of 128 floating point numbers): Random noise Franck Cappello
  29. 29. 29 Lossy compression: Atmospheric simulation Franck Cappello Latest SZ
  30. 30. 30 Characterizing compression error 0.0001 0.001 0.01 0.1 1 1/N 6/N 11/N 16/N 21/N 26/N 31/N 36/N 41/N 46/N Amplitude Frequency 0 2e-07 4e-07 6e-07 8e-07 1e-06 1.2e-06 1.4e-06 1.6e-06 1.8e-06 2e-06 0 20 40 60 80 100 120 140 160 180 200 MaximumCompressionError Variables SZ(max error) SZ(avg error) ZFP(max error) ZFP(avg error) Error distribution Spectral behavior Laplacian (derivatives) Autocorrelation of errors Respect of error bounds Error propagation Franck Cappello
  31. 31. 31 Z-checker: Analysis of data reduction error • Community tool to enable comprehensive assessment of lossy data reduction error: • Collection of data quality criteria from applications • Community repository for datasets, reduction quality requirements, compression performance • Modular design enables contributed analysis modules (C and R) and format readers (ADIOS, HDF5, etc.) • Off-line/on-line parallel statistical, spectral, point-wise distortion analysis with static & dynamic visualization Franck Cappello, Julie Bessac, Sheng Di
  32. 32. 32 Science-driven decompositions • Information-theoretically derived methods like SZ, Isabella, ZFP make for good generic capabilities • If scientists can provide additional details on how to determine features of interest, we can use those to drive further optimizations. E.g., if they can select: • Regions of high gradient • Regions near turbulent flow • Particles with velocities > two standard deviations • How can scientists help define features?
  33. 33. 33 Multilevel compression techniques A hierarchical reduction scheme produces multiple levels of partial decompression of the data so that users can work with reduced representations that require minimal storage whilst achieving the user-specified tolerance Compression vs. user-specified toleranceResults for turbulence dataset: extremely large, inherently non-smooth, resistant to compression Mark Ainsworth
  34. 34. 34 Manifold learning for change detection and adaptive sampling Low dimensional manifold projection of different state of MD trajectories • A single molecular dynamics trajectory can generate 32 PB • Use online data analysis to detect relevant or significant events • Project MD trajectories to manifold space (dimensionality reduction) across time into two dimensional space • Change detection on manifold space is more robust than original full coordinate space as it removes local vibrational noise • Apply adaptive sampling strategy based on accumulated changes of trajectories Shinjae Yoo
  35. 35. 35 Critical points extracted with topology analysis Tracking blobs in XGC fusion simulations Blobs, regions of high turbulence that can damage the Tokamak, can run along the edge wall down toward the diverter and damage it. Blob extraction and tracking enables the exploration and analysis of high-energy blobs across timesteps. Our new visualizations will help scientists understand the behavior of blob dynamics in greater detail than previously possible. Research Details • Access data with ADIOS I/O in high performance • Precondition the input data with robust PCA • Detect blobs as local extrema with topology analysis • Track blobs over time with combinatorial feature flow field method A method to extract, track, and visualize blobs in large scale 5D gyrokinetic Tokamak simulations. Hanqi Guo, Tom Peterka Tracking graph that visualizes the dynamics of blobs (birth, merge, split, and death) over time Data preconditioning with robust PCA
  36. 36. 36 Reduction for visualization “an extreme scale simulation … calculates temperature and density over 1000 of time steps. For both variables, a scientist would like to visualize 10 isosurface values and X, Y, and Z cut planes for 10 locations in each dimension. One hundred different camera positions are also selected, in a hemisphere above the dataset pointing towards the data set. We will run the in situ image acquisition for every time step. These parameters will produce: 2 variables x 1000 time steps x (10 isosurface values + 3 x 10 cut planes) x 100 camera positions x 3 images (depth, float, lighting) = 2.4 x 107 images.” J. Ahrens et al., SC’14 103 time steps x 1015 B state per time step = 1018 B 2.4 x 107 images x 1MB/image (megapixel, 4B) = 2.4 x 1012 B
  37. 37. 37 Fusion whole device model XGC GENEInterpolator 100+ PB PB/day on Titan today; 10+ PB/day in the future 10 TB/day on Titan today; 100+ TB/day in the future Analysis Analysis Analysis Read 10-100 PB per analysis http://bit.ly/2fcyznK
  38. 38. 38 XGC GENEInterpolator Reduction Reduction XGC Viz. XGC output GENE Viz. GENE output Comparative Viz. NVRAM PFS TAPE http://bit.ly/2fcyznK Fusion whole device model
  39. 39. 39 Integrates multiple technologies: •ADIOS staging (DataSpaces) for coupling •Sirius (ADIOS + Ceph) for storage •ZFP, SZ, Dogstar for reduction •VTK-M services for visualization •TAU for instrumenting the code •Cheetah + Savanna to test the different configurations (same node, different node, hybrid-combination) to determine where to place the different services •Flexpath for staged-write from XGC to storage •Ceph + ADIOS to manage storage hierarchy •Swift for workflow automation XGC GENEInterpolator Reduction Reduction XGC Viz. XGC output GENE Viz. GENE output TAU TAU Comparative Viz. NVRAM PFS TAPE Performance Viz. Cheetah + Savanna drive codesign experiments Fusion whole device model
  40. 40. 40 Savannah: Swift workflows coupled with ADIOS Z-Check dup Multi-node workflow components communicate over ADIOS Application data Cheetah Experiment configuration and dispatch User monitoring and control of multiple pipeline instances Co-design data Store experiment metadata Chimbuko captures co-design performance data Other co-design output (e.g., Z-Checker) CODAR campaign definition Analysis ADIOS output Job launch Science App Reduce Co-design experiment architecture
  41. 41. 41 Transformation layer • Designed for data conversions, compression, and transformation • zlib, bzip2, szip, ISOBAR, ALACRITY, FastBit • Can transform local data on each processor • Transparent for users • User code read/write the original untransformed data • Applications • Compressed output • Automatically indexed data • Local Data Reorganization • Data Reduction • Released in ADIOS 1.6 in 2013 with compression transformations User Application ADIOS Variable A I/O Transport Layer Regular var. BP file, staging area, etc. Data Transform Layer Variable B Plugin Read Transform Plugin Plugin Write Transformed var.
  42. 42. 42 Codesign questions to be addressed • How can we couple multiple codes? Files, staging on the same node, different nodes, synchronous, asynchronous? • How we can test different placement strategies for memory optimization, performance optimizations? • What are the best reduction technologies to allow us to capture all relevant information during a simulation? E.g., Performance vs. accuracy. • How can we create visualization services that work on the different architectures and use the data models in the codes? • How do we manage data across storage hierarchies?
  43. 43. 43 CODAR summary • Infrastructure development and deployment • Enable rapid composition of application and “data services” (data reduction methods, data analysis methods, etc.) • Support CODAR-developed and other data services • Method development: new reduction & analysis routines • Motif-specific: e.g., finite difference mesh vs. particles vs. finite elements • Application-specific: e.g., reduced physics to understand deltas • Application engagement • Understand data analysis and reduction requirements • Integrate, deploy, evaluate impact https://codarcode.github.io codar-info@cels.anl.gov
  44. 44. 44 Dramatic changes in HPC system geography … … are driving new application structures … … resulting in exciting new computer science challenges Thanks to US Department of Energy and CODAR team

×