Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL
Upcoming SlideShare
Loading in...5
×
 

Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

on

  • 324 views

Talk at the EarthCube End-User Domain Workshop for Rock Deformation and Mineral Physics Research.

Talk at the EarthCube End-User Domain Workshop for Rock Deformation and Mineral Physics Research.

By Martin Kunz, Lawrence Berkeley National Laboratory

Statistics

Views

Total Views
324
Views on SlideShare
324
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • I would like to start off by giving a brief slightly personalized historic perspective on the application of X-rays in mineral physics research: <br /> X-rays are applied in Earth Sciences on a routine basis for about 50 years, this story thus pretty much parallels my life. In the 60-ies and 70-ies, when I was just learning how to spell X-ray the first automated diffractometer replaced fully manual film techniques…. The brightness of the X-rays available in those days limited a data collection powder or single crystal to days and weeks. <br />
  • This changed most dramatically with the advent of dedicated light sources, in particular high-energy 3rd generation sources such as the ESRF in Grenoble where the first dedicated mineral physics beamline ID30. I meanwhile managed to spell X-rays and thus was fortunate enough to be involved in the early days of said dedicated beamline. The brilliance of the ID30 undulator enabled experiments through a diamond anvil cell to be performed in matter of seconds. However, each data point required the physical transport of a 1 x 1 ft image plate to the one and only IP reader on the floor, plus a read-out time of about 45 minutes. Sadly, the tremendous increase in brightness and flux of the X-ray sources could only be utilized in a limited way. <br />
  • Another twenty years later - the age-apropriate amount of light sources meanwhile doesn’t fit on my birthday cake anymore - we hail the advent of ultra-fast and ultra-low noise direct detection X-ray detectors such as the Perkin-Elmer or pilatus, which - in principle- allow data-point rates of up to 30 Hz. This leads to the possibility of large data rates. However, our capabil abilities to deal with these data are largely still on the level of high-end desktops and serial work-flow software. The opportunity given to us by the combination of ever brighter lightsources and fast detectors, I.e. to apply big-data methods to mineral physics research can therefore not be fully harnessed. <br />
  • The way out of this bottleneck is in automatizing and parallelizing the analysis workflow using - at least for the time being - massively parallel super-computers. This is the approach we are presently taking at the Advanced Lightsource in collaboration with the National Energy Research Scientific Computing Center. <br />
  • Let me quickly give you 3 examples of the order of magnitude of data rates we have to deal with: <br /> Intense X-rays and fast detector, coupled with programmable T and P change allows a much denser coverage of the P-V-T surface and thus a much better description of thermo-elastic properties of Earth materials and their phase transitions…. <br />
  • Mineral physics experiments involving very high temperatures and pressures invariable forces us to deal with large spatial and temporal gradients of pressure, temperature and chemical composition. High-spatial or temporal resolution is therefore needed to explore these inhomegenities. Fast detectors and bright X-rays thus allow us to collect spatially / and or temporally highly resolved maps of our sample….. <br />
  • Going beyond diffraction, various flavors of tomographic techniques allow now to create 3-dimensional images of samples in- and ex-situ, if needed even with chemical or phase selectivity. Such experiments ….. <br />
  • This solution works fairly well with medium-sized datasets of up to 10000 frames; With larger data volumes and/or tricky data, data analysis even on a 48 CPU cluster can take much more than the data collection <br />

Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL Presentation Transcript

  • Towards real-time analysis of large data volumes for synchrotron experiments Martin Kunz, Nobumichi Tamura Advanced Light Source, Lawrence Berkeley National Lab
  • Towards real-time analysis of large data volumes for synchrotron experiments Acknowledgements - Jack Deslippe, David Skinner (NERSC) - Abdelilah Essiari , Craig E. Tull (LBNL-CRD) - Eli Dart (ESNET) - Dula Parkinson (LBNL – ALS)
  • Towards real-time analysis of large data volumes for synchrotron experiments X-rays and Earth-Sciences; the story of a moving bottle-neck: 1960’s / 1970’s X-ray Source X-ray Detectors Henry Levy with Picker 5-circle and PDP-5 Data Analysis Publication
  • Towards real-time analysis of large data volumes for synchrotron experiments X-rays and Earth-Sciences; the story of a moving bottle-neck: 1980’s / 1990’s X-ray Source X-ray Detectors 1995: “MD Storm”: Readout time: 45 minutes Data Analysis Publication
  • Towards real-time analysis of large data volumes for synchrotron experiments X-rays and Earth-Sciences; the story of a moving bottle-neck: 2000’s / 2010’s X-ray Source X-ray Detectors Data Analysis Publication
  • Towards real-time analysis of large data volumes for synchrotron experiments X-rays and Earth-Sciences; the story of a moving bottle-neck: Future: X-ray Source X-ray Detectors Interactive access to supercomputers Data Analysis Publication
  • Towards real-time analysis of large data volumes for synchrotron experiments Examples of mineral physics related experiments with high data rates: 1) In situ powder diffraction with automated P-T stepping: ALS BL 12.2.2 with Perkin Elmer detector (~ 0 read-out delay) http://www.ltp-oldenburg.de Data rate in the order of 1000’s of frames per day (i.e. 10’s of GB/day)
  • Towards real-time analysis of large data volumes for synchrotron experiments Examples of mineral physics related experiments with high data rates: 2) Micro-diffraction / phase/orientation/strain-mapping at high spatial resolution Micro-diffraction set-up at ALS beamline 12.3.2 with Pilatus-1M detector. Left: Distribution of Re3N (black) and Re (blue) grown in a laser-heated DAC Right: Relative orientation of Re3N grains. Source: Friedrich et al. (2010), PRL (105), 085504. Data rate in the order of 10000’s of frames per day (i.e. 100’s of GB/day)
  • Towards real-time analysis of large data volumes for synchrotron experiments Examples of mineral physics related experiments with high data rates: 3) Tomography 3d-mapping of geo-materials: X-rays Scintillator Supercritical CO2 penetrating sandstone on ALS BL 8.3.2 (courtesy J Ajo-Franklin) Tomography set-up at ALS beamline 8.3.2 Distribution of Fe-alloy melt prepared at 64 GPa measured at SSRL. Shi et al. (2013) Nature Geosciences. DOI: 10.1038/NGEO1956 Data rate in the order of 100’000’s of frames per day (i.e. TB’s/day)
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis - 24 dual-socket AMD Opteron 248 2.2Ghz processor nodes 48 CPU’s - 48 GB aggregate memory - 14 TB shared disk storage - Gigabit Ethernet interconnect - 212 GFLOPS (theoretical peak)
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis 1) User tunes parameters manually on some ‘typical’ patterns
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis 1) Analysis Parameters are written into a instruction-file
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis 1) Analysis Parameters are written into a instruction-file
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis 2) Launch parsing script: -> reads instruction file and parses data-file onto available CPU’s -> writes batch files which manage individual CPU’s -> launches software on each node
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 1) Not-quite-real-time - local cluster for micro-diffraction analysis 3) Results are written in a single file which can be viewed and further analyzed and published: Relative lattice orientation: Gives domain structure. Total color range blue to red corresponds to 4 degs rotation. Average Intensity: Gives high-res fine structure of grain
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 1) Data are sent directly to NERSC for analysis and storage during data collection Data are packaged: - after every n images a ‘trigger file’ is deposited in a directory which is monitored by NERSC. - a SPADE web-app wraps the data (512 files at a time) with HDF5 (hierarchical data format) and ships them to NERSC via a Gigabit line (will be upgraded to 10G line). - at NERSC data are received by a SPADE instance, places them in target folder and on tape, and sends an acknowledgment.
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 1) Data are sent directly to NERSC for analysis and storage during data collection Up and running Transfer control is web-based
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 1) Data are sent directly to NERSC for analysis and storage during data collection Up and running Transfer control is web-based
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 1) Data are sent directly to NERSC for analysis and storage during data collection: Up and running Transfer control is web-based
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 2) Analysis parameters are set-up with a web-app - under development
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 2) Analysis parameters are set-up with a web-app - under development Jobs are launched manually by user via same web-page. Test-runs indicate analysis time in the order of data collection time; can in principle run synchronous to data collection.
  • Towards real-time analysis of large data volumes for synchrotron experiments How do we tackle this at the ALS? 2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development) 3) Analysis jobs are executed on Carver - under development Carver is an IBM iDataPlex cluster - 1202 nodes with a total of 9984 processor cores - 106 Tflop/sec peak performance - largest allocated parallel job is 512 cores
  • Towards real-time analysis of large data volumes for synchrotron experiments Summary: - Data analysis is the new bottle-neck limiting progress in many aspects of experimental mineral physics - Real-time analysis with immediate feed-back is increasingly important in experimental mineral physics - These challenges cannot always be met with traditional desktop machines – software has to be automatized and parallelized; collaborations with super-computing is becoming important also for experimental scientists (at least for a few more iterations of Moore’s cycle). - Data analysis on super-computers, remotely controlled with web-applications is a very promising alley, allowing for big-data methods to enter mineral physics. - Future developments may (must?) evolve away from super computers to highly parallelized (GPU’s) local computers and/or cloud computing.