Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

Martin Kunz, Nobumichi Tamura
Advanced Light Source, Lawrence Berkeley National Lab

Towards real-time analysis of large data volumes
for synchrotron experiments

Acknowledgements

- Jack Deslippe, David Skinner (NERSC)
- Abdelilah Essiari , Craig E. Tull (LBNL-CRD)
- Eli Dart (ESNET)
- Dula Parkinson (LBNL – ALS)


X-rays and Earth-Sciences; the story of a moving bottle-neck:
1960’s / 1970’s
X-ray Source

X-ray Detectors

Henry Levy with Picker 5-circle and PDP-5

Data Analysis

Publication


1980’s / 1990’s
X-ray Source

X-ray Detectors

1995: “MD Storm”: Readout time: 45 minutes

Data Analysis

Publication


2000’s / 2010’s
X-ray Source

X-ray Detectors

Data Analysis

Publication



Future:
X-ray Source

X-ray Detectors

Interactive access to supercomputers

Data Analysis

Publication


Examples of mineral physics related experiments with high data rates:
1) In situ powder diffraction with automated P-T stepping:

ALS BL 12.2.2 with Perkin Elmer detector (~ 0 read-out delay)

http://www.ltp-oldenburg.de

Data rate in the order of 1000’s of frames per day (i.e. 10’s of GB/day)


2) Micro-diffraction / phase/orientation/strain-mapping at high spatial resolution

Micro-diffraction set-up at ALS beamline 12.3.2 with
Pilatus-1M detector.

Left: Distribution of Re3N (black) and Re (blue) grown in a laser-heated DAC
Right: Relative orientation of Re3N grains.
Source: Friedrich et al. (2010), PRL (105), 085504.

Data rate in the order of 10000’s of frames per day (i.e. 100’s of GB/day)


3) Tomography 3d-mapping of geo-materials:

X-rays

Scintillator

Supercritical CO2 penetrating sandstone on ALS BL 8.3.2 (courtesy J
Ajo-Franklin)

Tomography set-up at ALS beamline 8.3.2
Distribution of Fe-alloy melt prepared at 64 GPa measured at SSRL. Shi et al. (2013)
Nature Geosciences. DOI: 10.1038/NGEO1956

Data rate in the order of 100’000’s of frames per day (i.e. TB’s/day)


How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
- 24 dual-socket AMD Opteron 248 2.2Ghz processor nodes 48 CPU’s
- 48 GB aggregate memory
- 14 TB shared disk storage
- Gigabit Ethernet interconnect
- 212 GFLOPS (theoretical peak)


1) User tunes parameters manually on some ‘typical’ patterns


1) Analysis Parameters are written into a instruction-file


2) Launch parsing script:
-> reads instruction file and parses data-file onto available CPU’s
-> writes batch files which manage individual CPU’s
-> launches software on each node


3) Results are written in a single file which can be viewed and further analyzed and published:
Relative lattice orientation: Gives domain structure.
Total color range blue to red corresponds to 4 degs rotation.

Average Intensity: Gives high-res fine structure of grain


2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection

Data are packaged:
- after every n images a ‘trigger file’ is deposited in a
directory which is monitored by NERSC.
- a SPADE web-app wraps the data (512 files at a
time) with HDF5 (hierarchical data format) and ships
them to NERSC via a Gigabit line (will be upgraded to
10G line).
- at NERSC data are received by a SPADE instance,
places them in target folder and on tape, and sends
an acknowledgment.


(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection Up and running

Transfer control is web-based


(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection: Up and running

Transfer control is web-based


(in development)
2) Analysis parameters are set-up with a web-app - under development


(in development)
2) Analysis parameters are set-up with a web-app - under development

Jobs are launched manually by user via same web-page.
Test-runs indicate analysis time in the order of data collection time;
can in principle run synchronous to data collection.


(in development)
3) Analysis jobs are executed on Carver - under development

Carver is an IBM iDataPlex cluster
- 1202 nodes with a total of 9984 processor cores
- 106 Tflop/sec peak performance
- largest allocated parallel job is 512 cores


Summary:
- Data analysis is the new bottle-neck limiting progress in many aspects of experimental mineral
physics
- Real-time analysis with immediate feed-back is increasingly important in experimental mineral
physics
- These challenges cannot always be met with traditional desktop machines – software has to be
automatized and parallelized; collaborations with super-computing is becoming important also for
experimental scientists (at least for a few more iterations of Moore’s cycle).
- Data analysis on super-computers, remotely controlled with web-applications is a very promising
alley, allowing for big-data methods to enter mineral physics.
- Future developments may (must?) evolve away from super computers to highly parallelized
(GPU’s) local computers and/or cloud computing.

Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Similar to Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL (20)

More from EarthCube

More from EarthCube (20)

Recently uploaded

Recently uploaded (20)

Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Editor's Notes