SlideShare a Scribd company logo
1 of 20
BIG DATA SOLUTION FOR
CTBT MONITORING:
CEA-IDC JOINT GLOBAL
CROSS CORRELATION
PROJECT
15 mai 2014 CEA | 21 JUIN 2012
International Data Centre 25 October 2010 Page 2
Presenters
Dmitry Bobrov1), Randy Bell1), Nicolas Brachet2), Pierre
Gaillard2),, Jocelyn Guilbert2),, Ivan Kitov3), Mikhail
Rozhkov1)
1)International Data Centre, CTBTO,
2) Commissariat a l ’Energie Atomique,
3) Institute for Dynamics of Geospheres
Scientia
potestas
est
Scientia
potestas
est
Informatio
n potestas
est
Cross-
correlation
Scientia
potestas
est
Informatio
n potestas
est
Tremendous seismic data
growth dictates:
Repeating seismicity: the IDC view
Dozens to hundreds of events from the same Earth cell.
But how can we populate the aseismic area with quality master event?
IMS seismic network
Blue circles – primary arrays, blue triangles – primary 3-C stations.
Yellow circles – auxiliary arrays, yellow triangles – auxiliary 3-C stations. Red
stars – underground nuclear explosions.
Primary network includes 25 arrays
Global Cross Correlation Grid +
Aftershock Sequence Processing
What is Grid?
• Grid is a set of loci of
hypothetic master events.
• Master is a set of waveform
templates linking array
station and the locus.
• Spacing between masters
~140 km.
• P-wave templates from three
to ten IMS primary arrays
per master.
• At least three IMS stations to
create an REB event.
Templates needed:
Real waveforms – for seismic areas
Grand masters – for adjacent territories
Synthetic waveforms – for aseismic areas
What is Grid?
• Grid is a set of loci of
hypothetic master events.
• Master is a set of waveform
templates linking array
station and the locus.
• P-wave templates from three
to ten IMS primary arrays
per master.
• At least three IMS stations to
create an REB event.
Global Cross Correlation Grid +
Aftershock Sequence Processing
Building Masters:
IDC database comprises
hundreds of thousands seismic
events. Building comprehensive
master event database would
require:
1. To cross-correlate each by
each event (low cost effort).
2. To cross correlate each event
with 10-year time interval
event history of IDC
database - extremely high
cost effort.
Global Cross Correlation Grid +
Aftershock Sequence Processing
Template dimensionality
reduction is crucial
• A repeating seismicity map
showed that one point on a
grid may correspond to
dozens or even hundreds of
templates. Effective
dimensionality reduction
technique to be applied to the
clusters of such events to pick
up a limited number of
master events for each
cluster.
• These techniques must be
applied as well to the sets of
synthetic events generated for
the aseismic areas
Global Cross Correlation Grid +
Aftershock Sequence Processing
BIG DATA
Solution
needed
Global Cross Correlation Grid +
Aftershock Sequence Processing
Data is everything
Data centers (IDC, NDCs) collect, process, analyze, produce data 24 hours a day, 7
days a week
Data is the cornerstone : full of information and source of knowledge
Data sets are :
+ Large and growing Volume
+ Complex and heterogeneous Variety
+ Continuous stream and real time Velocity
+ Sometimes imprecise Veracity
= Big Data 4V
A (big) technological problem
Intrinsic mismatch between Data and IT (Information Technology) :
Data volume increases 100x in 10 years
I/O bandwidth improves ~3x in 10 years
Difficult to process all the data with traditional applications within tolerable elapsed
time
What is Big Data
DataScale
Question is
How to bring a very practical solution to the challenge raised by the
exponential growth of the volume of data to be processed ?
DataScale project
Consortium of 9 partners, from large research
laboratories (CEA/DAM, IPGP) to SMEs,
including also big companies (BULL)
A two-year project, started in September 2013
Supported by the French government
Selected and funded by
the « Investments for the Future » program
DataScale objective
Design efficient Big Data solutions, suited to real use cases
Technological Solutions
High-Performance Computing
HPC already deals with data sets
from large-scale simulation of physical
phenomena
Enrich / Extend HPC solutions with
specific Big Data technological building
blocks
Building blocks
Efficient data processing (Distributed Mining of Data)
 Distribute, parallelize and deploy the application on HPC platform
Efficient data management (Mining of Distributed Data)
 Define hierarchy of data storage (data life cycle, reuse process)
NoSQL DataBase Management System (DBMS) with data mining technologies
 Handle very large data volumes and different types of data
TGCC
Mka3D
CEA Use Cases
A data-driven project
 Evaluation of the relevance of the technological solutions by implementing
demonstrators.
 3 areas, 4 real world applications at real scale :
Area Application
Cluster management
(CEA/DSSI)
Monitoring and enhancement of
HPC platform
Analysis of HPC log journals with data mining
techniques (detection and correlation of failure
patterns)
Social Media
Monitoring
(Linkfluence)
Measuring and reporting daily
web activities (companies, user,
topic,…)
Analysis of millions of conversations and
images (100 countries and 50 languages)
through social accounts (eg. Twitter, Facebook,
Google+)
Seismology
(IPGP)
Tomography of Europe Seismic noise correlation of 200 European
stations (5 years of records)
Seismology
(CEA/DASE)
Event detection Massive correlation between continuous data
stream and event template (Master Event
algorithm)
CEA-PTS Collaboration
Unique data analysis to revise the seismicity :
- of the last 10 years
- at global scale with a network of seismic stations distributed globally
The IDC high-quality dataset is a natural candidate for an extensive cross
correlation study :
- continuous seismic data from the primary IMS stations since 2000.
- 450,000 seismic events in the REB,
- tens of millions of raw detections.
Collaboration with IDC teams to:
- enhance the Master Event algorithm (use of station 3CP, association,
synthetic master event, subspacing)
- test and deploy the application on the secure and powerful HPC
infrastructure of the CEA.
Roadmap
15 mai 2014 | PAGE
Date Phase
Sep. 2013 Kick-Off
Oct. 2014 Design Specification : workflow and NoSQL database
Mar. 2014 Development NoSQL DBMS (Armadillo)
Algorithm enhancement
Workflow integration
Sep. 2014 Test Deployment
Run at reduced scale (3 years, regional network)
Result analysis
Apr. 2015 Demonstration Run at full scale (10 years, global network)
Result analysis
Aug. 2015 Assessment Reflection on the new components integration in the
operational chain
DATASCALE Partners
The DataScale project partners are :
ActiveEon
Armadillo
Bull
CEA (DASE)
CEA (LIST)
CEA (DSSI)
INRIA
IPGP
Linkfluence
CONCLUSION
We are:
Facing a BIG challenge.
Preparing a decisive turn toward a new data management
infrastructure.
Not alone, surrounded with extremely valuable partners.
New approach to nuclear monitoring
Thank you for
your attention!

More Related Content

Viewers also liked

The use of waveform cross correlation for creation of an accurate catalogue o...
The use of waveform cross correlation for creation of an accurate catalogue o...The use of waveform cross correlation for creation of an accurate catalogue o...
The use of waveform cross correlation for creation of an accurate catalogue o...Ivan Kitov
 
Acoustic and seismic effects of the 2013 Chelyabinsk meteorite as measured by...
Acoustic and seismic effects of the 2013 Chelyabinsk meteorite as measured by...Acoustic and seismic effects of the 2013 Chelyabinsk meteorite as measured by...
Acoustic and seismic effects of the 2013 Chelyabinsk meteorite as measured by...Ivan Kitov
 
The Magic Of Making Up
The Magic Of Making UpThe Magic Of Making Up
The Magic Of Making Upfarrarbd0297
 
Mid test robi rinando
Mid test robi rinandoMid test robi rinando
Mid test robi rinandorobirinando
 
Micro Nichos Rentables 2 Gratis
Micro Nichos Rentables 2 GratisMicro Nichos Rentables 2 Gratis
Micro Nichos Rentables 2 Gratisbellaaey8101
 
Advanced contact center solutions
Advanced contact center solutions Advanced contact center solutions
Advanced contact center solutions Bojan Jovic
 

Viewers also liked (9)

The use of waveform cross correlation for creation of an accurate catalogue o...
The use of waveform cross correlation for creation of an accurate catalogue o...The use of waveform cross correlation for creation of an accurate catalogue o...
The use of waveform cross correlation for creation of an accurate catalogue o...
 
Acoustic and seismic effects of the 2013 Chelyabinsk meteorite as measured by...
Acoustic and seismic effects of the 2013 Chelyabinsk meteorite as measured by...Acoustic and seismic effects of the 2013 Chelyabinsk meteorite as measured by...
Acoustic and seismic effects of the 2013 Chelyabinsk meteorite as measured by...
 
The Magic Of Making Up
The Magic Of Making UpThe Magic Of Making Up
The Magic Of Making Up
 
Call sari rahma yanti
Call sari rahma yantiCall sari rahma yanti
Call sari rahma yanti
 
Call sari rahma yanti
Call sari rahma yantiCall sari rahma yanti
Call sari rahma yanti
 
Mid test robi rinando
Mid test robi rinandoMid test robi rinando
Mid test robi rinando
 
Call sari rahma yanti
Call sari rahma yantiCall sari rahma yanti
Call sari rahma yanti
 
Micro Nichos Rentables 2 Gratis
Micro Nichos Rentables 2 GratisMicro Nichos Rentables 2 Gratis
Micro Nichos Rentables 2 Gratis
 
Advanced contact center solutions
Advanced contact center solutions Advanced contact center solutions
Advanced contact center solutions
 

Similar to Big Data solution for CTBT monitoring:CEA-IDC joint global cross correlation project

ruby dicom poster ver 1.14
ruby dicom poster ver 1.14ruby dicom poster ver 1.14
ruby dicom poster ver 1.14Perry Horwich
 
QuaP2P Lunchtalk on Online Social Networks 2010 - LifeSocial
QuaP2P Lunchtalk on Online Social Networks 2010 - LifeSocialQuaP2P Lunchtalk on Online Social Networks 2010 - LifeSocial
QuaP2P Lunchtalk on Online Social Networks 2010 - LifeSocialKalman Graffi
 
Internet degli Oggetti (lecture Elis center)
Internet degli Oggetti (lecture Elis center)Internet degli Oggetti (lecture Elis center)
Internet degli Oggetti (lecture Elis center)Leandro Agro'
 
A Picture of Cassandra in the Real World - StampedeCon 2014
A Picture of Cassandra in the Real World - StampedeCon 2014A Picture of Cassandra in the Real World - StampedeCon 2014
A Picture of Cassandra in the Real World - StampedeCon 2014StampedeCon
 
Albert Grooten - Reductie van Total Cost of Ownership
Albert Grooten - Reductie van Total Cost of OwnershipAlbert Grooten - Reductie van Total Cost of Ownership
Albert Grooten - Reductie van Total Cost of OwnershipDraka Communications
 

Similar to Big Data solution for CTBT monitoring:CEA-IDC joint global cross correlation project (7)

ruby dicom poster ver 1.14
ruby dicom poster ver 1.14ruby dicom poster ver 1.14
ruby dicom poster ver 1.14
 
QuaP2P Lunchtalk on Online Social Networks 2010 - LifeSocial
QuaP2P Lunchtalk on Online Social Networks 2010 - LifeSocialQuaP2P Lunchtalk on Online Social Networks 2010 - LifeSocial
QuaP2P Lunchtalk on Online Social Networks 2010 - LifeSocial
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Internet degli Oggetti (lecture Elis center)
Internet degli Oggetti (lecture Elis center)Internet degli Oggetti (lecture Elis center)
Internet degli Oggetti (lecture Elis center)
 
datatalk IRCELINE
datatalk IRCELINEdatatalk IRCELINE
datatalk IRCELINE
 
A Picture of Cassandra in the Real World - StampedeCon 2014
A Picture of Cassandra in the Real World - StampedeCon 2014A Picture of Cassandra in the Real World - StampedeCon 2014
A Picture of Cassandra in the Real World - StampedeCon 2014
 
Albert Grooten - Reductie van Total Cost of Ownership
Albert Grooten - Reductie van Total Cost of OwnershipAlbert Grooten - Reductie van Total Cost of Ownership
Albert Grooten - Reductie van Total Cost of Ownership
 

More from Ivan Kitov

Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...Ivan Kitov
 
Detection and location of small aftershocks using waveform cross correlation
Detection and location of small aftershocks using waveform cross correlationDetection and location of small aftershocks using waveform cross correlation
Detection and location of small aftershocks using waveform cross correlationIvan Kitov
 
Waveform cross correlation: coherency of seismic signals estimated from repea...
Waveform cross correlation: coherency of seismic signals estimated from repea...Waveform cross correlation: coherency of seismic signals estimated from repea...
Waveform cross correlation: coherency of seismic signals estimated from repea...Ivan Kitov
 
Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...Ivan Kitov
 
Investigation of repeated events at Jordan phosphate mine with waveform cross...
Investigation of repeated events at Jordan phosphate mine with waveform cross...Investigation of repeated events at Jordan phosphate mine with waveform cross...
Investigation of repeated events at Jordan phosphate mine with waveform cross...Ivan Kitov
 
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationInvestigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationIvan Kitov
 
Recovery of aftershock sequences using waveform cross correlation: from catas...
Recovery of aftershock sequences using waveform cross correlation: from catas...Recovery of aftershock sequences using waveform cross correlation: from catas...
Recovery of aftershock sequences using waveform cross correlation: from catas...Ivan Kitov
 
Testing the global grid of master events for waveform cross correlation with ...
Testing the global grid of master events for waveform cross correlation with ...Testing the global grid of master events for waveform cross correlation with ...
Testing the global grid of master events for waveform cross correlation with ...Ivan Kitov
 
Detection of the 2006 DPRK event using small aperture array Mikhnevo
Detection of the 2006 DPRK event using small aperture array MikhnevoDetection of the 2006 DPRK event using small aperture array Mikhnevo
Detection of the 2006 DPRK event using small aperture array MikhnevoIvan Kitov
 
The use of a 3-C array for regional monitoring
The use of a 3-C array for regional monitoringThe use of a 3-C array for regional monitoring
The use of a 3-C array for regional monitoringIvan Kitov
 
Inflation, Unemployment, and Labor Force: The Phillips Curve and Long-term Pr...
Inflation, Unemployment, and Labor Force: The Phillips Curve and Long-term Pr...Inflation, Unemployment, and Labor Force: The Phillips Curve and Long-term Pr...
Inflation, Unemployment, and Labor Force: The Phillips Curve and Long-term Pr...Ivan Kitov
 
Joint interpretation of infrasound, acoustic, and seismic waves from meteorit...
Joint interpretation of infrasound, acoustic, and seismic waves from meteorit...Joint interpretation of infrasound, acoustic, and seismic waves from meteorit...
Joint interpretation of infrasound, acoustic, and seismic waves from meteorit...Ivan Kitov
 
The Chelyabinsk meteor: joint interpretation of infrasound, acoustic, and sei...
The Chelyabinsk meteor: joint interpretation of infrasound, acoustic, and sei...The Chelyabinsk meteor: joint interpretation of infrasound, acoustic, and sei...
The Chelyabinsk meteor: joint interpretation of infrasound, acoustic, and sei...Ivan Kitov
 
Modelling the transition from a socialist to capitalist economic system
Modelling the transition from a socialist to capitalist economic systemModelling the transition from a socialist to capitalist economic system
Modelling the transition from a socialist to capitalist economic system Ivan Kitov
 
Evolution of the personal income distribution in the USA: high incomes
Evolution of the personal income distribution in the USA: high incomesEvolution of the personal income distribution in the USA: high incomes
Evolution of the personal income distribution in the USA: high incomesIvan Kitov
 
Review regional Source Specific Station Corrections (SSSCs) developed for no...
Review regional Source Specific Station Corrections (SSSCs) developed for  no...Review regional Source Specific Station Corrections (SSSCs) developed for  no...
Review regional Source Specific Station Corrections (SSSCs) developed for no...Ivan Kitov
 
The dynamics of personal income distribution and inequality in the United States
The dynamics of personal income distribution and inequality in the United StatesThe dynamics of personal income distribution and inequality in the United States
The dynamics of personal income distribution and inequality in the United StatesIvan Kitov
 
Global grid of master events for waveform cross correlation: design and testing
Global grid of master events for waveform cross correlation: design and testingGlobal grid of master events for waveform cross correlation: design and testing
Global grid of master events for waveform cross correlation: design and testingIvan Kitov
 

More from Ivan Kitov (18)

Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
 
Detection and location of small aftershocks using waveform cross correlation
Detection and location of small aftershocks using waveform cross correlationDetection and location of small aftershocks using waveform cross correlation
Detection and location of small aftershocks using waveform cross correlation
 
Waveform cross correlation: coherency of seismic signals estimated from repea...
Waveform cross correlation: coherency of seismic signals estimated from repea...Waveform cross correlation: coherency of seismic signals estimated from repea...
Waveform cross correlation: coherency of seismic signals estimated from repea...
 
Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...
 
Investigation of repeated events at Jordan phosphate mine with waveform cross...
Investigation of repeated events at Jordan phosphate mine with waveform cross...Investigation of repeated events at Jordan phosphate mine with waveform cross...
Investigation of repeated events at Jordan phosphate mine with waveform cross...
 
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationInvestigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlation
 
Recovery of aftershock sequences using waveform cross correlation: from catas...
Recovery of aftershock sequences using waveform cross correlation: from catas...Recovery of aftershock sequences using waveform cross correlation: from catas...
Recovery of aftershock sequences using waveform cross correlation: from catas...
 
Testing the global grid of master events for waveform cross correlation with ...
Testing the global grid of master events for waveform cross correlation with ...Testing the global grid of master events for waveform cross correlation with ...
Testing the global grid of master events for waveform cross correlation with ...
 
Detection of the 2006 DPRK event using small aperture array Mikhnevo
Detection of the 2006 DPRK event using small aperture array MikhnevoDetection of the 2006 DPRK event using small aperture array Mikhnevo
Detection of the 2006 DPRK event using small aperture array Mikhnevo
 
The use of a 3-C array for regional monitoring
The use of a 3-C array for regional monitoringThe use of a 3-C array for regional monitoring
The use of a 3-C array for regional monitoring
 
Inflation, Unemployment, and Labor Force: The Phillips Curve and Long-term Pr...
Inflation, Unemployment, and Labor Force: The Phillips Curve and Long-term Pr...Inflation, Unemployment, and Labor Force: The Phillips Curve and Long-term Pr...
Inflation, Unemployment, and Labor Force: The Phillips Curve and Long-term Pr...
 
Joint interpretation of infrasound, acoustic, and seismic waves from meteorit...
Joint interpretation of infrasound, acoustic, and seismic waves from meteorit...Joint interpretation of infrasound, acoustic, and seismic waves from meteorit...
Joint interpretation of infrasound, acoustic, and seismic waves from meteorit...
 
The Chelyabinsk meteor: joint interpretation of infrasound, acoustic, and sei...
The Chelyabinsk meteor: joint interpretation of infrasound, acoustic, and sei...The Chelyabinsk meteor: joint interpretation of infrasound, acoustic, and sei...
The Chelyabinsk meteor: joint interpretation of infrasound, acoustic, and sei...
 
Modelling the transition from a socialist to capitalist economic system
Modelling the transition from a socialist to capitalist economic systemModelling the transition from a socialist to capitalist economic system
Modelling the transition from a socialist to capitalist economic system
 
Evolution of the personal income distribution in the USA: high incomes
Evolution of the personal income distribution in the USA: high incomesEvolution of the personal income distribution in the USA: high incomes
Evolution of the personal income distribution in the USA: high incomes
 
Review regional Source Specific Station Corrections (SSSCs) developed for no...
Review regional Source Specific Station Corrections (SSSCs) developed for  no...Review regional Source Specific Station Corrections (SSSCs) developed for  no...
Review regional Source Specific Station Corrections (SSSCs) developed for no...
 
The dynamics of personal income distribution and inequality in the United States
The dynamics of personal income distribution and inequality in the United StatesThe dynamics of personal income distribution and inequality in the United States
The dynamics of personal income distribution and inequality in the United States
 
Global grid of master events for waveform cross correlation: design and testing
Global grid of master events for waveform cross correlation: design and testingGlobal grid of master events for waveform cross correlation: design and testing
Global grid of master events for waveform cross correlation: design and testing
 

Big Data solution for CTBT monitoring:CEA-IDC joint global cross correlation project

  • 1. BIG DATA SOLUTION FOR CTBT MONITORING: CEA-IDC JOINT GLOBAL CROSS CORRELATION PROJECT 15 mai 2014 CEA | 21 JUIN 2012
  • 2. International Data Centre 25 October 2010 Page 2 Presenters Dmitry Bobrov1), Randy Bell1), Nicolas Brachet2), Pierre Gaillard2),, Jocelyn Guilbert2),, Ivan Kitov3), Mikhail Rozhkov1) 1)International Data Centre, CTBTO, 2) Commissariat a l ’Energie Atomique, 3) Institute for Dynamics of Geospheres
  • 6. Repeating seismicity: the IDC view Dozens to hundreds of events from the same Earth cell. But how can we populate the aseismic area with quality master event?
  • 7. IMS seismic network Blue circles – primary arrays, blue triangles – primary 3-C stations. Yellow circles – auxiliary arrays, yellow triangles – auxiliary 3-C stations. Red stars – underground nuclear explosions. Primary network includes 25 arrays
  • 8. Global Cross Correlation Grid + Aftershock Sequence Processing What is Grid? • Grid is a set of loci of hypothetic master events. • Master is a set of waveform templates linking array station and the locus. • Spacing between masters ~140 km. • P-wave templates from three to ten IMS primary arrays per master. • At least three IMS stations to create an REB event.
  • 9. Templates needed: Real waveforms – for seismic areas Grand masters – for adjacent territories Synthetic waveforms – for aseismic areas What is Grid? • Grid is a set of loci of hypothetic master events. • Master is a set of waveform templates linking array station and the locus. • P-wave templates from three to ten IMS primary arrays per master. • At least three IMS stations to create an REB event. Global Cross Correlation Grid + Aftershock Sequence Processing
  • 10. Building Masters: IDC database comprises hundreds of thousands seismic events. Building comprehensive master event database would require: 1. To cross-correlate each by each event (low cost effort). 2. To cross correlate each event with 10-year time interval event history of IDC database - extremely high cost effort. Global Cross Correlation Grid + Aftershock Sequence Processing
  • 11. Template dimensionality reduction is crucial • A repeating seismicity map showed that one point on a grid may correspond to dozens or even hundreds of templates. Effective dimensionality reduction technique to be applied to the clusters of such events to pick up a limited number of master events for each cluster. • These techniques must be applied as well to the sets of synthetic events generated for the aseismic areas Global Cross Correlation Grid + Aftershock Sequence Processing
  • 12. BIG DATA Solution needed Global Cross Correlation Grid + Aftershock Sequence Processing
  • 13. Data is everything Data centers (IDC, NDCs) collect, process, analyze, produce data 24 hours a day, 7 days a week Data is the cornerstone : full of information and source of knowledge Data sets are : + Large and growing Volume + Complex and heterogeneous Variety + Continuous stream and real time Velocity + Sometimes imprecise Veracity = Big Data 4V A (big) technological problem Intrinsic mismatch between Data and IT (Information Technology) : Data volume increases 100x in 10 years I/O bandwidth improves ~3x in 10 years Difficult to process all the data with traditional applications within tolerable elapsed time What is Big Data
  • 14. DataScale Question is How to bring a very practical solution to the challenge raised by the exponential growth of the volume of data to be processed ? DataScale project Consortium of 9 partners, from large research laboratories (CEA/DAM, IPGP) to SMEs, including also big companies (BULL) A two-year project, started in September 2013 Supported by the French government Selected and funded by the « Investments for the Future » program DataScale objective Design efficient Big Data solutions, suited to real use cases
  • 15. Technological Solutions High-Performance Computing HPC already deals with data sets from large-scale simulation of physical phenomena Enrich / Extend HPC solutions with specific Big Data technological building blocks Building blocks Efficient data processing (Distributed Mining of Data)  Distribute, parallelize and deploy the application on HPC platform Efficient data management (Mining of Distributed Data)  Define hierarchy of data storage (data life cycle, reuse process) NoSQL DataBase Management System (DBMS) with data mining technologies  Handle very large data volumes and different types of data TGCC Mka3D
  • 16. CEA Use Cases A data-driven project  Evaluation of the relevance of the technological solutions by implementing demonstrators.  3 areas, 4 real world applications at real scale : Area Application Cluster management (CEA/DSSI) Monitoring and enhancement of HPC platform Analysis of HPC log journals with data mining techniques (detection and correlation of failure patterns) Social Media Monitoring (Linkfluence) Measuring and reporting daily web activities (companies, user, topic,…) Analysis of millions of conversations and images (100 countries and 50 languages) through social accounts (eg. Twitter, Facebook, Google+) Seismology (IPGP) Tomography of Europe Seismic noise correlation of 200 European stations (5 years of records) Seismology (CEA/DASE) Event detection Massive correlation between continuous data stream and event template (Master Event algorithm)
  • 17. CEA-PTS Collaboration Unique data analysis to revise the seismicity : - of the last 10 years - at global scale with a network of seismic stations distributed globally The IDC high-quality dataset is a natural candidate for an extensive cross correlation study : - continuous seismic data from the primary IMS stations since 2000. - 450,000 seismic events in the REB, - tens of millions of raw detections. Collaboration with IDC teams to: - enhance the Master Event algorithm (use of station 3CP, association, synthetic master event, subspacing) - test and deploy the application on the secure and powerful HPC infrastructure of the CEA.
  • 18. Roadmap 15 mai 2014 | PAGE Date Phase Sep. 2013 Kick-Off Oct. 2014 Design Specification : workflow and NoSQL database Mar. 2014 Development NoSQL DBMS (Armadillo) Algorithm enhancement Workflow integration Sep. 2014 Test Deployment Run at reduced scale (3 years, regional network) Result analysis Apr. 2015 Demonstration Run at full scale (10 years, global network) Result analysis Aug. 2015 Assessment Reflection on the new components integration in the operational chain
  • 19. DATASCALE Partners The DataScale project partners are : ActiveEon Armadillo Bull CEA (DASE) CEA (LIST) CEA (DSSI) INRIA IPGP Linkfluence
  • 20. CONCLUSION We are: Facing a BIG challenge. Preparing a decisive turn toward a new data management infrastructure. Not alone, surrounded with extremely valuable partners. New approach to nuclear monitoring Thank you for your attention!