A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

A Data Lake and a Data Lab to Optimize
Operations and Safety Within a Nuclear Fleet
Hadoop Summit 2016, San José, June 30th
Marie-Luce PICARD, EDF R&D – marie-luce.picard@edf.fr
Jean-Marc RANGOD, EDF-DPNT
Christophe SALPERWYCK, EDF R&D
Special thanks to Raphaël QUERCIA EDF-DTG, Carole MAI and Amandine PIERROT EDF R&D

2
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr

3
Outline
6. AS A CONCLUSION

4
ELECTRICITY GENERATION
623.5 TWH
All electricity-related activities
Generation
Transmission & Distribution
Trading and Sales & Marketing
Energy services
Key figures*
€72.9 billion in sales
38.5 million customers
158,161 employees worldwide
84.7% of generation does not emit CO2
2014 INVESTMENTS
€4.5 BILLION
EDF: A GLOBAL LEADER IN ELECTRICITY
*as of 2015
EDF :
AN EFFICIENT,
RESPONSIBLE
ELECTRICITY COMPANY
AND THE CHAMPION
OF LOW-CARBON
GROWTH

WORLD’S LEADING OPERATOR, EXCELLENT
PERFORMANCE IN FRANCE
72.9 GW installed capacity, 54% of the Group’s net generation
capacity
477.7 TWh generated, 77% of the Group’s output
58 reactors operated in France,
15 in the UK
3 EPR under construction:
— 1 in Flamanville (France)
— 2 in Taishan (China)
2 EPR in project phase
 OSART safety audit
17 best practices identified by IAEA
 France
Best generation performance for six years
 UK
World record for safety in the workplace
 China
Strengthened cooperation agreement with CNNC
NUCLEAR
EDF 2015 I P.5

Scientific
partnerships with
actors of Paris-
Saclay
research departments
8
exceptional buildings
4 outstanding hall test
1 Unique equipment,
innovative
communication
tools
Diverse areas of
expertise
1500
work stations
Plenty of
collaborative
spaces
EDF LAB PARIS-SACLAY

9
Main Big Data related challenges for EDF
Power Generation
 Process monitoring and condition-based maintenance
from sensors
 Power generation forecasting for renewables
Energy management
 Load forecasting
 Balancing and optimizing generation and consumption
(using smart metering information, including
renewables)
 Electrical networks
 Smart Grid operations (local)
 Condition-based maintenance
Customers and sales
 New services to customers using smart-metering data
 Smart Homes, Smart Building, Smart Cities management
related to energy

10
Outline
6. AS A CONCLUSION

11
Operations and maintenance of the nuclear fleet
 The maintenance policy of EDF generation fleet is optimized to ensure reliability and safety of
equipment and systems while strengthening our competitiveness:
 Have better diagnosis, improved performance and availability
 Make a better use of data and documents, so far stored into Data silos
 More globally, the IT teams and projects aim at:
 Strengthen performance of operations and maintenance through a global fleet approach
 Simplify the Industrial Information System architecture
 Improve and develop the way we use our data
 Accumulate and archive data through time
… while reducing costs

12
Voluminous and heterogeneous data …. stored in data silos
Source : Wikipedia
One DB by nuclear site, gathering data from
sensors. Use of Data Historians.
 Focus on data:
 High volume:
 data is stored up to 40-60 years (lifetime of the plant)
 SCADA data can be sampled every 20 to 40 ms (but mainly a few
seconds)
 Around 10.000 sensors per plant
 Variety:
 Data is heterogeneous
 Time series, images, documents
 Various data sources
 The actual systems (historians) don’t allow
too many concurrent access, and their SLA are
quite bad

13
A Data Lake for the nuclear fleet
ESPADON : the Data Lake for the nuclear
fleet
One DB by nuclear site, gathering data from
sensors. Use of Data Historians.
Source : Wikipedia
© M. Caraveo, Hadoop cluster NOE data center

14
Outline
6. AS A CONCLUSION

| 15
A data lake for the nuclear fleet: big picture
….
Files
(chemical
information)
Historian -
SCADA
Files
(dosimetry)
E-monitoring
application
Viz
Interactive
queries and
reporting
Web Service
Hadoop cluster – ESPADON
Data Lake
Reports
© M. Caraveo, Hadoop cluster NOE data
center

16
Zoom on data
 4 generations of plants, but high level of normalization of data and sensors (for
example, use of trigrams for identification of elementary systems)
 Two main types of sensors : ANA (for analogic) and TOR (for state events)
 Time series
 Volume
 For the POC, 10 plants, 2 years: about 20 billions of points
 Target (59 plants) : 15 To of data (all plants, whole lifecycle)
Metric, global Date Value Quality
BU2ABP177MT- 2015-04-30T22:05:00.000Z 156.6 Good/M
BU2ABP177MT- 2015-04-30T22:06:00.000Z 156.4 Good/M
BU2ABP177MT- 2015-04-30T22:07:00.000Z 156.2 Good/M
BU2ABP177MT- 2015-04-30T22:08:00.000Z 156.0 Good
BU2ABP177MT- 2015-04-30T22:09:00.000Z 156.2 Good/M
BU2ABP177MT- 2015-04-30T22:10:00.000Z 156.4 Good/M
BU2ABP177MT- 2015-04-30T22:12:00.000Z 156.7 Good/M
BU2ABP177MT- 2015-04-30T22:14:00.000Z 157.1 Good
BU2ABP177MT- 2015-04-30T22:15:00.000Z 157.3 Good
BU2ABP177MT- 2015-04-30T22:16:00.000Z 157.5 Good
BU2ABP177MT- 2015-04-30T22:19:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:20:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:21:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:22:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:24:00.000Z 156.9 Good/M
BU2ABP177MT- 2015-04-30T22:27:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:28:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:29:00.000Z 157.5 Good/M
BU2ABP177MT- 2015-04-30T22:30:00.000Z 157.7 Good/M

17
Data model
 Use of HBASE and PHOENIX
 Distributed key/values store
 Allows models update (normalization requirements evolution, new indicators… new plants)
 Phoenix for SQL compliance + BI tools
 Tables
 3 tables : DDT, ANA, TOR
 Rowkey : <sensorid, timestamp> (queries mainly consider one or several sensors for a period of time)
 Sequential storage ; split into Hfiles and Hregion according to the plant unit
Clé ColumnFamily Colonne Valeur Phoenix type
m
(concat(metriquei
d, timestamp))
0 v H_ValeurANA Float
q H_QualitéANA Char(10)
n H_NiveauxANA varchar(10)
Clé ColumnFamily Colonne Valeur Phoenix type
m
(concat(metriquei
d, timestamp))
0 v H_ValeurTOR Varchar(10)
q H_QualiteTOR Char(10)
n H_NiveauxTOR Varchar(10)

18
Validation and performances evaluation
 POC validation
 Upload of historical data; queries / analyses
 Existing functions: viz, reports, services
 Data injection: SCADA for the whole fleet,
integration of other sources of data
 Results
 6 weeks (estimated) needed to upload historical data
from 59 plants
 Queries for validating the model :
 Use of Jmeter for simulating load
 With or without insertion workload
 ~ < 1 second for drawing a curve for a selected month
 Integration of an existing GUI for viz (realized within a
few days)
 Validation of specific calculation within reports
 ODBC link for specific e-monitoring application
 Integration of various sources of (structured) data into
the data lake
 ‘Real-time’ insertion of data (micro-batch):
 Up to 2M points / s
 Very low latency between insertion and availability (< 10s)
SELECT
MIN(v), MAX(v),
FIRST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC),
LAST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC),
TO_CHAR(ts, 'dd') as day,
TO_CHAR(ts, 'HH') as hour,
TO_CHAR(ts, 'mm') as minute,
count(*) as cnt
FROM
ORLI_ANA
WHERE
m = ? AND
ts > current_time()-1 AND //last 24h
ts < current_time()
GROUP BY
day, hour, minute
Phoenix query (ANA)

19
Outline
6. AS A CONCLUSION

20
Added value of data science algorithms on heterogeneous data:
Operations and maintenance can be better optimized through data analytics run on
data coming from the whole fleet
 Active and reactive power are indicators of constraints on alternators: effect on
their wears
• ~ 50 plants
• 20 years of data
• 10 min interval data
• Phoenix queries allow to select plants and periods of time
• Compute and show reactive power per day or per hour of the
day
• More detailed analysis
• Fleet level analysis
• Interactive queries

21
Added value of data science algorithms on heterogeneous data:
Operations and maintenance can be better optimized through data analytics run on
data coming from the whole fleet
Monitoring and control of contractual agreements when network frequency
varies (plants have to contribute to the global balance)
• Pattern matching
• Response time for different plants
• Different levels of analysis : by plant, by
generation, global
• Generic approach implemented for any
kind of patterns

22
Added value of data science algorithms on heterogeneous data
Prediction of plants cooling according to the quality of incoming water in the
plants
• Correlations?
• According to the plants
• Use of GAM models
• Integration of two internal sources +
external data
• Better understanding
• // Work in progress //

23
Integration of data science and visualization: architecture
Hadoop Cluster Web Service REST
(VM)
Browser

24
Integration of data science: a global approach
Pre-processing
Data quality
Sampling
Synchronization
…
Selection and queries
Threshold
Pattern matching
Period of time
…
Analysis and data science
Reporting
Exploratory analysis
(distribution …)
Modelling
…

25
Outline
6. AS A CONCLUSION

26
A Data Lab in progress: a team, an approach …
… and some questions
Objectives:
Bring value from data analytics
Issues:
 Skills and organization (between entities)
 Architecture :
 Operational Hadoop cluster and loads (use of a multitenant
enterprise cluster)
 Other loads (data science)
 Data prep within Hadoop + edge machine for data science (Spark, R,
Python)
 How to quantify value
 Developments costs and maintenance
 How to industrialize
Source: Xebia

27
Outline
6. AS A CONCLUSION

28
Takeaways
 A Data Lake for our nuclear fleet
 In progress : industrialization and decommissioning of Historian applications
 Great reduction of licensing costs
 A Data Lab under construction
 POCs showing the added value of data science algorithms
 predictive maintenance
 In the context of fleet renovation for plant life extension (major overhaul program): operations & maintenance, generation
costs optimization
 Issues remaining : skills, organization, technical architecture, quantify value
 Perspectives and technical issues:
 Data lakes and labs for other fleets (thermal plants, hydro, renewables)
 Scalable time-series analytics (synchronization, missing data …)
 Handling heterogeneous data (textual, images, graphs …)
 IoT platform

References
A proof of concept with Hadoop: storage and analytics of electrical time-series.
Marie-Luce Picard, Bruno Jacquin, Hadoop Summit 2012, Californie, USA, June 2012: http://www.slideshare.net/Hadoop_Summit/proof-of-
concent-with-hadoop
Massive Smart Meter Data Storage and Processing on top of Hadoop.
Leeley D. P. dos Santos, Alzennyr G. da Silva, Bruno Jacquin, Marie-Luce Picard, David Worms,Charles Bernard. Workshop Big Data 2012,
Conférence VLDB (Very Large Data Bases), Istanbul, Turquie, 2012: http://www.cse.buffalo.edu/faculty/tkosar/bigdata2012/program.php
Searching time-series with Hadoop in an electric power company.
Alice Bérard, Georges Hébrail, BigMine Workshop, KDD2013, Chicago, August 2013: http://bigdata-mining.org/
Real-time energy data-analytics with Storm.
Rémy Saissy, Marie-Luce Picard, Charles Bernard, Bruno Jacquin, Simon Maby, Benoît Grossin, Hadoop Summit 2014, Californie, USA, June
2014: http://fr.slideshare.net/Hadoop_Summit/t-525p212picard
Computing Data Quality Indicators on Big Data Stream Using a CEP
Wenlu Yang, Alzennyr Gomes Da Silva, Marie-Luce Picard, IEEE Xplore - IWCIM 2015, Prague, Novembre 2015.
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Network
Guillaume Germaine, Thomas Vial, Hadoop Summit Europe 2016, Dublin
http://www.slideshare.net/HadoopSummit/exploring-titan-and-spark-graphx-for-analyzing-timevarying-electrical-networks

A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

Similar to A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet (20)

More from DataWorks Summit/Hadoop Summit

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded

Recently uploaded (20)

A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

Editor's Notes