Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

1,257 views

Published on

A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

  1. 1. A Data Lake and a Data Lab to Optimize Operations and Safety Within a Nuclear Fleet Hadoop Summit 2016, San José, June 30th Marie-Luce PICARD, EDF R&D – marie-luce.picard@edf.fr Jean-Marc RANGOD, EDF-DPNT Christophe SALPERWYCK, EDF R&D Special thanks to Raphaël QUERCIA EDF-DTG, Carole MAI and Amandine PIERROT EDF R&D
  2. 2. 2 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  3. 3. 3 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  4. 4. 4 ELECTRICITY GENERATION 623.5 TWH All electricity-related activities Generation Transmission & Distribution Trading and Sales & Marketing Energy services Key figures* €72.9 billion in sales 38.5 million customers 158,161 employees worldwide 84.7% of generation does not emit CO2 2014 INVESTMENTS €4.5 BILLION EDF: A GLOBAL LEADER IN ELECTRICITY *as of 2015 EDF : AN EFFICIENT, RESPONSIBLE ELECTRICITY COMPANY AND THE CHAMPION OF LOW-CARBON GROWTH
  5. 5. WORLD’S LEADING OPERATOR, EXCELLENT PERFORMANCE IN FRANCE 72.9 GW installed capacity, 54% of the Group’s net generation capacity 477.7 TWh generated, 77% of the Group’s output 58 reactors operated in France, 15 in the UK 3 EPR under construction: — 1 in Flamanville (France) — 2 in Taishan (China) 2 EPR in project phase  OSART safety audit 17 best practices identified by IAEA  France Best generation performance for six years  UK World record for safety in the workplace  China Strengthened cooperation agreement with CNNC NUCLEAR EDF 2015 I P.5
  6. 6. R&D KEY FIGURES
  7. 7. Scientific partnerships with actors of Paris- Saclay research departments 8 exceptional buildings 4 outstanding hall test 1 Unique equipment, innovative communication tools Diverse areas of expertise 1500 work stations Plenty of collaborative spaces EDF LAB PARIS-SACLAY
  8. 8. 9 Main Big Data related challenges for EDF Power Generation  Process monitoring and condition-based maintenance from sensors  Power generation forecasting for renewables Energy management  Load forecasting  Balancing and optimizing generation and consumption (using smart metering information, including renewables)  Electrical networks  Smart Grid operations (local)  Condition-based maintenance Customers and sales  New services to customers using smart-metering data  Smart Homes, Smart Building, Smart Cities management related to energy
  9. 9. 10 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  10. 10. 11 Operations and maintenance of the nuclear fleet  The maintenance policy of EDF generation fleet is optimized to ensure reliability and safety of equipment and systems while strengthening our competitiveness:  Have better diagnosis, improved performance and availability  Make a better use of data and documents, so far stored into Data silos  More globally, the IT teams and projects aim at:  Strengthen performance of operations and maintenance through a global fleet approach  Simplify the Industrial Information System architecture  Improve and develop the way we use our data  Accumulate and archive data through time … while reducing costs
  11. 11. 12 Voluminous and heterogeneous data …. stored in data silos Source : Wikipedia One DB by nuclear site, gathering data from sensors. Use of Data Historians.  Focus on data:  High volume:  data is stored up to 40-60 years (lifetime of the plant)  SCADA data can be sampled every 20 to 40 ms (but mainly a few seconds)  Around 10.000 sensors per plant  Variety:  Data is heterogeneous  Time series, images, documents  Various data sources  The actual systems (historians) don’t allow too many concurrent access, and their SLA are quite bad
  12. 12. 13 A Data Lake for the nuclear fleet ESPADON : the Data Lake for the nuclear fleet One DB by nuclear site, gathering data from sensors. Use of Data Historians. Source : Wikipedia © M. Caraveo, Hadoop cluster NOE data center
  13. 13. 14 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  14. 14. | 15 A data lake for the nuclear fleet: big picture …. Files (chemical information) Historian - SCADA Files (dosimetry) E-monitoring application Viz Interactive queries and reporting Web Service Hadoop cluster – ESPADON Data Lake Reports © M. Caraveo, Hadoop cluster NOE data center
  15. 15. 16 Zoom on data  4 generations of plants, but high level of normalization of data and sensors (for example, use of trigrams for identification of elementary systems)  Two main types of sensors : ANA (for analogic) and TOR (for state events)  Time series  Volume  For the POC, 10 plants, 2 years: about 20 billions of points  Target (59 plants) : 15 To of data (all plants, whole lifecycle) Metric, global Date Value Quality BU2ABP177MT- 2015-04-30T22:05:00.000Z 156.6 Good/M BU2ABP177MT- 2015-04-30T22:06:00.000Z 156.4 Good/M BU2ABP177MT- 2015-04-30T22:07:00.000Z 156.2 Good/M BU2ABP177MT- 2015-04-30T22:08:00.000Z 156.0 Good BU2ABP177MT- 2015-04-30T22:09:00.000Z 156.2 Good/M BU2ABP177MT- 2015-04-30T22:10:00.000Z 156.4 Good/M BU2ABP177MT- 2015-04-30T22:12:00.000Z 156.7 Good/M BU2ABP177MT- 2015-04-30T22:14:00.000Z 157.1 Good BU2ABP177MT- 2015-04-30T22:15:00.000Z 157.3 Good BU2ABP177MT- 2015-04-30T22:16:00.000Z 157.5 Good BU2ABP177MT- 2015-04-30T22:19:00.000Z 157.3 Good/M BU2ABP177MT- 2015-04-30T22:20:00.000Z 157.1 Good/M BU2ABP177MT- 2015-04-30T22:21:00.000Z 157.3 Good/M BU2ABP177MT- 2015-04-30T22:22:00.000Z 157.1 Good/M BU2ABP177MT- 2015-04-30T22:24:00.000Z 156.9 Good/M BU2ABP177MT- 2015-04-30T22:27:00.000Z 157.1 Good/M BU2ABP177MT- 2015-04-30T22:28:00.000Z 157.3 Good/M BU2ABP177MT- 2015-04-30T22:29:00.000Z 157.5 Good/M BU2ABP177MT- 2015-04-30T22:30:00.000Z 157.7 Good/M
  16. 16. 17 Data model  Use of HBASE and PHOENIX  Distributed key/values store  Allows models update (normalization requirements evolution, new indicators… new plants)  Phoenix for SQL compliance + BI tools  Tables  3 tables : DDT, ANA, TOR  Rowkey : <sensorid, timestamp> (queries mainly consider one or several sensors for a period of time)  Sequential storage ; split into Hfiles and Hregion according to the plant unit Clé ColumnFamily Colonne Valeur Phoenix type m (concat(metriquei d, timestamp)) 0 v H_ValeurANA Float q H_QualitéANA Char(10) n H_NiveauxANA varchar(10) Clé ColumnFamily Colonne Valeur Phoenix type m (concat(metriquei d, timestamp)) 0 v H_ValeurTOR Varchar(10) q H_QualiteTOR Char(10) n H_NiveauxTOR Varchar(10)
  17. 17. 18 Validation and performances evaluation  POC validation  Upload of historical data; queries / analyses  Existing functions: viz, reports, services  Data injection: SCADA for the whole fleet, integration of other sources of data  Results  6 weeks (estimated) needed to upload historical data from 59 plants  Queries for validating the model :  Use of Jmeter for simulating load  With or without insertion workload  ~ < 1 second for drawing a curve for a selected month  Integration of an existing GUI for viz (realized within a few days)  Validation of specific calculation within reports  ODBC link for specific e-monitoring application  Integration of various sources of (structured) data into the data lake  ‘Real-time’ insertion of data (micro-batch):  Up to 2M points / s  Very low latency between insertion and availability (< 10s) SELECT MIN(v), MAX(v), FIRST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC), LAST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC), TO_CHAR(ts, 'dd') as day, TO_CHAR(ts, 'HH') as hour, TO_CHAR(ts, 'mm') as minute, count(*) as cnt FROM ORLI_ANA WHERE m = ? AND ts > current_time()-1 AND //last 24h ts < current_time() GROUP BY day, hour, minute Phoenix query (ANA)
  18. 18. 19 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  19. 19. 20 Added value of data science algorithms on heterogeneous data: Operations and maintenance can be better optimized through data analytics run on data coming from the whole fleet  Active and reactive power are indicators of constraints on alternators: effect on their wears • ~ 50 plants • 20 years of data • 10 min interval data • Phoenix queries allow to select plants and periods of time • Compute and show reactive power per day or per hour of the day • More detailed analysis • Fleet level analysis • Interactive queries
  20. 20. 21 Added value of data science algorithms on heterogeneous data: Operations and maintenance can be better optimized through data analytics run on data coming from the whole fleet Monitoring and control of contractual agreements when network frequency varies (plants have to contribute to the global balance) • Pattern matching • Response time for different plants • Different levels of analysis : by plant, by generation, global • Generic approach implemented for any kind of patterns
  21. 21. 22 Added value of data science algorithms on heterogeneous data Prediction of plants cooling according to the quality of incoming water in the plants • Correlations? • According to the plants • Use of GAM models • Integration of two internal sources + external data • Better understanding • // Work in progress //
  22. 22. 23 Integration of data science and visualization: architecture Hadoop Cluster Web Service REST (VM) Browser
  23. 23. 24 Integration of data science: a global approach Pre-processing Data quality Sampling Synchronization … Selection and queries Threshold Pattern matching Period of time … Analysis and data science Reporting Exploratory analysis (distribution …) Modelling …
  24. 24. 25 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  25. 25. 26 A Data Lab in progress: a team, an approach … … and some questions Objectives: Bring value from data analytics Issues:  Skills and organization (between entities)  Architecture :  Operational Hadoop cluster and loads (use of a multitenant enterprise cluster)  Other loads (data science)  Data prep within Hadoop + edge machine for data science (Spark, R, Python)  How to quantify value  Developments costs and maintenance  How to industrialize Source: Xebia
  26. 26. 27 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  27. 27. 28 Takeaways  A Data Lake for our nuclear fleet  In progress : industrialization and decommissioning of Historian applications  Great reduction of licensing costs  A Data Lab under construction  POCs showing the added value of data science algorithms  predictive maintenance  In the context of fleet renovation for plant life extension (major overhaul program): operations & maintenance, generation costs optimization  Issues remaining : skills, organization, technical architecture, quantify value  Perspectives and technical issues:  Data lakes and labs for other fleets (thermal plants, hydro, renewables)  Scalable time-series analytics (synchronization, missing data …)  Handling heterogeneous data (textual, images, graphs …)  IoT platform
  28. 28. References A proof of concept with Hadoop: storage and analytics of electrical time-series. Marie-Luce Picard, Bruno Jacquin, Hadoop Summit 2012, Californie, USA, June 2012: http://www.slideshare.net/Hadoop_Summit/proof-of- concent-with-hadoop Massive Smart Meter Data Storage and Processing on top of Hadoop. Leeley D. P. dos Santos, Alzennyr G. da Silva, Bruno Jacquin, Marie-Luce Picard, David Worms,Charles Bernard. Workshop Big Data 2012, Conférence VLDB (Very Large Data Bases), Istanbul, Turquie, 2012: http://www.cse.buffalo.edu/faculty/tkosar/bigdata2012/program.php Searching time-series with Hadoop in an electric power company. Alice Bérard, Georges Hébrail, BigMine Workshop, KDD2013, Chicago, August 2013: http://bigdata-mining.org/ Real-time energy data-analytics with Storm. Rémy Saissy, Marie-Luce Picard, Charles Bernard, Bruno Jacquin, Simon Maby, Benoît Grossin, Hadoop Summit 2014, Californie, USA, June 2014: http://fr.slideshare.net/Hadoop_Summit/t-525p212picard Computing Data Quality Indicators on Big Data Stream Using a CEP Wenlu Yang, Alzennyr Gomes Da Silva, Marie-Luce Picard, IEEE Xplore - IWCIM 2015, Prague, Novembre 2015. Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Network Guillaume Germaine, Thomas Vial, Hadoop Summit Europe 2016, Dublin http://www.slideshare.net/HadoopSummit/exploring-titan-and-spark-graphx-for-analyzing-timevarying-electrical-networks

×