Your SlideShare is downloading. ×
Real-time energy data analytics with Storm
Hadoop Summit 2014, San José, June 3rd
Rémy Saissy - Simon Maby, Octo Technolog...
2
Outline
1. CONTEXT
2. OBJECTIVES : USING A CEP FOR REAL-TIME ANALYTICS
3. POC ON STORM: DETAILED ARCHITECTURE AND RESULT...
3
Outline
1. CONTEXT
2. OBJECTIVES : USING A CEP FOR REAL-TIME ANALYTICS
3. POC ON STORM: DETAILED ARCHITECTURE AND RESULT...
4
EDF GROUP : A GLOBAL LEADER IN
ELECTRICITY
 €72.7 billion in sales
 39.3 million customers
 159,740 employees worldwi...
5
EDF R&D: missions and key
figures
€ 520 millions
budget in 2012
70 % activity to support
performance of Group
businesses...
6
IT consulting company
209 employees
174 consultants, architects, experts or
coaches mastering:
Technology
Methodology
Kn...
7
What we do ?
We use technology and creativity to turn your ideas into reality
IT CONSULTING AND EXPERTISE
It is the prod...
8
Electricity industry business and data
management
The development of Smart Grids will lead to
the creation, collection a...
9
Outline
1. CONTEXT
2. OBJECTIVES : USING A CEP FOR REAL-TIME ANALYTICS
3. POC ON STORM: DETAILED ARCHITECTURE AND RESULT...
10
POC on STORM: objectives
Evaluate Storm capabilities for various real-time analytical
processing needs:
 On time serie...
11
POC Storm: functional picture
Smart Metering
Data Stream
Input
Customer data
Static or dynamic pricing
Weather forecast...
12
POC Storm: functional picture
Smart Metering
Data Stream
Input
Customer data
Static or dynamic pricing
Weather forecast...
13
Use of simulated data (load curves)
 The simulator TURBO-COURBOGEN © aims to generate
massively individual volatile lo...
14
Individual scores based on SAX transformation (see FROST
library presentation, a lightning talk during Hadoop
Summit Eu...
15
Outline
1. CONTEXT
2. OBJECTIVES : USING A CEP FOR REAL-TIME ANALYTICS
3. POC ON STORM: DETAILED ARCHITECTURE AND RESUL...
16
General development context
Storm
 Many concepts to understand (learning curve)
 Easy to take in hand
 Easy to test ...
17
Use Case: Next Day Forecasting
18
Turbo-CourboGen©
Emits tuples as fast as it can
209544,4268,282240,0.596,0.579,0.322,0.115,0.098,0.052,0.053,0.019,0.05...
19
R computation within a CEP
 Reuse of existing scripts
 Skills available within the organization
 Parallelization abs...
20
Performance Metrics
Load test run distribution – Tuples processed per minute
10 workers
Batch size of 2000
Low Parallel...
21
Performance Metrics
Load test run distribution – Tuples processed per minute
10 workers
Batch size of 2000
Medium Paral...
22
Performance Metrics
Load test run distribution – Tuples processed per minute
20 workers
Batch size of 5000
HighParallel...
23
Performance Metrics
Tuples processed over time
20 workers
Batch size of 5000
HighParallelism Hint (~400)
24
If we had to start over
25
Conclusion
 We had fun
 Behavior within the whole Information System
 Resources sharing with the rest of the stack
...
26
Conclusion
Finally, Storm is used in operational conditions for supervising the
communication network associated with s...
References
[1] A proof of concept with Hadoop: storage and analytics of electrical time-series.
Marie-Luce Picard, Bruno J...
Special thanks to : EDF R&D: Alexis Bondu, Yannig Goude
OCTO Technology: Cyrille Mailley
Upcoming SlideShare
Loading in...5
×

Real-time Energy Data Analytics with Storm

2,356

Published on

Published in: Technology, Business
0 Comments
16 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,356
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
16
Embeds 0
No embeds

No notes for slide
  • 27
  • 28
  • Transcript of "Real-time Energy Data Analytics with Storm"

    1. 1. Real-time energy data analytics with Storm Hadoop Summit 2014, San José, June 3rd Rémy Saissy - Simon Maby, Octo Technology Marie-Luce Picard - Bruno Jacquin - Charles Bernard - Benoît Grossin, EDF R&D
    2. 2. 2 Outline 1. CONTEXT 2. OBJECTIVES : USING A CEP FOR REAL-TIME ANALYTICS 3. POC ON STORM: DETAILED ARCHITECTURE AND RESULTS 4. CONCLUSIONS 5. REFERENCES Brice Richard - FlickrKC Tan Phoyography - Flickr
    3. 3. 3 Outline 1. CONTEXT 2. OBJECTIVES : USING A CEP FOR REAL-TIME ANALYTICS 3. POC ON STORM: DETAILED ARCHITECTURE AND RESULTS 4. CONCLUSIONS 5. REFERENCES Brice Richard - FlickrKC Tan Phoyography - Flickr
    4. 4. 4 EDF GROUP : A GLOBAL LEADER IN ELECTRICITY  €72.7 billion in sales  39.3 million customers  159,740 employees worldwide  84.7% of generation does not emit CO2 Net production capacity
    5. 5. 5 EDF R&D: missions and key figures € 520 millions budget in 2012 70 % activity to support performance of Group businesses 30 % activity to anticipate and prepare for the future 500 major projects ongoing 7 international Centres including 3 France 4 Germany, United Kingdom, Poland, China Plus 1 USA based team (technology/innovation survey and prospective) 2 100 employees including : 370 PhD 150 PhD students 200 researchers teaching at universities and advanced engineering schools 15 departments (expertise, partnerships and project management) 14 joint research laboratories Partnering with 4 venture capital funds in the field of clean technologies - Consolidate a carbon-free energy mix - Anticipate the electricity of tomorrow - Develop a flexible range of low carbon energy
    6. 6. 6 IT consulting company 209 employees 174 consultants, architects, experts or coaches mastering: Technology Methodology Knowledge of your business needs and challenges 24.1 million in turnover worldwide (2013) 16 years of feedbacks Purely organic growth (20% annually) Strong corporate culture and values OCTO ID NUMBERS 27% JUNIOR 33% SENIOR 40% DE CONFIRMÉS TURNOVER EMPLOYEES « We want to reproduce wherever possible what made us successful: a vision of IT, strong values and sharp skills. » INTERNATIONAL LOCATIONS EXPERIENCED OUR EXPEREINCED TEAM:
    7. 7. 7 What we do ? We use technology and creativity to turn your ideas into reality IT CONSULTING AND EXPERTISE It is the product of an ambitious business vision turned reality thanks to a pragmatic use of technology. DESIGN OF INNOVATIVE APPLICATIONS We are committed to fostering the fruition of your ideas and needs, making them concrete so that you can start benefitting from them in just a few weeks. You can trust us with the implementation of your software products from start to finish. We can also help you to design better innovative applications.
    8. 8. 8 Electricity industry business and data management The development of Smart Grids will lead to the creation, collection and use of an unprecedented amount of data for utilities. This brings opportunities for:  A better optimization of the system,  Improving the value for customers, based on a deep exploitation of consumption data The whole sector is evolving – “smart” data is everywhere Utilities become digital: physical systems come with digital ones (at all levels, from transportation, distribution, production or sales), the system becomes more complex (demand response, distributed generation …) Today, 2 indexes a year. Tomorrow, a daily measurement = + 20 000 % Tomorrow, one measurement every ½ hour = + 900 000 %
    9. 9. 9 Outline 1. CONTEXT 2. OBJECTIVES : USING A CEP FOR REAL-TIME ANALYTICS 3. POC ON STORM: DETAILED ARCHITECTURE AND RESULTS 4. CONCLUSIONS 5. REFERENCES Brice Richard - FlickrKC Tan Phoyography - Flickr
    10. 10. 10 POC on STORM: objectives Evaluate Storm capabilities for various real-time analytical processing needs:  On time series  Simple or complex analytics(build KPIs , or run adaptive machine learning algorithms)  Merging data in motion and data at rest  With real-time business intelligence constraints (not so extreme) Have a deeper understanding on how Storm works (concepts) and be able to compare with other classical CEP tools
    11. 11. 11 POC Storm: functional picture Smart Metering Data Stream Input Customer data Static or dynamic pricing Weather forecasts DatainmotionDataatrest http://storm-project.net/ • Simple aggregations ex. national curve • Complex aggregations ex. curves aggregated by tariff • Analytics: ex. scoring (for each meter) • Forecasts: ex.D+1 forecasts expressed in Wh and in € (adaptive models) Output
    12. 12. 12 POC Storm: functional picture Smart Metering Data Stream Input Customer data Static or dynamic pricing Weather forecasts DatainmotionDataatrest http://storm-project.net/ • Simple aggregations ex. national curve • Complex aggregations ex. curves aggregated by tariff • Analytics: ex. scoring (for each meter) • Forecasts: ex.D+1 forecasts expressed in Wh and in € (adaptive models) Output 1 ZOOM ON DATA ZOOM ON ANALYTICS 2
    13. 13. 13 Use of simulated data (load curves)  The simulator TURBO-COURBOGEN © aims to generate massively individual volatile load-curves  The simulated aggregated curve should be close to the real aggregate  Non parametric and efficient:  Java code on CPU 2GHz (Xenon E5405)  360000 tuples/s/CPU (18X real-time)  See [5] POC Storm: Zoom on data Real individual data Machine Learning process Markov generative model Simulation
    14. 14. 14 Individual scores based on SAX transformation (see FROST library presentation, a lightning talk during Hadoop Summit Europe 2013 [3]) Forecasts based on GAM models Generalized Additive Models, use of mgcv R package (S. Wood), applied to electricity demand forecast [6] POC Storm: Zoom on analytics
    15. 15. 15 Outline 1. CONTEXT 2. OBJECTIVES : USING A CEP FOR REAL-TIME ANALYTICS 3. POC ON STORM: DETAILED ARCHITECTURE AND RESULTS 4. CONCLUSIONS 5. REFERENCES Brice Richard - FlickrKC Tan Phoyography - Flickr
    16. 16. 16 General development context Storm  Many concepts to understand (learning curve)  Easy to take in hand  Easy to test and deploy (Storm client) Setting up a cluster  HDP 2.1 cluster  11 nodes  Easy to install with Ambari 1.5 Task force  Storm newbies  Statistics, development and architecture skills  30 days * 2 persons
    17. 17. 17 Use Case: Next Day Forecasting
    18. 18. 18 Turbo-CourboGen© Emits tuples as fast as it can 209544,4268,282240,0.596,0.579,0.322,0.115,0.098,0.052,0.053,0.019,0.055,0.051,0.008,0.054,0.02,0.059,0.06,0.555,0.614,0.56,0.651, 0.631,1.529,4.103,14.937,11.796,13.857,9.309,8.511,6.58,13.06,16.016,11.236,9.304,15.057,5.188,0.682,0.284,0.925,0.181,0.268,0.264, 0.525,0.221,0.197,0.215,0.174,0.132,0.118,0.132 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0:30 1:00 1:30 2:00 2:30 3:00 3:30 4:00 4:30 5:00 5:30 6:00 6:30 7:00 7:30 8:00 8:30 9:00 9:30 10:00 10:30 11:00 11:30 12:00 12:30 13:00 13:30 14:00 14:30 15:00 15:30 16:00 16:30 17:00 17:30 18:00 18:30 19:00 19:30 20:00 20:30 21:00 21:30 22:00 22:30 23:00 23:30 0:00
    19. 19. 19 R computation within a CEP  Reuse of existing scripts  Skills available within the organization  Parallelization abstraction thanks to Storm  Being able to load new models from the R&D on the fly But…  Difficult to instantiate  Difficult to debug  Slow (potential bottleneck)
    20. 20. 20 Performance Metrics Load test run distribution – Tuples processed per minute 10 workers Batch size of 2000 Low Parallelism Hint (<10)
    21. 21. 21 Performance Metrics Load test run distribution – Tuples processed per minute 10 workers Batch size of 2000 Medium Parallelism Hint (~100)
    22. 22. 22 Performance Metrics Load test run distribution – Tuples processed per minute 20 workers Batch size of 5000 HighParallelism Hint (~400)
    23. 23. 23 Performance Metrics Tuples processed over time 20 workers Batch size of 5000 HighParallelism Hint (~400)
    24. 24. 24 If we had to start over
    25. 25. 25 Conclusion  We had fun  Behavior within the whole Information System  Resources sharing with the rest of the stack  Storm-on-YARN, capacity scheduler  Lack of Security  Wire encryption  User role management (Kerberos?)  Reliability  Transactional  Failover  DevOps
    26. 26. 26 Conclusion Finally, Storm is used in operational conditions for supervising the communication network associated with smart meters [7]  Process 8 millions of events every day  Need to build KPIs on the fly for managing the system and ensuring QoS  Use of Trident (mini-batch, idempotency)  Storm is used with other components (HBase, Kafka …)
    27. 27. References [1] A proof of concept with Hadoop: storage and analytics of electrical time-series. Marie-Luce Picard, Bruno Jacquin, Hadoop Summit 2012, Californie, USA, 2012. présentation : http://www.slideshare.net/Hadoop_Summit/proof-of-concent-with-hadoop vidéo: http://www.youtube.com/watch?v=mjzblMBvt3Q&feature=plcp [2] Massive Smart Meter Data Storage and Processing on top of Hadoop. Leeley D. P. dos Santos, Alzennyr G. da Silva, Bruno Jacquin, Marie-Luce Picard, David Worms,Charles Bernard. Workshop Big Data 2012, Conférence VLDB (Very Large Data Bases), Istanbul, Turquie, 2012. http://www.cse.buffalo.edu/faculty/tkosar/bigdata2012/program.php [3] Smart Metering x Hadoop x Frost: A Smart Elephant Enabling Massive Time Series Analysis. Benoît Grossin, Marie-Luce Picard, Hadoop Summit Europe 2013, Amsterdam, Mars 2013 http://hadoopsummit.org/amsterdam/ [4] Searching time-series with Hadoop in an electric power company. Alice Bérard, Georges Hébrail, BigMine Workshop, KDD2013, Chicago, August 2013 http://bigdata-mining.org/ [5] Realistic and very fast simulation of individual electricity consumption Alexis Bondu, IEEE Transaction on Smart Grid Journal, 2014, to be published [6] Short-term electricity load forecasting with Generalized Additive Models Amandine Pierrot, Yannig Goude, Proceedings of ISAP Power, pp593-600, 2011 [7] Retour d’expérience du client eRDF. Supervision Linky Olivier Pellegrino, Richard Tagliazucchi, RedHat Forum, Paris, Juin 2014.
    28. 28. Special thanks to : EDF R&D: Alexis Bondu, Yannig Goude OCTO Technology: Cyrille Mailley

    ×