Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Customer Sucess Story: Big Data in EDP

1,599 views

Published on

This presentation reports the customer success story of EDP and its project called PREDIS.
BY Mário Guerreiro, Enterprise Architect @EDP

Published in: Technology

Customer Sucess Story: Big Data in EDP

  1. 1. BigData in EDP Mário Guerreiro February, 23th 2017
  2. 2. EDP Inovação 2 Agenda 1. Introduction to EDP 2. Motivation 3. Project PREDIS – Real time Load and Generation disaggregated forecast 4. EDP Future IT Architecture 5. Conclusions
  3. 3. EDP Inovação EDP Group - from a local electricity incumbent to a global energy player with strong presence in Europe, Brazil and considerable investments in USA UK USA Canada Portugal Brazil Angola Spain Italy France Belgium Poland Romania China 中国 # Present in the Electric Sector in Dow Jones Sustainability Indexes #3 World wind energy company #1 Europe hydro project (+3,5 GW under development) #1 Portugal industrial group 260 Employees 3 422 Installed Capacity (MW) 9 330 Net Generation (GWh) 100% Generation from renewable sources USA/ Canada 2 635 Employees 2 831 651 Electricity Customers 1 874 Installed Capacity (MW) 8 043 Net Generation (GWh) 100% Generation from renewable sources 24 544 Electricity Distribution (GWh) Brazil 7252 Employees 6 053 509 Electricity Customers 271 576 Gas Customers 10 992 Installed Capacity (MW) 34 364 Net Generation (GWh) 51% Generation from renewable sources 46 508 Electricity Distribution (GWh) 7 138 Gas Distribution (GWh) Portugal 34 Employees 363 Installed Capacity (MW) 705 Net Generation (GWh) 100% Generation from renewable sources France/ Belgium 14 Employees Italy 21 Employees United Kingdom 51 Employees 475 Installed Capacity (MW) 621 Net Generation (GWh) 100% Generation from renewable sources Poland/ Romania 2 038 Employees 1 015 543 Electricity Customers 787 869 Gas Customers 6 087 Installed Capacity (MW) 15 331 Net Generation (GWh) 37% Generation from renewable s. 9 517 Electricity Distribution (GWh) 48 447 Gas Distribution (GWh) Spain Mexico
  4. 4. EDP Inovação 4 EDP Distribuição and EDP Inovação – facts and figures 245.000 Km Percent of the electricity distribution network owned in mainland Portugal Distribution network approximate length 6 Million Approximate number of customers served EDP Distribuição is the EDP Group's company operating in the regulated distribution and supply businesses in Portugal. EDP's distribution activity is regulated by the Portuguese energy regulator ERSE (Entidade Reguladora dos Serviços Energéticos) which defines the tariffs, parameters and prices for electricity and other services in Portugal. EDP Inovação is the innovation arm of EDP Group, promoting value-adding innovation within the Group by leading the adoption of new technological evolutions and practices. Open innovation approach Client- focused Solutions Smarter Grids Cleaner Energy Data Leap 5 strategic innovation areas Entrepreneurship & Venture Capital ecosystem EDP Starter and EDP Ventures Storage
  5. 5. EDP Inovação 5 Agenda 1. Introduction to EDP 2. Motivation 3. Project PREDIS – Real time Load and Generation disaggregated forecast 4. EDP Future IT Architecture 5. Conclusions
  6. 6. EDP Inovação Smart Grid The energy sector transformation is adding new challenges to the Distribution System Operator (DSO), demanding new strategies for the Distribution Power Grid, that is becoming progressively more intelligent Quality of Service Operational Efficiency Historical Challenges New Challenges Advanced Metering Infrastructure Network automation & sensors Energy efficiency and new business models Electric vehicles Renewables and Distributed Generation 6
  7. 7. EDP Inovação 7 To address those new challenges we need to increase the visibility over the LV network, reducing the existing gap when compared with HV and MV networks. HV: 9.000 km 412 HV/MV Substation HV/MV Station VHV/HV HV network Distribution Network Secondary Substation MV/LV MV network LV network Retailer/ Consumer/ Producer 140.000 km LV Lines 6.000.000 Users MV: 74.000 km MV/LV: 66.000 Network Assets Level of Monitoring and Automation HANLANWAN EDP Box The ability to collect information from different sources (internal and external, structured and unstructured) that are mostly scattered, has a huge potential to improve the utility operational activities
  8. 8. EDP Inovação 8 Preparing for the data deluge: in 2013 EDP start to address big data and advanced analytics, due to a operational issue and upon benchmarking a conventional database with Hadoop Load Curve Profiling + Aggregation Technology Time Notes Current architecture Oracle Around 8h 4 Million points SQL with Big Data Hive, Impala 1 to 4h Inadequate Customized programming without Big Data Java Around 5min One machine (multi-core) Customized Programing with Big Data Spark <5 min Multi machines (PCs) with Big Data higher resilience and parallelization National Energy Consumption (with load curves) by voltage level* System Nodes [#] Cores [#] RAM [GB] Cluster Readings [10^6] Volume [MB] Processing Time [h:min:sec] BO (Oracle) 4 96 202 Local 12 x 6 72 3:45:00 Hadoop 21 42 157 Virtual / Cloud 96 x 6 576 00:09:37 *This Proof of Concept was done in the cloud payed with a credit card with a cost around $30. Main conclusions • The Hadoop cluster is by nature resilient and coped with nodes failure • The processing times can be greatly reduced over traditional architecture • There is a high need for customization • The choice of the tool within the Hadoop ecosystem depends highly on the type of calculations/use-case to be made
  9. 9. EDP Inovação 9 Agenda 1. Introduction to EDP 2. Motivation 3. Project PREDIS – Real time Load and Generation disaggregated forecast 4. EDP Future IT Architecture 5. Conclusions
  10. 10. EDP Inovação 10 With the results obtained a project called PREDIS was set-up to obtain the load and generation forecast at an disaggregated level and in near real time mode (with 15 minutes refreshment) PREDIS requirements • Inputs from different data sources from EDP Distribuição (GIS, SCADA, Oracle, SAP) • Development of a adequate machine-cluster to perform all the computation • Information integration on a data model to support the forecast • Develop analytic processes to compute the information in adequate elapsed time Energy Balance Revenue Assurance Dispatch Center PREDIS SGL EI Server TC TLP EB Estimate Grid Planning Fraud SIT BI- Scada Power On SysGrid PREDIS Project Goals:  Electrical Load Forecast for the next 72 hours  Disaggregated Renewable energy sources Forecast (Wind, Solar) for the next 72 hours  Manage the aprox. 6 million points asset universe (Substations, Distribution Transformers, LV clients, etc)  Forecast update every 15 minutes  Incorporate dynamic grid topology
  11. 11. EDP Inovação Review of existing load forecast models We defined some steps to find an adequate model that allowed us to forecast the load with “good enough” accuracy 11 Test the model over national Load Improve the model with additional Explanatory variables Define models for different times of year
  12. 12. EDP Inovação 12 After choosing the model we identified a set of explanatory variables and tested the model over National Demand Explanatory variables: • Year, month, day • Day of week • Public holiday • Season (Spring, Summer, Autumn, Winter) • Daylight save time (TRUE, FALSE) • Time of year • Time of day (48 1/2 hour intervals) • Temperature From NOAA website Dataset: • Half-hourly electricity measurements • National demand (mainland Portugal) • From 2006 to 2011 – Data for calibration • From 2012 to 2014 – Data for test High temperature ~ demand peak (2013 - 4th highest heat wave since 1981) Low temperature ~ demand peak (2012 European cold wave due Siberian High)
  13. 13. EDP Inovação 13 In order to increase the model’s accuracy and looking at the major residuals, we started a trial and error process to identify the main causes that would decrease the model errors August Christmas and New Year period Public holiday on Sunday Gong storm -500 500 0,65 0,7 0,75 0,8 0,85 0,9 0,95 1 Iteraction Variable 1 24h lagged load 2 temp. combined w. time of day 3 48h lagged load 4 day of week 5 public holidays 6 intra-day effect dependent on the day type 7 day of the year 8 24h lagged temp. + min and max temp. of last 24h 9 days offs before Christmas and Carnival Devianceexplained Iteration Thehigherthebetter Features added/combined Model Accuracy
  14. 14. EDP Inovação 14 But there were still some issues with the forecast. After special days like Christmas the model shouldn’t use the load values of the previous day to forecast This lead to a new approach of using a weighted majority algorithm
  15. 15. EDP Inovação 15 In this approach we had several algorithms that were trained to certain conditions and the model automatically choose the one that minimized the error for each period Iteration Variable 1 General-purpose model 2 General-purpose model reviewed 3 Weekends' model 4 August's model 5 Public holidays' model 6 Spring and Summer's model 7 Autumn and Winter's mode 8 Christmas and New Year's model 9 Carnival's model 10 Easter's model Jan Autumn Dec Carnival period Easter period AugustSpring Christmas and New Year periodWeekends Other public holidays Oneyear 2 2,1 2,2 2,3 2,4 2,5 2,6 2,7 Thelowerthebetter MAPE(%) Iteration Models added/combined We now have a working algorithm with ~2% error for an aggregated national load.
  16. 16. EDP Inovação Meanwhile we also implemented a R wind generation forecast model based on wind velocity + air pressure and also the energy supplied by the wind farm Forecast D+1 - Forecast - Actual Forecast D+2 - Forecast - Actual Forecast D+3 - Forecast - Actual 7% 8% 12% NMAE Normalized mean absolute error Test conditions: • 9 months calibration data + 1 month validation data • Hourly generation measurements and forecasts of wind velocity@10m and pressure@MSL (72h time horizon, 3h intervals)
  17. 17. EDP Inovação 17 New challenges on load forecasting emerge when we decrease the voltage level (substations and distribution transformers) August August Christmas and New year Christmas and New year Network reconfigurations? Done so far:  Implemented 2 Big Data Clusters (Cloudera Hadoop)  Developed an architecture for the Project  Developed a Load forecast model with ~2% MAPE for national load  Developed a Wind forecast model with ~10% error Next Steps:  Improve existing models  Incorporate network configurations on the forecast module (state estimation, network status)  Cluster different types of load by voltage level, load tipification etc.  Wind farms state estimation  Photovoltaic model definition and implementation  Collect data from the source systems in a continuous way
  18. 18. EDP Inovação 18 Agenda 1. Introduction to EDP 2. Motivation 3. Project PREDIS – Real time Load and Generation disaggregated forecast 4. EDP Future IT Architecture 5. Conclusions
  19. 19. EDP Inovação 19 The PREDIS project and other use-cases revealed a series of limitations that currently exist in the IT systems • How can we expand analytics knowledge in business areas? • How can we achieve massive data extractions without impacting the performance of existing operational systems? • How can we avoid a proliferation of interfaces each one with a specific function? • How can we interpret the data that exists in current IT systems? • How to “democratize” the access to data so that multi source analytics can be developed? • How to “stream” the data needed to address some of the use-cases identified ? These requirements were important to develop a new approach of IT systems and lead to a specific analysis of the current Analytics architecture
  20. 20. EDP Inovação 20 Traditionally a utility has analytic solutions based on a silo oriented BI architecture that isn’t prepared to deal with high volumes of data with structured and unstructured information InformationUsage Integration Software application Operation Software Application … Software Application Sap Application … SAP Application Software Application BW  Redundancy of information  Lack of connectivity between the different information “silos”  Little or non-existing related information at a disaggregated level This was the vision of the IT architecture till 2010. Meanwhile IT world has changed, but the business needs are still the same. It is necessary to have a Strategic, Tactic and Operational vision. Analytic level Operational Level SAP ExtractorsETL / Active Data Guard / Golden Gate
  21. 21. EDP Inovação Usage 21 Information Integration Application Operation Software Application … Software Application SAP Application … SAP Application Software Application External Sources 3 Nowadays operational systems create more data every day in the 3V’s that characterize Big Data (Volume, Variety, Velocity). A conventional infrastructure cannot handle operational activities and advanced analytics in due time. Big Data comes as an option that allows data ingestion and advanced analytics of high volumes of data oriented to one of the 3V’s (Volume, Variety, Velocity), keeping operational systems with their normal activities. SAP Extractors ETL / Golden Gate Interaction BW“DataLake” MDU GR MDU GA MDU GE Analytic Level Opoerational Level
  22. 22. EDP Inovação The new analytic-oriented architecture is based on a Data Lake and on UDMs with the information from the different IT/OT systems fed with CDC (Change Data Capture) interfaces leveraging new analytics 22 Catalogue DataWarehouse CorpODS BW OT Events Engine Virtual data warehouse Real-time Dashboard DataGovernance Business Solutions Comercial Analytics Performace MgmtAsset Analytics Asset Performance Distributed Load Forecast Energy Balance Fraud Predictive Maintenance Data Sources External SourcesSAPSAP Non-SAP BW Réplicas Data Lake (“big data”) MDU D (Gestão de ativos, Gestão comercial, Gestão de energia, Gestão da rede) HDFS DM DM DM DM HDFS HDFS HDFS Reporting Alerts Dashboards Discovery Advanced Analytics Geo Analytics Data exploration tools API’s Business functionalities DataGovernance OT Data Exploration Data Access Data Processing + Data Repository Data Ingestion
  23. 23. EDP Inovação The new analytic-oriented architecture is based on a Data Lake and on UDMs with the information from the different IT/OT systems fed with CDC (Change Data Capture) interfaces leveraging new analytics 23 Catalogue DataWarehouse CorpODS BW OT Events Engine Virtual data warehouse Real-time Dashboard DataGovernance Business Solutions Commercial Analytics Performace MgmtAsset Analytics Asset Performance Distributed Load Forecast Energy Balance Fraud Predictive Maintenance Data Sources External SourcesSAPSAP Non-SAP BW Replicas Data Lake (“big data”) Unified Data Model (Asset Mgmt, Commercial Mgmt, Energy Mgmt, Grid Mgmt) HDFS DM DM DM DM HDFS HDFS HDFS Reporting Alerts Dashboards Discovery Advanced Analytics Geo Analytics Data exploration tools API’s
  24. 24. EDP Inovação Big Data Platform/Cluster HDFS (Storage) Hadoop Distributed File System Hbase Columnar Store Mahout Machine Learning Hive SQL Query IMPALA/ SPARK In-memory Map Reduce/YARN (Resource Management) Distributed Processing Framework Web app API – data access and data modeling Externalaccess todatadownload System A System B Files PREDIS Forecast Model 1 implemented on R Model 2 implemented on R SIT EDM (SGL) Ei-Server Rede Activa SCADA-BI External data sources Dataextractandloading IPMA SGL SIT New Model implemented on R Deploy Resultsofnew modelsdeployed Sqoop Kafka PREDIS instantiated in the new architecture Files CDC
  25. 25. EDP Inovação 25 We are now building the data lake infrastructure whilst we acquire knowledge in this new type of architecture supported in two Hadoop clusters: an Enterprise Grade and a Low Cost as “sand box” Purpose: Internal enterprise level cluster for projects support  Data confidentiality guaranteed  Hardware quality (Enterprise grade)  Prepared to scale horizontally  Cloudera Hadoop and R  7 nodes/servers (dimensioned for PREDIS project) ENTERPRISE GRADE DEVELOPMENT CLUSTER Purpose: Internal test and development cluster assembly  Big Data platform knowledge development  Low cost platform (desktop PCs)  Cloudera Hadoop and R  48 nodes/servers LOW COST CLUSTER Purpose: Enterprise DataLake cluster  Oracle Big Data Appliance  Seamless integration with Exadata  Prepared to scale horizontally  Cloudera Hadoop Enterprise  6 nodes/servers with superior characteristics and hardware optimized ENTERPRISE PRODUCTION DATALAKE INFRASTRUCTURE
  26. 26. EDP Inovação 26 Findings & Conclusions • Big Data Analytics is a continuous learning process and a cultural change. To overcome the lack of knowledge in this subject a Advanced Analytics and Machine Learning training in R is being lectured in EDP. Additionally a SAS Miner training is scheduled for the 2nd trimester of 2017 • Access to overloaded source systems’ data can be difficult. CDC (change data capture) extractors seem the best way to extract data from the source systems and have the data available near real time to analytics users • The development of a Data Lake will decrease the number of interfaces between operational systems • The Unified Data Model, where information is organized and cataloged, allows a unique vision of all available data (and can support the “single source of truth”). But we need Data Governance! • Support of a experienced IT partner for several of the activities involved is essential to avoid major pitfalls and to help detail architecture “sweet spots” for each use-case
  27. 27. EDP Inovação Obrigado!

×