As advanced sensor technologies are becoming widely deployed in the energy industry, the availability of higher-frequency data results in both analytical benefits and computational costs. To an energy forecaster or data scientist, some of these benefits might include enhanced predictive performance from forecasting models as well as improved pattern recognition in energy consumption across building types, economic sectors, and geographies. To a utility or electricity service provider, these benefits might include significantly deeper insights into their diverse customer base. However, these advantages can come with a high computational price tag. With Spark 2.0, User-Defined Functions can be applied across grouped SparkDataFrames in the SparkR API to solve the multivariate optimization and model selection problems typically required for fitting site-level models. This recently added feature of Spark 2.0 on Databricks has allowed DNV GL to efficiently fit predictive models that relate weather, electricity, water, and gas consumption across virtually any number of buildings.
2. About me
• Data Scientist & Technical Consultant for DNV
GL’s Policy Advisory and Research Group.
• Background in Econometrics, Forecasting,
Machine Learning and Optimization.
• Working with Big Data for 3+ years
3. Agenda
• Introduction to DNV GL
• Energy Data Science using Spark
– Data Scales and the DGP
– Application 1 – Princeton Score Keeping Method
(PRISM)
– Application 2 – Hourly Predictive Modelling with
Distributed Energy Resources
• Next Steps with Spark and Databricks
7. Metering Data: Historical measured quantities of electricity usage for a site or
meter during a particular time.
- An analogue origin requiring a physical reading of the meter on a specific cycle.
- Typically used for utility companies to bill customers for their usage
- Advanced metering technologies and machine learning now allows for millisecond
reading and disaggregation down to the end use / appliance level.
Weather Data:
- Actual Weather: Records of temperature, humidity, cloud cover, solar irradiance, etc.
- Typical Weather: 30-year / 10-year averages that define “normal” weather conditions
Data Generating Process
8. Electricity Distribution Grid
Transmission Distribution ConsumerGeneration Transmission Distribution ConsumerGeneration
Wind
Farms
Photo
Voltaic
Aggregated
Utility Scale
2-50 MW
Utility
Scale
100kW-2MW
Distributed
Scale
25kW-100kW
Residential
Commercial
& Industrial
DistributionTransmissionGeneration
Bulk
Storage
> 50 MW
Distribution
System
Bulk
System
PhotovoltaicWind
Farms
11. The embarrassingly parallel ‘Primary Modeling Unit’:
I. Temporal: Sub-hourly, hourly, daily, monthly, annually
II. CrossSectional: Clusters/Segments, Geography, System Hierarchy.
III. Hybrid: Structure and Year specific
Databricks: Rapid deployment and development of existing analytics pipeline
Spark 2.0: SparkR allows for UDF’s and Partition-Based Model Learning
- gapply, dapply, lapply
Spark 2.1: Enable installing third party packages on workers using spark.addfile
- SPARK-7159: Multiclass Logistic Regression in DataFrame-based API
Analytical Solution
13. PRISM Algorithm
- Decomposes energy usage into it’s weather-driven and baseload
components.
- Site level modelling that combine both full and reduced form models
- Grid search over possible heating and cooling reference temperatures
- Rich history development based on fundamental structural
engineering principles
- Origin: Miriam Goldberg's dissertation "A Geometrical Approach to
Non-differentiable Regression Models as Related to Methods for
Assessing Residential Energy Conservation.“
29. Spark 2.0 / 2.1 has allowed DNV GL’s existing expertise and code base
to scale
Databricks has provided an environment that facilitated existing
codebases as well as additional rapid development
- Analytical contexts, prediction goals, and model selection processes define
the Primary Modeling Unit (PMU) in any Energy Data Science Application.
- The distributed computing framework must be able to scale with the
appropriate Primary Modeling Unit for any Energy Data Science Application
Take Home Message
30. Modeling Additional Fuels
- Natural Gas (Therms)
- Water (Liters / Gallons)
- Hybrid (British Thermal Units)
Climate Change Simulations
- DNV GL’s BayTown System Dynamics Model
Electricity Grid Optimization with Distributed Energy Resource Assets
The Future!
Adjust the underlying data in this graphic to make this not a real forecast.
- load shedding and load reductions
Talk about DNV GL’s service offering for micro demand response forecasting in the short run…
Briefly describe some of the algorithms and calibrations used for this project
Adjust the underlying data in this graphic to make this not a real forecast.
- load shedding and load reductions
Talk about DNV GL’s service offering for micro demand response forecasting in the short run…
Briefly describe some of the algorithms and calibrations used for this project
Resource: http://spark.apache.org/releases/spark-release-2-1-0.html
Extenion of Spark 2.1 in other Energy related analytics
Example: Climate Change Simulations in San Francisco Bay Area (DNV GL’s BayTown System Dynamics Model).
Additional Fuels / Dependent Variables to Model:
Natural Gas (Therms)
Water (Liters / Gallons)
Hybrid (British Thermal Units)
Grid Optimization with Distributed Energy Resources Assets
Resource: http://spark.apache.org/releases/spark-release-2-1-0.html
Extenion of Spark 2.1 in other Energy related analytics
Example: Climate Change Simulations in San Francisco Bay Area (DNV GL’s BayTown System Dynamics Model).
Additional Fuels / Dependent Variables to Model:
Natural Gas (Therms)
Water (Liters / Gallons)
Hybrid (British Thermal Units)
Grid Optimization with Distributed Energy Resources Assets