High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

High Resolution Energy
Modeling that Scales with
Apache Spark 2.0
Jonathan Farland
Consultant | Data Scientist, DNV GL

About me
• Data Scientist & Technical Consultant for DNV
GL’s Policy Advisory and Research Group.
• Background in Econometrics, Forecasting,
Machine Learning and Optimization.
• Working with Big Data for 3+ years

Agenda
• Introduction to DNV GL
• Energy Data Science using Spark
– Data Scales and the DGP
– Application 1 – Princeton Score Keeping Method
(PRISM)
– Application 2 – Hourly Predictive Modelling with
Distributed Energy Resources
• Next Steps with Spark and Databricks

Introduction to DNV GL
Jonathan Farland

Energy Data Science:
Data Scales and the DGP
Jonathan Farland

Metering Data: Historical measured quantities of electricity usage for a site or
meter during a particular time.
- An analogue origin requiring a physical reading of the meter on a specific cycle.
- Typically used for utility companies to bill customers for their usage
- Advanced metering technologies and machine learning now allows for millisecond
reading and disaggregation down to the end use / appliance level.
Weather Data:
- Actual Weather: Records of temperature, humidity, cloud cover, solar irradiance, etc.
- Typical Weather: 30-year / 10-year averages that define “normal” weather conditions
Data Generating Process

Electricity Distribution Grid
Transmission Distribution ConsumerGeneration Transmission Distribution ConsumerGeneration
Wind
Farms
Photo
Voltaic
Aggregated
Utility Scale
2-50 MW
Utility
Scale
100kW-2MW
Distributed
Scale
25kW-100kW
Residential
Commercial
& Industrial
DistributionTransmissionGeneration
Bulk
Storage
> 50 MW
Distribution
System
Bulk
System
PhotovoltaicWind
Farms

The embarrassingly parallel ‘Primary Modeling Unit’:
I. Temporal: Sub-hourly, hourly, daily, monthly, annually
II. CrossSectional: Clusters/Segments, Geography, System Hierarchy.
III. Hybrid: Structure and Year specific
Databricks: Rapid deployment and development of existing analytics pipeline
Spark 2.0: SparkR allows for UDF’s and Partition-Based Model Learning
- gapply, dapply, lapply
Spark 2.1: Enable installing third party packages on workers using spark.addfile
- SPARK-7159: Multiclass Logistic Regression in DataFrame-based API
Analytical Solution

Princeton Score Keeping
Method (PRISM)
Jonathan Farland

PRISM Algorithm
- Decomposes energy usage into it’s weather-driven and baseload
components.
- Site level modelling that combine both full and reduced form models
- Grid search over possible heating and cooling reference temperatures
- Rich history development based on fundamental structural
engineering principles
- Origin: Miriam Goldberg's dissertation "A Geometrical Approach to
Non-differentiable Regression Models as Related to Methods for
Assessing Residential Energy Conservation.“

Just a little math…
Et = β0 + βhH(τh) + βcC(τc) + εt
C(τc) =
0, xt − τc < 0
xt − τc, xt − τc ≥ 0
H(τh) =
τh − xt, xt − τh < 0
0, xt − τh ≥ 0

SparkR – gapply, dapply, lapply

Predictive Modeling with
Distributed Energy Resources
Jonathan Farland

21
Load Shifting: Electric Vehicles
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Demand(kW)
Hour Ending
Standard Rate Electric Vehicle Rate

22
-
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Load(kWh)
Hour Ending
Forecasted - DR Reduction Forecasted - DR Baseline
Forecasted - DR Impacted Load Actual DR - Reduction
Load Reduction: Demand Response

Cluster Sizes:
1 – 10,495
2 – 4,513
3 – 1,127
4 – 9,823
Digitalization:
Scalable Cluster Computing
(Spark, Python, R)
Data Science:
Machine Learning Algorithms
(Spectral Clustering and K-means)
Predictive Analytics
(Semiparametric Regression)

How well did it work?
Cluster 1 Cluster 4

0
0.5
1
1.5
2
2.5
3
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
101
105
109
113
117
121
125
129
133
137
141
kW
Forecast Horizon
Load Forecast Adjusted Load Forecast PV Production Storage Discharging
ClusterSite Tech Simulations

Conclusions
Jonathan Farland

Spark 2.0 / 2.1 has allowed DNV GL’s existing expertise and code base
to scale
Databricks has provided an environment that facilitated existing
codebases as well as additional rapid development
- Analytical contexts, prediction goals, and model selection processes define
the Primary Modeling Unit (PMU) in any Energy Data Science Application.
- The distributed computing framework must be able to scale with the
appropriate Primary Modeling Unit for any Energy Data Science Application
Take Home Message

Modeling Additional Fuels
- Natural Gas (Therms)
- Water (Liters / Gallons)
- Hybrid (British Thermal Units)
Climate Change Simulations
- DNV GL’s BayTown System Dynamics Model
Electricity Grid Optimization with Distributed Energy Resource Assets
The Future!

Thank You.
Jonathan Farland
jon.farland@dnvgl.com
https://github.com/jfarland

High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Similar to High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland (20)

More from Spark Summit

More from Spark Summit (20)

Recently uploaded

Recently uploaded (20)

High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Editor's Notes