LF Energy OpenEEmeter measures the energy impacts of demand-side interventions in buildings. OpenEEmeter 4.0 provides enhanced performance of the daily model with dramatically reduced seasonal and weekend/weekday bias, along with increased computational efficiency.
This webinar explores how OpenEEMeter:
-Reduces seasonal bias in the daily model by 84%
-Reduces weekend/weekday bias in the daily model by 95%
-Runs up to 100x faster with monthly data, and 2 - 10x faster with daily data
-Along with this release, the OpenEEMeter community is also publishing a detailed 4.0 model specification and results of thorough testing conducted across residential and commercial sectors and gas and electric fuels.
Speakers:
-Adam Scheer, Vice President of Applied Data Science, Recurve
-Travis Sikes, Data Science Manager, Recurve
-Jason Chulock, Lead Engineer, Recurve
2. Antitrust Policy Notice
Linux Foundation meetings involve participation by industry competitors, and it is the
intention of the Linux Foundation to conduct all of its activities in accordance with
applicable antitrust and competition laws. It is therefore extremely important that
attendees adhere to meeting agendas, and be aware of, and not participate in, any
activities that are prohibited under applicable US state, federal or foreign antitrust and
competition laws.
Examples of types of actions that are prohibited at Linux Foundation meetings and in
connection with Linux Foundation activities are described in the Linux Foundation
Antitrust Policy available at linuxfoundation.org/antitrust-policy. If you have
questions about these matters, please contact your company counsel, or if you are a
member of the Linux Foundation, feel free to contact Andrew Updegrove of the firm of
Gesmer Updegrove LLP, which provides legal counsel to the Linux Foundation.
3. ● Purpose and Brief History
● Methods Review
● Issues and Key Results
● Methods Advancements (The How)
○ Accuracy
○ Speed
● API Improvements
Agenda
5. Establish “weights and measures” for
demand side programs
Enable our industry to compete at
scale, including against supply side
options
Remove measurement barriers to
integrated programs
Purpose
7. 2012/2013
“CalTRACK” methods
development initiated to
calibrate building
software tools
2017
OpenEEmeter 3.0:
- Daily Improvements
- Hourly Methods
OpenEEmeter Timeline
2016
OpenEEmeter 1.0:
- Monthly Methods
- Daily Methods
- OpenEEmeter
2019
OpenEEmeter joined
LF Energy as open
source project
7
2024
OpenEEmeter 4.0:
- New Daily Model
- Vastly Improved API
8. The industry is changing fast
- Energy
- Utility
- Demand Side Program
We need measurement capabilities that
enable, not inhibit, modern programs
Times They Are A-Changin’
8
17. OpenEEmeter 3.0 = Sloooowww
● Exhaustive grid search
● 1,891 models for every meter
● 20 - 60 seconds per meter
OpenEEmeter 4.0 = Efficient
● Can replicate OpenEEmeter 3.0 at ~0.5 seconds per meter (~100x faster)
Computational Efficiency
17
25. Identified Issues: Ordinary Least Squares
Solution: Adaptive, robust loss function to down-weight outliers
Standard Deviations from Mean
Loss
Response
25
27. Identified Issues: Computational Efficiency
Grid search is slow compared to global optimization
Grid Search
9 evaluations
Optimization
9 evaluations
27
28. Identified Issues: Computational Efficiency
Grid search is slow compared to global optimization
Grid Search
25 evaluations
Optimization
9 evaluations
1891 models created 1.5 models created
28
29. Secret ingredient #1: Balance Point Optimization
● Initial guess: BP at 10% and 90% of data
● Use DIRECT global optimization method
Identified Issues: Computational Efficiency
Solution: Use optimization to find balance points
Initial guesses
29
30. Identified Issues: Computational Efficiency
Solution: Use Elastic Net to only fit one model and penalize coefficients
Ordinary Least Squares goal
● Minimize residuals
Elastic Net goal
● Minimize residuals + coefficients
30
31. Identified Issues: Computational Efficiency
Solution: Use Elastic Net to only fit one model and penalize coefficients
Ordinary Least Squares goal
● Minimize residuals
Elastic Net goal
● Minimize residuals + coefficients
Secret ingredient #2
31
33. How do we choose to split or not?
Strive for optimal fitting using test error
No splitting All possible
splits
33
34. Experimental Design Considerations
Cannot assess test error from reporting period
Bad Assumption
Reporting period = Baseline period
Wouldn’t it be nice if…
We could exclusively use baseline data and
achieve predictive testing
We need some tools!
34
35. Average performance
of all folds
Why do we need CV?
● Best model parameters
Where? → baseline period!
Goal? → predictive testing
Experimental Design Considerations
Can assess predictive error using cross validation
35
36. Cross validation
● Useful in development
● Untenable in final product → computational time
Can we approximate CV?
● Yes! Selection criterion
● Selection Criterion = SSE + penalty
● But it's meant to reduce master model
Final Model
Cross validation would be too slow for 1M buildings
36
37. What is the best penalization for model complexity
● Selection Criterion = SSE + penalty
(Modified Bayesian Information Criterion)
What are best parameters?
● Based on 10-fold cross validation RMSE (predictive)
● 6000 meters (4000 res gas, 1000 res elec, 1000 comm elec)
Final Model
Create a selection criterion function to select splits
37
41. API Improvements
Inspired by Sklearn’s simplicity
● Sklearn manages many complex models with a simple interface
● We should do the same
cluster_algo = [
cluster.MiniBatchKMeans(),
cluster.AgglomerativeClustering(),
cluster.Birch(),
cluster.DBSCAN(),
]
for algo in cluster_algo:
algo.fit(X)
res = algo.predict(X_new)
regres_algo = [
linear_model.LinearRegression(),
linear_model.ElasticNet(),
linear_model.BayesianRidge(),
linear_model.RANSACRegressor(),
]
for algo in regres_algo:
algo.fit(X, y)
res = algo.predict(X_new)
Clustering API Regression API
Completely
different, but
almost same
API?
41
42. OpenEEmeter 3.0 OpenEEmeter 4.0
● Most steps copied from tutorials
(user feedback)
● Different processes for daily and
hourly modeling
● User sets options in function calls
● Intermediate information passed
between function calls
● Simple function calls:
initialize/fit/predict
● Same function calls regardless of
model
● Sensible defaults
● Intermediate information within
class
API Improvements
Goal is ease of use
42
44. # create a design matrix for occupancy and segmentation
preliminary_design_matrix = create_caltrack_hourly_preliminary_design_matrix(
baseline_meter_data, temperature_data, degc
)
# build 12 monthly models - each step from now on operates on each segment
segmentation = segment_time_series(
preliminary_design_matrix.index, "three_month_weighted"
)
# assign an occupancy status to each hour of the week (0-167)
occupancy_lookup = estimate_hour_of_week_occupancy(
preliminary_design_matrix, segmentation=segmentation
)
# assign temperatures to bins
(
occupied_temperature_bins,
unoccupied_temperature_bins,
) = fit_temperature_bins(
preliminary_design_matrix,
segmentation=segmentation,
occupancy_lookup=occupancy_lookup,
)
# build a design matrix for each monthly segment
segmented_design_matrices = create_caltrack_hourly_segmented_design_matrices(
preliminary_design_matrix,
segmentation,
occupancy_lookup,
occupied_temperature_bins,
unoccupied_temperature_bins,
)
# build a CalTRACK hourly model
baseline_model = fit_caltrack_hourly_model(
segmented_design_matrices,
occupancy_lookup,
occupied_temperature_bins,
unoccupied_temperature_bins,
)
# compute metered savings for the year of the reporting period we've selected
result, error_bands = metered_savings(
baseline_model,
reporting_meter_data,
temperature_data,
with_disaggregated=True,
degc=degc,
)
baseline_data = HourlyBaselineData(baseline_df)
reporting_data = HourlyReportingData(reporting_df)
model = HourlyModel(settings=None).fit(baseline_data)
result = model.predict(reporting_data)
OpenEEmeter 3.0 OpenEEmeter 4.0
Simplified Hourly Model
44
45. Data Class
Tracks disqualification and formats data for Model class
● Track all data sufficiency
● Unique for each model type
● Must be run to pass to Model
(Can bypass in model)
● Formats data for Model class
● Violations are propagated to
Model class
baseline_data = BaselineData(baseline_df)
baseline_data.disqualification
baseline_data.warnings
Disqualification -
{
'qualified_name':
'eemeter.sufficiency_criteria.too_many_days_with_missing_data',
'description': 'Too many days in data have missing meter data or
temperature data.',
'data': {'n_valid_days': 251, 'n_days_total' : 365}}
}
Warnings -
{'qualified_name':
'eemeter.sufficiency_criteria.missing_high_frequency_meter_data',
'description': 'More than 50% of the high frequency Meter data is
missing.',
'data': [Timestamp('2020-02-29 00:00:00+0000', tz='UTC')]
}
45
47. Conclusion
Model
● 84% less seasonal bias
● 95% less weekday/weekend bias
● Daily model is 2 - 10x faster
● Billing model is 100x faster
● Hyperparameters are broadly applicable
47
pip install eemeter
48. Conclusion
API
● Standard calls for all models (fit/predict)
● Data class
○ Formats data for models
○ Checks sufficiency
○ Provides disqualification reasons
48
pip install eemeter
49. Technical Steering Committee:
● Adam Scheer, Recurve
● McGee Young, WattCarbon
● Phil Ngo, Recurve
● Travis Sikes, Recurve
● Steve Suffian, WattCarbon
Key Contributors
● Armin Aligholian, Recurve
● Jason Chulock, Recurve
● Joydeep Nag, Recurve
● Ethan Goldman, Resilient Edge
● Matt Fawcett, Carbon Co-op
● James Fenna, Carbon Co-op
49
People
50. Ongoing Work: Hourly Model!
Hourly Model
● 10x faster
● Huge improvement for solar
PV customers
● More flexible
● Data class
50
Error
Improvement
%
Percent Daily Cloudiness
Solar PV Customers
https://www.caltrack.org/technical-working-group.html
Join the working group!
53. Seasonal error
profiles: sharp
features around
balance point
temperature
Space heating
initiated at warmer
outside temps in
winter
Why is there Seasonal Bias?
54. Identified Issues: Ordinary Least Squares
Solution: Adaptive, robust loss function to down-weight outliers
54
Standard Deviations from Mean
Loss
Response
55. Identified Issues: Computational Efficiency
Solution: Only fit components once
55
Secret ingredient #3: Reuse component fits
● ~40-50 possible combinations of components
● Save component fits and reuse