SlideShare a Scribd company logo
1 of 18
Download to read offline
Spark for Behavior
Analysis Research
Ling Jin, Sam Borgeson, Anna Spurlock, Annika Todd
Doris Lee, Alex Sim, John Wu
Lawrence Berkeley National Lab
www.lbl.gov
Adam Smith’s Most Famous Books?
Behavioral Analysis Research 3
One-minute Behavioral Economics
Simpler to analyze and predict Harder to predict -- Need
massive data and computers
Behavioral Analysis Research 4
How Are Blackjack and Electric
Power Grid Similar?
Bust when demand is larger
than supply Bust when over 21
Behavioral Analysis Research 5
• Residential electricity records: 100,000 households,
hourly electricity usage
• A region where electricity usage peaks in the
summer time
• Randomized controlled trial of new rate structures -
households are randomly placed in different groups
• Control: using existing fixed rate for electricity
• Active: households opted in to Time-of-Use Pricing
• Passive: households defaulted in to Time-of-Use Pricing
• Data collected hourly over three years, one pre-
rate, two after (labeled T-1, T, T+1)
6
Reducing Peak Demands Through
Pricing
Behavioral Analysis Research
Goal Method Policy Implication
Better baseline models of energy
use
Gradient tree boosting Better program evaluation
Define relevant household
characteristics:
Classify and segment
households using easily
accessible data
• Define representative load
shapes
Adaptive K-means clustering
• Estimate household-specific
cooling change points (AC set
point)
Piecewise linear regression,
bootstrapping
• Characterize customers into
“Lifestyle Groups”
Blend behavioral theory with
machine learning techniques
• Define relevant household
energy characteristics
Simple feature algorithms (e.g.,
mean, min, max, peak usage;
variance; entropy, etc.)
7
Research Examples
Behavioral Analysis Research
Baselines are Critical for Measuring
Changes
Behavioral Analysis Research 8
Current Standard: randomized control group
Predictions from new Baseline Technique
Active group Passive group
9
Group PT PT +1 MT −PT MT+1 −PT+1
Control 1.960 1.957 0.069 -0.020
Passive 1.849 1.897 -0.027 -0.080
Active 1.860 1.903 -0.164 -0.164
Observation:the Active group reduced their uses of electricity consistently over the
two yeas of study during the peak-demand hours
P: predicted
M: measured
Behavioral Analysis Research
Goal Method Policy Implication
Better baseline models of energy
use
Gradient tree boosting Better program evaluation
Define relevant household
characteristics:
Classify and segment
households using easily
accessible data
• Define representative load
shapes
Adaptive K-means clustering
• Estimate household-specific
cooling change points (AC set
point)
Piecewise linear regression,
bootstrapping
• Characterize customers into
“Lifestyle Groups”
Blend behavioral theory with
machine learning techniques
• Define relevant household
energy characteristics
Simple feature algorithms (e.g.,
mean, min, max, peak usage;
variance; entropy, etc.)
10
Research Examples
Behavioral Analysis Research
Daily Load Shape: Definition
Objective / Definition
• 24-hour electricity usage
pattern
• To capture hour variations:
don’t care about constant
usage
• Want to use load shape to
study mechanism that might
affect electricity usage,
therefore concentrate on
discretionary demand
Samples
• A load shape is the
pattern created by 24
hours of demand
data.
• Shapes are produced
by patterns of
occupancy,
equipment
ownership, and
behavior
5
Defining Load Shapes
Flat/unoccupied Solar Morn/work/eve
Out for lunch? AC dominated?
Active at night?Behavioral Analysis Research 11
Clustering Process
Behavioral Analysis Research 12
Load data
preprocessing
Load shape
clustering
Post clustering
processing
Smart meter customer-
day load shape data
(24 hourly observations
per customer-day)
Subsample of
100,000
customer-day
load shapes
De-min*
Normalize
Eliminate load shapes
with missing data or
extremely low power
demand (below 0.2kW)
Adaptive k-means
algorithm
(parameter θ chosen)
Hierarchical
merging
Iterative truncation
algorithm*
(parameter V chosen)
Definition of final
dictionary of load shapes
Assign full dataset of
customer daily load
shapes to the closest
dictionary load shape
* Innovations we introduce
Components of Clustering Process
• Data cleaning and normalization
• Remove households with very low demand (<0.2kW)
• Normalization: compute hourly usage to hourly contribution to daily usage
• Clustering
• Centroid-based methods: Kmeans, Adaptive Kmeans
• Hierarchical clustering: distance metric, linkage
• Density-based clustering: DBSCAN
• Model-based clustering: GMM
• Judging cluster quality
• Compactness: MIA, VRSE
• Distinctness: SMI
• Combined: DBI, Silhouette
13
Clustering Quality Index
14
Clustering Quality
15
More compact
More distinct
Puzzling??
Best choice?
Kmeans Produces More Balanced Clusters
16
Sizes of clusters as fractions of total number of samples
Behavioral Analysis Research
Summary
Ø Examined a good number of clustering methods for identifying
daily usage profiles
Ø Short observation:
– Centroid-based method (Kmeans) works the best on this set of data
Ø Longer observations:
– There are too many choices and no (really) clear winner
– Centroid-based methods produce more compact clusters
– Centroid-based methods produce more balanced cluster
– DBSCAN declare many (up to 90%) data points as background
– Our observation is different from previous published results (Chicco
2012)
Ø Future work
– Maybe a different clustering quality metric would work better?
– Should examine alternative profile generation methods, other than
clustering
Behavioral Analysis Research 17
Thank You.
John Wu
Email: John.Wu@nersc.gov

More Related Content

What's hot

Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...Absi Ahmed
 
Slide 1
Slide 1Slide 1
Slide 1butest
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
 
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloudEnergy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloudLinda J
 
Handling High Energy Physics Data using Cloud Computing
Handling High Energy Physics Data using Cloud ComputingHandling High Energy Physics Data using Cloud Computing
Handling High Energy Physics Data using Cloud ComputingAbhishek Dey
 
Ahmed Absi slides bigbwa
Ahmed Absi slides  bigbwaAhmed Absi slides  bigbwa
Ahmed Absi slides bigbwaAbsi Ahmed
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceSpiros Oikonomakis
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Otávio Carvalho
 
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...AzarulIkhwan
 
Earth Science Platform
Earth Science PlatformEarth Science Platform
Earth Science PlatformTed Habermann
 
Graph computation
Graph computationGraph computation
Graph computationSigmoid
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirSpark Summit
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUsCarol McDonald
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Progres...
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Progres...Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Progres...
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Progres...AtakanAral
 
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]AtakanAral
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Spark Summit
 

What's hot (19)

Data automation 101
Data automation 101Data automation 101
Data automation 101
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
 
Slide 1
Slide 1Slide 1
Slide 1
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloudEnergy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
 
Handling High Energy Physics Data using Cloud Computing
Handling High Energy Physics Data using Cloud ComputingHandling High Energy Physics Data using Cloud Computing
Handling High Energy Physics Data using Cloud Computing
 
Ahmed Absi slides bigbwa
Ahmed Absi slides  bigbwaAhmed Absi slides  bigbwa
Ahmed Absi slides bigbwa
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/Reduce
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
 
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
 
Earth Science Platform
Earth Science PlatformEarth Science Platform
Earth Science Platform
 
Graph computation
Graph computationGraph computation
Graph computation
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
STDCS
STDCSSTDCS
STDCS
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Progres...
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Progres...Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Progres...
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Progres...
 
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
 

Viewers also liked

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkDatabricks
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Spark Summit
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteSpark Summit
 
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaRISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaSpark Summit
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...Spark Summit
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Spark Summit
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Spark Summit
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Spark Summit
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...Spark Summit
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Spark Summit
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit
 
Bulletproof Jobs: Patterns for Large-Scale Spark Processing: Spark Summit Eas...
Bulletproof Jobs: Patterns for Large-Scale Spark Processing: Spark Summit Eas...Bulletproof Jobs: Patterns for Large-Scale Spark Processing: Spark Summit Eas...
Bulletproof Jobs: Patterns for Large-Scale Spark Processing: Spark Summit Eas...Spark Summit
 
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...Spark Summit
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Spark Summit
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Spark Summit
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 

Viewers also liked (20)

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
 
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaRISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
 
Bulletproof Jobs: Patterns for Large-Scale Spark Processing: Spark Summit Eas...
Bulletproof Jobs: Patterns for Large-Scale Spark Processing: Spark Summit Eas...Bulletproof Jobs: Patterns for Large-Scale Spark Processing: Spark Summit Eas...
Bulletproof Jobs: Patterns for Large-Scale Spark Processing: Spark Summit Eas...
 
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 

Similar to Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Machine Learning / AI Use Case for Energy
Machine Learning / AI Use Case for EnergyMachine Learning / AI Use Case for Energy
Machine Learning / AI Use Case for EnergyJustin Hayward
 
Demand side management
Demand side managementDemand side management
Demand side managementRohit Garg
 
Energy audit by Qazi Arsalan Hamid-Dy Manager Technical KESC
Energy audit by Qazi Arsalan Hamid-Dy Manager Technical KESCEnergy audit by Qazi Arsalan Hamid-Dy Manager Technical KESC
Energy audit by Qazi Arsalan Hamid-Dy Manager Technical KESCQazi Arsalan Hamid
 
Participatory engagement of stakeholders with energy models
Participatory engagement of stakeholders with energy modelsParticipatory engagement of stakeholders with energy models
Participatory engagement of stakeholders with energy modelsIEA-ETSAP
 
Smart Residential Energy Management System Using Machine Learning.pptx
Smart Residential Energy Management System Using Machine Learning.pptxSmart Residential Energy Management System Using Machine Learning.pptx
Smart Residential Energy Management System Using Machine Learning.pptx083AbdulJavvad
 
Advance Data Mining - Analysis and forecasting of power factor for optimum el...
Advance Data Mining - Analysis and forecasting of power factor for optimum el...Advance Data Mining - Analysis and forecasting of power factor for optimum el...
Advance Data Mining - Analysis and forecasting of power factor for optimum el...Shrikant Samarth
 
Modelling loads and responses in project SGEM (Smart Grids and Energy Markets)
Modelling loads and responses in project SGEM (Smart Grids and Energy Markets)Modelling loads and responses in project SGEM (Smart Grids and Energy Markets)
Modelling loads and responses in project SGEM (Smart Grids and Energy Markets)IEA DSM Implementing Agreement (IA)
 
Energy efficiency in wireless sensor network(ce 16 aniket choudhury)
Energy efficiency in wireless sensor network(ce 16 aniket choudhury)Energy efficiency in wireless sensor network(ce 16 aniket choudhury)
Energy efficiency in wireless sensor network(ce 16 aniket choudhury)अनिकेत चौधरी
 
Integrating renewables and enabling flexibility of households and buildings
Integrating renewables and enabling flexibility of households and buildingsIntegrating renewables and enabling flexibility of households and buildings
Integrating renewables and enabling flexibility of households and buildingsLeonardo ENERGY
 
Intelligent load management system
Intelligent load management systemIntelligent load management system
Intelligent load management systemVineela Reddy
 
Greg Thomson - Community Microgrid - (17 oct 2014)
Greg Thomson - Community Microgrid - (17 oct 2014)Greg Thomson - Community Microgrid - (17 oct 2014)
Greg Thomson - Community Microgrid - (17 oct 2014)annphancock
 
e-Research & the art of linking Astrophysics to Deforestation
e-Research & the art of linking Astrophysics to Deforestatione-Research & the art of linking Astrophysics to Deforestation
e-Research & the art of linking Astrophysics to DeforestationDavid Wallom
 
Net Energy Metering and Smart Meters
Net Energy Metering and Smart MetersNet Energy Metering and Smart Meters
Net Energy Metering and Smart MetersMark Valen
 
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGAI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGLokesh147875
 
Case Study: Operational Energy Reduction through Data Analysis & Virtual Benc...
Case Study: Operational Energy Reduction through Data Analysis & Virtual Benc...Case Study: Operational Energy Reduction through Data Analysis & Virtual Benc...
Case Study: Operational Energy Reduction through Data Analysis & Virtual Benc...EMEX
 
enCOMPASS at GIoT 2017
enCOMPASS at GIoT 2017enCOMPASS at GIoT 2017
enCOMPASS at GIoT 2017encompassH2020
 
10766012 ranalitics
10766012 ranalitics10766012 ranalitics
10766012 ranaliticsJason Chen
 
IRJET- An Energy Conservation Scheme based on Tariff Moderation
IRJET- An Energy Conservation Scheme based on Tariff ModerationIRJET- An Energy Conservation Scheme based on Tariff Moderation
IRJET- An Energy Conservation Scheme based on Tariff ModerationIRJET Journal
 
Demand Response Analysis in Buildings and Energy Systems Integration
Demand Response Analysis in Buildings and Energy Systems IntegrationDemand Response Analysis in Buildings and Energy Systems Integration
Demand Response Analysis in Buildings and Energy Systems IntegrationSaif Salih
 
Hydro-Thermal Scheduling: Using Soft Computing Technique Approch
Hydro-Thermal Scheduling: Using Soft Computing Technique ApprochHydro-Thermal Scheduling: Using Soft Computing Technique Approch
Hydro-Thermal Scheduling: Using Soft Computing Technique ApprochIOSR Journals
 

Similar to Spark for Behavioral Analytics Research: Spark Summit East talk by John W u (20)

Machine Learning / AI Use Case for Energy
Machine Learning / AI Use Case for EnergyMachine Learning / AI Use Case for Energy
Machine Learning / AI Use Case for Energy
 
Demand side management
Demand side managementDemand side management
Demand side management
 
Energy audit by Qazi Arsalan Hamid-Dy Manager Technical KESC
Energy audit by Qazi Arsalan Hamid-Dy Manager Technical KESCEnergy audit by Qazi Arsalan Hamid-Dy Manager Technical KESC
Energy audit by Qazi Arsalan Hamid-Dy Manager Technical KESC
 
Participatory engagement of stakeholders with energy models
Participatory engagement of stakeholders with energy modelsParticipatory engagement of stakeholders with energy models
Participatory engagement of stakeholders with energy models
 
Smart Residential Energy Management System Using Machine Learning.pptx
Smart Residential Energy Management System Using Machine Learning.pptxSmart Residential Energy Management System Using Machine Learning.pptx
Smart Residential Energy Management System Using Machine Learning.pptx
 
Advance Data Mining - Analysis and forecasting of power factor for optimum el...
Advance Data Mining - Analysis and forecasting of power factor for optimum el...Advance Data Mining - Analysis and forecasting of power factor for optimum el...
Advance Data Mining - Analysis and forecasting of power factor for optimum el...
 
Modelling loads and responses in project SGEM (Smart Grids and Energy Markets)
Modelling loads and responses in project SGEM (Smart Grids and Energy Markets)Modelling loads and responses in project SGEM (Smart Grids and Energy Markets)
Modelling loads and responses in project SGEM (Smart Grids and Energy Markets)
 
Energy efficiency in wireless sensor network(ce 16 aniket choudhury)
Energy efficiency in wireless sensor network(ce 16 aniket choudhury)Energy efficiency in wireless sensor network(ce 16 aniket choudhury)
Energy efficiency in wireless sensor network(ce 16 aniket choudhury)
 
Integrating renewables and enabling flexibility of households and buildings
Integrating renewables and enabling flexibility of households and buildingsIntegrating renewables and enabling flexibility of households and buildings
Integrating renewables and enabling flexibility of households and buildings
 
Intelligent load management system
Intelligent load management systemIntelligent load management system
Intelligent load management system
 
Greg Thomson - Community Microgrid - (17 oct 2014)
Greg Thomson - Community Microgrid - (17 oct 2014)Greg Thomson - Community Microgrid - (17 oct 2014)
Greg Thomson - Community Microgrid - (17 oct 2014)
 
e-Research & the art of linking Astrophysics to Deforestation
e-Research & the art of linking Astrophysics to Deforestatione-Research & the art of linking Astrophysics to Deforestation
e-Research & the art of linking Astrophysics to Deforestation
 
Net Energy Metering and Smart Meters
Net Energy Metering and Smart MetersNet Energy Metering and Smart Meters
Net Energy Metering and Smart Meters
 
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGAI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
 
Case Study: Operational Energy Reduction through Data Analysis & Virtual Benc...
Case Study: Operational Energy Reduction through Data Analysis & Virtual Benc...Case Study: Operational Energy Reduction through Data Analysis & Virtual Benc...
Case Study: Operational Energy Reduction through Data Analysis & Virtual Benc...
 
enCOMPASS at GIoT 2017
enCOMPASS at GIoT 2017enCOMPASS at GIoT 2017
enCOMPASS at GIoT 2017
 
10766012 ranalitics
10766012 ranalitics10766012 ranalitics
10766012 ranalitics
 
IRJET- An Energy Conservation Scheme based on Tariff Moderation
IRJET- An Energy Conservation Scheme based on Tariff ModerationIRJET- An Energy Conservation Scheme based on Tariff Moderation
IRJET- An Energy Conservation Scheme based on Tariff Moderation
 
Demand Response Analysis in Buildings and Energy Systems Integration
Demand Response Analysis in Buildings and Energy Systems IntegrationDemand Response Analysis in Buildings and Energy Systems Integration
Demand Response Analysis in Buildings and Energy Systems Integration
 
Hydro-Thermal Scheduling: Using Soft Computing Technique Approch
Hydro-Thermal Scheduling: Using Soft Computing Technique ApprochHydro-Thermal Scheduling: Using Soft Computing Technique Approch
Hydro-Thermal Scheduling: Using Soft Computing Technique Approch
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Recently uploaded (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

  • 1. Spark for Behavior Analysis Research Ling Jin, Sam Borgeson, Anna Spurlock, Annika Todd Doris Lee, Alex Sim, John Wu Lawrence Berkeley National Lab
  • 3. Adam Smith’s Most Famous Books? Behavioral Analysis Research 3
  • 4. One-minute Behavioral Economics Simpler to analyze and predict Harder to predict -- Need massive data and computers Behavioral Analysis Research 4
  • 5. How Are Blackjack and Electric Power Grid Similar? Bust when demand is larger than supply Bust when over 21 Behavioral Analysis Research 5
  • 6. • Residential electricity records: 100,000 households, hourly electricity usage • A region where electricity usage peaks in the summer time • Randomized controlled trial of new rate structures - households are randomly placed in different groups • Control: using existing fixed rate for electricity • Active: households opted in to Time-of-Use Pricing • Passive: households defaulted in to Time-of-Use Pricing • Data collected hourly over three years, one pre- rate, two after (labeled T-1, T, T+1) 6 Reducing Peak Demands Through Pricing Behavioral Analysis Research
  • 7. Goal Method Policy Implication Better baseline models of energy use Gradient tree boosting Better program evaluation Define relevant household characteristics: Classify and segment households using easily accessible data • Define representative load shapes Adaptive K-means clustering • Estimate household-specific cooling change points (AC set point) Piecewise linear regression, bootstrapping • Characterize customers into “Lifestyle Groups” Blend behavioral theory with machine learning techniques • Define relevant household energy characteristics Simple feature algorithms (e.g., mean, min, max, peak usage; variance; entropy, etc.) 7 Research Examples Behavioral Analysis Research
  • 8. Baselines are Critical for Measuring Changes Behavioral Analysis Research 8 Current Standard: randomized control group
  • 9. Predictions from new Baseline Technique Active group Passive group 9 Group PT PT +1 MT −PT MT+1 −PT+1 Control 1.960 1.957 0.069 -0.020 Passive 1.849 1.897 -0.027 -0.080 Active 1.860 1.903 -0.164 -0.164 Observation:the Active group reduced their uses of electricity consistently over the two yeas of study during the peak-demand hours P: predicted M: measured Behavioral Analysis Research
  • 10. Goal Method Policy Implication Better baseline models of energy use Gradient tree boosting Better program evaluation Define relevant household characteristics: Classify and segment households using easily accessible data • Define representative load shapes Adaptive K-means clustering • Estimate household-specific cooling change points (AC set point) Piecewise linear regression, bootstrapping • Characterize customers into “Lifestyle Groups” Blend behavioral theory with machine learning techniques • Define relevant household energy characteristics Simple feature algorithms (e.g., mean, min, max, peak usage; variance; entropy, etc.) 10 Research Examples Behavioral Analysis Research
  • 11. Daily Load Shape: Definition Objective / Definition • 24-hour electricity usage pattern • To capture hour variations: don’t care about constant usage • Want to use load shape to study mechanism that might affect electricity usage, therefore concentrate on discretionary demand Samples • A load shape is the pattern created by 24 hours of demand data. • Shapes are produced by patterns of occupancy, equipment ownership, and behavior 5 Defining Load Shapes Flat/unoccupied Solar Morn/work/eve Out for lunch? AC dominated? Active at night?Behavioral Analysis Research 11
  • 12. Clustering Process Behavioral Analysis Research 12 Load data preprocessing Load shape clustering Post clustering processing Smart meter customer- day load shape data (24 hourly observations per customer-day) Subsample of 100,000 customer-day load shapes De-min* Normalize Eliminate load shapes with missing data or extremely low power demand (below 0.2kW) Adaptive k-means algorithm (parameter θ chosen) Hierarchical merging Iterative truncation algorithm* (parameter V chosen) Definition of final dictionary of load shapes Assign full dataset of customer daily load shapes to the closest dictionary load shape * Innovations we introduce
  • 13. Components of Clustering Process • Data cleaning and normalization • Remove households with very low demand (<0.2kW) • Normalization: compute hourly usage to hourly contribution to daily usage • Clustering • Centroid-based methods: Kmeans, Adaptive Kmeans • Hierarchical clustering: distance metric, linkage • Density-based clustering: DBSCAN • Model-based clustering: GMM • Judging cluster quality • Compactness: MIA, VRSE • Distinctness: SMI • Combined: DBI, Silhouette 13
  • 15. Clustering Quality 15 More compact More distinct Puzzling?? Best choice?
  • 16. Kmeans Produces More Balanced Clusters 16 Sizes of clusters as fractions of total number of samples Behavioral Analysis Research
  • 17. Summary Ø Examined a good number of clustering methods for identifying daily usage profiles Ø Short observation: – Centroid-based method (Kmeans) works the best on this set of data Ø Longer observations: – There are too many choices and no (really) clear winner – Centroid-based methods produce more compact clusters – Centroid-based methods produce more balanced cluster – DBSCAN declare many (up to 90%) data points as background – Our observation is different from previous published results (Chicco 2012) Ø Future work – Maybe a different clustering quality metric would work better? – Should examine alternative profile generation methods, other than clustering Behavioral Analysis Research 17
  • 18. Thank You. John Wu Email: John.Wu@nersc.gov