SlideShare a Scribd company logo
Analysis of hourly electricity consumption to 
characterise a large amount of customers using 
clustering techniques 
X.Cipriano, G.Mor
• 8.536 residential users with smart 
meters (half hourly electricity 
consumption) 
• The utility is called “El Gas” and is 
located at Mallorca Island in a city 
called Sòller. 
• 12 months of half hourly data are 
available. 
CASE STUDY 
Mallorca
OBJECTIVES 
• To identify the relevant indexes or 
parameters related to hourly 
consumption that can define the 
users’ electricity behaviour 
• To characterize the main groups of 
dwellings according to their patterns 
of hourly energy load profiles 
(clustering). 
• To support in decision- making 
regarding schemes and awareness 
campaigns, as well as in modelling 
the energy behaviour of huge amount 
of dwellings
– Monthly consumption 
– Hourly consumption 
– Results of other services 
– Weather data (Using the closest meteorological station available) 
– Contracted Tariff 
– Contracted power 
– Location 
– Year of construction 
– Type of construction (flat, attached, detached,…) 
– Nº rooms ,area, nº occupants, occupancy patterns, type 
of domestic appliances, type of HVAC systems, 
thermostats temperatures, glazed area of the dwelling, 
first/second residence,… 
Possible 
Impossible 
DATA WE REALLY HAVE 
AVAILABLE DATA
1. Data collection and pre-processing of half hourly data 
2. Selection of relevant indicators/parameters 
3. Calculation of degree of weather dependency (for all users) 
4. Segmentation of groups of dwellings (clustering with SOM+K-means) 
5. Load shape curves visualization and characterisation for each cluster 
6. Automated saving tips for each user according to the clustering 
results 
STEPS OF WORK
DATA PRE-PROCESSING 
PRE-DEFINITION OF INDICATORS 
To highlight load 
shapes, which occur 
more frequently every 
month 
• The energy consumption indicators: related 
to hourly consumption in different periods of 
the day 
• The user behaviour indicators: related to the 
time and intensity when consumption is done 
• The weather dependency indicators 
• Complementary indicators: related to other 
existing data
DATA PRE-PROCESSING 
ENERGY CONSUMPTION INDICATORS (E.I.) 
For each month of data, we defined a typical day: Emonth: 
Edaily: Daily average of the month (kWh/day) 
En: (hourly average by night) [01:00 – 05:59] (kWh/h) 
We defined 7 
periods of 
consumption per 
day 
Ed 
No significant differences between weekdays and weekends were obtained
USER’S BEHAVIOUR INDICATORS 
•Ct: Contracted tariff 
•P: Contracted power 
•Ndb10: Num. days below the 10th quartile 
consumption 
•Htowd: Daily occupied hours per weekday 
(Num. hours consump. > residual consump.) 
•Htowe :Daily occupied hours per weekend 
•Pmawe , Pmawd: Period of the day with 
maximum consumption weekends/weekdays 
•Pmiwe, Pmiwd: Period of the day with 
minimum consumption weekends/weekdays 
•Dma: Day with maximum daily consumption 
•Dmi: Day with minimum daily consumption 
•Hma: Time with maximum hourly 
consumption 
•Nhma: Number of hours with the maximum 
hourly consumption 
•Hmi: Time with minimum hourly consumption 
•Nhmi: Number of hours with the minimum 
hourly consumption 
•Hp1: First time when P > 1kW 
•Hp2: First time when P > 2kW 
•Hp3: First time when P reaches maximum 
Monthly calculation of: 
DATA PRE-PROCESSING
PRE-PROCESSING OF DATA 
TIME PERIOD OF CALCULATION 
Indicators have been calculated for each dwelling for monthly, seasonally, and 
yearly period: 
• MONTHLY: Mean value of hourly average (or daily aggregated) of daily periods 
during the month. 
• SEASONALLY: Monthly average for each season. 
SUMMER: June-Sept., WINTER: Dec.-March, AUTUM: Apr.-May, Oct.-Nov. 
• YEARLY: Monthly average for the whole year
SELECTION OF RELEVANT INDICATORS 
SELECTION OF USER’S BEHAVIOUR INDICATORS (UBI) 
Procedure for selecting those relevant indicators : 
1. We performed PCA of UBIs in order to decrease the number of 
parameters. 5 PC were obtained. 
2. k-means clustering technique (R software) is applied over these 5 PC 
calculated for all of users and bad results of quality of clustering were 
obtained (silhouette index = 0.3-0.32, and DBI=0.6-0.65). 
3. Improvement with previous SOM treatment followed by k-means 
clustering of SOM prototype's . Increase of SI (0.45-0.5), and DBI around 
0.7. 
• We identify difficulties in understanding the physical meaning of results of 
PCA when trying to characterize the obtained groups. 
• We detected that maximum and minimum period of consumption (Pmawe 
, Pmawd, Pmiwe, Pmiwd) were the most influencing indicators in PCA .
SELECTION OF RELEVANT INDICATORS 
SELECTION OF USER’S BEHAVIOUR INDICATORS (UBI) 
New group of indicators (Qualitative Indicators): 
• We defined p1, p2, p2, p3, p4, p5, p6, p7, as the ranking of average hourly 
consumption of the day over the month. P1 defines the highest period of consumption, 
and p7 the lowest. 
• We can reject the Energy Indicators as relevant for clustering. If p1 = 3, Enn is the 
biggest value (kWh) regarding the rest of E.I. (Ed, Ea, El, Em, En, Ee ) 
• New indicators are added: SD and the percentiles of consumption 
Ed
SELECTION OF RELEVANT INDICATORS 
SELECTION OF USER’S BEHAVIOUR INDICATORS (UBI) 
We start with 13 indicators: 7 qualitative indicators, 5 percentile 
indicators (5perc, 25perc, median, 75perc, 90perc), and hourly SD. 
Correlation matrix 
calculation 
Eliminate missing values 
min–max normalization is performed 
to scale the values within a 
predetermined range (0 to 1). 
high correlation if (correl. >0.8) 
We finally select 6 indicators: p1, p2, p6, p7, SD, and Perc25 for 
clustering
SEGMENTATION OF DWELLINGS 
Calculation of 
complementary 
indicators: 
U.B.I. indicators: 
p1, p2, p6, p7, SD, Perc25. 
prototypes 
K-means 
clustering 
Are 
cohesion 
and 
dissimilitude 
assured? 
SOM 
NO 
YES 
YES 
Baseload curves and 
comfort parameters 
of clusters 
NO 
Are cohesion 
and 
dissimilitude 
assured? 
CLUSTERING PROCEDURE
SEGMENTATION OF DWELLINGS 
CLUSTERING PROCEDURE 
Silhouette Index: is graphical representation of how well each 
object lies within its cluster (Peter J. Rousseeuw in 1986). 
Are 
cohesion 
and 
dissimilitude 
assured? 
s(i) = quantity between -1 and 1. a value near 1 indicates that 
the sample is affected to the right cluster. 
Davies-Bouldin Index: (David L. Davies and Donald W. Bouldin 
in 1979) is an internal evaluation scheme, where the validation of 
how well the clustering has been done is made using quantities 
and features inherent to the dataset. 
D(i) = a lower value will mean that the clustering is better
SEGMENTATION OF DWELLINGS 
CLUSTERING RESULTS: WINTER 
9 
1 
6 
7 
5 
4 
2 
3 
8 
10
SEGMENTATION OF DWELLINGS 
CLUSTERING RESULTS: SUMMER 
7 
1 
8 
5 
6 
4 
3 
2
SEGMENTATION OF DWELLINGS 
CLUSTERING RESULTS: MIDSEASON 
1 
9 
7 
8 
5 
6 
4 
3 2
SEGMENTATION OF DWELLINGS 
CLUSTERING RESULTS: SUMMER 
Energy Indicators 
C4 reflects the biggest electricity consumption in all periods, as well as in the daily and monthly consumption (700 
kWh/month). It is followed by C1 (450 kWh/m.). C8 represents the 3rd (250kWh/month). C2, C5, C7 have similar consumption 
(around 200 kWh/month), and C6 with almost no consumption. 
Except C4, the rest of cluster have small variations in all indicators, that means that mean/median value is rather 
representative of the group. 
Hourly SD median value is around 0.5kWh (C1), 0.25 (C2, C3, C5, C7), 0 (C6), 0.8 (C4), 0.3 (C8), foowing the same shape 
than the energy indicators.
SEGMENTATION OF DWELLINGS 
CLUSTERING RESULTS: SUMMER 
Energy Indicators (mean values) 
C8 (green): Users have their main consumption between afternoon-dinner (max in Ed). Is the 18% of daily consumption. The 
min. is by night (En) and evening (Ee) with 7 and 11% of daily consumption 
C4 (red): shows differences around 2.5 kWh within the hourly consumption percentiles over month. It has the greatest 
standby consumption (Perc25) although with also big differences over the month. The main consumption is at noon and lunch 
time (Enn, El)
SEGMENTATION OF DWELLINGS 
CLUSTERING RESULTS: SUMMER 
Subgroups of dwellings within clusters (mean values) 
• Clusters 1,2,3,8 have their main consumption between dinner (5) and evening (6). 
• Clusters 4,5,6,7 have their main consumption between noon (2) and lunch (3) 
• Lowest consumption by night in all clusters, except C7 and C6 which are focused at noon-lunch (3) and none (0) consumption 
respectively.
SEGMENTATION OF DWELLINGS 
CLUSTERING RESULTS: SUMMER 
Complementary Indicators (relations between daily mean consumption): 
aW1: Mean / Max: day with maximum/average. 
aW2: Min / Mean: day with minimum consumption/average. aW3: Mean (Weekends) / Mean (Weekdays) 
Only C6 shows less consumption on 
weekends. It has low daily/month 
consumption and is focused on week days. 
The days with max.-min consumption are 
quite similar to the average day for all 
clusters, except C6 which has many 
“peaks”. 
Relations between hourly mean consumptions of the month 
aD7: Mean (Night period) / Mean, aD8: Mean (Lunch period) / Mean, aD9: Mean (Dinner period) / Mean: 
Most of clusters have mean standby by 
night around 50-55% of total, except C7 
(80%) and C6 (30%, but many with no 
consumption) 
There is no impact of lunch time, except in 
C4 (20% higher), and C6 (55% lower). 
C1, C2, C3, C8 increase their consumption 
when having dinner (30-40% higher). . 
Not well predicted, or not relevant
COMPLEMENTARY INDICATORS 
WEATHER DEPENDENCY INDICATORS 
Daily total energy (electricity ) use was plotted against outside daily mean temperature,. We use 
a change-point models, able to capture the non-linear relation between heating and cooling 
energy use and outside temperature. 
We selected the five-parameter change-point model (ASHRAE, 2001). 
We do not divide in week days and weekend, because the consumption is similar
COMPLEMENTARY INDICATORS 
WEATHER DEPENDENCY INDICATORS 
Best fit 
We implemented an automated parameter search algorithm to detect the best-fit lines called 
“segmented” (Vito M. R. Muggeo, 2010). Is an R package where estimates of the slopes and of the 
possibly multiple breakpoints are provided. 
It is an iterative procedure (Muggeo, 2003) that needs starting values only for the breakpoint 
parameters 
PROCEDURE 
1. Preliminary treatment: In order to avoid “noise”, a preliminary linear regression to the data is 
implemented. We performed two simple linear regression when T>23ºC, and T<15ºC. 
2. If the slope is >±0.5kWh/ºC for heating/cooling this user is accepted for running the change 
point model, otherwise, it will be consider for monthly simplified model (DD signature). 
3. DD signature: Is the simplified model applied when daily model is not so clear. If slope is >1 and 
R2>0.6 the user is considered as monthly weather dependent, otherwise it is not weather 
dependent.
SEGMENTATION OF DWELLINGS 
CLUSTERING RESULTS: WEATHER DEPENDENCY 
Are these results in accordance with reality? 
Summer: daily model is only relevant in C4 and C1 (40%). For those dwellings only 16% are daily cooling dependent, and 25% 
monthly dependent, (41% total). 
Winter :C8, C5 increase to 27.5% daily and monthly heating dependent each (55% total). For those users R2 is around 0.55-0.75 
for summer/winter. They represent 19% and 20% of total users respectively. 
C6 in summer and C3 in winter have almost not weather dependent, for the others is around 20-25% of users 
Summer 
Winter
SEGMENTATION OF DWELLINGS 
CLUSTERING RESULTS: WEATHER DEPENDENCY 
Complementary Indicators (mean values) 
Base load is calculated as the aggregated monthly consumption for non air conditioned days/months. 
Power is the hourly maximum power over the period supplied by the utility. 
• The highest values of Power and Baseload for C1, and C4 are coherent with the other results. 
• The three different postal codes cannot be predicted by the clustering, neither the type of tariffs, because 
the two are similarly distributed over the clusters. 
• Only remarkable that in C4 2.1DHA (P> 10kW, and discriminated from 23h to 13h)is 30%. This can partially 
justify the higher consumption they have in summer. 
Summer
Conclusions 
• Relevant indices of usual behaviour have 
been identified. 
• Good results of clustering that allow us a 
better understanding of the different 
groups of users behaviour. 
• Preliminary results of weather 
dependency, and their “prediction” 
according to relevant indicators, have 
been obtained (need of improvement). 
• Decision making schemes can be 
implemented. 
• Further development in improving the 
clustering, and in better integration of 
energy indicators with other information.
BeeData Analytics 
BeeData Analytics offers new products and business intelligence for energy 
distribution and commercialisation companies. 
BeeData Analytics is a high performance scalable system 
that permits the storage and analysis of non-homogeneous 
data of any type (energy-related or other): 
- Scalable distributed data storage. 
Big Data applications supported by Apache Hadoop 
software framework, HBase and Hive. 
- Configurable communication API. 
This permits data to be acquired and transmitted bi-directionally 
using standard protocol (RESTful web 
API) 
- Advanced system for data analytics. 
Coupled to the principal open data mining tools and 
libraries (R, Pandas, Python).

More Related Content

Similar to Presentation of EMPOWERING project in the last Workshop of the IEA Annex 58

Template Instructions1
Template Instructions1Template Instructions1
Template Instructions1
mandika
 
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGAI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
Lokesh147875
 
NRLP Practicum Presentation
NRLP Practicum PresentationNRLP Practicum Presentation
NRLP Practicum Presentation
Jeremy Moore
 
SUSTAINABLE ENERGY SYSTEMS 1 P a g e .docx
SUSTAINABLE ENERGY SYSTEMS  1  P a g e .docxSUSTAINABLE ENERGY SYSTEMS  1  P a g e .docx
SUSTAINABLE ENERGY SYSTEMS 1 P a g e .docx
mabelf3
 

Similar to Presentation of EMPOWERING project in the last Workshop of the IEA Annex 58 (20)

Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
 
Template Instructions1
Template Instructions1Template Instructions1
Template Instructions1
 
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGAI BASED PPT FOR PROJCTS USEFUL FOR EDITING
AI BASED PPT FOR PROJCTS USEFUL FOR EDITING
 
Energy Management 101
Energy Management 101Energy Management 101
Energy Management 101
 
When Does "Use Less" Become A Business
When Does "Use Less" Become A BusinessWhen Does "Use Less" Become A Business
When Does "Use Less" Become A Business
 
NRLP Practicum Presentation
NRLP Practicum PresentationNRLP Practicum Presentation
NRLP Practicum Presentation
 
Auditing 101
Auditing 101Auditing 101
Auditing 101
 
Tableau Dashboarding of Energy Consumption Singapore
 Tableau  Dashboarding   of Energy Consumption Singapore Tableau  Dashboarding   of Energy Consumption Singapore
Tableau Dashboarding of Energy Consumption Singapore
 
Energy Efficiency Workshop - Powering Sydney
Energy Efficiency Workshop - Powering SydneyEnergy Efficiency Workshop - Powering Sydney
Energy Efficiency Workshop - Powering Sydney
 
SUSTAINABLE ENERGY SYSTEMS 1 P a g e .docx
SUSTAINABLE ENERGY SYSTEMS  1  P a g e .docxSUSTAINABLE ENERGY SYSTEMS  1  P a g e .docx
SUSTAINABLE ENERGY SYSTEMS 1 P a g e .docx
 
TECO Final Presentation to the Sponsor.pptx
TECO Final Presentation to the Sponsor.pptxTECO Final Presentation to the Sponsor.pptx
TECO Final Presentation to the Sponsor.pptx
 
Day2 - session 3 Construction of a set of indicators for monitoring energy ef...
Day2 - session 3 Construction of a set of indicators for monitoring energy ef...Day2 - session 3 Construction of a set of indicators for monitoring energy ef...
Day2 - session 3 Construction of a set of indicators for monitoring energy ef...
 
Catalyst 2016: Channels, Interval Data Setup, Import, Counters, Routes
Catalyst 2016: Channels, Interval Data Setup, Import, Counters, RoutesCatalyst 2016: Channels, Interval Data Setup, Import, Counters, Routes
Catalyst 2016: Channels, Interval Data Setup, Import, Counters, Routes
 
Energy Audit / Energy Conservation Basics by Varun Pratap Singh
Energy Audit / Energy Conservation Basics by Varun Pratap SinghEnergy Audit / Energy Conservation Basics by Varun Pratap Singh
Energy Audit / Energy Conservation Basics by Varun Pratap Singh
 
Energy Saving Ideas Shakti Singh Chauhan
Energy Saving Ideas Shakti Singh ChauhanEnergy Saving Ideas Shakti Singh Chauhan
Energy Saving Ideas Shakti Singh Chauhan
 
Energy efficiency in data processing centres
Energy efficiency in data processing centresEnergy efficiency in data processing centres
Energy efficiency in data processing centres
 
Energy Monitoring & targeting
Energy Monitoring & targetingEnergy Monitoring & targeting
Energy Monitoring & targeting
 
Lessons Learned from Meter Calibrated Energy Simulations of Multi-Unit Reside...
Lessons Learned from Meter Calibrated Energy Simulations of Multi-Unit Reside...Lessons Learned from Meter Calibrated Energy Simulations of Multi-Unit Reside...
Lessons Learned from Meter Calibrated Energy Simulations of Multi-Unit Reside...
 
Value of Solar Tariff Methodology: Proposed Approach
Value of Solar Tariff Methodology: Proposed ApproachValue of Solar Tariff Methodology: Proposed Approach
Value of Solar Tariff Methodology: Proposed Approach
 
05 2017 05-04-clear sky models g-kimball
05 2017 05-04-clear sky models g-kimball05 2017 05-04-clear sky models g-kimball
05 2017 05-04-clear sky models g-kimball
 

Recently uploaded

一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
benishzehra469
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 

Recently uploaded (20)

Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 

Presentation of EMPOWERING project in the last Workshop of the IEA Annex 58

  • 1. Analysis of hourly electricity consumption to characterise a large amount of customers using clustering techniques X.Cipriano, G.Mor
  • 2. • 8.536 residential users with smart meters (half hourly electricity consumption) • The utility is called “El Gas” and is located at Mallorca Island in a city called Sòller. • 12 months of half hourly data are available. CASE STUDY Mallorca
  • 3. OBJECTIVES • To identify the relevant indexes or parameters related to hourly consumption that can define the users’ electricity behaviour • To characterize the main groups of dwellings according to their patterns of hourly energy load profiles (clustering). • To support in decision- making regarding schemes and awareness campaigns, as well as in modelling the energy behaviour of huge amount of dwellings
  • 4. – Monthly consumption – Hourly consumption – Results of other services – Weather data (Using the closest meteorological station available) – Contracted Tariff – Contracted power – Location – Year of construction – Type of construction (flat, attached, detached,…) – Nº rooms ,area, nº occupants, occupancy patterns, type of domestic appliances, type of HVAC systems, thermostats temperatures, glazed area of the dwelling, first/second residence,… Possible Impossible DATA WE REALLY HAVE AVAILABLE DATA
  • 5. 1. Data collection and pre-processing of half hourly data 2. Selection of relevant indicators/parameters 3. Calculation of degree of weather dependency (for all users) 4. Segmentation of groups of dwellings (clustering with SOM+K-means) 5. Load shape curves visualization and characterisation for each cluster 6. Automated saving tips for each user according to the clustering results STEPS OF WORK
  • 6. DATA PRE-PROCESSING PRE-DEFINITION OF INDICATORS To highlight load shapes, which occur more frequently every month • The energy consumption indicators: related to hourly consumption in different periods of the day • The user behaviour indicators: related to the time and intensity when consumption is done • The weather dependency indicators • Complementary indicators: related to other existing data
  • 7. DATA PRE-PROCESSING ENERGY CONSUMPTION INDICATORS (E.I.) For each month of data, we defined a typical day: Emonth: Edaily: Daily average of the month (kWh/day) En: (hourly average by night) [01:00 – 05:59] (kWh/h) We defined 7 periods of consumption per day Ed No significant differences between weekdays and weekends were obtained
  • 8. USER’S BEHAVIOUR INDICATORS •Ct: Contracted tariff •P: Contracted power •Ndb10: Num. days below the 10th quartile consumption •Htowd: Daily occupied hours per weekday (Num. hours consump. > residual consump.) •Htowe :Daily occupied hours per weekend •Pmawe , Pmawd: Period of the day with maximum consumption weekends/weekdays •Pmiwe, Pmiwd: Period of the day with minimum consumption weekends/weekdays •Dma: Day with maximum daily consumption •Dmi: Day with minimum daily consumption •Hma: Time with maximum hourly consumption •Nhma: Number of hours with the maximum hourly consumption •Hmi: Time with minimum hourly consumption •Nhmi: Number of hours with the minimum hourly consumption •Hp1: First time when P > 1kW •Hp2: First time when P > 2kW •Hp3: First time when P reaches maximum Monthly calculation of: DATA PRE-PROCESSING
  • 9. PRE-PROCESSING OF DATA TIME PERIOD OF CALCULATION Indicators have been calculated for each dwelling for monthly, seasonally, and yearly period: • MONTHLY: Mean value of hourly average (or daily aggregated) of daily periods during the month. • SEASONALLY: Monthly average for each season. SUMMER: June-Sept., WINTER: Dec.-March, AUTUM: Apr.-May, Oct.-Nov. • YEARLY: Monthly average for the whole year
  • 10. SELECTION OF RELEVANT INDICATORS SELECTION OF USER’S BEHAVIOUR INDICATORS (UBI) Procedure for selecting those relevant indicators : 1. We performed PCA of UBIs in order to decrease the number of parameters. 5 PC were obtained. 2. k-means clustering technique (R software) is applied over these 5 PC calculated for all of users and bad results of quality of clustering were obtained (silhouette index = 0.3-0.32, and DBI=0.6-0.65). 3. Improvement with previous SOM treatment followed by k-means clustering of SOM prototype's . Increase of SI (0.45-0.5), and DBI around 0.7. • We identify difficulties in understanding the physical meaning of results of PCA when trying to characterize the obtained groups. • We detected that maximum and minimum period of consumption (Pmawe , Pmawd, Pmiwe, Pmiwd) were the most influencing indicators in PCA .
  • 11. SELECTION OF RELEVANT INDICATORS SELECTION OF USER’S BEHAVIOUR INDICATORS (UBI) New group of indicators (Qualitative Indicators): • We defined p1, p2, p2, p3, p4, p5, p6, p7, as the ranking of average hourly consumption of the day over the month. P1 defines the highest period of consumption, and p7 the lowest. • We can reject the Energy Indicators as relevant for clustering. If p1 = 3, Enn is the biggest value (kWh) regarding the rest of E.I. (Ed, Ea, El, Em, En, Ee ) • New indicators are added: SD and the percentiles of consumption Ed
  • 12. SELECTION OF RELEVANT INDICATORS SELECTION OF USER’S BEHAVIOUR INDICATORS (UBI) We start with 13 indicators: 7 qualitative indicators, 5 percentile indicators (5perc, 25perc, median, 75perc, 90perc), and hourly SD. Correlation matrix calculation Eliminate missing values min–max normalization is performed to scale the values within a predetermined range (0 to 1). high correlation if (correl. >0.8) We finally select 6 indicators: p1, p2, p6, p7, SD, and Perc25 for clustering
  • 13. SEGMENTATION OF DWELLINGS Calculation of complementary indicators: U.B.I. indicators: p1, p2, p6, p7, SD, Perc25. prototypes K-means clustering Are cohesion and dissimilitude assured? SOM NO YES YES Baseload curves and comfort parameters of clusters NO Are cohesion and dissimilitude assured? CLUSTERING PROCEDURE
  • 14. SEGMENTATION OF DWELLINGS CLUSTERING PROCEDURE Silhouette Index: is graphical representation of how well each object lies within its cluster (Peter J. Rousseeuw in 1986). Are cohesion and dissimilitude assured? s(i) = quantity between -1 and 1. a value near 1 indicates that the sample is affected to the right cluster. Davies-Bouldin Index: (David L. Davies and Donald W. Bouldin in 1979) is an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset. D(i) = a lower value will mean that the clustering is better
  • 15. SEGMENTATION OF DWELLINGS CLUSTERING RESULTS: WINTER 9 1 6 7 5 4 2 3 8 10
  • 16. SEGMENTATION OF DWELLINGS CLUSTERING RESULTS: SUMMER 7 1 8 5 6 4 3 2
  • 17. SEGMENTATION OF DWELLINGS CLUSTERING RESULTS: MIDSEASON 1 9 7 8 5 6 4 3 2
  • 18. SEGMENTATION OF DWELLINGS CLUSTERING RESULTS: SUMMER Energy Indicators C4 reflects the biggest electricity consumption in all periods, as well as in the daily and monthly consumption (700 kWh/month). It is followed by C1 (450 kWh/m.). C8 represents the 3rd (250kWh/month). C2, C5, C7 have similar consumption (around 200 kWh/month), and C6 with almost no consumption. Except C4, the rest of cluster have small variations in all indicators, that means that mean/median value is rather representative of the group. Hourly SD median value is around 0.5kWh (C1), 0.25 (C2, C3, C5, C7), 0 (C6), 0.8 (C4), 0.3 (C8), foowing the same shape than the energy indicators.
  • 19. SEGMENTATION OF DWELLINGS CLUSTERING RESULTS: SUMMER Energy Indicators (mean values) C8 (green): Users have their main consumption between afternoon-dinner (max in Ed). Is the 18% of daily consumption. The min. is by night (En) and evening (Ee) with 7 and 11% of daily consumption C4 (red): shows differences around 2.5 kWh within the hourly consumption percentiles over month. It has the greatest standby consumption (Perc25) although with also big differences over the month. The main consumption is at noon and lunch time (Enn, El)
  • 20. SEGMENTATION OF DWELLINGS CLUSTERING RESULTS: SUMMER Subgroups of dwellings within clusters (mean values) • Clusters 1,2,3,8 have their main consumption between dinner (5) and evening (6). • Clusters 4,5,6,7 have their main consumption between noon (2) and lunch (3) • Lowest consumption by night in all clusters, except C7 and C6 which are focused at noon-lunch (3) and none (0) consumption respectively.
  • 21. SEGMENTATION OF DWELLINGS CLUSTERING RESULTS: SUMMER Complementary Indicators (relations between daily mean consumption): aW1: Mean / Max: day with maximum/average. aW2: Min / Mean: day with minimum consumption/average. aW3: Mean (Weekends) / Mean (Weekdays) Only C6 shows less consumption on weekends. It has low daily/month consumption and is focused on week days. The days with max.-min consumption are quite similar to the average day for all clusters, except C6 which has many “peaks”. Relations between hourly mean consumptions of the month aD7: Mean (Night period) / Mean, aD8: Mean (Lunch period) / Mean, aD9: Mean (Dinner period) / Mean: Most of clusters have mean standby by night around 50-55% of total, except C7 (80%) and C6 (30%, but many with no consumption) There is no impact of lunch time, except in C4 (20% higher), and C6 (55% lower). C1, C2, C3, C8 increase their consumption when having dinner (30-40% higher). . Not well predicted, or not relevant
  • 22. COMPLEMENTARY INDICATORS WEATHER DEPENDENCY INDICATORS Daily total energy (electricity ) use was plotted against outside daily mean temperature,. We use a change-point models, able to capture the non-linear relation between heating and cooling energy use and outside temperature. We selected the five-parameter change-point model (ASHRAE, 2001). We do not divide in week days and weekend, because the consumption is similar
  • 23. COMPLEMENTARY INDICATORS WEATHER DEPENDENCY INDICATORS Best fit We implemented an automated parameter search algorithm to detect the best-fit lines called “segmented” (Vito M. R. Muggeo, 2010). Is an R package where estimates of the slopes and of the possibly multiple breakpoints are provided. It is an iterative procedure (Muggeo, 2003) that needs starting values only for the breakpoint parameters PROCEDURE 1. Preliminary treatment: In order to avoid “noise”, a preliminary linear regression to the data is implemented. We performed two simple linear regression when T>23ºC, and T<15ºC. 2. If the slope is >±0.5kWh/ºC for heating/cooling this user is accepted for running the change point model, otherwise, it will be consider for monthly simplified model (DD signature). 3. DD signature: Is the simplified model applied when daily model is not so clear. If slope is >1 and R2>0.6 the user is considered as monthly weather dependent, otherwise it is not weather dependent.
  • 24. SEGMENTATION OF DWELLINGS CLUSTERING RESULTS: WEATHER DEPENDENCY Are these results in accordance with reality? Summer: daily model is only relevant in C4 and C1 (40%). For those dwellings only 16% are daily cooling dependent, and 25% monthly dependent, (41% total). Winter :C8, C5 increase to 27.5% daily and monthly heating dependent each (55% total). For those users R2 is around 0.55-0.75 for summer/winter. They represent 19% and 20% of total users respectively. C6 in summer and C3 in winter have almost not weather dependent, for the others is around 20-25% of users Summer Winter
  • 25. SEGMENTATION OF DWELLINGS CLUSTERING RESULTS: WEATHER DEPENDENCY Complementary Indicators (mean values) Base load is calculated as the aggregated monthly consumption for non air conditioned days/months. Power is the hourly maximum power over the period supplied by the utility. • The highest values of Power and Baseload for C1, and C4 are coherent with the other results. • The three different postal codes cannot be predicted by the clustering, neither the type of tariffs, because the two are similarly distributed over the clusters. • Only remarkable that in C4 2.1DHA (P> 10kW, and discriminated from 23h to 13h)is 30%. This can partially justify the higher consumption they have in summer. Summer
  • 26. Conclusions • Relevant indices of usual behaviour have been identified. • Good results of clustering that allow us a better understanding of the different groups of users behaviour. • Preliminary results of weather dependency, and their “prediction” according to relevant indicators, have been obtained (need of improvement). • Decision making schemes can be implemented. • Further development in improving the clustering, and in better integration of energy indicators with other information.
  • 27. BeeData Analytics BeeData Analytics offers new products and business intelligence for energy distribution and commercialisation companies. BeeData Analytics is a high performance scalable system that permits the storage and analysis of non-homogeneous data of any type (energy-related or other): - Scalable distributed data storage. Big Data applications supported by Apache Hadoop software framework, HBase and Hive. - Configurable communication API. This permits data to be acquired and transmitted bi-directionally using standard protocol (RESTful web API) - Advanced system for data analytics. Coupled to the principal open data mining tools and libraries (R, Pandas, Python).