SlideShare a Scribd company logo
1 of 30
Download to read offline
Distributed Clustering for Smart Grids
Pedro Rodrigues, João Gama




                                  University of Porto, Portugal




                             Project KDUS (PTDC/EIA-EIA/98355/2008)
4 September 2011
NGDM '11
Smart Grids
    Smart Grids: monitoring information on the top of electrical
    grid
           Internet-like communications layer
              A shift in the way in which power grids are operated
           Intelligent monitoring in real time
              Interactive with consumers and markets
              Optimized to make the best use of resources and equipment
              Predictive rather than reactive
              Distributed across geographical and organizational boundaries




                                                                              2




NGDM '11
Smart Grids and Data Mining
    Smart grid forms a network (eventually decomposable) of distributed
    sources of high-speed data streams.
           The dynamics of data are unknown:
           the topology of network changes over time,
           the number of meters tends to increase and
           the context where the meter acts evolves over time.
    Several data mining tasks are involved: prediction, cluster (profiling)
    analysis, event and anomaly detection, correlation analysis, etc.

    All these characteristics constitute real challenges and opportunities for
    applied research in distributed data mining.

    The requirements of near real-time analysis for multiple time horizons
    and multiple space aggregations make these analysis an even harder
    research challenge.                                                          3




NGDM '11
Outline


    Rationale


    Clustering distributed data streams


     Local-to-Global Clustering of data sources




                                                  4




NGDM '11
Rationale                                       Sensor Networks



    Sensors are usually small, low-cost devices capable of sensing some
    attribute and of communicating with other sensors.
    Sensor networks can include thousands of sensors, each one being
    capable of measuring, analysing and transmitting a stream of data.




    Resources are scarse, which reduce the possibilities for heavy
    computation,while operating under a limited bandwidth.
                                                                          5




NGDM '11
Rationale            Comprehension of Ubiquitous Data Streams



    Comprehension
    Extract information about global interaction between sources by
    looking at the data they produce.


    When no other information is available, usual knowledge discovery
    approaches are based on unsupervised techniques (e.g. clustering).


    However, two different stream clustering problems exist:
           clustering streaming data points (e.g. meter' readings)
           clustering streaming data sources (e.g. meters)



                                                                         6




NGDM '11
Rationale           Comprehension by Clustering Data Points



    Information about dense regions of the sensor data space.




                                        Cluster A   Cluster B Cluster C




                                                                          7




NGDM '11
Rationale         Comprehension by Clustering Data Sources



    Information about groups of sensors that behave similarly over time.




    Possible scenario                    Cluster A   Cluster B Cluster C

    Sensors collecting electricity demand data from different homes,
    exploring similar consumption patterns.
                                                                           8




NGDM '11
DGClust                                    Setting and Objective



    Setting
    Sensors in a wide network produce streams of heterogeneously
    distributed data (each sensor produces a univariate stream of data)




    Objective                          Cluster A   Cluster B Cluster C

    To keep a clustering of the observations that are created by
                                                                          9
    aggregating each node's data as a feature in a centralized stream.

NGDM '11
DGClust                    Problems and Research Question



    Problems
    high-speed data streams        excessive storage and processing
    widely spread network          heavy communication
    centralized clustering         high dimensionality
    dynamic data                   outdated models


    Research Question
    Does local discretization and representative clustering improve
    validity, communication and computation loads when applied to
    distributed sensor data streams?



                                                                      10




NGDM '11
DGClust                                   Methodology : Local Step



    DGClust – Distributed Grid Clustering (Local Step)
    Each sensor keeps an online ordinal discretization of its data.
                      Partition Incremental Discretization
                                                          Current State

                                                              low




                                                               D




                                                                          11




NGDM '11
DGClust                         Methodology : Aggregating Step



    DGClust – Distributed Grid Clustering (Aggregating Step)
    The central server gathers the global state of the network.
    Sensors whose state has not change since last communication, do not
    transmit to server.


                                                  low             low
                                                     low          low
                                                   D               D
                                                    high          high
                                                  high            high
                                                      A            A
                                                   B               B
                                                      B            B
                                                   B               B
                                                    high          high
                                                  low             low


                                                                          12




NGDM '11
DGClust                      Methodology : Representative Step



    DGClust – Distributed Grid Clustering (Representative Step)
    Server keeps a small list of the most frequent global states.
                     Space-Saving Frequent Items Monitoring



                                 #
              low




                                       high
                                       high




                                       high
                                 523




                                       low
                                       low

                                       low
              low




                                        D
                                        C
                                        C
                                        B


                                        A
               D
              high
              high
                                       high




                                       high
                                       high
                                       low




                                       low
                                       low
                                 334




                                        D
                                        B
                                        B
                                        B
                                        A
               A
               B
               B




                                       high

                                       high
                                       low
                                       low




                                       low



                                       low
                                 89




                                        D
                                        A
                                        B
                                        A
                                        A
               B
              high
              low
                                               ...
                                                                    13




NGDM '11
DGClust                            Methodology : Clustering Step



    DGClust – Distributed Grid Clustering (Clustering Step)
    Server applies partitional clustering to the most frequent states.
             Furthest Point Clustering + Online Adaptive K-Means




                                                                         14




NGDM '11
DGClust   Example (k=5) Varying Resources




                                               15




NGDM '11
DGClust                                                Main Findings



     Quality of results does not depend on the number of sensors.


     Communication reduction is constant with any number of sensors (as
     long as direct link with server exists).




                                           higher clustering quality
     higher discretization granularity
                                           lower communication reduction


     higher number of sensors              more clustering updates
                                                                           16




NGDM '11
L2GClust                                      Setting and Objective



    Setting
    Sensors in a wide network produce streams of heterogeneously
    distributed data (each sensor produces a univariate stream of data)




    Objective                             Cluster A   Cluster B Cluster C

    To keep, at each node, a clustering of the entire network of sensors.
                                                                            17




NGDM '11
L2GClust                             Methodology : Local Sketch



    Each sensor keeps a sketch of its most recent data.

                                                              10.2



    The common approach for focus on recent data are sliding windows1.
    Even within the sliding window, the most recent data point is usually
    more important than the last one which is about to be discarded.


    In ubiquitous streaming data sources, such as sensor networks,
    resources like memory and processing power are scarse.
    Some times, there is not even enough memory to store all the data
    points inside the window.
                        Memoryless α-fading average
                                                                            18




NGDM '11
L2GClust                  Example : Local Clustering


                   1
                                  10
              2
                                         100

                   10

                                  11
                        99

                             95
                   5

                                  10

              10

                                         3

                   12

                                   2                      19




NGDM '11
L2GClust                  Example : Local Clustering

                                   Centroids {6.9, 98.0}
                   1
                                  10
              2
                                         100

                   10

                                  11
                        99

                             95
                   5

                                  10

              10

                                         3

                   12

                                   2                       20




NGDM '11
L2GClust                       Methodology : Local Clustering




    This estimate is computed by clustering the centroids of direct
    neighbors’ estimates of the global clustering.


                         Furthest Point Clustering


    Basically, each node performs an ensemble of clusterings from its
    direct neighbors.


    Instead of broadcasting the sketch of the its own data, each node
    broadcasts its estimate of the global clustering.


                                                                        21




NGDM '11
L2GClust                           Example : Local Clustering

                                                 Centroids {6.9, 98.0}
                   88.07
                                              87.37
           88.06
                                                         4.19

                   2.80
                             {7.71, 97.1}
                                               3.74
                           1.21
                                            {10.59, 97.38}
                                    3.58
                                            {5.10, 95.00}
                   2.41

                                              3.50

           88.06

                                                        88.03

                   86.31

                                             88.12                       22




NGDM '11
L2GClust                           Example : Local Clustering

                                                 Centroids {6.9, 98.0}
                   88.07
                                              87.37
           88.06
                                                         4.19

                   2.80
                             {7.71, 97.1}
                                               3.74
                           1.21
                                            {10.59, 97.38}
                                    3.58
                                            {5.10, 95.00}
                   2.41

                                              3.50

           88.06

                                                        88.03

                   86.31

                                             88.12                       23




NGDM '11
L2GClust                        Example : Local Clustering

                                               Centroids {6.9, 98.0}
                   88.07
                                            87.37
           88.06
                                                         4.19

                   2.80

                                             3.74
                           1.21
                                         {10.36, 97.1}
                                  3.58
                   2.41

                                            3.50

           88.06

                                                         88.03

                   86.31

                                            88.12                      24




NGDM '11
L2GClust                                       Evaluation Summary



     Comparison was performed with same strategy executed at a central
     server with access to all data.
     Measured outcomes were the agreement between a node's clustering
     estimate and the centralized clustering, averaged over all nodes.
     Kappa statistic                            cluster sanity
     Proportion of agreement                    cluster validity
                       K=(P(A)-P(e))/(1-P(e))
     State-of-the-art Simulator
     Each sensor in the simulation (Visual Sense) generates a Gaussian
     stream with mean from one of the predefined Gaussian clusters.
     Evaluated parameters were number of clusters, network size, and
     cluster overlap.
                                                                         25




NGDM '11
L2GClust                                                      Results




                                                                             26
      Average proportion of agreement converges (with small fluctuations).

NGDM '11
L2GClust                                                       Results




                                                                            27
           Sanity was confirmed with Kappa statistic always above 0.58.

NGDM '11
L2GClust                                                   Results




                                                                        28
           Real data from electricity demand sensors showed
                   ability to improve with examples.
NGDM '11
L2GClust                                                 Main Properties




    Local sketch yields:
           memoryless storage of summaries;
           a straightforward adaptation to most recent data;
           a reduction of the system's sensitivity to uncertainty;


    Local clustering with direct neighbors yields:
           no forwarding of information (reduced communication);
           low dimensionality of the clustering problem;
           sensitive information better preserved.
    Future Work
           Evaluate L2GClust on smart grid sensor networks.                   29




NGDM '11
Thank you!




              30




NGDM '11

More Related Content

What's hot

IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
International Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & EngineeringInternational Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & Engineeringpriyanka singh
 
Commutative approach for securing digital media
Commutative approach for securing digital mediaCommutative approach for securing digital media
Commutative approach for securing digital mediaijctet
 
Comparison of SVD & Pseudo Random Sequence based methods of Image Watermarking
Comparison of SVD & Pseudo Random Sequence based methods of Image WatermarkingComparison of SVD & Pseudo Random Sequence based methods of Image Watermarking
Comparison of SVD & Pseudo Random Sequence based methods of Image Watermarkingijsrd.com
 
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...Wesley De Neve
 
DIGITAL WATERMARKING TECHNIQUE BASED ON MULTI-RESOLUTION CURVELET TRANSFORM
DIGITAL WATERMARKING TECHNIQUE BASED ON MULTI-RESOLUTION CURVELET TRANSFORMDIGITAL WATERMARKING TECHNIQUE BASED ON MULTI-RESOLUTION CURVELET TRANSFORM
DIGITAL WATERMARKING TECHNIQUE BASED ON MULTI-RESOLUTION CURVELET TRANSFORMijfcstjournal
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 

What's hot (12)

IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
14 vikram kumar_150-159
14 vikram kumar_150-15914 vikram kumar_150-159
14 vikram kumar_150-159
 
International Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & EngineeringInternational Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & Engineering
 
Commutative approach for securing digital media
Commutative approach for securing digital mediaCommutative approach for securing digital media
Commutative approach for securing digital media
 
Comparison of SVD & Pseudo Random Sequence based methods of Image Watermarking
Comparison of SVD & Pseudo Random Sequence based methods of Image WatermarkingComparison of SVD & Pseudo Random Sequence based methods of Image Watermarking
Comparison of SVD & Pseudo Random Sequence based methods of Image Watermarking
 
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...
 
270 273
270 273270 273
270 273
 
SimWare and the new LSA study group on SISO
SimWare and the new LSA study group on SISOSimWare and the new LSA study group on SISO
SimWare and the new LSA study group on SISO
 
[IJET V2I4P2] Authors:Damanbir Singh, Guneet Kaur
[IJET V2I4P2] Authors:Damanbir Singh, Guneet Kaur[IJET V2I4P2] Authors:Damanbir Singh, Guneet Kaur
[IJET V2I4P2] Authors:Damanbir Singh, Guneet Kaur
 
Distributedsystems 100912185813-phpapp01
Distributedsystems 100912185813-phpapp01Distributedsystems 100912185813-phpapp01
Distributedsystems 100912185813-phpapp01
 
DIGITAL WATERMARKING TECHNIQUE BASED ON MULTI-RESOLUTION CURVELET TRANSFORM
DIGITAL WATERMARKING TECHNIQUE BASED ON MULTI-RESOLUTION CURVELET TRANSFORMDIGITAL WATERMARKING TECHNIQUE BASED ON MULTI-RESOLUTION CURVELET TRANSFORM
DIGITAL WATERMARKING TECHNIQUE BASED ON MULTI-RESOLUTION CURVELET TRANSFORM
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 

Similar to Distributed Clustering for Smart Grids

Energy consumption mitigation routing protocols for large wsn's final
Energy consumption mitigation  routing protocols for large wsn's finalEnergy consumption mitigation  routing protocols for large wsn's final
Energy consumption mitigation routing protocols for large wsn's finalsumavaidya90
 
Energy consumption mitigation__routing_protocols_for_large_wsn's_final
Energy consumption mitigation__routing_protocols_for_large_wsn's_finalEnergy consumption mitigation__routing_protocols_for_large_wsn's_final
Energy consumption mitigation__routing_protocols_for_large_wsn's_finalGr Patel
 
Tree Based Collaboration For Target Tracking
Tree Based Collaboration For Target TrackingTree Based Collaboration For Target Tracking
Tree Based Collaboration For Target TrackingChuka Okoye
 
Fault tolerant energy aware data dissemination protocol in WSN
Fault tolerant energy aware data dissemination protocol in WSNFault tolerant energy aware data dissemination protocol in WSN
Fault tolerant energy aware data dissemination protocol in WSNPrajwal Panchmahalkar
 
Characterization of directed diffusion protocol in wireless sensor network
Characterization of directed diffusion protocol in wireless sensor networkCharacterization of directed diffusion protocol in wireless sensor network
Characterization of directed diffusion protocol in wireless sensor networkijwmn
 
A seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networksA seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networkspraveen369
 
RETHINKING THE EXPRESSIVE POWER OF GNNS VIA GRAPH BICONNECTIVITY.pptx
RETHINKING THE EXPRESSIVE POWER OF GNNS VIA GRAPH BICONNECTIVITY.pptxRETHINKING THE EXPRESSIVE POWER OF GNNS VIA GRAPH BICONNECTIVITY.pptx
RETHINKING THE EXPRESSIVE POWER OF GNNS VIA GRAPH BICONNECTIVITY.pptxssuser2624f71
 
Fault tolerance in wireless sensor networks by Constrained Delaunay Triangula...
Fault tolerance in wireless sensor networks by Constrained Delaunay Triangula...Fault tolerance in wireless sensor networks by Constrained Delaunay Triangula...
Fault tolerance in wireless sensor networks by Constrained Delaunay Triangula...Sigma web solutions pvt. ltd.
 
Sequentail Max Search (SMS) resouce allocation algorithm
Sequentail Max Search (SMS) resouce allocation algorithm Sequentail Max Search (SMS) resouce allocation algorithm
Sequentail Max Search (SMS) resouce allocation algorithm amal algedir
 
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSravan Puttagunta
 
Energy consumption mitigation routing protocols for large wsn's
Energy consumption mitigation  routing protocols for large wsn'sEnergy consumption mitigation  routing protocols for large wsn's
Energy consumption mitigation routing protocols for large wsn'sSpandan Spandy
 
Analysis of GPSR and its Relevant Attacks in Wireless Sensor Networks
Analysis of GPSR and its Relevant Attacks in Wireless Sensor NetworksAnalysis of GPSR and its Relevant Attacks in Wireless Sensor Networks
Analysis of GPSR and its Relevant Attacks in Wireless Sensor NetworksIDES Editor
 
6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenters6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenterszafargilani
 
Sensor Protocols for Information via Negotiation (SPIN)
Sensor Protocols for Information via Negotiation (SPIN)Sensor Protocols for Information via Negotiation (SPIN)
Sensor Protocols for Information via Negotiation (SPIN)rajivagarwal23dei
 
Modelling D2D Communications in Cellular Access Networks via Coupled Processors
Modelling D2D Communications in Cellular Access Networks via Coupled ProcessorsModelling D2D Communications in Cellular Access Networks via Coupled Processors
Modelling D2D Communications in Cellular Access Networks via Coupled ProcessorsInstitute of Information Systems (HES-SO)
 
Using Distributed Node-RED to build fog/edge applications
Using Distributed Node-RED to build fog/edge applicationsUsing Distributed Node-RED to build fog/edge applications
Using Distributed Node-RED to build fog/edge applicationsNam Giang
 

Similar to Distributed Clustering for Smart Grids (20)

Energy consumption mitigation routing protocols for large wsn's final
Energy consumption mitigation  routing protocols for large wsn's finalEnergy consumption mitigation  routing protocols for large wsn's final
Energy consumption mitigation routing protocols for large wsn's final
 
Energy consumption mitigation__routing_protocols_for_large_wsn's_final
Energy consumption mitigation__routing_protocols_for_large_wsn's_finalEnergy consumption mitigation__routing_protocols_for_large_wsn's_final
Energy consumption mitigation__routing_protocols_for_large_wsn's_final
 
Tree Based Collaboration For Target Tracking
Tree Based Collaboration For Target TrackingTree Based Collaboration For Target Tracking
Tree Based Collaboration For Target Tracking
 
Fault tolerant energy aware data dissemination protocol in WSN
Fault tolerant energy aware data dissemination protocol in WSNFault tolerant energy aware data dissemination protocol in WSN
Fault tolerant energy aware data dissemination protocol in WSN
 
Characterization of directed diffusion protocol in wireless sensor network
Characterization of directed diffusion protocol in wireless sensor networkCharacterization of directed diffusion protocol in wireless sensor network
Characterization of directed diffusion protocol in wireless sensor network
 
wcn.pptx
wcn.pptxwcn.pptx
wcn.pptx
 
A seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networksA seminar report on data aggregation in wireless sensor networks
A seminar report on data aggregation in wireless sensor networks
 
RETHINKING THE EXPRESSIVE POWER OF GNNS VIA GRAPH BICONNECTIVITY.pptx
RETHINKING THE EXPRESSIVE POWER OF GNNS VIA GRAPH BICONNECTIVITY.pptxRETHINKING THE EXPRESSIVE POWER OF GNNS VIA GRAPH BICONNECTIVITY.pptx
RETHINKING THE EXPRESSIVE POWER OF GNNS VIA GRAPH BICONNECTIVITY.pptx
 
Fault tolerance in wireless sensor networks by Constrained Delaunay Triangula...
Fault tolerance in wireless sensor networks by Constrained Delaunay Triangula...Fault tolerance in wireless sensor networks by Constrained Delaunay Triangula...
Fault tolerance in wireless sensor networks by Constrained Delaunay Triangula...
 
Sequentail Max Search (SMS) resouce allocation algorithm
Sequentail Max Search (SMS) resouce allocation algorithm Sequentail Max Search (SMS) resouce allocation algorithm
Sequentail Max Search (SMS) resouce allocation algorithm
 
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
 
Energy consumption mitigation routing protocols for large wsn's
Energy consumption mitigation  routing protocols for large wsn'sEnergy consumption mitigation  routing protocols for large wsn's
Energy consumption mitigation routing protocols for large wsn's
 
Analysis of GPSR and its Relevant Attacks in Wireless Sensor Networks
Analysis of GPSR and its Relevant Attacks in Wireless Sensor NetworksAnalysis of GPSR and its Relevant Attacks in Wireless Sensor Networks
Analysis of GPSR and its Relevant Attacks in Wireless Sensor Networks
 
6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenters6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenters
 
358 365
358 365358 365
358 365
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Sensor Protocols for Information via Negotiation (SPIN)
Sensor Protocols for Information via Negotiation (SPIN)Sensor Protocols for Information via Negotiation (SPIN)
Sensor Protocols for Information via Negotiation (SPIN)
 
550 537-546
550 537-546550 537-546
550 537-546
 
Modelling D2D Communications in Cellular Access Networks via Coupled Processors
Modelling D2D Communications in Cellular Access Networks via Coupled ProcessorsModelling D2D Communications in Cellular Access Networks via Coupled Processors
Modelling D2D Communications in Cellular Access Networks via Coupled Processors
 
Using Distributed Node-RED to build fog/edge applications
Using Distributed Node-RED to build fog/edge applicationsUsing Distributed Node-RED to build fog/edge applications
Using Distributed Node-RED to build fog/edge applications
 

More from LARCA UPC

Experiments with Randomisation and Boosting for Multi-instance Classification
Experiments with Randomisation and Boosting for Multi-instance ClassificationExperiments with Randomisation and Boosting for Multi-instance Classification
Experiments with Randomisation and Boosting for Multi-instance ClassificationLARCA UPC
 
Spectral Learning Methods for Finite State Machines with Applications to Na...
  Spectral Learning Methods for Finite State Machines with Applications to Na...  Spectral Learning Methods for Finite State Machines with Applications to Na...
Spectral Learning Methods for Finite State Machines with Applications to Na...LARCA UPC
 
A query language for analyzing networks
A query language for analyzing networksA query language for analyzing networks
A query language for analyzing networksLARCA UPC
 
A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsLARCA UPC
 
Overlapping correlation clustering
Overlapping correlation clusteringOverlapping correlation clustering
Overlapping correlation clusteringLARCA UPC
 
Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application DevelopmentLARCA UPC
 
Semi-random model tree ensembles: an effective and scalable regression method
Semi-random model tree ensembles: an effective and scalable regression method Semi-random model tree ensembles: an effective and scalable regression method
Semi-random model tree ensembles: an effective and scalable regression method LARCA UPC
 
Adaptive pre-processing for streaming data
Adaptive pre-processing for streaming dataAdaptive pre-processing for streaming data
Adaptive pre-processing for streaming dataLARCA UPC
 

More from LARCA UPC (8)

Experiments with Randomisation and Boosting for Multi-instance Classification
Experiments with Randomisation and Boosting for Multi-instance ClassificationExperiments with Randomisation and Boosting for Multi-instance Classification
Experiments with Randomisation and Boosting for Multi-instance Classification
 
Spectral Learning Methods for Finite State Machines with Applications to Na...
  Spectral Learning Methods for Finite State Machines with Applications to Na...  Spectral Learning Methods for Finite State Machines with Applications to Na...
Spectral Learning Methods for Finite State Machines with Applications to Na...
 
A query language for analyzing networks
A query language for analyzing networksA query language for analyzing networks
A query language for analyzing networks
 
A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functions
 
Overlapping correlation clustering
Overlapping correlation clusteringOverlapping correlation clustering
Overlapping correlation clustering
 
Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application Development
 
Semi-random model tree ensembles: an effective and scalable regression method
Semi-random model tree ensembles: an effective and scalable regression method Semi-random model tree ensembles: an effective and scalable regression method
Semi-random model tree ensembles: an effective and scalable regression method
 
Adaptive pre-processing for streaming data
Adaptive pre-processing for streaming dataAdaptive pre-processing for streaming data
Adaptive pre-processing for streaming data
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Distributed Clustering for Smart Grids

  • 1. Distributed Clustering for Smart Grids Pedro Rodrigues, João Gama University of Porto, Portugal Project KDUS (PTDC/EIA-EIA/98355/2008) 4 September 2011 NGDM '11
  • 2. Smart Grids Smart Grids: monitoring information on the top of electrical grid Internet-like communications layer A shift in the way in which power grids are operated Intelligent monitoring in real time Interactive with consumers and markets Optimized to make the best use of resources and equipment Predictive rather than reactive Distributed across geographical and organizational boundaries 2 NGDM '11
  • 3. Smart Grids and Data Mining Smart grid forms a network (eventually decomposable) of distributed sources of high-speed data streams. The dynamics of data are unknown: the topology of network changes over time, the number of meters tends to increase and the context where the meter acts evolves over time. Several data mining tasks are involved: prediction, cluster (profiling) analysis, event and anomaly detection, correlation analysis, etc. All these characteristics constitute real challenges and opportunities for applied research in distributed data mining. The requirements of near real-time analysis for multiple time horizons and multiple space aggregations make these analysis an even harder research challenge. 3 NGDM '11
  • 4. Outline Rationale Clustering distributed data streams Local-to-Global Clustering of data sources 4 NGDM '11
  • 5. Rationale Sensor Networks Sensors are usually small, low-cost devices capable of sensing some attribute and of communicating with other sensors. Sensor networks can include thousands of sensors, each one being capable of measuring, analysing and transmitting a stream of data. Resources are scarse, which reduce the possibilities for heavy computation,while operating under a limited bandwidth. 5 NGDM '11
  • 6. Rationale Comprehension of Ubiquitous Data Streams Comprehension Extract information about global interaction between sources by looking at the data they produce. When no other information is available, usual knowledge discovery approaches are based on unsupervised techniques (e.g. clustering). However, two different stream clustering problems exist: clustering streaming data points (e.g. meter' readings) clustering streaming data sources (e.g. meters) 6 NGDM '11
  • 7. Rationale Comprehension by Clustering Data Points Information about dense regions of the sensor data space. Cluster A Cluster B Cluster C 7 NGDM '11
  • 8. Rationale Comprehension by Clustering Data Sources Information about groups of sensors that behave similarly over time. Possible scenario Cluster A Cluster B Cluster C Sensors collecting electricity demand data from different homes, exploring similar consumption patterns. 8 NGDM '11
  • 9. DGClust Setting and Objective Setting Sensors in a wide network produce streams of heterogeneously distributed data (each sensor produces a univariate stream of data) Objective Cluster A Cluster B Cluster C To keep a clustering of the observations that are created by 9 aggregating each node's data as a feature in a centralized stream. NGDM '11
  • 10. DGClust Problems and Research Question Problems high-speed data streams excessive storage and processing widely spread network heavy communication centralized clustering high dimensionality dynamic data outdated models Research Question Does local discretization and representative clustering improve validity, communication and computation loads when applied to distributed sensor data streams? 10 NGDM '11
  • 11. DGClust Methodology : Local Step DGClust – Distributed Grid Clustering (Local Step) Each sensor keeps an online ordinal discretization of its data. Partition Incremental Discretization Current State low D 11 NGDM '11
  • 12. DGClust Methodology : Aggregating Step DGClust – Distributed Grid Clustering (Aggregating Step) The central server gathers the global state of the network. Sensors whose state has not change since last communication, do not transmit to server. low low low low D D high high high high A A B B B B B B high high low low 12 NGDM '11
  • 13. DGClust Methodology : Representative Step DGClust – Distributed Grid Clustering (Representative Step) Server keeps a small list of the most frequent global states. Space-Saving Frequent Items Monitoring # low high high high 523 low low low low D C C B A D high high high high high low low low 334 D B B B A A B B high high low low low low 89 D A B A A B high low ... 13 NGDM '11
  • 14. DGClust Methodology : Clustering Step DGClust – Distributed Grid Clustering (Clustering Step) Server applies partitional clustering to the most frequent states. Furthest Point Clustering + Online Adaptive K-Means 14 NGDM '11
  • 15. DGClust Example (k=5) Varying Resources 15 NGDM '11
  • 16. DGClust Main Findings Quality of results does not depend on the number of sensors. Communication reduction is constant with any number of sensors (as long as direct link with server exists). higher clustering quality higher discretization granularity lower communication reduction higher number of sensors more clustering updates 16 NGDM '11
  • 17. L2GClust Setting and Objective Setting Sensors in a wide network produce streams of heterogeneously distributed data (each sensor produces a univariate stream of data) Objective Cluster A Cluster B Cluster C To keep, at each node, a clustering of the entire network of sensors. 17 NGDM '11
  • 18. L2GClust Methodology : Local Sketch Each sensor keeps a sketch of its most recent data. 10.2 The common approach for focus on recent data are sliding windows1. Even within the sliding window, the most recent data point is usually more important than the last one which is about to be discarded. In ubiquitous streaming data sources, such as sensor networks, resources like memory and processing power are scarse. Some times, there is not even enough memory to store all the data points inside the window. Memoryless α-fading average 18 NGDM '11
  • 19. L2GClust Example : Local Clustering 1 10 2 100 10 11 99 95 5 10 10 3 12 2 19 NGDM '11
  • 20. L2GClust Example : Local Clustering Centroids {6.9, 98.0} 1 10 2 100 10 11 99 95 5 10 10 3 12 2 20 NGDM '11
  • 21. L2GClust Methodology : Local Clustering This estimate is computed by clustering the centroids of direct neighbors’ estimates of the global clustering. Furthest Point Clustering Basically, each node performs an ensemble of clusterings from its direct neighbors. Instead of broadcasting the sketch of the its own data, each node broadcasts its estimate of the global clustering. 21 NGDM '11
  • 22. L2GClust Example : Local Clustering Centroids {6.9, 98.0} 88.07 87.37 88.06 4.19 2.80 {7.71, 97.1} 3.74 1.21 {10.59, 97.38} 3.58 {5.10, 95.00} 2.41 3.50 88.06 88.03 86.31 88.12 22 NGDM '11
  • 23. L2GClust Example : Local Clustering Centroids {6.9, 98.0} 88.07 87.37 88.06 4.19 2.80 {7.71, 97.1} 3.74 1.21 {10.59, 97.38} 3.58 {5.10, 95.00} 2.41 3.50 88.06 88.03 86.31 88.12 23 NGDM '11
  • 24. L2GClust Example : Local Clustering Centroids {6.9, 98.0} 88.07 87.37 88.06 4.19 2.80 3.74 1.21 {10.36, 97.1} 3.58 2.41 3.50 88.06 88.03 86.31 88.12 24 NGDM '11
  • 25. L2GClust Evaluation Summary Comparison was performed with same strategy executed at a central server with access to all data. Measured outcomes were the agreement between a node's clustering estimate and the centralized clustering, averaged over all nodes. Kappa statistic cluster sanity Proportion of agreement cluster validity K=(P(A)-P(e))/(1-P(e)) State-of-the-art Simulator Each sensor in the simulation (Visual Sense) generates a Gaussian stream with mean from one of the predefined Gaussian clusters. Evaluated parameters were number of clusters, network size, and cluster overlap. 25 NGDM '11
  • 26. L2GClust Results 26 Average proportion of agreement converges (with small fluctuations). NGDM '11
  • 27. L2GClust Results 27 Sanity was confirmed with Kappa statistic always above 0.58. NGDM '11
  • 28. L2GClust Results 28 Real data from electricity demand sensors showed ability to improve with examples. NGDM '11
  • 29. L2GClust Main Properties Local sketch yields: memoryless storage of summaries; a straightforward adaptation to most recent data; a reduction of the system's sensitivity to uncertainty; Local clustering with direct neighbors yields: no forwarding of information (reduced communication); low dimensionality of the clustering problem; sensitive information better preserved. Future Work Evaluate L2GClust on smart grid sensor networks. 29 NGDM '11
  • 30. Thank you! 30 NGDM '11