SlideShare a Scribd company logo
1 of 26
Download to read offline
SSDBM, 20-22/7/2011, Portland OR
Hans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek
Ludwig-Maximilians-Universität (LMU)
Munich,Germany
www.dbs.ifi.lmu.de
 Motivation
 Subspace clustering for static data (PreDeCon)
 Subspace clustering for dynamic data (incPreDeCon)
 Experiments
 Conclusions & Outlook
Density Based Subspace Clustering over Dynamic Data 2
 High dimensionality
 We tend to store more and more details regarding our applications
 Dynamic nature
 We tend to keep track of the population evolution over time
 Due to the advances in hardware/ software, we have nowadays
the ability to store all these data
 Applications: Telco, Banks, Retail industry, WWW ...
Density Based Subspace Clustering over Dynamic Data 3
The clustering problem becomes even
harder under these characteristics!
 Data evolution  cluster evolution
 How can we maintain online the clusters as new data arrive over time?
 Different lines of research, corresponding to different
application requirements:
 Incremental methods
▪ Appropriate for data arriving at a low rate
▪ Require access to the raw data for the re-arrangement of the clustering, but produces
lossless results (e.g. incDBSCAN, incOPTICS)
 Stream methods
▪ Appropriate for data arriving at a rapid rate
▪ No random access to the raw data, work upon summaries, thus produce lossy results
(e.g. CluStream, DenStream)
Density Based Subspace Clustering over Dynamic Data 4
1. The “curse of dimensionality”
 (Dmax_d – Dmin_d) / Dmin_d converges to zero with
increasing dimensionality d
▪ Dmin_d: distance to the nearest neighbor in d dimensions
▪ Dmax_d: distance to the farthest neighbor in d dimensions
2. Different features may be relevant for different
clusters
5
 Different solutions have been proposed:
 Feature selection methods (e.g. PCA): fail in 2., because they are global
 Subspace clustering methods: search for both clusters and subspaces
where these clusters exist
Density Based Subspace Clustering over Dynamic Data
 We chose:
 Incremental clustering to deal with the dynamic nature of the data
 Subspace clustering to deal with the high dimensionality of the data
 We work with the (static) algorithm PreDeCon:
 a subspace clustering algorithm
 relies on a density based model  updates are expected to cause only
limited local changes
 We propose an incremental version that maintains density
based subspace clusters as new data arrive over time
▪ Allows both points and dimensions associated to a cluster to evolve
▪ Deals with both single and batch updates
▪ Can serve as a framework for monitoring changes in dynamic
environments
Density Based Subspace Clustering over Dynamic Data 6
 Motivation
 Subspace clustering for static data (PreDeCon)
 Subspace clustering for dynamic data (incPreDeCon)
 Experiments
 Conclusions & Outlook
Density Based Subspace Clustering over Dynamic Data 7
 Extends DBSCAN to high dimensional spaces by incorporating
the notion of dimension preferences in the distance function
 For each point p, it defines its subspace preference vector:
 VARi is the variance along dimension j in Nε(p):
Density Based Subspace Clustering over Dynamic Data 8
p
Ai
Aj
δ, κ (κ>>1) are input parameters
 Preference weighted distance function:
 Preference weighted ε-neighborhood:
Density Based Subspace Clustering over Dynamic Data 9
simple
ε-neighborhood
preference weighted
ε-neighborhood

 Preference weighted core points:
 Direct density reachability, reachability and
connectivity are defined based on preference
weighted core points
 A subspace preference cluster is a maximal
density connected set of points associated
with a certain subspace preference vector.
Density Based Subspace Clustering over Dynamic Data 10
1 2
λ=1
μ=6
 Motivation
 Subspace clustering for static data (PreDeCon)
 Subspace clustering for dynamic data (incPreDeCon)
 Experiments
 Conclusions & Outlook
Density Based Subspace Clustering over Dynamic Data 11
 Observation: A subspace preference
cluster is uniquely determined by one
of its preference weighted core points.
 Idea: Check whether the insertion of a
new point p affects the core member
property of the points in the dataset
12Density Based Subspace Clustering over Dynamic Data
 3 interesting changes might occur w.r.t. core property:
 a non-core point might turn into core  new density connections
 a core point might turn into non-core  demolished density
connections
 a core point might remain core, but under different dimension
preferences  both cases
p
λ=1
μ=6
 The insertion of p, directly affects the points q
in its ε-neighborhood.
 Nε(q) is affected because p is now a member of it
13
The core property
of q might be
affected
Density Based Subspace Clustering over Dynamic Data
p
q

v
 
λ=1
μ=6
 non-core  core  core  non-core
 core  core, under
different preferences
 Effect in core property:
 The change in the core property of q, might cause changes to
points that are preference weighted reachable from q:
 if q: corenon-core, any density
connectivity relying on q is
destroyed
 if q: non-corecore, some new
density connectivity might arise
 If q core  core but under different
dimension preferences, both might occur
 Affected points:
14Density Based Subspace Clustering over Dynamic Data
λ=1
μ=6
 Naïve solution: cluster AFFECTEDD(p) from scratch
 But, any changes in AFFECTEDD(p) are initiated by points in
Nε(p)
 No need to consider all points in Nε(p), just those with affected
core member property (AFFECTEDCORE)
 If a point q’ is an affected core point, we consider as seeds points
for its update any core point q in its preferred neighborhood.
 Apply the expand procedure of PreDeCon using the points in the
UPDSEED
15Density Based Subspace Clustering over Dynamic Data
 Idea: Don’t treat insertions separately, rather
insert the whole batch and update the
clustering based on the whole batch
 Is more efficient since some computations might
take place only once:
 The preference vector computation for each point
 The set of affected core points
 The affected points
 Efficient for cases where the updates are related
to each other
 e.g. they might correspond to patients suffering from a
specific disease
Density Based Subspace Clustering over Dynamic Data 17
 Motivation
 Subspace clustering for static data (PreDeCon)
 Subspace clustering for dynamic data (incPreDeCon)
 Experiments
 Conclusions & Outlook
Density Based Subspace Clustering over Dynamic Data 18
 We compared incPreDeCon to PreDeCon
 We evaluated # range queries required by each algorithm
 For each dataset,
 Perform 100 random inserts
 Compute the range queries for PreDeCon
 Compute the range queries for incPreDeCon
Density Based Subspace Clustering over Dynamic Data 19
 Varying the population  Varying the dimensions
Density Based Subspace Clustering over Dynamic Data 20
 Random updates  “Local” updates
Density Based Subspace Clustering over Dynamic Data 21
 Concentration of different metabolites in the blood of
newborns in Bavaria
Density Based Subspace Clustering over Dynamic Data 22
 Motivation
 Subspace clustering for static data (PreDeCon)
 Subspace clustering for dynamic data (incPreDeCon)
 Experiments
 Conclusions & Outlook
Density Based Subspace Clustering over Dynamic Data 23
 We proposed a density based subspace clustering algorithm for
dynamic data
 It allows both points and dimensions associated to a cluster to evolve
over time
 It deals with single and batch updates
 It can serve as a framework for monitoring changes in dynamic
environments
Density Based Subspace Clustering over Dynamic Data 24
Partitioning
methods
Inc/ Stream
clustering
• Single-pass k-Means
• STREAM k-Means
• CluStream
• k-Means
• k-Medoids
• DenStream
• incDBSCAN
• incOPTICS
• DBSCAN
•OPTICS
• STING • DStream
Density-based
methods
Grid-based
methods
25
Static
clustering
Subspace
clustering
•PROCLUS
•PreDeCon
•CLIQUE
Inc/ Stream
Subspace
clustering
•HPStream
•incPreDeCon
•DUCStream
Density Based Subspace Clustering over Dynamic Data
Density Based Subspace Clustering over Dynamic Data 26
Thank you
for your attention!
The speaker‘s attendance at this conference was sponsored by
the Alexander von Humboldt Foundation
http://www.humboldt-foundation.de
Density Based Subspace Clustering over Dynamic Data 27

More Related Content

What's hot

A Survey on Balancing the Network Load Using Geographic Hash Tables
A Survey on Balancing the Network Load Using Geographic Hash TablesA Survey on Balancing the Network Load Using Geographic Hash Tables
A Survey on Balancing the Network Load Using Geographic Hash TablesIOSR Journals
 
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...Subhajit Sahu
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKSA FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKScsandit
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Cryptographic Cloud Storage with Hadoop Implementation
Cryptographic Cloud Storage with Hadoop ImplementationCryptographic Cloud Storage with Hadoop Implementation
Cryptographic Cloud Storage with Hadoop ImplementationIOSR Journals
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...IEEEGLOBALSOFTSTUDENTPROJECTS
 
REDUCING FREQUENCY OF GROUP REKEYING OPERATION
REDUCING FREQUENCY OF GROUP REKEYING OPERATIONREDUCING FREQUENCY OF GROUP REKEYING OPERATION
REDUCING FREQUENCY OF GROUP REKEYING OPERATIONcsandit
 
Selection of Energy Efficient Clustering Algorithm upon Cluster Head Failure ...
Selection of Energy Efficient Clustering Algorithm upon Cluster Head Failure ...Selection of Energy Efficient Clustering Algorithm upon Cluster Head Failure ...
Selection of Energy Efficient Clustering Algorithm upon Cluster Head Failure ...ijsrd.com
 
A Framework for Design of Partially Replicated Distributed Database Systems w...
A Framework for Design of Partially Replicated Distributed Database Systems w...A Framework for Design of Partially Replicated Distributed Database Systems w...
A Framework for Design of Partially Replicated Distributed Database Systems w...ijdms
 
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...csandit
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...Papitha Velumani
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Intel® Software
 
Refining the Estimation of the Available Bandwidth in Inter-Cloud Links for T...
Refining the Estimation of the Available Bandwidth in Inter-Cloud Links for T...Refining the Estimation of the Available Bandwidth in Inter-Cloud Links for T...
Refining the Estimation of the Available Bandwidth in Inter-Cloud Links for T...Thiago Genez
 
Benefit based data caching in ad hoc networks (synopsis)
Benefit based data caching in ad hoc networks (synopsis)Benefit based data caching in ad hoc networks (synopsis)
Benefit based data caching in ad hoc networks (synopsis)Mumbai Academisc
 

What's hot (18)

A Survey on Balancing the Network Load Using Geographic Hash Tables
A Survey on Balancing the Network Load Using Geographic Hash TablesA Survey on Balancing the Network Load Using Geographic Hash Tables
A Survey on Balancing the Network Load Using Geographic Hash Tables
 
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKSA FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Cryptographic Cloud Storage with Hadoop Implementation
Cryptographic Cloud Storage with Hadoop ImplementationCryptographic Cloud Storage with Hadoop Implementation
Cryptographic Cloud Storage with Hadoop Implementation
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
Todtree
TodtreeTodtree
Todtree
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
 
REDUCING FREQUENCY OF GROUP REKEYING OPERATION
REDUCING FREQUENCY OF GROUP REKEYING OPERATIONREDUCING FREQUENCY OF GROUP REKEYING OPERATION
REDUCING FREQUENCY OF GROUP REKEYING OPERATION
 
Selection of Energy Efficient Clustering Algorithm upon Cluster Head Failure ...
Selection of Energy Efficient Clustering Algorithm upon Cluster Head Failure ...Selection of Energy Efficient Clustering Algorithm upon Cluster Head Failure ...
Selection of Energy Efficient Clustering Algorithm upon Cluster Head Failure ...
 
A Framework for Design of Partially Replicated Distributed Database Systems w...
A Framework for Design of Partially Replicated Distributed Database Systems w...A Framework for Design of Partially Replicated Distributed Database Systems w...
A Framework for Design of Partially Replicated Distributed Database Systems w...
 
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
 
Refining the Estimation of the Available Bandwidth in Inter-Cloud Links for T...
Refining the Estimation of the Available Bandwidth in Inter-Cloud Links for T...Refining the Estimation of the Available Bandwidth in Inter-Cloud Links for T...
Refining the Estimation of the Available Bandwidth in Inter-Cloud Links for T...
 
Benefit based data caching in ad hoc networks (synopsis)
Benefit based data caching in ad hoc networks (synopsis)Benefit based data caching in ad hoc networks (synopsis)
Benefit based data caching in ad hoc networks (synopsis)
 

Similar to Density Based Subspace Clustering Over Dynamic Data

3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Energy efficient communication techniques for wireless micro sensor networks
Energy efficient communication techniques for wireless micro sensor networksEnergy efficient communication techniques for wireless micro sensor networks
Energy efficient communication techniques for wireless micro sensor networksPushpita Biswas
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data StreamIRJET Journal
 
Lecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning RecapLecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning Recapbutest
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstracttsysglobalsolutions
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELJenny Liu
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
 
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...IJCNCJournal
 
Managing Multidimensional Historical
Managing Multidimensional HistoricalManaging Multidimensional Historical
Managing Multidimensional HistoricalArul Suju
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATANexgen Technology
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...csandit
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingIRJET Journal
 
Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangEugine Kang
 

Similar to Density Based Subspace Clustering Over Dynamic Data (20)

3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Energy efficient communication techniques for wireless micro sensor networks
Energy efficient communication techniques for wireless micro sensor networksEnergy efficient communication techniques for wireless micro sensor networks
Energy efficient communication techniques for wireless micro sensor networks
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream
 
CNN
CNNCNN
CNN
 
Lecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning RecapLecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning Recap
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
 
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
 
Managing Multidimensional Historical
Managing Multidimensional HistoricalManaging Multidimensional Historical
Managing Multidimensional Historical
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
CLOUD BIOINFORMATICS Part1
 CLOUD BIOINFORMATICS Part1 CLOUD BIOINFORMATICS Part1
CLOUD BIOINFORMATICS Part1
 
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
 
Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine Kang
 

More from Eirini Ntoutsi

AAISI AI Colloquium 30/3/2021: Bias in AI systems
AAISI AI Colloquium 30/3/2021: Bias in AI systemsAAISI AI Colloquium 30/3/2021: Bias in AI systems
AAISI AI Colloquium 30/3/2021: Bias in AI systemsEirini Ntoutsi
 
Fairness-aware learning: From single models to sequential ensemble learning a...
Fairness-aware learning: From single models to sequential ensemble learning a...Fairness-aware learning: From single models to sequential ensemble learning a...
Fairness-aware learning: From single models to sequential ensemble learning a...Eirini Ntoutsi
 
Bias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approachBias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approachEirini Ntoutsi
 
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Eirini Ntoutsi
 
A Machine Learning Primer,
A Machine Learning Primer,A Machine Learning Primer,
A Machine Learning Primer,Eirini Ntoutsi
 
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...Eirini Ntoutsi
 
Redundancies in Data and their E ect on the Evaluation of Recommendation Syst...
Redundancies in Data and their Eect on the Evaluation of Recommendation Syst...Redundancies in Data and their Eect on the Evaluation of Recommendation Syst...
Redundancies in Data and their E ect on the Evaluation of Recommendation Syst...Eirini Ntoutsi
 
Density-based Projected Clustering over High Dimensional Data Streams
Density-based Projected Clustering over High Dimensional Data StreamsDensity-based Projected Clustering over High Dimensional Data Streams
Density-based Projected Clustering over High Dimensional Data StreamsEirini Ntoutsi
 
Summarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic EnvironmentsSummarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic EnvironmentsEirini Ntoutsi
 
Cluster formation over huge volatile robotic data
Cluster formation over huge volatile robotic data Cluster formation over huge volatile robotic data
Cluster formation over huge volatile robotic data Eirini Ntoutsi
 
Discovering and Monitoring Product Features and the Opinions on them with OPI...
Discovering and Monitoring Product Features and the Opinions on them with OPI...Discovering and Monitoring Product Features and the Opinions on them with OPI...
Discovering and Monitoring Product Features and the Opinions on them with OPI...Eirini Ntoutsi
 

More from Eirini Ntoutsi (12)

AAISI AI Colloquium 30/3/2021: Bias in AI systems
AAISI AI Colloquium 30/3/2021: Bias in AI systemsAAISI AI Colloquium 30/3/2021: Bias in AI systems
AAISI AI Colloquium 30/3/2021: Bias in AI systems
 
Fairness-aware learning: From single models to sequential ensemble learning a...
Fairness-aware learning: From single models to sequential ensemble learning a...Fairness-aware learning: From single models to sequential ensemble learning a...
Fairness-aware learning: From single models to sequential ensemble learning a...
 
Bias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approachBias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approach
 
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
 
A Machine Learning Primer,
A Machine Learning Primer,A Machine Learning Primer,
A Machine Learning Primer,
 
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
 
Redundancies in Data and their E ect on the Evaluation of Recommendation Syst...
Redundancies in Data and their Eect on the Evaluation of Recommendation Syst...Redundancies in Data and their Eect on the Evaluation of Recommendation Syst...
Redundancies in Data and their E ect on the Evaluation of Recommendation Syst...
 
Density-based Projected Clustering over High Dimensional Data Streams
Density-based Projected Clustering over High Dimensional Data StreamsDensity-based Projected Clustering over High Dimensional Data Streams
Density-based Projected Clustering over High Dimensional Data Streams
 
Summarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic EnvironmentsSummarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic Environments
 
Cluster formation over huge volatile robotic data
Cluster formation over huge volatile robotic data Cluster formation over huge volatile robotic data
Cluster formation over huge volatile robotic data
 
Discovering and Monitoring Product Features and the Opinions on them with OPI...
Discovering and Monitoring Product Features and the Opinions on them with OPI...Discovering and Monitoring Product Features and the Opinions on them with OPI...
Discovering and Monitoring Product Features and the Opinions on them with OPI...
 
NeeMo@OpenCoffeeXX
NeeMo@OpenCoffeeXXNeeMo@OpenCoffeeXX
NeeMo@OpenCoffeeXX
 

Recently uploaded

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Recently uploaded (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

Density Based Subspace Clustering Over Dynamic Data

  • 1. SSDBM, 20-22/7/2011, Portland OR Hans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek Ludwig-Maximilians-Universität (LMU) Munich,Germany www.dbs.ifi.lmu.de
  • 2.  Motivation  Subspace clustering for static data (PreDeCon)  Subspace clustering for dynamic data (incPreDeCon)  Experiments  Conclusions & Outlook Density Based Subspace Clustering over Dynamic Data 2
  • 3.  High dimensionality  We tend to store more and more details regarding our applications  Dynamic nature  We tend to keep track of the population evolution over time  Due to the advances in hardware/ software, we have nowadays the ability to store all these data  Applications: Telco, Banks, Retail industry, WWW ... Density Based Subspace Clustering over Dynamic Data 3 The clustering problem becomes even harder under these characteristics!
  • 4.  Data evolution  cluster evolution  How can we maintain online the clusters as new data arrive over time?  Different lines of research, corresponding to different application requirements:  Incremental methods ▪ Appropriate for data arriving at a low rate ▪ Require access to the raw data for the re-arrangement of the clustering, but produces lossless results (e.g. incDBSCAN, incOPTICS)  Stream methods ▪ Appropriate for data arriving at a rapid rate ▪ No random access to the raw data, work upon summaries, thus produce lossy results (e.g. CluStream, DenStream) Density Based Subspace Clustering over Dynamic Data 4
  • 5. 1. The “curse of dimensionality”  (Dmax_d – Dmin_d) / Dmin_d converges to zero with increasing dimensionality d ▪ Dmin_d: distance to the nearest neighbor in d dimensions ▪ Dmax_d: distance to the farthest neighbor in d dimensions 2. Different features may be relevant for different clusters 5  Different solutions have been proposed:  Feature selection methods (e.g. PCA): fail in 2., because they are global  Subspace clustering methods: search for both clusters and subspaces where these clusters exist Density Based Subspace Clustering over Dynamic Data
  • 6.  We chose:  Incremental clustering to deal with the dynamic nature of the data  Subspace clustering to deal with the high dimensionality of the data  We work with the (static) algorithm PreDeCon:  a subspace clustering algorithm  relies on a density based model  updates are expected to cause only limited local changes  We propose an incremental version that maintains density based subspace clusters as new data arrive over time ▪ Allows both points and dimensions associated to a cluster to evolve ▪ Deals with both single and batch updates ▪ Can serve as a framework for monitoring changes in dynamic environments Density Based Subspace Clustering over Dynamic Data 6
  • 7.  Motivation  Subspace clustering for static data (PreDeCon)  Subspace clustering for dynamic data (incPreDeCon)  Experiments  Conclusions & Outlook Density Based Subspace Clustering over Dynamic Data 7
  • 8.  Extends DBSCAN to high dimensional spaces by incorporating the notion of dimension preferences in the distance function  For each point p, it defines its subspace preference vector:  VARi is the variance along dimension j in Nε(p): Density Based Subspace Clustering over Dynamic Data 8 p Ai Aj δ, κ (κ>>1) are input parameters
  • 9.  Preference weighted distance function:  Preference weighted ε-neighborhood: Density Based Subspace Clustering over Dynamic Data 9 simple ε-neighborhood preference weighted ε-neighborhood 
  • 10.  Preference weighted core points:  Direct density reachability, reachability and connectivity are defined based on preference weighted core points  A subspace preference cluster is a maximal density connected set of points associated with a certain subspace preference vector. Density Based Subspace Clustering over Dynamic Data 10 1 2 λ=1 μ=6
  • 11.  Motivation  Subspace clustering for static data (PreDeCon)  Subspace clustering for dynamic data (incPreDeCon)  Experiments  Conclusions & Outlook Density Based Subspace Clustering over Dynamic Data 11
  • 12.  Observation: A subspace preference cluster is uniquely determined by one of its preference weighted core points.  Idea: Check whether the insertion of a new point p affects the core member property of the points in the dataset 12Density Based Subspace Clustering over Dynamic Data  3 interesting changes might occur w.r.t. core property:  a non-core point might turn into core  new density connections  a core point might turn into non-core  demolished density connections  a core point might remain core, but under different dimension preferences  both cases p λ=1 μ=6
  • 13.  The insertion of p, directly affects the points q in its ε-neighborhood.  Nε(q) is affected because p is now a member of it 13 The core property of q might be affected Density Based Subspace Clustering over Dynamic Data p q  v   λ=1 μ=6  non-core  core  core  non-core  core  core, under different preferences  Effect in core property:
  • 14.  The change in the core property of q, might cause changes to points that are preference weighted reachable from q:  if q: corenon-core, any density connectivity relying on q is destroyed  if q: non-corecore, some new density connectivity might arise  If q core  core but under different dimension preferences, both might occur  Affected points: 14Density Based Subspace Clustering over Dynamic Data λ=1 μ=6
  • 15.  Naïve solution: cluster AFFECTEDD(p) from scratch  But, any changes in AFFECTEDD(p) are initiated by points in Nε(p)  No need to consider all points in Nε(p), just those with affected core member property (AFFECTEDCORE)  If a point q’ is an affected core point, we consider as seeds points for its update any core point q in its preferred neighborhood.  Apply the expand procedure of PreDeCon using the points in the UPDSEED 15Density Based Subspace Clustering over Dynamic Data
  • 16.  Idea: Don’t treat insertions separately, rather insert the whole batch and update the clustering based on the whole batch  Is more efficient since some computations might take place only once:  The preference vector computation for each point  The set of affected core points  The affected points  Efficient for cases where the updates are related to each other  e.g. they might correspond to patients suffering from a specific disease Density Based Subspace Clustering over Dynamic Data 17
  • 17.  Motivation  Subspace clustering for static data (PreDeCon)  Subspace clustering for dynamic data (incPreDeCon)  Experiments  Conclusions & Outlook Density Based Subspace Clustering over Dynamic Data 18
  • 18.  We compared incPreDeCon to PreDeCon  We evaluated # range queries required by each algorithm  For each dataset,  Perform 100 random inserts  Compute the range queries for PreDeCon  Compute the range queries for incPreDeCon Density Based Subspace Clustering over Dynamic Data 19
  • 19.  Varying the population  Varying the dimensions Density Based Subspace Clustering over Dynamic Data 20
  • 20.  Random updates  “Local” updates Density Based Subspace Clustering over Dynamic Data 21
  • 21.  Concentration of different metabolites in the blood of newborns in Bavaria Density Based Subspace Clustering over Dynamic Data 22
  • 22.  Motivation  Subspace clustering for static data (PreDeCon)  Subspace clustering for dynamic data (incPreDeCon)  Experiments  Conclusions & Outlook Density Based Subspace Clustering over Dynamic Data 23
  • 23.  We proposed a density based subspace clustering algorithm for dynamic data  It allows both points and dimensions associated to a cluster to evolve over time  It deals with single and batch updates  It can serve as a framework for monitoring changes in dynamic environments Density Based Subspace Clustering over Dynamic Data 24
  • 24. Partitioning methods Inc/ Stream clustering • Single-pass k-Means • STREAM k-Means • CluStream • k-Means • k-Medoids • DenStream • incDBSCAN • incOPTICS • DBSCAN •OPTICS • STING • DStream Density-based methods Grid-based methods 25 Static clustering Subspace clustering •PROCLUS •PreDeCon •CLIQUE Inc/ Stream Subspace clustering •HPStream •incPreDeCon •DUCStream Density Based Subspace Clustering over Dynamic Data
  • 25. Density Based Subspace Clustering over Dynamic Data 26 Thank you for your attention!
  • 26. The speaker‘s attendance at this conference was sponsored by the Alexander von Humboldt Foundation http://www.humboldt-foundation.de Density Based Subspace Clustering over Dynamic Data 27