SlideShare a Scribd company logo
1 of 2
Download to read offline
Density-based Projected Clustering over High Dimensional Data Streams
Information Engineering and Computer
Science Department University of Trento,
Italy
Institute for Informatics,
Ludwig-Maximilians-Universität,
München, Germany
EIRINI NTOUTSI1, ARTHUR ZIMEK1, THEMIS PALPANAS2, PEER KRÖGER1, HANS-PETER KRIEGEL1
1Institute for Informatics, Ludwig-Maximilians-Universität (LMU) München, Germany
2Information Engineering and Computer Science Department (DISI), University of Trento, Italy
HIGH DIMENSIONAL & DYNAMIC DATA:
• High dimensionality
- overlapping / irrelevant/ locally relevant attributes
• Dynamic/ Stream data
- only 1 look at the data (upon their arrival) / volatile data
! Clustering upon such kind of data is very challenging!!
- both members and dimensions might evolve over time
 Most existing approaches deal with 1 aspect of the problem
 HPStream assumes a constant #clusters over time
OUR SOLUTION (HDDSTREAM):
1st density-based projected clustering algorithm for
clustering high dimensional data streams:
• High dimensionality  projected clustering
• Stream data  online summarize– offline cluster
• Density based clustering  no assumption on
#clusters/ invariant to outliers/ arbitrary shapes
 Quality improvement
 Bounded memory
 # summaries adapts to the underlying population
PROJECTED MICROCLUSTERS:
For points C={p1, …, pn} arriving at t1, …, tn, the summary
models both content and dimension preferences of C at t.
mc ≡mc(C,t) = <CF1(t), CF2(t), W(t)>
- CF1(t),/ CF2(t): temporal weighted linear/square sum of the points
- W(t): sum of points weights
Φ(mc) = <φ1, φ2, .., φd>
- j is a preferred dimension if: VARj(mc) ≤ δ
CLUSTERS & OUTLIERS
• Core projected microclusters (CORE-PMC):
(1) radiusΦ(mc) ≤ ε (radius criterion)
(2) W(t) ≥ μ (density criterion)
(3) PDIM(mc) ≤ π (dimensionality criterion)
The role of clusters and outliers in a stream often exchange:
• Potential core PMC (PCORE-PMC):
(1), (2) relax the density criterion W(t) ≥ β*μ, (3)
• Outlier MC (O-MC) :
(1), (2) relax density criterionW(t) ≥ β*μ, (3) relax dim. criterion
 QUALITY IMPROVEMENT  CHANGE DETECTION & MONITORING
NetworkIntrusiondataset
ForestsCoverTypesdataset
Content summary
Dimension preference vector
Density-based Projected Clustering over High Dimensional Data Streams
Information Engineering and Computer
Science Department University of Trento,
Italy
Institute for Informatics,
Ludwig-Maximilians-Universität,
München, Germany
EIRINI NTOUTSI1, ARTHUR ZIMEK1, THEMIS PALPANAS2, PEER KRÖGER1, HANS-PETER KRIEGEL1
1Institute for Informatics, Ludwig-Maximilians-Universität (LMU) München, Germany
2Information Engineering and Computer Science Department (DISI), University of Trento, Italy
HIGH DIMENSIONAL & DYNAMIC DATA:
• High dimensionality
- overlapping / irrelevant/ locally relevant attributes
• Dynamic/ Stream data
- only 1 look at the data (upon their arrival) / volatile data
! Clustering upon such kind of data is very challenging!!
- both members and dimensions might evolve over time
 Most existing approaches deal with 1 aspect of the problem
 HPStream assumes a constant #clusters over time
OUR SOLUTION (HDDSTREAM):
1st density-based projected clustering algorithm for
clustering high dimensional data streams:
• High dimensionality  projected clustering
• Stream data  online summarize– offline cluster
• Density based clustering  no assumption on
#clusters/ invariant to outliers/ arbitrary shapes
 Quality improvement
 Bounded memory
 # summaries adapts to the underlying population
PROJECTED MICROCLUSTERS:
For points C={p1, …, pn} arriving at t1, …, tn, the summary
models both content and dimension preferences of C at t.
mc ≡mc(C,t) = <CF1(t), CF2(t), W(t)>
- CF1(t),/ CF2(t): temporal weighted linear/square sum of the points
- W(t): sum of points weights
Φ(mc) = <φ1, φ2, .., φd>
- j is a preferred dimension if: VARj(mc) ≤ δ
CLUSTERS & OUTLIERS
• Core projected microclusters (CORE-PMC):
(1) radiusΦ(mc) ≤ ε (radius criterion)
(2) W(t) ≥ μ (density criterion)
(3) PDIM(mc) ≤ π (dimensionality criterion)
The role of clusters and outliers in a stream often exchange:
• Potential core PMC (PCORE-PMC):
(1), (2) relax the density criterion W(t) ≥ β*μ, (3)
• Outlier MC (O-MC) :
(1), (2) relax density criterionW(t) ≥ β*μ, (3) relax dim. criterion
0
10
20
30
40
50
60
70
80
90
100
200 500 1000 1500 2000 3000
a
v
g
p
u
rity
(%
)
window size
HPStream HDDStream
 QUALITY IMPROVEMENT  CHANGE DETECTION & MONITORING
NetworkIntrusiondataset
Forests Cover Types dataset
Content summary
Dimension preference vector

More Related Content

What's hot

Survey on clustering based color image segmentation and novel approaches to f...
Survey on clustering based color image segmentation and novel approaches to f...Survey on clustering based color image segmentation and novel approaches to f...
Survey on clustering based color image segmentation and novel approaches to f...
eSAT Journals
 

What's hot (20)

FUAT – A Fuzzy Clustering Analysis Tool
FUAT – A Fuzzy Clustering Analysis ToolFUAT – A Fuzzy Clustering Analysis Tool
FUAT – A Fuzzy Clustering Analysis Tool
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
NODE FAILURE TIME AND COVERAGE LOSS TIME ANALYSIS FOR MAXIMUM STABILITY VS MI...
NODE FAILURE TIME AND COVERAGE LOSS TIME ANALYSIS FOR MAXIMUM STABILITY VS MI...NODE FAILURE TIME AND COVERAGE LOSS TIME ANALYSIS FOR MAXIMUM STABILITY VS MI...
NODE FAILURE TIME AND COVERAGE LOSS TIME ANALYSIS FOR MAXIMUM STABILITY VS MI...
 
Survey on clustering based color image segmentation
Survey on clustering based color image segmentationSurvey on clustering based color image segmentation
Survey on clustering based color image segmentation
 
Survey on clustering based color image segmentation and novel approaches to f...
Survey on clustering based color image segmentation and novel approaches to f...Survey on clustering based color image segmentation and novel approaches to f...
Survey on clustering based color image segmentation and novel approaches to f...
 
Fuzzy c-means
Fuzzy c-meansFuzzy c-means
Fuzzy c-means
 
A Transmission Range Based Clustering Algorithm for Topology Control Manet
A Transmission Range Based Clustering Algorithm for Topology Control ManetA Transmission Range Based Clustering Algorithm for Topology Control Manet
A Transmission Range Based Clustering Algorithm for Topology Control Manet
 
OPTIMIZED TASK ALLOCATION IN SENSOR NETWORKS
OPTIMIZED TASK ALLOCATION IN SENSOR NETWORKSOPTIMIZED TASK ALLOCATION IN SENSOR NETWORKS
OPTIMIZED TASK ALLOCATION IN SENSOR NETWORKS
 
Paper id 26201482
Paper id 26201482Paper id 26201482
Paper id 26201482
 
Characterization of Random Carrier Sense Multiple Access wireless Networks
Characterization of Random Carrier Sense Multiple Access wireless NetworksCharacterization of Random Carrier Sense Multiple Access wireless Networks
Characterization of Random Carrier Sense Multiple Access wireless Networks
 
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
Optimum Reliability Analysis of Mobile Adhoc Networks using Universal Generat...
Optimum Reliability Analysis of Mobile Adhoc Networks using Universal Generat...Optimum Reliability Analysis of Mobile Adhoc Networks using Universal Generat...
Optimum Reliability Analysis of Mobile Adhoc Networks using Universal Generat...
 
Cray HPC + D + A = HPDA
Cray HPC + D + A = HPDACray HPC + D + A = HPDA
Cray HPC + D + A = HPDA
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
 
Comparative analysis of congestion
Comparative analysis of congestionComparative analysis of congestion
Comparative analysis of congestion
 
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
 

Similar to Density-based Projected Clustering over High Dimensional Data Streams

Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...
csandit
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
csandit
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
cscpconf
 
slides-defense-jie
slides-defense-jieslides-defense-jie
slides-defense-jie
jie ren
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...
butest
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
butest
 
System Monitoring
System MonitoringSystem Monitoring
System Monitoring
butest
 
A density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningA density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data mining
eSAT Publishing House
 

Similar to Density-based Projected Clustering over High Dimensional Data Streams (20)

Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...Adaptive blind multiuser detection under impulsive noise using principal comp...
Adaptive blind multiuser detection under impulsive noise using principal comp...
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
 
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
ADAPTIVE BLIND MULTIUSER DETECTION UNDER IMPULSIVE NOISE USING PRINCIPAL COMP...
 
slides-defense-jie
slides-defense-jieslides-defense-jie
slides-defense-jie
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...
 
Expert estimation from Multimodal Features
Expert estimation from Multimodal FeaturesExpert estimation from Multimodal Features
Expert estimation from Multimodal Features
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
011_20160321_Topological_data_analysis_of_contagion_map
011_20160321_Topological_data_analysis_of_contagion_map011_20160321_Topological_data_analysis_of_contagion_map
011_20160321_Topological_data_analysis_of_contagion_map
 
Clustering
ClusteringClustering
Clustering
 
Dsss final
Dsss finalDsss final
Dsss final
 
Digital communication unit II
Digital communication unit IIDigital communication unit II
Digital communication unit II
 
Tensor-based recommendation system
Tensor-based recommendation systemTensor-based recommendation system
Tensor-based recommendation system
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
Tdm fdm
Tdm fdmTdm fdm
Tdm fdm
 
Lecture 3 image sampling and quantization
Lecture 3 image sampling and quantizationLecture 3 image sampling and quantization
Lecture 3 image sampling and quantization
 
System Monitoring
System MonitoringSystem Monitoring
System Monitoring
 
How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...
How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...
How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...
 
A density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningA density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data mining
 
A density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningA density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data mining
 

More from Eirini Ntoutsi

More from Eirini Ntoutsi (12)

AAISI AI Colloquium 30/3/2021: Bias in AI systems
AAISI AI Colloquium 30/3/2021: Bias in AI systemsAAISI AI Colloquium 30/3/2021: Bias in AI systems
AAISI AI Colloquium 30/3/2021: Bias in AI systems
 
Fairness-aware learning: From single models to sequential ensemble learning a...
Fairness-aware learning: From single models to sequential ensemble learning a...Fairness-aware learning: From single models to sequential ensemble learning a...
Fairness-aware learning: From single models to sequential ensemble learning a...
 
Bias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approachBias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approach
 
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
 
A Machine Learning Primer,
A Machine Learning Primer,A Machine Learning Primer,
A Machine Learning Primer,
 
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
 
Redundancies in Data and their E ect on the Evaluation of Recommendation Syst...
Redundancies in Data and their Eect on the Evaluation of Recommendation Syst...Redundancies in Data and their Eect on the Evaluation of Recommendation Syst...
Redundancies in Data and their E ect on the Evaluation of Recommendation Syst...
 
Density Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic DataDensity Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic Data
 
Summarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic EnvironmentsSummarizing Cluster Evolution in Dynamic Environments
Summarizing Cluster Evolution in Dynamic Environments
 
Cluster formation over huge volatile robotic data
Cluster formation over huge volatile robotic data Cluster formation over huge volatile robotic data
Cluster formation over huge volatile robotic data
 
Discovering and Monitoring Product Features and the Opinions on them with OPI...
Discovering and Monitoring Product Features and the Opinions on them with OPI...Discovering and Monitoring Product Features and the Opinions on them with OPI...
Discovering and Monitoring Product Features and the Opinions on them with OPI...
 
NeeMo@OpenCoffeeXX
NeeMo@OpenCoffeeXXNeeMo@OpenCoffeeXX
NeeMo@OpenCoffeeXX
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Recently uploaded (20)

Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 

Density-based Projected Clustering over High Dimensional Data Streams

  • 1. Density-based Projected Clustering over High Dimensional Data Streams Information Engineering and Computer Science Department University of Trento, Italy Institute for Informatics, Ludwig-Maximilians-Universität, München, Germany EIRINI NTOUTSI1, ARTHUR ZIMEK1, THEMIS PALPANAS2, PEER KRÖGER1, HANS-PETER KRIEGEL1 1Institute for Informatics, Ludwig-Maximilians-Universität (LMU) München, Germany 2Information Engineering and Computer Science Department (DISI), University of Trento, Italy HIGH DIMENSIONAL & DYNAMIC DATA: • High dimensionality - overlapping / irrelevant/ locally relevant attributes • Dynamic/ Stream data - only 1 look at the data (upon their arrival) / volatile data ! Clustering upon such kind of data is very challenging!! - both members and dimensions might evolve over time  Most existing approaches deal with 1 aspect of the problem  HPStream assumes a constant #clusters over time OUR SOLUTION (HDDSTREAM): 1st density-based projected clustering algorithm for clustering high dimensional data streams: • High dimensionality  projected clustering • Stream data  online summarize– offline cluster • Density based clustering  no assumption on #clusters/ invariant to outliers/ arbitrary shapes  Quality improvement  Bounded memory  # summaries adapts to the underlying population PROJECTED MICROCLUSTERS: For points C={p1, …, pn} arriving at t1, …, tn, the summary models both content and dimension preferences of C at t. mc ≡mc(C,t) = <CF1(t), CF2(t), W(t)> - CF1(t),/ CF2(t): temporal weighted linear/square sum of the points - W(t): sum of points weights Φ(mc) = <φ1, φ2, .., φd> - j is a preferred dimension if: VARj(mc) ≤ δ CLUSTERS & OUTLIERS • Core projected microclusters (CORE-PMC): (1) radiusΦ(mc) ≤ ε (radius criterion) (2) W(t) ≥ μ (density criterion) (3) PDIM(mc) ≤ π (dimensionality criterion) The role of clusters and outliers in a stream often exchange: • Potential core PMC (PCORE-PMC): (1), (2) relax the density criterion W(t) ≥ β*μ, (3) • Outlier MC (O-MC) : (1), (2) relax density criterionW(t) ≥ β*μ, (3) relax dim. criterion  QUALITY IMPROVEMENT  CHANGE DETECTION & MONITORING NetworkIntrusiondataset ForestsCoverTypesdataset Content summary Dimension preference vector
  • 2. Density-based Projected Clustering over High Dimensional Data Streams Information Engineering and Computer Science Department University of Trento, Italy Institute for Informatics, Ludwig-Maximilians-Universität, München, Germany EIRINI NTOUTSI1, ARTHUR ZIMEK1, THEMIS PALPANAS2, PEER KRÖGER1, HANS-PETER KRIEGEL1 1Institute for Informatics, Ludwig-Maximilians-Universität (LMU) München, Germany 2Information Engineering and Computer Science Department (DISI), University of Trento, Italy HIGH DIMENSIONAL & DYNAMIC DATA: • High dimensionality - overlapping / irrelevant/ locally relevant attributes • Dynamic/ Stream data - only 1 look at the data (upon their arrival) / volatile data ! Clustering upon such kind of data is very challenging!! - both members and dimensions might evolve over time  Most existing approaches deal with 1 aspect of the problem  HPStream assumes a constant #clusters over time OUR SOLUTION (HDDSTREAM): 1st density-based projected clustering algorithm for clustering high dimensional data streams: • High dimensionality  projected clustering • Stream data  online summarize– offline cluster • Density based clustering  no assumption on #clusters/ invariant to outliers/ arbitrary shapes  Quality improvement  Bounded memory  # summaries adapts to the underlying population PROJECTED MICROCLUSTERS: For points C={p1, …, pn} arriving at t1, …, tn, the summary models both content and dimension preferences of C at t. mc ≡mc(C,t) = <CF1(t), CF2(t), W(t)> - CF1(t),/ CF2(t): temporal weighted linear/square sum of the points - W(t): sum of points weights Φ(mc) = <φ1, φ2, .., φd> - j is a preferred dimension if: VARj(mc) ≤ δ CLUSTERS & OUTLIERS • Core projected microclusters (CORE-PMC): (1) radiusΦ(mc) ≤ ε (radius criterion) (2) W(t) ≥ μ (density criterion) (3) PDIM(mc) ≤ π (dimensionality criterion) The role of clusters and outliers in a stream often exchange: • Potential core PMC (PCORE-PMC): (1), (2) relax the density criterion W(t) ≥ β*μ, (3) • Outlier MC (O-MC) : (1), (2) relax density criterionW(t) ≥ β*μ, (3) relax dim. criterion 0 10 20 30 40 50 60 70 80 90 100 200 500 1000 1500 2000 3000 a v g p u rity (% ) window size HPStream HDDStream  QUALITY IMPROVEMENT  CHANGE DETECTION & MONITORING NetworkIntrusiondataset Forests Cover Types dataset Content summary Dimension preference vector