SlideShare a Scribd company logo
1 of 14
Hierarchical Clustering
Techniques
CS306 Presentation
Presented By:
Md Syed Ahamad
Yanshul Sharma
Outline and Reference
▪ Outline
– Introduction
– Its types and Example
– Selected Research papers
– Experiment in some datasets
▪ Reference
– Introduction to the Hierarchical Clustering , Online Edition ©2009 Cambridge UP.
– Elio Masciari, Giuseppe Mazzeo and Carlo Zaniolo: A New, Fast and Accurate
Algorithm for HierarchicalClustering on Euclidean Distances. PAKDD (2) 2013: 111-122.
– Steinbach, M., Karypis, G., Kumar,V., “A Comparison of Document Clustering
Techniques,” University of Minnesota.
CS306 Presentation 2
Introduction
▪ Hierarchical Clustering – clustering given data in hierarchic structure.
– It is structured, more informative than flat clustering.
– Deterministic, Low efficiency
– Important when one of the potential flat clustering problem is concerned.
▪ Most of the flat clustering techniques are concerned with efficiency.
▪ Types
– Agglomerative clustering – bottom up
– DivisiveClustering – top down
CS306 Presentation 3
Hierarchical clustering types
[ Src: http://www.saedsayad.com/images/Clustering_h1.png ]CS306 Presentation 4
Example
[ Src: http://tangibleauditoryinterfaces.de/wp-content/uploads/2010/04/durcheinander-cluster-chart.png ]CS306 Presentation 5
Selected papers
▪ The paper proposed new algorithm called CLUBS.
▪ CLUBS – Clustering Using Binary Splitting.
– Faster than existing algorithm.
– More accurate, robust and impervious to noise.
– Works in complete unsupervised fashion.
– Also works density based clustering.
– It can be used for refining other algorithm’s performances.
▪ Popular algorithm k-means has repeatability problems of results.
– But CLUBS overcomes this problem.
Elio Masciari, Giuseppe Mazzeo and Carlo Zaniolo: A New, Fast and Accurate Algorithm for Hierarchical Clustering
on Euclidean Distances. PAKDD (2) 2013: 111-122.
CS306 Presentation 6
Approach
▪ CLUBS has two phases
– Divisive – original data set is split recursively into mini-clusters through binary
splitting.
▪ May cause a non optimal way.
– Agglomerative – the final mini-clusters are recursively combined into the final
results.
▪ It backtracks previously wrong calculations.
▪ Algorithm exploits SSQ (Sum of Squares) to minimize cost of split
operation.
CS306 Presentation 7
Algorithm
▪ Phase 1:
▪ Definition 1 – binary partition BP.
– d-dimensional data distribution D (multi-dimensional array of integers).
– N – non-zero entries of D
– ρi – range [l…u] on the i-th dimension of D, 1 ≤ l ≤u ≤ n, 1 ≤ i ≤ d, size(ρi) = ub(ρi) −
lb(ρi) + 1 = u − l + 1.
– block b (of D) is a d-tuple {ρ1, . . . , ρd}, vol(b)=size(ρ1) × . . . ×. size(ρd)
– A point x = x1, . . . , xd is chosen, lb(ρi) ≤ xi ≤ ub(ρi).
– x divides the range ρi of b into ρlowi = [lb(ρi)..x]and ρhighi = [(x+1)..ub(ρi)], thus
partitioning b into blow={ρ1, . . . , ρlowi , . . . , ρd } and bhigh = {ρ1, . . . , ρhighi , . . . , ρd }.
– (blow, bhigh ) – binary split, i – dimension splitting, x – position splitting.
CS306 Presentation 8
Algorithm
▪ Definition 2 –stopping condition of BP
– Cs – a cluster , S = (S1, . . . , Sd) = p∈Cs 𝑃 is a vector, p is a point.Centre of Cs,
Cs0=S/N,Qi = p∈Cs 𝑃𝑥𝑃.
CS306 Presentation 9
Algorithm
– Binary splitting stops when avgSSQ > deltSSQ which yields n’ mini-clusters,
where avgSSQ = SSQ0/n & deltSSQ = overall reduction of SSQ.
▪ Phase 2:
– n’ mini-clusters merged by choosing each best pairs (greedy approach).
– Continues until increase in SSQ is greater than avgdeltSSQ.
– It gives the final result.
▪ Complexity – O(n.d.l.s)
CS306 Presentation 10
Example
CS306 Presentation 11
Algorithm
CS306 Presentation 12
Experiment
CS306 Presentation 13
– Dataset 1 – 42 patients into 3 groups
(RM,HN,PM). 98 differentially expressed
genes picked up and analysed.
– Dataset 2 – samples extracted from
human breast cancer cells which consist
of four cell group and analysed.
Ek= Error calculation at 10 clusters
ε = probability that two similar data
belongs to same clusters.
Qk = avg % of points in the k-
neighborhood of a generic point
belonging to the same class of that point.
CS306 Presentation 14
ThankYou

More Related Content

What's hot (20)

CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
K means
K meansK means
K means
 
Clustering part 1
Clustering part 1Clustering part 1
Clustering part 1
 
Intro to MATLAB and K-mean algorithm
Intro to MATLAB and K-mean algorithmIntro to MATLAB and K-mean algorithm
Intro to MATLAB and K-mean algorithm
 
K means clustering
K means clusteringK means clustering
K means clustering
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
K means clustring @jax
K means clustring @jaxK means clustring @jax
K means clustring @jax
 
K-Means manual work
K-Means manual workK-Means manual work
K-Means manual work
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
K means clustering
K means clusteringK means clustering
K means clustering
 
K means
K meansK means
K means
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 

Viewers also liked

Introduction to clustering technologies
Introduction to clustering technologiesIntroduction to clustering technologies
Introduction to clustering technologiesSVCET
 
The Evolution of Business Intelligence: Maturing Enterprise Analytics
The Evolution of Business Intelligence: Maturing Enterprise AnalyticsThe Evolution of Business Intelligence: Maturing Enterprise Analytics
The Evolution of Business Intelligence: Maturing Enterprise AnalyticsLogi Analytics
 
Latest trends in Business Analytics
Latest trends in Business AnalyticsLatest trends in Business Analytics
Latest trends in Business AnalyticsPuneet Bhalla
 
Multidimensional scaling1
Multidimensional scaling1Multidimensional scaling1
Multidimensional scaling1Carlo Magno
 
Multidimensional scaling
Multidimensional scalingMultidimensional scaling
Multidimensional scalingH9460730008
 
Conjoint analysis
Conjoint analysisConjoint analysis
Conjoint analysisSunny Bose
 
Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Sneh Kumari
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster AnalysisDerek Kane
 
Report Writing - Introduction section
Report Writing - Introduction sectionReport Writing - Introduction section
Report Writing - Introduction sectionSherrie Lee
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedSlideShare
 

Viewers also liked (17)

Introduction to clustering technologies
Introduction to clustering technologiesIntroduction to clustering technologies
Introduction to clustering technologies
 
The Evolution of Business Intelligence: Maturing Enterprise Analytics
The Evolution of Business Intelligence: Maturing Enterprise AnalyticsThe Evolution of Business Intelligence: Maturing Enterprise Analytics
The Evolution of Business Intelligence: Maturing Enterprise Analytics
 
Latest trends in Business Analytics
Latest trends in Business AnalyticsLatest trends in Business Analytics
Latest trends in Business Analytics
 
Multidimensional scaling1
Multidimensional scaling1Multidimensional scaling1
Multidimensional scaling1
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Multidimensional scaling
Multidimensional scalingMultidimensional scaling
Multidimensional scaling
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Conjoint Analysis
Conjoint AnalysisConjoint Analysis
Conjoint Analysis
 
Conjoint analysis
Conjoint analysisConjoint analysis
Conjoint analysis
 
factor analysis
factor analysisfactor analysis
factor analysis
 
Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Analysis of variance (ANOVA)
Analysis of variance (ANOVA)
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
REPORT WRITTING
REPORT WRITTINGREPORT WRITTING
REPORT WRITTING
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Report Writing - Introduction section
Report Writing - Introduction sectionReport Writing - Introduction section
Report Writing - Introduction section
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
 

Similar to Hierarchical clustering techniques

Graph and Density Based Clustering
Graph and Density Based ClusteringGraph and Density Based Clustering
Graph and Density Based ClusteringAyushAnand105
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
Lecture8 clustering
Lecture8 clusteringLecture8 clustering
Lecture8 clusteringsidsingh680
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptxGandhiMathy6
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu
 
Lecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptLecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptSyedNahin1
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfVIKASGUPTA127897
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysisguest0edcaf
 
Information Theoretic Co Clustering
Information Theoretic Co ClusteringInformation Theoretic Co Clustering
Information Theoretic Co ClusteringAllenWu
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data miningZHAO Sam
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introductionYan Xu
 

Similar to Hierarchical clustering techniques (20)

Graph and Density Based Clustering
Graph and Density Based ClusteringGraph and Density Based Clustering
Graph and Density Based Clustering
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Lecture8 clustering
Lecture8 clusteringLecture8 clustering
Lecture8 clustering
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
Lecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptLecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.ppt
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Information Theoretic Co Clustering
Information Theoretic Co ClusteringInformation Theoretic Co Clustering
Information Theoretic Co Clustering
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introduction
 

More from Md Syed Ahamad

Bulk-Synchronous-Parallel - BSP
Bulk-Synchronous-Parallel - BSPBulk-Synchronous-Parallel - BSP
Bulk-Synchronous-Parallel - BSPMd Syed Ahamad
 
E mail protocol - SMTP
E mail protocol - SMTPE mail protocol - SMTP
E mail protocol - SMTPMd Syed Ahamad
 
Coap based application for android phones-end
Coap based application for android phones-endCoap based application for android phones-end
Coap based application for android phones-endMd Syed Ahamad
 
Coap based application for android phones
Coap based application for android phonesCoap based application for android phones
Coap based application for android phonesMd Syed Ahamad
 
Gps technology presentation
Gps technology presentationGps technology presentation
Gps technology presentationMd Syed Ahamad
 
Web Application Security II - SQL Injection
Web Application Security II - SQL InjectionWeb Application Security II - SQL Injection
Web Application Security II - SQL InjectionMd Syed Ahamad
 
Web application security I
Web application security IWeb application security I
Web application security IMd Syed Ahamad
 
Sociolinguistic and law
Sociolinguistic and lawSociolinguistic and law
Sociolinguistic and lawMd Syed Ahamad
 
Wlan 802.11n - MAC Sublayer
Wlan 802.11n - MAC SublayerWlan 802.11n - MAC Sublayer
Wlan 802.11n - MAC SublayerMd Syed Ahamad
 

More from Md Syed Ahamad (10)

Bulk-Synchronous-Parallel - BSP
Bulk-Synchronous-Parallel - BSPBulk-Synchronous-Parallel - BSP
Bulk-Synchronous-Parallel - BSP
 
E mail protocol - SMTP
E mail protocol - SMTPE mail protocol - SMTP
E mail protocol - SMTP
 
3rdYearStudentProject
3rdYearStudentProject3rdYearStudentProject
3rdYearStudentProject
 
Coap based application for android phones-end
Coap based application for android phones-endCoap based application for android phones-end
Coap based application for android phones-end
 
Coap based application for android phones
Coap based application for android phonesCoap based application for android phones
Coap based application for android phones
 
Gps technology presentation
Gps technology presentationGps technology presentation
Gps technology presentation
 
Web Application Security II - SQL Injection
Web Application Security II - SQL InjectionWeb Application Security II - SQL Injection
Web Application Security II - SQL Injection
 
Web application security I
Web application security IWeb application security I
Web application security I
 
Sociolinguistic and law
Sociolinguistic and lawSociolinguistic and law
Sociolinguistic and law
 
Wlan 802.11n - MAC Sublayer
Wlan 802.11n - MAC SublayerWlan 802.11n - MAC Sublayer
Wlan 802.11n - MAC Sublayer
 

Recently uploaded

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 

Hierarchical clustering techniques

  • 2. Outline and Reference ▪ Outline – Introduction – Its types and Example – Selected Research papers – Experiment in some datasets ▪ Reference – Introduction to the Hierarchical Clustering , Online Edition ©2009 Cambridge UP. – Elio Masciari, Giuseppe Mazzeo and Carlo Zaniolo: A New, Fast and Accurate Algorithm for HierarchicalClustering on Euclidean Distances. PAKDD (2) 2013: 111-122. – Steinbach, M., Karypis, G., Kumar,V., “A Comparison of Document Clustering Techniques,” University of Minnesota. CS306 Presentation 2
  • 3. Introduction ▪ Hierarchical Clustering – clustering given data in hierarchic structure. – It is structured, more informative than flat clustering. – Deterministic, Low efficiency – Important when one of the potential flat clustering problem is concerned. ▪ Most of the flat clustering techniques are concerned with efficiency. ▪ Types – Agglomerative clustering – bottom up – DivisiveClustering – top down CS306 Presentation 3
  • 4. Hierarchical clustering types [ Src: http://www.saedsayad.com/images/Clustering_h1.png ]CS306 Presentation 4
  • 6. Selected papers ▪ The paper proposed new algorithm called CLUBS. ▪ CLUBS – Clustering Using Binary Splitting. – Faster than existing algorithm. – More accurate, robust and impervious to noise. – Works in complete unsupervised fashion. – Also works density based clustering. – It can be used for refining other algorithm’s performances. ▪ Popular algorithm k-means has repeatability problems of results. – But CLUBS overcomes this problem. Elio Masciari, Giuseppe Mazzeo and Carlo Zaniolo: A New, Fast and Accurate Algorithm for Hierarchical Clustering on Euclidean Distances. PAKDD (2) 2013: 111-122. CS306 Presentation 6
  • 7. Approach ▪ CLUBS has two phases – Divisive – original data set is split recursively into mini-clusters through binary splitting. ▪ May cause a non optimal way. – Agglomerative – the final mini-clusters are recursively combined into the final results. ▪ It backtracks previously wrong calculations. ▪ Algorithm exploits SSQ (Sum of Squares) to minimize cost of split operation. CS306 Presentation 7
  • 8. Algorithm ▪ Phase 1: ▪ Definition 1 – binary partition BP. – d-dimensional data distribution D (multi-dimensional array of integers). – N – non-zero entries of D – ρi – range [l…u] on the i-th dimension of D, 1 ≤ l ≤u ≤ n, 1 ≤ i ≤ d, size(ρi) = ub(ρi) − lb(ρi) + 1 = u − l + 1. – block b (of D) is a d-tuple {ρ1, . . . , ρd}, vol(b)=size(ρ1) × . . . ×. size(ρd) – A point x = x1, . . . , xd is chosen, lb(ρi) ≤ xi ≤ ub(ρi). – x divides the range ρi of b into ρlowi = [lb(ρi)..x]and ρhighi = [(x+1)..ub(ρi)], thus partitioning b into blow={ρ1, . . . , ρlowi , . . . , ρd } and bhigh = {ρ1, . . . , ρhighi , . . . , ρd }. – (blow, bhigh ) – binary split, i – dimension splitting, x – position splitting. CS306 Presentation 8
  • 9. Algorithm ▪ Definition 2 –stopping condition of BP – Cs – a cluster , S = (S1, . . . , Sd) = p∈Cs 𝑃 is a vector, p is a point.Centre of Cs, Cs0=S/N,Qi = p∈Cs 𝑃𝑥𝑃. CS306 Presentation 9
  • 10. Algorithm – Binary splitting stops when avgSSQ > deltSSQ which yields n’ mini-clusters, where avgSSQ = SSQ0/n & deltSSQ = overall reduction of SSQ. ▪ Phase 2: – n’ mini-clusters merged by choosing each best pairs (greedy approach). – Continues until increase in SSQ is greater than avgdeltSSQ. – It gives the final result. ▪ Complexity – O(n.d.l.s) CS306 Presentation 10
  • 13. Experiment CS306 Presentation 13 – Dataset 1 – 42 patients into 3 groups (RM,HN,PM). 98 differentially expressed genes picked up and analysed. – Dataset 2 – samples extracted from human breast cancer cells which consist of four cell group and analysed. Ek= Error calculation at 10 clusters ε = probability that two similar data belongs to same clusters. Qk = avg % of points in the k- neighborhood of a generic point belonging to the same class of that point.