SlideShare a Scribd company logo
1 of 16
Assignment 4: Market Segmentation
Kompella Kaushik
Adusumelli Jaideep
Sushaanth Srirangapathi
Preprocessing of data:
Attribute transformation:
Attribute Transformation
EDU_1 If(EDU==1,1,0)
EDU_2 If(EDU==2,1,0)
EDU_3 If(EDU==3,1,0)
EDU_4 If(EDU==4,1,0)
EDU_5 If(EDU==5,1,0)
EDU_6 If(EDU==6,1,0)
EDU_7 If(EDU==7,1,0)
EDU_8 If(EDU==8,1,0)
EDU_9 If(EDU==9,1,0)
MT_English If(MT==3,1,0)
MY_Gujarati If(MT==4,1,0)
MT_Hindi If(MT==5,1,0)
MT_Kannada If(MT==6,1,0)
MT_Konkani If(MT==8,1,0)
MT_Malyalam If(MT==9,1,0)
MT_Marathi If(MT==10,1,0)
MT_Punjabi If(MT==12,1,0)
MT_Rajasthani If(MT==13,1,0)
MT_Sindhi If(MT==14,1,0)
MT_Tamil If(MT==15,1,0)
MT_Telugu If(MT==16,1,0)
MT_Other If(MT==19,1,0)
MT_Urdu If(MT==17,1,0)
MT_Not_Specifie
d
If(MT==0,1,0)
Maxpurchase max(BrCd24,BrCd272,BrCd286,BrCd352,BrCd481,BrCd5,BrCd55,BrCd55And144)
CS_New if(CS==0||CS==2,0,1)
Maxprop
max([PropCat 10],[PropCat 11],[PropCat 13],[PropCat 12],[PropCat 14],[PropCat 15],[PropCat
5],[PropCat 6],[PropCat 7],[PropCat 8],[PropCat 9])
Maxpropnum
if(MaxProp==[PropCat],5,if(MaxProp==[PropCat],6,if(MaxProp==[PropCat],7,if(MaxProp==[PropC
at 8],8,if(MaxProp==[PropCat],9,if(MaxProp==[PropCat10],10,if(MaxProp==[PropCat
11],11,if(MaxProp==[PropCat 12],12,if(MaxProp==[PropCat 13],13,if(MaxProp==[PropCat
14],14,if(MaxProp==[PropCat 15],15,0)))))))))))
“Maxpurchase” attribute was created to explain the Brand Loyalty.
“EDU_X”, and “MT_XXXXX” attribute were created to transform the attributes “mother Tongue” and
“Education” to binary.
1. Use k-means clustering to identify clusters of households based on
a. The variables that describe purchase behavior (including brand loyalty). How do you evaluate brand
loyalty?
[Variables: #brands, brand runs, total volume, #transactions, value, avg. price, share to other brands, (brand
loyalty)].
The variables used by us are:
Number of Brands, Brand Runs, Total Volume, Number of Transactions, Value, Average Price, Maximum
Purchase, Others999.
The attribute “Others999” was used to depict the share to other Brands.
Brand Loyalty:
The attribute “Maxpurchase” was created to depict brand Loyalty. “Maxpurchase” is the maximum value of
the Brand columns (not including Others999) for a given row and is an indicator of predilection of a household
to a given brand. Here, we have not included the individual brand purchase percentages to cover for the fact
that we are considering brand loyalty and not brand loyalty to a given brand. Among the above attributes
which were used in this process – “Others999” depicts the lack of loyalty, hence this also should be used for
calculating Brand Loyalty. Number of Brands in the purchase indicates to some extent the lack of brand loyalty
and we consider the temporal nature of brand loyalty by including Brand Runs, the consecutive buying runs
for a given set of brands.
Given below are the performances for different clusters with k values ranging from 2 to 5, generated using K-
means Clustering technique.
For k=2:
Performance Matrix:
Avg. within centroid distance:-0.180
Avg. within centroid distance_cluster_0: -0.187
Avg. within centroid distance_cluster_1: -0.169
Davies Bouldin:-1.062
Size of Clusters:(373,227)
Inter Cluster Distance
First Second Inter Cluster Distance
1.0 2.0 0.731
The below graph shows the dependence of the attributes on the differences between the clusters. From the
graph, it is clear that most of the difference is contributed by others999 and maxpurchase attributes.
For k=3:
Performance Matrix:
Avg. within centroid distance:-0.140
Avg. within centroid distance_cluster_0: -0.142
Avg. within centroid distance_cluster_1: -0.160
Avg. within centroid distance_cluster_2: -0.119
Davies Bouldin:-1.255
Size of Clusters:(197, 204,199)
Inter Cluster Distance
First Second Inter Cluster Distance
2.0 3.0 0.500
1.0 2.0 0.701
1.0 3.0 0.868
The below graph shows the dependence of the attributes on the differences between the clusters. From the
graph, it is clear that most of the difference is contributed by others999 and maxpurchase attributes.
For k=4:
Performance Matrix:
Avg. within centroid distance:-0.127
Avg. within centroid distance_cluster_0: -0.200
Avg. within centroid distance_cluster_1: -0.116
Avg. within centroid distance_cluster_2: -0.125
Avg. within centroid distance_cluster_3: -0.121
Davies Bouldin:-1.345
Size of Clusters:(51, 191, 181,177)
Inter Cluster Distance
First Second Inter Cluster Distance
1.0 2.0 0.794
1.0 3.0 0.539
1.0 4.0 0.583
2.0 3.0 0.478
2.0 4.0 0.880
3.0 4.0 0.753
The below graph shows the dependence of the attributes on the differences between the clusters. From the
graph, it is clear that most of the difference is contributed by others999 and maxpurchase attributes.
For k=5:
Performance Matrix:
Avg. within centroid distance:-0.110
Avg. within centroid distance_cluster_0: -0.080
Avg. within centroid distance_cluster_1: -0.116
Avg. within centroid distance_cluster_2: -0.093
Avg. within centroid distance_cluster_3: -0.113
Avg. within centroid distance_cluster_4:-0.217
Davies Bouldin:-1.251
Size of Clusters:(135, 145,100, 180, 40)
Inter Cluster Distance
First Second Inter Cluster Distance
1.0 2.0 0.515
1.0 3.0 0.499
1.0 4.0 0.619
1.0 5.0 0.539
2.0 3.0 0.996
2.0 4.0 0.502
2.0 5.0 0.615
3.0 4.0 1.061
3.0 5.0 0.757
4.0 5.0 0.849
The below graph shows the dependence of the attributes on the differences between the clusters. From the
graph, it is clear that most of the difference is contributed by Value, Total Volume, others999 and maxpurchase
attributes.
Choosingthe bestvalue ofK:
Choosing the best value of K for the given purpose has both objective and subjective criteria. Objective criteria
include consistent group sizes, large inter cluster distance and small values of intra cluster distance. Davies
Bouldin index is another criterion used to measure the overall clustering quality. Table above shows these
metrics for K=2, 3, 4 and 5. Subjective criterion includes the clusters that lend themselves to good explanation.
Based on good group sizes and decent values of intra-cluster distance and davies bouldin index, we select K=3
as our best model.
b. The variables that describe basis-for-purchase.
[Variables: pur_vol_no_promo, pur_vol_promo_6, pur_vol_other, all price categories, selling propositions]
[Note – would you use all selling_propositions? Explore the data.]
The variables used by us are:
Maxprop, Maxpropno, Pr Cat_1, Pr Cat_2, Pr Cat_3, Pr Cat_4, Purvolnopromo%, purvolotherpromo%,
purvolpromo6%.
All Proposition Categories (1-15) compiled as maxProp which has all maximum proposition value and
maxPropNo which has the corresponding Proposition number.
(Proposition categories 10-13 have distributions tending to zeros, but we kept them as there are at least
some points having valid data and the total data points are only 600)
For k=2:
We can notice from the graph below that most of the differences between these two clusters lie with Pr Cat 2,
Pr Cat3 and maxPropNo.
For k=3:
We can notice from the graph below that most of the differences between these three clusters lie with Pr Cat
1, Pr Cat 2, Pr Cat3, maxProp and maxPropNo.
For k=4:
We can notice from the graph below that most of the differences between these four clusters lie with Pr Cat 1,
Pr Cat 2, Pr Cat3 and maxProp.
For k=5:
We can notice from the graph below that most of the differences between these five clusters lie with Pr Cat 1
and maxProp.
As explained in question 1 (a), we chose the best model with good group sizes and decent metrics. Along with
good explainability K=5 turns out to be a good model for the given attributes, with –
Davies Bouldin Index: -1.08
Avg within Centroid distance: -0.168
Avg Inter cluster distance: 0.9959
Size of clusters: (80, 191, 55, 101, 173)
c. The variables that describe both purchase behavior and basis of purchase.
Note: How should k be chosen? Think about how the clusters would be used. It is likely that the marketing
efforts would support 2-5 different promotional approaches.
Note: How should the percentages of total purchases comprised by various brands be treated? Isn’t a
customer who buys all brand A just as loyal as a customer who buys all brand B? What will be the effect on
any distance measure of using the brand share variables as is?
The variables used by us are:
No.of.Trans, Value, BrandRuns, No.of.Brands, Max Brand Purchase, Others999, All Price Categories (1-4),
Pur_vol_no_promo, Pur_vol_promo_6, Pur_vol_other, All Proposition Categories (1-15) [in the form of
maxProp and maxPropNo]
For k=2:
We can notice from the graph below that most of the differences between these two clusters lie with No. of
Brands, Others999, PrCat1,Pr Cat2, Pr Cat3 and maxProp.
For k=3:
We can notice from the graph below that most of the differences between these three clusters lie with No. of
Brands, Others999, PrCat1,Pr Cat2, maxBrandPurchase and maxProp.
For k=4:
We can notice from the graph below that most of the differences between these four clusters lie with Brand
Runs, Others999, PrCat1, maxBrandPurchase and maxProp.
For k=5:
We can notice from the graph below that most of the differences between these five clusters lie with Avg.
Price, Others999, PrCat1, maxBrandPurchase.
Similar to question 1(a), we have selected k=4 as our best model with –
Davies Bouldin index: -1.526
Avg within Centroid Distance: -0.415
Avg Inter-Cluster Distance: 1.128
Size of clusters: (241, 139, 139, 81)
d. Try k-medoids, kernel k-means, agglomerative clustering, and DBSCAN clustering. You do not need to try
all techniques on all combinations in (a)-(c) above; you may pick one set of variables in (a) thru (c) that you
feel may be best suited for addressing the segmentation need. How do different parameter values for the
different techniques affect the clusters obtained? Are the clusters obtained from the different procedures
similar? What might be some reasons for differences in clusters obtained using different procedures? Which
would you pick as your 'best' and why?
For K-Medeoids:
We have tried the various methods as mentioned in the question, the first being k-mediods. We have taken
purchase behavior and basis of purchase variables for all these methods.
For k=2:
We can notice from the graph below that most of the differences between these two clusters lie with Total
Volume, Value, PurchaseVolNoPromo %, Others999, PrCat2 and maxProp.
For k=3:
We can notice from the graph below that most of the differences between these three clusters lie with Avg.
Price, Others999, PrCat2 and maxProp.
For k=4:
We can notice from the graph below that most of the differences between these four clusters lie with Avg. Price,
Others999 and maxProp.
For k=5:
We can notice from the graph below that most of the differences between these five clusters lie with Brand
Runs, Value, Others999 and maxProp.
Again, from the explanation in Question 1(a), the best K turns out to be K=5 for which we get good cluster
sizes, decent cluster metrics and for the clusters.
For K-Means Kernel:
For Kernel k-Means, we tried for different values of K from 2 to 5 and obtained their corresponding clusters.
Kernel K-means just like K-means and K-mediods clustering algorithms are driven by the value of “k”. A
change in the “k” indicates a change in the number of clusters. So the cluster density varies by the K values.
But the clusters we obtained were overlapping onto each other. And we couldn’t find completely distinct
clusters.
For Agglomerative Clustering:
For the Agglomerative Clustering, we used the flattening operator along with the operator to extract the clusters.
Following summarizes the parameters used –
 Mode – Complete Linkage
 Measure Types – Numerical Measures
 Numerical Measure – Euclidean Distance
 Number of clusters (in the flattening cluster operator) – 2 to 5
We did not get good cluster models using this method. This suggests that agglomerative clustering is not a
suitable clustering for this data.
For DBSCAN:
We ran several iterations of DBSCAN, with different combinations of values for parameters. But it seemed to
have no impact on the output.
We have tried with different values of epsilon, min. points and the distance measures, but we see that most of
the data points are going into one cluster and the other clusters have only around 1 or at the maximum 10 data
points.
The reason for poor performance of DBSCAN could be because of the high dimensionality of the data. The
DBSCAN clustering is not suitable for clustering on this data.
Hence, we could not extract any useful information regarding the model performance from Agglomerative and
DBSCAN and k-mean kernel. Between k-medoids and kernel k- means, we would choose k-mediods since it
has better formed clusters.
So, comparing all the best models in each case in question 1 with k-medoids clustering, we chose K-means,
since it gave a better clustering output when compared to that of the k-medoids.
2. (a) Selectwhat you think isthe 'best' segmentation - explainwhyyou think this is the ‘best’.You can also decide on
multiple segmentations,basedondifferent criteria-- for example,basedon purchase behavior,or basisfor
purchase,....( think about how differentclustersmay be useful0.
Based on our analysis using 6 clustering models and multiple iterations including optimizing the parameters in
rapid miner, our best model is a k-means clustering model with k = 4. For this model, the intra cluster distance
is small meaning that the data points in the cluster are similar and the inter cluster distance between the clusters
is good and the size of the clusters is also good. Hence we concluded that this was our best model.
Households in cluster 0 buy the highest number of brands, have the highest brand runs for a great amount of
volume. Most of the purchases for these households seemto be purchases not on promotion. These
households have a low brand loyalty towards our selected brands but have a relatively high brand loyalty for
the brands in others999.Their major purchases comprise of the popular soaps.
Households in cluster 1 buy the least number of brands, have a moderate brand runs for the least amount of
volume. Most of the purchases for these households seemto be purchases not on promotion. These
households have the least brand loyalty for our selected brands and have the highest brand loyalty towards
the brands in others999. Their major purchases include premium soaps.
Households in cluster 2 buy a decent number of brands, have a low number of brand runs but for a great
amount of volume. Most of the purchases for these households seem to be purchases not on promotion.
These households have a high brand loyalty towards our selected brands and a low brand loyalty for all the
other brands. They mostly buy popular soaps.
Households in cluster 3 buy moderately low number of brands, have the least number of brand runs for
almost the same amount of volume in comparison to other clusters. Most of the purchases for these
households seem to be purchases not on promotion. These households seemto have the highest brand
loyalty for the selected brands and are least loyal towards the others and their maximum purchases for
economic/carbolic soaps.
(b) Comment on the characteristics (demographic, brand loyalty and basis-for-purchase) of these clusters.
(This information would be used to guide the development of advertising and promotional campaigns.)
We “joined” the demographics data to the already existing cluster data to see how the demographics changes in
each of the cluster. Since each of the demographic attribute cannot be grouped in the similar manner, we used
sum, average, median and mode functions to get values for these attributes.
The following is the table showing the different clusters and their corresponding demographic data –
Cluster 0 –
 Households are more likely to be aged above 45 years,
 Have a median affluence index of 18,
 With a median average price of 0.227,
 With children below the age of 14,
 With television or broadcast TV cable,
 Speaking Marati, Gujarati and Assameese,
 Females belonging to a Socio Economic class ‘C’,
 With low level of education (10-12 years),
 And with a house size of around 4 people.
These households have a low brand loyalty towards our selected brands but have a relatively high brand
loyalty for the brands in others999.
Most of the purchases for these households seem to be purchases not on promotion with their least
purchases on promo code other than 6. Their major purchases comprise of the popular soaps.
Cluster 1 –
 Households are more likely to be aged above 45 years,
 Have a median affluence index of 15,
 With a median average price of 0.273,
 With most of the children below the age of 14 and a few house with none or not specified,
 With television or broadcast TV cable,
 Speaking Marati majorly,
 Females belonging to a Socio Economic class ‘A’,
 With low level of education (10-12 years),
 And with a house size of around 3 people.
These households have the least brand loyalty for our selected brands and have the highest brand loyalty
towards the brands in others999.
Most of the purchases for these households seem to be purchases not on promotion with their least
purchases on promo code other than 6. Their major purchases comprise of the premium soaps.
Cluster 2 –
 Households are more likely to be aged above 45 years,
 Have a median affluence index of 15,
 With a median average price of 0.194,
 With most of the children below the age of 14 and a few houses with none or not specified,
 With television or broadcast TV cable,
 Speaking Marati and Gujarati,
 Females belonging to a Socio Economic class ‘A’,
 With low level of education (10-12 years),
 And with a house size of around 4 people.
These households have a high brand loyalty towards our selected brands and a low brand loyalty for all the
other brands.
Most of the purchases for these households seem to be purchases not on promotion with their least
purchases on promo code 6. Their major purchases comprise of the popular soaps.
Cluster 3 –
 Households are more likely to be aged above 45 years,
 Have a median affluence index of 10,
 With a median average price of 0.048,
 With most of the children below the age of 14 and a few house with none or not specified,
 With television or broadcast TV cable,
 Speaking Marati and Assameese,
 Females belonging to a Socio Economic class ‘D/E’,
 And with a house size of around 4 people.
These households seem to have the highest brand loyalty for the selected brands and are least loyal towards
the others999.
Most of the purchases for these households seem to be purchases not on promotion with their least
purchases on promo code 6. Their major purchases comprise of the economic/carbolic soaps.
3. For the best segmentation,obtaina descriptionofthe clusters.You may base this on attributes describingthe
clusters(not restrictedto attributesusedfor clustering).Youshouldalso builda decisiontree to helpdescribe the
clusters– how effective isthe tree in identifyingthe differentclusters?Doesthe tree helpinexplaining/interpreting
the differentclusters?(explainwhy/whynot).(Youmay use a decisiontree to helpchoose the ‘best’clustering).
We ran the decision tree with Gini index criterian and the decision tree could identify the clusters distinctly, and
we are also able to interpret the relationship of the clusters with the other variables like: demography,
promotional activities, prices of the product, etc using the decision tree.
Given below is the confusion matrix of the decision tree:
The decision tree appears to be a clearer version as to which attributes determine the forming of which
clusters. It is evident from the decision tree that the variables appearing in the top part of the tree are the
variables that have a maximum influence in forming the clusters. The variables appearing in the top part of the
tree are:
PR Cat_2
MaxProp
Pr Cat_1
Maxpropno
We tried different parameters for the decision tree, but the top nodes remained almost consistent.

More Related Content

What's hot

Μαθηματικά Α΄ Γυμνασίου (58 διαγωνίσματα)
Μαθηματικά Α΄ Γυμνασίου (58 διαγωνίσματα)Μαθηματικά Α΄ Γυμνασίου (58 διαγωνίσματα)
Μαθηματικά Α΄ Γυμνασίου (58 διαγωνίσματα)Kats961
 
επαναληπτικο διαγωνισμα φυσικης β' γυμνασιου
επαναληπτικο διαγωνισμα φυσικης β' γυμνασιουεπαναληπτικο διαγωνισμα φυσικης β' γυμνασιου
επαναληπτικο διαγωνισμα φυσικης β' γυμνασιουΚΑΤΕΡΙΝΑ ΑΡΩΝΗ
 
απολυτήριες εξετάσεις στη βιολογία γ γυμνασίου
απολυτήριες εξετάσεις στη βιολογία γ γυμνασίουαπολυτήριες εξετάσεις στη βιολογία γ γυμνασίου
απολυτήριες εξετάσεις στη βιολογία γ γυμνασίουXristos Koutras
 
Διαγώνισμα 10- 2ο κεφάλαιο ΕΠΑ.Λ Γ Λυκείου
Διαγώνισμα 10- 2ο κεφάλαιο ΕΠΑ.Λ Γ ΛυκείουΔιαγώνισμα 10- 2ο κεφάλαιο ΕΠΑ.Λ Γ Λυκείου
Διαγώνισμα 10- 2ο κεφάλαιο ΕΠΑ.Λ Γ ΛυκείουΜάκης Χατζόπουλος
 
Διαγνωστικό τεστ από την Α΄ στη Β΄ λυκείου
Διαγνωστικό τεστ από την Α΄ στη Β΄ λυκείουΔιαγνωστικό τεστ από την Α΄ στη Β΄ λυκείου
Διαγνωστικό τεστ από την Α΄ στη Β΄ λυκείουΜάκης Χατζόπουλος
 
A A03 Dynameis Ii
A A03 Dynameis IiA A03 Dynameis Ii
A A03 Dynameis IiA Z
 
Διαγώνισμα άλγεβρας Α' λυκείου εξισώσεις - ανισώσεις.pdf
Διαγώνισμα άλγεβρας Α' λυκείου εξισώσεις - ανισώσεις.pdfΔιαγώνισμα άλγεβρας Α' λυκείου εξισώσεις - ανισώσεις.pdf
Διαγώνισμα άλγεβρας Α' λυκείου εξισώσεις - ανισώσεις.pdfelmit2
 
(νεο) ιδιότητες συναρτήσεων προτεινόμενες ασκήσεις
(νεο) ιδιότητες συναρτήσεων προτεινόμενες ασκήσεις(νεο) ιδιότητες συναρτήσεων προτεινόμενες ασκήσεις
(νεο) ιδιότητες συναρτήσεων προτεινόμενες ασκήσειςChristos Loizos
 
Διαγώνισμα β τριμήνου Α γυμνασίου
Διαγώνισμα β τριμήνου Α γυμνασίου Διαγώνισμα β τριμήνου Α γυμνασίου
Διαγώνισμα β τριμήνου Α γυμνασίου peinirtzis
 
Διαγώνισμα Μαθηματικά Κατεύθυνσης Β Λυκείου (Ευθεία - Κύκλος)
Διαγώνισμα Μαθηματικά Κατεύθυνσης Β Λυκείου (Ευθεία - Κύκλος)Διαγώνισμα Μαθηματικά Κατεύθυνσης Β Λυκείου (Ευθεία - Κύκλος)
Διαγώνισμα Μαθηματικά Κατεύθυνσης Β Λυκείου (Ευθεία - Κύκλος)Michael Magkos
 
Δύο διαγωνίσματα προσομοίωσης από Ιδιωτικά Σχολεία Αθήνας
Δύο διαγωνίσματα προσομοίωσης από Ιδιωτικά Σχολεία ΑθήναςΔύο διαγωνίσματα προσομοίωσης από Ιδιωτικά Σχολεία Αθήνας
Δύο διαγωνίσματα προσομοίωσης από Ιδιωτικά Σχολεία ΑθήναςΜάκης Χατζόπουλος
 
Σχέδιο μαθήματος παραγράφου 1.5: Εσωτερικό γινόμενο διανυσμάτων
Σχέδιο μαθήματος παραγράφου 1.5: Εσωτερικό γινόμενο διανυσμάτωνΣχέδιο μαθήματος παραγράφου 1.5: Εσωτερικό γινόμενο διανυσμάτων
Σχέδιο μαθήματος παραγράφου 1.5: Εσωτερικό γινόμενο διανυσμάτωνΜάκης Χατζόπουλος
 
PCA (Principal Component Analysis)
PCA (Principal Component Analysis)PCA (Principal Component Analysis)
PCA (Principal Component Analysis)Luis Serrano
 
Θεματα εξετασεων Βιολογιας Γυμνασιου (Ταξεις Α, Β, Γ)
Θεματα εξετασεων Βιολογιας Γυμνασιου (Ταξεις Α, Β, Γ)Θεματα εξετασεων Βιολογιας Γυμνασιου (Ταξεις Α, Β, Γ)
Θεματα εξετασεων Βιολογιας Γυμνασιου (Ταξεις Α, Β, Γ)Christos Gotzaridis
 

What's hot (20)

Μαθηματικά Α΄ Γυμνασίου (58 διαγωνίσματα)
Μαθηματικά Α΄ Γυμνασίου (58 διαγωνίσματα)Μαθηματικά Α΄ Γυμνασίου (58 διαγωνίσματα)
Μαθηματικά Α΄ Γυμνασίου (58 διαγωνίσματα)
 
επαναληπτικο διαγωνισμα φυσικης β' γυμνασιου
επαναληπτικο διαγωνισμα φυσικης β' γυμνασιουεπαναληπτικο διαγωνισμα φυσικης β' γυμνασιου
επαναληπτικο διαγωνισμα φυσικης β' γυμνασιου
 
απολυτήριες εξετάσεις στη βιολογία γ γυμνασίου
απολυτήριες εξετάσεις στη βιολογία γ γυμνασίουαπολυτήριες εξετάσεις στη βιολογία γ γυμνασίου
απολυτήριες εξετάσεις στη βιολογία γ γυμνασίου
 
Sampling Distributions and Estimators
Sampling Distributions and EstimatorsSampling Distributions and Estimators
Sampling Distributions and Estimators
 
Διαγώνισμα 10- 2ο κεφάλαιο ΕΠΑ.Λ Γ Λυκείου
Διαγώνισμα 10- 2ο κεφάλαιο ΕΠΑ.Λ Γ ΛυκείουΔιαγώνισμα 10- 2ο κεφάλαιο ΕΠΑ.Λ Γ Λυκείου
Διαγώνισμα 10- 2ο κεφάλαιο ΕΠΑ.Λ Γ Λυκείου
 
104 ερωτήσεις θεωρίας
104 ερωτήσεις θεωρίας104 ερωτήσεις θεωρίας
104 ερωτήσεις θεωρίας
 
Διαγνωστικό τεστ από την Α΄ στη Β΄ λυκείου
Διαγνωστικό τεστ από την Α΄ στη Β΄ λυκείουΔιαγνωστικό τεστ από την Α΄ στη Β΄ λυκείου
Διαγνωστικό τεστ από την Α΄ στη Β΄ λυκείου
 
A A03 Dynameis Ii
A A03 Dynameis IiA A03 Dynameis Ii
A A03 Dynameis Ii
 
Διαγώνισμα άλγεβρας Α' λυκείου εξισώσεις - ανισώσεις.pdf
Διαγώνισμα άλγεβρας Α' λυκείου εξισώσεις - ανισώσεις.pdfΔιαγώνισμα άλγεβρας Α' λυκείου εξισώσεις - ανισώσεις.pdf
Διαγώνισμα άλγεβρας Α' λυκείου εξισώσεις - ανισώσεις.pdf
 
Epanalipsi g gymnasiou
Epanalipsi g gymnasiouEpanalipsi g gymnasiou
Epanalipsi g gymnasiou
 
(νεο) ιδιότητες συναρτήσεων προτεινόμενες ασκήσεις
(νεο) ιδιότητες συναρτήσεων προτεινόμενες ασκήσεις(νεο) ιδιότητες συναρτήσεων προτεινόμενες ασκήσεις
(νεο) ιδιότητες συναρτήσεων προτεινόμενες ασκήσεις
 
Διαγώνισμα β τριμήνου Α γυμνασίου
Διαγώνισμα β τριμήνου Α γυμνασίου Διαγώνισμα β τριμήνου Α γυμνασίου
Διαγώνισμα β τριμήνου Α γυμνασίου
 
Διαγώνισμα Μαθηματικά Κατεύθυνσης Β Λυκείου (Ευθεία - Κύκλος)
Διαγώνισμα Μαθηματικά Κατεύθυνσης Β Λυκείου (Ευθεία - Κύκλος)Διαγώνισμα Μαθηματικά Κατεύθυνσης Β Λυκείου (Ευθεία - Κύκλος)
Διαγώνισμα Μαθηματικά Κατεύθυνσης Β Λυκείου (Ευθεία - Κύκλος)
 
Δύο διαγωνίσματα προσομοίωσης από Ιδιωτικά Σχολεία Αθήνας
Δύο διαγωνίσματα προσομοίωσης από Ιδιωτικά Σχολεία ΑθήναςΔύο διαγωνίσματα προσομοίωσης από Ιδιωτικά Σχολεία Αθήνας
Δύο διαγωνίσματα προσομοίωσης από Ιδιωτικά Σχολεία Αθήνας
 
ΑΕΠΠ Ασκήσεις
ΑΕΠΠ ΑσκήσειςΑΕΠΠ Ασκήσεις
ΑΕΠΠ Ασκήσεις
 
Σημειώσεις Β΄ Γυμνασίου σε word
Σημειώσεις Β΄ Γυμνασίου σε wordΣημειώσεις Β΄ Γυμνασίου σε word
Σημειώσεις Β΄ Γυμνασίου σε word
 
Άλγεβρα Β Γυμνασίου
Άλγεβρα Β Γυμνασίου Άλγεβρα Β Γυμνασίου
Άλγεβρα Β Γυμνασίου
 
Σχέδιο μαθήματος παραγράφου 1.5: Εσωτερικό γινόμενο διανυσμάτων
Σχέδιο μαθήματος παραγράφου 1.5: Εσωτερικό γινόμενο διανυσμάτωνΣχέδιο μαθήματος παραγράφου 1.5: Εσωτερικό γινόμενο διανυσμάτων
Σχέδιο μαθήματος παραγράφου 1.5: Εσωτερικό γινόμενο διανυσμάτων
 
PCA (Principal Component Analysis)
PCA (Principal Component Analysis)PCA (Principal Component Analysis)
PCA (Principal Component Analysis)
 
Θεματα εξετασεων Βιολογιας Γυμνασιου (Ταξεις Α, Β, Γ)
Θεματα εξετασεων Βιολογιας Γυμνασιου (Ταξεις Α, Β, Γ)Θεματα εξετασεων Βιολογιας Γυμνασιου (Ταξεις Α, Β, Γ)
Θεματα εξετασεων Βιολογιας Γυμνασιου (Ταξεις Α, Β, Γ)
 

Similar to Clustering Techniques Review

Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning ClusteringRupak Roy
 
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...Smarten Augmented Analytics
 
Digital electronics k map comparators and their function
Digital electronics k map comparators and their functionDigital electronics k map comparators and their function
Digital electronics k map comparators and their functionkumarankit06875
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Measures of dispersion qt pgdm 1st trisemester
Measures of dispersion qt pgdm 1st trisemester Measures of dispersion qt pgdm 1st trisemester
Measures of dispersion qt pgdm 1st trisemester Karan Kukreja
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIRaouf KESKES
 
Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...
 Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ... Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...
Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...Gota Morota
 
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...Vasileios Lampos
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clusteringishmecse13
 
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066rahulsm27
 
PATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptxPATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptxPATTEMJAGADESH
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_applicationCemal Ardil
 
Finding the best K- Knn
Finding the best K- KnnFinding the best K- Knn
Finding the best K- Knnstanleysayanka
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clusteringMegha Sharma
 

Similar to Clustering Techniques Review (20)

Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
 
FLIPKART SAMSUNG
FLIPKART SAMSUNGFLIPKART SAMSUNG
FLIPKART SAMSUNG
 
Digital electronics k map comparators and their function
Digital electronics k map comparators and their functionDigital electronics k map comparators and their function
Digital electronics k map comparators and their function
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
Measures of dispersion qt pgdm 1st trisemester
Measures of dispersion qt pgdm 1st trisemester Measures of dispersion qt pgdm 1st trisemester
Measures of dispersion qt pgdm 1st trisemester
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
 
Clustering
ClusteringClustering
Clustering
 
Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...
 Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ... Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...
Garge, Nikhil et. al. 2005. Reproducible Clusters from Microarray Research: ...
 
Clusters (4).pptx
Clusters (4).pptxClusters (4).pptx
Clusters (4).pptx
 
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...
Inferring the Socioeconomic Status of Social Media Users based on Behaviour a...
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Xgboost
XgboostXgboost
Xgboost
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066
 
PATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptxPATTEM JAGADESH_21mt0269_research proposal presentation.pptx
PATTEM JAGADESH_21mt0269_research proposal presentation.pptx
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_application
 
Finding the best K- Knn
Finding the best K- KnnFinding the best K- Knn
Finding the best K- Knn
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 

Recently uploaded

PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 

Recently uploaded (20)

PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 

Clustering Techniques Review

  • 1. Assignment 4: Market Segmentation Kompella Kaushik Adusumelli Jaideep Sushaanth Srirangapathi
  • 2. Preprocessing of data: Attribute transformation: Attribute Transformation EDU_1 If(EDU==1,1,0) EDU_2 If(EDU==2,1,0) EDU_3 If(EDU==3,1,0) EDU_4 If(EDU==4,1,0) EDU_5 If(EDU==5,1,0) EDU_6 If(EDU==6,1,0) EDU_7 If(EDU==7,1,0) EDU_8 If(EDU==8,1,0) EDU_9 If(EDU==9,1,0) MT_English If(MT==3,1,0) MY_Gujarati If(MT==4,1,0) MT_Hindi If(MT==5,1,0) MT_Kannada If(MT==6,1,0) MT_Konkani If(MT==8,1,0) MT_Malyalam If(MT==9,1,0) MT_Marathi If(MT==10,1,0) MT_Punjabi If(MT==12,1,0) MT_Rajasthani If(MT==13,1,0) MT_Sindhi If(MT==14,1,0) MT_Tamil If(MT==15,1,0) MT_Telugu If(MT==16,1,0) MT_Other If(MT==19,1,0) MT_Urdu If(MT==17,1,0) MT_Not_Specifie d If(MT==0,1,0) Maxpurchase max(BrCd24,BrCd272,BrCd286,BrCd352,BrCd481,BrCd5,BrCd55,BrCd55And144) CS_New if(CS==0||CS==2,0,1) Maxprop max([PropCat 10],[PropCat 11],[PropCat 13],[PropCat 12],[PropCat 14],[PropCat 15],[PropCat 5],[PropCat 6],[PropCat 7],[PropCat 8],[PropCat 9]) Maxpropnum if(MaxProp==[PropCat],5,if(MaxProp==[PropCat],6,if(MaxProp==[PropCat],7,if(MaxProp==[PropC at 8],8,if(MaxProp==[PropCat],9,if(MaxProp==[PropCat10],10,if(MaxProp==[PropCat 11],11,if(MaxProp==[PropCat 12],12,if(MaxProp==[PropCat 13],13,if(MaxProp==[PropCat 14],14,if(MaxProp==[PropCat 15],15,0))))))))))) “Maxpurchase” attribute was created to explain the Brand Loyalty. “EDU_X”, and “MT_XXXXX” attribute were created to transform the attributes “mother Tongue” and “Education” to binary.
  • 3. 1. Use k-means clustering to identify clusters of households based on a. The variables that describe purchase behavior (including brand loyalty). How do you evaluate brand loyalty? [Variables: #brands, brand runs, total volume, #transactions, value, avg. price, share to other brands, (brand loyalty)]. The variables used by us are: Number of Brands, Brand Runs, Total Volume, Number of Transactions, Value, Average Price, Maximum Purchase, Others999. The attribute “Others999” was used to depict the share to other Brands. Brand Loyalty: The attribute “Maxpurchase” was created to depict brand Loyalty. “Maxpurchase” is the maximum value of the Brand columns (not including Others999) for a given row and is an indicator of predilection of a household to a given brand. Here, we have not included the individual brand purchase percentages to cover for the fact that we are considering brand loyalty and not brand loyalty to a given brand. Among the above attributes which were used in this process – “Others999” depicts the lack of loyalty, hence this also should be used for calculating Brand Loyalty. Number of Brands in the purchase indicates to some extent the lack of brand loyalty and we consider the temporal nature of brand loyalty by including Brand Runs, the consecutive buying runs for a given set of brands. Given below are the performances for different clusters with k values ranging from 2 to 5, generated using K- means Clustering technique. For k=2: Performance Matrix: Avg. within centroid distance:-0.180 Avg. within centroid distance_cluster_0: -0.187 Avg. within centroid distance_cluster_1: -0.169 Davies Bouldin:-1.062 Size of Clusters:(373,227) Inter Cluster Distance First Second Inter Cluster Distance 1.0 2.0 0.731 The below graph shows the dependence of the attributes on the differences between the clusters. From the graph, it is clear that most of the difference is contributed by others999 and maxpurchase attributes.
  • 4. For k=3: Performance Matrix: Avg. within centroid distance:-0.140 Avg. within centroid distance_cluster_0: -0.142 Avg. within centroid distance_cluster_1: -0.160 Avg. within centroid distance_cluster_2: -0.119 Davies Bouldin:-1.255 Size of Clusters:(197, 204,199) Inter Cluster Distance First Second Inter Cluster Distance 2.0 3.0 0.500 1.0 2.0 0.701 1.0 3.0 0.868 The below graph shows the dependence of the attributes on the differences between the clusters. From the graph, it is clear that most of the difference is contributed by others999 and maxpurchase attributes.
  • 5. For k=4: Performance Matrix: Avg. within centroid distance:-0.127 Avg. within centroid distance_cluster_0: -0.200 Avg. within centroid distance_cluster_1: -0.116 Avg. within centroid distance_cluster_2: -0.125 Avg. within centroid distance_cluster_3: -0.121 Davies Bouldin:-1.345 Size of Clusters:(51, 191, 181,177) Inter Cluster Distance First Second Inter Cluster Distance 1.0 2.0 0.794 1.0 3.0 0.539 1.0 4.0 0.583 2.0 3.0 0.478 2.0 4.0 0.880 3.0 4.0 0.753 The below graph shows the dependence of the attributes on the differences between the clusters. From the graph, it is clear that most of the difference is contributed by others999 and maxpurchase attributes. For k=5: Performance Matrix: Avg. within centroid distance:-0.110 Avg. within centroid distance_cluster_0: -0.080 Avg. within centroid distance_cluster_1: -0.116 Avg. within centroid distance_cluster_2: -0.093 Avg. within centroid distance_cluster_3: -0.113 Avg. within centroid distance_cluster_4:-0.217 Davies Bouldin:-1.251 Size of Clusters:(135, 145,100, 180, 40)
  • 6. Inter Cluster Distance First Second Inter Cluster Distance 1.0 2.0 0.515 1.0 3.0 0.499 1.0 4.0 0.619 1.0 5.0 0.539 2.0 3.0 0.996 2.0 4.0 0.502 2.0 5.0 0.615 3.0 4.0 1.061 3.0 5.0 0.757 4.0 5.0 0.849 The below graph shows the dependence of the attributes on the differences between the clusters. From the graph, it is clear that most of the difference is contributed by Value, Total Volume, others999 and maxpurchase attributes. Choosingthe bestvalue ofK: Choosing the best value of K for the given purpose has both objective and subjective criteria. Objective criteria include consistent group sizes, large inter cluster distance and small values of intra cluster distance. Davies Bouldin index is another criterion used to measure the overall clustering quality. Table above shows these metrics for K=2, 3, 4 and 5. Subjective criterion includes the clusters that lend themselves to good explanation. Based on good group sizes and decent values of intra-cluster distance and davies bouldin index, we select K=3 as our best model. b. The variables that describe basis-for-purchase. [Variables: pur_vol_no_promo, pur_vol_promo_6, pur_vol_other, all price categories, selling propositions] [Note – would you use all selling_propositions? Explore the data.] The variables used by us are:
  • 7. Maxprop, Maxpropno, Pr Cat_1, Pr Cat_2, Pr Cat_3, Pr Cat_4, Purvolnopromo%, purvolotherpromo%, purvolpromo6%. All Proposition Categories (1-15) compiled as maxProp which has all maximum proposition value and maxPropNo which has the corresponding Proposition number. (Proposition categories 10-13 have distributions tending to zeros, but we kept them as there are at least some points having valid data and the total data points are only 600) For k=2: We can notice from the graph below that most of the differences between these two clusters lie with Pr Cat 2, Pr Cat3 and maxPropNo. For k=3: We can notice from the graph below that most of the differences between these three clusters lie with Pr Cat 1, Pr Cat 2, Pr Cat3, maxProp and maxPropNo. For k=4: We can notice from the graph below that most of the differences between these four clusters lie with Pr Cat 1, Pr Cat 2, Pr Cat3 and maxProp.
  • 8. For k=5: We can notice from the graph below that most of the differences between these five clusters lie with Pr Cat 1 and maxProp. As explained in question 1 (a), we chose the best model with good group sizes and decent metrics. Along with good explainability K=5 turns out to be a good model for the given attributes, with – Davies Bouldin Index: -1.08 Avg within Centroid distance: -0.168 Avg Inter cluster distance: 0.9959 Size of clusters: (80, 191, 55, 101, 173) c. The variables that describe both purchase behavior and basis of purchase. Note: How should k be chosen? Think about how the clusters would be used. It is likely that the marketing efforts would support 2-5 different promotional approaches. Note: How should the percentages of total purchases comprised by various brands be treated? Isn’t a customer who buys all brand A just as loyal as a customer who buys all brand B? What will be the effect on any distance measure of using the brand share variables as is? The variables used by us are:
  • 9. No.of.Trans, Value, BrandRuns, No.of.Brands, Max Brand Purchase, Others999, All Price Categories (1-4), Pur_vol_no_promo, Pur_vol_promo_6, Pur_vol_other, All Proposition Categories (1-15) [in the form of maxProp and maxPropNo] For k=2: We can notice from the graph below that most of the differences between these two clusters lie with No. of Brands, Others999, PrCat1,Pr Cat2, Pr Cat3 and maxProp. For k=3: We can notice from the graph below that most of the differences between these three clusters lie with No. of Brands, Others999, PrCat1,Pr Cat2, maxBrandPurchase and maxProp. For k=4: We can notice from the graph below that most of the differences between these four clusters lie with Brand Runs, Others999, PrCat1, maxBrandPurchase and maxProp.
  • 10. For k=5: We can notice from the graph below that most of the differences between these five clusters lie with Avg. Price, Others999, PrCat1, maxBrandPurchase. Similar to question 1(a), we have selected k=4 as our best model with – Davies Bouldin index: -1.526 Avg within Centroid Distance: -0.415 Avg Inter-Cluster Distance: 1.128 Size of clusters: (241, 139, 139, 81) d. Try k-medoids, kernel k-means, agglomerative clustering, and DBSCAN clustering. You do not need to try all techniques on all combinations in (a)-(c) above; you may pick one set of variables in (a) thru (c) that you feel may be best suited for addressing the segmentation need. How do different parameter values for the different techniques affect the clusters obtained? Are the clusters obtained from the different procedures similar? What might be some reasons for differences in clusters obtained using different procedures? Which would you pick as your 'best' and why? For K-Medeoids: We have tried the various methods as mentioned in the question, the first being k-mediods. We have taken purchase behavior and basis of purchase variables for all these methods. For k=2: We can notice from the graph below that most of the differences between these two clusters lie with Total Volume, Value, PurchaseVolNoPromo %, Others999, PrCat2 and maxProp.
  • 11. For k=3: We can notice from the graph below that most of the differences between these three clusters lie with Avg. Price, Others999, PrCat2 and maxProp. For k=4: We can notice from the graph below that most of the differences between these four clusters lie with Avg. Price, Others999 and maxProp. For k=5: We can notice from the graph below that most of the differences between these five clusters lie with Brand Runs, Value, Others999 and maxProp.
  • 12. Again, from the explanation in Question 1(a), the best K turns out to be K=5 for which we get good cluster sizes, decent cluster metrics and for the clusters. For K-Means Kernel: For Kernel k-Means, we tried for different values of K from 2 to 5 and obtained their corresponding clusters. Kernel K-means just like K-means and K-mediods clustering algorithms are driven by the value of “k”. A change in the “k” indicates a change in the number of clusters. So the cluster density varies by the K values. But the clusters we obtained were overlapping onto each other. And we couldn’t find completely distinct clusters. For Agglomerative Clustering: For the Agglomerative Clustering, we used the flattening operator along with the operator to extract the clusters. Following summarizes the parameters used –  Mode – Complete Linkage  Measure Types – Numerical Measures  Numerical Measure – Euclidean Distance  Number of clusters (in the flattening cluster operator) – 2 to 5 We did not get good cluster models using this method. This suggests that agglomerative clustering is not a suitable clustering for this data. For DBSCAN: We ran several iterations of DBSCAN, with different combinations of values for parameters. But it seemed to have no impact on the output. We have tried with different values of epsilon, min. points and the distance measures, but we see that most of the data points are going into one cluster and the other clusters have only around 1 or at the maximum 10 data points. The reason for poor performance of DBSCAN could be because of the high dimensionality of the data. The DBSCAN clustering is not suitable for clustering on this data. Hence, we could not extract any useful information regarding the model performance from Agglomerative and DBSCAN and k-mean kernel. Between k-medoids and kernel k- means, we would choose k-mediods since it has better formed clusters. So, comparing all the best models in each case in question 1 with k-medoids clustering, we chose K-means, since it gave a better clustering output when compared to that of the k-medoids.
  • 13. 2. (a) Selectwhat you think isthe 'best' segmentation - explainwhyyou think this is the ‘best’.You can also decide on multiple segmentations,basedondifferent criteria-- for example,basedon purchase behavior,or basisfor purchase,....( think about how differentclustersmay be useful0. Based on our analysis using 6 clustering models and multiple iterations including optimizing the parameters in rapid miner, our best model is a k-means clustering model with k = 4. For this model, the intra cluster distance is small meaning that the data points in the cluster are similar and the inter cluster distance between the clusters is good and the size of the clusters is also good. Hence we concluded that this was our best model. Households in cluster 0 buy the highest number of brands, have the highest brand runs for a great amount of volume. Most of the purchases for these households seemto be purchases not on promotion. These households have a low brand loyalty towards our selected brands but have a relatively high brand loyalty for the brands in others999.Their major purchases comprise of the popular soaps. Households in cluster 1 buy the least number of brands, have a moderate brand runs for the least amount of volume. Most of the purchases for these households seemto be purchases not on promotion. These households have the least brand loyalty for our selected brands and have the highest brand loyalty towards the brands in others999. Their major purchases include premium soaps. Households in cluster 2 buy a decent number of brands, have a low number of brand runs but for a great amount of volume. Most of the purchases for these households seem to be purchases not on promotion. These households have a high brand loyalty towards our selected brands and a low brand loyalty for all the other brands. They mostly buy popular soaps. Households in cluster 3 buy moderately low number of brands, have the least number of brand runs for almost the same amount of volume in comparison to other clusters. Most of the purchases for these households seem to be purchases not on promotion. These households seemto have the highest brand loyalty for the selected brands and are least loyal towards the others and their maximum purchases for economic/carbolic soaps. (b) Comment on the characteristics (demographic, brand loyalty and basis-for-purchase) of these clusters. (This information would be used to guide the development of advertising and promotional campaigns.) We “joined” the demographics data to the already existing cluster data to see how the demographics changes in each of the cluster. Since each of the demographic attribute cannot be grouped in the similar manner, we used sum, average, median and mode functions to get values for these attributes.
  • 14. The following is the table showing the different clusters and their corresponding demographic data – Cluster 0 –  Households are more likely to be aged above 45 years,  Have a median affluence index of 18,  With a median average price of 0.227,  With children below the age of 14,  With television or broadcast TV cable,  Speaking Marati, Gujarati and Assameese,  Females belonging to a Socio Economic class ‘C’,  With low level of education (10-12 years),  And with a house size of around 4 people. These households have a low brand loyalty towards our selected brands but have a relatively high brand loyalty for the brands in others999. Most of the purchases for these households seem to be purchases not on promotion with their least purchases on promo code other than 6. Their major purchases comprise of the popular soaps. Cluster 1 –  Households are more likely to be aged above 45 years,  Have a median affluence index of 15,  With a median average price of 0.273,  With most of the children below the age of 14 and a few house with none or not specified,  With television or broadcast TV cable,  Speaking Marati majorly,  Females belonging to a Socio Economic class ‘A’,  With low level of education (10-12 years),  And with a house size of around 3 people. These households have the least brand loyalty for our selected brands and have the highest brand loyalty towards the brands in others999. Most of the purchases for these households seem to be purchases not on promotion with their least purchases on promo code other than 6. Their major purchases comprise of the premium soaps. Cluster 2 –  Households are more likely to be aged above 45 years,  Have a median affluence index of 15,  With a median average price of 0.194,  With most of the children below the age of 14 and a few houses with none or not specified,  With television or broadcast TV cable,  Speaking Marati and Gujarati,  Females belonging to a Socio Economic class ‘A’,  With low level of education (10-12 years),  And with a house size of around 4 people. These households have a high brand loyalty towards our selected brands and a low brand loyalty for all the other brands.
  • 15. Most of the purchases for these households seem to be purchases not on promotion with their least purchases on promo code 6. Their major purchases comprise of the popular soaps. Cluster 3 –  Households are more likely to be aged above 45 years,  Have a median affluence index of 10,  With a median average price of 0.048,  With most of the children below the age of 14 and a few house with none or not specified,  With television or broadcast TV cable,  Speaking Marati and Assameese,  Females belonging to a Socio Economic class ‘D/E’,  And with a house size of around 4 people. These households seem to have the highest brand loyalty for the selected brands and are least loyal towards the others999. Most of the purchases for these households seem to be purchases not on promotion with their least purchases on promo code 6. Their major purchases comprise of the economic/carbolic soaps. 3. For the best segmentation,obtaina descriptionofthe clusters.You may base this on attributes describingthe clusters(not restrictedto attributesusedfor clustering).Youshouldalso builda decisiontree to helpdescribe the clusters– how effective isthe tree in identifyingthe differentclusters?Doesthe tree helpinexplaining/interpreting the differentclusters?(explainwhy/whynot).(Youmay use a decisiontree to helpchoose the ‘best’clustering). We ran the decision tree with Gini index criterian and the decision tree could identify the clusters distinctly, and we are also able to interpret the relationship of the clusters with the other variables like: demography, promotional activities, prices of the product, etc using the decision tree. Given below is the confusion matrix of the decision tree: The decision tree appears to be a clearer version as to which attributes determine the forming of which clusters. It is evident from the decision tree that the variables appearing in the top part of the tree are the variables that have a maximum influence in forming the clusters. The variables appearing in the top part of the tree are: PR Cat_2 MaxProp Pr Cat_1
  • 16. Maxpropno We tried different parameters for the decision tree, but the top nodes remained almost consistent.