SlideShare a Scribd company logo
Cluster Analysis
Kohonen Networks
Kohonen Networks – Self Organizing Map
• Kohonen networks were introduced in 1982 by Finnish researcher Tuevo Kohonen
• Although applied initially to image and sound analyses, Kohonen networks are
nevertheless an effective mechanism for clustering analysis
• Kohonen networks represent a type of self-organizing map (SOM), which itself
represents a special class of neural networks
• The goal of SOMs is to convert a complex high-dimensional input signal into a
simpler low-dimensional discrete map - nicely appropriate for cluster analysis,
where underlying hidden patterns among records and fields are sought
• SOMs structure the output nodes into clusters of nodes, where nodes in closer
proximity are more similar to each other than to other nodes that are farther apart
• Some researchers have shown that SOMs represent a nonlinear generalization of
principal components analysis, another dimension-reduction technique
• SOMs are based on competitive learning, where the output nodes compete among
themselves to be the winning node (or neuron), the only node to be activated by a
particular input observation
Clustering Task
• A typical SOM architecture is shown in the following figure
• The input layer is shown at the bottom of the figure, with one input node for each
field. Just as with neural networks, these input nodes perform no processing
themselves, but simply pass the field input values along downstream
Prerequisites of Clustering
• Like neural networks, SOMs are feedforward and completely connected
• Feedforward networks do not allow looping or cycling.
• Completely connected means that every node in a given layer is connected to every
node in the next layer, although not to other nodes in the same layer
• Like neural networks, each connection between nodes has a weight associated with
it, which at initialization is assigned randomly to a value between 0 and 1 
Adjusting these weights represents the key for the learning mechanism in both
neural networks and SOMs
• Variable values need to be normalized or standardized, just as for neural networks,
so that certain variables do not overwhelm others in the learning algorithm
• Unlike most neural networks SOMs have no hidden layer
• The data from the input layer is passed along directly to the output layer
• The output layer is represented in the form of a lattice, usually in one or two
dimensions, and typically in the shape of a rectangle, although other shapes, such as
hexagons, may be used
• The output layer shown in the earlier figure is a 3 × 3 square
Working of SOM
• For a given record (instance), a particular field value is forwarded from a particular
input node to every node in the output layer
• For example, suppose that the normalized age and income values for the first record
in the data set are 0.69 and 0.88, respectively.
• The 0.69 value would enter the SOM through the input node associated with age,
and this node would pass this value of 0.69 to every node in the output layer
• Similarly, the 0.88 value would be distributed through the income input node to
every node in the output layer
• These values, together with the weights assigned to each of the connections, would
determine the values of a scoring function (such as Euclidean distance) for each
output node
• The output node with the “best” outcome from the scoring function would then
be designated as the winning node
Working of SOM
• SOMs exhibit three characteristic processes:
1. Competition
• As mentioned above, the output nodes compete with each other to produce the best
value for a particular scoring function, most commonly the Euclidean distance
• In this case, the output node that has the smallest Euclidean distance between the
field inputs and the connection weights would be declared the winner
2. Cooperation.
• The winning node therefore becomes the center of a neighborhood of excited neurons.
This emulates the behavior of human neurons, which are sensitive to the output of other
neurons in their immediate neighborhood.
• In SOMs, all the nodes in this neighborhood share in the “excitement” or “reward”
earned by the winning nodes, that of adaptation. Thus, even though the nodes in the
output layer are not connected directly, they tend to share common features, due to
this neighborliness parameter
3. Adaptation.
• The nodes in the neighborhood of the winning node participate in adaptation, that is,
learning The weights of these nodes are adjusted so as to further improve the score
function
• In other words, these nodes will thereby have an increased chance of winning the
competition once again, for a similar set of field values
Kohonen Networks
• Kohonen networks are SOMs that exhibit Kohonen learning
• Suppose that we consider the set of m field values for the nth record to be an input
vector and the current set of m weights for a particular
output node j to be a weight vector
• In Kohonen learning, the nodes in the neighborhood of the winning node adjust
their weights using a linear combination of the input vector and the current weight
vector
• Kohonen indicates that the learning rate 𝜂 should be a decreasing function of
training epochs (runs through the data set) and that a linearly or geometrically
decreasing 𝜂 is satisfactory for most purposes
Algorithm for Kohonen Networks
• The algorithm for Kohonen Networks is shown below:
For each input vector x, perform the following steps:
• Competition : For each output node j, calculate the value of the
scoring function. For example, for Euclidean distance,
Find the winning node J that minimizes over all output nodes
• Cooperation : Identify all output nodes j within the topological neighborhood of
J defined by the neighborhood size R.
• Updation : Adjust the weights:
Adjust the learning rate and neighborhood size, as needed
• Stop when the termination criteria are met
A Worked Example
• Suppose that we have a data set with two attributes, age and income, which have
already been normalized, and suppose that we would like to use a 2 × 2 Kohonen
network to uncover hidden clusters in the data set. The Network is shown below:
A Worked Example
• With such a small network, we set the neighborhood size to be R = 0, so that only
the winning node will be awarded the opportunity to adjust its weight
• Also, we set the learning rate 𝜂 to be 0.5
• Assume that the weights have been randomly initialized as follows
• For the first input vector, x1 = (0.8, 0.8), we perform the following competition,
cooperation, and adaptation sequence
• Competition Step : Compute the Euclidean Distances between the input and weights
for four of the output nodes
• So, Node 1 is the winner here, why? As it might be seen that the input vector (0.8,
0.8) is most similar to the weight (0.9, 0.8)
• In other words, we may expect node 1 to uncover a cluster of older, high-income
persons
A Worked Example
• Cooperation. In this simple example, we have set the neighborhood size R = 0 so that
the level of cooperation among output nodes is nil! Therefore, only the winning
node, node 1, will be rewarded with a weight adjustment. (We omit this step in the
remainder of the example.)
• Adaptation. For the winning node, node 1, the weights are adjusted as follows: For j
= 1 (node 1), n = 1 (the first record) and learning rate 𝜂 = 0.5, for each of the fields
this becomes
• For age:
• For Income :
• Note the type of adjustment that takes place. The weights are nudged in the
direction of the fields’ values of the input record. That is, w11, the weight on the age
connection for the winning node, was originally 0.9, but was adjusted in the
direction of the normalized value for age in the first record, 0.8. As the learning rate
𝜂 = 0.5, this adjustment is half (0.5) of the distance between the current weight and
the field value
• This adjustment will help node 1 to become even more proficient at capturing the
records of older, high-income persons
A Worked Example
• Let's use this concept to 3 more input points x2 = (0.8, 0.1), x3 = (0.2, 0.9) and x1 =
(0.1, 0.1) with the same initial weights
• For input 2, the distances are computed as 0.78, 0.14, 0.99 and 0.71 and Output
node 2 is the winner, the updated weights are 0.85 and 0.15
• For input 3, the distances are computed as 0.66, 0.99, 0.14 and 0.71 and Output
node 3 is the winner, the updated weights are 0.15 and 0.85
• For input 4, the distances are computed as 1.03, 0.75, 0.75 and 0.10 and Output
node 4 is the winner, the updated weights are 0.1 and 0.10
• Consolidated outcome is as follows
Cluster Associated with Description
1 Node 1 Older person with high
income
2 Node 2 Older person with low
income
3 Node 3 Younger person with
high income
4 Node 4 Younger person with
low income
A Bigger Example
• Let’s apply the Kohonen Net to the Churn dataset
• Recall that the data set contains 20 variables worth of information about 3333
customers, along with an indication of whether that customer churned (left the
company) or not
• Total 9 variables/attributes are used for clustering the customers – 2 Categorical
(International plan and Voice Mail plan) and 7 continuous (Account length, voice
mail messages, day minutes, evening minutes, night minutes, international minutes,
and customer service calls)
• Z-score standardization has been adopted for standardizing the continuous variables
• A 3X3 Kohonen Net architecture has been adopted
• The learning of parameters were set as follows :
• For the first 20 cycles (passes through the data set), the neighborhood size was
set at R = 2, and the learning rate was set to decay linearly starting at 𝜂 = 0.3
• Then, for the next 150 cycles, the neighborhood size was reset to R = 1, while
the learning rate was allowed to decay linearly from 𝜂 = 0.3 to 𝜂 = 0
A Bigger Example
• The net architecture is given below :
A Bigger Example
• As it turned out, the Kohonen algorithm used only six of the nine available output
nodes, as shown in the following figure with output nodes 01, 11, and 21 being
pruned
A Bigger Example - Interpretation
• Can we develop the Cluster profiles?
• Consider International Plan = Yes v/s International Plan = No
• International Plan = Yes customers reside exclusively in Nodes 12 and 22
A Bigger Example - Interpretation
• Can we develop the Cluster profiles?
• Consider Voice mail = Yes v/s Voice main = No
• The three clusters along the bottom row (i.e., Cluster 00, Cluster 10, and Cluster 20)
contain only non-adopters of the VoiceMail Plan
• Clusters 02 and 12 contain only adopters of the VoiceMail Plan
• Cluster 22 contains mostly non-adopters and also some adopters of the VoiceMail
A Bigger Example - Interpretation
• Recall that because of the neighborliness parameter, clusters that are closer
together should be more similar than clusters that are farther apart.
• Note that all International Plan adopters reside in contiguous (neighboring) clusters,
as do all non-adopters. Similarly for later figure, except that Cluster 22 contains a
mixture
• We see that Cluster 12 represents a special subset of customers, those who have
adopted both the International Plan and the VoiceMail Plan  even though this
subset represents only 2.4% of the customers but a well defined one and the
Kohonen network uncovered it
• Next we interpret the clusters according to the continuous predictors
A Bigger Example - Interpretation
• The following figure provides information about how the values of all the variables
are distributed among the clusters, with one column per cluster and one row per
variable
A Bigger Example - Interpretation
• The darker rows indicate the more important variables, that is, the variables that
proved more useful for discriminating among the clusters
• Consider Account Length_Z. Cluster 00 contains customers who tend to have been
with the company for a long time, that is, their account lengths tend to be on the
large side. Contrast this with Cluster 20, whose customers tend to be fairly new
• For the continuous variables, the data analyst should report the means for each
variable, for each cluster, along with an assessment of whether the difference in
means across the clusters is significant
• It is important that the means reported to the client appear on the original
(untransformed) scale, and not on the Z scale or min–max scale, so that the client
may better understand the clusters
A Bigger Example - Interpretation
• The following table provides these means, along with the results of an analysis of
variance for assessing whether the difference in means across clusters is significant
A Bigger Example - Interpretation
• Each row contains the information for one numerical variable, with one analysis of
variance for each row. Each cell contains the cluster mean, standard deviation,
standard error (standard deviation∕√cluster count), and cluster count
• The degrees of freedom are df1 = k - 1 = 6 - 1 = 5 and df2 = N - k = 3333 - 6 = 3327
• The F-test statistic is the value of F = MSTR∕MSE for the analysis of variance for that
particular variable, and the Importance statistic is simply (1 – p)-value, where p-
value = P(F > F test statistic)
• Account length and the number of voice mail messages as the two most important
numerical variables for discriminating among clusters
• Next, graphically it is seen that the account length for Cluster 00 is greater than that
of Cluster 20 which also is supported by the statistics in the last table, which shows
that the mean account length of 141.508 days for Cluster 00 and 61.707 days for
Cluster 20
• Also, tiny Cluster 12 has the highest mean number of voice mail messages (31.662),
with Cluster 02 also having a large amount (29.229)
• Finally, note that the neighborliness of Kohonen clusters tends to make neighboring
clusters similar. It would have been surprising, for example to find a cluster with
141.508 mean account length right next to a cluster with 61.707 mean account
length. In fact, this did not happen
A Bigger Example – Cluster Profiles
• Cluster 00: Loyal Non-Adopters. Belonging to neither the VoiceMail Plan nor the
International Plan, customers in large Cluster 00 have nevertheless been with the
company the longest, with by far the largest mean account length, which may be
related to the largest number of calls to customer service. This cluster exhibits the
lowest average minutes usage for day minutes and international minutes, and the
second lowest evening minutes and night minutes
• Cluster 02: Voice Mail Users. This large cluster contains members of the VoiceMail
Plan, with therefore a high mean number of voice mail messages, and no members
of the International Plan. Otherwise, the cluster tends toward the middle of the
pack for the other variables
• Cluster 10: Average Customers. Customers in this medium-sized cluster belong to
neither the Voice Mail Plan nor the International Plan. Except for the second-
largest mean number of calls to customer service, this cluster otherwise tends
toward the average values for the other variables
• Cluster 12: Power Customers. This smallest cluster contains customers who belong
to both the VoiceMail Plan and the International Plan. These sophisticated
customers also lead the pack in usage minutes across three categories and are in
second place in the other category. They also have the fewest average calls to
customer service. The company should keep a watchful eye on this cluster, as they
may represent a highly profitable group.
A Bigger Example – Cluster Profiles
• Cluster 20: Newbie Non-AdoptersUsers. Belonging to neither the VoiceMail
Plan nor the International Plan, customers in large Cluster 00 represent the
company’s newest customers, on average, with easily the shortest mean account
length. These customers set the pace with the highest mean night minutes
usage
• Cluster 22: International Plan Users. This small cluster contains members
of the International Plan and only a few members of the VoiceMail Plan.
The number of calls to customer service is second lowest, which may
mean that they need a minimum of hand-holding. Besides the lowest mean
night minutes usage, this cluster tends toward average values for the other
variables.
• Cluster profiles may of themselves be of actionable benefit to companies and
researchers  for example, suggest marketing segmentation strategies in an
era of shrinking budgets
• Rather than targeting the entire customer base for a mass mailing, for example,
perhaps only the most profitable customers may be targeted
• Another strategy is to identify those customers whose potential loss would be of
greater harm to the company, such as the customers in Cluster 12 above
• Finally, customer clusters could be identified that exhibit behavior predictive of
churning; intervention with these customers could save them for the company

More Related Content

Similar to Sess07 Clustering02_KohonenNet.pptx

Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
ChenYiHuang5
 
5954987.ppt
5954987.ppt5954987.ppt
5954987.ppt
MukhtiarKhan5
 
Convolutional neural networks
Convolutional neural networksConvolutional neural networks
Convolutional neural networks
Mohammad Imran
 
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor Networks
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor NetworksA Virtual Infrastructure for Mitigating Typical Challenges in Sensor Networks
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor Networks
Michele Weigle
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 
Self Organization Map
Self Organization MapSelf Organization Map
Self Organization Map
Zahra Sadeghi
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
Nimrita Koul
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
14_cnn complete.pptx
14_cnn complete.pptx14_cnn complete.pptx
14_cnn complete.pptx
FaizanNadeem10
 
10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx
SaifKhan703888
 
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTSNEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
hemasubbu08
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
Tamer Ahmed Farrag, PhD
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
ChenYiHuang5
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
IshaneeSharma
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
Poonam60376
 
Unit iii update
Unit iii updateUnit iii update
Unit iii update
Indira Priyadarsini
 
Deep learning
Deep learningDeep learning
Deep learning
Kuppusamy P
 

Similar to Sess07 Clustering02_KohonenNet.pptx (20)

Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
5954987.ppt
5954987.ppt5954987.ppt
5954987.ppt
 
Convolutional neural networks
Convolutional neural networksConvolutional neural networks
Convolutional neural networks
 
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor Networks
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor NetworksA Virtual Infrastructure for Mitigating Typical Challenges in Sensor Networks
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor Networks
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Self Organization Map
Self Organization MapSelf Organization Map
Self Organization Map
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
14_cnn complete.pptx
14_cnn complete.pptx14_cnn complete.pptx
14_cnn complete.pptx
 
10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx
 
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTSNEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTS
 
CNN.pptx
CNN.pptxCNN.pptx
CNN.pptx
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
Unit iii update
Unit iii updateUnit iii update
Unit iii update
 
Deep learning
Deep learningDeep learning
Deep learning
 

Recently uploaded

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 

Recently uploaded (20)

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 

Sess07 Clustering02_KohonenNet.pptx

  • 2. Kohonen Networks – Self Organizing Map • Kohonen networks were introduced in 1982 by Finnish researcher Tuevo Kohonen • Although applied initially to image and sound analyses, Kohonen networks are nevertheless an effective mechanism for clustering analysis • Kohonen networks represent a type of self-organizing map (SOM), which itself represents a special class of neural networks • The goal of SOMs is to convert a complex high-dimensional input signal into a simpler low-dimensional discrete map - nicely appropriate for cluster analysis, where underlying hidden patterns among records and fields are sought • SOMs structure the output nodes into clusters of nodes, where nodes in closer proximity are more similar to each other than to other nodes that are farther apart • Some researchers have shown that SOMs represent a nonlinear generalization of principal components analysis, another dimension-reduction technique • SOMs are based on competitive learning, where the output nodes compete among themselves to be the winning node (or neuron), the only node to be activated by a particular input observation
  • 3. Clustering Task • A typical SOM architecture is shown in the following figure • The input layer is shown at the bottom of the figure, with one input node for each field. Just as with neural networks, these input nodes perform no processing themselves, but simply pass the field input values along downstream
  • 4. Prerequisites of Clustering • Like neural networks, SOMs are feedforward and completely connected • Feedforward networks do not allow looping or cycling. • Completely connected means that every node in a given layer is connected to every node in the next layer, although not to other nodes in the same layer • Like neural networks, each connection between nodes has a weight associated with it, which at initialization is assigned randomly to a value between 0 and 1  Adjusting these weights represents the key for the learning mechanism in both neural networks and SOMs • Variable values need to be normalized or standardized, just as for neural networks, so that certain variables do not overwhelm others in the learning algorithm • Unlike most neural networks SOMs have no hidden layer • The data from the input layer is passed along directly to the output layer • The output layer is represented in the form of a lattice, usually in one or two dimensions, and typically in the shape of a rectangle, although other shapes, such as hexagons, may be used • The output layer shown in the earlier figure is a 3 × 3 square
  • 5. Working of SOM • For a given record (instance), a particular field value is forwarded from a particular input node to every node in the output layer • For example, suppose that the normalized age and income values for the first record in the data set are 0.69 and 0.88, respectively. • The 0.69 value would enter the SOM through the input node associated with age, and this node would pass this value of 0.69 to every node in the output layer • Similarly, the 0.88 value would be distributed through the income input node to every node in the output layer • These values, together with the weights assigned to each of the connections, would determine the values of a scoring function (such as Euclidean distance) for each output node • The output node with the “best” outcome from the scoring function would then be designated as the winning node
  • 6. Working of SOM • SOMs exhibit three characteristic processes: 1. Competition • As mentioned above, the output nodes compete with each other to produce the best value for a particular scoring function, most commonly the Euclidean distance • In this case, the output node that has the smallest Euclidean distance between the field inputs and the connection weights would be declared the winner 2. Cooperation. • The winning node therefore becomes the center of a neighborhood of excited neurons. This emulates the behavior of human neurons, which are sensitive to the output of other neurons in their immediate neighborhood. • In SOMs, all the nodes in this neighborhood share in the “excitement” or “reward” earned by the winning nodes, that of adaptation. Thus, even though the nodes in the output layer are not connected directly, they tend to share common features, due to this neighborliness parameter 3. Adaptation. • The nodes in the neighborhood of the winning node participate in adaptation, that is, learning The weights of these nodes are adjusted so as to further improve the score function • In other words, these nodes will thereby have an increased chance of winning the competition once again, for a similar set of field values
  • 7. Kohonen Networks • Kohonen networks are SOMs that exhibit Kohonen learning • Suppose that we consider the set of m field values for the nth record to be an input vector and the current set of m weights for a particular output node j to be a weight vector • In Kohonen learning, the nodes in the neighborhood of the winning node adjust their weights using a linear combination of the input vector and the current weight vector • Kohonen indicates that the learning rate 𝜂 should be a decreasing function of training epochs (runs through the data set) and that a linearly or geometrically decreasing 𝜂 is satisfactory for most purposes
  • 8. Algorithm for Kohonen Networks • The algorithm for Kohonen Networks is shown below: For each input vector x, perform the following steps: • Competition : For each output node j, calculate the value of the scoring function. For example, for Euclidean distance, Find the winning node J that minimizes over all output nodes • Cooperation : Identify all output nodes j within the topological neighborhood of J defined by the neighborhood size R. • Updation : Adjust the weights: Adjust the learning rate and neighborhood size, as needed • Stop when the termination criteria are met
  • 9. A Worked Example • Suppose that we have a data set with two attributes, age and income, which have already been normalized, and suppose that we would like to use a 2 × 2 Kohonen network to uncover hidden clusters in the data set. The Network is shown below:
  • 10. A Worked Example • With such a small network, we set the neighborhood size to be R = 0, so that only the winning node will be awarded the opportunity to adjust its weight • Also, we set the learning rate 𝜂 to be 0.5 • Assume that the weights have been randomly initialized as follows • For the first input vector, x1 = (0.8, 0.8), we perform the following competition, cooperation, and adaptation sequence • Competition Step : Compute the Euclidean Distances between the input and weights for four of the output nodes • So, Node 1 is the winner here, why? As it might be seen that the input vector (0.8, 0.8) is most similar to the weight (0.9, 0.8) • In other words, we may expect node 1 to uncover a cluster of older, high-income persons
  • 11. A Worked Example • Cooperation. In this simple example, we have set the neighborhood size R = 0 so that the level of cooperation among output nodes is nil! Therefore, only the winning node, node 1, will be rewarded with a weight adjustment. (We omit this step in the remainder of the example.) • Adaptation. For the winning node, node 1, the weights are adjusted as follows: For j = 1 (node 1), n = 1 (the first record) and learning rate 𝜂 = 0.5, for each of the fields this becomes • For age: • For Income : • Note the type of adjustment that takes place. The weights are nudged in the direction of the fields’ values of the input record. That is, w11, the weight on the age connection for the winning node, was originally 0.9, but was adjusted in the direction of the normalized value for age in the first record, 0.8. As the learning rate 𝜂 = 0.5, this adjustment is half (0.5) of the distance between the current weight and the field value • This adjustment will help node 1 to become even more proficient at capturing the records of older, high-income persons
  • 12. A Worked Example • Let's use this concept to 3 more input points x2 = (0.8, 0.1), x3 = (0.2, 0.9) and x1 = (0.1, 0.1) with the same initial weights • For input 2, the distances are computed as 0.78, 0.14, 0.99 and 0.71 and Output node 2 is the winner, the updated weights are 0.85 and 0.15 • For input 3, the distances are computed as 0.66, 0.99, 0.14 and 0.71 and Output node 3 is the winner, the updated weights are 0.15 and 0.85 • For input 4, the distances are computed as 1.03, 0.75, 0.75 and 0.10 and Output node 4 is the winner, the updated weights are 0.1 and 0.10 • Consolidated outcome is as follows Cluster Associated with Description 1 Node 1 Older person with high income 2 Node 2 Older person with low income 3 Node 3 Younger person with high income 4 Node 4 Younger person with low income
  • 13. A Bigger Example • Let’s apply the Kohonen Net to the Churn dataset • Recall that the data set contains 20 variables worth of information about 3333 customers, along with an indication of whether that customer churned (left the company) or not • Total 9 variables/attributes are used for clustering the customers – 2 Categorical (International plan and Voice Mail plan) and 7 continuous (Account length, voice mail messages, day minutes, evening minutes, night minutes, international minutes, and customer service calls) • Z-score standardization has been adopted for standardizing the continuous variables • A 3X3 Kohonen Net architecture has been adopted • The learning of parameters were set as follows : • For the first 20 cycles (passes through the data set), the neighborhood size was set at R = 2, and the learning rate was set to decay linearly starting at 𝜂 = 0.3 • Then, for the next 150 cycles, the neighborhood size was reset to R = 1, while the learning rate was allowed to decay linearly from 𝜂 = 0.3 to 𝜂 = 0
  • 14. A Bigger Example • The net architecture is given below :
  • 15. A Bigger Example • As it turned out, the Kohonen algorithm used only six of the nine available output nodes, as shown in the following figure with output nodes 01, 11, and 21 being pruned
  • 16. A Bigger Example - Interpretation • Can we develop the Cluster profiles? • Consider International Plan = Yes v/s International Plan = No • International Plan = Yes customers reside exclusively in Nodes 12 and 22
  • 17. A Bigger Example - Interpretation • Can we develop the Cluster profiles? • Consider Voice mail = Yes v/s Voice main = No • The three clusters along the bottom row (i.e., Cluster 00, Cluster 10, and Cluster 20) contain only non-adopters of the VoiceMail Plan • Clusters 02 and 12 contain only adopters of the VoiceMail Plan • Cluster 22 contains mostly non-adopters and also some adopters of the VoiceMail
  • 18. A Bigger Example - Interpretation • Recall that because of the neighborliness parameter, clusters that are closer together should be more similar than clusters that are farther apart. • Note that all International Plan adopters reside in contiguous (neighboring) clusters, as do all non-adopters. Similarly for later figure, except that Cluster 22 contains a mixture • We see that Cluster 12 represents a special subset of customers, those who have adopted both the International Plan and the VoiceMail Plan  even though this subset represents only 2.4% of the customers but a well defined one and the Kohonen network uncovered it • Next we interpret the clusters according to the continuous predictors
  • 19. A Bigger Example - Interpretation • The following figure provides information about how the values of all the variables are distributed among the clusters, with one column per cluster and one row per variable
  • 20. A Bigger Example - Interpretation • The darker rows indicate the more important variables, that is, the variables that proved more useful for discriminating among the clusters • Consider Account Length_Z. Cluster 00 contains customers who tend to have been with the company for a long time, that is, their account lengths tend to be on the large side. Contrast this with Cluster 20, whose customers tend to be fairly new • For the continuous variables, the data analyst should report the means for each variable, for each cluster, along with an assessment of whether the difference in means across the clusters is significant • It is important that the means reported to the client appear on the original (untransformed) scale, and not on the Z scale or min–max scale, so that the client may better understand the clusters
  • 21. A Bigger Example - Interpretation • The following table provides these means, along with the results of an analysis of variance for assessing whether the difference in means across clusters is significant
  • 22. A Bigger Example - Interpretation • Each row contains the information for one numerical variable, with one analysis of variance for each row. Each cell contains the cluster mean, standard deviation, standard error (standard deviation∕√cluster count), and cluster count • The degrees of freedom are df1 = k - 1 = 6 - 1 = 5 and df2 = N - k = 3333 - 6 = 3327 • The F-test statistic is the value of F = MSTR∕MSE for the analysis of variance for that particular variable, and the Importance statistic is simply (1 – p)-value, where p- value = P(F > F test statistic) • Account length and the number of voice mail messages as the two most important numerical variables for discriminating among clusters • Next, graphically it is seen that the account length for Cluster 00 is greater than that of Cluster 20 which also is supported by the statistics in the last table, which shows that the mean account length of 141.508 days for Cluster 00 and 61.707 days for Cluster 20 • Also, tiny Cluster 12 has the highest mean number of voice mail messages (31.662), with Cluster 02 also having a large amount (29.229) • Finally, note that the neighborliness of Kohonen clusters tends to make neighboring clusters similar. It would have been surprising, for example to find a cluster with 141.508 mean account length right next to a cluster with 61.707 mean account length. In fact, this did not happen
  • 23. A Bigger Example – Cluster Profiles • Cluster 00: Loyal Non-Adopters. Belonging to neither the VoiceMail Plan nor the International Plan, customers in large Cluster 00 have nevertheless been with the company the longest, with by far the largest mean account length, which may be related to the largest number of calls to customer service. This cluster exhibits the lowest average minutes usage for day minutes and international minutes, and the second lowest evening minutes and night minutes • Cluster 02: Voice Mail Users. This large cluster contains members of the VoiceMail Plan, with therefore a high mean number of voice mail messages, and no members of the International Plan. Otherwise, the cluster tends toward the middle of the pack for the other variables • Cluster 10: Average Customers. Customers in this medium-sized cluster belong to neither the Voice Mail Plan nor the International Plan. Except for the second- largest mean number of calls to customer service, this cluster otherwise tends toward the average values for the other variables • Cluster 12: Power Customers. This smallest cluster contains customers who belong to both the VoiceMail Plan and the International Plan. These sophisticated customers also lead the pack in usage minutes across three categories and are in second place in the other category. They also have the fewest average calls to customer service. The company should keep a watchful eye on this cluster, as they may represent a highly profitable group.
  • 24. A Bigger Example – Cluster Profiles • Cluster 20: Newbie Non-AdoptersUsers. Belonging to neither the VoiceMail Plan nor the International Plan, customers in large Cluster 00 represent the company’s newest customers, on average, with easily the shortest mean account length. These customers set the pace with the highest mean night minutes usage • Cluster 22: International Plan Users. This small cluster contains members of the International Plan and only a few members of the VoiceMail Plan. The number of calls to customer service is second lowest, which may mean that they need a minimum of hand-holding. Besides the lowest mean night minutes usage, this cluster tends toward average values for the other variables. • Cluster profiles may of themselves be of actionable benefit to companies and researchers  for example, suggest marketing segmentation strategies in an era of shrinking budgets • Rather than targeting the entire customer base for a mass mailing, for example, perhaps only the most profitable customers may be targeted • Another strategy is to identify those customers whose potential loss would be of greater harm to the company, such as the customers in Cluster 12 above • Finally, customer clusters could be identified that exhibit behavior predictive of churning; intervention with these customers could save them for the company