Different ANN Algorithms for clustering
Competitive leaning
kohonen Network(SOM)(competitive leaning based clustering)
K-mean Clustering or C -Mean
Hierarchical clustering
1. CLUSTERING : A NEURAL
NETWORK APPROACH
INDIAN INSTITUTE OF INFORMATION
TECHNOLOGY KALYANI (WB)
ANUJ KUMAR PATHAK
(CSE)
SHIV PRATAP
2. OUTLINE
• INTRODUCTION
• TYPES OF CLUSTERING
• COMPETITIVE LEANING
• KOHONEN NETWORK(SOM)(COMPETITIVE LEANING BASED
CLUSTERING)
• K-MEAN CLUSTERING OR C -MEAN
• HIERARCHICAL CLUSTERING
• CONCLUSION
3. INTRODUCTION OF ANN
IT IS A INFORMATION PROCESSING MODEL WORKS LIKE HUMAN
NERVOUS SYSTEM AND CONSIST OF NUMBER OF ARTIFICIAL
NEURONS(PERCEPTRON).
BASIC BUILDING DIAGRAM OF ARTIFICIAL NEURAL NETWORK IS
NEURON WHICH IS MATHEMATICAL MODEL /FUNCTION.
4. NEURAL NETWORK CLUSTERING
• NEURAL NETWORK CLUSTERING IS A
UNSUPERVISED LEANING.
• BASICALLY CLUSTERING IS THE
ORGANIZATION OF UNLABELLED
DATA E.G. TWEET,PICTURE,VEDIO
INTO SIMILAR GROUP CALLED
CLUSTER.
• CLUSTER IS A COLLECTION OF DATA
ITEM WHICH ARE SIMILAR AND
DISSIMILAR TO DATA ITEM OF
DIFFERENT CLUSTER.
5. COMPETITIVE LEARNING
• COMPETITIVE LEARNING IS USEFUL FOR CLUSTERING INPUT PATTERNS INTO A
DISCRETE SET OF OUTPUT CLUSTER.
• COMPETITIVE LEARNING CAN BE BUILD USING 2 LAYER (J-K) NEURAL
NETWORK .
• IN COMPETITIVE LEARNING NETWORK INPUT AND OUTPUT ARE FULLY
CONNECTED .
• OUTPUT LAYER IS CALLED COMPETITIVE LAYER.
• WHEREIN LATERAL CONNECTIONS ARE USED TO PERFORM LATERAL
INHIBITION.
6. Continue..
Competitive is usually derived by minimizing the mean square error(MSE)
E = (1/N)*∑ Ep p=1 to n
where Ep=∑ µkp.|Xp-Ck|.|Xp-Ck|
where n=size of pattern set
and µkp is the connection weight assigned to Ck with respect to Xp
When Ck is the closest prototype Xp then µkp=1 otherwisw 0.
Assumption that weights are obtained by the nearest prototype condition
Thus Ep=min|Xp-Ck|.|Xp-Ck|
Which is the Squared Euclidean distance between the input Xp and its closest prototype Ck.
7. KOHONEN NETWORK(SOM)
• SOM IS VERY USEFUL FOR VQ CLUSTERING ANALYSIS , FEATURE EXTRACTION AND
DATA VISUALIZATION . IT IS A COMPETITIVE LEARNING BASED MODEL OR ALGORITHM.
• THE GOAL OF SOM IS TO TRANSFORM HIGHER INPUT SPACE INTO 1 1-D OR 2-D
DISCRETE MAP IN A TOPOLOGICAL FASHION.
• SOM CONSIST OF NODES OR NEURONS ASSOCIATED WITH EACH NODE HAVE A
WEIGHT VECTOR OF SAME DIMENSION AS THE INPUT DATA VECTORS AND A POSITION IN
THE MAP SPACE IN HEXAGONAL OR RECTANGULAR AND FORM A NETWORK LIKE
FORWARD NETWORK.
• THIS MAKES SOM USEFUL FOR VISUALIZATION OF LOW DIMENSIONAL VIEW OF HIGH
DIMENSIONAL DATA.
8. ALGORITHMIC STEPS::
• 1.RANDOMIZE NODE WEIGHT VECTOR IN MAP
• 2.RANDOMLI PICK UP INPUT VECTOR D(T)
• 3.TRAVERSE EACH NODE IN THE MAP
• I . USE EUCLIDEAN DISTANCE FORMULA FOR FINDING
TO FIND SIMILARITY BETWEEN INPUT VECTOR AND
MAP NODE VECTOR.
• II . TRACK THE NODE THAT PRODUCE SMALLEST
DISTANCE(THIS NODE IS KNOWN AS BEST MATCHING
UNIT CALLED BMU)
• III . UPDATE THE NODE IN THE NEIGHBOURHOOD OF
BMU
9. TYPES OF CLUSTERING
THERE IS THREE TYPES OF CLUSTERING
1 : - PARTITIONAL CLUSTERING (DYNAMIC FORM)
2:- HIERARCHICAL CLUSTERING (STATIC FORM)
3:- DENSITY CLUSTERING
10. PARTITIONAL CLUSTERING
• ITS SIMPLY A DIVISION OF THE SET OF DATA OBJECTS INTO NON-OVERLAPPING
CLUSTERS SUCH THAT EACH OBJECTS IS IN EXACTLY ONE SUBSET
• IT IS DYNAMIC CLUSTERING
• POINTS CAN MOVE FROM ONE CLUSTERS TO ANOTHER CLUSTER
• KNOWLEDGE OF SHAPE OR SIZE CAN BE INCORPORATED FOR MEASURING
DISTANCE
• IT IS SUSCEPTIBLE TO LOCAL MINIMA OF ITS OBJECTIVE FUNCTION
• NUMBER OF CLUSTER IS TO BE PREDEFINED
• EXAMPLE K-MEAN OR CENTROID CLUSTERING
11. K MEAN OR CENTROID CLUSTERING
• K MEAN CLUSTERING IS SPECIAL CASE OF SOM
• K MEANS CLUSTERING IS AN UNSUPERVISED LEARNING ALGORITHM THAT
TRIES TO CLUSTER DATA BASED ON THEIR SIMILARITY.
• SPECIFY THE NUMBER OF CLUSTERS WE WANT THE DATA TO BE GROUPED
INTO
• K MEAN CLUSTERING IS TWO TYPES
1:- BATCH MODE :- WHOLE TRAINING DATA SET IS FIXED , NEAREST
NEIGHBOUR RULE
2:- INCREMENTAL MODE :- TRAINING DATA SET CAN BE INCREASE
12. K MEAN CLUSTERING ALGORITHM
STEP 1:-SPECIFY THE NUMBER OF CLUSTER U WANT
STEP 2:- RANDOMLY ASSIGNS EACH OBSERVATION TO A CLUSTER, AND FINDS THE
CENTROID OF EACH CLUSTER
STEP3:- FOLLOW STEP 4 AND 5 UNTIL NO VARIATION IS FOUND
STEP 4: REASSIGN DATA POINTS TO THE CLUSTER WHOSE CENTROID IS CLOSEST
STEP 5: CALCULATE NEW CENTROID OF EACH CLUSTER (TAKING MEAN OF ALL DATA
IN CLUSTER)
13. To select the number of clusters we use Elbow method
14. HIERARCHICAL CLUSTERING
• IT ALSO KNOWN AS 'NESTING
CLUSTERING' AS IT ALSO CLUSTERS
TO EXIST WITHIN BIGGER CLUSTERS
TO FORM A TREE(DENDROME)
• CONSISTS OF A SEQUENCE OF
PARTITIONS IN A HIERARCHICAL
STRUCTURE
• IT IS STATIC (POINT COMMITTED TO
ONE CLUSTERING DOES NOT MOVE
15. TYPES OF HIERARCHICAL CLUSTERING
• 1:- AGGLOMERATIVE CLUSTERING
MERGING OF ONE CLUSTER TO ANOTHER ON BASIS OF SOME PROPERTY
ALSO KNOW ALSO CALLED BOTTOM UP APPROACH
• 2:- DIVISIVE CLUSTERING
ALL OBSERVATIONS START IN ONE CLUSTER, AND SPLITS ARE PERFORMED
RECURSIVELY AS ONE MOVES DOWN THE HIERARCHY
THIS IS A "TOP DOWN" APPROACH:
16. AGGLOMERATIVE CLUSTERING
ALGORITHM
STEP 1 START BY ASSIGNING EACH ITEM TO A CLUSTER, SO THAT IF YOU HAVE
N ITEMS (CONSIDER THERE DISTANCE BETWEEN THEM IS SAME)
STEP 2 FIND THE CLOSEST (MOST SIMILAR) PAIR OF CLUSTERS AND MERGE
THEM INTO A SINGLE CLUSTER, SO THAT NOW YOU HAVE ONE CLUSTER
LESS.
STEP 3 COMPUTE DISTANCES (SIMILARITIES) BETWEEN THE NEW CLUSTER AND
EACH OF THE OLD CLUSTERS.
STEP 4 REPEAT STEPS 2 AND 3 UNTIL ALL ITEMS ARE CLUSTERED INTO A
SINGLE CLUSTER OF SIZE N. (*)
THE SELECTION OF 3RD STEP IS RESPONSIBLE FOR DIFFERENT TYPE OF
AGGLOMERATIVE CLUSTERING
17. CONTINUE…
3RD STEP CAN BE DONE VIA SINGLE LINKAGE , COMPLETE LINKAGE AND
AVERAGE LINKAGE
SINGLE LINKAGE : CALCULATES THE INTER-CLUSTER DISTANCE USING THE
CLOSEST DATA POINTS IN DIFFERENT CLUSTERS
COMPLETE LINKAGE : CALCULATES THE INTER-CLUSTER DISTANCE USING THE
FARTHEST DATA POINTS IN DIFFERENT CLUSTERS
AVERAGE LINKAGE : EQUAL TO THE AVERAGE DISTANCE FROM ANY MEMBER
OF ONE CLUSTER TO ANY MEMBER OF THE OTHER CLUSTER.
18. COMPARISON BETWEEN K-MEAN
AND HIERARCHICAL CLUSTERING
HIERARCHICAL CLUSTERING
• TIME COMPLEXITY IS O(N*N)
• NO PREDICTION OF NUMBER OF
CLUSTERS
• NO PROBLEM OF LOCAL MINIMUM
INITIALIZATION PROBLEM
• NO PRIOR KNOWLEDGE OF SHAPE
AND SIZE IS REQUIRED
K-MEAN CLUSTERING
• TIME COMPLEXITY IS O(N)
• TIGHTER BOUND
• PREDICT THE VALUE OF NUMBER OF
CLUSTER
• LOCAL MINIMUM INITIALIZATION
PROBLEM
• PRIOR KNOWLEDGE OF SHAPE AND
SIZE IS REQUIRED