1. Graph-based cluster labeling using
Growing Hierarchal SOM
Mahmoud Rafeek Alfarra
College Of Science & Technology
m.farra@cst.ps
The second International conference of Applied Science & natural
Ayman Shehda Ghabayen
College Of Science & Technology
a.ghabayen@cst.ps
Prepared by:
2. Out Line
Labeling, What and why ?
Graph based Representation
Growing Hierarchal SOM
Extraction of labeles of clusters
3. Labeling, What and why ?
Cluster labeling: process tries to select
descriptive labels (Key words) for the clusters
obtained through a clustering algorithm.
4. Labeling, What and why ?
Cluster labeling is an increasingly important
task that:
1. The document collections grow larger.
2. Help To: work with processing of news,
email threads, blogs, reviews, and
search results
5. Labeling, What and why ?
Documents collection
Document
Labeled Clusters
Preprocessing Step
DIG Model
X B
S OL
A
G
C
D
Clustering
Process
+
Labeling
0
G0
0
G1
0
Gs
SOM
1
G0
1
G1
1
Gs
2
G1
2
G2
Hierarchal Growing SOM
2
G1
2
G2
1
G0
1
G1
2
G1
2
G2
7. Graph based Representation
Capture the silent features of the data.
DIG Model: a directed graph.
A document is represented as a vector of sentences
Phrase indexing information is stored in the graph
nodes themselves in the form of document tables.
e1
e0
e2
rafting
adventures
river
Document Table e0 S1(1), S2(2), S3(1)
e0 S2(1)
e2 S1(2)
e1 S4(1)
fishing
Doc TF ET
1 {0,0,3}
2 {0,0,2}
3 {0,0,1}
S1(2(
#Sentence
Position
of term
8. Graph based Representation
Example Document 1
River rafting
Mild river rafting
River rafting trips
Document 2
Wild river adventures
River rafting vocation plan
fishing trips
fishing vocation plan
booking fishing trips
river fishing
mild
river
rafting
trips
mild
river
rafting
trips
wild
adventures vocation
plan
wild
plan
mild
river
rafting
trips
adventures
vocation
booking
fishing
+
10. Growing Hierarchal SOM
Determining the winning node
…
v1
v2
v3
v5
v4
v7
e0 v6
e0
e1 e5
e3
e2
e4
n-nodes in SOM (Gs)
v1
v2 v5
v7
e0 v6
e0
e1 e5
e3
Input Document Graph (Gi)
Phrases Significance
Gi Gs
length
Gi
11. Growing Hierarchal SOM
Neuron updating in the graph domain
A
B D
C
e0 Xe0
e1 e5
e3
Y
B D
C
Ee4
e1 e5
e3
A
e2
e2
G1
G2
We choose increasing the matching phrases to update graphs
due to its affect is more stronger than increasing terms (nodes)
also add matching phrases can consider it as add ordered pair
of nodes
13. Extracting labeling of clusters
To extract the Key word, we need to build a table
for each cluster as the following:
Term TF- Locations
{T, L,B,b}
No of matching phrases
(MP)
Weight
Weight = (f1*T + f2*L + f3*B+ f4*b) * 0.4 + MP * 0.6