An Weighting Dissimilarity Function of CLARANS for Clustering Spatial Data Education in Java

An Weighting Dissimilarity Function of CLARANS for
Clustering Spatial Data Education in Java
ICHWANUL MUSLIM KARO KARO, S.KOM

What’s special about spatial data mining
The 1854 Asiatic Cholera in London

Data in Spatial data mining
 Non spatial information
 Same like data in traditional data mining
 Catagorical, Numeric, Boolean, etc
 E.g postal code, city name, number of victim natural disaster
Spatial Information
Spatial Attribute
 Neigborhood
 Location: longitude, latitude

Representation of spatial data
Raster: gridded and space
Vector: point, line, polygon

Relationships on Data in Spatial Data
Mining
Relationships on non-spatial data
• Explicit
• Arithmetic, ranking(ordering), etc.
Relationships on Spatial Data
• Many are implicit
• Relationship Categories
◦ Set-oriented: union, intersection, and membership, etc
◦ Topological: meet, within, overlap, etc
◦ Directional: North, NE, left, above, behind, etc
◦ Metric: e.g., Euclidean: distance, area, perimeter

Approach to solve spatial attribute
☼
MV-approximation
Calculating the
Exact Separation
Distance
IR Approximation
PDF

Polygon Dissimilarity Function
Given P ={𝑃1, 𝑃2, … 𝑃𝑛}, where P is set of Polygon
Non spatial attribute of polygon are all of non spatial attribute that independent
of the spatial polygon, average income, number of damaged house because
natural disaster
Spatial attribute of polygon divide into two catagories, intrinsic and extrinsic.
Attribute intrinsic of polygon describe polygon geometric characteristic, like
location, shape, area
Attribute extrinsic of polygon include various spatial object that may exist in
polygon, there are three object spatial, point, line and area

PDF between two polygon
𝐷 𝑃𝐷𝐹 = 𝑤 𝑛,𝑠, 𝑑 𝑛,𝑠 𝑃𝑖, 𝑃𝑗 + 𝑤𝑠 𝑑 𝑠 𝑃𝑖, 𝑃𝑗
𝑤 𝑛,𝑠 + 𝑤𝑠 = 1
Distance between non spatial attribute can solve by euclide or manhattan distance.
Distance between spatial attribute
𝑑 𝑠 = 𝑤𝑖𝑛𝑠 𝑑𝑖𝑛𝑠 𝑃𝑖, 𝑃𝑗 + 𝑤 𝑒𝑘𝑠 𝑑 𝑒𝑘𝑠 𝑃𝑖, 𝑃𝑗
𝑤𝑖𝑛𝑠 + 𝑤 𝑒𝑘𝑠 = 1

Design Process
Data
Start
Seperation
Attribute
Handling Spatial
Attribute
Handling non
Spatial Attribute
Similarity
Clustering
Evaluation
end

Modify Dissimilarity
𝐷 𝑎,𝑏 = 𝑤 𝑛,𝑠, 𝑑 𝑛,𝑠 𝑃𝑎, 𝑃𝑏 + 𝑤𝑠 𝑑 𝑠 𝑃𝑎, 𝑃𝑏
𝐷 𝑃𝐷𝐹 = 𝑤 𝑛,𝑠, 𝑑 𝑛,𝑠 𝑃𝑖, 𝑃𝑗 + 𝑤𝑠 𝑑 𝑠 𝑃𝑖, 𝑃𝑗
Without consider intrinsic attribute and extrinsic attribute

CLARANS
Clarans Algorithm
 Input parameters numlocal and maxneighbor. Initi- alize i to 1, and
mincost to a large number.
 Set current to an arbitrary node in Gn,k.
 Set j to 1.
 Consider a random neighbor S of current, and based on 5, calculate
the cost differential of the two nodes.
 If S has a lower cost, set current to S, and go to Step 3.
 Otherwise, increment j by 1. If j ≤ maxneighbor,go to Step 4.
 Otherwise, when j > maxneighbor, compare the cost of current with
mincost. If the former is less than mincost, set mincost to the cost of
current and set bestnode to current.
 Increment i by 1. If i > numlocal, output bestnode and halt.
Otherwise, go to Step 2.

Data
Data Ratio of student and Class in Java
Island

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Silhoutteindex
Weigth
Comparation silhouette index
K=2 K=3
0
20
40
60
80
100
120
140
160
180
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Times(s)
weigth
Comparation Runing Time
K=2 K=3
Weigth
Spatial
Silhouette
index
Running
time (s)
0,1 0.6909 128.828
0,2 0.5168 129.218
0,3 0.5940 129.028
0,4 0.5227 128.525
0,5 0.2783 130.221
0,6 0.4123 130.597
0,7 0.3740 131.458
0,8 0.4874 130.918
0,9 0.5321 135.227

0
50,000
100,000
150,000
200,000
250,000
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
runningtime(s)
number of cluster
Influent number of cluster to computational time

Conclusion
Clarans algorithm can combine with weigting dissimilarity function for solve spatial data
clustering. Weigting dissimilarity function more effeciecy than traditional dissimilarity to solve
spatial relationship. Increasing number of cluster will be followed by increasing computational
time but not efficiency of cluster. Weigthing spatial very influent to solve spatial data clustering.

An Weighting Dissimilarity Function of CLARANS for Clustering Spatial Data Education in Java

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to An Weighting Dissimilarity Function of CLARANS for Clustering Spatial Data Education in Java

Similar to An Weighting Dissimilarity Function of CLARANS for Clustering Spatial Data Education in Java (20)

Recently uploaded

Recently uploaded (20)

An Weighting Dissimilarity Function of CLARANS for Clustering Spatial Data Education in Java