5. Data in Spatial data mining
Non spatial information
Same like data in traditional data mining
Catagorical, Numeric, Boolean, etc
E.g postal code, city name, number of victim natural disaster
Spatial Information
Spatial Attribute
Neigborhood
Location: longitude, latitude
7. Relationships on Data in Spatial Data
Mining
Relationships on non-spatial data
• Explicit
• Arithmetic, ranking(ordering), etc.
Relationships on Spatial Data
• Many are implicit
• Relationship Categories
◦ Set-oriented: union, intersection, and membership, etc
◦ Topological: meet, within, overlap, etc
◦ Directional: North, NE, left, above, behind, etc
◦ Metric: e.g., Euclidean: distance, area, perimeter
8. Approach to solve spatial attribute
☼
MV-approximation
Calculating the
Exact Separation
Distance
IR Approximation
PDF
9. Polygon Dissimilarity Function
Given P ={𝑃1, 𝑃2, … 𝑃𝑛}, where P is set of Polygon
Non spatial attribute of polygon are all of non spatial attribute that independent
of the spatial polygon, average income, number of damaged house because
natural disaster
Spatial attribute of polygon divide into two catagories, intrinsic and extrinsic.
Attribute intrinsic of polygon describe polygon geometric characteristic, like
location, shape, area
Attribute extrinsic of polygon include various spatial object that may exist in
polygon, there are three object spatial, point, line and area
10. PDF between two polygon
𝐷 𝑃𝐷𝐹 = 𝑤 𝑛,𝑠, 𝑑 𝑛,𝑠 𝑃𝑖, 𝑃𝑗 + 𝑤𝑠 𝑑 𝑠 𝑃𝑖, 𝑃𝑗
𝑤 𝑛,𝑠 + 𝑤𝑠 = 1
Distance between non spatial attribute can solve by euclide or manhattan distance.
Distance between spatial attribute
𝑑 𝑠 = 𝑤𝑖𝑛𝑠 𝑑𝑖𝑛𝑠 𝑃𝑖, 𝑃𝑗 + 𝑤 𝑒𝑘𝑠 𝑑 𝑒𝑘𝑠 𝑃𝑖, 𝑃𝑗
𝑤𝑖𝑛𝑠 + 𝑤 𝑒𝑘𝑠 = 1
13. CLARANS
Clarans Algorithm
Input parameters numlocal and maxneighbor. Initi- alize i to 1, and
mincost to a large number.
Set current to an arbitrary node in Gn,k.
Set j to 1.
Consider a random neighbor S of current, and based on 5, calculate
the cost differential of the two nodes.
If S has a lower cost, set current to S, and go to Step 3.
Otherwise, increment j by 1. If j ≤ maxneighbor,go to Step 4.
Otherwise, when j > maxneighbor, compare the cost of current with
mincost. If the former is less than mincost, set mincost to the cost of
current and set bestnode to current.
Increment i by 1. If i > numlocal, output bestnode and halt.
Otherwise, go to Step 2.
17. Conclusion
Clarans algorithm can combine with weigting dissimilarity function for solve spatial data
clustering. Weigting dissimilarity function more effeciecy than traditional dissimilarity to solve
spatial relationship. Increasing number of cluster will be followed by increasing computational
time but not efficiency of cluster. Weigthing spatial very influent to solve spatial data clustering.