SlideShare a Scribd company logo
1 of 17
An Weighting Dissimilarity Function of CLARANS for
Clustering Spatial Data Education in Java
ICHWANUL MUSLIM KARO KARO, S.KOM
What is Spatial Data ?
What’s special about spatial data mining
The 1854 Asiatic Cholera in London
Data in Spatial data mining
 Non spatial information
 Same like data in traditional data mining
 Catagorical, Numeric, Boolean, etc
 E.g postal code, city name, number of victim natural disaster
Spatial Information
Spatial Attribute
 Neigborhood
 Location: longitude, latitude
Representation of spatial data
Raster: gridded and space
Vector: point, line, polygon
Relationships on Data in Spatial Data
Mining
Relationships on non-spatial data
• Explicit
• Arithmetic, ranking(ordering), etc.
Relationships on Spatial Data
• Many are implicit
• Relationship Categories
◦ Set-oriented: union, intersection, and membership, etc
◦ Topological: meet, within, overlap, etc
◦ Directional: North, NE, left, above, behind, etc
◦ Metric: e.g., Euclidean: distance, area, perimeter
Approach to solve spatial attribute
☼
MV-approximation
Calculating the
Exact Separation
Distance
IR Approximation
PDF
Polygon Dissimilarity Function
Given P ={𝑃1, 𝑃2, … 𝑃𝑛}, where P is set of Polygon
Non spatial attribute of polygon are all of non spatial attribute that independent
of the spatial polygon, average income, number of damaged house because
natural disaster
Spatial attribute of polygon divide into two catagories, intrinsic and extrinsic.
Attribute intrinsic of polygon describe polygon geometric characteristic, like
location, shape, area
Attribute extrinsic of polygon include various spatial object that may exist in
polygon, there are three object spatial, point, line and area
PDF between two polygon
𝐷 𝑃𝐷𝐹 = 𝑤 𝑛,𝑠, 𝑑 𝑛,𝑠 𝑃𝑖, 𝑃𝑗 + 𝑤𝑠 𝑑 𝑠 𝑃𝑖, 𝑃𝑗
𝑤 𝑛,𝑠 + 𝑤𝑠 = 1
Distance between non spatial attribute can solve by euclide or manhattan distance.
Distance between spatial attribute
𝑑 𝑠 = 𝑤𝑖𝑛𝑠 𝑑𝑖𝑛𝑠 𝑃𝑖, 𝑃𝑗 + 𝑤 𝑒𝑘𝑠 𝑑 𝑒𝑘𝑠 𝑃𝑖, 𝑃𝑗
𝑤𝑖𝑛𝑠 + 𝑤 𝑒𝑘𝑠 = 1
Design Process
Data
Start
Seperation
Attribute
Handling Spatial
Attribute
Handling non
Spatial Attribute
Similarity
Clustering
Evaluation
end
Modify Dissimilarity
𝐷 𝑎,𝑏 = 𝑤 𝑛,𝑠, 𝑑 𝑛,𝑠 𝑃𝑎, 𝑃𝑏 + 𝑤𝑠 𝑑 𝑠 𝑃𝑎, 𝑃𝑏
𝐷 𝑃𝐷𝐹 = 𝑤 𝑛,𝑠, 𝑑 𝑛,𝑠 𝑃𝑖, 𝑃𝑗 + 𝑤𝑠 𝑑 𝑠 𝑃𝑖, 𝑃𝑗
Without consider intrinsic attribute and extrinsic attribute
CLARANS
Clarans Algorithm
 Input parameters numlocal and maxneighbor. Initi- alize i to 1, and
mincost to a large number.
 Set current to an arbitrary node in Gn,k.
 Set j to 1.
 Consider a random neighbor S of current, and based on 5, calculate
the cost differential of the two nodes.
 If S has a lower cost, set current to S, and go to Step 3.
 Otherwise, increment j by 1. If j ≤ maxneighbor,go to Step 4.
 Otherwise, when j > maxneighbor, compare the cost of current with
mincost. If the former is less than mincost, set mincost to the cost of
current and set bestnode to current.
 Increment i by 1. If i > numlocal, output bestnode and halt.
Otherwise, go to Step 2.
Data
Data Ratio of student and Class in Java
Island
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Silhoutteindex
Weigth
Comparation silhouette index
K=2 K=3
0
20
40
60
80
100
120
140
160
180
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Times(s)
weigth
Comparation Runing Time
K=2 K=3
Weigth
Spatial
Silhouette
index
Running
time (s)
0,1 0.6909 128.828
0,2 0.5168 129.218
0,3 0.5940 129.028
0,4 0.5227 128.525
0,5 0.2783 130.221
0,6 0.4123 130.597
0,7 0.3740 131.458
0,8 0.4874 130.918
0,9 0.5321 135.227
0
50,000
100,000
150,000
200,000
250,000
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
runningtime(s)
number of cluster
Influent number of cluster to computational time
Conclusion
Clarans algorithm can combine with weigting dissimilarity function for solve spatial data
clustering. Weigting dissimilarity function more effeciecy than traditional dissimilarity to solve
spatial relationship. Increasing number of cluster will be followed by increasing computational
time but not efficiency of cluster. Weigthing spatial very influent to solve spatial data clustering.

More Related Content

What's hot

What's hot (18)

Shortest path problem
Shortest path problemShortest path problem
Shortest path problem
 
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RFinding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
 
Transfer learningforclp
Transfer learningforclpTransfer learningforclp
Transfer learningforclp
 
Integration
IntegrationIntegration
Integration
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithm
 
3D Graphics : Computer Graphics Fundamentals
3D Graphics : Computer Graphics Fundamentals3D Graphics : Computer Graphics Fundamentals
3D Graphics : Computer Graphics Fundamentals
 
d
dd
d
 
Wind/ Solar Power Forecasting
Wind/ Solar Power Forecasting  Wind/ Solar Power Forecasting
Wind/ Solar Power Forecasting
 
Shortest path algorithm
Shortest  path algorithmShortest  path algorithm
Shortest path algorithm
 
Lesson 25: The Definite Integral
Lesson 25: The Definite IntegralLesson 25: The Definite Integral
Lesson 25: The Definite Integral
 
Dijkstra & flooding ppt(Routing algorithm)
Dijkstra & flooding ppt(Routing algorithm)Dijkstra & flooding ppt(Routing algorithm)
Dijkstra & flooding ppt(Routing algorithm)
 
ONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHS
ONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHSONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHS
ONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHS
 
3 d transformation
3 d transformation3 d transformation
3 d transformation
 
Probabilistic Graph Layout for Uncertain Network Visualization
Probabilistic Graph Layout for Uncertain Network VisualizationProbabilistic Graph Layout for Uncertain Network Visualization
Probabilistic Graph Layout for Uncertain Network Visualization
 
Spatial Transformation
Spatial TransformationSpatial Transformation
Spatial Transformation
 
Suft
SuftSuft
Suft
 
On Skyline Groups
On Skyline GroupsOn Skyline Groups
On Skyline Groups
 
Gram-Schmidt and QR Decomposition (Factorization) of Matrices
Gram-Schmidt and QR Decomposition (Factorization) of MatricesGram-Schmidt and QR Decomposition (Factorization) of Matrices
Gram-Schmidt and QR Decomposition (Factorization) of Matrices
 

Similar to An Weighting Dissimilarity Function of CLARANS for Clustering Spatial Data Education in Java

Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataTony Fast
 
SPATIAL POINT PATTERNS
SPATIAL POINT PATTERNSSPATIAL POINT PATTERNS
SPATIAL POINT PATTERNSLiemNguyenDuy
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Ram Sriharsha
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data ScienceMutia Ulfi
 
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Abebe Admasu
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
 
Raster data analysis
Raster data analysisRaster data analysis
Raster data analysisAbdul Raziq
 
Python for data science
Python for data sciencePython for data science
Python for data sciencebotsplash.com
 

Similar to An Weighting Dissimilarity Function of CLARANS for Clustering Spatial Data Education in Java (20)

Data Mining Lecture_5.pptx
Data Mining Lecture_5.pptxData Mining Lecture_5.pptx
Data Mining Lecture_5.pptx
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud data
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
[PPT]
[PPT][PPT]
[PPT]
 
Data For Datamining
Data For DataminingData For Datamining
Data For Datamining
 
Data For Datamining
Data For DataminingData For Datamining
Data For Datamining
 
PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...
PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...
PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...
 
SPATIAL POINT PATTERNS
SPATIAL POINT PATTERNSSPATIAL POINT PATTERNS
SPATIAL POINT PATTERNS
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data Science
 
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Raster data analysis
Raster data analysisRaster data analysis
Raster data analysis
 
Python for data science
Python for data sciencePython for data science
Python for data science
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 

Recently uploaded

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Recently uploaded (20)

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

An Weighting Dissimilarity Function of CLARANS for Clustering Spatial Data Education in Java

  • 1. An Weighting Dissimilarity Function of CLARANS for Clustering Spatial Data Education in Java ICHWANUL MUSLIM KARO KARO, S.KOM
  • 3. What’s special about spatial data mining The 1854 Asiatic Cholera in London
  • 4.
  • 5. Data in Spatial data mining  Non spatial information  Same like data in traditional data mining  Catagorical, Numeric, Boolean, etc  E.g postal code, city name, number of victim natural disaster Spatial Information Spatial Attribute  Neigborhood  Location: longitude, latitude
  • 6. Representation of spatial data Raster: gridded and space Vector: point, line, polygon
  • 7. Relationships on Data in Spatial Data Mining Relationships on non-spatial data • Explicit • Arithmetic, ranking(ordering), etc. Relationships on Spatial Data • Many are implicit • Relationship Categories ◦ Set-oriented: union, intersection, and membership, etc ◦ Topological: meet, within, overlap, etc ◦ Directional: North, NE, left, above, behind, etc ◦ Metric: e.g., Euclidean: distance, area, perimeter
  • 8. Approach to solve spatial attribute ☼ MV-approximation Calculating the Exact Separation Distance IR Approximation PDF
  • 9. Polygon Dissimilarity Function Given P ={𝑃1, 𝑃2, … 𝑃𝑛}, where P is set of Polygon Non spatial attribute of polygon are all of non spatial attribute that independent of the spatial polygon, average income, number of damaged house because natural disaster Spatial attribute of polygon divide into two catagories, intrinsic and extrinsic. Attribute intrinsic of polygon describe polygon geometric characteristic, like location, shape, area Attribute extrinsic of polygon include various spatial object that may exist in polygon, there are three object spatial, point, line and area
  • 10. PDF between two polygon 𝐷 𝑃𝐷𝐹 = 𝑤 𝑛,𝑠, 𝑑 𝑛,𝑠 𝑃𝑖, 𝑃𝑗 + 𝑤𝑠 𝑑 𝑠 𝑃𝑖, 𝑃𝑗 𝑤 𝑛,𝑠 + 𝑤𝑠 = 1 Distance between non spatial attribute can solve by euclide or manhattan distance. Distance between spatial attribute 𝑑 𝑠 = 𝑤𝑖𝑛𝑠 𝑑𝑖𝑛𝑠 𝑃𝑖, 𝑃𝑗 + 𝑤 𝑒𝑘𝑠 𝑑 𝑒𝑘𝑠 𝑃𝑖, 𝑃𝑗 𝑤𝑖𝑛𝑠 + 𝑤 𝑒𝑘𝑠 = 1
  • 11. Design Process Data Start Seperation Attribute Handling Spatial Attribute Handling non Spatial Attribute Similarity Clustering Evaluation end
  • 12. Modify Dissimilarity 𝐷 𝑎,𝑏 = 𝑤 𝑛,𝑠, 𝑑 𝑛,𝑠 𝑃𝑎, 𝑃𝑏 + 𝑤𝑠 𝑑 𝑠 𝑃𝑎, 𝑃𝑏 𝐷 𝑃𝐷𝐹 = 𝑤 𝑛,𝑠, 𝑑 𝑛,𝑠 𝑃𝑖, 𝑃𝑗 + 𝑤𝑠 𝑑 𝑠 𝑃𝑖, 𝑃𝑗 Without consider intrinsic attribute and extrinsic attribute
  • 13. CLARANS Clarans Algorithm  Input parameters numlocal and maxneighbor. Initi- alize i to 1, and mincost to a large number.  Set current to an arbitrary node in Gn,k.  Set j to 1.  Consider a random neighbor S of current, and based on 5, calculate the cost differential of the two nodes.  If S has a lower cost, set current to S, and go to Step 3.  Otherwise, increment j by 1. If j ≤ maxneighbor,go to Step 4.  Otherwise, when j > maxneighbor, compare the cost of current with mincost. If the former is less than mincost, set mincost to the cost of current and set bestnode to current.  Increment i by 1. If i > numlocal, output bestnode and halt. Otherwise, go to Step 2.
  • 14. Data Data Ratio of student and Class in Java Island
  • 15. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Silhoutteindex Weigth Comparation silhouette index K=2 K=3 0 20 40 60 80 100 120 140 160 180 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Times(s) weigth Comparation Runing Time K=2 K=3 Weigth Spatial Silhouette index Running time (s) 0,1 0.6909 128.828 0,2 0.5168 129.218 0,3 0.5940 129.028 0,4 0.5227 128.525 0,5 0.2783 130.221 0,6 0.4123 130.597 0,7 0.3740 131.458 0,8 0.4874 130.918 0,9 0.5321 135.227
  • 16. 0 50,000 100,000 150,000 200,000 250,000 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 runningtime(s) number of cluster Influent number of cluster to computational time
  • 17. Conclusion Clarans algorithm can combine with weigting dissimilarity function for solve spatial data clustering. Weigting dissimilarity function more effeciecy than traditional dissimilarity to solve spatial relationship. Increasing number of cluster will be followed by increasing computational time but not efficiency of cluster. Weigthing spatial very influent to solve spatial data clustering.