SlideShare a Scribd company logo
Simply Shapes

This is a simply memo…
Classical shape analysis methods
•    Circularity:                              •    Irregularity:
 The degree of circularity is how much this    Measurement of the irregu- larity of a solid. It
polygon is similar to a circle. Where 1 is a   is calculated based on its perimeter and the
perfect circle and 0.492 is an isosceles       perimeter of the sur- rounding circle. The
triangle.                                      minimum irregularity is a circle, corresponding
                                               at the value 1. A square is the maximum
       4p s      s: object area                irregularity with a value of 1.402.
  C=
        p2       p: object perimeter
                                                        pc
                                                   I=
•    Quadrature:                                        p
The degree of quadrature of a solid,
where 1 is a square and 0.800 an isosceles     •     Elongation:
triangle.                                      The degree of ellipticity of a solid, where a
                                               circle and a square are the less elliptic shape.
        p
  Q=                                                    D
       4 s                                         E=
                                                        d
                                               D: maximum diameter within an object
                                               d: minimum diameter perpendicular at D
The Workflow of Morphometric Analysis for Shape
       Original Shape
                                     Distance Matrix
         (Polygon)



     Fourier Transform        Test the number of Clustering




  Inverse Fourier Transform         Clustering by PAM



    Approximate Shape
                              Assign Class info to each object
        (Polygon)



     Procrustes Analysis          Visualize on Geo-space
Fourier descriptors of closed polygons
        Fourier transform enables to represent any periodic function with indefinite summation of
        trigonometric function, which terms Fourier descriptors. Because polygon shape could be
        denote as periodic function when decomposed into X and Y axis, this method could be
        applicable to polygons.
                                                                                                                                                  X axsis




                                                                                                        139.7110
             35.5465




                                                                                                                                                1 ¥         2p nt          2p nt
                                                                                                                                        f( x ) = + å an cos       + bn sin
                                                                                                                                                2 n=1        L              L




                                                                                                        139.7106
                                                                                                 t(i)

                                                                                                        139.7102
                                                                                                        139.7098
             35.5460




                                                                                                                   0.000    0.001         0.002          0.003   0.004       0.005
org_58[,2]




                                          t(xi, yi)                                                                                                  t



                                                                                                                                                  Y axsis

                                                                                                        35.5465
                                                                                                                                     1 ¥         2p nt          2p nt
                                                                                                                           g( y) =    + å an cos       + bn sin
                                                                                                                                     2 n=1        L              L
             35.5455




                                                                                                        35.5460
                                                                                                 t(i)

                                                                                                        35.5455




                 139.7098   139.7100   139.7102   139.7104      139.7106   139.7108   139.7110                     0.000    0.001         0.002          0.003   0.004       0.005

                                                       org_58[,1]                                                                                    t
Original Shape
Simplifying with approximate Shape




              By configuring higher number of harmonics
              and of approximate points, shapes would be
              more approximate to original shapes.
Inverse Fourier Transform
Original polygons can be approximately                                                      Original Shape
                                                                                            First Approximate Ellipse
reconstruct. To reconstruct original




                                                                             35.5465
                                                                                            Approximate Shape
                                                                                                                                                     t(x’j, y’j)
shapes, number of points should be
specified, and each point is arranged on
constant degree apart in a circle.




                                                                             35.5460
                             Approximate with 10 points




                                                                org_58[,2]
               1.0




                                         1
               0.5




                                                                             35.5455
               0.0




                                         0
           y




                      -1                      0           1
               -0.5
               -1.0




                                         -1

                      -1.0        -0.5        0.0   0.5   1.0

                                              x                                  139.7098     139.7100         139.7102   139.7104    139.7106   139.7108   139.7110

                                                                                                                              org_58[,1]
       H
             ì      æ j × 2p × i ö            æ j × 2p × i öü
x ' j = åí ai × cos ç            ÷ + bi × sin ç            ÷ý + cx
         i=2 î      è L ø                     è L øþ
         H
             ì     æ j × 2p × i ö            æ j × 2p × i öü
y'i = åíci × cos ç              ÷ + di × sin ç            ÷ý + cy
        i=2  î     è L ø                     è L øþ
Proclustes Analysis
The aim is to obtain a similar placement
and size between two shapes, by
minimizing a measure of shape               Find an optimum angle of rotation θ that the
difference called the Procrustes distance   sum of the squared distances between
between the objects. To conduct this        corresponding points is minimized.

                                                           å
                                                               n
analysis, number of control points in                              ui yi - wi xi
each shape should be same.                  q = tan   -1   i=1

                                                           å
                                                           n

                                                           i=1
                                                                   ui xi - wi yi

Calculate root mean square distance for     Then, optimum coordinates are assigned by
uniform scaling                             following fomula.

      å ( x - x ) + ( y - y)
          n           2           2


s=        i=1   i             i             (hi, n i ) = ( cosqui -sinqwi,sinqui +sinqwi )
                     n
                                            Dissimilarity between two shapes           are
 Translate & uniform scaling
                                            measured as squared distance.
              xi - x yi - y
(ui, wi ) =         ,                       d=    åi=1(hi - xi ) + (n i - yi )
                                                       n                2          2

                 S      S
Proclustes Analysis
                                      Procrustes errors




                                                                                 35.5465
                       sum of squares:




                                                                                 35.5460
                       1.758e-06
              5e-04




                                                                  org_58[,2]

                                                                                 35.5455
Dimension 2




                                                                                   139.7098    139.7102    139.7106   139.7110
              0e+00




                                                                                                      org_58[,1]




                                                                                 35.702
              -5e-04




                                                                                 35.700
                                                                  org_2570[,2]

                                                                                 35.698
                                                                                 35.696
                             -5e-04         0e+00         5e-04
                                                                                           139.650        139.654      139.658
                                         Dimension 1                                                 org_2570[,1]
Partition Around Medoids (PAM)
Partition Around Medoids(PAM) is a clustering algorithm which attempt to minimize
squared error as well as the k-means. In contrast to k-means, PAM chooses existing points
as centers, terms medoids, and the algorithm is more robust to noise and outliers as
compared to k-means.                            Silhouette plot of pam(x = tokyo.dist^2, k = 5)

                k                                         n = 4373                                                 5 clusters Cj



argmin å å x j - mi
                                                                                                                   j : nj | aveiÎCj si



                                                                                                                   1 : 1388 | 0.62

               i=1 x j 'Si
Where mi is the medoid of Si.
                                                                                                                    2 : 740 | 0.41




$classinfo (output of PAM clustering)                                                                              3 : 1070 | 0.44

size      max_diss    av_diss   diameter     separation
[1,] 1388    65.804    18.27153 193.8786      0.2096066
                                                                                                                    4 : 693 | 0.41
[2,] 740   239.5017     29.9133    463.227    0.1864726
[3,] 1070 200.8129     31.75182 429.5183      0.2096066
                                                                                                                    5 : 482 | 0.35
[4,] 693   737.1965    30.68781 1044.5552     0.1864726
[5,] 482   460.6608     46.2136 803.3625      0.3181256
     ・       ・            ・         ・            ・           -0.2        0.0         0.2         0.4         0.6   0.8           1.0
     ・       ・            ・         ・            ・                                     Silhouette width si
     ・       ・            ・         ・            ・        Average silhouette width : 0.48
Silhouette Width - Test the number of clustering -
For each datum i, average dissimilarity distance                          C       k-=4
within the same class is calculated At first.


                    å
        1
a(i) =                          (a(i) - a j )2                 B
       n(k )    a(i) ,a j 'Ki
            i
                                                                                            D
Calculate the lowest averaged dissimilarity to
datum j of any other cluster as following.
              æ                   ö
b(i) = argmin ç 1 å (a - b )2 ÷                                               A
              ç n b 'K     (i ) j
                                  ÷
          K
              è (k j ) j j        ø
The index of clustering efficiency at datum i         The index of clustering efficiency at each
is calculated as silhouette width.                    cluster k is average silhouette width.
            a(i) - b(i)
                                                                    å S(i) (-1 £ Sk £1)
                                                            1
S(i) =                               (-1 £ S(i) £1)   Sk =
                {
         max a(i) , b(i)         }                         n(k j ) S(i) 'Ki
Average Silhouette Width
                     The highest average width = 5

                            Average Silhouette Width Silhouette Width N=50
                                                Averaged with PAM from 2 to 50 clusters
          0.48




                                                                    ì
                                                                    ï        1- a(i)
                                                                    ï
          0.46




                                                                               b(i )   if (a(i) > b(i) )
                                                                    ï
                                                                    ï
                                                             S(i) = í           0      if (a(i) = b(i) )
                                                                    ï
          0.44




                                                                               b(i )   if (a(i) < b(i) )
res$sil




                                                                    ï
                                                                    ï        a(i) -1
                                                                    ï
          0.42




                                                                    î
          0.40




                 0            10                     20                 30               40                50

                                                          Index


                       Averaged silhouette width suggests that the number of cluster = 5
Clustering by PAM
Silhouette Width
Simply shape
Simply shape

More Related Content

Viewers also liked

10分でわかる主成分分析(PCA)
10分でわかる主成分分析(PCA)10分でわかる主成分分析(PCA)
10分でわかる主成分分析(PCA)
Takanori Ogata
 
Rstudio事始め
Rstudio事始めRstudio事始め
Rstudio事始め
Takashi Yamane
 
はじめよう多変量解析~主成分分析編~
はじめよう多変量解析~主成分分析編~はじめよう多変量解析~主成分分析編~
はじめよう多変量解析~主成分分析編~
宏喜 佐野
 
Distance Metric Based Multi-Attribute Seismic Facies Classification to Identi...
Distance Metric Based Multi-Attribute Seismic Facies Classification to Identi...Distance Metric Based Multi-Attribute Seismic Facies Classification to Identi...
Distance Metric Based Multi-Attribute Seismic Facies Classification to Identi...
Pioneer Natural Resources
 
Definition and classification of (online) distance education
Definition and classification of (online) distance educationDefinition and classification of (online) distance education
Definition and classification of (online) distance education
Marie Tessier
 
Bad Apple Curve!! 〜フーリエ記述子でアニメーション作ってみた〜
Bad Apple Curve!! 〜フーリエ記述子でアニメーション作ってみた〜Bad Apple Curve!! 〜フーリエ記述子でアニメーション作ってみた〜
Bad Apple Curve!! 〜フーリエ記述子でアニメーション作ってみた〜
Yu(u)ki IWABUCHI
 
Distance based method
Distance based method Distance based method
Distance based method
Adhena Lulli
 
機械学習を理解するための数学テクニック勉強会 Lagrange 未定乗数法
機械学習を理解するための数学テクニック勉強会 Lagrange 未定乗数法機械学習を理解するための数学テクニック勉強会 Lagrange 未定乗数法
機械学習を理解するための数学テクニック勉強会 Lagrange 未定乗数法
OWL.learn
 
QGIS初級編 さわってみようQGIS (for ver. 2.8.2 at FOSS4G 2015 Hokkaido)
QGIS初級編 さわってみようQGIS (for ver. 2.8.2 at FOSS4G 2015 Hokkaido)QGIS初級編 さわってみようQGIS (for ver. 2.8.2 at FOSS4G 2015 Hokkaido)
QGIS初級編 さわってみようQGIS (for ver. 2.8.2 at FOSS4G 2015 Hokkaido)
Yoh Fukuda
 
Fisher線形判別分析とFisher Weight Maps
Fisher線形判別分析とFisher Weight MapsFisher線形判別分析とFisher Weight Maps
Fisher線形判別分析とFisher Weight Maps
Takao Yamanaka
 
初心者のためのRとRStudio入門 vol.2
初心者のためのRとRStudio入門 vol.2初心者のためのRとRStudio入門 vol.2
初心者のためのRとRStudio入門 vol.2
OWL.learn
 
Webディレクター・マーケターのためのSQL教室 2015/07/13
Webディレクター・マーケターのためのSQL教室 2015/07/13Webディレクター・マーケターのためのSQL教室 2015/07/13
Webディレクター・マーケターのためのSQL教室 2015/07/13
OWL.learn
 
営業さんまで、社員全員がSQLを使う 「越境型組織」 ができるまでの3+1のポイント | リブセンス
営業さんまで、社員全員がSQLを使う 「越境型組織」 ができるまでの3+1のポイント | リブセンス営業さんまで、社員全員がSQLを使う 「越境型組織」 ができるまでの3+1のポイント | リブセンス
営業さんまで、社員全員がSQLを使う 「越境型組織」 ができるまでの3+1のポイント | リブセンス
Livesense Inc.
 
パターン認識と機械学習入門
パターン認識と機械学習入門パターン認識と機械学習入門
パターン認識と機械学習入門Momoko Hayamizu
 

Viewers also liked (15)

10分でわかる主成分分析(PCA)
10分でわかる主成分分析(PCA)10分でわかる主成分分析(PCA)
10分でわかる主成分分析(PCA)
 
Rstudio事始め
Rstudio事始めRstudio事始め
Rstudio事始め
 
はじめよう多変量解析~主成分分析編~
はじめよう多変量解析~主成分分析編~はじめよう多変量解析~主成分分析編~
はじめよう多変量解析~主成分分析編~
 
Distance Metric Based Multi-Attribute Seismic Facies Classification to Identi...
Distance Metric Based Multi-Attribute Seismic Facies Classification to Identi...Distance Metric Based Multi-Attribute Seismic Facies Classification to Identi...
Distance Metric Based Multi-Attribute Seismic Facies Classification to Identi...
 
Definition and classification of (online) distance education
Definition and classification of (online) distance educationDefinition and classification of (online) distance education
Definition and classification of (online) distance education
 
Bad Apple Curve!! 〜フーリエ記述子でアニメーション作ってみた〜
Bad Apple Curve!! 〜フーリエ記述子でアニメーション作ってみた〜Bad Apple Curve!! 〜フーリエ記述子でアニメーション作ってみた〜
Bad Apple Curve!! 〜フーリエ記述子でアニメーション作ってみた〜
 
Distance based method
Distance based method Distance based method
Distance based method
 
Breast cancer
Breast cancerBreast cancer
Breast cancer
 
機械学習を理解するための数学テクニック勉強会 Lagrange 未定乗数法
機械学習を理解するための数学テクニック勉強会 Lagrange 未定乗数法機械学習を理解するための数学テクニック勉強会 Lagrange 未定乗数法
機械学習を理解するための数学テクニック勉強会 Lagrange 未定乗数法
 
QGIS初級編 さわってみようQGIS (for ver. 2.8.2 at FOSS4G 2015 Hokkaido)
QGIS初級編 さわってみようQGIS (for ver. 2.8.2 at FOSS4G 2015 Hokkaido)QGIS初級編 さわってみようQGIS (for ver. 2.8.2 at FOSS4G 2015 Hokkaido)
QGIS初級編 さわってみようQGIS (for ver. 2.8.2 at FOSS4G 2015 Hokkaido)
 
Fisher線形判別分析とFisher Weight Maps
Fisher線形判別分析とFisher Weight MapsFisher線形判別分析とFisher Weight Maps
Fisher線形判別分析とFisher Weight Maps
 
初心者のためのRとRStudio入門 vol.2
初心者のためのRとRStudio入門 vol.2初心者のためのRとRStudio入門 vol.2
初心者のためのRとRStudio入門 vol.2
 
Webディレクター・マーケターのためのSQL教室 2015/07/13
Webディレクター・マーケターのためのSQL教室 2015/07/13Webディレクター・マーケターのためのSQL教室 2015/07/13
Webディレクター・マーケターのためのSQL教室 2015/07/13
 
営業さんまで、社員全員がSQLを使う 「越境型組織」 ができるまでの3+1のポイント | リブセンス
営業さんまで、社員全員がSQLを使う 「越境型組織」 ができるまでの3+1のポイント | リブセンス営業さんまで、社員全員がSQLを使う 「越境型組織」 ができるまでの3+1のポイント | リブセンス
営業さんまで、社員全員がSQLを使う 「越境型組織」 ができるまでの3+1のポイント | リブセンス
 
パターン認識と機械学習入門
パターン認識と機械学習入門パターン認識と機械学習入門
パターン認識と機械学習入門
 

Simply shape

  • 1. Simply Shapes This is a simply memo…
  • 2. Classical shape analysis methods • Circularity: • Irregularity: The degree of circularity is how much this Measurement of the irregu- larity of a solid. It polygon is similar to a circle. Where 1 is a is calculated based on its perimeter and the perfect circle and 0.492 is an isosceles perimeter of the sur- rounding circle. The triangle. minimum irregularity is a circle, corresponding at the value 1. A square is the maximum 4p s s: object area irregularity with a value of 1.402. C= p2 p: object perimeter pc I= • Quadrature: p The degree of quadrature of a solid, where 1 is a square and 0.800 an isosceles • Elongation: triangle. The degree of ellipticity of a solid, where a circle and a square are the less elliptic shape. p Q= D 4 s E= d D: maximum diameter within an object d: minimum diameter perpendicular at D
  • 3. The Workflow of Morphometric Analysis for Shape Original Shape Distance Matrix (Polygon) Fourier Transform Test the number of Clustering Inverse Fourier Transform Clustering by PAM Approximate Shape Assign Class info to each object (Polygon) Procrustes Analysis Visualize on Geo-space
  • 4. Fourier descriptors of closed polygons Fourier transform enables to represent any periodic function with indefinite summation of trigonometric function, which terms Fourier descriptors. Because polygon shape could be denote as periodic function when decomposed into X and Y axis, this method could be applicable to polygons. X axsis 139.7110 35.5465 1 ¥ 2p nt 2p nt f( x ) = + å an cos + bn sin 2 n=1 L L 139.7106 t(i) 139.7102 139.7098 35.5460 0.000 0.001 0.002 0.003 0.004 0.005 org_58[,2] t(xi, yi) t Y axsis 35.5465 1 ¥ 2p nt 2p nt g( y) = + å an cos + bn sin 2 n=1 L L 35.5455 35.5460 t(i) 35.5455 139.7098 139.7100 139.7102 139.7104 139.7106 139.7108 139.7110 0.000 0.001 0.002 0.003 0.004 0.005 org_58[,1] t
  • 6. Simplifying with approximate Shape By configuring higher number of harmonics and of approximate points, shapes would be more approximate to original shapes.
  • 7. Inverse Fourier Transform Original polygons can be approximately Original Shape First Approximate Ellipse reconstruct. To reconstruct original 35.5465 Approximate Shape t(x’j, y’j) shapes, number of points should be specified, and each point is arranged on constant degree apart in a circle. 35.5460 Approximate with 10 points org_58[,2] 1.0 1 0.5 35.5455 0.0 0 y -1 0 1 -0.5 -1.0 -1 -1.0 -0.5 0.0 0.5 1.0 x 139.7098 139.7100 139.7102 139.7104 139.7106 139.7108 139.7110 org_58[,1] H ì æ j × 2p × i ö æ j × 2p × i öü x ' j = åí ai × cos ç ÷ + bi × sin ç ÷ý + cx i=2 î è L ø è L øþ H ì æ j × 2p × i ö æ j × 2p × i öü y'i = åíci × cos ç ÷ + di × sin ç ÷ý + cy i=2 î è L ø è L øþ
  • 8. Proclustes Analysis The aim is to obtain a similar placement and size between two shapes, by minimizing a measure of shape Find an optimum angle of rotation θ that the difference called the Procrustes distance sum of the squared distances between between the objects. To conduct this corresponding points is minimized. å n analysis, number of control points in ui yi - wi xi each shape should be same. q = tan -1 i=1 å n i=1 ui xi - wi yi Calculate root mean square distance for Then, optimum coordinates are assigned by uniform scaling following fomula. å ( x - x ) + ( y - y) n 2 2 s= i=1 i i (hi, n i ) = ( cosqui -sinqwi,sinqui +sinqwi ) n Dissimilarity between two shapes are Translate & uniform scaling measured as squared distance. xi - x yi - y (ui, wi ) = , d= åi=1(hi - xi ) + (n i - yi ) n 2 2 S S
  • 9. Proclustes Analysis Procrustes errors 35.5465 sum of squares: 35.5460 1.758e-06 5e-04 org_58[,2] 35.5455 Dimension 2 139.7098 139.7102 139.7106 139.7110 0e+00 org_58[,1] 35.702 -5e-04 35.700 org_2570[,2] 35.698 35.696 -5e-04 0e+00 5e-04 139.650 139.654 139.658 Dimension 1 org_2570[,1]
  • 10. Partition Around Medoids (PAM) Partition Around Medoids(PAM) is a clustering algorithm which attempt to minimize squared error as well as the k-means. In contrast to k-means, PAM chooses existing points as centers, terms medoids, and the algorithm is more robust to noise and outliers as compared to k-means. Silhouette plot of pam(x = tokyo.dist^2, k = 5) k n = 4373 5 clusters Cj argmin å å x j - mi j : nj | aveiÎCj si 1 : 1388 | 0.62 i=1 x j 'Si Where mi is the medoid of Si. 2 : 740 | 0.41 $classinfo (output of PAM clustering) 3 : 1070 | 0.44 size max_diss av_diss diameter separation [1,] 1388 65.804 18.27153 193.8786 0.2096066 4 : 693 | 0.41 [2,] 740 239.5017 29.9133 463.227 0.1864726 [3,] 1070 200.8129 31.75182 429.5183 0.2096066 5 : 482 | 0.35 [4,] 693 737.1965 30.68781 1044.5552 0.1864726 [5,] 482 460.6608 46.2136 803.3625 0.3181256 ・ ・ ・ ・ ・ -0.2 0.0 0.2 0.4 0.6 0.8 1.0 ・ ・ ・ ・ ・ Silhouette width si ・ ・ ・ ・ ・ Average silhouette width : 0.48
  • 11. Silhouette Width - Test the number of clustering - For each datum i, average dissimilarity distance C k-=4 within the same class is calculated At first. å 1 a(i) = (a(i) - a j )2 B n(k ) a(i) ,a j 'Ki i D Calculate the lowest averaged dissimilarity to datum j of any other cluster as following. æ ö b(i) = argmin ç 1 å (a - b )2 ÷ A ç n b 'K (i ) j ÷ K è (k j ) j j ø The index of clustering efficiency at datum i The index of clustering efficiency at each is calculated as silhouette width. cluster k is average silhouette width. a(i) - b(i) å S(i) (-1 £ Sk £1) 1 S(i) = (-1 £ S(i) £1) Sk = { max a(i) , b(i) } n(k j ) S(i) 'Ki
  • 12. Average Silhouette Width The highest average width = 5 Average Silhouette Width Silhouette Width N=50 Averaged with PAM from 2 to 50 clusters 0.48 ì ï 1- a(i) ï 0.46 b(i ) if (a(i) > b(i) ) ï ï S(i) = í 0 if (a(i) = b(i) ) ï 0.44 b(i ) if (a(i) < b(i) ) res$sil ï ï a(i) -1 ï 0.42 î 0.40 0 10 20 30 40 50 Index Averaged silhouette width suggests that the number of cluster = 5