SlideShare a Scribd company logo
1 of 65
Download to read offline
Analyzing formal features of archaeological
    artefacts through the application of
             Spectral Clustering
                                 Diego Jiménez-Badillo
                                    diego_jimenez@inah.gob.mx
                            National Institute of Anthropology and History
                                              México City




Salvador Ruíz Correa                                        Omar Méndoza-Montoya
src@cimat.mx                                                omendoz@cimat.mx
Centro de Investigaciones en Matemáticas                    Centro de Investigaciones en Matemáticas
Guanajuato, Mexico                                          Guanajuato, Mexico
Introduction
 This paper is part of a broader effort to introduce the
  archaeological community to a range of computer tools
  and mathematical algorithms for analyzing archaeological
  collections. These include:
1. Application of clustering techniques for unsupervised
   learning.
2. Acquisition and analysis of 3D digital models.

3. Application of computer vision algorithms for automatic
   recognition of artefacts.
4. Automatic classification of shape features.
This presentation

 Here, we will focus on a new methodology for the analysis
  of archaeological masks based on a quantitative procedure
  called Spectral Clustering.

 This technique has not been applied before in archaeology
  despite its proven performance for partitioning a collection
  of artifacts into meaningful groups.
The Mask Collection from the Great Temple of Tenochtitlan

Study case
The idea for this
    project came from
    the need to classify
    similarities in 162
    stone masks found
    in the remains of
    the Sacred Precinct
    of Tenochtitlan, the
    main ceremonial
    Aztec complex,
    located in Mexico
    City.

The schematic features
of these objects set
them apart from other
more “naturalistic”
style artifacts.

Their appearance has
attracted the attention
of many specialists and
during the last three
decades these masks
have been the subject
of intense debate for
two main reasons:
These masks are interesting for several reasons:


 First, 220
  figurines and
  162 masks were
  located in 14
  Aztec offerings
  dating from
  1390 to 1469
  A.C., yet they
  do not show
  typical “Aztec
  features”.
Indeed, their
appearance
resembles artefacts
from Teotihuacan
and from the
southern State of
Guerrero,
particularly from the
Mezcala region,
which is hundreds
of kilometers away
from the ancient
Tenochtitlan.
 Second, it is not clear how many Guerrero/Mezcala styles
  exist:

  Some specialists believe there are at least five1 different
  traditions while others recognize only four2, and another
  group of researchers sees only two3 (Serra Puche 1975).



  1Covarrubias 1948, 1961; Olmedo and Gonzalez 1986; Gonzalez and Olmedo 1990
  2 Gay, 1967
  3 Serra Puche 1975
More objective methods are needed to answer
questions such as:

 How many styles were developed in the
  Guerrero/Mezcala regions?
 How many of these styles coexisted?

 Were some styles contemporary with the Aztecs?

 Were some of these masks manufactured by the Aztecs?

 Which specific styles are represented among the 162
  masks found in the Sacred Precinct of Tenochtitlan?
.

Clustering basics
The application of clustering in archaeology


  One of the most important applications of clustering in
  archaeology is “unsupervised learning”, that is the
  discovery of artifact groups based exclusively on the
  analysis of characteristics observed in the artifacts
  themselves.

  Once the collection has been segmented into several
  groups, it would become easier to distinguish potential
  classes.
Clustering basics

  Given a set of data-points, any clustering algorithm seeks
  to separate those points into a finite quantity of groups
  (i.e., clusters). It applies an objective similarity function to
  weigh how close (similar) or distant (dissimilar) the
  original data are among themselves.

  Items assigned to the same group must be highly similar.
  At the same time, items belonging to different groups
  must be highly dissimilar.

  The quality of the algorithm is judged by how successful it
  is to accomplish: a) greater homogeneity within a group,
  and b) greater heterogeneity between groups.
Two clustering approaches are very popular:

1. Component linkage algorithms (i.e. single and total
   linkage) are based on thresholding pairwise distances
   and are best suited for discovering complex elongated
   clusters, but are very sensitive to noise in the data.

2. K-means algorithms, on the other hand, are very
   robust to noise but are best suited for rounded
   linearly separable clusters.
Boundary shapes
 Some years ago, Olmedo
  and Gonzalez (1986)
  proposed a classification
  based on a component-
  linkage algorithm
  (numerical taxonomy).
 The forms of faces, eyes,
  noses, eyebrows, etc. were
  codified categorically.
 This produced an input of
  23 shape attributes for
  each one of 162 masks.



Olmedo and Gonzalez. 1986. Presencia del Estilo Mezcala en el Templo Mayor. INAH, México .
Eyebrows shapes   Noise shapes
Olmedo and González results

 This lead to the identification of 40 groups, of which:


    13 groups include only 2 masks
    13 groups include only 3 masks
    6 groups include 4 masks
    20 masks were not included in any group
.

Spectral Clustering
SPECTRAL CLUSTERING
Spectral Clustering is a state-of-the-art technique for
exploratory analysis.

It is derived from Spectral Graph Theory, a branch of
mathematics that studies the properties of graphs in relation
to the eigenvalues and eigenvectors of an especial type of
matrix called Laplacian.

In contrast to other techniques, spectral clustering can be
applied to different kinds of data, including categorical data.
In many situations, categorical data is often the only source of
information for archaeological unsupervised learning.
Therefore we believe it could bring great benefits in our field.
Here, we won’t dive too deep into mathematical theory.
Instead, we follow a very simple example to make the
logic of this clustering technique easier to understand.
Mathematical proofs can be found in the literature
referenced in the paper, especially Luxburg (2007).

Spectral Clustering is more efficient than linkage and k-
means algorithms. It finds elongated clusters and is more
robust to noise than linkage algorithms.
Spectral clustering logic

Spectral clustering
seeks to identify
groups by analyzing
no the exact
location of the data
points (like single,
total linkage and k-
means), but the
connectedness
between them.
 Spectral clustering relies on a graph representation of the
  data set.
 In this graph each mask is represented as a vertex. Then,
  we calculate a measure of similarity (affinity) between all
  the masks. This is done in two steps:
 First, calculate the so-called Hamming distance, which
  measures the percentage of non-shared attributes
  between two artefacts. Obviously, this measure is very
  useful to work with categorical data. The formula is:
 Second, calculate the affinity between all the masks by
  applying the Gaussian function to the Hamming distance:




  σ = Threshold to control the desired level of similarity

 If two objects are very different the result of the equation
  is negative and close to or equal to zero. On the contrary,
  if two objects are very similar the result of the equation is
  near or equal to 1.
 The next step is to
  draw a link
  between two
  vertices (i.e. masks)
  if the similarity
  between them is
  positive or larger
  than a certain
  threshold
  controlled by the
  parameter sigma.
Next, we weight each
edge of the graph
with its
corresponding
similarity score.
This step produces a
matrix in which the
affinity (i.e. similarity)
of all objects with all
others is recorded.
 Once this is done, the
  clustering problem can be
  reformulated as finding an
  optimal cutting of the
  graph. This means finding a
  partition of the graph such
  that the edges linking
  objects of the same group
  have very high weights (i.e.
  are quite similar), while the
  edges between groups
  have low weighs (i.e. are
  very dissimilar).
THE CUTTING PROBLEM

  Finding an optimal cutting of the graph is what
  mathematicians call an NP-hard problem. This means
  that it cannot be found in real time. Therefore, we need
  to find an approximate solution.

  One type of relaxation is based on analyzing the
  eigenestructure of the Laplacian matrix associated to
  the similarity graph. Mathematical proofs can be found
  in the extensive literature on the subject.
LAPLACIAN MATRIX
The so-called Laplacian matrix is a transformation of the
similarity matrix that shows more clearly the structure of the
dataset. To build a Laplacian Matrix we need two ingredients:

   a. A matrix D containing information of the connectivity
       of the similarity graph. This is called the Degree
       matrix or Diagonal Matrix.

   b. The similarity matrix S produced in step 2.

The simple Laplacian matrix satisfies the following equation:

                          L=D-S
EIGENSTRUCTURE

  As you know “eigen” is a prefix that means “innate”,
  “own”, and “characteristic”. Thus, studying the
  eigenstructure of matrices serves the purpose of
  revealing the intrinsic nature of the data contained in
  those matrices.

  Looking at the eigenstructure we can identify the best
  possible cuts for the graph and therefore the best
  clustering partition.

  The eigenstructure is given by eigenvalues and
  eigenvectors.
EIGENVALUES AND EIGENVECTORS

  Given a square symmetric matrix S, we say that λ
  (lambda) is an eigenvalue of S if there exists a non-zero
  vector x such that:

                   Sx = λx     equation (2).

  In equation (2), x represents the eigenvector associated
  to the eigenvalue λ and both constitute an eigenpair for
  the matrix S.
GRAPH SPECTRUM

 Each eigenvector has its associated eigenvalue. So there
 are as many eigenvectors as eigenvalues. The spectrum of
 the graph is precisely the set of all eigenvalues that satisfy
 equation (2). Such property is invariant with respect to
 the orientation of the data set.

 Calculating eigenpairs for a large matrix is a daunting task.
 Archaeologists do not need to worry about how to do it,
 as many computer math-libraries provide appropriate
 tools.

 As part of this project, we have implemented a tool-box
 to perform spectral clustering.
A very basic example
 We try to partition a data set
  consisting of 9 objects. We
  first built a 9 x 9 square
  table, in which both rows
  and columns enumerate
  each one of the 9 objects of
  this example. Then, we
  calculate the affinity
  between the objects by
  applying equation 1 and
  input the resulting values
  into the table.
If we codify affinity values with shades of color, then the
affinity matrix will have bolder colors in the cells with high
affinity and light or no color in those with low affinity.
 The interesting thing to
  notice is that columns 1,
  2, and 3 of the affinity
  matrix look exactly the
  same.
 An obvious conclusion is
  that objects belonging to
  the same class (cluster)
  will have similar affinity
  vectors, which will be
  quite different of the
  affinity vectors of other
  classes.
In this picture the
                          affinity matrix shows
                          clearly 2 clusters in
                          the data.




In this picture the
affinity matrix shows
4 clusters in the data.
Therefore, by finding the
characteristic vectors in the
matrix we would be able to
identify the clusters in a
collection.
In our example, the three
dominant vectors would be
the ones illustrated here.
These are known in
mathematical terms as the
eigenvectors of the matrix.


There are other vectors, but
these tell us nothing about
the clustering structure of
the data.
Finally, if we use those
vectors as axes of a new
coordinate system and
map the original data-
points into such space,
then we would visually
appreciate the 3
clusters in a more easy
way.
Of course, not all data sets are as clearly structured as the
example. In most cases, it is necessary to perform the
eigenstructure analysis using complex Linear Algebra
algorithms.
Details of those techniques can be found in the extensive
literature on the subject, especially in Alpert, Kahng, and Yao
(1999), Shi and Malik (2000), Ng, Jordan and Weiss (2001),
Melia and Shi (2001), Zelnik-Manor and Prona (2004), Bach
and Jordan (2006, 2008), Azranand and Ghahramani
(2006b), Yan et al. (2009).
Fortunately, archaeologists do not need to implement these
methods, as we have produced a computer program that
performs Spectral Clustering automatically.
Previous clustering

Results from component linkage
COMPONENT LINKAGE FAILURES

 Dataset of 162 stone masks
 40 clusters (too many!)
 10 well defined clusters (too few)
 6 clusters not very well defined
 Sparse clusters: 13 clusters have only 2 elements.
 20 masks are not clustered (i.e. 20 clusters have only 1
  element).
Component Linkage: Cluster 2
Component Linkage : Cluster 4
Component Linkage: Cluster 10
Component Linkage: Cluster 37 (acceptable)
Again, Cluster 37 (as it should be)
Component Linkage: Cluster 39
If spectral clustering is efficient and robust,
we should find better-defined clusters
Application; K = 25; sigma = 1.0

Results from spectral clustering
Cluster 3
Cluster 11
Cluster 23
Cluster 14
Cluster 22
Cluster 18
Cluster 20
Cluster 4
Cluster 25
FUTURE WORK

 Spectral clustering relies on two parameters

1.     k (desired number of clusters)

2.     σ (similarity threshold)

    Our next goal is to apply another algorithm to “learn”
     the value of k directly from the dataset.
 Software demonstration
Conclusions
 Applying Spectral Clustering to the Mezcala collection
  have produced encouraging results. We were able to
  partition the mask collection into 23 well-defined groups,
  which is a better result than the 40 clusters obtained with
  Numerical taxonomy.
 We illustrated the 23 groups of Mezcala masks. The
  reader would notice the great performance of the
  Spectral Clustering algorithm, especially by comparing
  clusters 3, 11, 12, 14, 16, 18, 25, and some others. Such
  groups are defined by highly similar masks. Cluster 12, for
  example, contains masks with triangular faces made in
  highly polished stone. In contrast, cluster 3 contains
  square masks, most of which with perforated eyes and
  less polished material that the ones in group 12.
 Furthermore, each one of the groups identified with
  Spectral Clustering is clearly different from the rest, which
  allows us trusting the partition. Only 2 masks (labeled
  here as “clusters” 7 and 17) were left un-clustered, which
  represents a better result than the one obtained by
  numerical taxonomy in which 20 masks were isolated.
 Therefore, we believe that Spectral Clustering may have a
  future role in archaeology, especially as a first step in
  analyzing shape features of complex collections.
THANKS FOR YOUR ATTENTION

More Related Content

Viewers also liked

Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering TutorialZitao Liu
 
247 deview 2013 이미지 분석 - 민재식
247 deview 2013 이미지 분석 - 민재식247 deview 2013 이미지 분석 - 민재식
247 deview 2013 이미지 분석 - 민재식NAVER D2
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
[2B4]Live Broadcasting 추천시스템
[2B4]Live Broadcasting 추천시스템  [2B4]Live Broadcasting 추천시스템
[2B4]Live Broadcasting 추천시스템 NAVER D2
 
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개Terry Cho
 
[메조미디어] 2014년 연령별 타겟 분석 50대 2014.07
[메조미디어] 2014년 연령별 타겟 분석 50대 2014.07[메조미디어] 2014년 연령별 타겟 분석 50대 2014.07
[메조미디어] 2014년 연령별 타겟 분석 50대 2014.07MezzoMedia
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyChris Johnson
 
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가Yongha Kim
 

Viewers also liked (8)

Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering Tutorial
 
247 deview 2013 이미지 분석 - 민재식
247 deview 2013 이미지 분석 - 민재식247 deview 2013 이미지 분석 - 민재식
247 deview 2013 이미지 분석 - 민재식
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
[2B4]Live Broadcasting 추천시스템
[2B4]Live Broadcasting 추천시스템  [2B4]Live Broadcasting 추천시스템
[2B4]Live Broadcasting 추천시스템
 
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
머신 러닝 입문 #1-머신러닝 소개와 kNN 소개
 
[메조미디어] 2014년 연령별 타겟 분석 50대 2014.07
[메조미디어] 2014년 연령별 타겟 분석 50대 2014.07[메조미디어] 2014년 연령별 타겟 분석 50대 2014.07
[메조미디어] 2014년 연령별 타겟 분석 50대 2014.07
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가
 

Similar to [DCSB] Dr Diego Jiménez-Badillo (INAH, Mexico), "Classifying Formal Features of Archaeological Artefacts through the Application of Spectral Clustering".

Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster AnalysisDerek Kane
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques finalBenard Maina
 
Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...Antonio Lieto
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationChristopher Peter Makris
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringIAEME Publication
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringprjpublications
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporaryprjpublications
 
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms ijcseit
 
Literature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesLiterature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesIOSR Journals
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 

Similar to [DCSB] Dr Diego Jiménez-Badillo (INAH, Mexico), "Classifying Formal Features of Archaeological Artefacts through the Application of Spectral Clustering". (20)

Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques final
 
Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image Segmentation
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
I1803026164
I1803026164I1803026164
I1803026164
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
 
cluster.pptx
cluster.pptxcluster.pptx
cluster.pptx
 
PggLas12
PggLas12PggLas12
PggLas12
 
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
 
Project
ProjectProject
Project
 
EWIC talk - 07 June, 2018
EWIC talk - 07 June, 2018EWIC talk - 07 June, 2018
EWIC talk - 07 June, 2018
 
ICED 2013 A
ICED 2013 AICED 2013 A
ICED 2013 A
 
[PPT]
[PPT][PPT]
[PPT]
 
A0310112
A0310112A0310112
A0310112
 
Literature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesLiterature Survey On Clustering Techniques
Literature Survey On Clustering Techniques
 
Data analysis05 clustering
Data analysis05 clusteringData analysis05 clustering
Data analysis05 clustering
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 

More from Digital Classicist Seminar Berlin

[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...Digital Classicist Seminar Berlin
 
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...Digital Classicist Seminar Berlin
 
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...Digital Classicist Seminar Berlin
 
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...Digital Classicist Seminar Berlin
 
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...Digital Classicist Seminar Berlin
 
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...Digital Classicist Seminar Berlin
 
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...Digital Classicist Seminar Berlin
 
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...Digital Classicist Seminar Berlin
 
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...Digital Classicist Seminar Berlin
 
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...Digital Classicist Seminar Berlin
 
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...Digital Classicist Seminar Berlin
 
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...Digital Classicist Seminar Berlin
 
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...Digital Classicist Seminar Berlin
 
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...Digital Classicist Seminar Berlin
 
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...Digital Classicist Seminar Berlin
 
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...Digital Classicist Seminar Berlin
 
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...Digital Classicist Seminar Berlin
 
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...Digital Classicist Seminar Berlin
 
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...Digital Classicist Seminar Berlin
 
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...Digital Classicist Seminar Berlin
 

More from Digital Classicist Seminar Berlin (20)

[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
 
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
 
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
 
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
 
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
 
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
 
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
 
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
 
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
 
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
 
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
 
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
 
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
 
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
 
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
 
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
 
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
 
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
 
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
 
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
 

[DCSB] Dr Diego Jiménez-Badillo (INAH, Mexico), "Classifying Formal Features of Archaeological Artefacts through the Application of Spectral Clustering".

  • 1. Analyzing formal features of archaeological artefacts through the application of Spectral Clustering Diego Jiménez-Badillo diego_jimenez@inah.gob.mx National Institute of Anthropology and History México City Salvador Ruíz Correa Omar Méndoza-Montoya src@cimat.mx omendoz@cimat.mx Centro de Investigaciones en Matemáticas Centro de Investigaciones en Matemáticas Guanajuato, Mexico Guanajuato, Mexico
  • 2. Introduction  This paper is part of a broader effort to introduce the archaeological community to a range of computer tools and mathematical algorithms for analyzing archaeological collections. These include: 1. Application of clustering techniques for unsupervised learning. 2. Acquisition and analysis of 3D digital models. 3. Application of computer vision algorithms for automatic recognition of artefacts. 4. Automatic classification of shape features.
  • 3. This presentation  Here, we will focus on a new methodology for the analysis of archaeological masks based on a quantitative procedure called Spectral Clustering.  This technique has not been applied before in archaeology despite its proven performance for partitioning a collection of artifacts into meaningful groups.
  • 4. The Mask Collection from the Great Temple of Tenochtitlan Study case
  • 5. The idea for this project came from the need to classify similarities in 162 stone masks found in the remains of the Sacred Precinct of Tenochtitlan, the main ceremonial Aztec complex, located in Mexico City. 
  • 6.
  • 7.
  • 8. The schematic features of these objects set them apart from other more “naturalistic” style artifacts. Their appearance has attracted the attention of many specialists and during the last three decades these masks have been the subject of intense debate for two main reasons:
  • 9. These masks are interesting for several reasons:  First, 220 figurines and 162 masks were located in 14 Aztec offerings dating from 1390 to 1469 A.C., yet they do not show typical “Aztec features”.
  • 10. Indeed, their appearance resembles artefacts from Teotihuacan and from the southern State of Guerrero, particularly from the Mezcala region, which is hundreds of kilometers away from the ancient Tenochtitlan.
  • 11.  Second, it is not clear how many Guerrero/Mezcala styles exist: Some specialists believe there are at least five1 different traditions while others recognize only four2, and another group of researchers sees only two3 (Serra Puche 1975). 1Covarrubias 1948, 1961; Olmedo and Gonzalez 1986; Gonzalez and Olmedo 1990 2 Gay, 1967 3 Serra Puche 1975
  • 12. More objective methods are needed to answer questions such as:  How many styles were developed in the Guerrero/Mezcala regions?  How many of these styles coexisted?  Were some styles contemporary with the Aztecs?  Were some of these masks manufactured by the Aztecs?  Which specific styles are represented among the 162 masks found in the Sacred Precinct of Tenochtitlan?
  • 14. The application of clustering in archaeology One of the most important applications of clustering in archaeology is “unsupervised learning”, that is the discovery of artifact groups based exclusively on the analysis of characteristics observed in the artifacts themselves. Once the collection has been segmented into several groups, it would become easier to distinguish potential classes.
  • 15. Clustering basics Given a set of data-points, any clustering algorithm seeks to separate those points into a finite quantity of groups (i.e., clusters). It applies an objective similarity function to weigh how close (similar) or distant (dissimilar) the original data are among themselves. Items assigned to the same group must be highly similar. At the same time, items belonging to different groups must be highly dissimilar. The quality of the algorithm is judged by how successful it is to accomplish: a) greater homogeneity within a group, and b) greater heterogeneity between groups.
  • 16. Two clustering approaches are very popular: 1. Component linkage algorithms (i.e. single and total linkage) are based on thresholding pairwise distances and are best suited for discovering complex elongated clusters, but are very sensitive to noise in the data. 2. K-means algorithms, on the other hand, are very robust to noise but are best suited for rounded linearly separable clusters.
  • 17. Boundary shapes  Some years ago, Olmedo and Gonzalez (1986) proposed a classification based on a component- linkage algorithm (numerical taxonomy).  The forms of faces, eyes, noses, eyebrows, etc. were codified categorically.  This produced an input of 23 shape attributes for each one of 162 masks. Olmedo and Gonzalez. 1986. Presencia del Estilo Mezcala en el Templo Mayor. INAH, México .
  • 18. Eyebrows shapes Noise shapes
  • 19. Olmedo and González results  This lead to the identification of 40 groups, of which:  13 groups include only 2 masks  13 groups include only 3 masks  6 groups include 4 masks  20 masks were not included in any group
  • 21. SPECTRAL CLUSTERING Spectral Clustering is a state-of-the-art technique for exploratory analysis. It is derived from Spectral Graph Theory, a branch of mathematics that studies the properties of graphs in relation to the eigenvalues and eigenvectors of an especial type of matrix called Laplacian. In contrast to other techniques, spectral clustering can be applied to different kinds of data, including categorical data. In many situations, categorical data is often the only source of information for archaeological unsupervised learning. Therefore we believe it could bring great benefits in our field.
  • 22. Here, we won’t dive too deep into mathematical theory. Instead, we follow a very simple example to make the logic of this clustering technique easier to understand. Mathematical proofs can be found in the literature referenced in the paper, especially Luxburg (2007). Spectral Clustering is more efficient than linkage and k- means algorithms. It finds elongated clusters and is more robust to noise than linkage algorithms.
  • 23. Spectral clustering logic Spectral clustering seeks to identify groups by analyzing no the exact location of the data points (like single, total linkage and k- means), but the connectedness between them.
  • 24.  Spectral clustering relies on a graph representation of the data set.  In this graph each mask is represented as a vertex. Then, we calculate a measure of similarity (affinity) between all the masks. This is done in two steps:  First, calculate the so-called Hamming distance, which measures the percentage of non-shared attributes between two artefacts. Obviously, this measure is very useful to work with categorical data. The formula is:
  • 25.  Second, calculate the affinity between all the masks by applying the Gaussian function to the Hamming distance: σ = Threshold to control the desired level of similarity  If two objects are very different the result of the equation is negative and close to or equal to zero. On the contrary, if two objects are very similar the result of the equation is near or equal to 1.
  • 26.  The next step is to draw a link between two vertices (i.e. masks) if the similarity between them is positive or larger than a certain threshold controlled by the parameter sigma.
  • 27. Next, we weight each edge of the graph with its corresponding similarity score. This step produces a matrix in which the affinity (i.e. similarity) of all objects with all others is recorded.
  • 28.  Once this is done, the clustering problem can be reformulated as finding an optimal cutting of the graph. This means finding a partition of the graph such that the edges linking objects of the same group have very high weights (i.e. are quite similar), while the edges between groups have low weighs (i.e. are very dissimilar).
  • 29.
  • 30. THE CUTTING PROBLEM Finding an optimal cutting of the graph is what mathematicians call an NP-hard problem. This means that it cannot be found in real time. Therefore, we need to find an approximate solution. One type of relaxation is based on analyzing the eigenestructure of the Laplacian matrix associated to the similarity graph. Mathematical proofs can be found in the extensive literature on the subject.
  • 31. LAPLACIAN MATRIX The so-called Laplacian matrix is a transformation of the similarity matrix that shows more clearly the structure of the dataset. To build a Laplacian Matrix we need two ingredients: a. A matrix D containing information of the connectivity of the similarity graph. This is called the Degree matrix or Diagonal Matrix. b. The similarity matrix S produced in step 2. The simple Laplacian matrix satisfies the following equation: L=D-S
  • 32. EIGENSTRUCTURE As you know “eigen” is a prefix that means “innate”, “own”, and “characteristic”. Thus, studying the eigenstructure of matrices serves the purpose of revealing the intrinsic nature of the data contained in those matrices. Looking at the eigenstructure we can identify the best possible cuts for the graph and therefore the best clustering partition. The eigenstructure is given by eigenvalues and eigenvectors.
  • 33. EIGENVALUES AND EIGENVECTORS Given a square symmetric matrix S, we say that λ (lambda) is an eigenvalue of S if there exists a non-zero vector x such that: Sx = λx equation (2). In equation (2), x represents the eigenvector associated to the eigenvalue λ and both constitute an eigenpair for the matrix S.
  • 34. GRAPH SPECTRUM Each eigenvector has its associated eigenvalue. So there are as many eigenvectors as eigenvalues. The spectrum of the graph is precisely the set of all eigenvalues that satisfy equation (2). Such property is invariant with respect to the orientation of the data set. Calculating eigenpairs for a large matrix is a daunting task. Archaeologists do not need to worry about how to do it, as many computer math-libraries provide appropriate tools. As part of this project, we have implemented a tool-box to perform spectral clustering.
  • 35. A very basic example  We try to partition a data set consisting of 9 objects. We first built a 9 x 9 square table, in which both rows and columns enumerate each one of the 9 objects of this example. Then, we calculate the affinity between the objects by applying equation 1 and input the resulting values into the table.
  • 36. If we codify affinity values with shades of color, then the affinity matrix will have bolder colors in the cells with high affinity and light or no color in those with low affinity.
  • 37.  The interesting thing to notice is that columns 1, 2, and 3 of the affinity matrix look exactly the same.  An obvious conclusion is that objects belonging to the same class (cluster) will have similar affinity vectors, which will be quite different of the affinity vectors of other classes.
  • 38. In this picture the affinity matrix shows clearly 2 clusters in the data. In this picture the affinity matrix shows 4 clusters in the data.
  • 39. Therefore, by finding the characteristic vectors in the matrix we would be able to identify the clusters in a collection. In our example, the three dominant vectors would be the ones illustrated here. These are known in mathematical terms as the eigenvectors of the matrix. There are other vectors, but these tell us nothing about the clustering structure of the data.
  • 40. Finally, if we use those vectors as axes of a new coordinate system and map the original data- points into such space, then we would visually appreciate the 3 clusters in a more easy way.
  • 41. Of course, not all data sets are as clearly structured as the example. In most cases, it is necessary to perform the eigenstructure analysis using complex Linear Algebra algorithms. Details of those techniques can be found in the extensive literature on the subject, especially in Alpert, Kahng, and Yao (1999), Shi and Malik (2000), Ng, Jordan and Weiss (2001), Melia and Shi (2001), Zelnik-Manor and Prona (2004), Bach and Jordan (2006, 2008), Azranand and Ghahramani (2006b), Yan et al. (2009). Fortunately, archaeologists do not need to implement these methods, as we have produced a computer program that performs Spectral Clustering automatically.
  • 42. Previous clustering Results from component linkage
  • 43. COMPONENT LINKAGE FAILURES  Dataset of 162 stone masks  40 clusters (too many!)  10 well defined clusters (too few)  6 clusters not very well defined  Sparse clusters: 13 clusters have only 2 elements.  20 masks are not clustered (i.e. 20 clusters have only 1 element).
  • 45. Component Linkage : Cluster 4
  • 47. Component Linkage: Cluster 37 (acceptable)
  • 48. Again, Cluster 37 (as it should be)
  • 50. If spectral clustering is efficient and robust, we should find better-defined clusters
  • 51. Application; K = 25; sigma = 1.0 Results from spectral clustering
  • 61. FUTURE WORK  Spectral clustering relies on two parameters 1. k (desired number of clusters) 2. σ (similarity threshold)  Our next goal is to apply another algorithm to “learn” the value of k directly from the dataset.
  • 63. Conclusions  Applying Spectral Clustering to the Mezcala collection have produced encouraging results. We were able to partition the mask collection into 23 well-defined groups, which is a better result than the 40 clusters obtained with Numerical taxonomy.  We illustrated the 23 groups of Mezcala masks. The reader would notice the great performance of the Spectral Clustering algorithm, especially by comparing clusters 3, 11, 12, 14, 16, 18, 25, and some others. Such groups are defined by highly similar masks. Cluster 12, for example, contains masks with triangular faces made in highly polished stone. In contrast, cluster 3 contains square masks, most of which with perforated eyes and less polished material that the ones in group 12.
  • 64.  Furthermore, each one of the groups identified with Spectral Clustering is clearly different from the rest, which allows us trusting the partition. Only 2 masks (labeled here as “clusters” 7 and 17) were left un-clustered, which represents a better result than the one obtained by numerical taxonomy in which 20 masks were isolated.  Therefore, we believe that Spectral Clustering may have a future role in archaeology, especially as a first step in analyzing shape features of complex collections.
  • 65. THANKS FOR YOUR ATTENTION