SlideShare a Scribd company logo
1 of 12
Download to read offline
Un-Dusting the Foundations of
Compositional Analysis Approaches of
Ceramic Archaeological Data
Elisavet Charalambous
NARNIA ESR08, EuroCy Innovations
Department of Electrical and Computer Engineering,
University of Cyprus
liza.charalambous@eurocyinnovations.com
charalambous.elisavet@ucy.ac.cy
The Classification Problem
Classification of archaeological ceramics deals with the
isolation of ceramic groups of similar chemical profiles
 Given a number of artifacts of known fabric/category
identify to which of a set of categories an uncategorized
artifact belongs
 An instance of supervised learning which assumes that a
training set of correctly identified observations is available.
 Algorithms which perform classification require the
provision of a set with known labels.
 The lables of un-categorized observations are determined by
processing the already classified material
Compositional Data Analysis
 Compositional data do not vary independently
 Concentration based approaches to data analysis
can lead to faulty conclusions.
 Compositional data lay in the
constrained Simplex Space
 Correlation analysis and the Euclidean distance
are not mathematically meaningful concepts in
this context
 XY plots for raw or log-transformed data
should only be used with care in an
exploratory data analysis (EDA) sense
… cautious with the belief that good data will speak for
themselves
Hypothesis Formulation
 The development of an experimental design for the
classification of data from different datasets which span
several periods.
 The proposed design is tested with the deployment of
three well known classification algorithms.
 Test 1: Null Hypothesis
The classification algorithms, k-Nearest Neighbour, C4.5(based on Decision
Trees) and Learning Vector Quantization (LVQ Networks) perform equally
when analyzing ceramic compositional data.
 Test 2: Alternative Hypothesis
The pairwise performance of algorithms is not equal and one algorithm
outperforms the others
Classification Algorithms
k-Nearest Neigbour
Learning Vector Quantization
C 4.5
The Dataset
 177 samples constituted of utilitarian pottery
found in Cyprus analyzed with ED-XRF analysis
 Dated in the Philia phase as well as the Early and
Middle Bronze Ages
 Categorized into 36 classes including samples
classified as outliers by the expert
0
5
10
15
20
25
30
35
40
45
Class Label
Histogram of Class Distribution
Experiment Overview
The nature of the problem imposes constraints which lead to
the deployment of the following practices:
 The Aitchison distance is used when classification takes
place in the Euclidian (real) space
 Resampling with bootstrapping for the generation of
statistics
 Fine tuning of the parametric aspects of each algorithm
 Evaluation of classification result based on the classification
accuracy and the Jaccard Index (an external cluster validity
index)
 Significance Testing at level 0.05
 5x2 Cross Validation paired t-test
 Combined 5 x 2 Cross Validation F Test
Null Hypothesis
k-NN C4.5 LVQ
kNN Accept Reject
C4.5 Accept Reject
LVQ Reject Reject
Results of the experiment
45
50
55
60
65
70
75
80
85
KNN C4.5 LVQ
%
Algorithms
Classification Accuracy
Classification Accuracy
(%)
Jaccard Index
(%)
Mean Max Min Mean Max Min
k-NN 72.1 79.4 64.2 56.7 70.1 42.7
C4.5 68.5 77.2 61.7 49.1 63.7 38
LVQ 55.8 65.2 46.2 30.3 38.8 21
Classification Beyond Labeling
 Analysis of misclassification patterns may lead to
relationships between classes
 Observations in the results has shown the following
patterns
 Elements in class M1.II if misclassified would be allocated
to class M1.III. The same holds between classes M1.IV and
M1.VIII, M1.VIII and M1.XII, Ph.I and M1.I
Confusion Matrix
M1.I M1.II M1.III M1.IV
M1.I 6 0 0 0
M1.II 0 6 2 0
M1.III 0 2 5 0
M1.IV 0 0 0 9
What if there is more?
Trace elements may concur more characteristically in
determining the fingerprint of a deposit
 The discussed experiment was also implemented for trace
and main elements separately allowing us to hypothetize that
trace elements might contain additional information which we
could consider.
Classification using the Main Elements
Classification Accuracy
(%)
Jaccard Index
(%)
Mean Max Min Mean Max Min
KNN 73.2 79.5 67.6 52.9 64.8 40.5
C4.5 67.8 75.1 62.7 43.2 56.8 32.8
LVQ 57.3 67 48.8 29.3 36.6 20.9
Classification using just the Trace Elements
Classification Accuracy
(%)
Jaccard Index
(%)
Mean Max Min Mean Max Min
66.6 75 58.2 47.8 57.7 35.7
64.5 71.2 55.8 46.1 58.1 35.6
59.1 66.2 51.4 40.2 52.2 26.1
Last words....
 Data transformation should be used only when fully
understood and inline with the analysis objective
 Analysis results may be misleading due to incomplete
data : therefore exploratory analysis prior to any other
analysis is crucial in gaining insight on the problem
 Machine learning techniques and statistical analysis
can be very useful if used appropriately;
 Necessary to consider the assumptions each method
imposes
 Important to maintain consistency
 Ensure no conflicting constraints
Thank you for the attention!
Comments and Questions are Welcome!

More Related Content

Viewers also liked

Reklamos prekybos centruose apžvalga
Reklamos prekybos centruose apžvalgaReklamos prekybos centruose apžvalga
Reklamos prekybos centruose apžvalgaKarolis Rimkus
 
Ką Snapchat veikti prekių ženklams: LABAS pavyzdys.
Ką Snapchat veikti prekių ženklams: LABAS pavyzdys.Ką Snapchat veikti prekių ženklams: LABAS pavyzdys.
Ką Snapchat veikti prekių ženklams: LABAS pavyzdys.Karolis Rimkus
 
Ką apie medijas išpranašavo Back to the Future 2?
Ką apie medijas išpranašavo Back to the Future 2?Ką apie medijas išpranašavo Back to the Future 2?
Ką apie medijas išpranašavo Back to the Future 2?Karolis Rimkus
 
Media tendencijos ir „Media eksperto“ apdovanojimai
Media tendencijos ir „Media eksperto“ apdovanojimaiMedia tendencijos ir „Media eksperto“ apdovanojimai
Media tendencijos ir „Media eksperto“ apdovanojimaiMedia House
 
Viskas, ką turite žinoti apie i.MAS
Viskas, ką turite žinoti apie i.MASViskas, ką turite žinoti apie i.MAS
Viskas, ką turite žinoti apie i.MASStrongPoint
 
Kas yra media agentūra ir kaip jai pateikti rėmimo pasiūlymus
Kas yra media agentūra ir kaip jai pateikti rėmimo pasiūlymusKas yra media agentūra ir kaip jai pateikti rėmimo pasiūlymus
Kas yra media agentūra ir kaip jai pateikti rėmimo pasiūlymusKarolis Rimkus
 
Coaching with the brain in mind
Coaching with the brain in mindCoaching with the brain in mind
Coaching with the brain in mindChristophe Marchal
 
Augmented Reality Brand Tracking Q2 2010
Augmented Reality Brand Tracking Q2 2010Augmented Reality Brand Tracking Q2 2010
Augmented Reality Brand Tracking Q2 2010KZero Worldswide
 
What is Neuro Linguistic Programming (NLP)
What is Neuro Linguistic Programming (NLP)What is Neuro Linguistic Programming (NLP)
What is Neuro Linguistic Programming (NLP)Fiona Campbell
 
Intorduction to Neuro Linguistic Programming (NLP)
Intorduction to Neuro Linguistic Programming (NLP)Intorduction to Neuro Linguistic Programming (NLP)
Intorduction to Neuro Linguistic Programming (NLP)eohart
 

Viewers also liked (10)

Reklamos prekybos centruose apžvalga
Reklamos prekybos centruose apžvalgaReklamos prekybos centruose apžvalga
Reklamos prekybos centruose apžvalga
 
Ką Snapchat veikti prekių ženklams: LABAS pavyzdys.
Ką Snapchat veikti prekių ženklams: LABAS pavyzdys.Ką Snapchat veikti prekių ženklams: LABAS pavyzdys.
Ką Snapchat veikti prekių ženklams: LABAS pavyzdys.
 
Ką apie medijas išpranašavo Back to the Future 2?
Ką apie medijas išpranašavo Back to the Future 2?Ką apie medijas išpranašavo Back to the Future 2?
Ką apie medijas išpranašavo Back to the Future 2?
 
Media tendencijos ir „Media eksperto“ apdovanojimai
Media tendencijos ir „Media eksperto“ apdovanojimaiMedia tendencijos ir „Media eksperto“ apdovanojimai
Media tendencijos ir „Media eksperto“ apdovanojimai
 
Viskas, ką turite žinoti apie i.MAS
Viskas, ką turite žinoti apie i.MASViskas, ką turite žinoti apie i.MAS
Viskas, ką turite žinoti apie i.MAS
 
Kas yra media agentūra ir kaip jai pateikti rėmimo pasiūlymus
Kas yra media agentūra ir kaip jai pateikti rėmimo pasiūlymusKas yra media agentūra ir kaip jai pateikti rėmimo pasiūlymus
Kas yra media agentūra ir kaip jai pateikti rėmimo pasiūlymus
 
Coaching with the brain in mind
Coaching with the brain in mindCoaching with the brain in mind
Coaching with the brain in mind
 
Augmented Reality Brand Tracking Q2 2010
Augmented Reality Brand Tracking Q2 2010Augmented Reality Brand Tracking Q2 2010
Augmented Reality Brand Tracking Q2 2010
 
What is Neuro Linguistic Programming (NLP)
What is Neuro Linguistic Programming (NLP)What is Neuro Linguistic Programming (NLP)
What is Neuro Linguistic Programming (NLP)
 
Intorduction to Neuro Linguistic Programming (NLP)
Intorduction to Neuro Linguistic Programming (NLP)Intorduction to Neuro Linguistic Programming (NLP)
Intorduction to Neuro Linguistic Programming (NLP)
 

Similar to Undusting the foundations of compositional analysis approaches of ceramic archaeological data_reviewed

Unsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
Unsupervised Clustering Classify theCancer Data withtheHelp of FCM AlgorithmUnsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
Unsupervised Clustering Classify theCancer Data withtheHelp of FCM AlgorithmIOSR Journals
 
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0theijes
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxAmy Aung
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10mqasimsheikh5
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsAlbert Orriols-Puig
 
2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune SystemsLeandro de Castro
 
Diagnosis of Faulty Elements in Array Antenna using Nature Inspired Cuckoo Se...
Diagnosis of Faulty Elements in Array Antenna using Nature Inspired Cuckoo Se...Diagnosis of Faulty Elements in Array Antenna using Nature Inspired Cuckoo Se...
Diagnosis of Faulty Elements in Array Antenna using Nature Inspired Cuckoo Se...IJECEIAES
 
ClustIII.ppt
ClustIII.pptClustIII.ppt
ClustIII.pptSueMiu
 
Improving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..docImproving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..docbutest
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesCSCJournals
 
Fault detection and_diagnosis
Fault detection and_diagnosisFault detection and_diagnosis
Fault detection and_diagnosisM Reza Rahmati
 

Similar to Undusting the foundations of compositional analysis approaches of ceramic archaeological data_reviewed (20)

Unsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
Unsupervised Clustering Classify theCancer Data withtheHelp of FCM AlgorithmUnsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
Unsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
 
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptx
 
Pca part
Pca partPca part
Pca part
 
Bw4201485492
Bw4201485492Bw4201485492
Bw4201485492
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal Clusters
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data sets
 
Data clustering
Data clustering Data clustering
Data clustering
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems
 
Diagnosis of Faulty Elements in Array Antenna using Nature Inspired Cuckoo Se...
Diagnosis of Faulty Elements in Array Antenna using Nature Inspired Cuckoo Se...Diagnosis of Faulty Elements in Array Antenna using Nature Inspired Cuckoo Se...
Diagnosis of Faulty Elements in Array Antenna using Nature Inspired Cuckoo Se...
 
SpectralClassificationOfStars
SpectralClassificationOfStarsSpectralClassificationOfStars
SpectralClassificationOfStars
 
ClustIII.ppt
ClustIII.pptClustIII.ppt
ClustIII.ppt
 
Improving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..docImproving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..doc
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
 
Fault detection and_diagnosis
Fault detection and_diagnosisFault detection and_diagnosis
Fault detection and_diagnosis
 

Undusting the foundations of compositional analysis approaches of ceramic archaeological data_reviewed

  • 1. Un-Dusting the Foundations of Compositional Analysis Approaches of Ceramic Archaeological Data Elisavet Charalambous NARNIA ESR08, EuroCy Innovations Department of Electrical and Computer Engineering, University of Cyprus liza.charalambous@eurocyinnovations.com charalambous.elisavet@ucy.ac.cy
  • 2. The Classification Problem Classification of archaeological ceramics deals with the isolation of ceramic groups of similar chemical profiles  Given a number of artifacts of known fabric/category identify to which of a set of categories an uncategorized artifact belongs  An instance of supervised learning which assumes that a training set of correctly identified observations is available.  Algorithms which perform classification require the provision of a set with known labels.  The lables of un-categorized observations are determined by processing the already classified material
  • 3. Compositional Data Analysis  Compositional data do not vary independently  Concentration based approaches to data analysis can lead to faulty conclusions.  Compositional data lay in the constrained Simplex Space  Correlation analysis and the Euclidean distance are not mathematically meaningful concepts in this context  XY plots for raw or log-transformed data should only be used with care in an exploratory data analysis (EDA) sense … cautious with the belief that good data will speak for themselves
  • 4. Hypothesis Formulation  The development of an experimental design for the classification of data from different datasets which span several periods.  The proposed design is tested with the deployment of three well known classification algorithms.  Test 1: Null Hypothesis The classification algorithms, k-Nearest Neighbour, C4.5(based on Decision Trees) and Learning Vector Quantization (LVQ Networks) perform equally when analyzing ceramic compositional data.  Test 2: Alternative Hypothesis The pairwise performance of algorithms is not equal and one algorithm outperforms the others
  • 6. The Dataset  177 samples constituted of utilitarian pottery found in Cyprus analyzed with ED-XRF analysis  Dated in the Philia phase as well as the Early and Middle Bronze Ages  Categorized into 36 classes including samples classified as outliers by the expert 0 5 10 15 20 25 30 35 40 45 Class Label Histogram of Class Distribution
  • 7. Experiment Overview The nature of the problem imposes constraints which lead to the deployment of the following practices:  The Aitchison distance is used when classification takes place in the Euclidian (real) space  Resampling with bootstrapping for the generation of statistics  Fine tuning of the parametric aspects of each algorithm  Evaluation of classification result based on the classification accuracy and the Jaccard Index (an external cluster validity index)  Significance Testing at level 0.05  5x2 Cross Validation paired t-test  Combined 5 x 2 Cross Validation F Test
  • 8. Null Hypothesis k-NN C4.5 LVQ kNN Accept Reject C4.5 Accept Reject LVQ Reject Reject Results of the experiment 45 50 55 60 65 70 75 80 85 KNN C4.5 LVQ % Algorithms Classification Accuracy Classification Accuracy (%) Jaccard Index (%) Mean Max Min Mean Max Min k-NN 72.1 79.4 64.2 56.7 70.1 42.7 C4.5 68.5 77.2 61.7 49.1 63.7 38 LVQ 55.8 65.2 46.2 30.3 38.8 21
  • 9. Classification Beyond Labeling  Analysis of misclassification patterns may lead to relationships between classes  Observations in the results has shown the following patterns  Elements in class M1.II if misclassified would be allocated to class M1.III. The same holds between classes M1.IV and M1.VIII, M1.VIII and M1.XII, Ph.I and M1.I Confusion Matrix M1.I M1.II M1.III M1.IV M1.I 6 0 0 0 M1.II 0 6 2 0 M1.III 0 2 5 0 M1.IV 0 0 0 9
  • 10. What if there is more? Trace elements may concur more characteristically in determining the fingerprint of a deposit  The discussed experiment was also implemented for trace and main elements separately allowing us to hypothetize that trace elements might contain additional information which we could consider. Classification using the Main Elements Classification Accuracy (%) Jaccard Index (%) Mean Max Min Mean Max Min KNN 73.2 79.5 67.6 52.9 64.8 40.5 C4.5 67.8 75.1 62.7 43.2 56.8 32.8 LVQ 57.3 67 48.8 29.3 36.6 20.9 Classification using just the Trace Elements Classification Accuracy (%) Jaccard Index (%) Mean Max Min Mean Max Min 66.6 75 58.2 47.8 57.7 35.7 64.5 71.2 55.8 46.1 58.1 35.6 59.1 66.2 51.4 40.2 52.2 26.1
  • 11. Last words....  Data transformation should be used only when fully understood and inline with the analysis objective  Analysis results may be misleading due to incomplete data : therefore exploratory analysis prior to any other analysis is crucial in gaining insight on the problem  Machine learning techniques and statistical analysis can be very useful if used appropriately;  Necessary to consider the assumptions each method imposes  Important to maintain consistency  Ensure no conflicting constraints
  • 12. Thank you for the attention! Comments and Questions are Welcome!