Undusting the foundations of compositional analysis approaches of ceramic archaeological data_reviewed
1. Un-Dusting the Foundations of
Compositional Analysis Approaches of
Ceramic Archaeological Data
Elisavet Charalambous
NARNIA ESR08, EuroCy Innovations
Department of Electrical and Computer Engineering,
University of Cyprus
liza.charalambous@eurocyinnovations.com
charalambous.elisavet@ucy.ac.cy
2. The Classification Problem
Classification of archaeological ceramics deals with the
isolation of ceramic groups of similar chemical profiles
Given a number of artifacts of known fabric/category
identify to which of a set of categories an uncategorized
artifact belongs
An instance of supervised learning which assumes that a
training set of correctly identified observations is available.
Algorithms which perform classification require the
provision of a set with known labels.
The lables of un-categorized observations are determined by
processing the already classified material
3. Compositional Data Analysis
Compositional data do not vary independently
Concentration based approaches to data analysis
can lead to faulty conclusions.
Compositional data lay in the
constrained Simplex Space
Correlation analysis and the Euclidean distance
are not mathematically meaningful concepts in
this context
XY plots for raw or log-transformed data
should only be used with care in an
exploratory data analysis (EDA) sense
… cautious with the belief that good data will speak for
themselves
4. Hypothesis Formulation
The development of an experimental design for the
classification of data from different datasets which span
several periods.
The proposed design is tested with the deployment of
three well known classification algorithms.
Test 1: Null Hypothesis
The classification algorithms, k-Nearest Neighbour, C4.5(based on Decision
Trees) and Learning Vector Quantization (LVQ Networks) perform equally
when analyzing ceramic compositional data.
Test 2: Alternative Hypothesis
The pairwise performance of algorithms is not equal and one algorithm
outperforms the others
6. The Dataset
177 samples constituted of utilitarian pottery
found in Cyprus analyzed with ED-XRF analysis
Dated in the Philia phase as well as the Early and
Middle Bronze Ages
Categorized into 36 classes including samples
classified as outliers by the expert
0
5
10
15
20
25
30
35
40
45
Class Label
Histogram of Class Distribution
7. Experiment Overview
The nature of the problem imposes constraints which lead to
the deployment of the following practices:
The Aitchison distance is used when classification takes
place in the Euclidian (real) space
Resampling with bootstrapping for the generation of
statistics
Fine tuning of the parametric aspects of each algorithm
Evaluation of classification result based on the classification
accuracy and the Jaccard Index (an external cluster validity
index)
Significance Testing at level 0.05
5x2 Cross Validation paired t-test
Combined 5 x 2 Cross Validation F Test
8. Null Hypothesis
k-NN C4.5 LVQ
kNN Accept Reject
C4.5 Accept Reject
LVQ Reject Reject
Results of the experiment
45
50
55
60
65
70
75
80
85
KNN C4.5 LVQ
%
Algorithms
Classification Accuracy
Classification Accuracy
(%)
Jaccard Index
(%)
Mean Max Min Mean Max Min
k-NN 72.1 79.4 64.2 56.7 70.1 42.7
C4.5 68.5 77.2 61.7 49.1 63.7 38
LVQ 55.8 65.2 46.2 30.3 38.8 21
9. Classification Beyond Labeling
Analysis of misclassification patterns may lead to
relationships between classes
Observations in the results has shown the following
patterns
Elements in class M1.II if misclassified would be allocated
to class M1.III. The same holds between classes M1.IV and
M1.VIII, M1.VIII and M1.XII, Ph.I and M1.I
Confusion Matrix
M1.I M1.II M1.III M1.IV
M1.I 6 0 0 0
M1.II 0 6 2 0
M1.III 0 2 5 0
M1.IV 0 0 0 9
10. What if there is more?
Trace elements may concur more characteristically in
determining the fingerprint of a deposit
The discussed experiment was also implemented for trace
and main elements separately allowing us to hypothetize that
trace elements might contain additional information which we
could consider.
Classification using the Main Elements
Classification Accuracy
(%)
Jaccard Index
(%)
Mean Max Min Mean Max Min
KNN 73.2 79.5 67.6 52.9 64.8 40.5
C4.5 67.8 75.1 62.7 43.2 56.8 32.8
LVQ 57.3 67 48.8 29.3 36.6 20.9
Classification using just the Trace Elements
Classification Accuracy
(%)
Jaccard Index
(%)
Mean Max Min Mean Max Min
66.6 75 58.2 47.8 57.7 35.7
64.5 71.2 55.8 46.1 58.1 35.6
59.1 66.2 51.4 40.2 52.2 26.1
11. Last words....
Data transformation should be used only when fully
understood and inline with the analysis objective
Analysis results may be misleading due to incomplete
data : therefore exploratory analysis prior to any other
analysis is crucial in gaining insight on the problem
Machine learning techniques and statistical analysis
can be very useful if used appropriately;
Necessary to consider the assumptions each method
imposes
Important to maintain consistency
Ensure no conflicting constraints
12. Thank you for the attention!
Comments and Questions are Welcome!