Undusting the foundations of compositional analysis approaches of ceramic archaeological data_reviewed

Un-Dusting the Foundations of
Compositional Analysis Approaches of
Ceramic Archaeological Data
Elisavet Charalambous
NARNIA ESR08, EuroCy Innovations
Department of Electrical and Computer Engineering,
University of Cyprus
liza.charalambous@eurocyinnovations.com
charalambous.elisavet@ucy.ac.cy

The Classification Problem
Classification of archaeological ceramics deals with the
isolation of ceramic groups of similar chemical profiles
 Given a number of artifacts of known fabric/category
identify to which of a set of categories an uncategorized
artifact belongs
 An instance of supervised learning which assumes that a
training set of correctly identified observations is available.
 Algorithms which perform classification require the
provision of a set with known labels.
 The lables of un-categorized observations are determined by
processing the already classified material

Compositional Data Analysis
 Compositional data do not vary independently
 Concentration based approaches to data analysis
can lead to faulty conclusions.
 Compositional data lay in the
constrained Simplex Space
 Correlation analysis and the Euclidean distance
are not mathematically meaningful concepts in
this context
 XY plots for raw or log-transformed data
should only be used with care in an
exploratory data analysis (EDA) sense
… cautious with the belief that good data will speak for
themselves

Hypothesis Formulation
 The development of an experimental design for the
classification of data from different datasets which span
several periods.
 The proposed design is tested with the deployment of
three well known classification algorithms.
 Test 1: Null Hypothesis
The classification algorithms, k-Nearest Neighbour, C4.5(based on Decision
Trees) and Learning Vector Quantization (LVQ Networks) perform equally
when analyzing ceramic compositional data.
 Test 2: Alternative Hypothesis
The pairwise performance of algorithms is not equal and one algorithm
outperforms the others

Classification Algorithms
k-Nearest Neigbour
Learning Vector Quantization
C 4.5

The Dataset
 177 samples constituted of utilitarian pottery
found in Cyprus analyzed with ED-XRF analysis
 Dated in the Philia phase as well as the Early and
Middle Bronze Ages
 Categorized into 36 classes including samples
classified as outliers by the expert
0
5
10
15
20
25
30
35
40
45
Class Label
Histogram of Class Distribution

Experiment Overview
The nature of the problem imposes constraints which lead to
the deployment of the following practices:
 The Aitchison distance is used when classification takes
place in the Euclidian (real) space
 Resampling with bootstrapping for the generation of
statistics
 Fine tuning of the parametric aspects of each algorithm
 Evaluation of classification result based on the classification
accuracy and the Jaccard Index (an external cluster validity
index)
 Significance Testing at level 0.05
 5x2 Cross Validation paired t-test
 Combined 5 x 2 Cross Validation F Test

Null Hypothesis
k-NN C4.5 LVQ
kNN Accept Reject
C4.5 Accept Reject
LVQ Reject Reject
Results of the experiment
45
50
55
60
65
70
75
80
85
KNN C4.5 LVQ
%
Algorithms
Classification Accuracy
(%)
Jaccard Index
(%)
Mean Max Min Mean Max Min
k-NN 72.1 79.4 64.2 56.7 70.1 42.7
C4.5 68.5 77.2 61.7 49.1 63.7 38
LVQ 55.8 65.2 46.2 30.3 38.8 21

Classification Beyond Labeling
 Analysis of misclassification patterns may lead to
relationships between classes
 Observations in the results has shown the following
patterns
 Elements in class M1.II if misclassified would be allocated
to class M1.III. The same holds between classes M1.IV and
M1.VIII, M1.VIII and M1.XII, Ph.I and M1.I
Confusion Matrix
M1.I M1.II M1.III M1.IV
M1.I 6 0 0 0
M1.II 0 6 2 0
M1.III 0 2 5 0
M1.IV 0 0 0 9

What if there is more?
Trace elements may concur more characteristically in
determining the fingerprint of a deposit
 The discussed experiment was also implemented for trace
and main elements separately allowing us to hypothetize that
trace elements might contain additional information which we
could consider.
Classification using the Main Elements
(%)
Jaccard Index
(%)
KNN 73.2 79.5 67.6 52.9 64.8 40.5
C4.5 67.8 75.1 62.7 43.2 56.8 32.8
LVQ 57.3 67 48.8 29.3 36.6 20.9
Classification using just the Trace Elements
(%)
Jaccard Index
(%)
66.6 75 58.2 47.8 57.7 35.7
64.5 71.2 55.8 46.1 58.1 35.6
59.1 66.2 51.4 40.2 52.2 26.1

Last words....
 Data transformation should be used only when fully
understood and inline with the analysis objective
 Analysis results may be misleading due to incomplete
data : therefore exploratory analysis prior to any other
analysis is crucial in gaining insight on the problem
 Machine learning techniques and statistical analysis
can be very useful if used appropriately;
 Necessary to consider the assumptions each method
imposes
 Important to maintain consistency
 Ensure no conflicting constraints

Thank you for the attention!
Comments and Questions are Welcome!

Undusting the foundations of compositional analysis approaches of ceramic archaeological data_reviewed

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Undusting the foundations of compositional analysis approaches of ceramic archaeological data_reviewed

Similar to Undusting the foundations of compositional analysis approaches of ceramic archaeological data_reviewed (20)

Undusting the foundations of compositional analysis approaches of ceramic archaeological data_reviewed