This study compared two methods for sexing skulls - Method 1 used descriptions only while Method 2 used illustrations and descriptions. Two observers sexed 50 skulls of unknown sex twice using each method. There was no significant difference in average intraobserver agreement between the two methods. However, interobserver agreement was higher for most traits in Method 1 compared to Method 2. The results indicate that while the methods have similar reliability, Method 1 may be less subjective. Overall, subjectivity remained high for both methods.
Adaptationism And Molecular Biology An Example Based On ADHD
Â
Dissertation (final)
1. DATE: 28th March 2011
MATRIC NO: 0707609d
COURSE: Anatomy Level 4H
ASSIGNMENT: Research Project
SUPERVISOR: Dr Stuart McDonald
TITLE: âSexing the Craniumâ
2. Abstract
Skull sexing methods incorporating diagrams and descriptions are favoured by current
forensic anthropologists over the older description-only methods. This is due to
standardisation of the newer methods, resulting in a reported decrease in subjectivity and
need for experienced observers. However, no studies have compared illustrated and non-
illustrated methods in terms of reliability and subjectivity and so this formed the aim of the
study, with the null hypothesis being that there was no difference between the two Methods.
Fifty skulls of unknown sex were sexed independently by an experienced and inexperienced
observer twice using Method 1, an older description-only method, and twice using Method 2,
a newer illustrated and descriptive method. Sex classifications assigned to each cranial trait
by each observer were then analysed to determine intra-observer and inter-observer
agreement. There was no significant difference between the average intra-observer
agreements in Method 1 and 2 as well as between the average inter-observer agreement in
both Methods. However, amongst the eight traits that were shared by both Methods, all
except one had a higher inter-observer agreement in Method 1. Additionally, the illustrated
traits did seem to promote higher inter-observer agreement than the non-illustrated traits in
Method 2. The results indicate that Method 1 and 2 are equally reliable, and that diagrams do
seem to reduce subjectivity. The findings also imply that Method 1 is less subjective than
Method 2. Nevertheless, subjectivity was high in both Methods and the effect of observer
inexperience was evident in classifications made by the inexperienced observer. Ambiguous
descriptions and the lack of diagrams were some of the general issues raised. A new sexing
method resolving all the issues and achieving methodological standardisation would result in
one that is low in subjectivity and high in reproducibility, which could then be compared to
Method 1 in a future study.
4. i
Acknowledgements
I owe sincere and earnest thankfulness to my project supervisor, Dr Stuart McDonald, for his
endless support and wise words; without him this dissertation would not have been possible. I
would also like to express my gratitude to the Glasgow University Anatomy Department for
providing and entrusting me with the skull specimens. Last but most definitely not least,
thanks go to my mum, dad and sister for being great morale boosters, for their support,
constant interest in my work and for their love.
5. 1
Introduction
Male or female sexual characteristics are visible in soft tissue from the onset of puberty, and
this sexual dimorphism is also reflected skeletally (Keen, 1950, Albert et al., 2007). This has
long been taken advantage of for the purpose of forensic sex identification, and one of the
most dimorphic skeletal components in humans is the skull (Gapert et al., 2009). It may be
due to this that literature on forensic identification of the skull is plentiful (Krogman, 1955).
Several different methods can be adopted for sexing the skull. Authors express different
preferences for metric or non-metric methods of sex assignment. Metric procedures consist of
using measuring tools to quantify the size of a cranial feature, or distances between cranial
landmarks. The values obtained are then most frequently put into a pre-constructed equation,
namely a discriminant function equation, which calculates a value indicative of the estimated
sex (Veyre-Goulet et al., 2008). On the other hand, experimenters who adopt non-metric
methods depend on visual assessment of the perceived masculinity or femininity of individual
cranial traits which usually concern the shape, presence, absence or degree of expression of a
trait (Gualdi-Russo et al., 1999; Williams & Rogers, 2006).
In 1955, Krogman proposed a non-metric method in which a suite of 14 cranial traits were
compiled and the masculine and feminine appearance of each was described, with respect to
bone shape or degree of trait expression (Krogman, 1955). As it was not explicitly stated, it is
assumed that each of the 14 traits are to be observed on a given skull, and each assigned a
male or female status according to the descriptions. The sex that is assigned to more than half
of the cranial traits can then be concluded as the overall sex of the skull.
8. 4
The scoring of traits using descriptions and diagrams proved successful in the study of
Walker (2008) and Walrath et al. (2004). In Walkerâs study, intraobserver agreement level of
99.5% and there was no significant difference in the sex assignment between the 20 observers
of differing levels of experience. Walrath et al. (2004), who only tested interobserver
agreement, achieved gamma statistics from 0.73 up to 0.915 (on a scale from 0-1.0),
excluding their only non-significant result for the frontal and parietal eminences. Walrath et
al. (2004) witnessed lower interobserver agreements where cranial traits were not sufficiently
described and illustrated and this clearly seems to be the source of the problem of subjectivity
inherent in non-metric methods.
Although the success of scoring using diagrams to illustrate trait descriptions have been
demonstrated for the purpose of cranial sex estimation, no studies have made a direct
comparison between this method and those methods which only use dichotomous
male/female descriptions. Confirmation of the superiority of illustrated trait descriptions and
scoring as a method for reliable and objective cranial sexing would allow this method to be
widely adopted by future studies.
The aim of this study is to investigate whether methods incorporating scoring using diagrams
coupled with descriptions result in a higher agreement between and within observers of
differing experiences compared with description-only methods. This will be done by two
observers of differing levels of experience carrying out the sexing methods described in
studies of Walrath et al. (2004) and Krogman (1955) and comparing the results of
interobserver and intraobserver testing. The null hypothesis is that these two non-metric
9. 5
methods produce intraobserver and interobserver agreements which are not statistically
significantly different from each other.
Materials and Methods
Fifty skulls formed the sample group of this study. They were excavated from a Glasgow
cemetery some time before 1915 due to building work occurring on the site, and belong to the
early 19th century. The skulls are housed in the University of Glasgowâs Anatomy
Department and upon arrival the sex and race of each skull were unknown, although it is
most probable that the individuals are of Scottish ancestry.
Method 1
The first method adopted was that of Krogman (1955) in which 12 cranial traits were
observed on each of the 50 skulls and a sex assigned to each trait. Krogmanâs descriptions
formed the basis of our judgements as to whether a trait was that of a male or female, and is
summarised in Table 1. A trait would be classified as unascribed if it seemed to express
neither or both morphologies expected of a male or female skull. Mandibles and teeth were
observed by Krogman (1955) but they were disregarded in this study as all mandibles were
absent and there were too little teeth present in each skull, if any, to make a reliable
judgement upon.
10. 6
Trait Male Female
General size Large (endocranial volume 200cc
more)
Small
Architecture Rugged Smooth
Supraorbital ridges Medium to large Small to medium
Mastoid processes Medium to large Small to medium
Occipital area Muscle lines and protuberance
marked
Muscle lines and protuberance
not marked
Frontal eminences Small large
Parietal eminences Small large
Orbits Squared, lower, relatively smaller,
with rounded margins
Rounded, higher, relatively
larger, with sharp margins
Forehead Steeper, less rounded Rounded, full, infantile
Cheek bones Heavier, more laterally arched Lighter, more compressed
Palate Larger, broader, tends to U-shape Small, tends to parabola
Occipital condyles Large Small
Table 1: Traits in the human skull displaying sexual dimorphism and a description of how
each would appear in the male and female (from Krogman, 1955)
All the classifications were noted in a recording sheet, one for each skull (Figure 1). For each
skull the number of male, female and unascribed classifications were determined, and the
ascription that amounted to at least 7 out of the 12 traits was the overall sex assigned. If
neither the amount of male or female ascriptions reached this total, then the skull in question
was classified as unascribed. Each of the two observers carried out this procedure on the
whole sample group twice whilst blind to each otherâs decisions and to their own previous
classifications for each skull. The interobserver and intraobserver agreements for each trait
were analysed and tested for statistical significance.
11. 7
FEATURE COMMENT SEX INDICATED
General size
Architecture
Supraorbital ridges
Mastoid processes
Occipital area
Frontal eminences
Parietal eminences
Orbits
Forehead
Cheek bones
Palate
Occipital condyles
Sex ascribed:
Fig 1: the scoring sheet used for Method 1
Method 2
The second method adopted was described by Walrath et al. (2004). In this, 10 features were
observed on the same 50 skulls by the same two observers independently, and the criteria list
can be seen in Figure 2. Each feature was compared against their respective series of images
and/or descriptions and assigned a score accordingly, either -2, -1, 0, +1 or +2, each being
indicative of the level of sexualisation ranging from âhyperfeminineâ to âhypermasculineâ.
The scoring sheet used for this method is shown in Figure 3. The resultant set of scores of a
given skull, together with their respective weight values, can then be used to calculate an IS
score, which essentially is the overall sexualisation of the skull. The formula is given below:
IS = ÎŁ (score Ă weight)
ÎŁ weight
No. of male features
No. of female features
No. of unascribed features
12. 8
A positive IS score is indicative of a male cranium whilst a negative score is indicative of a
female cranium. The sex was regarded as inconclusive if the yielded value was between -0.2
and +0.2. Each observer also performed the scoring procedure on the 50 skulls twice, blind to
the other observerâs ascriptions and to their own previous ascriptions. As with the first
method, the intraobserver and interobserver agreement was calculated for each trait and
tested for statistical significance.
13. 9
Fig 2: The suite of cranial traits used in sex classification, and their varying degrees of
sexualisation (from Walrath et al., 2004). Note the typographical error: âMasculineâ should
score +1.
14. 10
Skull:
IS score = â(score x weight)/ âweight = âŠâŠâŠ./22 = âŠâŠâŠâŠ.
SEX =
Fig 3: the scoring sheet used for Method 2
To gain more insight into the classifications that resulted from the two different methods, the
distribution of sex classifications in each method was compared. The classifications made by
the inexperienced observer, observer 1, were compared with those of the experienced
observer, observer 2, to give some idea of the approach inexperienced observers take to sex
classification. Observer 1 is an undergraduate student with very little osteological training
whereas observer 2 has several years of osteological experience.
Differences between agreements were tested statistically for significance using the 2-sample
t-test. The 2-sample t-test was applied to test the differences between the average
intraobserver agreements and between the average interobserver agreements of the traits in
Method 1 and 2. Equal variance was not assumed.
Trait Score Score x weight
Glabella (3)
Mastoid process (3)
Nuchal plane (3)
Zygomatic process of the temporal (3)
Superciliary arches (2)
Frontal & parietal eminences (2)
External occipital protuberance (2)
Zygomatics (2)
Frontal profile (1)
Orbital form (1)
TOTAL=
15. 11
Results
Intraobserver agreement
In method 1, observer 1âs intraobserver agreement ranged from 62-92% whereas those of
observer 2 ranged from 56-78% (Table 2), giving observer 1 a significantly higher average
intraobserver agreement (p<0.05). In method 2, the intraobserver agreements varied from 52-
82% for observer 1 and 56%-78% for observer 2 (Table 3) but the averages were not
significantly different (p=0.262).
Observer 1 seemed to have a higher average agreement in Method 2 than Method 1 whereas
the opposite was true for observer 2 who obtained a slightly higher agreement in Method 1.
However, neither of those differences were statistically significantly different at p=0.481 for
observer 1 and p=0.487 for observer 2.
Trait
intraobserver agreement in Method 1
(JD) observer 1 (SMc) observer 2
General size 64% 56%
Architecture 72% 70%
Supraorbital ridges 92% 78%
Mastoid processes 76% 64%
Occipital area 86% 70%
Frontal eminences 90% 62%
Parietal eminences 84% 68%
Orbits 74% 76%
Forehead 82% 64%
Cheek bones 66% 68%
Palate 80% 78%
Occipital condyles 62% 68%
AVERAGE 77.33% 68.5%
Table 2: Summary of the intraobserver and interobserver agreement rate for each trait in
Method 1.
16. 12
Trait
intraobserver agreement in Method 2
(JD) observer 1 (SMc) observer 2
Glabella 80% 70%
Mastoid process 82% 62%
Nuchal plane 82% 74%
Zygomatic process
of the temporal 70% 70%
Superciliary arches 80% 78%
Frontal and parietal
eminences 72% 76%
External occipital
protuberance 74% 76%
Zygomatics 72% 60%
Frontal profile 80% 70%
Orbital form 52% 68%
AVERAGE 74% 70.4%
Table 3: Summary of the intraobserver and interobserver agreement rate for each trait in
Method 2. The traits highlighted in red are those that were accompanied with diagrams.
With regard to individual traits, general size, occipital condyles and cheekbones had the
lowest combined agreement within observers in Method 1. Conversely, the three traits with
the highest agreement were the supraorbital ridges, occipital area and palate (Table 4). In
Method 2 the lowest agreement within observers was upon the orbital form, zygomatic bone
and zygomatic process of the temporal. The highest combined agreement, on the other hand,
was seen in the superciliary arches, nuchal plane and external occipital protuberance. The
traits that were illustrated in Method 2 are ranked in 3rd, 4th, 5th and 7th place (Table 5).
17. 13
Trait
Intraobserver agreement
rank in Method 1 Combined
rankObserver 1 Observer 2
Supraorbital ridges 1 1 1
Occipital area 3 4 2
Palate 6 1 2
Frontal eminences 2 7 3
Parietal eminences 4 5 3
Orbits 8 3 4
Forehead 5 6 4
Mastoid processes 7 6 5
Architecture 9 4 5
Cheek bones 10 5 6
Occipital condyles 12 5 7
General size 11 8 8
Table 4: Ranking of the intraobserver agreements in Method 1. Intraobserver agreement is
ranked according to the intraobserver agreements listed in Table 2. Those ranks were
combined and the lowest values were assigned the highest combined ranks.
Trait
Intraobserver agreement
rank in Method 2 Combined
rankObserver 1 Observer 2
Superciliary arches 2 1 1
Nuchal plane 1 3 2
External occipital
protuberance
3 2 3
Frontal and parietal
eminences
4 2 4
Glabella 2 4 4
Frontal profile 2 4 4
Mastoid process 1 6 5
Zygomatic process of
the temporal
5 4 6
Zygomatics 4 7 7
Orbital form 6 5 7
Table 5: Ranking of the intraobserver agreements in Method 2. The traits highlighted in red
are those that were accompanied with diagrams. Intraobserver agreement is ranked according
to the intraobserver agreements listed in Table 3. Those ranks were combined and the lowest
values were assigned the highest combined ranks.
18. 14
Interobserver agreement
The average interobserver agreement in Method 1 (48.17%) was almost 10% higher than that
of Method 2 (38.6%), but this difference was found to be non-significant (p=0.101). In
Method 1, there was the most correspondence in the classifications of the supraorbital ridges,
occipital area and orbits, yielding 68%, 62% and 62% interobserver agreement, respectively.
The remaining nine traits achieved between 34-52% agreement with the forehead, general
size and palate performing the worst (Table 6).
Table 6: Ranked interobserver agreements for each trait in Method 1 and the average
interobserver agreement.
The interobserver agreement in Method 2 ranged from 14%-60% with the top three ranking
traits being the superciliary arches, glabella and external occipital protuberance. The three
traits that the observers agreed upon the least in this method were the orbital form, zygomatic
bone size and nuchal plane. The traits that were illustrated in Method 2 are ranked at 2nd, 3rd,
5th and 10th (Table 7).
Trait
interobserver
agreement in Method 1 Rank
Supraorbital ridges 68% 1
Occipital area 62% 2
Orbits 62% 2
Parietal eminences 52% 3
Mastoid processes 48% 4
Occipital condyles 48% 4
Architecture 46% 5
Frontal eminences 44% 6
Cheek bones 40% 7
Palate 38% 8
General size 36% 9
Forehead 34% 10
AVERAGE 48.17%
19. 15
Trait
interobserver
agreement in Method 2 Rank
Superciliary arches 60% 1
Glabella 52% 2
External occipital
protuberance
48% 3
Frontal profile 44% 4
Mastoid process 42% 5
Frontal and parietal
eminences
40% 6
Zygomatic process of
the temporal
38% 7
Nuchal plane 26% 8
Zygomatics 22% 9
Orbital form 14% 10
AVERAGE 38.6%
Table 7: Ranked interobserver agreements for each trait in Method 2 and the average
interobserver agreement. The traits highlighted in red are those that were accompanied with
diagrams.
The traits that are shared by both Method 1 and 2 are the orbits, zygomatics, frontal profile,
frontal and parietal eminences, mastoid processes, supraorbital ridges, external occipital
protuberance and nuchal plane. It was noticed that interobserver agreement was consistently
higher in Method 1 than Method 2 for all the shared traits with the exception of the frontal
profile.
Overall sex classifications
Observer 1 consistently classified a much larger proportion of the fifty skulls as female, and
this is true for both methods (Table 8). Conversely, the proportions of male and female
classifications assigned by observer 2 were more even. Observer 2 also assigned a
consistently larger proportion as indeterminate compared to observer 1 and this is also true
for both methods. It was also noticed in both methods that both observers were more likely to
assign a female classification in the second round compared to the first round. Lastly, Method
20. 16
2 generally yielded more classifications from observers 1 and 2, going by the lower
indeterminate ascriptions.
Number of male, female and indeterminate
ascriptions
Observer 1 Observer 2
Round 1 Round 2 Round 1 Round 2
Method 1 M: 15
F: 29
IN: 6
M: 13
F: 31
IN: 6
M: 16
F: 15
IN: 19
M: 17
F: 17
IN: 16
Method 2 M: 12
F: 36
IN: 2
M: 7
F: 40
IN: 3
M: 23
F: 19
IN: 8
M: 20
F: 23
IN: 7
Table 8: Overall ascriptions made by observer 1 and 2 over two rounds of classifications in
Method 1 and 2.
At this point it can be confirmed that the null hypothesis cannot be rejected. There is no
difference between the average intraobserver agreements of Method 1 and 2, and no
difference between the interobserver agreements of Method 1 and 2.
21. 17
Discussion
The main finding in this study is that Method 1 and 2 are not significantly different in terms
of their reproducibility or subjectivity. In both methods, reproducibility was fairly high but
subjectivity was also high. Traits with diagrams in Method 2, with the exception of the orbits,
did seem to generally increase agreement between observers compared to the traits without
diagrams in Method 2, which is concordant with the findings reported by Walrath et al.
(2004). This indicates a reduction in subjectivity due to diagrams but this did not aid in
producing an average agreement rate surpassing that of Method 1.
The main ethos of Method 2, that of Walrath et al. (2004), was to promote standardisation
through illustrations and clear descriptions (Walker, 2008; Hefner, 2009). The diagrams
would illustrate the written descriptions with the purpose of decreasing scope for
misinterpretation. The series of descriptions also meant that degrees of trait sexualisation
were more defined than the dichotomous male/female descriptions in Method 1. This,
theoretically, reduces the ambiguity of the trait morphology expected of a male or female
skull. The effect of illustrated and carefully-defined descriptions would be a reduced need for
any observer to have an extensive background in osteology and sexing skulls (Gualdi-Russo
et al., 1999).
There may be a number of reasons why illustrations and more detailed descriptions did not
make Method 2 a more reliable sexing method. Only four of the ten traits were illustrated in
Method 2: the glabella, mastoid process, external occipital protuberance and orbital form.
This left the observer with more than half of the traits with no diagram and still a choice of
five categories to choose from (hyperfeminine, feminine, indeterminate, masculine and
hypermasculine), which had only slight distinctions from one to the next (Figure 1).
22. 18
Seemingly overlapping descriptions did not help. Despite demonstrating a continuum of trait
sexualisation, the difference between, for example, âmediumâ and âmoderateâ for the frontal
and parietal eminences was ambiguous. These descriptions were not explained by a diagram
and so one can therefore imagine how subjective impressions would arise from this.
Another limitation of Method 2 is the coupling of two aspects of the same trait together, such
as the shape of the orbits with the thickness of the orbital margin. When the shape of the
orbits of a skull would match the more masculine descriptions, the thickness of the margin
would frequently match the more feminine descriptions. This was also a problem in the study
of Walrath et al. (2004) and the point was made that although orbital form was illustrated,
lower interobserver agreement was due having to take two aspects into consideration. It
surfaced that the two observers in the current study approached this problem differently;
observer 1 assigned a midpoint score, which was frequently 0, whereas observer 2 frequently
assigned +1 as the score.
The fact that observers adopted different solutions for the same problem is concordant with
the finding that the intraobserver agreements were consistently higher than the interobserver
agreements. It indicates that although both observers had a fairly consistent set of standards
for their observations, those standards differed. The layout of Method 2 thus did not succeed
well in achieving standardisation and âcalibratingâ observers of differing osteological
background. Additionally, although the interobserver and intraobserver agreements yielded in
this study were not as successful as those in other sexing studies (Walrath et al., 2004;
Williams & Rogers, 2006; Walker, 2008), the phenomenon of intraobserver agreement
exceeding interobserver agreement is found in those studies.
23. 19
Method 1, that of Krogman (1955), was not without its drawbacks either. A limitation of this
method is the presence of dichotomous descriptions for traits which are not expressed
dichotomously. To demonstrate this, orbits are described as âsquaredâ for males and
âroundedâ for females. Several of the skulls concerned in this study showed orbits which were
neither circular nor squared whereas intermediate orbital forms were taken into consideration
by Method 2 (Walrath et al., 2004). Similarly, occipital condyles were not always âlargeâ or
âsmallâ, but more frequently a medium size.
Those descriptions that did take into account medium trait sizes, such as âlarge to mediumâ
and âsmall to mediumâ for the mastoid processes, induced a dilemma of which ascription to
allocate if the trait in question was medium sized. Even when a sectioning volume was given
(200cc) for general size to define the boundary between a âlargeâ and âsmallâ cranium, it was
difficult to visualise a 200cc volume without involving additional measuring material.
Having said the above, when comparing the proportion of observer agreement for the traits
that were shared by Method 1 and 2, interobserver agreement for the traits in Method 1 was
consistently higher. Therefore, upon closer inspection, dichotomous male/female descriptions
in Method 1 seemed to decrease subjectivity and leave less room for observer disagreement,
despite its disadvantages. This implies that if Methods 1 and 2 classified sex using the exact
same suite of cranial traits, Method 1 could, in fact, be proven as a less subjective sexing
method.
After having considered the advantages and limitations specific to each method, a suggestion
for methodological improvement would be to complement dichotomous descriptions with
24. 20
diagrams. It is unlikely to be of any benefit if the traits from both methods were amalgamated
into one suite of traits as Walrath et al. (2004) found an increase in the number of traits does
not necessarily increase observer agreement, but that the quality of these traits are more
important.
Another general issue expressed by observers was with the lack of clarity as to which aspect
of certain traits the descriptions were referring. In Method 1 this applied to the cheek bones
where it was unclear what was meant by the terms âheavierâ and âlighterâ. Less experienced
observer 1 took this to be the thickness of the zygomatic bone whereas more experienced
observer 2 concentrated on the vertical height. This trait was also of interest in Method 2
which referred to the height of the zygomatics. Height, again, could refer to the vertical
height of the bone or height position of the zygomatics in relation to the skull, and this has
not been clarified by other studies utilising the zygomatics to sex the skull (Ferembach et al.,
1980).
There was also observer discordance in interpreting the descriptions for the palate in Method
1. Krogman (1955) described the male palate as âu-shapedâ and the female palate as
âparabolicâ; observer 1 looked for a parabolic shape in the coronal plane whereas observer 2
looked for the parabolic shape in the transverse plane. Williams and Rogers (2006) also
observed palate shape but did not define the plane in which palate shape was observed.
However, Ferembach et al. (1980) uses the words âroundedâ and âellipticalâ when referring to
male and female dental arch shape, respectively. It is then postulated that âdental archâ is
interchangeable with âpalateâ and so implies that observer 2 was right in observing palatal
shape in the transverse plane.
25. 21
In method 2, a similar problem was found with the glabella. Frequently in the more
masculine skulls, the supraorbital ridges were more prominent than the glabella, and so from
the lateral view of the skull, the glabella was occluded. This then arose the question of
whether the diagrams illustrating glabella expression was referring to the midline of the
glabella or the lateral view of the skull. In the study of Walker (2008), the glabella and
supraorbital ridges were taken as one trait and the descriptions instructed the observer to look
at the skull from the lateral side and compare it with the series of diagrams. Even when the
glabella and supraorbital ridges were taken together, they were deemed the best sex indicator
in this study and so this indicates that looking at the profile of the glabella and supraorbital
ridges from the lateral view is sufficient.
Finally, whilst the occipital condyles were described as âlargeâ or âsmallâ in Method 1, the
occurrence of very long but narrow occipital condyles seemed to be rather frequent.
According to the results of one study (Wescott et al., 2001), occipital condyle width was the
least reliable sex predictor within the occipital bone, which implies that perhaps descriptions
for the occipital condyles could concentrate on the length rather than overall size.
The aforementioned shortcomings of specific trait descriptions could be resolved through
increasing precision and decreasing ambiguity of the descriptions. Walker (2008) yielded
very high intraobserver and interobserver agreements between 20 observers, 14 of which
were undergraduate students. This indicates high reproducibility and low subjectivity
regardless of experience, and this could be due to the nature of the descriptions used in this
study. The detailed descriptions instructed how the observer should orient the skull, what to
palpate, and provided extra information such as pointing out that the volume of the mastoid
process was of interest, not the length.
26. 22
Another novel idea in the array of descriptions was to provide means for comparison, such as
comparing a hyperfeminine orbital margin thickness to the âedge of a dull knifeâ and a
hypermasculine orbital margin curvature to a âpencilâ (Walker, 2008). Walker also enabled
the observer to judge the size of a trait in relation to other cranial features, and an example of
this is seen where a small, hyperfeminine mastoid process is said to project âonly a small
distance below the inferior margins of the external auditory meatus and the digastric grooveâ.
Taking all of the limitations and suggestions for improvement into consideration, an ideal
methodological format would be where traits are dichotomously classified and more than one
aspect of the same trait would be separated into two distinct features. Each trait would also be
illustrated and complemented with descriptions of maximal amounts of clear detail using
comparison points, as seen in Walker (2008). This would result in a decrease in subjectivity
and an increase in reproducibility of sex classifications (Bruzeck & Murail, 2006). Illustrating
descriptions such as âruggedâ and âsmoothâ with regard to cranial architecture would be a
challenge as architecture is judged primarily through palpation rather than visual impression.
Therefore tactile information could be supplied where architecture could be compared to a
marble surface or a paving slab, for example.
Supraorbital ridges, glabella, external occipital protuberance and nuchal planes would all be
used to sex crania as they achieved higher combined observer agreements in this study. The
mastoid processes, size and architecture would also prove useful in sex estimation, despite
producing lower observer agreements, as these traits were deemed the most accurate sex
predictors and achieved the lowest intraobserver error in a study (Williams & Rogers, 2006).
27. 23
A future study could involve comparing a method of the above format to Method 1, put
together by Krogman (1955), with regards to their reproducibility and subjectivity. If the
sample group of crania could be of known sex, then the classification accuracy of each
method could also be known and compared.
As well as the reliability and subjectivity of the different formats of Methods 1 and 2, it was
of interest to compare the difference in approaches taken by the two observers to these
methods. By doing so, we could gain more of an insight into how the amount of observer
experience affects classification behaviour, as observer 1 had very little experience in sex
classification whereas observer 2 had several years of experience. The results showed that the
inexperienced observer (observer 1) was more likely to ascribe a sex to a skull than the
experienced observer (observer 2), who allocated an indeterminate sex ascription more often.
Within observer 1âs ascriptions, there was a heavy female bias whereas observer 2 assigned a
much more equal spread of male and female classifications.
This phenomenon of the inexperienced observer assigning more sex classifications, and
biased classifications, than the experienced observer is also seen in another study (Duric et
al., 2005). The classifications of the inexperienced observer in this study were heavily male
biased and the proportion of indeterminate classifications assigned was much less than that of
the experienced observer. Gualdi-Russo et al. (1999) also found increased discordance in
classifications of observers with different levels of experience when traits were difficult to
identify or classify due to inadequate definitions of the traits.
A way of overcoming this problem would be to remind inexperienced observers that it is
better to allocate an indeterminate classification than to proceed to erroneously classify it
28. 24
(Konigsberg et al., 2009). The question remains as to whether training should be provided by
the experienced observer to the inexperienced observer prior to performing the
classifications, due to the risk of bias. Walker (2008) talks about the inherent sex
classification bias of an observer who has gained several years of experience within a specific
population, as cranial sexualisation is expressed differently across different populations. One
observer training another would train them to sex a specific cohort of skulls, and if the skulls
of interest were of a different race, the classification results would be skewed.
Gualdi-Russo et al. (1999) put their observers through a training procedure before proceeding
through the experiment, for standardisation purposes. Perhaps a standard, brief training
session free of bias should be devised so that all observers are clear on trait definitions and
the aspects to which the descriptions are referring. What the training should not intend to do
is use sample skulls to demonstrate the definition of a large or small glabella, for example, as
that would lead to biased perceptions amongst the observers.
All things considered, the main conclusion of this study is that unambiguous, detailed trait
descriptions and ease of classification are of paramount importance to the consistency and
concurrence of sex classifications within and between observers. Inexperienced observers
should be discouraged from making sex classifications when in doubt and the key to a
successful sex classification method is achieving as much standardisation as possible, within
the method and within the observers themselves.
30. 26
Walrath DE, Turner P, Bruzek J. 2004. Reliability Test of the Visual Assessment of Cranial
Traits for Sex Determination. Am J Phys Anthropol 125:132-137.
Wescott DJ, Moore-Jansen PH. 2001. Metric variation in the human occipital bone: forensic
anthropological applications. J Forensic Sci 46:1159-63.
Williams BA, Rogers TL. 2006. Evaluating the Accuracy and Precision of Cranial
Morphological Traits for Sex Determination. J Forensic Sci 51:729-735.