SlideShare a Scribd company logo
John Blake
University of Aizu, Japan
Inter-annotator agreement:
By hook or by crook
www.orau.org
Overview
• Background
• Case study
– Annotation of scientific research abstracts
– Strategic decision points
• Findings
– Methodological improvements
– Statistical smoke and rhetorical mirrors
• Conclusions
2
Subjectivity in annotation
3
POS tagging,
Phonetic
transcription
etc.
Annotation
guidelines with
discussion of
boundary
cases
Basic
annotation
guidelines
Speaker
intuition, e.g.
discourse
annotation,
pragmatics,
etc.
Problem:
Vagueness and ambiguity in natural languages
Manning (2011) 97.321 / 10021 = 56.28 %
Automated and manual
annotation compared
4
Automated annotation Manual annotation
Subjective agent Software developer Annotator
Subjective stage Prior to annotation During annotation
Replicability (near) Perfect Variable
Initial set up cost High (if new software) Low
On-going cost (near) Zero High
Scalable Yes No
Dependent
condition
Availability of training
set
Availability of
annotators (contingent
on time/money)
Speed (near) Instantaneous Variable
Factors
considered
Endogeneric Endo- and exogeneric
Strength Grammatical parsing Semantic parsing
Inter-annotator agreement
5
Crucial issue: Are the annotations correct?
We are interested in validity
• Ability to discriminate without error by placing item into appropriate category
But there is no “Ground truth”
• Linguistic categories are determined by human judgement
 Implication: We cannot measure correctness directly
So we measure reliability , e.g. reproducibility.
• Intra-annotator reliability
• Inter-annotator reliability
i.e. whether human coders/annotators consistently make same decisions
 Assumption 1: lack of reliability rules out validity (text/training issues)
 Assumption 2: high reliability implies validity
Terminology credit (Artsein & Poesio, 2008)
Idea adapted from Boldea & Evert (2009) : https://clseslli09.files.wordpress.com/2009/07/02_iaa-
slides2.pdf/
Simple example 1
6
(abbreviated for length to increase readability)
Sentence Coder
1
Coder
2
Agreement
We address the problem of …… recognition I P 
Our aim is to …recognize [x] from [y]. P P 
[A] is set up as prior information, and its pose is
determined by three parameters, which are [j,k and l].
M M 
An efficient local gradient-based method is proposed to
…, which is combined into … framework to estimate [V
and W] by iterative evolution
P R 
It is shown that the local gradient-based method can
evaluate accurately and efficiently [V and W] .
R R 
Observed agreement between 1 and 2 is 60%
IAA measures: Kappa coefficient
7
Inter-annotator agreement of 60% in previous example, but
chance agreement figure is 20%. Agreement measures must
be corrected for chance agreement (Carletta, 1996).
Kappa coefficient (Cohen 1960 for 2, Fleiss for 2+)
e.g. Corrected measure: K =
P A −P E
1−𝑃(𝐸)
1 (agreement) 0 (no correlation) -1(disagreement)
Interpretation of Kappa
• Landis and Koch (1977) 0.6-0.79 substantial; 0.8+ perfect
• Krippendorff (1980) 0.67-0.79 tentative; 0.8+ good
• Green (1997) 0.4-0.74 fair/good; 0.75 high
IAA measures: Sophisticated
8
e.g. Typical measures used in computational linguistics built
into NLP pipelines, such as NLTK and GATE
Rather than measuring agreement alone, we can measure
both agreement and disagreement, e.g. using Measuring
agreement on set-valued items (MASI) and/or Jaccard
distance. Both MASI (Passonneau, 2006) and Jaccard distance
make use of the union and intersection between sets.
Jaccard formula (Jaccard, 1908 cited in Dunn & Everitt, 2004)
is:
Case study overview
• Moves in scientific research abstracts
• Scientific disciplines
• Core corpus specifications
• Example abstract
• Tagset
• Strategic decision points (tag #IAA extraction)
NB: By convention this far-from-linear study is
presented in a linear fashion when in fact there
were numerous forks, dead-ends and iterations.
9
Moves in scientific research abstracts
10
Move definition
“a discoursal or rhetorical unit that performs a coherent
communicative function in a written or spoken discourse”.
(Swales, 2004, p.228)
Move sequences
Example (very short) abstract
5-move code Introduction Purpose Method Results Discussion
Scientific disciplines
11
Science
Fundamental
Empirical
Natural
Physical Materials science
Life Botany
Social Linguistics
Theoretical Formal
Information
theory
Applied
Engineering
Evolutionary
computation
Knowledge & data
engineering
Image processing
Wireless
computing
Electronic
engineering
Healthcare Medical
Core 1000 corpus specifications
12
Code Journal name #
abstracts
#
words
1 EC Transactions on Evolutionary Computation 100 17,433
2 KDE Transactions on Knowledge and Data Engineering 100 18,407
3 IP Transactions on Image Processing 100 16,859
4 IT Transactions on Information Theory 100 15,982
5 WC Transactions on Wireless Communications 100 15,971
6 Mat Advanced materials 100 6.078
7 Bot The plant cell 100 19,981
8 Ling App. Ling; Journal of Comm; J of Cog. Neurosc. 100 13,587
9 Eng Transactions on Industrial Electronics 100 14,569
10 Med British Medical Journal 100 29,437
Total 1000 162,232
First 100 abstracts of research articles from top-tier journals published
from Jan 2012.
We study the detection error probability associated with a balanced
binary relay tree, where the leaves of the tree correspond to N
identical and independent sensors. The root of the tree represents a
fusion center that makes the overall detection decision. Each of the
other nodes in the tree is a relay node that combines two binary
messages to form a single output binary message. Only the leaves are
sensors. In this way, the information from the sensors is aggregated
into the fusion center via the relay nodes. In this context, we describe
the evolution of the Type I and Type II error probabilities of the binary
data as it propagates from the leaves toward the root. Tight upper and
lower bounds for the total error probability at the fusion center as
functions of N are derived. These characterize how fast the total error
probability converges to 0 with respect to N , even if the individual
sensors have error probabilities that converge to 1/2.
[IT 120616]
Standard abstract (IT)
13
Tagset
14
Manual annotation using UAM Corpus Tool 2.X and 3.X (O`Donnell, 2015)
This layer of annotation is for rhetorical moves.
There are 5 choices of moves and 6 choices of submoves.
In short, each ontological unit is assigned to one of 9 choices.
The “uncertain” tag is designed as a temporary label.
#IAA theme extraction
Strategic decision points
• Research log was kept using themes, e.g. #meth,
#stats, #IAA
• 142 notes relating to #IAA written between 2012-
2017 were identified.
• The findings presented are the notes that are the
most important and generalizable to other
projects.
15
Findings overview:
Three types of strategic decisions affecting IAA
1. Methodological decisions
2. Statistical decisions
3. Rhetorical decisions
16
Findings (1)
Methodological choices to enhance IAA
A. Ontological unit
B. Tagset size
C. Tag clarity of demarcation
D. Catch-all tags
E. Detailed coding booklet
F. Pre-selection, training and testing
G. Easy-to-use tools
H. Monitoring, feedback and regular meetings
I. Pilot studies and small trials
17
Finding 1a: Ontological unit
18
Fixed ontological units (i.e. what you code), e.g. each
word, each sentence, simplify calculation of IAA and
increase the IAA since boundaries of each unit are
identical.
Variable ontological units provide researchers with
additional choices on how to calculate (manipulate?)
IAA – identical, subsumed, cross-over. How do you
calculate by character (inc. white space?), letter,
word, what unit?
I love you. 8 letters, 3 words, 11 characters
I love him. Agreement ratio 0.62, 0.67, 0.72
Finding 1b: Tagset size
The more tags, the less agreement
Rissanen (1989, as cited in Archer, 2012, n.p.) points out the
“mystery of vanishing reliability”
i.e. the statistical unreliability of annotation that is too detailed.
Obvious with hindsight, but researchers tend to develop tags
that will inform their research rather than result in higher IAA.
1 tag = total agreement (but probably no reason to code)
10 tags = less agreement
100 tags = much less agreement
1000 tags = almost no chance of high IAA
19
Finding 1c:
Tagset clarity of demarcation
Pilot studies of possible tags and tagsets
Pilot study:
Tagged 100 abstracts using IMRD move and CARS move tags
Difficulty:
1. prevalence of method in IMRD positions
2. demarcation of boundary cases  created SOP, codified in
coding booklet
Final selection:
Dropped both sets of tags and selected Hyland (2004, p.67)
IPMPC tagset20
Finding 1d: Catch-all tags
21
Tags Description
Fuzzy Used when difficult to assign to tag in
existing tagset
Multiple Used when more than one tag applies
Portmanteau Used when item transcends two tag
domains
Problematic Used when impossible to assign tag
Archer (2012, n.p.) describes four tag types, all of which
increase IAA by providing easy-to-code options for
boundary cases.
My “uncertain” tag is a catch-all. Calculating IAA
including “uncertain” results in higher IAA.
Finding 1e:
Annotation (coding) booklet
22
Standard operating procedure
• Guidelines, Rules, Examples, Borderline cases disambiguated
Finding 1f:
Training course and test
23
Course based on annotation booklet
• Face-to-face and/or online
Test based on annotation booklet
• Serialist tests
• Holistic tests
Qualification cut-off points
• e.g. 90% can start annotating
• e.g. 61% needs additional training
• e.g. 60% discontinue training
Finding 1g:
Easy-to-use annotation tools
24
• Tool and instructions!
• UAM Corpus Tool – help forum in Spanish
• Wrote project-specific instruction booklet for annotators
Finding 1h: monitoring,
feedback and regular meetings
25
These three aspects I believe led to greater retention of
annotators and higher accuracy.
• More monitoring in initial stages (real-time is possible in GATE)
– to identify problems early
• Constructive actionable feedback
– to retain annotator and increase accuracy
• Regular meetings
– annotators who cancelled meetings tended to have a
problem (either with annotation or in their life).
I helped with annotation issues.
Finding 1i: Pilot studies
26
Various pilot studies and small-scale trials.
Enables researcher to discover issues and proactively avert potential problems
• 136 abstracts SFL annotation of process, participant and circumstance
• 136 abstracts SFL annotation of sub-categories of circumstance
• 10 abstracts Multimethod
• 500 abstracts Lexicogrammatical
• 40 abstracts Specialist vs linguist IMRaD annotation
• 100 abstracts Tagset selection (CARS vs IMRaD)
• 3 people Development of Coding booklet
• 10 abstracts Examples vs. Coding booklet
• 2 people Development of training course
• 500 abstracts Rhetorical moves using coding booklet by self
• 1000 abstracts Rhetorical moves using coding booklet by self & annotators
• 2500 abstracts Rhetorical moves using coding booklet by annotators
Findings (2)
Statistical choices to enhance IAA
A. Cherry-picking population-sample size ratio
B. Random vs systematic
C. Dealing with outliers (annotators)
• Omit [+justify?]; replace with mean [?]
D. Sample selection:
• early vs later coding
• pre-discussion vs. post-discussion
E. Granularity (see next slide)
• Reducing granularity by merging units; fewer
categories, higher agreement
27
Finding 2e: Granularity
28
Measures of IAA increase greatly as granularity decreases
Lower
IAA
Higher
IAA
Findings (3)
Rhetorical choices to enhance IAA
Claim high IAA with no further details
+ gold standard with no further details and/or
+ provide a simple ratio or percentage and/or
+ provide details of sample size
Rely on vagueness and ambiguity to allow reader to
infer higher IAA than found or actual high IAA.
29
Conclusion
High IAA may be due to
• sound or cogent methodological choices;
but it could also be due to manipulating the
• statistical smoke
(i.e. selecting parameters leading to higher IAA)
and
• rhetorical mirrors.
(i.e. using vagueness/ambiguity to infer IAA is high)
In most publications in applied linguistics, sufficient
detail is not provided.
30
Best practice suggestions
• Annotate using tags at one level more finely.
• Create annotation booklet with clear rules,
examples and discussion of boundary cases.
• Develop, trial and require all annotators to
complete a training course.
• Set a benchmark standard.
• Monitor and provide constructive actionable
feedback to annotators.
• Report IAA in sufficient detail to convince
skeptical readers.
31
Beware of the
skeleton in the cupboard
• Researchers aim to
portray their work as
sound or cogent.
• Actual IAA may differ
from reported IAA
• Be wary of statistical
smoke and
rhetorical mirrors
32
Any questions, suggestions or
comments?
John Blake
jblake@u-aizu.ac.jp

More Related Content

What's hot

Data Science
Data ScienceData Science
Data Science
Amit Singh
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
Marina Santini
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
zekeLabs Technologies
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Edureka!
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
NLP
NLPNLP
Cross-lingual Information Retrieval
Cross-lingual Information RetrievalCross-lingual Information Retrieval
Cross-lingual Information Retrieval
Shadi Saleh
 
Machine learning
Machine learning Machine learning
Machine learning
HarshitSingh81541
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
Gregory Piatetsky-Shapiro
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & Challenges
Shilpi Sharma
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
Jason Geng
 
Machine learning
Machine learningMachine learning
Machine learning
Amit Kumar Rathi
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
Arunabha Saha
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
Tomer Lieber
 
supervised learning
supervised learningsupervised learning
supervised learning
Amar Tripathi
 
Aspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic LanguageAspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic Language
Mido Razaz
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
Rahul Jaiman
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
Tony Russell-Rose
 

What's hot (20)

Data Science
Data ScienceData Science
Data Science
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
NLP
NLPNLP
NLP
 
Cross-lingual Information Retrieval
Cross-lingual Information RetrievalCross-lingual Information Retrieval
Cross-lingual Information Retrieval
 
Machine learning
Machine learning Machine learning
Machine learning
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & Challenges
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Aspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic LanguageAspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic Language
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
 

Similar to Interannotator Agreement

ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
Konstantinos Zagoris
 
Data analysis
Data analysisData analysis
Data analysis
AnandDesshpande
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
Chimezie Ogbuji
 
Don't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptxDon't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptx
Förderverein Technische Fakultät
 
Icsm19.ppt
Icsm19.pptIcsm19.ppt
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Johann Petrak
 
Utility of topic extraction on customer experience data
Utility of topic extraction on customer experience dataUtility of topic extraction on customer experience data
Utility of topic extraction on customer experience data
Kiran Karkera
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSS
Rayman Soe
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
AlgoAnalytics Financial Consultancy Pvt. Ltd.
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
Xavier Ochoa
 
Argument Papers (5-7 pages in length)1. Do schools perpe.docx
Argument Papers (5-7 pages in length)1. Do schools perpe.docxArgument Papers (5-7 pages in length)1. Do schools perpe.docx
Argument Papers (5-7 pages in length)1. Do schools perpe.docx
fredharris32
 
Evidence-based Semantic Web Just a Dream or the Way to Go?
Evidence-based Semantic WebJust a Dream or the Way to Go?Evidence-based Semantic WebJust a Dream or the Way to Go?
Evidence-based Semantic Web Just a Dream or the Way to Go?
Dragan Gasevic
 
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similarity
pathsproject
 
1st sem
1st sem1st sem
1st sem
nastysuman009
 
1st sem
1st sem1st sem
1st sem
nastysuman009
 
Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...
Aliaksandr Birukou
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Item
barthriley
 
Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020
Daniel Kershaw
 
Optimization of Mechanical Design Problems Using Improved Differential Evolut...
Optimization of Mechanical Design Problems Using Improved Differential Evolut...Optimization of Mechanical Design Problems Using Improved Differential Evolut...
Optimization of Mechanical Design Problems Using Improved Differential Evolut...
IDES Editor
 
Optimization of Mechanical Design Problems Using Improved Differential Evolut...
Optimization of Mechanical Design Problems Using Improved Differential Evolut...Optimization of Mechanical Design Problems Using Improved Differential Evolut...
Optimization of Mechanical Design Problems Using Improved Differential Evolut...
IDES Editor
 

Similar to Interannotator Agreement (20)

ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
 
Data analysis
Data analysisData analysis
Data analysis
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
 
Don't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptxDon't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptx
 
Icsm19.ppt
Icsm19.pptIcsm19.ppt
Icsm19.ppt
 
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
 
Utility of topic extraction on customer experience data
Utility of topic extraction on customer experience dataUtility of topic extraction on customer experience data
Utility of topic extraction on customer experience data
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSS
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Argument Papers (5-7 pages in length)1. Do schools perpe.docx
Argument Papers (5-7 pages in length)1. Do schools perpe.docxArgument Papers (5-7 pages in length)1. Do schools perpe.docx
Argument Papers (5-7 pages in length)1. Do schools perpe.docx
 
Evidence-based Semantic Web Just a Dream or the Way to Go?
Evidence-based Semantic WebJust a Dream or the Way to Go?Evidence-based Semantic WebJust a Dream or the Way to Go?
Evidence-based Semantic Web Just a Dream or the Way to Go?
 
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similarity
 
1st sem
1st sem1st sem
1st sem
 
1st sem
1st sem1st sem
1st sem
 
Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Item
 
Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020Elsevier Industry Talk - WSDM 2020
Elsevier Industry Talk - WSDM 2020
 
Optimization of Mechanical Design Problems Using Improved Differential Evolut...
Optimization of Mechanical Design Problems Using Improved Differential Evolut...Optimization of Mechanical Design Problems Using Improved Differential Evolut...
Optimization of Mechanical Design Problems Using Improved Differential Evolut...
 
Optimization of Mechanical Design Problems Using Improved Differential Evolut...
Optimization of Mechanical Design Problems Using Improved Differential Evolut...Optimization of Mechanical Design Problems Using Improved Differential Evolut...
Optimization of Mechanical Design Problems Using Improved Differential Evolut...
 

More from john6938

Martial artist's guide to research writing
Martial artist's guide to research writingMartial artist's guide to research writing
Martial artist's guide to research writing
john6938
 
Social Media Ethics.pptx
Social Media Ethics.pptxSocial Media Ethics.pptx
Social Media Ethics.pptx
john6938
 
Future of Information Ethics.pptx
Future of Information Ethics.pptxFuture of Information Ethics.pptx
Future of Information Ethics.pptx
john6938
 
Bioethics.pptx
Bioethics.pptxBioethics.pptx
Bioethics.pptx
john6938
 
Surveillance and security.pptx
Surveillance and security.pptxSurveillance and security.pptx
Surveillance and security.pptx
john6938
 
Introduction to Expert Systems.pptx
Introduction to Expert Systems.pptxIntroduction to Expert Systems.pptx
Introduction to Expert Systems.pptx
john6938
 
Starbuck.pptx
Starbuck.pptxStarbuck.pptx
Starbuck.pptx
john6938
 
Unit 4 Problem breakdown.pptx
Unit 4 Problem breakdown.pptxUnit 4 Problem breakdown.pptx
Unit 4 Problem breakdown.pptx
john6938
 
Image_recognition.pptx
Image_recognition.pptxImage_recognition.pptx
Image_recognition.pptx
john6938
 
Algorithms.pptx
Algorithms.pptxAlgorithms.pptx
Algorithms.pptx
john6938
 
Artificial_intelligence.pptx
Artificial_intelligence.pptxArtificial_intelligence.pptx
Artificial_intelligence.pptx
john6938
 
Image_generation.pptx
Image_generation.pptxImage_generation.pptx
Image_generation.pptx
john6938
 
Computer_Graphics.pptx
Computer_Graphics.pptxComputer_Graphics.pptx
Computer_Graphics.pptx
john6938
 
Security.pptx
Security.pptxSecurity.pptx
Security.pptx
john6938
 
Gravitational_wave_detection.pptx
Gravitational_wave_detection.pptxGravitational_wave_detection.pptx
Gravitational_wave_detection.pptx
john6938
 
Embedded_Systems.pptx
Embedded_Systems.pptxEmbedded_Systems.pptx
Embedded_Systems.pptx
john6938
 
Software_engineering.pptx
Software_engineering.pptxSoftware_engineering.pptx
Software_engineering.pptx
john6938
 
Quantum_computers.pptx
Quantum_computers.pptxQuantum_computers.pptx
Quantum_computers.pptx
john6938
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
john6938
 
Sensors_SLAM.pptx
Sensors_SLAM.pptxSensors_SLAM.pptx
Sensors_SLAM.pptx
john6938
 

More from john6938 (20)

Martial artist's guide to research writing
Martial artist's guide to research writingMartial artist's guide to research writing
Martial artist's guide to research writing
 
Social Media Ethics.pptx
Social Media Ethics.pptxSocial Media Ethics.pptx
Social Media Ethics.pptx
 
Future of Information Ethics.pptx
Future of Information Ethics.pptxFuture of Information Ethics.pptx
Future of Information Ethics.pptx
 
Bioethics.pptx
Bioethics.pptxBioethics.pptx
Bioethics.pptx
 
Surveillance and security.pptx
Surveillance and security.pptxSurveillance and security.pptx
Surveillance and security.pptx
 
Introduction to Expert Systems.pptx
Introduction to Expert Systems.pptxIntroduction to Expert Systems.pptx
Introduction to Expert Systems.pptx
 
Starbuck.pptx
Starbuck.pptxStarbuck.pptx
Starbuck.pptx
 
Unit 4 Problem breakdown.pptx
Unit 4 Problem breakdown.pptxUnit 4 Problem breakdown.pptx
Unit 4 Problem breakdown.pptx
 
Image_recognition.pptx
Image_recognition.pptxImage_recognition.pptx
Image_recognition.pptx
 
Algorithms.pptx
Algorithms.pptxAlgorithms.pptx
Algorithms.pptx
 
Artificial_intelligence.pptx
Artificial_intelligence.pptxArtificial_intelligence.pptx
Artificial_intelligence.pptx
 
Image_generation.pptx
Image_generation.pptxImage_generation.pptx
Image_generation.pptx
 
Computer_Graphics.pptx
Computer_Graphics.pptxComputer_Graphics.pptx
Computer_Graphics.pptx
 
Security.pptx
Security.pptxSecurity.pptx
Security.pptx
 
Gravitational_wave_detection.pptx
Gravitational_wave_detection.pptxGravitational_wave_detection.pptx
Gravitational_wave_detection.pptx
 
Embedded_Systems.pptx
Embedded_Systems.pptxEmbedded_Systems.pptx
Embedded_Systems.pptx
 
Software_engineering.pptx
Software_engineering.pptxSoftware_engineering.pptx
Software_engineering.pptx
 
Quantum_computers.pptx
Quantum_computers.pptxQuantum_computers.pptx
Quantum_computers.pptx
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
Sensors_SLAM.pptx
Sensors_SLAM.pptxSensors_SLAM.pptx
Sensors_SLAM.pptx
 

Recently uploaded

BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5
sayalidalavi006
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 

Recently uploaded (20)

BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 

Interannotator Agreement

  • 1. John Blake University of Aizu, Japan Inter-annotator agreement: By hook or by crook www.orau.org
  • 2. Overview • Background • Case study – Annotation of scientific research abstracts – Strategic decision points • Findings – Methodological improvements – Statistical smoke and rhetorical mirrors • Conclusions 2
  • 3. Subjectivity in annotation 3 POS tagging, Phonetic transcription etc. Annotation guidelines with discussion of boundary cases Basic annotation guidelines Speaker intuition, e.g. discourse annotation, pragmatics, etc. Problem: Vagueness and ambiguity in natural languages Manning (2011) 97.321 / 10021 = 56.28 %
  • 4. Automated and manual annotation compared 4 Automated annotation Manual annotation Subjective agent Software developer Annotator Subjective stage Prior to annotation During annotation Replicability (near) Perfect Variable Initial set up cost High (if new software) Low On-going cost (near) Zero High Scalable Yes No Dependent condition Availability of training set Availability of annotators (contingent on time/money) Speed (near) Instantaneous Variable Factors considered Endogeneric Endo- and exogeneric Strength Grammatical parsing Semantic parsing
  • 5. Inter-annotator agreement 5 Crucial issue: Are the annotations correct? We are interested in validity • Ability to discriminate without error by placing item into appropriate category But there is no “Ground truth” • Linguistic categories are determined by human judgement  Implication: We cannot measure correctness directly So we measure reliability , e.g. reproducibility. • Intra-annotator reliability • Inter-annotator reliability i.e. whether human coders/annotators consistently make same decisions  Assumption 1: lack of reliability rules out validity (text/training issues)  Assumption 2: high reliability implies validity Terminology credit (Artsein & Poesio, 2008) Idea adapted from Boldea & Evert (2009) : https://clseslli09.files.wordpress.com/2009/07/02_iaa- slides2.pdf/
  • 6. Simple example 1 6 (abbreviated for length to increase readability) Sentence Coder 1 Coder 2 Agreement We address the problem of …… recognition I P  Our aim is to …recognize [x] from [y]. P P  [A] is set up as prior information, and its pose is determined by three parameters, which are [j,k and l]. M M  An efficient local gradient-based method is proposed to …, which is combined into … framework to estimate [V and W] by iterative evolution P R  It is shown that the local gradient-based method can evaluate accurately and efficiently [V and W] . R R  Observed agreement between 1 and 2 is 60%
  • 7. IAA measures: Kappa coefficient 7 Inter-annotator agreement of 60% in previous example, but chance agreement figure is 20%. Agreement measures must be corrected for chance agreement (Carletta, 1996). Kappa coefficient (Cohen 1960 for 2, Fleiss for 2+) e.g. Corrected measure: K = P A −P E 1−𝑃(𝐸) 1 (agreement) 0 (no correlation) -1(disagreement) Interpretation of Kappa • Landis and Koch (1977) 0.6-0.79 substantial; 0.8+ perfect • Krippendorff (1980) 0.67-0.79 tentative; 0.8+ good • Green (1997) 0.4-0.74 fair/good; 0.75 high
  • 8. IAA measures: Sophisticated 8 e.g. Typical measures used in computational linguistics built into NLP pipelines, such as NLTK and GATE Rather than measuring agreement alone, we can measure both agreement and disagreement, e.g. using Measuring agreement on set-valued items (MASI) and/or Jaccard distance. Both MASI (Passonneau, 2006) and Jaccard distance make use of the union and intersection between sets. Jaccard formula (Jaccard, 1908 cited in Dunn & Everitt, 2004) is:
  • 9. Case study overview • Moves in scientific research abstracts • Scientific disciplines • Core corpus specifications • Example abstract • Tagset • Strategic decision points (tag #IAA extraction) NB: By convention this far-from-linear study is presented in a linear fashion when in fact there were numerous forks, dead-ends and iterations. 9
  • 10. Moves in scientific research abstracts 10 Move definition “a discoursal or rhetorical unit that performs a coherent communicative function in a written or spoken discourse”. (Swales, 2004, p.228) Move sequences Example (very short) abstract 5-move code Introduction Purpose Method Results Discussion
  • 11. Scientific disciplines 11 Science Fundamental Empirical Natural Physical Materials science Life Botany Social Linguistics Theoretical Formal Information theory Applied Engineering Evolutionary computation Knowledge & data engineering Image processing Wireless computing Electronic engineering Healthcare Medical
  • 12. Core 1000 corpus specifications 12 Code Journal name # abstracts # words 1 EC Transactions on Evolutionary Computation 100 17,433 2 KDE Transactions on Knowledge and Data Engineering 100 18,407 3 IP Transactions on Image Processing 100 16,859 4 IT Transactions on Information Theory 100 15,982 5 WC Transactions on Wireless Communications 100 15,971 6 Mat Advanced materials 100 6.078 7 Bot The plant cell 100 19,981 8 Ling App. Ling; Journal of Comm; J of Cog. Neurosc. 100 13,587 9 Eng Transactions on Industrial Electronics 100 14,569 10 Med British Medical Journal 100 29,437 Total 1000 162,232 First 100 abstracts of research articles from top-tier journals published from Jan 2012.
  • 13. We study the detection error probability associated with a balanced binary relay tree, where the leaves of the tree correspond to N identical and independent sensors. The root of the tree represents a fusion center that makes the overall detection decision. Each of the other nodes in the tree is a relay node that combines two binary messages to form a single output binary message. Only the leaves are sensors. In this way, the information from the sensors is aggregated into the fusion center via the relay nodes. In this context, we describe the evolution of the Type I and Type II error probabilities of the binary data as it propagates from the leaves toward the root. Tight upper and lower bounds for the total error probability at the fusion center as functions of N are derived. These characterize how fast the total error probability converges to 0 with respect to N , even if the individual sensors have error probabilities that converge to 1/2. [IT 120616] Standard abstract (IT) 13
  • 14. Tagset 14 Manual annotation using UAM Corpus Tool 2.X and 3.X (O`Donnell, 2015) This layer of annotation is for rhetorical moves. There are 5 choices of moves and 6 choices of submoves. In short, each ontological unit is assigned to one of 9 choices. The “uncertain” tag is designed as a temporary label.
  • 15. #IAA theme extraction Strategic decision points • Research log was kept using themes, e.g. #meth, #stats, #IAA • 142 notes relating to #IAA written between 2012- 2017 were identified. • The findings presented are the notes that are the most important and generalizable to other projects. 15
  • 16. Findings overview: Three types of strategic decisions affecting IAA 1. Methodological decisions 2. Statistical decisions 3. Rhetorical decisions 16
  • 17. Findings (1) Methodological choices to enhance IAA A. Ontological unit B. Tagset size C. Tag clarity of demarcation D. Catch-all tags E. Detailed coding booklet F. Pre-selection, training and testing G. Easy-to-use tools H. Monitoring, feedback and regular meetings I. Pilot studies and small trials 17
  • 18. Finding 1a: Ontological unit 18 Fixed ontological units (i.e. what you code), e.g. each word, each sentence, simplify calculation of IAA and increase the IAA since boundaries of each unit are identical. Variable ontological units provide researchers with additional choices on how to calculate (manipulate?) IAA – identical, subsumed, cross-over. How do you calculate by character (inc. white space?), letter, word, what unit? I love you. 8 letters, 3 words, 11 characters I love him. Agreement ratio 0.62, 0.67, 0.72
  • 19. Finding 1b: Tagset size The more tags, the less agreement Rissanen (1989, as cited in Archer, 2012, n.p.) points out the “mystery of vanishing reliability” i.e. the statistical unreliability of annotation that is too detailed. Obvious with hindsight, but researchers tend to develop tags that will inform their research rather than result in higher IAA. 1 tag = total agreement (but probably no reason to code) 10 tags = less agreement 100 tags = much less agreement 1000 tags = almost no chance of high IAA 19
  • 20. Finding 1c: Tagset clarity of demarcation Pilot studies of possible tags and tagsets Pilot study: Tagged 100 abstracts using IMRD move and CARS move tags Difficulty: 1. prevalence of method in IMRD positions 2. demarcation of boundary cases  created SOP, codified in coding booklet Final selection: Dropped both sets of tags and selected Hyland (2004, p.67) IPMPC tagset20
  • 21. Finding 1d: Catch-all tags 21 Tags Description Fuzzy Used when difficult to assign to tag in existing tagset Multiple Used when more than one tag applies Portmanteau Used when item transcends two tag domains Problematic Used when impossible to assign tag Archer (2012, n.p.) describes four tag types, all of which increase IAA by providing easy-to-code options for boundary cases. My “uncertain” tag is a catch-all. Calculating IAA including “uncertain” results in higher IAA.
  • 22. Finding 1e: Annotation (coding) booklet 22 Standard operating procedure • Guidelines, Rules, Examples, Borderline cases disambiguated
  • 23. Finding 1f: Training course and test 23 Course based on annotation booklet • Face-to-face and/or online Test based on annotation booklet • Serialist tests • Holistic tests Qualification cut-off points • e.g. 90% can start annotating • e.g. 61% needs additional training • e.g. 60% discontinue training
  • 24. Finding 1g: Easy-to-use annotation tools 24 • Tool and instructions! • UAM Corpus Tool – help forum in Spanish • Wrote project-specific instruction booklet for annotators
  • 25. Finding 1h: monitoring, feedback and regular meetings 25 These three aspects I believe led to greater retention of annotators and higher accuracy. • More monitoring in initial stages (real-time is possible in GATE) – to identify problems early • Constructive actionable feedback – to retain annotator and increase accuracy • Regular meetings – annotators who cancelled meetings tended to have a problem (either with annotation or in their life). I helped with annotation issues.
  • 26. Finding 1i: Pilot studies 26 Various pilot studies and small-scale trials. Enables researcher to discover issues and proactively avert potential problems • 136 abstracts SFL annotation of process, participant and circumstance • 136 abstracts SFL annotation of sub-categories of circumstance • 10 abstracts Multimethod • 500 abstracts Lexicogrammatical • 40 abstracts Specialist vs linguist IMRaD annotation • 100 abstracts Tagset selection (CARS vs IMRaD) • 3 people Development of Coding booklet • 10 abstracts Examples vs. Coding booklet • 2 people Development of training course • 500 abstracts Rhetorical moves using coding booklet by self • 1000 abstracts Rhetorical moves using coding booklet by self & annotators • 2500 abstracts Rhetorical moves using coding booklet by annotators
  • 27. Findings (2) Statistical choices to enhance IAA A. Cherry-picking population-sample size ratio B. Random vs systematic C. Dealing with outliers (annotators) • Omit [+justify?]; replace with mean [?] D. Sample selection: • early vs later coding • pre-discussion vs. post-discussion E. Granularity (see next slide) • Reducing granularity by merging units; fewer categories, higher agreement 27
  • 28. Finding 2e: Granularity 28 Measures of IAA increase greatly as granularity decreases Lower IAA Higher IAA
  • 29. Findings (3) Rhetorical choices to enhance IAA Claim high IAA with no further details + gold standard with no further details and/or + provide a simple ratio or percentage and/or + provide details of sample size Rely on vagueness and ambiguity to allow reader to infer higher IAA than found or actual high IAA. 29
  • 30. Conclusion High IAA may be due to • sound or cogent methodological choices; but it could also be due to manipulating the • statistical smoke (i.e. selecting parameters leading to higher IAA) and • rhetorical mirrors. (i.e. using vagueness/ambiguity to infer IAA is high) In most publications in applied linguistics, sufficient detail is not provided. 30
  • 31. Best practice suggestions • Annotate using tags at one level more finely. • Create annotation booklet with clear rules, examples and discussion of boundary cases. • Develop, trial and require all annotators to complete a training course. • Set a benchmark standard. • Monitor and provide constructive actionable feedback to annotators. • Report IAA in sufficient detail to convince skeptical readers. 31
  • 32. Beware of the skeleton in the cupboard • Researchers aim to portray their work as sound or cogent. • Actual IAA may differ from reported IAA • Be wary of statistical smoke and rhetorical mirrors 32
  • 33. Any questions, suggestions or comments? John Blake jblake@u-aizu.ac.jp