SlideShare a Scribd company logo
+ 
Automated Evaluation of Crowdsourced 
Annotations in the Cultural Heritage Domain 
Archana Nottamkandath, Jasper Oosterman, Davide Ceolin and Wan Fokkink 
VU University Amsterdam and TU Delft, The Netherlands 
1
+ 
Overview 
 Project Overview 
 Use case 
 Research Questions 
 Experiment 
 Results 
 Conclusion 
2
+ 
Context 
• COMMIT Project 
– ICT Project in Netherlands 
– Subprojects: SEALINCMedia and Data2semantics 
• Socially Enriched Access to Linked Cultural Media 
(SEALINCMedia) 
– Collaboration with cultural heritage institutions to enrich their 
collections and make them more accessible 
3
+ 
Use case 
 CH institutions have large collections which are poorly 
annotated (Rijksmuseum Amsterdam: over 1 million items) 
 Lack of sufficient resources: knowledge, cost, labor 
 Solution: Crowd sourcing 
4
+ 
Crowdsourcing Annotation Tasks 
5 
Roses 
Annotator 
From crowd 
Garden 
Provides 
Annotations 
Artefact 
(Painting or Objects) 
Car 
Car 
Garden 
Roses 
Evaluation
+ 
Annotation evaluation 
 Manual evaluation is not feasible 
 Institutions have large collections( Rijksmuseum: over 1 million) 
 Crowd provides quite a lot of annotations 
 Costs time and money 
 Museums have limited resources 
6
+ 
Need for automated algorithms 
 Thus there is a need to develop algorithms to automatically evaluate 
annotations with good accuracy 
7
+ 
Previous approach 
 Building user profile and tracking user reputation based on 
semantic similarity 
 Tracking provenance information for users 
 Realized: There is lot of data provided and meaningful info 
can be derived 
 Current approach: Can we determine quality of information 
based on features? 
8
+ 
Research questions 
 Can we evaluate annotations based on properties of the annotator 
and the annotation? 
 Can we predict reputation of annotator based on annotator 
properties? 
9 
Roses 
Age: 25 
Male 
Arts degree 
No typo 
Noun 
In Wordnet
+ 
Relevant features 
 Features of annotation 
 Annotator 
 Quality score 
 Length 
 Specificity… 
 Features of annotator 
 Age 
 Gender 
 Education 
 Tagging experience… 
10
+ 
Semantic Representation 
11 
Open Annotation model to represent annotation 
Annotation 
Target 
oac:hasBody 
Tag 
User 
oac:annotator 
Reviewer Review 
Review value 
oac:annotates 
oac:hasBody 
oac:hasTarget 
oac:annotation 
foaf:person 
rdf:type 
foaf:age 
age 
gender 
oac:annotates 
length 
oac:hasTarget 
rdf:type 
... 
... 
... 
... 
oac:annotator 
rdf:type 
rdf:type 
ex:length 
foaf:gender Used to estimate 
FOAF to represent Annotator properties
+ 
Experiment 
Steve.museum dataset 
 We performed our evaluations on Steve.Museum dataset 
 Online dataset of images and annotations 
12 
Stat features Values 
Provided tags 45,733 
Unique tags 13,949 
Tags evaluated as useful 39,931(87%) 
Tags evaluated as not-useful 
5,802(13%) 
Number of 
annotators/registered 
1218/488(40%)
+ 
Steve.museum annotation evaluation 
 The annotations in Steve.museum project were evaluated into 
multiple categories, we classified evaluations as either useful or not-useful 
13 
Usefulness-useful 
Judgement-positive 
Judgement-negative 
Problematic-foreign 
Problematic-typo 
… 
Usefulness-not useful
+ 
Identify relevant annotation properties 
 Manually select properties (F_man) 
 Is_adjective, is_english, in_wordnet 
 List of all possible properties (F_all) 
 F_man + [created_day/hour, length, specificty, nrwords, frequency] 
 Apply feature selection algorithm on F_all to choose properties 
(F_ml) 
 Feature selection algorithm from WEKA toolkit 
 WEKA is a collection of machine learning algorithms for data mining 
tasks 
 http://www.weka.net.nz/ 
14 
Usefulness-useful
+ 
Build train and test data 
 Split the Steve dataset annotations into test set and train set 
 The train set has features and goal(quality) 
 Test set: only the features 
 Fairness: Train set had 1000 useful and 1000 not-useful annotations 
15 
Tag Feature 
1 
Feature 
2 
Feature n Quality 
Rose f1 f2 fn Useful 
House f11 f12 f1n Not-useful 
Tag Feature 1 Feature 2 Feature n 
Lily f1 f2 fn 
Sky f11 f12 f1n 
Train data 
Test data
+ 
Machine learning 
 Apply Machine learning techniques 
 Learning: Learn about features and goal from training set 
 Predictions: Apply learning from the training set to the test set 
 Used SVM with default polykernel in WEKA to predict quality of 
annotations 
 Commonly used, fast and resistant against over-fitting 
16
+ 
Results 
 Method is good to predict useful tags, but not for predicting not-useful 
tags 
17 
Feature 
set 
Class Recall Precisio 
n 
F-m 
easure 
F_man Useful 0.90 0.90 0.90 
Not useful 0.20 0.21 0.20 
F_all Useful 0.75 0.91 0.83 
Not useful 0.42 0.18 0.25 
F_ml Useful 0.20 0.98 0.34 
Not useful 0.96 0.13 0.23
+ 
Identify relevant features of annotator 
 Are these features helpful to 
 Determine annotation quality? 
 Predict annotator reputation? 
18 
Age: 25 
Male 
Arts degree
+ 
Building annotator reputation 
 Probabilistic logic called Subjective Logic 
 Annotator opinion = 
 (belief, disbelief, uncertainty) 
 (p,n) = (positive,negative) evaluations 
 Belief = p/(p+n+2) Uncertainty = 2/(p+n+2) 
 Expectation value(E) is the reputation 
 E = (belief + apriori * uncertainty) 
 Apriori = 0.5 
19
+ 
Identify relevant annotator properties 
 Manually identified properties 
 F_man = [Community, age, education, experience, gender, tagging 
experience…] 
 List of all properties 
 F_all = F_man + [vocabulary_size, vocab_diversity, is_anonymous, # 
annotations in wordnet] 
 Feature selection algorithm on F_all 
 F_ml_a for annotation 
 F_ml_u for annotator 
20
+ 
Results 
 Trained on features using SVM to make predictions 
21 
Feature 
set 
Class Recall Precisio 
n 
F-measure 
F_man Useful 0.29 0.90 0.44 
Not 
useful 
0.73 0.11 0.20 
F_all Useful 0.69 0.91 0.78 
Not 
useful 
0.43 0.15 0.22 
F_ml_a Useful 0.55 0.91 0.68 
Not 
useful 
0.53 0.13 0.21
+ 
Results 
 Used regression to predict reputation values based on 
features of registered annotator 
 Since annotator reputation is highly skewed (90% > 0.7), we 
could not predict reputation successfully 
22 
Feature_se 
t 
corr RMS 
Error 
Mean Abs Errr Rel Abs Err 
F_man -0.02 0.15 0.10 97.8% 
F_all 0.22 0.13 0.09 95.1% 
F_ml_u 0.29 0.13 0.09 90.4%
+ 
Evaluation 
 The possible reasons why method not successful for 
predicting not-useful annotations: 
 They are minority (13% of whole dataset) 
 Need more in-depth analysis of features to determine not-useful 
annotations 
 Requires study from different datasets 
23
+ 
Relevance 
 Our experiments help to show that there is a correlation 
between features of annotator and annotation to the quality 
of annotations 
 With a small set of features we were able to predict 98% of 
the useful and 13% of the not useful annotations correctly. 
 Helps to identify which features are relevant to certain tasks 
24
+ 
Conclusions 
 Machine learning techniques help to predict useful 
evaluations but not not-useful ones 
 Devised a model 
 using SVM to predict annotation evaluation and annotator 
reputation 
 Using regression to predict annotator reputation 
25
+ 
Future work 
 Need to extract more in-depth information from both 
annotation and annotator 
 Need to build reputation of the annotator per topic 
 Apply the model on different use cases 
26
+ Thank you 
a.nottamkandath@vu.nl 
27

More Related Content

Similar to Automated evaluation of crowdsourced annotations in the cultural heritage domain

Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015
lbishal
 
Artificial intelligence and IoT
Artificial intelligence and IoTArtificial intelligence and IoT
Artificial intelligence and IoT
Veselin Pizurica
 
Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support System
Kavita Ganesan
 
C++chapter2671
C++chapter2671C++chapter2671
C++chapter2671
AshokBabu49
 
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Howin Chan, PHR
 
National STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic GlueNational STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic Glue
NAFCareerAcads
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
Ivo Andreev
 
Assessment outcomes from the TENCompetence project
Assessment outcomes from the TENCompetence projectAssessment outcomes from the TENCompetence project
Assessment outcomes from the TENCompetence project
University of Strathclyde
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
Ajay Ohri
 
Making property-based testing easier to read for humans
Making property-based testing easier to read for humansMaking property-based testing easier to read for humans
Making property-based testing easier to read for humans
Laura M. Castro
 
NEXiDA at OMG June 2009
NEXiDA at OMG June 2009NEXiDA at OMG June 2009
NEXiDA at OMG June 2009
Claudio Rubbiani
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
Jeet Das
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
Rakebul Hasan
 
Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?
Katrien Verbert
 
Vladimer Kobayashi
Vladimer KobayashiVladimer Kobayashi
Vladimer Kobayashi
Eduworks Network
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
Robin Reni
 
automatic extraction of job information from job vacancies
automatic extraction of job information from job vacanciesautomatic extraction of job information from job vacancies
automatic extraction of job information from job vacancies
University of the Philippines Mindanao
 
Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...
Aliaksandr Birukou
 
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
eMadrid network
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
台灣資料科學年會
 

Similar to Automated evaluation of crowdsourced annotations in the cultural heritage domain (20)

Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015
 
Artificial intelligence and IoT
Artificial intelligence and IoTArtificial intelligence and IoT
Artificial intelligence and IoT
 
Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support System
 
C++chapter2671
C++chapter2671C++chapter2671
C++chapter2671
 
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
 
National STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic GlueNational STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic Glue
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Assessment outcomes from the TENCompetence project
Assessment outcomes from the TENCompetence projectAssessment outcomes from the TENCompetence project
Assessment outcomes from the TENCompetence project
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
 
Making property-based testing easier to read for humans
Making property-based testing easier to read for humansMaking property-based testing easier to read for humans
Making property-based testing easier to read for humans
 
NEXiDA at OMG June 2009
NEXiDA at OMG June 2009NEXiDA at OMG June 2009
NEXiDA at OMG June 2009
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?
 
Vladimer Kobayashi
Vladimer KobayashiVladimer Kobayashi
Vladimer Kobayashi
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
automatic extraction of job information from job vacancies
automatic extraction of job information from job vacanciesautomatic extraction of job information from job vacancies
automatic extraction of job information from job vacancies
 
Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...
 
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
 

Recently uploaded

在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 

Recently uploaded (20)

在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 

Automated evaluation of crowdsourced annotations in the cultural heritage domain

  • 1. + Automated Evaluation of Crowdsourced Annotations in the Cultural Heritage Domain Archana Nottamkandath, Jasper Oosterman, Davide Ceolin and Wan Fokkink VU University Amsterdam and TU Delft, The Netherlands 1
  • 2. + Overview  Project Overview  Use case  Research Questions  Experiment  Results  Conclusion 2
  • 3. + Context • COMMIT Project – ICT Project in Netherlands – Subprojects: SEALINCMedia and Data2semantics • Socially Enriched Access to Linked Cultural Media (SEALINCMedia) – Collaboration with cultural heritage institutions to enrich their collections and make them more accessible 3
  • 4. + Use case  CH institutions have large collections which are poorly annotated (Rijksmuseum Amsterdam: over 1 million items)  Lack of sufficient resources: knowledge, cost, labor  Solution: Crowd sourcing 4
  • 5. + Crowdsourcing Annotation Tasks 5 Roses Annotator From crowd Garden Provides Annotations Artefact (Painting or Objects) Car Car Garden Roses Evaluation
  • 6. + Annotation evaluation  Manual evaluation is not feasible  Institutions have large collections( Rijksmuseum: over 1 million)  Crowd provides quite a lot of annotations  Costs time and money  Museums have limited resources 6
  • 7. + Need for automated algorithms  Thus there is a need to develop algorithms to automatically evaluate annotations with good accuracy 7
  • 8. + Previous approach  Building user profile and tracking user reputation based on semantic similarity  Tracking provenance information for users  Realized: There is lot of data provided and meaningful info can be derived  Current approach: Can we determine quality of information based on features? 8
  • 9. + Research questions  Can we evaluate annotations based on properties of the annotator and the annotation?  Can we predict reputation of annotator based on annotator properties? 9 Roses Age: 25 Male Arts degree No typo Noun In Wordnet
  • 10. + Relevant features  Features of annotation  Annotator  Quality score  Length  Specificity…  Features of annotator  Age  Gender  Education  Tagging experience… 10
  • 11. + Semantic Representation 11 Open Annotation model to represent annotation Annotation Target oac:hasBody Tag User oac:annotator Reviewer Review Review value oac:annotates oac:hasBody oac:hasTarget oac:annotation foaf:person rdf:type foaf:age age gender oac:annotates length oac:hasTarget rdf:type ... ... ... ... oac:annotator rdf:type rdf:type ex:length foaf:gender Used to estimate FOAF to represent Annotator properties
  • 12. + Experiment Steve.museum dataset  We performed our evaluations on Steve.Museum dataset  Online dataset of images and annotations 12 Stat features Values Provided tags 45,733 Unique tags 13,949 Tags evaluated as useful 39,931(87%) Tags evaluated as not-useful 5,802(13%) Number of annotators/registered 1218/488(40%)
  • 13. + Steve.museum annotation evaluation  The annotations in Steve.museum project were evaluated into multiple categories, we classified evaluations as either useful or not-useful 13 Usefulness-useful Judgement-positive Judgement-negative Problematic-foreign Problematic-typo … Usefulness-not useful
  • 14. + Identify relevant annotation properties  Manually select properties (F_man)  Is_adjective, is_english, in_wordnet  List of all possible properties (F_all)  F_man + [created_day/hour, length, specificty, nrwords, frequency]  Apply feature selection algorithm on F_all to choose properties (F_ml)  Feature selection algorithm from WEKA toolkit  WEKA is a collection of machine learning algorithms for data mining tasks  http://www.weka.net.nz/ 14 Usefulness-useful
  • 15. + Build train and test data  Split the Steve dataset annotations into test set and train set  The train set has features and goal(quality)  Test set: only the features  Fairness: Train set had 1000 useful and 1000 not-useful annotations 15 Tag Feature 1 Feature 2 Feature n Quality Rose f1 f2 fn Useful House f11 f12 f1n Not-useful Tag Feature 1 Feature 2 Feature n Lily f1 f2 fn Sky f11 f12 f1n Train data Test data
  • 16. + Machine learning  Apply Machine learning techniques  Learning: Learn about features and goal from training set  Predictions: Apply learning from the training set to the test set  Used SVM with default polykernel in WEKA to predict quality of annotations  Commonly used, fast and resistant against over-fitting 16
  • 17. + Results  Method is good to predict useful tags, but not for predicting not-useful tags 17 Feature set Class Recall Precisio n F-m easure F_man Useful 0.90 0.90 0.90 Not useful 0.20 0.21 0.20 F_all Useful 0.75 0.91 0.83 Not useful 0.42 0.18 0.25 F_ml Useful 0.20 0.98 0.34 Not useful 0.96 0.13 0.23
  • 18. + Identify relevant features of annotator  Are these features helpful to  Determine annotation quality?  Predict annotator reputation? 18 Age: 25 Male Arts degree
  • 19. + Building annotator reputation  Probabilistic logic called Subjective Logic  Annotator opinion =  (belief, disbelief, uncertainty)  (p,n) = (positive,negative) evaluations  Belief = p/(p+n+2) Uncertainty = 2/(p+n+2)  Expectation value(E) is the reputation  E = (belief + apriori * uncertainty)  Apriori = 0.5 19
  • 20. + Identify relevant annotator properties  Manually identified properties  F_man = [Community, age, education, experience, gender, tagging experience…]  List of all properties  F_all = F_man + [vocabulary_size, vocab_diversity, is_anonymous, # annotations in wordnet]  Feature selection algorithm on F_all  F_ml_a for annotation  F_ml_u for annotator 20
  • 21. + Results  Trained on features using SVM to make predictions 21 Feature set Class Recall Precisio n F-measure F_man Useful 0.29 0.90 0.44 Not useful 0.73 0.11 0.20 F_all Useful 0.69 0.91 0.78 Not useful 0.43 0.15 0.22 F_ml_a Useful 0.55 0.91 0.68 Not useful 0.53 0.13 0.21
  • 22. + Results  Used regression to predict reputation values based on features of registered annotator  Since annotator reputation is highly skewed (90% > 0.7), we could not predict reputation successfully 22 Feature_se t corr RMS Error Mean Abs Errr Rel Abs Err F_man -0.02 0.15 0.10 97.8% F_all 0.22 0.13 0.09 95.1% F_ml_u 0.29 0.13 0.09 90.4%
  • 23. + Evaluation  The possible reasons why method not successful for predicting not-useful annotations:  They are minority (13% of whole dataset)  Need more in-depth analysis of features to determine not-useful annotations  Requires study from different datasets 23
  • 24. + Relevance  Our experiments help to show that there is a correlation between features of annotator and annotation to the quality of annotations  With a small set of features we were able to predict 98% of the useful and 13% of the not useful annotations correctly.  Helps to identify which features are relevant to certain tasks 24
  • 25. + Conclusions  Machine learning techniques help to predict useful evaluations but not not-useful ones  Devised a model  using SVM to predict annotation evaluation and annotator reputation  Using regression to predict annotator reputation 25
  • 26. + Future work  Need to extract more in-depth information from both annotation and annotator  Need to build reputation of the annotator per topic  Apply the model on different use cases 26
  • 27. + Thank you a.nottamkandath@vu.nl 27

Editor's Notes

  1. An annotation describes certain aspect of the artefact An Annotator provides annotation Annotation process is crowdsourced on the Web Need to evaluate the provided annotations
  2. Quality is subjective
  3. More about WEKA
  4. Add example image here
  5. High precision,low recall – conservative, whichever it identified are correct Low precision, high recall – liberal algorithms, many false negatives Precision = TP/(TP+FP) Recall = TP/ (TP+FN)
  6. Expected value is the long-run average value of repetitions of the experiments it represents
  7. High precision,low recall – conservative, whichever it identified are correct Low precision, high recall – liberal algorithms, many false negatives Precision = TP/(TP+FP) Recall = TP/ (TP+FN) F_measure = 2.p.r/(p+r) harmonic mean
  8. RMS: sample std deviation of differences between predicted values and actual values Not many bad examples to learn from, values are very high (around 0.7)