SlideShare a Scribd company logo
1 of 27
+ 
Automated Evaluation of Crowdsourced 
Annotations in the Cultural Heritage Domain 
Archana Nottamkandath, Jasper Oosterman, Davide Ceolin and Wan Fokkink 
VU University Amsterdam and TU Delft, The Netherlands 
1
+ 
Overview 
 Project Overview 
 Use case 
 Research Questions 
 Experiment 
 Results 
 Conclusion 
2
+ 
Context 
• COMMIT Project 
– ICT Project in Netherlands 
– Subprojects: SEALINCMedia and Data2semantics 
• Socially Enriched Access to Linked Cultural Media 
(SEALINCMedia) 
– Collaboration with cultural heritage institutions to enrich their 
collections and make them more accessible 
3
+ 
Use case 
 CH institutions have large collections which are poorly 
annotated (Rijksmuseum Amsterdam: over 1 million items) 
 Lack of sufficient resources: knowledge, cost, labor 
 Solution: Crowd sourcing 
4
+ 
Crowdsourcing Annotation Tasks 
5 
Roses 
Annotator 
From crowd 
Garden 
Provides 
Annotations 
Artefact 
(Painting or Objects) 
Car 
Car 
Garden 
Roses 
Evaluation
+ 
Annotation evaluation 
 Manual evaluation is not feasible 
 Institutions have large collections( Rijksmuseum: over 1 million) 
 Crowd provides quite a lot of annotations 
 Costs time and money 
 Museums have limited resources 
6
+ 
Need for automated algorithms 
 Thus there is a need to develop algorithms to automatically evaluate 
annotations with good accuracy 
7
+ 
Previous approach 
 Building user profile and tracking user reputation based on 
semantic similarity 
 Tracking provenance information for users 
 Realized: There is lot of data provided and meaningful info 
can be derived 
 Current approach: Can we determine quality of information 
based on features? 
8
+ 
Research questions 
 Can we evaluate annotations based on properties of the annotator 
and the annotation? 
 Can we predict reputation of annotator based on annotator 
properties? 
9 
Roses 
Age: 25 
Male 
Arts degree 
No typo 
Noun 
In Wordnet
+ 
Relevant features 
 Features of annotation 
 Annotator 
 Quality score 
 Length 
 Specificity… 
 Features of annotator 
 Age 
 Gender 
 Education 
 Tagging experience… 
10
+ 
Semantic Representation 
11 
Open Annotation model to represent annotation 
Annotation 
Target 
oac:hasBody 
Tag 
User 
oac:annotator 
Reviewer Review 
Review value 
oac:annotates 
oac:hasBody 
oac:hasTarget 
oac:annotation 
foaf:person 
rdf:type 
foaf:age 
age 
gender 
oac:annotates 
length 
oac:hasTarget 
rdf:type 
... 
... 
... 
... 
oac:annotator 
rdf:type 
rdf:type 
ex:length 
foaf:gender Used to estimate 
FOAF to represent Annotator properties
+ 
Experiment 
Steve.museum dataset 
 We performed our evaluations on Steve.Museum dataset 
 Online dataset of images and annotations 
12 
Stat features Values 
Provided tags 45,733 
Unique tags 13,949 
Tags evaluated as useful 39,931(87%) 
Tags evaluated as not-useful 
5,802(13%) 
Number of 
annotators/registered 
1218/488(40%)
+ 
Steve.museum annotation evaluation 
 The annotations in Steve.museum project were evaluated into 
multiple categories, we classified evaluations as either useful or not-useful 
13 
Usefulness-useful 
Judgement-positive 
Judgement-negative 
Problematic-foreign 
Problematic-typo 
… 
Usefulness-not useful
+ 
Identify relevant annotation properties 
 Manually select properties (F_man) 
 Is_adjective, is_english, in_wordnet 
 List of all possible properties (F_all) 
 F_man + [created_day/hour, length, specificty, nrwords, frequency] 
 Apply feature selection algorithm on F_all to choose properties 
(F_ml) 
 Feature selection algorithm from WEKA toolkit 
 WEKA is a collection of machine learning algorithms for data mining 
tasks 
 http://www.weka.net.nz/ 
14 
Usefulness-useful
+ 
Build train and test data 
 Split the Steve dataset annotations into test set and train set 
 The train set has features and goal(quality) 
 Test set: only the features 
 Fairness: Train set had 1000 useful and 1000 not-useful annotations 
15 
Tag Feature 
1 
Feature 
2 
Feature n Quality 
Rose f1 f2 fn Useful 
House f11 f12 f1n Not-useful 
Tag Feature 1 Feature 2 Feature n 
Lily f1 f2 fn 
Sky f11 f12 f1n 
Train data 
Test data
+ 
Machine learning 
 Apply Machine learning techniques 
 Learning: Learn about features and goal from training set 
 Predictions: Apply learning from the training set to the test set 
 Used SVM with default polykernel in WEKA to predict quality of 
annotations 
 Commonly used, fast and resistant against over-fitting 
16
+ 
Results 
 Method is good to predict useful tags, but not for predicting not-useful 
tags 
17 
Feature 
set 
Class Recall Precisio 
n 
F-m 
easure 
F_man Useful 0.90 0.90 0.90 
Not useful 0.20 0.21 0.20 
F_all Useful 0.75 0.91 0.83 
Not useful 0.42 0.18 0.25 
F_ml Useful 0.20 0.98 0.34 
Not useful 0.96 0.13 0.23
+ 
Identify relevant features of annotator 
 Are these features helpful to 
 Determine annotation quality? 
 Predict annotator reputation? 
18 
Age: 25 
Male 
Arts degree
+ 
Building annotator reputation 
 Probabilistic logic called Subjective Logic 
 Annotator opinion = 
 (belief, disbelief, uncertainty) 
 (p,n) = (positive,negative) evaluations 
 Belief = p/(p+n+2) Uncertainty = 2/(p+n+2) 
 Expectation value(E) is the reputation 
 E = (belief + apriori * uncertainty) 
 Apriori = 0.5 
19
+ 
Identify relevant annotator properties 
 Manually identified properties 
 F_man = [Community, age, education, experience, gender, tagging 
experience…] 
 List of all properties 
 F_all = F_man + [vocabulary_size, vocab_diversity, is_anonymous, # 
annotations in wordnet] 
 Feature selection algorithm on F_all 
 F_ml_a for annotation 
 F_ml_u for annotator 
20
+ 
Results 
 Trained on features using SVM to make predictions 
21 
Feature 
set 
Class Recall Precisio 
n 
F-measure 
F_man Useful 0.29 0.90 0.44 
Not 
useful 
0.73 0.11 0.20 
F_all Useful 0.69 0.91 0.78 
Not 
useful 
0.43 0.15 0.22 
F_ml_a Useful 0.55 0.91 0.68 
Not 
useful 
0.53 0.13 0.21
+ 
Results 
 Used regression to predict reputation values based on 
features of registered annotator 
 Since annotator reputation is highly skewed (90% > 0.7), we 
could not predict reputation successfully 
22 
Feature_se 
t 
corr RMS 
Error 
Mean Abs Errr Rel Abs Err 
F_man -0.02 0.15 0.10 97.8% 
F_all 0.22 0.13 0.09 95.1% 
F_ml_u 0.29 0.13 0.09 90.4%
+ 
Evaluation 
 The possible reasons why method not successful for 
predicting not-useful annotations: 
 They are minority (13% of whole dataset) 
 Need more in-depth analysis of features to determine not-useful 
annotations 
 Requires study from different datasets 
23
+ 
Relevance 
 Our experiments help to show that there is a correlation 
between features of annotator and annotation to the quality 
of annotations 
 With a small set of features we were able to predict 98% of 
the useful and 13% of the not useful annotations correctly. 
 Helps to identify which features are relevant to certain tasks 
24
+ 
Conclusions 
 Machine learning techniques help to predict useful 
evaluations but not not-useful ones 
 Devised a model 
 using SVM to predict annotation evaluation and annotator 
reputation 
 Using regression to predict annotator reputation 
25
+ 
Future work 
 Need to extract more in-depth information from both 
annotation and annotator 
 Need to build reputation of the annotator per topic 
 Apply the model on different use cases 
26
+ Thank you 
a.nottamkandath@vu.nl 
27

More Related Content

Similar to Automated evaluation of crowdsourced annotations in the cultural heritage domain

Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015lbishal
 
Artificial intelligence and IoT
Artificial intelligence and IoTArtificial intelligence and IoT
Artificial intelligence and IoTVeselin Pizurica
 
Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support SystemKavita Ganesan
 
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)Howin Chan, PHR
 
National STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic GlueNational STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic GlueNAFCareerAcads
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
Assessment outcomes from the TENCompetence project
Assessment outcomes from the TENCompetence projectAssessment outcomes from the TENCompetence project
Assessment outcomes from the TENCompetence projectUniversity of Strathclyde
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsAjay Ohri
 
Making property-based testing easier to read for humans
Making property-based testing easier to read for humansMaking property-based testing easier to read for humans
Making property-based testing easier to read for humansLaura M. Castro
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Jeet Das
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Rakebul Hasan
 
Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?Katrien Verbert
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...Aliaksandr Birukou
 
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...eMadrid network
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用台灣資料科學年會
 

Similar to Automated evaluation of crowdsourced annotations in the cultural heritage domain (20)

Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015
 
Artificial intelligence and IoT
Artificial intelligence and IoTArtificial intelligence and IoT
Artificial intelligence and IoT
 
Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support System
 
C++chapter2671
C++chapter2671C++chapter2671
C++chapter2671
 
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
 
National STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic GlueNational STEM League - Student Goals and Academic Glue
National STEM League - Student Goals and Academic Glue
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Assessment outcomes from the TENCompetence project
Assessment outcomes from the TENCompetence projectAssessment outcomes from the TENCompetence project
Assessment outcomes from the TENCompetence project
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
 
Making property-based testing easier to read for humans
Making property-based testing easier to read for humansMaking property-based testing easier to read for humans
Making property-based testing easier to read for humans
 
NEXiDA at OMG June 2009
NEXiDA at OMG June 2009NEXiDA at OMG June 2009
NEXiDA at OMG June 2009
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?
 
Vladimer Kobayashi
Vladimer KobayashiVladimer Kobayashi
Vladimer Kobayashi
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
automatic extraction of job information from job vacancies
automatic extraction of job information from job vacanciesautomatic extraction of job information from job vacancies
automatic extraction of job information from job vacancies
 
Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...
 
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
 

Recently uploaded

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Automated evaluation of crowdsourced annotations in the cultural heritage domain

  • 1. + Automated Evaluation of Crowdsourced Annotations in the Cultural Heritage Domain Archana Nottamkandath, Jasper Oosterman, Davide Ceolin and Wan Fokkink VU University Amsterdam and TU Delft, The Netherlands 1
  • 2. + Overview  Project Overview  Use case  Research Questions  Experiment  Results  Conclusion 2
  • 3. + Context • COMMIT Project – ICT Project in Netherlands – Subprojects: SEALINCMedia and Data2semantics • Socially Enriched Access to Linked Cultural Media (SEALINCMedia) – Collaboration with cultural heritage institutions to enrich their collections and make them more accessible 3
  • 4. + Use case  CH institutions have large collections which are poorly annotated (Rijksmuseum Amsterdam: over 1 million items)  Lack of sufficient resources: knowledge, cost, labor  Solution: Crowd sourcing 4
  • 5. + Crowdsourcing Annotation Tasks 5 Roses Annotator From crowd Garden Provides Annotations Artefact (Painting or Objects) Car Car Garden Roses Evaluation
  • 6. + Annotation evaluation  Manual evaluation is not feasible  Institutions have large collections( Rijksmuseum: over 1 million)  Crowd provides quite a lot of annotations  Costs time and money  Museums have limited resources 6
  • 7. + Need for automated algorithms  Thus there is a need to develop algorithms to automatically evaluate annotations with good accuracy 7
  • 8. + Previous approach  Building user profile and tracking user reputation based on semantic similarity  Tracking provenance information for users  Realized: There is lot of data provided and meaningful info can be derived  Current approach: Can we determine quality of information based on features? 8
  • 9. + Research questions  Can we evaluate annotations based on properties of the annotator and the annotation?  Can we predict reputation of annotator based on annotator properties? 9 Roses Age: 25 Male Arts degree No typo Noun In Wordnet
  • 10. + Relevant features  Features of annotation  Annotator  Quality score  Length  Specificity…  Features of annotator  Age  Gender  Education  Tagging experience… 10
  • 11. + Semantic Representation 11 Open Annotation model to represent annotation Annotation Target oac:hasBody Tag User oac:annotator Reviewer Review Review value oac:annotates oac:hasBody oac:hasTarget oac:annotation foaf:person rdf:type foaf:age age gender oac:annotates length oac:hasTarget rdf:type ... ... ... ... oac:annotator rdf:type rdf:type ex:length foaf:gender Used to estimate FOAF to represent Annotator properties
  • 12. + Experiment Steve.museum dataset  We performed our evaluations on Steve.Museum dataset  Online dataset of images and annotations 12 Stat features Values Provided tags 45,733 Unique tags 13,949 Tags evaluated as useful 39,931(87%) Tags evaluated as not-useful 5,802(13%) Number of annotators/registered 1218/488(40%)
  • 13. + Steve.museum annotation evaluation  The annotations in Steve.museum project were evaluated into multiple categories, we classified evaluations as either useful or not-useful 13 Usefulness-useful Judgement-positive Judgement-negative Problematic-foreign Problematic-typo … Usefulness-not useful
  • 14. + Identify relevant annotation properties  Manually select properties (F_man)  Is_adjective, is_english, in_wordnet  List of all possible properties (F_all)  F_man + [created_day/hour, length, specificty, nrwords, frequency]  Apply feature selection algorithm on F_all to choose properties (F_ml)  Feature selection algorithm from WEKA toolkit  WEKA is a collection of machine learning algorithms for data mining tasks  http://www.weka.net.nz/ 14 Usefulness-useful
  • 15. + Build train and test data  Split the Steve dataset annotations into test set and train set  The train set has features and goal(quality)  Test set: only the features  Fairness: Train set had 1000 useful and 1000 not-useful annotations 15 Tag Feature 1 Feature 2 Feature n Quality Rose f1 f2 fn Useful House f11 f12 f1n Not-useful Tag Feature 1 Feature 2 Feature n Lily f1 f2 fn Sky f11 f12 f1n Train data Test data
  • 16. + Machine learning  Apply Machine learning techniques  Learning: Learn about features and goal from training set  Predictions: Apply learning from the training set to the test set  Used SVM with default polykernel in WEKA to predict quality of annotations  Commonly used, fast and resistant against over-fitting 16
  • 17. + Results  Method is good to predict useful tags, but not for predicting not-useful tags 17 Feature set Class Recall Precisio n F-m easure F_man Useful 0.90 0.90 0.90 Not useful 0.20 0.21 0.20 F_all Useful 0.75 0.91 0.83 Not useful 0.42 0.18 0.25 F_ml Useful 0.20 0.98 0.34 Not useful 0.96 0.13 0.23
  • 18. + Identify relevant features of annotator  Are these features helpful to  Determine annotation quality?  Predict annotator reputation? 18 Age: 25 Male Arts degree
  • 19. + Building annotator reputation  Probabilistic logic called Subjective Logic  Annotator opinion =  (belief, disbelief, uncertainty)  (p,n) = (positive,negative) evaluations  Belief = p/(p+n+2) Uncertainty = 2/(p+n+2)  Expectation value(E) is the reputation  E = (belief + apriori * uncertainty)  Apriori = 0.5 19
  • 20. + Identify relevant annotator properties  Manually identified properties  F_man = [Community, age, education, experience, gender, tagging experience…]  List of all properties  F_all = F_man + [vocabulary_size, vocab_diversity, is_anonymous, # annotations in wordnet]  Feature selection algorithm on F_all  F_ml_a for annotation  F_ml_u for annotator 20
  • 21. + Results  Trained on features using SVM to make predictions 21 Feature set Class Recall Precisio n F-measure F_man Useful 0.29 0.90 0.44 Not useful 0.73 0.11 0.20 F_all Useful 0.69 0.91 0.78 Not useful 0.43 0.15 0.22 F_ml_a Useful 0.55 0.91 0.68 Not useful 0.53 0.13 0.21
  • 22. + Results  Used regression to predict reputation values based on features of registered annotator  Since annotator reputation is highly skewed (90% > 0.7), we could not predict reputation successfully 22 Feature_se t corr RMS Error Mean Abs Errr Rel Abs Err F_man -0.02 0.15 0.10 97.8% F_all 0.22 0.13 0.09 95.1% F_ml_u 0.29 0.13 0.09 90.4%
  • 23. + Evaluation  The possible reasons why method not successful for predicting not-useful annotations:  They are minority (13% of whole dataset)  Need more in-depth analysis of features to determine not-useful annotations  Requires study from different datasets 23
  • 24. + Relevance  Our experiments help to show that there is a correlation between features of annotator and annotation to the quality of annotations  With a small set of features we were able to predict 98% of the useful and 13% of the not useful annotations correctly.  Helps to identify which features are relevant to certain tasks 24
  • 25. + Conclusions  Machine learning techniques help to predict useful evaluations but not not-useful ones  Devised a model  using SVM to predict annotation evaluation and annotator reputation  Using regression to predict annotator reputation 25
  • 26. + Future work  Need to extract more in-depth information from both annotation and annotator  Need to build reputation of the annotator per topic  Apply the model on different use cases 26
  • 27. + Thank you a.nottamkandath@vu.nl 27

Editor's Notes

  1. An annotation describes certain aspect of the artefact An Annotator provides annotation Annotation process is crowdsourced on the Web Need to evaluate the provided annotations
  2. Quality is subjective
  3. More about WEKA
  4. Add example image here
  5. High precision,low recall – conservative, whichever it identified are correct Low precision, high recall – liberal algorithms, many false negatives Precision = TP/(TP+FP) Recall = TP/ (TP+FN)
  6. Expected value is the long-run average value of repetitions of the experiments it represents
  7. High precision,low recall – conservative, whichever it identified are correct Low precision, high recall – liberal algorithms, many false negatives Precision = TP/(TP+FP) Recall = TP/ (TP+FN) F_measure = 2.p.r/(p+r) harmonic mean
  8. RMS: sample std deviation of differences between predicted values and actual values Not many bad examples to learn from, values are very high (around 0.7)