Rijksmuseum presentation

Trusting user-contributed data in
Cultural Heritage Domain
Archana Nottamkandath
(Work done with Davide Ceolin & Wan Fokkink)
VU University Amsterdam
COMMIT/SEALINC
1

Context
• COMMIT/SEALINC project
• Museums have collections which can be
annotated with (external) user-contributed
information for searching better through
collection
COMMIT/SEALINC 2
TulipsTulips
ButterflyButterfly
PortraitPortrait

Can we directly trust the user provided
content?
COMMIT/SEALINC 3

Can we trust the user provided
content directly? – Apparently Not!
COMMIT/SEALINC 4
Stella is GayStella is Gay
wwwapartmentvermeercomwwwapartmentvermeercom

Possible Solution: Manually evaluate
annotations
COMMIT/SEALINC 5
Accept
Not sure
Reject

But…
RMA has over 1 million Collection items!!

Evaluation costs Resources
• Is expensive manual labor
• Costs a lot of time
• Requires adherence to museum policies
– Museum X [Accept, not sure, reject]
– Museum Y [Foreign, Judgmental, Strong reject,
Strong accept ]..
COMMIT/SEALINC 7

Need for automated trust analysis
• Algorithms automatically/ semi-automatically
evaluate annotations
COMMIT/SEALINC 8
(a) Flower
(b) 19th
century
(c) Sunshine
(d) Vermeer
(e) Bronze

Automated Trust analysis algorithms
• Requirements
– High accuracy (Accurately predict evaluations
most of the time)
– Minimum input from cultural heritage
professionals
– Scalable and Efficient (w.r.t resources and time)
– Works with different cultural heritage data
COMMIT/SEALINC 9

Definition
• Trustworthy annotation
– Relevant to image
– Enhances/re-instates existing knowledge
– Is acceptable by museums policies to be published
on their website
COMMIT/SEALINC 10

Used
Accurator Interface
Existing workflow
COMMIT/SEALINC 11
Tulips
Roses
Night Sky
Van Gogh
Buddhist
Portrait
Monument
Asian
War
memorial
User_name: Jones
contributed
Tags

How to determine trust from user
contributing annotations to the
system?
COMMIT/SEALINC 12
Tulips
Roses
Night Sky
Van Gogh
Buddhist
Portrait
Monument
Asian
War
memorial
User_name: Jones
contributed
Used
Accurator Interface
Tags

How to determine trust from the
Annotation Process?
COMMIT/SEALINC 13
Tulips
Roses
Night Sky
Van Gogh
Buddhist
Portrait
Monument
Asian
War
memorial
User_name: Jones
contributed
Used
Accurator Interface
Tags

How to determine trust from
contributed data?
COMMIT/SEALINC 14
Tulips
Roses
Night Sky
Van Gogh
Buddhist
Portrait
Monument
Asian
War
memorial
User_name: Jones
contributed
Used
Accurator Interface
Tags

users?[1]
• Evaluate subset of user tags
COMMIT/SEALINC 15
Tulips
Roses
Night Sky
Van Gogh
Buddhist
Portrait
Monument
Asian
War
memorial
User_name: Jones Test set
Roses
Night sky
Van Gogh
Asian
War
Memorial
contributed
Train set
Tulips
Van Gogh
Buddhist
Monument
Evaluates
Museum

• User expert on one topic might be expert on
similar topics
COMMIT/SEALINC 16
Expert on
Tulips
Possibly
Expert on
Possibly
Expert on
Roses
Lilies
User_name: Jones
Test set
Roses
Night sky
Van Gogh
Asian
War
Memorial
Train
setTulips
Van Gogh
Buddhist
Monument
users?[1]
With a certain probability

Determine trust from users[2]
• User profile : [Experience, education, country,
gender, income, museum visits…]
COMMIT/SEALINC 17
Steve.museum
dataset

Determine trust from users[2]
• Predict user reputation using machine
learning
• [Feature1, Feature2, ..] -> Category of user
– [21 yrs, Female, Bachelors, Australia] -> Excellent
– [60 yrs, Male, PhD, America] -> Good
– [56 yrs, Female, Masters, Croatia] -> Bad
– [30 yrs, Male, Bachelors, Mexico] -> ?
COMMIT/SEALINC 18

Annotation process?
• Time of day, Day of week, Day of month etc.
affect user quality
• Typing speed affects user quality
– Typing fast might indicate higher confidence
COMMIT/SEALINC 19
Tulips
Van Gogh
Buddhist
Monument
Rich Lady
Plant
Leonardo
Bronze plate

Annotation process?
• Predict tag quality using machine learning
• [Feature1, Feature2, ....] -> Category of Tag
– [10:00, Monday, June, 3s] -> Excellent
– [12:00, Wednesday, 15s] -> Good
– [23:56, Friday, April, 80s] -> Bad
– [06:00, Thursday, March, 70s] -> ?
COMMIT/SEALINC 20

Annotation process?
• Why is this important?
– Useful for anonymous users who did not fill profile
information
COMMIT/SEALINC 21

How to determine trust from data?
• Contributed data itself has features, use
machine learning to predict quality of tag
– Length
– Specificity
– Presence in vocabularies
– Times already contributed
– Noun
COMMIT/SEALINC 22
Tulips
Van Gogh
Buddhist
Monument
[6,specific, yes, English, 10, no…] -> Good
[7,specific, yes, Dutch, 1,yes…] -> Bad

Goals achieved
• Requirements
most of the time)
– Minimum input from cultural heritage
professionals
– Scalable and efficient
– Works with different cultural heritage data
COMMIT/SEALINC 23

Goal 1: High Accuracy
COMMIT/SEALINC 24

most of the time)
• Predicted quality of a tag based on user profile with
accuracy from 68% to 72%
COMMIT/SEALINC 25
Steve dataset results
Goal 1: High Accuracy

Goal 2: Minimum input from
Cultural Heritage Institutions
• Algorithms require minimum of 5 evaluated
tags per user for predictions
• Working on to minimize/eliminate this
requirement
COMMIT/SEALINC 26

Goal 3: Scalable and efficient
• Reduced computation time while maintaining
accuracy in Steve dataset
COMMIT/SEALINC 27

Goal 4: Works with different
cultural heritage data
• Steve Museum dataset
• Waisda? Dataset
– Video Tagging Game
• SEALINC Media experiments at CWI
COMMIT/SEALINC 28

Future Work
• Employ our experiences and algorithms to
analyze the data from Accurator
• Employ trust scores for ranking in search
• Identify techniques to visualize trust
COMMIT/SEALINC 29

Thank you
a.nottamkandath@vu.nl
COMMIT/SEALINC 30

Rijksmuseum presentation

Recommended

Recommended

More Related Content

Similar to Rijksmuseum presentation

Similar to Rijksmuseum presentation (20)

Recently uploaded

Recently uploaded (20)

Rijksmuseum presentation

Editor's Notes