Semi-Automated Assessment
of Annotation
Trustworthiness
Davide Ceolin, Archana Nottamkandath
Wan Fokkink
Museums...
Photo: flickr.com/clumsyjim
Museums...
Photo: flickr.com/clumsyjim
have a problem.
Photo: flickr.com/grrrl
Let’s recruit some help
Photo: flickr.com/anirudhkoul
Photo: flickr.com/anirudhkoul
Photo: flickr.com/hz536n
Red!
Photo: flickr.com/anirudhkoul
Photo: flickr.com/hz536n
Red!
Tulips!
Photo: flickr.com/anirudhkoul
Photo: flickr.com/hz536n
Tasty!
Red!
Tulips!
Photo: flickr.com/anirudhkoul
Photo: flickr.com/hz536n
Tasty!
Red!
Tulips!
???
Photo: flickr.com/anirudhkoul
Photo: flickr.com/hz536n
Museums are
meticolous
• The crowd alone does not solve the problem.
• We need to select only the most trustworthy
annotations.
• System requirements:
✓reliable
✓with minumum effort
✓efficient
User Reputation
User reputation should reflect the trustworthiness of
their annotations.
Tulip
Flower
Ugly
Photo: flickr.com/hz536n
Reputation estimation
Tulip
Flower
Red
Pink
Purple
Photo: flickr.com/hz536n
Reputation estimation
Tulip
Flower
Red
Pink
Purple
0.00.51.01.52.0
Reputation
Reputation values
ProbabilityDensity
0 0.1 0.3 0.5 0.7 0.9 1
We compute reputations as in
subjective logic.
Photo: flickr.com/hz536n
Reputation estimation
Tulip
Flower
Red
Pink
Purple
Few annotations per author:
✓ minimum effort
0.00.51.01.52.0
Reputation
Reputation values
ProbabilityDensity
0 0.1 0.3 0.5 0.7 0.9 1
Photo: flickr.com/hz536n
Interpreting reputation
0.00.51.01.52.0
Reputation
Reputation values
ProbabilityDensity
0 0.1 0.3 0.5 0.7 0.9 1
Interpreting reputation
User
reputation: 0.6
0.00.51.01.52.0
Reputation
Reputation values
ProbabilityDensity
0 0.1 0.3 0.5 0.7 0.9 1
E0.6
Interpreting reputation
User
reputation: 0.6
Should we accept all his
annotations? Reject all?
Interpreting reputation
User
reputation: 0.6
Should we accept all his
annotations? Reject all?
No: accept only the best 60%.
How?
Estimating expertise
can help
Expertise
Training set
New annotation
Tulip
Flower
Red
Pink
Purple
Semantic
distance
Rose
We are almost there...
Rank the annotations and evaluate them.
But we can improve the efficiency of the system
Rose
Violet
...
...
Tomato
0.9
0.8
...
...
0.3
}
}
60% accepted
40% rejected
User reputation:
0.6
Test set
Semantic clustering
Clustering the training set semantically helps in
improving the system efficiency
Tulip
Flower
Red
Pink
Purple
Rose
Semantic clustering
Clustering the training set semantically helps in
improving the system efficiency
Tulip
Flower
Red
Pink
Purple
Rose
Semantic clustering
Clustering the training set semantically helps in
improving the system efficiency
Tulip
Flower
Red
Pink
Purple
Rose
Semantic clustering
✓efficiency
Clustering the training set semantically helps in
improving the system efficiency
Tulip
Flower
Red
Pink
Purple
Rose
Results
• We run some experiments with from 5 to
30 annotations per user, on two datasets.
• Accuracy: 68% - 84%
• Precision: 87% - 88%
• Recall: 80% - 96%
• Time saved by clustering: 44% - 52%
Discussion
• Results are satisfactory but they can be
further improved
• Provenance can help
• Image analysis programs can help
• Reuse of evaluations can reduce the
museum effort
Recap
• We propose a system to evaluate the
trustworthiness of museum annotations
• Uses reputation, semantic similarity and
clustering
✓Reliable
✓With minumum effort
✓Efficient
Recap
• We propose a system to evaluate the
trustworthiness of museum annotations
• Uses reputation, semantic similarity and
clustering
✓Reliable
✓With minumum effort
✓Efficient
Thanks!

Semi-automated Assessment of Annotation Trustworthiness