This document summarizes a project to develop a tool for multilingual scene description using semantics. The tool was tested by archivists describing video clips. Key findings were that the conceptual model for semantic description was not difficult to learn but slowed the cataloguing process. The search function worked accurately and fast once clips were described. Based on user feedback, improvements are needed to the terminology and interface to make the tool more intuitive for end users. The goal is to eventually automatically generate semantic descriptions using natural language processing.
4. How can different
users describe the
same scene?
≠ Different laguages
≠ Different words
Ambiguity is the key
5. 2018 FIAT/IFTA World
Conference
Developing a tool for scene
description in a multilingual
context using semantics
Unambigous global searches
Audiovisual-Semantics
Ontology Conceptual Model
11. User interface
Enables users to introduce complex
descriptions:
One or multiple agents
Qualified or not
Perfoming one or several actions
To one or several objects
With an undertermined of happening-
wide qualifiers
Hides the complexity of the AVS
conceptual model
Alows quick data entry
14. End user tests
Focus group: 8 people, different skills...
Could be scene semantic description easily adopted?
Is the tool friendly enough?
15. “
You have 10 seconds to use your
imagination
If you can't, maybe you shouldn't
watch so much TV
16. Individual working
sessions
Access to VSN MAM.
Basic training.
5 clips to be describe.
10 queries to be perfomed.
1 survey.
Interviews.
User’s concerns
How to qualify agents and
objects.
How to describe a scene
without actions or very
general scenes such as.
Being very exhaustive.
System usability
Archivists tend to use
the search option rather
than the autocomplete
one.
The term validation
process was not intuitive.
Minor bugs were
detected and solved.
17. Individual working
sessions
Access to VSN MAM.
Basic training.
5 clips to be describe.
10 queries to be perfomed.
1 survey.
Interviews.
User’s concerns
How to qualify agents and
objects.
How to describe a scene
without actions or very
general scenes such as.
Being very exhaustive.
System usability
Archivists tend to use
the search option rather
than the autocomplete
one.
The term validation
process was not intuitive.
Minor bugs were
detected and solved.
18. Individual working
sessions
Access to VSN MAM.
Basic training.
5 clips to be describe.
10 queries to be perfomed.
1 survey.
Interviews.
User’s concerns
How to qualify agents and
objects.
How to describe a scene
without actions or very
general scenes such as.
Being very exhaustive.
System usability
Archivists tend to use
the search option rather
than the autocomplete
one.
The term validation
process was not intuitive.
Minor bugs were
detected and solved.
19. Individual working
sessions
Access to VSN MAM.
Basic training.
5 clips to be describe.
10 queries to be perfomed.
1 survey.
Interviews.
User’s concerns
How to qualify agents and
objects.
How to describe a scene
without actions or very
general scenes such as.
Being very exhaustive.
System usability
Archivists tend to use
the search option rather
than the autocomplete
one.
The term validation
process was not intuitive.
Minor bugs were
detected and solved.
20.
21. USER SCENE HAPPENINGS AGENTS ACTIONS OBJECTS QUALIFIERS
MGL 7 7 SALESWOMAN
TALK
PLACE
DROP
HAND ON
CLOTHES
CUSTOMERS
CLOTHES
YOUNG
AFRO-AMERICAN
SHELF
CHANGING ROOM
SHOP
EEA 1 1 SALESWOMAN PLACE CLOTHES
CDP 1 2
WOMAN
SHOP
CLOTHES
ORDER CLOTHES
AFRO-AMERICAN
SHOP
PPM 4 6 SALESWOMAN
FOLD
PLACE
DROP IN
ORDER
CLOTHES
SHOP
SHELF
CHANGING ROOM
SREET
AHG 4 4 SALESWOMAN
PLACE
FOLD
WALK
DROP IN
SPEAK
CLOTHES
CUSTOMERS
SHOP
STREET
BLACK
YOUNG
CHANGING ROOM
JAG 2 2 WOMAN
WORK
FOLD
SHOP
CLOTHES
CHANGING ROOM
SALESWOMAN
BLACK
ILS 4 4 WOMAN
PLACE
FOLD
WALK
ATTEND
CLOTHES
CHANGING ROOM
SALESWOMAN
BLACK
SHOP
KIABI
NGC 3 3
SALESWOMAN
WOMAN
PLACE
DROP
ATTEND
GARMENT
CHANGIN ROOM
BLACK
SHOP
22.
23.
24.
25. End users survey
The conceptual model for
scene description is not
difficult.
The learning process was
easy.
Conceptual problems are the
26. Archivists
conclusions
The cataloguing process is slower
than natural language cataloguing.
Semantic description can be
relevant for archive footage, where
descriptions must be thorough and
detailed.
The data entry process determines
the information retrieval.
The search process is accurate and
fast.
For fact-checking the tool should be
complemented with other
technologies such as speech to text.
27. What's next
A terminology closer to the end
user is needed.
Improve the User Interface.
Clean-up the vocabularies.
Our main goal is to
automatically generate semantic
descriptions applying NLP.
The relations between the terms
are the key of this model.
28. A project by Financed by
CREDITS
Virginia Bazán-Gil (RTVE)
Manuel Escribano (VSN)
Juan-Antonio Pastor-Sánchez (UMU)
Tomás Saorín (UMU)