This presentation explains the situation recognition with visual semantic role labeling for image understanding.
I presented this paper in the natural language processing lab as an undergraduate research assistant.
(July 16th, 2019)
2. Natural Language Processing Labs. By Daanv
Abstract
The problem of producing a concise summary of the situation an image depicts
1) The main activity
2) The participating actors (objects)
3) The roles these participants play in the activity
https://framenet.icsi.berkeley.edu/fndrupal/
FrameNet(Frame Semantics)
: building a lexical database of English that is
both human and machine readable,
based on annotating examples of how words are used in actual texts.
3. Natural Language Processing Labs. By Daanv
Formal Task Definition
- verb(V) , nouns(N), frames(F)
Ex) Rf = {(agent, boy), (source, cliff), (obstacle, Ø), (destination, water), (place, lake)}
> Predict a situation, S = (v, Rf)
Predict: S = (jumping, {(agent, boy), (source, cliff), (obstacle, Ø), (destination, water), (place, lake)})
4. Natural Language Processing Labs. By Daanv
Situation Recognition
This task compare performance to baselines that independently recognize activities and objects.
Dataset
> imSitu
Images labeled with situations (over 500 verbs with 125,000 images.)
Metrics
> Accuracy
The evaluation data has situations provided by multiple annotators.
So, the accuracy is high
when the verb prediction(verb) and semantic role-value pair predictions(value) matches one of the annotations.
Systems
> CRF (feature: ImageNet, wordNet) with VGG
- Stochastic gradient descent
- Batch size: 192 / epochs: 30 / learning rate: 1e-5
5. Natural Language Processing Labs. By Daanv
Situation Recognition
The number of semantic roles a noun can participate in, on a log-scale.
6. Natural Language Processing Labs. By Daanv
Situation Recognition
The number of nouns that can participate in a sample of semantic roles.
7. Natural Language Processing Labs. By Daanv
Situation Recognition
The number of nouns that appear with a sample of verbs.
8. Natural Language Processing Labs. By Daanv
Situation Recognition
Quantitative Results
Qualitative Results
Semantic roles Values for those roles
activity
The standard annotated data The output from CRF model when it correctly predicted the activity