Situation recognition visual semantic role labeling for image understanding

Natural Language Processing Labs. By Daanv
2019.07.16
Situation Recognition

Abstract
The problem of producing a concise summary of the situation an image depicts
1) The main activity
2) The participating actors (objects)
3) The roles these participants play in the activity
https://framenet.icsi.berkeley.edu/fndrupal/
FrameNet(Frame Semantics)
: building a lexical database of English that is
both human and machine readable,
based on annotating examples of how words are used in actual texts.

Formal Task Definition
- verb(V) , nouns(N), frames(F)
Ex) Rf = {(agent, boy), (source, cliff), (obstacle, Ø), (destination, water), (place, lake)}
> Predict a situation, S = (v, Rf)
Predict: S = (jumping, {(agent, boy), (source, cliff), (obstacle, Ø), (destination, water), (place, lake)})

This task compare performance to baselines that independently recognize activities and objects.
Dataset
> imSitu
Images labeled with situations (over 500 verbs with 125,000 images.)
Metrics
> Accuracy
The evaluation data has situations provided by multiple annotators.
So, the accuracy is high
when the verb prediction(verb) and semantic role-value pair predictions(value) matches one of the annotations.
Systems
> CRF (feature: ImageNet, wordNet) with VGG
- Stochastic gradient descent
- Batch size: 192 / epochs: 30 / learning rate: 1e-5

The number of semantic roles a noun can participate in, on a log-scale.

The number of nouns that can participate in a sample of semantic roles.

The number of nouns that appear with a sample of verbs.

Quantitative Results
Qualitative Results
Semantic roles Values for those roles
activity
The standard annotated data The output from CRF model when it correctly predicted the activity

THANK YOU.

Situation recognition visual semantic role labeling for image understanding

More Related Content

Similar to Situation recognition visual semantic role labeling for image understanding

More from Danbi Cho

Recently uploaded

Situation recognition visual semantic role labeling for image understanding