SlideShare a Scribd company logo
1 of 33
Download to read offline
Best of both worlds:
human-machine
collaboration for object
annotation (CVPR2015)
Olga Russakovsky@1, Li-Jia Li@2, Li Fei-Fei@1
@1: Stanford University, @2: Snapchat(Yahoo! Labs)
1
Presenter : 品川 政太朗(NAIST)
paper reading
※ All of images are quoted from the paper.
諸注意 2
• “bbox” は “boundary(bounding) box”の略
• “TP” は “True Positive”
• “TN” は “True Negative”.
𝑛𝑢𝑚 "yes" is correct answer
𝑛𝑢𝑚 𝑎𝑛𝑠𝑤𝑒𝑟 "𝑦𝑒𝑠"
𝑛𝑢𝑚 "no" is correct answer
𝑛𝑢𝑚 𝑎𝑛𝑠𝑤𝑒𝑟 "𝑛𝑜"
reference number is same in the paper
[paper] http://ai.stanford.edu/~olga/papers/RussakovskyCVPR15.pdf
[supplements] http://ai.stanford.edu/~olga/papers/RussakovskyCVPR15_supp.pdf
[CVPR poster] http://ai.stanford.edu/~olga/posters/cvpr15-poster.pdf
[slides made by first author] http://ai.stanford.edu/~olga/slides/best_of_both_worlds_slides.pdf
3
画像内のすべての物体をできるだけ
速くアノテーションしてください
4
正解
速く、漏れなくアノテーションするのは
人間にとって骨が折れる
5楽をする方法はないか?
有望な方法:物体検出技術による自動アノテーション
RCNN(Regions with CNN) [Girshick et al. 2014]
detect bbox and classify internal bbox using CNN (so strong)
問題点:
現状の物体検出技術
でもアノテーションでき
る物体は限られている
green : 成功
yellow : bboxにずれ有
pink : 検出失敗
complex task -> ask human (human-in-the-loop)
Human Machine Collaboration
6Human Machine Collaboration
トレードオフが存在
trivial tasks(yes/no問題)
(less accuracy, low cost)
complex tasks(bbox描画)
(high accuracy, high cost)
accuracy low cost
 binary question-and-answer [6,59,60] low cost (not accurate)
 attribute-based feedback [40,39,34]
 free-form object annotation [58] accurate (but high cost)
研究課題 : どのような質問をすれば、アノテーションの正確性を上げ
て、かつコストを下げることができるか
(人間側は常に正解を返せると仮定)
一番バランスするところが一番いいはず
強化学習(MDP)で対話のpolicy(どんな質問をするか)を最適化
7Related Work (1/3)
Recognition with humans in the loop
image classification [6,59,12]
image segmentation [26]
attribute-based classification [32,40,3]
image clustering [34]
image annotation [54,55,47]
human interaction [31]
object annotation in video[58]
[6,59,12,60]はhuman machine collaborationにおける
human time と annotation accuracyの関係に言及
→only single type of human response
[26,13,54]はmultiple modality feedback(varying costs)
predict the success of each modality
→they do not incorporate iterative improvement
8
Better object detection
weakly supervised data [42,23,52,8,24,15]
active learning [32,56]
mine the web for object names and exemplars
[8,11,15]
→minimize human annotation
Related Work (2/3)
9
Cheaper manual annotation
some development of crowdsourcing techniques
・annotation games[57, 12, 30]
・tricks to reduce the annotation search space[13,4]
・effective user interface design[50,58]
・making use of existing annotations[5]
making use of weak human supervision[26,7]
accurately computing the number of required workers[46]
[10,46,28,62] iterative improvement to perform a task with
accuracy per unit of human cost
Related Work (3/3)
10
Olga Russakovsky
• postdoctoral fellow at Carnegie Mellon Univ.
(PhD student when this paper published)
• large-scale recognition, ML, HCI
Li-Jia Li
• Snapchat
• PhD degree from Stanford Univ.
Li Fei-Fei
• Associate Professor, Stanford Univ.
• CVの鬼
(Crowdsourcing) + (large-scale object recognition)
+ (to reduce annotation cost) = this paper ?
この論文のみ
著者 (Stanford Vision Lab team)
11
Utility : (Recallのようなもの)
画像内で全ての数の物体が検出
されたラベルの数
(例)正解:卵2個、椅子1台、人2人
検出:卵1個、椅子1台、人2人
𝑈𝑡𝑖𝑙𝑖𝑡𝑦 = 2
Precision :
アノテーションされたオブジェクトの
うち正確なアノテーションがされたラ
ベルの数
Budget :
人がアノテーションにかかる時間
Problem Formulation
最初に閾値 (𝑈∗
, 𝑃∗
, 𝐵∗
)
を設定
12
𝔼[𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑌)] = 𝑖∈𝑌 𝑝 𝑖
|𝑌|
(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∅ = 1)
𝒴 = 𝐵𝑖, 𝐶𝑖, 𝑝𝑖 𝑖=1
𝑁
, 𝑌 ⊆ 𝒴
𝑓 𝐵𝑖, 𝐶𝑖 =
1 (𝑐𝑙𝑎𝑠𝑠 𝐶𝑖 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑙𝑦 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑒𝑑 𝑏𝑦 𝑏𝑏𝑜𝑥 𝐵𝑖)
0 (𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒)
Problem Formulation
𝒴:すべての物体集合(Nコ)
𝑌:あるラベルの物体集合
𝐵:bboxの頂点(位置、大きさ)
𝐶:クラスラベル
𝑝: 検出が正確である確率
数式での(U, P, B)
𝔼[𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑌)] =
𝑖∈𝑌
𝑝𝑖 𝑓(𝐵𝑖, 𝐶𝑖)
そもそも画像に存在しないもの
は正確に検出していると定義
13
states 𝒮
actions 𝒜
transition 𝒯
rewards ℛ
現在のアノテーション状態
MDP formulation
システムからユーザへの質問
予測されるユーザの返答
(utility of labeling)/costが上昇
すれば増加
ℛ 𝑎(𝑠, 𝑠′
) =
𝔼 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝒴(𝑠′) − 𝔼 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝒴 𝑠
𝑐𝑜𝑠𝑡(𝑎)
𝑎∗ 𝑠 = argmax
𝑎
𝑠′
𝑃𝑎 𝑠, 𝑠′ 𝑅 𝑎 𝑠, 𝑠′ + 𝑉 𝑠′
𝑉 𝑠 =
𝑠′
𝑃 𝑎∗ 𝑠, 𝑠′
𝑅 𝑎∗ 𝑠, 𝑠′
+ 𝑉 𝑠′
rewards
(𝑎 ∈ 𝐴)
𝑉(𝑠)を最大化する
𝑎∗ 𝑠 を選ぶ
2 step lookahead search
ℛ 𝑎 𝑠, 𝑠′ = −𝑖𝑛𝑓
𝑖𝑓 𝑐𝑜𝑠𝑡 𝑎 > 𝐵𝑢𝑑𝑔𝑒𝑡
14
𝑃 𝑢 𝑇 𝐼, 𝑈 𝑇−1 =
𝑘=1
𝐾
𝑃 𝑢 𝑇 𝐸 𝑘
𝑇
𝑃 𝐸 𝑘
𝑇
𝐼, 𝑈 𝑇−1
transition probabilities 𝒯
MDP formulation
𝐸1
𝑇
, 𝐸2
𝑇
, ⋯ , 𝐸 𝐾
𝑇
: 質問𝑎 𝑇に対して回答可能な返答
𝑢 𝑇 : 時刻Tでのユーザの実際の返答
(𝑈 𝑇−1 = 𝑢1, 𝑢2, ⋯ , 𝑢 𝑇−1)
15
𝑃 𝑢 𝑇 𝐼, 𝑈 𝑇−1 =
𝑘=1
𝐾
𝑃 𝑢 𝑇 𝐸 𝑘
𝑇
𝑃 𝐸 𝑘
𝑇
𝐼, 𝑈 𝑇−1
transition probabilities 𝒯
MDP formulation
𝐸1
𝑇
, 𝐸2
𝑇
, ⋯ , 𝐸 𝐾
𝑇
:
𝑢 𝑇: 時刻Tでの
ユーザの返答
回答可能な返答
(𝑈 𝑇−1 = 𝑢1, 𝑢2, ⋯ , 𝑢 𝑇−1)
𝑃 𝑢 𝑇 𝐸 𝑘
𝑇
, 𝐼, 𝑈 𝑇−1 を簡略化
𝒂 𝑻に対して𝑬 𝒌
𝑻
が正解であるときに
ユーザーが𝑬 𝒌
𝑻
を返答に選ぶ確率
1)ユーザの返答のノイズは画像に対して
独立
2)ユーザの返答同士は独立
16
𝑃 𝑢 𝑇 𝐼, 𝑈 𝑇−1 =
𝑘=1
𝐾
𝑃 𝑢 𝑇 𝐸 𝑘
𝑇
𝑃 𝐸 𝑘
𝑇
𝐼, 𝑈 𝑇−1
𝑃 𝐸 𝑘
𝑇
𝐼, 𝑈 𝑇−1 ∝ 𝑃 𝐸 𝑘
𝑇
𝐼
𝑡=1
𝑇−1
𝑃 𝑢 𝑡 𝐸 𝑘
𝑇
, 𝐼, 𝑈𝑡−1
𝑃 𝐸 𝐼, 𝑈 𝑇−1 ∝ 𝑃 𝐸 𝑢 𝑇
𝑡=1
T−1∖ 𝑇
𝑃 𝑢 𝑡 𝐸, 𝐼, 𝑈𝑡−1
transition probabilities 𝒯
MDP formulation
𝐸1
𝑇
, 𝐸2
𝑇
, ⋯ , 𝐸 𝐾
𝑇
:
𝑢 𝑇: 時刻Tでの
ユーザの返答
回答可能な返答
(𝑈 𝑇−1 = 𝑢1, 𝑢2, ⋯ , 𝑢 𝑇−1)
物体検出モデル
最初に物体検出システムを使用する場合
人にbboxを描かせる場合(時刻 𝑇で描かせる)
※ユーザの返答は前の返答や画像に対して独立と仮定
𝒂 𝑻に対して𝑬 𝒌
𝑻
が正解である確率
17
Task
(MDP action)
Template TP TN Cost
Verify-box
Is box B tight around an instance of
class C ?
0.87 0.98 5.34s
Verify-image
Does the image contain an object of
class C ?
0.77 0.93 5.89s
Verify-cover
Are there more instance of class C not
covered by the set of boxes B ?
0.75 0.74 7.57s
Draw-box
Draw a new instance of class C not
already in set of boxes B.
0.72 0.84 10.21s
Name-image
Name an object class in the image
besides the known object classes C .
0.71 0.96 5.71s
Verify-object Is box B tight around some object? 0.75 0.92 9.67s
Name-box
If box B is tight around an object other
than the objects in 𝐶 𝐵, name the object.
0.98 0.88 9.46s
Requests from system to human
18
Task
(MDP action)
Template CV model
Verify-box
Is box B tight around an instance of
class C ?
𝑃(det(𝐵, 𝐶)|𝐼)
Verify-image
Does the image contain an object of
class C ?
𝑃(cls(𝐶)|𝐼)
Verify-cover
Are there more instance of class C not
covered by the set of boxes B ?
𝑃 more 𝐵, 𝐶 𝐼
Draw-box
Draw a new instance of class C not
already in set of boxes B.
𝑃(morecls(𝐶)|𝐼)
Name-image
Name an object class in the image
besides the known object classes C .
𝑃(morecls(𝐶)|𝐼)
Verify-object
Is box B tight around some object? 𝑃(obj(B)のbbox
はtightか)
Name-box
If box B is tight around an object other
than the objects in 𝐶 𝐵, name the object.
𝑃(new B, C )
CV model
19
𝑃 new B, C = P(obj(B))
𝐶∈𝑐
(1 − 𝑃(det(𝐵, 𝐶)))
𝑃 more B, C |𝐼 =
𝑃(𝑐𝑙𝑠(𝐶)|𝐼)
𝑃(𝑚𝑜𝑟𝑒|𝑛)
if n=0
else
𝑛 = 𝑟𝑜𝑢𝑛𝑑_𝑛𝑒𝑎𝑟𝑒𝑠𝑡_𝑖𝑛𝑡(𝔼[𝑛𝑐(𝐵, 𝐶)])
𝔼 𝑛𝑐 ℬ, 𝐶 =
𝐵∈ℬ
𝑃(det(𝐵, 𝐶)|𝐼)
𝑛𝑐 ℬ, 𝐶 ∶ クラスCを満たしているbbox 𝑠𝑒𝑡 ℬの数
20
Verify-box (Task 1/7)
focus on an object
(existence known)
(bbox exists)
(bbox quality unknown)
Q: Do the bbox exists
tightly around the object ?
(yes/no)
In this case,
“yes” is correct answer.
Request answer
21
Verify-image (Task 2/7)
focus on an object
(existence unknown)
Q: Do the object exists in
the image ? (yes/no)
Request answer
In this case,
“no” is correct answer.
22
Verify-cover (Task 3/7)
focus on multiple objects
at least, a object exists
(existence known)
(bbox exists)
(bbox fitness known)
however, multiple objects
(existence unknown)
Request answer
Q: Are the all of objects
completely annotated ?
(yes/no)
In this case,
“no” is correct answer.
23
Draw-box (Task 4/7)
focus on multiple objects
at least, a object exists
(existence known)
(bbox exists)
(bbox quality known)
however, multiple objects
(existence unknown)
Request answer
Q: Are the all of objects
completely annotated ?
(yes -> draw a box / no)
In this case,
“no” is correct answer.
24
Name-image (Task 5/7)
some object
(existence known)
unannotated objects
(existence unknown)
Request answer
Q: Are there any
unannotated objects in the
image ? (yes -> input the
name of the object / no)
In this case,
“umbrella” is an example.
25
Verify-object (Task 6/7)
focus on a bbox
(bbox exists)
(bbox quality unknown)
Request answer
Q: Is this bbox good ?
(yes/no)
In this case,
“yes” is correct answer.
difference from Verify-box :
not focus on object
26
Name-object(box) (Task 7/7)
Request answer
focus on a bbox
(bbox exists)
(bbox quality may be
good)
(object name unknown)
Q: Is this bbox good ?
(yes -> input the object
name / no )
In this case,
“no” is correct answer.
27
Experiment Setup
dataset : ImageNet Large Scale Visual Recognition Challenge
(ILSVRC)2014 detection dataset
40万訓練用画像, 20万バリデーション用画像
validationはval1とval2に分割 (val2をテストに用いる)
val2は2216画像, 1画像少なくとも4つアノテーション有
CV model :
物体検出器 -> pretrained R-CNN [Girshick et al. 2014]
訓練画像はILSVRC2013 detection training set
検出や分類でprobability <0.1 となったものは結果を破棄する
検出器の出力に以下の理由でnon-maximum suppressionをかけ
る
1) 同じ物体を何度も検出するのを避けるため
2) 計算量を削減するため
アノテーション成功とする目標値は IOU=0.7
28
Intersection over union (IOU)
IOUが高いほど良いbboxといえる
bbox内の物体の領域の割合 = high IOUが必ずしも成り立たない
例があるので現在のCV技術では高いIOUを獲得するのが難しいも
のも存在する(例:コークスクリュー)⇒人間の手が必要
29
Experimental Results
• Computer Vision model + Humanが
他の手法よりも優れている
• CVのみはBudget=0(人は無関係)
setting :
2K images of ILSVRC2014 detection
validation (that have at least 4 objects)
• In the budget < 120 [s]
CV+H is higher than others
• MDP is effective
• ILSVRC-DET [43] also use
human-in-the-loop
it takes long time to be ready to
require annotators to draw bbox,
446.9 [s/image]
only binary question
CV only
30
Utility of returned labeling 𝑼
• Req.prec -> requested precision 𝑷∗
• 高いprecision⇒低いutility
• システムが注意深くなっていると解釈できる
Fraction of feasible images
• Req.util -> requested utility 𝑼∗
• 得られたutility 𝑼が𝑼 ≥ 𝑼∗であるような画像
の割合
Precision of returned labeling 𝑷
• expected precision of the labeling
Constraint (𝑼∗
, 𝑷∗
, 𝑩∗
)
31
System Process Examples
32
Discussion
 7つのタスク設定はこれで十分だろうか?
個人的には少し冗長に思える
Verify-objectのbudgetがName-object’sより低いならName-object’sいらな
いのでは・・・
 他の既存の手法とも比べて欲しい
何が効いているのかよく分からない
 結局家の中のような大量のアノテーションが必要な画像はどの
程度できるようになったのか分からない
33
Good/Bad Annotations
(Instruction of crowdsourcing)
Good
• tight bbox
• each bbox covers most of
an object
Bad
• Redundant bbox
• each bbox covers only a part
of an object
• bbox covers multiple objects

More Related Content

Similar to Paper reading best of both world

Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
Exploiting Worker Correlation for Label Aggregation in CrowdsourcingExploiting Worker Correlation for Label Aggregation in Crowdsourcing
Exploiting Worker Correlation for Label Aggregation in CrowdsourcingYuanLi589586
 
NIPS読み会2013: One-shot learning by inverting a compositional causal process
NIPS読み会2013: One-shot learning by inverting  a compositional causal processNIPS読み会2013: One-shot learning by inverting  a compositional causal process
NIPS読み会2013: One-shot learning by inverting a compositional causal processnozyh
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Numenta
 
Jarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logicJarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logicPalGov
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...yaevents
 
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyCuriosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyHyunwoo Kim
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
Dcn 20170823 yjy
Dcn 20170823 yjyDcn 20170823 yjy
Dcn 20170823 yjy재연 윤
 
Multi-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksMulti-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksIlias Flaounas
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...cvpaper. challenge
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia岳華 杜
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GANJunho Cho
 
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...Deep Learning JP
 
Computational Pool-Testing with Retesting Strategy
Computational Pool-Testing with Retesting StrategyComputational Pool-Testing with Retesting Strategy
Computational Pool-Testing with Retesting StrategyWaqas Tariq
 
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...Tobias Wunner
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
 

Similar to Paper reading best of both world (20)

Gan seminar
Gan seminarGan seminar
Gan seminar
 
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
Exploiting Worker Correlation for Label Aggregation in CrowdsourcingExploiting Worker Correlation for Label Aggregation in Crowdsourcing
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
 
NIPS読み会2013: One-shot learning by inverting a compositional causal process
NIPS読み会2013: One-shot learning by inverting  a compositional causal processNIPS読み会2013: One-shot learning by inverting  a compositional causal process
NIPS読み会2013: One-shot learning by inverting a compositional causal process
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
 
Jarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logicJarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logic
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...
 
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyCuriosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Dcn 20170823 yjy
Dcn 20170823 yjyDcn 20170823 yjy
Dcn 20170823 yjy
 
Multi-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksMulti-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricks
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
 
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
 
Computational Pool-Testing with Retesting Strategy
Computational Pool-Testing with Retesting StrategyComputational Pool-Testing with Retesting Strategy
Computational Pool-Testing with Retesting Strategy
 
Lec11 object-re-id
Lec11 object-re-idLec11 object-re-id
Lec11 object-re-id
 
riken-RBlur-slides.pptx
riken-RBlur-slides.pptxriken-RBlur-slides.pptx
riken-RBlur-slides.pptx
 
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Rea...
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
 

More from Shinagawa Seitaro

第7回 NIPS+読み会・関西 Generating Informative and Diverse Conversational Responses v...
第7回 NIPS+読み会・関西 Generating Informative and Diverse Conversational Responses v...第7回 NIPS+読み会・関西 Generating Informative and Diverse Conversational Responses v...
第7回 NIPS+読み会・関西 Generating Informative and Diverse Conversational Responses v...Shinagawa Seitaro
 
AHC-Lab M1勉強会 論文の読み方・書き方
AHC-Lab M1勉強会 論文の読み方・書き方AHC-Lab M1勉強会 論文の読み方・書き方
AHC-Lab M1勉強会 論文の読み方・書き方Shinagawa Seitaro
 
2018.01.12 AHClab SD-study paper reading
2018.01.12 AHClab SD-study paper reading2018.01.12 AHClab SD-study paper reading
2018.01.12 AHClab SD-study paper readingShinagawa Seitaro
 
One-sided unsupervised domain mapping
One-sided unsupervised domain mappingOne-sided unsupervised domain mapping
One-sided unsupervised domain mappingShinagawa Seitaro
 
170318 第3回関西NIPS+読み会: Learning What and Where to Draw
170318 第3回関西NIPS+読み会: Learning What and Where to Draw170318 第3回関西NIPS+読み会: Learning What and Where to Draw
170318 第3回関西NIPS+読み会: Learning What and Where to DrawShinagawa Seitaro
 
20160716 ICML paper reading, Learning to Generate with Memory
20160716 ICML paper reading, Learning to Generate with Memory20160716 ICML paper reading, Learning to Generate with Memory
20160716 ICML paper reading, Learning to Generate with MemoryShinagawa Seitaro
 
情報幾何勉強会 EMアルゴリズム
情報幾何勉強会 EMアルゴリズム 情報幾何勉強会 EMアルゴリズム
情報幾何勉強会 EMアルゴリズム Shinagawa Seitaro
 
MS COCO Dataset Introduction
MS COCO Dataset IntroductionMS COCO Dataset Introduction
MS COCO Dataset IntroductionShinagawa Seitaro
 
How to calculate back propagation
How to calculate back propagationHow to calculate back propagation
How to calculate back propagationShinagawa Seitaro
 

More from Shinagawa Seitaro (11)

第7回 NIPS+読み会・関西 Generating Informative and Diverse Conversational Responses v...
第7回 NIPS+読み会・関西 Generating Informative and Diverse Conversational Responses v...第7回 NIPS+読み会・関西 Generating Informative and Diverse Conversational Responses v...
第7回 NIPS+読み会・関西 Generating Informative and Diverse Conversational Responses v...
 
DTLC-GAN
DTLC-GANDTLC-GAN
DTLC-GAN
 
AHC-Lab M1勉強会 論文の読み方・書き方
AHC-Lab M1勉強会 論文の読み方・書き方AHC-Lab M1勉強会 論文の読み方・書き方
AHC-Lab M1勉強会 論文の読み方・書き方
 
2018.01.12 AHClab SD-study paper reading
2018.01.12 AHClab SD-study paper reading2018.01.12 AHClab SD-study paper reading
2018.01.12 AHClab SD-study paper reading
 
One-sided unsupervised domain mapping
One-sided unsupervised domain mappingOne-sided unsupervised domain mapping
One-sided unsupervised domain mapping
 
170318 第3回関西NIPS+読み会: Learning What and Where to Draw
170318 第3回関西NIPS+読み会: Learning What and Where to Draw170318 第3回関西NIPS+読み会: Learning What and Where to Draw
170318 第3回関西NIPS+読み会: Learning What and Where to Draw
 
20160716 ICML paper reading, Learning to Generate with Memory
20160716 ICML paper reading, Learning to Generate with Memory20160716 ICML paper reading, Learning to Generate with Memory
20160716 ICML paper reading, Learning to Generate with Memory
 
情報幾何勉強会 EMアルゴリズム
情報幾何勉強会 EMアルゴリズム 情報幾何勉強会 EMアルゴリズム
情報幾何勉強会 EMアルゴリズム
 
MS COCO Dataset Introduction
MS COCO Dataset IntroductionMS COCO Dataset Introduction
MS COCO Dataset Introduction
 
How to calculate back propagation
How to calculate back propagationHow to calculate back propagation
How to calculate back propagation
 
150829 kdd2015読み会
150829 kdd2015読み会150829 kdd2015読み会
150829 kdd2015読み会
 

Recently uploaded

power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 

Recently uploaded (20)

power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 

Paper reading best of both world

  • 1. Best of both worlds: human-machine collaboration for object annotation (CVPR2015) Olga Russakovsky@1, Li-Jia Li@2, Li Fei-Fei@1 @1: Stanford University, @2: Snapchat(Yahoo! Labs) 1 Presenter : 品川 政太朗(NAIST) paper reading ※ All of images are quoted from the paper.
  • 2. 諸注意 2 • “bbox” は “boundary(bounding) box”の略 • “TP” は “True Positive” • “TN” は “True Negative”. 𝑛𝑢𝑚 "yes" is correct answer 𝑛𝑢𝑚 𝑎𝑛𝑠𝑤𝑒𝑟 "𝑦𝑒𝑠" 𝑛𝑢𝑚 "no" is correct answer 𝑛𝑢𝑚 𝑎𝑛𝑠𝑤𝑒𝑟 "𝑛𝑜" reference number is same in the paper [paper] http://ai.stanford.edu/~olga/papers/RussakovskyCVPR15.pdf [supplements] http://ai.stanford.edu/~olga/papers/RussakovskyCVPR15_supp.pdf [CVPR poster] http://ai.stanford.edu/~olga/posters/cvpr15-poster.pdf [slides made by first author] http://ai.stanford.edu/~olga/slides/best_of_both_worlds_slides.pdf
  • 5. 5楽をする方法はないか? 有望な方法:物体検出技術による自動アノテーション RCNN(Regions with CNN) [Girshick et al. 2014] detect bbox and classify internal bbox using CNN (so strong) 問題点: 現状の物体検出技術 でもアノテーションでき る物体は限られている green : 成功 yellow : bboxにずれ有 pink : 検出失敗 complex task -> ask human (human-in-the-loop) Human Machine Collaboration
  • 6. 6Human Machine Collaboration トレードオフが存在 trivial tasks(yes/no問題) (less accuracy, low cost) complex tasks(bbox描画) (high accuracy, high cost) accuracy low cost  binary question-and-answer [6,59,60] low cost (not accurate)  attribute-based feedback [40,39,34]  free-form object annotation [58] accurate (but high cost) 研究課題 : どのような質問をすれば、アノテーションの正確性を上げ て、かつコストを下げることができるか (人間側は常に正解を返せると仮定) 一番バランスするところが一番いいはず 強化学習(MDP)で対話のpolicy(どんな質問をするか)を最適化
  • 7. 7Related Work (1/3) Recognition with humans in the loop image classification [6,59,12] image segmentation [26] attribute-based classification [32,40,3] image clustering [34] image annotation [54,55,47] human interaction [31] object annotation in video[58] [6,59,12,60]はhuman machine collaborationにおける human time と annotation accuracyの関係に言及 →only single type of human response [26,13,54]はmultiple modality feedback(varying costs) predict the success of each modality →they do not incorporate iterative improvement
  • 8. 8 Better object detection weakly supervised data [42,23,52,8,24,15] active learning [32,56] mine the web for object names and exemplars [8,11,15] →minimize human annotation Related Work (2/3)
  • 9. 9 Cheaper manual annotation some development of crowdsourcing techniques ・annotation games[57, 12, 30] ・tricks to reduce the annotation search space[13,4] ・effective user interface design[50,58] ・making use of existing annotations[5] making use of weak human supervision[26,7] accurately computing the number of required workers[46] [10,46,28,62] iterative improvement to perform a task with accuracy per unit of human cost Related Work (3/3)
  • 10. 10 Olga Russakovsky • postdoctoral fellow at Carnegie Mellon Univ. (PhD student when this paper published) • large-scale recognition, ML, HCI Li-Jia Li • Snapchat • PhD degree from Stanford Univ. Li Fei-Fei • Associate Professor, Stanford Univ. • CVの鬼 (Crowdsourcing) + (large-scale object recognition) + (to reduce annotation cost) = this paper ? この論文のみ 著者 (Stanford Vision Lab team)
  • 11. 11 Utility : (Recallのようなもの) 画像内で全ての数の物体が検出 されたラベルの数 (例)正解:卵2個、椅子1台、人2人 検出:卵1個、椅子1台、人2人 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 = 2 Precision : アノテーションされたオブジェクトの うち正確なアノテーションがされたラ ベルの数 Budget : 人がアノテーションにかかる時間 Problem Formulation 最初に閾値 (𝑈∗ , 𝑃∗ , 𝐵∗ ) を設定
  • 12. 12 𝔼[𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑌)] = 𝑖∈𝑌 𝑝 𝑖 |𝑌| (𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∅ = 1) 𝒴 = 𝐵𝑖, 𝐶𝑖, 𝑝𝑖 𝑖=1 𝑁 , 𝑌 ⊆ 𝒴 𝑓 𝐵𝑖, 𝐶𝑖 = 1 (𝑐𝑙𝑎𝑠𝑠 𝐶𝑖 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑙𝑦 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑒𝑑 𝑏𝑦 𝑏𝑏𝑜𝑥 𝐵𝑖) 0 (𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒) Problem Formulation 𝒴:すべての物体集合(Nコ) 𝑌:あるラベルの物体集合 𝐵:bboxの頂点(位置、大きさ) 𝐶:クラスラベル 𝑝: 検出が正確である確率 数式での(U, P, B) 𝔼[𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑌)] = 𝑖∈𝑌 𝑝𝑖 𝑓(𝐵𝑖, 𝐶𝑖) そもそも画像に存在しないもの は正確に検出していると定義
  • 13. 13 states 𝒮 actions 𝒜 transition 𝒯 rewards ℛ 現在のアノテーション状態 MDP formulation システムからユーザへの質問 予測されるユーザの返答 (utility of labeling)/costが上昇 すれば増加 ℛ 𝑎(𝑠, 𝑠′ ) = 𝔼 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝒴(𝑠′) − 𝔼 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝒴 𝑠 𝑐𝑜𝑠𝑡(𝑎) 𝑎∗ 𝑠 = argmax 𝑎 𝑠′ 𝑃𝑎 𝑠, 𝑠′ 𝑅 𝑎 𝑠, 𝑠′ + 𝑉 𝑠′ 𝑉 𝑠 = 𝑠′ 𝑃 𝑎∗ 𝑠, 𝑠′ 𝑅 𝑎∗ 𝑠, 𝑠′ + 𝑉 𝑠′ rewards (𝑎 ∈ 𝐴) 𝑉(𝑠)を最大化する 𝑎∗ 𝑠 を選ぶ 2 step lookahead search ℛ 𝑎 𝑠, 𝑠′ = −𝑖𝑛𝑓 𝑖𝑓 𝑐𝑜𝑠𝑡 𝑎 > 𝐵𝑢𝑑𝑔𝑒𝑡
  • 14. 14 𝑃 𝑢 𝑇 𝐼, 𝑈 𝑇−1 = 𝑘=1 𝐾 𝑃 𝑢 𝑇 𝐸 𝑘 𝑇 𝑃 𝐸 𝑘 𝑇 𝐼, 𝑈 𝑇−1 transition probabilities 𝒯 MDP formulation 𝐸1 𝑇 , 𝐸2 𝑇 , ⋯ , 𝐸 𝐾 𝑇 : 質問𝑎 𝑇に対して回答可能な返答 𝑢 𝑇 : 時刻Tでのユーザの実際の返答 (𝑈 𝑇−1 = 𝑢1, 𝑢2, ⋯ , 𝑢 𝑇−1)
  • 15. 15 𝑃 𝑢 𝑇 𝐼, 𝑈 𝑇−1 = 𝑘=1 𝐾 𝑃 𝑢 𝑇 𝐸 𝑘 𝑇 𝑃 𝐸 𝑘 𝑇 𝐼, 𝑈 𝑇−1 transition probabilities 𝒯 MDP formulation 𝐸1 𝑇 , 𝐸2 𝑇 , ⋯ , 𝐸 𝐾 𝑇 : 𝑢 𝑇: 時刻Tでの ユーザの返答 回答可能な返答 (𝑈 𝑇−1 = 𝑢1, 𝑢2, ⋯ , 𝑢 𝑇−1) 𝑃 𝑢 𝑇 𝐸 𝑘 𝑇 , 𝐼, 𝑈 𝑇−1 を簡略化 𝒂 𝑻に対して𝑬 𝒌 𝑻 が正解であるときに ユーザーが𝑬 𝒌 𝑻 を返答に選ぶ確率 1)ユーザの返答のノイズは画像に対して 独立 2)ユーザの返答同士は独立
  • 16. 16 𝑃 𝑢 𝑇 𝐼, 𝑈 𝑇−1 = 𝑘=1 𝐾 𝑃 𝑢 𝑇 𝐸 𝑘 𝑇 𝑃 𝐸 𝑘 𝑇 𝐼, 𝑈 𝑇−1 𝑃 𝐸 𝑘 𝑇 𝐼, 𝑈 𝑇−1 ∝ 𝑃 𝐸 𝑘 𝑇 𝐼 𝑡=1 𝑇−1 𝑃 𝑢 𝑡 𝐸 𝑘 𝑇 , 𝐼, 𝑈𝑡−1 𝑃 𝐸 𝐼, 𝑈 𝑇−1 ∝ 𝑃 𝐸 𝑢 𝑇 𝑡=1 T−1∖ 𝑇 𝑃 𝑢 𝑡 𝐸, 𝐼, 𝑈𝑡−1 transition probabilities 𝒯 MDP formulation 𝐸1 𝑇 , 𝐸2 𝑇 , ⋯ , 𝐸 𝐾 𝑇 : 𝑢 𝑇: 時刻Tでの ユーザの返答 回答可能な返答 (𝑈 𝑇−1 = 𝑢1, 𝑢2, ⋯ , 𝑢 𝑇−1) 物体検出モデル 最初に物体検出システムを使用する場合 人にbboxを描かせる場合(時刻 𝑇で描かせる) ※ユーザの返答は前の返答や画像に対して独立と仮定 𝒂 𝑻に対して𝑬 𝒌 𝑻 が正解である確率
  • 17. 17 Task (MDP action) Template TP TN Cost Verify-box Is box B tight around an instance of class C ? 0.87 0.98 5.34s Verify-image Does the image contain an object of class C ? 0.77 0.93 5.89s Verify-cover Are there more instance of class C not covered by the set of boxes B ? 0.75 0.74 7.57s Draw-box Draw a new instance of class C not already in set of boxes B. 0.72 0.84 10.21s Name-image Name an object class in the image besides the known object classes C . 0.71 0.96 5.71s Verify-object Is box B tight around some object? 0.75 0.92 9.67s Name-box If box B is tight around an object other than the objects in 𝐶 𝐵, name the object. 0.98 0.88 9.46s Requests from system to human
  • 18. 18 Task (MDP action) Template CV model Verify-box Is box B tight around an instance of class C ? 𝑃(det(𝐵, 𝐶)|𝐼) Verify-image Does the image contain an object of class C ? 𝑃(cls(𝐶)|𝐼) Verify-cover Are there more instance of class C not covered by the set of boxes B ? 𝑃 more 𝐵, 𝐶 𝐼 Draw-box Draw a new instance of class C not already in set of boxes B. 𝑃(morecls(𝐶)|𝐼) Name-image Name an object class in the image besides the known object classes C . 𝑃(morecls(𝐶)|𝐼) Verify-object Is box B tight around some object? 𝑃(obj(B)のbbox はtightか) Name-box If box B is tight around an object other than the objects in 𝐶 𝐵, name the object. 𝑃(new B, C ) CV model
  • 19. 19 𝑃 new B, C = P(obj(B)) 𝐶∈𝑐 (1 − 𝑃(det(𝐵, 𝐶))) 𝑃 more B, C |𝐼 = 𝑃(𝑐𝑙𝑠(𝐶)|𝐼) 𝑃(𝑚𝑜𝑟𝑒|𝑛) if n=0 else 𝑛 = 𝑟𝑜𝑢𝑛𝑑_𝑛𝑒𝑎𝑟𝑒𝑠𝑡_𝑖𝑛𝑡(𝔼[𝑛𝑐(𝐵, 𝐶)]) 𝔼 𝑛𝑐 ℬ, 𝐶 = 𝐵∈ℬ 𝑃(det(𝐵, 𝐶)|𝐼) 𝑛𝑐 ℬ, 𝐶 ∶ クラスCを満たしているbbox 𝑠𝑒𝑡 ℬの数
  • 20. 20 Verify-box (Task 1/7) focus on an object (existence known) (bbox exists) (bbox quality unknown) Q: Do the bbox exists tightly around the object ? (yes/no) In this case, “yes” is correct answer. Request answer
  • 21. 21 Verify-image (Task 2/7) focus on an object (existence unknown) Q: Do the object exists in the image ? (yes/no) Request answer In this case, “no” is correct answer.
  • 22. 22 Verify-cover (Task 3/7) focus on multiple objects at least, a object exists (existence known) (bbox exists) (bbox fitness known) however, multiple objects (existence unknown) Request answer Q: Are the all of objects completely annotated ? (yes/no) In this case, “no” is correct answer.
  • 23. 23 Draw-box (Task 4/7) focus on multiple objects at least, a object exists (existence known) (bbox exists) (bbox quality known) however, multiple objects (existence unknown) Request answer Q: Are the all of objects completely annotated ? (yes -> draw a box / no) In this case, “no” is correct answer.
  • 24. 24 Name-image (Task 5/7) some object (existence known) unannotated objects (existence unknown) Request answer Q: Are there any unannotated objects in the image ? (yes -> input the name of the object / no) In this case, “umbrella” is an example.
  • 25. 25 Verify-object (Task 6/7) focus on a bbox (bbox exists) (bbox quality unknown) Request answer Q: Is this bbox good ? (yes/no) In this case, “yes” is correct answer. difference from Verify-box : not focus on object
  • 26. 26 Name-object(box) (Task 7/7) Request answer focus on a bbox (bbox exists) (bbox quality may be good) (object name unknown) Q: Is this bbox good ? (yes -> input the object name / no ) In this case, “no” is correct answer.
  • 27. 27 Experiment Setup dataset : ImageNet Large Scale Visual Recognition Challenge (ILSVRC)2014 detection dataset 40万訓練用画像, 20万バリデーション用画像 validationはval1とval2に分割 (val2をテストに用いる) val2は2216画像, 1画像少なくとも4つアノテーション有 CV model : 物体検出器 -> pretrained R-CNN [Girshick et al. 2014] 訓練画像はILSVRC2013 detection training set 検出や分類でprobability <0.1 となったものは結果を破棄する 検出器の出力に以下の理由でnon-maximum suppressionをかけ る 1) 同じ物体を何度も検出するのを避けるため 2) 計算量を削減するため アノテーション成功とする目標値は IOU=0.7
  • 28. 28 Intersection over union (IOU) IOUが高いほど良いbboxといえる bbox内の物体の領域の割合 = high IOUが必ずしも成り立たない 例があるので現在のCV技術では高いIOUを獲得するのが難しいも のも存在する(例:コークスクリュー)⇒人間の手が必要
  • 29. 29 Experimental Results • Computer Vision model + Humanが 他の手法よりも優れている • CVのみはBudget=0(人は無関係) setting : 2K images of ILSVRC2014 detection validation (that have at least 4 objects) • In the budget < 120 [s] CV+H is higher than others • MDP is effective • ILSVRC-DET [43] also use human-in-the-loop it takes long time to be ready to require annotators to draw bbox, 446.9 [s/image] only binary question CV only
  • 30. 30 Utility of returned labeling 𝑼 • Req.prec -> requested precision 𝑷∗ • 高いprecision⇒低いutility • システムが注意深くなっていると解釈できる Fraction of feasible images • Req.util -> requested utility 𝑼∗ • 得られたutility 𝑼が𝑼 ≥ 𝑼∗であるような画像 の割合 Precision of returned labeling 𝑷 • expected precision of the labeling Constraint (𝑼∗ , 𝑷∗ , 𝑩∗ )
  • 33. 33 Good/Bad Annotations (Instruction of crowdsourcing) Good • tight bbox • each bbox covers most of an object Bad • Redundant bbox • each bbox covers only a part of an object • bbox covers multiple objects