• Like
From Large Scale Image Categorization to Entry-Level Categories
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

From Large Scale Image Categorization to Entry-Level Categories

  • 5,891 views
Published

CV勉強会@関東 ICCV2013読み会資料

CV勉強会@関東 ICCV2013読み会資料

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,891
On SlideShare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
11
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 2014/02/23 CV勉強会@関東 ICCV2013読み会 発表資料 takmin
  • 2. 紹介する研究  “From Large Scale Image Categorization to Entry-Level Categories”  Vicente Ordonez  Jia Deng  Yejin Choi  Alexander C. Berg  Tamara L. Berg
  • 3. はじめに  この資料は、First AuthorのOrdonezさんが自 身のサイトに発表スライドをアップしていたた め、それを元に和訳と若干の追加説明を加え たものです。   http://www.cs.unc.edu/~vicente/entrylevel/index. html Best Paperに選ばれた理由  大規模一般物体認識に対して新しい問題提起を 行ったこと?
  • 4. この絵をなんて呼びますか? Grampus griseus Dolphin (イルカ) (ハナゴンドウ)
  • 5. この絵をなんて呼びますか? Object (物体) Organism (生物) Animal (動物) Chordate (脊索動物) Vertebrate (脊椎動物) Bird (鳥) Aquatic bird (水鳥) Swan (白鳥) Whistling swan (アメリカコハクチョウ) Cygnus Columbianus (コハクチョウ)
  • 6. 基礎知識  WordNet      英語の概念辞書 約15万語 synsetと呼ばれる同義語のグループに分類され、 synset同士の関係が記述されている 名詞と動詞は階層構造を持つ ImageNet   WordNetの名詞に対応した画像データセット 1つのsynsetあたり平均1000枚の画像
  • 7. 画像の内容に名前をつける (0.80) (0.83) American black bear (0.16) Grizzly bear (0.25) King penguin (0.11) Cormorant (0.56) Homing pigeon (0.26) Ball-peen hammer (0.06) Spigot (0.07) Diskette, floppy (0.06) Steel arch bridge (0.16) Farmhouse (0.03) Soapweed (0.12) Brazilian rosewood (0.13) Bristlecone pine (0.04) Cliffdiving (0.19) Vision Grampus griseus Crabapple Input Image Thousands of Noisy Category Predictions Grampus griseus Pick the Best
  • 8. 画像の内容に名前をつける (0.80) (0.83) American black bear (0.16) Grizzly bear (0.25) King penguin (0.11) Cormorant (0.56) Homing pigeon (0.26) Ball-peen hammer (0.06) Spigot (0.07) Diskette, floppy (0.06) Steel arch bridge (0.16) Farmhouse (0.03) Soapweed (0.12) Brazilian rosewood (0.13) Bristlecone pine (0.04) Cliffdiving (0.19) Vision Grampus griseus Crabapple Input Image Thousands of Noisy Category Predictions Grampus Naming griseus Pick the Best Dolphin What Should I Call It?
  • 9. Entry-Level Category 物体の画像を見せられた時、 人がつけるであろう名前 Rosch et al, 1976 Jolicoeur, Gluck & Kosslyn, 1984 上位後: animal, vertebrate エントリーレベル: bird 下位語: Black-capped chickadee (アメリカコガラ)
  • 10. Entry-Level Category 物体の画像を見せられた時、 人がつけるであろう名前 Rosch et al, 1976 Jolicoeur, Gluck & Kosslyn, 1984 上位語: animal, bird エントリーレベル: penguin 下位語: Chinstrap penguin (アゴヒモペンギン)
  • 11. それって難しいの? wordnet hierarchy Living thing Plant, Flora Bird 球根植物 被子植物 Angiosperm Seabird Bulbous Plant スイセン Narcissus Flower Penguin Cormorant 鵜 King penguin どこまで上位語に遡れ ば良い? Orchid Daisy Daffodil ラッパスイセン Frog Orchid 上位語にEntry Categoryを 含まない場合は?
  • 12. さて、どうやろう? Wordnet Linguistic resources Imagenet Google Web 1T Computer Vision Lots of text The Egyptian cat statue by the floor clock and perpetual motion Interior design of modern white and brown living room furniture hanging. SBU Captioned Dataset Man sits in a rusted car buried in the sand on Waitarere beach Labeled Images Little girl and her dog in northern Thailand. They both seemed. Our dog Zoe in her bed Emma in her hat looking super cute Lots of images with text
  • 13. Ground Truthデータ作成 WordNetの葉ノード(x500) Friesian, Holstein, HolsteinFriesian cow cattle pasture 対応する画像 x10 投票 Amazon Mechanical Terk fence x8
  • 14. 1. Goal: Category Translation Detailed Category What should I Call It? (Entry-Level Category) Grampus griseus dolphin 𝑑 𝑒 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin 𝑒
  • 15. 1. Goal: Category Translation Detailed Category What should I Call It? (Entry-Level Category) Grampus griseus dolphin 𝑑 𝑒 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin 𝑒
  • 16. 1. Goal: Category Translation Detailed Category What should I Call It? (Entry-Level Category) Grampus griseus dolphin 𝑑 𝑒 1.1 Text Based WebコーパスとWordNetの階層構造のみか ら推定 1.2 Image Based 画像特徴からWordNetの単語へ投票して推 定
  • 17. 1.1 Category Translation: Text-based 𝜓(𝑑, 𝑒) wordnet hierarchy 656M Animal Bird Mammal 15M 128M Seabird Cetacean 0.9M Penguin 88M 1.2M Whale 55M 30M King penguin 22M Dolphin 6.4M Grampus griseus 0.08M Sperm whale Semantic Distance 366M Cormorant 𝜙(𝑒) n-gram Frequency
  • 18. 1.1 Category Translation: Text-based 𝜓(𝑑, 𝑒) wordnet hierarchy 656M Animal Mammal 15M 128M Seabird Cetacean 0.9M Penguin 88M 1.2M Whale 55M 30M King penguin 22M Dolphin 6.4M Grampus griseus Naturalness Bird Semantic Distance 366M Cormorant 𝜙(𝑒) Sperm whale 0.08M 𝜏 𝑑, 𝜆 = argmax[𝜙 𝑒 − 𝜆𝜓(𝑑, 𝑒)] 𝑤
  • 19. 1.2 Category Translation: Image-based WordNetの葉ノード Friesian, Holstein, HolsteinFriesian 対応する画像 (1.9071) cow (1.1851) orange_tree (0.6136) stall (0.5630) mushroom (0.3825) pasture (0.3156) sheep (0.3321) black_bear (0.3015) puppy (0.2409) pedestrian_bridge (0.2353) nest TF-IDFでランク 付け Vision System アノテーション
  • 20. Category Translation: Examples HUMANS TEXT BASED IMAGE BASED cactus wren bird bird bird buzzard, Buteo buteo hawk hawk bird whinchat, Saxicola rubetra bird chat bird Weimaraner dog dog dog numbat, banded anteater, anteater anteater anteater cat rhea, Rhea americana ostrich bird grass Europ. black grouse, heathfowl bird bird duck yellowbelly marmot, rockchuck Squirrel marmot rock
  • 21. Category Translation: Examples HUMANS TEXT BASED IMAGE BASED cactus wren bird bird bird buzzard, Buteo buteo hawk hawk bird whinchat, Saxicola rubetra bird chat bird Weimaraner dog dog dog numbat, banded anteater, anteater anteater anteater cat rhea, Rhea americana ostrich bird grass Europ. black grouse, heathfowl bird bird duck yellowbelly marmot, rockchuck Squirrel marmot rock
  • 22. Category Translation: Examples HUMANS TEXT BASED IMAGE BASED cactus wren bird bird bird buzzard, Buteo buteo hawk hawk bird whinchat, Saxicola rubetra bird chat bird Weimaraner numbat, banded anteater, anteater dog dog dog 「chat」は鳥の種類。コーパスに 「おしゃべり」の意味で頻出cat anteater anteater rhea, Rhea americana ostrich bird grass Europ. black grouse, heathfowl bird bird duck yellowbelly marmot, rockchuck Squirrel marmot rock
  • 23. Category Translation: Examples HUMANS TEXT BASED IMAGE BASED cactus wren bird bird bird buzzard, Buteo buteo hawk hawk bird whinchat, Saxicola rubetra bird chat bird アメリカダチョウ Weimaraner dog dog dog numbat, banded anteater, anteater anteater anteater cat rhea, Rhea americana ostrich bird grass Europ. black grouse, heathfowl bird bird duck yellowbelly marmot, rockchuck Squirrel marmot rock
  • 24. 1. Goal: Category Translation Detailed Category What should I Call It? (Entry-Level Category) Grampus griseus dolphin 𝑑 𝑒 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin 𝑒
  • 25. 2.1 Propagated Visual Estimates 画像認識結果にWebコーパスの情報を加味し て、“自然な”WordNet上の単語を選択 2.2 Supervised Learning ImageNetの葉ノードに対する認識結果を Entry-Level Categoriesへ変換 2. Goal: Content Naming Input Image What should I Call It? (Entry-Level Category) dolphin 𝑒
  • 26. Large Scale Categorization (0.80) (0.41) Homing pigeon (0.26) Ball-peen hammer (0.06) Spigot (0.07) Diskette, floppy (0.06) Steel arch bridge (0.16) Farmhouse (0.03) Soapweed (0.12) Brazilian rosewood (0.13) Spatial pooling Cormorant (0.56) Coding (LLC), Wang et al. CVPR 2010 King penguin (0.11) Local descriptors Grizzly bear (0.25) Selective Search Windows. van De Sande et al. ICCV 2011 American black bear (0.16) Flat Classifiers Grampus griseus Bristlecone pine (0.04) Cliffdiving (0.19) Crabapple
  • 27. 2.1 Propagated Visual Estimates Accuracy 𝑓(𝑣, 𝐼) (0.05) Cormorant King penguin (0.15) Sperm whale Grampus griseus (0.6) (0.2)
  • 28. 2.1 Propagated Visual Estimates Animal 𝑓(𝑣, 𝐼) (1.0) (0.2) Mammal (0.8) Seabird (0.2) Cetacean (0.8) Whale (0.8) Penguin (0.15) King penguin (0.15) (0.05) Cormorant Dolphin (0.6) Grampus griseus (0.6) Sperm whale Accuracy Bird (0.2)
  • 29. 2.1 Propagated Visual Estimates Animal 𝑓(𝑣, 𝐼) - 𝜓(𝑣) (1.0) (0.8) Seabird (0.2) Cetacean (0.8) Whale (0.8) Penguin (0.15) King penguin (0.15) (0.05) Cormorant Dolphin (0.6) Grampus griseus Sperm whale (0.2) (0.6) Deng et al. CVPR 2012 𝑓 𝑣, 𝐼, 𝜆 = 𝑓(𝑣, 𝐼) [− 𝜓 𝑣 + 𝜆] 葉からの距離 Mammal Specificity (0.2) Accuracy Bird
  • 30. 2.1 Propagated Visual Estimates Animal 656M 𝑓(𝑣, 𝐼) - 𝜓(𝑣) (1.0) Mammal (0.8) Seabird (0.2) 0.9M Cetacean (0.8) 55M Whale (0.8) Dolphin (0.6) 6.4M Sperm whale Penguin (0.15) 1.2M King penguin (0.15) (0.05) Cormorant 30M 0.08M Grampus griseus (0.2) (0.6) OurDeng et al. CVPR 2012 work 𝑓 𝑓(𝑣, 𝐼) 𝐼) 𝑣 𝑓 𝑛𝑎𝑡 𝑣,𝑣,𝐼,𝐼,𝜆 𝜆 = = 𝑓(𝑣,[− 𝜓[𝜙 +𝒗 𝜆]− 𝜆 𝜓(𝑣)] Naturalness 15M Specificity 22M (0.2) 128M 88M Bird Accuracy 366M 𝜙(𝑣)
  • 31. 2.2 Supervised Learning SBU Captioned Dataset 100万枚のキャプション付き画像データセット(from Flickr) POS Taggerでノイズ除去 キャプションからEntry-Level Categories生成 ImageNetの葉ノードとEntry-Level Categories との関係を学習
  • 32. 2.2 Supervised Learning (0.80) Grampus griseus (0.41) American black bear (0.16) Grizzly bear (0.25) King penguin (0.11) Cormorant (0.56) 画像認識結果 Homing pigeon (0.26) 𝑋= Ball-peen hammer (0.06) Spigot (0.07) Diskette, floppy (0.06) Steel arch bridge (0.16) Farmhouse (0.03) Soapweed (0.12) Brazilian rosewood (0.13) Bristlecone pine (0.04) Cliffdiving (0.19) Crabapple ImageNetの葉ノード
  • 33. 2.2 Supervised Learning (0.80) (0.41) American black bear (0.16) Grizzly bear (0.25) King penguin (0.11) Cormorant Bear (0.56) Homing pigeon Dog (0.26) 𝑋= Grampus griseus Ball-peen hammer (0.06) Spigot (0.07) Diskette, floppy (0.06) Steel arch bridge (0.16) Farmhouse Penguin (0.03) Soapweed Tree (0.12) Brazilian rosewood Palm tree (0.13) Bristlecone pine (0.04) Cliffdiving (0.19) Crabapple training from weak annotations SBU Captioned Photo Dataset 1 million captioned images! Building House Bird 𝑓𝑠𝑣𝑚 1 学習 𝒗 𝒊 , 𝐼, Θ = 1 − exp(𝑎Θ 𝑇 𝑋 + 𝑏)
  • 34. Extracting Meaning from Data “tree”を学習させた結果 snag shade tree bracket fungus, shelf fungus bristlecone pine, Rocky Mountain bristlecone pine, Pinus aristata Brazilian rosewood, caviuna wood, jacaranda, Dalbergia nigra redheaded woodpecker, redhead, Melanerpes erythrocephalus redbud, Cercis canadensis mangrove, Rhizophora mangle chiton, coat-of-mail shell, sea cradle, polyplacophore crab apple, crabapple papaya, papaia, pawpaw, papaya tree, melon tree, Carica papaya frogmouth Mammals Birds Instruments Structures Plants Other
  • 35. Extracting Meaning from Data “water”を学習させた結果 water dog surfing, surfboarding, surfriding manatee, Trichechus manatus punt dip, plunge cliff diving fly-fishing sockeye, sockeye salmon, red salmon, blueback salmon, Oncorhynchus nerka sea otter, Enhydra lutris American coot, marsh hen, mud hen, water hen, Fulica americana booby canal boat, narrow boat, narrowboat Mammals Birds Instruments Structures Plants Other
  • 36. 2.3 Joint 2.1 Propagated Visual Estimatesと 2.2 Supervised Learning の統合 𝑓𝑗𝑜𝑖𝑛𝑡 𝑣, 𝛼 = 𝛼 𝑓 𝑛𝑎𝑡 𝑣 + 𝑓𝑠𝑣𝑚 𝑣
  • 37. Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’12 Propagated Visual Estimates Supervised Learning Joint farm, fence field horse, mule kite, dirt people tree, zoo gelding yearling shire yearling draft horse equine perissodactyl ungulate male horse tree equine male gelding horse pasture field cow fence horse pasture field cow fence
  • 38. Results: Content Naming Human Labels Flat Classifier Deng et al. CVPR’12 Propagated Visual Estimates Supervised Learning Joint fence, junk sign stop sign street sign trash can tree woody tree structure plant vascular tree structure building plant area logo street neighborhoo building office building logo street neighborhood building office feeder Hyla cleaner box large
  • 39. Evaluation: Content Naming Test Set B – High Confidence Prediction Scores Test Set A – Random Images 26% 26% 24% 24% 22% 22% 20% 20% 18% 18% 16% 16% 14% 14% 12% 12% 10% 10% 8% 8% 6% 6% 4% 4% 2% 2% 0% 0% Flat Deng et al. Propagated Supervised Combined Classifier CVPR'12 Visual Learning Estimates Precision Recall Flat Deng et al. Propagated Supervised Combined Classifier CVPR'12 Visual Learning Estimates Precision Recall
  • 40. Conclusions/Future Work  画像へ名前付けを行うための新しい方法を 検討  画像に対し、人間のような名前付けを行うた めの方法を示した  名詞以外にもアクションや属性の推定など にも今後拡張していきたい
  • 41. Questions?