Presentation 17 may morning keynote cees snoek

258 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
258
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Presentation 17 may morning keynote cees snoek

  1. 1. 22-­‐05-­‐13  1  A  Rosea  Stone  for  Image  Understanding  Cees  Snoek    University  of  Amsterdam  The  Netherlands  Euvision  Technologies  The  Netherlands  A  classical  problem  Understanding  was  lost  from  394CE  to  1822  
  2. 2. 22-­‐05-­‐13  2  RoseEa  Stone  discovery  in  1799  A  decree  by  King  Ptolemy  V  – Hieroglyphs  – DemoMc  script  – Ancient  Greek  Key  to  decipherment  in  1822  JF  Champollion  RECOGNIZING  WORDS  Understanding  images  Mazloom  et  al.,  ICMR  201
  3. 3. 22-­‐05-­‐13  3  How  difficult  is  the  problem?  Human  vision  consumes  50%  brain  power…  Van  Essen,  Science  1992  Visual  labeling  in  a  nutshell  Visualization byJasper Schulte
  4. 4. 22-­‐05-­‐13  4  Visual  labeling  by  machine  Encode ReduceEncode ReduceLearnLabelInternaMonal  compeMMon  NIST  TRECVID  Benchmark  Promote  progress  in  video  retrieval  research  Open  data,  tasks,  evaluaMon  and  innovaMon  hEp://trecvid.nist.gov/  
  5. 5. 22-­‐05-­‐13  5  Are  we  making  progress?  •  1000+  others  x MediaMill teamMediaMill team, TRECVID 2004-2012Performance  doubled  in  just  3  years  Snoek & Smeulders, IEEE Computer 2010So&ware  licensed  by  Euvision  Technologies  
  6. 6. 22-­‐05-­‐13  6  MediaMill  video  search  engines  Learning  from  social-­‐tagged  images  Xirong  Li  et  al,  TMM  2009    Exploit  consistency  in  tagging  behavior  of  different  users  for  visually  similar  images  
  7. 7. 22-­‐05-­‐13  7  Tag  relevance  ObjecMve  tags  are  idenMfied  and  reinforced  Based  on  3.5  Million  images  downloaded  from  Flickr  RECOGNIZING  SENTENCES  Understanding  images  Mazloom  et  al.,  ICMR  2013
  8. 8. 22-­‐05-­‐13  8  Human  event  descripMon  on  web  video  We  analyze  13K  web  videos  and  their  descripMons  People  compe:ng  in  a  sand  sculp:ng  compe::on  and  children  playing  on  the  beach.  A  woman  folds  and  packages  a  scarf  she  has  made.  Habibian  et  al.,  ICMR  2013    Human  concept-­‐vocabulary  Consists  of  5K  disMnct  and  mostly  rare  concepts  Includes  general  and  specialized  concepts  It  is  composed  of  various  concept  types  0 10 20 30 40 50Non VisualAttributeSceneActionObjectPortions (in %)AnimalPeople
  9. 9. 22-­‐05-­‐13  9  Concepts  categorized  by  type  Object  People  Animal  Scene  AcDon  Aribute  From  concepts  to  sentences  Input  Video  Event  Models  Concept  1  Concept  2  Concept  K  …  Concept  Vocabulary  Train  SVM  Crea9ng  the  concept  vocabulary  is  cri9cal    Sadanand,  CVPR12  Merler,  TMM12  Althoff,  MM12    AEempMng  a  board  trick  
  10. 10. 22-­‐05-­‐13  10  Video  sentence  examples  ABemp9ng  a  board  trick  Working  on  a  woodworking  project  Changing  a  vehicle  9re  Are  more  concepts  beEer?  In  general,  more  is  beBer.  But,  a  vocabulary  of    500  concepts  exists  that  outperforms  all  others    Mazloom  et  al.,  ICMR  2013    
  11. 11. 22-­‐05-­‐13  11  Results  for  “Landing  a  fish  in”  A  vocabulary  of  100  concepts  is  the  best  performer  InformaMve  concepts  vs  All  concepts  The  23%  most  informa9ve  concepts  lead  to    a  65%  rela9ve  increase  in  event  detec9on  accuracy.    
  12. 12. 22-­‐05-­‐13  12  What  concepts  are  informaMve  Font size correlates with informativenessWedding  Ceremony  Landing  a  Fish  Visual  translaMon  Represent images and text in unified semantic spaceC1  Cn  C2  The  18th-­‐largest  country  in  the  world   in   terms   of   area   at  1 , 6 4 8 , 1 9 5   I r a n   h a s   a  populaMon   of   around   75  million.   It   is   a   country   of  parMcular  geo..  Concept  Detectors  (Textual)   Concept  Detectors  (Visual)  SemanMc  Space  
  13. 13. 22-­‐05-­‐13  13  Example:  query  by  a  video  Video  translaMon  Summary  of  most  likely  translaMons  Habibian  et  al.,  submi@ed  
  14. 14. 22-­‐05-­‐13  14  Conclusion      AI-­‐progress  and  human  descripMons  on  the  web  act  as  ‘RoseEa  Stone’  for  image  understanding.    AutomaMc  metadata  generaMon  jumps  from  words  to  sentences.    www.ceessnoek.info  

×