孫民/從電腦視覺看人工智慧 : 下一件大事

3,305 views

Published on

孫民博士目前任教於國立清華大學電機系,他畢業於國立交通大學電子工程學系後,取得史坦福電機碩士、密西根安雅堡電機系統組博士、以及西雅圖華盛頓大學計算機工程博士後的經歷。他的研究興趣在電腦視覺、機器學習、以及人機互動領域,近年來基於深度學習在電腦視覺的突破,他致力於開發橫跨人工智慧不同子領域的系統,如自動影片文字描述 ( 視覺 x 自然語言 )、以及與人類行為互動的智慧機器 ( 視覺 x 控制 )。

Published in: Data & Analytics

孫民/從電腦視覺看人工智慧 : 下一件大事

  1. 1. Ar#ficial  Intelligence:     The  Next  Big  Thing     from  a  computer  vision  perspec0ve     VSLab   清大電機 孫民
  2. 2. What’s  the  Next  Big  Thing?   h2p://research.microso6.com/en-­‐us/um/redmond/events/fs2015  
  3. 3. Goal     “big  data  being  the  source,  machine   learning  being  the  technique,  and  AI   being  the  outcome”     by  Prof.  Hsuan-­‐Tien  Lin  at  IEEE  BigData  2016     Many  kinds  of  source  (data)  and   outcomes  (AI  tasks)  can  be  trained  end-­‐ to-­‐end  using  Deep  Learning  (DL)  
  4. 4. Classical  AI  Tests:  Turing  Test   by  Alan  Turing  in  1950  
  5. 5. Chatbot@F8   h2ps://developers.facebook.com/videos/f8-­‐2016/keynote/  
  6. 6. Classical  AI  Tests:  CAPTCHA  
  7. 7. Breaking  CAPTCHA   by  vicarious.com  
  8. 8. AlphaGo   2016  by  Google  DeepMind   Are  these  what  AI  all  about?  
  9. 9. 2014  Subfields  of  AI  
  10. 10. 2015   Ar#fical  General  Intelligence  (AGI)  
  11. 11. Deep  Learning  (DL)   •  Data   •  GPU  Compu0ng   •  Talents  
  12. 12. DL  Fuses  AI-­‐subfields   •  Vision  and  Language     •  Vision  and  Control   h2p://mscoco.org/   Atari  Breakout  game  &  AlphaGo,  DeepMind.   -­‐>  AGI   •  Mul0ple  Encoding  and  Decoding  
  13. 13. Image  Cap#oning   f(                      )  =   The  man  at  bat  is   ready  to  swing     at  the  pitch   Vision   Language   Recurrent  Neuron  Network  (RNN)   credit:  Nature   convolu0ons   Convolu#on  Neuron  Network  (CNN)   credit:  wiki  
  14. 14. Image  Ques#on  Answering   h2p://visualqa.org/  
  15. 15. Zhen  et  al.  ECCV  2016  from  VSLab  and  Stanford  AI  Lab  
  16. 16. Big  Video  Data  with  Titles   •  Pairs  of   Raw  Video     CNN   CNN   CNN   CNN   Title  
  17. 17. Viral  Videos   Google  for  “viral  video  company”  
  18. 18. Large  Video  Repository   Currently  28740  videos  and  keep  growing  
  19. 19. DL  Fuses  AI-­‐subfields   •  Vision  and  Language     •  Vision  and  Control   h2p://mscoco.org/   Atari  Breakout  game  &  AlphaGo,  DeepMind.   -­‐>  AGI   •  Mul0ple  Encoding  and  Decoding  
  20. 20. Vision  and  Control   h2ps://gym.openai.com/   •  Learning  to  play  game  with  weak  supervision:    Reinforcement  Learning  (RL)  
  21. 21. Where  It  All  Begins  …     by  DeepMind  in  NIPS  2013  Deep  Learning  Wrokshop   Playing Atari with Deep Reinforcement Learning slides  by     Yen-­‐Chen  Lin  
  22. 22. Control:  Learning  to  Act   Play  Breakout  equals  to   •  Input:  screen  images   •  Output:  ac0ons          (do  nothing  |  left  |  right)     Supervised   Classifica0on   slides  by     Yen-­‐Chen  Lin  
  23. 23. Supervised  Solu#on     •  Training data:  Record  experts  game   sessions   •  Target label:  Ac0on  experts  take  at  every   step   •  What  if  there’s  no  expert?   •  This  is  not  how  human  learns   Problems:   slides  by     Yen-­‐Chen  Lin  
  24. 24. How  Human  Learns   •  Don’t  need  somebody  to  tell  us  a  million   0mes  which  move  to  choose  at  each  screen   •  Just  need  occasional feedback  that  we   did  the  right  thing   slides  by     Yen-­‐Chen  Lin  
  25. 25. Reinforcement  Learning   •  Somewhere  between  supervised  and   unsupervised  learning   •  Sparse  and  time-delayed  labels   Based  only  on  those  rewards,  the  agent  has   to  learn  to  behave  in  the  environment.     A  ra0onal  agent  should  op0mize  total   reward.   slides  by     Yen-­‐Chen  Lin  
  26. 26. RL  in  A  Nutshell   slides  by     Yen-­‐Chen  Lin  
  27. 27. Markov  Decision  Process   •  State     •  Action     •  Reward The  probability  of  the  next  state  si+1  depends  only  on   current  state  si  and  ac0on  ai. slides  by     Yen-­‐Chen  Lin  
  28. 28. Episode   One  episode  of  this  process  (e.g.  one  game)  forms  a   finite  sequence  of  states,  ac0ons  and  rewards:   slides  by     Yen-­‐Chen  Lin  
  29. 29. Example:  Breakout   •  State: game  screen     •  Action:
 
 
 •  Reward:  game  score   1. do  nothing   2.  le6   3.  right   slides  by     Yen-­‐Chen  Lin  
  30. 30. Example:  Breakout   •  State: successive 
 game  screens     •  Action:
 
 
 •  Reward:  game  score   1. do  nothing   2.  le6   3.  right   slides  by     Yen-­‐Chen  Lin  
  31. 31. •  To  perform  well,  we  should  also  take  future   rewards  into  account,  how  to  do  that?   Total reward: Total future reward: Reward   slides  by     Yen-­‐Chen  Lin  
  32. 32. Discounted  Future  Reward   •  However,  since  the  environment  is   stochas0c,  intui0vely  one  should  earn   reward  as  soon  as  possible   Total discounted future reward: slides  by     Yen-­‐Chen  Lin  
  33. 33. Q  func#on   •  Q(s, a): The  maximum discounted future reward     when  we  perform  ac0on  a  in  state  s,     and  con0nue  optimally  from  that  point  on.   It represents the “quality” of a certain action in a given state. slides  by     Yen-­‐Chen  Lin  
  34. 34. How  to  Choose  Ac#on?   Here  π  represents  the  policy,     the  rule  how  we  choose  an  ac0on  in  each  state.   If  we  know  Q  func0on,   slides  by     Yen-­‐Chen  Lin  
  35. 35. Q  Func#on  Implementa#on   ac#on  0   ac#on  1   ac#on  2   state  0   -­‐2   -­‐1   5   state  1   3   2   3   state  2   5   6   -­‐6   slides  by     Yen-­‐Chen  Lin  
  36. 36. If  We  Use  Pixels  as  State   1.  Resize  images  to  84x84   2.  Convert  to  grayscale  with  256  levels   3.  Use  last  4  frames  to  represent  state   25684x84x4  =  1067970      possible  game  states   We  can  never  cover  all  the  cases!   slides  by     Yen-­‐Chen  Lin  
  37. 37. Vision  &  Controal:  Deep  Q  Network   We  use  CNN  to  represent  Q  func0on,  which  takes:   •  Input:  the  state  (4  game  screens)  and  ac0on   •  Output:  Q-­‐values  of  different  ac0ons  a  (i.e.,  Q(s,a))   slides  by     Yen-­‐Chen  Lin   π(              )=argmaxaQ(                ,a)    
  38. 38. Fusing  Mul#ple  Sensors   Ke#le% Medium+wrap% Ke#le% Medium+wrap% thumb+4+finger% Manipula7on% Region% Side+view% Chan  et  al.  ECCV  2015  from  VSLab  
  39. 39. Left Hand Head Right Hand 81 Lab Office Home
  40. 40. Left Hand Head Right Hand 82 Lab Office Home
  41. 41. Recogni#on  from  Wearable  Cameras   Pred% GT% Pred% GT% Gesture%Recogni1on% Object%Category%Recogni1on%
  42. 42. Real-­‐#me  Wearable  Demo   Fisheye  camera   NVIDIA  TK1  
  43. 43. Real-­‐#me  Wearable  Demo   cellphone,  bo2le,  keyboard,  mouse,  free  hand  
  44. 44. Take-­‐Home  Message   •  Encoding  Source  (data)   – N-­‐D  observa0on   – N-­‐D  sequence  of  observa0ons   •  Decoding  Outcome  (AI  tasks)   – N-­‐D  single  output   – N-­‐D  open-­‐ended  sequence  as  output     •  Mul0ple  Encoding  and  Decoding   •  If  each  module  is  differen0able/approximately   differen0able  -­‐>  End-­‐to-­‐End  Learning   We  get  many  tools  to  tackle   Ar#ficial  General  Intelligence     Just  Try!   Worse  Thing:  Do  Nothing  
  45. 45. My  Two  Cents  for  Taiwan  
  46. 46. Ques#ons   •  Can  I  simply  ask  my  engineers  to  use   open  source  deep  learning  tools  to   create  new  products?   Answer:  Yes  and  Not  really.   Yes  –  if  you  want  to  complete  a  well-­‐known   task.  But  Google’s  MLaaS  product  will  almost   always  beat  you.   Not  really  –  if  you  want  to  solve  your  own   problem,  with  your  own  data.  You  need  talents   or  make  engineers  not  afraid  of  failure.  
  47. 47. Where  can  I  find  talents?   •  Most  talents  are  PhD  students  or  young   professionals  in  the  US  and  EU.   h2p://www.economist.com/news/business/21695908-­‐silicon-­‐valley-­‐fights-­‐talent-­‐universi0es-­‐struggle-­‐hold-­‐their   How  can  we  compete?  
  48. 48. Local  Students   •  Our  students  know  deep  learning  is  HOT!   [  Deep  Learning  Workshop  中研院  ]  500  位參加者  
  49. 49. Case  Study:  NTHU@TW  Undergraduate   h2ps://github.com/yenchenlin1994/DeepLearningFlappyBird  
  50. 50. Case  Study:  UNIST@Korean  Undergraduate  
  51. 51. To-­‐Do  for  Local  Students   •  We  need  more  students  to  work  on     – realis0c  deep  learning  projects  with     – enough  computer  resource   •  We  need  some  of  them  to  stay  in  our  local   industry   Advanced  Deep  Learning  Course  at  NTHU  (105學年)   1.  Taught  by  a  group  of  profs   2.  Topics  including  latest  DNN  models,  distributed   training,  DL  for  embedded  system   3.  Sponsored  by  MTK  and  ITRI  巨資中心   4.  More  sponsors  are  welcomed!  
  52. 52. For  Talents  Abroad   Get  in  the  Talents  Race!   h2p://cvpr2016.thecvf.com/exhibit/industry_expo  
  53. 53. For  Talents  Abroad   Most  of  them  fresh  PhDs   1  Billion  Pledged  USD  
  54. 54. For  Talents  Abroad  
  55. 55. AI  is  happening  Fast  
  56. 56. Thanks!  

×