Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Roland Memisevic at AI Frontiers : Using Video to Make Your Assistant See

141 views

Published on

In this talk, I will introduce an AI system that interacts with you while "looking" at you - to understand your behaviour, your surroundings and the full context of the engagement. At the core of this technology is a crowd acting-platform, that allows humans to engage with and teach the system about everyday aspects of our lives and of our physical world. Combining this with deep neural networks makes it possible to generate a high degree human-like "awareness" of everyday scenes and situations. I will describe how this technology allows devices, ranging from information kiosks to cars, to engage with humans more naturally and instinctively, and how TwentyBN uses this ability to create commercial value for our customers.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Roland Memisevic at AI Frontiers : Using Video to Make Your Assistant See

  1. 1. Using video to make your assistant see Roland Memisevic AI Frontiers 2018
  2. 2. 3 TwentyBN’s real-time vision system
  3. 3. Crowd-Acting 1 We create labels describing complex scenes and actions We distribute the labels using our patented platform to workers all over the world Workers record, submit and verify videos, resulting in high quality, densely labelled video data for training nets
  4. 4. Data variability Camera angles and scene layouts Multi-person actions and localization Interactivity Complex object interactions
  5. 5. 15 people, 3 street signs
  6. 6. Contrastive examples
  7. 7. Network training Data platform The data loop Real-time engineering User Interface
  8. 8. Dog Cat
  9. 9. Something-something (V2): recognize complex activities end-to-end
  10. 10. (Mahdisoltani et al 2018)
  11. 11. 13 GradCam visualizations
  12. 12. 14 20bn-kitchenware
  13. 13. 20bn-kitchenware from scratch ImageNet baselines Smth-Smth: classification on 40 action groups Holding something Smth-Smth: classification on 178 class actions Holding [something] next to [something] Captioning on full captions Holding a metallic cup next to a big blue box Captioning on “single-object” captions Holding cup next to box
  14. 14. https://github.com/TwentyBN/smth-smth-v2-baseline-with-models
  15. 15. 20bn.com
  16. 16. Twenty Billion Neurons fact sheet: ● Founded in 2015 by three Machine Learning researchers ● The company mission is to make cameras see just like the human eye ● The company builds computer vision systems that run at the edge and that are driven by a single RGB camera ● The company counts 20 full-time staff across Berlin and Toronto
  17. 17. 19 Contrastive groups After training on all 174 classes, evaluate classifier within contrastive group: argmax p(class|video, group)
  18. 18. 20 Contrastive groups After training on all 174 classes, evaluate classifier within contrastive group: argmax p(class|video, group)
  19. 19. Common sense score

×