Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Deep learning continues to push the state of the art in domains such as video analytics, computer vision, and speech recognition. Deep networks are powered by amazing levels of representational power, feature learning, and abstraction. This approach comes at the cost of a significant increase in required compute power, which makes the AWS cloud an excellent environment for training. Innovators in this space are applying deep learning to a variety of applications. One such innovator, Vilynx, a startup based in Palo Alto, realized that the current pre-roll advertising-based models for mobile video weren’t returning publishers' desired levels of engagement. In this session, we explain the algorithmic challenges of scaling across multiple nodes, and what Intel is doing on AWS to overcome them. We describe the benefits of using AWS CloudFormation to set up a distributed training environment for deep networks. We also showcase Vilynx’s contributions to video discoverability, and explain how Vilynx uses AWS tools to understand video content. This session is sponsored by Intel.

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. November 2016 MAC205 Deep Learning at Cloud Scale Improving Video Discoverability by Scaling Up Caffe on AWS Andres Rodriguez, PhD, Solutions Architect, Intel Corporation Juan Carlos Riverio, CEO, Vilynx
  2. 2. Content Outline • Deep learning overview and usages • Worked example for fine-tuning a NN • Some theory behind deep learning • Vilynx – videos discoverability 2
  3. 3. Deep Learning • A branch of machine learning • Data is passed through multiple non-linear transformations • Goal: Learn the parameters of the transformation that minimize a cost function 3
  4. 4. Bigger Data Better Hardware Smarter Algorithms Why Now? Image: 1000 KB / picture Audio: 5000 KB / song Video: 5,000,000 KB / movie Transistor density doubles every 18 months Cost / GB in 1995: $1000.00 Cost / GB in 2015: $0.03 Advances in algorithm innovation, including neural networks, leading to better accuracy in training models 4
  5. 5. Types of Deep Learning • Supervised learning • Data -> Labels • Unsupervised learning • No labels; Clustering; Reducing dimensionality • Reinforcement learning • Reward actions (e.g., robotics) http://ode.engin.umich.edu/presentations/idetc2014/img/image_feature_learning_clear.png 5
  6. 6. data output expected … 0.10 0.15 0.20 …0.05 person cat dog bike 0 1 0 … 0 person cat dog bike penalty (error or cost) … Forward Propagation Back Propagation Training 6
  7. 7. data output expected … person cat dog bike 0 1 0 … 0 person cat dog bike inference Training 0.10 0.15 0.20 0.05 penalty (error or cost) 7 … … Forward Propagation Back Propagation
  8. 8. Deep Learning Use Cases • Fraud / face detection • Gaming, check processing • Computer server monitoring • Financial forecasting and prediction • Network intrusion detection • Recommender systems • Personal assistant • Automatic Speech recognition • Natural language processing • Image & Video recognition/tagging • Targeted Ads Cloud Service Providers Financial Services Healthcare Automotive 8
  9. 9. Optimized Deep Learning Environment Fuel the development of vertical solutions Deliver excellent deep learning environment Develop deep networks across frameworks Maximum performance on Intel architecture EC2 Intel® Math Kernel Library (Intel® MKL) 9
  10. 10. Elastic Compute Cloud (EC2) C4 Instances • “Highest performing processors and the lowest price/compute performance in EC2”1 • Vilynx • Deep learning for video content extraction • Supports various companies: CBS, TBS, etc. • 1https://aws.amazon.com/ec2/instance-types/https://www.stlmag.com/news/st-louis-app-pikazo-will-turn-your-profile-picture/ • Pikazo app • Transforms photos into artistic render 10
  11. 11. Elastic Compute Cloud (EC2) C4 Instances c4.8xlarge On-Demand: • $1.675/hr GoogleNet inference: • batch size 32 • 237 ims/sec = 4.2 ms/im • 1 million images costs $1.96 Spot prices are cheaper OS: Linux version 3.13.0-86-generic (buildd@lgw01-51) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #131-Ubuntu SMP Thu May 12 23:33:13 UTC 2016. MxNet Tip of tree: commit de41c736422d730e7cfad72dd6afc229ce08cf90, Tue Nov 1 11:43:04 2016 +0800. MKL 2017 Gold update 1 11 6.1 2.4 1.2 0.8 679.5 262.5 79.7 73.9 0 200 400 600 800 AlexNet GoogLeNet v1 ResNet-50 GoogLeNet v3 Images/Sec c4.8xlarge MXNet Inference No MKL MKL
  12. 12. Intel® Math Kernel Library 2017 (Intel® MKL 2017) • Optimized for EC2 instances with Intel® Xeon® CPUs • Optimized for common deep learning operations • GEMM (useful in RNNs and fully connected layers) • Convolutions • Pooling • ReLU • Batch normalization Recurrent NN Convolutional NN 12
  13. 13. Naïve Convolution https://en.wikipedia.org/wiki/Convolutional_neural_network 13
  14. 14. Cache Friendly Convolution arxiv.org/pdf/1602.06709v1.pdf 14
  15. 15. Gradient Descent 𝐽 𝒘(0) = 𝑖=1 𝑁 𝑐𝑜𝑠𝑡(𝒘(0) , 𝒙𝑖) 𝒘𝒘(0) 15
  16. 16. Gradient Descent 𝐽 𝒘(0) = 𝑖=1 𝑁 𝑐𝑜𝑠𝑡(𝒘(0) , 𝒙𝑖) 𝒘𝒘(0) 𝑑𝐽 𝒘(0) 𝑑𝒘 16
  17. 17. Gradient Descent 𝐽 𝒘(0) = 𝑖=1 𝑁 𝑐𝑜𝑠𝑡(𝒘(0) , 𝒙𝑖) 𝒘𝒘(0) 𝒘(1) = 𝒘(0) − 𝑑𝐽 𝒘(0) 𝑑𝒘 17
  18. 18. Gradient Descent 𝐽 𝒘(0) = 𝑖=1 𝑁 𝑐𝑜𝑠𝑡(𝒘(0) , 𝒙𝑖) 𝒘𝒘(0) 𝒘(1) = 𝒘(0) − 𝛼 𝑑𝐽 𝒘(0) 𝑑𝒘 learning rate 18
  19. 19. Gradient Descent 𝐽 𝒘(0) = 𝑖=1 𝑁 𝑐𝑜𝑠𝑡(𝒘(0) , 𝒙𝑖) 𝒘𝒘(0) 𝒘(1) = 𝒘(0) − 𝛼 𝑑𝐽 𝒘(0) 𝑑𝒘 𝒘(1) too small 19
  20. 20. Gradient Descent 𝐽 𝒘(0) = 𝑖=1 𝑁 𝑐𝑜𝑠𝑡(𝒘(0) , 𝒙𝑖) 𝒘𝒘(0) 𝒘(1) = 𝒘(0) − 𝛼 𝑑𝐽 𝒘(0) 𝑑𝒘 𝒘(1) too large 20
  21. 21. Gradient Descent 𝐽 𝒘(0) = 𝑖=1 𝑁 𝑐𝑜𝑠𝑡(𝒘(0) , 𝒙𝑖) 𝒘𝒘(0) 𝒘(1) = 𝒘(0) − 𝛼 𝑑𝐽 𝒘(0) 𝑑𝒘 𝒘(1) good enough 21
  22. 22. Gradient Descent 𝐽 𝒘(1) = 𝑖=1 𝑁 𝑐𝑜𝑠𝑡(𝒘(1) , 𝒙𝑖) 𝒘𝒘(2) 𝒘(2) = 𝒘(1) − 𝛼 𝑑𝐽 𝒘(1) 𝑑𝒘 𝒘(1) 22
  23. 23. Gradient Descent 𝐽 𝒘(2) = 𝑖=1 𝑁 𝑐𝑜𝑠𝑡(𝒘(2) , 𝒙𝑖) 𝒘 𝒘(3) = 𝒘(2) − 𝛼 𝑑𝐽 𝒘(2) 𝑑𝒘 𝒘(2) 𝒘(3) 23
  24. 24. Gradient Descent 𝐽 𝒘(3) = 𝑖=1 𝑁 𝑐𝑜𝑠𝑡(𝒘(3) , 𝒙𝑖) 𝒘 𝒘(4) = 𝒘(3) − 𝛼 𝑑𝐽 𝒘(3) 𝑑𝒘 𝒘(4) 𝒘(3) 24
  25. 25. Transfer learning via fine-tuning • First few layers are usually very similar within a domain • Last layers are task specific • Take a trained model and fine-tune it for a particular task http://vision.stanford.edu/Datasets/collage_s.png https://www.kaggle.com/c/dogs-vs-cats http://adas.cvc.uab.es/task-cv2016/papers/0026.pdf 25
  26. 26. • Install Intel-Optimized Caffe (or your favorite framework) • https://software.intel.com/en-us/articles/training-and-deploying-deep- learning-networks-with-caffe-optimized-for-intel-architecture • Download a pre-trained model • http://dl.caffe.berkeleyvision.org/bvlc_reference_caffenet.caffemodel • Modify the training model (next slide) Fine-tuning steps 26
  27. 27. Fine-tuning: ILSVRC -> DogsVsCats layer { name: "data" type: "Data" data_param { source: "ilsvrc12_train_lmdb" ... } ... } ... layer { name: "fc8" type: "InnerProduct" inner_product_param { num_output: 1000 ... } } layer { name: "data" type: "Data" data_param { source: “dogs_cats_train_lmdb" ... } ... } ... layer { name: "fc8-ft" type: "InnerProduct" inner_product_param { num_output: 2 ... } } >> # From the command line >> caffe train -solver solver.prototxt -weights trainedModel.caffemodel 27
  28. 28. Fine-tuning guidelines • Freeze all but the last layer (or more if new dataset is very different) • lr_mult=0 in local learning rates • Earlier layer weights won't change very much • Drop the initial learning rate (in the solver.prototxt) by 10x Replace 1000 with 2 unit layer Train the 4096+1 x 2 weights http://www.mdpi.com/remotesensing/remotesensing-07-14680/article_deploy/html/images/remotesensing-07-14680-g002-1024.png 28
  29. 29. Demo • Fine-tune trained model for dog vs cats http://vision.stanford.edu/Datasets/collage_s.png https://www.kaggle.com/c/dogs-vs-cats 29
  30. 30. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Juan Carlos Riveiro: CEO and Cofounder 30
  31. 31. How?... Building the biggest dataset for video deep learning by auto tagging selected video scenes in real-time and leveraging web and social media to continues update the tags Hello. We're Vilynx, the video personalization company We select the relevant contents targeted to individual needs solving the content discovery problem. Benefit?.. Increase views, time spent watching videos and in video search. Markets: Media, Smart Phones, Drones, Security, Robots, Smart Cities. 31
  32. 32. Outstanding Tech Team: Experienced and Very Successful Juan Carlos Riveiro, CEO More than 100 patents in Signal Processing, Data statistics/algorithms and Machine Learning. Founder and CEO of Gigle Networks (Acquired by Broadcom), CTO & VP of R&D at DS2 (Acquired by Marvell). Elisenda Bou, CTO PhD from UPC and MIT and expert on Machine Learning and Complex SW Architectures. Worked on adaptive satellite control systems and recipient of the 2013 Google Faculty Research Awards. José Cordero Rama MS for Deep Learning at UPC/BSC Data Scientist at King, Bdigital and Gen-Med Joan Capdevila, PhD MS and PhD for Machine Learning At Georgia Tech and UPC/BSC Data Scientist at AIS and Accenture Jordi Pont-Tuset, PhD PostDoc on Machine Learning at ETH Zurich PhD on Image Segmentation at UPC Disney Research Asier Aduriz Computer Science and Telecom Engineering degree at UPC (Top 1% of class) Engineer at CERN. Dèlia Fernàndez MS on Deep Learning at Columbia University Signal Processing Researcher at Northeastern University Data Scientist at InnoTech David Varas, PhD PhD for Video Object Tracking at UPC Adjunct Professor on Computer Vision & Statistical Signal Processing at UPC 32
  33. 33. Vilynx: Indexing Visual Knowledge 8 cameras/car Smart Cities Connecting Everything VR/AR Changing Everything A camera at every corner in London Drones everywhere (Amazon) How is all this visual content going to be indexed? Just like the internet before Google +1000 hours of video uploaded every minute in internet 33
  34. 34. The Vilynx Knowledge Graph The average vocabulary of a 5-year old is 5000 words • 4800 words/concepts • 1.8 tags per video • 8M videos The average vocabulary of an adult is 30,000 words • 2M words/concepts • 50 tags per video • 10M videos 34
  35. 35. First Market driven by Video Content Producers Media companies need content personalization to drive audience through multiple channels 35
  36. 36. Some Customer Examples: http://www.cbs.com/shows/the-late- show-with-stephen-colbert/ https://www.americasgreatestmakers.com/ http://www.vanitatis.elconfidencial.com/ 36
  37. 37. Vilynx Products Inputs: Outputs: Applications: 37 Videos Audience Data Contextual Data: Social Networks, YouTube, Web Key 5 sec clips Intelligent Auto Tagging • Better video discovery • Native Ad integration • Programmatic Ad matching • More video views and longer engagement times • VOD & Live Events • Drive branding • Amplification with keyword recommendation • Drive Click through rates • Better user experience Video Thumbnails Social Sharing Recommendations Video Search Ad Market
  38. 38. Vilynx | Workflow Machine Learning or Deep Learning 4 3 12 98% accuracy to find the relevant parts of the video CTR increase between 50% to 500% (customer validated) 38 1. We ingest customer videos and the contextual information around it. 2. We then take cues from around the Web and social networks. 3. This combined input is fed to the most advanced convolutional deep neural network in the industry. 4. Output are video previews optimized to engage your audience and rich metadata that can further drive your video content.
  39. 39.  A data training set of video moments that includes:  10M (and growing) tagged 5 sec video moments, ImageNet for video has only 4000 moments  2M Contextual tags (and growing)  Continuously updated training set of new tags by crawling of social media/the web  Real time unsupervised training of the network to autonomously learn and identify new patterns Advancing Deep Learning Networks: Move from simple classification to indexing all visual content 39
  40. 40. Demo Results • Fine-tune dogs vs cats classifier results http://vision.stanford.edu/Datasets/collage_s.png https://www.kaggle.com/c/dogs-vs-cats 40
  41. 41. Call to action • Use Intel Optimized Frameworks for workloads • https://github.com/intel/caffe • https://github.com/dmlc/mxnet • https://github.com/intel/theano • https://github.com/intel/torch • other frameworks coming soon… • Deep learning tutorial • https://software.intel.com/en-us/articles/training-and-deploying-deep-learning-networks-with-caffe- optimized-for-intel-architecture • Distributed training of deep networks on AWS • https://software.intel.com/en-us/articles/distributed-training-of-deep-networks-on-amazon-web- services-aws 41
  42. 42. Legal Notices & Disclaimers This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel, the Intel logo, Pentium, Celeron, Atom, Core, Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2016 Intel Corporation. 42
  43. 43. Thank you! (huge) contributions from: Joseph Spisak, Elisenda Bou, Hendrik Van der Meer, Zhenlin Luo, Ravi Panchumarthy, Ryan Saffores, Niv Sundaram, and many more..
  44. 44. Remember to complete your evaluations!

×