Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Intelligent Thumbnail Selection
Kamil Sindi, Lead Data Scientist
JW Player
1. Company
a. Open-source video player
b. Hosting platform
c. 5% of global internet video traffic
d. 150+ team
2...
Thumbnails are Important
● Your video's first impression
● Types: Upload, Manual, Auto (default)
● Manual >> Auto in Play ...
What’s a “Good” Thumbnail?
It’s subjective to the viewer!
Common themes:
● Not blurry
● Balanced brightness
● Centered obj...
Manually Creating a Model is Hard
● Which features to extract?
● How to describe those features?
● How to weight features?...
Deep Learning
● Learn features implicitly
● Learn from examples
● Techniques to avoid overfitting
● Success in a lot of ap...
Inception
● Learn multiple models in parallel; concatenate
their outputs (“modules”)
● Factoring convolutions (“towers”): ...
1. Dimensionality reduction: fewer
channels, strides, feature pooling
2. Parameter reduction: faster, less
overfitting
3. ...
InceptionV3 Architecture
https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html
Dog (0.80)
Cat ...
Transfer Learning
1,000,000 images, 1,000 categories● Use pre-trained model
○ Cheaper (no GPU required)
○ Faster
○ Prevent...
Fine Tuning + Tips
● Change classification layer +
backprop layers back
● Idea:
Early layers do basic filters; later
layer...
Other Applications of Transfer Learning
Google “Show and Tell”
https://github.com/tensorflow/models/tree/master/im2txt
Ima...
Training: Thesis
Train to differentiate between Manual and Auto
● Manual thumbnails are (usually) better than Auto
● Selec...
Training: Examples
Positive (Manual)
Negative Examples
Negative (Auto)
Video Pre-Filter
Use FFMPEG to select top 100 frame
candidates
Methods:
● Color histogram changes to avoid
dupes
● Coded M...
Motion Vectors
Source: Sintel, Blender Studios
Engineering
Demo: Evaluation Tool
Demo: Examples Original Auto (10th second frame)
Top scored frames
from new model
What’s Next
● Refinements:
○ Fine tuning to earlier layers
○ Other models: ResnetV2, Xception
○ Pre-Filtering: adaptive, h...
Resources
Blog Posts:
● https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html
● https://github...
Upcoming SlideShare
Loading in …5
×

Intelligent Thumbnail Selection

1,169 views

Published on

Building a Tensorflow-based model that extracts the "best" frames from a video, which are then used as auto-generated thumbnails and thumbstrips. We used transfer learning on Google's Inceptionv3 model, which was pretrained with ImageNet data and retrained on JW Player's thumbnail library.

Published in: Internet
  • Be the first to comment

Intelligent Thumbnail Selection

  1. 1. Intelligent Thumbnail Selection Kamil Sindi, Lead Data Scientist
  2. 2. JW Player 1. Company a. Open-source video player b. Hosting platform c. 5% of global internet video traffic d. 150+ team 2. Data Team a. Handling 5MM events per minute b. Storing 1TB+ per day c. Stack: Storm (Trident), Kafka, Luigi, Elasticsearch, Spark, AWS, MySQL Customers
  3. 3. Thumbnails are Important ● Your video's first impression ● Types: Upload, Manual, Auto (default) ● Manual >> Auto in Play Rate ● Current Auto is 10th second frame ● Many big publishers only use Manual ● 90% of Thumbnails are Auto! :-( source: tastingtable.com (2016-10-12)
  4. 4. What’s a “Good” Thumbnail? It’s subjective to the viewer! Common themes: ● Not blurry ● Balanced brightness ● Centered objects ● Large text overlay ● Relevant to subject vs Source: Big Buck Bunny, Blender Studios
  5. 5. Manually Creating a Model is Hard ● Which features to extract? ● How to describe those features? ● How to weight features? ● How to penalize overfitting of models? ● Many techniques: SIFT, SURF, HOG? Need to be an expert in Computer Vision :-( Edge Detection Color Histogram Pixel Segmentation So Many Image Features...
  6. 6. Deep Learning ● Learn features implicitly ● Learn from examples ● Techniques to avoid overfitting ● Success in a lot of applications: ○ Image classification ○ Image captioning ○ Machine translation ○ Speech-to-Text
  7. 7. Inception ● Learn multiple models in parallel; concatenate their outputs (“modules”) ● Factoring convolutions (“towers”): e.g. 1x1 convs followed by 3x3 ● Parameter reduction: GoogleNet (5MM) vs. AlexNet (60MM), VGG (200MM) ● Auxiliary classifiers for regularization ● Residual connections (Inceptionv4) ● Depthwise separable convolutions (Xception) https://www.udacity.com/course/deep-learning--ud730 https://arxiv.org/abs/1409.4842 Source: Rethinking the Inception Architecture for Computer Vision
  8. 8. 1. Dimensionality reduction: fewer channels, strides, feature pooling 2. Parameter reduction: faster, less overfitting 3. “Cheap” nonlinearity: 1x1 + 3x3 is non-lin 4. Cross-channel ⊥ spatial correlations 1x1 Convolutions: what’s the point? 1x1 convolution with strides Pooling with 1x1 convolution Source: http://iamaaditya.github.io/2016/03/one-by-one-convolution/ In Convolutional Nets, there is no such thing as “fully-connected layers”. There are only convolution layers with 1x1 convolution kernels. – Yann LeCun
  9. 9. InceptionV3 Architecture https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html Dog (0.80) Cat (0.05) Rat (0.01) ...
  10. 10. Transfer Learning 1,000,000 images, 1,000 categories● Use pre-trained model ○ Cheaper (no GPU required) ○ Faster ○ Prevents overfitting ● Penultimate (“Bottleneck”) layer contains image’s “essence” (CNN codes); acts as a feature extractor ● Just add a linear classifier (Softmax; lin-SVM) to Bottleneck
  11. 11. Fine Tuning + Tips ● Change classification layer + backprop layers back ● Idea: Early layers do basic filters; later layers more dataset specific ● Generally use a pre-trained model regardless of data size or similarity Data Size (per class) < 500 > 500 > 5,000 Similar to original Too small TL TL + FT earlier layers Not Similar Too small TL on earlier layers TL + FT entire network
  12. 12. Other Applications of Transfer Learning Google “Show and Tell” https://github.com/tensorflow/models/tree/master/im2txt Image Captioning Image Search http://www.slideshare.net/ScottThompson90/applying-transfer-learn ing-in-tensorflow
  13. 13. Training: Thesis Train to differentiate between Manual and Auto ● Manual thumbnails are (usually) better than Auto ● Select Manual with high views and play rate; Auto selection is random but low plays ● We have a lot of examples: 10K+ manual ● We used InceptionV3 pre-trained on ImageNet
  14. 14. Training: Examples Positive (Manual) Negative Examples Negative (Auto)
  15. 15. Video Pre-Filter Use FFMPEG to select top 100 frame candidates Methods: ● Color histogram changes to avoid dupes ● Coded Macroblock information ● Remove “black” frame ● Measure motion vectors
  16. 16. Motion Vectors Source: Sintel, Blender Studios
  17. 17. Engineering
  18. 18. Demo: Evaluation Tool
  19. 19. Demo: Examples Original Auto (10th second frame) Top scored frames from new model
  20. 20. What’s Next ● Refinements: ○ Fine tuning to earlier layers ○ Other models: ResnetV2, Xception ○ Pre-Filtering: adaptive, hardware accel. ● Products: ○ New auto thumbnails ○ Thumbstrips
  21. 21. Resources Blog Posts: ● https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html ● https://github.com/tensorflow/models/tree/master/inception ● http://iamaaditya.github.io/2016/03/one-by-one-convolution/ ● http://www.slideshare.net/ScottThompson90/applying-transfer-learning-in-tensorflow ● https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html ● http://cs231n.github.io/transfer-learning/ ● https://research.googleblog.com/2015/10/improving-youtube-video-thumbnails-with.html ● https://pseudoprofound.wordpress.com/2016/08/28/notes-on-the-tensorflow-implementation-of-inception-v3/ ● https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html Papers: ● Rethinking the inception architecture for computer vision. https://arxiv.org/abs/1512.00567 ● Xception: Deep Learning with Depthwise Separable Convolutions. https://arxiv.org/abs/1610.02357 ● CNN Features off-the-shelf: an Astounding Baseline for Recognition. https://arxiv.org/abs/1403.6382 ● DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. https://arxiv.org/abs/1310.1531 ● How transferable are features in deep neural networks? https://arxiv.org/abs/1411.1792

×