DEEP LEARNING IN PRACTICE
Tess Ferrandez – Microsoft - @TessFerrandez
TIM
MITYA
YANA
VITO
CLAUS
TESS
SHOTS ON GOAL
DETECTING CANCER
SHOPLIFTING
DEEP LEARNING
What’s so magical about it
int EstimatePrice(...){
price = 100000 +
67000 * area_in_sqm +
200000 * has_pool +
100000 * new_kitchen +
5000 * neighborhood_quality;
return price;
}
Price = b + w1*area_in_sqm + w2*has_pool + ...
[LINEAR REGRESSION]
GIVEN ENOUGH SAMPLES,
A NEURAL NETWORK WILL
FIND THE PATTERN
GIVEN ENOUGH SAMPLES,
A NEURAL NETWORK WILL
FIND THE PATTERN
SEQUENCESPACE
CONVOLUTIONAL NN RECURRENT NNDENSE NN
[ 0.01949719, 0.09399229, -0.01618082, -0.00876935, 0.03146157, 0.06853894, 0.00096175, -0.06854118, -0.04771797, -0.05296798, 0.02119147, 0.00511259, 0.13726683
INTERMEDIATE REPRESENTATIONEMBEDDINGSECRET CODE
RECOMMEND A BOOK
-1.0 1.0
ADULTYOUTH
ADULT
FICTION
(-0.6, 0.4) *
(0.7, 0.9) *
FICTION ADULT
MATH REFERENCES
US-CENTRIC
CHICK LIT
FUNNY
SCI-FI
LAWYERS
WOULD BRAD PITT PLAY A
CHARACTER IN THE MOVIE?
EMBEDDING
[ 0.01949719, 0.09399229, -0.01618082, -0.00876935, 0.03146157, 0.06853894, 0.00096175, -0.06854118, -0.04771797, -0.05296798, 0.02119147, 0.00511259, 0.1372668
WORD
EMBEDDINGS
[ 0.01949719, 0.09399229, -0.01618082, -0.00876935, 0.03146157, 0.06853894, 0.00096175, -0.06854118, -0.04771797, -0.05296798, 0.02119147, 0.00511259, 0.1372668
[FACENET]
T-SNE
Projection of
128D to 2D
FACE RECOGNITION DEMO
SEGMENTATION
encoder decoder
encoder decoder
encoder decoder
IN PRACTICE
The secrets behind the magic
Time for the Epoch
Training data
Validation data
MODEL LOSS ACCURACY
BASIC 0.2507 91.05%
OOPSIE DOOPSIE!
We’re overfitting
Chihuahua the movie
[DATA AUGMENTATION]
[DROPOUT]
MODEL LOSS ACCURACY
BASIC 0.2507 91.05%
AUGMENTATION 0.1988 93.68%
MODEL LOSS ACCURACY
BASIC 0.2507 91.05%
AUGMENTATION 0.1988 93.68%
TRANSFER LEARN 0.01253 99.47%
APPLIED MACHINE LEARNING
When the magic is gone, and we’re left with Software Engineering
UNDERSTAND THE BUSINESS NEEDS
WHAT IS THE PROBLEM?
HOW WILL THE MODEL BE
USED? / REQUIREMENTS
HOW IS IT DONE TODAY?
IS IT FEASIBLE?
ETHICAL CONCERNS
UNDERSTAND THE BUSINESS NEEDS
UNDERSTAND THE
BUSINESS NEEDS
MINE CLEAN EXPLORE
UNDERSTAND THE
BUSINESS NEEDS
MINE CLEAN EXPLORE
ENGINEER MODEL DEPLOY
LOTS OF LABLED SAMPLES and
NO CONSEQUENTIAL DECISIONS
SHOTS ON GOAL
LOUD CROWD
GOAL VISIBLE
SPEED/DIRECTION PLAYER DENSITY
PLAYER POSES
SCENE CHANGES
GOAL IN VIEW
NEGATIVE SAMPLING
5S VIDEOS - AROUND ACTION
NEGATIVE SAMPLES FROM ATTACKS
VGG EMBEDDINGS
90+%
MODEL
ACCURACY
GRASS?
GOAL?
SCENE CHANGE
ShotNoShot
https://github.com/tyiannak/pyAudioAnalysis
AUDIO
PEOPLE CLUSTERS - SIZES
https://github.com/fizyr/keras-retinanet
model_path =
'c:/Tess/source/vision_samples/models/resnet50_coco_best_v2.1.0.h5'
model = models.load_model(model_path, backbone_name='resnet50’)
image_path = 'C:/Tess/source/vision_samples/data/images/basket_image.jpg'
image = read_image_bgr(image_path)
image = preprocess_image(image)
image, scale = resize_image(image)
# process image
boxes, scores, labels = model.predict_on_batch(np.expand_dims(image,
axis=0))
from keras_retinanet import models
from keras_retinanet.utils.image import read_image_bgr, preprocess_image,
resize_image
GOAL / NO GOAL
SCENE CHANGE DETECTION
FOCUSED OPTICAL FLOW ON PLAYERS
DETECTING CANCER
VERY FEW POSITIVE SAMPLES
EXTREME ACCURACY NEEDS
POTENTIAL FOR BIAS
HARD TO DIFFERENTIATE
ONLY PARTIALLY LABLED
EXTREMELY LARGE IMAGES
COLOR SEGMENTATION
CONVEX HULL
SHOPLIFTING
VERY FEW POSITIVE SAMPLES
VERY FEW SAMPLES PER ACTION TYPE
VERY SENSITIVE TO BIAS
COVERED FACES
ALONE
MEN 20-40 HOODIES
SHOPLIFTING POSES
MEN 20-40 HOODIESCOVERED FACES
ALONE
12:32:00CHRISTMAS
FISH EYE
DETECT not PREDICT HUD ARTIFACTS
NEGATIVE SAMPLES FROM SAME VIDEOS
PEOPLE SHOPPING
POSE DETECTION
BACKGROUND SUBTRACTION
CLASSIFICATION AT THE BOX LEVEL
A LITTLE
DOMAIN KNOWLEDGE
GOES A LONG WAY
KISSKeep it Simple …
DEEP LEARNING IN PRACTICE
Tess Ferrandez – Microsoft - @TessFerrandez

Deep learning and computer vision