Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep learning and computer vision

230 views

Published on

Updated version of the Practical Guide to Deep Learning with some actual projects, and our process of working through deep learning projects - less end to end, more joining several models together

Published in: Engineering
  • Be the first to comment

Deep learning and computer vision

  1. 1. DEEP LEARNING IN PRACTICE Tess Ferrandez – Microsoft - @TessFerrandez
  2. 2. TIM MITYA YANA VITO CLAUS TESS
  3. 3. SHOTS ON GOAL
  4. 4. DETECTING CANCER
  5. 5. SHOPLIFTING
  6. 6. DEEP LEARNING What’s so magical about it
  7. 7. int EstimatePrice(...){ price = 100000 + 67000 * area_in_sqm + 200000 * has_pool + 100000 * new_kitchen + 5000 * neighborhood_quality; return price; } Price = b + w1*area_in_sqm + w2*has_pool + ...
  8. 8. [LINEAR REGRESSION]
  9. 9. GIVEN ENOUGH SAMPLES, A NEURAL NETWORK WILL FIND THE PATTERN
  10. 10. GIVEN ENOUGH SAMPLES, A NEURAL NETWORK WILL FIND THE PATTERN
  11. 11. SEQUENCESPACE CONVOLUTIONAL NN RECURRENT NNDENSE NN
  12. 12. [ 0.01949719, 0.09399229, -0.01618082, -0.00876935, 0.03146157, 0.06853894, 0.00096175, -0.06854118, -0.04771797, -0.05296798, 0.02119147, 0.00511259, 0.13726683 INTERMEDIATE REPRESENTATIONEMBEDDINGSECRET CODE
  13. 13. RECOMMEND A BOOK
  14. 14. -1.0 1.0 ADULTYOUTH
  15. 15. ADULT FICTION (-0.6, 0.4) * (0.7, 0.9) *
  16. 16. FICTION ADULT MATH REFERENCES US-CENTRIC CHICK LIT FUNNY SCI-FI LAWYERS WOULD BRAD PITT PLAY A CHARACTER IN THE MOVIE? EMBEDDING [ 0.01949719, 0.09399229, -0.01618082, -0.00876935, 0.03146157, 0.06853894, 0.00096175, -0.06854118, -0.04771797, -0.05296798, 0.02119147, 0.00511259, 0.1372668
  17. 17. WORD EMBEDDINGS
  18. 18. [ 0.01949719, 0.09399229, -0.01618082, -0.00876935, 0.03146157, 0.06853894, 0.00096175, -0.06854118, -0.04771797, -0.05296798, 0.02119147, 0.00511259, 0.1372668
  19. 19. [FACENET] T-SNE Projection of 128D to 2D
  20. 20. FACE RECOGNITION DEMO
  21. 21. SEGMENTATION encoder decoder
  22. 22. encoder decoder
  23. 23. encoder decoder
  24. 24. IN PRACTICE The secrets behind the magic
  25. 25. Time for the Epoch Training data Validation data
  26. 26. MODEL LOSS ACCURACY BASIC 0.2507 91.05%
  27. 27. OOPSIE DOOPSIE! We’re overfitting
  28. 28. Chihuahua the movie [DATA AUGMENTATION]
  29. 29. [DROPOUT]
  30. 30. MODEL LOSS ACCURACY BASIC 0.2507 91.05% AUGMENTATION 0.1988 93.68%
  31. 31. MODEL LOSS ACCURACY BASIC 0.2507 91.05% AUGMENTATION 0.1988 93.68% TRANSFER LEARN 0.01253 99.47%
  32. 32. APPLIED MACHINE LEARNING When the magic is gone, and we’re left with Software Engineering
  33. 33. UNDERSTAND THE BUSINESS NEEDS
  34. 34. WHAT IS THE PROBLEM? HOW WILL THE MODEL BE USED? / REQUIREMENTS HOW IS IT DONE TODAY? IS IT FEASIBLE? ETHICAL CONCERNS UNDERSTAND THE BUSINESS NEEDS
  35. 35. UNDERSTAND THE BUSINESS NEEDS MINE CLEAN EXPLORE
  36. 36. UNDERSTAND THE BUSINESS NEEDS MINE CLEAN EXPLORE ENGINEER MODEL DEPLOY
  37. 37. LOTS OF LABLED SAMPLES and NO CONSEQUENTIAL DECISIONS SHOTS ON GOAL
  38. 38. LOUD CROWD GOAL VISIBLE SPEED/DIRECTION PLAYER DENSITY PLAYER POSES
  39. 39. SCENE CHANGES GOAL IN VIEW NEGATIVE SAMPLING
  40. 40. 5S VIDEOS - AROUND ACTION NEGATIVE SAMPLES FROM ATTACKS
  41. 41. VGG EMBEDDINGS
  42. 42. 90+% MODEL ACCURACY GRASS? GOAL? SCENE CHANGE
  43. 43. ShotNoShot https://github.com/tyiannak/pyAudioAnalysis AUDIO
  44. 44. PEOPLE CLUSTERS - SIZES
  45. 45. https://github.com/fizyr/keras-retinanet model_path = 'c:/Tess/source/vision_samples/models/resnet50_coco_best_v2.1.0.h5' model = models.load_model(model_path, backbone_name='resnet50’) image_path = 'C:/Tess/source/vision_samples/data/images/basket_image.jpg' image = read_image_bgr(image_path) image = preprocess_image(image) image, scale = resize_image(image) # process image boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0)) from keras_retinanet import models from keras_retinanet.utils.image import read_image_bgr, preprocess_image, resize_image
  46. 46. GOAL / NO GOAL
  47. 47. SCENE CHANGE DETECTION
  48. 48. FOCUSED OPTICAL FLOW ON PLAYERS
  49. 49. DETECTING CANCER VERY FEW POSITIVE SAMPLES EXTREME ACCURACY NEEDS POTENTIAL FOR BIAS
  50. 50. HARD TO DIFFERENTIATE ONLY PARTIALLY LABLED EXTREMELY LARGE IMAGES
  51. 51. COLOR SEGMENTATION
  52. 52. CONVEX HULL
  53. 53. SHOPLIFTING VERY FEW POSITIVE SAMPLES VERY FEW SAMPLES PER ACTION TYPE VERY SENSITIVE TO BIAS
  54. 54. COVERED FACES ALONE MEN 20-40 HOODIES SHOPLIFTING POSES MEN 20-40 HOODIESCOVERED FACES ALONE
  55. 55. 12:32:00CHRISTMAS FISH EYE DETECT not PREDICT HUD ARTIFACTS
  56. 56. NEGATIVE SAMPLES FROM SAME VIDEOS PEOPLE SHOPPING
  57. 57. POSE DETECTION
  58. 58. BACKGROUND SUBTRACTION
  59. 59. CLASSIFICATION AT THE BOX LEVEL
  60. 60. A LITTLE DOMAIN KNOWLEDGE GOES A LONG WAY
  61. 61. KISSKeep it Simple …
  62. 62. DEEP LEARNING IN PRACTICE Tess Ferrandez – Microsoft - @TessFerrandez

×