Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM

299 views

Published on

Presented at Lucene/Solr Revolution 2017

Published in: Technology
  • Be the first to comment

Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM

  1. 1. Solr and Machine Vision Scott Cote and Trevor Grant Lucidworks / IBM
  2. 2. ABOUT US Trevor Grant  PMC: Apache Mahout Apache Streams  IBM: Open Source Evangelist “AI Engineer”    @rawkintrevo  www.rawkintrevo.org Scott Cote  Organizer: DFW Data Science Mahout Fan  Lucidworks: Senior Software Engineer (Fusion Core Team)  @scottccote @dfwdatascience
  3. 3. ACT 1 The Maths
  4. 4. DEEP LEARNING: AN OVERVIEW  Deep learning is an exciting new technology with numerous applications, such as detecting cats in pictures, creating nonsensical manuscripts, “completing” un finished symphonies, magically returning your company to profitability after decades of poor management through clever application of buzzwords, etc.
  5. 5. DEEP LEARNING: AN OVERVIEW
  6. 6. WHO’S INTERESTED IN “DEEP LEARNING”
  7. 7. WHAT I THINK YOU DO
  8. 8. DEEP LEARNING: SLOW TRAINING / PREDICTION TIMES
  9. 9. ALTERNATIVELY- EXPENSIVE (AND STILL SLOW)
  10. 10. BECAUSE YOU DON’T
  11. 11. IMAGE DETECTION Haar Cascade Filters Deep Learning Speed of training Days Months Speed of prediction Ultrafast Not great Accuracy Slightly lower Higher to MUCH higher (domain) Type of recognition Well understood problem (faces) Poorly understood problem (darkmatter) Best Use-case •  You understand the domain •  You can use multiple methods •  You have limited resources: •  Limited Time •  Limited Compute Power •  Limited $$$
  12. 12. DON’T HURT YOUR EYES (IMAGE DETECTION PUN)
  13. 13. ”FAST PREDICTION” IS RELATIVE
  14. 14. REAL TIME VIDEO- OK, NOT GOOD ENOUGH
  15. 15. LAST MEMES FOR A WHILE
  16. 16. LESS HATER-Y  “Neural Nets are universal function approximates” - Jake Manix, talk an hour ago.  When milliseconds count- we can’t afford to approximate. - Me, Now.
  17. 17. ANCIENT PARADIGM Fast (Training and Prediction Time) Right (Highest accuracy) Cheap (In dollars and in hardware) GPU Deep Learning Haar-Cascade Filters CPU Deep Learning
  18. 18. CASCADE FILTER OVERVIEW  Scans for areas that match certain patterns.  Historical Context of Cascade Filters
  19. 19. CASCADE FILTER OVERVIEW
  20. 20. CASCADE FILTER
  21. 21. CASCADE FILTER (AREAWISE)
  22. 22. EIGENFACES (FACIAL RECOGNITION) OVERVIEW  Similar to Principal Component Analysis- ­  We week reduce dimensionality of images (tens of thousands of individual pixels) to a composition of “eigenfaces” ­  A face (as a 250x250 image) is represented as a vector of length 62500 (250 x 250 = 62500 pixels) ­  If we decompose into a combination of 130 Eigenfaces, we can represent a face with a vector of length 130. ­  Advantages over “Deep Learning” ­  Quicker to identify face ­  Quicker to retrain ­  Can instantaneously add new face to dataset  History of Eigenfaces:
  23. 23. WHY NOT LANDMARK RECOGNITION
  24. 24. EIGENFACES (FACIAL RECOGNITION)
  25. 25. EIGENFACES (FACIAL RECOGNITION)
  26. 26. EIGENFACES (PIXELWISE) Squares represent pixels…
  27. 27. EIGENFACES (PIXELWISE) Squares represent pixels…
  28. 28. EIGENFACES (PIXELWISE) Squares represent pixels…
  29. 29. EIGENFACES (PIXELWISE) 22 85 54 123 56 187 92 91 111 204 103 245 8 247 155 212 239 87 99 84 Squares represent pixels…
  30. 30. EIGENFACES (PIXELWISE) 22 85 54 123 56 187 92 91 111 204 103 245 8 247 155 212 239 87 99 84 Squares represent pixels…
  31. 31. EIGENFACES (FACIAL RECOGNITION)
  32. 32. EIGENFACES (FACIAL RECOGNITION)
  33. 33. EIGENFACES (FACIAL RECOGNITION) Matrix of Faces ith Image jth Pixel Position
  34. 34. EIGENFACES: SINGULAR VALUE DECOMPOSITION Matrix of FacesU Vx =
  35. 35. EIGENFACES: MATRIX V Matrix of FacesU V (Eigenfaces)x =
  36. 36. EIGENFACES: MATRIX U Matrix of FacesU Vx = Linear combinations of Eigenfaces required to form the Nth Face = 2.456 x - 7.2345 x + 0.4125 x
  37. 37. NEW FACES y V Transpose (each column is eigenface)
  38. 38. NEW FACES y X Simple Regression (OLS) Ordinary Least Squares β
  39. 39. RECAP  Cascade Filters: Facial Detection (where/is there a ‘face’ in this picture)  Eigen faces: Facial Recognition (WHO am I looking at?)  Neural nets / deep learning- could do both in one pass- very very slow.
  40. 40. ACT 2 Real-time Facial Recognition
  41. 41. CREATING THE EIGENFACES: COMPUTING  Apache Spark- an In-Memory Map-Reduce Engine (has weak ML library, however we won’t use).  Apache Mahout- Provides Distributed Stochastic Singular Value Decomposition method. (Also provides Mathematically expressive Scala DSL, and GPU/CPU acceleration)  Creating Eigen faces- Spark Job took 45 minutes on Desktop with 32GB RAM, 8CPUs @ 3.9GHz, but also I was watching Rick And Morty.  THIS JOB CAN BE GPU ACCELERATED BY CHANGING ONE DEPENDENCY.
  42. 42. CREATING THE EIGENFACES: DATASET  University of Mass. Faces in the Wild Dataset: 10k images of labeled faces from the internet. Each image is 250x250 (62500 pixels) 10k Faces Dataset Matrix (10,000 x 62500) Each row corresponds to 1 image of a face Each column corresponds to a given pixel position
  43. 43. APACHE MAHOUT ON APACHE SPARK CALCULATES EIGENFACES 10k Faces Dataset Matrix Linear Combos Eigenfaces x =
  44. 44. OPEN CV DETECTS FACES IN VIDEO FRAME
  45. 45. SCALE THE IMAGE TO 250X250
  46. 46. 321 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 Eigenfaces Ordinary Least Squares Linear Combination of Eigenfaces MAHOUT DECOMPOSES FACERECT INTO LINEAR COMBINATION OF EIGENFACES VECTOR
  47. 47. SEARCH SOLR FOR MATCHING VECTOR
  48. 48. DOCUMENT THE QUERY, RESPONSE, AND DOCUMENTS { name_s: “Richard Hatch”, e0_d : 1.512 e1_d : 5.125 e2_d : -15.1256 e3_d : 4.241 … e129_d : 1.245 ... call_sign_s : “Apollo” last_seen_dt : 2017-02-08T08:52:12 alias_s : “Tom Zarek” ... }
  49. 49. { name_s: “Richard Hatch”, e0_d : 1.512 e1_d : 5.125 e2_d : -15.1256 e3_d : 4.241 … e129_d : 1.245 ... call_sign_s : “Apollo” last_seen_dt : 2017-02-08T08:52:12 alias_s : “Tom Zarek” ... } Query THE QUERY, RESPONSE, AND DOCUMENTS DOCUMENT
  50. 50. THE QUERY, RESPONSE, AND DOCUMENTS Query ALL Documents Euclidean Distance Ascending Order
  51. 51. THE QUERY, RESPONSE, AND DOCUMENTS Response: [ { “name_s” : “Apollo”, “calcDist” : 1256.254, “lastseen_pdt”: 1979-05-11T08:41:25}, { “name_s” : “Tom Zarek”, “calcDist” : 1826.529, “lastseen_pdt”: 2017-02-07T08:41:25}, { “name_s” : “Starbuck”, “calcDist” : 5826.529, “lastseen_pdt”: 2017-09-14T15:22:56}, { “name_s” : “Caprica 6”, “calcDist” : 7119.525, “lastseen_pdt”: 2017-09-14T08:41:25}, … ]
  52. 52. RECOGNIZE OR NEW ENTITY? Response Recognize? Yes No Add Person to Solr Done
  53. 53. WHO DOES THE WORK? Local  Advantages:  - Edge device can build use context clues to make final decision  Disadvantages:  - Requires more hardware at edge to “think” On Solr  Advantages:  - Leverage advantages of Solr  - Less hardware requirement on edge  Disadvantages:  - “Contextual clues” must be encoded in query Response Recognize?
  54. 54. ACT 3 Building your own Cylons
  55. 55. DRONES ARE GETTING CHEAP  Drone 2-Pack ­  $99.99 ­  Controlled via Smartphone  FPV Camera ­  $39.99 / ea ­  Video over Wifi via RTSP Video enabled drones for ~$90 each
  56. 56. CHALLENGES AND OPPORTUNITIES Challenge: ­  Cascade Filters inconsistently frame face ­  “Ghost Faces” ­  Eigenfaces not robust to facial expressions, changes in light, etc.
  57. 57. CHALLENGES AND OPPORTUNITIES Opportunity: ­  Video gives us a lot more ”context clues” than still frames. ­  People don’t sporadically disappear and appear ­  Someone seen recently is more likely to be present than someone seen long ago.
  58. 58. OPENCV DETECTS FACES IN A VIDEO FRAME
  59. 59. OPENCV DETECTS FACES IN A VIDEO FRAME
  60. 60. OPENCV DETECTS FACES IN A VIDEO FRAME
  61. 61. OPENCV DETECTS FACES IN A VIDEO FRAME
  62. 62. OPENCV DETECTS FACES IN A VIDEO FRAME
  63. 63. OPENCV DETECTS FACES IN A VIDEO FRAME
  64. 64. 2 PROBLEMS 1.  The face is inconsistently detected (Eigenfaces is sensitive to this) 2.  Shadows, patterns on clothes, etc. cause “ghost faces” to be identified sporradically.
  65. 65. OPENCV DETECTS FACES IN A VIDEO FRAME
  66. 66. SOLUTION: CLUSTERING/ FILTERING/WINDOWING  Proposal: Cluster faces by location in frame. If less than N faces in cluster- remove all faces in cluster (e.g. ghost clusters)  Problem-2: People move around frame in time.  Proposal-2: Break frames up into sliding window of M seconds.  Problem-3: Clustering/machine learning can be somewhat computationally expensive  Proposal-3: Canopy clustering (old, but still effective method- 1 pass clustering).
  67. 67. CANOPY CLUSTERING  Create N Second Window  Cluster Faces in Window  Quick dirty clustering- but effective. ­  First point is “center” ­  All points within distance t2 are “in that cluster. ­  If a point is not within t2 of any cluster- it becomes a new cluster center.
  68. 68. t2= max square width OPENCV DETECTS FACES IN VIDEO FRAME
  69. 69. t2= max square widthFirst rect – new cluster Second Rect- within one width of first rect (same cluster) Third Rect- within one width of first rect (same cluster) Forth Rect- NOT within one width of first rect (new cluster) Fifth Rect- within one width of first rect (same cluster) Finally- any cluster with less than two entities in windows gets filtered out. CANOPY CLUSTERING TO REMOVE “GHOST” FACES
  70. 70. CLUSTERING BECAUSE WE DON’T KNOW HOW MANY TRUE FACES THERE ARE
  71. 71. SETTING THE ”LOOSE DISTANCE"Half the width of largest rectangle is the “Loose Distance”
  72. 72. SETTING THE “TIGHT DISTANCE”Half the width of largest rectangle is the “Loose Distance”
  73. 73. ADAPTIVE HYPER- PARAMETERS  A very simple machine learning algorithm adapts its self in real time to the input it is receiving…  A.I. Is a strong buzzword but...
  74. 74. A BETTER WAY TO SOLR (1)
  75. 75. SEARCH SOLR FOR MATCHING VECTOR
  76. 76. DOCUMENT A BETTER WAY TO SOLR { name_s: “Richard Hatch”, e0_d : 1.512 e1_d : 5.125 e2_d : -15.1256 e3_d : 4.241 … e129_d : 1.245 ... call_sign_s : “Apollo” last_seen_dt : 2017-02-08T08:52:12 alias_s : “Tom Zarek” ... }
  77. 77. LEVERAGE PAYLOAD CAPABILITY OF TERM FIELDS  New Query  q=“*:*”  &sort=dist(  2   ,payload(“e_dpf”,”e_00”)   ,payload(“e_dpf”,”e_01”)   ...   ,payload(“e_dpf”,”e_129”)   ,x_e0   ,x_e1   ...   ,x_e130  ) asc  &rows=5 { name_s : “Richard Hatch” e_dpf:”e0|1.512 e1|1.512 … e129| 1.245” … ,call_sign_s : “Apollo” ,last_seen_dt : “2017-02-08T08:52:12 ,alias_s : “Tom Zarek” … } Thank you Erik Hatcher (SOLR-1485  https://issues.apache.org/jira/browse/SOLR-1485)
  78. 78. THESE METHODS
  79. 79. BETTER SCALING Cluster1 Cluster2 Cluster3 Cluster2a Cluster2b
  80. 80. WINDOWING  A video is just a stream of Frames  Apache Flink gives us a nice API for splitting/joining the stream, as well as creating windows and applying functions to the windows. (Other bonuses too)
  81. 81. ENTER THE STREAM: OPENCV DETECTS FACES
  82. 82. ENTER THE STREAM: MAHOUT CANOPY CLUSTER An n-m-second sliding window: Every m seconds this window emits a set of clusters based on the last n seconds of data. For Exampe: 5-1, every 1 second a new set of ”face zones” based on faces detected the previous 5 seconds.
  83. 83. MAHOUT CANOPY CLUSTER An n-m-second sliding window: Every m seconds this window emits a set of clusters based on the last n seconds of data. For Exampe: 5-1, every 1 second a new set of ”face zones” based on faces detected the previous 5 seconds. (Or 0.5 / 0.1 – Every 10th of a second based on last half second)
  84. 84. ENTER THE STREAM: A LAG Here a small lag is introduced.
  85. 85. “APPLY THE CLUSTERS” BASED ON FIT CANOPIES Face Cluster 1 Face Cluster 1 Face Cluster 1 Face Cluster 1 Face Cluster 2 (only 1 image- Ghost)
  86. 86. STORE OUR MEMORIES IN SOLR
  87. 87. STORE OUR MEMORIES IN SOLR METHOD1: AVERAGING 1.  Take all Face Rects in Cluster. 2.  Average them All together. 3.  Search Solr for this averaged image. 4.  If this “Average Face” matches a face in the cluster (within some distance tolerance) we assign that name to every face in the cluster- and write all faces to Solr as that person’s name. 5.  Otherwise- we create a new name, and write all faces to Solr under the new Name. 6.  This really doesn’t work very well at all. 7.  ADVANTAGE: Minimize network traffic/SOLR taxation
  88. 88. STORE OUR MEMORIES IN SOLR METHOD2: “VOTING” 1.  Search EACH face 2.  Get list of names in results 3.  Assign points based on rank or distance 4.  Aggregate points across all rects, highest points “wins”- if winner has some minimum threshold, assign that name. 5.  Otherwise- we create a new name, and write all faces to Solr under the new Name.
  89. 89. PUNCHLINE:  Second benefit of Eigen faces over ”deep learning” quickly add faces
  90. 90. WHY APACHE SOLR ­ Capable of storing large amounts of data ­ Scales to petabytes text oriented  ­ Numeric compute friendly ­ Many ways to store different types of data
  91. 91. WHY APACHE MAHOUT ­ Engine Agnostic (Spark/Flink/Standalone/RYO) ­ Native acceleration on CPU/GPU/CUDA ­ Possible to accelerate BLAS operations on ANY arch (edge devices)  ­ Mathematically expressive Scala
  92. 92. WHY APACHE FLINK ­ Sophisticated Windowing Functions ­ Complex Event Library ­ Scales linearly (1 drone vs Army of Drones)
  93. 93. TECHNICALLY “BORG-STYLE” AI, NOT CYLONS   A finer technical point for those familiar with the Cylons and the Borg  “Hive Mind” Architecture
  94. 94. NEW HUMAN-0001OH, hai HUMAN-0001 LEARNING PROPAGATES QUICKLY
  95. 95. SHAPE OF THINGS TO COME. ”Science Fiction” of 10 years ago, today is domain of hobbyists Demo presented here is “Science Fair” grade AI. Vlad Putin’s recently talking about “it is undesirable for anyone to monopolize AI”. (Yay Apache!)
  96. 96. DEMO Here’s a fun video while I set up
  97. 97. Thank You

×