Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Practical computer vision-- A problem-driven approach towards learning CV/ML/DL

3,600 views

Published on

Practical computer vision-- A problem-driven approach towards learning CV/ML/DL
Albert Chen Ph.D., 20170726 at Academia Sinica, Taiwan
Invited Speech during Academia Sinica's AI month

Published in: Technology
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Practical computer vision-- A problem-driven approach towards learning CV/ML/DL

  1. 1. Practical Computer Vision A problem-driven approach towards learning CV/ML/DL Albert Y. C. Chen, Ph.D. Vice President, R&D Viscovery
  2. 2. Albert Y. C. Chen, Ph.D. • Experience 2017-present: Vice President of R&D @ Viscovery 2016-2017: Chief Scientist @ Viscovery 2015: Principal Scientist @ Nervve Technologies 2013-2014 Computer Vision Scientist @ Tandent 2011-2012 @ GE Global Research • Education Ph.D. in Computer Science, SUNY-Buffalo M.S. in Computer Science, NTNU B.S. in Computer Science, NTHU
  3. 3. 1. W.Wu,A.Y. C. Chen, L. Zhao, and J. J. Corso. Brain tumor detection and segmentation in a CRF framework with pixel-wise affinity and superpixel-level features. International Journal of Computer Assisted Radiology and Surgery, 2015. 2. S. N. Lim,A.Y. C. Chen and X.Yang. Parameter Inference Engine (PIE) on the Pareto Front. In Proceedings of International Conference of Machine Learning,Auto ML Workshop, 2014. 3. A.Y. C. Chen, S.Whitt, C. Xu, and J. J. Corso. Hierarchical supervoxel fusion for robust pixel label propagation in videos. In Submission to ACM Multimedia, 2013. 4. A.Y.C. Chen and J.J. Corso.Temporally consistent multi-class video-object segmentation with the video graph-shifts algorithm. In Proceedings of IEEE Workshop on Applications of ComputerVision, 2011. 5. D.R. Schlegel,A.Y.C. Chen, C. Xiong, J.A. Delmerico, and J.J. Corso. Airtouch: Interacting with computer systems at a distance. In Proceedings of IEEE Workshop on Applications of ComputerVision, 2011. 6. A.Y.C. Chen and J.J. Corso. On the effects of normalization in adaptive MRF Hierarchies. In Proceedings of International Symposium CompIMAGE, 2010. 7. A.Y.C. Chen and J.J. Corso. Propagating multi-class pixel labels throughout video frames. In Proceedings of IEEE Western NewYork Image Processing Workshop, 2010. 8. A.Y. C. Chen and J. J. Corso. On the effects of normalization in adaptive MRF Hierarchies. Computational Modeling of Objects Represented in Images, pages 275–286, 2010. 9. Y.Tao, L. Lu, M. Dewan,A.Y. C. Chen, J. J. Corso, J. Xuan, M. Salganicoff, and A. Krishnan. Multi-level ground glass nodule detection and segmentation in ct lung images. Medical Image Computing and Computer-Assisted Intervention, 2009. 10. A.Y.C. Chen, J.J. Corso, and L.Wang. Hops: Efficient region labeling using higher order proxy neighborhoods. In Proceedings of IEEE International Conference on Pattern Recognition, 2008.
  4. 4. Some work done before I caught the startup fever Freestyle Sketching Stage AirTouch waits in background for the initialization signal Initialize Terminate Output image database Start: Results CBIR query Airtouch HCI interface for Content-based Image Retrieval
  5. 5. Interactive Segmentation & Classification • Segmentation then classification: • computationally more efficient, • results in much higher classification accuracy. • Pioneered the “pixel label propagation” field. • First to utilize superpixels and supervoxels for the task. FG Traditional Spatial Propagation Pixel label map Label a subset of pixels BG Spatio-temporal Propagation time
  6. 6. Image/Video Object Recognition and Content Understanding approaches person carries gives recieves Ontology object Person 1 Person 1Person 2 High-Level Mid-Level approach activity receives gives carries activity activity activity Time Reasoning x x x Low-Level x x x x
  7. 7. Learning and Adapting Optimal Classifier Parameters subspace B subspace A subspace C Image-level feature space priors Patch-level feature space posterior probability suggest optimal parameter configuration
  8. 8. Graphical Models and Stochastic Optimization A (a) The space-time volume of a video showing the objects (A--F) and their appearing time-span. space time A B C D E F B E F C D (b) The temporal relationship graph. An edge between two vertices mean that the two objects overlap in time. (c) The goal is: cover all objects with the smallest number of "ground truth key frames". space time A B C D E F key 1 key 2 A B E F C D (d) This translates to: iteratively solving the max clique problem until all vertices belong to a clique. A B E F C D key 2 key 1 frame t-1 frame t layer n layer n layer n+1 layer n+1 Temporal Shift Shift µ
  9. 9. Medical Imaging and Geospatial Imaging GNN detection and segmentation in Lung CT geospatial imaging: building detection Brain tumor detection and segmentation in MR images.
  10. 10. Why are we here today? To make a better change for our future.
  11. 11. Change is the only constant -Heraclitus (535 BC - 475 BC)
  12. 12. Change is the only constant -Heraclitus (535 BC - 475 BC)
  13. 13. Why Risk Innovating? • Good business model NEVER last forever. • Average “shelf life” on S&P 500: 20 years. • 100-year old companies constantly reinvent themselves every 10-20 years • Startups contribute to 20% of USA’s GDP.
  14. 14. The Death of a Good Business Model • Foxconn 20 year revenue v.s. net profit (now at 5%)
  15. 15. What do 100 year old corporations do? GE Schenectady, 1896
  16. 16. History of change at GE • 1886: one of the 12 original companies on the Dow Jone Industrial Average (also the only one remaining). • 1889: lightbulbs • 1919: radios • 1927: TV • 1941: jet engine • 1960: nuclear power • 1971: room AC units • 1995: MRI
  17. 17. History of change at IBM • 1960s: mainframe computer • 1980s: personal computer • 2000s: integrated solutions • 2020s: AI, Watson
  18. 18. How about the leading Semiconductor companies?
  19. 19. NVidia reinventing itself —2 times in 20 years
  20. 20. “Bad money drives out good” in the desktop GPU market
  21. 21. The rise of mobile computing, and how NVidia missed the boat!
  22. 22. NVidia’s Tegra mobile processors never took off then, the market saturated…
  23. 23. NVidia not just survived. NVidia is thriving!
  24. 24. Meet the new NVidia: Deep Learning, Deep Learning, and still, Deep Learning
  25. 25. The king is dead, long live the king!
  26. 26. Now, again, do we want to do OEM/ODM forever? Optimizing an old business model is just delaying its eventual death.
  27. 27. Computer Vision, it can’t be that hard, right? hmm… grayscale color can’t work alone… maybe color works better?
  28. 28. Computer Vision, it can’t be that hard, right? White and Gold or Blue and Black? The Dress 2015/02/26
  29. 29. Computer Vision, it can’t be that hard, right?
  30. 30. Even if we can auto-correct all lighting and color temperature [w w w w] [w r r w] [w r r w] [w w w w] and force all apples to be encoded as: we’d still have all these “affine transformation” issues:
  31. 31. Even if lighting, color, affine transformation are not an issue • Our 3D world can’t simply be represented by fixed 2D encoding:
  32. 32. Brief History Marvin Minsky “In 1966, Minsky hired a first-year undergraduate student and assigned him a problem to solve over the summer: connect a television camera to a computer and get the machine to describe what it sees.” Gerald Sussman The student never worked on Computer Vision problems again.
  33. 33. Brief History • 1960’s: interpretation of synthetic worlds • 1970’s: some progress on interpreting selected images • 1980’s: ANNs come and go; shift toward geometry and increased mathematical rigor • 1990’s: face recognition; statistical analysis in vogue • 2000’s: broader recognition; large annotated datasets available; video processing starts Guzman ‘68 Ohta Kanade ‘78 Turk and Pentland ‘91
  34. 34. What was in our arsenal? • Image filters • Feature descriptors • Classifiers
  35. 35. Filters: blurring
  36. 36. Filters: sharpening
  37. 37. Filters: Canny Edge Detector
  38. 38. Filters: straight lines
  39. 39. Features: a compact and (hopefully) invariant representation
  40. 40. Features: Gabor
  41. 41. Features: Harris Corners
  42. 42. Features: Laplacian of Gaussian (LoG; scale detection)
  43. 43. Features: Orientation How to compute the rotation? Create edge orientation histogram and find peak.
  44. 44. Features: SIFT
  45. 45. Features: SIFT
  46. 46. Classifier Training in Machine Learning Classification Clustering Regression Dimension Reduction supervised unsupervised continuousdiscrete
  47. 47. Classifiers: SVM
  48. 48. Classifiers: Ensemble
  49. 49. Classifiers: Random Fields
  50. 50. Classifiers: Deformable Parts Model (DPM)
  51. 51. Meta-Learning • Different use cases calls for different ML algorithms. • Meta-Learning: learning how to learn. • Requires plenty of domain-specific know-how.
  52. 52. Neural Network (NN) Why didn’t it work; why now? • MNIST digit data 28x28 • LeCunn’s 3 layer NN: 1170 variables. • Require tens of thousands of samples. • Only learn simple line/ curve combinations
  53. 53. AI Winter (1970-1980, 1990-2000) • Early NN problems: • redundant structure, • slow learning speed • need too much data • bad learning stability.
  54. 54. What’s in a NN ( )zσ+ ( )zσ+ ( )zσ+ ( )zσ+ Input weights bias activation function
  55. 55. NN breakthroughs since 1970’s 1. Better Network Structure • Convolutional Neural Network greatly reduces the number of variables in NN’s designed for images and videos. —> Improved convergence speed, reduced data requirements. Upper-left corner Bird Beak Detector Center Bird Beak Detector Almost identical, can be shared across regions
  56. 56. NN breakthroughs since 1970’s 1. Network Structure
  57. 57. NN breakthroughs since 1970’s 2. Improved Activation Functions Large Small 1x 2x …… Nx …… …… …… …… …… …… …… y1 y2 yM          
  58. 58. NN breakthroughs since 1970’s 3. Effective Backpropagation w1 w2 Clipping [Razvan Pascanu, ICML’13]
  59. 59. NN breakthroughs since 1970’s 4. Efficient Training Methods • Mini-batch • Adaptive Learning Rate • Dropout, Batch- normalization minibatchminibatch 1 epoch
  60. 60. Deep Neural Networks (DNN) way more complex and capable!
  61. 61. What do DNNs learn? • Neurons act like “custom-trained filters”; react to very different visual cues, depending on data.
  62. 62. What do DNNs learn? • Neurons act like “custom-trained filters”; react to very different visual cues, depending on data.
  63. 63. • Does not “memorize” millions of viewed images. • Extracts greatly reduced number of features that are vital to classify different classes of data. • Classifying data becomes a simple task when the features measured are “”good”. What do DNNs learn?
  64. 64. Mature/Maturing Computer Vision Applications
  65. 65. • Final inspection cells • Robot guidance and checking orientation of components • Packaging Inspection • Medical vial inspection • Food pack checks • Verifying engineered components[5] • Wafer Dicing • Reading of Serial Numbers • Inspection of Saw Blades • Inspection of Ball Grid Arrays (BGAs) • Surface Inspection • Measuring of Spark Plugs • Molding Flash Detection • Inspection of Punched Sheets • 3D Plane Reconstruction with Stereo • Pose Verification of Resistors • Classification of Non- Woven Fabrics 1970s-now: Machine Vision for Industrial Inspection • Automated Train Examiner (ATEx) Systems • Automatic PCB inspection • Wood quality inspection • Final inspection of sub-assemblies • Engine part inspection • Label inspection on products • Checking medical devices for defects
  66. 66. Industrial Inspection: turbofan jet engine blade maintenance • Some seemingly daunting machine vision tasks actually works with relatively simple image processing algorithms.
  67. 67. Industrial Inspection: Cognex Omniview
  68. 68. Industrial Inspection: Cognex Omniview
  69. 69. License Plate Recognition (1979-now)
  70. 70. License Plate Readers with Text Detection and Neural Networks
  71. 71. Biometrics
  72. 72. Automated Fingerprint Identification (1970s-now)
  73. 73. Face Recognition (1990s-now) • Face Detection (Viola and Jones, 2001) • Face Verification (1:1) v.s. Identification (1:N)
  74. 74. Face Verification and Identification,
 Labeled Faces in the Wild (LFW) Recognition Accuracy: • 1 to 1: 99%+ • 1 to 100: 90% • 1 to 10,000: 50%-70%. • 1 to 1M: 30%. LFW dataset, common FN↑, FP↓
  75. 75. Sports—NFL first down line (1995-now)
  76. 76. Sports—NFL first down line minus equals
  77. 77. 3D Reconstruction (As old as CV; became practical since SIFT)
  78. 78. 3D Reconstruction with Feature Matching, Structure from Motion
  79. 79. 3D Reconstruction with Feature Matching, Structure from Motion
  80. 80. Image Panoramas (1980s - now)
  81. 81. Solving Panorama Problem with Markov Random Fields Input:
  82. 82. Solving Panorama Problem with Markov Random Fields Input:
  83. 83. Solving Panorama Problem with Markov Random Fields Input:
  84. 84. Solving Panorama Problem with Markov Random Fields Input:
  85. 85. Solving Panorama Problem with Markov Random Fields Input:
  86. 86. Solving Panorama Problem with Markov Random Fields Input:
  87. 87. Solving Panorama Problem with Markov Random Fields Input:
  88. 88. Solving Panorama Problem with Markov Random Fields
  89. 89. Solving Panorama Problem with Markov Random Fields ICM (Iterated Conditional Modes), 1986
  90. 90. Solving Panorama Problem with Markov Random Fields Belief Propagation (1980-2000)
  91. 91. Solving Panorama Problem with Markov Random Fields Graph-Cuts (alpha expansion), 2001
  92. 92. Photosynthesis
  93. 93. Solving Photosynthesis Problems with Alpha-matting (2000s-now)
  94. 94. Object Detection & Classification state-of-the-art • ImageNet Large Scale Visual Recognition Challenge (ILSVRC) • 1000+ classes, 1.2M images. 0 0.125 0.25 0.375 0.5 11 12 13 14 11 12 13 14 classification error classification +localization error
  95. 95. Image Scene Classification • MIT Places 401 dataset. • top-5 accuracy rates >80%.
  96. 96. Self-driving cars (2000s-now)
  97. 97. DARPA Grand Challenge (2005)
  98. 98. 2005 winner, Stanley (Stanford), 3mph through desert
  99. 99. DARPA Urban Challenge (2007)
  100. 100. 2007 winner, Boss (CMU), 13mpg through the city
  101. 101. Self Driving Cadillac, US congressman to airport, 2013
  102. 102. Google Self Driving Car, 2015
  103. 103. Google Self-Driving Car, 2016
  104. 104. Google Self-Driving Car, 2016
  105. 105. NVidia Self Driving Car, 2016
  106. 106. How did we come this far? Race car drivers know the trick
  107. 107. Focus on Free Space / Drivable Area, not Obstacles!
  108. 108. Up-and-coming Computer Vision Applications
  109. 109. Structure from X, Floored
  110. 110. Structure from X, PIX4D
  111. 111. Object Recognition Blue River Technology
  112. 112. Augmented Reality Magic Leap
  113. 113. IMRSV
  114. 114. Retail Insights Source: Prism Skylabs
  115. 115. Other Applications in Business Intelligence • Measure brand exposure. • Measure sponsorship effectiveness. • Loss prevention and retail layout optimization.
  116. 116. How about Smart Surveillance?
  117. 117. Exciting applications many of you might be attempting to SOLVE!!!
  118. 118. Problem Solving Workflow Classical Workflow: 1. Data collection 2. Feature Extraction 3. Dimension Reduction 4. Classifier (re)Design 5. Classifier Verification 6. Deploy Modern Brute-force workflow 1. Data collection 2. Throw everything into a Deep Neural Network 3. Mommy, why doesn’t it work ???
  119. 119. Classical Problem #1: Curse of Dimensionality ze sit 앉다 sentarse • Number of Variables vs Number of Samples Q. Who would make such naive mistakes? A. Many “newbies” repeatedly do so.
  120. 120. Example 1-1: illegal parking detection legal parking samples x100 illegal parking samples x100 Let’s train a 150-layer Res-Net!!! What could possibly go wrong?
  121. 121. Example 1-1: illegal parking detection • Data: try cleaner data • Feature: fine-tune with pre-trained model; don’t train from scratch • Classifier overfitting: beware of statistical coincidences,
  122. 122. Example 1-2: Smart Photo Album with Google Cloud Vision
  123. 123. Example 1-2: Smart Photo Album with Google Cloud Vision No effective distance measure for thousands, if not millions of dimensions (tags); would be approximately zero most of the time.
  124. 124. Classical Problem #2: Overfitting Data • Make sure your deep learning algorithm is learning better features for data, not overfitting the data with complex classifiers.
  125. 125. Deep Learning Cookbook Good Results on Testing Data? Good Results on Training Data? YES YES New activation function Adaptive Learning Rate Early Stopping Regularization Dropout (credit: Prof. H.Y. Lee, NTU)
  126. 126. Example: AOI breakthroughs with Deep Learning—Metal Inspection D Weimer et al. 2017
  127. 127. Example: AOI breakthroughs with Deep Learning—Textile Inspection X Funding Li et al. / IEEE Tran Automation Science and Engineer 2017 (to appear)
  128. 128. Example: AOI breakthroughs with Deep Learning—Laser Welding Johannes Günther et al. / Procedia Technology 15 (2014) 474 – 483
  129. 129. Example: AOI breakthroughs with Deep Learning—Laser Welding Johannes Günther et al. / Procedia Technology 15 (2014) 474 – 483
  130. 130. Example: AOI breakthroughs with Deep Learning—Serial Number Processing S. N. Lim et al. / GE Global Research
  131. 131. S. N. Lim et al. / GE Global Research Example: AOI breakthroughs with Deep Learning—Serial Number Processing
  132. 132. Example: AOI breakthroughs with Deep Learning—Corrosion Detection S. N. Lim et al. / GE Global Research
  133. 133. Example: Dermatologist-level Skin Cancer Diagnosis with DNN+Smartphones • 5.4M cancer cases, 58M pre-cancer cases diagnosed every year in the US. (Andre Esteva, Sebastian Thrun, 2017)
  134. 134. Example: Dermatologist-level Skin Cancer Diagnosis with DNN+Smartphones
  135. 135. Example: Dermatologist-level Skin Cancer Diagnosis with DNN+Smartphones
  136. 136. Example: Hippocampus Segmentation in 7T MR Images (Dinggang Shen, 2017)
  137. 137. (Dinggang Shen, 2017) Example: Hippocampus Segmentation in 7T MR Images
  138. 138. (Dinggang Shen, 2017) Example: Hippocampus Segmentation in 7T MR Images
  139. 139. Example: Histopathological Image Classification w. DNN Microscopic view of Breast malignant tumor 40x 100x 200x 400x (FA Spanhol, IJCNN 2016)
  140. 140. Example: Histopathological Image Classification w. DNN
  141. 141. Example: Histopathological Image Classification w. DNN
  142. 142. Example: DNN for Plant Disease Detection (S Mohanty, 2016)
  143. 143. Example: DNN for Plant Disease Detection
  144. 144. Example: DNN for Plant Disease Detection
  145. 145. Thank You! albert@viscovery.com
  146. 146. Appendix 1: Startups • A company, partnership, or temporary organization designed to search for a new, repeatable and scalable business model.
  147. 147. Your Idea • Are you passionate about it? • Is it disruptive enough? • What is your business plan? • What is it? • Can it make money? • What is the future of the idea? • What is your competitive advantage? • How do you build up your entry barrier?
  148. 148. A minimal startup team • A hacker • A hustler • A hipster
  149. 149. Startup Timeline
  150. 150. Prototype • Hack out a prototype • Spend 2-10 weeks max. • Investors are much more likely to fund you if you have a minimal initial version of your idea. • Hackathons are a good place to start. • Iteratively improve the prototype
  151. 151. Money!
  152. 152. Buildup your entry barrier! • Market (users) • Speed • Team • Technology
  153. 153. Building entry barrier with Technology!!
  154. 154. Angel.co
  155. 155. Appendix 2: My humble attempts at putting the latest Computer Vision algorithms to work
  156. 156. Intrinsic Imaging at Tandent Vision Science Computer Vision would be half-solved without shadows! LightOriginal Image Surface
  157. 157. Tandent Lightbrush Video Tutorial for Tandent Lightbrush: https://vimeo.com/47009123
  158. 158. Issues • Highly anticipated, highly acclaimed, but small crowd at $500 a license. • Adobe Photoshop monopoly and the “not invented here” syndrome. • Adobe’s arch-rival, Corel (Corel Draw, Paint Shop Pro, Ulead PhotoImpact) was DYING and asked too much from the botched deal.
  159. 159. Have fun scribbling out your shadows in photoshop! Poor Bob from Adobe wasted 9 minutes removing just 1 shadow
  160. 160. Intrinsic Imaging for improving the RGB signal in autonomous driving
  161. 161. Intrinsic Imaging’s other applications
  162. 162. Retrospect • 20 researchers burned 25 million in 8 years; investors got 50 patents in return, period. • Overestimated the total addressable market size, in a market with existing monopoly. • Many missed opportunities. Counterexample of the lean startup model.
  163. 163. Some SfM, SLAM startups
  164. 164. Satellite/Aerial Imagery Analysis • 40cm resolution at 30fps for 90 sec for any location on earth. • One LEO satellite revisits any place on Earth every 3 days. • Need 24 satellites to revisit any place on Earth every 3 hours.
  165. 165. Challenges for Single satellite depth estimation and 3D reconstruction • At 30fps, a LEO satellite travels 250m between two consecutive frames —> theoretically sufficient for cm-level depth estimation. • Sources of Noise: • Camera distortions • Atmospheric Disturbance • Ground vegetation • Sub-pixel sampling noise 1 2
  166. 166. What happened? • B2B customers takes too long to strike deals. • Google ate us alive in just 3 months, while we were still pitching for VC-funding with our prototype.
  167. 167. Visual Search at Nervve
  168. 168. Retrospect • Growth pains expanding from intelligence community clients to advertisement clients. • Forming the right team of engineers and researchers and moving at the right pace. • For any Computer Vision/Machine Learning company: • Researchers that cannot program—> OUT • Engineers that don’t know math —> OUT
  169. 169. Visual Search, Simply Smarter
  170. 170. Once in a lifetime opportunity in China’s video streaming market
  171. 171. What do we need? Face Motion Image scene Text Audio Object Semantics
  172. 172. Viscovery VDS (Video Discovery Service)
  173. 173. Viscovery VDS (Video Discovery Service)
  174. 174. Viscovery VDS (Video Discovery Service)
  175. 175. Challenges Encountered Along the Way • From Product Recognition in Images, to Face, Logo, Object, Scene recognition in Videos. • Number of Categories • Recognition Accuracy • Recognition Speed • System Architecture • Business Model
  176. 176. Viscovery’s Edge • Market: first mover’s advantage in China’s video streaming market. • Speed: we built the whole VDS thing in a few months! • Team: You! Seriously! • Technology: • Depth • Breadth • Cloud • Customizability • Self-Learning
  177. 177. Life is not all rosy at startups • High Risk, High Pressure, High Uncertainty! • Resources are scarce, but you MUST DELIVER! • Forming your all-star team is not that easy… • Focus, and persistence.
  178. 178. Appendix 3: What can Taiwan’s academia do to help bridge the gap? HMM….
  179. 179. Academia IndustryGeneral Public reputation and policy support improved living standards students opportunity well-trained graduates grants and collaborations A healthy cycle
  180. 180. Academia IndustryGeneral Public unsupportive policies stagnant wages useless education unemployable graduates A vicious cycle no grants no students
  181. 181. Where should we start? Maybe with a few more stories.
  182. 182. Where should we start? Maybe with a few more stories.
  183. 183. Where should we start? Maybe with a few more stories.
  184. 184. The Goldilocks zone of innovation
  185. 185. The Goldilocks zone of innovation Business Relevance Academic Relevance plentiful resources; hierarchical organization lack of resources; responsive organization traditional corporations talking “innovation” corporate research startups struggling to survive academic spinoffs MSR

×