Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Computer Vision – From traditional approaches to deep neural networks

406 views

Published on

Event: GDG Munich February Meetup: Machine Learning, 27.02.2018
Speaker: Stanislav Frolov, inovex

Mehr Tech-Vorträge: https://www.inovex.de/de/content-pool/vortraege/
Mehr Tech-Artikel im inovex Blog: https://www.inovex.de/blog

Published in: Software
  • Be the first to comment

Computer Vision – From traditional approaches to deep neural networks

  1. 1. Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018
  2. 2. ● Computer vision ● Human vision ● Traditional approaches and methods ● Artificial neural networks ● Summary 2 Outline of this talk What we are going to talk about
  3. 3. ● trained deep neural networks for object detection during master thesis ● still fascinated and interested 3 Stanislav Frolov Big Data Engineer @inovex
  4. 4. ● Teach computers how to see ● Automatic extraction, analysis and understanding of images ● Infer useful information, interpret and make decisions ● Automate tasks that human visual system can do ● One of the most exciting fields in AI and ML 4 What is computer vision General
  5. 5. 5 What is computer vision Motivation ● Era of pixels ● Internet consists mostly of images ● Explosion of visual data ● Cannot be labeled by humans
  6. 6. 6 What is computer vision Drivers ● Two drivers for computer vision explosion ○ Compute (faster and cheaper) ○ Data (more data > algorithms)
  7. 7. 7 What is computer vision Interdisciplinary field Computer Science Mathematics Engineering Physics Biology Psychology Information Retrieval Machine LearningGraphs, Algorithms Systems Architecture Robotics Speech, NLP Image Processing Optics Solid-State Physics Neuroscience Cognitive SciencesBiological vision
  8. 8. Synonyms? 8
  9. 9. ● Imaging for statistical pattern recognition ● Image transformations such as pixel-by-pixel operations ○ Contrast enhancement ○ Edge extraction ○ Noise reduction ○ Geometrical and spatial operations (i.e rotations) 9 What is computer vision Related fields - image processing
  10. 10. ● Creates new images from scene descriptions ● Produces image data from 3D models ● “Inverse” of computer vision ● AR as a combination of both 10 What is computer vision Related fields - computer graphics
  11. 11. ● Mainly manufacturing applications ● Image-based automatic inspection, process control, robot guidance ● Usually employs strong assumptions (colour, shape, light, structure, orientation, ...) -> works very well ● Output often pass/fail or good/bad ● Additionally numerical/measurement data, counts 11 What is computer vision Related fields - machine vision
  12. 12. ● Create “intelligent” systems ● Studying computational aspects of intelligence ● Make computers do things at which, at the moment, people are better ● Many techniques play an important role (ML, ANNs) ● Currently does a few things better/faster at scale than humans can ● Ability to do anything “human” is not answered 12 What is computer vision Related fields - AI
  13. 13. ● Related fields have a large intersection ● Basic techniques used, developed and studied are very similar 13 What is computer vision Related fields- summary
  14. 14. Short trip to human vision 14
  15. 15. ● Two stage process ○ Eyes take in light reflected off the objects and retina converts 3D objects into 2D images ○ Brain’s visual system interprets 2D images and “rebuilds” a 3D model 15 What is human vision General
  16. 16. ● Pair of 2D images with slightly different view allows to infer depth ● Position of nearby objects will vary more across the two images than the position of more distant objects 16 What is human vision Stereoscopic vision
  17. 17. ● Prior knowledge of relative sizes and depths is often key for understanding and interpretation 17 What is human vision Prior knowledge
  18. 18. ● Texture and texture change helps solving depth perception 18 What is human vision Texture pattern
  19. 19. 19 What is human vision Biases and illusions in human perception ● Shadows make all the difference in interpretation ● Gradual changes in light ignored to not be misled by shadow
  20. 20. 20 What is human vision A few more illusions ● Two arrows with different orientations have the same length
  21. 21. ● Assumptions and familiarity (distorted room) ● Face recognition bias ● Up-down orientation bias 21 What is human vision Biases and illusions in human perception
  22. 22. 22 What is human vision Summary ● Illusions are fun, but the complete puzzle to understand human vision is far from being complete
  23. 23. Back to computer vision 23
  24. 24. ● Recognition ● Localization ● Detection ● Segmentation 24 What is computer vision Typical tasks
  25. 25. ● Part-based detection ○ Deformable parts model ○ Pose estimation and poselets 25 What is computer vision Typical tasks
  26. 26. ● Image captioning (actions, attributes) 26 What is computer vision Typical tasks
  27. 27. ● Motion analysis ○ Egomotion (camera) ○ Optical flow (pixels) 27 What is computer vision Typical tasks
  28. 28. ● Scene understanding and reconstruction 28 What is computer vision Typical tasks
  29. 29. ● Image restoration ● Colouring black & white photos 29 What is computer vision Typical tasks
  30. 30. Solving this is useful for many applications 30
  31. 31. 31 What is computer vision Typical applications ● Assistance systems for cars and people ● Surveillance ● Navigation (obstacle avoidance, road following, path planning) ● Photo interpretation ● Military (“smart” weapons) ● Manufacturing (inspection, identification) ● Robotics ● Autonomous vehicles (dangerous zones)
  32. 32. 32 What is computer vision Typical applications ● Recognition and tracking ● Event detection ● Interaction (man-machine interfaces) ● Modeling (medical, manufacturing, training, education) ● Organizing (database index, sorting/clustering) ● Fingerprint and biometrics ● …
  33. 33. Why so difficult? 33
  34. 34. 34 What is computer vision Why it is difficult ● Occlusion ● Deformation ● Scale ● Clutter ● Illumination ● Viewpoint ● Object pose ● Tons of classes and variants ● Often n:1 mapping ● Computationally expensive ● Full understanding of biological vision is missing
  35. 35. System overview 35
  36. 36. ● Input: image(s) + labels ● Output: Semantic data, labels ● Digital image pixels usually have three channels [R,G,B] each [0...255] + Location[x,y] ● Digital images are just vectors 36 What is computer vision System overview
  37. 37. 1. Image acquisition (camera, sensors) 2. Pre-processing (sampling, noise reduction, augmentation) 3. Feature extraction (lines, edges, regions, points) 4. Detection and segmentation 5. Post-processing (verification, estimation, recognition) 6. Decision making ● -> Ability of a machine to step back and interpret the big picture of those pixels 37 What is computer vision System overview
  38. 38. Some history 38
  39. 39. 1950s ● 2D imaging for statistical pattern recognition ● Theory of optical flow based on a fixed point towards which one moves 39 What is computer vision History
  40. 40. Image processing ● Histograms ● Filtering ● Stitching ● Thresholding ● ... 40 What is computer vision Traditional approaches
  41. 41. 1960s ● Desire to extract 3D structure from 2D images for scene understanding ● Began at pioneering AI universities to mimic human visual system as stepping stone for intelligent robots ● Summer vision project at MIT: attach camera to computer and having it “describe what it saw” 41 What is computer vision History
  42. 42. ● Given to 10 undergraduate students ● … an attempt to use our summer workers effectively … ● … construction of a significant part of a visual system … ● … task can be segmented into sub-problems … ● … participate in the construction of a system complex enough to be a real landmark in the development of “pattern recognition” … 42 What is computer vision History: summer vision project @MIT 1966
  43. 43. ● Goal: analyse scenes and identify objects ● Structure of system: ○ Region proposal ○ Property lists for regions ○ Boundary construction ○ Match with properties ○ Segment ● Basic foreground/background segmentation with simple objects (cubes, cylinders, ….) 43 What is computer vision History: summer vision project @MIT 1966
  44. 44. ● Unlike general intelligence, computer vision seemed tractable ● Amusing anecdote, but it did never aimed to “solve” computer vision ● Computer vision today differs from what it was thought to be in 1966 44 What is computer vision History: summer vision project @MIT 1966
  45. 45. 1970s ● Formed many algorithms that exist today ● Edges, lines and objects as interconnected structures 45 What is computer vision History
  46. 46. 46 What is computer vision Traditional approaches Edge detection based on ● Brightness ● Gradients ● Geometry ● Illumination
  47. 47. 47 What is computer vision Traditional approaches - part based detector ● Objects composed of features of parts and their spatial relationship ● Challenge: how to define and combine
  48. 48. 1980s ● More rigorous mathematical analysis and quantitative aspects ● Optical character recognition ● Sliding window approaches ● Usage of artificial neural networks 48 What is computer vision History
  49. 49. 49 What is computer vision Traditional approaches - HOG detection (histogram of oriented gradients) ● Concept in 80s but used only in 2005 ● Create HOG descriptors (object generalizations) ● One feature vector per object ● Train with SVM ● Sliding window @multiple scales
  50. 50. 50 What is computer vision Traditional approaches - HOG detection (histogram of oriented gradients) ● Computation of HOG descriptors: 1. Compute gradients 2. Compute histograms on cells 3. Normalize histograms 4. Concatenate histograms ● Requires a lot of engineering ● Must build ensembles of feature descriptors
  51. 51. 1990s ● Significant interaction with computer graphics (rendering, morphing, stitching) ● Approaches using statistical learning ● Eigenface (Ghostfaces) through principal component analysis (PCA) 51 What is computer vision History
  52. 52. 52 What is computer vision Traditional approaches - deformable parts model (DPM) ● Objects constructed by its parts ● First match whole object, then refine on the parts ● HOG + part-based + modern features ● Slow but good at difficult objects ● Involves many heuristics
  53. 53. 53 What is computer vision Features ● Feature points ○ Small area of pixels with certain properties ● Feature detection ○ Use features for identification ○ Activate if “object” present ● Examples: ○ Lines, edges, colours, blobs, … ○ Animals, faces, cars, ...
  54. 54. 54 What is computer vision Traditional approaches - classical recognition ● Init: extract features for objects in different scales, colours, orientations, rotations, occlusion levels ● Inference: extract features from query image and find closest match in database or train a classifier ● Computationally expensive (hundreds of features in image, millions in database) and complex due to errors and mismatches
  55. 55. 55 What is computer vision History Before the new era ● Bags of features ● Handcrafted ensembles Input Feat. 2 Feat. 1 Feat. n Final Decision Feature Extraction
  56. 56. The new era of computer vision 56
  57. 57. ● Elementary building block ● Inspired by biological neurons ● Mathematical function y=f(wx+b) ● Learnable weights 57 Artificial neural networks Fundamentals - artificial neuron
  58. 58. ● Collection of neurons organized in layers ● Universal approximators ● Fully-connected network here 58 Artificial neural networks Fundamentals - artificial neural networks
  59. 59. 59 Artificial neural networks Fundamentals - training ● Basically an optimization problem ● Find minimum of a loss function by an iterative process (training) ● Designing the loss function is sometimes tricky
  60. 60. 60 Artificial neural networks Fundamentals - training Simple optimizer algorithm: 1. Forward pass with a batch of data 2. Calculate error between actual and wanted output 3. Nudge weights in proportion to error into the right direction (same data would result in smaller error) 4. Repeat until convergence
  61. 61. 61 Artificial neural networks Fundamentals - CNN ● Local neighborhood contributes to activation ● Exploit spatial information ● Hierarchical feature extractors ● Less parameters input activation filters receptive field
  62. 62. 62 Artificial neural networks Fundamentals - CNN ● Filter of size 3x3 applied to an input of 7x7
  63. 63. 63 Artificial neural networks Fundamentals - pooling ● Max-pooling ● Dimension reduction/adaption ● Existence is more important than location
  64. 64. 64 Artificial neural networks Fundamentals - pooling ● Zero-padding ● Controlling dimensions
  65. 65. 65 Artificial neural networks Fundamentals - general network architecture Input image convolutional layers ... Final decision
  66. 66. 66 Artificial neural networks Fundamentals - hierarchical feature extractors Lines, edges, blobs, colours, ... Abstract objectsParts of abstract objects First layers Deeper layers Activations for:
  67. 67. Modern history of object recognition 67
  68. 68. ● Classification and detection ○ 27k images ○ 20 classes ■ person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/ monitor 68 Benchmark Datasets - PASCAL VOC
  69. 69. ● Challenges on a subset of ImageNet ○ 14kk labeled images ○ 20k object categories ● ILSVRC* usually on 10k categories including 90 out of 120 dog breeds 69 Benchmark Datasets - ImageNet *ImageNet Large Scale Visual Recognition Challenge
  70. 70. ● ILSVRC 2012 winner by a large margin from 25% to 16% ● Proved effectiveness of CNNs and kicked of a new era ● 8 layers, 650k neurons, 60kk parameters 70 Artificial neural networks Roadmap - AlexNet
  71. 71. ● ILSVRC 2013 winner with a best top-5 error of 11.6% ● AlexNet but using smaller 7x7 kernels to keep more information in deeper layers 71 Artificial neural networks Roadmap - ZFNet
  72. 72. ● ILSVRC 2013 localization winner ● Uses AlexNet on multi-scale input images with sliding window approach ● Accumulates bounding boxes for final detection (instead of non-max suppression) 72 Artificial neural networks Roadmap - OverFeat
  73. 73. ● 2k proposals generated by selective search ● SVM trained for classification ● Multi-stage pipeline 73 Artificial neural networks Roadmap - RCNN (region based CNN)
  74. 74. ● Not a winner but famous due to simplicity and effectiveness ● Replace large-kernel convolutions by stacking several small-kernel convolutions 74 Artificial neural networks Roadmap - VGGNet
  75. 75. ● ILSVRC 2014 winner ● Stacks up “inception” modules ● 22 layers, 5kk parameters 75 Artificial neural networks Roadmap - InceptionNet (GoogleNet)
  76. 76. ● Jointly learns region proposal and detection ● Employs a region of interest (RoI) that allows to reuse the computations 76 Artificial neural networks Roadmap - Fast RCNN
  77. 77. ● Directly predicts all objects and classes in one shot ● Very fast ● Processes images at ~40 FPS on a Titan X GPU ● First real-time state-of-the-art detector ● Divides input images into multiple grid cells which are then classified 77 Artificial neural networks Roadmap - YOLO (you only look once)
  78. 78. ● ILSVRC 2015 winner with a 3.6% error rate (human performance is 5-10%) ● Employs residual blocks which allows to build deep networks (hundreds of layers) ● Additional identity mapping 78 Artificial neural networks Roadmap - ResNet (Microsoft)
  79. 79. ● Not a recognition network ● A region proposal network ● Popularized prior/anchor boxes (found through clustering) to predict offsets ● Much better strategy than starting the predictions with random coordinates ● Since then heuristic approaches have been gradually fading out and replaced 79 Artificial neural networks Roadmap - MultiBox
  80. 80. ● Fast RCNN with heuristic region proposal replaced by region proposal network (RPN) inspired by MultiBox ● RPN shares full-image convolutional features with the detection network (cost-free region proposal) ● RPN uses “attention” mechanism to tell where to look ● ~5 FPS on a Titan K40 GPU ● End-to-end training 80 Artificial neural networks Roadmap - Faster RCNN
  81. 81. ● SSD leverages the Faster RCNN’s RPN to directly classify objects inside each prior box (similar to YOLO) ● Predicts category scores and box offsets for a fixed set of default bounding boxes ● Fixes the predefined grid cells used in YOLO by using multiple aspect ratios ● Produces predictions of different scales ● ~59 FPS 81 Artificial neural networks Roadmap - SSD (single shot multibox detector)
  82. 82. ● Open-source software library for machine learning applications ● Tensorflow Object Detection API ○ A collection of pretrained models ○ construct, train and deploy object detection models 82 Artificial neural networks TensorFlow object detection API
  83. 83. Summary 83
  84. 84. ● Humans are good at understanding the big picture ● Neural networks are good at details ● But they can be fooled... 84 Summary Human vs machine
  85. 85. ● Need a large amount data ● Lots of engineering ● Trial and error ● Long training time ● Still lots of hyperparameter parameter tuning ● No general network (generalization not answered) ● Little mathematical foundation 85 Summary Computer vision is still difficult
  86. 86. ● Despite all of these advances, the dream of having a computer interpret an image at the same level as a human remains unrealized 86 Summary Computer vision is hard
  87. 87. Thank You Stanislav Frolov Big Data Engineer sfrolov@inovex.de 0173 318 11 35 inovex GmbH Lindberghstraße 3 80939 München

×