Computer Vision
–
Old Problems and New Solutions
Gopi Krishna Nuti
Vice President, MUST Research
vp@must.co.in, ngopikrishna@gmail.com
Computer Vision
–
The (age) old
problems
• What should a robot do in “Scene
understanding”?
• Identify colours, brightness etc
• Identify objects a.k.a Image Segmentation
• Different things
• Multiple occurrences of the same thing
• Stuff other than things
• Distance of things and stuff
• Relative and absolute
Colour and
Brightness
Colour spaces
• Grayscale, RGB, CMY,
• Transparency/Opacity
using a fourth
attribute
Limitations
• Does not represent all
colours in nature
• colour perception
highly susceptible to
lighting changes.
New Solutions
• Colour spaces have
been expanded greatly.
• With micro and
macro level
differences, ~250
colour spaces are in
vogue
• HSV, HSL/HSI,
YUV, YPbPr, YCbCr
etc
Old Problem
–
Image
Segmentation
Panoptic Segmentation – Not a technique. A metric
Old Problem
–
Image
Segmentation
• Image is an matrix of numbers.
• How to identify the edges of each object
• How to recognize the object correctly
• Differentiate between “things” (foreground)
and “stuff ” (background)
Image
Segmentation
–
Old Solutions
Solution Family Algorithm Drawbacks
Thresholding
• Otsu thresholding
• Adaptive local thresholding
• Mean
• Gaussian
For reasonably simple scenarios only
Edges and Corners
• Canny edges, Sobel Hough, Laplace
algorithms
• Harris Corner detection
• Convolution of kernels
Unsuitable for noisy/blurry images
Region Growing
Watershed
• Relatively strong at detecting
overlapping/touching objects
Super Pixels
• SLCI Algorithm
• Susceptible to noise
• Steep increase in algorithmic
complexit
Clustering
• K-means
• Fuzzy C-Means (FCM)
• Expectation Maximization (EM)
• Relies on low level features like colour etc.
• Poor performance on complicated images
Clustering • Image Pyramid
• Carefully controlled environments only
• Cannot handle non-affine transformation like
rotation, reflection etc.
• Occlusions are a big no-no
• Compute intensive
Image
Segmentation
–
Convolutional
Neural
Networks
• Specialized kind of neural networks
• Process data in known grid-like spatial structures
• Comprised of large number of layers like convolution,
pooling and Fully connected layers
• Usually, very very deep. i.e. lots of layers and lots of weight
parameters
• Non linear Activation Functions are mandatory for learning
complex features
http://cs231n.github.io/convolutional-networks/#overview
Evolution of
CNN
Classifiers
2014
• Regions
with CNN
Features
2015
• Fast R-CNN
• Faster R-CNN
• Inception V3
2016
• YOLO
• SSD
• UberNet
2017
• Mask R-
CNN
• Pixel wise
Instance
Segmentation
Some Salient
points
Regions with CNN FeaturesR-CNN
• Uses Selective Search
• Significantly reduced the search space to ~2000 region proposal
• Very Slow and very complicated
Designed to solve the problems with R-CNNFast R-CNN
• Region Of Interest is treated as a pooling layer
• Jointly trains feature extractor, classifier and bounding box regression into a single model
• Almost 25 time faster than R-CNN
Replace Selective search with region proposal networkFaster R-CNN
• 10 times faster than Fast R-CNN
You Only Look OnceYOLO
• Detection is considered as a regression problem
• Extremely fast but less accurate. Struggles with small objects that appear in groups
Single Shot Multi box detectorSSD
• Faster than YOLO and more accurate as well.
Extension of Faster R-CNNMask R-CNN
• Predicts the object masks as well as bounding box
• Impressive results
Old Problem
-
Depth
Perception
Normal vision and depth
perception expectation
Relative depth
Optical illusion based on depth Picture of a picture. All pixels
have same depth
Old Solutions
-
Depth
Perception
• Stereo cameras spaced at a fixed distance apart capture the
same image.
• Remember trigonometry? 
• Algorithm Families
• Triangulation
• Interferometry
• Time of Flight
• Many Limitations
• Cost
• Complexity
• Controlled environments only
New Solutions
-
Depth
Perception
• Furious research in progress
• Single camera moving between two fixed positions
• Monocular Depth perception
• Some interesting proposals
• Train NN with depth information and semantically segmented
image
• Use the models for predicting depth in new images
Old Problem –
Programmer’s Dilemma
Old Problem
-
Programmers Dilemma
• Which image format should I use?
• Which image file format should I code for? Do I have to
learn reading and writing image files?
• Matlab is expensive 
New Solution
-
OpenCV, Python,
PILLOW etc
• OpenCV
• Democratized image processing
• A large number of functionalities provided as APIs
• Impressive Python bindings and native support for C, Java
• Python
• PILLOW and many other libraries for reading images
• Vectorization and Numpy Arrays
New Solutions
–
New Problems
Neural
Networks
• Data hungry. Lots and lots of training data.
• Resource hungry and compute intensive.
• Overfitting, Underfitting, Stochasticity
• Black box
Some solutions
• Transfer Learning to reduce training time
• Hyper parameter tuning
• Hardware based solutions for improving performance
• On-going research for explainability
• On-going research for reducing the training data
requirement 3rd generation neural networks
Demos

Computer vision old problems new solutions

  • 1.
    Computer Vision – Old Problemsand New Solutions Gopi Krishna Nuti Vice President, MUST Research vp@must.co.in, ngopikrishna@gmail.com
  • 2.
    Computer Vision – The (age)old problems • What should a robot do in “Scene understanding”? • Identify colours, brightness etc • Identify objects a.k.a Image Segmentation • Different things • Multiple occurrences of the same thing • Stuff other than things • Distance of things and stuff • Relative and absolute
  • 3.
    Colour and Brightness Colour spaces •Grayscale, RGB, CMY, • Transparency/Opacity using a fourth attribute Limitations • Does not represent all colours in nature • colour perception highly susceptible to lighting changes. New Solutions • Colour spaces have been expanded greatly. • With micro and macro level differences, ~250 colour spaces are in vogue • HSV, HSL/HSI, YUV, YPbPr, YCbCr etc
  • 4.
  • 5.
    Old Problem – Image Segmentation • Imageis an matrix of numbers. • How to identify the edges of each object • How to recognize the object correctly • Differentiate between “things” (foreground) and “stuff ” (background)
  • 6.
    Image Segmentation – Old Solutions Solution FamilyAlgorithm Drawbacks Thresholding • Otsu thresholding • Adaptive local thresholding • Mean • Gaussian For reasonably simple scenarios only Edges and Corners • Canny edges, Sobel Hough, Laplace algorithms • Harris Corner detection • Convolution of kernels Unsuitable for noisy/blurry images Region Growing Watershed • Relatively strong at detecting overlapping/touching objects Super Pixels • SLCI Algorithm • Susceptible to noise • Steep increase in algorithmic complexit Clustering • K-means • Fuzzy C-Means (FCM) • Expectation Maximization (EM) • Relies on low level features like colour etc. • Poor performance on complicated images Clustering • Image Pyramid • Carefully controlled environments only • Cannot handle non-affine transformation like rotation, reflection etc. • Occlusions are a big no-no • Compute intensive
  • 7.
    Image Segmentation – Convolutional Neural Networks • Specialized kindof neural networks • Process data in known grid-like spatial structures • Comprised of large number of layers like convolution, pooling and Fully connected layers • Usually, very very deep. i.e. lots of layers and lots of weight parameters • Non linear Activation Functions are mandatory for learning complex features
  • 8.
  • 9.
    Evolution of CNN Classifiers 2014 • Regions withCNN Features 2015 • Fast R-CNN • Faster R-CNN • Inception V3 2016 • YOLO • SSD • UberNet 2017 • Mask R- CNN • Pixel wise Instance Segmentation
  • 10.
    Some Salient points Regions withCNN FeaturesR-CNN • Uses Selective Search • Significantly reduced the search space to ~2000 region proposal • Very Slow and very complicated Designed to solve the problems with R-CNNFast R-CNN • Region Of Interest is treated as a pooling layer • Jointly trains feature extractor, classifier and bounding box regression into a single model • Almost 25 time faster than R-CNN Replace Selective search with region proposal networkFaster R-CNN • 10 times faster than Fast R-CNN You Only Look OnceYOLO • Detection is considered as a regression problem • Extremely fast but less accurate. Struggles with small objects that appear in groups Single Shot Multi box detectorSSD • Faster than YOLO and more accurate as well. Extension of Faster R-CNNMask R-CNN • Predicts the object masks as well as bounding box • Impressive results
  • 11.
    Old Problem - Depth Perception Normal visionand depth perception expectation Relative depth Optical illusion based on depth Picture of a picture. All pixels have same depth
  • 12.
    Old Solutions - Depth Perception • Stereocameras spaced at a fixed distance apart capture the same image. • Remember trigonometry?  • Algorithm Families • Triangulation • Interferometry • Time of Flight • Many Limitations • Cost • Complexity • Controlled environments only
  • 13.
    New Solutions - Depth Perception • Furiousresearch in progress • Single camera moving between two fixed positions • Monocular Depth perception • Some interesting proposals • Train NN with depth information and semantically segmented image • Use the models for predicting depth in new images
  • 14.
  • 15.
    Old Problem - Programmers Dilemma •Which image format should I use? • Which image file format should I code for? Do I have to learn reading and writing image files? • Matlab is expensive 
  • 16.
    New Solution - OpenCV, Python, PILLOWetc • OpenCV • Democratized image processing • A large number of functionalities provided as APIs • Impressive Python bindings and native support for C, Java • Python • PILLOW and many other libraries for reading images • Vectorization and Numpy Arrays
  • 17.
  • 18.
    Neural Networks • Data hungry.Lots and lots of training data. • Resource hungry and compute intensive. • Overfitting, Underfitting, Stochasticity • Black box
  • 19.
    Some solutions • TransferLearning to reduce training time • Hyper parameter tuning • Hardware based solutions for improving performance • On-going research for explainability • On-going research for reducing the training data requirement 3rd generation neural networks
  • 20.