Computer vision old problems new solutions

Computer Vision
–
Old Problems and New Solutions
Gopi Krishna Nuti
Vice President, MUST Research
vp@must.co.in, ngopikrishna@gmail.com

Computer Vision
–
The (age) old
problems
• What should a robot do in “Scene
understanding”?
• Identify colours, brightness etc
• Identify objects a.k.a Image Segmentation
• Different things
• Multiple occurrences of the same thing
• Stuff other than things
• Distance of things and stuff
• Relative and absolute

Colour and
Brightness
Colour spaces
• Grayscale, RGB, CMY,
• Transparency/Opacity
using a fourth
attribute
Limitations
• Does not represent all
colours in nature
• colour perception
highly susceptible to
lighting changes.
New Solutions
• Colour spaces have
been expanded greatly.
• With micro and
macro level
differences, ~250
colour spaces are in
vogue
• HSV, HSL/HSI,
YUV, YPbPr, YCbCr
etc

Old Problem
–
Image
Segmentation
Panoptic Segmentation – Not a technique. A metric

Old Problem
–
Image
Segmentation
• Image is an matrix of numbers.
• How to identify the edges of each object
• How to recognize the object correctly
• Differentiate between “things” (foreground)
and “stuff ” (background)

Image
Segmentation
–
Old Solutions
Solution Family Algorithm Drawbacks
Thresholding
• Otsu thresholding
• Adaptive local thresholding
• Mean
• Gaussian
For reasonably simple scenarios only
Edges and Corners
• Canny edges, Sobel Hough, Laplace
algorithms
• Harris Corner detection
• Convolution of kernels
Unsuitable for noisy/blurry images
Region Growing
Watershed
• Relatively strong at detecting
overlapping/touching objects
Super Pixels
• SLCI Algorithm
• Susceptible to noise
• Steep increase in algorithmic
complexit
Clustering
• K-means
• Fuzzy C-Means (FCM)
• Expectation Maximization (EM)
• Relies on low level features like colour etc.
• Poor performance on complicated images
Clustering • Image Pyramid
• Carefully controlled environments only
• Cannot handle non-affine transformation like
rotation, reflection etc.
• Occlusions are a big no-no
• Compute intensive

Image
Segmentation
–
Convolutional
Neural
Networks
• Specialized kind of neural networks
• Process data in known grid-like spatial structures
• Comprised of large number of layers like convolution,
pooling and Fully connected layers
• Usually, very very deep. i.e. lots of layers and lots of weight
parameters
• Non linear Activation Functions are mandatory for learning
complex features

http://cs231n.github.io/convolutional-networks/#overview

Evolution of
CNN
Classifiers
2014
• Regions
with CNN
Features
2015
• Fast R-CNN
• Faster R-CNN
• Inception V3
2016
• YOLO
• SSD
• UberNet
2017
• Mask R-
CNN
• Pixel wise
Instance
Segmentation

Some Salient
points
Regions with CNN FeaturesR-CNN
• Uses Selective Search
• Significantly reduced the search space to ~2000 region proposal
• Very Slow and very complicated
Designed to solve the problems with R-CNNFast R-CNN
• Region Of Interest is treated as a pooling layer
• Jointly trains feature extractor, classifier and bounding box regression into a single model
• Almost 25 time faster than R-CNN
Replace Selective search with region proposal networkFaster R-CNN
• 10 times faster than Fast R-CNN
You Only Look OnceYOLO
• Detection is considered as a regression problem
• Extremely fast but less accurate. Struggles with small objects that appear in groups
Single Shot Multi box detectorSSD
• Faster than YOLO and more accurate as well.
Extension of Faster R-CNNMask R-CNN
• Predicts the object masks as well as bounding box
• Impressive results

Old Problem
-
Depth
Perception
Normal vision and depth
perception expectation
Relative depth
Optical illusion based on depth Picture of a picture. All pixels
have same depth

Old Solutions
-
Depth
Perception
• Stereo cameras spaced at a fixed distance apart capture the
same image.
• Remember trigonometry? 
• Algorithm Families
• Triangulation
• Interferometry
• Time of Flight
• Many Limitations
• Cost
• Complexity
• Controlled environments only

New Solutions
-
Depth
Perception
• Furious research in progress
• Single camera moving between two fixed positions
• Monocular Depth perception
• Some interesting proposals
• Train NN with depth information and semantically segmented
image
• Use the models for predicting depth in new images

Old Problem –
Programmer’s Dilemma

Old Problem
-
Programmers Dilemma
• Which image format should I use?
• Which image file format should I code for? Do I have to
learn reading and writing image files?
• Matlab is expensive 

New Solution
-
OpenCV, Python,
PILLOW etc
• OpenCV
• Democratized image processing
• A large number of functionalities provided as APIs
• Impressive Python bindings and native support for C, Java
• Python
• PILLOW and many other libraries for reading images
• Vectorization and Numpy Arrays

New Solutions
–
New Problems

Neural
Networks
• Data hungry. Lots and lots of training data.
• Resource hungry and compute intensive.
• Overfitting, Underfitting, Stochasticity
• Black box

Some solutions
• Transfer Learning to reduce training time
• Hyper parameter tuning
• Hardware based solutions for improving performance
• On-going research for explainability
• On-going research for reducing the training data
requirement 3rd generation neural networks

Computer vision old problems new solutions

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to Computer vision old problems new solutions

Similar to Computer vision old problems new solutions (20)

More from Gopi Krishna Nuti

More from Gopi Krishna Nuti (8)

Recently uploaded

Recently uploaded (20)

Computer vision old problems new solutions