My lecture during a Faculty Development Program on the discipline of Computer Vision. Covers a breadth of topics in the field of Computer Vision using both classical image processing algorithms and techniques to new approaches of deep learning etc.
1. Computer Vision
–
Old Problems and New Solutions
Gopi Krishna Nuti
Vice President, MUST Research
vp@must.co.in, ngopikrishna@gmail.com
2. Computer Vision
–
The (age) old
problems
• What should a robot do in “Scene
understanding”?
• Identify colours, brightness etc
• Identify objects a.k.a Image Segmentation
• Different things
• Multiple occurrences of the same thing
• Stuff other than things
• Distance of things and stuff
• Relative and absolute
3. Colour and
Brightness
Colour spaces
• Grayscale, RGB, CMY,
• Transparency/Opacity
using a fourth
attribute
Limitations
• Does not represent all
colours in nature
• colour perception
highly susceptible to
lighting changes.
New Solutions
• Colour spaces have
been expanded greatly.
• With micro and
macro level
differences, ~250
colour spaces are in
vogue
• HSV, HSL/HSI,
YUV, YPbPr, YCbCr
etc
5. Old Problem
–
Image
Segmentation
• Image is an matrix of numbers.
• How to identify the edges of each object
• How to recognize the object correctly
• Differentiate between “things” (foreground)
and “stuff ” (background)
6. Image
Segmentation
–
Old Solutions
Solution Family Algorithm Drawbacks
Thresholding
• Otsu thresholding
• Adaptive local thresholding
• Mean
• Gaussian
For reasonably simple scenarios only
Edges and Corners
• Canny edges, Sobel Hough, Laplace
algorithms
• Harris Corner detection
• Convolution of kernels
Unsuitable for noisy/blurry images
Region Growing
Watershed
• Relatively strong at detecting
overlapping/touching objects
Super Pixels
• SLCI Algorithm
• Susceptible to noise
• Steep increase in algorithmic
complexit
Clustering
• K-means
• Fuzzy C-Means (FCM)
• Expectation Maximization (EM)
• Relies on low level features like colour etc.
• Poor performance on complicated images
Clustering • Image Pyramid
• Carefully controlled environments only
• Cannot handle non-affine transformation like
rotation, reflection etc.
• Occlusions are a big no-no
• Compute intensive
7. Image
Segmentation
–
Convolutional
Neural
Networks
• Specialized kind of neural networks
• Process data in known grid-like spatial structures
• Comprised of large number of layers like convolution,
pooling and Fully connected layers
• Usually, very very deep. i.e. lots of layers and lots of weight
parameters
• Non linear Activation Functions are mandatory for learning
complex features
10. Some Salient
points
Regions with CNN FeaturesR-CNN
• Uses Selective Search
• Significantly reduced the search space to ~2000 region proposal
• Very Slow and very complicated
Designed to solve the problems with R-CNNFast R-CNN
• Region Of Interest is treated as a pooling layer
• Jointly trains feature extractor, classifier and bounding box regression into a single model
• Almost 25 time faster than R-CNN
Replace Selective search with region proposal networkFaster R-CNN
• 10 times faster than Fast R-CNN
You Only Look OnceYOLO
• Detection is considered as a regression problem
• Extremely fast but less accurate. Struggles with small objects that appear in groups
Single Shot Multi box detectorSSD
• Faster than YOLO and more accurate as well.
Extension of Faster R-CNNMask R-CNN
• Predicts the object masks as well as bounding box
• Impressive results
12. Old Solutions
-
Depth
Perception
• Stereo cameras spaced at a fixed distance apart capture the
same image.
• Remember trigonometry?
• Algorithm Families
• Triangulation
• Interferometry
• Time of Flight
• Many Limitations
• Cost
• Complexity
• Controlled environments only
13. New Solutions
-
Depth
Perception
• Furious research in progress
• Single camera moving between two fixed positions
• Monocular Depth perception
• Some interesting proposals
• Train NN with depth information and semantically segmented
image
• Use the models for predicting depth in new images
15. Old Problem
-
Programmers Dilemma
• Which image format should I use?
• Which image file format should I code for? Do I have to
learn reading and writing image files?
• Matlab is expensive
16. New Solution
-
OpenCV, Python,
PILLOW etc
• OpenCV
• Democratized image processing
• A large number of functionalities provided as APIs
• Impressive Python bindings and native support for C, Java
• Python
• PILLOW and many other libraries for reading images
• Vectorization and Numpy Arrays
18. Neural
Networks
• Data hungry. Lots and lots of training data.
• Resource hungry and compute intensive.
• Overfitting, Underfitting, Stochasticity
• Black box
19. Some solutions
• Transfer Learning to reduce training time
• Hyper parameter tuning
• Hardware based solutions for improving performance
• On-going research for explainability
• On-going research for reducing the training data
requirement 3rd generation neural networks