Computer Vision

Computer Vision
2014/9/23
John Lu

Agenda
● Human Vision & Image Concept
● Introduction of Computer Vision
● Basic operations of Image Processing
● Computer Vision’s approaches
● Accelerating Computer Vision
● Summary

Human Vision & Image Concept
● Consciousness
● Sight and Light
● Eye to Vision
● Pin-hole system (Camera)
● Quantization & Resolution

Consciousness
● Human has many conscious-nesses(
sight, hearing, touch,
smell, taste).
● The most important
consciousness is sight.
● The Colour and The plastic
effect are most important in
sight.

Sight and Light
The visible spectrum is the portion of the electromagnetic spectrum that is
visible to (can be detected by) the human eye. Electromagnetic radiation in this
range of wavelengths is called visible light or simply light. A typical human eye
will respond to wavelengths from about 390 to 700 nm.[1] In terms of
frequency, this corresponds to a band in the vicinity of 430–790 THz.

Eye to Vision
● Photoreceptor cell
● Rod: scotopic vision
● Cone: photopic vision

Quantization & Resolution
How to make an Image: Digitalization
● Resolution: Sampling corresponds to a discretization of
the space. That is, of the domain of the function, into f :
[1, . . . ,N] × [1, . . . , M] −→ Rm.
● Quantization corresponds to a discretization of the
intensity values. That is, of the co-domain of the
function.

Introduction of Computer Vision
● Like Human Vision
● Relatived Techologies
● Relatived Applications
● Image definition
● Image types
● Image Transforms

Like Human Vision
But…
● Need high processing performance
● Cognitive ability of environmental
● Adapt to environmental in brightness and
color
● Features operate simultaneously and
complementary function.

Relatived Techologies
● Image Processing
● 3D information restoration
● Recognition
● Understanding

Relatived Applications
● Watermarking
● Pattern recognition
● 3D computer vision
● Motion analysis & tracking

Image definition
Images may be two-dimensional,
such as a photograph, screen
display, and as well as a three-dimensional,
such as a statue or
hologram.

Image types
● Number of spectrum
● Device
● Range image

Image Transforms
● Geometric transformation:
Linear Transform, Eudian Transform...

Image Transforms
● Space transformation:
Fourier transform, Discrete cosine transform, Wavelet
transform...

Basic operations of Image Processing
● Image enhancement
● Image segmentation
● Feature extraction
● Image representation

Image enhancement
● Spatial-domain enhancement and
Frequency-domain enhancement
● Spatial-domain enhancement: Noise
removal, Smoothing, Sharpening.
● Frequency-domain enhancement: Filter

Image enhancement
Transformation function:
Basic transorm(bi-level):
Image negatives, Log transform, Contrast stretching, and Power-law
(Gamma) transform

Image enhancement
Image negatives:

Image enhancement
Log transform:

Image enhancement
Contrast stretching:

Image enhancement
Power-law(Gamma) transform:

Image enhancement
Mask operator: Smoothing and Sharpening
Mask: 3*1, 3*3, 5*5, .., 31*31
Neighborhood averaging, Median filtering, High-boost filter

Image enhancement
Neighborhood averaging:
Gaussian smoothing:

Image enhancement
Median filtering:
The main idea of the median filter is to run through the signal entry by
entry, replacing each entry with the median of neighboring entries.

Image enhancement
High-boost filter: It is often desirable to emphasize high
frequency components representing the image details (by means such
as sharpening) without eliminating low frequency components
representing the basic form of the signal.

Image enhancement
Frequency high-pass filter:

Image segmentation
Base on discontinued pixels in the image to
detect basic features: point, line, edge, corner.

Image segmentation
Isolated point exam
Mask:

Image segmentation
Line exam
Mask 0o, +45o, 90o, -45o:

Image segmentation
How many line?

Image segmentation
Use Hough transform (Global processing)

Image segmentation
Edge:
● Edges are pixels where the image
function changes abruptly.
● Derivative can be used to detect
the presence of an edge
● Derivatives are sensitive to (even fairly little) noise

Image segmentation
Edge exam:
● Gradient operators: Prewitt and Sobel

Feature extraction
● A feature is defined as an "interesting" part
of an image, and features are used as a
starting point for many computer vision
algorithms.
● Feature detection:LoG, DoG, Hough
Transfrom...
● High level feature extraction: HOG, SIFT,
SURT

Feature extraction
Laplacian of Gaussian (LoG)
Laplacian filters are derivative filters used to find areas of rapid change
(edges) in images. Since derivative filters are very sensitive to noise, it
is common to smooth the image (e.g., using a Gaussian filter) before
applying the Laplacian. This two-step process is call the Laplacian of
Gaussian (LoG) operation.

Feature extraction
Difference of Gaussian (DoG)
Differences of Gaussians have also been used for blob detection in the
scale-invariant feature transform. In fact, the DoG as the difference of
two Multivariate normal distribution has always a total null sum and
convolving it with a uniform signal generates no response.

Feature extraction
Histogram of Oriented Gradients (HOG)

Feature extraction
Scale Invariant Feature Transform (SIFT)
An important characteristic of these features is that the relative
positions between them in the original scene shouldn't change from one
image to another. For example, if only the four corners of a door were
used as features.
SIFT can robustly identify objects even among clutter and under partial
occlusion, because the SIFT feature descriptor is invariant to uniform
scaling, orientation, and partially invariant to affine distortion and
illumination changes.

Feature extraction
Scale Invariant Feature Transform (SIFT)

Feature extraction
Speeded Up Robust Features (SURF)
The standard version of SURF is several times faster than SIFT and
claimed by its authors to be more robust against different image
transformations than SIFT.
It uses an integer approximation to the determinant of Hessian blob
detector, which can be computed extremely quickly with an integral
image (3 integer operations). For features, it uses the sum of the Haar
wavelet response around the point of interest.

Feature extraction
Speeded Up Robust Features (SURF)

Image representation
Image representation is used to pre-processing
of image recognition.
● Boundary representation
● Skeleton representation

Boundary representation
Chain code, Polygonal representation, Gray-level histogram, Boundary
segments

Skeleton representation

Computer Vision’s approaches
● Pattern Recognition(PR)
○ Facial Recognition: Gender, Age, Expression
○ Object Recognition: Hand, Shape, License Plate
○ Human Behavior Analysis
○ Background Subtraction
● Popular Issues
○ Advanced Safety Vehicle(ASV)
○ Autonomous Land Vehicles(ALV)
○ Virtual Reality & Argumented Reality & Eye Tracking

PR - Facial Recognition
The Basic Framework

Features for Facial information: Geometry,
Texture, Facial parts

Histogram of some features of pixels
● Local binary patterns (LBP)
● Histograms of oriented gradients (HOG)

Local binary patterns (LBP):
LBP has become the choice of features for
gender classification.

LBP and Gender Classification

Moghaddam, Baback, and Ming-Hsuan Yang. "Learning gender with support faces." Pattern Analysis and Machine Intelligence, IEEE Transactions on 24.5
(2002): 707-711.

Age Estimation
Geng, Xin, Zhi-Hua Zhou, and Kate Smith-Miles. "Automatic age estimation based on facial aging patterns." Pattern Analysis and Machine Intelligence, IEEE
Transactions on 29.12 (2007): 2234-2240.

Age Estimation: Some Age Manifolds

Expression Recognition: Typical Framework

Expression Recognition: Confusion Matrix

PR - Object Recognition
Hand Detection
1. Skin Color
2. Connected Componet
3. Minimum Convex Polygon

Hand Direction Detection
● Find Longest Axis Direction

Shape Recognition
● Shape Context
● Histogram
● Grid-base
● Edge

Shape Context
1. Finding a list of points on shape edges
2. Computing the shape context
3. Comparing shape context

Histogram
● Histogram projection
● Histogram distance

Grid-base

Edge: Chamfer Distance

License Plate Recognition
1. LP detection: Adaboost+Haar-like
2. Character Segmentation
3. NFL-based classification

License Plate Recognition
Configurations Accuracy Detector Speed
Dataset of 6498 images Features Training time Image size
320 x 240
Methods Detection
Rate
#False
Alarms
Layers /
Features
Minutes Frames Per
Seconds
FFS 89.77% 1598 10 / 737 52 32.26
Fast Adaboost 91.81% 1472 7 / 387 69 25.6

License Plate Recognition Issues
● Low spatial resolution
● Blurred image
● Low contrast, Overexposure, Bad lighting
conditions
● High distortion

Computer Vision’s approaches
● Pattern Recognition(PR)
○ Facial Recognition: Gender, Age, Emotion
○ Object Recognition: Hand, Shape, License Plate
○ Human Behavior Analysis
○ Background Subtraction
● Popular Issues
○ Advanced Safety Vehicle(ASV)
○ Autonomous Land Vehicles(ALV)
○ Virtual Reality & Argumented Reality & Eye Tracking

Human Behavior Analysis
Using String Representation

Background Subtraction
What can we do?
Problem: Is spray pond foreground?

Background Subtraction
Background Modeling by Codebook

Virtual Reality & Argumented Reality

Accelerating Computer Vision
● Single Core v.s. Multi-Core
● Parallel Programming
● GPGPU
● OpenCV
● Android Renderscript

Single Core v.s. Multi-Core
Moore’s Law needs Multi-core

Parallel Programming
Flynn's taxonomy

Parallel Programming
SIMD:
A computer which exploits multiple data
streams against a single instruction stream to
perform operations which may be naturally
parallelized. For example, an array processor
or GPU.
Mask operation!

GPGPU
● General-purpose computing on graphics processing
units - GPGPU
● OpenCL is the currently dominant open general-purpose
GPU computing language. The dominant
proprietary framework is Nvidia's CUDA

GPGPU
Since GPU’s
Stream Multi-Processor

OpenCV
A very popular computer vision library.
● 6M downloads
● BSD licenses
● 1000 ~ CV functions
● Modularized and Efficient
● Optimization: CUDA, OpenCL

Android Renderscript
● RenderScript is a framework for running
computationally intensive tasks at high
performance on Android.
● RenderScript runtime will parallelize work
across all processors available on a device,
such as multi-core CPUs, GPUs, or DSPs

Summary
● Human Vision is more complex than
Computer Vision.
● Computer Vision only mature in a small part
of All.
● The key is Software technology in Computer
Vision.
● Parallel programmin is not a simple solution
but it is only way in the future.

Computer Vision

More Related Content

What's hot

Viewers also liked

Similar to Computer Vision

More from Kan-Han (John) Lu

Recently uploaded

Computer Vision