Computer Vision 
2014/9/23 
John Lu
Agenda 
● Human Vision & Image Concept 
● Introduction of Computer Vision 
● Basic operations of Image Processing 
● Computer Vision’s approaches 
● Accelerating Computer Vision 
● Summary
Human Vision & Image Concept 
● Consciousness 
● Sight and Light 
● Eye to Vision 
● Pin-hole system (Camera) 
● Quantization & Resolution
Consciousness 
● Human has many conscious-nesses( 
sight, hearing, touch, 
smell, taste). 
● The most important 
consciousness is sight. 
● The Colour and The plastic 
effect are most important in 
sight.
Sight and Light 
The visible spectrum is the portion of the electromagnetic spectrum that is 
visible to (can be detected by) the human eye. Electromagnetic radiation in this 
range of wavelengths is called visible light or simply light. A typical human eye 
will respond to wavelengths from about 390 to 700 nm.[1] In terms of 
frequency, this corresponds to a band in the vicinity of 430–790 THz.
Sight and Light
Eye to Vision 
● Photoreceptor cell 
● Rod: scotopic vision 
● Cone: photopic vision
Eye to Vision
Pin-hole system (Camera)
Quantization & Resolution 
How to make an Image: Digitalization 
● Resolution: Sampling corresponds to a discretization of 
the space. That is, of the domain of the function, into f : 
[1, . . . ,N] × [1, . . . , M] −→ Rm. 
● Quantization corresponds to a discretization of the 
intensity values. That is, of the co-domain of the 
function.
Introduction of Computer Vision 
● Like Human Vision 
● Relatived Techologies 
● Relatived Applications 
● Image definition 
● Image types 
● Image Transforms
Like Human Vision
Like Human Vision 
But… 
● Need high processing performance 
● Cognitive ability of environmental 
● Adapt to environmental in brightness and 
color 
● Features operate simultaneously and 
complementary function.
Relatived Techologies 
● Image Processing 
● 3D information restoration 
● Recognition 
● Understanding
Relatived Applications 
● Watermarking 
● Pattern recognition 
● 3D computer vision 
● Motion analysis & tracking
Image definition 
Images may be two-dimensional, 
such as a photograph, screen 
display, and as well as a three-dimensional, 
such as a statue or 
hologram.
Image types 
● Number of spectrum 
● Device 
● Range image
Image Transforms 
● Geometric transformation: 
Linear Transform, Eudian Transform...
Image Transforms 
● Space transformation: 
Fourier transform, Discrete cosine transform, Wavelet 
transform...
Basic operations of Image Processing 
● Image enhancement 
● Image segmentation 
● Feature extraction 
● Image representation
Image enhancement 
● Spatial-domain enhancement and 
Frequency-domain enhancement 
● Spatial-domain enhancement: Noise 
removal, Smoothing, Sharpening. 
● Frequency-domain enhancement: Filter
Image enhancement 
Transformation function: 
Basic transorm(bi-level): 
Image negatives, Log transform, Contrast stretching, and Power-law 
(Gamma) transform
Image enhancement 
Image negatives:
Image enhancement 
Log transform:
Image enhancement 
Contrast stretching:
Image enhancement 
Power-law(Gamma) transform:
Image enhancement 
Mask operator: Smoothing and Sharpening 
Mask: 3*1, 3*3, 5*5, .., 31*31 
Neighborhood averaging, Median filtering, High-boost filter
Image enhancement 
Neighborhood averaging: 
Gaussian smoothing:
Image enhancement 
Median filtering: 
The main idea of the median filter is to run through the signal entry by 
entry, replacing each entry with the median of neighboring entries.
Image enhancement 
High-boost filter: It is often desirable to emphasize high 
frequency components representing the image details (by means such 
as sharpening) without eliminating low frequency components 
representing the basic form of the signal.
Image enhancement 
Frequency high-pass filter:
Image segmentation 
Base on discontinued pixels in the image to 
detect basic features: point, line, edge, corner.
Image segmentation 
Isolated point exam 
Mask:
Image segmentation 
Line exam 
Mask 0o, +45o, 90o, -45o:
Image segmentation 
Line exam
Image segmentation 
How many line?
Image segmentation 
Use Hough transform (Global processing)
Image segmentation 
Edge: 
● Edges are pixels where the image 
function changes abruptly. 
● Derivative can be used to detect 
the presence of an edge 
● Derivatives are sensitive to (even fairly little) noise
Image segmentation 
Edge exam: 
● Gradient operators: Prewitt and Sobel
Feature extraction 
● A feature is defined as an "interesting" part 
of an image, and features are used as a 
starting point for many computer vision 
algorithms. 
● Feature detection:LoG, DoG, Hough 
Transfrom... 
● High level feature extraction: HOG, SIFT, 
SURT
Feature extraction 
Laplacian of Gaussian (LoG) 
Laplacian filters are derivative filters used to find areas of rapid change 
(edges) in images. Since derivative filters are very sensitive to noise, it 
is common to smooth the image (e.g., using a Gaussian filter) before 
applying the Laplacian. This two-step process is call the Laplacian of 
Gaussian (LoG) operation.
Feature extraction 
Difference of Gaussian (DoG) 
Differences of Gaussians have also been used for blob detection in the 
scale-invariant feature transform. In fact, the DoG as the difference of 
two Multivariate normal distribution has always a total null sum and 
convolving it with a uniform signal generates no response.
Feature extraction 
Histogram of Oriented Gradients (HOG)
Feature extraction 
Scale Invariant Feature Transform (SIFT) 
An important characteristic of these features is that the relative 
positions between them in the original scene shouldn't change from one 
image to another. For example, if only the four corners of a door were 
used as features. 
SIFT can robustly identify objects even among clutter and under partial 
occlusion, because the SIFT feature descriptor is invariant to uniform 
scaling, orientation, and partially invariant to affine distortion and 
illumination changes.
Feature extraction 
Scale Invariant Feature Transform (SIFT)
Feature extraction 
Speeded Up Robust Features (SURF) 
The standard version of SURF is several times faster than SIFT and 
claimed by its authors to be more robust against different image 
transformations than SIFT. 
It uses an integer approximation to the determinant of Hessian blob 
detector, which can be computed extremely quickly with an integral 
image (3 integer operations). For features, it uses the sum of the Haar 
wavelet response around the point of interest.
Feature extraction 
Speeded Up Robust Features (SURF)
Image representation 
Image representation is used to pre-processing 
of image recognition. 
● Boundary representation 
● Skeleton representation
Image representation 
Boundary representation 
Chain code, Polygonal representation, Gray-level histogram, Boundary 
segments
Image representation 
Skeleton representation
Computer Vision’s approaches 
● Pattern Recognition(PR) 
○ Facial Recognition: Gender, Age, Expression 
○ Object Recognition: Hand, Shape, License Plate 
● Motion analysis & tracking 
○ Human Behavior Analysis 
○ Background Subtraction 
● Popular Issues 
○ Advanced Safety Vehicle(ASV) 
○ Autonomous Land Vehicles(ALV) 
○ Virtual Reality & Argumented Reality & Eye Tracking
PR - Facial Recognition 
The Basic Framework
PR - Facial Recognition 
Features for Facial information: Geometry, 
Texture, Facial parts
PR - Facial Recognition 
Histogram of some features of pixels 
● Local binary patterns (LBP) 
● Histograms of oriented gradients (HOG)
PR - Facial Recognition 
Local binary patterns (LBP): 
LBP has become the choice of features for 
gender classification.
PR - Facial Recognition 
LBP and Gender Classification
PR - Facial Recognition 
Moghaddam, Baback, and Ming-Hsuan Yang. "Learning gender with support faces." Pattern Analysis and Machine Intelligence, IEEE Transactions on 24.5 
(2002): 707-711.
PR - Facial Recognition 
Age Estimation 
Geng, Xin, Zhi-Hua Zhou, and Kate Smith-Miles. "Automatic age estimation based on facial aging patterns." Pattern Analysis and Machine Intelligence, IEEE 
Transactions on 29.12 (2007): 2234-2240.
PR - Facial Recognition 
Age Estimation: Some Age Manifolds
PR - Facial Recognition 
Expression Recognition: Typical Framework
PR - Facial Recognition 
Expression Recognition: Confusion Matrix
PR - Object Recognition 
Hand Detection 
1. Skin Color 
2. Connected Componet 
3. Minimum Convex Polygon
PR - Object Recognition 
Hand Direction Detection 
● Find Longest Axis Direction
PR - Object Recognition 
Shape Recognition 
● Shape Context 
● Histogram 
● Grid-base 
● Edge
PR - Object Recognition 
Shape Context 
1. Finding a list of points on shape edges 
2. Computing the shape context 
3. Comparing shape context
PR - Object Recognition 
Histogram 
● Histogram projection 
● Histogram distance
PR - Object Recognition 
Grid-base
PR - Object Recognition 
Edge: Chamfer Distance
PR - Object Recognition 
License Plate Recognition 
1. LP detection: Adaboost+Haar-like 
2. Character Segmentation 
3. NFL-based classification
PR - Object Recognition 
License Plate Recognition 
Configurations Accuracy Detector Speed 
Dataset of 6498 images Features Training time Image size 
320 x 240 
Methods Detection 
Rate 
#False 
Alarms 
Layers / 
Features 
Minutes Frames Per 
Seconds 
FFS 89.77% 1598 10 / 737 52 32.26 
Fast Adaboost 91.81% 1472 7 / 387 69 25.6
PR - Object Recognition 
License Plate Recognition Issues 
● Low spatial resolution 
● Blurred image 
● Low contrast, Overexposure, Bad lighting 
conditions 
● High distortion
Computer Vision’s approaches 
● Pattern Recognition(PR) 
○ Facial Recognition: Gender, Age, Emotion 
○ Object Recognition: Hand, Shape, License Plate 
● Motion analysis & tracking 
○ Human Behavior Analysis 
○ Background Subtraction 
● Popular Issues 
○ Advanced Safety Vehicle(ASV) 
○ Autonomous Land Vehicles(ALV) 
○ Virtual Reality & Argumented Reality & Eye Tracking
Human Behavior Analysis 
Flow
Human Behavior Analysis 
Using String Representation
Background Subtraction 
What can we do? 
Problem: Is spray pond foreground?
Background Subtraction 
Background Modeling by Codebook
Computer Vision’s approaches 
● Pattern Recognition(PR) 
○ Facial Recognition: Gender, Age, Emotion 
○ Object Recognition: Hand, Shape, License Plate 
● Motion analysis & tracking 
○ Human Behavior Analysis 
○ Background Subtraction 
● Popular Issues 
○ Advanced Safety Vehicle(ASV) 
○ Autonomous Land Vehicles(ALV) 
○ Virtual Reality & Argumented Reality & Eye Tracking
Advanced Safety Vehicle(ASV)
Autonomous Land Vehicles(ALV)
Virtual Reality & Argumented Reality
Accelerating Computer Vision 
● Single Core v.s. Multi-Core 
● Parallel Programming 
● GPGPU 
● OpenCV 
● Android Renderscript
Single Core v.s. Multi-Core
Single Core v.s. Multi-Core 
Moore’s Law needs Multi-core
Single Core v.s. Multi-Core
Parallel Programming 
Flynn's taxonomy
Parallel Programming 
SIMD: 
A computer which exploits multiple data 
streams against a single instruction stream to 
perform operations which may be naturally 
parallelized. For example, an array processor 
or GPU. 
Mask operation!
GPGPU 
● General-purpose computing on graphics processing 
units - GPGPU 
● OpenCL is the currently dominant open general-purpose 
GPU computing language. The dominant 
proprietary framework is Nvidia's CUDA
GPGPU
GPGPU 
Since GPU’s Pipline.
GPGPU 
Since GPU’s 
Stream Multi-Processor
GPGPU
OpenCV 
A very popular computer vision library. 
● 6M downloads 
● BSD licenses 
● 1000 ~ CV functions 
● Modularized and Efficient 
● Optimization: CUDA, OpenCL
Android Renderscript 
● RenderScript is a framework for running 
computationally intensive tasks at high 
performance on Android. 
● RenderScript runtime will parallelize work 
across all processors available on a device, 
such as multi-core CPUs, GPUs, or DSPs
Android Renderscript
Android Renderscript
Summary 
● Human Vision is more complex than 
Computer Vision. 
● Computer Vision only mature in a small part 
of All. 
● The key is Software technology in Computer 
Vision. 
● Parallel programmin is not a simple solution 
but it is only way in the future.
Thanks

Computer Vision

  • 1.
  • 2.
    Agenda ● HumanVision & Image Concept ● Introduction of Computer Vision ● Basic operations of Image Processing ● Computer Vision’s approaches ● Accelerating Computer Vision ● Summary
  • 3.
    Human Vision &Image Concept ● Consciousness ● Sight and Light ● Eye to Vision ● Pin-hole system (Camera) ● Quantization & Resolution
  • 4.
    Consciousness ● Humanhas many conscious-nesses( sight, hearing, touch, smell, taste). ● The most important consciousness is sight. ● The Colour and The plastic effect are most important in sight.
  • 5.
    Sight and Light The visible spectrum is the portion of the electromagnetic spectrum that is visible to (can be detected by) the human eye. Electromagnetic radiation in this range of wavelengths is called visible light or simply light. A typical human eye will respond to wavelengths from about 390 to 700 nm.[1] In terms of frequency, this corresponds to a band in the vicinity of 430–790 THz.
  • 6.
  • 7.
    Eye to Vision ● Photoreceptor cell ● Rod: scotopic vision ● Cone: photopic vision
  • 8.
  • 9.
  • 10.
    Quantization & Resolution How to make an Image: Digitalization ● Resolution: Sampling corresponds to a discretization of the space. That is, of the domain of the function, into f : [1, . . . ,N] × [1, . . . , M] −→ Rm. ● Quantization corresponds to a discretization of the intensity values. That is, of the co-domain of the function.
  • 11.
    Introduction of ComputerVision ● Like Human Vision ● Relatived Techologies ● Relatived Applications ● Image definition ● Image types ● Image Transforms
  • 12.
  • 13.
    Like Human Vision But… ● Need high processing performance ● Cognitive ability of environmental ● Adapt to environmental in brightness and color ● Features operate simultaneously and complementary function.
  • 14.
    Relatived Techologies ●Image Processing ● 3D information restoration ● Recognition ● Understanding
  • 15.
    Relatived Applications ●Watermarking ● Pattern recognition ● 3D computer vision ● Motion analysis & tracking
  • 16.
    Image definition Imagesmay be two-dimensional, such as a photograph, screen display, and as well as a three-dimensional, such as a statue or hologram.
  • 17.
    Image types ●Number of spectrum ● Device ● Range image
  • 18.
    Image Transforms ●Geometric transformation: Linear Transform, Eudian Transform...
  • 19.
    Image Transforms ●Space transformation: Fourier transform, Discrete cosine transform, Wavelet transform...
  • 20.
    Basic operations ofImage Processing ● Image enhancement ● Image segmentation ● Feature extraction ● Image representation
  • 21.
    Image enhancement ●Spatial-domain enhancement and Frequency-domain enhancement ● Spatial-domain enhancement: Noise removal, Smoothing, Sharpening. ● Frequency-domain enhancement: Filter
  • 22.
    Image enhancement Transformationfunction: Basic transorm(bi-level): Image negatives, Log transform, Contrast stretching, and Power-law (Gamma) transform
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
    Image enhancement Maskoperator: Smoothing and Sharpening Mask: 3*1, 3*3, 5*5, .., 31*31 Neighborhood averaging, Median filtering, High-boost filter
  • 28.
    Image enhancement Neighborhoodaveraging: Gaussian smoothing:
  • 29.
    Image enhancement Medianfiltering: The main idea of the median filter is to run through the signal entry by entry, replacing each entry with the median of neighboring entries.
  • 30.
    Image enhancement High-boostfilter: It is often desirable to emphasize high frequency components representing the image details (by means such as sharpening) without eliminating low frequency components representing the basic form of the signal.
  • 31.
    Image enhancement Frequencyhigh-pass filter:
  • 32.
    Image segmentation Baseon discontinued pixels in the image to detect basic features: point, line, edge, corner.
  • 33.
  • 34.
    Image segmentation Lineexam Mask 0o, +45o, 90o, -45o:
  • 35.
  • 36.
  • 37.
    Image segmentation UseHough transform (Global processing)
  • 38.
    Image segmentation Edge: ● Edges are pixels where the image function changes abruptly. ● Derivative can be used to detect the presence of an edge ● Derivatives are sensitive to (even fairly little) noise
  • 39.
    Image segmentation Edgeexam: ● Gradient operators: Prewitt and Sobel
  • 40.
    Feature extraction ●A feature is defined as an "interesting" part of an image, and features are used as a starting point for many computer vision algorithms. ● Feature detection:LoG, DoG, Hough Transfrom... ● High level feature extraction: HOG, SIFT, SURT
  • 41.
    Feature extraction Laplacianof Gaussian (LoG) Laplacian filters are derivative filters used to find areas of rapid change (edges) in images. Since derivative filters are very sensitive to noise, it is common to smooth the image (e.g., using a Gaussian filter) before applying the Laplacian. This two-step process is call the Laplacian of Gaussian (LoG) operation.
  • 42.
    Feature extraction Differenceof Gaussian (DoG) Differences of Gaussians have also been used for blob detection in the scale-invariant feature transform. In fact, the DoG as the difference of two Multivariate normal distribution has always a total null sum and convolving it with a uniform signal generates no response.
  • 43.
    Feature extraction Histogramof Oriented Gradients (HOG)
  • 44.
    Feature extraction ScaleInvariant Feature Transform (SIFT) An important characteristic of these features is that the relative positions between them in the original scene shouldn't change from one image to another. For example, if only the four corners of a door were used as features. SIFT can robustly identify objects even among clutter and under partial occlusion, because the SIFT feature descriptor is invariant to uniform scaling, orientation, and partially invariant to affine distortion and illumination changes.
  • 45.
    Feature extraction ScaleInvariant Feature Transform (SIFT)
  • 46.
    Feature extraction SpeededUp Robust Features (SURF) The standard version of SURF is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT. It uses an integer approximation to the determinant of Hessian blob detector, which can be computed extremely quickly with an integral image (3 integer operations). For features, it uses the sum of the Haar wavelet response around the point of interest.
  • 47.
    Feature extraction SpeededUp Robust Features (SURF)
  • 48.
    Image representation Imagerepresentation is used to pre-processing of image recognition. ● Boundary representation ● Skeleton representation
  • 49.
    Image representation Boundaryrepresentation Chain code, Polygonal representation, Gray-level histogram, Boundary segments
  • 50.
  • 51.
    Computer Vision’s approaches ● Pattern Recognition(PR) ○ Facial Recognition: Gender, Age, Expression ○ Object Recognition: Hand, Shape, License Plate ● Motion analysis & tracking ○ Human Behavior Analysis ○ Background Subtraction ● Popular Issues ○ Advanced Safety Vehicle(ASV) ○ Autonomous Land Vehicles(ALV) ○ Virtual Reality & Argumented Reality & Eye Tracking
  • 52.
    PR - FacialRecognition The Basic Framework
  • 53.
    PR - FacialRecognition Features for Facial information: Geometry, Texture, Facial parts
  • 54.
    PR - FacialRecognition Histogram of some features of pixels ● Local binary patterns (LBP) ● Histograms of oriented gradients (HOG)
  • 55.
    PR - FacialRecognition Local binary patterns (LBP): LBP has become the choice of features for gender classification.
  • 56.
    PR - FacialRecognition LBP and Gender Classification
  • 57.
    PR - FacialRecognition Moghaddam, Baback, and Ming-Hsuan Yang. "Learning gender with support faces." Pattern Analysis and Machine Intelligence, IEEE Transactions on 24.5 (2002): 707-711.
  • 58.
    PR - FacialRecognition Age Estimation Geng, Xin, Zhi-Hua Zhou, and Kate Smith-Miles. "Automatic age estimation based on facial aging patterns." Pattern Analysis and Machine Intelligence, IEEE Transactions on 29.12 (2007): 2234-2240.
  • 59.
    PR - FacialRecognition Age Estimation: Some Age Manifolds
  • 60.
    PR - FacialRecognition Expression Recognition: Typical Framework
  • 61.
    PR - FacialRecognition Expression Recognition: Confusion Matrix
  • 62.
    PR - ObjectRecognition Hand Detection 1. Skin Color 2. Connected Componet 3. Minimum Convex Polygon
  • 63.
    PR - ObjectRecognition Hand Direction Detection ● Find Longest Axis Direction
  • 64.
    PR - ObjectRecognition Shape Recognition ● Shape Context ● Histogram ● Grid-base ● Edge
  • 65.
    PR - ObjectRecognition Shape Context 1. Finding a list of points on shape edges 2. Computing the shape context 3. Comparing shape context
  • 66.
    PR - ObjectRecognition Histogram ● Histogram projection ● Histogram distance
  • 67.
    PR - ObjectRecognition Grid-base
  • 68.
    PR - ObjectRecognition Edge: Chamfer Distance
  • 69.
    PR - ObjectRecognition License Plate Recognition 1. LP detection: Adaboost+Haar-like 2. Character Segmentation 3. NFL-based classification
  • 70.
    PR - ObjectRecognition License Plate Recognition Configurations Accuracy Detector Speed Dataset of 6498 images Features Training time Image size 320 x 240 Methods Detection Rate #False Alarms Layers / Features Minutes Frames Per Seconds FFS 89.77% 1598 10 / 737 52 32.26 Fast Adaboost 91.81% 1472 7 / 387 69 25.6
  • 71.
    PR - ObjectRecognition License Plate Recognition Issues ● Low spatial resolution ● Blurred image ● Low contrast, Overexposure, Bad lighting conditions ● High distortion
  • 72.
    Computer Vision’s approaches ● Pattern Recognition(PR) ○ Facial Recognition: Gender, Age, Emotion ○ Object Recognition: Hand, Shape, License Plate ● Motion analysis & tracking ○ Human Behavior Analysis ○ Background Subtraction ● Popular Issues ○ Advanced Safety Vehicle(ASV) ○ Autonomous Land Vehicles(ALV) ○ Virtual Reality & Argumented Reality & Eye Tracking
  • 73.
  • 74.
    Human Behavior Analysis Using String Representation
  • 75.
    Background Subtraction Whatcan we do? Problem: Is spray pond foreground?
  • 76.
  • 77.
    Computer Vision’s approaches ● Pattern Recognition(PR) ○ Facial Recognition: Gender, Age, Emotion ○ Object Recognition: Hand, Shape, License Plate ● Motion analysis & tracking ○ Human Behavior Analysis ○ Background Subtraction ● Popular Issues ○ Advanced Safety Vehicle(ASV) ○ Autonomous Land Vehicles(ALV) ○ Virtual Reality & Argumented Reality & Eye Tracking
  • 78.
  • 79.
  • 80.
    Virtual Reality &Argumented Reality
  • 81.
    Accelerating Computer Vision ● Single Core v.s. Multi-Core ● Parallel Programming ● GPGPU ● OpenCV ● Android Renderscript
  • 82.
    Single Core v.s.Multi-Core
  • 83.
    Single Core v.s.Multi-Core Moore’s Law needs Multi-core
  • 84.
    Single Core v.s.Multi-Core
  • 85.
  • 86.
    Parallel Programming SIMD: A computer which exploits multiple data streams against a single instruction stream to perform operations which may be naturally parallelized. For example, an array processor or GPU. Mask operation!
  • 87.
    GPGPU ● General-purposecomputing on graphics processing units - GPGPU ● OpenCL is the currently dominant open general-purpose GPU computing language. The dominant proprietary framework is Nvidia's CUDA
  • 88.
  • 89.
  • 90.
    GPGPU Since GPU’s Stream Multi-Processor
  • 91.
  • 92.
    OpenCV A verypopular computer vision library. ● 6M downloads ● BSD licenses ● 1000 ~ CV functions ● Modularized and Efficient ● Optimization: CUDA, OpenCL
  • 93.
    Android Renderscript ●RenderScript is a framework for running computationally intensive tasks at high performance on Android. ● RenderScript runtime will parallelize work across all processors available on a device, such as multi-core CPUs, GPUs, or DSPs
  • 94.
  • 95.
  • 96.
    Summary ● HumanVision is more complex than Computer Vision. ● Computer Vision only mature in a small part of All. ● The key is Software technology in Computer Vision. ● Parallel programmin is not a simple solution but it is only way in the future.
  • 97.