3. Feature Machine Vision Human Vision
Spectral range Gamma rays to microwaves
(10-11 - 10-1 m)
Visible light (4x10-7 -7x10-7m)
Spatial Resolution 4x106 pixels (area scan,
growing rapidly), 8192 (line-
scan)
Effectively approximately
4000x4000 pixels
Sensor size Small (approx.5x5x15 mm3) Very large
Quantitative Yes. Capable of precise
measurement of size, area
No
Ability to cope with unseen
events
Poor Good
Performance on repetitive tasks Good Poor, due to fatigue and
boredom
Intelligence Low High (subjective)
Light level variability Fixed, closely controlled Highly variable
Light level (min) Equivalent to cloudy
moonless night
Quarter-moon light
(Greater if dark-adaptation is
extended)
Strobe lighting and lasers Possible (good screening is
needed for safety)
Unsafe
Consistency Good Poor
Capital cost Moderate Low
Running cost Low High
4. Feature Machine Vision Human Vision
Inspection cost, per unit Low High
Ability to “program” in situ Limited.. Special
interfaces make task
easier
Speech is effective
Able to cope with multiple
views in space and/or time
Versatile Limited
Able to work in toxic,
biohazard areas
Yes Not easily
Non-standard scanning
methods
Line scan, circular scan,
random scan, spiral-
scan, radial scan
Not possible
Image storage Good Poor without photography
or digital storage
Optical aids Numerous available Limited
6. Application features that make your vision system attractive includes
1. Inaccessible part (a robot in the way for example)
2. Hostile manufacturing environment
3. Possible part damage from physical contact
4. Need to measure large number of features
5. Predictable interaction with light
6. Poor or no visual access to part features of interests
7. Extremely poor visibility
8. Mechanical/electrical sensors provide the necessary
data
9. Image formation:
Right illumination, optical sensor such as high resolution cameras, line scan cameras,
frame grabber . Transformation of the visual image of a physical object and its intrinsic
characteristics into set of digitized data that can be used by the image processing unit.
Image processing:
Image processing consists of image grabbing, image enhancement, feature extraction
and output formatting. Function of image processing is to create a new image by
altering the data in such a way that the features of interests are enhanced and the
noise is reduced.
Image analysis:
The main function of image analysis is the automatic extraction of explicit information
regarding the content of the image, for example, the shape, size or the range data and
local orientation information from several two dimensional images. It utilizes several
redundant information representations such as edges, boundaries, disparities shading
etc. Most commonly used techniques are: template matching, statistical pattern
recognition and the Hough transform.
Decision making:
Decision making is concerned with making a decision based on the description in the
image and using AI to control the process or task
10. Generic Model of Machine Vision
• Scene constraints
• Image acquisition
• Preprocessing
• Segmentation
• Feature Extraction
• Classification and/or Interpretation
• Actuation
11. Scene Constraints
• Scene refers to the environment in which the task is taking place and into
which the machine -vision system is to be placed.
• The aim of the scene constraint sub-system is to reduce the complexity of
the subsequent subsystems to a manageable level. This is achieved by
proper exploitation of a priori constraints such as: knowledge of limited
number of objects possible in the scene, knowledge of their surface finish
and appearance etc. We can also impose new constraints such as:
replacement of ambient light with carefully controlled lighting.
• As is clear from the terminology itself the Scene refers to the industrial
environment in which the manufacturing is being done and the machine
vision system is to perform the required task in that environment.
• The aim of this module is to reduce the complexity of all the subsequent
sub-systems to a manageable level which is achieved by exploitation of
existing constraints and imposition of new ones.
12. Scene Constraints
Two types of scene constraints can be applied
1. Inherent or natural constraints
2. Imposed constraints
Inherent constraints
• Characteristics of the material
• Inherent features
• Limitations, in the range of objects
• Inherent positional limitations
Imposed constraints
• Control of object features
• Control of object position
• Control of lighting conditions
15. Lighting Sources
• LED illumination units
• Metal halide light sources (“cold light sources”
transmitted over fibre-optic cables)
• Laser illumination units
• Fluorescent light (high-frequency)
• Halogen lamps
16. Light Source Type Advantages Disadvantages
LED Array of light-emitting
diodes
Can form many configurations
within the arrays; single color
source can be useful in some
application
Some features hard to see with
single color source; large array
required to light large area
Fiber-Optic Illuminators
Incandescent lamp in housing;
light carried by optical fiber
bundle to application
Fiber bundles available in many
configurations; heat and
electrical power remote from
application; easy access for
lamp replacement
Incandescent lamp has low
efficiency, especially for blue
light
Fluorescent High-frequency
tube or ring lamp
Diffuse source; wide or narrow
spectral range available; lamps
are efficient and long lived
Limited range of configurations;
intensity control not available
on some lamps
Strobe Xenon arc strobe lamp,
with either direct or fiber bundle
light delivery
Freezes rapidly moving parts;
high peak illumination intensity
Requires precise timing of light
source and image capture
electronics. May require eye
protection for persons working
near the application
18. Image Acquisition
• Translation from the light stimuli falling onto the photo sensors
of a camera to a stored digital value within the computer’s
memory.
• Each digitized picture is typically of 512x512 pixels resolution,
with each pixel representing a binary, grey or color value.
• To ensure that no useful information is lost a proper choice of
spatial and luminescence resolution parameters must be made.
• Depending on particular application cameras with line scan or
area scan elements can be made use of for image acquisition.
• While area scan sensors have lower spatial resolution but they
provide highly standardized interfacing to computers and do not
need any relative motion between the object and the camera;
the line scan sensors need relative motion to build 2-D image.
19. Preprocessing-1
• To produce a form of the acquired image which is better suited
for further operations the processes (contrast enhancement,
and adjustment, filtering to remove noise and improve quality)
modify and prepare pixel values for digitized image.
• Fundamental information of the image is not changed during
this module.
• The initially acquired image has direct pixel by pixel relation to
the original scene and thus lies in the spatial domain.
• Transformations from spatial to frequency domain can be done
using Fourier transforms, which is although not very
computationally efficient operation.
20. Preprocessing-2
• Low level processing for image improvement such as histogram
manipulations (grey level shifting or equalization) involves noisy images
clean up and highlight features of particular interest.
• With the use of some transformations pixels are shared among grey levels
which would enhance or alter the appearance of the image.
• Histogram manipulations provide simple image improvement operations,
either by grey level shifting or, equalization.
• An image histogram is easily produced by recording the number of pixels at
a particular grey level.
• If this shows a bias towards the lower intensity grey levels, then some
transformation to achieve a more equitable sharing of pixels among the grey
levels would enhance or alter the appearance of the image. Such
transformations will simply enhance or suppress contrast, and stretch or
compress grey levels, without any alteration in the structural information
present in the image.
21. Preprocessing-3
• Another important class of spatial domain algorithms is
designed to perform pixel transformation, whose final value is
calculated as a function of a group of pixel values (or
'neighborhood') in some specified spatial location in the original
image.
• Many filtering algorithms for smoothing (low pass) and edge
enhancement (high pass) are firmly in this category.
• This introduces the basic principle of 'windowing operations' in
which a 2-D (two-dimensional) mask, or window, defining the
neighborhood of interest is moved across the image, taking
each pixel in turn as the centre, and at each position the
transformed value of the pixel of interest is calculated.
23. Image Segmentation-1
• Acquired image is broken up into meaningful regions or segments, i.e.
partitioning of image.
• Segmentation is not concerned with what the image represents. Broadly two
approaches are employed:
Thresholding based on some predetermined criterion (global thresholds
the entire image into single threshold value or local thresholds partitions
image into sub-images and determines for each of them) and
• Edge-based methods (uses digital versions of standard finite operators
which accentuates intensity changes, which gives rise to peak in the first
derivative or a zero crossing in second derivative, which can be detected
and properties such as position, sharpness, and height of peak infer the
location, sharpness and contrast of intensity changes in the image). Edge
elements can be used to form the complete boundaries as shown in the
Figure 1.5.
24. Image Segmentation-2
The classical approach to edge-based segmentation begins with edge
enhancement which makes use of digital versions of standard finite
difference operators, as in the first-order gradient operators (e.g. Roberts
Sobel) or in the second-order Laplacian operator.
The difference operation accentuates intensity changes, and transforms
this the image into a representation from which properties of these
changes can be extracted more easily.
A significant intensity change gives rise to a peak in the first derivative or
a zero crossing in the second derivative of the smoothed intensities.
These peaks, or zero crossings, can be detected easily, and properties
such as the position, sharpness, and height of the peaks infer the
location, sharpness and contrast of the intensity changes in the image.
Edge elements can be identified from the edge-enhanced image and
these can then be linked to form complete boundaries of the regions of
interest
25. Feature Extraction
• During this phase the inherent characteristics or
features of different regions within the image are
identified, which are checked against predetermined
standards.
• This description should be invariant to position,
orientation and scale of the object.
• A number of basic parameters such as minimum
enclosing rectangle, centre of area (e.g. centre may
be considered as object oriented origin and series of
feature descriptors can be developed), may be
derived from an arbitrary shape and can be used for
classification and position information
26. Image Classification (Analysis)
• The classification sub-system is concerned with pattern recognition or image classification. This
process utilizes some or all of the extracted features to make a decision about to which
category of objects the unknown object belongs.
• There are three main techniques for classification
• Template matching
• Statistically based approaches
• Neural network approach
• Template matching is used in situations where the objects to be identified have well defined
and highly 'differentiated’ features, for example standard alphanumeric character fonts. In such
cases an unknown character is compared with a set of templates or masks, each of which fits
just one character uniquely.
• Statistical techniques can be selected to provide optimum classification performance for more
varied industrial applications.
• If the vision task is well constrained then classification may be made via a simple tree
searching algorithm where classification proceeds by making branching choices on the basis of
single feature parameters. In more complex cases, n features are combined to create a 'feature
vector' which places a candidate object within the n-dimensional feature space. Provided that
the features have been properly chosen to divide the allowable range of candidate objects into
well separated 'clusters', then classification merely consists of dividing the space with one or
more 'decision surfaces', such that each decision surface reliably separates two clusters.
27. Industrial Vision: Image acquisition
CCD camera
Digitalization
Data acquisition cards
Vision software
Cameras and sensors; lenses
33. Video sources
The video source can be:
• Video camera
• Camcorder
• Video recorder (VCR)
• Television broadcasts
• X-ray equipment
• Scanning Electron Microscope (SEM)
• CT scanner
34. Composite video = signal containing both video data (luminance + colour)
and the timing (synchronisation) information. It is the standard which
interconnects almost all video equipment (TVs, laserdisc, videorecorders,
camcorders) at home.
Examples of composite video standards:
• RS-170:
• used in North America and Japan
• Monochrome signal
• Spatial resolution: 640 pixels x 480 lines
• Frequency: 60 fields/second (equivalent to 30 frames/second)
• NTSC/RS-330
• used in North America and Japan
• Equivalent to RS-170 but colour information is superimposed on the
monochrome signal.
• NTSC = National Television System Committee
Signal types for Image acquisition boards
36. Signal types for Image acquisition boards
S-Video (also called Y/C video): luminance (Y) and chrominance (C) are separate
signals. The Y signal contains timing (synchronisation) information. S-video can be
transported over 4 pin mini DIN connector, or over SCART connector.
Some image sources produce “nonstandard” video signals:
• Video and timing information can vary in format as well as in single or multiple
signals. They do not adhere to particular spatial resolutions, signal timing
schemes, signal characteristics … Consult the documentation provided with
your video source.
Progressive scan (25-30 frames/sec) cameras produce non interlaced signals.
All previous camera signals are analogue.
DIGITAL CAMERAS: No frame grabber required!
• Cameras with FireWire (IEEE 1394) interface.
Supported by Apple, Windows XP
• Cameras with USB interface
38. Image acquisition boards
• The video capture device is often called frame grabber card.
• Frame grabber puts a pixel mask over the image: the card converts the
analogue image (or images) supplied by a video source into a digital array
(or arrays) of data points.
• It is a plug in card (PCI) with AD convertor. The ADC must have video
speed: 20 MHz or higher (30 or 25 video frames per second, 300 kB [640 x
480 x 8 bit] per frame.
• Other features:
• input multiplexer (to select one of the 4 inputs)
• Colour notch filter = chrominance filter (to acquire monochrome signals
from colour sources)
• Programmable gain stage (to match the signal into the ADC input range)
• Timing and acquisition control (to synchronise grabbing with sync pulses
of incoming signal: PLL or Digital Clock Synchronisation)
• Camera control stage (to send to the camera or to receive from the
camera setup and control signals, e.g. horizontal and vertical sync
signals, pixel clock and reset signals)
• Most cards provide digital I/O for input or output operations, to
communicate with external digital devices (e.g. industrial process). This
saves a separate I/O board.
40. Image acquisition boards (continued)
• Plug-in cards (image grabber, frame grabber card) for analogue cameras
• Are plugged in at a VME or PCI bus
• Are delivered with Windows 98 or NT drivers
• Accept cameras according to the EIA (30 frames/sec) or CCIR (25)
standards
• Good cards have their own processor (DMA data transfer to PC) and
large RAM
• Others (cheaper ones) use the PC processor
• They accept the signals: S video, composite video TV or VCR signals
(NTSC/PAL/Secam)
• Some cards have camera control output
41. Image acquisition - Cameras
• Sensor types:
• Line
• Array
• Interface standards:
• CCIR / RS-170 (B&W, 50-60 fields/sec.)
• PAL / SECAM / NTSC (Colour)
• Progressive scan (25-30 frames/sec.)
• FireWire (IEEE 1394)
• USB
• Sensor technology:
• CCD (Charge Coupled Device)
• CMOS (Complementary Metal Oxide
Semiconductor). A CMOS camera produces a
1000*1000 pixel image
42. Spatial resolution
• The number of rows (N) from a video source generally corresponds
one-to-one with lines in the video image. The number of columns,
however, depends on the nature of the electronics that is used to
digitize the image. Different frame grabbers for the same video camera
might produce M = 384, 512, or 768 columns (pixels) per line.
• a CCIR / PAL image source can result in max 768 x 576 pixel image
• a RS-170 / NTSC source can result in max 640 x 480 pixel image
• Depending on video source or camera used, the spatial resolution can
range from 256 x 256 up to 4096 x 4096.
• Most applications use only the spatial resolution required. For fast
image transfer and manipulation, often 512 x 512 is used. For more
accurate image processing, 1024 x 1024 is common.
• The pixel aspect ratio (pixel width : pixel height) can be different from
1:1, typical 4:3. Some frame grabbers don’t convert video data into
square pixels but into rectangle ones. This creates the effect of a circle
appearing ovular, and squares appearing as rectangles.
43. Spatial resolution
• Example 768 x 512 (aspect ratio 3 : 2)
Brightness resolution
• Brightness resolution = bit depth resolution: number of gray levels
(monochrome) or number of colours
• RS-170 / NTSC image: 8 bits = 256 gray levels
• A standard RS-170 image is 307 kB large: 640 x 480 x 8bit.
768 pixels
512 rows
2
3
520 rows max CCIR
44. Interlaced / non interlaced formats
• A video signal consists of a series of lines. Horizontal sync pulses
separe the lines from each other.
• All composite video sources (RS-170/NTSC, CCIR/PAL) and some
nonstandard video sources transmit the lines in interlaced format: first
the odd (first field), afterwards the even lines (second field).
• Vertical sync pulses separate the fields from each other.
• Some nonstandard video sources transmit the lines in non-interlaced
format = progressive scan. Only one field, containing all the lines, is
transmitted.
• Progressive scan is recommended for fast moving images.
• If one is planning to use images that have been scanned from an
interlaced video source, it is important to know if the two half-images
have been appropriately "shuffled" by the digitization hardware or if that
should be implemented in software. Further, the analysis of moving
objects requires special care with interlaced video to avoid "zigzag"
edges.