2. Computer vision is a field of study within artificial intelligence (AI) that
focuses on enabling computers to Intercept and extract information from
images and videos, in a manner similar to human vision. It involves
developing algorithms and techniques to extract meaningful information
from visual inputs and make sense of the visual world.
OR
What is Computer Vision?
Computer vision is a subfield of artificial intelligence that deals with
acquiring, processing, analyzing, and making sense of visual data such
as digital images and videos. It is one of the most compelling types of
artificial intelligence that we regularly implement in our daily routines.
3.
4. Comparison between Image Processing and Computer Vision
Basis of Comparison Image Processing Computer Vision
Definition
Processing the raw images that are
entered into the system in order to
improve them or get them ready for
usage in other applications is the
primary emphasis of image
processing.
The primary objective of computer
vision is to derive information from
the pictures or videos that are used
as input in order to have an accurate
grasp of the data and to anticipate
the visual data in the same way that
the human brain does.
Applicable methods
Methods such as anisotropic
diffusion, hidden Markov models,
independent component analysis,
different filtering, and many more are
utilised throughout the image
processing process.
Image processing is just one of the
many techniques that are employed
in computer vision; other approaches,
such as machine learning, CNN, and
so on are also utilised.
Function
The field of Computer Vision includes
Image Processing as one of its
subfields.
Image processing is the subfield that
Computer Vision falls under.
Applications
Some applications of image
processing include rescaling the
image (also known as digital zoom),
correcting the illumination, and
changing the tones, among other
things.
Object detection, face detection,
handwriting recognition, and other
similar tasks are all examples of
applications that computer vision can
perform.
5. History of Computer Vision
Computer vision is not a new technology because scientists and experts have been trying to
develop machines that can see and understand visual data for almost six decades. The
evolution of computer vision is classified as follows:
1959: The first experiment with computer vision was initiated in 1959, where they
showed a cat as an array of images. Initially, they found that the system reacts first to
hard edges or lines, and scientifically, this means that image processing begins with
simple shapes such as straight edges.
1960: In 1960, artificial intelligence was added as a field of academic study to solve
human vision problems.
1963: This was another great achievement for scientists when they developed
computers that could transform 2D images into 3-D images.
1974: This year, optical character recognition (OCR) and intelligent character
recognition (ICR) technologies were successfully discovered. The OCR has solved the
problem of recognizing text printed in any font or typeface, whereas ICR can decrypt
handwritten text. These inventions are one of the greatest achievements in document
and invoice processing, vehicle number plate recognition, mobile payments, machine
translation, etc.
6. History of Computer Vision
1982: In this year, the algorithm was developed to detect edges, corners, curves, and
other shapes. Further, scientists also developed a network of cells that could recognize
patterns.
2000: In this year, scientists worked on a study of object recognition.
2001: The first real-time face recognition application was developed.
2010: The ImageNet data set became available to use with millions of tagged images,
which can be considered the foundation for recent Convolutional Neural Network (CNN)
and deep learning models.
2012: CNN has been used as an image recognition technology with a reduced error rate.
2014: COCO has also been developed to offer a dataset for object detection and support
future research.
7.
8.
9.
10.
11. How computer vision works
Computer vision is a technique that extracts information from visual data, such
as images and videos
Firstly, a vast amount of visual labeled data is provided to machines to train
it.
This labeled data enables the machine to analyze different patterns in all the
data points and can relate to those labels. E.g., suppose we provide visual
data of millions of dog images.
In that case, the computer learns from this data, analyzes each photo, shape,
the distance between each shape, color, etc., and hence identifies patterns
similar to dogs and generates a model.
As a result, this computer vision model can now accurately detect whether
the image contains a dog or not for each input image.
12.
13. Task Associated with Computer Vision
Although computer vision has been utilized in so many fields, there are a few common tasks for
computer vision systems. These tasks are given below:
Object classification: Object classification is a computer vision technique/task used to classify
an image, such as whether an image contains a dog, a person's face, or a banana. It analyzes
the visual content (videos & images) and classifies the object into the defined category. It
means that we can accurately predict the class of an object present in an image with image
classification.
Object Identification/detection: Object identification or detection uses image classification to
identify and locate the objects in an image or video. With such detection and identification
technique, the system can count objects in a given image or scene and determine their
accurate location and labeling. For example, in a given image, one dog, one cat, and one duck
can be easily detected and classified using the object detection technique.
Object Verification: The system processes videos, finds the objects based on search criteria,
and tracks their movement.
Object Landmark Detection: The system defines the key points for the given object in the image
data.
Image Segmentation: Image segmentation not only detects the classes in an image as image
classification; instead, it classifies each pixel of an image to specify what objects it has. It tries
to determine the role of each pixel in the image.
Object Recognition: In this, the system recognizes the object's location with respect to the
image.
16. Facial recognition: Computer vision has enabled machines to detect face images of
people to verify their identity. Initially, the machines are given input data images in
which computer vision algorithms detect facial features and compare them with
databases of fake profiles. Popular social media platforms like Facebook also use
facial recognition to detect and tag users. Further, various government spy agencies
are employing this feature to identify criminals in video feeds.
Healthcare and Medicine: Computer vision has played an important role in the
healthcare and medicine industry. Traditional approaches for evaluating cancerous
tumors are time-consuming and have less accurate predictions, whereas computer
vision technology provides faster and more accurate chemotherapy response
assessments; doctors can identify cancer patients who need faster surgery with life-
saving precision.
Applications of computer vision
17. Self-driving vehicles: Computer vision technology has also contributed to its role in
self-driving vehicles to make sense of their surroundings by capturing video from
different angles around the car and then introducing it into the software. This helps
to detect other cars and objects, read traffic signals, pedestrian paths, etc., and
safely drive its passengers to their destination.
Optical character recognition (OCR): Optical character recognition helps us extract
printed or handwritten text from visual data such as images. Further, it also enables
us to extract text from documents like invoices, bills, articles, etc.
Machine inspection: Computer vision is vital in providing an image-based automatic
inspection. It detects a machine's defects, features, and functional flaws, determines
inspection goals, chooses lighting and material-handling techniques, and other
irregularities in manufactured products.
Applications of computer vision
18. Retail (e.g., automated checkouts): Computer vision is also being implemented in the retail
industries to track products, shelves, wages, record product movements into the store, etc. This
AI-based computer vision technique automatically charges the customer for the marked
products upon checkout from the retail stores.
3D model building: 3D model building or 3D modeling is a technique to generate a 3D digital
representation of any object or surface using the software. In this field also, computer vision
plays its role in constructing 3D computer models from existing objects. Furthermore, 3D
modeling has a variety of applications in various places, such as Robotics, Autonomous driving,
3D tracking, 3D scene reconstruction, and AR/VR.
Medical imaging: Computer vision helps medical professionals make better decisions regarding
treating patients by developing visualization of specific body parts such as organs and tissues.
It helps them get more accurate diagnoses and a better patient care system. E.g., Computed
Tomography (CT) or Magnetic Resonance Imaging (MRI) scanner to diagnose pathologies or
guide medical interventions such as surgical planning or for research purposes.
Applications of computer vision
19. Automotive safety: Computer vision has added an important safety feature in automotive
industries. E.g., if a vehicle is taught to detect objects and dangers, it could prevent an accident
and save thousands of lives and property.
Surveillance: It is one of computer vision technology's most important and beneficial use cases.
Nowadays, CCTV cameras are almost fitted in every place, such as streets, roads, highways,
shops, stores, etc., to spot various doubtful or criminal activities. It helps provide live footage of
public places to identify suspicious behavior, identify dangerous objects, and prevent crimes by
maintaining law and order.
Fingerprint recognition and biometrics: Computer vision technology detects fingerprints and
biometrics to validate a user's identity. Biometrics deals with recognizing persons based on
physiological characteristics, such as the face, fingerprint, vascular pattern, or iris, and
behavioral traits, such as gait or speech. It combines Computer Vision with knowledge of human
physiology and behavior.
Applications of computer vision
20.
21.
22. Computer Vision Challenges
Computer vision has emerged as one of the most growing domains of artificial
intelligence, but it still has a few challenges to becoming a leading technology. There
are a few challenges observed while working with computer vision technology.
Reasoning and analytical issues: All programming languages and technologies
require the basic logic behind any task. To become a computer vision expert, you
must have strong reasoning and analytical skills. If you don't have such skills, then
defining any attribute in visual content may be a big problem.
Privacy and security: Privacy and security are among the most important factors for
any country. Similarly, vision-powered surveillance is also having various serious
privacy issues for lots of countries. It restricts users from accessing unauthorized
content. Further, various countries also avoid such face recognition and detection
techniques for privacy and security reasons.
Duplicate and false content: Cyber security is always a big concern for all
organizations, and they always try to protect their data from hackers and cyber
fraud. A data breach can lead to serious problems, such as creating duplicate images
and videos over the internet.
24. Geometric camera model
In computer vision, the geometric camera model is a mathematical representation
of how a camera captures a 3D world in a 2D image. This model is essential in
various applications such as 3D reconstruction, augmented reality, and robotics.
Here, we will describe the key components and concepts involved in the geometric
camera model;
Camera Intrinsic
Camera Extrinsic
Projection Models
Camera Calibration
Image Formation
Depth and Disparity
3D Reconstruction
25. Geometric camera model
1. Camera Intrinsic
1.1 Focal Length
The focal length is the distance between the camera center and the image plane.
It affects the field of view of the camera.
1.2 Principal Point
The principal point is the point where the camera's optical axis intersects the
image plane. It is usually assumed to be at the center of the image.
1.3 Lens Distortion
Real lenses introduce distortions to the image. The two main types of distortions
are radial and tangential distortions. These distortions are corrected using
distortion coefficients.
26. Geometric camera model
2. Camera Extrinsic
2.1 Rotation Matrix
The rotation matrix describes the orientation of the camera in the world
coordinate system.
2.2 Translation Vector
The translation vector describes the position of the camera center in the
world coordinate system.
27. Geometric camera model
3. Projection Models
3.1 Perspective Projection
In this model, 3D points are projected onto the 2D image plane using a set of
perspective projection equations..
3.2 Orthographic Projection
In this model, the projection lines are parallel, and there is no perspective
effect
28. Geometric camera model
4. Camera Calibration
Camera calibration is the process of estimating the camera's intrinsic and
extrinsic parameters. This process often involves capturing images of a
known pattern (like a chessboard) and using optimization techniques to find
the best-fitting parameters.
29. Geometric camera model
5. Image Formation
5.1 Pinhole Camera Model
This is a simple camera model that assumes a pinhole aperture. It is often
used as a starting point for more complex models.
5.2 Lens Model
This model incorporates the effects of a lens, including distortion and focal
length.
30. Geometric camera model
6. Depth and Disparity
6.1 Depth Estimation
Depth estimation involves determining the distance of objects from the
camera.
6.2 Stereo Vision
Stereo vision uses two cameras to estimate depth through triangulation.
31. Geometric camera model
7. 3D Reconstruction
Using the geometric camera model, one can reconstruct a 3D representation
of the scene from 2D images.
32. Geometric camera model
Commonly used coordinate systems in CV
World coordinate system (3D)
Camera coordinate system (3D)
Image coordinate system (2D)
Pixel coordinate system (2D)
33. Geometric camera model
Commonly used coordinate systems in CV
Transformations:
World-to-Camera: 3D-3D projection. Rotation, Scaling, Translation
Camera-to-Image: 3D-2D projection. Loss of information. Depends on the camera model
and its parameters (pinhole, f-theta, etc)
Image-to-Pixel: 2D-2D projection. Continuous to discrete. Quantization and origin shift.
34. Geometric camera model
Commonly used coordinate systems in CV
Camera Extrinsic Matrix (World-to-Camera):
Converts points from world coordinate system to camera coordinate system. Depends on the position and
orientation of the camera.
Camera Intrinsic Matrix (Camera-to-Image, Image-to-Pixel):
Converts points from the camera coordinate system to the pixel coordinate system. Depends on camera properties
(such as focal length, pixel dimensions, resolution, etc.)
35. Geometric camera model
Camera Extrinsic
Camera extrinsic refer to the parameters that represent the position and orientation of
the camera in a world coordinate system. These parameters are essential in
determining how a 3D point in a world coordinate system map to a 2D point in the
image coordinate system.
36. Geometric camera model
Camera Extrinsic
Here are the detailed explanations of the components:
Rotation Matrix
Definition: The rotation matrix is a 3x3 matrix that represents the orientation of the camera in
the world coordinate system.
Mathematical Representation:
37. Geometric camera model
Camera Extrinsic
Translation Vector
Definition: The translation vector is a 3x1 vector that represents the position of the
camera center in the world coordinate system.
Mathematical Representation:
The camera's extrinsic matrix, which combines both the rotation and translation,
can be represented as:
This transformation matrix allows you to map points from the camera's coordinate system to
the world coordinate system and vice versa. It is essential for various computer vision tasks,
including object localization, camera tracking, and 3D reconstruction. The specific values of R
and T are determined through calibration or estimation processes.
38. Geometric camera model
Camera Intrinsic
Camera intrinsic refer to the internal characteristics of a camera that affect the imaging
process. These parameters are specific to each camera and are used to relate the 3D world
coordinates to the 2D image coordinates. Here are the detailed explanations of the
components:
43. Geometric camera model
Here is a step-by-step breakdown of the process depicted in the diagram:
3D Point in World Coordinate (Xw, Yw, Zw): This is the initial position of a point in the world coordinate
system.
Rotation Matrix (R): The 3D point is then transformed using the rotation matrix to align the world
coordinate system with the camera coordinate system.
3D Point in Camera Coordinate: The point now represented in the camera's coordinate system after
rotation.
Translation Vector (T): The point is then translated using the translation vector to represent its
coordinates relative to the camera center.
Camera Center Coordinate: The point's coordinates relative to the camera center.
Projection through Intrinsic Parameters (K): The point is then projected onto the image plane using the
camera's intrinsic parameters, which include the focal length and the principal point coordinates.
2D Point in Image Coordinate (u, v): Finally, the point is represented as a 2D point in the image
coordinate system.