Presentation1.pptx

Unit 3 Image Processing & Computer Vision:
Image - Definition and Tagging.
Classification of images. Tagging.
Image formation,
Deep Learning algorithms for Object detection & Recognition.
Face recognition,
Instance recognition,
Feature detection and matching,
Segmentation,
Recognition Databases and test sets Applications –
Feature extraction, Shape identification. Fane detection.
EDAC
Unit I and Unit II are
mapped with coursera
course titled with
Introduction to Electronics
(https://www.coursera.org/le
arn/electronics?)
Introduction to Artificial
Intelligence (AI)

Image - Definition and Tagging

there’s only one object.
the image tagging program a chance to easily rank the keywords by a percentage or
decimal because the object is an exact something (dog).

Identifying Text in Images
• CAPTCHAs (or Completely Automated Public Turing test to tell
Computers and Humans Apart) are images which contain a distorted
rendering of some text.

Identifying Text in Images
• Their goal is to provide an easy task for humans to do, but that is
extremely hard for computer programs to perform equally.
• For this task, OCR is generally not sufficient enough to extract the
text.
• This is a good example of why machine-readable information should
be available.

Google Images
• Luis von Ahn developed the “ESP Game” which could be used to tag
images.
• He presented a Google tech talk about the game as a form of human
computation.
• Google later licensed the technology to create a similar web
application called the Google Images Image Labeler.

Flickr - Geotagging
• Geotagging is a term for adding
geospatial metadata to images
such as the latitude, longitude
and other directional indications
of where a photo was taken at.

Facebook
• Facebook.com has a tagging feature
that is integrated with “My Photos”.
• It allows you to add a textual
descriptor (tag or person’s name)
to a specific point in the image.
• This allows the module to describe
who or what are included in a specific
album.

What Is Image Tagging?
Image tagging is the process of labeling or keywording images based on figures
within a certain picture.
Image tagging software automatically tags images, though it’s possible for users to
fulfill image tagging processes.
It makes images on websites more searchable through keywords pertaining to that
photo.
Tag:
A tag is a unique identifier for an element or line segment.
Tags are Data.
Annotation is a part of Information about an element or line segment that shows on
a drawing.

Image content
Let's say a project relies on tons
of user-generated or crawled
image content.
Hassle/ Irritate
These images come without any
structured meta-data,
So far you could either annotate the
images manually, using expensive
human labor and effort, or in most of
the cases - present them in a random
order
Automated tagging!
Using Auto-Tagging
you can assign relevant tags
to all these images in an
automated fashion!
Your images are parsed /
Arranged through our Auto-
Tagging .
Analyze them and suggest the
tags they should be
associated with.

How image tagging works?
Imagine an entire home photo album is scanned and uploaded digitally to a
computer.
images are uploaded all at once to a personal website.
As each image is uploaded, a program fills in details about them.
If a picture has a wedding cake in it, the keywords ‘cake’ and ‘food’ are
tagged.
When the process is complete, the user can search the entire digital
album using keywords the program implemented.

Image formation
Light reflection. At the surface of the apple, light
is reflected in all directions and two of the rays
hit the eye of two observers.
Fig. shows the reflection of a ray of light
at the object surface. The object surface
reflects the light in all directions.

Projection on the retina. The object in front of the eye is
projected on the retina.
The ray of light from the surface patch
is reflected in the direction of the
human eye and projected on the
retina.
the inner surface of the eye that
contains the light sensitive cells

Fig. Pinhole Camera. The most simplistic model of
an optical camera is a simple box with a hole in it.
The optical principle of the human
eye is the same as for any optical
camera, be it a photo camera or a
video camera.
The most simple model for such an
optical camera is the pinhole
camera.
Just a box with a small hole and a
photosensitive layer on the
opposite side.
See Fig. for a sketch of a pinhole
camera.

A simple model
- The scene is illuminated by a single source.
- The scene reflects radiation towards the camera.
- The camera senses it via chemicals on film.

Camera Geometry –
The simplest device to form an image of a 3D
scene on a 2D surface is the "pinhole" camera.
–
Rays of light pass through a "pinhole" and form
an inverted image of the object on the image
plane.

Introduction to Object Recognition
image classification is straight forward,
Image classification involves assigning a class label to an image,
image classification task is to look at a picture and say is there a cat or not.
Classification with localization means not only do to label an object as a cat,
but also to put a bounding box or draw a rectangle around the position of the cat in the image.

Object detection is a method that is
used to recognize and detect different
objects present in an image or video
and label them to classify these
objects.
An object is an element that can be represented visually.

Object detection is a process of finding all the possible instances of real-world
objects, such as human faces, flowers, cars, etc. in images or videos, in real-
time with utmost accuracy.
Image classification involves predicting the class of one object in an image.
Object localization refers to identifying the location of one or more objects in an
image and drawing abounding box around their extent.
Object detection combines these two tasks and localizes and classifies one
or more objects in an image.

Object detection combines these two tasks and localizes and classifies one
or more objects in an image.

three computer vision tasks:
•Image Classification: Predict the type or class of an object in an
image.
• Input: An image with a single object, such as a photograph.
• Output: A class label (e.g. one or more integers that are mapped
to class labels).
•Object Localization: Locate the presence of objects in an image and
indicate their location with a bounding box.
• Input: An image with one or more objects, such as a photograph.
• Output: One or more bounding boxes (e.g. defined by a point,
width, and height).
•Object Detection: Locate the presence of objects with a bounding
box and types or classes of the located objects in an image.
• Input: An image with one or more objects, such as a photograph.
• Output: One or more bounding boxes (e.g. defined by a point,
width, and height), and a class label for each bounding box.

Object segmentation, / object instance segmentation / semantic segmentation
where instances of recognized objects are indicated by highlighting the specific pixels of
the object instead of a coarse bounding box.

detection problem where there might be multiple objects in the picture and we have to detect them all and
localize them all.
The classification and the classification with localization problems usually have one big object in the middle
of the image that we’re trying to recognize or recognize and localize.
detection problem there can be multiple objects, and in fact maybe even multiple objects of different
categories within a single image.
Object detection is a method that is used to recognize and detect different objects present in an
image or video and label them to classify these objects.

0.4034
0.7330 0.9254
IOU>0.5 is
considered
a good prediction.
Intersection over Union is
an evaluation metric used
to measure the accuracy
of an object detector on a
particular dataset.

Introduction to Object Recognition With Deep Learning
Object detection can be applied into many computer vision areas, such as video
surveillance, robotics, and human interaction.
Due to the factors of complex background, illumination variation, scale variation,
occlusion, and object deformation, object detection is very challenging and difficult
object detection methods can be divided into two main classes:
handcrafted feature-based methods
deep learning-based methods
2010, deep learning begins to show superior performance on some computer vision
areas (e.g., image classification
2012, with the big image data , deep CNN network (called AlexNet ) achieves the
best detection performance.

CNN Architectures of Object Detection
pipeline of deep object detection
can be divided into two main classes in Fig. :
two-stage methods
one-stage methods

Two-Stage Methods for Deep Object Detection
Two-stage methods treat object detection as a multistage process.
Given an input image, some proposals of possible objects are firstly extracted.
After that, these proposals are further classified into the specific object categories
by the trained classifier.
Benefits of these methods
Reduces a large number of proposals which are put into the following classifier.
accelerate detection speed
The step of proposal generation can be seen as a bootstrap technique.
Two-stage methods, the series of RCNN, including RCNN, SPPnet, Fast RCNN
and Faster RCNN , are very representative.

Architecture of RCNN
Three steps
Step 1: Extracts the candidate object proposals, where the object proposals are
category-independent.
Step 2: For each object proposal of arbitrary scale, the image data is then warped
into a fixed size (e.g., 227×227) and put into the deep CNN network (e.g., AlexNet ) to
compute a 4096-d feature vector.
Step 3: Based on the feature vector extracted by CNN network, the SVM classifiers
predict the specific category of each proposal.
drawback of this method is that RCNN has the problem of repeated computation.

Presentation1.pptx

Recommended

Recommended

More Related Content

Similar to Presentation1.pptx

Similar to Presentation1.pptx (20)

Recently uploaded

Recently uploaded (20)

Presentation1.pptx

Editor's Notes