SlideShare a Scribd company logo
1 of 30
Download to read offline
http://www.iaeme.com/IJARET/index.asp 104 editor@iaeme.com
International Journal of Advanced Research in Engineering and Technology
(IJARET)
Volume 6, Issue 12, Dec 2015, pp. 104-133, Article ID: IJARET_06_12_010
Available online at
http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=6&IType=12
ISSN Print: 0976-6480 and ISSN Online: 0976-6499
© IAEME Publication
REVIEW ON GENERIC OBJECT
RECOGNITION TECHNIQUES:
CHALLENGES AND OPPORTUNITIES
Prof. Deepika Shukla
Comp. Science and Engineering Department,
Institute of Technology, Nirma University, Ahmedabad, India
Apurva Desai
Department of Computer Science and Information Technology,
VNSGU, Surat India
ABSTRACT
Recognizing objects automatically from an image is a fundamental step for
many real-world computer vision applications. It is the task of identifying an
instance of object in an image or video sequence without or least human
intervention and assistance. In-spite of very high complexity, human beings
perform this task with very less effort and even in the state of least attention.
Little effort is needed for the humans to recognize huge number of and various
categories of objects in images, though ‘object’ in the image may be different
with respect to size / scale, viewpoint, position or orientation. We are even
able to recognize the objects from an image, when they are only partially
visible or present against cluttered background. Not only this, the recognition
can be for specific instance of object or object category/class. When the task is
done for classes of the object it is known as Generic object recognition or
object-class detection or category-level object recognition. It has been found
that over the years many techniques have evolved for recognizing object
classes from images, but any automated object recognition system till date has
not gained this capability fully at par with human beings. This very fact makes
recognition of objects from an image, the most basic and fundamental
challenge in the field of computer vision research. The purpose of this study is
to give an overview and categorization of the approaches used in the literature
for the purpose of Generic Object Recognition and various technical
advancements achieved in the field. Mostly the survey focusses on the leading
work since year 2000.
We have discussed the challenges that the field is currently facing. We
have also made an attempt to suggest future research directions in the area of
Generic Object Recognition. Finally we conclude the study with a hope that in
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 105 editor@iaeme.com
near future more sophisticated object class recognition systems would be
developed in an efficient and cost effective manner.
Key words: Object Recognition, Generic Object Recognition, Object class
Recognition, Scene Understanding, Scene categorization, Image Analysis,
Computer Vision, Machine Vision, Scene Analysis, Image Analysis.
Cite this Article: Prof. Deepika Shukla and Apurva Desai, Review on
Generic Object Recognition Techniques: Challenges and Opportunities.
International Journal of Advanced Research in Engineering and Technology,
6(12), 2015, pp. 104-133.
http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=6&IType=12
1. INTRODUCTION
Automated recognition of objects in images is a critical and fundamental step for
many real-world computer vision applications. It is the task of finding a given object
in an image or video sequence without or least human intervention/ assistance. As we
know, very little effort is required at our part to detect and recognize huge number of
classes of objects in images though image of the object may be different with respect
to size / scale, viewpoint position or orientation. Human beings are able to recognize
the objects from an image even when they are only partially visible or present against
cluttered background. Also, the ability to generalize from examples and categories
objects, events, scenes, and places is one of the core capabilities of the human visual
system; For human being this is a mundane activity, but imbibing these capabilities in
machine, has still proved to be significantly challenging task for computer vision
systems in general.
The reason behind this may root to the fact that “Automatic Object Recognition”
requires understanding of human visual perception and so becomes a
multidisciplinary research area involving knowledge and expertise of fields like
optics, psychology, pattern recognition, artificial intelligence, machine learning and
most importantly cognitive science which in itself needs sophisticated concepts and
tools from mathematics as well as computer science [1].
Object recognition is a dominant field of research in the computer vision as well
as image analysis applications and even the simplest machine vision task cannot be
solved without the help of recognition. The fact can be evidenced by the existence of
vast volume of research conducted in this area over the past three decades. The
statement can be substantiated by the fact that, if one just gives, “objects recognition
from images” as the search string on ieeexplore.org, gets more than 20000 results. So,
from the substantial volume of current literature existing on the topic, we can also say
that “Object Recognition” field is closely tied to and is part and parcel of computer
vision research.
This paper reviews most of the leading state-of-the-art researches performed in the
area of Generic object recognition. But more specifically, it is focused to get the
insight into following Research Questions pertaining to the topic of Generic Object
Recognition.
What are the generic object recognition techniques and approaches drawn by the
literature?
What different representation techniques are used for object representation?
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 106 editor@iaeme.com
Which feature detection and extraction methods are used by most of the prominent
researchers on the topic?
Which classification/learning technique has been used in the classification stage of the
object recognition pipeline?
The rest of this paper is organized as follows. Section 2 introduces and explains
the problem of Generic object recognition which can be considered as specific subset
of object recognition problem. Section 3 concentrates on the challenges that the field
of object recognition faces in general and Generic Object Recognition in particular.
Section 4 discusses the vast literature existing for the topic area. Section 5 manifests
roadmap to future research areas and directions. Section 6 finally sought conclusion
of the study.
2. GENERIC OBJECT RECOGNITION PROBLEM
The problem of object recognition can be viewed as a classification or labelling
problem where models/representation of known objects are available to the system
and when a novel image is given, the system has to predict the class of the object[s]
present in the Image. Formally, it can be stated as, given an image containing one or
more objects of interest (and background) and a set of labels corresponding to a set of
models known to the system, the system should assign correct labels to regions, or a
set of regions, in the image. i. e. Object recognition systems should assign a high level
definition of an object based on the image data, that is represented.
Oftentimes, the task of object recognition is considered as broadly comprising of three
sub-tasks;
Object detection: Detecting whether an instance of the object category is present in
the image or not.
Localization: To give the location of object category. Drawing a bounding box
surrounding the object instance is most prominently used in literature to show the
result of localization.
Visual category recognition: To recognize and label the class/category of object
present in the image.
Moreover, the image being presented to the object recognition framework for the
purpose of recognizing objects from it may have single instance of some class of
object or may have multiple instances of single class or multiple instances of multiple
classes. Therefore, the object recognition approaches at the top-most level can broadly
be categorized to follow top-down, bottom-up or hybrid approach. And within that it
can be for recognizing specific or generic object. So basically image-based object
recognition can be stated as; Given a database of objects and an image, determine
what, if any of the object[s] are present in the image. Thus the problem of object class
recognition can be considered as an instance of supervised classification.
Another dimension along which, the task of object recognition can be categorized
is: First, Where a specific object to be recognized is known to the system and the
system is trained for that specific object category only. For example, Face recognition
, pedestrian recognition Second, Generic object recognition system. Generic object
recognition means that the computer recognizes objects from images by their general
name [2] or common name. Figure-1 shows an instance of Generic Object
Recognition. Generic object recognition has been also referred as object-class
detection or category-level object recognition in literature [14] which aims at
recognizing the class to which the object present in the image belongs. The images
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 107 editor@iaeme.com
can have single instance of a class, multiple instances of same class or multiple
instances of multiple classes. When categorization of multiple objects of multiple
classes in an image is done, it is known as scene categorization.
Figure 1 Generic Object Recognition
2.1. Architecture of the object recognition system
The current vision systems can said to be consisting of activities as shown in Figure -
2.
Figure 2 Activities involved in a typical vision system
Any recognition system would involve these or some subset of these activities in
its life cycle. In general, after image acquisition stage, image is pre-processed for
performing noise removal and some kind of enhancement. The pre-processing stage is
followed by feature extraction and description/representation stage which then are
passed for recognition. In the representation stage, the objects can be represented as 2-
D or 3-D. Figure-3 shows the general architecture of object recognition system.
Object Recognition task is affected by several factors and can differ according to
various aspects as shown in Figure-4. It shows the categorization of aspects in which
the work is going on, in the field of object recognition. The approaches may differ on
the basis of form and representation of objects, Matching schemes, Image Formation
Model, Type of Features, Type of Image and type of data suited for categorization.
Once we studied various aspects we figured out that these approaches mainly differ in
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 108 editor@iaeme.com
the object representation method based on type of features or the classification
approach adopted by the method in the recognition phase.
As the factor changes it can easily be observed that the approach changes
substantially but basically these approaches broadly follows three paradigms for
formulating and attempting the solution to the problem of Object Recognition from an
image; Bottom-up, Top-down, Hybrid paradigm [103].
Figure 3 Generic Architecture of Object Recognition System
Bottom-upIt can also be considered as Image analysis from its low level data and is
based on image segmentation techniques. It considers the raw image data which is
available in the form as it is acquired. Boundaries of the homogeneous regions are
extracted by performing non-purposive segmentation without prior knowledge about
properties of individual object classes. No attempt is made to make any prior
assumptions related to what these objects are. Fixed set of attributes are used to
characterize these regions and objects are linked together to characterize the scene
itself. However, without some additional information, purely bottom-up approaches
have so far been unable to yield figure-ground segmentations of sufficient quality for
object categorization [Leibe & Shiele] till 2009, then after many approaches have
been developed [85, 86, 88, 89] which uses bottom-up segmentation methods as
discussed in [82] and [85] and have achieved remarkable results which will be
discussed in detail later in literature review section of the paper.
Top-down: This is Image Analysis from the Semantic level data. Contradictory to
earlier approach, this methodology proceeds with an assumption that the image does
contain a particular object. If the problem is of scene categorization, it assumes that it
is a particular type of scene. The system will attempt to verify the existence of a
hypothesized object. Purposive segmentation may be performed or specialized ways
are used to represent the object.
Hybrid: Combination of the earlier two paradigms are used [61],[79] in this kind of
approach.
3. KEY CHALLENGES
3.1. Challenges overview
As stated earlier, the problem of Object recognition in general and Generic Object
Recognition in particular faces various challenges.
(I) The appearance of an object in the image can have a large range of variation due
to:
1. Viewpoint changes
2. Scale, Orientation and Shape changes (e.g., non-rigid objects)
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 109 editor@iaeme.com
3. Photometric effects (scene illumination etc.)
4. Scene/Background clutter (therefore objects may be occluded)
(II) Different views of the same object can give rise to widely different images.
(III) Large number of object categories existing in real-world and these categories
may have very less inter-class variation.
Figure 4 Factors affecting the task of object recognition
3.2. Description
Object recognition can be considered as yet another data processing task, so data is
given the highest priority thus acquisition should be considered as most important
step. In recent years, with the advent of high quality camera and other image
capturing devices, we can collect a huge amount of data (images) in various forms
like intensity images, range images and also from various sources like web but the
major problem that computer vision research community is facing today is scarcity of
accurately and precisely labelled image examples. As stated earlier, object recognition
problem can be considered essentially, a supervised classification task, and for that to
work successfully there remains the need of labelled images examples. The problem
becomes more gruesome for the reason that the task is labour intensive. Also due to
the non-availability of human experts which can do image annotation task efficiently
and accurately the task becomes more challenging.
‘Feature Extraction’ is the next crucial step in the pipeline of generic object
recognition. Assuming that the data is available, the feature extraction becomes the
most important stage of the entire object recognition framework. If, suitable features
of right dimensions are not extracted, this phase can become the bottle neck of the
recognition pipeline. Though recently many sophisticated approaches have been
developed and are existing in the literature ,but they are not sufficient to describe
every object , so feature extraction becomes too object specific and varies as either
viewpoint, size and illumination conditions of the image capturing varies. Thus
representing images by effective features is crucial to the performance of various
image analysis tasks. Features can be low-level (colour, texture, Intensity), middle-
level (Image Patches) or High-level (objects, textually annotated objects). Figure-5
shows a probable classification of different kinds of features.
Choosing and deploying an appropriate classifier is the next important step of the
pipeline. The classifier can be linear or a non-linear one. Various classifiers like
Byesian classifier, SVM, decision trees, Neural Networks etc. are utilized in literature
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 110 editor@iaeme.com
for the purpose of classification each possessing its own benefits and drawbacks. One
important inherent issue related to the classification stage is; scalability of the
classifier. As number of existing object categories is too large in real world and many
visual features are required to model each category thus forcing the system to have
huge volume of training data to model variety of category classifiers. In order to keep
scalability manageable, a linear classifier is commonly utilized, but its classification
performance is inferior to the nonlinear one whereas the non-linear classifiers are
more computation intensive. To remedy this defect of the linear classifiers, a design
of rich image feature set, (which is after all, a key factor in the success of the image
recognition system) per object class is required so that the system can distinctly
recognize objects from images possessing inter-class and intra-class variation as
shown in Figure-6. Additionally, classifier has to be updated continuously because
even if it is trained once for a category/class of object and previously unseen instances
of object emerge or appearance of the object evolves, the earlier trained classifier will
not give correct results. This kind of flexibility and resilience to change is inherently
expected from any object recognition framework.
Figure 5 Classification of Image Features
Figure 6 Images of different instances of object (Dog) in varied imaging conditions.
Intra-class appearance variations refer to the appearance differences among
different objects of the same class [14]. Intra-class appearance variation may be either
due to difference in colour, shapes and sizes of the object’s instance or due to
difference in imaging conditions. For example Image of the same object taken at
different time of the day, in different seasons or at different places, with different
devices and different viewpoints, will be entirely be different. In addition to intra-
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 111 editor@iaeme.com
class appearance variations, Generic Object Recognition system has to efficiently and
distinctly handle inter-class appearance variations also; which in many cases would be
very less as shown in Figure 7. For example, object recognition system should be
capable of distinctly recognize between a donkey and a horse or a Horse and a Mule.
Figure 7 Images of Horses and Donkeys with very small inter-class appearance
variation: Lower row is images of horses (adapted from [14])
The performance of generic object recognition framework is generally judged
upon criteria like robustness against noise, invariance to basic geometric
transformations, invariance to illumination and viewpoint changes and its ability to
handle the number and different types of objects, ability to handle intra-class and
inter-class variations, recognize objects in presence of clutter or complicated
background and also to be able to recognize the object even if it is partially occluded
accurately and efficiently. These requirements are expected implicitly and must be
present in any framework for object recognition and as a result these issues can be
considered as key challenges for the field of generic object recognition.
4. LITERATURE REVIEW
4.1. Overview
The object recognition pipeline, as stated in the earlier section, consists of the key
tasks like Image acquisition, Pre-processing, Feature Extraction, Feature
representation/Feature description and Classification. However Image acquisition and
Pre-processing phase falls out of the scope of this study. Although, most of the related
work surveyed and cited here focusses on one or the other phase of this pipeline, our
main focus in this study is on feature extraction and description techniques and also to
obtain the answers to the research questions put up at the beginning of this
manuscript. Although, various groups of researchers in the literature have attempted
to survey and review the work in the field of computer vision but either they are
related to some specific object like a survey on face recognition is presented in
[108,109] or various descriptors have been compared and surveyed [14,45,48] or a
separate survey is presented in [114] on object recognition using deep neural
networks. That is one particular aspect of the topic is explored and related literature
review is presented while discussing their core work. Periodically comprehensive
surveys on generic object recognition [14, 15] have been published in the past but
looking to the rapid pace of achievements in the field, it seems natural to survey the
most recent developments and object recognition techniques available in the
literature. In this study, we have mostly tried to review the work done in the field
since year 2000 but more emphasis is given on surveying the work done after 2011.
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 112 editor@iaeme.com
The rationale behind this is that most of the surveys and papers mostly talk
extensively of the approaches before 2011. But, looking at the pace of technical
advancements in the field, lot of approaches have emerged since 2011 which demands
detailed reportage and covering them is the basic motive of this review. Therefore, the
study is also aimed at presenting the survey in such a way which should help to gain
an insight into this field of research. Also as noted in introduction section that the task
of object recognition is considered as broadly comprising of three sub-tasks: Object
detection, Object localization and object classification, but in this manuscript we have
studied the approaches of generic object recognition which is the highest level of task
in the object categorization subtasks; i.e to categorize the class of object in the image,
object detection is inherently performed and many a times they needs to be localized
also. Due to this reason we have not segregated the approaches on the basis of
detection, localization or categorization.
4.2. Features and Feature Descriptors
The foundations of the field can be traced back to 1950s and 1960s, when early work
was done in very simplistic domains [1]. The world was modelled as being composed
of blocks defined by the coordinates of their vertices and edge information. The
“block image” represented areas of uniform brightness in the image and the edges of
blocks were located in the areas of intensity discontinuity. But very soon it was
realised that, it is not an ideal way to represent the complicated information presented
in the image. Since then various strategies are being developed for the task of object
recognition with an emphasis on feature extraction stage and in the usage of novel and
efficient type of feature descriptor.
Object recognition can be classified into various broad categories. These include
model-based approaches, shape-based approaches and appearance-based approaches.
Model-based approaches try to represent objects as set of various three dimensional
objects [1, 12, 13] like generalized cylinders, cones, cubes, cuboids spheres etc.
Shape-based approaches [13, 19, 20, 21, 52, 53] represent the objects by shape
primitives like boundary fragments, contours, shapelets, etc In contrast, for
appearance-based models only the appearance is used, which is usually captured by
different two-dimensional views of the object-of-interest. So, it can be observed
easily, whatever be the representation method object representation takes the centre
stage in the entire object recognition pipeline. And in turn the problem of object class
recognition reduces to the generating an efficient representation of object which can
detect, localise and identify the class of object discriminatively and repetitively.
As stated earlier extracting and describing features efficiently, of the objects from
the images, decides the fate and success of a typical object recognition system. In a
generic object recognition or categorization system, the relevant features or
descriptors from a characteristic point, patch or region of an image are often obtained
by different approaches. As shown in Figure-5, the features at the top most level can
be divided into two categories global and local wherein the former characterizes the
image as a whole whereas latter represents some local information in form of pixel,
patch or region. Yet another direction along which many researchers have tried to
classify features is structural and statistical. Although, there are various classifications
for features but there exists significant overlap among these classes. For example,
local features can be structural as well as statistical. These features are often
combined to form various descriptors especially region level descriptors are formed
by combining colour, texture and other such low-level features.
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 113 editor@iaeme.com
As far as pixel level features are concerned, they are regarded as low level
information of the image and are directly computed from the grayscale value of the
pixels individually and generally used to build more sophisticated patch-level or
region level descriptors. We now briefly discuss some of the best performing
descriptors proposed and utilized over the years. This is not meant to be an exhaustive
discussion of the existing approaches, but rather to provide a sample of some
relatively successful and widely used approaches over the years.
4.2.1. Appearance-Based Object Representation
Local Scale-Invariant Features (SIFT) [3][4] introduced by Lowe, is regarded as one
of the most popular patch-level feature descriptors reported in literature. Feature
identified are shown to be completely invariant to basic geometric transformation and
partially invariant to illumination changes and occlusion. SIFT features proved more
successful, as they do not depend on exact grey level distribution within an image
patch, instead use general configuration of image gradient [60]. This was considered
as one of the prominent approach in the area of object recognition and the work is
considered as milestone in the research of object recognition, computer vision and
other image analysis problems. However, as the descriptors are appearance based, and
may produce poor result especially if the object does not have enough information of
its texture features. The SIFT is applied for the problem of object recognition in many
works. Two such usages are mentioned in [3] and [4]. In various other work [2, 39,
42, 75, 110, 111, 112], some kind of improvisation has been achieved by combining
other features along with SIFT or using other filters than Gaussian [110]. The
dimension of key-points obtained when SIFT is applied is relatively large in number,
hence resulting into high dimensional data. This drawback was realised by authors in
[5] and SIFT was extended as PCA-SIFT , where Principal Component Analysis is
applied to normalized gradient patch resulting into lesser dimensional descriptor.
PCA-SIFT yields 36-dimensional descriptor which is fast for computation and
matching but are less distinctive [6] while descriptor introduced by Mikolajczyk and
Schmid[45] namely GLOH (Gradient Location-Oriented Histogram) is another
variant over SIFT which proved to be more distinctive with the same dimension[6].
Also, a colour image-based SIFT has been demonstrated in [75], wherein in place of
intensity gradients, colour gradients are used in Gaussian framework.
As mentioned earlier, high dimensionality of the descriptor is the major limitation
of SIFT, another effective patch-level descriptor SURF (Speeded-Up Robust
Features) is proposed in [6] by Bay et al. The authors have made use of integral
images which results into yielding not only faster but distinctive and repeatable
features. The authors based their descriptor on Hessian matrix but uses very basic
approximation. Moreover only 64 dimensions are used which is much less than
SIFT’s 128 dimensional vector. Though one can argue that PCA-SIFT results in only
36-dimensional vector but at the same time it loses the distinctiveness whereas SURF
has been proved more distinctive and repeatable.
Another level at which features descriptors are generated in numerous papers is at
region level. Dalal and Triggs [32, 33, 34] used grids of locally normalised
Histograms of Oriented Gradients (HOG) as descriptors for object detection in static
images. The technique counts occurrences of gradient orientation in localized portions
of an image. The detector window is tiled with a grid of overlapping blocks in which
Histogram of Oriented Gradient feature vectors are extracted. Detector thus presented
is contrast-based which makes it robust to small changes in image contour locations
and directions, and significant changes in image illumination and colour, while
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 114 editor@iaeme.com
remaining highly discriminative for overall visual form. In work of Dalal and Triggs
[32, 33,34] is aimed at detection of humans in particular, but also proved effective in
detecting other object classes from images. HOG descriptor has proved very efficient
descriptor for representing structured objects. For example, It has outperformed all
other descriptors in pedestrian detection from videos and images. Inspired by HOG
[32] Bosch et al[36] proposed a novel descriptor called PHOG (pyramid of HOG).
The idea was to represent local image shape and its spatial layout, together with a
spatial pyramid kernel of Bag of Features (BoF) [25,26]. Each image is divided into a
sequence of increasingly finer spatial grids by repeatedly doubling the number of
divisions in each axis direction (like a quadtree). The number of points in each grid
cell is then recorded. HOG vector is computed for each grid cell at each pyramid
resolution level. The final PHOG descriptor for the image is a concatenation of all the
HOG vectors. This concatenated HOG vector is then normalized to ensure that texture
rich or images with more edges are not weighted more strongly than others. Another
descriptor which is built on the idea of histogram of gradients (HOG) is CoHOG (Co-
occurance histograms of gradients) proposed in [37]. CoHOG can express shapes in
more detail than HOG as CoHOG are histograms whose basic units are pairs of
gradient orientations. Histogram is referred as co-occurrence matrix. Due to this
pairing, the vocabulary size increases resulting into more specific expression of shape
of object in the image. The usage of higher dimensional matrix makes CoHOG
powerful in terms of its discriminative power but at the same time becomes highly
computation intensive.
Bag of Features and visual codebook based approaches
The approach is inspired by BoW (Bag of Words) approach which was first proposed
in 1997 by [38] for describing the textual data for the purpose various text analysis
tasks. It is used to represent a text document or a sentence written in natural language,
as set of words, not taking into consideration its grammar or the order in which these
words occur in the original text. The frequency of occurrence of each word is
calculated and then used for various language processing tasks. The analogous term
BoF( Bag of Features), is used to represent the approach. Similar to BoW model, here
image is represented as order less collections of local features of Image. Similar terms
like Bag of Keypoints (BoK), Bag of Visual Words(BoVW) by various researchers is
used in their works. The method is based on vector quantization of affine invariant
descriptors of image patches [39]. A bag of keypoints corresponds to a histogram of
the number of occurrences of particular image patterns in a given image. The method
uses clustering to obtain quite high-dimensional feature vectors for a classifier. As
construction of codebook is done in the BoF approach, at many times it is also
referred as codebook-based approach. The method includes following main steps.
 Detection of image patches for computation of patch descriptors
 Computing patch descriptors for these patches. These descriptors can be any feature
invariant descriptors like SIFT [3,4] or any variant of it or any other lower level
descriptor like Harris-affine [43], MSER.
 Construction of a visual codebook/vocabulary/dictionary by assigning patch
descriptors to predetermined clusters (a vocabulary) with a vector quantization
algorithm that groups similar features together. For determining clusters instances of
usage of several clustering techniques are available. However, more frequently k-
means clustering is applied. [39]. Whereas Hierarchical k-means clustering is adopted
by [49] and mean-shift by authors in [35].
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 115 editor@iaeme.com
 Generating a histogram of number of occurrences of particular patches assigned to
each cluster. The size of the resulting histogram equals the size of the codebook and
hence the number of clusters obtained from the clustering technique [40].
 Treating the bag of features as a feature vector and using a classifier to classify the
respective image patch. A distance measure is required when comparing two term
vectors for similarity but this measure operates in the term vector space as opposed to
the feature space.
There are two reasons why the bag-of-features image representation (BoF) proved
to be popular for indexing and categorization applications. First, this representation
benefits from powerful local descriptors, such as the SIFT descriptor and Second,
these vector representations can be compared with standard distances, and
subsequently be used by robust classification methods such as support vector
machines [50]. Also, the codebook model-based approaches, while ignoring any
structural aspect in vision, provide state-of-the-art performances on current datasets
[40]. The discriminative power of such a visual codebook determines the quality of
the codebook model, whereas the size of the codebook controls the complexity of the
model. The codebook-based approaches are considered as simple and efficient, and
also can be made robust to clutter, occlusion, viewpoint change, and even non-rigid
deformations [26, 25]. Inspite of being one of the popular and successful approaches,
we find that BoF and visual codebook generation approach has also got certain
limitations. As BoF expresses the image as appearance frequency histograms of visual
words by quantizing SIFT like features, location information and the geometric
relationship between key-points are lost. Also as vector quantization is involved so
inherently loss of information occurs. Also due to loss of geometric relation between
the features, localization of the object is not possible.
To overcome the limitation of orderless representation of objects, several
researchers have proposed approaches to augment bag of features with global spatial
relations in a way that significantly, at one end improves classification performance
while at the other end remain simple and computationally efficient so that can be
applied for the real-world applications [27]. Authors in [27] have demonstrated that
bag of feature description of the image can be extended to spatial pyramids so that the
spatial location information of the features can be retained. To generate these spatial
pyramids, the input image is partitioned into increasingly fine sub-regions.
Histograms of local features are computed over these sub-regions. The histograms are
further concatenated to generate the final features. This representation is combined
with a kernel-based pyramid matching scheme proposed by [24] that efficiently
computes approximate global geometric correspondence between sets of features in
two images. While the spatial pyramid representation sacrifices the geometric
invariance properties of bags of features, it compensates for this loss with increased
discriminative power derived from the global spatial information.
Similarly in [2], to overcome the problem inherent to BoF approach, graph is
constructed by connecting SIFT key-points with lines. As a result, the key-points
maintain their relationship, and then structural representation with location
information is achieved. Since graph representation is not suitable for statistical work,
the graph is embedded into a vector space according to the graph edit distance. As a
result, authors achieved recognition accuracy compared to the conventional method in
their experiments on PASCAL VOC and Caltech-101 datasets. So, the basic idea to
achieve the improvement in BoF approach is to somehow incorporate the spatial
location information of features in BoF features so that the method can not only be
used for recognition but can also be successfully applied for object localization. The
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 116 editor@iaeme.com
authors in [47] have achieved improvisation by adding binary signatures to the
descriptors, First, Hamming Embedding (HE) of the SIFT descriptors; analogical to
hamming distance and second integrate a weak geometric consistency (WGC) check
within the inverted file system which penalizes the descriptors that are not consistent
in terms of angle and scale. In this way geometrical information is incorporated in the
index with very large datasets. But at the same time both Hamming Embedding (HE)
and WGC require to store additional information, hence memory requirement of index
increases.
The visual codebook approach has been used by several other researches in
slightly different way. For example, Liebe et al in [7,8,9] have adopted a two staged
approach In first stage a codebook of local appearances is learnt which contains
information, which local structures may appear on objects of the target category.
Next an Implicit Shape Model (ISM) that specifies where on the object the codebook
entries may occur. To create the codebook the authors have adopted the method
presented in [17] by Agarwal and Roth. From a variety of images, 25 x 25 pixel
patches are extracted with the Harris interest point detector. These patches are
clustered using agglomerative clustering to generate a compact cluster. These
codebook entries are used to define implicit shape model of the objects. The approach
do not try to create and define a separate model for all possible shapes an object can
take rather define shapes of an object in terms of patches that are consistent in local
appearances. Due to this concept, less number of training examples are needed to
learn object’s probable shapes. A second time codebook entries are scanned and all
those entries are activated whose similarity is above a certain chosen threshold. The
threshold chosen would be same as the threshold used during clustering performed in
the first step. While in recognition stage generalized Hough transform is performed
for identifying possible object centre.
GIST: Humans can recognize the gist of a novel image in a single glance,
independent of its complexity [69], by considering them in a “holistic” manner, while
overlooking most of the details of the constituent objects. Intuitively, GIST
summarizes the gradient information (scales and orientations) for different parts of an
image, which provides a rough description (the gist) of the scene. Input image is
divided into non-overlapping regions. The region is then further divided into sub-
regions and then Gradient Orientation histogram is computed for these sub-regions.
The GIST descriptor for a region is formed by concatenating these Gradient
Orientation histograms for all sub-regions of a region. The approach is more
prevalently used for scene understanding purpose. Approaches based on GIST cannot
be considered as an alternative to image analysis based on local feature based
approaches but can be considered as an additional support for recognition problems
by helping to constrain the local feature based image analysis. In [72,73] short binary
codes are used to compress local GIST descriptors and demonstrated that the
approach works on millions of images obtained from internet without sacrificing the
recognition accuracy and effectiveness.
4.2.2. Shape-Base Approaches
Many approaches based on intensity, colour gradient of the image patches or region is
discussed in the previous part of this paper. Although as noted, these descriptors are
very powerful and have shown to perform object recognition with remarkable
effectiveness. Still there may be a case where two object class exist with same colour
and texture with only difference in shape or for the classes where the appearance is
very much variable in every instance of the object. Such objects cannot be represented
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 117 editor@iaeme.com
with only colour, intensity based features alone. For example, if we consider fruit
class, raw mango and capsicum are of green colour, but having entirely different
shape. Recognition community also very soon understood that across the exemplars
that belong to a category, shape is a more invariant property than appearance. As a
result, the majority of recognition systems from the mid-1960s to the late 1990s
attempted to extract shape features, typically beginning with the extraction of edges,
at occluding boundaries and surface discontinuities, edges capture shape information
So, shape is yet another important cue which can be used to generate a discriminative
representation of objects. To compute the shape of the object, different authors have
taken different methods. Shape cues are frequently captured and described at the
region level for object class recognition or detection using contour or boundary
fragments [19], shapelets, edgelets [20], shockgrphs etc. Another area of research, as
far as object’s shape-based detection is concerned, is how to set up the
correspondence between shape extracted from training and test image i.e ; How to say
two shapes are matching [52,53]. One of the limitations of shape-based object
description is that it cannot capture intra-class variations in very discriminative way.
For example, a Zebra cannot be differentiated from a Horse. Often shape-based cues
are combined with other appearance based object cues.
4.2.3 Part-Based approaches
Object as 3D volumetric parts
Earliest attempts at solving the object recognition problem used high level 3D parts
based objects, such as generalized cylinders (Binford) and other deformable objects,
such as geons (Biederman [13] )and superquadrics (Pentland) [79]. The common
characteristic among all of them is that they all based on symmetry; a physical
regularity in our world which is exploited by our human visual system. However in
practice it becomes too complex to extract such parts efficiently and in an inexpensive
manner. But once extracted they are more semantically nearer to description of the
image content. Such parts would be limited in number, as compared to the approaches
where low-level features and mid-level features are used to describe the object.
Although the methods based on low-level and mid-level features score on their
simplicity, ease of extraction and attractive invariance properties; but have proved to
be weak in expressing high-level semantic information of the image. The above noted
facts had made object representation using 3D volumetric parts had achieved lot of
attention in the decade of 70s and 80s. The detailed coverage of the topic is out of the
idea of this study but the works of Binford and Nevatia [115] can be explored for
further information related to the concept.
Recognition based on parts
In Part-based object recognition approaches, object is modeled as a set of
geometrically constrained set of various parts of the image where each part has a
distinctive appearance and spatial position. In such approaches, shape is represented
by the mutual position of the parts [22]. Using such features it is determined whether
an instance of object of interest exist in the image or not and if at all it exist where it
exist in the image. Various methods exist in literature which differs on how these
parts are detected, how their position could be represented and what should be the
ideal number of parts to represent an image. Generally these parameters are tuned to
the requirement of the approach. In [22] Objects are modelled as flexible
constellations of parts. A probabilistic representation (which in this case the authors
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 118 editor@iaeme.com
have used Gaussian), is used for all aspects of the object like shape, appearance,
occlusion and relative scale. To learn and model the object category, first regions and
their scales are detected. Once the regions are identified, they are cropped from the
image and rescaled to the size of a small typically 11×11 pixel patch and then
parameters of the above densities are estimated from these regions, such that the
model gives a maximum-likelihood description of the training data. To detect the
features, a histogram is generated of the intensities in a circular region of some radius.
This is done for each point on the image. The entropy of this histogram is calculated
and local maxima of this histogram are considered as scale of the region. The N-
regions with highest saliency over the image provide the features for learning and
recognition. To reduce the dimension of the feature set, PCA has been used.
Deformable part based approach
Deformable Part Models constitute the state of the art for sliding-window object
detection [99]. The DPM’s are inspired by pictorial structure representation
introduced in [91] by Fischler and Elschlager where an object is modelled by a
collection of parts arranged in a deformable configuration [92]. To represent visual
properties of the object small picture segments are used whereas the deformable
configuration is captured by spring-like connections between these visual picture
segments. An energy function is computed by computing match cost for each part and
deformation cost for each pair of connected parts and this energy function is
minimized to find the best match of model with in an image. The effectiveness of
pictorial representation in case of image matching demonstrated in [91] is due to the
fact that the representation is simple. In addition the representation possesses wide
general applicability as it is not dependent on any particular scheme to model the
appearance of the parts so can be used to represent quite generic objects. But at the
other end, the model suffers from certain very critical limitations. Too many
parameters are involved in the construction of the model thus the energy minimization
function solving becomes very computation intensive. Also the best match is only
found likewise, if the image consist multiple instances of the same object, they would
not be detected by the pictorial representation given by [91]. The issues in pictorial
representation are aptly handled by Felzenswalb in pioneering work reported in [92].
Pictorial representation proposed by Fischler and Elschlager constructs the
representation which can be viewed as graph whereas Felzenswalb and Huttenlocher
used tree representation realising that many objects in real-world can be represented
by using a tree structure especially when the object to be modelled are human beings,
animals. Using this improvisation finding best match model to an image can be
computed in polynomial time. The approach demands that the graph which is
generated to represent the object be acyclic and function dij(li , lj) measuring the
degree of deformation of the model when part vi is placed at location li and part vj is
placed at location lj needs to be a Mahalanobis distance between transformed
locations. DPM’s are impressive way of object representation. While deformable
models can capture significant variations in appearance, a single deformable model is
often not expressive enough to represent a rich set of object category [93]. It can also
be noted that in practice simple models generally outperform approaches using
deformable part based representation. The reason being the simpler models can be
trained easily whereas it is more difficult to train more sophisticated models like
DPM. Authors in [93] illustrates that a deformable part-based model represents an
object by a low-resolution root filter and a set of higher-resolution part filters
arranged in a flexible spatial configuration. The flexible spatial configuration helps to
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 119 editor@iaeme.com
model the visual appearance at multiple scales. The approach has achieved benchmark
results in the PASCAL object detection challenges. The approach basically uses HOG
(Histogram of gradients) [32] using star-structured part-based model defined by a
filter similar to filter used in [32] and set of part-based filters and deformation models.
The model presented by [101] is effective for shallow structures consisting of at
most two layers, but as the number of layers in the structure increases, it becomes
difficult to scale the model without incorporating and tuning additional parameters.
Yullie et al in [106] have extended the model discussed in [101]. In this paper authors
have proposed that description of object class using several templates from different
viewpoints. Each template is represented as a tree-structure consisting of three layers.
The first layer represents entire image. The second layer divides the image into 9 sub-
images and third layer divides each sub-image of second layer into four sub-images
making third layer of 36 sub-images.
The approach used by Dalal and Triggs [32], to detect pedestrians, fails in
presence of articulation whereas [93, 94, 95, 96] allows an intermediate layer of parts
that can now be shifted with respect to each other making overall model deformable
and in this way achieves generalization. But such approaches do not work when it is a
question of extracting human pose from images. In [102] Bourdev and Malik have
introduced ‘Poselets’ ; parts that are tightly clustered in both appearance and
configuration; for detection and pose estimation of in image consisting human body.
Whereas, in [79] Pablo et al have unified the approaches presented by Dalal and
Triggs [32], Felzenswalb [95] and Bourdev and Malik [102] into a single recognition
framework and tries to take the benefit of each approach. The region-based object
descriptors are used to perform purposive semantic segmentation and subsequently
their outputs are combined and hence performance is achieved.
4.2.4. Recent Approaches and Advancement
We have discussed many approaches with their benefits and limitations in the earlier
sections. One thing can also be noted that all those approaches to object recognition
make essential use of machine learning methods. Most current machine learning
methods work well because of human-designed representations and inputs features.
Early conventional approaches involve hand-crafted features for object representation
and look for these features in image. To do this the programmer was required to have
a deep knowledge of the data and would laboriously engineer each one the feature
detection algorithms [114]. There have been big improvements in image analysis over
the last few years due to the adoption of deep learning neural networks to solve vision
problems. Fig-8 shows schematically the difference between traditional vision
systems and recent deep neural network based system.
Neural Nets for Object Recognition: Neural Networks have been used in object
recognition systems since decades. Neural Nets implement a classification approach.
Their attraction lies in their ability to partition the feature space using nonlinear
boundaries for classes. Earlier Neural Networks were used as classifier only in the
classification stage of Object recognition pipeline (Figure 8), but only recently, with
the progress in vision research and the increase in computational power, neural
networks are utilized for automatic feature learning( from the raw data of the image)
as well as classification also. LeCunn [123] in 1989 demonstrated an algorithm to
train Neural Networks in supervised way and proved applications like hand-written
digit recognition performs remarkably and is benefitted from it. Since then
Convolutional Neural network are being used by many research communities.
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 120 editor@iaeme.com
Convolutional Neural Networks are different than conventional approaches like BoF,
DPM (Deformable Part Model).This difference is due to two very important reasons.
First, they are deep architecture whereas the conventional approaches were shallow
architectures. And second they doesn’t need to have prior knowledge of data of
image. Deep learning neural networks made it possible to learn features in an
unsupervised manner directly from data instead of handcrafting them explicitly. The
approach has helped vision tasks particularly object recognition greatly thereby
enabling effective capturing of low-level as well as middle level cues of object to be
recognized. As a result Deep learning Neural Networks have brought huge
improvements in the performance of image analysis results, over the last few years.
What makes deep architectures achieve such a good result?
Conventional neural nets used 1 to 2 layers of neurons whereas Deep Neural
Network” is one class of neural nets that uses deep architectures with 2 to 6 layers of
neurons stacked on top of each other. As a result DNN can learn more complex
models easily without the need of hand-designed features. DNN’s have shown good
results on ImageNet dataset [126]. On the test data authors achieved top-1 and top-5
error rates of 37.5% and 17%. Their neural network consisted of 650,000 neurons and
had 5 convolutional layers and learnt 60 million parameters in ILSVRC 2010.
Like every other approach Deep architecture also has got certain limitations.
 Needs very sophisticated hardware and also image of fixed size typically 224 x 224.
 Contains huge number of parameters to be trained so computation intensive
 When trained using Gradient descent, the gradient does not trickle down to the lower
layers; so the sub-optimal sets of weights are obtained [114].
Various modifications to DNN’s have been suggested in the literature to
overcome these limitations. To overcome the constraint of fixed sized images required
by deep neural networks, several efficient pooling strategies are proposed. In [113],
network is equipped with spatial pooling strategy (SPP-net). SPP-net can generate
fixed length representation irrespective of image scale and size. Spatial pyramid
pooling is based on spatial pyramid matching [24] which in turn is an extension of
BoF [26] approach. Another improvement achieved is RNN (Recursive Neural
Network) [130] used for scene classification. The method predicts tree structure for
scene images.
Figure 8(a). Block diagram representing typical traditional object recognition system
Various competitive challenges and Datasets
Here in this section we present some of the challenges that computer vision
community organize annually to invite, evaluate and report the innovative approaches
developed by research groups all across the world. These challenges serve the purpose
of setting a common platform for researchers to present their work and compete with
each other in the area of Object detection, localization and categorization. These
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 121 editor@iaeme.com
challenges also provide dataset of sample images so that the approaches can be
evaluated for all possible image content and variations in image capturing conditions.
Figure 8(b). Block diagram of Deep Architectures (Image taken from:
ufldl.stanford.edu/eccv10-tutorial/eccv10-tutorial_part4.ppt)
Early approaches for object recognition used very small set of images to evaluate
their algorithmic work. But with the advent of sophisticated world-wide-web, large
number of annotated images is readily available in public as well as private
repositories. To harness the benefit of such repositories datasets are created and made
available to the research community. As mentioned earlier these challenges is an
effort to bring together the research community together in a framework of
competition so that best approaches in computer vision can be evaluated and
publicized. These challenges consist of two components. First, a publicly available
dataset with ground truth annotations with standardised evaluation software and
second a competition and workshop [119]. To review these challenges, we first
discuss the datasets made available by these competitions along with certain other
widely used datasets.
Datasets: No research is possible in any research area [30] without appropriate
datasets. The same fact applies to object recognition and computer vision research
also. Appropriate datasets are needed for all stages of recognition research; may be for
learning visual models of objects and scene categories, detecting and localizing
instances of these models in images, and evaluating the performance of recognition
algorithms. Work mentioned in [30] reviews existing Image datasets from the point of
expectation, challenges and limitations. Datasets ideally should offer vide range of
image variability and should be sufficiently challenging so that algorithms can be
evaluated. One of the major limitations in creating such datasets is that images are to
be annotated. This task of annotating has to be done by human experts and turns out
to be mammoth task considering huge number of objects existing in the real world
which are to be recognized for various application and it is not so easy to get the
human experts accomplishing this effectively, correctly and efficiently. A wonderful
approach of automatic dataset collection using web is mentioned in [66], using an
object recognition techniques in incremental method. The images present on web are
used to learn the model in a robust way. Another solution for getting annotated
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 122 editor@iaeme.com
training examples can be by crowdsourcing but the most common error that an
untrained annotator is susceptible to is a failure to consider a relevant class as a
possible label because they are unaware of its existence.
Now we discuss certain most prevalent datasets.
Caltech-101 & Caltech-256: Caltech-101 is a collection of pictures of objects
belonging to 101 categories collected by Fei-Fei et al [64] in 2003. About 40 to 800
images exist per category. Most categories have about 50 images. Most images have
little or no clutter. The objects tend to be centered in each image and in stereotypical
pose. In comparison to Caltech-101, Caltech-256 is collection of 256 categories of
objects. Total 30608 images are present. Fig-9 compares Caltech-101 and Caltech-
256.
Figure 9 (Courtesy: http://www.vision.caltech.edu/Image_Datasets/Caltech256/details.html)
TRECVID: TRECVID organizes competition every year and for evaluating the
performance, releases dataset consisting of video shots. The goal of the conference
series is to encourage research in information retrieval by providing a large test
collection, uniform scoring procedures, and a forum for organizations interested in
comparing their results. Annotation is not provided by the organizers.
LabelMe: LabelMe [74] is a publically available annotated image database open for
public contribution. The dataset is provided with annotation tool, so that anyone can
annotate any image. As images are annotated by experts as well as casual users,
cannot be relied for obtaining test set whereas huge quantity of training images can be
obtained.
COIL-20 & COIL-100: Coil-20 and Coil-100 is a database of grayscale images of 20
and 100 categories of objects respectively [120]. Different poses of objects were
generated by placing the objects on rotating turn table and images were captured at
angular displacement of 5 degrees generating 72 views per object. It consists of 720
unprocessed images of 10 object categories. 1440 size normalized images are also
provided.
Microsoft COCO: Common Objects in Context database is a large-scale database of
images that addresses three core research problems in scene understanding: detecting
non-iconic views of objects, contextual reasoning between objects and the precise 2D
localization of objects [122]. Contextual knowledge can be helpful to boost all the
components of the object recognition framework. The dataset is provided to support
object recognition based on the context in which they lie in the scene. The dataset
consist 91 common object categories from which 82 of them having more than 5000
labeled instances. In total dataset have 2,500,000 labeled instances in 328000 images.
The dataset consists of less object categories but very high number of instances per
category that differentiates it with other popular large-scale datasets like PASCAL
VOC and ImageNet dataset which would be discussed in following sections.
The PASCAL Visual Object Classes Challenge
The PASCAL VOC challenge was first time organized in the year 2005. Since then
up to 2012, every year this challenge was organized annually. The challenge basically
consists of two components. A dataset consisting of 1000 images related to objects of
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 123 editor@iaeme.com
20 categories; obtained from Flickr web-site; was made available publicly and
competition involving object classification, detection, segmentation, action
classification and person layout. Everingham et al has reviewed PASCAL VOC in
[119]. The objects were fully annotated for each of the objects. Not that in 2005, such
a rich dataset was released. Only dataset consisting of four categories (motorbikes,
bicycles, cars, and people) was made available in 2005, but every year organizers kept
on enriching it and finally in 2011 of 1000 images were released. To assess different
methods bootstrapping of ROC curve is used.
The evaluation technique is used in a number of different ways: to simply judge
the variability for a given method, to compare the relative strength of two methods, or
to look at rank ranges in order to get an overall sense of all methods in a competition
[119].
ImageNet Large Scale Visual Recognition Challenge (ILSVRC): ILSVRC was
first organized in 2010 and since then, the event is organized annually. ILSVRC is
one of the most prestigious series of competition and workshop in computer vision
community to evaluate the performance of all contemporary approaches developed by
various researchers. The challenge from various aspects is nicely reviewed in [118].
Similar to PASCAL VOC, ILSVRC also provides a huge collection of annotated
images under the name ImageNet by Deng. et al.
ImageNet Dataset: ImageNet is an image database organized according to
the WordNet hierarchy in which each node of the hierarchy is depicted by hundreds
and thousands of images. Currently, there exist over five hundred images per node
[121]. In ImageNet, on an average 1000 images to illustrate each synset has been
provided. Images of each concept are quality-controlled and human-annotated.
ILSVRC has dimension very high as compared to PASCAL VOC [121]. As per 2010
data it is organized in form of 12 subtrees with 5247 synsets and 3.2 million images in
total. As ImageNet organization is inspired by WordNet structure and there are
around 80,000 noun synset in WordNet, Similarly ImageNet also aims at providing
nearly all the majority of the 80,000 synsets of WordNet with an average of 500- 1000
clean and full resolution images. To evaluate the approaches effective strategy of
bootstrapping used by PASCAL VOC is employed in ILSVR challenge series also. In
Table 2, we present the comparison between PASCAL VOC challenge dataset and
ILSVRC challenge as per year 2012 as referred from [101].
Table 2 Comparison of PASCAL VOC and ILSVRC as per [101]
Aspect for comparison PASCAL VOC ILSVRC
Diversity of Object classes Objects are only one class
label for e.g Boat for all types
of boat, be it lifeboat, fireboat
Objects are further refined to
subcategories for for e.g boat
is just not boat but lifeboat,
gondola
Chance Performance of
Localization(CPL)
8.8% on validation set for 20
categories
20.8% for all 1000 categories
Average Object Scale /class 0.241 0.358
Average Number of Instances per
class
1.69 1.59
Clutter per class when clutter is
computed as No. of boxes.
129.96 106.98
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 124 editor@iaeme.com
In addition to all these, various other datasets are used in literature like GRAZ-01
by Opelt and Pinz, which contains four types of images: bikes, people, background
with no bikes, background with no people. INRIA (people) dataset used by Dalal and
Triggs in their work in [32, 33], MNIST – Dataset of handwritten digits, ImageCLEF,
INRIA (Horses,cars), TinyImages dataset by Torralba et al , ETH-80 etc. CIFAR-10
and CIFAR-100 set has 6000 examples of each of 10 classes and the CIFAR-100 set
has 600 examples of each of 100 non-overlapping classes [125]. The list that we have
considered is not exhaustive but exemplary. For an exhaustive list [127] can further be
explored.
5. FUTURE RESEARCH DIRECTIONS
Object recognition is one of the most exciting research area in the field of computer
vision. Today need is to develop systems which are, computationally efficient and at
the same time cost effective. We suggest some of the future research directions which
can be explored and in turn be incorporated in recognition systems. We attempt to
suggest these directions at algorithmic level or at product level; many of which can be
at present considered as an idea which may need knowledge base from
multidisciplinary fields.
 Currently, deep learning is the current state-of-the-art in object recognition and has
produced promising results but they suffer from certain serious limitations of being
resource intensive. So, in absence of sophisticated hardware DNN’s cannot be
adopted for object recognition. In such cases enhancing the performance of
conventional feature extraction techniques on shallow architectures can be helpful.
New approaches are in need which require shallow architectures and are still
efficient. Also it is realised that DNN’s have not shown very impressive results in the
task of object localization, this area can further be explored.
 It is also to be understood, how learning of features take place in Convolutional
neural networks? What makes deep architectures giving so high accuracy of
recognition?
 Due to advent of mobile and other hand-held devices with very nice image capturing
abilities, algorithms for mobile and other hand-held devices are in great need.
 Although considerable work exists in literature related to action recognition systems
and a complete line of research is going on in this direction as the area in itself
involves many and varied issues and research problems Products can be developed
involving action and activity recognition from videos.
 Computer vision techniques can be good method to generate assistive technology for
blind people. For example, products can be developed which sees the surrounding
and generates a natural language description of the scene and can be given as output
in spoken form. This will help to understand the surroundings and will help blind
people to navigate.
 Research in the area of understanding videos from its content already started, but still
in its infancy, Generic object recognition also paves the path for research in areas like
emotion recognition which will actually enable us to recognize the meaning of the
content in the video.
 Robotics is one another important field which can benefit from active object
recognition. Today’s robots are able to work only in well -structured and constrained
environments. Whereas, the requirement is to develop robots which can learn, adapt
and execute their tasks in real human environments.
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 125 editor@iaeme.com
 Almost every device has a camera and devices are now powerful enough to record
and process live video. These videos can be exploited for real-time applications. How
do we organize and personalize all of this content for common man?
 New performance evaluation techniques are needed.
 Many rich datasets have been generated and made available to public like ImageNet,
PASCAL VOC by computer vision research community. Although these datasets
holds huge number of images pertaining to various categories, but if we were to reach
to the level of near human vision capabilities in terms of flexibility and dynamism,
then these datasets are to be enriched. Novel ways of labelling huge number of
unlabeled image data should be found so that images annotated with ground truth can
be generated and can be made available publically.
6. CONCLUSION
From the literature available on the subject it was found that the demand for the
efficient Generic object recognition system is increasing very fast as the spectrum of
applications in which the object recognition is needed is very wide and rich. One
major problem in process of Generic Object Recognition is that, categories available
in the real world are varied and huge in number. Due to this fact, the training of the
recognition system for such a large number of categories and classes becomes a
challenging task however sophisticated, the approach may be. Also in such kind of
system the property of plasticity i.e the system should be able to gradually train itself
for unseen categories, is expected which further adds to the complexity of the system.
Such systems can be developed which should be flexible enough to train themselves
for new classes of objects. Another important issue with generic object recognition
system is with the feature extraction and description phase. In most of the approaches,
the number of features obtained is too large and are handcrafted. This very critical
limitation has been overcome by deep architectures which in turn have exploited
sophisticated hardware accelerations evolved recently. Approaches are needed which
make the entire set up cost effective requiring fewer resources. Ideally, it is desirable
that, the recognition task should be performed at semantic level which will result into
near human vision systems.
One of the key objectives behind this survey was to get the answers of the
research questions identified by us and mentioned in Section 1. From the literature
surveyed it can be deduced that, the earlier work related to generic object recognition
were putting more weightage on feature extraction stage and type of features, whereas
the later works were giving more prominence to type of classifier used. Also, recent
approaches are learning features directly from the image data. This can be regarded as
very striking innovation achieved by the vision community. Now the ways are needed
which can bring enhancement to these approaches. The above efforts can also be
extended for 3D images and also for videos.
As a result of this study and from the referred material, a general remark can also
be made about the kind of work that is done in the field. Most of the papers before
2008, mainly present novel ways of modelling the object class. i.e they emphasize on
novel ways of feature detection and descript tion. However, work presented and
published in the recent past since 2011, with the advent of sophisticated hardware,
more emphasis is given on handling more categories accurately and efficiently.
In this paper a current scenario of generic object recognition is portrayed in brief
with a hope that in near future, such an object recognition system will be developed
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 126 editor@iaeme.com
which would be capable of performing vision task similar to the human vision system
least possible effort and in a cost effective manner.
REFERENCES
[1] Bennamoun, Mohammed, and George J. Mamic. Object recognition:
fundamentals and case studies. Springer Science & Business Media, 2002.
[2] Takahiro Hori, Tetsuya Takiguchi, Yasuo Ariki. Generic Object Recognition
Using Graph Embedding into a Vector Space, American Journal of Software
Engineering and Applications. Vol. 2, No. 1, 2013, pp. 13-18.
[3] David. G. Lowe, “Object Recognition from Local Scale-Invariant Features”,
Proc. Of the International Conference on Computer Vision, /corfu.(Sept-
1999)
[4] David.G.Lowe,” Distinctive Image Features from Scale-Invariant
Keypoints”, 2004.
[5] Ke, Yan, and Rahul Sukthankar. "PCA-SIFT: A more distinctive
representation for local image descriptors." Computer Vision and Pattern
Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer
Society Conference on. Vol. 2. IEEE, 2004.
[6] Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. "Surf: Speeded up robust
features." Computer Vision–ECCV 2006. Springer Berlin Heidelberg, 2006.
404-417.
[7] B. Leibe, A. Leonardis, and B. Schiele. Combined Object Categorization and
Segmentation with an Implicit Shape Model”, In ECCV04. Workshop on
Stat. Learning in Computer Vision, pages 17–32, May 2004.
[8] Alexander Thomas, Vittorio Ferrari, Bastian Leibe, Tinne Tuytelaars, Bernt
Schiele, Luc Van Gool”, Towards Multi-View Object Class Detection”,
Proceedings of the 2006 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR’06)
[9] Bastian Leibe, Aleˇs Leonardis, and Bernt Schiele,” Robust Object Detection
with Interleaved Categorization and Segmentation”, IJCV, (2008) 77,259-
289.
[10] Jia, Menglei , Li, Hua, Xie, Xing , Chen, Zheng Ma, Wei-ying, “Automatic
Classification Of Objects Within IMAGES” United states Microsoft
Corporation (Redmond, WA, US)20080037877
http://www.freepatentsonline.com/y2008/0037877.html
[11] Pisipati, Radha Krishna (Hyderabad, IN), Syed, Shahanaz (Guntur, IN),
Jonna, Kishore (Proddatur, IN), Bandyopadhyay, Subhadip (Kolkata, IN),
Narayan, Rudra Narayan (Jemadeipur, IN) 2014. Systems And Methods
For Multi-Dimensional Object Detection United States 20140029852
http://www.freepatentsonline.com/y2014/0029852.html
[12] Besl, Paul J., and Ramesh C. Jain. "Three-dimensional object recognition."
ACM Computing Surveys (CSUR) 17.1 (1985): 75-145.
[13] Irving Biederman. Recognition-by-components: A theory of human image
understanding. Psychological Review, 94(2):115-147, 1987.
[14] Zhang, Xin, et al. "Object class detection: A survey." ACM Computing
Surveys (CSUR) 46.1 (2013): 10.
[15] Andreopoulos, Alexander, and John K. Tsotsos. "50 Years of object
recognition: Directions forward." Computer Vision and Image Understanding
117.8 (2013): 827-891.
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 127 editor@iaeme.com
[16] Roth, Peter M., and Martin Winter. "Survey of appearance-based methods for
object recognition." Inst. for Computer Graphics and Vision, Graz University
of Technology, Austria, Technical Report ICGTR0108 (ICG-TR-01/08)
(2008).
[17] Agarwal, Shivani, Aatif Awan, and Dan Roth. "Learning to detect objects in
images via a sparse, part-based representation." Pattern Analysis and
Machine Intelligence, IEEE Transactions on 26.11 (2004): 1475-1490.
[18] Fergus, Robert, Pietro Perona, and Andrew Zisserman. "Object class
recognition by unsupervised scale-invariant learning." Computer Vision and
Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society
Conference on. Vol. 2. IEEE, 2003.
[19] A. Opelt, A. Pinz, and A. Zisserman. A Boundary-Fragment Model For
Object Detection. In Proc. ECCV, volume 2, pp 575–588, May 2006.
[20] Wu, Bo, and Ramakant Nevatia. "Detection of multiple, partially occluded
humans in a single image by bayesian combination of edgelet part
detectors."Computer Vision, 2005. ICCV 2005. Tenth IEEE International
Conference on. Vol. 1. IEEE, 2005.
[21] Andreas Opelt, Axel Pinz,Andrew Zisserman,” Incremental Learning Of
Object Detectors Using A Visual Shape Alphabet”, Proceedings of the 2006
IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR’06)
[22] Andreas Opelt, Axel Pinz ,Andrew Zisserman,,“Fusing shape and appearance
information for object category detection”, 2006 - eprints.pascal-network.org
[23] Andreas Opelt, Axel Pinz ,Andrew Zisserman, “Learning an Alphabet of
Shape and Appearance for Multi-Class Object Detection”, IJCV (2008) 80:
16–44
[24] Grauman, Kristen, and Trevor Darrell. "Pyramid match kernel and related
techniques." U.S. Patent No. 7,949,186. 24 May 2011.
[25] Zhang, Jianguo, et al. "Local features and kernels for classification of texture
and object categories: A comprehensive study." International journal of
computer vision 73.2 (2007): 213-238.
[26] Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond bags of
features: Spatial pyramid matching for recognizing natural scene categories."
Computer Vision and Pattern Recognition, 2006 IEEE Computer Society
Conference on. Vol. 2. IEEE, 2006.
[27] Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Spatial pyramid
matching." Object Categorization: Computer and Human Vision Perspectives
3 (2009): 4.
[28] Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. “A Discriminative
Framework for Texture and Object Recognition Using Local Image Features.
In Toward Category-Level Object Recognition. Springer-Verlag Lecture
Notes in Computer Science, J. Ponce, M. Hebert, C. Schmid, and A.
Zisserman (eds.), 2006.
[29] Lazebnik, Svetlana. "Local, semi-local and global models for texture, object
and scene recognition." (2006).
[30] J. Ponce, T. L. Berg, M. Everingham, D. A. Forsyth, M. Hebert, S. Lazebnik,
M. Marszalek, C. Schmid, B. C. Russell, A. Torralba, C. K. I. Williams, J.
Zhang, and A. Zisserman. “Dataset Issues in Object Recognition” In Toward
Category-Level Object Recognition. Springer-Verlag Lecture Notes in
Computer Science, J. Ponce, M. Hebert, C. Schmid, and A. Zisserman (eds.),
2006.
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 128 editor@iaeme.com
[31] Dorko, Gyuri, and Cordelia Schmid. "Object class recognition using
discriminative local features." (2005): 22.
[32] Dalal, N. And Triggs, B. 2005. Histograms of Oriented Gradients for Human
Detection. In Proceedings of theIEEE Conference on Computer Vision and
Pattern Recognition (CVPR’05).
[33] Dalal,N.,Triggs, B., And Schmid,C. 2006. Human Detection Using Oriented
Histograms Of Flow And Appearance. In Proceedings of the European
Conference on Computer Vision (ECCV’06).
[34] Dalal, Navneet. Finding people in images and videos. Diss. Institut National
Polytechnique de Grenoble-INPG, 2006.
[35] Jurie, Frederic, and Bill Triggs. "Creating efficient codebooks for visual
recognition." Computer Vision, 2005. ICCV 2005. Tenth IEEE International
Conference on. Vol. 1. IEEE, 2005.
[36] Bosch Anna, Andrew Zisserman, and Xavier Munoz. "Representing shape
with a spatial pyramid kernel." Proceedings of the 6th ACM international
conference on Image and video retrieval. ACM, 2007.
[37] Watanabe, Tomoki, Satoshi Ito, and Kentaro Yokoi. "Co-occurrence
histograms of oriented gradients for pedestrian detection." Advances in
Image and Video Technology. Springer Berlin Heidelberg, 2009. 37-47.
[38] JOACHIMS, T. 1997. A probabilistic analysis of the rocchio algorithm with
tfidf for text categorization. In Proceedings of the International Conference
on Machine Learning (ICML’97)
[39] Csurka, Gabriella, et al. "Visual categorization with bags of keypoints."
Workshop on statistical learning in computer vision, ECCV. Vol. 1. No. 1-22.
2004.
[40] Ramanan, Amirthalingam, and Mahesan Niranjan. "A review of codebook
models in patch-based visual object recognition." Journal of Signal
Processing Systems 68.3 (2012): 333-352.
[41] K. Mikolajczyk, C.Schmid,” Indexing based on scale invariant interest
points”, International Conference on Computer Vision (ICCV '01) 1 (2001)
525—531.
[42] K.Mikolajczyk A. Zisserman C. Schmid,” Shape recognition with edge-based
features”, British Machine Vision Conference (BMVC '03) 2 (2003) 779—
788.
[43] K. Mikolajczyk and C. Schmid. “Scale and affine invariant interest point
detectors”. Int. J. Comput. Vision, 60(1):63–86, 2004.
[44] K. Mikolajczyk1, T. Tuytelaars2, C. Schmid4, A. Zisserman ,”A Comparison
of Affine Region Detectors”, International Journal of Computer Vision 65,
1/2 (2005) 43—72
[45] K.Mikolajczyk and C.Schmid,”A performance evaluation of local
descriptors”, IEEE Transactions on Pattern Analysis and Machine
Intelligence 27, 10 (2005) 1615—1630.
[46] Douze, Matthijs, et al. "Evaluation of gist descriptors for web-scale image
search." Proceedings of the ACM International Conference on Image and
Video Retrieval. ACM, 2009.
[47] Jégou, Hervé, Matthijs Douze, and Cordelia Schmid. "Improving bag-of-
features for large scale image search." International Journal of Computer
Vision 87.3 (2010): 316-336.
[48] Tuytelaars, Tinne, and Krystian Mikolajczyk. "Local invariant feature
detectors: a survey." Foundations and Trends® in Computer Graphics and
Vision 3.3 (2008): 177-280.
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 129 editor@iaeme.com
[49] K. Mikolajczyk, Bastian Leibe, Bernt Schiele,” Multiple Object Class
Detection with a Generative Model”, Computer Vision and Pattern
Recognition, 2006 IEEE Computer Society Conference on vol-1,pp 26 – 36
[50] Jégou, Hervé, et al. "Aggregating local descriptors into a compact image
representation." Computer Vision and Pattern Recognition (CVPR), 2010
IEEE Conference on. IEEE, 2010.
[51] Wengert, Christian, Matthijs Douze, and Hervé Jégou. "Bag-of-colors for
improved image search." Proceedings of the 19th ACM international
conference on Multimedia. ACM, 2011.
[52] Belongie, S., Malik, J., And Puzicha, J. 2001. “Matching shapes”, In
Proceedings of the IEEE International Conference on Computer Vision
(ICCV’01).
[53] Belongie, Serge, Jitendra Malik, and Jan Puzicha. "Shape matching and
object recognition using shape contexts." Pattern Analysis and Machine
Intelligence, IEEE Transactions on 24.4 (2002): 509-522
[54] Andras Ferencz, Erik G. Learned-Miller, Jitendra Malik,” Building a
Classification Cascade for Visual Identification from One Example”,
Proceedings of the Tenth IEEE International Conference on Computer Vision
(ICCV’05).
[55] Hao Zhang Alexander C. Berg Michael Maire Jitendra Malik,”SVM-
KNN:Discriminative Nearest Neighbor Classification for Visual
Recognition”, Proceedings of the 2006 IEEE Computer Society Conference
on Computer Vision and Pattern Recognition (CVPR’06)
[56] Bjorn Ommer Jitendra Malik,”Multi-Scale Object Detection by Clustering
Lines”, 2009 IEEE 12th International Conference on Computer Vision
(ICCV)pp 484-491
[57] Subhransu Maji, Jitendra Malik,”Object Detection using a Max-Margin
Hough Transform”,IEEE 2009, pp 1038-1045 .
[58] Vidal-Naquet, Michel, and Shimon Ullman. "Object Recognition with
Informative Features and Linear Classification." ICCV. Vol. 3. 2003
[59] Fergus, Robert, Pietro Perona, and Andrew Zisserman. "Object class
recognition by unsupervised scale-invariant learning." Computer Vision and
Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society
Conference on. Vol. 2. IEEE, 2003.
[60] Boris Epshtein Shimon Ullman,” Identifying Semantically Equivalent Object
Fragments”, Proceedings of the 2005 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR’05)
[61] Eran Borenstein and Shimon Ullman,” Combined Top-Down/Bottom-Up
Segmentation”, IEEE Transactions On Pattern Analysis And Machine
Intelligence, Vol. 30, No. 12, December 2008, pp 2109-2125
[62] L. Fei-Fei, R. Fergus, and P. Perona,”Learning generative visual models from
few training examples: An incremental bayesian approach tested on 101
object categories.” In Proc. CVPR Workshop on Generative-Model Based
Vision, 2004.
[63] Fei-Fei, Li, Rob Fergus, and Pietro Perona. "Learning generative visual
models from few training examples: An incremental bayesian approach tested
on 101 object categories." Computer Vision and Image Understanding 106.1
(2007): 59-70.
[64] www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 130 editor@iaeme.com
[65] Fei-Fei, Li, Robert Fergus, and Pietro Perona. "One-shot learning of object
categories." Pattern Analysis and Machine Intelligence, IEEE Transactions
on28.4 (2006): 594-611.
[66] Li-Jia Li, Gang Wang and Li Fei-Fei,” OPTIMOL: Automatic Online Picture
Collection via Incremental Model Learning”, 2007 IEEE
[67] Hao Su, Min Sun, Li Fei-Fei,Silvio Savarese,”Learning a Dense Multi-View
Representation For Detection, Viewpoint Classification And Synthesis Of
Object Categories”, 2009 IEEE 12th International Conference on Computer
Vision (ICCV)
[68] Bangpeng Yao, Li Fei-Fei,” Recognizing Human-Object Interactions in Still
Images by Modeling Mutual Context of Object and Human Pose in Human-
Object Interaction Activities”, Ieee Transactions on Pattern Analysis and
Machine Intelligence, Vol. 34, No. 9, September 2012, pp 1691-1703
[69] Oliva, Aude, and Antonio Torralba. "Building the gist of a scene: The role of
global image features in recognition." Progress in brain research 155 (2006):
23-36.
[70] Antonio Torralba, Kevin P. Murphy and William T. Freeman, “Sharing
Visual Features for Multiclass and Multiview Object Detection”, April 2004.
[71] Antonio Torralba , Kevin P. Murphy, William T. Freeman,” Sharing features:
efficient boosting procedures for multiclass object detection”
[72] Oliva, Aude, and Antonio Torralba. "Modeling the shape of the scene: A
holistic representation of the spatial envelope." International journal of
computer vision 42.3 (2001): 145-175.
[73] Torralba, Antonio, Robert Fergus, and Yair Weiss. "Small codes and large
image databases for recognition." Computer Vision and Pattern Recognition,
2008. CVPR 2008. IEEE Conference on. IEEE, 2008.
[74] BC Russell, A Torralba, KP Murphy,“LabelMe: a database and web-based
tool for image annotation”,International journal of Computer Vision,
2008(77) – Springer, pp-157-173.
[75] Taha H. Rassem, Bee Ee Khoo,” Object Class Recognition using
Combination of Color SIFT Descriptors”, 2011 IEEE
[76] Gyuri Dork_o, Cordelia Schmid,” Object Class Recognition Using
Discriminative Local Features”,Technical Report
[77] Gy. Dorko and C. Schmid. Selection of scale-invariant parts for object class
recognition”. In Proceedings of the Ninth IEEE International Conference on
Computer Vision (ICCV’03), pages 634–639, 2003.
[78] Thomas Serre, Lior Wolf, Stanley Bileschi, Maximilian Riesenhuber, and
Tomaso Poggio,” Robust Object Recognition with Cortex-Like
Mechanisms”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND
MACHINE INTELLIGENCE, VOL. 29, NO. 3, MARCH 2007 pp 411-426.
[79] B. Mayurathan, A. Ramanan, S. Mahesan & U.A.J. Pinidiyaarachchi,”
Speeded-up and Compact Visual Codebook for Object Recognition”,
International Journal of Image Processing (IJIP), Volume (7): Issue (1): 2013
pp 31-50
[80] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing.
Prentice-Hall Inc., 2002.
[81] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, “Learning object
Categories from google’s Image Search,” Computer Vision, 2005. ICCV’
2005. Tenth IEEE International Conference on, 2005.
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 131 editor@iaeme.com
[82] Joao Carreira and Cristian Sminchisescu, “Constrained Parametric Min-Cuts
for Automatic Object Segmentation”, Computer Vision and Pattern
Recognition (CVPR), 2010 IEEE Conference pp 3241-3248
[83] Van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2008). Evaluation of
color descriptors for object and scene recognition. In Proceedings of the IEEE
conference on computer vision and pattern recognition (CVPR’08)
[84] Uijlings, Jasper RR, et al. "Selective search for object
recognition."International journal of computer vision 104.2 (2013): 154-171.
[85] Van de Sande, Koen EA, et al. "Segmentation as selective search for object
recognition." Computer Vision (ICCV), 2011 IEEE International Conference
on. IEEE, 2011. (selective search) (reviewerI and III)
[86] Fuxin Li_ and Joao Carreira_ and Cristian Sminchisescu, “Object
Recognition as Ranking Holistic Figure-Ground Hypotheses” , Computer
Vision and Pattern Recognition (CVPR), 2010 IEEE Conference , pp 1712 –
1719.
[87] Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection
and semantic segmentation." Computer Vision and Pattern Recognition
(CVPR), 2014 IEEE Conference on. IEEE, 2014.
[88] Carreira, Joao, et al. "Semantic segmentation with second-order pooling."
Computer Vision–ECCV 2012. Springer Berlin Heidelberg, 2012. 430-443.
[89] Carreira, Joao, and Cristian Sminchisescu. "CPMC: Automatic object
segmentation using constrained parametric min-cuts." Pattern Analysis and
Machine Intelligence, IEEE Transactions on 34.7 (2012): 1312-1328.
[90] Li, Fuxin, Joao Carreira, and Cristian Sminchisescu. "Object recognition as
ranking holistic figure-ground hypotheses." Computer Vision and Pattern
Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010
[91] Fischler, Martin A., and Robert A. Elschlager. "The representation and
matching of pictorial structures." IEEE Transactions on Computers 22.1
(1973): 67-92.
[92] Felzenszwalb, Pedro F., and Daniel P. Huttenlocher. "Pictorial structures for
object recognition." International Journal of Computer Vision 61.1 (2005):
55-79.
[93] Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained
part-based models." Pattern Analysis and Machine Intelligence, IEEE
Transactions on 32.9 (2010): 1627-1645.
[94] Girshick, Ross B., Pedro F. Felzenszwalb, and D. McAllester.
"Discriminatively trained deformable part models, release 5." (2012).
[95] Felzenszwalb, Pedro, David McAllester, and Deva Ramanan. "A
discriminatively trained, multiscale, deformable part model." Computer
Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on.
IEEE, 2008.
[96] Felzenszwalb, Pedro F., Ross B. Girshick, and David McAllester. "Cascade
object detection with deformable part models." Computer vision and pattern
recognition (CVPR), 2010 IEEE conference on. IEEE, 2010.
[97] Ferrari, Vittorio, Frederic Jurie, and Cordelia Schmid. "Accurate object
detection with deformable shape models learnt from images." Computer
Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE,
2007.
[98] Pentland, Alex P. "Automatic extraction of deformable part models."
International Journal of Computer Vision 4.2 (1990): 107-126.
Prof. Deepika Shukla and Apurva Desai
http://www.iaeme.com/IJARET/index.asp 132 editor@iaeme.com
[99] Pandey, Megha, and Svetlana Lazebnik. "Scene recognition and weakly
supervised object localization with deformable part-based models." Computer
Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
[100] Ren, Xiaofeng, and Deva Ramanan. "Histograms of sparse codes for object
detection." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE
Conference on. IEEE, 2013.
[101] Yang, Yi, and Deva Ramanan. "Articulated pose estimation with flexible
mixtures-of-parts." Computer Vision and Pattern Recognition (CVPR), 2011
IEEE Conference on. IEEE, 2011.
[102] Bourdev, Lubomir, and Jitendra Malik. "Poselets: Body part detectors trained
using 3d human pose annotations." Computer Vision, 2009 IEEE 12th
International Conference on. IEEE, 2009.
[103] Arbeláez, Pablo, et al. "Semantic segmentation using regions and parts."
Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference
on. IEEE, 2012.
[104] Bourdev, Lubomir, et al. "Detecting people using mutually consistent poselet
activations." Computer Vision–ECCV 2010. Springer Berlin Heidelberg,
2010. 168-181.
[105] Arbeláez, Pablo, Bharath Hariharan, Chunhui Gu, Saurabh Gupta, Lubomir
Bourdev, and Jitendra Malik. "Semantic segmentation using regions and
parts." In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE
Conference on, pp. 3378-3385. IEEE, 2012.
[106] Zhu, Long, et al. "Latent hierarchical structural learning for object detection."
Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference
on. IEEE, 2010.
[107] Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding
convolutional networks." Computer Vision–ECCV 2014. Springer
International Publishing, 2014. 818-833.
[108] Zhao, Wenyi, et al. "Face recognition: A literature survey." Acm Computing
Surveys (CSUR) 35.4 (2003): 399-458.
[109] Yang, Ming-Hsuan, David Kriegman, and Narendra Ahuja. "Detecting faces
in images: A survey." Pattern Analysis and Machine Intelligence, IEEE
Transactions on 24.1 (2002): 34-58.
[110] T.yamazaki,T.Fujikawa,J.Katto,”Improving the performance of SIFT using
Bilateral Filter and its Application to Generic Object Recognition.”, ICASSP
2012, IEEE , pp 945 – 948.
[111] Chiu, Liang-Chi, et al. "Fast SIFT Design For Real-Time Visual Feature
Extraction." Image Processing, IEEE Transactions on 22.8 (2013): 3158-
3167.
[112] Kamencay, Patrik, et al. "Feature extraction for object recognition using
PCA-KNN with application to medical image analysis." Telecommunications
and Signal Processing (TSP), 2013 36th International Conference on. IEEE,
2013.
[113] He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks
for visual recognition."arXiv preprint arXiv: 1406.4729 (2014).
[114] Goyal, Soren, and Paul Benjamin. "Object Recognition Using Deep Neural
Networks: A Survey." arXiv preprint arXiv: 1412.3684 (2014).
[115] Nevatia, Ramakant, and Thomas O. Binford. "Description and recognition of
curved objects." Artificial Intelligence 8.1 (1977): 77-98.
Review on Generic Object Recognition Techniques: Challenges and Opportunities
http://www.iaeme.com/IJARET/index.asp 133 editor@iaeme.com
[116] Fidler, Sanja, Marko Boben, and Ales Leonardis. "Learning a hierarchical
compositional shape vocabulary for multi-class object representation." arXiv
preprint arXiv: 1408.5516 (2014).
[117] Lee, Tom, Sanja Fidler, and Sven Dickinson. "Multi-cue mid-level
grouping."
[118] Russakovsky, Olga, et al. "Imagenet large scale visual recognition
challenge." arXiv preprint arXiv: 1409.0575 (2014).
[119] Everingham, Mark, et al. "The pascal visual object classes challenge: A
retrospective." International Journal of Computer Vision 111.1 (2014): 98-
136.Fei
[120] Nene, Sameer A., Shree K. Nayar, and Hiroshi Murase. Columbia object
image library (COIL-20). Technical Report CUCS-005-96, 1996.
[121] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: a large-
scale hierarchical image database, IEEE Computer Vision and Pattern
Recognition, 2009. <http://www.image-net.org/>.
[122] Lin, Tsung-Yi, et al. "Microsoft COCO: Common objects in
context." Computer Vision–ECCV 2014. Springer International Publishing,
2014. 740-755.
[123] LeCun, Yann, et al. "Backpropagation applied to handwritten zip code
recognition." Neural computation 1.4 (1989): 541-551.
[124] Humphrey, Eric J., Juan Pablo Bello, and Yann LeCun. "Moving Beyond
Feature Design: Deep Architectures and Automatic Feature Learning in
Music Informatics." ISMIR. 2012.
[125] Krizhevsky, Alex, and Geoffrey Hinton. "Learning multiple layers of features
from tiny images." Computer Science Department, University of Toronto,
Tech. Rep 1.4 (2009): 7.
[126] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet
classification with deep convolutional neural networks." Advances in neural
information processing systems. 2012.
[127] http://riemenschneider.hayko.at/vision/dataset/index.php as referred on 12th
April 2015
[128] http://image-net.org/challenges/LSVRC/2012/analysis/as referred on 12th
April 2015
Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding
convolutional networks." Computer Vision–ECCV 2014. Springer
International Publishing, 2014. 818-833.
[129] Bengio, Yoshua, et al. "Greedy layer-wise training of deep
networks."Advances in neural information processing systems 19 (2007):
153.
[130] Mrs. Manisha Bhisekar and Prof. Prajakta Deshmane, Image Retrieval and
Face Recognition Techniques: Literature Survey. International Journal of
Electronics and communication Engineering and Technology, 5(1), 2014, pp.
52-58.
[131] Yoel E. Almeida, Ashray S. Bhandare and Aishwary P. Nipane, Computer
Vision Based Adaptive Lighting Solutions for Smart and Efficient System.
International Journal of Computer Engineering and Technology, 6(3), 2015,
pp. 01-11.
[132] Socher, Richard, et al. Parsing natural scenes and natural language with
recursive neural networks. Proceedings of the 28th international conference
on machine learning (ICML-11). 2011.

More Related Content

What's hot

Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...ijceronline
 
AHP validated literature review of forgery type dependent passive image forge...
AHP validated literature review of forgery type dependent passive image forge...AHP validated literature review of forgery type dependent passive image forge...
AHP validated literature review of forgery type dependent passive image forge...IJECEIAES
 
Face Detection Using Modified Viola Jones Algorithm
Face Detection Using Modified Viola Jones AlgorithmFace Detection Using Modified Viola Jones Algorithm
Face Detection Using Modified Viola Jones Algorithmpaperpublications3
 
Paper id 24201475
Paper id 24201475Paper id 24201475
Paper id 24201475IJRAT
 
Ieeepro techno solutions ieee embedded project secure and robust iris recog...
Ieeepro techno solutions   ieee embedded project secure and robust iris recog...Ieeepro techno solutions   ieee embedded project secure and robust iris recog...
Ieeepro techno solutions ieee embedded project secure and robust iris recog...srinivasanece7
 
Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...
Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...
Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...AM Publications
 
Face Recognition Based Attendance System with Auto Alert to Guardian using Ca...
Face Recognition Based Attendance System with Auto Alert to Guardian using Ca...Face Recognition Based Attendance System with Auto Alert to Guardian using Ca...
Face Recognition Based Attendance System with Auto Alert to Guardian using Ca...ijtsrd
 
AN IMPROVED TECHNIQUE FOR HUMAN FACE RECOGNITION USING IMAGE PROCESSING
AN IMPROVED TECHNIQUE FOR HUMAN FACE RECOGNITION USING IMAGE PROCESSINGAN IMPROVED TECHNIQUE FOR HUMAN FACE RECOGNITION USING IMAGE PROCESSING
AN IMPROVED TECHNIQUE FOR HUMAN FACE RECOGNITION USING IMAGE PROCESSINGijiert bestjournal
 
FACE DETECTION AND FEATURE EXTRACTION FOR FACIAL EMOTION DETECTION
FACE DETECTION AND FEATURE EXTRACTION FOR FACIAL EMOTION DETECTIONFACE DETECTION AND FEATURE EXTRACTION FOR FACIAL EMOTION DETECTION
FACE DETECTION AND FEATURE EXTRACTION FOR FACIAL EMOTION DETECTIONvivatechijri
 
IRJET- Age Analysis using Face Recognition with Hybrid Algorithm
IRJET-  	  Age Analysis using Face Recognition with Hybrid AlgorithmIRJET-  	  Age Analysis using Face Recognition with Hybrid Algorithm
IRJET- Age Analysis using Face Recognition with Hybrid AlgorithmIRJET Journal
 
A Comparison Based Study on Biometrics for Human Recognition
A Comparison Based Study on Biometrics for Human RecognitionA Comparison Based Study on Biometrics for Human Recognition
A Comparison Based Study on Biometrics for Human RecognitionIOSR Journals
 
Mining of Images Based on Structural Features Correlation for Facial Annotation
Mining of Images Based on Structural Features Correlation for Facial AnnotationMining of Images Based on Structural Features Correlation for Facial Annotation
Mining of Images Based on Structural Features Correlation for Facial AnnotationIRJET Journal
 
Feature based head pose estimation for controlling movement of
Feature based head pose estimation for controlling movement ofFeature based head pose estimation for controlling movement of
Feature based head pose estimation for controlling movement ofIAEME Publication
 
Pattern Recognition
Pattern RecognitionPattern Recognition
Pattern RecognitionMaaz Hasan
 
IRJET - Chatbot with Gesture based User Input
IRJET -  	  Chatbot with Gesture based User InputIRJET -  	  Chatbot with Gesture based User Input
IRJET - Chatbot with Gesture based User InputIRJET Journal
 
Performance Comparison of Face Recognition Using DCT Against Face Recognition...
Performance Comparison of Face Recognition Using DCT Against Face Recognition...Performance Comparison of Face Recognition Using DCT Against Face Recognition...
Performance Comparison of Face Recognition Using DCT Against Face Recognition...CSCJournals
 

What's hot (20)

Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
Scale Invariant Feature Transform Based Face Recognition from a Single Sample...
 
AHP validated literature review of forgery type dependent passive image forge...
AHP validated literature review of forgery type dependent passive image forge...AHP validated literature review of forgery type dependent passive image forge...
AHP validated literature review of forgery type dependent passive image forge...
 
Face Detection Using Modified Viola Jones Algorithm
Face Detection Using Modified Viola Jones AlgorithmFace Detection Using Modified Viola Jones Algorithm
Face Detection Using Modified Viola Jones Algorithm
 
Hl2413221328
Hl2413221328Hl2413221328
Hl2413221328
 
Paper id 24201475
Paper id 24201475Paper id 24201475
Paper id 24201475
 
Ieeepro techno solutions ieee embedded project secure and robust iris recog...
Ieeepro techno solutions   ieee embedded project secure and robust iris recog...Ieeepro techno solutions   ieee embedded project secure and robust iris recog...
Ieeepro techno solutions ieee embedded project secure and robust iris recog...
 
Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...
Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...
Iris Encryption using (2, 2) Visual cryptography & Average Orientation Circul...
 
AUGMENTED REALITY
AUGMENTED REALITYAUGMENTED REALITY
AUGMENTED REALITY
 
Face Recognition Based Attendance System with Auto Alert to Guardian using Ca...
Face Recognition Based Attendance System with Auto Alert to Guardian using Ca...Face Recognition Based Attendance System with Auto Alert to Guardian using Ca...
Face Recognition Based Attendance System with Auto Alert to Guardian using Ca...
 
AN IMPROVED TECHNIQUE FOR HUMAN FACE RECOGNITION USING IMAGE PROCESSING
AN IMPROVED TECHNIQUE FOR HUMAN FACE RECOGNITION USING IMAGE PROCESSINGAN IMPROVED TECHNIQUE FOR HUMAN FACE RECOGNITION USING IMAGE PROCESSING
AN IMPROVED TECHNIQUE FOR HUMAN FACE RECOGNITION USING IMAGE PROCESSING
 
FACE DETECTION AND FEATURE EXTRACTION FOR FACIAL EMOTION DETECTION
FACE DETECTION AND FEATURE EXTRACTION FOR FACIAL EMOTION DETECTIONFACE DETECTION AND FEATURE EXTRACTION FOR FACIAL EMOTION DETECTION
FACE DETECTION AND FEATURE EXTRACTION FOR FACIAL EMOTION DETECTION
 
IRJET- Age Analysis using Face Recognition with Hybrid Algorithm
IRJET-  	  Age Analysis using Face Recognition with Hybrid AlgorithmIRJET-  	  Age Analysis using Face Recognition with Hybrid Algorithm
IRJET- Age Analysis using Face Recognition with Hybrid Algorithm
 
A Comparison Based Study on Biometrics for Human Recognition
A Comparison Based Study on Biometrics for Human RecognitionA Comparison Based Study on Biometrics for Human Recognition
A Comparison Based Study on Biometrics for Human Recognition
 
Mining of Images Based on Structural Features Correlation for Facial Annotation
Mining of Images Based on Structural Features Correlation for Facial AnnotationMining of Images Based on Structural Features Correlation for Facial Annotation
Mining of Images Based on Structural Features Correlation for Facial Annotation
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Feature based head pose estimation for controlling movement of
Feature based head pose estimation for controlling movement ofFeature based head pose estimation for controlling movement of
Feature based head pose estimation for controlling movement of
 
Pattern Recognition
Pattern RecognitionPattern Recognition
Pattern Recognition
 
IRJET - Chatbot with Gesture based User Input
IRJET -  	  Chatbot with Gesture based User InputIRJET -  	  Chatbot with Gesture based User Input
IRJET - Chatbot with Gesture based User Input
 
Ijetcas14 465
Ijetcas14 465Ijetcas14 465
Ijetcas14 465
 
Performance Comparison of Face Recognition Using DCT Against Face Recognition...
Performance Comparison of Face Recognition Using DCT Against Face Recognition...Performance Comparison of Face Recognition Using DCT Against Face Recognition...
Performance Comparison of Face Recognition Using DCT Against Face Recognition...
 

Viewers also liked

TO INVESTIGATE THE LEVEL OF EMOTIONAL INTELLIGENCE AND STRESS AMONGST COLLEG...
 TO INVESTIGATE THE LEVEL OF EMOTIONAL INTELLIGENCE AND STRESS AMONGST COLLEG... TO INVESTIGATE THE LEVEL OF EMOTIONAL INTELLIGENCE AND STRESS AMONGST COLLEG...
TO INVESTIGATE THE LEVEL OF EMOTIONAL INTELLIGENCE AND STRESS AMONGST COLLEG...IAEME Publication
 
DESIGN, SIMULATION, IMPLEMENTATION AND CONTROL OF PLC BASED INTEGRAL CYCLE CO...
DESIGN, SIMULATION, IMPLEMENTATION AND CONTROL OF PLC BASED INTEGRAL CYCLE CO...DESIGN, SIMULATION, IMPLEMENTATION AND CONTROL OF PLC BASED INTEGRAL CYCLE CO...
DESIGN, SIMULATION, IMPLEMENTATION AND CONTROL OF PLC BASED INTEGRAL CYCLE CO...IAEME Publication
 
Miten uudistimme alle 16 vuotiaan vammaistuen verkkotekstejä
Miten uudistimme alle 16 vuotiaan vammaistuen verkkotekstejäMiten uudistimme alle 16 vuotiaan vammaistuen verkkotekstejä
Miten uudistimme alle 16 vuotiaan vammaistuen verkkotekstejäKela
 
Advanced architecture theory and criticism lecture 01
Advanced architecture theory and criticism lecture 01Advanced architecture theory and criticism lecture 01
Advanced architecture theory and criticism lecture 01Khaled Ali
 

Viewers also liked (10)

TO INVESTIGATE THE LEVEL OF EMOTIONAL INTELLIGENCE AND STRESS AMONGST COLLEG...
 TO INVESTIGATE THE LEVEL OF EMOTIONAL INTELLIGENCE AND STRESS AMONGST COLLEG... TO INVESTIGATE THE LEVEL OF EMOTIONAL INTELLIGENCE AND STRESS AMONGST COLLEG...
TO INVESTIGATE THE LEVEL OF EMOTIONAL INTELLIGENCE AND STRESS AMONGST COLLEG...
 
DESIGN, SIMULATION, IMPLEMENTATION AND CONTROL OF PLC BASED INTEGRAL CYCLE CO...
DESIGN, SIMULATION, IMPLEMENTATION AND CONTROL OF PLC BASED INTEGRAL CYCLE CO...DESIGN, SIMULATION, IMPLEMENTATION AND CONTROL OF PLC BASED INTEGRAL CYCLE CO...
DESIGN, SIMULATION, IMPLEMENTATION AND CONTROL OF PLC BASED INTEGRAL CYCLE CO...
 
De klok
De klokDe klok
De klok
 
Daminato Rodriguez
Daminato RodriguezDaminato Rodriguez
Daminato Rodriguez
 
Miten uudistimme alle 16 vuotiaan vammaistuen verkkotekstejä
Miten uudistimme alle 16 vuotiaan vammaistuen verkkotekstejäMiten uudistimme alle 16 vuotiaan vammaistuen verkkotekstejä
Miten uudistimme alle 16 vuotiaan vammaistuen verkkotekstejä
 
CLOUD COMPUTING: A REVIEW
CLOUD COMPUTING: A REVIEWCLOUD COMPUTING: A REVIEW
CLOUD COMPUTING: A REVIEW
 
Wp 7
Wp 7Wp 7
Wp 7
 
Wp 3
Wp 3Wp 3
Wp 3
 
Advanced architecture theory and criticism lecture 01
Advanced architecture theory and criticism lecture 01Advanced architecture theory and criticism lecture 01
Advanced architecture theory and criticism lecture 01
 
Wp 8
Wp 8Wp 8
Wp 8
 

Similar to REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES

Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxK Manjunath
 
Medical vision: Web and mobile medical image retrieval system based on google...
Medical vision: Web and mobile medical image retrieval system based on google...Medical vision: Web and mobile medical image retrieval system based on google...
Medical vision: Web and mobile medical image retrieval system based on google...IJECEIAES
 
A novel enhanced algorithm for efficient human tracking
A novel enhanced algorithm for efficient human trackingA novel enhanced algorithm for efficient human tracking
A novel enhanced algorithm for efficient human trackingIJICTJOURNAL
 
An Approach For Single Object Detection In Images
An Approach For Single Object Detection In ImagesAn Approach For Single Object Detection In Images
An Approach For Single Object Detection In ImagesCSCJournals
 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & TrackingAkshay Gujarathi
 
Survey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesSurvey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesCSCJournals
 
Deep Learning for X ray Image to Text Generation
Deep Learning for X ray Image to Text GenerationDeep Learning for X ray Image to Text Generation
Deep Learning for X ray Image to Text Generationijtsrd
 
The deep learning technology on coco framework full report
The deep learning technology on coco framework full reportThe deep learning technology on coco framework full report
The deep learning technology on coco framework full reportJIEMS Akkalkuwa
 
Computer Vision for Object Detection and Tracking.pdf
Computer Vision for Object Detection and Tracking.pdfComputer Vision for Object Detection and Tracking.pdf
Computer Vision for Object Detection and Tracking.pdfNexgits Private Limited
 
Adversarial Multi Scale Features Learning for Person Re Identification
Adversarial Multi Scale Features Learning for Person Re IdentificationAdversarial Multi Scale Features Learning for Person Re Identification
Adversarial Multi Scale Features Learning for Person Re Identificationijtsrd
 
IRJET- Comparative Analysis of Video Processing Object Detection
IRJET- Comparative Analysis of Video Processing Object DetectionIRJET- Comparative Analysis of Video Processing Object Detection
IRJET- Comparative Analysis of Video Processing Object DetectionIRJET Journal
 
Techniques Used For Extracting Useful Information From Images
Techniques Used For Extracting Useful Information From ImagesTechniques Used For Extracting Useful Information From Images
Techniques Used For Extracting Useful Information From ImagesJill Crawford
 
Real Time Object Detection with Audio Feedback using Yolo v3
Real Time Object Detection with Audio Feedback using Yolo v3Real Time Object Detection with Audio Feedback using Yolo v3
Real Time Object Detection with Audio Feedback using Yolo v3ijtsrd
 
IRJET - Direct Me-Nevigation for Blind People
IRJET -  	  Direct Me-Nevigation for Blind PeopleIRJET -  	  Direct Me-Nevigation for Blind People
IRJET - Direct Me-Nevigation for Blind PeopleIRJET Journal
 
Object Detection and Tracking AI Robot
Object Detection and Tracking AI RobotObject Detection and Tracking AI Robot
Object Detection and Tracking AI RobotIRJET Journal
 
IRJET- Object Detection in an Image using Deep Learning
IRJET- Object Detection in an Image using Deep LearningIRJET- Object Detection in an Image using Deep Learning
IRJET- Object Detection in an Image using Deep LearningIRJET Journal
 
Image Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningImage Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningIRJET Journal
 

Similar to REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES (20)

Paper of Final Year Project.pdf
Paper of Final Year Project.pdfPaper of Final Year Project.pdf
Paper of Final Year Project.pdf
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Medical vision: Web and mobile medical image retrieval system based on google...
Medical vision: Web and mobile medical image retrieval system based on google...Medical vision: Web and mobile medical image retrieval system based on google...
Medical vision: Web and mobile medical image retrieval system based on google...
 
A novel enhanced algorithm for efficient human tracking
A novel enhanced algorithm for efficient human trackingA novel enhanced algorithm for efficient human tracking
A novel enhanced algorithm for efficient human tracking
 
An Approach For Single Object Detection In Images
An Approach For Single Object Detection In ImagesAn Approach For Single Object Detection In Images
An Approach For Single Object Detection In Images
 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & Tracking
 
Survey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesSurvey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real Images
 
Deep Learning for X ray Image to Text Generation
Deep Learning for X ray Image to Text GenerationDeep Learning for X ray Image to Text Generation
Deep Learning for X ray Image to Text Generation
 
Dq4301702706
Dq4301702706Dq4301702706
Dq4301702706
 
The deep learning technology on coco framework full report
The deep learning technology on coco framework full reportThe deep learning technology on coco framework full report
The deep learning technology on coco framework full report
 
Computer Vision for Object Detection and Tracking.pdf
Computer Vision for Object Detection and Tracking.pdfComputer Vision for Object Detection and Tracking.pdf
Computer Vision for Object Detection and Tracking.pdf
 
Adversarial Multi Scale Features Learning for Person Re Identification
Adversarial Multi Scale Features Learning for Person Re IdentificationAdversarial Multi Scale Features Learning for Person Re Identification
Adversarial Multi Scale Features Learning for Person Re Identification
 
IRJET- Comparative Analysis of Video Processing Object Detection
IRJET- Comparative Analysis of Video Processing Object DetectionIRJET- Comparative Analysis of Video Processing Object Detection
IRJET- Comparative Analysis of Video Processing Object Detection
 
Techniques Used For Extracting Useful Information From Images
Techniques Used For Extracting Useful Information From ImagesTechniques Used For Extracting Useful Information From Images
Techniques Used For Extracting Useful Information From Images
 
Real Time Object Detection with Audio Feedback using Yolo v3
Real Time Object Detection with Audio Feedback using Yolo v3Real Time Object Detection with Audio Feedback using Yolo v3
Real Time Object Detection with Audio Feedback using Yolo v3
 
IRJET - Direct Me-Nevigation for Blind People
IRJET -  	  Direct Me-Nevigation for Blind PeopleIRJET -  	  Direct Me-Nevigation for Blind People
IRJET - Direct Me-Nevigation for Blind People
 
Object Detection and Tracking AI Robot
Object Detection and Tracking AI RobotObject Detection and Tracking AI Robot
Object Detection and Tracking AI Robot
 
IRJET- Object Detection in an Image using Deep Learning
IRJET- Object Detection in an Image using Deep LearningIRJET- Object Detection in an Image using Deep Learning
IRJET- Object Detection in an Image using Deep Learning
 
Computer vision
Computer visionComputer vision
Computer vision
 
Image Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningImage Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep Learning
 

More from IAEME Publication

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME Publication
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...IAEME Publication
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSIAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSIAEME Publication
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSIAEME Publication
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSIAEME Publication
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOIAEME Publication
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IAEME Publication
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYIAEME Publication
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEIAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...IAEME Publication
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...IAEME Publication
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTIAEME Publication
 

More from IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Recently uploaded

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Recently uploaded (20)

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 

REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES

  • 1. http://www.iaeme.com/IJARET/index.asp 104 editor@iaeme.com International Journal of Advanced Research in Engineering and Technology (IJARET) Volume 6, Issue 12, Dec 2015, pp. 104-133, Article ID: IJARET_06_12_010 Available online at http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=6&IType=12 ISSN Print: 0976-6480 and ISSN Online: 0976-6499 © IAEME Publication REVIEW ON GENERIC OBJECT RECOGNITION TECHNIQUES: CHALLENGES AND OPPORTUNITIES Prof. Deepika Shukla Comp. Science and Engineering Department, Institute of Technology, Nirma University, Ahmedabad, India Apurva Desai Department of Computer Science and Information Technology, VNSGU, Surat India ABSTRACT Recognizing objects automatically from an image is a fundamental step for many real-world computer vision applications. It is the task of identifying an instance of object in an image or video sequence without or least human intervention and assistance. In-spite of very high complexity, human beings perform this task with very less effort and even in the state of least attention. Little effort is needed for the humans to recognize huge number of and various categories of objects in images, though ‘object’ in the image may be different with respect to size / scale, viewpoint, position or orientation. We are even able to recognize the objects from an image, when they are only partially visible or present against cluttered background. Not only this, the recognition can be for specific instance of object or object category/class. When the task is done for classes of the object it is known as Generic object recognition or object-class detection or category-level object recognition. It has been found that over the years many techniques have evolved for recognizing object classes from images, but any automated object recognition system till date has not gained this capability fully at par with human beings. This very fact makes recognition of objects from an image, the most basic and fundamental challenge in the field of computer vision research. The purpose of this study is to give an overview and categorization of the approaches used in the literature for the purpose of Generic Object Recognition and various technical advancements achieved in the field. Mostly the survey focusses on the leading work since year 2000. We have discussed the challenges that the field is currently facing. We have also made an attempt to suggest future research directions in the area of Generic Object Recognition. Finally we conclude the study with a hope that in
  • 2. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 105 editor@iaeme.com near future more sophisticated object class recognition systems would be developed in an efficient and cost effective manner. Key words: Object Recognition, Generic Object Recognition, Object class Recognition, Scene Understanding, Scene categorization, Image Analysis, Computer Vision, Machine Vision, Scene Analysis, Image Analysis. Cite this Article: Prof. Deepika Shukla and Apurva Desai, Review on Generic Object Recognition Techniques: Challenges and Opportunities. International Journal of Advanced Research in Engineering and Technology, 6(12), 2015, pp. 104-133. http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=6&IType=12 1. INTRODUCTION Automated recognition of objects in images is a critical and fundamental step for many real-world computer vision applications. It is the task of finding a given object in an image or video sequence without or least human intervention/ assistance. As we know, very little effort is required at our part to detect and recognize huge number of classes of objects in images though image of the object may be different with respect to size / scale, viewpoint position or orientation. Human beings are able to recognize the objects from an image even when they are only partially visible or present against cluttered background. Also, the ability to generalize from examples and categories objects, events, scenes, and places is one of the core capabilities of the human visual system; For human being this is a mundane activity, but imbibing these capabilities in machine, has still proved to be significantly challenging task for computer vision systems in general. The reason behind this may root to the fact that “Automatic Object Recognition” requires understanding of human visual perception and so becomes a multidisciplinary research area involving knowledge and expertise of fields like optics, psychology, pattern recognition, artificial intelligence, machine learning and most importantly cognitive science which in itself needs sophisticated concepts and tools from mathematics as well as computer science [1]. Object recognition is a dominant field of research in the computer vision as well as image analysis applications and even the simplest machine vision task cannot be solved without the help of recognition. The fact can be evidenced by the existence of vast volume of research conducted in this area over the past three decades. The statement can be substantiated by the fact that, if one just gives, “objects recognition from images” as the search string on ieeexplore.org, gets more than 20000 results. So, from the substantial volume of current literature existing on the topic, we can also say that “Object Recognition” field is closely tied to and is part and parcel of computer vision research. This paper reviews most of the leading state-of-the-art researches performed in the area of Generic object recognition. But more specifically, it is focused to get the insight into following Research Questions pertaining to the topic of Generic Object Recognition. What are the generic object recognition techniques and approaches drawn by the literature? What different representation techniques are used for object representation?
  • 3. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 106 editor@iaeme.com Which feature detection and extraction methods are used by most of the prominent researchers on the topic? Which classification/learning technique has been used in the classification stage of the object recognition pipeline? The rest of this paper is organized as follows. Section 2 introduces and explains the problem of Generic object recognition which can be considered as specific subset of object recognition problem. Section 3 concentrates on the challenges that the field of object recognition faces in general and Generic Object Recognition in particular. Section 4 discusses the vast literature existing for the topic area. Section 5 manifests roadmap to future research areas and directions. Section 6 finally sought conclusion of the study. 2. GENERIC OBJECT RECOGNITION PROBLEM The problem of object recognition can be viewed as a classification or labelling problem where models/representation of known objects are available to the system and when a novel image is given, the system has to predict the class of the object[s] present in the Image. Formally, it can be stated as, given an image containing one or more objects of interest (and background) and a set of labels corresponding to a set of models known to the system, the system should assign correct labels to regions, or a set of regions, in the image. i. e. Object recognition systems should assign a high level definition of an object based on the image data, that is represented. Oftentimes, the task of object recognition is considered as broadly comprising of three sub-tasks; Object detection: Detecting whether an instance of the object category is present in the image or not. Localization: To give the location of object category. Drawing a bounding box surrounding the object instance is most prominently used in literature to show the result of localization. Visual category recognition: To recognize and label the class/category of object present in the image. Moreover, the image being presented to the object recognition framework for the purpose of recognizing objects from it may have single instance of some class of object or may have multiple instances of single class or multiple instances of multiple classes. Therefore, the object recognition approaches at the top-most level can broadly be categorized to follow top-down, bottom-up or hybrid approach. And within that it can be for recognizing specific or generic object. So basically image-based object recognition can be stated as; Given a database of objects and an image, determine what, if any of the object[s] are present in the image. Thus the problem of object class recognition can be considered as an instance of supervised classification. Another dimension along which, the task of object recognition can be categorized is: First, Where a specific object to be recognized is known to the system and the system is trained for that specific object category only. For example, Face recognition , pedestrian recognition Second, Generic object recognition system. Generic object recognition means that the computer recognizes objects from images by their general name [2] or common name. Figure-1 shows an instance of Generic Object Recognition. Generic object recognition has been also referred as object-class detection or category-level object recognition in literature [14] which aims at recognizing the class to which the object present in the image belongs. The images
  • 4. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 107 editor@iaeme.com can have single instance of a class, multiple instances of same class or multiple instances of multiple classes. When categorization of multiple objects of multiple classes in an image is done, it is known as scene categorization. Figure 1 Generic Object Recognition 2.1. Architecture of the object recognition system The current vision systems can said to be consisting of activities as shown in Figure - 2. Figure 2 Activities involved in a typical vision system Any recognition system would involve these or some subset of these activities in its life cycle. In general, after image acquisition stage, image is pre-processed for performing noise removal and some kind of enhancement. The pre-processing stage is followed by feature extraction and description/representation stage which then are passed for recognition. In the representation stage, the objects can be represented as 2- D or 3-D. Figure-3 shows the general architecture of object recognition system. Object Recognition task is affected by several factors and can differ according to various aspects as shown in Figure-4. It shows the categorization of aspects in which the work is going on, in the field of object recognition. The approaches may differ on the basis of form and representation of objects, Matching schemes, Image Formation Model, Type of Features, Type of Image and type of data suited for categorization. Once we studied various aspects we figured out that these approaches mainly differ in
  • 5. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 108 editor@iaeme.com the object representation method based on type of features or the classification approach adopted by the method in the recognition phase. As the factor changes it can easily be observed that the approach changes substantially but basically these approaches broadly follows three paradigms for formulating and attempting the solution to the problem of Object Recognition from an image; Bottom-up, Top-down, Hybrid paradigm [103]. Figure 3 Generic Architecture of Object Recognition System Bottom-upIt can also be considered as Image analysis from its low level data and is based on image segmentation techniques. It considers the raw image data which is available in the form as it is acquired. Boundaries of the homogeneous regions are extracted by performing non-purposive segmentation without prior knowledge about properties of individual object classes. No attempt is made to make any prior assumptions related to what these objects are. Fixed set of attributes are used to characterize these regions and objects are linked together to characterize the scene itself. However, without some additional information, purely bottom-up approaches have so far been unable to yield figure-ground segmentations of sufficient quality for object categorization [Leibe & Shiele] till 2009, then after many approaches have been developed [85, 86, 88, 89] which uses bottom-up segmentation methods as discussed in [82] and [85] and have achieved remarkable results which will be discussed in detail later in literature review section of the paper. Top-down: This is Image Analysis from the Semantic level data. Contradictory to earlier approach, this methodology proceeds with an assumption that the image does contain a particular object. If the problem is of scene categorization, it assumes that it is a particular type of scene. The system will attempt to verify the existence of a hypothesized object. Purposive segmentation may be performed or specialized ways are used to represent the object. Hybrid: Combination of the earlier two paradigms are used [61],[79] in this kind of approach. 3. KEY CHALLENGES 3.1. Challenges overview As stated earlier, the problem of Object recognition in general and Generic Object Recognition in particular faces various challenges. (I) The appearance of an object in the image can have a large range of variation due to: 1. Viewpoint changes 2. Scale, Orientation and Shape changes (e.g., non-rigid objects)
  • 6. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 109 editor@iaeme.com 3. Photometric effects (scene illumination etc.) 4. Scene/Background clutter (therefore objects may be occluded) (II) Different views of the same object can give rise to widely different images. (III) Large number of object categories existing in real-world and these categories may have very less inter-class variation. Figure 4 Factors affecting the task of object recognition 3.2. Description Object recognition can be considered as yet another data processing task, so data is given the highest priority thus acquisition should be considered as most important step. In recent years, with the advent of high quality camera and other image capturing devices, we can collect a huge amount of data (images) in various forms like intensity images, range images and also from various sources like web but the major problem that computer vision research community is facing today is scarcity of accurately and precisely labelled image examples. As stated earlier, object recognition problem can be considered essentially, a supervised classification task, and for that to work successfully there remains the need of labelled images examples. The problem becomes more gruesome for the reason that the task is labour intensive. Also due to the non-availability of human experts which can do image annotation task efficiently and accurately the task becomes more challenging. ‘Feature Extraction’ is the next crucial step in the pipeline of generic object recognition. Assuming that the data is available, the feature extraction becomes the most important stage of the entire object recognition framework. If, suitable features of right dimensions are not extracted, this phase can become the bottle neck of the recognition pipeline. Though recently many sophisticated approaches have been developed and are existing in the literature ,but they are not sufficient to describe every object , so feature extraction becomes too object specific and varies as either viewpoint, size and illumination conditions of the image capturing varies. Thus representing images by effective features is crucial to the performance of various image analysis tasks. Features can be low-level (colour, texture, Intensity), middle- level (Image Patches) or High-level (objects, textually annotated objects). Figure-5 shows a probable classification of different kinds of features. Choosing and deploying an appropriate classifier is the next important step of the pipeline. The classifier can be linear or a non-linear one. Various classifiers like Byesian classifier, SVM, decision trees, Neural Networks etc. are utilized in literature
  • 7. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 110 editor@iaeme.com for the purpose of classification each possessing its own benefits and drawbacks. One important inherent issue related to the classification stage is; scalability of the classifier. As number of existing object categories is too large in real world and many visual features are required to model each category thus forcing the system to have huge volume of training data to model variety of category classifiers. In order to keep scalability manageable, a linear classifier is commonly utilized, but its classification performance is inferior to the nonlinear one whereas the non-linear classifiers are more computation intensive. To remedy this defect of the linear classifiers, a design of rich image feature set, (which is after all, a key factor in the success of the image recognition system) per object class is required so that the system can distinctly recognize objects from images possessing inter-class and intra-class variation as shown in Figure-6. Additionally, classifier has to be updated continuously because even if it is trained once for a category/class of object and previously unseen instances of object emerge or appearance of the object evolves, the earlier trained classifier will not give correct results. This kind of flexibility and resilience to change is inherently expected from any object recognition framework. Figure 5 Classification of Image Features Figure 6 Images of different instances of object (Dog) in varied imaging conditions. Intra-class appearance variations refer to the appearance differences among different objects of the same class [14]. Intra-class appearance variation may be either due to difference in colour, shapes and sizes of the object’s instance or due to difference in imaging conditions. For example Image of the same object taken at different time of the day, in different seasons or at different places, with different devices and different viewpoints, will be entirely be different. In addition to intra-
  • 8. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 111 editor@iaeme.com class appearance variations, Generic Object Recognition system has to efficiently and distinctly handle inter-class appearance variations also; which in many cases would be very less as shown in Figure 7. For example, object recognition system should be capable of distinctly recognize between a donkey and a horse or a Horse and a Mule. Figure 7 Images of Horses and Donkeys with very small inter-class appearance variation: Lower row is images of horses (adapted from [14]) The performance of generic object recognition framework is generally judged upon criteria like robustness against noise, invariance to basic geometric transformations, invariance to illumination and viewpoint changes and its ability to handle the number and different types of objects, ability to handle intra-class and inter-class variations, recognize objects in presence of clutter or complicated background and also to be able to recognize the object even if it is partially occluded accurately and efficiently. These requirements are expected implicitly and must be present in any framework for object recognition and as a result these issues can be considered as key challenges for the field of generic object recognition. 4. LITERATURE REVIEW 4.1. Overview The object recognition pipeline, as stated in the earlier section, consists of the key tasks like Image acquisition, Pre-processing, Feature Extraction, Feature representation/Feature description and Classification. However Image acquisition and Pre-processing phase falls out of the scope of this study. Although, most of the related work surveyed and cited here focusses on one or the other phase of this pipeline, our main focus in this study is on feature extraction and description techniques and also to obtain the answers to the research questions put up at the beginning of this manuscript. Although, various groups of researchers in the literature have attempted to survey and review the work in the field of computer vision but either they are related to some specific object like a survey on face recognition is presented in [108,109] or various descriptors have been compared and surveyed [14,45,48] or a separate survey is presented in [114] on object recognition using deep neural networks. That is one particular aspect of the topic is explored and related literature review is presented while discussing their core work. Periodically comprehensive surveys on generic object recognition [14, 15] have been published in the past but looking to the rapid pace of achievements in the field, it seems natural to survey the most recent developments and object recognition techniques available in the literature. In this study, we have mostly tried to review the work done in the field since year 2000 but more emphasis is given on surveying the work done after 2011.
  • 9. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 112 editor@iaeme.com The rationale behind this is that most of the surveys and papers mostly talk extensively of the approaches before 2011. But, looking at the pace of technical advancements in the field, lot of approaches have emerged since 2011 which demands detailed reportage and covering them is the basic motive of this review. Therefore, the study is also aimed at presenting the survey in such a way which should help to gain an insight into this field of research. Also as noted in introduction section that the task of object recognition is considered as broadly comprising of three sub-tasks: Object detection, Object localization and object classification, but in this manuscript we have studied the approaches of generic object recognition which is the highest level of task in the object categorization subtasks; i.e to categorize the class of object in the image, object detection is inherently performed and many a times they needs to be localized also. Due to this reason we have not segregated the approaches on the basis of detection, localization or categorization. 4.2. Features and Feature Descriptors The foundations of the field can be traced back to 1950s and 1960s, when early work was done in very simplistic domains [1]. The world was modelled as being composed of blocks defined by the coordinates of their vertices and edge information. The “block image” represented areas of uniform brightness in the image and the edges of blocks were located in the areas of intensity discontinuity. But very soon it was realised that, it is not an ideal way to represent the complicated information presented in the image. Since then various strategies are being developed for the task of object recognition with an emphasis on feature extraction stage and in the usage of novel and efficient type of feature descriptor. Object recognition can be classified into various broad categories. These include model-based approaches, shape-based approaches and appearance-based approaches. Model-based approaches try to represent objects as set of various three dimensional objects [1, 12, 13] like generalized cylinders, cones, cubes, cuboids spheres etc. Shape-based approaches [13, 19, 20, 21, 52, 53] represent the objects by shape primitives like boundary fragments, contours, shapelets, etc In contrast, for appearance-based models only the appearance is used, which is usually captured by different two-dimensional views of the object-of-interest. So, it can be observed easily, whatever be the representation method object representation takes the centre stage in the entire object recognition pipeline. And in turn the problem of object class recognition reduces to the generating an efficient representation of object which can detect, localise and identify the class of object discriminatively and repetitively. As stated earlier extracting and describing features efficiently, of the objects from the images, decides the fate and success of a typical object recognition system. In a generic object recognition or categorization system, the relevant features or descriptors from a characteristic point, patch or region of an image are often obtained by different approaches. As shown in Figure-5, the features at the top most level can be divided into two categories global and local wherein the former characterizes the image as a whole whereas latter represents some local information in form of pixel, patch or region. Yet another direction along which many researchers have tried to classify features is structural and statistical. Although, there are various classifications for features but there exists significant overlap among these classes. For example, local features can be structural as well as statistical. These features are often combined to form various descriptors especially region level descriptors are formed by combining colour, texture and other such low-level features.
  • 10. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 113 editor@iaeme.com As far as pixel level features are concerned, they are regarded as low level information of the image and are directly computed from the grayscale value of the pixels individually and generally used to build more sophisticated patch-level or region level descriptors. We now briefly discuss some of the best performing descriptors proposed and utilized over the years. This is not meant to be an exhaustive discussion of the existing approaches, but rather to provide a sample of some relatively successful and widely used approaches over the years. 4.2.1. Appearance-Based Object Representation Local Scale-Invariant Features (SIFT) [3][4] introduced by Lowe, is regarded as one of the most popular patch-level feature descriptors reported in literature. Feature identified are shown to be completely invariant to basic geometric transformation and partially invariant to illumination changes and occlusion. SIFT features proved more successful, as they do not depend on exact grey level distribution within an image patch, instead use general configuration of image gradient [60]. This was considered as one of the prominent approach in the area of object recognition and the work is considered as milestone in the research of object recognition, computer vision and other image analysis problems. However, as the descriptors are appearance based, and may produce poor result especially if the object does not have enough information of its texture features. The SIFT is applied for the problem of object recognition in many works. Two such usages are mentioned in [3] and [4]. In various other work [2, 39, 42, 75, 110, 111, 112], some kind of improvisation has been achieved by combining other features along with SIFT or using other filters than Gaussian [110]. The dimension of key-points obtained when SIFT is applied is relatively large in number, hence resulting into high dimensional data. This drawback was realised by authors in [5] and SIFT was extended as PCA-SIFT , where Principal Component Analysis is applied to normalized gradient patch resulting into lesser dimensional descriptor. PCA-SIFT yields 36-dimensional descriptor which is fast for computation and matching but are less distinctive [6] while descriptor introduced by Mikolajczyk and Schmid[45] namely GLOH (Gradient Location-Oriented Histogram) is another variant over SIFT which proved to be more distinctive with the same dimension[6]. Also, a colour image-based SIFT has been demonstrated in [75], wherein in place of intensity gradients, colour gradients are used in Gaussian framework. As mentioned earlier, high dimensionality of the descriptor is the major limitation of SIFT, another effective patch-level descriptor SURF (Speeded-Up Robust Features) is proposed in [6] by Bay et al. The authors have made use of integral images which results into yielding not only faster but distinctive and repeatable features. The authors based their descriptor on Hessian matrix but uses very basic approximation. Moreover only 64 dimensions are used which is much less than SIFT’s 128 dimensional vector. Though one can argue that PCA-SIFT results in only 36-dimensional vector but at the same time it loses the distinctiveness whereas SURF has been proved more distinctive and repeatable. Another level at which features descriptors are generated in numerous papers is at region level. Dalal and Triggs [32, 33, 34] used grids of locally normalised Histograms of Oriented Gradients (HOG) as descriptors for object detection in static images. The technique counts occurrences of gradient orientation in localized portions of an image. The detector window is tiled with a grid of overlapping blocks in which Histogram of Oriented Gradient feature vectors are extracted. Detector thus presented is contrast-based which makes it robust to small changes in image contour locations and directions, and significant changes in image illumination and colour, while
  • 11. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 114 editor@iaeme.com remaining highly discriminative for overall visual form. In work of Dalal and Triggs [32, 33,34] is aimed at detection of humans in particular, but also proved effective in detecting other object classes from images. HOG descriptor has proved very efficient descriptor for representing structured objects. For example, It has outperformed all other descriptors in pedestrian detection from videos and images. Inspired by HOG [32] Bosch et al[36] proposed a novel descriptor called PHOG (pyramid of HOG). The idea was to represent local image shape and its spatial layout, together with a spatial pyramid kernel of Bag of Features (BoF) [25,26]. Each image is divided into a sequence of increasingly finer spatial grids by repeatedly doubling the number of divisions in each axis direction (like a quadtree). The number of points in each grid cell is then recorded. HOG vector is computed for each grid cell at each pyramid resolution level. The final PHOG descriptor for the image is a concatenation of all the HOG vectors. This concatenated HOG vector is then normalized to ensure that texture rich or images with more edges are not weighted more strongly than others. Another descriptor which is built on the idea of histogram of gradients (HOG) is CoHOG (Co- occurance histograms of gradients) proposed in [37]. CoHOG can express shapes in more detail than HOG as CoHOG are histograms whose basic units are pairs of gradient orientations. Histogram is referred as co-occurrence matrix. Due to this pairing, the vocabulary size increases resulting into more specific expression of shape of object in the image. The usage of higher dimensional matrix makes CoHOG powerful in terms of its discriminative power but at the same time becomes highly computation intensive. Bag of Features and visual codebook based approaches The approach is inspired by BoW (Bag of Words) approach which was first proposed in 1997 by [38] for describing the textual data for the purpose various text analysis tasks. It is used to represent a text document or a sentence written in natural language, as set of words, not taking into consideration its grammar or the order in which these words occur in the original text. The frequency of occurrence of each word is calculated and then used for various language processing tasks. The analogous term BoF( Bag of Features), is used to represent the approach. Similar to BoW model, here image is represented as order less collections of local features of Image. Similar terms like Bag of Keypoints (BoK), Bag of Visual Words(BoVW) by various researchers is used in their works. The method is based on vector quantization of affine invariant descriptors of image patches [39]. A bag of keypoints corresponds to a histogram of the number of occurrences of particular image patterns in a given image. The method uses clustering to obtain quite high-dimensional feature vectors for a classifier. As construction of codebook is done in the BoF approach, at many times it is also referred as codebook-based approach. The method includes following main steps.  Detection of image patches for computation of patch descriptors  Computing patch descriptors for these patches. These descriptors can be any feature invariant descriptors like SIFT [3,4] or any variant of it or any other lower level descriptor like Harris-affine [43], MSER.  Construction of a visual codebook/vocabulary/dictionary by assigning patch descriptors to predetermined clusters (a vocabulary) with a vector quantization algorithm that groups similar features together. For determining clusters instances of usage of several clustering techniques are available. However, more frequently k- means clustering is applied. [39]. Whereas Hierarchical k-means clustering is adopted by [49] and mean-shift by authors in [35].
  • 12. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 115 editor@iaeme.com  Generating a histogram of number of occurrences of particular patches assigned to each cluster. The size of the resulting histogram equals the size of the codebook and hence the number of clusters obtained from the clustering technique [40].  Treating the bag of features as a feature vector and using a classifier to classify the respective image patch. A distance measure is required when comparing two term vectors for similarity but this measure operates in the term vector space as opposed to the feature space. There are two reasons why the bag-of-features image representation (BoF) proved to be popular for indexing and categorization applications. First, this representation benefits from powerful local descriptors, such as the SIFT descriptor and Second, these vector representations can be compared with standard distances, and subsequently be used by robust classification methods such as support vector machines [50]. Also, the codebook model-based approaches, while ignoring any structural aspect in vision, provide state-of-the-art performances on current datasets [40]. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. The codebook-based approaches are considered as simple and efficient, and also can be made robust to clutter, occlusion, viewpoint change, and even non-rigid deformations [26, 25]. Inspite of being one of the popular and successful approaches, we find that BoF and visual codebook generation approach has also got certain limitations. As BoF expresses the image as appearance frequency histograms of visual words by quantizing SIFT like features, location information and the geometric relationship between key-points are lost. Also as vector quantization is involved so inherently loss of information occurs. Also due to loss of geometric relation between the features, localization of the object is not possible. To overcome the limitation of orderless representation of objects, several researchers have proposed approaches to augment bag of features with global spatial relations in a way that significantly, at one end improves classification performance while at the other end remain simple and computationally efficient so that can be applied for the real-world applications [27]. Authors in [27] have demonstrated that bag of feature description of the image can be extended to spatial pyramids so that the spatial location information of the features can be retained. To generate these spatial pyramids, the input image is partitioned into increasingly fine sub-regions. Histograms of local features are computed over these sub-regions. The histograms are further concatenated to generate the final features. This representation is combined with a kernel-based pyramid matching scheme proposed by [24] that efficiently computes approximate global geometric correspondence between sets of features in two images. While the spatial pyramid representation sacrifices the geometric invariance properties of bags of features, it compensates for this loss with increased discriminative power derived from the global spatial information. Similarly in [2], to overcome the problem inherent to BoF approach, graph is constructed by connecting SIFT key-points with lines. As a result, the key-points maintain their relationship, and then structural representation with location information is achieved. Since graph representation is not suitable for statistical work, the graph is embedded into a vector space according to the graph edit distance. As a result, authors achieved recognition accuracy compared to the conventional method in their experiments on PASCAL VOC and Caltech-101 datasets. So, the basic idea to achieve the improvement in BoF approach is to somehow incorporate the spatial location information of features in BoF features so that the method can not only be used for recognition but can also be successfully applied for object localization. The
  • 13. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 116 editor@iaeme.com authors in [47] have achieved improvisation by adding binary signatures to the descriptors, First, Hamming Embedding (HE) of the SIFT descriptors; analogical to hamming distance and second integrate a weak geometric consistency (WGC) check within the inverted file system which penalizes the descriptors that are not consistent in terms of angle and scale. In this way geometrical information is incorporated in the index with very large datasets. But at the same time both Hamming Embedding (HE) and WGC require to store additional information, hence memory requirement of index increases. The visual codebook approach has been used by several other researches in slightly different way. For example, Liebe et al in [7,8,9] have adopted a two staged approach In first stage a codebook of local appearances is learnt which contains information, which local structures may appear on objects of the target category. Next an Implicit Shape Model (ISM) that specifies where on the object the codebook entries may occur. To create the codebook the authors have adopted the method presented in [17] by Agarwal and Roth. From a variety of images, 25 x 25 pixel patches are extracted with the Harris interest point detector. These patches are clustered using agglomerative clustering to generate a compact cluster. These codebook entries are used to define implicit shape model of the objects. The approach do not try to create and define a separate model for all possible shapes an object can take rather define shapes of an object in terms of patches that are consistent in local appearances. Due to this concept, less number of training examples are needed to learn object’s probable shapes. A second time codebook entries are scanned and all those entries are activated whose similarity is above a certain chosen threshold. The threshold chosen would be same as the threshold used during clustering performed in the first step. While in recognition stage generalized Hough transform is performed for identifying possible object centre. GIST: Humans can recognize the gist of a novel image in a single glance, independent of its complexity [69], by considering them in a “holistic” manner, while overlooking most of the details of the constituent objects. Intuitively, GIST summarizes the gradient information (scales and orientations) for different parts of an image, which provides a rough description (the gist) of the scene. Input image is divided into non-overlapping regions. The region is then further divided into sub- regions and then Gradient Orientation histogram is computed for these sub-regions. The GIST descriptor for a region is formed by concatenating these Gradient Orientation histograms for all sub-regions of a region. The approach is more prevalently used for scene understanding purpose. Approaches based on GIST cannot be considered as an alternative to image analysis based on local feature based approaches but can be considered as an additional support for recognition problems by helping to constrain the local feature based image analysis. In [72,73] short binary codes are used to compress local GIST descriptors and demonstrated that the approach works on millions of images obtained from internet without sacrificing the recognition accuracy and effectiveness. 4.2.2. Shape-Base Approaches Many approaches based on intensity, colour gradient of the image patches or region is discussed in the previous part of this paper. Although as noted, these descriptors are very powerful and have shown to perform object recognition with remarkable effectiveness. Still there may be a case where two object class exist with same colour and texture with only difference in shape or for the classes where the appearance is very much variable in every instance of the object. Such objects cannot be represented
  • 14. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 117 editor@iaeme.com with only colour, intensity based features alone. For example, if we consider fruit class, raw mango and capsicum are of green colour, but having entirely different shape. Recognition community also very soon understood that across the exemplars that belong to a category, shape is a more invariant property than appearance. As a result, the majority of recognition systems from the mid-1960s to the late 1990s attempted to extract shape features, typically beginning with the extraction of edges, at occluding boundaries and surface discontinuities, edges capture shape information So, shape is yet another important cue which can be used to generate a discriminative representation of objects. To compute the shape of the object, different authors have taken different methods. Shape cues are frequently captured and described at the region level for object class recognition or detection using contour or boundary fragments [19], shapelets, edgelets [20], shockgrphs etc. Another area of research, as far as object’s shape-based detection is concerned, is how to set up the correspondence between shape extracted from training and test image i.e ; How to say two shapes are matching [52,53]. One of the limitations of shape-based object description is that it cannot capture intra-class variations in very discriminative way. For example, a Zebra cannot be differentiated from a Horse. Often shape-based cues are combined with other appearance based object cues. 4.2.3 Part-Based approaches Object as 3D volumetric parts Earliest attempts at solving the object recognition problem used high level 3D parts based objects, such as generalized cylinders (Binford) and other deformable objects, such as geons (Biederman [13] )and superquadrics (Pentland) [79]. The common characteristic among all of them is that they all based on symmetry; a physical regularity in our world which is exploited by our human visual system. However in practice it becomes too complex to extract such parts efficiently and in an inexpensive manner. But once extracted they are more semantically nearer to description of the image content. Such parts would be limited in number, as compared to the approaches where low-level features and mid-level features are used to describe the object. Although the methods based on low-level and mid-level features score on their simplicity, ease of extraction and attractive invariance properties; but have proved to be weak in expressing high-level semantic information of the image. The above noted facts had made object representation using 3D volumetric parts had achieved lot of attention in the decade of 70s and 80s. The detailed coverage of the topic is out of the idea of this study but the works of Binford and Nevatia [115] can be explored for further information related to the concept. Recognition based on parts In Part-based object recognition approaches, object is modeled as a set of geometrically constrained set of various parts of the image where each part has a distinctive appearance and spatial position. In such approaches, shape is represented by the mutual position of the parts [22]. Using such features it is determined whether an instance of object of interest exist in the image or not and if at all it exist where it exist in the image. Various methods exist in literature which differs on how these parts are detected, how their position could be represented and what should be the ideal number of parts to represent an image. Generally these parameters are tuned to the requirement of the approach. In [22] Objects are modelled as flexible constellations of parts. A probabilistic representation (which in this case the authors
  • 15. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 118 editor@iaeme.com have used Gaussian), is used for all aspects of the object like shape, appearance, occlusion and relative scale. To learn and model the object category, first regions and their scales are detected. Once the regions are identified, they are cropped from the image and rescaled to the size of a small typically 11×11 pixel patch and then parameters of the above densities are estimated from these regions, such that the model gives a maximum-likelihood description of the training data. To detect the features, a histogram is generated of the intensities in a circular region of some radius. This is done for each point on the image. The entropy of this histogram is calculated and local maxima of this histogram are considered as scale of the region. The N- regions with highest saliency over the image provide the features for learning and recognition. To reduce the dimension of the feature set, PCA has been used. Deformable part based approach Deformable Part Models constitute the state of the art for sliding-window object detection [99]. The DPM’s are inspired by pictorial structure representation introduced in [91] by Fischler and Elschlager where an object is modelled by a collection of parts arranged in a deformable configuration [92]. To represent visual properties of the object small picture segments are used whereas the deformable configuration is captured by spring-like connections between these visual picture segments. An energy function is computed by computing match cost for each part and deformation cost for each pair of connected parts and this energy function is minimized to find the best match of model with in an image. The effectiveness of pictorial representation in case of image matching demonstrated in [91] is due to the fact that the representation is simple. In addition the representation possesses wide general applicability as it is not dependent on any particular scheme to model the appearance of the parts so can be used to represent quite generic objects. But at the other end, the model suffers from certain very critical limitations. Too many parameters are involved in the construction of the model thus the energy minimization function solving becomes very computation intensive. Also the best match is only found likewise, if the image consist multiple instances of the same object, they would not be detected by the pictorial representation given by [91]. The issues in pictorial representation are aptly handled by Felzenswalb in pioneering work reported in [92]. Pictorial representation proposed by Fischler and Elschlager constructs the representation which can be viewed as graph whereas Felzenswalb and Huttenlocher used tree representation realising that many objects in real-world can be represented by using a tree structure especially when the object to be modelled are human beings, animals. Using this improvisation finding best match model to an image can be computed in polynomial time. The approach demands that the graph which is generated to represent the object be acyclic and function dij(li , lj) measuring the degree of deformation of the model when part vi is placed at location li and part vj is placed at location lj needs to be a Mahalanobis distance between transformed locations. DPM’s are impressive way of object representation. While deformable models can capture significant variations in appearance, a single deformable model is often not expressive enough to represent a rich set of object category [93]. It can also be noted that in practice simple models generally outperform approaches using deformable part based representation. The reason being the simpler models can be trained easily whereas it is more difficult to train more sophisticated models like DPM. Authors in [93] illustrates that a deformable part-based model represents an object by a low-resolution root filter and a set of higher-resolution part filters arranged in a flexible spatial configuration. The flexible spatial configuration helps to
  • 16. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 119 editor@iaeme.com model the visual appearance at multiple scales. The approach has achieved benchmark results in the PASCAL object detection challenges. The approach basically uses HOG (Histogram of gradients) [32] using star-structured part-based model defined by a filter similar to filter used in [32] and set of part-based filters and deformation models. The model presented by [101] is effective for shallow structures consisting of at most two layers, but as the number of layers in the structure increases, it becomes difficult to scale the model without incorporating and tuning additional parameters. Yullie et al in [106] have extended the model discussed in [101]. In this paper authors have proposed that description of object class using several templates from different viewpoints. Each template is represented as a tree-structure consisting of three layers. The first layer represents entire image. The second layer divides the image into 9 sub- images and third layer divides each sub-image of second layer into four sub-images making third layer of 36 sub-images. The approach used by Dalal and Triggs [32], to detect pedestrians, fails in presence of articulation whereas [93, 94, 95, 96] allows an intermediate layer of parts that can now be shifted with respect to each other making overall model deformable and in this way achieves generalization. But such approaches do not work when it is a question of extracting human pose from images. In [102] Bourdev and Malik have introduced ‘Poselets’ ; parts that are tightly clustered in both appearance and configuration; for detection and pose estimation of in image consisting human body. Whereas, in [79] Pablo et al have unified the approaches presented by Dalal and Triggs [32], Felzenswalb [95] and Bourdev and Malik [102] into a single recognition framework and tries to take the benefit of each approach. The region-based object descriptors are used to perform purposive semantic segmentation and subsequently their outputs are combined and hence performance is achieved. 4.2.4. Recent Approaches and Advancement We have discussed many approaches with their benefits and limitations in the earlier sections. One thing can also be noted that all those approaches to object recognition make essential use of machine learning methods. Most current machine learning methods work well because of human-designed representations and inputs features. Early conventional approaches involve hand-crafted features for object representation and look for these features in image. To do this the programmer was required to have a deep knowledge of the data and would laboriously engineer each one the feature detection algorithms [114]. There have been big improvements in image analysis over the last few years due to the adoption of deep learning neural networks to solve vision problems. Fig-8 shows schematically the difference between traditional vision systems and recent deep neural network based system. Neural Nets for Object Recognition: Neural Networks have been used in object recognition systems since decades. Neural Nets implement a classification approach. Their attraction lies in their ability to partition the feature space using nonlinear boundaries for classes. Earlier Neural Networks were used as classifier only in the classification stage of Object recognition pipeline (Figure 8), but only recently, with the progress in vision research and the increase in computational power, neural networks are utilized for automatic feature learning( from the raw data of the image) as well as classification also. LeCunn [123] in 1989 demonstrated an algorithm to train Neural Networks in supervised way and proved applications like hand-written digit recognition performs remarkably and is benefitted from it. Since then Convolutional Neural network are being used by many research communities.
  • 17. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 120 editor@iaeme.com Convolutional Neural Networks are different than conventional approaches like BoF, DPM (Deformable Part Model).This difference is due to two very important reasons. First, they are deep architecture whereas the conventional approaches were shallow architectures. And second they doesn’t need to have prior knowledge of data of image. Deep learning neural networks made it possible to learn features in an unsupervised manner directly from data instead of handcrafting them explicitly. The approach has helped vision tasks particularly object recognition greatly thereby enabling effective capturing of low-level as well as middle level cues of object to be recognized. As a result Deep learning Neural Networks have brought huge improvements in the performance of image analysis results, over the last few years. What makes deep architectures achieve such a good result? Conventional neural nets used 1 to 2 layers of neurons whereas Deep Neural Network” is one class of neural nets that uses deep architectures with 2 to 6 layers of neurons stacked on top of each other. As a result DNN can learn more complex models easily without the need of hand-designed features. DNN’s have shown good results on ImageNet dataset [126]. On the test data authors achieved top-1 and top-5 error rates of 37.5% and 17%. Their neural network consisted of 650,000 neurons and had 5 convolutional layers and learnt 60 million parameters in ILSVRC 2010. Like every other approach Deep architecture also has got certain limitations.  Needs very sophisticated hardware and also image of fixed size typically 224 x 224.  Contains huge number of parameters to be trained so computation intensive  When trained using Gradient descent, the gradient does not trickle down to the lower layers; so the sub-optimal sets of weights are obtained [114]. Various modifications to DNN’s have been suggested in the literature to overcome these limitations. To overcome the constraint of fixed sized images required by deep neural networks, several efficient pooling strategies are proposed. In [113], network is equipped with spatial pooling strategy (SPP-net). SPP-net can generate fixed length representation irrespective of image scale and size. Spatial pyramid pooling is based on spatial pyramid matching [24] which in turn is an extension of BoF [26] approach. Another improvement achieved is RNN (Recursive Neural Network) [130] used for scene classification. The method predicts tree structure for scene images. Figure 8(a). Block diagram representing typical traditional object recognition system Various competitive challenges and Datasets Here in this section we present some of the challenges that computer vision community organize annually to invite, evaluate and report the innovative approaches developed by research groups all across the world. These challenges serve the purpose of setting a common platform for researchers to present their work and compete with each other in the area of Object detection, localization and categorization. These
  • 18. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 121 editor@iaeme.com challenges also provide dataset of sample images so that the approaches can be evaluated for all possible image content and variations in image capturing conditions. Figure 8(b). Block diagram of Deep Architectures (Image taken from: ufldl.stanford.edu/eccv10-tutorial/eccv10-tutorial_part4.ppt) Early approaches for object recognition used very small set of images to evaluate their algorithmic work. But with the advent of sophisticated world-wide-web, large number of annotated images is readily available in public as well as private repositories. To harness the benefit of such repositories datasets are created and made available to the research community. As mentioned earlier these challenges is an effort to bring together the research community together in a framework of competition so that best approaches in computer vision can be evaluated and publicized. These challenges consist of two components. First, a publicly available dataset with ground truth annotations with standardised evaluation software and second a competition and workshop [119]. To review these challenges, we first discuss the datasets made available by these competitions along with certain other widely used datasets. Datasets: No research is possible in any research area [30] without appropriate datasets. The same fact applies to object recognition and computer vision research also. Appropriate datasets are needed for all stages of recognition research; may be for learning visual models of objects and scene categories, detecting and localizing instances of these models in images, and evaluating the performance of recognition algorithms. Work mentioned in [30] reviews existing Image datasets from the point of expectation, challenges and limitations. Datasets ideally should offer vide range of image variability and should be sufficiently challenging so that algorithms can be evaluated. One of the major limitations in creating such datasets is that images are to be annotated. This task of annotating has to be done by human experts and turns out to be mammoth task considering huge number of objects existing in the real world which are to be recognized for various application and it is not so easy to get the human experts accomplishing this effectively, correctly and efficiently. A wonderful approach of automatic dataset collection using web is mentioned in [66], using an object recognition techniques in incremental method. The images present on web are used to learn the model in a robust way. Another solution for getting annotated
  • 19. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 122 editor@iaeme.com training examples can be by crowdsourcing but the most common error that an untrained annotator is susceptible to is a failure to consider a relevant class as a possible label because they are unaware of its existence. Now we discuss certain most prevalent datasets. Caltech-101 & Caltech-256: Caltech-101 is a collection of pictures of objects belonging to 101 categories collected by Fei-Fei et al [64] in 2003. About 40 to 800 images exist per category. Most categories have about 50 images. Most images have little or no clutter. The objects tend to be centered in each image and in stereotypical pose. In comparison to Caltech-101, Caltech-256 is collection of 256 categories of objects. Total 30608 images are present. Fig-9 compares Caltech-101 and Caltech- 256. Figure 9 (Courtesy: http://www.vision.caltech.edu/Image_Datasets/Caltech256/details.html) TRECVID: TRECVID organizes competition every year and for evaluating the performance, releases dataset consisting of video shots. The goal of the conference series is to encourage research in information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. Annotation is not provided by the organizers. LabelMe: LabelMe [74] is a publically available annotated image database open for public contribution. The dataset is provided with annotation tool, so that anyone can annotate any image. As images are annotated by experts as well as casual users, cannot be relied for obtaining test set whereas huge quantity of training images can be obtained. COIL-20 & COIL-100: Coil-20 and Coil-100 is a database of grayscale images of 20 and 100 categories of objects respectively [120]. Different poses of objects were generated by placing the objects on rotating turn table and images were captured at angular displacement of 5 degrees generating 72 views per object. It consists of 720 unprocessed images of 10 object categories. 1440 size normalized images are also provided. Microsoft COCO: Common Objects in Context database is a large-scale database of images that addresses three core research problems in scene understanding: detecting non-iconic views of objects, contextual reasoning between objects and the precise 2D localization of objects [122]. Contextual knowledge can be helpful to boost all the components of the object recognition framework. The dataset is provided to support object recognition based on the context in which they lie in the scene. The dataset consist 91 common object categories from which 82 of them having more than 5000 labeled instances. In total dataset have 2,500,000 labeled instances in 328000 images. The dataset consists of less object categories but very high number of instances per category that differentiates it with other popular large-scale datasets like PASCAL VOC and ImageNet dataset which would be discussed in following sections. The PASCAL Visual Object Classes Challenge The PASCAL VOC challenge was first time organized in the year 2005. Since then up to 2012, every year this challenge was organized annually. The challenge basically consists of two components. A dataset consisting of 1000 images related to objects of
  • 20. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 123 editor@iaeme.com 20 categories; obtained from Flickr web-site; was made available publicly and competition involving object classification, detection, segmentation, action classification and person layout. Everingham et al has reviewed PASCAL VOC in [119]. The objects were fully annotated for each of the objects. Not that in 2005, such a rich dataset was released. Only dataset consisting of four categories (motorbikes, bicycles, cars, and people) was made available in 2005, but every year organizers kept on enriching it and finally in 2011 of 1000 images were released. To assess different methods bootstrapping of ROC curve is used. The evaluation technique is used in a number of different ways: to simply judge the variability for a given method, to compare the relative strength of two methods, or to look at rank ranges in order to get an overall sense of all methods in a competition [119]. ImageNet Large Scale Visual Recognition Challenge (ILSVRC): ILSVRC was first organized in 2010 and since then, the event is organized annually. ILSVRC is one of the most prestigious series of competition and workshop in computer vision community to evaluate the performance of all contemporary approaches developed by various researchers. The challenge from various aspects is nicely reviewed in [118]. Similar to PASCAL VOC, ILSVRC also provides a huge collection of annotated images under the name ImageNet by Deng. et al. ImageNet Dataset: ImageNet is an image database organized according to the WordNet hierarchy in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently, there exist over five hundred images per node [121]. In ImageNet, on an average 1000 images to illustrate each synset has been provided. Images of each concept are quality-controlled and human-annotated. ILSVRC has dimension very high as compared to PASCAL VOC [121]. As per 2010 data it is organized in form of 12 subtrees with 5247 synsets and 3.2 million images in total. As ImageNet organization is inspired by WordNet structure and there are around 80,000 noun synset in WordNet, Similarly ImageNet also aims at providing nearly all the majority of the 80,000 synsets of WordNet with an average of 500- 1000 clean and full resolution images. To evaluate the approaches effective strategy of bootstrapping used by PASCAL VOC is employed in ILSVR challenge series also. In Table 2, we present the comparison between PASCAL VOC challenge dataset and ILSVRC challenge as per year 2012 as referred from [101]. Table 2 Comparison of PASCAL VOC and ILSVRC as per [101] Aspect for comparison PASCAL VOC ILSVRC Diversity of Object classes Objects are only one class label for e.g Boat for all types of boat, be it lifeboat, fireboat Objects are further refined to subcategories for for e.g boat is just not boat but lifeboat, gondola Chance Performance of Localization(CPL) 8.8% on validation set for 20 categories 20.8% for all 1000 categories Average Object Scale /class 0.241 0.358 Average Number of Instances per class 1.69 1.59 Clutter per class when clutter is computed as No. of boxes. 129.96 106.98
  • 21. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 124 editor@iaeme.com In addition to all these, various other datasets are used in literature like GRAZ-01 by Opelt and Pinz, which contains four types of images: bikes, people, background with no bikes, background with no people. INRIA (people) dataset used by Dalal and Triggs in their work in [32, 33], MNIST – Dataset of handwritten digits, ImageCLEF, INRIA (Horses,cars), TinyImages dataset by Torralba et al , ETH-80 etc. CIFAR-10 and CIFAR-100 set has 6000 examples of each of 10 classes and the CIFAR-100 set has 600 examples of each of 100 non-overlapping classes [125]. The list that we have considered is not exhaustive but exemplary. For an exhaustive list [127] can further be explored. 5. FUTURE RESEARCH DIRECTIONS Object recognition is one of the most exciting research area in the field of computer vision. Today need is to develop systems which are, computationally efficient and at the same time cost effective. We suggest some of the future research directions which can be explored and in turn be incorporated in recognition systems. We attempt to suggest these directions at algorithmic level or at product level; many of which can be at present considered as an idea which may need knowledge base from multidisciplinary fields.  Currently, deep learning is the current state-of-the-art in object recognition and has produced promising results but they suffer from certain serious limitations of being resource intensive. So, in absence of sophisticated hardware DNN’s cannot be adopted for object recognition. In such cases enhancing the performance of conventional feature extraction techniques on shallow architectures can be helpful. New approaches are in need which require shallow architectures and are still efficient. Also it is realised that DNN’s have not shown very impressive results in the task of object localization, this area can further be explored.  It is also to be understood, how learning of features take place in Convolutional neural networks? What makes deep architectures giving so high accuracy of recognition?  Due to advent of mobile and other hand-held devices with very nice image capturing abilities, algorithms for mobile and other hand-held devices are in great need.  Although considerable work exists in literature related to action recognition systems and a complete line of research is going on in this direction as the area in itself involves many and varied issues and research problems Products can be developed involving action and activity recognition from videos.  Computer vision techniques can be good method to generate assistive technology for blind people. For example, products can be developed which sees the surrounding and generates a natural language description of the scene and can be given as output in spoken form. This will help to understand the surroundings and will help blind people to navigate.  Research in the area of understanding videos from its content already started, but still in its infancy, Generic object recognition also paves the path for research in areas like emotion recognition which will actually enable us to recognize the meaning of the content in the video.  Robotics is one another important field which can benefit from active object recognition. Today’s robots are able to work only in well -structured and constrained environments. Whereas, the requirement is to develop robots which can learn, adapt and execute their tasks in real human environments.
  • 22. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 125 editor@iaeme.com  Almost every device has a camera and devices are now powerful enough to record and process live video. These videos can be exploited for real-time applications. How do we organize and personalize all of this content for common man?  New performance evaluation techniques are needed.  Many rich datasets have been generated and made available to public like ImageNet, PASCAL VOC by computer vision research community. Although these datasets holds huge number of images pertaining to various categories, but if we were to reach to the level of near human vision capabilities in terms of flexibility and dynamism, then these datasets are to be enriched. Novel ways of labelling huge number of unlabeled image data should be found so that images annotated with ground truth can be generated and can be made available publically. 6. CONCLUSION From the literature available on the subject it was found that the demand for the efficient Generic object recognition system is increasing very fast as the spectrum of applications in which the object recognition is needed is very wide and rich. One major problem in process of Generic Object Recognition is that, categories available in the real world are varied and huge in number. Due to this fact, the training of the recognition system for such a large number of categories and classes becomes a challenging task however sophisticated, the approach may be. Also in such kind of system the property of plasticity i.e the system should be able to gradually train itself for unseen categories, is expected which further adds to the complexity of the system. Such systems can be developed which should be flexible enough to train themselves for new classes of objects. Another important issue with generic object recognition system is with the feature extraction and description phase. In most of the approaches, the number of features obtained is too large and are handcrafted. This very critical limitation has been overcome by deep architectures which in turn have exploited sophisticated hardware accelerations evolved recently. Approaches are needed which make the entire set up cost effective requiring fewer resources. Ideally, it is desirable that, the recognition task should be performed at semantic level which will result into near human vision systems. One of the key objectives behind this survey was to get the answers of the research questions identified by us and mentioned in Section 1. From the literature surveyed it can be deduced that, the earlier work related to generic object recognition were putting more weightage on feature extraction stage and type of features, whereas the later works were giving more prominence to type of classifier used. Also, recent approaches are learning features directly from the image data. This can be regarded as very striking innovation achieved by the vision community. Now the ways are needed which can bring enhancement to these approaches. The above efforts can also be extended for 3D images and also for videos. As a result of this study and from the referred material, a general remark can also be made about the kind of work that is done in the field. Most of the papers before 2008, mainly present novel ways of modelling the object class. i.e they emphasize on novel ways of feature detection and descript tion. However, work presented and published in the recent past since 2011, with the advent of sophisticated hardware, more emphasis is given on handling more categories accurately and efficiently. In this paper a current scenario of generic object recognition is portrayed in brief with a hope that in near future, such an object recognition system will be developed
  • 23. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 126 editor@iaeme.com which would be capable of performing vision task similar to the human vision system least possible effort and in a cost effective manner. REFERENCES [1] Bennamoun, Mohammed, and George J. Mamic. Object recognition: fundamentals and case studies. Springer Science & Business Media, 2002. [2] Takahiro Hori, Tetsuya Takiguchi, Yasuo Ariki. Generic Object Recognition Using Graph Embedding into a Vector Space, American Journal of Software Engineering and Applications. Vol. 2, No. 1, 2013, pp. 13-18. [3] David. G. Lowe, “Object Recognition from Local Scale-Invariant Features”, Proc. Of the International Conference on Computer Vision, /corfu.(Sept- 1999) [4] David.G.Lowe,” Distinctive Image Features from Scale-Invariant Keypoints”, 2004. [5] Ke, Yan, and Rahul Sukthankar. "PCA-SIFT: A more distinctive representation for local image descriptors." Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. Vol. 2. IEEE, 2004. [6] Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. "Surf: Speeded up robust features." Computer Vision–ECCV 2006. Springer Berlin Heidelberg, 2006. 404-417. [7] B. Leibe, A. Leonardis, and B. Schiele. Combined Object Categorization and Segmentation with an Implicit Shape Model”, In ECCV04. Workshop on Stat. Learning in Computer Vision, pages 17–32, May 2004. [8] Alexander Thomas, Vittorio Ferrari, Bastian Leibe, Tinne Tuytelaars, Bernt Schiele, Luc Van Gool”, Towards Multi-View Object Class Detection”, Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) [9] Bastian Leibe, Aleˇs Leonardis, and Bernt Schiele,” Robust Object Detection with Interleaved Categorization and Segmentation”, IJCV, (2008) 77,259- 289. [10] Jia, Menglei , Li, Hua, Xie, Xing , Chen, Zheng Ma, Wei-ying, “Automatic Classification Of Objects Within IMAGES” United states Microsoft Corporation (Redmond, WA, US)20080037877 http://www.freepatentsonline.com/y2008/0037877.html [11] Pisipati, Radha Krishna (Hyderabad, IN), Syed, Shahanaz (Guntur, IN), Jonna, Kishore (Proddatur, IN), Bandyopadhyay, Subhadip (Kolkata, IN), Narayan, Rudra Narayan (Jemadeipur, IN) 2014. Systems And Methods For Multi-Dimensional Object Detection United States 20140029852 http://www.freepatentsonline.com/y2014/0029852.html [12] Besl, Paul J., and Ramesh C. Jain. "Three-dimensional object recognition." ACM Computing Surveys (CSUR) 17.1 (1985): 75-145. [13] Irving Biederman. Recognition-by-components: A theory of human image understanding. Psychological Review, 94(2):115-147, 1987. [14] Zhang, Xin, et al. "Object class detection: A survey." ACM Computing Surveys (CSUR) 46.1 (2013): 10. [15] Andreopoulos, Alexander, and John K. Tsotsos. "50 Years of object recognition: Directions forward." Computer Vision and Image Understanding 117.8 (2013): 827-891.
  • 24. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 127 editor@iaeme.com [16] Roth, Peter M., and Martin Winter. "Survey of appearance-based methods for object recognition." Inst. for Computer Graphics and Vision, Graz University of Technology, Austria, Technical Report ICGTR0108 (ICG-TR-01/08) (2008). [17] Agarwal, Shivani, Aatif Awan, and Dan Roth. "Learning to detect objects in images via a sparse, part-based representation." Pattern Analysis and Machine Intelligence, IEEE Transactions on 26.11 (2004): 1475-1490. [18] Fergus, Robert, Pietro Perona, and Andrew Zisserman. "Object class recognition by unsupervised scale-invariant learning." Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on. Vol. 2. IEEE, 2003. [19] A. Opelt, A. Pinz, and A. Zisserman. A Boundary-Fragment Model For Object Detection. In Proc. ECCV, volume 2, pp 575–588, May 2006. [20] Wu, Bo, and Ramakant Nevatia. "Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors."Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on. Vol. 1. IEEE, 2005. [21] Andreas Opelt, Axel Pinz,Andrew Zisserman,” Incremental Learning Of Object Detectors Using A Visual Shape Alphabet”, Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) [22] Andreas Opelt, Axel Pinz ,Andrew Zisserman,,“Fusing shape and appearance information for object category detection”, 2006 - eprints.pascal-network.org [23] Andreas Opelt, Axel Pinz ,Andrew Zisserman, “Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection”, IJCV (2008) 80: 16–44 [24] Grauman, Kristen, and Trevor Darrell. "Pyramid match kernel and related techniques." U.S. Patent No. 7,949,186. 24 May 2011. [25] Zhang, Jianguo, et al. "Local features and kernels for classification of texture and object categories: A comprehensive study." International journal of computer vision 73.2 (2007): 213-238. [26] Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories." Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006. [27] Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Spatial pyramid matching." Object Categorization: Computer and Human Vision Perspectives 3 (2009): 4. [28] Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. “A Discriminative Framework for Texture and Object Recognition Using Local Image Features. In Toward Category-Level Object Recognition. Springer-Verlag Lecture Notes in Computer Science, J. Ponce, M. Hebert, C. Schmid, and A. Zisserman (eds.), 2006. [29] Lazebnik, Svetlana. "Local, semi-local and global models for texture, object and scene recognition." (2006). [30] J. Ponce, T. L. Berg, M. Everingham, D. A. Forsyth, M. Hebert, S. Lazebnik, M. Marszalek, C. Schmid, B. C. Russell, A. Torralba, C. K. I. Williams, J. Zhang, and A. Zisserman. “Dataset Issues in Object Recognition” In Toward Category-Level Object Recognition. Springer-Verlag Lecture Notes in Computer Science, J. Ponce, M. Hebert, C. Schmid, and A. Zisserman (eds.), 2006.
  • 25. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 128 editor@iaeme.com [31] Dorko, Gyuri, and Cordelia Schmid. "Object class recognition using discriminative local features." (2005): 22. [32] Dalal, N. And Triggs, B. 2005. Histograms of Oriented Gradients for Human Detection. In Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition (CVPR’05). [33] Dalal,N.,Triggs, B., And Schmid,C. 2006. Human Detection Using Oriented Histograms Of Flow And Appearance. In Proceedings of the European Conference on Computer Vision (ECCV’06). [34] Dalal, Navneet. Finding people in images and videos. Diss. Institut National Polytechnique de Grenoble-INPG, 2006. [35] Jurie, Frederic, and Bill Triggs. "Creating efficient codebooks for visual recognition." Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on. Vol. 1. IEEE, 2005. [36] Bosch Anna, Andrew Zisserman, and Xavier Munoz. "Representing shape with a spatial pyramid kernel." Proceedings of the 6th ACM international conference on Image and video retrieval. ACM, 2007. [37] Watanabe, Tomoki, Satoshi Ito, and Kentaro Yokoi. "Co-occurrence histograms of oriented gradients for pedestrian detection." Advances in Image and Video Technology. Springer Berlin Heidelberg, 2009. 37-47. [38] JOACHIMS, T. 1997. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In Proceedings of the International Conference on Machine Learning (ICML’97) [39] Csurka, Gabriella, et al. "Visual categorization with bags of keypoints." Workshop on statistical learning in computer vision, ECCV. Vol. 1. No. 1-22. 2004. [40] Ramanan, Amirthalingam, and Mahesan Niranjan. "A review of codebook models in patch-based visual object recognition." Journal of Signal Processing Systems 68.3 (2012): 333-352. [41] K. Mikolajczyk, C.Schmid,” Indexing based on scale invariant interest points”, International Conference on Computer Vision (ICCV '01) 1 (2001) 525—531. [42] K.Mikolajczyk A. Zisserman C. Schmid,” Shape recognition with edge-based features”, British Machine Vision Conference (BMVC '03) 2 (2003) 779— 788. [43] K. Mikolajczyk and C. Schmid. “Scale and affine invariant interest point detectors”. Int. J. Comput. Vision, 60(1):63–86, 2004. [44] K. Mikolajczyk1, T. Tuytelaars2, C. Schmid4, A. Zisserman ,”A Comparison of Affine Region Detectors”, International Journal of Computer Vision 65, 1/2 (2005) 43—72 [45] K.Mikolajczyk and C.Schmid,”A performance evaluation of local descriptors”, IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 10 (2005) 1615—1630. [46] Douze, Matthijs, et al. "Evaluation of gist descriptors for web-scale image search." Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, 2009. [47] Jégou, Hervé, Matthijs Douze, and Cordelia Schmid. "Improving bag-of- features for large scale image search." International Journal of Computer Vision 87.3 (2010): 316-336. [48] Tuytelaars, Tinne, and Krystian Mikolajczyk. "Local invariant feature detectors: a survey." Foundations and Trends® in Computer Graphics and Vision 3.3 (2008): 177-280.
  • 26. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 129 editor@iaeme.com [49] K. Mikolajczyk, Bastian Leibe, Bernt Schiele,” Multiple Object Class Detection with a Generative Model”, Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on vol-1,pp 26 – 36 [50] Jégou, Hervé, et al. "Aggregating local descriptors into a compact image representation." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010. [51] Wengert, Christian, Matthijs Douze, and Hervé Jégou. "Bag-of-colors for improved image search." Proceedings of the 19th ACM international conference on Multimedia. ACM, 2011. [52] Belongie, S., Malik, J., And Puzicha, J. 2001. “Matching shapes”, In Proceedings of the IEEE International Conference on Computer Vision (ICCV’01). [53] Belongie, Serge, Jitendra Malik, and Jan Puzicha. "Shape matching and object recognition using shape contexts." Pattern Analysis and Machine Intelligence, IEEE Transactions on 24.4 (2002): 509-522 [54] Andras Ferencz, Erik G. Learned-Miller, Jitendra Malik,” Building a Classification Cascade for Visual Identification from One Example”, Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05). [55] Hao Zhang Alexander C. Berg Michael Maire Jitendra Malik,”SVM- KNN:Discriminative Nearest Neighbor Classification for Visual Recognition”, Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) [56] Bjorn Ommer Jitendra Malik,”Multi-Scale Object Detection by Clustering Lines”, 2009 IEEE 12th International Conference on Computer Vision (ICCV)pp 484-491 [57] Subhransu Maji, Jitendra Malik,”Object Detection using a Max-Margin Hough Transform”,IEEE 2009, pp 1038-1045 . [58] Vidal-Naquet, Michel, and Shimon Ullman. "Object Recognition with Informative Features and Linear Classification." ICCV. Vol. 3. 2003 [59] Fergus, Robert, Pietro Perona, and Andrew Zisserman. "Object class recognition by unsupervised scale-invariant learning." Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on. Vol. 2. IEEE, 2003. [60] Boris Epshtein Shimon Ullman,” Identifying Semantically Equivalent Object Fragments”, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) [61] Eran Borenstein and Shimon Ullman,” Combined Top-Down/Bottom-Up Segmentation”, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 30, No. 12, December 2008, pp 2109-2125 [62] L. Fei-Fei, R. Fergus, and P. Perona,”Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories.” In Proc. CVPR Workshop on Generative-Model Based Vision, 2004. [63] Fei-Fei, Li, Rob Fergus, and Pietro Perona. "Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories." Computer Vision and Image Understanding 106.1 (2007): 59-70. [64] www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html
  • 27. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 130 editor@iaeme.com [65] Fei-Fei, Li, Robert Fergus, and Pietro Perona. "One-shot learning of object categories." Pattern Analysis and Machine Intelligence, IEEE Transactions on28.4 (2006): 594-611. [66] Li-Jia Li, Gang Wang and Li Fei-Fei,” OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning”, 2007 IEEE [67] Hao Su, Min Sun, Li Fei-Fei,Silvio Savarese,”Learning a Dense Multi-View Representation For Detection, Viewpoint Classification And Synthesis Of Object Categories”, 2009 IEEE 12th International Conference on Computer Vision (ICCV) [68] Bangpeng Yao, Li Fei-Fei,” Recognizing Human-Object Interactions in Still Images by Modeling Mutual Context of Object and Human Pose in Human- Object Interaction Activities”, Ieee Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No. 9, September 2012, pp 1691-1703 [69] Oliva, Aude, and Antonio Torralba. "Building the gist of a scene: The role of global image features in recognition." Progress in brain research 155 (2006): 23-36. [70] Antonio Torralba, Kevin P. Murphy and William T. Freeman, “Sharing Visual Features for Multiclass and Multiview Object Detection”, April 2004. [71] Antonio Torralba , Kevin P. Murphy, William T. Freeman,” Sharing features: efficient boosting procedures for multiclass object detection” [72] Oliva, Aude, and Antonio Torralba. "Modeling the shape of the scene: A holistic representation of the spatial envelope." International journal of computer vision 42.3 (2001): 145-175. [73] Torralba, Antonio, Robert Fergus, and Yair Weiss. "Small codes and large image databases for recognition." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008. [74] BC Russell, A Torralba, KP Murphy,“LabelMe: a database and web-based tool for image annotation”,International journal of Computer Vision, 2008(77) – Springer, pp-157-173. [75] Taha H. Rassem, Bee Ee Khoo,” Object Class Recognition using Combination of Color SIFT Descriptors”, 2011 IEEE [76] Gyuri Dork_o, Cordelia Schmid,” Object Class Recognition Using Discriminative Local Features”,Technical Report [77] Gy. Dorko and C. Schmid. Selection of scale-invariant parts for object class recognition”. In Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV’03), pages 634–639, 2003. [78] Thomas Serre, Lior Wolf, Stanley Bileschi, Maximilian Riesenhuber, and Tomaso Poggio,” Robust Object Recognition with Cortex-Like Mechanisms”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 3, MARCH 2007 pp 411-426. [79] B. Mayurathan, A. Ramanan, S. Mahesan & U.A.J. Pinidiyaarachchi,” Speeded-up and Compact Visual Codebook for Object Recognition”, International Journal of Image Processing (IJIP), Volume (7): Issue (1): 2013 pp 31-50 [80] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. Prentice-Hall Inc., 2002. [81] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, “Learning object Categories from google’s Image Search,” Computer Vision, 2005. ICCV’ 2005. Tenth IEEE International Conference on, 2005.
  • 28. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 131 editor@iaeme.com [82] Joao Carreira and Cristian Sminchisescu, “Constrained Parametric Min-Cuts for Automatic Object Segmentation”, Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference pp 3241-3248 [83] Van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2008). Evaluation of color descriptors for object and scene recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’08) [84] Uijlings, Jasper RR, et al. "Selective search for object recognition."International journal of computer vision 104.2 (2013): 154-171. [85] Van de Sande, Koen EA, et al. "Segmentation as selective search for object recognition." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. (selective search) (reviewerI and III) [86] Fuxin Li_ and Joao Carreira_ and Cristian Sminchisescu, “Object Recognition as Ranking Holistic Figure-Ground Hypotheses” , Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference , pp 1712 – 1719. [87] Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014. [88] Carreira, Joao, et al. "Semantic segmentation with second-order pooling." Computer Vision–ECCV 2012. Springer Berlin Heidelberg, 2012. 430-443. [89] Carreira, Joao, and Cristian Sminchisescu. "CPMC: Automatic object segmentation using constrained parametric min-cuts." Pattern Analysis and Machine Intelligence, IEEE Transactions on 34.7 (2012): 1312-1328. [90] Li, Fuxin, Joao Carreira, and Cristian Sminchisescu. "Object recognition as ranking holistic figure-ground hypotheses." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010 [91] Fischler, Martin A., and Robert A. Elschlager. "The representation and matching of pictorial structures." IEEE Transactions on Computers 22.1 (1973): 67-92. [92] Felzenszwalb, Pedro F., and Daniel P. Huttenlocher. "Pictorial structures for object recognition." International Journal of Computer Vision 61.1 (2005): 55-79. [93] Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." Pattern Analysis and Machine Intelligence, IEEE Transactions on 32.9 (2010): 1627-1645. [94] Girshick, Ross B., Pedro F. Felzenszwalb, and D. McAllester. "Discriminatively trained deformable part models, release 5." (2012). [95] Felzenszwalb, Pedro, David McAllester, and Deva Ramanan. "A discriminatively trained, multiscale, deformable part model." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008. [96] Felzenszwalb, Pedro F., Ross B. Girshick, and David McAllester. "Cascade object detection with deformable part models." Computer vision and pattern recognition (CVPR), 2010 IEEE conference on. IEEE, 2010. [97] Ferrari, Vittorio, Frederic Jurie, and Cordelia Schmid. "Accurate object detection with deformable shape models learnt from images." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007. [98] Pentland, Alex P. "Automatic extraction of deformable part models." International Journal of Computer Vision 4.2 (1990): 107-126.
  • 29. Prof. Deepika Shukla and Apurva Desai http://www.iaeme.com/IJARET/index.asp 132 editor@iaeme.com [99] Pandey, Megha, and Svetlana Lazebnik. "Scene recognition and weakly supervised object localization with deformable part-based models." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. [100] Ren, Xiaofeng, and Deva Ramanan. "Histograms of sparse codes for object detection." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013. [101] Yang, Yi, and Deva Ramanan. "Articulated pose estimation with flexible mixtures-of-parts." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. [102] Bourdev, Lubomir, and Jitendra Malik. "Poselets: Body part detectors trained using 3d human pose annotations." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009. [103] Arbeláez, Pablo, et al. "Semantic segmentation using regions and parts." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. [104] Bourdev, Lubomir, et al. "Detecting people using mutually consistent poselet activations." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 168-181. [105] Arbeláez, Pablo, Bharath Hariharan, Chunhui Gu, Saurabh Gupta, Lubomir Bourdev, and Jitendra Malik. "Semantic segmentation using regions and parts." In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 3378-3385. IEEE, 2012. [106] Zhu, Long, et al. "Latent hierarchical structural learning for object detection." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010. [107] Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer Vision–ECCV 2014. Springer International Publishing, 2014. 818-833. [108] Zhao, Wenyi, et al. "Face recognition: A literature survey." Acm Computing Surveys (CSUR) 35.4 (2003): 399-458. [109] Yang, Ming-Hsuan, David Kriegman, and Narendra Ahuja. "Detecting faces in images: A survey." Pattern Analysis and Machine Intelligence, IEEE Transactions on 24.1 (2002): 34-58. [110] T.yamazaki,T.Fujikawa,J.Katto,”Improving the performance of SIFT using Bilateral Filter and its Application to Generic Object Recognition.”, ICASSP 2012, IEEE , pp 945 – 948. [111] Chiu, Liang-Chi, et al. "Fast SIFT Design For Real-Time Visual Feature Extraction." Image Processing, IEEE Transactions on 22.8 (2013): 3158- 3167. [112] Kamencay, Patrik, et al. "Feature extraction for object recognition using PCA-KNN with application to medical image analysis." Telecommunications and Signal Processing (TSP), 2013 36th International Conference on. IEEE, 2013. [113] He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition."arXiv preprint arXiv: 1406.4729 (2014). [114] Goyal, Soren, and Paul Benjamin. "Object Recognition Using Deep Neural Networks: A Survey." arXiv preprint arXiv: 1412.3684 (2014). [115] Nevatia, Ramakant, and Thomas O. Binford. "Description and recognition of curved objects." Artificial Intelligence 8.1 (1977): 77-98.
  • 30. Review on Generic Object Recognition Techniques: Challenges and Opportunities http://www.iaeme.com/IJARET/index.asp 133 editor@iaeme.com [116] Fidler, Sanja, Marko Boben, and Ales Leonardis. "Learning a hierarchical compositional shape vocabulary for multi-class object representation." arXiv preprint arXiv: 1408.5516 (2014). [117] Lee, Tom, Sanja Fidler, and Sven Dickinson. "Multi-cue mid-level grouping." [118] Russakovsky, Olga, et al. "Imagenet large scale visual recognition challenge." arXiv preprint arXiv: 1409.0575 (2014). [119] Everingham, Mark, et al. "The pascal visual object classes challenge: A retrospective." International Journal of Computer Vision 111.1 (2014): 98- 136.Fei [120] Nene, Sameer A., Shree K. Nayar, and Hiroshi Murase. Columbia object image library (COIL-20). Technical Report CUCS-005-96, 1996. [121] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: a large- scale hierarchical image database, IEEE Computer Vision and Pattern Recognition, 2009. <http://www.image-net.org/>. [122] Lin, Tsung-Yi, et al. "Microsoft COCO: Common objects in context." Computer Vision–ECCV 2014. Springer International Publishing, 2014. 740-755. [123] LeCun, Yann, et al. "Backpropagation applied to handwritten zip code recognition." Neural computation 1.4 (1989): 541-551. [124] Humphrey, Eric J., Juan Pablo Bello, and Yann LeCun. "Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics." ISMIR. 2012. [125] Krizhevsky, Alex, and Geoffrey Hinton. "Learning multiple layers of features from tiny images." Computer Science Department, University of Toronto, Tech. Rep 1.4 (2009): 7. [126] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. [127] http://riemenschneider.hayko.at/vision/dataset/index.php as referred on 12th April 2015 [128] http://image-net.org/challenges/LSVRC/2012/analysis/as referred on 12th April 2015 Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer Vision–ECCV 2014. Springer International Publishing, 2014. 818-833. [129] Bengio, Yoshua, et al. "Greedy layer-wise training of deep networks."Advances in neural information processing systems 19 (2007): 153. [130] Mrs. Manisha Bhisekar and Prof. Prajakta Deshmane, Image Retrieval and Face Recognition Techniques: Literature Survey. International Journal of Electronics and communication Engineering and Technology, 5(1), 2014, pp. 52-58. [131] Yoel E. Almeida, Ashray S. Bhandare and Aishwary P. Nipane, Computer Vision Based Adaptive Lighting Solutions for Smart and Efficient System. International Journal of Computer Engineering and Technology, 6(3), 2015, pp. 01-11. [132] Socher, Richard, et al. Parsing natural scenes and natural language with recursive neural networks. Proceedings of the 28th international conference on machine learning (ICML-11). 2011.