
Be the first to like this
Published on
Learning a Structured Model for Visual Category Recognition
Abstract:
This thesis deals with the problem of estimating structure in data due to the semantic relations between data elements and leveraging this information to learn a visual model for category recognition. A visual model consists of dictionary learning, which computes a succinct representation of training data by partitioning feature space, and feature encoding, which learns a representation of each image as a combination of dictionary elements. Besides variations in lighting and pose, a key challenge of classifying a category is intracategory appearance variation. The key idea in this thesis is that feature data describing a category has latent structure due to visual content idiomatic to a category. However, popular algorithms in literature disregard this structure when computing a visual model.
Towards incorporating this structure in the learning algorithms, this thesis analyses two facets of feature data to discover relevant structure. The first is structure amongst the subspaces of the feature descriptor. Several subspace embedding techniques that use global or local information to compute a projection function are analysed. A novel entropy based measure of structure in the embedded descriptors suggests that relevant structure has local extent. The second is structure amongst the partitions of feature space. Hard partitioning of feature space leads to ambiguity in feature encoding. To address this issue, novel fuzzy logic based dictionary learning and feature encoding algorithms are employed that are able to model the local feature vectors distributions and provide performance benefits.
To estimate structure amongst subspaces, coclustering is used with a training descriptor data matrix to compute groups of subspaces. A dictionary learnt on feature vectors embedded in these multiple submanifolds is demonstrated to model data better than a dictionary learnt on feature vectors embedded in a single submanifold computed using principal components. In a similar manner, coclustering is used with encoded feature data matrix to compute groups of dictionary elements  referred to as `topics'. A topic dictionary is demonstrated to perform better than a regular dictionary of comparable size. Both these results suggest that the groups of subspaces and dictionary elements have semantic relevance.
All the methods developed here have been viewed from the unifying perspective of matrix factorization, where a data matrix is decomposed to two factor matrices which are interpreted as a dictionary matrix and a coefficient matrix. Sparse coding methods, which are currently enjoying much success, can be viewed as matrix factorization with a regularization constraint on the vectors of the dictionary or coefficient matrices. ....
Clipping is a handy way to collect important slides you want to go back to later.
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.
Be the first to comment