Be the first to like this
Learning a Structured Model for Visual Category Recognition
This thesis deals with the problem of estimating structure in data due to the semantic relations between data elements and leveraging this information to learn a visual model for category recognition. A visual model consists of dictionary learning, which computes a succinct representation of training data by partitioning feature space, and feature encoding, which learns a representation of each image as a combination of dictionary elements. Besides variations in lighting and pose, a key challenge of classifying a category is intra-category appearance variation. The key idea in this thesis is that feature data describing a category has latent structure due to visual content idiomatic to a category. However, popular algorithms in literature disregard this structure when computing a visual model.
Towards incorporating this structure in the learning algorithms, this thesis analyses two facets of feature data to discover relevant structure. The first is structure amongst the sub-spaces of the feature descriptor. Several sub-space embedding techniques that use global or local information to compute a projection function are analysed. A novel entropy based measure of structure in the embedded descriptors suggests that relevant structure has local extent. The second is structure amongst the partitions of feature space. Hard partitioning of feature space leads to ambiguity in feature encoding. To address this issue, novel fuzzy logic based dictionary learning and feature encoding algorithms are employed that are able to model the local feature vectors distributions and provide performance benefits.
To estimate structure amongst sub-spaces, co-clustering is used with a training descriptor data matrix to compute groups of sub-spaces. A dictionary learnt on feature vectors embedded in these multiple sub-manifolds is demonstrated to model data better than a dictionary learnt on feature vectors embedded in a single sub-manifold computed using principal components. In a similar manner, co-clustering is used with encoded feature data matrix to compute groups of dictionary elements - referred to as `topics'. A topic dictionary is demonstrated to perform better than a regular dictionary of comparable size. Both these results suggest that the groups of sub-spaces and dictionary elements have semantic relevance.
All the methods developed here have been viewed from the unifying perspective of matrix factorization, where a data matrix is decomposed to two factor matrices which are interpreted as a dictionary matrix and a co-efficient matrix. Sparse coding methods, which are currently enjoying much success, can be viewed as matrix factorization with a regularization constraint on the vectors of the dictionary or co-efficient matrices. ....