Be the first to like this
The thesis addresses the problem of holistic recognition of printed text in Nastalique writing style of the Urdu language. The main difficulty of the recognition process lies in
the large number of classes (17,000 different possible classes in our Urdu text data). This large number of classes not only limits the efficiency (run-time) of many recognition algorithms, but it also makes it more difficult to make use of some state-of-the-art classifiers –like random forests– that assume a much smaller number of classes in the classification problems they can be used for. In this paper, we investigate different strategies for improving the efficiency (reducing the search space) of nearest neighbor based classification of Urdu ligatures.
Experiments using spectral hashing show that the search space of nearest neighbor comparison can be reduced by about 50% without loss in recognition accuracy.
Further experiments demonstrate that Random Forest classifier can reliably distinguish one-character ligatures from multiple-character ligatures.