Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: Application to handwritten character recognition
Szilárd Vajda, Yves Rangoni, Hubert Cecotti
Pattern Recognition Letters, 2015
Revealing disease-associated pathways by network integration of untargeted me...SOYEON KIM
More Related Content
Similar to Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: Application to handwritten character recognition
HANDWRITTEN DIGIT RECOGNITION USING k-NN CLASSIFIERvineet raj
Similar to Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: Application to handwritten character recognition (20)
6. Feature representations
• Raw pixels
– Pixel intensity in raw images
• Profiles (upper/lower/left/right)
– only considers outer shape of the character
– i.e. consider the distance between the upper
horizontal line and the closest pixel to the
upper boundary of the image
• Local Binary Patterns (LBP)
– local texture and rotation invariant
representation
6
L. Heutte, T. Paquet, J.V. Moreau, Y. Lecourtier, C. Olivier, A
structural/statistical featurebased vector for handwritten
character recognition, Pattern Recognit. Lett. 19 (7) (1998)
629–641.
7. Feature representations
• Randon transform
– takes multiple and parallel-beam projections of the image from
different angles
• Encoder network
– a special kind of deep learning architectures
– data-driven
7
8. • Definitions
• Voting scheme: consensus, majority voting
Classifiers
the number of patterns that should be assigned to the i-th class
the number of patterns that are assigned to the
class after classification
𝑁𝑝 = 𝑁𝑑𝑒𝑐 + 𝑁𝑟𝑒𝑗
𝑁𝑑𝑒𝑐: patterns that have a class assigned, 𝑁𝑟𝑒𝑗: patterns that have no assigned patterns
𝑁+ / 𝑁−: patterns that have been correctly / incorrectly classified 8
9. Classifiers
• Unsupervised clustering
– K-means clustering (Lloyd algorithm)
– Self Organizing Map (SOM) : a special type of neural network trained in
an unsupervised fashion, to produce a two-dimensional mapping of the
input data
– The Growing Neural Gas (GNG) : no constraints on the topology
contrary to the SOM
• Supervised classification
– The k-nearest neighbor (k-nn) classifier 9
10. Classifiers
• Evaluation
• measures combine inter-class and intra-class variances
• measures the reliability of the labeling strategy
X: total numbers of vectors to be clustered
1
0
11. Dataset
• MNIST
– Arabic digits
– 10 classes (0,1,…,9)
– 60,000 training / 10,000 test images
• Lampung
– multi-writer handwritten collection produced by 82 high school students from
Bandar Lampung, Indonesia
– 20 character classes
– 23, 447 characters for training
– 7,853 characters for the test
1
1
18. Results
• Classification performance (different voting)
– A fully connected multi layer perceptron classification
1
8
96.69 96.74 96.77
The network is more sensitive to the samples with wrong labels
19. Conclusion
• Semi-automatic labeling scheme with minimal human
involvement.
• The newly discovered labels with this labeling scheme are
compared in a k-nn scheme, with randomly selected samples
and the complete data (all labeled).
1
9