In this paper we proposed a multiple classifiers system for handwritten Arabic alphabet recognition to investigate if it will really achieve a remarkable increase in the recognition accuracy compared to a single feature-based classifier system result
2. 1. Boosting
Back
Boosting can be implemented in three different ways
Boosting by filtering: The training set is filtered by a set of weak
learners. It requires a large training set.
Boosting by re-sampling: Examples are re-sampled to overcome
the need for large training sets.
Boosting by re-weighting: This approach uses weak learners, yet
it assumes that the training examples are weighted.
3. 1. Boosting
a. Boosting by filtering
The committee is consistent of 3 experts/classifiers trained as follows
1. The first expert is trained on N1 patterns.
2. The trained first classifier is used to filter another set of patterns as
follows
Flip a fair coin, generating a random guess
If the results is Heads, pass new patterns through the first expert and
discard correctly classified patterns until a pattern is misclassified. That
pattern is added to the training set for the second expert.
If the results is Tails, pass new patterns through the first expert and
discard incorrectly classified patterns until a pattern is correctly
classified. That pattern is added to the training set for the second
expert.
Continue this process until a total of N1 patterns have been filtered by
the first expert. Use this training set to train the second expert.
Back
4. 1. Boosting
a. Boosting by filtering (continued)
3. Once the second expert is trained, a third training set is formed for the
third expert as follows
Pass a new pattern through experts 1 and 2. If the two agree on a
decision discard this pattern. If they disagree then add the pattern to
the training set for the third expert.
Continue this process until a total of N1 patterns have been filtered
jointly by both the 1st and 2nd experts.
Back
5. 1. Boosting
a. Boosting by filtering (continued)
The generated experts can be combined by
Simple Voting: If the first and second experts agree, then their
respective decisions is used. Otherwise, the decision of the third
expert is used.
Sum Rule: The output of all the experts is “added” together. The
final decision (class label) is based on the final added output.
Back
6. 1. Boosting
b. Boosting by re-sampling
The practical limitation of the boosting by filtering is that it requires a large
training set. This limitation can be overcome using re-sampling, or
AdaBoost. AdaBoost differs from other Boosting techniques in that:
Adjusts adaptively to the errors of the weak hypothesis returned by the
weak learning model.
The bound on performance depends only on the performance of the
weak learning model on those distributions that are actually generated
during the learning process.
Back
7. 1. Boosting
b. Boosting by re-sampling (continued)
Boosting generates a hypothesis whose error rate is small by
combining many hypotheses whose error rate maybe large. It is
useful on learning problems that have one of the following
properties
The observed examples tend to have varying degrees of
difficulty. For such problems, the boosting algorithm tends to
generate distributions that concentrate on the harder examples.
The learning algorithm should be sensitive to changes in the
training examples so that significantly different hypotheses
could be generated for different training sets. This is related to
the reduction in variance achieved by boosting.
Back
8. 2. Stacked Generalization
Back
Stacked generalization is a layered architecture framework,
The classifiers at the Level 0 receive as input the original data, and each
classifier outputs a prediction for its own sub problem.
Successive layers receive as an input the predictions of the layer
immediately preceding it.
A single classifier at the top level outputs the final prediction.
Stacked generalization is an attempt to minimize the generalization error
using the classifiers in higher layers to learn the type of errors made by the
classifiers immediately below.
10. 2. Stacked Generalization (continued)
Back
The learning phase begins by training each of the Level 0 classifiers
using the leave-one-out cross validation, that is, for each pattern in the
training set one is omitted and the remaining patterns are used for training.
After training, classify the left-out pattern is classified. A vector is formed
from the predictions of the Level 0 classifiers and the actual class of that
pattern.
Then, Level 1 classifier is trained with the training set being the collection
of vectors generated by the Level 0 classifiers.
The second-level and subsequent level training would require large
quantities of data be available. It is also prune to error due to unrealistic
confidence by certain members of the base classifiers.
11. 3. Mixture of Experts
Back
Mixture of experts (ME) is the classical adaptive ensemble method
A gating network is used to generate a partition of feature space into
different regions, with one expert in the ensemble being responsible for
generating the correct output within that region.
The experts in the ensemble and the gating network are trained
simultaneously, which can be efficiently performed with the Expectation
Maximization (EM) algorithm.
ME can be extended to a multi-level hierarchical structure, where each
component is itself a ME. In this case, a linear network can be used for the
terminal classifiers.