Be the first to like this
I study classification problems in a standard learning framework, in which an unsupervised learner creates a discriminant function over each class and observations are labeled by the learner returning the highest value associated with that observation. I examine whether this approach gains significant advantage over traditional discriminant techniques.
Probably Approximately Correct learning distributions over class labels under L1 distance or KL-divergence is shown to imply PAC classification in this framework. I determine bounds on the regret associated with the resulting classifier, taking into account the possibility of variable misclassification penalties, and demonstrate the advantage of estimating the a posteriori probability distributions over class labels in the setting of Optical Character Recognition.
Unsupervised learners can be used to learn a class of probabilistic
concepts (stochastic rules denoting the probability that an observation has a positive
label in a 2-class setting). I demonstrate a situation where unsupervised learners
can be used even when it is hard to learn distributions over class labels – in this case
the discriminant functions do not estimate the class probability densities.
If that isn't exciting enough, I then use a standard state-merging technique to PAC-learn a class of probabilistic automata. The results show that by learning the distribution over outputs under the weaker L1 distance rather than KL-divergence we are able to learn without knowledge of the expected length of an output. It is also shown that for a restricted class of these automata learning under L1 distance is equivalent to learning under KL-divergence.