The “ closeness ” between documents is calculated as the correlation between the vectors that represent them, using measures such as the cosine of the angle between these two vectors.
Build classifier on each bootstrap sample set ( 自助样本集 )
23.
Bagging Algorithm Let k be the number of bootstrap samples set For i =1 to k do Create a bootstrap sample D i of Size N Train a (base) classifier C i on D i End for
Given ( x j , y j ) : a set of N training examples ( j=1,…,N )
The error rate of a base classifier C i : where I(p) = 1 if p is true, and 0 otherwise. The importance of a classifier C i :
29.
AdaBoosting Algorithm The weight update mechanism (Equation): where is the normalization factor: : the weight for example ( x i , y i ) during the round
30.
AdaBoosting Algorithm Let k be the number of boosting rounds, D is the set of all examples Update the weight of each examples according to Equation End for , Initialize the weights for all N examples For i = 1 to k do Create training set D i by sampling from D according to W . Train a base classifier C i on D i Apply C i to all examples in the original set D
Two “independent” views: split the features into two sets.
Train a classifier on each view.
Each classifier labels data that can be used to train the other classifier , and vice versa
35.
Co-Training Approach Feature Set X=(X1, X2) Classification Model One Classification Model Two new labeled data set 1 subset X1 subset X2 training training new labeled data set 2 classifying classifying Unlabeled data Unlabeled data example set L example set L
Company has database with details of its customer – positive examples, but no information on those who are not their customers, i.e., no negative examples .
Want to find people who are similar to their customers for marketing.
Buy a database consisting of details of people -- who may be potential customers ?
46.
Step 1 Step 2 positive negative Reliable Negative (RN) Q =U - RN U P positive Using P, RN and Q to build the final classifier iteratively or Using only P and RN to build a classifier Existing 2-step strategy
Be the first to comment