Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ada boost

3,383 views

Published on

combination weak classifiers to become a strong one

Published in: Engineering
  • Login to see the comments

Ada boost

  1. 1. Hank 2013/06/26 AdaBOOST Classifier
  2. 2. Contents  Concept  Code Tracing  Haar + AdaBoost  Notice  Usage  Appendix
  3. 3. Adaboost v.2c3 Concept-Classifier  Training procedures  Give +ve and –ve examples to the system, then the system will learn to classify an unknown input.  E.g. give pictures of faces (+ve examples) and non- faces (-ve examples) to train the system.  Detection procedures  Input an unknown (e.g. an image) , the system will tell you it is a face or not. Face non-face
  4. 4.     )( otherwise1 if1 )( becomes)(, .variablesareconstants,are and),]),[((:functiontheis 1},-or1{polaritywhere )( otherwise1 )(if1 )( use.oyou want tcasewhichcontroltopolarityuseand equationbecometotogether2and1casecombine,-At time- otherwise1 constantsgivenare,where1 )( :aswrittenbecanIt.1otherwise 1then,areawhite""in theis],[xpointaIf ---Case2- otherwise1 constantsgivenare,where,if1 )( :aswrittenbecanIt.1otherwise 1then,areagray""in theis][pointaIf ---Case1- ib cpmuvp xh iequationcpmuvp u,vm,cwhere cmuvvuxff p i pxfp xh p (i)t m,xc ,mu)if -(v- xh h(x) h(x)v)(u m,xcv-mu xh h(x) h(x)(u,v)x tt t tt t t tttt t t                                    4 (Updated)!! First let us learn what is what a weak classifier h( )  v=mu+c or v-mu=c •m,c are used to define the line •Any points in the gray area satisfy v-mu<c •Any points in the white area satisfy v-mu>c v c Gradient m (0,0) v-mu<c v-mu>c u
  5. 5. Adaboost - Adaptive Boosting 5  Instead of resampling, uses training set re-weighting  Each training sample uses a weight to determine the probability of being selected for a training set.  AdaBoost is an algorithm for constructing a “strong” classifier as linear combination of “simple” “weak” classifier  Final classification based on weighted vote of weak classifiers
  6. 6. Concept Weak learners from the family of lines h => p(error) = 0.5 it is at chance Each data point has a class label: wt =1 and a weight: +1 ( ) -1 ( ) yt =
  7. 7. Concept This one seems to be the best Each data point has a class label: wt =1 and a weight: +1 ( ) -1 ( ) yt = This is a ‘weak classifier’: It performs slightly better than chance.
  8. 8. Concept We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( ) -1 ( ) yt =
  9. 9. We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( ) -1 ( ) yt = Concept
  10. 10. Concept We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( ) -1 ( ) yt =
  11. 11. We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( ) -1 ( ) yt = Concept
  12. 12. The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers. f1 f2 f3 f4 Concept
  13. 13. An example to show how Adaboost works Adaboost v.2c13  Training,  Present ten samples to the system :[xi={ui,vi},yi={’+’ or ‘-’}]  5 +ve (blue, diamond) samples  5 –ve (red, circle) samples  Train up the system  Detection  Give an input xj=(1.5,3.4)  The system will tell you it is ‘+’ or ‘-’. E.g. Face or non-face  Example:  u=weight, v=height  Classification: suitability to play in the basket ball team. [xi={-0.48,0},yi=’+’] [xi={-0.2,-0.5},yi=’+’]u-axis v-axis
  14. 14. Adaboost concept Adaboost v.2c14  Use this training data, how to make a classifier One axis-parallel weak classifier cannot achieve 100% classification. E.g. h1(), h2(), h3() all fail. That means no matter how you place the decision line (horizontally or vertically) you cannot get 100% classification result. You may try it yourself! The above strong classifier should; work, but how can we find it? ANSWER: Combine many weak classifiers to achieve it. Training data 6 squares, 5 circles. h1( ) h2 ( ) h3( ) The solution is a H_complex( ) Objective: Train a classifier to classify an unknown input to see if it is a circle or square.
  15. 15. How? Each classifier may not be perfect but each can achieve over 50% correct rate.  1         T t tt (x)hαsignH(x) 1 Adaboost v.2c15 Classification Result Combine to form the Final strong classifier h1( ) h2() h3( ) h4( ) h5( ) h6() h7() 2 3 4 5 6 7 7,..,2,1for, classifierweak eachforWeight ii
  16. 16. Adaboost Algorithm Adaboost v.2c16                                                                               otherwise iosigny )(xhαtS)(xhαxo CE )(xhαtI)(xhαsigny x )(xhαtI)(xhαsigny x I )(xhαtI n E )(xhαtECE Z xhyiD iDStep ε ε .ε otherwise yxh IIiD εhD Xh ,...Tt )( niD YyXx),,y),..(xy(x ti i t iit t i t ii i i t ii i n i tj j ijt t ititt t t t t t t iit yxhyxh n i tt q q tt t t iinn, iitiit 0 )(if1 ,,and,)(outputThe } break;t,Tthen0If 1,,errorhence, i.e.classifiercascadedcurrentby theclassifiedyincorrectlisIf 0,,errorhence, i.e.,classifiercascadedrentrcuby thedclassifiecorrectlyisIf :followsasdefinedis)(and ,,, 1 errorclassifiercurrentthehilew ,,errorclassifiercascadedtalCurrent to:Step4 nexplanatioforslidenextsee, ))(exp()( )(:3 value).confidence(orweight, 1 ln 2 1 :Step2 stop.otherwiseok)is0.5ansmaller th(error:50:teprerequisi:stepchecking 0 y)incorrectld(classifie)(if1 where,*)(error:Step1b minarg:meansthat,respect toerror with theminimizesthat}1,1{:classifiertheFind:Step1a{ 1For examples)1(negativeofnumberLexamples;1positiveofnumberM LMnsuch that;/1)((weight)ondistributiInitialze }1,1{,where:Given 1 1 1 1 1 1 )()( 1 1 11                classifierstrongfinalThe 1         T t tt (x)hαsignH(x) Initialization Main Training loop The final strong classifier See enlarged versions in the following slides )(xhy(i)eD)(xhy(i)eD weightincorrrectweightcorrectZ ondistrubutiyprobabilitDionnormalizatZ iti α classifiedyincorrectln i titi -α classifiedcorrectlyn i t classifiedyincorrectln i classifiedcorrectlyn i t tt tt        __ 1 __ 1 __ 1 __ 1 __ aissofactor,where
  17. 17. Initialization Adaboost v.2c17  examples)1(negativeofnumberL examples;1positiveofnumberM LMnsuch that ;/1)((weight)ondistributiInitialze }1,1{,where:Given 1 11       )( niD YyXx),,y),..(xy(x t iinn,
  18. 18. Main loop (step1,2,3) Adaboost v.2c18         nexplanatioforslidenextsee, ))(exp()( )(:3 value).confidence(orweight, 1 ln 2 1 :Step2 stop.otherwiseok)is0.5ansmaller th(error:50:teprerequisi:stepchecking 0 )yincrroectlclassified()(if1 where,*)(error:Step1b minarg:meansthat,respect toerror with theminimizesthat}1,1{:classifiertheFind:Step1a{ 1For 1 )()( 1 t ititt t t t t t t iit yxhyxh n i tt q q tt t Z xhyiD iDStep ε ε .ε otherwise yxh IIiD εhD Xh ,...Tt iitiit                        
  19. 19. Main loop (step 4) Adaboost v.2c19                                               otherwise iosigny )(xhαtS)(xhαxo CE )(xhαtI)(xhαsigny x )(xhαtI)(xhαsigny x I )(xhαtI n E )(xhαtECE ti i t iit t i t ii i i t ii i n i tj j ijt 0 )(if1 ,,and,)(outputThe } break;t,Tthen0If 1,,errorhence, i.e.classifiercascadedcurrentby theclassifiedyincorrectlisIf 0,,errorhence, i.e.,classifiercascadedrentrcuby thedclassifiecorrectlyisIf :followsasdefinedis)(and ,,, 1 errorclassifiercurrentthehilew ,,errorclassifiercascadedtalCurrent to:Step4 1 1 1 1 1             classifierstrongfinalThe 1         T t tt (x)hαsignH(x)
  20. 20. AdaBoost chooses this weight update function deliberately Because, •when a training sample is correctly classified, weight decreases •when a training sample is incorrectly classified, weight increases Note: Normalization factor Zt in step3 Adaboost v.2c20 )(xhy(i)eD)(xhy(i)eD weightincorrrectweightcorrectZ ondistrubutiyprobabilitDionnormalizatZ Z xhyiD iDStep call iti α classifiedyincorrectln i titi -α classifiedcorrectlyn i t classifiedyincorrectln i classifiedcorrectlyn i t tt t ititt t tt           __ 1 __ 1 __ 1 __ 1 1 __ abecomessofactor,where , ))(exp()( )(:3 :Re  ))(exp()()(1 itittt xhyiDiD 
  21. 21. Note: Stopping criterion of the main loop  The main loops stops when all training data are correctly classified by the cascaded classifier up to stage t.         } break;t,Tthen0If 1,,errorhence, i.e.classifiercascadedcurrentby theclassifiedyincorrectlisIf 0,,errorhence, i.e.,classifiercascadedrentrcuby thedclassifiecorrectlyisIf :followsasdefinedis)(and ,,, 1 errorclassifiercurrentthehilew ,,errorclassifiercascadedtalCurrent to:Step4 1 1 1 1                           t i t ii i i t ii i n i tj j ijt CE )(xhαtI)(xhαsigny x )(xhαtI)(xhαsigny x I )(xhαtI n E )(xhαtECE          Adaboost v.2c21
  22. 22. Dt(i) =weight Adaboost v.2c22  Dt(i) = probability distribution of the i-th training sample at time t . i=1,2…n.  It shows how much you trust this sample.  At t=1, all samples are the same with equal weight. Dt=1(all i)=same  At t >1 , Dt>1(i) will be modified, we will see later.
  23. 23. An example to show how Adaboost works Adaboost v.2c23  Training,  Present ten samples to the system :[xi={ui,vi},yi={’+’ or ‘-’}]  5 +ve (blue, diamond) samples  5 –ve (red, circle) samples  Train up the classification system.  Detection example:  Give an input xj=(1.5,3.4)  The system will tell you it is ‘+’ or ‘-’. E.g. Face or non-face.  Example:  You may treat u=weight, v=height  Classification task: suitability to play in the basket ball team. [xi={-0.48,0},yi=’+’] [xi={-0.2,-0.5},yi=’+’]u-axis v-axis
  24. 24. Initialization  M=5 +ve (blue, diamond) samples  L=5 –ve (red, circle) samples  n=M+L=10  Initialize weight D(t=1)(i)= 1/10 for all i=1,2,..,10,  So, D(1)(1)=0.1, D(1) (2)=0.1,……, D(1)(10)=0.1 exampleLexample;positiveM LMnthatsuch;/1)(Initialze }1,1{,wherewhere:Given 1 11 negative niD YyXx),,y),..(x,y(x t iinn     Adaboost v.2c24
  25. 25. Select h( ): For simplicity in implementation we use the Axis-parallel weak classifier  0 0 bycontrolledbecanlinetheofpositionthe line)(vertcialmgradientoflineais or bycontrolledbecanlinetheofpositionthe line)l(horizonta0mgradientoflineais classifierweakparallel-Axis .variablesareconstants,are),(:functiontheis threshold1},-or1{polaritywhere )( otherwise0 )(if1 )( Recall u f v f u,vm,ccmuff vp i pxfp xh tt tttt t              Adaboost v.2c25 ha (x) hb(x) u0 v0
  26. 26. Step1a, 1b  Assume h() can only be horizontal or vertical separators. (axis-parallel weak classifier)  There are still many ways to set h(), here, if this hq() is selected, there will be 3 incorrectly classified training samples.  See the 3 circled training samples  We can go through all h( )s and select the best with the least misclassification (see the following 2 slides)   stop.otherwiseok)is0.5ansmaller th(error:50:teprerequisi:stepchecking:Step1b minarg:meansThat respect toerror withtheminimizethat}1,1{:classifiertheFind:{Step1a .ε εh DXh t q q t tt       Adaboost v.2c26 Incorrectly classified by hq() hq()
  27. 27. Example :Training example slides from [Smyth 2007] classifier the ten red (circle)/blue (diamond) dots Step 1a:  },-{p (x)h vvux pupu xh i i i 11polarity axis.verticalthe toparallelisbecause usednotis),,( otherwise1 if1 )(         Adaboost v.2c27 Initialize: Dn (t=1)=1/10 You may choose one of the following axis-parallel (vertical line) classifiers Vertical Dotted lines are possible choices hi=1(x) ………….. hi=4(x) ……………… hi=9(x) u1 u2 u3 u4 u5 u6 u7 u8 u9 u-axis v-axis There are 9x2 choices here, hi=1,2,3,..9, (polarity +1) h’i=1,2,3,..9, (polarity -1)
  28. 28. Example :Training example slides from [Smyth 2007] classifier the ten red (circle)/blue (diamond) dots Step 1a:  },-{p (x)h uvux pvpv xh j j j 11polarity axis.horizontalthe toparallelisbecause usednotis),,( otherwise1 if1 )(         28 Initialize: Dn (t=1)=1/10 You may choose one of the following axis-parallel (horizontal lines) classifiers Horizontal dotted lines are possible choices hj=1(x) hj=2(x) : hj=4(x) : : : : : hj=9(x) v1 v2 v3 V4 V5 V6 V7 V8 v9 u-axis v-axis There are 9x2 choices here, hj=1,2,3,..9, (polarity +1) h’j=1,2,3,..9, (polarity -1) All together including the previous slide 36 choices
  29. 29. Step 1b: Find and check the error of the weak classifier h( )  To evaluate how successful is your selected weak classifier h( ), we can evaluate the error rate of the weak classifier  ɛt = Misclassification probability of h( )  Checking: If εt>= 0.5 (something wrong), stop the training  Because, by definition a weak classifier should be slightly better than a random choice--probability =0.5  So if εt >= 0.5 , your h( ) is a bad choice, redesign another h”( ) and do the training based on the new h”( ).       stop.otherwise,50:teprerequisi:stepchecking:Step1b 0 )classifiedly(incorrect)(if1 where,*)( )()( 1 .ε otherwise yxh IIiD t iit yxhyxh n i tt iitiit          Adaboost v.2c29
  30. 30.  Assume h() can only be horizontal or vertical separators.  How many different classifiers are available?  If hj() is selected as shown, circle the misclassified training samples. Find ɛ( ) to see misclassification probability if the probability distribution (D) for each sample is the same.  Find h() with minimum error. stop.otherwise,50:teprerequisi:stepchecking:Step1b respect toerror withtheminimizesthat}1,1{:classfiertheFind:{Step1a .ε DXh t tt   Adaboost v.2c30 hj()
  31. 31. Result of step2 at t=1 Adaboost v.2c31  Incorrectly classified by ht=1(x) ht=1(x)
  32. 32. Step2 at t=1 (refer to the previous slide)  Using εt=1=0.3, because 3 samples are incorrectly classified 424.0 30.0 3.01 ln 2 1 .classifierofrateerrorweightedtheiswhere 1 ln 2 1 :Step2 3.01.01.01.0 1 1         t tt t t t t so hε ε ε ε                   otherwise yxh I IiD iit yxh yxh n i tt iit iit 0 )(if1 where ,*)( )( )( 1  Adaboost v.2c32 The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf Also see appendix.
  33. 33. Step3 at t=1, update Dt to Dt+1  Update the weight Dt(i) for each training sample i function)(prob.ondistrubutiaisso factor,ionnormalizatwhere ))(exp()( )(:3 1 t t t ititt t D Z Z xhyiD iDStep     Adaboost v.2c33 The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf Also see appendix.
  34. 34. Step 3: Find first Z (the normalization factor). Note that Dt=1=0.1, at=1 =0.424  911.0 456.0455.0 52.1*3*1.065.0*7*1.0*3*1.0*7*1.0 )__()__( initput,1)(so),(:classifiedyincorrectl initput,1)(so),(:classifiedcorrectly )( __ 1t samplesincorrect3andcorrect7 424.0,1.0 1 424.0424.0 )()()( )1( )( )1( )()( )( )( 11                               t xhyi α t xhy α t xhyi α t xhy α tt iiiiii iiiiii xhyi )(xhyα t xhy )(xhyα tt xhy xhyi t tt Z ee weightincorrecttotalweightcorrecttotal (i)eD(i)eD(i)eD(i)eDZ (i)xhyxhy (i)xhyxhy i(i)eD(i)eDZ weightincorrectweightcorrectZ αD ii t iii t ii t iii t ii itit iii itit iii ii Adaboost v.2c34 Note: currently t=1, Dt=1(i)=0.1 for all i 7 correctly classified 3 incorrectly classified
  35. 35. Step 3: Example: update Dt to Dt+1 If correctly classified, weight Dt+1 will decrease, and vice versa.      167.052.1 911.0 1.0 911.0 1.0 )( 0714.065.0 911.0 1.0 911.0 1.0 )( ,911.0since 52.1*1.0 1.01.0 )( 65.0 1.0 )( 1.0 )( 1 1 1 1 42.01 1 1 42.0 )( 1                    eiDincrease eiDdecrease SoZ e Z e Z iD Z iD e Z e Z iD D incorrectt correctt t tt incorrectt t correctt t correct t t t Adaboost v.2c35
  36. 36. Now run the main training loop second time t=2  167.052.1 911.0 1.0 911.0 1.0 )( 0714.065.0 911.0 1.0 911.0 1.0 )( 1 1 1 1        eiD eiD incorrectt correctt Adaboost v.2c36
  37. 37. Now run the main training loop second time t=2, and then t=3 Adaboost v.2c37  Final classifier by combining three weak classifiers
  38. 38. Combined classifier for t=1,2,3 Exercise: work out 1and 2   )()()(*424.0)( 33221 1 xhαxhαxhsignxH (x)hαsignH(x) tt T t tt            Adaboost v.2c38 Combine to form the classifier. May need one more step for the final classifier ht=1() ht=2() ht=3() 1 2 3
  39. 39. Code trace
  40. 40. 1 2 For loop (numStages) 1
  41. 41. CvCascadeBoost::train  update_weights( 0 );  do{  CvCascadeBoostTree* tree = new CvCascadeBoostTree; if( !tree->train( data, subsample_mask, this ) ){ delete tree;  continue;  }  cvSeqPush( weak, &tree );  update_weights( tree );  trim_weights();  } while( !isErrDesired() && (weak->total < params.weak_count) ); weak_eval[i] = f(x_i) in [- 1,1] w_i *= exp(-y_i*f(x_i))
  42. 42. Trace code  Main related files  traincascade.cpp  classifier.train  Main Boosting algorithm  CvCascadeClassifier::train (file: CascadeClassifier.cpp), 只要觀察裡面的 for numStages loop 1. updateTrainingSet 1. 只取之前 stage失敗的->predict=1 2. fillPassedSamples 1. imgReader.getPos與 imgReader.getNeg不太一樣 2. 利用 CvCascadeBoost::predict (boost.cpp) 來選擇加入的 samples, stage (stage為0時, 全取->predict(i)=1) 1. acceptanceRatio = negCount / negConsumed 3. 每個 stage會計算 tempLeafFARate, 若已經比 requiredLeafFARate 小, 則結束 4. CvCascadeBoost::train (file: boost.cpp) 1. new CvCascadeBoostTrainData 會在此時被 new 2. update_weights -> 若還不存在 tree, 則各 tree的 weight會在此時被 update 3. featureEvaluator 可任意被置換為 e.g. HaarEvaluator
  43. 43. Usage  Pre-processing  opencv_createsamples.exe  Training  opencv_traincascade.exe -featureType HAAR -data classifier/ -vec positive.vec -bg negative.dat -w 30 -h 30 -numPos 696 - numNeg 545 –numStage 16  Parameters:  maxFalseAlarm: 最高可容忍的 false alarm rate, 此參數會影 響各 stage的停止條件  requiredLeafFARate = pow(maxFalseAlarm, numStages ) /max_depth
  44. 44. Usage  # pre-processing  # resize images in directory, you need to have imageMagicK utility  ################# 1. collect file names #############################  # notice: a. negative image size should be larger than posititve ones  find ./dataset/positive/resize/ -name '*.jpg' > temp.dat  find ./dataset/negative/ -name '*.jpg' > negative.dat  sed 's/$/ 1 0 0 30 30/' temp.dat > positive.dat  rm temp.dat  ################# 2. create samples #################################  ./opencv_createsamples.exe -info positive.dat -vec positive.vec -w 30 -h 30 -show  ################## 3. train samples #################################  ./opencv_traincascade.exe -featureType HAAR -data classifier -vec positive.vec -bg negative.dat -w 30 -h 30 -numPos 100 -numNeg 300 -numStages 18
  45. 45. Usage  Detection  Windows-based  haarClassifier.load  haarClassifier.detectMultiScale(procImg, resultRect, 1.1, 3, 0, cvSize(12, 12), cvSize(80, 80));  Detect on your own  haarClassifier.load  haarClassifier.featureEvaluator->setImage( scaledImage, originalWindowSize )  haarClassifier.runAt(evaluator, Point(0, 0), gypWeight);  Notes  Infinite loop in CvCascadeClassifier::fillPassedSamples  Solution:  Add more samples  Reduce stages
  46. 46. Appendix-Haar-like Features
  47. 47. Example )Sum(r)Sum(r blacki,whitei, if •Feature’s value is calculated as the difference between the sum of the pixels within white and black rectangle regions.       thresholdfif thresholdfif xh i i i 1 1 )(
  48. 48. Reference  http://docs.opencv.org/doc/user_guide/ug_trainca scade.html

×