1. An Analysis of mnistclassify.m of
Hinton’s mnistdeepauto example
by Ali Riza SARAL
arsaral((at))yahoo.com
References:
Hinton’s «Lecture 12C _ Restricted Boltzmann Machines»
Hugo Larochelle’s «Neural networks [5.2] _ Restricted Boltzmann machine – inference»
Hugo Larochelle’s «Neural networks [5.4] _ Restricted Boltzmann machine - contrastive divergence»
2. mnistclassify
• clear allclose all
• maxepoch=1; % maxepoch=50;
• numhid=250; numpen=250; numpen2=50;
• % numhid=500; numpen=500; numpen2=2000;
• fprintf(1,'Converting Raw files into Matlab format n');
• converter;dos('erase *.ascii');
• fprintf(1,'Pretraining a deep autoencoder. n');fprintf(1,'The Science
paper used 50 epochs. This uses %3i n', maxepoch);
• makebatches;
• [numcases numdims numbatches]=size(batchdata); % 100 784 600
4. Backpropclassify.m
• maxepoch=2; %maxepoch=200;
• fprintf(1,'nTraining discriminative model on MNIST by minimizing cross entropy error. n');
• fprintf(1,'60 batches of 1000 cases each. n');
• load ...
• makebatches;
• [numcases numdims numbatches]=size(batchdata); % 100 784 600
• N=numcases; % 100
• %%%% PREINITIALIZE WEIGHTS OF THE DISCRIMINATIVE MODEL
• w1, w2 ,w3
• %%%%%%%%%% END OF PREINITIALIZATION OF WEIGHTS
• set l1,l2,l3,l4,l5 lengths
5. Backpropclassify.m
• test_err=[];
• train_err=[];
• for epoch = 1:maxepoch
•
• %%%%%%%%%%%%%%%%%%%% COMPUTE TRAINING MISCLASSIFICATION ERROR
• %%%%%%%%%%%%%% END OF COMPUTING TRAINING MISCLASSIFICATION ERROR
• %%%%%%%%%%%%%%%%%%%% COMPUTE TEST MISCLASSIFICATION ERROR
• %%%%%%%%%%%%%% END OF COMPUTING TEST MISCLASSIFICATION ERROR
6. Backpropclassify.m
• for epoch = 1:maxepoch
•
• %%%%%%%%%%%%%%%%%%%% COMPUTE TRAINING MISCLASSIFICATION ERROR
• ...
• %%%%%%%%%%%%%%%%%%%% COMPUTE TEST MISCLASSIFICATION ERROR
• ...
• for batch = 1:numbatches/10
• fprintf(1,'epoch %d batch %dn',epoch,batch);
• %%%%%%%%%%% COMBINE 10 MINIBATCHES INTO 1 LARGER MINIBATCH
• %%%%%%%%%%%%%%% PERFORM CONJUGATE GRADIENT WITH 3 LINESEARCHES
• if epoch<2 % original 6 First update top-level weights holding other weights fixed.
• [X, fX] = minimize(VV,'CG_CLASSIFY_INIT',max_iter,Dim,w3probs,targets);
• % 510 1, 4 1 = min(510 1,'CG_CLASS...',3, 2 1, 1000 51, 1000 10
• else
• [X, fX] = minimize(VV,'CG_CLASSIFY',max_iter,Dim,data,targets);
• %[% 272060 1, 4 1] %=mini..(272060 1,'CG_CLA..', 3, 5 1, 1000 784, 1000 10);
• end
• %%%%%%%%%%%%%%% END OF CONJUGATE GRADIENT WITH 3 LINESEARCHES
• end
• save mnistclassify_weights w1 w2 w3 w_class
• save mnistclassify_error test_err test_crerr train_err train_crerr;
• end
9. COMPUTE TRAINING
MISCLASSIFICATION ERROR
• for batch = 1:numbatches % 1 : 600
• ...
• [I J]=max(targetout,[],2);
• % 100 1 , 100 1 = 100 1 -->I has the value J has the sequence
• [I1 J1]=max(target,[],2); % max(100 10,[],2) 100 1
•
• counter=counter+length(find(J==J1)); % =6 for the first batch
• err_cr = err_cr- sum(sum( target(:,1:end).*log(targetout))); %cross
entrophy
• end
• train_err(epoch)=(numcases*numbatches-counter);
• % total number of errors for all the batches in this epoche
train_crerr(epoch)=err_cr/numbatches;
• % total cross enthropy error for the complete batchdata in this epoche
12. COMPUTE TEST MISCLASSIFICATION ERROR
• [I J]=max(targetout,[],2);
• % 100 1 , 100 1 = 100 1 -->I has the value J has the sequence
• [I1 J1]=max(target,[],2); % max(100 10,[],2) 100 1
counter=counter+length(find(J==J1));
• % =9 for the first batch
• err_cr = err_cr- sum(sum(
target(:,1:end).*log(targetout))); %cross entrophy
• end
• test_err(epoch)=(testnumcases*testnumbatches-
counter); % total number of errors for all the batches in this epoche
• test_crerr(epoch)=err_cr/testnumbatches;
• % total cross enthropy error for the complete batchdata in this epoche
• fprintf(1,'Before epoch %d Train # misclassified: %d
(from %d). Test # misclassified: %d (from %d) t t n',...
epoch,train_err(epoch),numcases*numbatches,test_er
r(epoch),testnumcases*testnumbatches);
16. End of backpropclassify
• for epoch = 1:maxepoch
• ... The body of backpropclassify
• %%%%%%%%%%% END OF CONJUGATE GRADIENT WITH 3 LINESEARCHES
• end
• save mnistclassify_weights w1 w2 w3 w_class
• save mnistclassify_error test_err test_crerr train_err train_crerr;
• end
17. Outline
• There are two important loops in
mnistclassify.
• Epoche and batch loops.
• The batch loop serves to handle the data in
600 batches.
• The epoche loop determines the amount of
loops to approach the final result.
19. Calculate misclassification
• The epoche loop has three main sections.
• The first two sections compute misclassification
error.
• Training and test data are used to calculate the
probabilities of w1probs, w2probs, w3probs.
• This is done in a batch loop 600 times in each
section.
• Targetout is calculated using w3probs and w_class.
• targetout = exp(w3probs*w_class); % 100 501 * 501 10 = 100 10
20. Find the targetout and target
• targetout = targetout./repmat(sum(targetout,2),1,10);
• % 100 10 = 100 10 ./ repmat ( 100 1), 1 10) = 100 10 Normalize targetout
• Find the sequence number(1..10) of the
maximum values
• [I J]=max(targetout,[],2);
• % 100 1 , 100 1 = 100 1 -->I has the value J has the sequence
• Find the same in target
• [I1 J1]=max(target,[],2); % max(100 10,[],2) 100 1
21. End of misclassification calculation
• Count the number of correct results in this
batch and calculate cross entropy
• counter=counter+length(find(J==J1)); % =6 for the first batch
• err_cr = err_cr- sum(sum(
target(:,1:end).*log(targetout))) ; %cross entropy
• Repeat these for all the 600 batches and accumulate
error counter and cross entropy err_cr.
• Fprintf this statistic info for each epoch at the end of
each test misclassification calculation.
22. COMBINE 10 MINIBATCHES INTO 1 LARGER MINIBATCH
• for kk=1:10
• data=[data batchdata(:,:,(tt-1)*10+kk)]; % 1000 784
• targets=[targets batchtargets(:,:,(tt-1)*10+kk)]; % 1000 10
• End
• Calculate data and targets, namely data is
1000 items of 28x28 = 784 and targets 1000
items of 10 probabilities
23. PERFORM CONJUGATE GRADIENT WITH 3
LINESEARCHES
• Calculate w1probs, w2probs, w3probs again for data
• VV = [w_class(:)']'; % 51 10 = 510 1
• Dim = [l4; l5]; % 2 1
•
• [X, fX] = minimize(VV,'CG_CLASSIFY_INIT',max_iter,Dim,w3probs,targets);
• % 510 1, 4 1 = min(510 1,'CG_CLASS...',3, 2 1, 1000 51, 1000 10
• Minimize w_class using w3probs as data
• change w_class so that it will be used in the next epoche
• w_class = reshape(X,l4+1,l5); %reshape(X,51,10)= 51 10
24. PERFORM CONJUGATE GRADIENT WITH 3
LINESEARCHES
• if epoch<2 % original 6 First update top-level weights holding other weights fixed.
• ...prev page
• Else
• Minimize the weights and the w_class
• VV = [w1(:)' w2(:)' w3(:)' w_class(:)']'; % 272060 1
• [X, fX] = minimize(VV,'CG_CLASSIFY',max_iter,Dim,data,targets);
• %[272060 1, 4 1]=mini..(272060 1,'CG_CLA..', 3, 5 1, 1000 784, 1000 10);
• w1 = reshape(X(1:(l1+1)*l2),l1+1,l2);
• w2,w3
• w_class = reshape(X(xxx+1:xxx+(l4+1)*l5),l4+1,l5); % 51 10
•
25. End of each epoche
• save mnistclassify_weights w1 w2 w3 w_class
• save mnistclassify_error test_err test_crerr train_err
train_crerr;
26. THE PITH
• 1- We create the weight values of a net with
RBM. This net can detect itself. It is a feature
map of the data that has been used.
• 2- We make an epoche loop and batch loops in it
to repetitively approach the result.
• 3-We use the weights developed at the first stage
and a random generated w_class to compute
misclassification error.
• We perform conjugate gradient with 3 line
searches to minimize weights and w_classify.
27. THE PITH
• Note that
• [I J]=max(targetout,[],2); % 100 1 , 100 1 = 100 1 -->I has the max value and J has its
sequence
• We use w_classify to classify the number.
• w_classify is defined as:
• w_class = 0.1*randn(size(w3,2)+1,10); % randn(501,10) = 501 10
• W_class is a matrix of 501 rows and 10 columns.
• targetout = exp(w3probs*w_class); % 100 501 * 501 10 = 100 10
• Targetout is a matrix of 100 rows and 10 columns and is normalized
• [I J]=max(targetout,[],2); % 100 1 , 100 1 = 100 1 -->I has the value J has its sequence
• J is the number and it comes from the sequence of
the max probability value in each row.
31. CG_CLASSIFY
• Reshape VV to produce w1,w2,w3,w_class
• Produce w1probs, w2probs, w3probs
• Produce targetout from w3probs and w_class
• Produce cost value f using target and
targetout
• Produce dw1, dw2,dw3 and dw_class
• df = [dw1(:)' dw2(:)' dw3(:)' dw_class(:)']';
• % 272060 1