eam2

618 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
618
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

eam2

  1. 1. Machine Learning Contest Team2 정하용 , JC Bazin( 제이시 ), 강찬구
  2. 2. Contents <ul><li>Introduction </li></ul><ul><li>Machine Learning Algorithms </li></ul><ul><ul><li>Neural Network </li></ul></ul><ul><ul><li>Support Vector Machine </li></ul></ul><ul><ul><li>Maximum Entropy Model </li></ul></ul><ul><li>Feature Selection </li></ul><ul><li>Voting </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Introduction <ul><li>In this team project, we were asked to develop a program which predicts whether a person's income is greater than 50K per year or not. </li></ul><ul><li>The objective of contest </li></ul><ul><ul><li>Good result..!! </li></ul></ul>
  4. 4. Machine Learning Algorithms <ul><li>For this project, we used 3 different learning algorithms : </li></ul><ul><ul><li>Neural Network </li></ul></ul><ul><ul><li>Support Vector Machine </li></ul></ul><ul><ul><li>Maximum Entropy Model </li></ul></ul>
  5. 5. Neural network <ul><li>the MATLAB NN toolbox using the feed-forward back-propagation algorithm. </li></ul><ul><li>Format Transformation </li></ul><ul><ul><li>only numbers are allowed </li></ul></ul><ul><ul><li>transforms the data into integers assigning to a value its position in the given list of attribute. </li></ul></ul><ul><ul><li>E.g. : </li></ul></ul><ul><ul><li> Race : ( White, Asian-Pac-Islander, Amer-Indian-Eskimo, other, Black ) </li></ul></ul><ul><ul><li>2 will be assign to “Asian-Pac-Islander” </li></ul></ul><ul><ul><li>The “? unknown information” has been assigned by -1 </li></ul></ul><ul><ul><li>All the others data are positive values </li></ul></ul>
  6. 6. Neural network <ul><li>Parameters </li></ul><ul><ul><li>The number of neurons in the hidden layer may be: </li></ul></ul><ul><ul><ul><li>equal to the number of neurons in the input layer (Wierenga et Kluytmans, 1994), </li></ul></ul></ul><ul><ul><ul><li>equal to 75% of the number of neurons in the input layer (Venugopal et Baets, 1994), </li></ul></ul></ul><ul><ul><ul><li>equal to the root square of the product of the number of neurons in the input and output layers (Shepard, 1990). </li></ul></ul></ul><ul><ul><li>The activation function has been chosen by trying the three main important ones: logsig, tansig and purelin. </li></ul></ul><ul><li>Result </li></ul><ul><ul><li>Precision = 80.34% </li></ul></ul>Configuration of the NN : - number of hidden layer : 1 - number of neurons in the hidden layer : 3 - number of neurons in the input layer : 14 - number of neurons in the output layer : 1 - activation function : tansig and purelin. - epochs : 1000
  7. 7. Support Vector Machine <ul><li>LIBSVM (Library for Support Vector Machines) </li></ul><ul><li>Format Transformation </li></ul><ul><ul><li>The format of training and testing data file is: </li></ul></ul><ul><ul><ul><li><label> <index1>:<value1> <index2>:<value2> ... </li></ul></ul></ul><ul><ul><li>In order to turn the original data set into the required format. </li></ul></ul><ul><ul><li>Translates the original format into a new format </li></ul></ul><ul><ul><ul><li>New format does some reordering and mapping (attributes into numbers) </li></ul></ul></ul><ul><ul><ul><li>Old format </li></ul></ul></ul><ul><ul><ul><ul><li>50, Private, 138179, Assoc-acdm, 12, Married-civ-spouse, Craft-repair, Husband, White, Male, 0, 1902, 40, United-States, >50K </li></ul></ul></ul></ul><ul><ul><ul><li>New format (directly applicable to svm-train and svm-predict) </li></ul></ul></ul><ul><ul><ul><ul><li>0 1:50 2:0 3:138179 4:5 5:12 6:0 7:1 8:2 9:0 10:1 11:0 12:1902 13:40 14:0 </li></ul></ul></ul></ul>
  8. 8. Support Vector Machine <ul><li>Parameters </li></ul><ul><ul><li>SVM-type = C-SVC </li></ul></ul><ul><ul><ul><li>Parameter C = 1 </li></ul></ul></ul><ul><ul><li>Kernel function = radial basis function </li></ul></ul><ul><ul><ul><li>Degree in kernel function = 3 </li></ul></ul></ul><ul><ul><ul><li>Gamma in kernel function = 1/k </li></ul></ul></ul><ul><ul><ul><li>Coefficient0 in kernel function = 0 </li></ul></ul></ul><ul><ul><li>Epsilon in loss function = 0.1 </li></ul></ul><ul><ul><li>Tolerance of termination criterion = 0.001 </li></ul></ul><ul><ul><li>Shirinking = 1 </li></ul></ul><ul><ul><li>Parameter C of class i to weight*C = 1 </li></ul></ul><ul><li>Results </li></ul><ul><ul><li>Precision = 76.43% </li></ul></ul>
  9. 9. Maximum Entropy Model <ul><li>Language : Java </li></ul><ul><li>Library: OpenNLP MaxEnt-2.2.0 </li></ul><ul><li>Parameters </li></ul><ul><ul><li>Gis = 1201 </li></ul></ul><ul><ul><li>Iis = 923 </li></ul></ul><ul><ul><li>Steepest ascent = 212 </li></ul></ul><ul><ul><li>Conjugate gradient (fr) = 74 </li></ul></ul><ul><ul><li>Conjugate gradient (prp) = 63 </li></ul></ul><ul><ul><li>Limited memory variable metric = 70 </li></ul></ul><ul><li>Results </li></ul><ul><ul><li>Precision = 81.56% </li></ul></ul>
  10. 10. Cross Validation <ul><li>Why is it needed? </li></ul><ul><ul><li>If we do something to improve performance (Voting, Feature Selection, etc), </li></ul></ul><ul><ul><li>How can we know which one is better than other? </li></ul></ul><ul><li>How about to train for all training data and test for them? </li></ul><ul><ul><li>It’s not sufficient because it contains all answers. </li></ul></ul><ul><li>Cross Validation </li></ul><ul><ul><li>Setting aside some fraction of the known data and using it to test the prediction performance of a hypothesis induced from the remaining data. </li></ul></ul>
  11. 11. Feature Selection <ul><li>If we use features more and more, can we get more high precision? </li></ul><ul><ul><li>Some features can help to make a decision, but some features can’t. </li></ul></ul><ul><ul><li>Moreover, some features can disturb to make a decision. </li></ul></ul><ul><li>If we don’t use such bad features, we can get a better performance and more short training time. </li></ul>
  12. 12. Feature Selection <ul><li>Experiments on MEM </li></ul><ul><ul><li>Using all features </li></ul></ul><ul><ul><ul><li>Precision: 81.56% </li></ul></ul></ul><ul><ul><li>Using only 1 feature </li></ul></ul><ul><ul><ul><li>Precision: 76.05% (base line) </li></ul></ul></ul><ul><ul><ul><li>If we always answer “<=50K”, we can get 76.05%..!! </li></ul></ul></ul><ul><ul><li>Using 5 features </li></ul></ul><ul><ul><ul><li>Precision: 74.2%, 81.5%, 82.9% </li></ul></ul></ul><ul><ul><li>... </li></ul></ul><ul><ul><li>Using all features except 3 rd feature </li></ul></ul><ul><ul><ul><li>Precision: 86.95% (best features) </li></ul></ul></ul><ul><li>Improvement: 5.4% </li></ul><ul><li>Precision using all training data : 87.32% </li></ul>============================= MEM using only 3rd feature Partial Result: 2448/3200 = 0.765 2445/3200 = 0.7640625 2418/3200 = 0.755625 2453/3200 = 0.7665625 2423/3200 = 0.7571875 2450/3200 = 0.765625 2410/3200 = 0.753125 2445/3200 = 0.7640625 2424/3200 = 0.7575 2422/3200 = 0.756875 Last Result: 24338/32000 = 0.760586268320885 ============================= Baseline : 24338/32000 = 0.7606 All : 26098/32000 = 0.8156 … 11,12,13 : 26582/32000 = 0.8307 4,11,12,13 : 26694/32000 = 0.8342 2,4,7,11,12,13 : 26840/32000 = 0.8388 … 4,6,11,12,13 : 27477/32000 = 0.8587 4,8,11,12,13 : 27491/32000 = 0.8591 … 4,6,8,10,11,12,13 : 27516/32000 = 0.8599 2,4,6,7,8,10,11,12,13 : 27709/32000 = 0.8659 1,2,4,6,7,8,10,11,12,13 : 27788/32000 = 0.8684 … All except 3 rd : 27823/32000 = 0.8695
  13. 13. Voting <ul><li>What do we have to do when different learners give us different results? </li></ul><ul><ul><li>Voting by democracy </li></ul></ul><ul><ul><li>Weighted voting </li></ul></ul><ul><li>Precision of 3 learners </li></ul><ul><ul><li>MEM : 27942/32000 = 87.32% </li></ul></ul><ul><ul><li>NN : 25708/32000 = 80.34% </li></ul></ul><ul><ul><li>SVM : 24458/32000 = 76.43% </li></ul></ul><ul><li>Precision of Voting by democracy </li></ul><ul><ul><li>27382/32000 = 85.57% </li></ul></ul>
  14. 14. Conclusion <ul><li>We can get best result using only MEM </li></ul><ul><ul><li>Precision: 27942/32000 = 87.32% </li></ul></ul><ul><li>Why? </li></ul><ul><ul><li>We couldn’t use best features for NN, SVM </li></ul></ul><ul><ul><ul><li>Precision of SVM using best features : 29036/32000 = 90.74% </li></ul></ul></ul><ul><ul><li>We didn’t have experiments for voting </li></ul></ul>

×