• Like
eam2
Upcoming SlideShare
Loading in...5
×
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
371
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Machine Learning Contest Team2 정하용 , JC Bazin( 제이시 ), 강찬구
  • 2. Contents
    • Introduction
    • Machine Learning Algorithms
      • Neural Network
      • Support Vector Machine
      • Maximum Entropy Model
    • Feature Selection
    • Voting
    • Conclusion
  • 3. Introduction
    • In this team project, we were asked to develop a program which predicts whether a person's income is greater than 50K per year or not.
    • The objective of contest
      • Good result..!!
  • 4. Machine Learning Algorithms
    • For this project, we used 3 different learning algorithms :
      • Neural Network
      • Support Vector Machine
      • Maximum Entropy Model
  • 5. Neural network
    • the MATLAB NN toolbox using the feed-forward back-propagation algorithm.
    • Format Transformation
      • only numbers are allowed
      • transforms the data into integers assigning to a value its position in the given list of attribute.
      • E.g. :
      • Race : ( White, Asian-Pac-Islander, Amer-Indian-Eskimo, other, Black )
      • 2 will be assign to “Asian-Pac-Islander”
      • The “? unknown information” has been assigned by -1
      • All the others data are positive values
  • 6. Neural network
    • Parameters
      • The number of neurons in the hidden layer may be:
        • equal to the number of neurons in the input layer (Wierenga et Kluytmans, 1994),
        • equal to 75% of the number of neurons in the input layer (Venugopal et Baets, 1994),
        • equal to the root square of the product of the number of neurons in the input and output layers (Shepard, 1990).
      • The activation function has been chosen by trying the three main important ones: logsig, tansig and purelin.
    • Result
      • Precision = 80.34%
    Configuration of the NN : - number of hidden layer : 1 - number of neurons in the hidden layer : 3 - number of neurons in the input layer : 14 - number of neurons in the output layer : 1 - activation function : tansig and purelin. - epochs : 1000
  • 7. Support Vector Machine
    • LIBSVM (Library for Support Vector Machines)
    • Format Transformation
      • The format of training and testing data file is:
        • <label> <index1>:<value1> <index2>:<value2> ...
      • In order to turn the original data set into the required format.
      • Translates the original format into a new format
        • New format does some reordering and mapping (attributes into numbers)
        • Old format
          • 50, Private, 138179, Assoc-acdm, 12, Married-civ-spouse, Craft-repair, Husband, White, Male, 0, 1902, 40, United-States, >50K
        • New format (directly applicable to svm-train and svm-predict)
          • 0 1:50 2:0 3:138179 4:5 5:12 6:0 7:1 8:2 9:0 10:1 11:0 12:1902 13:40 14:0
  • 8. Support Vector Machine
    • Parameters
      • SVM-type = C-SVC
        • Parameter C = 1
      • Kernel function = radial basis function
        • Degree in kernel function = 3
        • Gamma in kernel function = 1/k
        • Coefficient0 in kernel function = 0
      • Epsilon in loss function = 0.1
      • Tolerance of termination criterion = 0.001
      • Shirinking = 1
      • Parameter C of class i to weight*C = 1
    • Results
      • Precision = 76.43%
  • 9. Maximum Entropy Model
    • Language : Java
    • Library: OpenNLP MaxEnt-2.2.0
    • Parameters
      • Gis = 1201
      • Iis = 923
      • Steepest ascent = 212
      • Conjugate gradient (fr) = 74
      • Conjugate gradient (prp) = 63
      • Limited memory variable metric = 70
    • Results
      • Precision = 81.56%
  • 10. Cross Validation
    • Why is it needed?
      • If we do something to improve performance (Voting, Feature Selection, etc),
      • How can we know which one is better than other?
    • How about to train for all training data and test for them?
      • It’s not sufficient because it contains all answers.
    • Cross Validation
      • Setting aside some fraction of the known data and using it to test the prediction performance of a hypothesis induced from the remaining data.
  • 11. Feature Selection
    • If we use features more and more, can we get more high precision?
      • Some features can help to make a decision, but some features can’t.
      • Moreover, some features can disturb to make a decision.
    • If we don’t use such bad features, we can get a better performance and more short training time.
  • 12. Feature Selection
    • Experiments on MEM
      • Using all features
        • Precision: 81.56%
      • Using only 1 feature
        • Precision: 76.05% (base line)
        • If we always answer “<=50K”, we can get 76.05%..!!
      • Using 5 features
        • Precision: 74.2%, 81.5%, 82.9%
      • ...
      • Using all features except 3 rd feature
        • Precision: 86.95% (best features)
    • Improvement: 5.4%
    • Precision using all training data : 87.32%
    ============================= MEM using only 3rd feature Partial Result: 2448/3200 = 0.765 2445/3200 = 0.7640625 2418/3200 = 0.755625 2453/3200 = 0.7665625 2423/3200 = 0.7571875 2450/3200 = 0.765625 2410/3200 = 0.753125 2445/3200 = 0.7640625 2424/3200 = 0.7575 2422/3200 = 0.756875 Last Result: 24338/32000 = 0.760586268320885 ============================= Baseline : 24338/32000 = 0.7606 All : 26098/32000 = 0.8156 … 11,12,13 : 26582/32000 = 0.8307 4,11,12,13 : 26694/32000 = 0.8342 2,4,7,11,12,13 : 26840/32000 = 0.8388 … 4,6,11,12,13 : 27477/32000 = 0.8587 4,8,11,12,13 : 27491/32000 = 0.8591 … 4,6,8,10,11,12,13 : 27516/32000 = 0.8599 2,4,6,7,8,10,11,12,13 : 27709/32000 = 0.8659 1,2,4,6,7,8,10,11,12,13 : 27788/32000 = 0.8684 … All except 3 rd : 27823/32000 = 0.8695
  • 13. Voting
    • What do we have to do when different learners give us different results?
      • Voting by democracy
      • Weighted voting
    • Precision of 3 learners
      • MEM : 27942/32000 = 87.32%
      • NN : 25708/32000 = 80.34%
      • SVM : 24458/32000 = 76.43%
    • Precision of Voting by democracy
      • 27382/32000 = 85.57%
  • 14. Conclusion
    • We can get best result using only MEM
      • Precision: 27942/32000 = 87.32%
    • Why?
      • We couldn’t use best features for NN, SVM
        • Precision of SVM using best features : 29036/32000 = 90.74%
      • We didn’t have experiments for voting