Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using support vector machine with a hybrid feature selection method to the stock trend prediction

6,797 views

Published on

Published in: Technology, Education
  • Be the first to comment

Using support vector machine with a hybrid feature selection method to the stock trend prediction

  1. 1. Using support vector machine with a hybrid feature selection method to the stock trend prediction Ming-Chi Lee Expert Systems with Applications . 2009 Presenter: Yu Hsiang Huang Date: 2012-05-17 1
  2. 2. Outline• Introduction• Feature selection• Research design• Experimental results and analysis• Conclusion 2
  3. 3. Introduction• Stock market – Highly nonlinear dynamic system• Application of AI – Expert system , Fuzzy system, Neuron network – Back propagation neural network (BPNN) • Power of prediction is better than the others • Require a large amount of training data to estimate the distribution of input pattern • Over-fitting nature • Fully depends on researcher’s experience of knowledge to preprocess data – relevant input variables, hidden layer size, learning rate, momentum, etc. 3
  4. 4. Introduction• In this paper – Support vector machine (SVM) • Captures geometric characteristics of feature space without deriving weights of networks from the training data. • Extracts the optimal solution with the small training set size • Local optimal solution vs. Global optimum solution • No over-fitting • Classification performance is influenced by dimension or number of feature variables – Feature selection • Addresses the dimensionality reduction problem by determining a subset of available features which is most essential for classification • Hybrid feature selection : Filter method + wrapper method  F_SSFS • F_SSFS : F-score + Supported sequential forward search • Optimal parameter search – Compare performance between BP and SVM 4
  5. 5. SVM-based model with F_SSFS Original feature variablesHybrid feature selection Filter part Feature pruning using F-score Pre-selected features Wrapper part SSFS algorithm find best feature variables Best Feature variables Data SVM Training , testing , evaluating the classification accuracy 5
  6. 6. Feature selection• Filter method : – No feed back from classifier – Estimate the classification performance by some indirect assessments • Distance : reflect how well the classes separate from each other No feedback from classifier Estimate the classificationperformance : distance 6
  7. 7. Feature selection 7
  8. 8. Feature selection• F-score and Supported Sequential Forward Search (F_SSFS) – F-score Original feature variables Calculate F-score Sort F-score Select top K F-score feature 8
  9. 9. SVM-based model with F_SSFS Original feature variablesHybrid feature selection Filter part Feature pruning using F-score Pre-selected features Wrapper part SSFS algorithm find best feature variables Best Feature variables Data SVM Training , testing , evaluating the classification accuracy 9
  10. 10. Feature selection• Wrapper method: – Classifier-dependent • Evaluate the “goodness” of the selected feature subset directly (from classifier) • Should intuitively yield better performance – Have limit applications • Due to the high computational complexity involved Feedback from classifier 10
  11. 11. Feature selection• F-score and Supported Sequential Forward Search (F_SSFS) – Supported sequential forward search (SSFS) • Play the role of wrapper • A variation of the sequential forward search (SFS) algorithm that is specially tailored to SVM to expedite the feature searching process • Support vector : training samples other than support vectors have no contribution to determine the decision boundary • Dynamically maintains an active subset as the candidates of the support vector • Training SVM using reduced subset rather than the entire training set - less computational cost 11
  12. 12. Feature selection• F-score and Supported Sequential Forward Search (F_SSFS) – Supported sequential forward search (SSFS) f1 f2 f3 f4 … fk-2 fk-1 fk label r1 … … … … … … … … + r2 … … … … … … … … - … … … … … … … … … - rN … … … … … … … … + 12
  13. 13. Feature selection• F-score and Supported Sequential Forward Search (F_SSFS) – Supported sequential forward search (SSFS)Iteration = 1Iteration = n+1 Termination 13
  14. 14. Feature selection• F-score and Supported Sequential Forward Search (F_SSFS) – F_SSFS • Uses the F-score measure to decide the best feature subsets • Uses the SSFS algorithm to select the final best feature subsets • Reduces the number of features that has to be tested through the training of SVM • Reduces the unnecessary computation time spent on the testing of the “no-informative” features by wrapper method 14
  15. 15. Research design• Data collection and preprocessing – Prediction target : the direction of change in the daily NASDAQ index – Index futures lead the spot index – Using 30 technical indices as the whole features set – 20 future contracts, 9 spot indexes and 1-day lagged NASDAQ Index – Use “1” and “-1” to denote the next day’s index is higher or lower than today’s – From Nov 8, 2001 to Nov 8, 2007 with 1065 observations per feature – The original data are scaled into the range of (0,1) f1 f2 f3 … … f28 f29 f30 label 1 … … … … … … … … 1 2 … … … … … … … … -1 … … … … … … … … … -1 1065 … … … … … … … … 1 15
  16. 16. Research design 16
  17. 17. Research design 17
  18. 18. SVM-based model with F_SSFS Original feature variablesHybrid feature selection Filter part Feature pruning using F-score Pre-selected K features Wrapper part SSFS algorithm find best feature variables Best Feature variables Data SVM Training , testing , evaluating the classification accuracy 18
  19. 19. Experimental results and analysis• Experimental result of F_SSFS – Threshold K determines how many features we want to keep after filtering. • K is equal to the number of all original features  filter part does not contribute at all • K is equal to 1  the wrapper method is unnecessary 19
  20. 20. Experimental results and analysis 20
  21. 21. Experimental results and analysis• Experimental result of F_SSFS – wrapper part – Choose K = 22, after the process of wrapper part – 17 features are left, average accuracy rate 81.7% 21
  22. 22. Experimental results and analysis 22
  23. 23. Experimental results and analysis• Experimental result of SVM• Experimental result of BPNN 23
  24. 24. Experimental results and analysis• Experimental result of feature selection – Key deficiency of neural-network models for stock trend prediction • Difficulty in selecting the discriminative features and explaining the rationale for the stock trend prediction – Relative importance of each feature 24
  25. 25. Experimental results and analysis• Conclusion – Stock trend prediction – Support vector machine with hybrid feature selection method (F_SSFS) – Reducing high computational cost and the risk of over-fitting – Need to investigate to develop the optimal value of the parameters in SVM for the best prediction performance – Generalization of SVM on the basis of the appropriate level of the training set size and give a guideline to measure the generalization performance 25

×