The document presents a methodology for predicting stock market prices using support vector machine regression (SVR) with different windowing techniques. It involves collecting historical stock market data, preprocessing the data using various windowing approaches to convert the time series to a supervised learning format, training SVR models on the windowed data with different parameters, and evaluating the models' ability to predict stock prices on testing data. The results show that de-flattening and 5-day windows achieved the lowest prediction errors compared to the actual stock prices in the testing period.
Predicting Stock Market Price Using Support Vector Regression
1. Agenda
•
Master’s Thesis Presentation
•
Title:
Predicting Stock Market Price Using Support Vector
Machine with Different Kinds of Windows
•
By:
Name: Risul Islam Rasel
Student ID: 5407011866512
Major Field: I-MIT
•
Advisor:
•
Assoc. Prof. Dr. Phayung Meesad
Copyright@IT.KMUTNB
Introduction
- Purpose of the study
- Scope of the study
Literature review
- Time series Prediction
- SVR
- Windowing
- Some recent research work
Experiment design
- Data collection
- Data preprocess
- Work flow diagram
- Model Tree structure
Results
- windowing parameter values
- SVR kernel function parameter value
- model result analysis
- Error calculation
Conclusion
Copyright@IT.KMUTNB
2
Introduction
• Stock exchange :
- is an emerging business sector which becomes more popular among the
people.
- many people, organizations are related to this business.
- gaining insight about the market trend is become an important factor
• Stock trend or price prediction is regarding as challenging task because
Purpose of the study
•
•
•
To propose a stock market time series prediction model combining support
vector machine regression (SVR) and windowing operator.
To apply the propose model to different stock market historical data set.
To evaluate the model’s prediction results with real time data set from
stock markets in order to measure the prediction accuracy.
- Essentially a non-liner, non parametric
- Noisy
- Deterministically chaotic system
• Why deterministically chaotic system?
- Liquid money and Stock adequacy
- Human behavior and News related to the stock market
- Share Gambling
- Money exchange rate, etc.
Copyright@IT.KMUTNB
3
Copyright@IT.KMUTNB
4
1
2. Literature review
Time Series Prediction
Scope of the study
•
•
•
To develop a model which can rise early warning for financial crisis in
stock market and as well as to gain insight about the current trend of the
market.
Propose a model which can be applied to different stock index in order to
predict stock prices. For this study, data is collected from the Dhaka stock
exchange (DSE), Bangladesh such that research result can be compared
and evaluated. 4 years (2009-2012) of historical data sets are collected and
separated into two groups, training data set (2009-2011) and testing data
set (2012).
To compare the prediction results with different stock market time series
data from different stock index in order to evaluate the performance.
Copyright@IT.KMUTNB
9
8
7
6
5
4
3
2
1
0
open
•
•
•
A time series is a sequence of data points, measured typically at successive
points in time space at uniform time intervals.
Examples of time series are the daily closing values of the stock index, daily
exchange rate, daily rainfall, flow volume of river etc.
A time series analysis consists of two steps:
(1) Building a model that represents a time series, and
(2) Using the model to predict (forecast) future values.
Forecasting systems are usually fed by some time series members of the last
several days whereas the next day closing price is obtained at the system
output, i.e.
Close [t-n], Close [t-n+1] ,…., Close [t-1], Close [t] Close [t+1]
Copyright@IT.KMUTNB
6
Support Vector Machine Regression
high
low
close
Support vector machine (SVM), a novel artificial intelligence-based method
developed from statistical learning theory
SVM has two major futures: classification (SVC) & regression (SVR).
In SVM regression, the input is first mapped onto a m-dimensional feature
space using some fixed (nonlinear) mapping, and then a linear model is
constructed in this feature space.
a margin of tolerance (epsilon) is set in approximation.
This type of function is often called – epsilon intensive – loss function.
Usage of slack variables to overcome noise in the data and non – separability.
1/
1/
20
12
1/
2/
20
12
1/
3/
20
12
1/
4/
20
12
1/
5/
20
12
1/
6/
20
12
1/
7/
20
12
1/
8/
20
12
1/
9/
20
12
Value (Price)
5
•
Tim e (Days)
Figure 1: Stock Market Time Series
Figure 2: Time Series Prediction Process
Copyright@IT.KMUTNB
7
Copyright@IT.KMUTNB
8
2
3. Windowing operator:
transform the time series data into
a generic data set
convert the last row of a window
within the time series into a label
or target variable
Fed the cross sectional values as
inputs to the machine learning
technique such as liner regression,
Neural Network, Support vector
machine and so on.
• Parameters:
Horizon (h)
Window size
Step size
Training window width
Testing window width
Figure 3: Linear and Nonlinear SVR
Copyright@IT.KMUTNB
9
Copyright@IT.KMUTNB
10
• Normal rectangular windowing
• Single attribute close is
selected, window size is 3,
horizon size is 1
Figure 4: converting time series to windowed data
• So, close-0/close-1/close-2
windowed attributes are
generated.
• label = (WS+Hz) th value
= (3+1)=4th value
Copyright@IT.KMUTNB
11
Copyright@IT.KMUTNB
12
3
4. • Flatten windowing
• De-flatten windowing
• First, it removes all attributes lying between
the time point zero (attribute name ending "0") and the time point before horizon values.
• Second, it transforms the corresponding time
point zero of the specified label stem to the
actual label.
• Last, it re-represents all values relative to the
last known time value for each original
dimension including the label value.
* Since horizon = 1, so close-1 all 0 and close0 deleted.
* (old close 2 – old close 1) = new close-2.
* New close-0 = old close-0 – old close-1
(since horizon-1).
* Since close-0 (old) selected as base_value, so
base value=oldclose-0 – newclose-0
Copyright@IT.KMUTNB
13
• It adds the values of the base value
special attribute to both the label and
the predicted label (if available) so the
original data (range) is restored.
• After that, it removes the base value
special attribute.
* label + close 0= label_original
(since close 0 = base_value)
* close-0 is removed, since it was
selected as base value.
Copyright@IT.KMUTNB
14
Original Time
series data
Some recent research works
Normal
Windowed data
set
Flatten Windowed data set
De-flatten Windowed data set
1. “Stock Forecasting Using Support Vector Machine,”
•
Authors: Lucas, K. C. Lai, James, N. K. Liu
•
Applied technique: SVM and NN
•
Data preprocess technique: Exponential Moving Average (EMA15) and relative difference in percentage of
price (RDP)
•
Domain: Hong Kong Stock Exchange
2. “Stock Index Prediction: A Comparison of MARS, BPN and SVR in an Emerging Market,”
•
Authors: Lu, Chi-Jie, Chang, Chih-Hsiang, Chen, Chien-Yu, Chiu, Chih-Chou, Lee, Tian-Shyug,
•
Applied technique: Multivariate adaptive regression splines (MARS), Back propagation neural network
(BPN), support vector regression (SVR), and multiple linear regression (MLR).
•
Domain: Shanghai B-share stock index
3. “An Improved Support Vector Regression Modeling for Taiwan Stock Exchange Market Weighted
Index Forecasting,”
•
•
•
Copyright@IT.KMUTNB
15
Authors: Kuan-Yu. Chen, Chia-Hui. Ho
Applied technique: SVR, GA, Auto regression (AR)
Domain: Taiwan Stock Exchange
Copyright@IT.KMUTNB
16
4
5. Company
Open
High
Low
Close
Volume
January 25, 2009
776
790
767
781.25
2,600.00
1STICB
January 25, 2009
5100
5100
5100
5100
10.00
2NDICB
Methodology
Date
1STBSRS
January 25, 2009
1782
1782
1781
1781.5
15.00
4THICB
1040
1010
1031.5
60.00
1005
1006
999
1001.5
60.00
January 25, 2009
543.75
544
520
524
2,840.00
7THICB
January 25, 2009
710
710
680
702.75
250.00
8THICB
January 25, 2009
495
504
495
498.5
250.00
ABBANK
Experiment dataset had been collected from Dhaka stock exchange (DSE),
Bangladesh.
4 year’s (January 2009-June 2012)historical data were collected.
Almost 522 company are listed in DSE. But for the convenient of the experiment we
only select one well known company data.
Dataset had 6 attributes. Date, open price, high price, low price, close price, volumes.
5 attributes were used in experiment except volumes.
Total 822 days data. 700 data were used as training dataset, and 122 data were used
as testing dataset.
1040
January 25, 2009
6THICB
Data collection
January 25, 2009
5THICB
January 25, 2009
759.75
759.75
732.5
734.25
34,740.00
289,100.00
ACI
January 25, 2009
518
538.5
514
533.1
AFTABAUTO
January 25, 2009
418.5
448
418.5
442.25
155,090.00
AGNISYSL
January 25, 2009
65
66.1
63
63.3
264,500.00
AIMS1STMF
January 25, 2009
15.05
15.15
14.82
14.89
2,417,500.00
ALARABANK
January 25, 2009
419
419.25
410
410.75
22,750.00
AMBEEPHA
January 25, 2009
132
139.9
130
138.3
50,800.00
AMCL(PRAN)
January 25, 2009
1150
1203
1145
1181
6,960.00
APEXADELFT
January 25, 2009
2220
2250
2185
2200.25
7,220.00
APEXFOODS
January 25, 2009
881.25
907
874
885
4,645.00
APEXSPINN
January 25, 2009
472
500
472
484.25
380.00
APEXTANRY
January 25, 2009
1060
1083
1042
1051.5
37,690.00
3,950.00
ARAMIT
17
Company
Date
Open
High
Low
January 25, 2009
518
538.5
514
January 26, 2009
538.5
541
515
January 27, 2009
515
519
507.1
January 28, 2009
509
519.9
509
January 29, 2009
515.5
521.9
506.2
February 1, 2009
504.2
512
504
February 2, 2009
511
512.9
497
498.5
ACI
February 3, 2009
500
500
489.1
February 4, 2009
504
505.9
492.1
495.2
ACI
February 5, 2009
495.2
500.9
490.4
493.5
ACI
February 8, 2009
508
508
485
354.9
335
339.8
76,950.00
489.3
ACI
February 9, 2009
489
491
471.1
February 10, 2009
472.1
485.9
463
481.4
ACI
February 11, 2009
488
488
460
463.8
ACI
February 12, 2009
477
485
469
February 15, 2009
485
490
474
475.3
ACI
February 16, 2009
476
479.5
470
472.3
ACI
February 17, 2009
473
485
472.1
477.1
ACI
February 18, 2009
484
500
483
494
ACI
February 19, 2009
499.9
502
478
483.5
ACI
February 22, 2009
479
484.9
472.2
478.5
ACI
February 23, 2009
480
488
476.2
479.2
ACI
February 25, 2009
475
477.9
463
Data pre-process
(windowing)
481.5
ACI
Training phase
475.6
ACI
Copyright@IT.KMUTNB
350
493.3
ACI
9,100.00
January 25, 2009
505.3
ACI
220.5
508.7
ACI
268.6
215.5
514.5
ACI
267.1
225
508.8
ACI
270.1
215.5
516.5
ACI
267.2
January 25, 2009
533.1
ACI
January 25, 2009
Close
ACI
Copyright@IT.KMUTNB
ASIAPACINS
ATLASBANG
Copyright@IT.KMUTNB
Training phase
Step 1: Read the training dataset from local
repository.
Step 2: Apply windowing operator to transform the
time series data into a generic dataset. This step will
convert the last row of a window within the time
series into a label or target variable. Last variable is
treated as label.
Step 3: Accomplish a cross validation process of the
produced label from windowing operator in order to
feed them as inputs into SVR model.
Step 4: Select kernel types and select special
parameters of SVR (C, epsilon (+/-), g etc).
Step 5: Run the model and observe the performance
(accuracy).
Step 6: If performance accuracy is good than go to
step 7, otherwise go to step 4.
Step 7: Exit from the training phase & apply trained
model to the testing dataset.
Testing phase
Step 1: Read the testing dataset from local
repository.
Step 2: Apply the training model to test the out of
sample dataset for price prediction.
Step 3: Produce the predicted trends and stock price
20
469.1
Machine learning
(SVR)
Testing phase
Copyright@IT.KMUTNB
5
6. Results
Data pre-process & optimized input selection
Windowing
Name
Model
Rectangular
All
3
1
30
30
1 day
3
1
30
30
5 days
8
1
30
30
22 days
25
1
30
30
All
5
1
30
30
Flatten window
De-Flatten
window
Window
size
Step size
Training
window
Testing
window
SVR Kernel function parameter settings
SVR Model
1 day a-head
5 days a-head
Actual
Predicted
Error
Actual
Predicted
Error
Actual
Predicted
Jan'12
3846.7
3924.9
-78.2
2973.6
3115.8
-142.2
---
---
ε-
1
1
RBF
10000
1
2
1
1
RBF
10000
1
2
1
1
Copyright@IT.KMUTNB
22
3310.9
4252.3
-941.4
3310.9
3849.9
-539.0
2976.2
3387.5
-411.3
1025.9
-138.7
---
---
4.4
3310.9
3319.7
-8.8
2976.2
3494.8
-518.6
39.1
4015.8
3899.6
116.2
4015.8
---
3482.5
533.3
5279.5
5242.2
37.3
5279.5
5139.4
140.1
5279.5
4413.4
4417.0
-101.3
4315.7
4601.0
-285.3
3958.6
4321.1
-362.5
Jun'12
3437.0
3447.8
-10.8
2604.1
2651.1
-47.0
3437.0
4417.0
-980.0
Jan'12
3410.7
6680.7
-3270.0
2551.7
4721.2
-2169.5
---
---
---
Feb'12
3310.9
9756.1
-6445.2
3310.9
12826.9
-9516.0
5506.4
2981.4
4015.8
7778.7
-3762.9
4015.8
8705.9
-4690.1
4015.8
10189.4
Actual Close (A)
Actual Close (A)
-6173.6
5279.5
10330.9
-5051.4
5279.5
8832.3
-3552.8
5279.5
8072.5
-2793.0
4315.7
9527.1
-5211.4
4315.7
8535.5
-4219.8
4315.7
8890.2
-4574.5
Jun'12
3437.0
10381.4
-6944.4
3437.0
13218.6
-9781.6
3437.0
9310.5
-5873.5
300
250
200
150
100
50
0
1/
1/
20
12
1/
15
/2
01
2
1/
29
/2
01
2
2/
12
/2
01
2
2/
26
/2
01
2
3/
11
/2
01
2
3/
25
/2
01
2
4/
8/
20
12
4/
22
/2
01
2
5/
6/
20
12
5/
20
/2
01
2
Apr'12
May'12
Predicted close (P)
Predicted close (P)
22 days a-head model's result for DS E using Flatten window
2525.0
Mar'12
Days
Days
866.1
4315.7
Close Price (BDT)
Apr'12
May'12
6/3/2012
3112.3
3306.5
3976.7
6/17/2012
-37.0
3310.9
4015.8
5/6/2012
3883.7
Feb'12
Mar'12
5/20/2012
-863.0
4/8/2012
4300.0
4/22/2012
3437.0
3/25/2012
-567.1
3/11/2012
4004.1
2/26/2012
3437.0
2/12/2012
-1076.2
1/29/2012
4513.2
Close Price (BDT)
3437.0
300
250
200
150
100
50
0
1/1/2012
Jun'12
5 Days a-head model's results for DSE using Flatten window
1 Day a-head model's results for DSE using Flatten window
300
250
200
150
100
50
0
6/3/2012
-189.7
6/17/2012
4253.6
4505.4
5/6/2012
4015.8
5/20/2012
5279.5
4315.7
4/8/2012
20.9
4/22/2012
224.4
-447.4
2973.6
3994.9
3/25/2012
5055.1
4763.1
3/11/2012
4015.8
2/26/2012
5279.5
4315.7
2/12/2012
-93.6
1/29/2012
97.4
-297.6
3846.7
4109.4
1/1/2012
5182.1
4613.3
Jan'12
4015.8
1/15/2012
735.2
5279.5
4315.7
1/15/2012
3280.6
Apr'12
Close Price (BDT)
Mar'12
Error
May'12
De-Flatten
ε+
2
---
Feb'12
Flatten
ε
1
22 days a-head
Month
Rectangular
g
10000
22 Days ahead
21
C
RBF
5 Days a-head
Copyright@IT.KMUTNB
Kernel
1 Day a-head
Days
Copyright@IT.KMUTNB
23
Copyright@IT.KMUTNB
Actual Close (A)
Predicted close (P)
24
6
7. Result evolution technique:
MAPE for Rectangular window
Error calculation: Used MAPE
MAPE: Mean Average Percentage Error (MAPE) was used to calculate the error rate
between actual and predicted price.
MAPE for Flatten window
1.2
1
1.5
MAPE (error)
MAPE (error)
2
1
0.5
0.8
0.6
0.4
0.2
0
0
n
∑
M A P E = 100 ×
i =1
A−P
A
n
Jan
Feb
Mar
Apr
May
Jan
June
Feb
1 day a-head
Model
Horizon
Rectangular
window
Flatten
window
Mar
Apr
May
June
Month
Month
5 days a-head
1 day a-head
22 days a-head
Deflatten
window
5 days a-head
22 days a-head
MAPE for De-Flatten window
20
1 Day a-head
1
0.65
0.08
6.84
5 days a-head
5
0.48
0.19
8.28
22 days a-head
A = Actual price
22
0.82
0.74
5.40
MAPE (error)
Here,
15
10
5
0
P = Predicted price
Jan
Feb
1 day a-head
Copyright@IT.KMUTNB
25
Mar
Apr
May
June
Month
n = number of data to be counted
5 days a-head
22 days a-head
Copyright@IT.KMUTNB
26
Conclusion
•
Discussions :
Compare with other Index data
Window Type
Model
S&P 500
DSE
IBM
1 day a-head
Rectangular
0.65
0.65
0.02
5 days a-head
0.74
0.48
0.57
22 days a-head
1.43
0.82
3.22
1 day a-head
0.01
0.08
0.01
5 days a-head
0.03
0.19
0.47
22 days a-head
Flatten
0.14
0.74
0.21
1 day a-head
3.99
6.84
3.86
8.28
5.40
Apply other windowing operators.
Compare the model results with other machine learning techniques.
6.51
2.45
Future works:
5.84
5 days a-head
22 days a-head
De-Flatten
Different windowing function can produce different prediction results.
In this study 3 types of windowing operators are used. Normal rectangular window,
Flatten window, De-flatten window.
Rectangular and flatten windows are able to produce good prediction result for time
series data.
De-flatten window can not produce good prediction results.
Index Name
6.19
** S&P 500 and IBM index data were collected from: Google Finance.“http://www.google.com/finance”
Copyright@IT.KMUTNB
27
Copyright@IT.KMUTNB
28
7
8. • Publication
1) P.Meesad, R.I.Rasel. “ Dhaka Stock Exchange Trend Analysis Using Support Vector
Regression.” In: Advances In Intelligent System and Computing 209 (Springer), 2013,
pp:135-143.
2) Phayung Meesad, Risul Islam Rasel. “ Stock Market Price Prediction Using Support
Vector Regression.” In: 2nd International Conference on Informatics, Electronic and
Vision (ICIEV’2013), Indexed in IEEE explore, pp:1-6.
THANK YOU
• Presentation
1) 9th International Conference in Communication and Information Technology
(IC2IT’2013)
2) 2nd International Conference on Informatics, Electronic and Vision (ICIEV’2013).
Copyright@IT.KMUTNB
29
Copyright@IT.KMUTNB
30
8