SlideShare a Scribd company logo
Financial data mining III
Price movement prediction in Hong
Kong equity market
GROUP LCW1004
Ying Ting Chung
Au Chun Man
Review
• To learn patterns from historical data
– using data mining
– for forecasting future price movement
• HSI (Hang Seng Index), daily prices
Prediction Problem
Semester 1
• Finished data collection and pre-processing
• Binary classification
– Target label: “8-day bottom”
Semester 1
• K-Nearest Neighbour (k-NN)
– Low bias
• Sliding window testing
– Model changes through time
Data Visualization
• Label against various input attribute
• Label distribution through time
• Input Attributes:
• Prediction Target: 5-day bottom
Slope of SMA(22) and EMA(8, 13, 18)
Price Deviation from SMA(9, 22) and EMA(8, 16)
Percentage aggregate return for last 5 and 6 days
Daily gap open
Data Visualization
Parameter Estimation
• Selecting no. of voting neighbours
0.53
0.535
0.54
0.545
0.55
0.555
0.56
0.565
0.57
0.575
0.58
0 5 10 15 20 25 30
K (neighbours)
window size = 45 (days)
Precision
Recall
Parameter Estimation
• Selecting no. of days looking back
– Window size reduced, performance improved?
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
0 20 40 60 80 100 120
window size (days)
K = 3
Precision
Recall
Problem with fixed sliding window
• Given 3-NN majority voting
• Property of target label
– Continually positive
• Fail to learn
– From history
– From input variable
Function of time as attribute
• Recent labels are more important in trend-
dominant market
• New dimension in feature space:
Exponential function of time index
• Recent data vectors pulled closer
to current query point
• i.e. Memory decay
• controversial
Function of time as attribute
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
0 100 200 300 400 500 600
window size (days)
K = 3
Precision
Recall
Function of time as attribute
• New dimension in feature space:
Sine function of time index
• Data vectors at regular interval pulled closer
to each other
• Choice of period?
Function of time as attribute
0.65
0.66
0.67
0.68
0.69
0.7
0.71
0.72
0.73
0.74
0.75
0 100 200 300 400 500 600
window size (days)
K = 3
Precision
Recall
Result
• Looking back: 540 days
• K = 3 neighbours
• Error rate: 24%
• Precision: 0.731; Recall: 0.698
Limitation of label definition
• Consider a use case:
– A trader buys on first occurrence of positive
prediction?
– He needs a smart trading rule
Prediction Problem
Re-consider label definition
• N-bottom + r% gain after N days
– For each time point t, if its closing price is lower
than the low price by m% after N time points, then
time point t is a hold-N-bottom.
• Trading signal
Re-consider label definition
• N-bottom + m% stop-loss within N days
– For each time point t, if its closing price is lower
than the minimum low price within N time points
in the future by m%, then time point t is a stop-N-
bottom.
• More specific, more useful
• Fewer positive labels -> imbalanced dataset
New Result
• Looking back: 120 days
• K = 3 neighbours
• Error rate: 19.6%
• Precision: 0.601; Recall: 0.476
Artificial Neural Networks
• Theory
• Input and Output Selection
• Experiment
Artificial Neural Networks - Theory
Output units yp
Input units xn
Hidden units hm
Back Propagation Algorithm
• Step 1 : forward pass
• Step 2 : backward pass
• Step 3 : weight updates
Back Propagation Algorithm –
Step 1 :forward pass
• set of input vectors:
• set of output vectors:
.
Back Propagation Algorithm –
Step 1 :forward pass
• f(.) is a sigmoid activation function
x
e
xsigmoid 


1
1
)(
Back Propagation Algorithm –
Step 2 :backward pass
• Sum of squared errors:
Gradient Descent
• To minimize E
• partial derivation iW
E


Gradient Descent – local minimum
Step 3 : weight updates
• Weight updating:
• learning rate : η
• momentum factor : α
Learning Rate & Momentum
Limitations
• requires lots of supervised training, with lots
of input-output examples
• there is no guarantee that the system will
converge to an acceptable solution
~local minimum
Experiment
• The learning rate :0.25 to 0.5 ,and
• Momentum set to WEKA’s default:0.2
• the size of training set: 500 to 400000.
Experiment
• Confusion matrix
TNFPNegative
FNTPPositive
NegativePositive
True
Predicted
Experiment - Result
Size of
training set
500 5000 10000 20000 30000 400000
Learning
rate = 0.25
N/A
Learning
rate = 0.3
Learning
rate = 0.35
Learning
rate = 0.5
N/A
Result analysis
FPTP
TP
FNTNFPTP
TNTP





Precision
Accuracy
TNFPNegative
FNTPPositive
NegativePositive
True
Predicted
Result analysis
RSI(14) Accuracy Precision
4 day bottom 0.5209627 0.6080084
8 day bottom 0.4274533 0.5185729
12 day bottom 0.5257611 0.54337349
16 day bottom 0.5111502 0.46223958
20 day bottom 0.5226190 0.42605156
Size of training set Learning rate = 0.35 Accuracy Precision
500 0.716129032 0.75
5000 0.754838709 0.782868525
10000 0.709677419 0.779068767
20000 0.711290322 0.780582524
30000 0/711290322 0.778420038
40000 0.709677419 0.779069767
Result analysis - Problem?
• False Positive
TNFPNegative
FNTPPositive
NegativePositive
True
Predicted
FNTPTNFP
FP
rateFPSpecial
False Positive
RSI(14) Special FP rate
4 day bottom 0.216783216
8 day bottom 0.23656542
12 day bottom 0.221896955
16 day bottom 0.242370892
20 day bottom 0.251785714
Size of training set Learning rate = 0.35 Special FP rate
500 0.233870967
5000 0.175806451
10000 0.183870967
20000 0.182258064
30000 0.185483871
Discussion
• 100% accuracy is impossible in practice
• Effectiveness of classification model also
depends on how it is used, i.e. coupled with
trading rules
Further Improvement
How if?!

More Related Content

Similar to Price movement prediction in Hong Kong equity market

An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
milad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Mehrnaz Faraz
 
1710 track3 zhu
1710 track3 zhu1710 track3 zhu
1710 track3 zhu
Rising Media, Inc.
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
DongHyun Kwak
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning Systems
Trieu Nguyen
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
 
Wayfair-Data Science Project
Wayfair-Data Science ProjectWayfair-Data Science Project
Wayfair-Data Science Project
Mehnaz Maharin
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
Mikko Mäkipää
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Dalei Li
 
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
WithTheBest
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
harmonylab
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Experfy
 
6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdf6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdf
ssuserdca880
 
Applications of Machine Learning in High Frequency Trading
Applications of Machine Learning in High Frequency TradingApplications of Machine Learning in High Frequency Trading
Applications of Machine Learning in High Frequency Trading
Ayan Sengupta
 
Deep learning
Deep learningDeep learning
Deep learning
Rouyun Pan
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
Kishor Datta Gupta
 
Six sigma pedagogy
Six sigma pedagogySix sigma pedagogy
Six sigma pedagogy
MallikarjunRaoPanaba
 
Six sigma
Six sigma Six sigma
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Rising Media, Inc.
 

Similar to Price movement prediction in Hong Kong equity market (20)

An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
1710 track3 zhu
1710 track3 zhu1710 track3 zhu
1710 track3 zhu
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning Systems
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Wayfair-Data Science Project
Wayfair-Data Science ProjectWayfair-Data Science Project
Wayfair-Data Science Project
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
 
6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdf6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdf
 
Applications of Machine Learning in High Frequency Trading
Applications of Machine Learning in High Frequency TradingApplications of Machine Learning in High Frequency Trading
Applications of Machine Learning in High Frequency Trading
 
Deep learning
Deep learningDeep learning
Deep learning
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
 
Six sigma pedagogy
Six sigma pedagogySix sigma pedagogy
Six sigma pedagogy
 
Six sigma
Six sigma Six sigma
Six sigma
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 

Recently uploaded (20)

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 

Price movement prediction in Hong Kong equity market

  • 1. Financial data mining III Price movement prediction in Hong Kong equity market GROUP LCW1004 Ying Ting Chung Au Chun Man
  • 2. Review • To learn patterns from historical data – using data mining – for forecasting future price movement • HSI (Hang Seng Index), daily prices
  • 4. Semester 1 • Finished data collection and pre-processing • Binary classification – Target label: “8-day bottom”
  • 5. Semester 1 • K-Nearest Neighbour (k-NN) – Low bias • Sliding window testing – Model changes through time
  • 6. Data Visualization • Label against various input attribute • Label distribution through time • Input Attributes: • Prediction Target: 5-day bottom Slope of SMA(22) and EMA(8, 13, 18) Price Deviation from SMA(9, 22) and EMA(8, 16) Percentage aggregate return for last 5 and 6 days Daily gap open
  • 8. Parameter Estimation • Selecting no. of voting neighbours 0.53 0.535 0.54 0.545 0.55 0.555 0.56 0.565 0.57 0.575 0.58 0 5 10 15 20 25 30 K (neighbours) window size = 45 (days) Precision Recall
  • 9. Parameter Estimation • Selecting no. of days looking back – Window size reduced, performance improved? 0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 0 20 40 60 80 100 120 window size (days) K = 3 Precision Recall
  • 10. Problem with fixed sliding window • Given 3-NN majority voting • Property of target label – Continually positive • Fail to learn – From history – From input variable
  • 11. Function of time as attribute • Recent labels are more important in trend- dominant market • New dimension in feature space: Exponential function of time index • Recent data vectors pulled closer to current query point • i.e. Memory decay • controversial
  • 12. Function of time as attribute 0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0 100 200 300 400 500 600 window size (days) K = 3 Precision Recall
  • 13. Function of time as attribute • New dimension in feature space: Sine function of time index • Data vectors at regular interval pulled closer to each other • Choice of period?
  • 14. Function of time as attribute 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0 100 200 300 400 500 600 window size (days) K = 3 Precision Recall
  • 15. Result • Looking back: 540 days • K = 3 neighbours • Error rate: 24% • Precision: 0.731; Recall: 0.698
  • 16. Limitation of label definition • Consider a use case: – A trader buys on first occurrence of positive prediction? – He needs a smart trading rule
  • 18. Re-consider label definition • N-bottom + r% gain after N days – For each time point t, if its closing price is lower than the low price by m% after N time points, then time point t is a hold-N-bottom. • Trading signal
  • 19. Re-consider label definition • N-bottom + m% stop-loss within N days – For each time point t, if its closing price is lower than the minimum low price within N time points in the future by m%, then time point t is a stop-N- bottom. • More specific, more useful • Fewer positive labels -> imbalanced dataset
  • 20. New Result • Looking back: 120 days • K = 3 neighbours • Error rate: 19.6% • Precision: 0.601; Recall: 0.476
  • 21. Artificial Neural Networks • Theory • Input and Output Selection • Experiment
  • 22. Artificial Neural Networks - Theory Output units yp Input units xn Hidden units hm
  • 23. Back Propagation Algorithm • Step 1 : forward pass • Step 2 : backward pass • Step 3 : weight updates
  • 24. Back Propagation Algorithm – Step 1 :forward pass • set of input vectors: • set of output vectors: .
  • 25. Back Propagation Algorithm – Step 1 :forward pass • f(.) is a sigmoid activation function x e xsigmoid    1 1 )(
  • 26. Back Propagation Algorithm – Step 2 :backward pass • Sum of squared errors:
  • 27. Gradient Descent • To minimize E • partial derivation iW E  
  • 28. Gradient Descent – local minimum
  • 29. Step 3 : weight updates • Weight updating: • learning rate : η • momentum factor : α
  • 30. Learning Rate & Momentum
  • 31. Limitations • requires lots of supervised training, with lots of input-output examples • there is no guarantee that the system will converge to an acceptable solution ~local minimum
  • 32. Experiment • The learning rate :0.25 to 0.5 ,and • Momentum set to WEKA’s default:0.2 • the size of training set: 500 to 400000.
  • 34. Experiment - Result Size of training set 500 5000 10000 20000 30000 400000 Learning rate = 0.25 N/A Learning rate = 0.3 Learning rate = 0.35 Learning rate = 0.5 N/A
  • 36. Result analysis RSI(14) Accuracy Precision 4 day bottom 0.5209627 0.6080084 8 day bottom 0.4274533 0.5185729 12 day bottom 0.5257611 0.54337349 16 day bottom 0.5111502 0.46223958 20 day bottom 0.5226190 0.42605156 Size of training set Learning rate = 0.35 Accuracy Precision 500 0.716129032 0.75 5000 0.754838709 0.782868525 10000 0.709677419 0.779068767 20000 0.711290322 0.780582524 30000 0/711290322 0.778420038 40000 0.709677419 0.779069767
  • 37. Result analysis - Problem? • False Positive TNFPNegative FNTPPositive NegativePositive True Predicted FNTPTNFP FP rateFPSpecial
  • 38. False Positive RSI(14) Special FP rate 4 day bottom 0.216783216 8 day bottom 0.23656542 12 day bottom 0.221896955 16 day bottom 0.242370892 20 day bottom 0.251785714 Size of training set Learning rate = 0.35 Special FP rate 500 0.233870967 5000 0.175806451 10000 0.183870967 20000 0.182258064 30000 0.185483871
  • 39. Discussion • 100% accuracy is impossible in practice • Effectiveness of classification model also depends on how it is used, i.e. coupled with trading rules