SlideShare a Scribd company logo
1 of 49
Solutions
Machine learning
Time Series Analysis
Challenges
Finlab CTO 韓承佑
2019 / 07 / 05
Outline
• Introduction
• Background
• Motivation
• Proposed Method
• Conclusion
Human Evolution
Agricultural
revolution
Industrial
revolution
i
Information
revolution
14000 years
Human vs Computer at Go
4300 years
200 years
Machine Learning
Gartner Hype Cycle
For emerging technologies 2018
Deep Learning
Gartner Hype Cycle
For emerging technologies 2015
Machine Learning
1948 Technical Indicators
1970 Algorithmic Trading
1980 Personal Computer
1990 High frequency trading
History of Trading
• Mimics the ability to see and hear
• Extract rules automatically from data
• ML spots patterns in high dimensional data
Machine Learning
ML algorithms in finance?
Live trading result
date
profit
What is AI, ML, DL?
AI (Artificial Intelligent)
ML (Machine Learning)
DL (Deep Learning)
@
Dog or
Cat?
Categories Probability
Dog 0.9
Cat 0.1
Supervised Machine Learning
Color Weight Age Category
3.2 kg 2 cat
4.2 kg 5 cat
6.2 kg 4 dog
features labels
ML
Model
Training
Testing
Color Weight Age
3.2 kg 2
4.2 kg 5
6.2 kg 4
ML
Model
True Answer Prediction
cat cat
cat dog
dog dog
Outline
• Machine Learning Models
• Training & Testing
• Evaluation
• Feature Engineering
• Data Preprocessing
Feature Engineering &
Data Preprocessing
Feature Source
• Difficult to confirm data release date
• Missing data is often backfilled
• Consider multiple correction
• Maybe useful to combine other data types
Fundamental data
• Trading book
• market participant characteristic footprint
• Massive amount of data generated in one day
Market data
Alternative data
• news, trend, web, satellites…
• Primary data source
• Hard to process, difficult to confirm consistency
Challenging of Labeling the data
Time t KD RSI MACD Category
1 -1
2 -1
… 0
t 1
features labels
price
time
1
0
-1
𝑝 𝑡 + 𝑤 + 𝜏
𝑝 𝑡 + 𝑤 − 𝜏
𝑝 𝑡 + 𝑤𝑝 𝑡
• A popular method in the literature
• 𝜏 is a constant regardless of the volatility
• Do not have stop-loss limits
Fixed time horizon
Labeling generation for financial price
• Triple barrier [Prado 2018]
• Continuous trading signals [Dash 2016]
• Trading Point decision [Chang 2009]
[Prado 2018] Advances in Financial Machine Learning
[Tsantekidis 2017] Using Deep Learning to Detect Price Change Indications in Financial Markets
[Dash 2016] A hybrid stock trading framework integrating technical analysis with machine learning techniques
[Chang 2009] Integrating a Piecewise Linear Representation Method and a Neural Network Model for Stock Trading Points Prediction
Triple barrier [Prado 2018]
price
time
1
0
-1
𝑝 𝑡 + 𝑤𝑝 𝑡
𝑝 𝑡 + 𝑤 + 𝜏1
𝑝 𝑡 + 𝑤 − 𝜏2
• Labels according to the first barrier touched out of three barriers
• horizontal barriers are defined by profit-taking and stop- loss limit
• 𝜏1 and 𝜏2 are dynamic according to estimated volatility
Continuous trading signals [Dash 2016]
time
price
𝑝 𝑡 + 𝑤𝑝 𝑡
𝑝𝑡,𝑡+𝑤
max
𝑝𝑡,𝑡+𝑤
min
𝑝𝑡,𝑡+𝑤
min
𝑝𝑡,𝑡+𝑤
max
0.5
1
0.5
0
• Using momentum of the stock price
• y(t)’s are continuous
• Provides more detailed information
Trading point decision [Chang 2009]
• Find the local minimum and maximum points
• Divide the time series into subsegments
• Threshold value d  length of trend
Trading point decision [Chang 2009]
• Find the local minimum and maximum points
• Divide the time series into subsegments
• Threshold value d  length of trend
Trading point decision [Chang 2009]
• Find the local minimum and maximum points
• Divide the time series into subsegments
• Threshold value d  length of trend
Trading point decision
• Find the local minimum and maximum points
• Divide the time series into subsegments
• Threshold value d  length of trend
Trading point decision
• Find the local minimum and maximum points
• Divide the time series into subsegments
• Threshold value d  length of trend
Machine learning Models
Neural Network [McCulloch 1943]
• Built to model the human brain
• Designed to recognize patterns
• Interpret numeric data through a kind of machine perception
Human neuron structure Single neuron model
y1= g (w1x1+w2x2+
v
g(v)
1
w0
w1x1
x2
g
w2
y1
[McCulloch 1943] A Logical Calculus of Ideas Immanent in Nervous Activity
Neural network [McCulloch 1943]
Single node in neural network
g y1
1
w0
w1x1
x2
w2
v
g(v)
Neural network [McCulloch 1943]
Simplified expression
y1
1
w0
w1x1
x2
w2
Neural network [McCulloch 1943]
A layer contain multiple neurons
1
x1
x2
y2
y1
y3
y4
Deep Neural network
Multi-layer deep neural network
1
x1
x2
y2
y3
y1
Deep Neural Network Training Result
2018-1-1 2019-7-1
Train
2006 ~ 2014 2016 ~ 2019-3-1
Validate
2015
Backtest
Features
Scaled Technical Indicators
Asset
Data split
Labels
Fixed time horizon
Taiwan Capitalization
Weighted Stock Index
benchmark
backtest
Long short term memory neural network(LSTM)
[Hochreiter 1997]
• Can process sequence of data
• LSTM deals with the exploding and vanishing gradient problems
Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
Neural
network
x y
Neural
network
x y
Neural
network
x y
Neural
network
x y
Neural
network
x y
t = 1
t = 2
t = 3
x
y
Input gate +
tanh
LST
M
Forget gate
Output gate
Long short term memory neural network(LSTM)
[Hochreiter 1997]
Train
2006 ~ 2014 2016 ~ 2019-3-1
Validate
2015
Backtest
Features
Scaled Technical Indicators
Asset
Data split
Labels
Fixed time horizon
Taiwan Capitalization
Weighted Stock Index loss
profit
epoch
Convolutional Neural Network [2012 Krizhevsky]
• Commonly applied to computer visual imagery
• Prevent overfitting
Convolutional Layer
Time series to Image conversion Approach
[Sezer 2018]
[Sezer 2018] Algorithmic FinancialTrading with Deep Convolutional Neural Networks:Time Series to Image Conversion Approach
Which time series is random walk?
1
2
3
4
5
6
Generative Adversarial Networks (GAN)
Historical
Price data
Real/Generated
• The Generator is trained to generate data that looks like historical price
• The Discriminator is trained to tell the difference between generated and real data
https://github.com/nmharmon8/StockMarketGAN
StockMarketGAN
Discriminator features is good for predict the direction of the price
Historical
Price data
Fixed
Weights
(New features)
Rise/Fall
https://github.com/nmharmon8/StockMarketGAN
Backtest
• Survivor bias, lookahead bias, training, transection cost, outlier
• Finding the lottery tickets that won the last game
• Machine learning overfitting
• Solutions
• Develop model for entire. asset or classes
• Use Bootstrap aggregating
• Record every backtest conducted
• K-fold Cross validation
K-fold Cross Validation
• Determine the generalization error of an ML algorithm
• Prevent overfitting
• Assume the training set and the testing set are IID
Drawback
Train
Test
Train
Price
• Training set contains information
that also appears in the testing set
• Observations cannot be assumed
to be drawn from an IID process
• Multiple testing and selection bias
Purged K-fold Cross Validation [Prado 2018]
Price
Price
Before Purging
After Purging
Feature Importance [Liaw 2002]
• Understand features contributed to the performance
• Add some features that strengthen the predictive power
• Opens up the proverbial black box
• How to deal with selection bias
• Evaluate features in multiple assets
• Using different tree base classifier/regressor
• Use random feature to distinguish powerful features
Mean decrease impurity (MDI)
• Adding up information gain for each features
• Simple and efficient to calculate
• inflate importance of continuous feature
Correlation Coefficient
0.94
Importance
indicator ID’s
Humidity
Rain Sunny day
Throw a coin
Rain Sunny day
Importance: technical indicators of TXF1
Importance: technical Indicators of random walk time series
[2013 Louppe] Understanding variable importances
in forests of randomized trees.
[2013 Louppe]
Permutation Importance
• Much more computationally expensive
• Results are more reliable
Color Weight Age
3.2 kg 2
4.2 kg 5
6.2 kg 4
baseline = model.score(X, y)
for each column in X
permutate column in X randomly
reduced_score = model.score(X, y)
recover the column
importance of the feature = baseline – reduced_score
indicator ID’s
Importance: technical indicators of TXF1
Importance: technical Indicators of random walk time series
-0.04
Correlation Coefficient
Importance
[2013 Louppe]
[2013 Louppe] Understanding variable importances in forests of randomized trees
Conclusion
• Feature engineering
• Labeling
• Fixed time horizon, triple barrier, Continuous signal
• Multi-asset feature selection
• MDI, MDA
• Model
• Neural network
• LSTN
• CNN
• Cross validation, backtest
• Realistic check
Future work
Feature enumeration
Stationarize features
Feature selection
Preprocessing
Training
Trading strategies
development
Reality check
Multiple assets
Strategies
Strategy management
Live trading
Q&A
Facebook AI CourseFinlab Blog AI Diagnose

More Related Content

What's hot

Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Praxitelis Nikolaos Kouroupetroglou
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine LearningAnkit Rai
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Simplilearn
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term MemoryYan Xu
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Simplilearn
 
Lesson 5 arima
Lesson 5 arimaLesson 5 arima
Lesson 5 arimaankit_ppt
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonWes McKinney
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentationRamandeep Kaur Bagri
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 

What's hot (20)

Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Time series forecasting
Time series forecastingTime series forecasting
Time series forecasting
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Lesson 5 arima
Lesson 5 arimaLesson 5 arima
Lesson 5 arima
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
 
Predictive Analytics - An Introduction
Predictive Analytics - An IntroductionPredictive Analytics - An Introduction
Predictive Analytics - An Introduction
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentation
 
Machine learning
Machine learning Machine learning
Machine learning
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Time series analysis
Time series analysisTime series analysis
Time series analysis
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 

Similar to Machine learning & Time Series Analysis

Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15MLconf
 
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
The Art of Intelligence – Introduction Machine Learning for Oracle profession...The Art of Intelligence – Introduction Machine Learning for Oracle profession...
The Art of Intelligence – Introduction Machine Learning for Oracle profession...Lucas Jellema
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Lucidworks
 
Time Series with Driverless AI - Marios Michailidis and Mathias Müller - H2O ...
Time Series with Driverless AI - Marios Michailidis and Mathias Müller - H2O ...Time Series with Driverless AI - Marios Michailidis and Mathias Müller - H2O ...
Time Series with Driverless AI - Marios Michailidis and Mathias Müller - H2O ...Sri Ambati
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataWeCloudData
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?Ivo Andreev
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Venturesmicrosoftventures
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCgdgsurrey
 
Marios Michailidis & Mathias Muller, H2O.ai - Time Series with H2O Driverless...
Marios Michailidis & Mathias Muller, H2O.ai - Time Series with H2O Driverless...Marios Michailidis & Mathias Muller, H2O.ai - Time Series with H2O Driverless...
Marios Michailidis & Mathias Muller, H2O.ai - Time Series with H2O Driverless...Sri Ambati
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305Amazon Web Services
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIMEAli Raza Anjum
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
 

Similar to Machine learning & Time Series Analysis (20)

Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
The Art of Intelligence – Introduction Machine Learning for Oracle profession...The Art of Intelligence – Introduction Machine Learning for Oracle profession...
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
 
Time Series with Driverless AI - Marios Michailidis and Mathias Müller - H2O ...
Time Series with Driverless AI - Marios Michailidis and Mathias Müller - H2O ...Time Series with Driverless AI - Marios Michailidis and Mathias Müller - H2O ...
Time Series with Driverless AI - Marios Michailidis and Mathias Müller - H2O ...
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
OR Ndejje Univ (1).pptx
OR Ndejje Univ (1).pptxOR Ndejje Univ (1).pptx
OR Ndejje Univ (1).pptx
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
OR Ndejje Univ.pptx
OR Ndejje Univ.pptxOR Ndejje Univ.pptx
OR Ndejje Univ.pptx
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
 
Marios Michailidis & Mathias Muller, H2O.ai - Time Series with H2O Driverless...
Marios Michailidis & Mathias Muller, H2O.ai - Time Series with H2O Driverless...Marios Michailidis & Mathias Muller, H2O.ai - Time Series with H2O Driverless...
Marios Michailidis & Mathias Muller, H2O.ai - Time Series with H2O Driverless...
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305
 
When Should I Use Simulation?
When Should I Use Simulation?When Should I Use Simulation?
When Should I Use Simulation?
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIME
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 

Recently uploaded

How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingAggregage
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...Suhani Kapoor
 
Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex
 
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionMuhammadHusnain82237
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...makika9823
 
Financial institutions facilitate financing, economic transactions, issue fun...
Financial institutions facilitate financing, economic transactions, issue fun...Financial institutions facilitate financing, economic transactions, issue fun...
Financial institutions facilitate financing, economic transactions, issue fun...Avanish Goel
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...shivangimorya083
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfHenry Tapper
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designsegoetzinger
 
Quantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector CompaniesQuantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector Companiesprashantbhati354
 
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawlmakika9823
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spiritegoetzinger
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarHarsh Kumar
 
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service NashikHigh Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdfAdnet Communications
 
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130Suhani Kapoor
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesMarketing847413
 

Recently uploaded (20)

How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of Reporting
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
 
Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024
 
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th edition
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
 
Financial institutions facilitate financing, economic transactions, issue fun...
Financial institutions facilitate financing, economic transactions, issue fun...Financial institutions facilitate financing, economic transactions, issue fun...
Financial institutions facilitate financing, economic transactions, issue fun...
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
 
Quantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector CompaniesQuantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector Companies
 
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spirit
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh Kumar
 
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service NashikHigh Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf
 
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast Slides
 

Machine learning & Time Series Analysis

  • 1. Solutions Machine learning Time Series Analysis Challenges Finlab CTO 韓承佑 2019 / 07 / 05
  • 2. Outline • Introduction • Background • Motivation • Proposed Method • Conclusion
  • 4. Human vs Computer at Go 4300 years 200 years
  • 6. Gartner Hype Cycle For emerging technologies 2018 Deep Learning
  • 7. Gartner Hype Cycle For emerging technologies 2015 Machine Learning
  • 8. 1948 Technical Indicators 1970 Algorithmic Trading 1980 Personal Computer 1990 High frequency trading History of Trading
  • 9. • Mimics the ability to see and hear • Extract rules automatically from data • ML spots patterns in high dimensional data Machine Learning
  • 10. ML algorithms in finance?
  • 12. What is AI, ML, DL? AI (Artificial Intelligent) ML (Machine Learning) DL (Deep Learning) @ Dog or Cat? Categories Probability Dog 0.9 Cat 0.1
  • 13. Supervised Machine Learning Color Weight Age Category 3.2 kg 2 cat 4.2 kg 5 cat 6.2 kg 4 dog features labels ML Model Training Testing Color Weight Age 3.2 kg 2 4.2 kg 5 6.2 kg 4 ML Model True Answer Prediction cat cat cat dog dog dog
  • 14. Outline • Machine Learning Models • Training & Testing • Evaluation • Feature Engineering • Data Preprocessing
  • 15. Feature Engineering & Data Preprocessing
  • 16. Feature Source • Difficult to confirm data release date • Missing data is often backfilled • Consider multiple correction • Maybe useful to combine other data types Fundamental data • Trading book • market participant characteristic footprint • Massive amount of data generated in one day Market data Alternative data • news, trend, web, satellites… • Primary data source • Hard to process, difficult to confirm consistency
  • 17. Challenging of Labeling the data Time t KD RSI MACD Category 1 -1 2 -1 … 0 t 1 features labels price time 1 0 -1 𝑝 𝑡 + 𝑤 + 𝜏 𝑝 𝑡 + 𝑤 − 𝜏 𝑝 𝑡 + 𝑤𝑝 𝑡 • A popular method in the literature • 𝜏 is a constant regardless of the volatility • Do not have stop-loss limits Fixed time horizon
  • 18. Labeling generation for financial price • Triple barrier [Prado 2018] • Continuous trading signals [Dash 2016] • Trading Point decision [Chang 2009] [Prado 2018] Advances in Financial Machine Learning [Tsantekidis 2017] Using Deep Learning to Detect Price Change Indications in Financial Markets [Dash 2016] A hybrid stock trading framework integrating technical analysis with machine learning techniques [Chang 2009] Integrating a Piecewise Linear Representation Method and a Neural Network Model for Stock Trading Points Prediction
  • 19. Triple barrier [Prado 2018] price time 1 0 -1 𝑝 𝑡 + 𝑤𝑝 𝑡 𝑝 𝑡 + 𝑤 + 𝜏1 𝑝 𝑡 + 𝑤 − 𝜏2 • Labels according to the first barrier touched out of three barriers • horizontal barriers are defined by profit-taking and stop- loss limit • 𝜏1 and 𝜏2 are dynamic according to estimated volatility
  • 20. Continuous trading signals [Dash 2016] time price 𝑝 𝑡 + 𝑤𝑝 𝑡 𝑝𝑡,𝑡+𝑤 max 𝑝𝑡,𝑡+𝑤 min 𝑝𝑡,𝑡+𝑤 min 𝑝𝑡,𝑡+𝑤 max 0.5 1 0.5 0 • Using momentum of the stock price • y(t)’s are continuous • Provides more detailed information
  • 21. Trading point decision [Chang 2009] • Find the local minimum and maximum points • Divide the time series into subsegments • Threshold value d  length of trend
  • 22. Trading point decision [Chang 2009] • Find the local minimum and maximum points • Divide the time series into subsegments • Threshold value d  length of trend
  • 23. Trading point decision [Chang 2009] • Find the local minimum and maximum points • Divide the time series into subsegments • Threshold value d  length of trend
  • 24. Trading point decision • Find the local minimum and maximum points • Divide the time series into subsegments • Threshold value d  length of trend
  • 25. Trading point decision • Find the local minimum and maximum points • Divide the time series into subsegments • Threshold value d  length of trend
  • 27. Neural Network [McCulloch 1943] • Built to model the human brain • Designed to recognize patterns • Interpret numeric data through a kind of machine perception Human neuron structure Single neuron model y1= g (w1x1+w2x2+ v g(v) 1 w0 w1x1 x2 g w2 y1 [McCulloch 1943] A Logical Calculus of Ideas Immanent in Nervous Activity
  • 28. Neural network [McCulloch 1943] Single node in neural network g y1 1 w0 w1x1 x2 w2 v g(v)
  • 29. Neural network [McCulloch 1943] Simplified expression y1 1 w0 w1x1 x2 w2
  • 30. Neural network [McCulloch 1943] A layer contain multiple neurons 1 x1 x2 y2 y1 y3 y4
  • 31. Deep Neural network Multi-layer deep neural network 1 x1 x2 y2 y3 y1
  • 32. Deep Neural Network Training Result 2018-1-1 2019-7-1 Train 2006 ~ 2014 2016 ~ 2019-3-1 Validate 2015 Backtest Features Scaled Technical Indicators Asset Data split Labels Fixed time horizon Taiwan Capitalization Weighted Stock Index benchmark backtest
  • 33. Long short term memory neural network(LSTM) [Hochreiter 1997] • Can process sequence of data • LSTM deals with the exploding and vanishing gradient problems Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780. Neural network x y Neural network x y Neural network x y Neural network x y Neural network x y t = 1 t = 2 t = 3 x y Input gate + tanh LST M Forget gate Output gate
  • 34. Long short term memory neural network(LSTM) [Hochreiter 1997] Train 2006 ~ 2014 2016 ~ 2019-3-1 Validate 2015 Backtest Features Scaled Technical Indicators Asset Data split Labels Fixed time horizon Taiwan Capitalization Weighted Stock Index loss profit epoch
  • 35. Convolutional Neural Network [2012 Krizhevsky] • Commonly applied to computer visual imagery • Prevent overfitting Convolutional Layer
  • 36. Time series to Image conversion Approach [Sezer 2018] [Sezer 2018] Algorithmic FinancialTrading with Deep Convolutional Neural Networks:Time Series to Image Conversion Approach
  • 37. Which time series is random walk? 1 2 3 4 5 6
  • 38. Generative Adversarial Networks (GAN) Historical Price data Real/Generated • The Generator is trained to generate data that looks like historical price • The Discriminator is trained to tell the difference between generated and real data https://github.com/nmharmon8/StockMarketGAN
  • 39. StockMarketGAN Discriminator features is good for predict the direction of the price Historical Price data Fixed Weights (New features) Rise/Fall https://github.com/nmharmon8/StockMarketGAN
  • 40. Backtest • Survivor bias, lookahead bias, training, transection cost, outlier • Finding the lottery tickets that won the last game • Machine learning overfitting • Solutions • Develop model for entire. asset or classes • Use Bootstrap aggregating • Record every backtest conducted • K-fold Cross validation
  • 41. K-fold Cross Validation • Determine the generalization error of an ML algorithm • Prevent overfitting • Assume the training set and the testing set are IID
  • 42. Drawback Train Test Train Price • Training set contains information that also appears in the testing set • Observations cannot be assumed to be drawn from an IID process • Multiple testing and selection bias
  • 43. Purged K-fold Cross Validation [Prado 2018] Price Price Before Purging After Purging
  • 44. Feature Importance [Liaw 2002] • Understand features contributed to the performance • Add some features that strengthen the predictive power • Opens up the proverbial black box • How to deal with selection bias • Evaluate features in multiple assets • Using different tree base classifier/regressor • Use random feature to distinguish powerful features
  • 45. Mean decrease impurity (MDI) • Adding up information gain for each features • Simple and efficient to calculate • inflate importance of continuous feature Correlation Coefficient 0.94 Importance indicator ID’s Humidity Rain Sunny day Throw a coin Rain Sunny day Importance: technical indicators of TXF1 Importance: technical Indicators of random walk time series [2013 Louppe] Understanding variable importances in forests of randomized trees. [2013 Louppe]
  • 46. Permutation Importance • Much more computationally expensive • Results are more reliable Color Weight Age 3.2 kg 2 4.2 kg 5 6.2 kg 4 baseline = model.score(X, y) for each column in X permutate column in X randomly reduced_score = model.score(X, y) recover the column importance of the feature = baseline – reduced_score indicator ID’s Importance: technical indicators of TXF1 Importance: technical Indicators of random walk time series -0.04 Correlation Coefficient Importance [2013 Louppe] [2013 Louppe] Understanding variable importances in forests of randomized trees
  • 47. Conclusion • Feature engineering • Labeling • Fixed time horizon, triple barrier, Continuous signal • Multi-asset feature selection • MDI, MDA • Model • Neural network • LSTN • CNN • Cross validation, backtest • Realistic check
  • 48. Future work Feature enumeration Stationarize features Feature selection Preprocessing Training Trading strategies development Reality check Multiple assets Strategies Strategy management Live trading
  • 49. Q&A Facebook AI CourseFinlab Blog AI Diagnose

Editor's Notes

  1. 好幾千年的進化, 解決生存問題 - 農業 解決勞動問題 - 工業革命 解決思考問題 - 資訊革命 人類是很懶的生物 懶得出外打獵 懶得勞動 懶得思考 人類思考了這麼久
  2. 我們如今走到了這裡,200年前真空管發明到現在,電腦取得了巨大的進步
  3. 要是…就好了 要是…很痛苦 要是…躺著賺
  4. As emotional beings, subject to fears, hopes, and agendas, humans are not particularly good at making fact-based decisions, particularly when those decisions involve conflicts of interest. Millions of years of evolution (a genetic algorithm) have fine-tuned our ape brains to survive in a hostile 3-dimensional world where the laws of nature are static. Humans are slow learners, which puts us at a disadvantage in a fast-changing world like finance. 10 year is very different The first algorithm trading Technical Trading -> Quantitive trading -> high frequency trading -> machine learning trading