SlideShare a Scribd company logo
1 of 18
Options Pricing
Group Project
DSO 530 - Group 14
Bingxin Li, Shimin Liang, Xinye Yang,
Xinyi Zhang, Yuxin Tang, Ziyi Gao
Data
Understanding
Data
Preparation
Modeling
Business
Questions
Business
Understanding
Agenda
Business Understanding
Strike Price Option Price (Premium) Time to Maturity The underlying asset price
Profit
Loss
Definition: A European call option gives the owner the right to acquire the underlying security at expiry.
For an investor to profit from a European call option, the
stock's price, at expiry, has to be trading high enough
above the strike price to cover the cost of the option
premium.
➔The market price of the options sometimes deviates
from the fair price, we need a tool that can help us
judge pricing.
Price
Option
Strike
Price
6 fields
1680 records
Exploratory Data Analysis
Regression analysis Classification analysis
0.125
Dependent Variables
Exploratory Data Analysis
Independent Variables
Exploratory Data Analysis
Checking Missing Values
Exploratory Data Analysis
Three primary methods of treating the outliers
● Trimming/removing the outlier
● Quantile based flooring and capping
● Mean/Median imputation
Boxplot after data cleaning
Data Cleaning
Handling Outliers
Handling Missing Values
Two primary ways of handling missing values
● Deleting the Missing values
● Imputing the Missing Values
Model preparation
Classification
Normalization - MinMax Scaler Dummy Variables
Regression & Classification
Regression Models Building
Cross Validation Scores of Regression Models
1. Use 5 statistical/ML models to predict option
value on training data
1. Use GridSearchCV to tune the parameters of
models
1. Given the cross validation scores (R-squared as
criterion), we finally choose random forest
model
3
2
1
Regression Champion Model - Random Forest Regression
Lasso and Ridge are types of linear models. According to cross validation results, random forest has a
much greater advantage in predicting option value than linear models.
Random forest is able to discover more complex dependencies at the cost of more time for fitting.
Why does random forest get a higher R-squared
But random forest still has some drawbacks…
“Random forests are black boxes derived by machine-learning.”
Classification results
● Use 7 statistical/ML models to predict BS
value on training data
● Use GridSearchCV to tune the parameters
of models
● Given the cross validation scores
(accuracy rate as criterion), we finally
choose random forest model
● We do not choose Gradient Boosting
because it has much larger variance than
random forest, which indicates it is
unstable.
3
2
1
4
Cross Validation Scores of Classification Models
Random forest algorithm is based on decision trees. It have better accurate rate than distance based
classification method like KNN and SVM, because
1. It can judge the importance of the feature
2. Can judge the interaction between different features
3. Not easy to overfit compared with decision tree
Classification Champion Model - Random Forest Classification
Why does random forest get a higher accurate rate
K<=427.5
gini=0.126
K<=452.5
gini=0.306
K<=422.5
gini=0.005
K<=427.5
gini=0.0349
S<=443.411
gini=0.459
K<=437.5
gini=0.492
Model Selection Criteria
Interpretation
Accuracy VS.
➔ Only a score to pass
to an automated
process
➔ Large amount of
data being
processed
Eg: Spam detection
➔ Need further
modification if
needed
➔ Increase social
acceptance
Eg: Medical cases
Random
Forest
Classifier
Random
Forest
Regressor ✓
✓
Linear
Regression
✓
Decision Tree
✓
Option Pricing
thousands of options dealing every day -> huge amount of data
Four Feature Understanding
Asset
Value
Interest
Rate
Time To
Maturity
Lower strike price,
Lower risk to lose
Negative
Higher current asset value,
Lower risk to lose
Positive
Higher time to maturity,
Higher freedom for buyers
to make decisions
Positive
Higher interest rate,
Higher value for buyers’ cash
Positive
Strike
Price
✓ Machine learning models does
not rely on pre-assumptions
✓ Calculate from historical data
✓ Can reproduce most of the
empirical characteristics of
options prices
1. No dividends are paid out during
the life of the option.
2. The risk-free rate and volatility of
the underlying asset are known and
constant.
3. Markets are random there is no
emotional decisions.
4. There are no transaction costs in
buying the option.
5. The returns on the underlying asset
are log-normally distributed.
BS is based on the following
assumptions
Why Machine Learning Models Outperform Black-Scholes
➔500 companies
➔Less fluctuation
➔Overall stock market performance
Factors affecting stock market:
1. Supply and demand
2. Investor sentiment
3. Interest rates
4. Politics
5. Current events
6. Natural calamities
7. Exchange rates
S&P 500 Tesla
Using trained model to predict option values for Tesla stocks?
➔Only 1 company
➔Less stability
➔Company performance
Factors affecting:
1. Product
2. Revenue & Debt
3. Investor capital
4. Management
5. Mergers & Acquisitions
6. …
Thank you!

More Related Content

Similar to DSO530 Group project

VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsBigML, Inc
 
Machine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business casesMachine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business casesClaudio Mirti
 
Decision trees
Decision treesDecision trees
Decision treesNcib Lotfi
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePedro Ecija Serrano
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - ReportAkanksha Gohil
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxChandrakalaV15
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statisticsSpotle.ai
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxkprasad8
 
Dwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisDwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisnivatripathy93
 
Data scientist Methods | Artificial Intelligence | Rahul Gulab Singh
Data scientist Methods  | Artificial Intelligence | Rahul Gulab SinghData scientist Methods  | Artificial Intelligence | Rahul Gulab Singh
Data scientist Methods | Artificial Intelligence | Rahul Gulab SinghRahul Singh
 

Similar to DSO530 Group project (20)

VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 Sessions
 
Machine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business casesMachine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business cases
 
random forest.pptx
random forest.pptxrandom forest.pptx
random forest.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Decision trees
Decision treesDecision trees
Decision trees
 
Data Science for Retail Broking
Data Science for Retail BrokingData Science for Retail Broking
Data Science for Retail Broking
 
Data Science for Retail Broking
Data Science for Retail BrokingData Science for Retail Broking
Data Science for Retail Broking
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
Dwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisDwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basis
 
Data scientist Methods | Artificial Intelligence | Rahul Gulab Singh
Data scientist Methods  | Artificial Intelligence | Rahul Gulab SinghData scientist Methods  | Artificial Intelligence | Rahul Gulab Singh
Data scientist Methods | Artificial Intelligence | Rahul Gulab Singh
 
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
Credit Card Fraud Detection_ Mansi_Choudhary.pptxCredit Card Fraud Detection_ Mansi_Choudhary.pptx
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
 
PPT s09-machine vision-s2
PPT s09-machine vision-s2PPT s09-machine vision-s2
PPT s09-machine vision-s2
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 

Recently uploaded

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

DSO530 Group project

  • 1. Options Pricing Group Project DSO 530 - Group 14 Bingxin Li, Shimin Liang, Xinye Yang, Xinyi Zhang, Yuxin Tang, Ziyi Gao
  • 3. Business Understanding Strike Price Option Price (Premium) Time to Maturity The underlying asset price Profit Loss Definition: A European call option gives the owner the right to acquire the underlying security at expiry. For an investor to profit from a European call option, the stock's price, at expiry, has to be trading high enough above the strike price to cover the cost of the option premium. ➔The market price of the options sometimes deviates from the fair price, we need a tool that can help us judge pricing. Price Option Strike Price
  • 5. Regression analysis Classification analysis 0.125 Dependent Variables Exploratory Data Analysis
  • 8. Three primary methods of treating the outliers ● Trimming/removing the outlier ● Quantile based flooring and capping ● Mean/Median imputation Boxplot after data cleaning Data Cleaning Handling Outliers Handling Missing Values Two primary ways of handling missing values ● Deleting the Missing values ● Imputing the Missing Values
  • 9. Model preparation Classification Normalization - MinMax Scaler Dummy Variables Regression & Classification
  • 10. Regression Models Building Cross Validation Scores of Regression Models 1. Use 5 statistical/ML models to predict option value on training data 1. Use GridSearchCV to tune the parameters of models 1. Given the cross validation scores (R-squared as criterion), we finally choose random forest model 3 2 1
  • 11. Regression Champion Model - Random Forest Regression Lasso and Ridge are types of linear models. According to cross validation results, random forest has a much greater advantage in predicting option value than linear models. Random forest is able to discover more complex dependencies at the cost of more time for fitting. Why does random forest get a higher R-squared But random forest still has some drawbacks… “Random forests are black boxes derived by machine-learning.”
  • 12. Classification results ● Use 7 statistical/ML models to predict BS value on training data ● Use GridSearchCV to tune the parameters of models ● Given the cross validation scores (accuracy rate as criterion), we finally choose random forest model ● We do not choose Gradient Boosting because it has much larger variance than random forest, which indicates it is unstable. 3 2 1 4 Cross Validation Scores of Classification Models
  • 13. Random forest algorithm is based on decision trees. It have better accurate rate than distance based classification method like KNN and SVM, because 1. It can judge the importance of the feature 2. Can judge the interaction between different features 3. Not easy to overfit compared with decision tree Classification Champion Model - Random Forest Classification Why does random forest get a higher accurate rate K<=427.5 gini=0.126 K<=452.5 gini=0.306 K<=422.5 gini=0.005 K<=427.5 gini=0.0349 S<=443.411 gini=0.459 K<=437.5 gini=0.492
  • 14. Model Selection Criteria Interpretation Accuracy VS. ➔ Only a score to pass to an automated process ➔ Large amount of data being processed Eg: Spam detection ➔ Need further modification if needed ➔ Increase social acceptance Eg: Medical cases Random Forest Classifier Random Forest Regressor ✓ ✓ Linear Regression ✓ Decision Tree ✓ Option Pricing thousands of options dealing every day -> huge amount of data
  • 15. Four Feature Understanding Asset Value Interest Rate Time To Maturity Lower strike price, Lower risk to lose Negative Higher current asset value, Lower risk to lose Positive Higher time to maturity, Higher freedom for buyers to make decisions Positive Higher interest rate, Higher value for buyers’ cash Positive Strike Price
  • 16. ✓ Machine learning models does not rely on pre-assumptions ✓ Calculate from historical data ✓ Can reproduce most of the empirical characteristics of options prices 1. No dividends are paid out during the life of the option. 2. The risk-free rate and volatility of the underlying asset are known and constant. 3. Markets are random there is no emotional decisions. 4. There are no transaction costs in buying the option. 5. The returns on the underlying asset are log-normally distributed. BS is based on the following assumptions Why Machine Learning Models Outperform Black-Scholes
  • 17. ➔500 companies ➔Less fluctuation ➔Overall stock market performance Factors affecting stock market: 1. Supply and demand 2. Investor sentiment 3. Interest rates 4. Politics 5. Current events 6. Natural calamities 7. Exchange rates S&P 500 Tesla Using trained model to predict option values for Tesla stocks? ➔Only 1 company ➔Less stability ➔Company performance Factors affecting: 1. Product 2. Revenue & Debt 3. Investor capital 4. Management 5. Mergers & Acquisitions 6. …

Editor's Notes

  1. Hello everyone, we are Group 14 presenting our understanding and machine learning models for options pricing project.
  2. We will follow a typical machine learning project workflow by starting with business understanding, and we will conclude our presentation by answering the 4 business questions.
  3. A European call option gives the owner the right to acquire the underlying security at expiry. For an investor to profit from a European call option, the stock's price, at expiry, has to be trading high enough above the strike price to cover the cost of the option price. But the market price of the options sometimes deviates from the fair price, so we need a tool that can help us judge pricing. Therefore, we decided to explore machine learning algorithms in calculating fair option prices.
  4. Our dataset has 1680 records and consists of 2 dependent variables, 4 independent variables. ‘K’ stands for the strike price of option. ‘S’ stands for the current asset value. ‘tau’ stands for days remaining to expiration converted to the percentage of the year. So the legal range should be between 0 and 1. ‘R’ stands for the annual interest rate.
  5. The value field stands for the current European call option value. By applying the BS formula to the features data, we get the predicted option value. If the predicted value is greater than current value, we associate that option with Over, otherwise we associate that with Under. From the count plot, we can see the ratio between ‘over’ and ‘under’ is relatively balanced, so there is no need to upsample or downsample our dataset.
  6. From the boxplot of ‘S’, we can identify one extreme outlier ‘0’. Also, we can identify 2 extreme outliers from the box plot of ‘tau’; those 2 outliers might be due to human error.
  7. From this table we can see observation 292 has 3 missing values, observation 818 has 2 missing values, and one of the missing values is located in the target field.
  8. Since the missing values all appear in 2 records, imputing missing values might distort the data. We choose to delete the two observations. It is obvious that the outlier is due to incorrectly entered or measured data, so we choose to simply drop the outliers.
  9. For model preparation stage, we first normalize the data for our four feature variables since they are different scales. Normalization helps to change the values of numeric columns to a common scale, without distorting differences in the ranges of values. Since we will use some regression and classification models that are based on distance, so for these algorithms, normalization is helpful for reducing the scale difference between features. For non parametric algorithms, eg random forest, normalization will not change the ranks of data, so it will make no difference. For easier comparison between different models, we choose to use normalized data for all models. For classification models, we took an extra step in changing the target variable to dummies, to ensure that the algorithm can read the target variable.
  10. In the first part of model building, we have used 5 regression models to predict option value and used GridSearchCV to tune our models’ parameters such as # of trees in Random Forest. Then we implement cross validation to evaluate models and compare their performance. As the box plot shows, Random Forest model is more robust because it has the highest R-squared and the standard deviation of its R-squared is also small.
  11. After choosing Random Forest as our final regression model, we want to find out why it can get a better performance. Firstly, the performance is significantly different between linear models like Lasso and tree-based models like Random Forest. Tree-based models can discover more complex dependencies, whereas linear models can only produce functions with a linear "shape". Therefore, if the relationship between input variables and option value is non-linear, a tree-based model would be able to capture such a relationship, but linear models can’t. However, Random Forest still has some drawbacks. Almost everyone can understand and interpret linear models easily. In contrast, Random Forest, like black boxes, it’s very hard to get such straightforward interpretation. We will dive into the tradeoff between model accuracy and interpretability in insights part.
  12. We used 7 machine learning models and GridSearchCV to tune the parameters and predicted BS value on training data. We used 10 fold cross validation to test our model, using accuracy rate as criterion, and we found that gradient boosting and rf have higher accuracy than all the rest. We finally chose rf instead of Gradient Boosting because it has much smaller variance than random forest, which indicates it is more stable.
  13. After selecting the models, we found 3 possible reasons to explain why decision tree based models have much higher accuracy rate compared to distance based models like KNN and SVM. First, It can judge the importance of the feature Second, It can judge the interaction between different features Third, It can avoid overfitting compared with decision tree Here is one of the decision trees from our random forest model. Because the decision tree uses greedy approach to attain minimum gini value, we can see the first binary split on K value dramatically decreases the gini rate, which implies that feature K is much more important than other features. Thus it can make better classification and result in higher accuracy rate.
  14. We decided to use the model with highest accuracy, which is the random forest model. We think problems that require more interpretation are case by case problems that need further model modifications or judgements based on the result. For example, for tumor detections, the doctors need to understand the models to avoid misclassification. Therefore, the models should be highly interpretable. What’s more, simple models are easier to earn trust from others such as patients. Problems that require a high accuracy rate are ones that need to process large amounts of data at a fast pace to deliver the results to another process. There is no need to understand the model logic because high accuracy is good enough for the purpose. Even if they sometimes generate some errors, the influence will not be detrimental. For example, the users do not need to understand the complicated spam detection model. Even if spam is not detected, the overall influence is not huge, so no human interpretation and intervention is needed. For our option price situation, there are thousands of options dealing every day. The amount of data being processed is very huge that a good prediction on prices overall with some errors will be good enough. Therefore, we think accuracy matters more in this case.
  15. Under the model, we think all of the 4 features we used are necessary and important to predict the option price. The higher current asset value and a lower strike price means the need for an asset’s price increase is lower, which means there is lower risk to lose, so the option price will be higher. Higher the interest rate, means the call option buyers can earn more interest by holding the cash in bank during the time to maturity, which means they are willing to pay more to buy the call option. In contrast, the seller lost the opportunity to benefit from the increased interest rate, because the cash is in the buyer's hand until maturity. Lastly, longer time to maturity gives call option holders more freedom to trade the option in the market which again increases the call option price. Also, from our linear regression test, the p-value for all variables are significantly under 0.05, meaning they are all important to include.
  16. Machine Learning models avoid assumptions made for Black-Scholes like constant risk-free rates and volatility over the option’s life and learn from historical data. The assumptions are not true in reality that volatility fluctuates with the level of supply and demand. Additionally, the other assumptions can lead to prices that deviate from the real world. Instead, machine learning models release the model and calculate prices from a large amount of historical data. And this pricing process can reproduce most of the empirical characteristics of options prices. In this sense, Machine Learning Models Outperform Black-Scholes.