DSO530 Group project

Options Pricing
Group Project
DSO 530 - Group 14
Bingxin Li, Shimin Liang, Xinye Yang,
Xinyi Zhang, Yuxin Tang, Ziyi Gao

Data
Understanding
Data
Preparation
Modeling
Business
Questions
Business
Understanding
Agenda

Business Understanding
Strike Price Option Price (Premium) Time to Maturity The underlying asset price
Profit
Loss
Definition: A European call option gives the owner the right to acquire the underlying security at expiry.
For an investor to profit from a European call option, the
stock's price, at expiry, has to be trading high enough
above the strike price to cover the cost of the option
premium.
➔The market price of the options sometimes deviates
from the fair price, we need a tool that can help us
judge pricing.
Price
Option
Strike
Price

6 fields
1680 records
Exploratory Data Analysis

Regression analysis Classification analysis
0.125
Dependent Variables

Independent Variables

Checking Missing Values

Three primary methods of treating the outliers
● Trimming/removing the outlier
● Quantile based flooring and capping
● Mean/Median imputation
Boxplot after data cleaning
Data Cleaning
Handling Outliers
Handling Missing Values
Two primary ways of handling missing values
● Deleting the Missing values
● Imputing the Missing Values

Model preparation
Classification
Normalization - MinMax Scaler Dummy Variables
Regression & Classification

Regression Models Building
Cross Validation Scores of Regression Models
1. Use 5 statistical/ML models to predict option
value on training data
1. Use GridSearchCV to tune the parameters of
models
1. Given the cross validation scores (R-squared as
criterion), we finally choose random forest
model
3
2
1

Regression Champion Model - Random Forest Regression
Lasso and Ridge are types of linear models. According to cross validation results, random forest has a
much greater advantage in predicting option value than linear models.
Random forest is able to discover more complex dependencies at the cost of more time for fitting.
Why does random forest get a higher R-squared
But random forest still has some drawbacks…
“Random forests are black boxes derived by machine-learning.”

Classification results
● Use 7 statistical/ML models to predict BS
value on training data
● Use GridSearchCV to tune the parameters
of models
● Given the cross validation scores
(accuracy rate as criterion), we finally
choose random forest model
● We do not choose Gradient Boosting
because it has much larger variance than
random forest, which indicates it is
unstable.
3
2
1
4
Cross Validation Scores of Classification Models

Random forest algorithm is based on decision trees. It have better accurate rate than distance based
classification method like KNN and SVM, because
1. It can judge the importance of the feature
2. Can judge the interaction between different features
3. Not easy to overfit compared with decision tree
Classification Champion Model - Random Forest Classification
Why does random forest get a higher accurate rate
K<=427.5
gini=0.126
K<=452.5
gini=0.306
K<=422.5
gini=0.005
K<=427.5
gini=0.0349
S<=443.411
gini=0.459
K<=437.5
gini=0.492

Model Selection Criteria
Interpretation
Accuracy VS.
➔ Only a score to pass
to an automated
process
➔ Large amount of
data being
processed
Eg: Spam detection
➔ Need further
modification if
needed
➔ Increase social
acceptance
Eg: Medical cases
Random
Forest
Classifier
Random
Forest
Regressor ✓
✓
Linear
Regression
✓
Decision Tree
✓
Option Pricing
thousands of options dealing every day -> huge amount of data

Four Feature Understanding
Asset
Value
Interest
Rate
Time To
Maturity
Lower strike price,
Lower risk to lose
Negative
Higher current asset value,
Lower risk to lose
Positive
Higher time to maturity,
Higher freedom for buyers
to make decisions
Positive
Higher interest rate,
Higher value for buyers’ cash
Positive
Strike
Price

✓ Machine learning models does
not rely on pre-assumptions
✓ Calculate from historical data
✓ Can reproduce most of the
empirical characteristics of
options prices
1. No dividends are paid out during
the life of the option.
2. The risk-free rate and volatility of
the underlying asset are known and
constant.
3. Markets are random there is no
emotional decisions.
4. There are no transaction costs in
buying the option.
5. The returns on the underlying asset
are log-normally distributed.
BS is based on the following
assumptions
Why Machine Learning Models Outperform Black-Scholes

➔500 companies
➔Less fluctuation
➔Overall stock market performance
Factors affecting stock market:
1. Supply and demand
2. Investor sentiment
3. Interest rates
4. Politics
5. Current events
6. Natural calamities
7. Exchange rates
S&P 500 Tesla
Using trained model to predict option values for Tesla stocks?
➔Only 1 company
➔Less stability
➔Company performance
Factors affecting:
1. Product
2. Revenue & Debt
3. Investor capital
4. Management
5. Mergers & Acquisitions
6. …

DSO530 Group project

Recommended

Recommended

More Related Content

Similar to DSO530 Group project

Similar to DSO530 Group project (20)

Recently uploaded

Recently uploaded (20)

DSO530 Group project

Editor's Notes