Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Time Series Analysis Sample Code
1. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 1/40
Time
Series
Analysis
In [49]: import os
import datetime
import requests
import zipfile
import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.tsa.ar_model import AR
from sklearn.metrics import r2_score, mean_squared_error
%matplotlib inline
0.
Get
Data
In [50]: sorted(list(filter(lambda f: not f.startswith("."), os.listdir("."))))
In [51]: def get_data(data_url):
with requests.get(data_url) as r:
with zipfile.ZipFile(io.BytesIO(r.content)) as z:
z.extractall()
In [52]: data_url = "http://quantquote.com/files/quantquote_daily_sp500_83986.zi
p"
get_data(data_url=data_url)
data_dir = os.path.join("quantquote_daily_sp500_83986", "daily")
Out[50]: ['README.md',
'Untitled.ipynb',
'census_data.csv',
'data_science_answers.ipynb',
'data_science_raw.ipynb',
'pandas_basics.ipynb',
'pandas_basics_addntl.ipynb',
'pandas_basics_answers.ipynb',
'python_basics.ipynb',
'python_basics_addntl.ipynb',
'python_basics_answers.ipynb',
'quantquote_daily_sp500_83986',
'requirements.txt',
'time_series_analysis.ipynb',
'time_series_analysis_answers.ipynb']
2. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 2/40
Scope
out
data
directory
In [53]: stock_csv_names = sorted(os.listdir(data_dir))
In [54]: cols = ['date', 'time', 'open', 'high', 'low_price', 'close', 'volume']
Check
sample
data
In [55]: df = pd.read_csv(os.path.join(data_dir, stock_csv_names[0]),names=cols)
In [56]: df.shape
In [64]: df.head()
In [58]: df.dtypes
We see that most of the typing here looks good (no object / string representations of numeric data) but we do
see that date is coming in as an int rather than a datetime.
Out[56]: (3452, 7)
Out[64]:
date time open high low_price close volume
0 1999-11-18 0 42.2076 46.3820 37.4581 39.1928 4.398181e+07
1 1999-11-19 0 39.8329 39.8885 36.9293 37.6251 1.139020e+07
2 1999-11-22 0 38.3208 40.0091 37.1613 39.9442 4.654716e+06
3 1999-11-23 0 39.4247 40.4729 37.3375 37.5138 4.268903e+06
4 1999-11-24 0 37.2262 38.9052 37.1056 38.0889 3.602367e+06
Out[58]: date int64
time int64
open float64
high float64
low_price float64
close float64
volume float64
dtype: object
3. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 3/40
In [59]: df.isnull().sum()
In [60]: df.duplicated().sum()
No missing data, and no duplicate data.
In [61]: df.date = pd.to_datetime(df.date.astype(str), infer_datetime_format=True
)
check that this is a proper time-series data set, i.e. we're indexed on time, which in this case will mean that we
have one date for every row:
In [67]: df.date.nunique()/len(df)
In [31]: df=df.set_index('date')
In [68]: df.time.nunique()
In [69]: df=df.drop('time', axis=1)
Problem
Get an iterable of DataFrames , one for each stock in our dataset, with the wrangling we did above included.
In [82]: def get_csv_path(csv_name, stock_csv_folder=data_dir):
return os.path.join(stock_csv_folder, csv_name)
Out[59]: date 0
time 0
open 0
high 0
low_price 0
close 0
volume 0
dtype: int64
Out[60]: 0
Out[67]: 1.0
Out[68]: 1
4. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 4/40
In [83]: def get_df(csv_name, cols=cols):
df = pd.read_csv(get_csv_path(csv_name),
names=cols,
usecols=list(filter(lambda c: c!= "time", cols)))
df.date = pd.to_datetime(df.date.astype(str), infer_datetime_format=
True)
return df.set_index("date", drop=False)
In [84]: dfs_iter = (get_df(csv_name) for csv_name in stock_csv_names)
In [85]: dfs_list = list(dfs_iter) #convert generator object to list
In [89]: len(dfs_list)
1.
Prices
&
Returns
Prices
In [90]: aapl_df = get_df("table_aapl.csv")
In [93]: pd.Series(aapl_df.index).quantile([0, 1])
In [94]: aapl_df.isnull().sum()
In [95]: aapl_df.duplicated().sum()
Out[89]: 500
Out[93]: 0.0 1998-01-02
1.0 2013-08-09
Name: date, dtype: datetime64[ns]
Out[94]: date 0
open 0
high 0
low_price 0
close 0
volume 0
dtype: int64
Out[95]: 0
5. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 5/40
In [96]: ax = aapl_df.close.plot(figsize=(11, 8))
t = ax.set_title("aapl: closing price")
A couple of observations on the above graph:
the stock's price has increased over time
there is a good bit of variability in between the start and end points
it would have been nice to buy AAPL back in the 90s!
We can see clearly that there are a number of different components, if you will, to the above time series:
there's an upward trend over time
there look to be some periodic-ish patterns
there's a fair amount of noise-ish stuff, too
9. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 9/40
In [113]: ax = aapl_df.return_net.plot(figsize=(11, 8))
t = ax.set_title("aapl: net daily return")
10. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 10/40
In [114]: plot_time_series_decomposition(aapl_df.return_simple.dropna())
In [115]: aapl_df["return_simple_rolling_21_mean"] = aapl_df.return_simple.rolling
(21).mean()
11. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 11/40
In [122]: ax = aapl_df.return_simple_rolling_21_mean.plot(figsize=(11, 8))
t = ax.set_title("aapl: net daily return, rolling 21 mean(one-month)")
retain this functionality to try other windows
In [127]: def set_and_plot_rolling_mean(window):
col_name = f"return_simple_rolling_{window}_mean"
aapl_df[col_name] = aapl_df.return_simple.rolling(window).mean()
ax = aapl_df[col_name].plot(figsize=(11, 8))
t = ax.set_title(f"aapl: net daily return, rolling {window} day mea
n")
12. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 12/40
In [121]: set_and_plot_rolling_mean(63)
In [123]: set_and_plot_rolling_mean(166)
13. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 13/40
In [124]: set_and_plot_rolling_mean(252)
In [126]: def set_and_plot_rolling_std(window):
col_name = f"return_simple_rolling_{window}_std"
aapl_df[col_name] = aapl_df.return_simple.rolling(window).std()
ax = aapl_df[col_name].plot(figsize=(11, 8))
t = ax.set_title(f"aapl: net daily return, rolling {window} day std"
)
14. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 14/40
In [128]: set_and_plot_rolling_std(63)
Plot rolling mean and std together
In [129]: [c for c in aapl_df.columns if "63" in c]
In [132]: aapl_quarterly = aapl_df[[c for c in aapl_df.columns if "63" in c]].drop
na()
In [134]: aapl_quarterly = aapl_quarterly / aapl_quarterly.iloc[0]
In [135]: aapl_quarterly.iloc[0:5]
Out[129]: ['return_simple_rolling_63_mean', 'return_simple_rolling_63_std']
Out[135]:
return_simple_rolling_63_mean return_simple_rolling_63_std
date
1998-04-03 1.000000 1.000000
1998-04-06 0.973895 1.004831
1998-04-07 0.601837 0.829599
1998-04-08 0.699827 0.788331
1998-04-09 0.681255 0.784945
15. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 15/40
In [136]: ax = aapl_quarterly.plot(figsize=(11, 8))
t = ax.set_title("aapl: simple daily return, rolling 63 mean and std")
inverse relationship between the two
In [138]: aapl_quarterly.corr()
In [139]: aapl_quarterly.iloc[0:5]
Out[138]:
return_simple_rolling_63_mean return_simple_rolling_63_std
return_simple_rolling_63_mean 1.000000 -0.370614
return_simple_rolling_63_std -0.370614 1.000000
Out[139]:
return_simple_rolling_63_mean return_simple_rolling_63_std
date
1998-04-03 1.000000 1.000000
1998-04-06 0.973895 1.004831
1998-04-07 0.601837 0.829599
1998-04-08 0.699827 0.788331
1998-04-09 0.681255 0.784945
16. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 16/40
In [141]: plot_time_series_decomposition(aapl_df.return_simple_rolling_63_std.drop
na())
Compare rolling returns and volatility with differing window lengths and quantify these relationships using
correlation.
In [142]: def compare_return_to_vol(return_col, vol_col, df=aapl_df):
return df[[return_col, vol_col]].corr()
In [144]: compare_return_to_vol("return_simple_rolling_63_mean", "return_simple_ro
lling_63_std")
Out[144]:
return_simple_rolling_63_mean return_simple_rolling_63_std
return_simple_rolling_63_mean 1.000000 -0.370614
return_simple_rolling_63_std -0.370614 1.000000
18. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 18/40
In [151]: compare_return_to_vol("return_simple_rolling_166_mean", "return_simple_r
olling_166_std")
In [152]: aapl_halflyish = aapl_df[[c for c in aapl_df.columns if "166" in c]
].dropna()
In [153]: aapl_halflyish = aapl_halflyish / aapl_halflyish.iloc[0]
In [154]: ax = aapl_halflyish.plot(figsize=(11, 8))
t = ax.set_title("aapl: simple daily return, rolling 166 mean and std")
2.
Multi-Stock
Analysis
In [155]: len(dfs_list)
Out[151]:
return_simple_rolling_166_mean return_simple_rolling_166_std
return_simple_rolling_166_mean 1.000000 -0.349501
return_simple_rolling_166_std -0.349501 1.000000
Out[155]: 500
19. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 19/40
In [157]: os.listdir(data_dir)[:5]
In [158]: os.listdir(data_dir)[-5:]
In [159]: def get_stock_name(csv_name):
file_name = csv_name.split(".")[0]
return file_name.split("_")[1]
In [160]: dfs_list_indexed = [
pd.concat({get_stock_name(csv_name): dfs_list[i]}, axis=1)
for i, csv_name in enumerate(stock_csv_names)]
In [161]: all_stocks = dfs_list_indexed[0].join(dfs_list_indexed[1:])
In [162]: all_stocks.iloc[:5, :8]
Out[157]: ['table_dlph.csv',
'table_cat.csv',
'table_coh.csv',
'table_mcd.csv',
'table_ca.csv']
Out[158]: ['table_schw.csv',
'table_cl.csv',
'table_te.csv',
'table_vz.csv',
'table_hrs.csv']
Out[162]:
a aa
date open high low_price close volume date open
date
1999-11-
18
1999-11-
18
42.2076 46.3820 37.4581 39.1928 4.398181e+07
1999-11-
18
24.5183
1999-11-
19
1999-11-
19
39.8329 39.8885 36.9293 37.6251 1.139020e+07
1999-11-
19
24.5647
1999-11-
22
1999-11-
22
38.3208 40.0091 37.1613 39.9442 4.654716e+06
1999-11-
22
24.6846
1999-11-
23
1999-11-
23
39.4247 40.4729 37.3375 37.5138 4.268903e+06
1999-11-
23
25.0018
1999-11-
24
1999-11-
24
37.2262 38.9052 37.1056 38.0889 3.602367e+06
1999-11-
24
25.0482
20. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 20/40
In [164]: all_stocks["aa"].head()
Close
In [165]: tickers = all_stocks.columns.levels[0]
tickers[:5]
In [166]: _all_close_list = [all_stocks[tick].close for tick in tickers]
In [170]: _all_close_list[0].head()
In [171]: _all_close_list = [srs.rename(tickers[i])
for i, srs in enumerate(_all_close_list)]
In [176]: all_stocks_close = _all_close_list[0].to_frame().join(_all_close_list[1
:])
Out[164]:
date open high low_price close volume
date
1999-11-18 1999-11-18 24.5183 24.5183 24.0579 24.2514 2658683.414
1999-11-19 1999-11-19 24.5647 24.7581 24.2978 24.6846 3022133.556
1999-11-22 1999-11-22 24.6846 25.1450 24.4912 25.0250 4525318.956
1999-11-23 1999-11-23 25.0018 25.4351 24.9515 25.2185 5622139.724
1999-11-24 1999-11-24 25.0482 25.0482 24.6614 24.9283 3144923.734
Out[165]: Index(['a', 'aa', 'aapl', 'abbv', 'abc'], dtype='object')
Out[170]: date
1999-11-18 39.1928
1999-11-19 37.6251
1999-11-22 39.9442
1999-11-23 37.5138
1999-11-24 38.0889
Name: close, dtype: float64
21. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 21/40
In [177]: all_stocks_close.iloc[:5, :8]
In [178]: all_stocks_returns = all_stocks_close / all_stocks_close.shift(1)
Correlation
In [179]: corrs = all_stocks_returns.corr()
In [180]: corrs.iloc[:5, :5]
In [181]: corrs_modified = corrs.replace(1, np.nan)
In [182]: corrs_modified.iloc[:5, :5]
Out[177]:
a aa aapl abbv abc abt ace acn
date
1999-11-18 39.1928 24.2514 21.7924 NaN 3.09830 11.9023 14.3008 NaN
1999-11-19 37.6251 24.6846 22.4756 NaN 3.05589 11.8612 13.9294 NaN
1999-11-22 39.9442 25.0250 22.0185 NaN 2.95767 12.2375 14.0259 NaN
1999-11-23 37.5138 25.2185 22.6264 NaN 2.74784 12.0383 13.9294 NaN
1999-11-24 38.0889 24.9283 23.0373 NaN 2.74784 12.1964 13.3722 NaN
Out[180]:
a aa aapl abbv abc
a 1.000000 0.383146 0.361200 0.164908 0.169508
aa 0.383146 1.000000 0.283109 0.067920 0.254100
aapl 0.361200 0.283109 1.000000 0.032483 0.142794
abbv 0.164908 0.067920 0.032483 1.000000 0.282846
abc 0.169508 0.254100 0.142794 0.282846 1.000000
Out[182]:
a aa aapl abbv abc
a NaN 0.383146 0.361200 0.164908 0.169508
aa 0.383146 NaN 0.283109 0.067920 0.254100
aapl 0.361200 0.283109 NaN 0.032483 0.142794
abbv 0.164908 0.067920 0.032483 NaN 0.282846
abc 0.169508 0.254100 0.142794 0.282846 NaN
22. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 22/40
In [183]: corrs_modified.max().sort_values(ascending=False)[:5]
In [184]: corrs_modified.min().sort_values(ascending=False)[:5]
Problem
Create a price
index that provides insight into the daily tendencies of our 500 stocks. We are going to do this as
follows: calculate, for each day, the
mean
spread
between
open
and
closing
prices
across
all
500
stocks.
Next, vizualize your results. Plot on the same axes a the volatility
of
the
open
and
closing
price
spreads. Note
that you may want to normalize prior to plotting.
Additionally, find:
the 10 days with the highest average open-close spreads
the 10 days with the highest volatility in their open-close spreads
In [185]: _all_open_list = [all_stocks[tick].open for tick in tickers]
In [186]: _all_open_list[0].head()
In [187]: _all_open_list = [srs.rename(tickers[i])
for i, srs in enumerate(_all_open_list)]
In [188]: all_stocks_open = _all_open_list[0].to_frame().join(_all_open_list[1:])
Out[183]: eqr 0.884192
avb 0.884192
spg 0.883757
bxp 0.883757
vno 0.883234
dtype: float64
Out[184]: wyn 0.178355
pfg 0.175845
dfs 0.175066
dd 0.168540
hst 0.159443
dtype: float64
Out[186]: date
1999-11-18 42.2076
1999-11-19 39.8329
1999-11-22 38.3208
1999-11-23 39.4247
1999-11-24 37.2262
Name: open, dtype: float64
23. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 23/40
In [189]: all_stocks_open.iloc[:5, :8]
In [190]: mean_spread = (all_stocks_close - all_stocks_open).mean(axis=1)
In [191]: mean_spread.head()
In [193]: std_spread = (all_stocks_close - all_stocks_open).std(axis=1)
In [194]: std_spread.head()
In [196]: joined_spread = mean_spread.to_frame().join(std_spread)
In [197]: joined_spread = joined_spread / joined_spread.iloc[0]
Out[189]:
a aa aapl abbv abc abt ace acn
date
1999-11-18 42.2076 24.5183 22.1401 NaN 2.88847 12.1775 14.8134 NaN
1999-11-19 39.8329 24.5647 21.7608 NaN 3.08267 12.0003 14.3973 NaN
1999-11-22 38.3208 24.6846 22.3079 NaN 3.01348 12.0193 14.0259 NaN
1999-11-23 39.4247 25.0018 22.3079 NaN 2.95767 12.1585 14.2116 NaN
1999-11-24 37.2262 25.0482 22.6118 NaN 2.73445 12.0794 13.8848 NaN
Out[191]: date
1999-11-18 0.029497
1999-11-19 -0.136570
1999-11-22 -0.136851
1999-11-23 -0.344567
1999-11-24 0.187926
dtype: float64
Out[194]: date
1999-11-18 1.447925
1999-11-19 2.082665
1999-11-22 1.295058
1999-11-23 2.393124
1999-11-24 1.648351
dtype: float64
24. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 24/40
In [198]: ax = joined_spread.plot(figsize=(11, 8))
3.
Returns,
by
Month
Check if AAPL post better daily performance in certain months
Average
daily
returns,
by
month
In [200]: aapl_df["month"] = aapl_df.date.apply(lambda d: d.month)
In [201]: daily_by_month = aapl_df[["month", "return_simple"]
].groupby("month"
).agg([np.mean, np.std])
26. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 26/40
In [204]: ax = daily_by_month.plot.barh(figsize=(11, 11))
t = ax.set_title("aapl: average simple daily return, by month")
So, we see that, on average, AAPL has seen negative daily returns (with higher-than-average volatility) in
September - earnings?
Average
monthly
returns,
by
month
In [206]: aapl_df["year"] = aapl_df.date.apply(lambda d: d.year)
In [207]: monthly_by_month = aapl_df[["year", "month", "return_simple"]
].groupby(["year", "month"]
).agg([np.amin, np.amax])
In [209]: monthly_by_month = monthly_by_month.diff(axis=1)
27. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 27/40
In [211]: monthly_by_month.columns = monthly_by_month.columns.swaplevel()
In [212]: monthly_by_month = monthly_by_month["amax"]
In [213]: monthly_by_month.head()
In [218]: monthlies = monthly_by_month.groupby(level=1
).agg([np.mean, np.std])
Out[213]:
return_simple
year month
1998 1 0.262945
2 0.106502
3 0.151977
4 0.089050
5 0.076243
28. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 28/40
In [219]: ax = monthlies.plot.barh(figsize=(11, 11))
t = ax.set_title("aapl: average simple monthly return, by month")
4.
Autocorrelation
29. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 29/40
In [220]: ax = aapl_df.return_simple.to_frame(
).join(
aapl_df.return_simple.shift(1).rename("shifted")
).plot.scatter(x="shifted", y="return_simple", figsize=(11,
8))
scrub the outlier and try again
In [221]: t_and_t_plus_one = aapl_df.return_simple.to_frame(
).join(
aapl_df.return_simple.shift(1).rename("shif
ted")
)
In [222]: (t_and_t_plus_one < -.5).sum()
In [223]: for col in t_and_t_plus_one.columns:
t_and_t_plus_one.loc[t_and_t_plus_one[col] < -.5, col] = np.nan
Out[222]: return_simple 1
shifted 1
dtype: int64
30. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 30/40
In [224]: ax = t_and_t_plus_one.plot.scatter(x="shifted", y="return_simple", figsi
ze=(11, 8))
In [226]: aapl_autocorrs = pd.Series({
lag: aapl_df.return_simple.autocorr(lag=lag) for lag in list
(range(1, 20))
})
31. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 31/40
In [227]: ax = aapl_autocorrs.plot.bar(figsize=(11, 8), rot=0)
t = ax.title.set_text("aapl: autocorrelation")
Problem
Extract the most signification lag values from above, and understand whether day-of-the-week makes a
difference.
In [228]: aapl_df["weekday"] = aapl_df.date.apply(lambda d: d.weekday())
In [229]: aapl_df.weekday.value_counts(normalize=True).sort_index()
Out[229]: 0 0.187723
1 0.204534
2 0.205807
3 0.201732
4 0.200204
Name: weekday, dtype: float64
32. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 32/40
In [230]: aapl_df[aapl_df.weekday == 4].return_simple.head()
In [231]: aapl_df[aapl_df.weekday == 4].return_simple.autocorr(lag=1)
In [232]: for w in sorted(aapl_df.weekday.unique()):
print(w, aapl_df[aapl_df.weekday == w].return_simple.autocorr(lag=1
), sep=": ")
Correlations,
with
rolling
window
returns
In [234]: aapl_df.return_simple.to_frame(
).join(
aapl_df.return_simple_rolling_21_mean.shift(1)
).corr()
In [235]: aapl_df.return_simple.to_frame(
).join(
aapl_df.return_simple_rolling_63_mean.shift(1)
).corr()
5.
Log
Returns
Out[230]: date
1998-01-02 NaN
1998-01-09 0.010519
1998-01-16 -0.022928
1998-01-23 0.009838
1998-01-30 -0.009699
Name: return_simple, dtype: float64
Out[231]: 0.06890856162364511
0: 0.03307978683299217
1: 0.02142547271905113
2: 0.01132272930404902
3: -0.025362416692087088
4: 0.06890856162364511
Out[234]:
return_simple return_simple_rolling_21_mean
return_simple 1.000000 -0.004471
return_simple_rolling_21_mean -0.004471 1.000000
Out[235]:
return_simple return_simple_rolling_63_mean
return_simple 1.00000 0.01565
return_simple_rolling_63_mean 0.01565 1.00000
33. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 33/40
Taking the logarithm of prices allows for easy returns calculations. Additionally, taking the log of a data set can
squeeze in outliers and reduce skewness:
In [237]: aapl_df["close_log"] = aapl_df.close.apply(np.log)
In [238]: ax = aapl_df.close_log.plot(figsize=(11, 8))
t = ax.set_title("aapl: log closing price")
35. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 35/40
In [241]: ax = aapl_df.return_log.hist(figsize=(11, 8), bins=100)
t = ax.set_title("aapl: log daily return")
In [242]: aapl_df.return_simple.skew()
In [243]: aapl_df.return_log.skew()
For skewed data with negative values, take the cube
root of it: to separate out the positive and negative values,
and it's interesting to look at each group's frequencies separately.
In [244]: aapl_df["return_cubrt"] = aapl_df.return_simple.apply(np.cbrt)
Out[242]: -1.198448451367238
Out[243]: -3.4413935608928203
36. 8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 36/40
In [246]: ax = aapl_df.return_cubrt.hist(figsize=(11, 8), bins=500)
t = ax.set_title("aapl: cube root of daily return")
In [247]: aapl_df["day_of_week"] = aapl_df.date.apply(lambda d: d.weekday())
In [248]: aapl_df.day_of_week.value_counts(normalize=True).sort_index()
In [249]: aapl_df["is_monday"] = aapl_df.day_of_week == 0
Out[248]: 0 0.187723
1 0.204534
2 0.205807
3 0.201732
4 0.200204
Name: day_of_week, dtype: float64