Time Series Analysis Sample Code

8/1/2019 Aiden Wu's Sample Code
localhost:8888/nbconvert/html/Desktop/PythonRelated/python_fundamentals-master/Aiden Wu's Sample Code.ipynb?download=false 1/40
Time
Series
Analysis
In [49]: import os
import datetime
import requests
import zipfile
import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.tsa.ar_model import AR
from sklearn.metrics import r2_score, mean_squared_error
%matplotlib inline
0.
Get
Data
In [50]: sorted(list(filter(lambda f: not f.startswith("."), os.listdir("."))))
In [51]: def get_data(data_url):
with requests.get(data_url) as r:
with zipfile.ZipFile(io.BytesIO(r.content)) as z:
z.extractall()
In [52]: data_url = "http://quantquote.com/files/quantquote_daily_sp500_83986.zi
p"
get_data(data_url=data_url)
data_dir = os.path.join("quantquote_daily_sp500_83986", "daily")
Out[50]: ['README.md',
'Untitled.ipynb',
'census_data.csv',
'data_science_answers.ipynb',
'data_science_raw.ipynb',
'pandas_basics.ipynb',
'pandas_basics_addntl.ipynb',
'pandas_basics_answers.ipynb',
'python_basics.ipynb',
'python_basics_addntl.ipynb',
'python_basics_answers.ipynb',
'quantquote_daily_sp500_83986',
'requirements.txt',
'time_series_analysis.ipynb',
'time_series_analysis_answers.ipynb']

Scope
out
data
directory
In [53]: stock_csv_names = sorted(os.listdir(data_dir))
In [54]: cols = ['date', 'time', 'open', 'high', 'low_price', 'close', 'volume']
Check
sample
data
In [55]: df = pd.read_csv(os.path.join(data_dir, stock_csv_names[0]),names=cols)
In [56]: df.shape
In [64]: df.head()
In [58]: df.dtypes
We see that most of the typing here looks good (no object / string representations of numeric data) but we do
see that date is coming in as an int rather than a datetime.
Out[56]: (3452, 7)
Out[64]:
date time open high low_price close volume
0 1999-11-18 0 42.2076 46.3820 37.4581 39.1928 4.398181e+07
1 1999-11-19 0 39.8329 39.8885 36.9293 37.6251 1.139020e+07
2 1999-11-22 0 38.3208 40.0091 37.1613 39.9442 4.654716e+06
3 1999-11-23 0 39.4247 40.4729 37.3375 37.5138 4.268903e+06
4 1999-11-24 0 37.2262 38.9052 37.1056 38.0889 3.602367e+06
Out[58]: date int64
time int64
open float64
high float64
low_price float64
close float64
volume float64
dtype: object

In [59]: df.isnull().sum()
In [60]: df.duplicated().sum()
No missing data, and no duplicate data.
In [61]: df.date = pd.to_datetime(df.date.astype(str), infer_datetime_format=True
)
check that this is a proper time-series data set, i.e. we're indexed on time, which in this case will mean that we
have one date for every row:
In [67]: df.date.nunique()/len(df)
In [31]: df=df.set_index('date')
In [68]: df.time.nunique()
In [69]: df=df.drop('time', axis=1)
Problem
Get an iterable of DataFrames , one for each stock in our dataset, with the wrangling we did above included.
In [82]: def get_csv_path(csv_name, stock_csv_folder=data_dir):
return os.path.join(stock_csv_folder, csv_name)
Out[59]: date 0
time 0
open 0
high 0
low_price 0
close 0
volume 0
dtype: int64
Out[60]: 0
Out[67]: 1.0
Out[68]: 1

In [83]: def get_df(csv_name, cols=cols):
df = pd.read_csv(get_csv_path(csv_name),
names=cols,
usecols=list(filter(lambda c: c!= "time", cols)))
df.date = pd.to_datetime(df.date.astype(str), infer_datetime_format=
True)
return df.set_index("date", drop=False)
In [84]: dfs_iter = (get_df(csv_name) for csv_name in stock_csv_names)
In [85]: dfs_list = list(dfs_iter) #convert generator object to list
In [89]: len(dfs_list)
1.
Prices
&
Returns
Prices
In [90]: aapl_df = get_df("table_aapl.csv")
In [93]: pd.Series(aapl_df.index).quantile([0, 1])
In [94]: aapl_df.isnull().sum()
In [95]: aapl_df.duplicated().sum()
Out[89]: 500
Out[93]: 0.0 1998-01-02
1.0 2013-08-09
Name: date, dtype: datetime64[ns]
Out[94]: date 0
open 0
high 0
low_price 0
close 0
volume 0
dtype: int64
Out[95]: 0

In [96]: ax = aapl_df.close.plot(figsize=(11, 8))
t = ax.set_title("aapl: closing price")
A couple of observations on the above graph:
the stock's price has increased over time
there is a good bit of variability in between the start and end points
it would have been nice to buy AAPL back in the 90s!
We can see clearly that there are a number of diﬀerent components, if you will, to the above time series:
there's an upward trend over time
there look to be some periodic-ish patterns
there's a fair amount of noise-ish stuﬀ, too

In [97]: def plot_time_series_decomposition(series, freq=252):
res = sm.tsa.seasonal_decompose(series, freq=freq)
fig, (ax1,ax2,ax3) = plt.subplots(3,1, figsize=(11,8))
p1 = res.trend.plot(ax=ax1, rot=0)
t1 = ax1.title.set_text("trend")
p2 = res.seasonal.plot(ax=ax2, rot=0)
t2 = ax2.title.set_text("seasonal")
p3 = res.resid.plot(ax=ax3, rot=0)
t3 = ax3.title.set_text("resid")
fig.tight_layout()
In [98]: plot_time_series_decomposition(aapl_df.close)
Returns
In [101]: aapl_df["return_gross"] = aapl_df.close.divide(aapl_df.close.shift(1))
In [104]: aapl_df["return_net"] = aapl_df.return_gross - 1

In [105]: aapl_df.return_net.describe()
Returns Distirbution
In [112]: ax = aapl_df.return_net.hist(figsize=(11, 8), bins=200)
t = ax.set_title("aapl: simple daily return")
In [107]: (aapl_df.return_simple < -.1).sum()
Out[105]: count 3925.000000
mean 0.001672
std 0.029774
min -0.518150
25% -0.013670
50% 0.000859
75% 0.016312
max 0.183749
Name: return_net, dtype: float64
Out[107]: 10

In [108]: aapl_df[aapl_df.return_simple < -.1]
Out[108]:
date open high low_price close volume return_gross return_sim
date
1999-
01-14
1999-
01-14
11.06280 11.18440 9.98318 10.07560 5.813583e+07 0.892333 -0.1076
1999-
09-21
1999-
09-21
17.79520 17.80980 16.77650 16.81050 1.138997e+08 0.875187 -0.1248
2000-
09-29
2000-
09-29
13.64490 14.10200 12.34160 12.52160 2.339738e+08 0.481850 -0.5181
2000-
12-06
2000-
12-06
7.11421 7.29413 6.80785 6.99264 4.577349e+07 0.845883 -0.1541
2001-
07-18
2001-
07-18
10.62510 10.69810 9.92974 10.10480 3.854362e+07 0.829214 -0.1707
2002-
06-19
2002-
06-19
8.44660 8.55844 8.20832 8.33476 5.893970e+07 0.850621 -0.1493
2002-
07-17
2002-
07-17
7.84362 7.87766 7.38652 7.62479 4.142656e+07 0.878923 -0.1210
2008-
01-23
2008-
01-23
132.46100 136.15700 122.67700 135.03800 1.181346e+08 0.880185 -0.1198
2008-
09-29
2008-
09-29
116.41400 116.41400 97.82880 103.12900 9.250666e+07 0.823352 -0.1766
2013-
01-24
2013-
01-24
451.33600 456.96900 441.77900 442.29000 4.939171e+07 0.876643 -0.1233

In [113]: ax = aapl_df.return_net.plot(figsize=(11, 8))
t = ax.set_title("aapl: net daily return")

In [114]: plot_time_series_decomposition(aapl_df.return_simple.dropna())
In [115]: aapl_df["return_simple_rolling_21_mean"] = aapl_df.return_simple.rolling
(21).mean()

In [122]: ax = aapl_df.return_simple_rolling_21_mean.plot(figsize=(11, 8))
t = ax.set_title("aapl: net daily return, rolling 21 mean(one-month)")
retain this functionality to try other windows
In [127]: def set_and_plot_rolling_mean(window):
col_name = f"return_simple_rolling_{window}_mean"
aapl_df[col_name] = aapl_df.return_simple.rolling(window).mean()
ax = aapl_df[col_name].plot(figsize=(11, 8))
t = ax.set_title(f"aapl: net daily return, rolling {window} day mea
n")

In [121]: set_and_plot_rolling_mean(63)

In [126]: def set_and_plot_rolling_std(window):
col_name = f"return_simple_rolling_{window}_std"
aapl_df[col_name] = aapl_df.return_simple.rolling(window).std()
ax = aapl_df[col_name].plot(figsize=(11, 8))
t = ax.set_title(f"aapl: net daily return, rolling {window} day std"
)

In [128]: set_and_plot_rolling_std(63)
Plot rolling mean and std together
In [129]: [c for c in aapl_df.columns if "63" in c]
In [132]: aapl_quarterly = aapl_df[[c for c in aapl_df.columns if "63" in c]].drop
na()
In [134]: aapl_quarterly = aapl_quarterly / aapl_quarterly.iloc[0]
In [135]: aapl_quarterly.iloc[0:5]
Out[129]: ['return_simple_rolling_63_mean', 'return_simple_rolling_63_std']
Out[135]:
return_simple_rolling_63_mean return_simple_rolling_63_std
date
1998-04-03 1.000000 1.000000
1998-04-06 0.973895 1.004831
1998-04-07 0.601837 0.829599
1998-04-08 0.699827 0.788331
1998-04-09 0.681255 0.784945

In [136]: ax = aapl_quarterly.plot(figsize=(11, 8))
t = ax.set_title("aapl: simple daily return, rolling 63 mean and std")
inverse relationship between the two
In [138]: aapl_quarterly.corr()
In [139]: aapl_quarterly.iloc[0:5]
Out[138]:
return_simple_rolling_63_mean 1.000000 -0.370614
return_simple_rolling_63_std -0.370614 1.000000
Out[139]:
date
1998-04-03 1.000000 1.000000
1998-04-06 0.973895 1.004831
1998-04-07 0.601837 0.829599
1998-04-08 0.699827 0.788331
1998-04-09 0.681255 0.784945

In [141]: plot_time_series_decomposition(aapl_df.return_simple_rolling_63_std.drop
na())
Compare rolling returns and volatility with diﬀering window lengths and quantify these relationships using
correlation.
In [142]: def compare_return_to_vol(return_col, vol_col, df=aapl_df):
return df[[return_col, vol_col]].corr()
In [144]: compare_return_to_vol("return_simple_rolling_63_mean", "return_simple_ro
lling_63_std")
Out[144]:

In [145]: aapl_df.columns
In [148]: compare_return_to_vol("return_simple_rolling_166_mean", "return_simple_r
olling_63_std")
In [149]: set_and_plot_rolling_std(166)
Out[145]: Index(['date', 'open', 'high', 'low_price', 'close', 'volume', 'return_
gross',
'return_simple', 'return_net', 'return_simple_rolling_21_mean',
'return_simple_rolling_63_mean', 'return_simple_rolling_166_mea
n',
'return_simple_rolling_252_mean', 'return_simple_rolling_63_st
d'],
dtype='object')
Out[148]:

In [151]: compare_return_to_vol("return_simple_rolling_166_mean", "return_simple_r
olling_166_std")
In [152]: aapl_halflyish = aapl_df[[c for c in aapl_df.columns if "166" in c]
].dropna()
In [153]: aapl_halflyish = aapl_halflyish / aapl_halflyish.iloc[0]
In [154]: ax = aapl_halflyish.plot(figsize=(11, 8))
t = ax.set_title("aapl: simple daily return, rolling 166 mean and std")
2.
Multi-Stock
Analysis
In [155]: len(dfs_list)
Out[151]:
Out[155]: 500

In [157]: os.listdir(data_dir)[:5]
In [158]: os.listdir(data_dir)[-5:]
In [159]: def get_stock_name(csv_name):
file_name = csv_name.split(".")[0]
return file_name.split("_")[1]
In [160]: dfs_list_indexed = [
pd.concat({get_stock_name(csv_name): dfs_list[i]}, axis=1)
for i, csv_name in enumerate(stock_csv_names)]
In [161]: all_stocks = dfs_list_indexed[0].join(dfs_list_indexed[1:])
In [162]: all_stocks.iloc[:5, :8]
Out[157]: ['table_dlph.csv',
'table_cat.csv',
'table_coh.csv',
'table_mcd.csv',
'table_ca.csv']
Out[158]: ['table_schw.csv',
'table_cl.csv',
'table_te.csv',
'table_vz.csv',
'table_hrs.csv']
Out[162]:
a aa
date open high low_price close volume date open
date
1999-11-
18
1999-11-
18
42.2076 46.3820 37.4581 39.1928 4.398181e+07
1999-11-
18
24.5183
1999-11-
19
1999-11-
19
39.8329 39.8885 36.9293 37.6251 1.139020e+07
1999-11-
19
24.5647
1999-11-
22
1999-11-
22
38.3208 40.0091 37.1613 39.9442 4.654716e+06
1999-11-
22
24.6846
1999-11-
23
1999-11-
23
39.4247 40.4729 37.3375 37.5138 4.268903e+06
1999-11-
23
25.0018
1999-11-
24
1999-11-
24
37.2262 38.9052 37.1056 38.0889 3.602367e+06
1999-11-
24
25.0482

In [164]: all_stocks["aa"].head()
Close
In [165]: tickers = all_stocks.columns.levels[0]
tickers[:5]
In [166]: _all_close_list = [all_stocks[tick].close for tick in tickers]
In [170]: _all_close_list[0].head()
In [171]: _all_close_list = [srs.rename(tickers[i])
for i, srs in enumerate(_all_close_list)]
In [176]: all_stocks_close = _all_close_list[0].to_frame().join(_all_close_list[1
:])
Out[164]:
date open high low_price close volume
date
1999-11-18 1999-11-18 24.5183 24.5183 24.0579 24.2514 2658683.414
1999-11-19 1999-11-19 24.5647 24.7581 24.2978 24.6846 3022133.556
1999-11-22 1999-11-22 24.6846 25.1450 24.4912 25.0250 4525318.956
1999-11-23 1999-11-23 25.0018 25.4351 24.9515 25.2185 5622139.724
1999-11-24 1999-11-24 25.0482 25.0482 24.6614 24.9283 3144923.734
Out[165]: Index(['a', 'aa', 'aapl', 'abbv', 'abc'], dtype='object')
Out[170]: date
1999-11-18 39.1928
1999-11-19 37.6251
1999-11-22 39.9442
1999-11-23 37.5138
1999-11-24 38.0889
Name: close, dtype: float64

In [177]: all_stocks_close.iloc[:5, :8]
In [178]: all_stocks_returns = all_stocks_close / all_stocks_close.shift(1)
Correlation
In [179]: corrs = all_stocks_returns.corr()
In [180]: corrs.iloc[:5, :5]
In [181]: corrs_modified = corrs.replace(1, np.nan)
In [182]: corrs_modified.iloc[:5, :5]
Out[177]:
a aa aapl abbv abc abt ace acn
date
1999-11-18 39.1928 24.2514 21.7924 NaN 3.09830 11.9023 14.3008 NaN
1999-11-19 37.6251 24.6846 22.4756 NaN 3.05589 11.8612 13.9294 NaN
1999-11-22 39.9442 25.0250 22.0185 NaN 2.95767 12.2375 14.0259 NaN
1999-11-23 37.5138 25.2185 22.6264 NaN 2.74784 12.0383 13.9294 NaN
1999-11-24 38.0889 24.9283 23.0373 NaN 2.74784 12.1964 13.3722 NaN
Out[180]:
a aa aapl abbv abc
a 1.000000 0.383146 0.361200 0.164908 0.169508
aa 0.383146 1.000000 0.283109 0.067920 0.254100
aapl 0.361200 0.283109 1.000000 0.032483 0.142794
abbv 0.164908 0.067920 0.032483 1.000000 0.282846
abc 0.169508 0.254100 0.142794 0.282846 1.000000
Out[182]:
a aa aapl abbv abc
a NaN 0.383146 0.361200 0.164908 0.169508
aa 0.383146 NaN 0.283109 0.067920 0.254100
aapl 0.361200 0.283109 NaN 0.032483 0.142794
abbv 0.164908 0.067920 0.032483 NaN 0.282846
abc 0.169508 0.254100 0.142794 0.282846 NaN

In [183]: corrs_modified.max().sort_values(ascending=False)[:5]
In [184]: corrs_modified.min().sort_values(ascending=False)[:5]
Problem
Create a price
index that provides insight into the daily tendencies of our 500 stocks. We are going to do this as
follows: calculate, for each day, the
mean
spread
between
open
and
closing
prices
across
all
500
stocks.
Next, vizualize your results. Plot on the same axes a the volatility
of
the
open
and
closing
price
spreads. Note
that you may want to normalize prior to plotting.
Additionally, ﬁnd:
the 10 days with the highest average open-close spreads
the 10 days with the highest volatility in their open-close spreads
In [185]: _all_open_list = [all_stocks[tick].open for tick in tickers]
In [186]: _all_open_list[0].head()
In [187]: _all_open_list = [srs.rename(tickers[i])
for i, srs in enumerate(_all_open_list)]
In [188]: all_stocks_open = _all_open_list[0].to_frame().join(_all_open_list[1:])
Out[183]: eqr 0.884192
avb 0.884192
spg 0.883757
bxp 0.883757
vno 0.883234
dtype: float64
Out[184]: wyn 0.178355
pfg 0.175845
dfs 0.175066
dd 0.168540
hst 0.159443
dtype: float64
Out[186]: date
1999-11-18 42.2076
1999-11-19 39.8329
1999-11-22 38.3208
1999-11-23 39.4247
1999-11-24 37.2262
Name: open, dtype: float64

In [189]: all_stocks_open.iloc[:5, :8]
In [190]: mean_spread = (all_stocks_close - all_stocks_open).mean(axis=1)
In [191]: mean_spread.head()
In [193]: std_spread = (all_stocks_close - all_stocks_open).std(axis=1)
In [194]: std_spread.head()
In [196]: joined_spread = mean_spread.to_frame().join(std_spread)
In [197]: joined_spread = joined_spread / joined_spread.iloc[0]
Out[189]:
a aa aapl abbv abc abt ace acn
date
1999-11-18 42.2076 24.5183 22.1401 NaN 2.88847 12.1775 14.8134 NaN
1999-11-19 39.8329 24.5647 21.7608 NaN 3.08267 12.0003 14.3973 NaN
1999-11-22 38.3208 24.6846 22.3079 NaN 3.01348 12.0193 14.0259 NaN
1999-11-23 39.4247 25.0018 22.3079 NaN 2.95767 12.1585 14.2116 NaN
1999-11-24 37.2262 25.0482 22.6118 NaN 2.73445 12.0794 13.8848 NaN
Out[191]: date
1999-11-18 0.029497
1999-11-19 -0.136570
1999-11-22 -0.136851
1999-11-23 -0.344567
1999-11-24 0.187926
dtype: float64
Out[194]: date
1999-11-18 1.447925
1999-11-19 2.082665
1999-11-22 1.295058
1999-11-23 2.393124
1999-11-24 1.648351
dtype: float64

In [198]: ax = joined_spread.plot(figsize=(11, 8))
3.
Returns,
by
Month
Check if AAPL post better daily performance in certain months
Average
daily
returns,
by
month
In [200]: aapl_df["month"] = aapl_df.date.apply(lambda d: d.month)
In [201]: daily_by_month = aapl_df[["month", "return_simple"]
].groupby("month"
).agg([np.mean, np.std])

In [202]: daily_by_month
In [203]: daily_by_month.columns = daily_by_month.columns.droplevel()
Out[202]:
return_simple
mean std
month
1 0.002190 0.034613
2 0.001037 0.025793
3 0.003630 0.027720
4 0.002755 0.030354
5 -0.000095 0.025757
6 0.000647 0.024897
7 0.003089 0.030554
8 0.001921 0.023400
9 -0.000956 0.042424
10 0.003242 0.033829
11 0.001369 0.028624
12 0.000834 0.025551

In [204]: ax = daily_by_month.plot.barh(figsize=(11, 11))
t = ax.set_title("aapl: average simple daily return, by month")
So, we see that, on average, AAPL has seen negative daily returns (with higher-than-average volatility) in
September - earnings?
Average
monthly
returns,
by
month
In [206]: aapl_df["year"] = aapl_df.date.apply(lambda d: d.year)
In [207]: monthly_by_month = aapl_df[["year", "month", "return_simple"]
].groupby(["year", "month"]
).agg([np.amin, np.amax])
In [209]: monthly_by_month = monthly_by_month.diff(axis=1)

In [211]: monthly_by_month.columns = monthly_by_month.columns.swaplevel()
In [212]: monthly_by_month = monthly_by_month["amax"]
In [213]: monthly_by_month.head()
In [218]: monthlies = monthly_by_month.groupby(level=1
).agg([np.mean, np.std])
Out[213]:
return_simple
year month
1998 1 0.262945
2 0.106502
3 0.151977
4 0.089050
5 0.076243

In [219]: ax = monthlies.plot.barh(figsize=(11, 11))
t = ax.set_title("aapl: average simple monthly return, by month")
4.
Autocorrelation

In [220]: ax = aapl_df.return_simple.to_frame(
).join(
aapl_df.return_simple.shift(1).rename("shifted")
).plot.scatter(x="shifted", y="return_simple", figsize=(11,
8))
scrub the outlier and try again
In [221]: t_and_t_plus_one = aapl_df.return_simple.to_frame(
).join(
aapl_df.return_simple.shift(1).rename("shif
ted")
)
In [222]: (t_and_t_plus_one < -.5).sum()
In [223]: for col in t_and_t_plus_one.columns:
t_and_t_plus_one.loc[t_and_t_plus_one[col] < -.5, col] = np.nan
Out[222]: return_simple 1
shifted 1
dtype: int64

In [224]: ax = t_and_t_plus_one.plot.scatter(x="shifted", y="return_simple", figsi
ze=(11, 8))
In [226]: aapl_autocorrs = pd.Series({
lag: aapl_df.return_simple.autocorr(lag=lag) for lag in list
(range(1, 20))
})

In [227]: ax = aapl_autocorrs.plot.bar(figsize=(11, 8), rot=0)
t = ax.title.set_text("aapl: autocorrelation")
Problem
Extract the most signiﬁcation lag values from above, and understand whether day-of-the-week makes a
diﬀerence.
In [228]: aapl_df["weekday"] = aapl_df.date.apply(lambda d: d.weekday())
In [229]: aapl_df.weekday.value_counts(normalize=True).sort_index()
Out[229]: 0 0.187723
1 0.204534
2 0.205807
3 0.201732
4 0.200204
Name: weekday, dtype: float64

In [230]: aapl_df[aapl_df.weekday == 4].return_simple.head()
In [231]: aapl_df[aapl_df.weekday == 4].return_simple.autocorr(lag=1)
In [232]: for w in sorted(aapl_df.weekday.unique()):
print(w, aapl_df[aapl_df.weekday == w].return_simple.autocorr(lag=1
), sep=": ")
Correlations,
with
rolling
window
returns
In [234]: aapl_df.return_simple.to_frame(
).join(
aapl_df.return_simple_rolling_21_mean.shift(1)
).corr()
In [235]: aapl_df.return_simple.to_frame(
).join(
aapl_df.return_simple_rolling_63_mean.shift(1)
).corr()
5.
Log
Returns
Out[230]: date
1998-01-02 NaN
1998-01-09 0.010519
1998-01-16 -0.022928
1998-01-23 0.009838
1998-01-30 -0.009699
Name: return_simple, dtype: float64
Out[231]: 0.06890856162364511
0: 0.03307978683299217
1: 0.02142547271905113
2: 0.01132272930404902
3: -0.025362416692087088
4: 0.06890856162364511
Out[234]:
return_simple return_simple_rolling_21_mean
return_simple 1.000000 -0.004471
return_simple_rolling_21_mean -0.004471 1.000000
Out[235]:
return_simple return_simple_rolling_63_mean
return_simple 1.00000 0.01565
return_simple_rolling_63_mean 0.01565 1.00000

Taking the logarithm of prices allows for easy returns calculations. Additionally, taking the log of a data set can
squeeze in outliers and reduce skewness:
In [237]: aapl_df["close_log"] = aapl_df.close.apply(np.log)
In [238]: ax = aapl_df.close_log.plot(figsize=(11, 8))
t = ax.set_title("aapl: log closing price")

In [239]: plot_time_series_decomposition(aapl_df.close_log)
In [240]: aapl_df["return_log"] = aapl_df.close_log - aapl_df.close_log.shift(1)

In [241]: ax = aapl_df.return_log.hist(figsize=(11, 8), bins=100)
t = ax.set_title("aapl: log daily return")
In [242]: aapl_df.return_simple.skew()
In [243]: aapl_df.return_log.skew()
For skewed data with negative values, take the cube
root of it: to separate out the positive and negative values,
and it's interesting to look at each group's frequencies separately.
In [244]: aapl_df["return_cubrt"] = aapl_df.return_simple.apply(np.cbrt)
Out[242]: -1.198448451367238
Out[243]: -3.4413935608928203

In [246]: ax = aapl_df.return_cubrt.hist(figsize=(11, 8), bins=500)
t = ax.set_title("aapl: cube root of daily return")
In [247]: aapl_df["day_of_week"] = aapl_df.date.apply(lambda d: d.weekday())
In [248]: aapl_df.day_of_week.value_counts(normalize=True).sort_index()
In [249]: aapl_df["is_monday"] = aapl_df.day_of_week == 0
Out[248]: 0 0.187723
1 0.204534
2 0.205807
3 0.201732
4 0.200204
Name: day_of_week, dtype: float64

In [250]: ax = aapl_df.loc[aapl_df.is_monday == True, "return_cubrt"].hist(
figsize=(11, 8), bins=100, density=True, alpha=.35)
ax = aapl_df.loc[aapl_df.is_monday == False, "return_cubrt"].hist(ax=ax,
alpha=.35, bins=100, density=True)
t = ax.set_title("aapl: cube root of daily return")
In [251]: aapl_df[["return_simple", "is_monday"]].groupby("is_monday").agg([np.mea
n, np.std])
6.
Autoregression
In [252]: aapl_returns_2010 = aapl_df.loc[aapl_df.date.apply(lambda d: d.year == 2
010), "return_simple"]
Out[251]:
return_simple
mean std
is_monday
False 0.001286 0.029905
True 0.003341 0.029161

In [253]: train = aapl_returns_2010.loc[:datetime.date(2010, 6, 30)]
In [254]: train.head()
In [255]: train.tail()
In [256]: test = aapl_returns_2010.loc[datetime.date(2010, 7, 1):datetime.date(201
0, 7, 31)]
In [257]: model = AR(train.values)
In [258]: model_fit = model.fit()
In [259]: model_fit.k_ar
In [260]: model_fit.params
In [261]: params_df = pd.DataFrame(model_fit.params,
index=[i for i in range(len(model_fit.params
))],
columns=["model_params"])
In [262]: autocorrs = aapl_autocorrs.rename("autocorr").loc[:13].to_frame()
Out[254]: date
2010-01-04 0.016518
2010-01-05 0.000701
2010-01-06 -0.016521
2010-01-07 -0.001990
2010-01-08 0.007175
Out[255]: date
2010-06-24 -0.006807
2010-06-25 -0.005433
2010-06-28 0.005089
2010-06-29 -0.044515
2010-06-30 -0.020156
Out[259]: 13
Out[260]: array([ 0.00246467, 0.04125616, 0.03683962, -0.25484031, 0.04155076,
0.09983197, -0.17533004, -0.02235045, -0.07667173, -0.08510721,
-0.01421416, -0.03022782, 0.04335507, 0.29210439])

In [263]: ax = autocorrs.join(params_df).plot.bar(figsize=(11, 8))
In [264]: predictions = model_fit.predict(start=len(train),
end=len(train) + len(test) -1,
dynamic=False)
In [265]: predictions_df = pd.DataFrame(predictions, index=test.index, columns=["p
redicted"])

In [266]: ax = test.to_frame().join(predictions_df).plot(figsize=(11, 8), rot=10)
In [ ]:
In [ ]:

Time Series Analysis Sample Code

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Time Series Analysis Sample Code

Similar to Time Series Analysis Sample Code (20)

Recently uploaded

Recently uploaded (20)

Time Series Analysis Sample Code