The lure of creating models to predict the stock market has drawn talent from fields beyond finance and economics, reaching into disciplines such as physics, computational chemistry, applied mathematics, electrical engineering and perhaps most recently statistics and what we now refer to as data science. The attraction is clear - the stock market (and the economy/internet at large) throws off massive and ever increasing reams of data from garden variety time-series to complex structured data sets like quarterly financials, to unstructured data sets like conference call transcripts, news articles and of course — tweets! While all this data holds promise - it also holds traps and blind alleys that can be tricky to avoid. In this session we’ll review some of the common (but not easy!) pitfalls to avoid in creating models for predicting stock returns; overfitting & exploding model complexity, non-stationary processes, time-travel illusions, and under-estimation of real-world costs.
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Modeling the Stock Market: Common pitfalls and how to avoid them!
1. Modeling the Stock Market:
Common pitfalls… and how to avoid them!
Jess Stauth
Portfolio Management and Research
Jess@quantopian.com / @jstauth
2. Disclaimer
Quantopian provides this presentation to help people write trading
algorithms - it is not intended to provide investment advice. More
specifically, the material is provided for informational purposes only
and does not constitute an offer to sell, a solicitation to buy, or a
recommendation or endorsement for any security or strategy, nor does
it constitute an offer to provide investment advisory or other services
by Quantopian. In addition, the content neither constitutes investment
advice nor offers any opinion with respect to the suitability of any
security or any specific investment.
3. Motivation
Building a beautiful backtest is easy!
But…
Don’t expect anyone to pay you for it!
Building a model that predicts the future is HARD!
But…
Many people will fight to pay you a lot for doing that!
4. Ok, so it’s hard. I love hard work!
What’s the catch?
• It can be hard to know
when you have what you
want – aka “future
predictor”!
• We “simulate” the future
(usually using the past!) to
validate our model
• But what if our simulation
doesn’t match reality?
• Or our data was flawed?
• Or we just got lucky?
• Or…
Idea
Data
Research/
Build
model
Simulate
Trade
$$$
5. Common pitfalls that turn into
1. Overfitting
2. Overtrading
3. Non-stationary processes / regime changes
4. Lookahead aka “time travel illusion”
5. Model complexity
7. 1. Overfitting
Real world example: The incredible shrinking portfolio
Example from A Quantopian author / model
developer in diligence.
A robust ‘information rich’ signal should show
stable or increasingly good performance (Sharpe
ratio) as you increase the number of assets
included.
Fundamental law of active management*:
IR = IC * sqrt(N)
Finding that your signal is degraded by expanding
the number of assets scored is a red flag that you
may have identified an unstable, noisy, or
spurious effect
How to avoid: Take care not to ‘over optimize’
your model on a small number of data points (in
our use case those are assets/ stock tickers)
*Grinold and Kahn. Active Portfolio Management – pdf online
8. This phenomenon of overly concentrated portfolios turned out to be
prevalent in the submissions to Quantopian’s daily contest.
In a ‘tearsheet feedback’ thread
and webinar we highlighted this
pitfall.
We ran a second feedback session
a few weeks later and…
9. 2. Overtrading – three real examples
Algo A Algo B Algo C
“low” costs
“high” costs
Trading algorithms developed with the assumption of “low” (or no) cost of trading in the markets often show
unrealistically good returns.
How to avoid: Using conservative cost estimates, and looking at the sensitivity of your stock market model to
the underlying assumption of what your costs will be can be the difference between profits and losses in the
real world!
10. 3. Regime Shift/Non-stationarity
• Many common time-series techniques assume data are stationary (constant mean and variance).
• Imagine doing all your research on data from 2016/17 and evaluating a model that makes money
shorting volatility…
• How to avoid: Know that markets are always changing and make sure to backtest over long
enough time ranges to see regime changes that might impact your model.
Vol Regimes – Quantopian Blog
11. 4. Time travel illusion: What did you know and
when did you know it?
• Classic date alignment fail examples:
• Drop the timestamp from close prices and build a daily technical factor...
You’ll prove that knowing the 4pm price at 9am would be super valuable!
• Modeling earnings surprises and assuming your model knows actual reported
earnings on quarter end dates, when IRL you don’t get them for 45+ DAYS
after…
• How to avoid: Same principle as with modeling market impact, be
conservative with your assumptions about data timeliness and check
your strategy’s robustness to lagged data over a range of lags.
Stationarity. A common assumption in many time series techniques is that the data are stationary. A stationary process has the property that the mean, variance and autocorrelation structure do not changeover time.
“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk”John von Neumann