Huge number and variety of open source libraries. </li></ul>
Some Styles of Trading <ul><li>HFT </li></ul>Market-making or scalping (opinions vary) in sub-second windows. <ul><li>Day Trading </li></ul>Buying & selling a stock in the same trading day, attempting to profit on same-day price moves. <ul><li>Position Trading </li></ul>Attempting to profit on trends over a window of a few days to a few years <ul><li>Swing Trading </li></ul>Attempting to profit on volatility over a window of a few days to a few weeks
How do humans get an edge or time the market? <ul><li>Dumb luck.
OK, how can I automate this? <ul><li>Convert market data to signals.
The software interprets the signals & decides how to trade. </li><ul><li>Refinement towards implementation – Map a discrete set of signal responses to a discrete set of trading decisions. </li></ul><li>Simulate trades based on these decisions.
Goal is optimize for profitable trades, of course. </li></ul>
Leads us to: Optimization Algos and/or Machine Learning! <ul><li>Artificial Neural Networks
The genome, a double serves as the settings, the 'randomness', for each candidate. </li></ul>
Evolving the Candidates <ul><li>The trades generated during backtesting will be used to score the individual candidate.
The GA will select the fittest candidates for mating, based on their scores.
The GA will recombine (crossover) and mutate (spontaneously change values) mated genomes. </li></ul>
More Details <ul><li>I have simplified some things for this presentation.
I pre-screen the stocks used, for swing trading appropriateness. I don't want to wait while it runs through the whole Nasdaq.
I backtest & evolve on a subset of available data.
Then I try (backtest w/o evolution) the top evolved candidates against recent out-of-sample data. </li></ul>
Other Comments <ul><li>The GA is the easy part.
Designing the machine to work with it is a bit of an art.
Reduce the solution space by limiting choices. </li><ul><li>Ex: Pre-screen stocks, favor discrete choices </li></ul><li>Allow the GA plenty of wiggle room. </li><ul><li>Ex: Large genome, multiple voting schemes </li></ul><li>How meta to get? I hard-coded some values that could have been part of the genome. </li></ul>
Last Minute Notes <ul><li>Termination Condition: Num of stagnant generations
Scoring – I'm using the ending account balance (cash + approx val of open positions) as the score.
The chosen scoring formula has a critical impact on the out come. It provides the 'motivation' for the GA to 'improve'.
What happens when you try a score like numGoodTrades / numBadTrades?
Watchmaker only supports positive double score values.
The Trade PnL calculation includes transaction costs (my target broker is $2.50 a trade) </li></ul>
Evolve & test models on 20080101 thru (Today - 45 trading days). This is the in-sample data.
Stagnation Condition: quit after 7 generations w/ no improvement.
Keep the best 2 evolved individuals for each stock, for a total of 8 reserved candidates.
Test them on out-of sample data of (Today - 44 trading days) thru Today.
Print out the trading results of the top 3 out-of-sample tests. </li></ul>
Results – Talkin' 'bout an evolution... Run 1 Run 2 Best candidate score Vs. the number of generations Yes, the score is in $, but since this is in-sample data, please take it with a grain of salt.
Results – Discussion of <ul><li>These 2 demo runs took about 5 mins each. I cut corners with some of my settings (number of stocks examined; stagnation), in order to keep the run time short.
The graphs show that evolution is a choppy process, due in part to my use of aggressive crossover & mutation strategies as well my choice not to enable elitism (best members transcend their own generation). </li></ul>
Results – Conclusions <ul><li>On average, I am making about 3% per simulated trade. The max gain was about 10%, the max loss was about 7%. The standard deviation was about 4.5%.
So, the generated models appear to have predictive power.
The validity of these results depends on my belief that my backtesting routine is both reasonable & bug-free.
No guarantee that the evolved models will continue to work when facing new, out-of-sample data. I'm still just betting. But, maybe, with an edge. </li></ul>