Using Java & Genetic Algorithms to Beat the Market

16,409 views
15,904 views

Published on

I presented this at JavaOne 2011 along with a demo of software that I've written. Since you won't see the demo, I added a few more slides to explain it.

Published in: Business, Economy & Finance
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
16,409
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
360
Comments
0
Likes
13
Embeds 0
No embeds

No notes for slide

Using Java & Genetic Algorithms to Beat the Market

  1. 1. Using Java & Genetic Algorithms to Beat the Market Matthew Ring JavaOne 2011 BOF Session 22382 [20111007 15:11CDT – MRing – added results pages]
  2. 2. 401K Problem?
  3. 3. I can't speak for you, but...
  4. 4. I know, I'll write software to time the market!
  5. 5. Give it the 'ole college try, son!
  6. 6. Shoestring Toolkit <ul><li>K.I.S.S. – Let's trade stocks.
  7. 7. Cheap or free market data. Especially daily OHLCV data.
  8. 8. Generally Liquid.
  9. 9. Low transaction fees.
  10. 10. Many brokers offer equity trading APIs. </li></ul><ul><li>Java – My primary language.
  11. 11. Free.
  12. 12. Huge number and variety of open source libraries. </li></ul>
  13. 13. Some Styles of Trading <ul><li>HFT </li></ul>Market-making or scalping (opinions vary) in sub-second windows. <ul><li>Day Trading </li></ul>Buying & selling a stock in the same trading day, attempting to profit on same-day price moves. <ul><li>Position Trading </li></ul>Attempting to profit on trends over a window of a few days to a few years <ul><li>Swing Trading </li></ul>Attempting to profit on volatility over a window of a few days to a few weeks
  14. 14. How do humans get an edge or time the market? <ul><li>Dumb luck.
  15. 15. Inside information.
  16. 16. Gut instinct.
  17. 17. Find repeating patterns in the market.
  18. 18. Manipulate other participants.
  19. 19. Better or faster information.
  20. 20. Faster execution. </li></ul>
  21. 21. So, for which edge should *I* try to write software? <ul><li>I like the sound of “Find repeating patterns in the market”
  22. 22. Patterns? … Signals?
  23. 23. Technical Analysis!
  24. 24. TA-Lib : Technical Analysis Library – Open-source API for C/C++, Java , Perl, Python and 100% Managed .NET </li></ul>
  25. 25. What is Technical Analysis? <ul><li>Bands
  26. 26. Oscillators
  27. 27. Moving averages
  28. 28. Candlesticks
  29. 29. Trading decisions as reactions to these signals
  30. 30. Voodoo? </li></ul>
  31. 31. OK, how can I automate this? <ul><li>Convert market data to signals.
  32. 32. The software interprets the signals & decides how to trade. </li><ul><li>Refinement towards implementation – Map a discrete set of signal responses to a discrete set of trading decisions. </li></ul><li>Simulate trades based on these decisions.
  33. 33. Goal is optimize for profitable trades, of course. </li></ul>
  34. 34. Leads us to: Optimization Algos and/or Machine Learning! <ul><li>Artificial Neural Networks
  35. 35. Simulated Annealing
  36. 36. Emergent Behavior (swarm, cellular automata)
  37. 37. Genetic Programming
  38. 38. Genetic Algorithms
  39. 39. Support Vector Machines </li></ul>
  40. 40. Genetic Algorithms Sources: http://www.slideshare.net/kancho/genetic-algorithm-by-example (Nobal Niraula)
  41. 41. Genetic Algorithms Sources: http://www.slideshare.net/Mathijsje/genetic-algorithms-7276974 (Mathijs van Meerkerk)
  42. 42. Some possibly familiar examples of GA...
  43. 43. <ul><li>Open Source, Java, created by Dan Dyer
  44. 44. Provides easy-to-follow examples & interface-based API. </li></ul>
  45. 45. Our Machine Market Data Operational Settings Trading Decisions ? Backtesting Feedback GA
  46. 46. Piecing our Machine Together <ul><li>Get the OHLCV market data as CSV files, load into DB. (boring, will not be shown here)
  47. 47. Generate signals from market data with TA-LIB.
  48. 48. Generate choices for trading decisions.
  49. 49. React to the signals
  50. 50. Map reactions to choices
  51. 51. Test the choices made. </li></ul>
  52. 52. Generate Signals
  53. 53. Generate Choices for Trading Decisions What Decisions? How about... <ul><li>When to buy?
  54. 54. How much to buy?
  55. 55. How long to hold?
  56. 56. Predicted low price (for buying)?
  57. 57. Predicted high price (for selling)?
  58. 58. Price at which I should sell prior to end of holding period (opportunistic sell)? </li></ul>
  59. 59. Generate Choices for Trading Decisions
  60. 60. Generate Choices for Trading Decisions
  61. 61. Generate Choices for Trading Decisions <ul><li>Recall that number of choices for each decision is purposely limited.
  62. 62. Choice values are generated from 'random' doubles, scaled to meet their various requirements. </li></ul>
  63. 63. React to the signals <ul><li>Each signal feeds, one day at a time, into a device I named Thresholds.
  64. 64. Each Thresholds instance is initialized w/ a 'random', sorted array of doubles, scaled to span the range of the signal it is to consume.
  65. 65. This initialization step establishes both the number of bins and the bin boundaries. </li></ul>public int findBin(double testVal)
  66. 66. React to the signals <ul><li>A Thresholds selects a bin based on the signal level via a simple sigVal < binBoundary check.
  67. 67. The bin index is the reaction.
  68. 68. Sort of like an A/D converter, if you pretend that the signal is analog, sampled daily.
  69. 69. A subclass provides a memory of its previous reaction, so it reacts to both the current signal level and the change from its previous reaction. </li></ul>
  70. 70. Summary, so far... <ul><li>Daily stock OHLCV data become Signals.
  71. 71. I have chosen 24 various TA signals.
  72. 72. I have defined 6 trading-related decisions: Buy?, How much?, Holding period? Low price guess? High price guess? Opportunistic sell price?
  73. 73. For each decision, for each signal, a Thresholds instance will react. </li></ul>
  74. 74. Converting reactions to decisions <ul><li>Each threshold bin in each decision layer is mapped to one of the decision choices.
  75. 75. The mappings are established 'randomly'
  76. 76. The choice that gets the most bin reactions is selected.
  77. 77. Winner take-all voting.
  78. 78. Note that some mappings will effectively abstain from voting
  79. 79. Ties are broken based on iteration order. </li></ul>
  80. 80. Converting reactions to decisions <ul><li>Now to get a little fancier...
  81. 81. Added combinational voting to each layer.
  82. 82. Some 'randomly' selected bins are ANDed with others, then 'randomly' mapped to choices for the given decision layer.
  83. 83. More opinions in the mix. </li></ul>
  84. 84. Backtesting the decisions <ul><li>Rules, laws & things to keep in mind </li><ul><li>Cash account trades settle in T+3 days.
  85. 85. Avoid free-riding: </li><ul><li>Buy X at T
  86. 86. Sell X at T', where T' < T+3
  87. 87. Buy Y on T' with proceeds from sale of X
  88. 88. Cannot sell Y before T'+3
  89. 89. If you do, the Federal Reserve Board requires that your broker freeze your account for 90 days </li></ul></ul></ul>
  90. 90. Backtesting the decisions <ul><li>More rules, laws & things to keep in mind </li><ul><li>Avoid day trading: </li><ul><li>Buy X on T
  91. 91. Sell X later on T
  92. 92. If you do this more than 3x in a 5-day period, you will be classified as a Pattern Day Trader
  93. 93. FINRA rules require that Pattern Day Traders use margin accounts with a Min. balance of $25K </li></ul><li>When backtesting: </li><ul><li>Allow for slippage (be pessimistic in fill price)
  94. 94. Market will eat up high bids & low asks. </li></ul></ul></ul>
  95. 95. Backtesting the decisions <ul><li>Based on the last slides, I usually set minHoldDays=3
  96. 96. Iterate signal data while trying to create trades based on machine advice & market actuals.
  97. 97. Remember to skip first max 'lookback' number of days, as many signals involve moving averages.
  98. 98. Strongly bias 'buy' fills toward the high, 'sell' fills toward the low. </li></ul>
  99. 99. Backtesting the decisions <ul><li>Check proposed buy & sell prices against the current day open as a sanity check.
  100. 100. Sell is greedy – will try to sell at Max(predicted high price, chosen opportunistic sell price)
  101. 101. Any position where holdDays >= maxHoldDays is dumped at the end of the day at a price biased toward the low and the close. </li></ul>
  102. 102. Evolving the Candidates <ul><li>Notice that the previous slides have mentioned 'random' & 'randomly' several times.
  103. 103. This is where the GA comes into play.
  104. 104. The genome, a double[6850] serves as the settings, the 'randomness', for each candidate. </li></ul>
  105. 105. Evolving the Candidates <ul><li>The trades generated during backtesting will be used to score the individual candidate.
  106. 106. The GA will select the fittest candidates for mating, based on their scores.
  107. 107. The GA will recombine (crossover) and mutate (spontaneously change values) mated genomes. </li></ul>
  108. 110. More Details <ul><li>I have simplified some things for this presentation.
  109. 111. I pre-screen the stocks used, for swing trading appropriateness. I don't want to wait while it runs through the whole Nasdaq.
  110. 112. I backtest & evolve on a subset of available data.
  111. 113. Then I try (backtest w/o evolution) the top evolved candidates against recent out-of-sample data. </li></ul>
  112. 114. Other Comments <ul><li>The GA is the easy part.
  113. 115. Designing the machine to work with it is a bit of an art.
  114. 116. Reduce the solution space by limiting choices. </li><ul><li>Ex: Pre-screen stocks, favor discrete choices </li></ul><li>Allow the GA plenty of wiggle room. </li><ul><li>Ex: Large genome, multiple voting schemes </li></ul><li>How meta to get? I hard-coded some values that could have been part of the genome. </li></ul>
  115. 117. Last Minute Notes <ul><li>Termination Condition: Num of stagnant generations
  116. 118. Scoring – I'm using the ending account balance (cash + approx val of open positions) as the score.
  117. 119. The chosen scoring formula has a critical impact on the out come. It provides the 'motivation' for the GA to 'improve'.
  118. 120. What happens when you try a score like numGoodTrades / numBadTrades?
  119. 121. Watchmaker only supports positive double score values.
  120. 122. The Trade PnL calculation includes transaction costs (my target broker is $2.50 a trade) </li></ul>
  121. 123. Relevant Links <ul><li>http://watchmaker.uncommons.org/
  122. 124. http://www.eoddata.com/
  123. 125. http://ta-lib.org/
  124. 126. http://www.slideshare.net/kancho/genetic-algorithm-by-example
  125. 127. http://www.slideshare.net/Mathijsje/genetic-algorithms-7276974 </li></ul>
  126. 128. Results – About the demo you didn't get to see (sorry). <ul><li>Pick 4 random stocks from the list of pre-screened Nasdaq stocks.
  127. 129. Each model starts w/ $2000.
  128. 130. Evolve & test models on 20080101 thru (Today - 45 trading days). This is the in-sample data.
  129. 131. Stagnation Condition: quit after 7 generations w/ no improvement.
  130. 132. Keep the best 2 evolved individuals for each stock, for a total of 8 reserved candidates.
  131. 133. Test them on out-of sample data of (Today - 44 trading days) thru Today.
  132. 134. Print out the trading results of the top 3 out-of-sample tests. </li></ul>
  133. 135. Results – Talkin' 'bout an evolution... Run 1 Run 2 Best candidate score Vs. the number of generations Yes, the score is in $, but since this is in-sample data, please take it with a grain of salt.
  134. 136. Results – Out-of-sample trades Run 1 Run 2
  135. 137. Results – Discussion of <ul><li>These 2 demo runs took about 5 mins each. I cut corners with some of my settings (number of stocks examined; stagnation), in order to keep the run time short.
  136. 138. The graphs show that evolution is a choppy process, due in part to my use of aggressive crossover & mutation strategies as well my choice not to enable elitism (best members transcend their own generation). </li></ul>
  137. 139. Results – Conclusions <ul><li>On average, I am making about 3% per simulated trade. The max gain was about 10%, the max loss was about 7%. The standard deviation was about 4.5%.
  138. 140. So, the generated models appear to have predictive power.
  139. 141. The validity of these results depends on my belief that my backtesting routine is both reasonable & bug-free.
  140. 142. No guarantee that the evolved models will continue to work when facing new, out-of-sample data. I'm still just betting. But, maybe, with an edge. </li></ul>

×