I was shocked to figure out that any good backtest result, which I have previously obtained, could simply be converging to a local minima that performs well just on the test set or in other words, winning the lottery due to random initialization. In order to prevent this kind of test-set overfitting.
I have developed a system that is able to simultaneously train 10K trials, which are all differently initialized, in GPU and then validated with the mean portfolio, rather than selecting the best trial. More details of the method and the blind-set results are present in this presentation.
This is also first method I guess, which optimizes a constant rebalanced portfolio for a defined risk-adjusted reward and an assumed transaction cost, using Reinforcement Learning (Policy Gradient).
2. UCRP Optimization
- Current portfolio selection methods only address the selection of an optimal
buy & hold portfolio but do not help selecting a constant rebalanced portfolio.
- Due to the mean-reversion, a minimum volatility constant rebalanced portfolio,
that has no-positive return when simply being held, can also generate profits.
- In this work, I have rather tried to select an optimal portfolio for a given trading
policy (such as UCRP) and risk-adjusted reward, under transaction costs (1%).
- Despite portfolio weights, a divergence threshold, which is used for deciding
when to rebalance back to the selected constant portfolio is also optimized.
3. UCRP Optimization
- The asset weights of a buy & hold portfolio are constantly in change due to the
price changes. UCRP re-balances portfolio back when they diverge too much.
- Red coloured assets below are Inverse ETF products to help hedging the risk.
4. Test-Set Overfitting!
- Regular approach in machine-learning for preventing overfitting the training set
is using a validation set to decide when to terminate the training process.
- While optimizing a portfolio-weight vector, one has risk of finding a policy that
performs well on the validation set but would not generalize on a blind test set.
- In fact, due to the random-initialization, one can obtain a portfolio weight that
performs already best on the validation set that will be used for early-stopping.
- For that reason, it is difficult to prevent test-set overfitting and deciding on an
early-stopping epoch that can be utilized for re-training weights for live-trading.
5. Training a Population
- Instead of optimizing a single portfolio-weight, train a population of them in
parallel. At the initialization, some will already be overfitting the validation set.
- After each epoch of training, use the mean-weight of top 50% candidates for
calculating the validation loss that will be used for early-stopping the training.
- Combine train and validation set to train the population until the early-stopping
epoch. Use the mean-weight of top 50% candidates for evaluating in a test set.
- In this project, 8192 portfolio-weight candidates, which include 33 ETF(s), are
optimized in parallel via Autograd package of PyTorch with a RTX2060 GPU.
10. Conclusion & Future-work
- Paper-trading has already been done since the past 3-months and has been
performing in parallel with back-test results. Next goal is to start live-trading.
- Instead of optimizing the divergence threshold, train a basic model that makes
decision of when to rebalance portfolio back to the selected constant weights
or liquidate it, using diverged weights and selected portfolio returns as inputs.
- More evolutionary approaches for accelerating the training such as resampling
the bottom X% candidates from the distribution of the rest of the candidates.
- Real-time 3D visualization of the population during training using TensorBoard.
11. Who I am?
I am Chief Data Scientist (CDS) of an Anti-Money Laundering startup,
Hawk:AI. I was also CDS at ConnectedLife GmbH, a global All-in-One
Smart Living & Healthcare Technology provider. I founded AI startups
(LivingRooms GmbH & OTA Expert Inc) and also worked in internationally
reputable Research Institutes including Socio-Digital Systems (Human
Experience & Design) Group in Computer Mediated Living Laboratory of
Microsoft Research Cambridge (MSRC) and Quality & Usability Group of
Deutsche Telekom Innovation Laboratories (T-Labs), besides Computer
Vision & Pattern Analysis (VPALAB), Computer Graphics (CGLAB) and
Distributed Artificial Intelligence (DAI-Labor) laboratories of Sabanci
University & TU-Berlin where I have co-authored 35+ publications on AI