Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Analytics

2,324 views

Published on

QuantMinds 2019 presentation by Antoine Savine and Brian Huge of Danske Bank
How to incorporate Deep Learning into Risk Management and Regulatory Systems, structure and train deep neural nets for the purpose of revaluation of derivatives trading books

Published in: Economy & Finance
  • Be the first to comment

Deep Analytics

  1. 1. Deep Analytics Brian Huge and Antoine Savine, Danske Bank
  2. 2. Part I Revaluation context
  3. 3. • The key problem common to regulations  XVA, CCR, FRTB, PRIIPS, SIMM/MVA, …  Estimate distribution value/risk on future “exposure” dates • Methodology  Generate scenarios for state of market on exposure dates  Sometimes called ‘outer simulations’  Sampled from  Calibrated model / risk-neutral probability Q (XVA)  Historical model / probability P (CCR)  Prescribed by regulator / probability R (PRIIPS) • Then, compute future book value/risks in all scenarios • Regulators insist revaluation consistent with FO practice Today’s market Exposure date Revaluation problem outer distribution: P/Q/R Scenario 1  book value V1=f(S1) Scenario 2  book value V2=f(S2) Scenario n  book value Vn=f(Sn) Scenario N  book value VN=f(SN)
  4. 4. • Force full FO reval on every simulated scenario • Massive computation burden  All trades in the book must be valued  One by one  In (up to) thousands of scenarios  On (up to) hundreds of exposure dates  Together with all risks for SIMM/MVA (cheap with AAD) • Industry has been actively researching computationally efficient solutions • See e.g. Andreasen, “XVA on iPad Mini” presentations Brute Force Revaluation
  5. 5. • Transaction: collection of event driven cash-flows • Cash-flow: payment of a function(al) of state vector (S) path up to payment date • Trading book: collection of transactions = collection of cash-flows  Aggregates CF from all the transactions  Itself a (meta-) transaction • Book: hybrid derivative, depends on different assets from different classes • Valued in a hybrid model  Joint, arbitrage-free model on all relevant assets (a.k.a state vector S)  With correlation/dependence assumption  Calibrated to European option prices / implied volatility surfaces / marginal distributions  Written under some risk-neutral measure (even if outer model is not)  High dimensional, generally implemented with (inner) Monte-Carlo simulations • Trading book valued as one transaction in one run of the hybrid model • (In practice, we may want to split cash-flows into a small number of sub-books) Reval trading book as one trade   ,0p p t pCF f S t T  
  6. 6. • Outer simulations: sample market state simulated under P/Q/R • Revaluation in each scenario:  Hybrid (Q) model  Implemented with (inner) simulations  Nested simulations • Better than brute force  One set of nested simulations  Shared for all transactions • But computational burden still far too elevated Nested simulations Today’s market Exposure date outer simulations under P/Q/R Inner simulations under Q  book value V1  book value VN
  7. 7. Conventional approximations • Temptation to reduce computation burden at the expense of accuracy • For example: reduce number of inner simulations • Widespread practice: conventional approximations  Replace nested sims by “light” closed-form approximations  Hand crafted for different types of transactions no more nested simulations closed-form approximations instead analytic approx.  book value V1~g(S1) analytic approx.  book value V2~g(S2) analytic approx.  book value VN~g(SN)
  8. 8. Approximate analytics • Flawed in major ways:  High cost: develop/maintain approximate analytics for every transaction and every model  Typically inaccurate and biased – not numerical methods, no convergence property  Inconsistent with FO pricing: violates regulatory requirement  Forces to represent transactions by trade, not by cash-flow • But they do resolve computation burden • Can we design approximate analytics without the flaws? • Yes: Machine Learning analytics
  9. 9. ML pricing • Pricing problem (different from reval problem)  Given dynamics Q and cash-flow CF, find analytic formula for EQ[CF]  As a function of model parameters spot, volatility, (volatility of volatility) …  And product parameters strike, expiry, (barrier) … • Solutions  Exact formulas only for simplest models (like Black-Scholes) and products (Europeans, Barriers)  In all other cases, slower numerical methods are necessary  Many applications require fast pricing, including calibration and European risk management  Widely accepted solution: approximate analytics  Manually find approximate, but precise closed-form solutions  Generally, working on stochastic equations  (Major) example: SABR, Pat Hagan, 2001 • Costly and specific  Requires (considerable) human expertise and effort  Solves one model, one product at a time, not reusable
  10. 10. ML pricing (2) • Major current trend  McGhee, Ferguson-Green, Horvath and al, … 2018-2019  Find analytic approximations by machine learning  Machines learn from data, not mathematics  But find analytic approximations all the same  Automatically, without the human cost  In principle, for any model and any product  Using universal approximators, with convergence guarantees • A (smart, efficient) tabular approach  PV = f(mdl_prm, prd_prm)  Train universal approximator on prices, produced with slower numerical methods  Generalize to out-of-sample parameter sets • Train once offline, use forever • Trained approximator is the analytic
  11. 11. ML reval • Revaluation is a different problem  We need future_value = f(future_state), given fixed model and product parameters  Not a tabular approach: trains on path-wise samples, not prices  Disposable approximators: train once, use once  Must train online, quickly, without human supervision • But the core idea applies:  Find analytic approximation by ML techniques  Train universal approximators  Reap benefits of analytic approximations without the cost and flaws
  12. 12. ML reval (2) • Principle: approx. value function f by universal approximator f-hat • Value = (unknown) function of state (ignoring discounting) • Universal approximator  Parameterized family of functions of state  Asymptotically guaranteed to approximate any function to arbitrary accuracy  Example: linear combination of (fixed) basis functions  Where the gi form a basis of the function space, e.g. polynomials, splines, Fourier basis…   p Q t p t t T t V E CF S f S            ˆ ;tf S w    ˆ ;t i i t i f S w w g S 
  13. 13. Simulated training set • Simulate training set to calibrate the approximator  Generate m scenarios under hybrid model Q  For each scenario i, denote the realizations: • The simulated training set is X(i) is a vector, Y(i) is a real number • No nested simulations • Value given by trained approximator  target: ex ex ex p ex Q T p T T T T V E CF S f S                   and (ignoring discounting)ex p ex ii i i T p T T X S Y CF           , ,1 i i X Y i m 
  14. 14. Training the approximator • Value: • Approximate the minimization • So value = conditional expectation = minimum of usual cost function (mean squared error)      2 * where * arg min p Q T p T h T T V E CF S E Y X h X h E Y h X                                2 2 2 1 min ˆmin ; 1 ˆmin ; h w m i i w i E Y h X E Y f X w Y f X w m                 Because f-hat is a universal approximator that encodes functions in its weights, asymptotically Because the training examples were independently sampled from the correct distribution By definition of the conditional expectation
  15. 15. Training a linear approximator • Particular case: linear approximator • Then is known in closed form  Normal equation  Where  And potential (near) singularity of is classically corrected by  Cutting small singular values in the SVD decomposition of  Or Tikhonov regularization  Which also mitigates overfitting but requires hyper-parameter lambda    ˆ ; i i i f X w w g X        2 1 1 ˆw* arg min ; m i i w i Y f X w m         1 w* T T G G G Y                           1 1 1 1 2 1 2 ... ... ... ... ... ... n m m m n g X g X g X G g X g X g X              T G G T G UDV   1 w* T T G G I G Y   
  16. 16. The LSM algorithm • Previous slides discussed training linear approximators on simulated data  Also known as the LSM algorithm  Longstaff-Schwartz, 2001 and Carriere, 1996 • Well known in the derivatives industry  Applied for two decades  For Bermudas/Callables  (So we have been applying machine learning for a long time after all!) • Approximates continuation values  With (universal) linear approximators  Trained on simulated datasets  With fixed basis functions (often polynomials or splines) • Works well for Bermudas/Callables  Especially with POI (proxies only in indicators, see Huge-Savine, LSM Reloaded, 2017)  See also QuantMinds 2017 for application to XVA/RWA • Not so well for revaluation of trading books
  17. 17. The problem with linear models • For Bermudas/Callables:  We generally know what variables affect (continuation) value  E.g. for standard Bermuda: swap to maturity and discount to next call (and perhaps volatility)  So we pick appropriate basis functions features with hard coded rule • For trading book revaluation:  Choice of basis functions depends on cash-flows in the trading book  European options: only non-linear functions of the state variables  Basket options: only non-linear functions of basket • Basis functions must be found from the cash-flows in the book (Automatically)
  18. 18. How to find your basis functions • One possible strategy:  Inspect the cash-flows in the book  Find the major axes of non-linearity  Select basis functions of these axes  Not discussed further here, see scripting (Savine, QM2018 – Andreasen, QM2019) • Another possibility:  Find basis functions from simulated dataset  With neural nets (a.k.a deep learning) • Neural networks:  Extension of linear models  Basis functions are learned from the dataset  In ML/DL lingo: automatic feature extraction  Also universal approximators (Universal Approximation Theorem, see e.g. Horvath, 2019)
  19. 19. Part II Neural Revaluation
  20. 20. Feed-forward neural nets Input layer 0z x Hidden layers 1z Output layer ˆ Ly z feed-forward equation  1 1l l l l lz w g z b   1Lz  g: activation non-linear scalar function, applied element-wise
  21. 21. Automatic feature extraction vector of basis functions of x for regression  1 1L Lh g z  output linear regression on h ˆ L Ly w h b  hidden layers learn/build/encode basis vector h
  22. 22. Deeply learning future prices • Training: find optimal connection weights • Find regression weights and basis function encoding at the same time • No closed-form solution • Not even a convex problem • No algorithm with guaranteed convergence • Versions of gradient descent work well in practice but without guarantee       2 1 1 ˆw* arg min ; m i i w i Y f X w m      
  23. 23. Deeply Learning Finance • Financial DL is fundamentally different than classic ML contexts  We train ANNs on realizations  We target values = conditional expectations  We don’t attempt to predict realizations  We know that target (value) is a fixed, deterministic function of input (state)  In classic ML, dependency of output to input is not guaranteed and may be changing with time  We train on simulated data  Guaranteed sampling of the correct distribution  Guaranteed independence of samples  In classic ML, access to clean, IID data may be a major challenge
  24. 24. Deeply learning Finance (2) max(S2-K,0)
  25. 25. Challenges • Overfitting  Training on realizations more prone to overfit noise in training set  Classic DL applies regularization  Tikhonov: encourage small weights  Dropout: randomly drop units during training  Early stopping: when cross-validation error starts increasing • Sensitivities  We need good approx. not only of value  But also risks  Explicitly for SIMM/MVA  Implicitly for FRTB (ordering)  Universal Approximation Theorem  Extends to differentials, see Horvath and al., 2019  But only asymptotically  With finite capacity/training set, derivatives may be wrong  “Derivatives of a good approx. not a good approx. of derivatives”       2 2 1 1 ˆw* arg min ; m i i w i Y f X w w m n         ANN trained on simulated data in Black-Scholes Value Delta
  26. 26. Challenges (2) • Extrapolation  ANNs struggle to learn extrapolation slope  Extrapolation key for ordering (FRTB/ES) • “Unsupervised” Supervised Learning  ANN loss is not a convex function of weights  Optimization algorithms are not guaranteed to converge  In practice, variants of SGD like ADAM perform well in many cases  But without guarantee: cannot run risk management on faith  OK for training a network once and reuse forever  Like in pricing problems a la McGhee/Ferguson-Green/Horvath and al.  But not for revaluation  Networks only applicable for specific book  With specific model parameters calibrated to todays market  Train once, use once disposable networks  Must train in seconds to minutes, on their own  With some guarantee to find a “decent” function Value Another run, ANN trained on Black-Scholes simulated data Delta
  27. 27. Activation • Gaussian quadratures:  We approximate an integral  By choosing both weights and abscissas  The sum is exact for polynomials up to degree n  Choosing the split wisely will give a better approximation • Universal Approximation Theorem:  We can approximate with a 1 layer neural net  The UAT is a limit statement, for some functions we may need a lot of connections and weights • If we can choose activation functions appropriately we will  Need less connections and weights  Have better extrapolation properties  End up with financially meanignful results • Revaluation = conditional expectation  helps reason about activation    ˆ ;T Tf X f X w         1 b b n j j ja a f x dx W x g x dx w g x           f x W x g x
  28. 28. Activation (2) • For a European payoff in 1 dimension • Using Carr – Madan gives a weighted sum of call values  The value of a piecewise linear payoff • From samples of the conditional expectation we optimize over  We find the best piecewise linear payoff to approximate (hedge) the actual payoff. • Alternatives: hedge using digitals (CDF) or straddles (density)      i i i f x c x dx w c k                                1 1 T T T T T T T S T S S T S V S E f S S f x x dx f S f x x dx f x x dx f S f x p x dx f x c x dx                            Call and put values Carr – Madan Density / straddles CDF / digitals ,i iw k
  29. 29. Activation (3) Interpolation Payoff approximation Extrapolation -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 -1.5 -1 -0.5 0 0.5 1 1.5 Call Digital Straddle -0.6 -0.4 -0.2 0 0.2 0.4 -1.5 -1 -0.5 0 0.5 1 1.5 Call Digital Straddle -12 -10 -8 -6 -4 -2 0 2 4 6 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 -1.5 -1 -0.5 0 0.5 1 1.5 Call Digital Straddle Straddle has constant extrapolation with same value left and right Digital has constant extrapolation different left and right Call allow linear extrapolation
  30. 30. Activation (4) • We do not know the call value function in the underlying model or it is too slow to calculate • We approximate the call value with a sum of calls from a different model • Softplus is the call value in a logistic distribution with density where is moneyness • Examples used in finance  Jump diffusion models  Mixture models  Change of measure  log 1 x e 1 1 1 1 1x x v e e   s k x v       0 ; ! i J T i c k e c k i i              1 ; n M i i i c k p c k                              c k x x k dx x x x k dx x x f x dx x f x x k x                      use Carr – Madan with payoff
  31. 31. Activation (5)                  0 1 1 11 1 Activation Distribution Call CDF PDF ReLU Dirac 1 1 1 1 ELU Exponential 1 1 1 0 1 1 1 1 Softplus/Sigmoid Logistic log 1 1 1 1 Bachelier Gaussian Multiquadri x x x x v x x x x v x x ve x e x e x vx x x x v e v e e e v xN x n x N x n x                               3 22 21 1 2 2 22 c Student's T 1 1 1 1 v v x x x x x         
  32. 32. Activation (6) • Exotic products  In some cases we may directly observe that the network generalizes  For example, Bermudas may be viewed as Europeans on Europeans on … i.e. multi layer network • If we use iterated expectations • We find the iterative structure of the neural network     1 1 1 1 2 1 2 1 1 1 1 ˆ ˆ; ; T T T T T T T T T V E CF CF S E CF E CF S S E CF V S E CF f S w S f S w                       Two different neural nets
  33. 33. Activation (7) • Asymptotically, we know activation does not matter But with a finite network size, not all activations are equivalent • When activations are option values  Relu, ELU, softplus, Bachelier, Multiquadric The network  approximates values as combinations of Europeans (in approximate models)  approximates payoffs as piece-wise linear  extrapolates linearly • These activations behave best in finance • We found softplus works best for revaluation
  34. 34. Training algorithms • Weight optimization  1st order iterative algorithms like SGD empirically shown to perform well on variety of problems  Differentials of cost function  Computed by back-propagation for a cost similar to one evaluation  Frameworks like TensorFlow or your own/favourite AAD software implement back-prop automatically, behind the scenes • 2nd order algorithms?  (In our experience) ADAM converges significantly faster than vanilla SGD for financial nets  Roughly, ADAM = SGD + momentum + normalization of gradient by variance  Hence, ADAM approximates 2nd order search  Hinting that true 2nd order algorithms (like Levenberg-Marquardt or Conjugate Gradients) may perform better on financial nets (untested)
  35. 35. Regularization • Overfitting  Learn noise specific to training set, prevents correct generalization  Diagnosed with small training error combined with large test error  (Asymptotically) vanishes when the size of training set grows  Maths here (among other places): https://qr.ae/TWImtE  Of particular concern when training ANN to produce expectations out of samples • Regularization  Overfitting classically mitigated by regularization  Tikhonov regularization:  Penalize weight size to constraint parameters  Effectively mitigates overfitting by constraining weights  But what is it we prefer about small weights?  If we are going to constrain weights  Can we do this in a way that qualitatively improves the behaviour of the network?       2 2 1 1 ˆmin ; m i i i Y f X w m w n      
  36. 36. Differential regularization • Idea: constrain weights so the ANN produces correct sensitivities so • Combine the two: • Compare to Tikhonov: ex ex p ex Q T p T T T V E CF S E Y X               p exex ex ex ex p T TT Q T T T CF V Y E S E X S S X                           2 1 1 ˆw* arg min ; m i i w i Y f X w m                 2 1 ˆ ;1 w* arg min i im w i i i f X wY nm X X                       2 2 11 1 ˆw ; ˆ * arg min ;m i i w i im i ii i f X w n Y f X Y m Xm X w                     2 2 1 1 ˆw* arg min ; m i i w i Y f X w w nm         train net to produce correct values train net to produce correct sensitivities
  37. 37. Differential regularization (2)                 2 2 1 1 1 ˆw ; ˆ * arg min ;m m i i w i i ii i i f X m Y X Y f m w X n X w                        p ex ex i pi T T i i T CF Y X S        derivatives labels • a.k.a path-wise sensitivities • simulated with the training set • fed to the training algorithm           ˆ ; p ex ex i i p T T i i T CF f X w X S E              derivatives results • a.k.a (Approx.) value sensitivities • Computed in the network
  38. 38. • Extended training set  In addition to pathwise states  And cash-flows  We must simulate path-wise differentials  As sensitivity labels for the training set • Produced by classic path-wise AAD over Monte-Carlo simulations  Easily  Efficiently  As long as your Monte-Carlo engine is AAD aware Path-wise sensitivities         p ex ex i pi T T i i T CF Y X S                , , ,1 i i i i X Y i m Y X                   ex i i TX S     p ex ii p T T Y CF   
  39. 39. A brief reminder of AAD • Principle:  Given any scalar function h  Coded as a computer program  Compute all the differentials  In a time similar to one function evaluation  Automatically • How:  Build evaluation graph of h (automatically, e.g. with operator overloading)  That is, split calculation into elementary building blocks: +, -, *, /, log, exp, sqrt…  Which derivatives are known  Denote them with Ai = ancestors of xi so j < i  Sequence of operations and dependencies called ‘tape’ in AAD lingo  0 ,..., ny h x x i y x    ,i i j ix f x j A  … … … … … … ……1x 2x nx 1nx  2nx  ix Nx inputs operations resultjx  ,i i j ix f x j A 
  40. 40. AAD • Apply chain rule in reverse order through the tape:  because xN = y  by the chain rule, with Sj = successors of xj so i > j • Which gives us the reverse propagation algorithm for derivatives: 1 N y x    j i i Sj i j xy y x x x        … … … … … ……1x 2x ix Nx…nx 1nx  2nx  jx evaluation: x1  x2  …  xN  ,i i j ix f x j A  differentiation: dx1  dX2  …  dxN i i j j i fy y j A x x x         
  41. 41. Path-wise sensitivities with AAD • Path-wise Monte-Carlo simulation 1. Draw random numbers (typically, Brownian increments) 2. Run SDE to generate path: 3. Evaluate cash-flows • AAD instrumented simulation automatically puts this sequence on tape: • Running the adjoint equation backwards, we get all the differentials: • Including the path-wise derivative label for training nets: 1 2 10 *... ...ex k kT T TT T TS S S S S SX S          *, ,...,ex k p ex p T T T T T Y CF f S S S    … … … …1CF pY CF 0S 1TS exTS kTS *TS 2CF PCF path simulation CF evaluation … … … …1CF pY CF 0S 1TS exTS kTS *TS 2CF PCF 1 Y Y   P Y CF  2 Y CF  1 Y CF  *T Y S   kT Y S  exT Y S  1T Y S  0 Y S   exT Y Y X S     
  42. 42. • Path-wise sensitivities  Produced by the simulation engine  Efficiently and conveniently, with path-wise AAD  Fed to the training algorithm as derivative labels • We also need value sensitivities  Produced by the neural net  Computed with classic back-propagation equations  Also, efficiently and conveniently  Which is not surprising, since back-prop is also a form of AAD Value sensitivities
  43. 43. Back-propagation feed-forward: g: softmax back-prop: g’: sigmoid adjoint of feed-forward  1 1l l l l lz w g z b   0z x 1 1 1z w x b   2 2 1 2z w g z b   3 3 2 3 ˆy z w g z b   3 ˆ 1 y z     2 3 2 ˆ Ty z w z      1 1 1 1 ˆ ˆ ˆ ' Tl l l l l l l l zy y y g z w z z z z                1 2 1 2 ˆ ˆTy y z w z z        1 1 ˆ ˆTy y x w x z       sequence of matrix operations 0  L another sequence of matrix operations with same weights L  0
  44. 44. Value sensis with back-prop • Neural net with integrated back-prop  Back-prop: additional layers, also matrix operations  Net doubles in depth, unchanged number of weights  Weights sharing  regularization  Doubles cost of traversing net but estimates all sensis in addition to values feed-forward back-prop = more feed-forward 0z x 111 bwz x   2 1 22z w g z b   2 33 ˆ wy g z b    2 32 ˆ T z w y z      1 2 21 ˆ ˆT y z z z w y        1 1 ˆ ˆT w y y x x z      
  45. 45. Back-prop through value sensis • Back-prop written as additional feed-forward layers with shared weights • ANN now outputs values and sensis • (Low) cost: double net depth  double computation/train time • Valuation + differentiation written as sequence of matrix operations • This whole sequence is:  Efficiently differentiated for training  With another round of back-prop  Automatic with TensorFlow or your own/favourite DL or AAD framework  Of course, you can also do it manually and explicitly if you wish  And see how it is another sequence of matrix operations (not shown here)  Explicit second order differentiation is not necessary
  46. 46. (Simple) classic TensorFlow code
  47. 47. TensorFlow code (2)
  48. 48. TensorFlow code (3)
  49. 49. Results • Effective regularization • Net learns to produce correct sensis • And compute them efficiently • Additional benefits:  Improves correctness of value funct. shape  Improves ordering of paths against state  Mitigates extrapolation challenge no regularization differential regularization
  50. 50. Regularization in classic ML • Textbook ML example (Bishop):  Data comes from  Fit linear polynomial model  On 10 (noisy) training examples • Result:  Classic overfitting  Perfect fit of the training examples  Completely wrong generalization • Observe:  Training Values are perfectly fitted  But their sensitivities are completely wrong  sin 2y x noise  9 0 ˆ i i i y w x   
  51. 51. Tikhonov regularization lambda = 0.00001 lambda = 0.01 lambda = 0.0001 lambda = 0.1 lambda = 0.001 lambda = 1.0
  52. 52. Differential regularization lambda = 0.00001 lambda = 0.1 lambda = 0.001 lambda = 1.0 lambda = 0.01 lambda = 10.0
  53. 53. Regularization: ML and Finance • Differential regularization is  Natural and Powerful  Efficient and Easy (modulo a good understanding of AAD and back-prop) • Why is it (to our knowledge) unknown to ML?  ML mostly deals with real world data  No differential labels • In finance:  We have control over data (since we simulate it)  We can leverage it to find more powerful ML algorithms • Importantly:  Only part of the work is on ANN design and training  Most work is on data simulation  This is where we need other powerful financial techniques  Scripting and otherwise handling of cash-flows  Hierarchical models of everything  Generic parallel simulation engines  AAD
  54. 54. The extrapolation challenge • Neural nets struggle on extrapolation • Known problem, same (worse) with linear models • Bad extrapolation not acceptable in Finance  Messes with ordering, damaging ES computations, e.g. for FRTB  Inaccurate revaluation in stress scenarios  Wrong assessment and management of tail risk • Differential regularization  Helps  But does not eliminate problem • (In our experience)  Classic ML/DL methods help only marginally  For example, early stopping may prevent overfitting extrapolation slope  But does not help finding the correct extrapolation slope in the first place • We need a specific solution
  55. 55. Sol 1: widen sampling dist. of X • Sample from a wider distribution Q’  For example, increase volatility of simulation over period from today (0) to Tex  Leaving parameters unchanged after Tex  So the relation between X and Y is unchanged (because Y = functional of S path after Tex)  But X tails are pushed further away • Effectively resolves extrapolation • Without any work on nets, only simulation • Implementation  Trivial with one exposure date Tex  With multiple exposure dates  Requires different simulation sets for each exposure date  At significant computational cost for the production of the training sets  Can be avoided by importance sampling  Simulate full path from wider distribution Q’  Weight training labels by likelihood ratios exTX S 'p ex p T T dQ Y CF dQ          Today’s market Exposure date
  56. 56. Sol 2: stabilize Y with antithetics • Monte-Carlo simulation with antithetic branching:  Simulate additional antithetic path from every ex date  Y = average cash-flows from main and antithetic* path  Since CF are generally linear in tails, value ~ 0.5(CF + CF*)  (In linear region) Y is the value, not a realization  Train extrapolation slope on values, with Noise effectively removed by antithetic • Also, effectively resolves extrapolation problem • Again, only by working on the simulation engine • At (roughly) double simulation cost ex1 ex2 ex3 main path antithetic branch
  57. 57. Part III One Training Set Simulator
  58. 58. One Analytic Engine • One single system design and implementation for:  Front-Office valuation and risk management  Regulatory risk management  XVA, CCR, FRTB, MVA, …  Customer analytics • Architecture based on 3 pillars (4 with ML) 1. One transaction representation, down to cash flows (including books and regulations) 2. One model hierarchy 3. One simulation/risk engine, parallel and AAD aware 4. Integrated ML
  59. 59. One Analytic: Danske Bank
  60. 60. One cash flow engine • We aggregate and manipulate cash-flows from different transactions  Swaps, options, exotics… • All transactions must be  Represented consistently  Described in terms of cash-flows • Common denominator to all transactions/books/nettings sets: event driven cash-flows • In general, cash-flows are functionals of state path prior to payment:  ,0p p t pCF f S t T  
  61. 61. One cash flow engine (2) • Scripting applies a language to represent all cash-flows  Human readable: a ”financial” programming language  Computer readable by visitors (code that reads and interprets scripts) • Scripting = one unique interface to all booking systems • Scripted cash-flows aggregated (compressed) to represent trading books • Regulatory amounts also scripted as options on trading book values simple example: (uncollateralized) CVA = European put on netting set • For more on scripting see Savine QM 2018 and Andreasen QM 2019
  62. 62. (automated) CF extraction One transaction engine notional: N1 start date: t0 end date: t1 type: payer fixed leg: C, ann, 30/360 float leg: EUR3M+s swap booking system notional: N2 underlying: EUR/USD expiry: T strike: K option booking system from t0 to t1 every 1y: fixed_leg pays -N1*C*cvg(start(), end(), 30/360) on payDate() from t0 to t1 every 3m: float_leg pays N1*(Libor(start(), end(), EUR3M)+s) *cvg(start(), end(), act.360) on payDate() on T: opt pays N2*max(spot(EUR/USD)-K,0) aggregation (compression) from t0 to t1 every 1y: pv pays -N1*C*cvg(start(), end(), 30/360) on payDate() from t0 to t1 every 3m: pv pays N1*(Libor(start(), end(), EUR3M)+s) *cvg(start(), end(), act.360) on payDate() on T: pv pays N2*max(spot(EUR/USD),0) transaction scripts trading book as one script model selection model simulated paths cash-flow eval  Y Tex  X training set train and use approximators
  63. 63. Dynamic hybrid One Model hierarchy Linear models Linear market • Store today’s prices of stocks, bonds, indices… and interpolate today’s curves: rates, spreads, dividends, repos, … • Compute discounts and forward prices/rates for all maturities • Used by FO for linear transactions Molecule price Molecule rate • Store and interpolate today’s implied volatilities (Black-Scholes, Bachelier, Heston, SABR, …) • Compute European options of all strikes and maturities • Used by FO for European options Dynamic price Dynamic rate • Produce scenarios with arbitrage-free dynamics for market variables • Used by FO for exotic options • Calibrated to molecules Hybrid/Regulatory model • Correctly assembles and correlates dynamic models • Used by FO for Hybrid options, xVA, (CCR), PRIIPS … (inflation etc.) Molecules a.k.a Implied Volatility Surfaces Dynamic models
  64. 64. One Method hierarchy Molecules = Marginal distributions Fourier invertion • Interpolate discretely observed European option quotes • Pricing of European cash flows CF = f(ST) Stochastic processes = Copulas • Calibrate to marginal distributions • (Implicitly) define joint distributions (a.k.a copulas) • Typically not directly applied to pricing Monte Carlo Simulators Hybrid MC • Find their parameters from the stochastic process • Pricing path dependent cash-flows CF = f(St,0<t<Tp) • (Callable = Path Dep with LSM) Neural Nets • Nets ”calibrate” to a simulated training set • Pricing/Reval any trade/book with a trained neural net Neural Net Closed Form Linear market = Expectations Linear market surfaces • Interpolate dsicretely observed linear quotes • Pricing of linear cash flows CF = alpha * S + beta SLV (Heston) Local vol (Dupire) MFC MC (rates) LV MC (prices)
  65. 65. One Risk Engine • Risk engine  Combines models, methods and transactions  Calibrates models  Simulates state paths  Evaluates cash-flows path-wise  Produces a training set  Trains neural nets  Computes values and risks • Risk engine implementation  Accommodates any model (via appropriate API) and instrument (via scripting)  Leverages hardware parallelism  Vectorization  Multi-threading  (GPU)  Incorporates AAD for efficient production of differentials
  66. 66. One Analytic with ML • Today any FO derivatives system includes:  Simulation: generation of Monte-Carlo paths for the state vector  Evaluation of ”payoff” along simulated paths  (For Bermudan/American options) Estimate ”continuation” values with LSM • Hence, a FO derivatives system can train a simple linear model to estimate continuation values (at least Bermudan options) • With our One Analytic Engine, this process is generalized so:  Any model can generate samples for any transaction  Including whole trading book  And regulatory calculation as a (hybrid) option on the cash flows of the book  Any network can be trained on these samples, including derivatives regularization  With computational and numerical efficiency:  Parallel simulation  AAD
  67. 67. Thank you for your attention Slides available on https://www.slideshare.net/AntoineSavine
  68. 68. Everything about AAD and MC It would not be much of an exaggeration to say that Antoine Savine's book ranks as the 21st century peer to Merton's 'Continuous-Time Finance‘. Vladimir Piterbarg This book [...] addresses the challenges of AAD head on. [...] The exposition is [...] ideal for a Finance audience. The conceptual, mathematical, and computational ideas behind AAD are patiently developed in a step-by-step manner, where the many brain-twisting aspects of AAD are de-mystified. For real-life application projects, the book is loaded with modern C++ code and battle-tested advice on how to get AAD to run for real. [...] Start reading! Leif Andersen An indispensable resource for any quant. Written by experts in the field and filled with practical examples and industry insights that are hard to find elsewhere, the book sets a new standard for computational finance. Paul Glasserman A passion to instruct A knack for clarity An obsession with detail A luminous writer An instant classic. Bruno Dupire

×