Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- The AI Rush by Jean-Baptiste Dumont 2597985 views
- AI and Machine Learning Demystified... by Carol Smith 4148980 views
- 10 facts about jobs in the future by Pew Research Cent... 1160542 views
- Harry Surden - Artificial Intellige... by Harry Surden 925994 views
- Inside Google's Numbers in 2017 by Rand Fishkin 1505812 views
- Pinot: Realtime Distributed OLAP da... by Kishore Gopalakri... 742202 views

2,324 views

Published on

How to incorporate Deep Learning into Risk Management and Regulatory Systems, structure and train deep neural nets for the purpose of revaluation of derivatives trading books

Published in:
Economy & Finance

No Downloads

Total views

2,324

On SlideShare

0

From Embeds

0

Number of Embeds

1,084

Shares

0

Downloads

79

Comments

0

Likes

6

No embeds

No notes for slide

- 1. Deep Analytics Brian Huge and Antoine Savine, Danske Bank
- 2. Part I Revaluation context
- 3. • The key problem common to regulations XVA, CCR, FRTB, PRIIPS, SIMM/MVA, … Estimate distribution value/risk on future “exposure” dates • Methodology Generate scenarios for state of market on exposure dates Sometimes called ‘outer simulations’ Sampled from Calibrated model / risk-neutral probability Q (XVA) Historical model / probability P (CCR) Prescribed by regulator / probability R (PRIIPS) • Then, compute future book value/risks in all scenarios • Regulators insist revaluation consistent with FO practice Today’s market Exposure date Revaluation problem outer distribution: P/Q/R Scenario 1 book value V1=f(S1) Scenario 2 book value V2=f(S2) Scenario n book value Vn=f(Sn) Scenario N book value VN=f(SN)
- 4. • Force full FO reval on every simulated scenario • Massive computation burden All trades in the book must be valued One by one In (up to) thousands of scenarios On (up to) hundreds of exposure dates Together with all risks for SIMM/MVA (cheap with AAD) • Industry has been actively researching computationally efficient solutions • See e.g. Andreasen, “XVA on iPad Mini” presentations Brute Force Revaluation
- 5. • Transaction: collection of event driven cash-flows • Cash-flow: payment of a function(al) of state vector (S) path up to payment date • Trading book: collection of transactions = collection of cash-flows Aggregates CF from all the transactions Itself a (meta-) transaction • Book: hybrid derivative, depends on different assets from different classes • Valued in a hybrid model Joint, arbitrage-free model on all relevant assets (a.k.a state vector S) With correlation/dependence assumption Calibrated to European option prices / implied volatility surfaces / marginal distributions Written under some risk-neutral measure (even if outer model is not) High dimensional, generally implemented with (inner) Monte-Carlo simulations • Trading book valued as one transaction in one run of the hybrid model • (In practice, we may want to split cash-flows into a small number of sub-books) Reval trading book as one trade ,0p p t pCF f S t T
- 6. • Outer simulations: sample market state simulated under P/Q/R • Revaluation in each scenario: Hybrid (Q) model Implemented with (inner) simulations Nested simulations • Better than brute force One set of nested simulations Shared for all transactions • But computational burden still far too elevated Nested simulations Today’s market Exposure date outer simulations under P/Q/R Inner simulations under Q book value V1 book value VN
- 7. Conventional approximations • Temptation to reduce computation burden at the expense of accuracy • For example: reduce number of inner simulations • Widespread practice: conventional approximations Replace nested sims by “light” closed-form approximations Hand crafted for different types of transactions no more nested simulations closed-form approximations instead analytic approx. book value V1~g(S1) analytic approx. book value V2~g(S2) analytic approx. book value VN~g(SN)
- 8. Approximate analytics • Flawed in major ways: High cost: develop/maintain approximate analytics for every transaction and every model Typically inaccurate and biased – not numerical methods, no convergence property Inconsistent with FO pricing: violates regulatory requirement Forces to represent transactions by trade, not by cash-flow • But they do resolve computation burden • Can we design approximate analytics without the flaws? • Yes: Machine Learning analytics
- 9. ML pricing • Pricing problem (different from reval problem) Given dynamics Q and cash-flow CF, find analytic formula for EQ[CF] As a function of model parameters spot, volatility, (volatility of volatility) … And product parameters strike, expiry, (barrier) … • Solutions Exact formulas only for simplest models (like Black-Scholes) and products (Europeans, Barriers) In all other cases, slower numerical methods are necessary Many applications require fast pricing, including calibration and European risk management Widely accepted solution: approximate analytics Manually find approximate, but precise closed-form solutions Generally, working on stochastic equations (Major) example: SABR, Pat Hagan, 2001 • Costly and specific Requires (considerable) human expertise and effort Solves one model, one product at a time, not reusable
- 10. ML pricing (2) • Major current trend McGhee, Ferguson-Green, Horvath and al, … 2018-2019 Find analytic approximations by machine learning Machines learn from data, not mathematics But find analytic approximations all the same Automatically, without the human cost In principle, for any model and any product Using universal approximators, with convergence guarantees • A (smart, efficient) tabular approach PV = f(mdl_prm, prd_prm) Train universal approximator on prices, produced with slower numerical methods Generalize to out-of-sample parameter sets • Train once offline, use forever • Trained approximator is the analytic
- 11. ML reval • Revaluation is a different problem We need future_value = f(future_state), given fixed model and product parameters Not a tabular approach: trains on path-wise samples, not prices Disposable approximators: train once, use once Must train online, quickly, without human supervision • But the core idea applies: Find analytic approximation by ML techniques Train universal approximators Reap benefits of analytic approximations without the cost and flaws
- 12. ML reval (2) • Principle: approx. value function f by universal approximator f-hat • Value = (unknown) function of state (ignoring discounting) • Universal approximator Parameterized family of functions of state Asymptotically guaranteed to approximate any function to arbitrary accuracy Example: linear combination of (fixed) basis functions Where the gi form a basis of the function space, e.g. polynomials, splines, Fourier basis… p Q t p t t T t V E CF S f S ˆ ;tf S w ˆ ;t i i t i f S w w g S
- 13. Simulated training set • Simulate training set to calibrate the approximator Generate m scenarios under hybrid model Q For each scenario i, denote the realizations: • The simulated training set is X(i) is a vector, Y(i) is a real number • No nested simulations • Value given by trained approximator target: ex ex ex p ex Q T p T T T T V E CF S f S and (ignoring discounting)ex p ex ii i i T p T T X S Y CF , ,1 i i X Y i m
- 14. Training the approximator • Value: • Approximate the minimization • So value = conditional expectation = minimum of usual cost function (mean squared error) 2 * where * arg min p Q T p T h T T V E CF S E Y X h X h E Y h X 2 2 2 1 min ˆmin ; 1 ˆmin ; h w m i i w i E Y h X E Y f X w Y f X w m Because f-hat is a universal approximator that encodes functions in its weights, asymptotically Because the training examples were independently sampled from the correct distribution By definition of the conditional expectation
- 15. Training a linear approximator • Particular case: linear approximator • Then is known in closed form Normal equation Where And potential (near) singularity of is classically corrected by Cutting small singular values in the SVD decomposition of Or Tikhonov regularization Which also mitigates overfitting but requires hyper-parameter lambda ˆ ; i i i f X w w g X 2 1 1 ˆw* arg min ; m i i w i Y f X w m 1 w* T T G G G Y 1 1 1 1 2 1 2 ... ... ... ... ... ... n m m m n g X g X g X G g X g X g X T G G T G UDV 1 w* T T G G I G Y
- 16. The LSM algorithm • Previous slides discussed training linear approximators on simulated data Also known as the LSM algorithm Longstaff-Schwartz, 2001 and Carriere, 1996 • Well known in the derivatives industry Applied for two decades For Bermudas/Callables (So we have been applying machine learning for a long time after all!) • Approximates continuation values With (universal) linear approximators Trained on simulated datasets With fixed basis functions (often polynomials or splines) • Works well for Bermudas/Callables Especially with POI (proxies only in indicators, see Huge-Savine, LSM Reloaded, 2017) See also QuantMinds 2017 for application to XVA/RWA • Not so well for revaluation of trading books
- 17. The problem with linear models • For Bermudas/Callables: We generally know what variables affect (continuation) value E.g. for standard Bermuda: swap to maturity and discount to next call (and perhaps volatility) So we pick appropriate basis functions features with hard coded rule • For trading book revaluation: Choice of basis functions depends on cash-flows in the trading book European options: only non-linear functions of the state variables Basket options: only non-linear functions of basket • Basis functions must be found from the cash-flows in the book (Automatically)
- 18. How to find your basis functions • One possible strategy: Inspect the cash-flows in the book Find the major axes of non-linearity Select basis functions of these axes Not discussed further here, see scripting (Savine, QM2018 – Andreasen, QM2019) • Another possibility: Find basis functions from simulated dataset With neural nets (a.k.a deep learning) • Neural networks: Extension of linear models Basis functions are learned from the dataset In ML/DL lingo: automatic feature extraction Also universal approximators (Universal Approximation Theorem, see e.g. Horvath, 2019)
- 19. Part II Neural Revaluation
- 20. Feed-forward neural nets Input layer 0z x Hidden layers 1z Output layer ˆ Ly z feed-forward equation 1 1l l l l lz w g z b 1Lz g: activation non-linear scalar function, applied element-wise
- 21. Automatic feature extraction vector of basis functions of x for regression 1 1L Lh g z output linear regression on h ˆ L Ly w h b hidden layers learn/build/encode basis vector h
- 22. Deeply learning future prices • Training: find optimal connection weights • Find regression weights and basis function encoding at the same time • No closed-form solution • Not even a convex problem • No algorithm with guaranteed convergence • Versions of gradient descent work well in practice but without guarantee 2 1 1 ˆw* arg min ; m i i w i Y f X w m
- 23. Deeply Learning Finance • Financial DL is fundamentally different than classic ML contexts We train ANNs on realizations We target values = conditional expectations We don’t attempt to predict realizations We know that target (value) is a fixed, deterministic function of input (state) In classic ML, dependency of output to input is not guaranteed and may be changing with time We train on simulated data Guaranteed sampling of the correct distribution Guaranteed independence of samples In classic ML, access to clean, IID data may be a major challenge
- 24. Deeply learning Finance (2) max(S2-K,0)
- 25. Challenges • Overfitting Training on realizations more prone to overfit noise in training set Classic DL applies regularization Tikhonov: encourage small weights Dropout: randomly drop units during training Early stopping: when cross-validation error starts increasing • Sensitivities We need good approx. not only of value But also risks Explicitly for SIMM/MVA Implicitly for FRTB (ordering) Universal Approximation Theorem Extends to differentials, see Horvath and al., 2019 But only asymptotically With finite capacity/training set, derivatives may be wrong “Derivatives of a good approx. not a good approx. of derivatives” 2 2 1 1 ˆw* arg min ; m i i w i Y f X w w m n ANN trained on simulated data in Black-Scholes Value Delta
- 26. Challenges (2) • Extrapolation ANNs struggle to learn extrapolation slope Extrapolation key for ordering (FRTB/ES) • “Unsupervised” Supervised Learning ANN loss is not a convex function of weights Optimization algorithms are not guaranteed to converge In practice, variants of SGD like ADAM perform well in many cases But without guarantee: cannot run risk management on faith OK for training a network once and reuse forever Like in pricing problems a la McGhee/Ferguson-Green/Horvath and al. But not for revaluation Networks only applicable for specific book With specific model parameters calibrated to todays market Train once, use once disposable networks Must train in seconds to minutes, on their own With some guarantee to find a “decent” function Value Another run, ANN trained on Black-Scholes simulated data Delta
- 27. Activation • Gaussian quadratures: We approximate an integral By choosing both weights and abscissas The sum is exact for polynomials up to degree n Choosing the split wisely will give a better approximation • Universal Approximation Theorem: We can approximate with a 1 layer neural net The UAT is a limit statement, for some functions we may need a lot of connections and weights • If we can choose activation functions appropriately we will Need less connections and weights Have better extrapolation properties End up with financially meanignful results • Revaluation = conditional expectation helps reason about activation ˆ ;T Tf X f X w 1 b b n j j ja a f x dx W x g x dx w g x f x W x g x
- 28. Activation (2) • For a European payoff in 1 dimension • Using Carr – Madan gives a weighted sum of call values The value of a piecewise linear payoff • From samples of the conditional expectation we optimize over We find the best piecewise linear payoff to approximate (hedge) the actual payoff. • Alternatives: hedge using digitals (CDF) or straddles (density) i i i f x c x dx w c k 1 1 T T T T T T T S T S S T S V S E f S S f x x dx f S f x x dx f x x dx f S f x p x dx f x c x dx Call and put values Carr – Madan Density / straddles CDF / digitals ,i iw k
- 29. Activation (3) Interpolation Payoff approximation Extrapolation -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 -1.5 -1 -0.5 0 0.5 1 1.5 Call Digital Straddle -0.6 -0.4 -0.2 0 0.2 0.4 -1.5 -1 -0.5 0 0.5 1 1.5 Call Digital Straddle -12 -10 -8 -6 -4 -2 0 2 4 6 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 -1.5 -1 -0.5 0 0.5 1 1.5 Call Digital Straddle Straddle has constant extrapolation with same value left and right Digital has constant extrapolation different left and right Call allow linear extrapolation
- 30. Activation (4) • We do not know the call value function in the underlying model or it is too slow to calculate • We approximate the call value with a sum of calls from a different model • Softplus is the call value in a logistic distribution with density where is moneyness • Examples used in finance Jump diffusion models Mixture models Change of measure log 1 x e 1 1 1 1 1x x v e e s k x v 0 ; ! i J T i c k e c k i i 1 ; n M i i i c k p c k c k x x k dx x x x k dx x x f x dx x f x x k x use Carr – Madan with payoff
- 31. Activation (5) 0 1 1 11 1 Activation Distribution Call CDF PDF ReLU Dirac 1 1 1 1 ELU Exponential 1 1 1 0 1 1 1 1 Softplus/Sigmoid Logistic log 1 1 1 1 Bachelier Gaussian Multiquadri x x x x v x x x x v x x ve x e x e x vx x x x v e v e e e v xN x n x N x n x 3 22 21 1 2 2 22 c Student's T 1 1 1 1 v v x x x x x
- 32. Activation (6) • Exotic products In some cases we may directly observe that the network generalizes For example, Bermudas may be viewed as Europeans on Europeans on … i.e. multi layer network • If we use iterated expectations • We find the iterative structure of the neural network 1 1 1 1 2 1 2 1 1 1 1 ˆ ˆ; ; T T T T T T T T T V E CF CF S E CF E CF S S E CF V S E CF f S w S f S w Two different neural nets
- 33. Activation (7) • Asymptotically, we know activation does not matter But with a finite network size, not all activations are equivalent • When activations are option values Relu, ELU, softplus, Bachelier, Multiquadric The network approximates values as combinations of Europeans (in approximate models) approximates payoffs as piece-wise linear extrapolates linearly • These activations behave best in finance • We found softplus works best for revaluation
- 34. Training algorithms • Weight optimization 1st order iterative algorithms like SGD empirically shown to perform well on variety of problems Differentials of cost function Computed by back-propagation for a cost similar to one evaluation Frameworks like TensorFlow or your own/favourite AAD software implement back-prop automatically, behind the scenes • 2nd order algorithms? (In our experience) ADAM converges significantly faster than vanilla SGD for financial nets Roughly, ADAM = SGD + momentum + normalization of gradient by variance Hence, ADAM approximates 2nd order search Hinting that true 2nd order algorithms (like Levenberg-Marquardt or Conjugate Gradients) may perform better on financial nets (untested)
- 35. Regularization • Overfitting Learn noise specific to training set, prevents correct generalization Diagnosed with small training error combined with large test error (Asymptotically) vanishes when the size of training set grows Maths here (among other places): https://qr.ae/TWImtE Of particular concern when training ANN to produce expectations out of samples • Regularization Overfitting classically mitigated by regularization Tikhonov regularization: Penalize weight size to constraint parameters Effectively mitigates overfitting by constraining weights But what is it we prefer about small weights? If we are going to constrain weights Can we do this in a way that qualitatively improves the behaviour of the network? 2 2 1 1 ˆmin ; m i i i Y f X w m w n
- 36. Differential regularization • Idea: constrain weights so the ANN produces correct sensitivities so • Combine the two: • Compare to Tikhonov: ex ex p ex Q T p T T T V E CF S E Y X p exex ex ex ex p T TT Q T T T CF V Y E S E X S S X 2 1 1 ˆw* arg min ; m i i w i Y f X w m 2 1 ˆ ;1 w* arg min i im w i i i f X wY nm X X 2 2 11 1 ˆw ; ˆ * arg min ;m i i w i im i ii i f X w n Y f X Y m Xm X w 2 2 1 1 ˆw* arg min ; m i i w i Y f X w w nm train net to produce correct values train net to produce correct sensitivities
- 37. Differential regularization (2) 2 2 1 1 1 ˆw ; ˆ * arg min ;m m i i w i i ii i i f X m Y X Y f m w X n X w p ex ex i pi T T i i T CF Y X S derivatives labels • a.k.a path-wise sensitivities • simulated with the training set • fed to the training algorithm ˆ ; p ex ex i i p T T i i T CF f X w X S E derivatives results • a.k.a (Approx.) value sensitivities • Computed in the network
- 38. • Extended training set In addition to pathwise states And cash-flows We must simulate path-wise differentials As sensitivity labels for the training set • Produced by classic path-wise AAD over Monte-Carlo simulations Easily Efficiently As long as your Monte-Carlo engine is AAD aware Path-wise sensitivities p ex ex i pi T T i i T CF Y X S , , ,1 i i i i X Y i m Y X ex i i TX S p ex ii p T T Y CF
- 39. A brief reminder of AAD • Principle: Given any scalar function h Coded as a computer program Compute all the differentials In a time similar to one function evaluation Automatically • How: Build evaluation graph of h (automatically, e.g. with operator overloading) That is, split calculation into elementary building blocks: +, -, *, /, log, exp, sqrt… Which derivatives are known Denote them with Ai = ancestors of xi so j < i Sequence of operations and dependencies called ‘tape’ in AAD lingo 0 ,..., ny h x x i y x ,i i j ix f x j A … … … … … … ……1x 2x nx 1nx 2nx ix Nx inputs operations resultjx ,i i j ix f x j A
- 40. AAD • Apply chain rule in reverse order through the tape: because xN = y by the chain rule, with Sj = successors of xj so i > j • Which gives us the reverse propagation algorithm for derivatives: 1 N y x j i i Sj i j xy y x x x … … … … … ……1x 2x ix Nx…nx 1nx 2nx jx evaluation: x1 x2 … xN ,i i j ix f x j A differentiation: dx1 dX2 … dxN i i j j i fy y j A x x x
- 41. Path-wise sensitivities with AAD • Path-wise Monte-Carlo simulation 1. Draw random numbers (typically, Brownian increments) 2. Run SDE to generate path: 3. Evaluate cash-flows • AAD instrumented simulation automatically puts this sequence on tape: • Running the adjoint equation backwards, we get all the differentials: • Including the path-wise derivative label for training nets: 1 2 10 *... ...ex k kT T TT T TS S S S S SX S *, ,...,ex k p ex p T T T T T Y CF f S S S … … … …1CF pY CF 0S 1TS exTS kTS *TS 2CF PCF path simulation CF evaluation … … … …1CF pY CF 0S 1TS exTS kTS *TS 2CF PCF 1 Y Y P Y CF 2 Y CF 1 Y CF *T Y S kT Y S exT Y S 1T Y S 0 Y S exT Y Y X S
- 42. • Path-wise sensitivities Produced by the simulation engine Efficiently and conveniently, with path-wise AAD Fed to the training algorithm as derivative labels • We also need value sensitivities Produced by the neural net Computed with classic back-propagation equations Also, efficiently and conveniently Which is not surprising, since back-prop is also a form of AAD Value sensitivities
- 43. Back-propagation feed-forward: g: softmax back-prop: g’: sigmoid adjoint of feed-forward 1 1l l l l lz w g z b 0z x 1 1 1z w x b 2 2 1 2z w g z b 3 3 2 3 ˆy z w g z b 3 ˆ 1 y z 2 3 2 ˆ Ty z w z 1 1 1 1 ˆ ˆ ˆ ' Tl l l l l l l l zy y y g z w z z z z 1 2 1 2 ˆ ˆTy y z w z z 1 1 ˆ ˆTy y x w x z sequence of matrix operations 0 L another sequence of matrix operations with same weights L 0
- 44. Value sensis with back-prop • Neural net with integrated back-prop Back-prop: additional layers, also matrix operations Net doubles in depth, unchanged number of weights Weights sharing regularization Doubles cost of traversing net but estimates all sensis in addition to values feed-forward back-prop = more feed-forward 0z x 111 bwz x 2 1 22z w g z b 2 33 ˆ wy g z b 2 32 ˆ T z w y z 1 2 21 ˆ ˆT y z z z w y 1 1 ˆ ˆT w y y x x z
- 45. Back-prop through value sensis • Back-prop written as additional feed-forward layers with shared weights • ANN now outputs values and sensis • (Low) cost: double net depth double computation/train time • Valuation + differentiation written as sequence of matrix operations • This whole sequence is: Efficiently differentiated for training With another round of back-prop Automatic with TensorFlow or your own/favourite DL or AAD framework Of course, you can also do it manually and explicitly if you wish And see how it is another sequence of matrix operations (not shown here) Explicit second order differentiation is not necessary
- 46. (Simple) classic TensorFlow code
- 47. TensorFlow code (2)
- 48. TensorFlow code (3)
- 49. Results • Effective regularization • Net learns to produce correct sensis • And compute them efficiently • Additional benefits: Improves correctness of value funct. shape Improves ordering of paths against state Mitigates extrapolation challenge no regularization differential regularization
- 50. Regularization in classic ML • Textbook ML example (Bishop): Data comes from Fit linear polynomial model On 10 (noisy) training examples • Result: Classic overfitting Perfect fit of the training examples Completely wrong generalization • Observe: Training Values are perfectly fitted But their sensitivities are completely wrong sin 2y x noise 9 0 ˆ i i i y w x
- 51. Tikhonov regularization lambda = 0.00001 lambda = 0.01 lambda = 0.0001 lambda = 0.1 lambda = 0.001 lambda = 1.0
- 52. Differential regularization lambda = 0.00001 lambda = 0.1 lambda = 0.001 lambda = 1.0 lambda = 0.01 lambda = 10.0
- 53. Regularization: ML and Finance • Differential regularization is Natural and Powerful Efficient and Easy (modulo a good understanding of AAD and back-prop) • Why is it (to our knowledge) unknown to ML? ML mostly deals with real world data No differential labels • In finance: We have control over data (since we simulate it) We can leverage it to find more powerful ML algorithms • Importantly: Only part of the work is on ANN design and training Most work is on data simulation This is where we need other powerful financial techniques Scripting and otherwise handling of cash-flows Hierarchical models of everything Generic parallel simulation engines AAD
- 54. The extrapolation challenge • Neural nets struggle on extrapolation • Known problem, same (worse) with linear models • Bad extrapolation not acceptable in Finance Messes with ordering, damaging ES computations, e.g. for FRTB Inaccurate revaluation in stress scenarios Wrong assessment and management of tail risk • Differential regularization Helps But does not eliminate problem • (In our experience) Classic ML/DL methods help only marginally For example, early stopping may prevent overfitting extrapolation slope But does not help finding the correct extrapolation slope in the first place • We need a specific solution
- 55. Sol 1: widen sampling dist. of X • Sample from a wider distribution Q’ For example, increase volatility of simulation over period from today (0) to Tex Leaving parameters unchanged after Tex So the relation between X and Y is unchanged (because Y = functional of S path after Tex) But X tails are pushed further away • Effectively resolves extrapolation • Without any work on nets, only simulation • Implementation Trivial with one exposure date Tex With multiple exposure dates Requires different simulation sets for each exposure date At significant computational cost for the production of the training sets Can be avoided by importance sampling Simulate full path from wider distribution Q’ Weight training labels by likelihood ratios exTX S 'p ex p T T dQ Y CF dQ Today’s market Exposure date
- 56. Sol 2: stabilize Y with antithetics • Monte-Carlo simulation with antithetic branching: Simulate additional antithetic path from every ex date Y = average cash-flows from main and antithetic* path Since CF are generally linear in tails, value ~ 0.5(CF + CF*) (In linear region) Y is the value, not a realization Train extrapolation slope on values, with Noise effectively removed by antithetic • Also, effectively resolves extrapolation problem • Again, only by working on the simulation engine • At (roughly) double simulation cost ex1 ex2 ex3 main path antithetic branch
- 57. Part III One Training Set Simulator
- 58. One Analytic Engine • One single system design and implementation for: Front-Office valuation and risk management Regulatory risk management XVA, CCR, FRTB, MVA, … Customer analytics • Architecture based on 3 pillars (4 with ML) 1. One transaction representation, down to cash flows (including books and regulations) 2. One model hierarchy 3. One simulation/risk engine, parallel and AAD aware 4. Integrated ML
- 59. One Analytic: Danske Bank
- 60. One cash flow engine • We aggregate and manipulate cash-flows from different transactions Swaps, options, exotics… • All transactions must be Represented consistently Described in terms of cash-flows • Common denominator to all transactions/books/nettings sets: event driven cash-flows • In general, cash-flows are functionals of state path prior to payment: ,0p p t pCF f S t T
- 61. One cash flow engine (2) • Scripting applies a language to represent all cash-flows Human readable: a ”financial” programming language Computer readable by visitors (code that reads and interprets scripts) • Scripting = one unique interface to all booking systems • Scripted cash-flows aggregated (compressed) to represent trading books • Regulatory amounts also scripted as options on trading book values simple example: (uncollateralized) CVA = European put on netting set • For more on scripting see Savine QM 2018 and Andreasen QM 2019
- 62. (automated) CF extraction One transaction engine notional: N1 start date: t0 end date: t1 type: payer fixed leg: C, ann, 30/360 float leg: EUR3M+s swap booking system notional: N2 underlying: EUR/USD expiry: T strike: K option booking system from t0 to t1 every 1y: fixed_leg pays -N1*C*cvg(start(), end(), 30/360) on payDate() from t0 to t1 every 3m: float_leg pays N1*(Libor(start(), end(), EUR3M)+s) *cvg(start(), end(), act.360) on payDate() on T: opt pays N2*max(spot(EUR/USD)-K,0) aggregation (compression) from t0 to t1 every 1y: pv pays -N1*C*cvg(start(), end(), 30/360) on payDate() from t0 to t1 every 3m: pv pays N1*(Libor(start(), end(), EUR3M)+s) *cvg(start(), end(), act.360) on payDate() on T: pv pays N2*max(spot(EUR/USD),0) transaction scripts trading book as one script model selection model simulated paths cash-flow eval Y Tex X training set train and use approximators
- 63. Dynamic hybrid One Model hierarchy Linear models Linear market • Store today’s prices of stocks, bonds, indices… and interpolate today’s curves: rates, spreads, dividends, repos, … • Compute discounts and forward prices/rates for all maturities • Used by FO for linear transactions Molecule price Molecule rate • Store and interpolate today’s implied volatilities (Black-Scholes, Bachelier, Heston, SABR, …) • Compute European options of all strikes and maturities • Used by FO for European options Dynamic price Dynamic rate • Produce scenarios with arbitrage-free dynamics for market variables • Used by FO for exotic options • Calibrated to molecules Hybrid/Regulatory model • Correctly assembles and correlates dynamic models • Used by FO for Hybrid options, xVA, (CCR), PRIIPS … (inflation etc.) Molecules a.k.a Implied Volatility Surfaces Dynamic models
- 64. One Method hierarchy Molecules = Marginal distributions Fourier invertion • Interpolate discretely observed European option quotes • Pricing of European cash flows CF = f(ST) Stochastic processes = Copulas • Calibrate to marginal distributions • (Implicitly) define joint distributions (a.k.a copulas) • Typically not directly applied to pricing Monte Carlo Simulators Hybrid MC • Find their parameters from the stochastic process • Pricing path dependent cash-flows CF = f(St,0<t<Tp) • (Callable = Path Dep with LSM) Neural Nets • Nets ”calibrate” to a simulated training set • Pricing/Reval any trade/book with a trained neural net Neural Net Closed Form Linear market = Expectations Linear market surfaces • Interpolate dsicretely observed linear quotes • Pricing of linear cash flows CF = alpha * S + beta SLV (Heston) Local vol (Dupire) MFC MC (rates) LV MC (prices)
- 65. One Risk Engine • Risk engine Combines models, methods and transactions Calibrates models Simulates state paths Evaluates cash-flows path-wise Produces a training set Trains neural nets Computes values and risks • Risk engine implementation Accommodates any model (via appropriate API) and instrument (via scripting) Leverages hardware parallelism Vectorization Multi-threading (GPU) Incorporates AAD for efficient production of differentials
- 66. One Analytic with ML • Today any FO derivatives system includes: Simulation: generation of Monte-Carlo paths for the state vector Evaluation of ”payoff” along simulated paths (For Bermudan/American options) Estimate ”continuation” values with LSM • Hence, a FO derivatives system can train a simple linear model to estimate continuation values (at least Bermudan options) • With our One Analytic Engine, this process is generalized so: Any model can generate samples for any transaction Including whole trading book And regulatory calculation as a (hybrid) option on the cash flows of the book Any network can be trained on these samples, including derivatives regularization With computational and numerical efficiency: Parallel simulation AAD
- 67. Thank you for your attention Slides available on https://www.slideshare.net/AntoineSavine
- 68. Everything about AAD and MC It would not be much of an exaggeration to say that Antoine Savine's book ranks as the 21st century peer to Merton's 'Continuous-Time Finance‘. Vladimir Piterbarg This book [...] addresses the challenges of AAD head on. [...] The exposition is [...] ideal for a Finance audience. The conceptual, mathematical, and computational ideas behind AAD are patiently developed in a step-by-step manner, where the many brain-twisting aspects of AAD are de-mystified. For real-life application projects, the book is loaded with modern C++ code and battle-tested advice on how to get AAD to run for real. [...] Start reading! Leif Andersen An indispensable resource for any quant. Written by experts in the field and filled with practical examples and industry insights that are hard to find elsewhere, the book sets a new standard for computational finance. Paul Glasserman A passion to instruct A knack for clarity An obsession with detail A luminous writer An instant classic. Bruno Dupire

No public clipboards found for this slide

Be the first to comment