Experts in numerical algorithms
and HPC services
NAG for HPC Finance
John Holden
john.holden@nag.co.uk
14th July 2015
The good, bad and ugly of
accelerators in finance
and an alternative a
complementary path
2
 NAG Introduction
 NAG for HPC Finance
 Why Quants Love NAG
 Accelerators
 NVIDIA
 Intel Xeon-Phi
 Algorithmic Differentiation
 Summary
Agenda
3
 Founded 1970
 Not-for-profit organisation
 Surpluses fund on-going R&D
 Mathematical and Statistical Expertise
 Numerical Libraries of components
 Consulting
 HPC Services
 Computational Science and Engineering (CSE) support
 Procurement advice, market watch, benchmarking
NAG Background
4
HPC Services
 Government, Academic
and Commercial
 Full CSE service
 Code porting, tuning,
scaling, rewriting…
 Training
 1-20 FTEs per annum
 Procurement
advice/benchmarking
ARM
5
Financial Services
 Many clients in FSI
 Most Tier 1 Banks have
licences
 > 60% have global
licences
 Typically the NAG
Library is embedded in
the banks own “quant”
libraries (C++, . NET,
Java, Python,…)
6
 NAG Introduction
 NAG for HPC Finance
 Why Quants Love NAG
 Accelerators
 NVIDIA
 Intel Xeon-Phi
 Algorithmic Differentiation
 Summary
Agenda
7
Why Quants use NAG Libraries and Toolboxes?
 Global reputation for quality – accuracy, reliability
and robustness…
 Extensively tested, supported and maintained code
 Reduces development time
 Allows concentration on your key areas
 Components
 Fit into your environment
 Simple interfaces to your favourite packages
 Regular performance improvements!
 Give “qualified error” messages e.g. tolerances of
answers
8
from Finance - k Factor Problem
9
from Finance - k Factor Problem
Principal Factors method (Andersen et al., 2003)
does NOT always converge to correct answer…
(no convergence theory)
Should have come to NAG….
Our* spectral projected gradient method respects
constraints, exploits convexity, converges to a feasible
stationary point
*NAG Library G02AE - Borsdorf, Higham & Raydan, 2010,
10
NAG Library and Toolbox Contents
 Root Finding
 Summation of Series
 Quadrature
 Ordinary Differential
Equations
 Partial Differential Equations
 Numerical Differentiation
 Integral Equations
 Mesh Generation
 Interpolation
 Curve and Surface Fitting
 Optimization
 Approximations of Special
Functions
 Dense Linear Algebra
 Sparse Linear Algebra
 Correlation & Regression
Analysis
 Multivariate Methods
 Analysis of Variance
 Random Number Generators
 Univariate Estimation
 Nonparametric Statistics
 Smoothing in Statistics
 Contingency Table Analysis
 Survival Analysis
 Time Series Analysis
 Operations Research
11
Use of NAG Software in Finance
 Portfolio allocation / Risk management /Stress testing
 Optimization , interpolation, linear algebra, RNGs, Distributions,
Copulas…
 Derivative pricing, Hedging
 PDEs, RNGs, multivariate normal, curve & surface fitting,
quadrature…
 Calibration
 Optimisation, Interpolation , Root Finders, Splines
 Data analysis
 Time series, GARCH, principal component analysis, data smoothing,
Data Mining…
 Monte Carlo simulation
 RNGs, Brownian Bridge constructor, Linear Algebra
 …
12
Why Quantitative Analysts Love NAG?
 General Problem
 To build asset models and risk engines in a timely manner
that are
 Robust
 Stable
 Quick
 Solution
 Use robust, well tested, fast numerical components
 This allows the “expensive” experts to concentrate on the
modelling and interpretation
 avoiding distraction with low level numerical components
13
Problem 1: Simulation (Monte Carlo)
 Simulation is important for scenario generation
 Several different numerical components needed
 Random Number Generators
 Brownian bridge constructor
 Interpolation/Splines
 Principal Component Analysis
 Cholesky Decomposition
 Distributions (uniform, Normal, exponential gamma,
Poisson, Student’s t, Weibull,..)
 ..
14
Problem 1: Simulation (Monte Carlo)
 Simulation is important for scenario generation
NAG to the rescue (CPU or GPU)
 Several different numerical components needed
 Random Number Generators √
 Brownian bridge constructor √
 Interpolation/Splines √
 Principal Component Analysis√
 Cholesky Decomposition √
 Distributions (uniform, Normal, exponential gamma,
Poisson, Student’s t, Weibull,..)√
 .. √ √
15
Problem 2: Calibration
 Financial institutions all need to calibrate their
models
 Several different numerical components needed
 Optimisation functions (e.g. constrained non-linear
optimisers)
 Interpolation functions (used intelligently*)
 Spline functions
 ..
*interpolator must be used carefully –must know the properties to pick appropriate method
16
Problem 2: Calibration
 Financial institutions all need to calibrate their
models
NAG to the rescue
 Several different numerical components needed
 Optimisation functions (e.g. constrained non-linear
optimisers) √
 Interpolation functions (used intelligently*) √
 Spline functions √
 .. √ √
*interpolator must be used carefully –must know the properties to pick appropriate method
17
 NAG Introduction
 NAG for HPC Finance
 Why Quants Love NAG
 Accelerators
 NVIDIA
 Intel Xeon-Phi
 Algorithmic Differentiation
 Summary
Agenda
18
 Escalator?:
Want more performance? Buy the next processor!
 To get performance/efficiency we have to go
(massively) parallel
 Disruption causing serious look at ‘other’
technologies and algorithms!
 Even CPUs with tens of cores per node
 Hybrid, shared-memory and distributed-memory
parallelism
 Painful whichever way we turn!
Where has my Escalator gone?
19
 Loose definition: hardware on which to run your
software better than on your (general purpose) CPU
 Generally NOT an easy win
 Significant learning curve and effort
 Offload disadvantages
 …
Accelerators
20
 ClearSpeed
 Similar to GPU
 Lacked a good software eco-system
 IBM Cell
 Lacked a good software eco-system
 GPGPU
 NVIDIA invested in the software eco-system (AMD not!)
 Intel Phi
 Early days – an encouraging start
 Expecting a lot more with Knights Landing!
Accelerators
21
 We provide
 A suite of Numerical Routines for Monte Carlo simulation
 from a collaboration with Professor Mike Giles
 also MAGMA based Linear Algebra from Jack Dongarra
 worked with Professor William Shaw to implement new Inverse
CDFs (new distributions and speed up to existing code)
 “Bespoke” consultancy codes
 PDE Solver for Stochastic Local Volatility
 FX Basket Option, Local Vol Model
 Solutions combining GPU and Algorithmic Differentiation
 More on that later….
 Training courses for CUDA and Open CL
NAG and GPGPUs (NVIDIA)
22
 Relatively easy to take existing
OpenMP based code and port to
Phi
NAG and Intel Xeon-Phi
 Tuning for Phi takes some learning and expertise
 … but feedback into Xeon code is often very strong
 Performance Issues
 As always, need large enough problems to make the offload
worthwhile
 seems to have significant offload overheads
 NAG Library for Intel Xeon-Phi available
23
0
50
100
150
200
250
300
350
400
450
0 5000 10000 15000 20000 25000 30000
Time(s)
Problem Size (n)
NAG Distance Matrix (g03ea) – Intel Xeon Phi
32 threads original Phi offload original Phi offload opt 32 threads opt
 n=30k; m=3k
 Xeon 32t: 192s
 Xeon 32t*: 75.7s
 Phi 240t*: 40.6s
 Phi gain ~5x over
original or ~2x over
optimised
24
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
100 10,000 1,000,000 100,000,000
Time(s)
Size of problem (n, log scale)
Uniform RNG – NAG Mersenne Twister (g05sa) – Intel Xeon Phi
8 threads original Native Phi original Native Phi opt 8 threads opt
 n=500m
 Xeon 8t: 0.25s
 Phi 240t:1.50s
 Xeon 8t*: 0.22s
 Phi gain ~3x
25
NAG & AD: Algorithmic Differentiation in a nutshell
Computers can only add, subtract, multiply and divide
numbers.
 A computer program implementing a model is many
of these basic operations strung together
 Elementary to compute the derivatives of these
 Chain rule + basic derivatives = program derivative
 Classes, templates and operator overloading can do
this efficiently and non-intrusively
26
To get acceleration look at the algorithms
0
50
100
150
200
250
300
350
50 100 150 200
Runtime(s)
Number of inputs (size of Delta)
5000 gradient evaluations of LIBOR Market Model*
using finite
differences
(bumping)
using adjoints
2nd-order
adjoints
(projected
Hessian)
*M.B. Giles and P.
Glasserman. `Smoking
adjoints: fast Monte Carlo
Greeks', RISK, January
2006
27
Computing derivatives in finance is important…
 Calculating a product’s sensitivities to a range of risk
factors (a.k.a. Greeks) creates huge computational
demand on risk and price models
 Traditional approach “bumping” - finite differences
 Which is Computationally very expensive.. more hardware!
The alternatives to finite differences are
 Write derivative code by hand
 Efficient, but difficult to write & highly error prone (need to
develop original and adjoint models)
 Algorithmic Differentiation
 flexible and just develop the original model - obvious choice
28
NAG and AAD
 Adjoint Algorithmic Differentiation (AAD) reduces
Runtime
 With RWTH Aachen University (Prof. Uwe Naumann
et al.) NAG are delivering Algorithmic Differentiation
(AD) tools and services to the finance community for C
/C++/CUDA codes.
 Our example codes include
 LIBOR Market Model
 PDE based Local Volatility model
 GPU accelerated Local Vol Basket Option pricer
 Our solutions deliver for accelerators
29
A few numbers
Monte Carlo
n f cfd AD ADf cfdAD
34 0.5s 29.0s 3.0s (2.5MB) 6.0x 9.7x
62 0.7s 80.9s 5.1s (3.2MB) 7.3x 15.9x
142 1.5s 423.5s 12.4s (5.1MB) 8.3x 34.2x
22 2.3s 1010.7s 24.4s (7.1MB) 10.6x 41.4x
PDE
34 0.6s 37.7s 11.6s (535MB) 19.3x 3.3x
62 1.0s 119.5s 18.7s (919MB) 18.7x 6.4x
142 2.6s 741.2s 39s (2GB) 15.0x 19x
222 4.1s 1857.3s 60s (3GB) 14.6x 31x
30
AAD and GPUs
 “AAD Vs GPUs: banks turn to maths tricks as chips lose
appeal” risk.net, Jan 2015 – NONSENSE… surely
combining AAD and GPUs make the ultimate
accelerator!
 “…. Join the AAD revolution” risk.net, July 15 – making
more a lot more sense…
“In computational finance, there is no silver bullet. AAD is an
algorithmic advance… …GPUs are parallel compute
accelerations. The two are complementary” J. Ashley, IBM
 NAG is already delivering “combined” solutions to
our clients (in FSI and other sectors)
31
 NAG Introduction
 NAG for HPC Finance
 Why Quants Love NAG
 Accelerators
 NVIDIA
 Intel Xeon-Phi
 Algorithmic Differentiation
 Summary
Agenda
32
 NAG is keen to collaborate in building models and
risk engines
 Requirements are likely to be varied across FSI
 We want to make sure we have what you need
 The importance of HPC Finance is growing and will
involve a LOT of computation (Basel III, CVA,…)
 NAG has significant experience in HPC libraries, services,
consulting and training
 We know how to do large scale computations efficiently
 This is non-trivial! Our expertise has been sought out and
exploited by organisations such as (AMD, HECToR,
Microsoft, Oracle, major banks, major oil & gas cos,…….)
HPC Finance - Summary
33
 www.nag.co.uk
 AD explained http://www.nag.co.uk/pss/nag-and-
algorithmic-differentiation
 Adjoint Algorithmic Differentiation Tool Support for Typical
Numerical Patterns in Computational Finance
http://www.nag.co.uk/doc/techrep/pdf/tr3_14.pdf
 Adjoint Algorithmic Differentiation of a GPU Accelerated
Application http://www.nag.co.uk/Market/articles/adjoint-
algorithmic-differentiation-of-gpu-accelerated-app.pdf
References

ISC Frankfurt 2015: Good, bad and ugly of accelerators and a complementary path

  • 1.
    Experts in numericalalgorithms and HPC services NAG for HPC Finance John Holden john.holden@nag.co.uk 14th July 2015 The good, bad and ugly of accelerators in finance and an alternative a complementary path
  • 2.
    2  NAG Introduction NAG for HPC Finance  Why Quants Love NAG  Accelerators  NVIDIA  Intel Xeon-Phi  Algorithmic Differentiation  Summary Agenda
  • 3.
    3  Founded 1970 Not-for-profit organisation  Surpluses fund on-going R&D  Mathematical and Statistical Expertise  Numerical Libraries of components  Consulting  HPC Services  Computational Science and Engineering (CSE) support  Procurement advice, market watch, benchmarking NAG Background
  • 4.
    4 HPC Services  Government,Academic and Commercial  Full CSE service  Code porting, tuning, scaling, rewriting…  Training  1-20 FTEs per annum  Procurement advice/benchmarking ARM
  • 5.
    5 Financial Services  Manyclients in FSI  Most Tier 1 Banks have licences  > 60% have global licences  Typically the NAG Library is embedded in the banks own “quant” libraries (C++, . NET, Java, Python,…)
  • 6.
    6  NAG Introduction NAG for HPC Finance  Why Quants Love NAG  Accelerators  NVIDIA  Intel Xeon-Phi  Algorithmic Differentiation  Summary Agenda
  • 7.
    7 Why Quants useNAG Libraries and Toolboxes?  Global reputation for quality – accuracy, reliability and robustness…  Extensively tested, supported and maintained code  Reduces development time  Allows concentration on your key areas  Components  Fit into your environment  Simple interfaces to your favourite packages  Regular performance improvements!  Give “qualified error” messages e.g. tolerances of answers
  • 8.
    8 from Finance -k Factor Problem
  • 9.
    9 from Finance -k Factor Problem Principal Factors method (Andersen et al., 2003) does NOT always converge to correct answer… (no convergence theory) Should have come to NAG…. Our* spectral projected gradient method respects constraints, exploits convexity, converges to a feasible stationary point *NAG Library G02AE - Borsdorf, Higham & Raydan, 2010,
  • 10.
    10 NAG Library andToolbox Contents  Root Finding  Summation of Series  Quadrature  Ordinary Differential Equations  Partial Differential Equations  Numerical Differentiation  Integral Equations  Mesh Generation  Interpolation  Curve and Surface Fitting  Optimization  Approximations of Special Functions  Dense Linear Algebra  Sparse Linear Algebra  Correlation & Regression Analysis  Multivariate Methods  Analysis of Variance  Random Number Generators  Univariate Estimation  Nonparametric Statistics  Smoothing in Statistics  Contingency Table Analysis  Survival Analysis  Time Series Analysis  Operations Research
  • 11.
    11 Use of NAGSoftware in Finance  Portfolio allocation / Risk management /Stress testing  Optimization , interpolation, linear algebra, RNGs, Distributions, Copulas…  Derivative pricing, Hedging  PDEs, RNGs, multivariate normal, curve & surface fitting, quadrature…  Calibration  Optimisation, Interpolation , Root Finders, Splines  Data analysis  Time series, GARCH, principal component analysis, data smoothing, Data Mining…  Monte Carlo simulation  RNGs, Brownian Bridge constructor, Linear Algebra  …
  • 12.
    12 Why Quantitative AnalystsLove NAG?  General Problem  To build asset models and risk engines in a timely manner that are  Robust  Stable  Quick  Solution  Use robust, well tested, fast numerical components  This allows the “expensive” experts to concentrate on the modelling and interpretation  avoiding distraction with low level numerical components
  • 13.
    13 Problem 1: Simulation(Monte Carlo)  Simulation is important for scenario generation  Several different numerical components needed  Random Number Generators  Brownian bridge constructor  Interpolation/Splines  Principal Component Analysis  Cholesky Decomposition  Distributions (uniform, Normal, exponential gamma, Poisson, Student’s t, Weibull,..)  ..
  • 14.
    14 Problem 1: Simulation(Monte Carlo)  Simulation is important for scenario generation NAG to the rescue (CPU or GPU)  Several different numerical components needed  Random Number Generators √  Brownian bridge constructor √  Interpolation/Splines √  Principal Component Analysis√  Cholesky Decomposition √  Distributions (uniform, Normal, exponential gamma, Poisson, Student’s t, Weibull,..)√  .. √ √
  • 15.
    15 Problem 2: Calibration Financial institutions all need to calibrate their models  Several different numerical components needed  Optimisation functions (e.g. constrained non-linear optimisers)  Interpolation functions (used intelligently*)  Spline functions  .. *interpolator must be used carefully –must know the properties to pick appropriate method
  • 16.
    16 Problem 2: Calibration Financial institutions all need to calibrate their models NAG to the rescue  Several different numerical components needed  Optimisation functions (e.g. constrained non-linear optimisers) √  Interpolation functions (used intelligently*) √  Spline functions √  .. √ √ *interpolator must be used carefully –must know the properties to pick appropriate method
  • 17.
    17  NAG Introduction NAG for HPC Finance  Why Quants Love NAG  Accelerators  NVIDIA  Intel Xeon-Phi  Algorithmic Differentiation  Summary Agenda
  • 18.
    18  Escalator?: Want moreperformance? Buy the next processor!  To get performance/efficiency we have to go (massively) parallel  Disruption causing serious look at ‘other’ technologies and algorithms!  Even CPUs with tens of cores per node  Hybrid, shared-memory and distributed-memory parallelism  Painful whichever way we turn! Where has my Escalator gone?
  • 19.
    19  Loose definition:hardware on which to run your software better than on your (general purpose) CPU  Generally NOT an easy win  Significant learning curve and effort  Offload disadvantages  … Accelerators
  • 20.
    20  ClearSpeed  Similarto GPU  Lacked a good software eco-system  IBM Cell  Lacked a good software eco-system  GPGPU  NVIDIA invested in the software eco-system (AMD not!)  Intel Phi  Early days – an encouraging start  Expecting a lot more with Knights Landing! Accelerators
  • 21.
    21  We provide A suite of Numerical Routines for Monte Carlo simulation  from a collaboration with Professor Mike Giles  also MAGMA based Linear Algebra from Jack Dongarra  worked with Professor William Shaw to implement new Inverse CDFs (new distributions and speed up to existing code)  “Bespoke” consultancy codes  PDE Solver for Stochastic Local Volatility  FX Basket Option, Local Vol Model  Solutions combining GPU and Algorithmic Differentiation  More on that later….  Training courses for CUDA and Open CL NAG and GPGPUs (NVIDIA)
  • 22.
    22  Relatively easyto take existing OpenMP based code and port to Phi NAG and Intel Xeon-Phi  Tuning for Phi takes some learning and expertise  … but feedback into Xeon code is often very strong  Performance Issues  As always, need large enough problems to make the offload worthwhile  seems to have significant offload overheads  NAG Library for Intel Xeon-Phi available
  • 23.
    23 0 50 100 150 200 250 300 350 400 450 0 5000 1000015000 20000 25000 30000 Time(s) Problem Size (n) NAG Distance Matrix (g03ea) – Intel Xeon Phi 32 threads original Phi offload original Phi offload opt 32 threads opt  n=30k; m=3k  Xeon 32t: 192s  Xeon 32t*: 75.7s  Phi 240t*: 40.6s  Phi gain ~5x over original or ~2x over optimised
  • 24.
    24 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 100 10,000 1,000,000100,000,000 Time(s) Size of problem (n, log scale) Uniform RNG – NAG Mersenne Twister (g05sa) – Intel Xeon Phi 8 threads original Native Phi original Native Phi opt 8 threads opt  n=500m  Xeon 8t: 0.25s  Phi 240t:1.50s  Xeon 8t*: 0.22s  Phi gain ~3x
  • 25.
    25 NAG & AD:Algorithmic Differentiation in a nutshell Computers can only add, subtract, multiply and divide numbers.  A computer program implementing a model is many of these basic operations strung together  Elementary to compute the derivatives of these  Chain rule + basic derivatives = program derivative  Classes, templates and operator overloading can do this efficiently and non-intrusively
  • 26.
    26 To get accelerationlook at the algorithms 0 50 100 150 200 250 300 350 50 100 150 200 Runtime(s) Number of inputs (size of Delta) 5000 gradient evaluations of LIBOR Market Model* using finite differences (bumping) using adjoints 2nd-order adjoints (projected Hessian) *M.B. Giles and P. Glasserman. `Smoking adjoints: fast Monte Carlo Greeks', RISK, January 2006
  • 27.
    27 Computing derivatives infinance is important…  Calculating a product’s sensitivities to a range of risk factors (a.k.a. Greeks) creates huge computational demand on risk and price models  Traditional approach “bumping” - finite differences  Which is Computationally very expensive.. more hardware! The alternatives to finite differences are  Write derivative code by hand  Efficient, but difficult to write & highly error prone (need to develop original and adjoint models)  Algorithmic Differentiation  flexible and just develop the original model - obvious choice
  • 28.
    28 NAG and AAD Adjoint Algorithmic Differentiation (AAD) reduces Runtime  With RWTH Aachen University (Prof. Uwe Naumann et al.) NAG are delivering Algorithmic Differentiation (AD) tools and services to the finance community for C /C++/CUDA codes.  Our example codes include  LIBOR Market Model  PDE based Local Volatility model  GPU accelerated Local Vol Basket Option pricer  Our solutions deliver for accelerators
  • 29.
    29 A few numbers MonteCarlo n f cfd AD ADf cfdAD 34 0.5s 29.0s 3.0s (2.5MB) 6.0x 9.7x 62 0.7s 80.9s 5.1s (3.2MB) 7.3x 15.9x 142 1.5s 423.5s 12.4s (5.1MB) 8.3x 34.2x 22 2.3s 1010.7s 24.4s (7.1MB) 10.6x 41.4x PDE 34 0.6s 37.7s 11.6s (535MB) 19.3x 3.3x 62 1.0s 119.5s 18.7s (919MB) 18.7x 6.4x 142 2.6s 741.2s 39s (2GB) 15.0x 19x 222 4.1s 1857.3s 60s (3GB) 14.6x 31x
  • 30.
    30 AAD and GPUs “AAD Vs GPUs: banks turn to maths tricks as chips lose appeal” risk.net, Jan 2015 – NONSENSE… surely combining AAD and GPUs make the ultimate accelerator!  “…. Join the AAD revolution” risk.net, July 15 – making more a lot more sense… “In computational finance, there is no silver bullet. AAD is an algorithmic advance… …GPUs are parallel compute accelerations. The two are complementary” J. Ashley, IBM  NAG is already delivering “combined” solutions to our clients (in FSI and other sectors)
  • 31.
    31  NAG Introduction NAG for HPC Finance  Why Quants Love NAG  Accelerators  NVIDIA  Intel Xeon-Phi  Algorithmic Differentiation  Summary Agenda
  • 32.
    32  NAG iskeen to collaborate in building models and risk engines  Requirements are likely to be varied across FSI  We want to make sure we have what you need  The importance of HPC Finance is growing and will involve a LOT of computation (Basel III, CVA,…)  NAG has significant experience in HPC libraries, services, consulting and training  We know how to do large scale computations efficiently  This is non-trivial! Our expertise has been sought out and exploited by organisations such as (AMD, HECToR, Microsoft, Oracle, major banks, major oil & gas cos,…….) HPC Finance - Summary
  • 33.
    33  www.nag.co.uk  ADexplained http://www.nag.co.uk/pss/nag-and- algorithmic-differentiation  Adjoint Algorithmic Differentiation Tool Support for Typical Numerical Patterns in Computational Finance http://www.nag.co.uk/doc/techrep/pdf/tr3_14.pdf  Adjoint Algorithmic Differentiation of a GPU Accelerated Application http://www.nag.co.uk/Market/articles/adjoint- algorithmic-differentiation-of-gpu-accelerated-app.pdf References