SlideShare a Scribd company logo
Estimating Financial
Risk with Spark
Sandy Ryza
Senior Data Scientist, Cloudera
In reasonable
circumstances, what’s
the most you can expect
to lose?
def valueAtRisk(
portfolio,
timePeriod,
pValue
): Double = { ... }
def valueAtRisk(
portfolio,
2 weeks,
.05
): = $1,000,000
Probability
density
Portfolio return ($) over time period
VaR estimation
approaches
• Variance-covariance
• Historical
• Monte Carlo
Market Risk Factors
• Indexes (S&P 500, NASDAQ)
• Prices of commodities
• Currency exchange rates
• Treasury bonds
Predicting Instrument Returns from
Factor Returns
• Train a linear model on the factors for each
instrument
Fancier
• Add features that are non-linear transformations of
the market risk factors
• Decision trees
• For options, use Black-Scholes
import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression
// Load the instruments and factors
val factorReturns: Array[Array[Double]] = ...
val instrumentReturns: RDD[Array[Double]] = ...
// Fit a model to each instrument
val models: Array[Array[Double]] =
instrumentReturns.map { instrument =>
val regression = new OLSMultipleLinearRegression()
regression.newSampleData(instrument, factorReturns)
regression.estimateRegressionParameters()
}.collect()
How to sample factor
returns?
• Need to be able to generate sample vectors where
each component is a factor return.
• Factors returns are usually correlated.
Distribution of US treasury bond two-
week returns
Distribution of crude oil two-week
returns
The Multivariate Normal Distribution
• Probability distribution over vectors of length N
• Given all the variables but one, that variable is
distributed according to a univariate normal
distribution
• Models correlations between variables
import org.apache.commons.math3.stat.correlation.Covariance
// Compute means
val factorMeans: Array[Double] = transpose(factorReturns)
.map(factor => factor.sum / factor.size)
// Compute covariances
val factorCovs: Array[Array[Double]] = new Covariance(factorReturns)
.getCovarianceMatrix().getData()
Fancier
• Multivariate normal often a poor choice compared to
more sophisticated options
• Fatter tails: Multivariate T Distribution
• Filtered historical simulation
• ARMA
• GARCH
Running the Simulations
• Create an RDD of seeds
• Use each seed to generate a set of simulations
• Aggregate results
def trialReturn(factorDist: MultivariateNormalDistribution, models: Seq[Array[Double]]): Double = {
val trialFactorReturns = factorDist.sample()
var totalReturn = 0.0
for (model <- models) {
// Add the returns from the instrument to the total trial return
for (i <- until trialFactorsReturns.length) {
totalReturn += trialFactorReturns(i) * model(i)
}
}
totalReturn
}
// Broadcast the factor return -> instrument return models
val bModels = sc.broadcast(models)
// Generate a seed for each task
val seeds = (baseSeed until baseSeed + parallelism)
val seedRdd = sc.parallelize(seeds, parallelism)
// Create an RDD of trials
val trialReturns: RDD[Double] = seedRdd.flatMap { seed =>
trialReturns(seed, trialsPerTask, bModels.value, factorMeans, factorCovs)
}
Executor
Executor
Time
Model
parameters
Model
parameters
Executor
Task Task
Trial Trial TrialTrial
Task Task
Trial Trial Trial Trial
Executor
Task Task
Trial Trial TrialTrial
Task Task
Trial Trial Trial Trial
Time
Model
parameters
Model
parameters
// Cache for reuse
trialReturns.cache()
val numTrialReturns = trialReturns.count().toInt
// Compute value at risk
val valueAtRisk = trials.takeOrdered(numTrialReturns / 20).last
// Compute expected shortfall
val expectedShortfall =
trials.takeOrdered(numTrialReturns / 20).sum / (numTrialReturns / 20)
VaR
So why Spark?
Easier to use
• Scala and Python REPLs
• Single platform for
• Cleaning data
• Fitting models
• Running simulations
• Analyzing results
New powers
• Save full simulation-loss matrix in memory (or disk)
• Run deeper analyses
• Join with other datasets
But it’s CPU bound and
we’re using Java?
• Computational bottlenecks are normally in matrix
operations, which can be BLAS-ified
• Can call out to GPUs just like in C++
• Memory access patterns aren’t high-GC inducing
Want to do this
yourself?
spark-finance
• https://github.com/cloudera/spark-finance
• Everything here + the fancier stuff
• Patches welcome!
Estimating Value at Risk with Spark

More Related Content

Similar to Estimating Value at Risk with Spark

wk5ppt2_Iris
wk5ppt2_Iriswk5ppt2_Iris
wk5ppt2_Iris
AliciaWei1
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
Lviv Startup Club
 
Unit test candidate solutions
Unit test candidate solutionsUnit test candidate solutions
Unit test candidate solutionsbenewu
 
Dive into DevOps | March, Building with Terraform, Volodymyr Tsap
Dive into DevOps | March, Building with Terraform, Volodymyr TsapDive into DevOps | March, Building with Terraform, Volodymyr Tsap
Dive into DevOps | March, Building with Terraform, Volodymyr Tsap
Provectus
 
Visual diagnostics at scale
Visual diagnostics at scaleVisual diagnostics at scale
Visual diagnostics at scale
Rebecca Bilbro
 
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkTraining Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in Spark
Patrick Pletscher
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every day
Vadym Khondar
 
An Introduction to Property Based Testing
An Introduction to Property Based TestingAn Introduction to Property Based Testing
An Introduction to Property Based Testing
C4Media
 
Building calloutswithoutwsdl2apex
Building calloutswithoutwsdl2apexBuilding calloutswithoutwsdl2apex
Building calloutswithoutwsdl2apex
Ming Yuan
 
Android Automated Testing
Android Automated TestingAndroid Automated Testing
Android Automated Testing
roisagiv
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at Scale
Aaron (Ari) Bornstein
 
Testing in those hard to reach places
Testing in those hard to reach placesTesting in those hard to reach places
Testing in those hard to reach places
dn
 
Real world cross-platform testing
Real world cross-platform testingReal world cross-platform testing
Real world cross-platform testingPeter Edwards
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Yao Yao
 
Analytics with Spark
Analytics with SparkAnalytics with Spark
Analytics with Spark
Probst Ludwine
 
Workshop 23: ReactJS, React & Redux testing
Workshop 23: ReactJS, React & Redux testingWorkshop 23: ReactJS, React & Redux testing
Workshop 23: ReactJS, React & Redux testing
Visual Engineering
 
Test in action week 3
Test in action   week 3Test in action   week 3
Test in action week 3Yi-Huan Chan
 
Converting R to PMML
Converting R to PMMLConverting R to PMML
Converting R to PMML
Villu Ruusmann
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning service
Ruth Yakubu
 

Similar to Estimating Value at Risk with Spark (20)

wk5ppt2_Iris
wk5ppt2_Iriswk5ppt2_Iris
wk5ppt2_Iris
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Unit test candidate solutions
Unit test candidate solutionsUnit test candidate solutions
Unit test candidate solutions
 
Dive into DevOps | March, Building with Terraform, Volodymyr Tsap
Dive into DevOps | March, Building with Terraform, Volodymyr TsapDive into DevOps | March, Building with Terraform, Volodymyr Tsap
Dive into DevOps | March, Building with Terraform, Volodymyr Tsap
 
Visual diagnostics at scale
Visual diagnostics at scaleVisual diagnostics at scale
Visual diagnostics at scale
 
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkTraining Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in Spark
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every day
 
An Introduction to Property Based Testing
An Introduction to Property Based TestingAn Introduction to Property Based Testing
An Introduction to Property Based Testing
 
Building calloutswithoutwsdl2apex
Building calloutswithoutwsdl2apexBuilding calloutswithoutwsdl2apex
Building calloutswithoutwsdl2apex
 
ppopoff
ppopoffppopoff
ppopoff
 
Android Automated Testing
Android Automated TestingAndroid Automated Testing
Android Automated Testing
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at Scale
 
Testing in those hard to reach places
Testing in those hard to reach placesTesting in those hard to reach places
Testing in those hard to reach places
 
Real world cross-platform testing
Real world cross-platform testingReal world cross-platform testing
Real world cross-platform testing
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Analytics with Spark
Analytics with SparkAnalytics with Spark
Analytics with Spark
 
Workshop 23: ReactJS, React & Redux testing
Workshop 23: ReactJS, React & Redux testingWorkshop 23: ReactJS, React & Redux testing
Workshop 23: ReactJS, React & Redux testing
 
Test in action week 3
Test in action   week 3Test in action   week 3
Test in action week 3
 
Converting R to PMML
Converting R to PMMLConverting R to PMML
Converting R to PMML
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning service
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 

Estimating Value at Risk with Spark