SlideShare a Scribd company logo
Massive Simulations In Spark:
Distributed Monte Carlo For Global
Health Forecasts
Kyle Foreman, PhD
07 June 2016
Simulations in Spark
• Context
• Motivation
• SimBuilder
• Backends
• Benchmarks
• Discussion
2
What is IHME?
3
• Independent research center at the
University of Washington
• Core funding by Bill & Melinda Gates
Foundation and State of Washington
• ~300 faculty, researchers, students and
staff
• Providing rigorous, scientific measurement
o What are the world’s major health problems?
o How well is society addressing these problems?
o How should we best dedicate resources to
improving health?
Goal:
improve health
by providing the best
information
on population health
Global Burden of Disease
• A systematic scientific effort to
quantify the comparative magnitude
of health loss due to diseases,
injuries and risk factors
o 188 countries
o 300+ diseases
o 1990 – 2015+
o 50+ risk factors
4
GBD Collaborative Model
1,100 experts in 106 countries
Death Rate in 2013 (age-standardized)
6
Childhood Death Rate in 2013
7
DALYs in
High
Income
Countries
(2010)
8
DALYs in
Low & Middle
Income
Countries
(2010)
9
Simulations in Spark
• Context
• Motivation
• SimBuilder
• Backends
• Benchmarks
• Discussion
10
Forecasting the Burden of Disease
11
1. Generate a baseline scenario projecting the GBD 25 years into the future
o Mortality, morbidity, population, and risk factors
o Every country, cause, sex, age
2. Create a comprehensive simulation framework to assess alternative
scenarios
o Capture the complex interrelationships between risk factors, interventions, and
diseases to explore the effects of changes to the system
3. Build a modular and flexible platform that can incorporate new models to
answer detailed “what if?” questions
o E.g. introduction of a new vaccine, effects of global warming, scale up of ART
coverage, risk of global pandemics, etc
Simulations
• Takes into account interdependencies between risk factors,
diseases, etc.
o Use draws of model parameters to advance from t to t+1, allowing for
forecasts of other quantities (e.g. risk factors and mortality) to interact
• Modular structure allows for detailed “what ifs”
o E.g. scale up of coverage of a new intervention
• Allows us to incorporate alternative models to test sensitivity to model
choice and specification
12
Basic Simulation Process
1. Fit statistical models (PyMB) capturing most important
relationships as statistical parameters
2. Generate many draws from those parameters, reflecting
their uncertainty and correlation structure
3. Run Monte Carlo simulations to update each quantity
over time, taking into account its dependencies
13
Simulations in Spark
• Context
• Motivation
• SimBuilder
• Backends
• Benchmarks
• Discussion
14
SimBuilder
• Directed Acyclic Graph (DAG) construction and validation
o yaml and sympy specification
o curses interface
o graphviz visualization
• Flexible execution backends
o pandas
o pyspark
15
Example Global Health DAG
16
GDP 2016
Deaths
2016
Births 2016
Population
2016
Workforce
2016
Deaths
2017
Births 2017
Population
2017
Workforce
2017
GDP 2017
Creating a DAG with SimBuilder
1. Create a YAML for the overall simulation
o Specify child models and the dimensions of the simulation
2. Build separate YAML files for each child model
o Dimensions along which the model varies
o Location of historical data
o Expression for how to generate the quantity in t+1
3. Run SimBuilder to validate and explore DAG
17
Creating a DAG with SimBuilder
18
name: gdp_pop
models:
- name: gdp_per_capita
- name: workforce
- name: mortality
- name: gdp
- name: pop
global_parameters:
- name: loc
set: List
values: ["USA", "MEX", "CAN", ..., ]
- name: t
set: Range
start: 2014
end: 2040
- name: draw
set: Range
start: 0
end: 999
	
sim.yaml
19
Creating a DAG with SimBuilder
name: pop
version: 0.1
variables: [loc,t]
history:
- type: csv
path: pop.csv
	
pop.yaml
Creating a DAG with SimBuilder
name: gdp_per_capita	
version: 0.1	
expr: gdp(draw, t, loc)/pop(draw, t-1, loc)	
variables: [draw, t, loc]	
history:	
- type: csv	
path: gdp_pc_w_draw.csv	
	
gdp_per_capita.yaml
	
20
21
Creating a DAG with SimBuilder
name: gdp
version: 0.1
expr: gdp(d,loc,t-1) * exp(
alpha_0(loc, t, draw) +
beta_gdp(draw) * log(gdp_per_capita(draw, loc, t-1)) +
beta_pop(draw, t) * workforce(loc, t)
)
variables: [draw ,loc,t]
history:
- type: csv
path: gdp_w_draw.csv
model_parameters:
- name: alpha_0
variables: [draw, loc, t]
history:
- type: csv
path: gdp/alpha_0.csv
- name: beta_gdp
variables: [d]
history:
- type: csv
path: gdp/beta_gdp.csv
- name: beta_pop
variables: [d]
history:
- type: csv
path: gdp/beta_pop.csv
	
gdp.yaml
DAG
22
gdp{0}
gdp_per_capita{0}
workforce{0}
pop{0}
mortality{0}
gdp_beta_gdp gdp_alpha_0{0} gdp_beta_pop{0}
mortality_beta_pop{0} mortality_alpha_0{0} mortality_beta_gdp
23
gdp{0}
gdp{1}
gdp_per_capita{0}
gdp_per_capita{1}
workforce{0}
mortality{1}
workforce{1}
pop{0}
pop{1}
mortality{0}
gdp_beta_gdpgdp_alpha_0{0}gdp_beta_pop{0}
gdp_alpha_0{1} gdp_beta_pop{1}
mortality_beta_pop{0} mortality_alpha_0{0} mortality_beta_gdp
mortality_beta_pop{1} mortality_alpha_0{1}
24
gdp{0}
gdp{1}
gdp_per_capita{0}
gdp{2}
gdp_per_capita{1}
gdp{3}
gdp_per_capita{2}
gdp{4}
gdp_per_capita{3}
gdp_per_capita{4}
workforce{0}
mortality{1}
workforce{1}
mortality{2}
workforce{2}
mortality{3}
workforce{3}
mortality{4}
workforce{4}
pop{0}
pop{1}
pop{2}
pop{3}
pop{4}
mortality{0}
gdp_beta_gdpgdp_alpha_0{0} gdp_beta_pop{0}
gdp_alpha_0{1} gdp_beta_pop{1}
gdp_alpha_0{2}gdp_beta_pop{2}
gdp_alpha_0{3} gdp_beta_pop{3}
gdp_alpha_0{4}gdp_beta_pop{4}
mortality_beta_pop{0}mortality_alpha_0{0}mortality_beta_gdp
mortality_beta_pop{1}mortality_alpha_0{1}
mortality_beta_pop{2} mortality_alpha_0{2}
mortality_beta_pop{3} mortality_alpha_0{3}
mortality_beta_pop{4} mortality_alpha_0{4}
Simulations in Spark
• Context
• Motivation
• SimBuilder
• Backends
• Benchmarks
• Discussion
25
SimBuilder Backends
• SimBuilder traverses the DAG
o Backends plugin via API
• Backends provide
o Data loading strategy
o Methods to combine data from different nodes to
calculate new nodes
o Method to save computed data
26
Local Pandas Backend
• Useful for debugging and small simulations
• Master process maintains list of Model objects:
o pandas DataFrames of all model data
─ Indexed by global variables
o sympy expression for how to calculate t+1
─ Vectorized when possible, fallback to row-wise apply
• Simulating over time traverses DAG to join necessary
DataFrames and compute results to fill in new columns
27
Spark DataFrame Backend
• Similar to local backend, swapping in Spark’s DataFrame
for Pandas’
• Spark context maintains list of Model objects:
o pyspark DataFrames of all model data
─ Columns for each global variable and
o sympy expression for how to calculate t+1
─ Calculated using row-wise apply
• Joins DataFrames where necessary
28
Spark + Numpy Backend
• Uses Spark to distribute and schedule the execution of
vectorized numpy operations
• One RDD for each node and year:
o Containing an ndarray and index vector for each axis
o Can optionally partition the data array into key-value pairs
along one or more axes (similar to bolt)
• To calculate a new node, all the parent RDD’s are
unioned and then reduced into a new child RDD
29
Simulations in Spark
• Context
• Motivation
• SimBuilder
• Backends
• Benchmarks
• Discussion
30
Benchmarking
• Single executor (8 cores, 32GB)
• Synthetic data
o 65 nodes in DAG
o 3 countries x 20 ages x 2 sexes x N draws
o Forecasted forward 25 years
31
Benchmarks
32
Benchmarks
33
Simulations in Spark
• Context
• Motivation
• SimBuilder
• Backends
• Benchmarks
• Discussion
34
Limitations of Spark DataFrame
• Majority of execution time is spent joining the DataFrame
and aligning the dimension columns
• These times were achieved after careful tuning of
partitions
o Perhaps custom partitioners could reduce runtime further
35
Taking Advantage of Numpy
• For multidimensional datasets of consistent shape and
size, simply aligning indices can eliminate the overhead
of joins
o Including generalizations to axes of size 1 to simplify coding
• Numpy’s vectorized operations are highly tuned and
scale well
o Including multithreading when e.g. compiled against MKL
36
Future Development
• Experiment with better partitioning of Numpy RDDs
o Perhaps use bolt?
• Improve partial graph execution
o Executing the entire DAG at once often crashes the driver
o Executing each year’s partial DAG in sequence results in idle CPU
time towards the end of that DAG
• Investigate more efficient Spark DataFrame joining
methods for Panel data
37
Team
Kyle Heuton
krheuton@uw.edu
Chun-Wei Yuan
cwyuan@uw.edu
Kyle Foreman
kfor@uw.edu
healthdata.org

More Related Content

What's hot

Data Visualization for Management Consultants & Analyst
Data Visualization for Management Consultants & AnalystData Visualization for Management Consultants & Analyst
Data Visualization for Management Consultants & Analyst
Asen Gyczew
 
How to get into management consulting
How to get into management consultingHow to get into management consulting
How to get into management consulting
Asen Gyczew
 
Integrating Business Rules and Business Processes
Integrating Business Rules and Business ProcessesIntegrating Business Rules and Business Processes
Integrating Business Rules and Business Processes
Michael zur Muehlen
 
Tice - HR & Sales Partnership: Developin High-Performance Sales People
Tice - HR & Sales Partnership:  Developin High-Performance Sales PeopleTice - HR & Sales Partnership:  Developin High-Performance Sales People
Tice - HR & Sales Partnership: Developin High-Performance Sales PeopleHR Florida State Council, Inc.
 
The Business Model of Consulting is Dead
The Business Model of Consulting is DeadThe Business Model of Consulting is Dead
The Business Model of Consulting is Dead
Patrick Van der Pijl
 
Silicon Valley’s Tools for Translating Startup Ideas into Billion Dollar Comp...
Silicon Valley’s Tools for Translating Startup Ideas into Billion Dollar Comp...Silicon Valley’s Tools for Translating Startup Ideas into Billion Dollar Comp...
Silicon Valley’s Tools for Translating Startup Ideas into Billion Dollar Comp...
Rod King, Ph.D.
 
What Makes a Great Sales Playbook?
What Makes a Great Sales Playbook?What Makes a Great Sales Playbook?
What Makes a Great Sales Playbook?
Carrie Morgan
 
Kellogg The Top 5 Scale-Up Mistakes.pdf
Kellogg The Top 5 Scale-Up Mistakes.pdfKellogg The Top 5 Scale-Up Mistakes.pdf
Kellogg The Top 5 Scale-Up Mistakes.pdf
Dave Kellogg
 
Summary of 'The Mom Test' (v2 2013-11-05)
Summary of 'The Mom Test' (v2 2013-11-05)Summary of 'The Mom Test' (v2 2013-11-05)
Summary of 'The Mom Test' (v2 2013-11-05)
Max Völkel
 
E-mail Restart 2023: Jakub Malý - Gamifikace, omnichannel komunikace a kineti...
E-mail Restart 2023: Jakub Malý - Gamifikace, omnichannel komunikace a kineti...E-mail Restart 2023: Jakub Malý - Gamifikace, omnichannel komunikace a kineti...
E-mail Restart 2023: Jakub Malý - Gamifikace, omnichannel komunikace a kineti...
Taste
 
Essential Finance & Accounting for Management Consultants and Business Analysts
Essential Finance & Accounting for Management Consultants and Business AnalystsEssential Finance & Accounting for Management Consultants and Business Analysts
Essential Finance & Accounting for Management Consultants and Business Analysts
Asen Gyczew
 
Essential Real Estate Modeling in Excel
Essential Real Estate Modeling in ExcelEssential Real Estate Modeling in Excel
Essential Real Estate Modeling in Excel
Asen Gyczew
 
Essential Excel for Business Analysts and Consultants
Essential Excel for Business Analysts and ConsultantsEssential Excel for Business Analysts and Consultants
Essential Excel for Business Analysts and Consultants
Asen Gyczew
 
Zero to One - Book Summary Report
Zero to One - Book Summary ReportZero to One - Book Summary Report
Zero to One - Book Summary Report
Corey O'Neal
 
Top Courses for Business Analysts
Top Courses for Business AnalystsTop Courses for Business Analysts
Top Courses for Business Analysts
Asen Gyczew
 
E-mail Restart 2023: Ondřej Kužílek - Budování značky v e-mail marketingu
E-mail Restart 2023: Ondřej Kužílek - Budování značky v e-mail marketinguE-mail Restart 2023: Ondřej Kužílek - Budování značky v e-mail marketingu
E-mail Restart 2023: Ondřej Kužílek - Budování značky v e-mail marketingu
Taste
 
A Primer on Primary Market Research
A Primer on Primary Market ResearchA Primer on Primary Market Research
A Primer on Primary Market Research
Elaine Chen
 
Lean New Product & Process Development
Lean New Product & Process DevelopmentLean New Product & Process Development
Lean New Product & Process Development
ICEES Global Private Limited
 
Liquidity Management for Management Consultants & Managers
Liquidity Management for Management Consultants & ManagersLiquidity Management for Management Consultants & Managers
Liquidity Management for Management Consultants & Managers
Asen Gyczew
 

What's hot (20)

Data Visualization for Management Consultants & Analyst
Data Visualization for Management Consultants & AnalystData Visualization for Management Consultants & Analyst
Data Visualization for Management Consultants & Analyst
 
How to get into management consulting
How to get into management consultingHow to get into management consulting
How to get into management consulting
 
Integrating Business Rules and Business Processes
Integrating Business Rules and Business ProcessesIntegrating Business Rules and Business Processes
Integrating Business Rules and Business Processes
 
Tice - HR & Sales Partnership: Developin High-Performance Sales People
Tice - HR & Sales Partnership:  Developin High-Performance Sales PeopleTice - HR & Sales Partnership:  Developin High-Performance Sales People
Tice - HR & Sales Partnership: Developin High-Performance Sales People
 
The Business Model of Consulting is Dead
The Business Model of Consulting is DeadThe Business Model of Consulting is Dead
The Business Model of Consulting is Dead
 
Silicon Valley’s Tools for Translating Startup Ideas into Billion Dollar Comp...
Silicon Valley’s Tools for Translating Startup Ideas into Billion Dollar Comp...Silicon Valley’s Tools for Translating Startup Ideas into Billion Dollar Comp...
Silicon Valley’s Tools for Translating Startup Ideas into Billion Dollar Comp...
 
What Makes a Great Sales Playbook?
What Makes a Great Sales Playbook?What Makes a Great Sales Playbook?
What Makes a Great Sales Playbook?
 
Kellogg The Top 5 Scale-Up Mistakes.pdf
Kellogg The Top 5 Scale-Up Mistakes.pdfKellogg The Top 5 Scale-Up Mistakes.pdf
Kellogg The Top 5 Scale-Up Mistakes.pdf
 
Summary of 'The Mom Test' (v2 2013-11-05)
Summary of 'The Mom Test' (v2 2013-11-05)Summary of 'The Mom Test' (v2 2013-11-05)
Summary of 'The Mom Test' (v2 2013-11-05)
 
E-mail Restart 2023: Jakub Malý - Gamifikace, omnichannel komunikace a kineti...
E-mail Restart 2023: Jakub Malý - Gamifikace, omnichannel komunikace a kineti...E-mail Restart 2023: Jakub Malý - Gamifikace, omnichannel komunikace a kineti...
E-mail Restart 2023: Jakub Malý - Gamifikace, omnichannel komunikace a kineti...
 
Essential Finance & Accounting for Management Consultants and Business Analysts
Essential Finance & Accounting for Management Consultants and Business AnalystsEssential Finance & Accounting for Management Consultants and Business Analysts
Essential Finance & Accounting for Management Consultants and Business Analysts
 
Essential Real Estate Modeling in Excel
Essential Real Estate Modeling in ExcelEssential Real Estate Modeling in Excel
Essential Real Estate Modeling in Excel
 
Essential Excel for Business Analysts and Consultants
Essential Excel for Business Analysts and ConsultantsEssential Excel for Business Analysts and Consultants
Essential Excel for Business Analysts and Consultants
 
Zero to One - Book Summary Report
Zero to One - Book Summary ReportZero to One - Book Summary Report
Zero to One - Book Summary Report
 
Top Courses for Business Analysts
Top Courses for Business AnalystsTop Courses for Business Analysts
Top Courses for Business Analysts
 
E-mail Restart 2023: Ondřej Kužílek - Budování značky v e-mail marketingu
E-mail Restart 2023: Ondřej Kužílek - Budování značky v e-mail marketinguE-mail Restart 2023: Ondřej Kužílek - Budování značky v e-mail marketingu
E-mail Restart 2023: Ondřej Kužílek - Budování značky v e-mail marketingu
 
A Primer on Primary Market Research
A Primer on Primary Market ResearchA Primer on Primary Market Research
A Primer on Primary Market Research
 
Business Models Template E145
Business Models Template E145Business Models Template E145
Business Models Template E145
 
Lean New Product & Process Development
Lean New Product & Process DevelopmentLean New Product & Process Development
Lean New Product & Process Development
 
Liquidity Management for Management Consultants & Managers
Liquidity Management for Management Consultants & ManagersLiquidity Management for Management Consultants & Managers
Liquidity Management for Management Consultants & Managers
 

Viewers also liked

GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
Jen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
Jen Aman
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
Jen Aman
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
Jen Aman
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
Jen Aman
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
Jen Aman
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics
Jen Aman
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
Jen Aman
 
Big Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudBig Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the Cloud
Jen Aman
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
Jen Aman
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
Jen Aman
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Jen Aman
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Databricks
 
Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insigh...
Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insigh...Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insigh...
Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insigh...
Spark Summit
 
Estimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache SparkEstimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache Spark
Cloudera, Inc.
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Morticia: Visualizing And Debugging Complex Spark Workflows
Morticia: Visualizing And Debugging Complex Spark WorkflowsMorticia: Visualizing And Debugging Complex Spark Workflows
Morticia: Visualizing And Debugging Complex Spark Workflows
Spark Summit
 

Viewers also liked (20)

GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Big Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudBig Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the Cloud
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insigh...
Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insigh...Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insigh...
Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insigh...
 
Estimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache SparkEstimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache Spark
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Morticia: Visualizing And Debugging Complex Spark Workflows
Morticia: Visualizing And Debugging Complex Spark WorkflowsMorticia: Visualizing And Debugging Complex Spark Workflows
Morticia: Visualizing And Debugging Complex Spark Workflows
 

Similar to Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forecasts

ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
Jason Riedy
 
19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE
Bharath123Maddipati
 
ONS local presents clustering
ONS local presents clusteringONS local presents clustering
ONS local presents clustering
Office for National Statistics
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
台灣資料科學年會
 
Big data
Big dataBig data
Big data
Big dataBig data
Big data
Harshit Namdev
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
1904saikrishna
 
Numerical inflation: simulation of observational parameters
Numerical inflation: simulation of observational parametersNumerical inflation: simulation of observational parameters
Numerical inflation: simulation of observational parameters
Milan Milošević
 
PCL (Point Cloud Library)
PCL (Point Cloud Library)PCL (Point Cloud Library)
PCL (Point Cloud Library)
University of Oklahoma
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
Skillwise Group
 
Graphs and Financial Services Analytics
Graphs and Financial Services AnalyticsGraphs and Financial Services Analytics
Graphs and Financial Services Analytics
Neo4j
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handout
Yi-Shin Chen
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision Making
BigML, Inc
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
ibankuk
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
Mark Nichols, P.E.
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
Toshiyuki Shimono
 
Big data
Big dataBig data
Big data
Zeeshan Khan
 
Carbon dioxide emissions from cars of different brands.
Carbon dioxide emissions from cars of different brands.Carbon dioxide emissions from cars of different brands.
Carbon dioxide emissions from cars of different brands.
Rohan386712
 
Euro30 2019 - Benchmarking tree approaches on street data
Euro30 2019 - Benchmarking tree approaches on street dataEuro30 2019 - Benchmarking tree approaches on street data
Euro30 2019 - Benchmarking tree approaches on street data
Fabion Kauker
 

Similar to Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forecasts (20)

ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE
 
ONS local presents clustering
ONS local presents clusteringONS local presents clustering
ONS local presents clustering
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 
Numerical inflation: simulation of observational parameters
Numerical inflation: simulation of observational parametersNumerical inflation: simulation of observational parameters
Numerical inflation: simulation of observational parameters
 
PCL (Point Cloud Library)
PCL (Point Cloud Library)PCL (Point Cloud Library)
PCL (Point Cloud Library)
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
 
Graphs and Financial Services Analytics
Graphs and Financial Services AnalyticsGraphs and Financial Services Analytics
Graphs and Financial Services Analytics
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handout
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision Making
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
Big data
Big dataBig data
Big data
 
Carbon dioxide emissions from cars of different brands.
Carbon dioxide emissions from cars of different brands.Carbon dioxide emissions from cars of different brands.
Carbon dioxide emissions from cars of different brands.
 
Euro30 2019 - Benchmarking tree approaches on street data
Euro30 2019 - Benchmarking tree approaches on street dataEuro30 2019 - Benchmarking tree approaches on street data
Euro30 2019 - Benchmarking tree approaches on street data
 

More from Jen Aman

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Jen Aman
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
Jen Aman
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
Jen Aman
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache Spark
Jen Aman
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To Production
Jen Aman
 
High-Performance Python On Spark
High-Performance Python On SparkHigh-Performance Python On Spark
High-Performance Python On Spark
Jen Aman
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
Jen Aman
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
Jen Aman
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Jen Aman
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Jen Aman
 
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine LearningUtilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Jen Aman
 

More from Jen Aman (15)

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache Spark
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To Production
 
High-Performance Python On Spark
High-Performance Python On SparkHigh-Performance Python On Spark
High-Performance Python On Spark
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
 
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine LearningUtilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine Learning
 

Recently uploaded

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 

Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forecasts

  • 1. Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forecasts Kyle Foreman, PhD 07 June 2016
  • 2. Simulations in Spark • Context • Motivation • SimBuilder • Backends • Benchmarks • Discussion 2
  • 3. What is IHME? 3 • Independent research center at the University of Washington • Core funding by Bill & Melinda Gates Foundation and State of Washington • ~300 faculty, researchers, students and staff • Providing rigorous, scientific measurement o What are the world’s major health problems? o How well is society addressing these problems? o How should we best dedicate resources to improving health? Goal: improve health by providing the best information on population health
  • 4. Global Burden of Disease • A systematic scientific effort to quantify the comparative magnitude of health loss due to diseases, injuries and risk factors o 188 countries o 300+ diseases o 1990 – 2015+ o 50+ risk factors 4
  • 5. GBD Collaborative Model 1,100 experts in 106 countries
  • 6. Death Rate in 2013 (age-standardized) 6
  • 9. DALYs in Low & Middle Income Countries (2010) 9
  • 10. Simulations in Spark • Context • Motivation • SimBuilder • Backends • Benchmarks • Discussion 10
  • 11. Forecasting the Burden of Disease 11 1. Generate a baseline scenario projecting the GBD 25 years into the future o Mortality, morbidity, population, and risk factors o Every country, cause, sex, age 2. Create a comprehensive simulation framework to assess alternative scenarios o Capture the complex interrelationships between risk factors, interventions, and diseases to explore the effects of changes to the system 3. Build a modular and flexible platform that can incorporate new models to answer detailed “what if?” questions o E.g. introduction of a new vaccine, effects of global warming, scale up of ART coverage, risk of global pandemics, etc
  • 12. Simulations • Takes into account interdependencies between risk factors, diseases, etc. o Use draws of model parameters to advance from t to t+1, allowing for forecasts of other quantities (e.g. risk factors and mortality) to interact • Modular structure allows for detailed “what ifs” o E.g. scale up of coverage of a new intervention • Allows us to incorporate alternative models to test sensitivity to model choice and specification 12
  • 13. Basic Simulation Process 1. Fit statistical models (PyMB) capturing most important relationships as statistical parameters 2. Generate many draws from those parameters, reflecting their uncertainty and correlation structure 3. Run Monte Carlo simulations to update each quantity over time, taking into account its dependencies 13
  • 14. Simulations in Spark • Context • Motivation • SimBuilder • Backends • Benchmarks • Discussion 14
  • 15. SimBuilder • Directed Acyclic Graph (DAG) construction and validation o yaml and sympy specification o curses interface o graphviz visualization • Flexible execution backends o pandas o pyspark 15
  • 16. Example Global Health DAG 16 GDP 2016 Deaths 2016 Births 2016 Population 2016 Workforce 2016 Deaths 2017 Births 2017 Population 2017 Workforce 2017 GDP 2017
  • 17. Creating a DAG with SimBuilder 1. Create a YAML for the overall simulation o Specify child models and the dimensions of the simulation 2. Build separate YAML files for each child model o Dimensions along which the model varies o Location of historical data o Expression for how to generate the quantity in t+1 3. Run SimBuilder to validate and explore DAG 17
  • 18. Creating a DAG with SimBuilder 18 name: gdp_pop models: - name: gdp_per_capita - name: workforce - name: mortality - name: gdp - name: pop global_parameters: - name: loc set: List values: ["USA", "MEX", "CAN", ..., ] - name: t set: Range start: 2014 end: 2040 - name: draw set: Range start: 0 end: 999 sim.yaml
  • 19. 19 Creating a DAG with SimBuilder name: pop version: 0.1 variables: [loc,t] history: - type: csv path: pop.csv pop.yaml
  • 20. Creating a DAG with SimBuilder name: gdp_per_capita version: 0.1 expr: gdp(draw, t, loc)/pop(draw, t-1, loc) variables: [draw, t, loc] history: - type: csv path: gdp_pc_w_draw.csv gdp_per_capita.yaml 20
  • 21. 21 Creating a DAG with SimBuilder name: gdp version: 0.1 expr: gdp(d,loc,t-1) * exp( alpha_0(loc, t, draw) + beta_gdp(draw) * log(gdp_per_capita(draw, loc, t-1)) + beta_pop(draw, t) * workforce(loc, t) ) variables: [draw ,loc,t] history: - type: csv path: gdp_w_draw.csv model_parameters: - name: alpha_0 variables: [draw, loc, t] history: - type: csv path: gdp/alpha_0.csv - name: beta_gdp variables: [d] history: - type: csv path: gdp/beta_gdp.csv - name: beta_pop variables: [d] history: - type: csv path: gdp/beta_pop.csv gdp.yaml
  • 24. 24 gdp{0} gdp{1} gdp_per_capita{0} gdp{2} gdp_per_capita{1} gdp{3} gdp_per_capita{2} gdp{4} gdp_per_capita{3} gdp_per_capita{4} workforce{0} mortality{1} workforce{1} mortality{2} workforce{2} mortality{3} workforce{3} mortality{4} workforce{4} pop{0} pop{1} pop{2} pop{3} pop{4} mortality{0} gdp_beta_gdpgdp_alpha_0{0} gdp_beta_pop{0} gdp_alpha_0{1} gdp_beta_pop{1} gdp_alpha_0{2}gdp_beta_pop{2} gdp_alpha_0{3} gdp_beta_pop{3} gdp_alpha_0{4}gdp_beta_pop{4} mortality_beta_pop{0}mortality_alpha_0{0}mortality_beta_gdp mortality_beta_pop{1}mortality_alpha_0{1} mortality_beta_pop{2} mortality_alpha_0{2} mortality_beta_pop{3} mortality_alpha_0{3} mortality_beta_pop{4} mortality_alpha_0{4}
  • 25. Simulations in Spark • Context • Motivation • SimBuilder • Backends • Benchmarks • Discussion 25
  • 26. SimBuilder Backends • SimBuilder traverses the DAG o Backends plugin via API • Backends provide o Data loading strategy o Methods to combine data from different nodes to calculate new nodes o Method to save computed data 26
  • 27. Local Pandas Backend • Useful for debugging and small simulations • Master process maintains list of Model objects: o pandas DataFrames of all model data ─ Indexed by global variables o sympy expression for how to calculate t+1 ─ Vectorized when possible, fallback to row-wise apply • Simulating over time traverses DAG to join necessary DataFrames and compute results to fill in new columns 27
  • 28. Spark DataFrame Backend • Similar to local backend, swapping in Spark’s DataFrame for Pandas’ • Spark context maintains list of Model objects: o pyspark DataFrames of all model data ─ Columns for each global variable and o sympy expression for how to calculate t+1 ─ Calculated using row-wise apply • Joins DataFrames where necessary 28
  • 29. Spark + Numpy Backend • Uses Spark to distribute and schedule the execution of vectorized numpy operations • One RDD for each node and year: o Containing an ndarray and index vector for each axis o Can optionally partition the data array into key-value pairs along one or more axes (similar to bolt) • To calculate a new node, all the parent RDD’s are unioned and then reduced into a new child RDD 29
  • 30. Simulations in Spark • Context • Motivation • SimBuilder • Backends • Benchmarks • Discussion 30
  • 31. Benchmarking • Single executor (8 cores, 32GB) • Synthetic data o 65 nodes in DAG o 3 countries x 20 ages x 2 sexes x N draws o Forecasted forward 25 years 31
  • 34. Simulations in Spark • Context • Motivation • SimBuilder • Backends • Benchmarks • Discussion 34
  • 35. Limitations of Spark DataFrame • Majority of execution time is spent joining the DataFrame and aligning the dimension columns • These times were achieved after careful tuning of partitions o Perhaps custom partitioners could reduce runtime further 35
  • 36. Taking Advantage of Numpy • For multidimensional datasets of consistent shape and size, simply aligning indices can eliminate the overhead of joins o Including generalizations to axes of size 1 to simplify coding • Numpy’s vectorized operations are highly tuned and scale well o Including multithreading when e.g. compiled against MKL 36
  • 37. Future Development • Experiment with better partitioning of Numpy RDDs o Perhaps use bolt? • Improve partial graph execution o Executing the entire DAG at once often crashes the driver o Executing each year’s partial DAG in sequence results in idle CPU time towards the end of that DAG • Investigate more efficient Spark DataFrame joining methods for Panel data 37