Sim Slides,Tricks,Trends,2012jan15

Simulation in Excel:
Tricks, Trials & Trends
Presented to the
American College of Radiology
12 January 2012

Dennis Sweitzer, Ph.D.!
www.Dennis-Sweitzer.com !

Abstract
Simulation in Excel: Tricks, Trials & Trends

Excel is a general purpose spreadsheet which is widely used & understood, but rarely used by itself for
simulations. However, the Data Table function in MS Excel can be used to execute substantial
simulations, without requiring cumbersome programming "tricks" or VBA coding. The result is an
arbitrarily large results table in which each row is one iteration of the simulation, and each column is a
random variable generated in the simulation.

A small number of additional probability functions are easily programmed using VBA to make Excel a
general purpose simulation package. Because VBA is interpreted, use of VBA functions can greatly limit
the speed of a simulation. However, for simulations of small size and complexity, the ease and familiarity
of working in Excel, outweigh the disadvantages of speed. Examples from clinical trials will be used.

Finally, I discuss new methods to move simulations out of the black boxes and into the enterprise, based
on work by Sam Savage. Simulation results (a “SIP”, or “Stochastic Information Packet”) from multiple
platforms can be stored as XML strings(using the DIST standard) in a “SLURP” (“Stochastic Library Unit
with Relationships Preserved”), and from there used for reports, planning, etc, or incorporated into other
simulations.

Outline
•  How to do Simulation in Excel
•  Notes on using Inverse Probability Functions
•  Some Macros and VBA •  Clinical Trial Examples
functions
•  Probability Management
in SIPS, SLURPS, & DIST

Background

•  Occasional need for simulations
•  Excel is convenient, but
–  does not explicitly support simulations
–  Simulation usually requires VBA programming
(so why not use R or SAS instead)
–  Or Add-in commercial programs (eg., @Risk)
–  Or some academic add-ins
•  Does have iterative calculations, Solver
•  Why not simulation?

Simulate what?
•  Stochastic Models
–  Unknown parameters? èGuestimate a distribution
–  Optimizing choices? èTest each with simulations
•  Sensitivity Analysis
–  Variations in Inputs è Variations in Outputs
–  2 parameters: use a table
–  >2 parameters: simulate & compare variation

Excel: Pros
Common Language / Common Tools
•  Most people understand Excel MEGO
•  Many tools available in Excel
Transparency: Modeling assumptions can be:
Specified -- Graphed -- Debated
What you see is what you get!
More hands on deck, more eyes on the prize….:
Statistician Team Member
Initial Model Explores & breaks model
Repair & enhance …Repeat until satisfied

Excel Cons
Slower than in SAS, S+, R, etc
Lacks some statistical/probability functions
•  Latest versions are a little better
•  Still need to add some VBA code
•  Known bugs in statistical routines (often fixed)
Tradeoffs:
•  Quicker modifications
vs slower execution

Simple Solution: Data Tables
Excel Data Tables
•  Creates a table of values of a function
 Each column is a Random Variable

•  Leftmost column is used as an argument
–  (unneeded for simulation)
•  Data Table repeats calculations for each row
 Each row is a simulation iteration

1. Create Simulation

Create Random Variables using Inverse Probability Method:
For Random Variable X with distribution function F(x),
F(x): ℜ→ [0,1]
If Random Uniform U∈ [0,1]
X = F-1(U) (Excel: U=Rand() )

2. Align Random Variables
•  Calculations can be
anywhere in
Spreadsheet
•  Reference the
Variables in a row
•  Is best to label
variables in same way

3. Select Data Table
•  Select table region
–  1st row is Rand Vars
–  1st column is not used
(can label iterations)
•  From toolbar:
–  Data>Data Table

4. Create Simulation Table
•  Column input cell =
Upper left hand corner
of table
•  Row input cell = ignore
•  OK è Populates the
table
•  (may have to manually
recalculate)

5. Execute Simulation
Iterative development
•  Simulation can be changed
•  Add reporting variables
•  Recalculate to rerun
–  (no need to use Data Table
again, unless expanding)
•  Hint: debug with short table,
expand for final run

But still more….
•  Why use inverse probability distributions
(instead of random variables)?
•  When not to use a spreadsheet for simulation?
•  Tools:
–  Macros to set up a simulation
–  VBA functions for common simulation distributions
•  Trends: Probability Management
–  SIPs, SLURPS, DIST

Inverse Probability Function
•  Most systems directly generate random
variables with the desired distribution
•  Why use Inverse Probability Functions?
–  Which are (probably) slower?
Personal opinion
•  Testing & Debugging
•  Verification ç Calculates correctly
•  Validation ç Calculations answer Problem
•  Sensitivity ç Input vs Output variability

Why use Inverse Probability Distributions?
•  Testing & Debugging
•  Validation & Verification
•  Sensitivity

ç Save the Rand() values
è Recreate unexpected results
è Reasonableness: small changes in Rand() à small
changes in output?
è Explore impact of small changes in Rand() values
on simulation output

As Mapping function

⟼F-1
U

Probability Distribution: F(x): ℜ→ [0,1]
Random Uniform: U∈ (0,1]

Inverse PDF: X = F-1(U)

For Continuous (or monotone) F-1
Small changes in u U è small changes in F-1 (u)

Mapping

2 Random Uniform Var
As input to
Deterministic Function

Mapping

Random numbers in
(should)
Map to outputs in

Example #1
Simple model, Saving {Ui}:
function of 2 RV
•  Verify
•  Replicate
A Max value looks high. •  Quantify
Is it a bug? If not, how often?

Saved random U[0,1]
For each iteration
Check u U[0,1]
That generated high value
u=0.983… è random high
è Rarely happens

Example #1 (Sensitivity)
Sort by U1, U2

çSensitive
to U1

çInsensitive
to U2

Spreadsheet limitations
•  Only simple data structures are available
–  Rows & columns, no lists & trees
–  Discrete event simulations
•  Complex algorithms: difficult
–  Eg, While or for loops
–  Can improvise (cumbersome, slow, buggy)
•  Speed: slow
•  Data Storage: what-you-see-is-all-you-get

Tools: Excel Simulation Template
•  Adds some missing random functions
•  Adds some set-up macros

Excel template & examples at:
www.Dennis-Sweitzer.com

Macro SimulateSampler
To start a new simulation when you don't
remember the names & parameters of
common random variables used in simulation:
•  Run the Macro SimulationSample
•  Copy, delete, and edit as needed.
•  Make sure all random values are referenced
in the first row of the data table at the
bottom.

Macro SimulationSampler
•  Creates a simulation with
each of common
simulation functions

Macro SimulationSampler
………
•  Sets up header
row for data
table
•  Sets up a place
for statistics

Macro Simulate
•  Highlight the row of random variables
–  (1st row of simulation table)
•  Run macro "Simulate”
–  Prompts for which will ask for the number of
simulation iterations,
–  The default number of iterations is 100
–  Debug & develop (manually recalculate)
–  Final run with >1000 iterations
–  Visual Basic code is computationally intensive,

Excel Random Variables

Rand() --Random Uniform [0,1]
NormSInv() – Inverse Standard Normal Distribution
CriticalBinomial() – Inverse Binomial Distribution
LogNormInv() - Inverse Log Normal Distribution
Caveat: parameters are mean, SD after the Log transformation

Erlang Distribution

How long do you wait until you get a
predetermined number of arrivals?
•  Interarrival times are distributed IID
exponential
•  Erlang is Gamma with integer parameter

Beta Distribution

Can use as
•  Distribution of a Binomial probability
•  Range = [0,1]
•  Generic bounded hump (vs Normal as generic unbounded hump)
•  Better behaved than a triangular distribution

Example#2, Problem

Client: “Here’s our plan….”
•  Simple spreadsheet calculation
–  But only the expected value,
–  but not variability

Example #2, Simulation
•  Time to 100th
patient
•  Patients arrive
IID Exponential

Summary Statistics of Simulated values
(below)
Interpretation: under the assumptions,
90% of simulations required more than 4.4
months

Added VBA Functions
Inverse Functions Needed for Simulation
•  Poisson, Negative Binomial
Interpolation from Table
•  Interpolate: 1 or 2 dimensional interpolation
Convenience
•  Beta with Mean, SD as parameters
•  Beta with Hi, Low, and Mode used for
parameters
•  Log Normal with mean, SD as parameters

Missing Statistical Functions
Inverse Distributions
•  InvPoisson :: Poisson
•  InvPascal :: Negative Binomial
– (how many failures before k successes)
•  Negative Binomial is continuous valued distribution;
•  Discrete version is often denoted Pascal distribution

Example#3,
Patients to Screen

Expected Enrollment rate
= 75% ± 5%
~ Beta Distribution

# Screen Failures
~ Negative Binomial (Pascal)
–  Depends on Enrollment
Rate

Beta Distribution (2)

For
Convenience
•  Beta distribution given Mean, SD
•  Beta distribution given Mean, SD, upper, lower bounds
•  Beta distribution given Mode, Upper, Lower bounds

Simulation from a Table

Find the value in the 1st vector;
ç Return interpolated value from 2nd
Simulate arbitrary distribution:
•  Top Row: values in [0,1]
•  Bottom Row: Quantiles
•  Result: interpolated value of U from table
Or a function: y=f(x)
•  X is found in top row, y is interpolated from bottom row

Table Simulation Uses
• Polygonal distributions (like Triangular)
• Survival curve (for time to event)
– Est. K-M curve from data, simulate rest of trial
• Arbitrary empirical distributions
• Distribution from observations
• Table of power calculations
– eg, assurance calculations:
• If # patients is random, so is effective power of the study
• If True effect size is random, so is Pr{success}

Simulation from a 2-dimensional table

Here:
•  Rows are quartiles of a random function
•  Left column is value of a parameter
•  A family of distributions which vary with the parameter

•  Parameter y=75% (can be random)
•  Generate random numbers from the interpolated distribution.

Example #4: Interim Review
•  After 2 months, review randomization rates
•  Continue to Randomize to 100 patients
•  How long?

Example#4: Interim Review (Simulation)
Y= # Patients at 2 mos
~ Poisson

Time to Randomize
(100-Y) additional pts
~ Erlang (Gamma)

80% CI:; (2.5, 3.7)
months

Clinical Trials Applications
•  Simulations for planning
•  Prototyping larger simulation
•  Checking assumptions/validation

Planning
Expected Trial Performance
•  Usually not of interest -- already done w/o simulation
•  But should be
Variability of Trial Performance
•  Important for Risk Management: What s the earliest,
the latest, the most, the least, etc
•  80% CIs
Structural Problems
•  Interactions of parameters may doom the trial before it
even starts! (eg, mean (max{ X, Y} ) vs max{ mean(X), mean(Y) } )

¡The Flaw of Averages!

Prototyping
Prototyping:
•  Toy simulation with hands-on teamwork
•  Development model
•  Get team buy-in on assumptions
•  Processing speed not important
•  Rapid modifications are important
Ideal?
•  Develop a prototype in an 1 hour meeting
•  Check for errors later
•  Run large simulations later for precise estimates

Checking planning assumptions
•  H0 = Simulation assumptions
•  Observed: a value X
•  {xi} = corresponding values in simulation
•  Rank of X in {xi} ≈ p-value
Stored Values: Use Function Percent Rank
Descriptive Statistics: Use Frequency Count

Use to:
•  Test assumptions, validate model, +??
•  If an observed value of X is rare in the simulation,
question assumptions!

Checking Assumptions
Example:
•  A trial is designed based on a non-trivial simulation.
•  The model predicts a completion rate of 65%
with 95% C.I.= (55%, 75%)
•  4 months into the trial, a 50% completion rate is
observed.
•  How significant is this discrepancy?
Resimulate:
•  {xi} = simulated completion rates (1/iteration)
•  Rank of observed 50% in simulated {xi} ≈ p-value
•  How likely is the observation, under the modeled
assumptions?

Sensitivity Analysis
•  What-ifs
•  Interactions between parameters
è Identify Key Control points!
•  Vary parameters between simulations
•  Compare simulation results
–  Eg, average, worst-case scenarios
•  Correlations between simulated parameters
and outcomes

Weighted simulations
Advantage:
•  Large but unlikely events are more likely to
be simulated
•  Common but dull events are simulated
infrequently, but up-weighted
•  Rare, but exciting, events are simulated, and
down-weighted

Macro Management
VBA Editor:
Alt-F11 (or find the menu)
•  Copy Module between sheets
•  Copy code from .xls sheet &
insert into VBA editor
•  Open & save as new sheet

Macro Management (newer)
In Visual Basic

From the
Tool Bar

•  File > Export File
–  Export VBA code
(module: “SweitzerSimulationCoreCode”)
•  File > Import File
–  Imports VBA code (into a module)

Further resources
Commercial and Free software packages
Provide:
•  More rigorous algorithms
•  More functions
–  Resampling, multivariate, etc
•  More support

Commercial Add-Ins
@RISK
www.palisade.com
Crystal Ball
www.decisioneering.com

Free Add-Ins
PopTools (Windows only)
www.cse.csiro.au/poptools
SimTools.xla (Macintosh & Windows)
http://home.uchicago.edu/~rmyerson/addins.htm
Caveat: Licensing
•  Free for non-commercial (eg, education)
•  Not clear for other uses
(NB: vba code from my website is free for all use,
but not as useful)

Semi-Commercial
Low-cost Excel simulation add-in:
•  RiskSim by Michael Middleton
•  www.treeplan.com/
•  Also: Decision Trees, Sensitivity Analysis,
on-line text-book:
http://www.treeplan.com/chapters.htm

Additional Reading
INTRODUCTION TO MODELING AND GENERATING
PROBABILISTIC INPUT PROCESSES FOR SIMULATION

www.informs-sim.org/wsc07papers/008.pdf
Spreadsheet Simulation (Seila, 2006)
Work Smarter, Not Harder: Guidelines for
Designing Simulation Experiments
Tips for the Successful Practice of Simulation

Probability Management

Built more elaborate models
Learned to
•  Display results in column
•  Copy values to save
•  Do math with the results
Why not?
•  Save columns
of simulated
iterations
•  Recombine as
needed

Combining simulations results
4 simulations:
{ 2 studies} x {2 scenarios}
Why not?
•  Save columns
Study#1,
of simulated
iterations
Early Start Estimates of
•  Recombine as total:
Study#1,
needed Late Start
•  Resources
•  Costs
•  Pr{success}
Study#2,
Early Start
Pick optimal

M
Study#2,
Late Start Requires
independence!
•  Ie., portfolio optimization

Combining simulation iterations
4 simulations:
{ 2 studies} x {2 scenarios}
Why not?
•  Save columns
Study#1,
of simulated
Early Start
iterations
•  Recombine as Study#1,
needed Late Start Estimates of …

Study#2,
Simulation Early Start
of common
factors Study#2,
Late Start

•  Preserves relationships

Other people already doing it
Further research:

Primary source for rest of presentation:
Savage, Scholtes and Zweidler, 2006, "Probability
Management," OR/MS Today, Vol.33, No.1 (February 2006)
•  http://www.orms-today.org/orms-2-06/frprobability.html
(Part 2)

Basic idea

Simulations
Simulations
Simulations
of common
of of common
common
factors
factors
factors Dependent
Simulations
Dependent
Simulations

Dependent
Simulations
Reporting &
Dependent Analysis
Simulations Programs

Estimates of …

Basic idea
Simulations
Simulations Multiple simulations:
Simulations •  Different platforms
Simulations
Simulations
•  Different sources
Simulations •  Different uses

Reporting &
Analysis
Programs &
Reporting
•  Database of Simulation Results Analysis
•  Results at the iteration level Programs

•  Coherent

Basic Definitions
Simulations

SIP: Stochastic
Information Package
•  Basic unit of information
•  Eg, “the price of oil”, but for
10,000 alternative universes

SLURP: Stochastic Library Unit with
Relationships Preserved
•  SIPs are coherent with each other
–  Eg, in each SIP, iteration #4567 is from the same alternative universe
•  Analogous to demographic “Representative Samples”

Basic Definitions
Simulations

Benefits of coherent Requires central control:
modeling •  Common standards
•  Statistical dependencies are •  Certification authority
modeled consistently across –  “Chief Probability Officer”
the organization
•  Models can be “rolled up”
between levels of the
organization
•  Auditability: Easier to audit
individual simple models

Coherence
Simulations

Example: variables Requires central control:
X&Y •  Common standards
•  Coherent •  Certification authority
–  “Chief Probability Officer”
•  But not correlated

DIST Standard
Simulations
XML
How to •  10,000 numbers
1 XML string
Store SIPs? Metadata + Base 64
•  Massive encoding of values
amounts of data
Contents:
How to •  Name
Reduce precision
Share SIPs? and pack it! •  Mean, Min, Max,
Count of values
•  Data type (Binary,
1 or 2 Byte)
3 bytes (8 bits each)
into
4 characters (6 bits each)

DIST Standard
•  A SIP in DIST fits into 1 cell on a spreadsheet

<dist name="User Interface, weeks"
avg="3.3751" min="2.03" max="7.75" count="100"
type="Double" origin="DistShaper3 at smpro.ca"
ver="1.1" >G00Z9SIDCIEmC0nYFtMi6R0XKZ
+KvSzBI85ui5tMZgoDlbGt dF1d/
CqEMwUlmCfVMMg6oUByUXQyIATsaSw1QhgrhOwaaAI9D
6oks9M+IDk0XQyIDlI2mhJZBkQXRnm7IR45ST3D///
IDlgrHD I38VraK2kLownZf41jWw1tROxTsS/
jGRAUJCbwHfwougAAEXR r3A83FQnpnhXukBxM
+kswBykeb0gOQ5RByk83PxtV7mCrH1QQ
jy6LPGstpgFYRrYKvqZ9Ez8AAAAA</dist>!
•  Each cell contains an array
•  Operations apply functions
to each element in array

Source: Marc Thibault, Sam Savage. Probability
Management for Projects: Managing Uncertainty in
plan estimates and targets.. October 2011

Supporting Software
MS Excel Spreadsheet Add-ins
•  Risk Solver from Frontline Systems (www.Solver.com)
<dist name="User Interface, weeks"
•  XLSim 3 (www.VectorEconomics.com) max="7.75" count="100"
avg="3.3751" min="2.03"
type="Double" origin="DistShaper3 at smpro.ca"
–  small (single sheet) interactive simulation with DISTs
ver="1.1" >G00Z9SIDCIEmC0nYFtMi6R0XKZ
+KvSzBI85ui5tMZgoDlbGt dF1d/
–  enables the users of Oracle Crystal Ball and @Risk from
CqEMwUlmCfVMMg6oUByUXQyIATsaSw1QhgrhOwaaAI9D
6oks9M+IDk0XQyIDlI2mhJZBkQXRnm7IR45ST3D///
Palisade Corp. to read and right DISTs.
IDlgrHD I38VraK2kLownZf41jWw1tROxTsS/
jGRAUJCbwHfwougAAEXR r3A83FQnpnhXukBxM
•  Analytica from Lumina Decision Systems, Inc
+kswBykeb0gOQ5RByk83PxtV7mCrH1QQ
jy6LPGstpgFYRrYKvqZ9Ez8AAAAA</dist>!
(www.Lumina.com)
SAS?
R/S+ --Already is vector oriented
•  RExcel runs R from Excel. ??

R/S+
Ø  x1<-rnorm(10000) # an array of 10,000 standard random normal
Ø  y1<-rpois(10000, 5) # an array of 10,000 random poissons
Ø  (x1+y1)[1:10] # element by element operations
•  Already handles vectors – very fast
•  Needs functions to encode & decode DIST

¿Accessing R from with spreadsheet?
•  RExcel – Access R from within Excel (Addin)
•  ROOo – Access R from within OpenOffice spreadsheet
•  Open Source (like LINIX)
•  (Perhaps) use spreadsheet for upper level simulation
•  Use R at lower level – each cell contains 1000’s of simulated values


Savage, Scholtes and Zweidler, 2006, "Probability
Management," OR/MS Today, Vol.33, No.1 (February 2006)
(Part 2)

The End
(Actual – not simulated)

Sim Slides,Tricks,Trends,2012jan15

Recommended

Recommended

More Related Content

Similar to Sim Slides,Tricks,Trends,2012jan15

Similar to Sim Slides,Tricks,Trends,2012jan15 (20)

More from Dennis Sweitzer

More from Dennis Sweitzer (12)

Sim Slides,Tricks,Trends,2012jan15