This document discusses methods for interpreting complex predictive models, specifically focusing on determining variable importance. It introduces input shuffling as a technique to assess variable importance for any model, including linear regression, decision trees, neural networks, and ensembles. Input shuffling randomly shuffles values of one input variable at a time and measures the impact on model predictions to determine how influential each variable is to the model. The document demonstrates input shuffling on both regression and classification tasks.
This document discusses methods for interpreting complex predictive models, specifically focusing on determining variable importance. It introduces input shuffling as a technique to assess variable importance for any model, including linear regression, decision trees, neural networks, and ensembles. Input shuffling randomly shuffles values of one input variable at a time and measures the impact on model predictions to identify influential variables. The document demonstrates input shuffling on both regression and classification tasks and compares it to other variable importance metrics for linear models.
This document provides instructions on using Excel functions and charts. It describes Excel components and arithmetic operators. It explains the order of precedence for calculations and how to use the Insert Function button to select functions. Examples show how to define functions within functions, examine the Insert Function dialog box, and create column and pie charts using the Chart Wizard. The Chart Wizard dialog boxes guide the user in selecting a chart type, choosing data series, and customizing the chart.
Applications of simulation in Business with ExamplePratima Ray
Simulation is modeling a real-world system on a computer to understand its behavior and evaluate strategies. It allows experimenting with a model instead of the real system. Some key uses of simulation include handling complex problems without optimal solutions, risky or costly real-world experiments, and answering "what if" questions. Simulation can be static or dynamic, deterministic or stochastic, discrete or continuous. Monte Carlo simulation uses random numbers to model uncertainty and is useful for decision-making under risk. Businesses apply simulation to areas like stock analysis, pricing, marketing, and cash flow forecasting. An example is using simulation to analyze a university health clinic's queuing system and improve operations.
The document provides an overview of Unified Modeling Language (UML) diagrams. It discusses 13 types of UML diagrams but notes that most users focus on class, sequence, and state machine diagrams. The document describes the components and syntax of class, sequence, and state machine diagrams. It provides examples of each and guidelines for creating them to model the structure and behavior of software systems.
Choosing the right process improvement tool for your project.
Learn how an experienced engineer decides when simulation is the right tool for his projects,
and when it isn't.
With the evolution of process improvement software, it can be difficult to decide the right tool for the job. Using something too powerful and complex can be a lengthy and unnecessary process, but underestimating the depth of analysis required and choosing something too simplistic early in a project can result in repeated work later.
Adversaries compromise at will, penetrating today’s signature and IOC dependent detection capabilities. Most incident responders are locked in a cycle of constant reaction to the fraction of activity that is known. Often, undetected attackers remain active in the network as reported incidents are remediated. A new approach is needed to break the cycle of reaction and eradicate the unknown.
An offense-based approach must be adopted. Hunting puts the defender on the offensive within their networks, allowing for rapid detection and remediation of threats. Adversary dwell time can be drastically reduced, reducing business impacts and recovery costs. The Endgame hunt platform enables instant protection, visibility, and precision response across your endpoints and automates detection of known and never before seen adversaries without relying on signatures.
This talk covers:
• Description and benefits of hunt
• Challenges of hunting
• Solutions and hunting best practices
This document discusses methods for interpreting complex predictive models, specifically focusing on determining variable importance. It introduces input shuffling as a technique to assess variable importance for any model, including linear regression, decision trees, neural networks, and ensembles. Input shuffling randomly shuffles values of one input variable at a time and measures the impact on model predictions to determine how influential each variable is to the model. The document demonstrates input shuffling on both regression and classification tasks.
This document discusses methods for interpreting complex predictive models, specifically focusing on determining variable importance. It introduces input shuffling as a technique to assess variable importance for any model, including linear regression, decision trees, neural networks, and ensembles. Input shuffling randomly shuffles values of one input variable at a time and measures the impact on model predictions to identify influential variables. The document demonstrates input shuffling on both regression and classification tasks and compares it to other variable importance metrics for linear models.
This document provides instructions on using Excel functions and charts. It describes Excel components and arithmetic operators. It explains the order of precedence for calculations and how to use the Insert Function button to select functions. Examples show how to define functions within functions, examine the Insert Function dialog box, and create column and pie charts using the Chart Wizard. The Chart Wizard dialog boxes guide the user in selecting a chart type, choosing data series, and customizing the chart.
Applications of simulation in Business with ExamplePratima Ray
Simulation is modeling a real-world system on a computer to understand its behavior and evaluate strategies. It allows experimenting with a model instead of the real system. Some key uses of simulation include handling complex problems without optimal solutions, risky or costly real-world experiments, and answering "what if" questions. Simulation can be static or dynamic, deterministic or stochastic, discrete or continuous. Monte Carlo simulation uses random numbers to model uncertainty and is useful for decision-making under risk. Businesses apply simulation to areas like stock analysis, pricing, marketing, and cash flow forecasting. An example is using simulation to analyze a university health clinic's queuing system and improve operations.
The document provides an overview of Unified Modeling Language (UML) diagrams. It discusses 13 types of UML diagrams but notes that most users focus on class, sequence, and state machine diagrams. The document describes the components and syntax of class, sequence, and state machine diagrams. It provides examples of each and guidelines for creating them to model the structure and behavior of software systems.
Choosing the right process improvement tool for your project.
Learn how an experienced engineer decides when simulation is the right tool for his projects,
and when it isn't.
With the evolution of process improvement software, it can be difficult to decide the right tool for the job. Using something too powerful and complex can be a lengthy and unnecessary process, but underestimating the depth of analysis required and choosing something too simplistic early in a project can result in repeated work later.
Adversaries compromise at will, penetrating today’s signature and IOC dependent detection capabilities. Most incident responders are locked in a cycle of constant reaction to the fraction of activity that is known. Often, undetected attackers remain active in the network as reported incidents are remediated. A new approach is needed to break the cycle of reaction and eradicate the unknown.
An offense-based approach must be adopted. Hunting puts the defender on the offensive within their networks, allowing for rapid detection and remediation of threats. Adversary dwell time can be drastically reduced, reducing business impacts and recovery costs. The Endgame hunt platform enables instant protection, visibility, and precision response across your endpoints and automates detection of known and never before seen adversaries without relying on signatures.
This talk covers:
• Description and benefits of hunt
• Challenges of hunting
• Solutions and hunting best practices
This document provides an overview of regression analysis and linear regression. It explains that regression analysis estimates relationships among variables to predict continuous outcomes. Linear regression finds the best fitting line through minimizing error. It describes modeling with multiple features, representing data in vector and matrix form, and using gradient descent optimization to learn the weights through iterative updates. The goal is to minimize a cost function measuring error between predictions and true values.
MATLAB is a high-level programming language and computing environment used for numerical computations, visualization, and programming. The document discusses MATLAB's capabilities including its toolboxes, plotting functions, control structures, M-files, and user-defined functions. MATLAB is useful for engineering and scientific calculations due to its matrix-based operations and built-in functions.
This document provides an overview of MATLAB, including its uses, features, and basic programming concepts. MATLAB is a numerical computing environment and programming language that allows matrix manipulations, data visualization, algorithm development, and interfacing with other languages. It has a comprehensive set of built-in functions for mathematical and technical computing. The document discusses MATLAB's programming constructs like scripts, functions, operators, decision making statements, and loops. It also covers basic data types like vectors and matrices.
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
The document provides an overview of machine learning concepts including linear regression, artificial neural networks, and convolutional neural networks. It discusses how artificial neural networks are inspired by biological neurons and can learn relationships in data. The document uses the MNIST dataset example to demonstrate how a neural network can be trained to classify images of handwritten digits using backpropagation to adjust weights to minimize error. TensorFlow is introduced as a popular Python library for building machine learning models, enabling flexible creation and training of neural networks.
Machine Learning can often be a daunting subject to tackle much less utilize in a meaningful manner. In this session, attendees will learn how to take their existing data, shape it, and create models that automatically can make principled business decisions directly in their applications. The discussion will include explanations of the data acquisition and shaping process. Additionally, attendees will learn the basics of machine learning - primarily the supervised learning problem.
This document provides an overview of MATLAB for students in engineering fields. It introduces MATLAB as a tool for matrix calculations and numerical computing. It describes the MATLAB environment and commands for help, variables, matrices, logical operations, flow control, scripts and functions. It also covers image processing in MATLAB, including importing and displaying images, image data types, basic operations, and examples of blending and edge detection on images. Finally, it discusses performance issues and the importance of vectorizing code to avoid slow loops.
Elementary Data Analysis with MS Excel_Day-4Redwan Ferdous
This event took place on 12th September 2020. This was arranged by EMK Center (Makerlab). The title was 'Elementary Data Analysis with MS Excel', where very basic data analysis with MS excel was discussed.
In Day-4, the MS Excel Data Tab, View and Review tab as well as Developer Tab of Horizontal top ribbon was discussed. As well as different Quick analysis tools, What-if Analysis, Data Table, Scenario Manager, Pareto Chart was also discussed.
The document discusses problem-solving and design skills needed for computer programming. It covers several key topics:
1. Candidates should understand top-down design and be able to break down computer systems into subsystems using structure diagrams, flowcharts, pseudocode, and subroutines.
2. Candidates should be able to work with algorithms - explaining them, suggesting test data, and identifying/fixing errors. They should be able to produce algorithms for problems.
3. Top-down design is described as the process of breaking down a computer system into subsystems, then breaking each subsystem into smaller subsystems, until each performs a single action.
Three sentences:
The document provides an introduction to image processing and MATLAB. It defines key concepts in image processing like image formation through sampling and quantization. It also introduces various tools in MATLAB for working with digital images, such as importing/exporting images, displaying images using functions like imshow and imagesc, and performing basic operations on image matrices.
This is a single day course, allows the learner to get experience with the basic details of deep learning, first half is building a network using python/numpy only and the second half we build the more advanced netwrok using TensorFlow/Keras.
At the end you will find a list of usefull pointers to continue.
course git: https://gitlab.com/eshlomo/EazyDnn
Dataset Preparation
Abstract: This PDSG workshop introduces basic concepts on preparing a dataset for training a model. Concepts covered are data wrangling, replacing missing values, categorical variable conversion, and feature scaling.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
The value of "a.value" will be printed to the VBA Immediate window when that line is executed. The Debug.Print statement sends its output to the Immediate window, which is useful for inspecting variable values while code is running without stopping the execution.
This document provides an overview of machine learning concepts including:
1. It defines data science and machine learning, distinguishing machine learning's focus on letting systems learn from data rather than being explicitly programmed.
2. It describes the two main areas of machine learning - supervised learning which uses labeled examples to predict outcomes, and unsupervised learning which finds patterns in unlabeled data.
3. It outlines the typical machine learning process of obtaining data, cleaning and transforming it, applying mathematical models, and using the resulting models to make predictions. Popular models like decision trees, neural networks, and support vector machines are also briefly introduced.
This document provides an introduction to Excel, Word, and PowerPoint. It discusses the basics of spreadsheets in Excel including creating and formatting worksheets, calculations with formulas, and copying data to other programs. It also covers creating and formatting presentations in PowerPoint including adding slides, text, images, and charts. Finally, it discusses opening and viewing documents in Word and resources for learning more about Microsoft Office applications.
Machine learning for IoT - unpacking the blackboxIvo Andreev
This document provides an overview of machine learning and how it can be applied to IoT scenarios. It discusses different machine learning algorithms like supervised and unsupervised learning. It also compares various machine learning platforms like Azure ML, BigML, Amazon ML, Google Prediction and IBM Watson ML. It provides guidance on choosing the right algorithm based on the data and diagnosing why machine learning models may fail. It also introduces neural networks and deep learning concepts. Finally, it demonstrates Azure ML capabilities through a predictive maintenance example.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Maninda Edirisooriya
This lesson covers the core data science related content required for applying ML. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
This document describes how modeling and simulation (M&S) can be used to project outcomes for clinical trials. M&S involves building statistical models based on incoming patient data and then simulating the remainder of the study multiple times. This allows researchers to predict milestones, test alternative scenarios, and validate study assumptions. The document provides examples of how M&S was used to accurately forecast timelines and inform decisions for trials experiencing issues with enrollment rates and event rates differing from initial assumptions. Management found the simulations to be very valuable for planning by providing projections when other methods would have involved guessing.
This document introduces Tolstoy Targets, a visualization method using radial axes to provide a concise summary of multiple objectives or attributes. It discusses principles like using traffic light colors to indicate success or failure of predefined targets. Conventions are outlined, such as grouping attributes by direction and adding confidence ranges. Practical examples demonstrate comparing projects, mass screening of enzymes, transfusion risks for multiple patients, and assessment scores. The document concludes by providing contact information for the author.
This document provides an overview of regression analysis and linear regression. It explains that regression analysis estimates relationships among variables to predict continuous outcomes. Linear regression finds the best fitting line through minimizing error. It describes modeling with multiple features, representing data in vector and matrix form, and using gradient descent optimization to learn the weights through iterative updates. The goal is to minimize a cost function measuring error between predictions and true values.
MATLAB is a high-level programming language and computing environment used for numerical computations, visualization, and programming. The document discusses MATLAB's capabilities including its toolboxes, plotting functions, control structures, M-files, and user-defined functions. MATLAB is useful for engineering and scientific calculations due to its matrix-based operations and built-in functions.
This document provides an overview of MATLAB, including its uses, features, and basic programming concepts. MATLAB is a numerical computing environment and programming language that allows matrix manipulations, data visualization, algorithm development, and interfacing with other languages. It has a comprehensive set of built-in functions for mathematical and technical computing. The document discusses MATLAB's programming constructs like scripts, functions, operators, decision making statements, and loops. It also covers basic data types like vectors and matrices.
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
The document provides an overview of machine learning concepts including linear regression, artificial neural networks, and convolutional neural networks. It discusses how artificial neural networks are inspired by biological neurons and can learn relationships in data. The document uses the MNIST dataset example to demonstrate how a neural network can be trained to classify images of handwritten digits using backpropagation to adjust weights to minimize error. TensorFlow is introduced as a popular Python library for building machine learning models, enabling flexible creation and training of neural networks.
Machine Learning can often be a daunting subject to tackle much less utilize in a meaningful manner. In this session, attendees will learn how to take their existing data, shape it, and create models that automatically can make principled business decisions directly in their applications. The discussion will include explanations of the data acquisition and shaping process. Additionally, attendees will learn the basics of machine learning - primarily the supervised learning problem.
This document provides an overview of MATLAB for students in engineering fields. It introduces MATLAB as a tool for matrix calculations and numerical computing. It describes the MATLAB environment and commands for help, variables, matrices, logical operations, flow control, scripts and functions. It also covers image processing in MATLAB, including importing and displaying images, image data types, basic operations, and examples of blending and edge detection on images. Finally, it discusses performance issues and the importance of vectorizing code to avoid slow loops.
Elementary Data Analysis with MS Excel_Day-4Redwan Ferdous
This event took place on 12th September 2020. This was arranged by EMK Center (Makerlab). The title was 'Elementary Data Analysis with MS Excel', where very basic data analysis with MS excel was discussed.
In Day-4, the MS Excel Data Tab, View and Review tab as well as Developer Tab of Horizontal top ribbon was discussed. As well as different Quick analysis tools, What-if Analysis, Data Table, Scenario Manager, Pareto Chart was also discussed.
The document discusses problem-solving and design skills needed for computer programming. It covers several key topics:
1. Candidates should understand top-down design and be able to break down computer systems into subsystems using structure diagrams, flowcharts, pseudocode, and subroutines.
2. Candidates should be able to work with algorithms - explaining them, suggesting test data, and identifying/fixing errors. They should be able to produce algorithms for problems.
3. Top-down design is described as the process of breaking down a computer system into subsystems, then breaking each subsystem into smaller subsystems, until each performs a single action.
Three sentences:
The document provides an introduction to image processing and MATLAB. It defines key concepts in image processing like image formation through sampling and quantization. It also introduces various tools in MATLAB for working with digital images, such as importing/exporting images, displaying images using functions like imshow and imagesc, and performing basic operations on image matrices.
This is a single day course, allows the learner to get experience with the basic details of deep learning, first half is building a network using python/numpy only and the second half we build the more advanced netwrok using TensorFlow/Keras.
At the end you will find a list of usefull pointers to continue.
course git: https://gitlab.com/eshlomo/EazyDnn
Dataset Preparation
Abstract: This PDSG workshop introduces basic concepts on preparing a dataset for training a model. Concepts covered are data wrangling, replacing missing values, categorical variable conversion, and feature scaling.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
The value of "a.value" will be printed to the VBA Immediate window when that line is executed. The Debug.Print statement sends its output to the Immediate window, which is useful for inspecting variable values while code is running without stopping the execution.
This document provides an overview of machine learning concepts including:
1. It defines data science and machine learning, distinguishing machine learning's focus on letting systems learn from data rather than being explicitly programmed.
2. It describes the two main areas of machine learning - supervised learning which uses labeled examples to predict outcomes, and unsupervised learning which finds patterns in unlabeled data.
3. It outlines the typical machine learning process of obtaining data, cleaning and transforming it, applying mathematical models, and using the resulting models to make predictions. Popular models like decision trees, neural networks, and support vector machines are also briefly introduced.
This document provides an introduction to Excel, Word, and PowerPoint. It discusses the basics of spreadsheets in Excel including creating and formatting worksheets, calculations with formulas, and copying data to other programs. It also covers creating and formatting presentations in PowerPoint including adding slides, text, images, and charts. Finally, it discusses opening and viewing documents in Word and resources for learning more about Microsoft Office applications.
Machine learning for IoT - unpacking the blackboxIvo Andreev
This document provides an overview of machine learning and how it can be applied to IoT scenarios. It discusses different machine learning algorithms like supervised and unsupervised learning. It also compares various machine learning platforms like Azure ML, BigML, Amazon ML, Google Prediction and IBM Watson ML. It provides guidance on choosing the right algorithm based on the data and diagnosing why machine learning models may fail. It also introduces neural networks and deep learning concepts. Finally, it demonstrates Azure ML capabilities through a predictive maintenance example.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Maninda Edirisooriya
This lesson covers the core data science related content required for applying ML. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Similar to Sim Slides,Tricks,Trends,2012jan15 (20)
This document describes how modeling and simulation (M&S) can be used to project outcomes for clinical trials. M&S involves building statistical models based on incoming patient data and then simulating the remainder of the study multiple times. This allows researchers to predict milestones, test alternative scenarios, and validate study assumptions. The document provides examples of how M&S was used to accurately forecast timelines and inform decisions for trials experiencing issues with enrollment rates and event rates differing from initial assumptions. Management found the simulations to be very valuable for planning by providing projections when other methods would have involved guessing.
This document introduces Tolstoy Targets, a visualization method using radial axes to provide a concise summary of multiple objectives or attributes. It discusses principles like using traffic light colors to indicate success or failure of predefined targets. Conventions are outlined, such as grouping attributes by direction and adding confidence ranges. Practical examples demonstrate comparing projects, mass screening of enzymes, transfusion risks for multiple patients, and assessment scores. The document concludes by providing contact information for the author.
This document describes how modeling and simulation (M&S) can be used to project timelines and resource needs for clinical trials. M&S involves building statistical models based on incoming patient data and then simulating the remainder of the study multiple times. This allows researchers to predict milestones, test alternative scenarios, and validate study assumptions. The document provides examples of how M&S accurately predicted timelines for trials with complex multi-segment designs and competing risk events. Study managers found the projections from M&S to be very valuable for planning purposes.
This document discusses metrics for assessing the performance of randomization methods in clinical trials. It proposes measuring randomness using potential selection bias, which calculates how well an observer could guess the next treatment assignment based on previous assignments. It also considers periodicity to detect patterns. Balance is measured using efficiency loss, which quantifies the increase in variability due to imbalances. The document outlines a simulation study comparing randomization methods using these proposed metrics. Stratification factors are modeled using a Zipf-Mandelbrot distribution to generate realistic subgroup sizes. Randomness and balance metrics are calculated at interim analyses and summarized graphically.
This document discusses metrics for assessing the predictability and efficiency of covariate-adaptive randomization designs in clinical trials. It proposes measuring predictability using a modified Blackwell-Hodges potential selection bias metric that calculates how well an observer could guess the next treatment assignment. It also considers entropy and periodicity measures. Balance/efficiency is proposed to be measured using Atkinson's method of quantifying the loss of statistical power as an equivalent reduction in sample size due to treatment imbalances within subgroups. The document then outlines a simulation study to compare various randomization methods using these proposed metrics.
This document discusses methods for assessing randomization in clinical trials that use covariate-adaptive randomization designs. It presents metrics for measuring randomness, balance, and efficiency loss in randomization schemes. The document outlines a simulation approach and discusses results from comparing different randomization factors and sample sizes. It proposes future directions such as optimizing randomization parameters and exploring periodicity in system behavior.
- Simulations of clinical trial randomization methods showed consistent trade-offs between efficiency and unpredictability over different methods and parameters. No single best method optimized both metrics.
- Two metrics were used to evaluate predictability (potential for selection bias) and efficiency (loss of statistical power): simulations revealed clear trade-offs between higher predictability and lower efficiency.
- As sample size increased, most methods became more efficient while some also became more predictable and others less predictable, depending on the method. Permuted blocks, dynamic allocation, and complete randomization were among the methods evaluated.
The document discusses randomization in clinical trials. It explains that randomization is important to minimize biases and balance treatment groups. Different randomization methods are presented: complete randomization, minimization, and permuted blocks. Metrics for evaluating randomization like balance, predictability, and loss of power are covered. Simulations comparing methods in terms of confounding factors, overall performance, and discontinuing patients are described. The importance of balanced treatment groups for sufficient statistical power and avoiding light weight results is emphasized.
Splatter plots provide:
(1) A comprehensive yet reducible way to visualize data across multiple dimensions.
(2) Diagnostic insights are obvious and interpretable at a glance, with problem areas visually identified.
(3) Various symbols, colors and visual cues can be used depending on the type of data and desired level of precision needed.
1. Simulation in Excel:
Tricks, Trials & Trends
Presented to the
American College of Radiology
12 January 2012
Dennis Sweitzer, Ph.D.!
www.Dennis-Sweitzer.com !
2. Abstract
Simulation in Excel: Tricks, Trials & Trends
Excel is a general purpose spreadsheet which is widely used & understood, but rarely used by itself for
simulations. However, the Data Table function in MS Excel can be used to execute substantial
simulations, without requiring cumbersome programming "tricks" or VBA coding. The result is an
arbitrarily large results table in which each row is one iteration of the simulation, and each column is a
random variable generated in the simulation.
A small number of additional probability functions are easily programmed using VBA to make Excel a
general purpose simulation package. Because VBA is interpreted, use of VBA functions can greatly limit
the speed of a simulation. However, for simulations of small size and complexity, the ease and familiarity
of working in Excel, outweigh the disadvantages of speed. Examples from clinical trials will be used.
Finally, I discuss new methods to move simulations out of the black boxes and into the enterprise, based
on work by Sam Savage. Simulation results (a “SIP”, or “Stochastic Information Packet”) from multiple
platforms can be stored as XML strings(using the DIST standard) in a “SLURP” (“Stochastic Library Unit
with Relationships Preserved”), and from there used for reports, planning, etc, or incorporated into other
simulations.
3. Outline
• How to do Simulation in Excel
• Notes on using Inverse Probability Functions
• Some Macros and VBA • Clinical Trial Examples
functions
• Probability Management
in SIPS, SLURPS, & DIST
4. Background
• Occasional need for simulations
• Excel is convenient, but
– does not explicitly support simulations
– Simulation usually requires VBA programming
(so why not use R or SAS instead)
– Or Add-in commercial programs (eg., @Risk)
– Or some academic add-ins
• Does have iterative calculations, Solver
• Why not simulation?
5. Simulate what?
• Stochastic Models
– Unknown parameters? èGuestimate a distribution
– Optimizing choices? èTest each with simulations
• Sensitivity Analysis
– Variations in Inputs è Variations in Outputs
– 2 parameters: use a table
– >2 parameters: simulate & compare variation
6. Excel: Pros
Common Language / Common Tools
• Most people understand Excel MEGO
• Many tools available in Excel
Transparency: Modeling assumptions can be:
Specified -- Graphed -- Debated
What you see is what you get!
More hands on deck, more eyes on the prize….:
Statistician Team Member
Initial Model Explores & breaks model
Repair & enhance …Repeat until satisfied
7. Excel Cons
Slower than in SAS, S+, R, etc
Lacks some statistical/probability functions
• Latest versions are a little better
• Still need to add some VBA code
• Known bugs in statistical routines (often fixed)
Tradeoffs:
• Quicker modifications
vs slower execution
8. Simple Solution: Data Tables
Excel Data Tables
• Creates a table of values of a function
Each column is a Random Variable
• Leftmost column is used as an argument
– (unneeded for simulation)
• Data Table repeats calculations for each row
Each row is a simulation iteration
9. 1. Create Simulation
Create Random Variables using Inverse Probability Method:
For Random Variable X with distribution function F(x),
F(x): ℜ→ [0,1]
If Random Uniform U∈ [0,1]
X = F-1(U) (Excel: U=Rand() )
10. 2. Align Random Variables
• Calculations can be
anywhere in
Spreadsheet
• Reference the
Variables in a row
• Is best to label
variables in same way
11. 3. Select Data Table
• Select table region
– 1st row is Rand Vars
– 1st column is not used
(can label iterations)
• From toolbar:
– Data>Data Table
12. 4. Create Simulation Table
• Column input cell =
Upper left hand corner
of table
• Row input cell = ignore
• OK è Populates the
table
• (may have to manually
recalculate)
13. 5. Execute Simulation
Iterative development
• Simulation can be changed
• Add reporting variables
• Recalculate to rerun
– (no need to use Data Table
again, unless expanding)
• Hint: debug with short table,
expand for final run
15. But still more….
• Why use inverse probability distributions
(instead of random variables)?
• When not to use a spreadsheet for simulation?
• Tools:
– Macros to set up a simulation
– VBA functions for common simulation distributions
• Trends: Probability Management
– SIPs, SLURPS, DIST
16. Inverse Probability Function
• Most systems directly generate random
variables with the desired distribution
• Why use Inverse Probability Functions?
– Which are (probably) slower?
Personal opinion
• Testing & Debugging
• Verification ç Calculates correctly
• Validation ç Calculations answer Problem
• Sensitivity ç Input vs Output variability
17. Why use Inverse Probability Distributions?
• Testing & Debugging
• Validation & Verification
• Sensitivity
ç Save the Rand() values
è Recreate unexpected results
è Reasonableness: small changes in Rand() à small
changes in output?
è Explore impact of small changes in Rand() values
on simulation output
18. As Mapping function
⟼F-1
U
Probability Distribution: F(x): ℜ→ [0,1]
Random Uniform: U∈ (0,1]
Inverse PDF: X = F-1(U)
For Continuous (or monotone) F-1
Small changes in u U è small changes in F-1 (u)
21. Example #1
Simple model, Saving {Ui}:
function of 2 RV
• Verify
• Replicate
A Max value looks high. • Quantify
Is it a bug? If not, how often?
Saved random U[0,1]
For each iteration
Check u U[0,1]
That generated high value
u=0.983… è random high
è Rarely happens
23. Spreadsheet limitations
• Only simple data structures are available
– Rows & columns, no lists & trees
– Discrete event simulations
• Complex algorithms: difficult
– Eg, While or for loops
– Can improvise (cumbersome, slow, buggy)
• Speed: slow
• Data Storage: what-you-see-is-all-you-get
24. Tools: Excel Simulation Template
• Adds some missing random functions
• Adds some set-up macros
Excel template & examples at:
www.Dennis-Sweitzer.com
25. Macro SimulateSampler
To start a new simulation when you don't
remember the names & parameters of
common random variables used in simulation:
• Run the Macro SimulationSample
• Copy, delete, and edit as needed.
• Make sure all random values are referenced
in the first row of the data table at the
bottom.
27. Macro SimulationSampler
………
• Sets up header
row for data
table
• Sets up a place
for statistics
28. Macro Simulate
• Highlight the row of random variables
– (1st row of simulation table)
• Run macro "Simulate”
– Prompts for which will ask for the number of
simulation iterations,
– The default number of iterations is 100
– Debug & develop (manually recalculate)
– Final run with >1000 iterations
– Visual Basic code is computationally intensive,
30. Excel Random Variables
Rand() --Random Uniform [0,1]
NormSInv() – Inverse Standard Normal Distribution
CriticalBinomial() – Inverse Binomial Distribution
LogNormInv() - Inverse Log Normal Distribution
Caveat: parameters are mean, SD after the Log transformation
31. Erlang Distribution
How long do you wait until you get a
predetermined number of arrivals?
• Interarrival times are distributed IID
exponential
• Erlang is Gamma with integer parameter
32. Beta Distribution
Can use as
• Distribution of a Binomial probability
• Range = [0,1]
• Generic bounded hump (vs Normal as generic unbounded hump)
• Better behaved than a triangular distribution
34. Example #2, Simulation
• Time to 100th
patient
• Patients arrive
IID Exponential
Summary Statistics of Simulated values
(below)
Interpretation: under the assumptions,
90% of simulations required more than 4.4
months
35. Added VBA Functions
Inverse Functions Needed for Simulation
• Poisson, Negative Binomial
Interpolation from Table
• Interpolate: 1 or 2 dimensional interpolation
Convenience
• Beta with Mean, SD as parameters
• Beta with Hi, Low, and Mode used for
parameters
• Log Normal with mean, SD as parameters
36. Missing Statistical Functions
Inverse Distributions
• InvPoisson :: Poisson
• InvPascal :: Negative Binomial
– (how many failures before k successes)
• Negative Binomial is continuous valued distribution;
• Discrete version is often denoted Pascal distribution
37. Example#3,
Patients to Screen
Expected Enrollment rate
= 75% ± 5%
~ Beta Distribution
# Screen Failures
~ Negative Binomial (Pascal)
– Depends on Enrollment
Rate
38. Beta Distribution (2)
For
Convenience
• Beta distribution given Mean, SD
• Beta distribution given Mean, SD, upper, lower bounds
• Beta distribution given Mode, Upper, Lower bounds
39. Simulation from a Table
Find the value in the 1st vector;
ç Return interpolated value from 2nd
Simulate arbitrary distribution:
• Top Row: values in [0,1]
• Bottom Row: Quantiles
• Result: interpolated value of U from table
Or a function: y=f(x)
• X is found in top row, y is interpolated from bottom row
40. Table Simulation Uses
• Polygonal distributions (like Triangular)
• Survival curve (for time to event)
– Est. K-M curve from data, simulate rest of trial
• Arbitrary empirical distributions
• Distribution from observations
• Table of power calculations
– eg, assurance calculations:
• If # patients is random, so is effective power of the study
• If True effect size is random, so is Pr{success}
41. Simulation from a 2-dimensional table
Here:
• Rows are quartiles of a random function
• Left column is value of a parameter
• A family of distributions which vary with the parameter
• Parameter y=75% (can be random)
• Generate random numbers from the interpolated distribution.
42. Example #4: Interim Review
• After 2 months, review randomization rates
• Continue to Randomize to 100 patients
• How long?
43. Example#4: Interim Review (Simulation)
Y= # Patients at 2 mos
~ Poisson
Time to Randomize
(100-Y) additional pts
~ Erlang (Gamma)
80% CI:; (2.5, 3.7)
months
45. Planning
Expected Trial Performance
• Usually not of interest -- already done w/o simulation
• But should be
Variability of Trial Performance
• Important for Risk Management: What s the earliest,
the latest, the most, the least, etc
• 80% CIs
Structural Problems
• Interactions of parameters may doom the trial before it
even starts! (eg, mean (max{ X, Y} ) vs max{ mean(X), mean(Y) } )
¡The Flaw of Averages!
46. Prototyping
Prototyping:
• Toy simulation with hands-on teamwork
• Development model
• Get team buy-in on assumptions
• Processing speed not important
• Rapid modifications are important
Ideal?
• Develop a prototype in an 1 hour meeting
• Check for errors later
• Run large simulations later for precise estimates
47. Checking planning assumptions
• H0 = Simulation assumptions
• Observed: a value X
• {xi} = corresponding values in simulation
• Rank of X in {xi} ≈ p-value
Stored Values: Use Function Percent Rank
Descriptive Statistics: Use Frequency Count
Use to:
• Test assumptions, validate model, +??
• If an observed value of X is rare in the simulation,
question assumptions!
48. Checking Assumptions
Example:
• A trial is designed based on a non-trivial simulation.
• The model predicts a completion rate of 65%
with 95% C.I.= (55%, 75%)
• 4 months into the trial, a 50% completion rate is
observed.
• How significant is this discrepancy?
Resimulate:
• {xi} = simulated completion rates (1/iteration)
• Rank of observed 50% in simulated {xi} ≈ p-value
• How likely is the observation, under the modeled
assumptions?
49. Sensitivity Analysis
• What-ifs
• Interactions between parameters
è Identify Key Control points!
• Vary parameters between simulations
• Compare simulation results
– Eg, average, worst-case scenarios
• Correlations between simulated parameters
and outcomes
50. Weighted simulations
Advantage:
• Large but unlikely events are more likely to
be simulated
• Common but dull events are simulated
infrequently, but up-weighted
• Rare, but exciting, events are simulated, and
down-weighted
51. Macro Management
VBA Editor:
Alt-F11 (or find the menu)
• Copy Module between sheets
• Copy code from .xls sheet &
insert into VBA editor
• Open & save as new sheet
52. Macro Management (newer)
In Visual Basic
From the
Tool Bar
• File > Export File
– Export VBA code
(module: “SweitzerSimulationCoreCode”)
• File > Import File
– Imports VBA code (into a module)
53. Further resources
Commercial and Free software packages
Provide:
• More rigorous algorithms
• More functions
– Resampling, multivariate, etc
• More support
55. Free Add-Ins
PopTools (Windows only)
www.cse.csiro.au/poptools
SimTools.xla (Macintosh & Windows)
http://home.uchicago.edu/~rmyerson/addins.htm
Caveat: Licensing
• Free for non-commercial (eg, education)
• Not clear for other uses
(NB: vba code from my website is free for all use,
but not as useful)
57. Additional Reading
INTRODUCTION TO MODELING AND GENERATING
PROBABILISTIC INPUT PROCESSES FOR SIMULATION
www.informs-sim.org/wsc07papers/008.pdf
Spreadsheet Simulation (Seila, 2006)
www.informs-sim.org/wsc06papers/002.pdf
Work Smarter, Not Harder: Guidelines for
Designing Simulation Experiments
www.informs-sim.org/wsc06papers/005.pdf
Tips for the Successful Practice of Simulation
www.informs-sim.org/wsc06papers/007.pdf
58. Probability Management
Built more elaborate models
Learned to
• Display results in column
• Copy values to save
• Do math with the results
Why not?
• Save columns
of simulated
iterations
• Recombine as
needed
59. Combining simulations results
4 simulations:
{ 2 studies} x {2 scenarios}
Why not?
• Save columns
Study#1,
of simulated
iterations
Early Start Estimates of
• Recombine as total:
Study#1,
needed Late Start
• Resources
• Costs
• Pr{success}
Study#2,
Early Start
Pick optimal
M
Study#2,
Late Start Requires
independence!
• Ie., portfolio optimization
60. Combining simulation iterations
4 simulations:
{ 2 studies} x {2 scenarios}
Why not?
• Save columns
Study#1,
of simulated
Early Start
iterations
• Recombine as Study#1,
needed Late Start Estimates of …
Study#2,
Simulation Early Start
of common
factors Study#2,
Late Start
• Preserves relationships
61. Probability Management
Other people already doing it
Further research:
Primary source for rest of presentation:
Savage, Scholtes and Zweidler, 2006, "Probability
Management," OR/MS Today, Vol.33, No.1 (February 2006)
• http://www.orms-today.org/orms-2-06/frprobability.html
(Part 2)
• http://www.orms-today.org/orms-4-06/frprobability.html
62. Basic idea
Simulations
Simulations
Simulations
of common
of of common
common
factors
factors
factors Dependent
Simulations
Dependent
Simulations
Dependent
Simulations
Reporting &
Dependent Analysis
Simulations Programs
Estimates of …
63. Basic idea
Simulations
Simulations Multiple simulations:
Simulations • Different platforms
Simulations
Simulations
• Different sources
Simulations • Different uses
Reporting &
Analysis
Programs &
Reporting
• Database of Simulation Results Analysis
• Results at the iteration level Programs
• Coherent
64. Basic Definitions
Simulations
SIP: Stochastic
Information Package
• Basic unit of information
• Eg, “the price of oil”, but for
10,000 alternative universes
SLURP: Stochastic Library Unit with
Relationships Preserved
• SIPs are coherent with each other
– Eg, in each SIP, iteration #4567 is from the same alternative universe
• Analogous to demographic “Representative Samples”
65. Basic Definitions
Simulations
Benefits of coherent Requires central control:
modeling • Common standards
• Statistical dependencies are • Certification authority
modeled consistently across – “Chief Probability Officer”
the organization
• Models can be “rolled up”
between levels of the
organization
• Auditability: Easier to audit
individual simple models
66. Coherence
Simulations
Example: variables Requires central control:
X&Y • Common standards
• Coherent • Certification authority
– “Chief Probability Officer”
• But not correlated
67. DIST Standard
Simulations
XML
How to • 10,000 numbers
1 XML string
Store SIPs? Metadata + Base 64
• Massive encoding of values
amounts of data
Contents:
How to • Name
Reduce precision
Share SIPs? and pack it! • Mean, Min, Max,
Count of values
• Data type (Binary,
1 or 2 Byte)
3 bytes (8 bits each)
into
4 characters (6 bits each)
68. DIST Standard
• A SIP in DIST fits into 1 cell on a spreadsheet
<dist name="User Interface, weeks"
avg="3.3751" min="2.03" max="7.75" count="100"
type="Double" origin="DistShaper3 at smpro.ca"
ver="1.1" >G00Z9SIDCIEmC0nYFtMi6R0XKZ
+KvSzBI85ui5tMZgoDlbGt dF1d/
CqEMwUlmCfVMMg6oUByUXQyIATsaSw1QhgrhOwaaAI9D
6oks9M+IDk0XQyIDlI2mhJZBkQXRnm7IR45ST3D///
IDlgrHD I38VraK2kLownZf41jWw1tROxTsS/
jGRAUJCbwHfwougAAEXR r3A83FQnpnhXukBxM
+kswBykeb0gOQ5RByk83PxtV7mCrH1QQ
jy6LPGstpgFYRrYKvqZ9Ez8AAAAA</dist>!
• Each cell contains an array
• Operations apply functions
to each element in array
Source: Marc Thibault, Sam Savage. Probability
Management for Projects: Managing Uncertainty in
plan estimates and targets.. October 2011
69. Supporting Software
MS Excel Spreadsheet Add-ins
• Risk Solver from Frontline Systems (www.Solver.com)
<dist name="User Interface, weeks"
• XLSim 3 (www.VectorEconomics.com) max="7.75" count="100"
avg="3.3751" min="2.03"
type="Double" origin="DistShaper3 at smpro.ca"
– small (single sheet) interactive simulation with DISTs
ver="1.1" >G00Z9SIDCIEmC0nYFtMi6R0XKZ
+KvSzBI85ui5tMZgoDlbGt dF1d/
– enables the users of Oracle Crystal Ball and @Risk from
CqEMwUlmCfVMMg6oUByUXQyIATsaSw1QhgrhOwaaAI9D
6oks9M+IDk0XQyIDlI2mhJZBkQXRnm7IR45ST3D///
Palisade Corp. to read and right DISTs.
IDlgrHD I38VraK2kLownZf41jWw1tROxTsS/
jGRAUJCbwHfwougAAEXR r3A83FQnpnhXukBxM
• Analytica from Lumina Decision Systems, Inc
+kswBykeb0gOQ5RByk83PxtV7mCrH1QQ
jy6LPGstpgFYRrYKvqZ9Ez8AAAAA</dist>!
(www.Lumina.com)
SAS?
R/S+ --Already is vector oriented
• RExcel runs R from Excel. ??
70. R/S+
Ø x1<-rnorm(10000) # an array of 10,000 standard random normal
Ø y1<-rpois(10000, 5) # an array of 10,000 random poissons
Ø (x1+y1)[1:10] # element by element operations
• Already handles vectors – very fast
• Needs functions to encode & decode DIST
¿Accessing R from with spreadsheet?
• RExcel – Access R from within Excel (Addin)
• ROOo – Access R from within OpenOffice spreadsheet
• Open Source (like LINIX)
• (Perhaps) use spreadsheet for upper level simulation
• Use R at lower level – each cell contains 1000’s of simulated values