SlideShare a Scribd company logo
1 of 24
AN INTRODUCTION TO
 SIMULATION DESIGN IN
 THE SOCIAL SCIENCES
By Francis Smart

Michigan State University
Agricultural, Food, and Resource Economics
Measurement and Quantitative Methods


www.econometricsbysimulation.com
Why Simulate?
Simulation is a detailed thought experiment:

1.   Confirm theoretical results.

2.   Explore the unknown theoretical environments.

3.   Statistical method for generating estimates.
Why Simulate?
1.    Confirmatory results:
     a.   Develop theory
     b.   Design simulation
     c.   Get results
     d.   Sensitivity analysis

2. Exploratory analysis:
     a.   Develop simulation
     b.   Get results
     c.   Develop theory
     d.   Sensitivity analysis

3. Statistical estimators:
     a. Bootstrap
     b. Markov Chain Monte Carlo (Bayesian)
Some examples
• Confirmatory:
1. Econometrician – new estimator, demonstrate performance
2. Psychometrician – new item response function, demonstrates
    performance

• Exploratory:
1. Econometrician – test performance of consistent estimator
    on small sample
2. Epidemiologist – explore the effects of different levels of
    mosquito net usage in a dynamic infection model
3. Educational researcher – wonder about the best way to
    estimate teacher ability when students are non-randomly
    assigned.
Simulation Stages
All simulations can be broken down into a series of discrete
stages.
                                                        Calculate/Store     Compute
                            Assign        Generate Data                      Results
         Specify Model                                    Indicators
                          Parameters
               Choose           Survey             Most      Know what           Perform
            theoretical      literature      simulations       indicators      summary
             paradigm        Calibrate        generate a        you need        statistics
                               model         data set for   and develop         from the
                                              every time    methods for     collection of
                            Draw from           they run.    generating       indicators
                              real data                             those         and the
                                                              indicators.    parameters
                                                                                     that
                                                                              generated
                                                                                   them.




                                       Repeat
1. Specify Model
• Identify underlying model (theoretical paradigm)
  This should be obvious usually based on the discipline
  which you are in though it is not uncommon for
  simulations to be interdisciplinary in nature.

• Identify minimum required complexity
  Generally the simpler the model for which you can
  test/demonstrate your theory, the better. The more
  complexity in your model the more places for
  uncertainty in what is driving your results.
Choice of Environment
                            Stata or R*
1. Most people will have a previously defined preference.

2. Simple simulations are often easier in Stata because of built
in commands like “simulate”

3. Simulations handling multiple agents, multiple data sets, or
complex relationships are often easier in R.

4. Stata is to Accounting like R is to Tetris.

* There are many other programming languages suitable for
simulation studies. These are the two which I know well.
2. Assign Parameters
• Survey the literature for reasonable model parameters.

• Estimate reasonable model parameters from available
  data.

• Generate a reasonable argument for parameter choices
  without theoretical backing.

• Allow some parameters to vary either gradually or
  randomly.
Model Calibration
• Typically there are parameters available for which no estimates are
  available.

• Modify these parameters in such a ways as to calibrate the model in
  such a way as to lead to believable and desirable outcomes.

• For instance: In the malaria transmission simulation we varied
  mosquito speed and malaria resistance rates to achieve a desired
  infection rate among the general population of 15-30% at stead
  state.
3. Generate Data
• Draw from theoretical distributions.


          Distribution     Stata            R
          Normal           rnormal()        rnorm()
          Uniform          runiform()       runif()
          Poisson          rpoisson()       rpois()
          Bernoulli        rbinomial(1,…)   rbinomial(…,1,…)


• Resample from available data. Bootstrapping (for instance)

• Sort or organize data.
Random Seed
• Most programs are incapable of generating truly random
  numbers.

• Often, truly random numbers are undesirable.

• If randomness exists, then results cannot be duplicated.

• Setting the seed allows for exactly duplicate ‘random’
  variables to be generated. Thus results do not change.
Calculate results
• Know what results are needed for confirmation of your theory. For
  example:
  1. Benefit of bednet usage is greater than the cost of bednets
  2. The estimator is unbiased.
  3. Estimates from one estimator are better than those from another.


• Know what results are needed for confirmation that simulation is
  working properly. For example:
  1. Students should only have one teacher per grade.
  2. The skewedness of the explanatory variable should be less than that of
  the dependent variable.
Repeat
• This may seem like a trivial task but it is not. Repetition is essential
  in most simulations. It is generally unconvincing (and often
  uninformative) to run a simulation only once.

• Some people do not believe results of any simulation that is not
  repeated at least 1000 times.

• How one repeats a simulation and how one interprets the results of
  the collective set of repetitions are important questions. For
  example:
 1.   Does one count the number of times that a mosquito net is profitable to
      buy or how much on average return from purchasing mosquito nets is?
 2.   Does one present the average of an estimator and its standard deviation
      or does one present how frequently the true parameter falls within the
      confidence interval of the estimator.
Necessary Programming Tools
• Macros/scalar manipulation

• Data generating commands

• For/While loops

• The ability to store results after commands
Example Simulation:
Stata: Simulate the result of errors correlated with explanatory variable.

set more off
* Turn the scroll lock off (I have it set to permenently off on my computer)

clear
* Clear the old data

set obs 1000
* Tell stata you want 1000 observations available to be used for data
generation.


gen x = rnormal()
* This is some random explanatory variable
Sort x and u
sort x
* Now the data is ordered from the smallest x to the largest x

gen id = _n
* This will count from 1 to 1000 so that each observation has a unique id

gen u = rnormal()
* u is the unobserved error in the model

sort u
* Now the data is ordered from the smallest u to the largest u

gen x2 = .
* We are going to match up the smallest u with the smallest x.
Force the correlation between x draws and the
error to be positive.
* This will loop from 1 to 1000
forv i=1/1000 {
  replace x2 = x[`i'] if id[`i']==_n
}

drop x
rename x2 x

corr x u
/*           |      x        u
-------------+------------------
          x | 1.0000
          u | 0.9980 1.0000 */
Results
gen y = 5 + 2*x + u*5

reg y x



      Source |       SS       df       MS              Number of obs    =     1000
-------------+------------------------------           F( 1,     998)   =        .
       Model | 50827.8493      1 50827.8493            Prob > F         =   0.0000
    Residual | 55.8351723    998 .055947066            R-squared        =   0.9989
-------------+------------------------------           Adj R-squared    =   0.9989
       Total | 50883.6844    999 50.9346191            Root MSE         =   .23653

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   7.145123   .0074963   953.15   0.000     7.130412    7.159833
       _cons |   4.858391   .0074869   648.92   0.000     4.843699    4.873083
------------------------------------------------------------------------------

* It is clear that we have shown that when the error is correlated in OLS that the
estimator can be severely biased.
Same simulation in R
x = sort(rnorm(1000))
u = sort(rnorm(1000))

y = 5 + 2*x + u*5

summary(lm(y~x))
# This simulation turns out to be extremely easy in R

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.75818     0.01281   371.5   <2e-16 ***
x            6.86977    0.01282   535.8   <2e-16 ***
Multi-agent simulations
• Are simulations in which agents with specified command routines
  interact. Some result of that interaction is subsequently observed
  and stored for analysis.

• An example from my work is a recent project with Andrew Dillon in
  which we simulated an environment populated by both humans and
  mosquitos. The human population stayed constant while the
  mosquito population moved each round. Mosquitos had the
  chance of becoming infected with malaria or infecting humans with
  malaria. Two hundred days (rounds) were simulated per simulation
  and the last thirty were used to calculate the returns from
  technology choice for the group that decided to use prevention
  technology at the beginning of the simulation relative to those who
  decided against prevention technology.
Multi-agent simulations: Error Checking
Especially prone to errors. Develop error routines to check for bugs.

1.   If assigning subjects to groups make sure all of the subjects have
     only one group and all of the groups have equal numbers of
     subjects (if balanced).

2. If generating composite random variables be sure the resulting
   random variables have reasonable ranges (probabilities cannot
   be less than 0 or greater than 1).
Graphical error checks
• Generate graphical figures as a means of checking for errors




The simulation appears to be converging on a stead state.
Statistical Estimators
• Bootstrap (case resampling)
  The bootstrap routine takes advantage of the assumption of
random sampling. It is often used to estimate the variances of
random variables.

• Markov Chain Monte Carlo (Bayesian Estimation)
  MCMC are a class of algorithms that has an equilibrium distribution
as a desired distribution. MCMC uses some kind of rules to move
from a specified prior distribution to a distribution reflective of the
sample distribution.
For Additional Reference

• For many more examples of simulations in R and
  Stata go to www.econometricsbysimulation.com

More Related Content

What's hot

Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind MapAshish Patel
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMapAshish Patel
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use casesSridhar Ratakonda
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Max Kleiner
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01Mehmet Çelik
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Data mining Computerassignment 3
Data mining Computerassignment 3Data mining Computerassignment 3
Data mining Computerassignment 3BarryK88
 
Learning machine learning with Yellowbrick
Learning machine learning with YellowbrickLearning machine learning with Yellowbrick
Learning machine learning with YellowbrickRebecca Bilbro
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithmsShalitha Suranga
 

What's hot (17)

Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMap
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
Machine learning
Machine learningMachine learning
Machine learning
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
K fold validation
K fold validationK fold validation
K fold validation
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
 
Data mining Computerassignment 3
Data mining Computerassignment 3Data mining Computerassignment 3
Data mining Computerassignment 3
 
Learning machine learning with Yellowbrick
Learning machine learning with YellowbrickLearning machine learning with Yellowbrick
Learning machine learning with Yellowbrick
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithms
 

Viewers also liked

Simulation Technology Challenges
Simulation Technology ChallengesSimulation Technology Challenges
Simulation Technology ChallengesCETES
 
Mourão Moura - input2012
Mourão Moura - input2012Mourão Moura - input2012
Mourão Moura - input2012INPUT 2012
 
Introduction to simulation
Introduction to simulationIntroduction to simulation
Introduction to simulationn_cool001
 
Unit 1 introduction
Unit 1 introductionUnit 1 introduction
Unit 1 introductionraksharao
 
Introduction to Simulation- Predictive Analytics
Introduction to Simulation- Predictive AnalyticsIntroduction to Simulation- Predictive Analytics
Introduction to Simulation- Predictive AnalyticsPerformanceG2, Inc.
 
02 20110314-simulation
02 20110314-simulation02 20110314-simulation
02 20110314-simulationSaad Gabr
 
The use of 3D simulation technology to improve health and safety performance ...
The use of 3D simulation technology to improve health and safety performance ...The use of 3D simulation technology to improve health and safety performance ...
The use of 3D simulation technology to improve health and safety performance ...Stephen Au
 
Future Of Simulation In Healthcare Education
Future Of Simulation In Healthcare EducationFuture Of Simulation In Healthcare Education
Future Of Simulation In Healthcare EducationCarolyn Jenkins
 
Esri CityEngine
Esri CityEngineEsri CityEngine
Esri CityEngineEsri
 
Dashboard Business Simulation Deck
Dashboard  Business Simulation DeckDashboard  Business Simulation Deck
Dashboard Business Simulation DeckAPSinc
 
Simulation technology, speed up your iterative process (by Jan Buytaert)
Simulation technology, speed up your iterative process (by Jan Buytaert)Simulation technology, speed up your iterative process (by Jan Buytaert)
Simulation technology, speed up your iterative process (by Jan Buytaert)Verhaert Masters in Innovation
 
Spatial Microsimulation for City Modelling, Social Forecasting and Urban Poli...
Spatial Microsimulation for City Modelling, Social Forecasting and Urban Poli...Spatial Microsimulation for City Modelling, Social Forecasting and Urban Poli...
Spatial Microsimulation for City Modelling, Social Forecasting and Urban Poli...NeISSProject
 
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...Beniamino Murgante
 
Simulation of urban mobility (sumo) prest
Simulation of urban mobility (sumo) prestSimulation of urban mobility (sumo) prest
Simulation of urban mobility (sumo) prestJaskaranpreet Singh
 
Leonardo Marques Monteiro - New Methods in Urban Simulation
Leonardo Marques Monteiro - New Methods in Urban SimulationLeonardo Marques Monteiro - New Methods in Urban Simulation
Leonardo Marques Monteiro - New Methods in Urban Simulationleo4mm
 
Introduction to simulation modeling
Introduction to simulation modelingIntroduction to simulation modeling
Introduction to simulation modelingbhupendra kumar
 
A collaborative environment for urban landscape simulation
A collaborative environment for urban landscape simulationA collaborative environment for urban landscape simulation
A collaborative environment for urban landscape simulationDaniele Gianni
 

Viewers also liked (20)

Within and Between Analysis (WABA).
Within and Between Analysis (WABA).Within and Between Analysis (WABA).
Within and Between Analysis (WABA).
 
Simulation Technology Challenges
Simulation Technology ChallengesSimulation Technology Challenges
Simulation Technology Challenges
 
Mourão Moura - input2012
Mourão Moura - input2012Mourão Moura - input2012
Mourão Moura - input2012
 
Introduction to simulation
Introduction to simulationIntroduction to simulation
Introduction to simulation
 
Unit 1 introduction
Unit 1 introductionUnit 1 introduction
Unit 1 introduction
 
Introduction to Simulation- Predictive Analytics
Introduction to Simulation- Predictive AnalyticsIntroduction to Simulation- Predictive Analytics
Introduction to Simulation- Predictive Analytics
 
02 20110314-simulation
02 20110314-simulation02 20110314-simulation
02 20110314-simulation
 
The use of 3D simulation technology to improve health and safety performance ...
The use of 3D simulation technology to improve health and safety performance ...The use of 3D simulation technology to improve health and safety performance ...
The use of 3D simulation technology to improve health and safety performance ...
 
Future Of Simulation In Healthcare Education
Future Of Simulation In Healthcare EducationFuture Of Simulation In Healthcare Education
Future Of Simulation In Healthcare Education
 
Esri CityEngine
Esri CityEngineEsri CityEngine
Esri CityEngine
 
Smell Simulation...A technology that can smell
Smell Simulation...A technology that can smellSmell Simulation...A technology that can smell
Smell Simulation...A technology that can smell
 
Dashboard Business Simulation Deck
Dashboard  Business Simulation DeckDashboard  Business Simulation Deck
Dashboard Business Simulation Deck
 
Simulation technology, speed up your iterative process (by Jan Buytaert)
Simulation technology, speed up your iterative process (by Jan Buytaert)Simulation technology, speed up your iterative process (by Jan Buytaert)
Simulation technology, speed up your iterative process (by Jan Buytaert)
 
Simulator
SimulatorSimulator
Simulator
 
Spatial Microsimulation for City Modelling, Social Forecasting and Urban Poli...
Spatial Microsimulation for City Modelling, Social Forecasting and Urban Poli...Spatial Microsimulation for City Modelling, Social Forecasting and Urban Poli...
Spatial Microsimulation for City Modelling, Social Forecasting and Urban Poli...
 
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
 
Simulation of urban mobility (sumo) prest
Simulation of urban mobility (sumo) prestSimulation of urban mobility (sumo) prest
Simulation of urban mobility (sumo) prest
 
Leonardo Marques Monteiro - New Methods in Urban Simulation
Leonardo Marques Monteiro - New Methods in Urban SimulationLeonardo Marques Monteiro - New Methods in Urban Simulation
Leonardo Marques Monteiro - New Methods in Urban Simulation
 
Introduction to simulation modeling
Introduction to simulation modelingIntroduction to simulation modeling
Introduction to simulation modeling
 
A collaborative environment for urban landscape simulation
A collaborative environment for urban landscape simulationA collaborative environment for urban landscape simulation
A collaborative environment for urban landscape simulation
 

Similar to An Introduction to Simulation in the Social Sciences

Azure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesAzure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesIvo Andreev
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Sherri Gunder
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...Marlon Dumas
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONIRJET Journal
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark MLAhmet Bulut
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_TrushitaTrushita Redij
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
Multiple-Linear-Regression-Model-Analysis.pptx
Multiple-Linear-Regression-Model-Analysis.pptxMultiple-Linear-Regression-Model-Analysis.pptx
Multiple-Linear-Regression-Model-Analysis.pptxNaryCasila
 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...TEJVEER SINGH
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...IJCSES Journal
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...ijcseit
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsParinaz Ameri
 

Similar to An Introduction to Simulation in the Social Sciences (20)

Azure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesAzure Machine Learning and ML on Premises
Azure Machine Learning and ML on Premises
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Multiple-Linear-Regression-Model-Analysis.pptx
Multiple-Linear-Regression-Model-Analysis.pptxMultiple-Linear-Regression-Model-Analysis.pptx
Multiple-Linear-Regression-Model-Analysis.pptx
 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 

Recently uploaded

internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 

Recently uploaded (20)

internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 

An Introduction to Simulation in the Social Sciences

  • 1. AN INTRODUCTION TO SIMULATION DESIGN IN THE SOCIAL SCIENCES By Francis Smart Michigan State University Agricultural, Food, and Resource Economics Measurement and Quantitative Methods www.econometricsbysimulation.com
  • 2. Why Simulate? Simulation is a detailed thought experiment: 1. Confirm theoretical results. 2. Explore the unknown theoretical environments. 3. Statistical method for generating estimates.
  • 3. Why Simulate? 1. Confirmatory results: a. Develop theory b. Design simulation c. Get results d. Sensitivity analysis 2. Exploratory analysis: a. Develop simulation b. Get results c. Develop theory d. Sensitivity analysis 3. Statistical estimators: a. Bootstrap b. Markov Chain Monte Carlo (Bayesian)
  • 4. Some examples • Confirmatory: 1. Econometrician – new estimator, demonstrate performance 2. Psychometrician – new item response function, demonstrates performance • Exploratory: 1. Econometrician – test performance of consistent estimator on small sample 2. Epidemiologist – explore the effects of different levels of mosquito net usage in a dynamic infection model 3. Educational researcher – wonder about the best way to estimate teacher ability when students are non-randomly assigned.
  • 5. Simulation Stages All simulations can be broken down into a series of discrete stages. Calculate/Store Compute Assign Generate Data Results Specify Model Indicators Parameters Choose Survey Most Know what Perform theoretical literature simulations indicators summary paradigm Calibrate generate a you need statistics model data set for and develop from the every time methods for collection of Draw from they run. generating indicators real data those and the indicators. parameters that generated them. Repeat
  • 6. 1. Specify Model • Identify underlying model (theoretical paradigm) This should be obvious usually based on the discipline which you are in though it is not uncommon for simulations to be interdisciplinary in nature. • Identify minimum required complexity Generally the simpler the model for which you can test/demonstrate your theory, the better. The more complexity in your model the more places for uncertainty in what is driving your results.
  • 7. Choice of Environment Stata or R* 1. Most people will have a previously defined preference. 2. Simple simulations are often easier in Stata because of built in commands like “simulate” 3. Simulations handling multiple agents, multiple data sets, or complex relationships are often easier in R. 4. Stata is to Accounting like R is to Tetris. * There are many other programming languages suitable for simulation studies. These are the two which I know well.
  • 8. 2. Assign Parameters • Survey the literature for reasonable model parameters. • Estimate reasonable model parameters from available data. • Generate a reasonable argument for parameter choices without theoretical backing. • Allow some parameters to vary either gradually or randomly.
  • 9. Model Calibration • Typically there are parameters available for which no estimates are available. • Modify these parameters in such a ways as to calibrate the model in such a way as to lead to believable and desirable outcomes. • For instance: In the malaria transmission simulation we varied mosquito speed and malaria resistance rates to achieve a desired infection rate among the general population of 15-30% at stead state.
  • 10. 3. Generate Data • Draw from theoretical distributions. Distribution Stata R Normal rnormal() rnorm() Uniform runiform() runif() Poisson rpoisson() rpois() Bernoulli rbinomial(1,…) rbinomial(…,1,…) • Resample from available data. Bootstrapping (for instance) • Sort or organize data.
  • 11. Random Seed • Most programs are incapable of generating truly random numbers. • Often, truly random numbers are undesirable. • If randomness exists, then results cannot be duplicated. • Setting the seed allows for exactly duplicate ‘random’ variables to be generated. Thus results do not change.
  • 12. Calculate results • Know what results are needed for confirmation of your theory. For example: 1. Benefit of bednet usage is greater than the cost of bednets 2. The estimator is unbiased. 3. Estimates from one estimator are better than those from another. • Know what results are needed for confirmation that simulation is working properly. For example: 1. Students should only have one teacher per grade. 2. The skewedness of the explanatory variable should be less than that of the dependent variable.
  • 13. Repeat • This may seem like a trivial task but it is not. Repetition is essential in most simulations. It is generally unconvincing (and often uninformative) to run a simulation only once. • Some people do not believe results of any simulation that is not repeated at least 1000 times. • How one repeats a simulation and how one interprets the results of the collective set of repetitions are important questions. For example: 1. Does one count the number of times that a mosquito net is profitable to buy or how much on average return from purchasing mosquito nets is? 2. Does one present the average of an estimator and its standard deviation or does one present how frequently the true parameter falls within the confidence interval of the estimator.
  • 14. Necessary Programming Tools • Macros/scalar manipulation • Data generating commands • For/While loops • The ability to store results after commands
  • 15. Example Simulation: Stata: Simulate the result of errors correlated with explanatory variable. set more off * Turn the scroll lock off (I have it set to permenently off on my computer) clear * Clear the old data set obs 1000 * Tell stata you want 1000 observations available to be used for data generation. gen x = rnormal() * This is some random explanatory variable
  • 16. Sort x and u sort x * Now the data is ordered from the smallest x to the largest x gen id = _n * This will count from 1 to 1000 so that each observation has a unique id gen u = rnormal() * u is the unobserved error in the model sort u * Now the data is ordered from the smallest u to the largest u gen x2 = . * We are going to match up the smallest u with the smallest x.
  • 17. Force the correlation between x draws and the error to be positive. * This will loop from 1 to 1000 forv i=1/1000 { replace x2 = x[`i'] if id[`i']==_n } drop x rename x2 x corr x u /* | x u -------------+------------------ x | 1.0000 u | 0.9980 1.0000 */
  • 18. Results gen y = 5 + 2*x + u*5 reg y x Source | SS df MS Number of obs = 1000 -------------+------------------------------ F( 1, 998) = . Model | 50827.8493 1 50827.8493 Prob > F = 0.0000 Residual | 55.8351723 998 .055947066 R-squared = 0.9989 -------------+------------------------------ Adj R-squared = 0.9989 Total | 50883.6844 999 50.9346191 Root MSE = .23653 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 7.145123 .0074963 953.15 0.000 7.130412 7.159833 _cons | 4.858391 .0074869 648.92 0.000 4.843699 4.873083 ------------------------------------------------------------------------------ * It is clear that we have shown that when the error is correlated in OLS that the estimator can be severely biased.
  • 19. Same simulation in R x = sort(rnorm(1000)) u = sort(rnorm(1000)) y = 5 + 2*x + u*5 summary(lm(y~x)) # This simulation turns out to be extremely easy in R Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.75818 0.01281 371.5 <2e-16 *** x 6.86977 0.01282 535.8 <2e-16 ***
  • 20. Multi-agent simulations • Are simulations in which agents with specified command routines interact. Some result of that interaction is subsequently observed and stored for analysis. • An example from my work is a recent project with Andrew Dillon in which we simulated an environment populated by both humans and mosquitos. The human population stayed constant while the mosquito population moved each round. Mosquitos had the chance of becoming infected with malaria or infecting humans with malaria. Two hundred days (rounds) were simulated per simulation and the last thirty were used to calculate the returns from technology choice for the group that decided to use prevention technology at the beginning of the simulation relative to those who decided against prevention technology.
  • 21. Multi-agent simulations: Error Checking Especially prone to errors. Develop error routines to check for bugs. 1. If assigning subjects to groups make sure all of the subjects have only one group and all of the groups have equal numbers of subjects (if balanced). 2. If generating composite random variables be sure the resulting random variables have reasonable ranges (probabilities cannot be less than 0 or greater than 1).
  • 22. Graphical error checks • Generate graphical figures as a means of checking for errors The simulation appears to be converging on a stead state.
  • 23. Statistical Estimators • Bootstrap (case resampling) The bootstrap routine takes advantage of the assumption of random sampling. It is often used to estimate the variances of random variables. • Markov Chain Monte Carlo (Bayesian Estimation) MCMC are a class of algorithms that has an equilibrium distribution as a desired distribution. MCMC uses some kind of rules to move from a specified prior distribution to a distribution reflective of the sample distribution.
  • 24. For Additional Reference • For many more examples of simulations in R and Stata go to www.econometricsbysimulation.com