BU_FCAI_SCC430_Modeling&Simulation_Ch06.pptx

SCC430
Modeling and Simulation
Chapter 06
Input Modeling
Dr. Ahmed Hagag
Faculty of Computers and Artificial Intelligence
Benha University
Fall 2020

• Introduction.
• Data Collection.
• Useful Probability Distributions.
• Identifying the Distribution with Data.
• Distribution-Fitting Software (e.g., ExpertFit ®).
©AhmedHagag SCC430 Modeling and Simulation 2
Chapter 6: Input Modeling

Recall
Introduction (1/6)
Queueing Sim.
Inventory Sim.

Introduction (2/6)
Input Models
(Distribution)

Introduction (2/6)
If we know about the desired
distribution, we can build a
random number generator to
generate input samples.
Input Models
(Distribution)

Introduction (3/6)
Value Probability
1 0.5
2 0.3
3 0.1
4 0.1
Generating RandomVariates
Distribution
Function
PRNs
APPROACHES
1. Inverse Transform
2. Composition
3. Convolution
4. Acceptance-Rejection
5. Ratio of Uniforms
6. {Others … }

Introduction (3/6)
Select
Distribution
𝑋
෠
Data
𝐹 𝑥
Cumulative
Distribution
𝑈
Generated
PRNs
𝐹−1(𝑢)
Inverse
Distribution
Inputs
𝑋
Output
(Random Variates)
Inverse Transform

Introduction (3/6)
Value Probability
1 0.5
2 0.3
3 0.1
4 0.1

Inputs are the independent variables in the system:
• Customer inter-arrival time periods in a single service
facility is an input. It’s a continuous random variable
since the period could be any value between two
limits.
• The demand sizes and for an inventory system is an
input. It’s a discreet random variable since the
demand size can take only specific values.
Introduction (4/6)

• Input models provide the driving force for a
simulation model.
• One of the biggest tasks in solving a real problem.
• An important factor in controlling simulation quality.
Introduction (5/6)

Introduction (6/6)
Collect data from the
real system of interest.
Evaluate the chosen distribution and the associated
parameters for goodness of fit.
Y
es
No
Y
es
Identify a probability distribution to
represent the input process.
Yes
Choose parameters that determine a specific
instance of the distribution family.
Yes
Expert opinion and
knowledge of the
process must be used.
No
Y
es

• If a simulation product does not have good statistical-
analysis features, then it is impossible to obtain
correct results from a simulation study.
1. The software must have a good random-number
generator.
2. Each source of randomness in the system of interest
should be represented in the simulation model by a
probability distribution.
3. Making independent runs of the simulation model.
(e.g., Each run uses separate sets of different random
numbers) and output data analysis.
Data Collection (1/5)

• If a simulation product does not have good statistical-
analysis features, then it is impossible to obtain
correct results from a simulation study.
1. The software must have a good random-number
generator.
2. Each source of randomness in the system of interest
should be represented in the simulation model by a
probability distribution.
3. Making independent runs of the simulation model.
(e.g., Each run uses separate sets of different random
numbers) and output data analysis.

Probability Distribution:
• If it is possible to find a standard theoretical
distribution (e.g., Uniform, Poisson, Exponential,
Normal) that is a good model for a particular source
of randomness, then this distribution should be used
in the model.
• If a theoretical distribution cannot be found that is a
good representation for a source of randomness, then
an empirical (or user-defined) distribution based on
the data should be used.

Which is Better?
a standard theoretical
• It is preferring
distribution than
distribution.
to use
an empirical (or user-defined)

• Even when model structure is valid simulation results
can be misleading, if the input data is:
 Inaccurately collected.
 Inappropriately analyzed.
 Not representative of the environment.

• If we model an existing system, we collect data from
the real system (inter-arrival time or the demand size
as examples). Then, use the collected data to feed the
model.
• If the data is not available, experts guided guessing is
our way to choose a suitable theoretical distribution
then build the random number generator to generate
the input.

Continuous Distributions:
• Uniform.
• Exponential.
• Gamma.
• Normal (Gaussian).
Useful Prob. Dist. (1/10)

Continuous Distributions (Uniform) (1/2):
• 𝑎 and 𝑏 real numbers with 𝑎 < 𝑏.
𝑼(𝒂, 𝒃)

Continuous Distributions (Uniform) (2/2):
𝑼(𝒂, 𝒃)
PRN
Generating Random Variates 𝑋’s

Continuous Distributions (Exponential) (1/2):
• Scale parameter 𝛽 > 0.
expo(𝖰)
PRN

Continuous Distributions (Exponential) (2/2):
expo(𝖰)
𝖰 = 𝟑
𝖰 = 𝟐
𝖰 = 𝟏

Continuous Distributions (Gamma):
• Shape parameter 𝛼 > 0, Scale parameter 𝛽 > 0.
gamma(𝑎 , 𝖰)

Continuous Distributions (Gamma) (1/2):
• Shape parameter 𝛼 > 0, Scale parameter 𝛽 > 0.
gamma(𝑎 ,𝖰)

Continuous Distributions (Normal):
• Location parameter 𝜇 ∈ ℝ, Scale parameter 𝜎 > 0.
N(𝝁, 𝝈𝟐)

Discrete Distributions:
• Discrete Uniform.
• Geometric.
• Poisson.

Discrete Distributions (Discrete Uniform) (1/2):
• 𝑖 and 𝑗 integers with 𝑖 ≤ 𝑗.
𝑫𝑼(𝒊,𝒋)

Discrete Distributions (Discrete Uniform) (2/2):
• 𝑖 and 𝑗 integers with 𝑖 ≤ 𝑗.
𝑫𝑼(𝒊,𝒋)

Discrete Distributions (Geometric) (1/3):
• 𝑝 ∈ 0, 1 .
geom(𝒑)

• 𝑝 ∈ 0, 1 .
geom(𝒑)

• 𝑝 ∈ 0, 1 .
geom(𝒑)
PRN

Discrete Distributions (Poisson) (1/3):
• 𝜆 > 0.
Poisson(𝝀)

• 𝜆 > 0.
Poisson(𝝀)

Choosing a standard theoretical distribution:
1. Select the type (family) of the distribution.
2. Estimate the parameters of the selected family using
the collected data.
3. Test the selected distribution.
 Heuristic (Graphical Comparison) test.
 Goodness-of-Fit test (Anderson-Darling (AD),
K-S, and Chi-square tests).

Choose the suitable distribution using:
1. Summary Statistics.
2. Histogram.
3. Quantile-Quantile (𝑞-𝑞) Plot.
4. Boxplot.
5. {Others …}
Ident. the Dist. with Data (1/9)

Summary Statistics (1/3):
=
1
𝑛 − 1
𝑖
=1
𝑛
𝑖
෠ 𝑥 − 𝑥 ҧ
2

For example:
• If the median is equal or near equal to the mean, this
indicates symmetric, (e.g., normal) distribution.
• If the coefficient of variation (cv) is close to 1 this
indicates exponential distribution because its cv is 1.

Example1 – Summary Statistics (1/4)
A simulation model was developed for a drive-up banking
facility, and data were collected on the arrival pattern for cars.
Over a fixed 90-minute interval, 220 cars arrived, and we noted
the (continuous) interarrival time 𝑋𝑖 (in minutes) between cars 𝑖
and 𝑖 + 1, for 𝑖 = 1, 2,… , 219.

Since:
𝑋
ത219 = 0.399 > 0.270 = 𝑥ො 0.5(219)
and 𝑣
ො 219 = 1.478, this suggests that
the underlying distribution is skewed to
the right, rather than symmetric.
Furthermore, 𝑐
ො 𝑣 219 = 0.953, which
is close to the theoretical value of 1 for the
exponential distribution.

Given the values and counts for 𝑛 = 156 observations on the
(discrete) number of items demanded in a week from an
inventory system, arranged into increasing order. Rather than
giving all the individual values, we give the frequency counts;
59 𝑋𝑖’s were equal to 0, 26 𝑋𝑖’s were equal to 1, etc.

Histogram (1/3)
To make a histogram, we break up the range of values covered
by the data into 𝑘 disjoint adjacent intervals [𝑏
0, 𝑏1), [𝑏1, 𝑏2), . . . , [𝑏𝑘−1, 𝑏𝑘 ).
All the intervals should be the same width ∆𝑏, which might
necessitate throwing out a few extremely large or small 𝑋𝑖 ’s to
avoid getting an unwieldy-looking histogram plot.
,
∆𝑏 = max 𝑋 −min 𝑋
𝑘 = max 𝑋 −min 𝑋
𝑘 ∆
𝑏

Histogram (2/3)
For 𝑗 = 1, 2,. . . , 𝑘, let ℎ𝑗 be the proportion of the 𝑋𝑖’s that are in
the 𝑗th interval [𝑏𝑗−1, 𝑏𝑗).
i.e., ℎ1
𝘍
= number of 𝑋 s in 𝑏0, 𝑏1
total number of 𝑋𝘍s
. Finally, we define the function

Histogram (3/3)
The number of intervals 𝑘 may be chosen according to the
following formula:
However, in general, we do not believe that such rules are very
useful. We recommend trying several different values of ∆𝑏 and
choosing the smallest one that gives a “smooth” histogram.

Example1 – Histogram (1/8)

following formula:
𝑘 = 1 + log2 219 = 8
,max 𝑋 = 1.96 ≈ 2 ,
min𝑋 = 0.01 ≈ 0 ∆𝑏 =
2 − 0
8
= 0.25

following formula:
𝑘 = 1 + log2 219 = 8
,max 𝑋 = 1.96 ≈ 2 ,
min𝑋 = 0.01 ≈ 0 ∆𝑏 =
2 − 0
8
= 0.25
[0, 0.25), [0.25, 0.5),[0. 5, 0.75),[0.75, 1),
[1, 1.25), [1.25, 1.5),[1.5, 1.75), [1.75, 2),

0.25
For ∆𝑏 = 0.25 , 𝑘 = 2−0
= 8

0.05
For ∆𝑏 = 0.05 , 𝑘 = 2−0
= 40

For ∆𝑏 = 0.075 , 𝑘 =
2−0
0.075
= 27

For ∆𝑏 = 0.1 , 𝑘 = 2−0
= 20
0.1
0.1
For ∆𝑏 = 0.1 , 𝑘 = 2−0
= 20

Heuristic (Graphical comparison) test
The smoothest-looking histogram appears
to be for ∆𝑏 = 0.1 and its shape resembles
that of an exponential density.

Quantile-Quantile (𝒒-𝒒) plot (1/8)
When there is a small number of data points, say, 30 or fewer, a
histogram is not as useful for evaluating the fit of the chosen
distribution. Further, our perception of the fit depends on the
widths of the histogram intervals. But, even if the intervals are
well chosen, grouping data into cells makes it difficult to
compare a histogram to a continuous probability density
function. A quantile-quantile (𝑞-𝑞) plot is a useful tool for
evaluating distribution fit, one that does not suffer from these
problems.

A quantile-quantile (𝑞-𝑞) plot is a probability plot, which is a
graphical method for comparing two probability distributions
by plotting their quantiles against each other.

We plot (x, y), where x’s are observed values (observed data
quantiles) and y’s are the theoretical values (theoretical
quantiles). If we select an appropriate family of theoretical
distributions the plot points (x, y) will be approximately a
straight line. On the other hand, if the assumed distribution is
inappropriate, the points will deviate from a straight line.

Example: A robot is used to install the doors on automobiles
along an assembly line. It was thought that the installation times
followed a normal distribution. The robot is capable of
measuring installation times accurately. A sample of 20
installation times was automatically taken by the robot, with the
following results, where the values are in seconds:

The observations are now ordered from smallest to largest as follows:

20 values from the normal distribution with
mean 99.99 and variance 0.2832 2

Boxplot (1/4)
Is a way of graphically depicting groups of numerical data
through their five number summaries:
1st
2nd
3rd
Quartile = Lower Quartile = 𝑄1.
Quartile = Median Quartile = 𝑄2.
Quartile = Upper Quartile = 𝑄3.
1) Smallest observation.
2)
3)
4)
5) Largest observation.

Boxplot (2/4)
Interquartile Range (𝐼𝑄𝑅) = 𝑄3 − 𝑄1
Smallest Non-Outliers = 𝑄1 – (1.5 × 𝐼𝑄𝑅)
Largest Non-Outliers = 𝑄3 + (1.5 × 𝐼𝑄𝑅)

Boxplot (3/4)

Boxplot (4/4)

Example1 – Boxplot (1/5)
Create a boxplot for the following data set of 14 numbers:
1, 30, 6, 7.2, 4, 8, 9, 10, 6.8, 8.3, 2, 2, 10, 1

Sort: 1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 30

Sort: 1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 30
1st
2nd
3rd
2)
3)
4)

Sort: 1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 30
1st
2nd
3rd
1) Smallest observation = 1
2)
3)
4)

Sort: 1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 30
1st
2nd
3rd
2)
3)
4)
5) Largest observation = 30

Sort: 1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 30
Quartile = Median Quartile = 𝑄2
1st
2nd
3rd
= 6.8+7.2
= 7.
2
2)
3)
4)

Sort: 1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 30
Quartile = Lower Quartile = 𝑄1 = 2.
1st
2nd
3rd
= 6.8+7.2
= 7.
2
2)
3)
4)

Sort: 1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 30
1st
2nd
3rd
= 6.8+7.2
= 7.
2
Quartile = Upper Quartile = 𝑄3 = 9.
2)
3)
4)

Sort: 1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 30
𝑄1 = 2, 𝑄2 = 7, 𝑄3 = 9
Interquartile Range (𝐼𝑄𝑅) = 𝑄3 − 𝑄1 = 7
= −8.5
Smallest Non-Outliers = 𝑄1 – 1.5 × 𝐼𝑄𝑅
Largest Non-Outliers = 𝑄3 + 1.5 × 𝐼𝑄𝑅
= 19.5

Sort: 1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 30
𝑄1 = 2, 𝑄2 = 7, 𝑄3 = 9
= −8.5
Largest Non-Outliers = 𝑄3 + 1.5 × 𝐼𝑄𝑅 = 19.5
Outlier

𝑄1 = 2, 𝑄2 = 7, 𝑄3 = 9, 𝐼𝑄𝑅 = 7
= −8.5

Create a boxplot for the following data set of 11 numbers:
150, 130, 160, 170, 140, 80, 190, 100, 160, 160, 130

Sort: 80, 100, 130, 130, 140, 150, 160, 160, 160, 170, 190

Sort: 80, 100, 130, 130, 140, 150, 160, 160, 160, 170, 190
1st
2nd
3rd
2)
3)
4)

Sort: 80, 100, 130, 130, 140, 150, 160, 160, 160, 170, 190
1st
2nd
3rd
1) Smallest observation = 80.
2)
3)
4)

Sort: 80, 100, 130, 130, 140, 150, 160, 160, 160, 170, 190
1st
2nd
3rd
2)
3)
4)
5) Largest observation = 190.

Sort: 80, 100, 130, 130, 140, 150, 160, 160, 160, 170, 190
1st
2nd
3rd
Quartile = Median Quartile = 𝑄2 = 150.
2)
3)
4)

Sort: 80, 100, 130, 130, 140, 150, 160, 160, 160, 170, 190
1st
2nd
3rd
2)
3)
4)

Sort: 80, 100, 130, 130, 140, 150, 160, 160, 160, 170, 190
1st
2nd
3rd
Quartile = Upper Quartile = 𝑄3 = 160.
2)
3)
4)

Sort: 80, 100, 130, 130, 140, 150, 160, 160, 160, 170, 190
𝑄1 = 130, 𝑄2 = 150, 𝑄3 = 160
Smallest Non-Outliers = 𝑄1 – 1.5 × 𝐼𝑄𝑅 = 85
Largest Non-Outliers = 𝑄3 + 1.5 × 𝐼𝑄𝑅
= 205

Sort: 80, 100, 130, 130, 140, 150, 160, 160, 160, 170, 190
𝑄1 = 130, 𝑄2 = 150, 𝑄3 = 160
Largest Non-Outliers = 𝑄3 + 1.5 × 𝐼𝑄𝑅 = 205
Outlier

𝑄1 = 130, 𝑄2 = 150, 𝑄3 = 160

Software (1/22)
http://www.averill-law.com/distribution-fitting/

Software (2/22)

Software (3/22)

Software (4/22)

Software (5/22)

Software (6/22)

Software (7/22)

Software (8/22)

Software (9/22)

Software (10/22)

Software (11/22)

Software (12/22)

Software (13/22)

Software (14/22)

Software (15/22)

Software (16/22)

Software (17/22)

Software (18/22)

Software (19/22)

Software (20/22)

Software (21/22)

Software (22/22)

Dr. AhmedHagag
ahagag@fci.bu.edu.eg

BU_FCAI_SCC430_Modeling&Simulation_Ch06.pptx

Recommended

Recommended

More Related Content

Similar to BU_FCAI_SCC430_Modeling&Simulation_Ch06.pptx

Similar to BU_FCAI_SCC430_Modeling&Simulation_Ch06.pptx (20)

More from MaiGaafar

More from MaiGaafar (10)

Recently uploaded

Recently uploaded (20)

BU_FCAI_SCC430_Modeling&Simulation_Ch06.pptx