Chap_05_Data_Collection_and_Analysis.ppt

Chapter 5
Data Collection and Analysis
“You can observe a lot just by watching.”
– Yogi Berra
McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.

5-2
Questions to be answered:
 What types of data should be gathered?
 How should data be gathered?
 What statistical background do I need?
 How should data be analyzed?
 How do you get data in the right form for
use in simulation?
 How should data be documented?

5-3
Objective of Data Collection and
Analysis
The goal of data collection and analysis it to
come up with descriptive information and
statistics that define the behavior of the
system.

5-4
What Types of Data Should Be
Gathered?
 Structural
 Operational
 Numerical

5-5
Structural Data
 Resources used
 Workstations and buffers
 Entity types
 Paths and conveyors

5-6
Operational Data
 Routings
 Arrivals
 Work Schedules (shifts)
 Decision logic

5-7
Numerical Data
 Resource quantities
 Buffer sizes
 Operation times
 Move times
 Interarrival times
 Batch sizes
 Time between failures
Requires
some
statistical
analysis

5-8
How Should Data Be Gathered?
1. Determine data requirements.
2. Identify reliable data sources.
3. Collect the data.
4. Summarize the data
5. Document and approve data.

5-9
Suggestions for Data Gathering
 Define the problem/objective for the simulation
(maximize throughput, minimize cycle time, etc.)
 Identify only factors that bear on the problem
(operation times, resource scheduling, etc.)
 Focus on input variables, not response variables (flow
times, throughput rates, utilizations, etc.)
 Separate delay times from delay conditions (e.g. getting
a resource vs. activity time).
 Look for common groupings (e.g., part families)
 Focus on essence (i.e. time, conditions), not substance
(i.e. how the activity is performed).
 Look for triggering events (What triggers entity
movement? What triggers a machine setup?).

5-10
Look for the Constraints
 A system constraint is anything that keeps
everything from happening all at once.
 Constraints include time delays,
conditional delays due to unavailable
resources, parts, etc.

5-11
Use a Questionnaire
 Organizes and simplifies the data
gathering process.
 Helps ensure all issues are addressed.
 Can send it to process owners in
advance and leave a copy afterwards

5-12
Sources of Data
 Historical records (production, sales,
scrap rates, equipment reliability)
 System documentation (process plans,
facility layouts, work procedures)
 Personal Observation (facility walk-
through, time studies, work sampling)
 Interviews (operators, maintenance
personnel, engineers, managers)

5-13
Sources of Data (cont.)
 Comparative systems (same or similar
industries)
 Vendor claims (cycle times, equipment
reliability)
 Design estimates (process times, move
times, etc. for a new system)
 Literature (published research on
learning curves, predetermined time
studies, etc.)

5-14
What to Avoid
 Opinions when actual times can be
determined.
 Taking only one or two sample times.
 Taking samples from only one day or only
one operator then applying the results to
many operators.

5-16
Systematic Data Collection
1. Define overall flow logic.
2. Describe each process step
3. Give specific values

5-17
Defining the Process Flow
Station 1 Station 2 Station 3
Station 4 Station 5
Product A

5-18
Description of Operation
Location Activity
Time
Activity
Resource
Next
Location
Move
Trigger
Move
Time
Move
Resource
Check-in
Counter
N(1,.2) min. Secretary Waiting
Room
None .2 min. None
Waiting
Room
None None Exam
Room
When
room is
available
.8 min.* Nurse
Exam
Room
N(15,4) min. Doctor Check-out
Counter
None .2 min. None
Check-out
Counter
N(3,.5) min. Secretary Exit None .2 min. None
Patient
Waiting
Room
Exam
Room
(3)
Check-out
Counter
Check-in
Counter

5-20
Random System Variables
 Qualitative variable -- A non-numerically valued variable.
 Quantitative variable -- A numerically valued variable.
 Discrete Variable -- A quantitative variable whose possible
values form a finite set of specific values (categories for
qualitative data, whole numbers for quantitative data).
 Continuous variable -- A quantitative variable whose
possible values can vary infinitely within a range.
Defining characteristics of a system that vary in value from
one observation to the next (e.g. cycle time, boxes per
pallet).

5-21
Characterizing Random
Variables
 descriptive statistics (describe the data)
 data analysis (looks for correlations in the
data)
 distribution fitting (determines the
appropriate probability distribution to
represent the data)
You don’t need to be a professional statistician,
all you need is a basic knowledge of

5-22
Discrete vs. Continuous Variables
 Continuous – The variable can take on
any value within a range (i.e. Height,
Weight, Time, etc.)
 Discrete – The variable can only take
select values within a range (i.e. Gender,
Patient Class, Part Type, Counts, etc.)

5-23
Data for a Variable
 A random system variable is defined by
gathering sample data on the variable.
 For random variables, the more data that
are gathered (i.e., the larger the sample
size), the more accurate the
characterization is of the variable.

5-24
Data Groupings
 Class – Category or range of values for
grouping data.
 Frequency -- The number of observations that
fall in a class.
 Frequency distribution -- A listing of all
classes along with their frequencies.
 Relative frequency -- The ratio of the
frequency of a class to the total number of
observations.
 Relative-frequency distribution -- A listing
of all classes along with their relative
frequencies.

5-25
Histograms vs. Bar Charts
•Used for quantitative variables
•Horizontal axis shows range
•Bars touch each other
•Used for categorical variables
•Horizontal axis shows category
•Bars don’t touch each other

5-26
Age Tally Frequency Relative
Frequency
Percent
25 to < 33 |||| 5 5/50 = .10 10%
33 to < 41 |||| |||| |||| 14 14/50 = .28 28%
41 to < 49 |||| |||| ||| 13 13/50 = .26 26%
49 to < 57 |||| |||| 9 9/50 = .18 18%
57 to < 65 |||| || 7 7/50 = .14 14%
65 to < 73 || 2 2/50 = .04 4%
A Histogram is used
to show frequency or
percentage by
interval (data will
always be numerical)

5-28
Histograms reveal the shape,
center, and spread of a variable
 Shape refers to the shape formed by
the bars of the histogram
 Center refers to the mean of the
variable. If the histogram were an
object, it would “balance” on the mean
(half the area is to the left, half to the
right).
 Spread refers to how far dispersed the
data values are

5-29
Measures of Center or “Location”
 Mean
 Median
 Mode

5-30
Calculating Sample Mean
n
i
X
X


Formula:
That is, add up all of the data points and divide
by the number of data points.
Data (# of classes skipped): 2 8 3 4 1
Sample Mean = (2+8+3+4+1)/5 = 3.6
Do not round! Mean need not be a whole number.

5-31
Median
 Another name for 50th percentile.
 Appropriate for describing measurement
data.
 “Robust to outliers,” that is, not
affected much by unusual values.

5-32
Mode
 The value that occurs most frequently.
 One data set can have many modes.
 Appropriate for all types of data, but most
useful for categorical data or discrete data
with only a few number of possible values.

5-33
Histograms can be unimodal,
multimodal or uniform

5-34
A histogram can show you if
there are outliers in the data.

5-35
Measures of Spread
 Range
 Variance
 Standard Deviation

5-36
How and why should data be
analyzed?
 Data analysis ensures that your data is
meaningful and useful.
 Types of analysis include:
 Test for independence (randomness).
 Test for homogeneity (same source).
 Test for stationarity (non varying over time).

5-37
Testing for Independence
(Randomness)
 Scatter Plot
 Autocorrelation Plot
 Runs Test
These tests can be run using Stat::Fit
which can be run “Stand-alone” or from
the ProModel Tools menu.

5-38
Stat::Fit
 Stat::Fit is the data analysis and
distribution fitting software package
bundled with PROMODEL products.
 You can enter or import a dataset into
Stat::Fit for distribution fitting.
 You can copy and paste the fitted
distribution parameters into a ProModel
model.

5-39
Entering Data in Stat::Fit
 Type values.
 Open a .dat file
(File  Open).
 Copy and paste
from spreadsheet.

5-40
Scatter Plot
 Tests for Independence.
 Plots successive pairs of data as x,y values
(n-1 points).
 Random scatter of points indicates
independence.
 If data are correlated, the points will fall
along a line or curve.

5-41
Scatter Plot for 100 Inspection
Times

5-42
Scatter Plot for 100 Temperatures

5-43
Autocorrelation Plot
 Another test for independence.
 Independence is ascertained by computing
autocorrelations for data values at varying
time lags.
 If independent, such autocorrelations
should be near zero for any and all time-
lag separations.

5-44
Autocorrelation Plot for
Inspection Times

5-45
Autocorrelation Plot for
Temperatures

5-46
Data that tends to be non-
homogenous
 Activity times that take longer or
shorter depending on the type of entity
being processed.
 Inter-arrival times that fluctuate in
length depending on the time of day or
day of the week.
 Time between failures and time to
repair where the failure may result from
a number of different causes.

5-47
Testing for Identically Distributed
(Homogenous) Data
Repair Time
Frequency
of
Occurrence
Part Jams Mechanical Failures
Bimodal Distribution of Downtimes Indicating Multiple Causes

5-48
Nonstationary (time-variant)
Data
 Behavior that changes over time
Examples:
 Customer arrivals
 Equipment reliability

5-49
Non-stationary Data
Time of Day
Rate of
Arrival
10:00 a.m. 12:00 a.m. a.m. 2:00 p.m. 4:00 p.m. 6:00 p.m.
Change in Rate of Customer Arrivals Between 10 a.m. and 6 p.m.

5-50
Three ways to represent data
 Use actual data -- e.g. read from text file
 Use a frequency table -- called an
empirical or user-defined distribution
 Use a standard distribution -- best guess
or Stat::Fit

5-51
Probability Distributions
 A Probability Distribution defines all
possible values of a system variable
plotted against their respective
probabilities.
 Distributions can be either discrete
(probability mass function) or continuous
(probability density function).

5-52
Bernoulli
 The output of a process is either defective
or non-defective
 An employee shows up for work or not
 An operation is required or not
f(x)
x
0 1
1.0
0.5

5-53
Binomial
 The number of defective items in a batch.
 The number of customers of a particular
type that enter the system.
 The number of employees out of a group
of employees who call in sick on a given
day.
f(x)
x
0 1 2 3 4 5 6
.4
.3
.2
.1

5-54
 The number of entities arriving each hour.
 The number of defects per item.
 The number of times a resource is
interrupted each hour.
Poisson
f(x)
x
0.2
0.1
0 1 2 3 4 5 6 7 8 9 10

5-55
 The number of machine cycles before a
failure occurs.
 The number of items inspected before a
defective item is found.
 The number of customers processed
before a particular type is encountered
Geometric
f(x)
x
1 2 3 4 5 6 7
0
0.4
0.3
0.2
0.1

5-56
 The type of an incoming entity given that
each possible type is equally likely to
occur
Uniform
f(x)
x
a b

5-57
f(x)
x
a m c
Triangular
• good first approximation to the true
underlying distribution when data is sparse
and no distribution fitting analysis has been
performed

5-58
 Popular but rarely a true representation of
actual data.
Normal
 

f(x)
x

5-59
 intervals between occurrences such as
the time between customer arrivals.
 certain repair times or activities such as
the duration of telephone conversations.
 Inverse of Poisson
Exponential
f(x)
x

5-60
 random proportions such as the
percentage of defective items in a lot
 activity times, particularly when multiple
tasks make up the activity (PERT
analysis is based on beta distribution).
Beta
f(x)
x
 =0.5,  =2
 =1.0,  =2.0
 =.5,  =.5
1.0
0

5-61
 manual activities such as assembly,
inspection or repair.
 The time between failures is often
lognormally distributed.
Lognormal
f(x)
x

5-62
 manual tasks such as service times or
repair times
Gamma
f(x)
x
 
 >
1
2

5-63
 used in reliability theory for defining the
time until failure particularly due to
items (e.g. bearings, tooling, etc.) that
wear
Weibull
f(x)
x
 
 =
1
2

5-64
Bounded vs. Boundless
Distributions
 Bounded distributions prevent likely
extreme values from occurring.
 Boundless distributions cause unlikely
extreme values to occur.

5-65
Fitting Distributions to Data Using
Stat::Fit
1. Enter or import data as previously
discussed.
2. Plot the data and look at parameters to
get a sense of the shape of the data.
3. Select distributions to fit and analysis to
use.
4. Run the analysis and view rankings.
5. Make a selection.

5-66
Plotting the Data
To plot the raw data
you have imported
select Input  Input
Graph
The shape of the input
graph can help
determine the
appropriate distribution.

5-67
Descriptive Statistics
If desired, you can view
descriptive statistics
(data parameters) for
the data to get an idea
of the center and
spread.
Select Statistics 
Descriptive

5-68
Setup
Select Fit  Setup
The window that
comes up allows you
to select the
distributions Stat::Fit
will fit to the data set.

5-69
Setup
By selecting the
Calculations tab, you
can change the tests
to be run, the
estimators to be used,
and the level of
significance.

5-70
Estimators
Stat::Fit allows you to
change the estimation
technique utilized to fit
the distributions.
MLE’s (Maximum
Likelihood Estimators)
are generally preferred,
but in some cases the
MLE estimator doesn’t
exist so you must use
Moments.

5-71
Tests
You can choose to
perform any
combination of the
three goodness of fit
tests that are available.
All three tests measure
the extent that the fit
distribution models the
data set but in different
ways.

5-72
Chi-Squared Test
1. Compares actual counts(values from the
input dataset) versus expected counts
(values from the estimated distribution)
2. Derives p-value from how much these
values differ
3. Better with larger sample sizes

5-73
Kolmogorov-Smirnov Test
1. Difference between cumulative
distribution of data and fit distribution
2. Most conservative, least likely to reject
the correct distribution in error

5-74
Anderson-Darling Test
1. Like Kolmogorov Smirnov, but gives a
heavier weight to differences in the tails of
the distribution
2. Good for any sample size
3. Not good for discrete data

5-75
Tests
 The Hypotheses:
 H0: The distribution fit is in fact the correct
distribution to describe the variable of
interest.
 H1: The distribution fit is NOT the correct
distribution to describe the variable of
interest.

5-76
How often are you willing to be
wrong?
You set the value of the
level of significance
based on the answer
to the question
above.
Tells you how likely the
test rejects a
distribution that
accurately describes
the data.

5-77
Errors
 There are two types of errors that can be
made when performing a statistical test:
 Type I: you reject H0 when in fact H0 is true
 Type II: you accept H0 when if fact H1 is true
 The level of significance you chose IS the
probability of a Type I error

5-78
Fitting the Data
There are two ways to perform the goodness
of fit tests:
 Select the Auto::Fit button from the toolbar
 Select Fit  Goodness of Fit

5-79
Auto::Fit
Within the Auto::Fit
window you can select
to fit continuous or
discrete distributions.
If the distribution has a
lower bound, that value
can be specified here.

5-80
Auto::Fit
Using Auto::Fit, the distributions are automatically
ranked according to which seem to fit the data the best.

5-81
Goodness of Fit Tests
The test results are given
along with the actual
distribution fit. Each test
has a result: Reject or Do
Not Reject.
Do Not Reject means
there was not enough
evidence to conclude it is
not the correct
distribution to describe
the data.

5-82
Distribution Graph
After picking out the top
few distributions, it can
be useful to graph the
fit distribution against
the data.
Select Fit  Results
Graph  Comparison

5-83
Picking a Fit
 Compare test results.
 Compare graphs.
 Use what you know about the
process.

5-84
Exporting
Once you have selected
a fit, you need to export
it to the PROMODEL
product.
Select File  Export
 Export Fit or the
Export button from the
toolbar.

5-85
Exporting
 Select the application
you would like to
export to (PROMODEL
Products)
 Select the distribution
to export

5-86
Exporting
 The precision box
allows you to change
the number of
decimal places in the
distribution
parameters
 Select OK
 The distribution is
now in the correct
form to paste into
your model

5-87
Stat::Fit
 What to avoid:
 Small samples
 Using all goodness of fit tests – it
increases the Type I error rate
 Taking the distribution into the
model without exporting

5-88
Frequency Histogram of
Inspection Times

5-89
Best Distribution Fit for
Inspection Times

5-90
Beta Curve Representing
Inspection Times

5-91
Appropriate Adjustments
 Remember, you fit a distribution to
historical data, not necessarily to the
data reflecting the design period.
 Don’t forget to adjust the data to reflect
the period of interest.
 Is there a growth rate to factor in?
 Is there a learning curve to consider?

5-92
Handling Rare Behavior
 Repeating behavior – e.g. Occasional
abnormally long downtimes.
 Can include if not too infrequent.
 Can model once like non-repeating.
 Non-repeating behavior – e.g. Labor strike
 Throw one in to see what happens.

5-93
Absence of Data
 A single, most likely or mean value
 Minimum and maximum values defining a range
 Minimum, most likely and maximum values
 Use sensitivity analysis:
 Best case
 Worse case
 Most likely case

5-94
Assumptions
 All models are based on assumptions.
 Relative comparisons may still be valid.
 Sensitivity analysis can show crucial
assumptions.

5-95
Data Documentation and
Approval
 Tables
 Flow Diagrams
 Assumption Lists
 Exclusion Lists
 Sources Used
 Consent (not necessarily validation)
from users and decision makers

5-96
Use of Flowchart
Station 1
Station 3
19”, 21” & 25”
Monitors
Station 2 Inspection
Rejected Monitors
25” Monitor
19” & 21” Monitor
Reworked
Monitors

5-97
Use of Tables
Entity Station Opn. Time (min, mode, max)
19" Monitor Station 1 .8, 1, 1.5
Station 2 .9, 1.2, 1.8
Inspection 1.8, 2.2, 3
21” Monitor Station 1 .8, 1, 1.5
Station 2 1.1, 1.3, 1.9
Inspection 1.8, 2.2, 3
25” Monitor Station 1 .9, 1.1, 1.6
Station 2 1.2, 1.4, 2
Inspection 1.8, 2.3, 3.2
Station 3 .5, .7, 1

5-98
Use of Operation Rules
l Defective monitors are detected at inspection and routed to
whichever station created the problem.
l Monitors waiting at a station for rework have a higher
priority than first time monitors.
l Corrected monitors are routed back to inspection.
l A reworked monitor fails a second time it is removed from
the system.
Handling Defective Monitors

5-99
Use of an Assumption List
l No downtimes are considered (downtimes are rare).
l Operators are dedicated at each workstation and are always
available during the scheduled work time.
l Rework times are half of the normal operation times.
Assumptions

5-100
Remember…
Information is never accurate or sufficient enough
to make a “no risk” decision. But the better the
information, the less risky the decision.

Chap_05_Data_Collection_and_Analysis.ppt

More Related Content

Similar to Chap_05_Data_Collection_and_Analysis.ppt

Recently uploaded

Chap_05_Data_Collection_and_Analysis.ppt

Editor's Notes