SlideShare a Scribd company logo
1 of 100
Chapter 5
Data Collection and Analysis
“You can observe a lot just by watching.”
– Yogi Berra
McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
5-2
Questions to be answered:
 What types of data should be gathered?
 How should data be gathered?
 What statistical background do I need?
 How should data be analyzed?
 How do you get data in the right form for
use in simulation?
 How should data be documented?
5-3
Objective of Data Collection and
Analysis
The goal of data collection and analysis it to
come up with descriptive information and
statistics that define the behavior of the
system.
5-4
What Types of Data Should Be
Gathered?
 Structural
 Operational
 Numerical
5-5
Structural Data
 Resources used
 Workstations and buffers
 Entity types
 Paths and conveyors
5-6
Operational Data
 Routings
 Arrivals
 Work Schedules (shifts)
 Decision logic
5-7
Numerical Data
 Resource quantities
 Buffer sizes
 Operation times
 Move times
 Interarrival times
 Batch sizes
 Time between failures
Requires
some
statistical
analysis
5-8
How Should Data Be Gathered?
1. Determine data requirements.
2. Identify reliable data sources.
3. Collect the data.
4. Summarize the data
5. Document and approve data.
5-9
Suggestions for Data Gathering
 Define the problem/objective for the simulation
(maximize throughput, minimize cycle time, etc.)
 Identify only factors that bear on the problem
(operation times, resource scheduling, etc.)
 Focus on input variables, not response variables (flow
times, throughput rates, utilizations, etc.)
 Separate delay times from delay conditions (e.g. getting
a resource vs. activity time).
 Look for common groupings (e.g., part families)
 Focus on essence (i.e. time, conditions), not substance
(i.e. how the activity is performed).
 Look for triggering events (What triggers entity
movement? What triggers a machine setup?).
5-10
Look for the Constraints
 A system constraint is anything that keeps
everything from happening all at once.
 Constraints include time delays,
conditional delays due to unavailable
resources, parts, etc.
5-11
Use a Questionnaire
 Organizes and simplifies the data
gathering process.
 Helps ensure all issues are addressed.
 Can send it to process owners in
advance and leave a copy afterwards
5-12
Sources of Data
 Historical records (production, sales,
scrap rates, equipment reliability)
 System documentation (process plans,
facility layouts, work procedures)
 Personal Observation (facility walk-
through, time studies, work sampling)
 Interviews (operators, maintenance
personnel, engineers, managers)
5-13
Sources of Data (cont.)
 Comparative systems (same or similar
industries)
 Vendor claims (cycle times, equipment
reliability)
 Design estimates (process times, move
times, etc. for a new system)
 Literature (published research on
learning curves, predetermined time
studies, etc.)
5-14
What to Avoid
 Opinions when actual times can be
determined.
 Taking only one or two sample times.
 Taking samples from only one day or only
one operator then applying the results to
many operators.
5-15
Know when to quit.
5-16
Systematic Data Collection
1. Define overall flow logic.
2. Describe each process step
3. Give specific values
5-17
Defining the Process Flow
Station 1 Station 2 Station 3
Station 4 Station 5
Product A
5-18
Description of Operation
Location Activity
Time
Activity
Resource
Next
Location
Move
Trigger
Move
Time
Move
Resource
Check-in
Counter
N(1,.2) min. Secretary Waiting
Room
None .2 min. None
Waiting
Room
None None Exam
Room
When
room is
available
.8 min.* Nurse
Exam
Room
N(15,4) min. Doctor Check-out
Counter
None .2 min. None
Check-out
Counter
N(3,.5) min. Secretary Exit None .2 min. None
Patient
Waiting
Room
Exam
Room
(3)
Check-out
Counter
Check-in
Counter
5-19
Random System Variables
5-20
Random System Variables
 Qualitative variable -- A non-numerically valued variable.
 Quantitative variable -- A numerically valued variable.
 Discrete Variable -- A quantitative variable whose possible
values form a finite set of specific values (categories for
qualitative data, whole numbers for quantitative data).
 Continuous variable -- A quantitative variable whose
possible values can vary infinitely within a range.
Defining characteristics of a system that vary in value from
one observation to the next (e.g. cycle time, boxes per
pallet).
5-21
Characterizing Random
Variables
 descriptive statistics (describe the data)
 data analysis (looks for correlations in the
data)
 distribution fitting (determines the
appropriate probability distribution to
represent the data)
You don’t need to be a professional statistician,
all you need is a basic knowledge of
5-22
Discrete vs. Continuous Variables
 Continuous – The variable can take on
any value within a range (i.e. Height,
Weight, Time, etc.)
 Discrete – The variable can only take
select values within a range (i.e. Gender,
Patient Class, Part Type, Counts, etc.)
5-23
Data for a Variable
 A random system variable is defined by
gathering sample data on the variable.
 For random variables, the more data that
are gathered (i.e., the larger the sample
size), the more accurate the
characterization is of the variable.
5-24
Data Groupings
 Class – Category or range of values for
grouping data.
 Frequency -- The number of observations that
fall in a class.
 Frequency distribution -- A listing of all
classes along with their frequencies.
 Relative frequency -- The ratio of the
frequency of a class to the total number of
observations.
 Relative-frequency distribution -- A listing
of all classes along with their relative
frequencies.
5-25
Histograms vs. Bar Charts
•Used for quantitative variables
•Horizontal axis shows range
•Bars touch each other
•Used for categorical variables
•Horizontal axis shows category
•Bars don’t touch each other
5-26
Age Tally Frequency Relative
Frequency
Percent
25 to < 33 |||| 5 5/50 = .10 10%
33 to < 41 |||| |||| |||| 14 14/50 = .28 28%
41 to < 49 |||| |||| ||| 13 13/50 = .26 26%
49 to < 57 |||| |||| 9 9/50 = .18 18%
57 to < 65 |||| || 7 7/50 = .14 14%
65 to < 73 || 2 2/50 = .04 4%
A Histogram is used
to show frequency or
percentage by
interval (data will
always be numerical)
5-27
Bar chart or Histogram?
5-28
Histograms reveal the shape,
center, and spread of a variable
 Shape refers to the shape formed by
the bars of the histogram
 Center refers to the mean of the
variable. If the histogram were an
object, it would “balance” on the mean
(half the area is to the left, half to the
right).
 Spread refers to how far dispersed the
data values are
5-29
Measures of Center or “Location”
 Mean
 Median
 Mode
5-30
Calculating Sample Mean
n
i
X
X


Formula:
That is, add up all of the data points and divide
by the number of data points.
Data (# of classes skipped): 2 8 3 4 1
Sample Mean = (2+8+3+4+1)/5 = 3.6
Do not round! Mean need not be a whole number.
5-31
Median
 Another name for 50th percentile.
 Appropriate for describing measurement
data.
 “Robust to outliers,” that is, not
affected much by unusual values.
5-32
Mode
 The value that occurs most frequently.
 One data set can have many modes.
 Appropriate for all types of data, but most
useful for categorical data or discrete data
with only a few number of possible values.
5-33
Histograms can be unimodal,
multimodal or uniform
5-34
A histogram can show you if
there are outliers in the data.
5-35
Measures of Spread
 Range
 Variance
 Standard Deviation
5-36
How and why should data be
analyzed?
 Data analysis ensures that your data is
meaningful and useful.
 Types of analysis include:
 Test for independence (randomness).
 Test for homogeneity (same source).
 Test for stationarity (non varying over time).
5-37
Testing for Independence
(Randomness)
 Scatter Plot
 Autocorrelation Plot
 Runs Test
These tests can be run using Stat::Fit
which can be run “Stand-alone” or from
the ProModel Tools menu.
5-38
Stat::Fit
 Stat::Fit is the data analysis and
distribution fitting software package
bundled with PROMODEL products.
 You can enter or import a dataset into
Stat::Fit for distribution fitting.
 You can copy and paste the fitted
distribution parameters into a ProModel
model.
5-39
Entering Data in Stat::Fit
 Type values.
 Open a .dat file
(File  Open).
 Copy and paste
from spreadsheet.
5-40
Scatter Plot
 Tests for Independence.
 Plots successive pairs of data as x,y values
(n-1 points).
 Random scatter of points indicates
independence.
 If data are correlated, the points will fall
along a line or curve.
5-41
Scatter Plot for 100 Inspection
Times
5-42
Scatter Plot for 100 Temperatures
5-43
Autocorrelation Plot
 Another test for independence.
 Independence is ascertained by computing
autocorrelations for data values at varying
time lags.
 If independent, such autocorrelations
should be near zero for any and all time-
lag separations.
5-44
Autocorrelation Plot for
Inspection Times
5-45
Autocorrelation Plot for
Temperatures
5-46
Data that tends to be non-
homogenous
 Activity times that take longer or
shorter depending on the type of entity
being processed.
 Inter-arrival times that fluctuate in
length depending on the time of day or
day of the week.
 Time between failures and time to
repair where the failure may result from
a number of different causes.
5-47
Testing for Identically Distributed
(Homogenous) Data
Repair Time
Frequency
of
Occurrence
Part Jams Mechanical Failures
Bimodal Distribution of Downtimes Indicating Multiple Causes
5-48
Nonstationary (time-variant)
Data
 Behavior that changes over time
Examples:
 Customer arrivals
 Equipment reliability
5-49
Non-stationary Data
Time of Day
Rate of
Arrival
10:00 a.m. 12:00 a.m. a.m. 2:00 p.m. 4:00 p.m. 6:00 p.m.
Change in Rate of Customer Arrivals Between 10 a.m. and 6 p.m.
5-50
Three ways to represent data
 Use actual data -- e.g. read from text file
 Use a frequency table -- called an
empirical or user-defined distribution
 Use a standard distribution -- best guess
or Stat::Fit
5-51
Probability Distributions
 A Probability Distribution defines all
possible values of a system variable
plotted against their respective
probabilities.
 Distributions can be either discrete
(probability mass function) or continuous
(probability density function).
5-52
Bernoulli
 The output of a process is either defective
or non-defective
 An employee shows up for work or not
 An operation is required or not
f(x)
x
0 1
1.0
0.5
5-53
Binomial
 The number of defective items in a batch.
 The number of customers of a particular
type that enter the system.
 The number of employees out of a group
of employees who call in sick on a given
day.
f(x)
x
0 1 2 3 4 5 6
.4
.3
.2
.1
5-54
 The number of entities arriving each hour.
 The number of defects per item.
 The number of times a resource is
interrupted each hour.
Poisson
f(x)
x
0.2
0.1
0 1 2 3 4 5 6 7 8 9 10
5-55
 The number of machine cycles before a
failure occurs.
 The number of items inspected before a
defective item is found.
 The number of customers processed
before a particular type is encountered
Geometric
f(x)
x
1 2 3 4 5 6 7
0
0.4
0.3
0.2
0.1
5-56
 The type of an incoming entity given that
each possible type is equally likely to
occur
Uniform
f(x)
x
a b
5-57
f(x)
x
a m c
Triangular
• good first approximation to the true
underlying distribution when data is sparse
and no distribution fitting analysis has been
performed
5-58
 Popular but rarely a true representation of
actual data.
Normal
 

f(x)
x
5-59
 intervals between occurrences such as
the time between customer arrivals.
 certain repair times or activities such as
the duration of telephone conversations.
 Inverse of Poisson
Exponential
f(x)
x
5-60
 random proportions such as the
percentage of defective items in a lot
 activity times, particularly when multiple
tasks make up the activity (PERT
analysis is based on beta distribution).
Beta
f(x)
x
 =0.5,  =2
 =1.0,  =2.0
 =.5,  =.5
1.0
0
5-61
 manual activities such as assembly,
inspection or repair.
 The time between failures is often
lognormally distributed.
Lognormal
f(x)
x
5-62
 manual tasks such as service times or
repair times
Gamma
f(x)
x
 
 >
1
2
5-63
 used in reliability theory for defining the
time until failure particularly due to
items (e.g. bearings, tooling, etc.) that
wear
Weibull
f(x)
x
 
 =
1
2
5-64
Bounded vs. Boundless
Distributions
 Bounded distributions prevent likely
extreme values from occurring.
 Boundless distributions cause unlikely
extreme values to occur.
5-65
Fitting Distributions to Data Using
Stat::Fit
1. Enter or import data as previously
discussed.
2. Plot the data and look at parameters to
get a sense of the shape of the data.
3. Select distributions to fit and analysis to
use.
4. Run the analysis and view rankings.
5. Make a selection.
5-66
Plotting the Data
To plot the raw data
you have imported
select Input  Input
Graph
The shape of the input
graph can help
determine the
appropriate distribution.
5-67
Descriptive Statistics
If desired, you can view
descriptive statistics
(data parameters) for
the data to get an idea
of the center and
spread.
Select Statistics 
Descriptive
5-68
Setup
Select Fit  Setup
The window that
comes up allows you
to select the
distributions Stat::Fit
will fit to the data set.
5-69
Setup
By selecting the
Calculations tab, you
can change the tests
to be run, the
estimators to be used,
and the level of
significance.
5-70
Estimators
Stat::Fit allows you to
change the estimation
technique utilized to fit
the distributions.
MLE’s (Maximum
Likelihood Estimators)
are generally preferred,
but in some cases the
MLE estimator doesn’t
exist so you must use
Moments.
5-71
Tests
You can choose to
perform any
combination of the
three goodness of fit
tests that are available.
All three tests measure
the extent that the fit
distribution models the
data set but in different
ways.
5-72
Chi-Squared Test
1. Compares actual counts(values from the
input dataset) versus expected counts
(values from the estimated distribution)
2. Derives p-value from how much these
values differ
3. Better with larger sample sizes
5-73
Kolmogorov-Smirnov Test
1. Difference between cumulative
distribution of data and fit distribution
2. Most conservative, least likely to reject
the correct distribution in error
5-74
Anderson-Darling Test
1. Like Kolmogorov Smirnov, but gives a
heavier weight to differences in the tails of
the distribution
2. Good for any sample size
3. Not good for discrete data
5-75
Tests
 The Hypotheses:
 H0: The distribution fit is in fact the correct
distribution to describe the variable of
interest.
 H1: The distribution fit is NOT the correct
distribution to describe the variable of
interest.
5-76
How often are you willing to be
wrong?
You set the value of the
level of significance
based on the answer
to the question
above.
Tells you how likely the
test rejects a
distribution that
accurately describes
the data.
5-77
Errors
 There are two types of errors that can be
made when performing a statistical test:
 Type I: you reject H0 when in fact H0 is true
 Type II: you accept H0 when if fact H1 is true
 The level of significance you chose IS the
probability of a Type I error
5-78
Fitting the Data
There are two ways to perform the goodness
of fit tests:
 Select the Auto::Fit button from the toolbar
 Select Fit  Goodness of Fit
5-79
Auto::Fit
Within the Auto::Fit
window you can select
to fit continuous or
discrete distributions.
If the distribution has a
lower bound, that value
can be specified here.
5-80
Auto::Fit
Using Auto::Fit, the distributions are automatically
ranked according to which seem to fit the data the best.
5-81
Goodness of Fit Tests
The test results are given
along with the actual
distribution fit. Each test
has a result: Reject or Do
Not Reject.
Do Not Reject means
there was not enough
evidence to conclude it is
not the correct
distribution to describe
the data.
5-82
Distribution Graph
After picking out the top
few distributions, it can
be useful to graph the
fit distribution against
the data.
Select Fit  Results
Graph  Comparison
5-83
Picking a Fit
 Compare test results.
 Compare graphs.
 Use what you know about the
process.
5-84
Exporting
Once you have selected
a fit, you need to export
it to the PROMODEL
product.
Select File  Export
 Export Fit or the
Export button from the
toolbar.
5-85
Exporting
 Select the application
you would like to
export to (PROMODEL
Products)
 Select the distribution
to export
5-86
Exporting
 The precision box
allows you to change
the number of
decimal places in the
distribution
parameters
 Select OK
 The distribution is
now in the correct
form to paste into
your model
5-87
Stat::Fit
 What to avoid:
 Small samples
 Using all goodness of fit tests – it
increases the Type I error rate
 Taking the distribution into the
model without exporting
5-88
Frequency Histogram of
Inspection Times
5-89
Best Distribution Fit for
Inspection Times
5-90
Beta Curve Representing
Inspection Times
5-91
Appropriate Adjustments
 Remember, you fit a distribution to
historical data, not necessarily to the
data reflecting the design period.
 Don’t forget to adjust the data to reflect
the period of interest.
 Is there a growth rate to factor in?
 Is there a learning curve to consider?
5-92
Handling Rare Behavior
 Repeating behavior – e.g. Occasional
abnormally long downtimes.
 Can include if not too infrequent.
 Can model once like non-repeating.
 Non-repeating behavior – e.g. Labor strike
 Throw one in to see what happens.
5-93
Absence of Data
 A single, most likely or mean value
 Minimum and maximum values defining a range
 Minimum, most likely and maximum values
 Use sensitivity analysis:
 Best case
 Worse case
 Most likely case
5-94
Assumptions
 All models are based on assumptions.
 Relative comparisons may still be valid.
 Sensitivity analysis can show crucial
assumptions.
5-95
Data Documentation and
Approval
 Tables
 Flow Diagrams
 Assumption Lists
 Exclusion Lists
 Sources Used
 Consent (not necessarily validation)
from users and decision makers
5-96
Use of Flowchart
Station 1
Station 3
19”, 21” & 25”
Monitors
Station 2 Inspection
Rejected Monitors
25” Monitor
19” & 21” Monitor
Reworked
Monitors
5-97
Use of Tables
Entity Station Opn. Time (min, mode, max)
19" Monitor Station 1 .8, 1, 1.5
Station 2 .9, 1.2, 1.8
Inspection 1.8, 2.2, 3
21” Monitor Station 1 .8, 1, 1.5
Station 2 1.1, 1.3, 1.9
Inspection 1.8, 2.2, 3
25” Monitor Station 1 .9, 1.1, 1.6
Station 2 1.2, 1.4, 2
Inspection 1.8, 2.3, 3.2
Station 3 .5, .7, 1
5-98
Use of Operation Rules
l Defective monitors are detected at inspection and routed to
whichever station created the problem.
l Monitors waiting at a station for rework have a higher
priority than first time monitors.
l Corrected monitors are routed back to inspection.
l A reworked monitor fails a second time it is removed from
the system.
Handling Defective Monitors
5-99
Use of an Assumption List
l No downtimes are considered (downtimes are rare).
l Operators are dedicated at each workstation and are always
available during the scheduled work time.
l Rework times are half of the normal operation times.
Assumptions
5-100
Remember…
Information is never accurate or sufficient enough
to make a “no risk” decision. But the better the
information, the less risky the decision.

More Related Content

Similar to Chap_05_Data_Collection_and_Analysis.ppt

Quality management system documentation
Quality management system documentationQuality management system documentation
Quality management system documentation
selinasimpson1501
 
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxUnit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
tesfkeb
 
Training quality management system
Training quality management systemTraining quality management system
Training quality management system
selinasimpson2701
 

Similar to Chap_05_Data_Collection_and_Analysis.ppt (20)

Six sigma tools an overview
Six sigma tools  an overviewSix sigma tools  an overview
Six sigma tools an overview
 
Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Data Mining with SQL Server 2008
Data Mining with SQL Server 2008
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
analysis plan.ppt
analysis plan.pptanalysis plan.ppt
analysis plan.ppt
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Metopen 6
Metopen 6Metopen 6
Metopen 6
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Tqm old tools
Tqm old toolsTqm old tools
Tqm old tools
 
Tqm old tools
Tqm old toolsTqm old tools
Tqm old tools
 
OLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENTOLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENT
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
Quality management system documentation
Quality management system documentationQuality management system documentation
Quality management system documentation
 
7 qc tools
7 qc tools7 qc tools
7 qc tools
 
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxUnit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
 
Training quality management system
Training quality management systemTraining quality management system
Training quality management system
 
Business analyst
Business analystBusiness analyst
Business analyst
 
Using Investigative Analytics to Speed New Drugs to Market
Using Investigative Analytics to Speed New Drugs to MarketUsing Investigative Analytics to Speed New Drugs to Market
Using Investigative Analytics to Speed New Drugs to Market
 

Recently uploaded

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 

Recently uploaded (20)

Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

Chap_05_Data_Collection_and_Analysis.ppt

  • 1. Chapter 5 Data Collection and Analysis “You can observe a lot just by watching.” – Yogi Berra McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
  • 2. 5-2 Questions to be answered:  What types of data should be gathered?  How should data be gathered?  What statistical background do I need?  How should data be analyzed?  How do you get data in the right form for use in simulation?  How should data be documented?
  • 3. 5-3 Objective of Data Collection and Analysis The goal of data collection and analysis it to come up with descriptive information and statistics that define the behavior of the system.
  • 4. 5-4 What Types of Data Should Be Gathered?  Structural  Operational  Numerical
  • 5. 5-5 Structural Data  Resources used  Workstations and buffers  Entity types  Paths and conveyors
  • 6. 5-6 Operational Data  Routings  Arrivals  Work Schedules (shifts)  Decision logic
  • 7. 5-7 Numerical Data  Resource quantities  Buffer sizes  Operation times  Move times  Interarrival times  Batch sizes  Time between failures Requires some statistical analysis
  • 8. 5-8 How Should Data Be Gathered? 1. Determine data requirements. 2. Identify reliable data sources. 3. Collect the data. 4. Summarize the data 5. Document and approve data.
  • 9. 5-9 Suggestions for Data Gathering  Define the problem/objective for the simulation (maximize throughput, minimize cycle time, etc.)  Identify only factors that bear on the problem (operation times, resource scheduling, etc.)  Focus on input variables, not response variables (flow times, throughput rates, utilizations, etc.)  Separate delay times from delay conditions (e.g. getting a resource vs. activity time).  Look for common groupings (e.g., part families)  Focus on essence (i.e. time, conditions), not substance (i.e. how the activity is performed).  Look for triggering events (What triggers entity movement? What triggers a machine setup?).
  • 10. 5-10 Look for the Constraints  A system constraint is anything that keeps everything from happening all at once.  Constraints include time delays, conditional delays due to unavailable resources, parts, etc.
  • 11. 5-11 Use a Questionnaire  Organizes and simplifies the data gathering process.  Helps ensure all issues are addressed.  Can send it to process owners in advance and leave a copy afterwards
  • 12. 5-12 Sources of Data  Historical records (production, sales, scrap rates, equipment reliability)  System documentation (process plans, facility layouts, work procedures)  Personal Observation (facility walk- through, time studies, work sampling)  Interviews (operators, maintenance personnel, engineers, managers)
  • 13. 5-13 Sources of Data (cont.)  Comparative systems (same or similar industries)  Vendor claims (cycle times, equipment reliability)  Design estimates (process times, move times, etc. for a new system)  Literature (published research on learning curves, predetermined time studies, etc.)
  • 14. 5-14 What to Avoid  Opinions when actual times can be determined.  Taking only one or two sample times.  Taking samples from only one day or only one operator then applying the results to many operators.
  • 16. 5-16 Systematic Data Collection 1. Define overall flow logic. 2. Describe each process step 3. Give specific values
  • 17. 5-17 Defining the Process Flow Station 1 Station 2 Station 3 Station 4 Station 5 Product A
  • 18. 5-18 Description of Operation Location Activity Time Activity Resource Next Location Move Trigger Move Time Move Resource Check-in Counter N(1,.2) min. Secretary Waiting Room None .2 min. None Waiting Room None None Exam Room When room is available .8 min.* Nurse Exam Room N(15,4) min. Doctor Check-out Counter None .2 min. None Check-out Counter N(3,.5) min. Secretary Exit None .2 min. None Patient Waiting Room Exam Room (3) Check-out Counter Check-in Counter
  • 20. 5-20 Random System Variables  Qualitative variable -- A non-numerically valued variable.  Quantitative variable -- A numerically valued variable.  Discrete Variable -- A quantitative variable whose possible values form a finite set of specific values (categories for qualitative data, whole numbers for quantitative data).  Continuous variable -- A quantitative variable whose possible values can vary infinitely within a range. Defining characteristics of a system that vary in value from one observation to the next (e.g. cycle time, boxes per pallet).
  • 21. 5-21 Characterizing Random Variables  descriptive statistics (describe the data)  data analysis (looks for correlations in the data)  distribution fitting (determines the appropriate probability distribution to represent the data) You don’t need to be a professional statistician, all you need is a basic knowledge of
  • 22. 5-22 Discrete vs. Continuous Variables  Continuous – The variable can take on any value within a range (i.e. Height, Weight, Time, etc.)  Discrete – The variable can only take select values within a range (i.e. Gender, Patient Class, Part Type, Counts, etc.)
  • 23. 5-23 Data for a Variable  A random system variable is defined by gathering sample data on the variable.  For random variables, the more data that are gathered (i.e., the larger the sample size), the more accurate the characterization is of the variable.
  • 24. 5-24 Data Groupings  Class – Category or range of values for grouping data.  Frequency -- The number of observations that fall in a class.  Frequency distribution -- A listing of all classes along with their frequencies.  Relative frequency -- The ratio of the frequency of a class to the total number of observations.  Relative-frequency distribution -- A listing of all classes along with their relative frequencies.
  • 25. 5-25 Histograms vs. Bar Charts •Used for quantitative variables •Horizontal axis shows range •Bars touch each other •Used for categorical variables •Horizontal axis shows category •Bars don’t touch each other
  • 26. 5-26 Age Tally Frequency Relative Frequency Percent 25 to < 33 |||| 5 5/50 = .10 10% 33 to < 41 |||| |||| |||| 14 14/50 = .28 28% 41 to < 49 |||| |||| ||| 13 13/50 = .26 26% 49 to < 57 |||| |||| 9 9/50 = .18 18% 57 to < 65 |||| || 7 7/50 = .14 14% 65 to < 73 || 2 2/50 = .04 4% A Histogram is used to show frequency or percentage by interval (data will always be numerical)
  • 27. 5-27 Bar chart or Histogram?
  • 28. 5-28 Histograms reveal the shape, center, and spread of a variable  Shape refers to the shape formed by the bars of the histogram  Center refers to the mean of the variable. If the histogram were an object, it would “balance” on the mean (half the area is to the left, half to the right).  Spread refers to how far dispersed the data values are
  • 29. 5-29 Measures of Center or “Location”  Mean  Median  Mode
  • 30. 5-30 Calculating Sample Mean n i X X   Formula: That is, add up all of the data points and divide by the number of data points. Data (# of classes skipped): 2 8 3 4 1 Sample Mean = (2+8+3+4+1)/5 = 3.6 Do not round! Mean need not be a whole number.
  • 31. 5-31 Median  Another name for 50th percentile.  Appropriate for describing measurement data.  “Robust to outliers,” that is, not affected much by unusual values.
  • 32. 5-32 Mode  The value that occurs most frequently.  One data set can have many modes.  Appropriate for all types of data, but most useful for categorical data or discrete data with only a few number of possible values.
  • 33. 5-33 Histograms can be unimodal, multimodal or uniform
  • 34. 5-34 A histogram can show you if there are outliers in the data.
  • 35. 5-35 Measures of Spread  Range  Variance  Standard Deviation
  • 36. 5-36 How and why should data be analyzed?  Data analysis ensures that your data is meaningful and useful.  Types of analysis include:  Test for independence (randomness).  Test for homogeneity (same source).  Test for stationarity (non varying over time).
  • 37. 5-37 Testing for Independence (Randomness)  Scatter Plot  Autocorrelation Plot  Runs Test These tests can be run using Stat::Fit which can be run “Stand-alone” or from the ProModel Tools menu.
  • 38. 5-38 Stat::Fit  Stat::Fit is the data analysis and distribution fitting software package bundled with PROMODEL products.  You can enter or import a dataset into Stat::Fit for distribution fitting.  You can copy and paste the fitted distribution parameters into a ProModel model.
  • 39. 5-39 Entering Data in Stat::Fit  Type values.  Open a .dat file (File  Open).  Copy and paste from spreadsheet.
  • 40. 5-40 Scatter Plot  Tests for Independence.  Plots successive pairs of data as x,y values (n-1 points).  Random scatter of points indicates independence.  If data are correlated, the points will fall along a line or curve.
  • 41. 5-41 Scatter Plot for 100 Inspection Times
  • 42. 5-42 Scatter Plot for 100 Temperatures
  • 43. 5-43 Autocorrelation Plot  Another test for independence.  Independence is ascertained by computing autocorrelations for data values at varying time lags.  If independent, such autocorrelations should be near zero for any and all time- lag separations.
  • 46. 5-46 Data that tends to be non- homogenous  Activity times that take longer or shorter depending on the type of entity being processed.  Inter-arrival times that fluctuate in length depending on the time of day or day of the week.  Time between failures and time to repair where the failure may result from a number of different causes.
  • 47. 5-47 Testing for Identically Distributed (Homogenous) Data Repair Time Frequency of Occurrence Part Jams Mechanical Failures Bimodal Distribution of Downtimes Indicating Multiple Causes
  • 48. 5-48 Nonstationary (time-variant) Data  Behavior that changes over time Examples:  Customer arrivals  Equipment reliability
  • 49. 5-49 Non-stationary Data Time of Day Rate of Arrival 10:00 a.m. 12:00 a.m. a.m. 2:00 p.m. 4:00 p.m. 6:00 p.m. Change in Rate of Customer Arrivals Between 10 a.m. and 6 p.m.
  • 50. 5-50 Three ways to represent data  Use actual data -- e.g. read from text file  Use a frequency table -- called an empirical or user-defined distribution  Use a standard distribution -- best guess or Stat::Fit
  • 51. 5-51 Probability Distributions  A Probability Distribution defines all possible values of a system variable plotted against their respective probabilities.  Distributions can be either discrete (probability mass function) or continuous (probability density function).
  • 52. 5-52 Bernoulli  The output of a process is either defective or non-defective  An employee shows up for work or not  An operation is required or not f(x) x 0 1 1.0 0.5
  • 53. 5-53 Binomial  The number of defective items in a batch.  The number of customers of a particular type that enter the system.  The number of employees out of a group of employees who call in sick on a given day. f(x) x 0 1 2 3 4 5 6 .4 .3 .2 .1
  • 54. 5-54  The number of entities arriving each hour.  The number of defects per item.  The number of times a resource is interrupted each hour. Poisson f(x) x 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10
  • 55. 5-55  The number of machine cycles before a failure occurs.  The number of items inspected before a defective item is found.  The number of customers processed before a particular type is encountered Geometric f(x) x 1 2 3 4 5 6 7 0 0.4 0.3 0.2 0.1
  • 56. 5-56  The type of an incoming entity given that each possible type is equally likely to occur Uniform f(x) x a b
  • 57. 5-57 f(x) x a m c Triangular • good first approximation to the true underlying distribution when data is sparse and no distribution fitting analysis has been performed
  • 58. 5-58  Popular but rarely a true representation of actual data. Normal    f(x) x
  • 59. 5-59  intervals between occurrences such as the time between customer arrivals.  certain repair times or activities such as the duration of telephone conversations.  Inverse of Poisson Exponential f(x) x
  • 60. 5-60  random proportions such as the percentage of defective items in a lot  activity times, particularly when multiple tasks make up the activity (PERT analysis is based on beta distribution). Beta f(x) x  =0.5,  =2  =1.0,  =2.0  =.5,  =.5 1.0 0
  • 61. 5-61  manual activities such as assembly, inspection or repair.  The time between failures is often lognormally distributed. Lognormal f(x) x
  • 62. 5-62  manual tasks such as service times or repair times Gamma f(x) x    > 1 2
  • 63. 5-63  used in reliability theory for defining the time until failure particularly due to items (e.g. bearings, tooling, etc.) that wear Weibull f(x) x    = 1 2
  • 64. 5-64 Bounded vs. Boundless Distributions  Bounded distributions prevent likely extreme values from occurring.  Boundless distributions cause unlikely extreme values to occur.
  • 65. 5-65 Fitting Distributions to Data Using Stat::Fit 1. Enter or import data as previously discussed. 2. Plot the data and look at parameters to get a sense of the shape of the data. 3. Select distributions to fit and analysis to use. 4. Run the analysis and view rankings. 5. Make a selection.
  • 66. 5-66 Plotting the Data To plot the raw data you have imported select Input  Input Graph The shape of the input graph can help determine the appropriate distribution.
  • 67. 5-67 Descriptive Statistics If desired, you can view descriptive statistics (data parameters) for the data to get an idea of the center and spread. Select Statistics  Descriptive
  • 68. 5-68 Setup Select Fit  Setup The window that comes up allows you to select the distributions Stat::Fit will fit to the data set.
  • 69. 5-69 Setup By selecting the Calculations tab, you can change the tests to be run, the estimators to be used, and the level of significance.
  • 70. 5-70 Estimators Stat::Fit allows you to change the estimation technique utilized to fit the distributions. MLE’s (Maximum Likelihood Estimators) are generally preferred, but in some cases the MLE estimator doesn’t exist so you must use Moments.
  • 71. 5-71 Tests You can choose to perform any combination of the three goodness of fit tests that are available. All three tests measure the extent that the fit distribution models the data set but in different ways.
  • 72. 5-72 Chi-Squared Test 1. Compares actual counts(values from the input dataset) versus expected counts (values from the estimated distribution) 2. Derives p-value from how much these values differ 3. Better with larger sample sizes
  • 73. 5-73 Kolmogorov-Smirnov Test 1. Difference between cumulative distribution of data and fit distribution 2. Most conservative, least likely to reject the correct distribution in error
  • 74. 5-74 Anderson-Darling Test 1. Like Kolmogorov Smirnov, but gives a heavier weight to differences in the tails of the distribution 2. Good for any sample size 3. Not good for discrete data
  • 75. 5-75 Tests  The Hypotheses:  H0: The distribution fit is in fact the correct distribution to describe the variable of interest.  H1: The distribution fit is NOT the correct distribution to describe the variable of interest.
  • 76. 5-76 How often are you willing to be wrong? You set the value of the level of significance based on the answer to the question above. Tells you how likely the test rejects a distribution that accurately describes the data.
  • 77. 5-77 Errors  There are two types of errors that can be made when performing a statistical test:  Type I: you reject H0 when in fact H0 is true  Type II: you accept H0 when if fact H1 is true  The level of significance you chose IS the probability of a Type I error
  • 78. 5-78 Fitting the Data There are two ways to perform the goodness of fit tests:  Select the Auto::Fit button from the toolbar  Select Fit  Goodness of Fit
  • 79. 5-79 Auto::Fit Within the Auto::Fit window you can select to fit continuous or discrete distributions. If the distribution has a lower bound, that value can be specified here.
  • 80. 5-80 Auto::Fit Using Auto::Fit, the distributions are automatically ranked according to which seem to fit the data the best.
  • 81. 5-81 Goodness of Fit Tests The test results are given along with the actual distribution fit. Each test has a result: Reject or Do Not Reject. Do Not Reject means there was not enough evidence to conclude it is not the correct distribution to describe the data.
  • 82. 5-82 Distribution Graph After picking out the top few distributions, it can be useful to graph the fit distribution against the data. Select Fit  Results Graph  Comparison
  • 83. 5-83 Picking a Fit  Compare test results.  Compare graphs.  Use what you know about the process.
  • 84. 5-84 Exporting Once you have selected a fit, you need to export it to the PROMODEL product. Select File  Export  Export Fit or the Export button from the toolbar.
  • 85. 5-85 Exporting  Select the application you would like to export to (PROMODEL Products)  Select the distribution to export
  • 86. 5-86 Exporting  The precision box allows you to change the number of decimal places in the distribution parameters  Select OK  The distribution is now in the correct form to paste into your model
  • 87. 5-87 Stat::Fit  What to avoid:  Small samples  Using all goodness of fit tests – it increases the Type I error rate  Taking the distribution into the model without exporting
  • 89. 5-89 Best Distribution Fit for Inspection Times
  • 91. 5-91 Appropriate Adjustments  Remember, you fit a distribution to historical data, not necessarily to the data reflecting the design period.  Don’t forget to adjust the data to reflect the period of interest.  Is there a growth rate to factor in?  Is there a learning curve to consider?
  • 92. 5-92 Handling Rare Behavior  Repeating behavior – e.g. Occasional abnormally long downtimes.  Can include if not too infrequent.  Can model once like non-repeating.  Non-repeating behavior – e.g. Labor strike  Throw one in to see what happens.
  • 93. 5-93 Absence of Data  A single, most likely or mean value  Minimum and maximum values defining a range  Minimum, most likely and maximum values  Use sensitivity analysis:  Best case  Worse case  Most likely case
  • 94. 5-94 Assumptions  All models are based on assumptions.  Relative comparisons may still be valid.  Sensitivity analysis can show crucial assumptions.
  • 95. 5-95 Data Documentation and Approval  Tables  Flow Diagrams  Assumption Lists  Exclusion Lists  Sources Used  Consent (not necessarily validation) from users and decision makers
  • 96. 5-96 Use of Flowchart Station 1 Station 3 19”, 21” & 25” Monitors Station 2 Inspection Rejected Monitors 25” Monitor 19” & 21” Monitor Reworked Monitors
  • 97. 5-97 Use of Tables Entity Station Opn. Time (min, mode, max) 19" Monitor Station 1 .8, 1, 1.5 Station 2 .9, 1.2, 1.8 Inspection 1.8, 2.2, 3 21” Monitor Station 1 .8, 1, 1.5 Station 2 1.1, 1.3, 1.9 Inspection 1.8, 2.2, 3 25” Monitor Station 1 .9, 1.1, 1.6 Station 2 1.2, 1.4, 2 Inspection 1.8, 2.3, 3.2 Station 3 .5, .7, 1
  • 98. 5-98 Use of Operation Rules l Defective monitors are detected at inspection and routed to whichever station created the problem. l Monitors waiting at a station for rework have a higher priority than first time monitors. l Corrected monitors are routed back to inspection. l A reworked monitor fails a second time it is removed from the system. Handling Defective Monitors
  • 99. 5-99 Use of an Assumption List l No downtimes are considered (downtimes are rare). l Operators are dedicated at each workstation and are always available during the scheduled work time. l Rework times are half of the normal operation times. Assumptions
  • 100. 5-100 Remember… Information is never accurate or sufficient enough to make a “no risk” decision. But the better the information, the less risky the decision.

Editor's Notes

  1. 48
  2. 50
  3. 52
  4. 53
  5. 54
  6. 55
  7. 56
  8. 57
  9. 58
  10. 59
  11. 60
  12. 61
  13. 62
  14. 63
  15. 69
  16. 88
  17. 92
  18. 93
  19. 95