SlideShare a Scribd company logo
1 of 17
Download to read offline
ISEN614 (ADVANCED QUALITY CONTROL)
PROJECT REPORT
Fall-2015
Naman Kapoor
Rahul Garg
Vinayak Nair
ISEN-614 Project Fall-2015
1
Index
1.) Executive Summary…………………………………………………………………………………………………………………………………………..2
2.) Introduction………………………………………………………………………………………………………………………………………………………3
3.) Approach…………………………………………………………………………………………………………………………………………………………..3
4.) Justification……………………………………………………………………………………………………………………………………………………….7
5.) Results………………………………………………………………………………………………………………………………………………………………7
6.) Conclusion……………………………………………………………………………………………………………………………………………………….16
ISEN-614 Project Fall-2015
2
Executive Summary
The objective of this project was to develop a procedure to isolate the in-control data and out-of-control data points
taken from a manufacturing process so that the distribution parameters can be estimated, and a monitoring scheme can
be set, which can be used for anomaly or change detection for the future process. The dataset available had 552 records
and 209 dimensions. So, it was imperative to reduce the dimension of the dataset such that the individual smaller noise
components do not add up and overwhelm the signal and make our task of anomaly detection more difficult. Though we
have an option to use T2
, m-CUSUM and m-EWMA charts for change detection, it is extremely difficult to do so for a high
dimension data set.
We conducted Principal Component Analysis (PCA) on the dataset to reduce the data dimension by selecting the ‘vital
few’ dimensions over the ‘trivial many’. After performing PCA, we analyzed the Scree plot, Pareto plot and MDL to
choose the right number of PCAs that define most of the variance in our dataset and at the same time are un-correlated.
We found that the first four PCs were enough to explain about 80% of the variability in the given data set. After this we
decided to plot multiple univariate charts for the four selected PC’s to find the out-of-control data points and eliminate
them. Univariate charts are a good option as there is no correlation between the PC’s, and hence there is no need to
monitor the change in correlation. This took a number of iterations before all the data points were in control and the
data was clean enough to estimate distribution parameters for future detection.
Working on this project was quite insightful in a sense that we actually got to work with a real high dimension
manufacturing dataset and were able to understand how to implement the theoretical concepts of PCA in the industry.
This also enabled us to understand the implications of principal of effect sparsity and spectral decomposition. All in all,
the project was a good exercise to delve deep into the procedure of doing phase-1 analysis for a data set of very high
dimension and of course how to use MATLAB to help us with it.
ISEN-614 Project Fall-2015
3
Introduction
In today’s world of cut throat competition and large scale production, only that manufacturer can survive who can
provide good quality products and services that meet or exceed the expectations of the customers. There is a need to
continuously monitor the ongoing manufacturing processes so that any change in the process can be indentified quickly
and rectified to prevent production loss. To help with detecting any change or anomaly in the process, we have control
charts at our disposal which are widely used in today’s manufacturing world because of their effectiveness and accuracy.
Depending upon the type of data – continuous or discrete, univariate or multivariate – we have various options to
choose from, ranging from univariate x-bar chart to m-CUSUM or m-EWMA chart. If analysis of the control chart
indicates that the process is currently under control (i.e., is stable, with variation only coming from sources common to
the process), then, no corrections or changes to process control parameters are needed or desired. In addition, data
from the process can be used to predict the future performance of the process. If the chart indicates that the monitored
process is not in control, analysis of the chart can help determine the sources of variation.
We have introduced Principal Component Analysis (PCA) as the data reduction tool here. PCA is a method to convert co-
related variables into uncorrelated variables or principal components by spectral decomposition. Then out of these
uncorrelated principal components we take a subset such that the components in this subset can define most of the
variation in the data. This will also eliminate the noise due to large number of individual components and will make our
signal stronger.
After selecting the vital few principal components we can clean our data set which means removing all out of control
points for doing phase-1 analysis. Phase-1 analysis is a method to find out the distribution parameters of the in-control
data so that a monitoring scheme can be developed to detect any change/anomaly in future observations. In this
particular case, the number of PC’s we chose was four as these four PC’s were able to capture about 80% variation in the
data. A number of iterations were carried out where all the data points were in control and data is good enough to
estimate the distribution parameters and carrying on with the phase-2 analysis which is to monitor the future process.
Approach
Dataset Interpretation
The dataset provided was multivariate continuous with 209 dimensions (columns) of 552 observations (rows) and each
observation having a sample size of 1 (p=209, m= 552, n=1). The data set was perceived to follow a normal distribution by
plotting the profile of its mean. The distribution observed closely resembled the normal distribution, hence the dataset
was approached as a continuous multivariate data in subsequent manipulations.
ISEN-614 Project Fall-2015
4
Dimensional Reduction
Since the dataset provided had 209 different variables (dimensions) and their 552 observations it was difficult and more
importantly highly impractical to include all the variables in the monitoring process. There was a need to conduct
dimensional reduction to bring out the few variables which explain most of the variability in the data. The technique
selected for identifying the “vital few” dimensions was Principal Component Analysis (PCA). Since the information on the
dataset is very limited and we do not really know the units of the variables, the PCAs were calculated using both the
covariance and the correlation matrix.
I. Method 1- Calculation of PCAs using the ‘Covariance’ matrix of the dataset.
Fig: Pareto chart of principal components
Fig: Scree plot of the eigen values of the principal components
ISEN-614 Project Fall-2015
5
Fig: Minimum Description Lengths of the Principal Components
With the help of above charts, 4 principal components were identified that explain over 80% variability in the whole
dataset.
II. Method 2- Calculation of PCAs using the ‘Correlation’ matrix of the dataset.
Fig: Pareto chart of principal components
ISEN-614 Project Fall-2015
6
Fig: Scree plot of the Eigen values of the principal components
Fig: Minimum Description Lengths of the Principal Components
With the help of above charts, 4 principal components were identified that explain over 90% variability in the whole
dataset.
ISEN-614 Project Fall-2015
7
Data Imputation using Phase I analysis
After the dimensional reduction procedure an appropriate control chart had to be selected to go ahead with phase I
analysis to identify any out of control data points and eliminate them. Since PCA was performed to bring out the variables
that explain most of the variability in the data (4), multiple univariate charts were selected for phase I analysis due to the
fact that PCA de-correlates the variables by performing spectral decomposition on the raw data matrix to obtain the
principal components. Principal components obtained have no correlation among each other thus making it easier and
more feasible to use multiple univariate control charts especially in a manufacturing setup. Using the multiple univariate
charts the out of control data points were identified and eliminated through 7 iterations using method-I and 3 iterations
using method-II.
Justification
In this project, we deal with a multivariate data set with 552 records and 209 dimensions, which is a lot of data. When
the number of dimensions are high, the noise components can add up to a great magnitude, even if the individual ones
are relatively small, which reduces the signal to noise ratio, thereby making the monitoring process harder. This is called
“curse of dimensionality”. This is the basic necessary need to reduce the dimension of the data.
By the principle of effect sparsity, it is always the "vital few" instead of the "trivial many" that matters. Detection can be
done effectively on a lower data dimension if we can extract the “vital-few”. Principal Component Analysis (PCA) is an
effective way to reduce the dimension of data, in which we identify the directions in which most of the variability (or
information) exists, and we monitor only these “vital-few” directions. Since here we don’t know the whether the relative
magnitude in deviation is important or not, we done PCA using both of the covariance matrix as well as the correlation
matrix.
Since we do not have any information about the dataset, we performed PCA on both the covariance and the correlation
matrix.
Results
Following are the univariate control charts for the individual PCS using the COVARIANCE matrix.
Iteration 0:
ISEN-614 Project Fall-2015
8
We see that there are many out of control points. They are removed from the dataset and the PCs and control limits are
recalculated.
Iteration 1:
ISEN-614 Project Fall-2015
9
We see that there are still many out of control points in the control charts. So, we continue with the iterations till we
have a dataset that is completely in control.
Iteration 2:
Iteration 3:
ISEN-614 Project Fall-2015
10
Iteration 4:
ISEN-614 Project Fall-2015
11
Iteration 5:
Iteration 6:
ISEN-614 Project Fall-2015
12
Iteration 7:
ISEN-614 Project Fall-2015
13
Finally, after 7 iterations, we are able to get a dataset with all points within control with the recalculated UCL and LCL.
This is the end of Phase 1 Analysis.
Following table summarizes the number of data points removed after every iteration.
Iteration No. PC1 PC2 PC3 PC4
0. 12 5 0 0
1. 6 7 0 0
2. 2 5 0 0
3. 0 6 0 0
4. 1 5 0 0
5. 0 1 0 0
6. 0 1 1 0
7. 0 0 0 0
Following are the univariate control charts for the individual PCS using the CORRELATION matrix.
Iteration 0:
ISEN-614 Project Fall-2015
14
We see that there are many out of control points. Like previously done, we remove them and recalculate the PCs, UCL
and LCL.
Iteration 1:
Iteration 2:
ISEN-614 Project Fall-2015
15
Iteration 3:
ISEN-614 Project Fall-2015
16
Following shows the number of out of control points for each PC in every iteration.
Iteration No. PC1 PC2 PC3 PC4
0 0 0 2 4
1 0 0 2 3
2 0 0 1 0
3 0 0 0 0
Conclusion
After using PCA for both covariance matrix as well as the correlation matrix, we were able to reduce the dimension of
the data from 209 to 4 which are much easier to analyze and also the unnecessary noise has been eliminated by
selecting these vital few directions of four principal components. Since we don’t know whether the relative magnitude
of deviations of each of the dimension are of relative importance or not, it is better to use the correlation matrix for
future monitoring process. All of the out of control data points were removed after seven iterations for method-1 (i.e
using the covariance matrix) and after three iterations for method-2 (i.e. using the correlation matrix). Now, this clean
data is good enough to estimate the population distribution parameters.
Finally, we can conclude that after this phase-1 analysis, the estimated population parameters can be successfully used
to carry out the phase-2 analysis. Individual x-bar charts will be a good choice as all of the four PC’s are uncorrelated and
there is no risk of missing any change in correlation for larger mean shifts and m-CUSUM charts will be optimum to
detect any small sustained mean shifts.

More Related Content

What's hot

The graphical analysis for maintenace management method
The graphical analysis for maintenace management methodThe graphical analysis for maintenace management method
The graphical analysis for maintenace management methodPeterpanPan3
 
Statistical process control
Statistical process controlStatistical process control
Statistical process controleSAT Journals
 
Assessing Software Reliability Using SPC – An Order Statistics Approach
Assessing Software Reliability Using SPC – An Order Statistics ApproachAssessing Software Reliability Using SPC – An Order Statistics Approach
Assessing Software Reliability Using SPC – An Order Statistics ApproachIJCSEA Journal
 
Spc lecture presentation (bonnie corrror)
Spc lecture presentation (bonnie corrror)Spc lecture presentation (bonnie corrror)
Spc lecture presentation (bonnie corrror)Jitesh Gaurav
 
Open06
Open06Open06
Open06butest
 
Process Capability: Overview
Process Capability: OverviewProcess Capability: Overview
Process Capability: OverviewMatt Hansen
 
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTSESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTScsandit
 
Informing product design with analytical data
Informing product design with analytical dataInforming product design with analytical data
Informing product design with analytical dataTeam Consulting Ltd
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit
 
Statistical Process Control
Statistical Process ControlStatistical Process Control
Statistical Process ControlTushar Naik
 
7 QC Tools PowerPoint Presentation Slides
7 QC Tools PowerPoint Presentation Slides 7 QC Tools PowerPoint Presentation Slides
7 QC Tools PowerPoint Presentation Slides SlideTeam
 

What's hot (18)

The graphical analysis for maintenace management method
The graphical analysis for maintenace management methodThe graphical analysis for maintenace management method
The graphical analysis for maintenace management method
 
Statistical process control
Statistical process controlStatistical process control
Statistical process control
 
Assessing Software Reliability Using SPC – An Order Statistics Approach
Assessing Software Reliability Using SPC – An Order Statistics ApproachAssessing Software Reliability Using SPC – An Order Statistics Approach
Assessing Software Reliability Using SPC – An Order Statistics Approach
 
Spc lecture presentation (bonnie corrror)
Spc lecture presentation (bonnie corrror)Spc lecture presentation (bonnie corrror)
Spc lecture presentation (bonnie corrror)
 
7 QC TOOL
7 QC TOOL7 QC TOOL
7 QC TOOL
 
Ops A La Carte SPC Seminar
Ops A La Carte SPC SeminarOps A La Carte SPC Seminar
Ops A La Carte SPC Seminar
 
Open06
Open06Open06
Open06
 
Process Capability: Overview
Process Capability: OverviewProcess Capability: Overview
Process Capability: Overview
 
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTSESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
 
Informing product design with analytical data
Informing product design with analytical dataInforming product design with analytical data
Informing product design with analytical data
 
Chapter 1 spc
Chapter 1   spcChapter 1   spc
Chapter 1 spc
 
Spc
SpcSpc
Spc
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
 
De33635641
De33635641De33635641
De33635641
 
Statistical Process Control
Statistical Process ControlStatistical Process Control
Statistical Process Control
 
Evolutionary Operation
Evolutionary OperationEvolutionary Operation
Evolutionary Operation
 
7 QC Tools PowerPoint Presentation Slides
7 QC Tools PowerPoint Presentation Slides 7 QC Tools PowerPoint Presentation Slides
7 QC Tools PowerPoint Presentation Slides
 
Test data generation
Test data generationTest data generation
Test data generation
 

Similar to ISEN-614 Project Analyzes Manufacturing Data

Statistical Process Control
Statistical Process ControlStatistical Process Control
Statistical Process ControlNicola Mezzetti
 
Statistical Process Control (SPC) - QMS.pptx
Statistical Process Control (SPC) - QMS.pptxStatistical Process Control (SPC) - QMS.pptx
Statistical Process Control (SPC) - QMS.pptxARUN KUMAR
 
Statistical Process Control & Control Chart
Statistical Process Control  & Control ChartStatistical Process Control  & Control Chart
Statistical Process Control & Control ChartShekhar Verma
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutiongitikasingh2004
 
Statistical Process Control,Control Chart and Process Capability
Statistical Process Control,Control Chart and Process CapabilityStatistical Process Control,Control Chart and Process Capability
Statistical Process Control,Control Chart and Process Capabilityvaidehishah25
 
IMPLEMENTATION OF STATISTICAL PROCESS CONTROL TOOL IN AN AUTOMOBILE MANUFACTU...
IMPLEMENTATION OF STATISTICAL PROCESS CONTROL TOOL IN AN AUTOMOBILE MANUFACTU...IMPLEMENTATION OF STATISTICAL PROCESS CONTROL TOOL IN AN AUTOMOBILE MANUFACTU...
IMPLEMENTATION OF STATISTICAL PROCESS CONTROL TOOL IN AN AUTOMOBILE MANUFACTU...Angela Williams
 
A Practical Guide to Selecting the Right Control Chart eBook
A Practical Guide to Selecting the Right Control Chart eBookA Practical Guide to Selecting the Right Control Chart eBook
A Practical Guide to Selecting the Right Control Chart eBookB2B Marketing Source, LLC
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
Lecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxLecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxsmile790243
 
Spc overview mfg
Spc overview mfgSpc overview mfg
Spc overview mfginnobun
 
process monitoring (statistical process control)
process monitoring (statistical process control)process monitoring (statistical process control)
process monitoring (statistical process control)Bindutesh Saner
 
Application of Principal Components Analysis in Quality Control Problem
Application of Principal Components Analysisin Quality Control ProblemApplication of Principal Components Analysisin Quality Control Problem
Application of Principal Components Analysis in Quality Control ProblemMaxwellWiesler
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive StatisticsCIToolkit
 
Six sigma tools an overview
Six sigma tools  an overviewSix sigma tools  an overview
Six sigma tools an overviewKomal Kamble
 
Control Charts28 Modified
Control Charts28 ModifiedControl Charts28 Modified
Control Charts28 Modifiedvaliamoley
 
analysis part 02.pptx
analysis part 02.pptxanalysis part 02.pptx
analysis part 02.pptxefrembeyene4
 
Process capability
Process capabilityProcess capability
Process capabilitypadam nagar
 

Similar to ISEN-614 Project Analyzes Manufacturing Data (20)

Statistical Process Control
Statistical Process ControlStatistical Process Control
Statistical Process Control
 
AIG Seven QC Tools
AIG Seven QC ToolsAIG Seven QC Tools
AIG Seven QC Tools
 
Statistical Process Control (SPC) - QMS.pptx
Statistical Process Control (SPC) - QMS.pptxStatistical Process Control (SPC) - QMS.pptx
Statistical Process Control (SPC) - QMS.pptx
 
Statistical Process Control & Control Chart
Statistical Process Control  & Control ChartStatistical Process Control  & Control Chart
Statistical Process Control & Control Chart
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
 
Statistical Process Control,Control Chart and Process Capability
Statistical Process Control,Control Chart and Process CapabilityStatistical Process Control,Control Chart and Process Capability
Statistical Process Control,Control Chart and Process Capability
 
IMPLEMENTATION OF STATISTICAL PROCESS CONTROL TOOL IN AN AUTOMOBILE MANUFACTU...
IMPLEMENTATION OF STATISTICAL PROCESS CONTROL TOOL IN AN AUTOMOBILE MANUFACTU...IMPLEMENTATION OF STATISTICAL PROCESS CONTROL TOOL IN AN AUTOMOBILE MANUFACTU...
IMPLEMENTATION OF STATISTICAL PROCESS CONTROL TOOL IN AN AUTOMOBILE MANUFACTU...
 
A Practical Guide to Selecting the Right Control Chart eBook
A Practical Guide to Selecting the Right Control Chart eBookA Practical Guide to Selecting the Right Control Chart eBook
A Practical Guide to Selecting the Right Control Chart eBook
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
Lecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxLecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docx
 
Spc overview mfg
Spc overview mfgSpc overview mfg
Spc overview mfg
 
C O N T R O L L P R E S E N T A T I O N
C O N T R O L L  P R E S E N T A T I O NC O N T R O L L  P R E S E N T A T I O N
C O N T R O L L P R E S E N T A T I O N
 
process monitoring (statistical process control)
process monitoring (statistical process control)process monitoring (statistical process control)
process monitoring (statistical process control)
 
Application of Principal Components Analysis in Quality Control Problem
Application of Principal Components Analysisin Quality Control ProblemApplication of Principal Components Analysisin Quality Control Problem
Application of Principal Components Analysis in Quality Control Problem
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Introduction to SPC
Introduction to SPCIntroduction to SPC
Introduction to SPC
 
Six sigma tools an overview
Six sigma tools  an overviewSix sigma tools  an overview
Six sigma tools an overview
 
Control Charts28 Modified
Control Charts28 ModifiedControl Charts28 Modified
Control Charts28 Modified
 
analysis part 02.pptx
analysis part 02.pptxanalysis part 02.pptx
analysis part 02.pptx
 
Process capability
Process capabilityProcess capability
Process capability
 

ISEN-614 Project Analyzes Manufacturing Data

  • 1. ISEN614 (ADVANCED QUALITY CONTROL) PROJECT REPORT Fall-2015 Naman Kapoor Rahul Garg Vinayak Nair
  • 2. ISEN-614 Project Fall-2015 1 Index 1.) Executive Summary…………………………………………………………………………………………………………………………………………..2 2.) Introduction………………………………………………………………………………………………………………………………………………………3 3.) Approach…………………………………………………………………………………………………………………………………………………………..3 4.) Justification……………………………………………………………………………………………………………………………………………………….7 5.) Results………………………………………………………………………………………………………………………………………………………………7 6.) Conclusion……………………………………………………………………………………………………………………………………………………….16
  • 3. ISEN-614 Project Fall-2015 2 Executive Summary The objective of this project was to develop a procedure to isolate the in-control data and out-of-control data points taken from a manufacturing process so that the distribution parameters can be estimated, and a monitoring scheme can be set, which can be used for anomaly or change detection for the future process. The dataset available had 552 records and 209 dimensions. So, it was imperative to reduce the dimension of the dataset such that the individual smaller noise components do not add up and overwhelm the signal and make our task of anomaly detection more difficult. Though we have an option to use T2 , m-CUSUM and m-EWMA charts for change detection, it is extremely difficult to do so for a high dimension data set. We conducted Principal Component Analysis (PCA) on the dataset to reduce the data dimension by selecting the ‘vital few’ dimensions over the ‘trivial many’. After performing PCA, we analyzed the Scree plot, Pareto plot and MDL to choose the right number of PCAs that define most of the variance in our dataset and at the same time are un-correlated. We found that the first four PCs were enough to explain about 80% of the variability in the given data set. After this we decided to plot multiple univariate charts for the four selected PC’s to find the out-of-control data points and eliminate them. Univariate charts are a good option as there is no correlation between the PC’s, and hence there is no need to monitor the change in correlation. This took a number of iterations before all the data points were in control and the data was clean enough to estimate distribution parameters for future detection. Working on this project was quite insightful in a sense that we actually got to work with a real high dimension manufacturing dataset and were able to understand how to implement the theoretical concepts of PCA in the industry. This also enabled us to understand the implications of principal of effect sparsity and spectral decomposition. All in all, the project was a good exercise to delve deep into the procedure of doing phase-1 analysis for a data set of very high dimension and of course how to use MATLAB to help us with it.
  • 4. ISEN-614 Project Fall-2015 3 Introduction In today’s world of cut throat competition and large scale production, only that manufacturer can survive who can provide good quality products and services that meet or exceed the expectations of the customers. There is a need to continuously monitor the ongoing manufacturing processes so that any change in the process can be indentified quickly and rectified to prevent production loss. To help with detecting any change or anomaly in the process, we have control charts at our disposal which are widely used in today’s manufacturing world because of their effectiveness and accuracy. Depending upon the type of data – continuous or discrete, univariate or multivariate – we have various options to choose from, ranging from univariate x-bar chart to m-CUSUM or m-EWMA chart. If analysis of the control chart indicates that the process is currently under control (i.e., is stable, with variation only coming from sources common to the process), then, no corrections or changes to process control parameters are needed or desired. In addition, data from the process can be used to predict the future performance of the process. If the chart indicates that the monitored process is not in control, analysis of the chart can help determine the sources of variation. We have introduced Principal Component Analysis (PCA) as the data reduction tool here. PCA is a method to convert co- related variables into uncorrelated variables or principal components by spectral decomposition. Then out of these uncorrelated principal components we take a subset such that the components in this subset can define most of the variation in the data. This will also eliminate the noise due to large number of individual components and will make our signal stronger. After selecting the vital few principal components we can clean our data set which means removing all out of control points for doing phase-1 analysis. Phase-1 analysis is a method to find out the distribution parameters of the in-control data so that a monitoring scheme can be developed to detect any change/anomaly in future observations. In this particular case, the number of PC’s we chose was four as these four PC’s were able to capture about 80% variation in the data. A number of iterations were carried out where all the data points were in control and data is good enough to estimate the distribution parameters and carrying on with the phase-2 analysis which is to monitor the future process. Approach Dataset Interpretation The dataset provided was multivariate continuous with 209 dimensions (columns) of 552 observations (rows) and each observation having a sample size of 1 (p=209, m= 552, n=1). The data set was perceived to follow a normal distribution by plotting the profile of its mean. The distribution observed closely resembled the normal distribution, hence the dataset was approached as a continuous multivariate data in subsequent manipulations.
  • 5. ISEN-614 Project Fall-2015 4 Dimensional Reduction Since the dataset provided had 209 different variables (dimensions) and their 552 observations it was difficult and more importantly highly impractical to include all the variables in the monitoring process. There was a need to conduct dimensional reduction to bring out the few variables which explain most of the variability in the data. The technique selected for identifying the “vital few” dimensions was Principal Component Analysis (PCA). Since the information on the dataset is very limited and we do not really know the units of the variables, the PCAs were calculated using both the covariance and the correlation matrix. I. Method 1- Calculation of PCAs using the ‘Covariance’ matrix of the dataset. Fig: Pareto chart of principal components Fig: Scree plot of the eigen values of the principal components
  • 6. ISEN-614 Project Fall-2015 5 Fig: Minimum Description Lengths of the Principal Components With the help of above charts, 4 principal components were identified that explain over 80% variability in the whole dataset. II. Method 2- Calculation of PCAs using the ‘Correlation’ matrix of the dataset. Fig: Pareto chart of principal components
  • 7. ISEN-614 Project Fall-2015 6 Fig: Scree plot of the Eigen values of the principal components Fig: Minimum Description Lengths of the Principal Components With the help of above charts, 4 principal components were identified that explain over 90% variability in the whole dataset.
  • 8. ISEN-614 Project Fall-2015 7 Data Imputation using Phase I analysis After the dimensional reduction procedure an appropriate control chart had to be selected to go ahead with phase I analysis to identify any out of control data points and eliminate them. Since PCA was performed to bring out the variables that explain most of the variability in the data (4), multiple univariate charts were selected for phase I analysis due to the fact that PCA de-correlates the variables by performing spectral decomposition on the raw data matrix to obtain the principal components. Principal components obtained have no correlation among each other thus making it easier and more feasible to use multiple univariate control charts especially in a manufacturing setup. Using the multiple univariate charts the out of control data points were identified and eliminated through 7 iterations using method-I and 3 iterations using method-II. Justification In this project, we deal with a multivariate data set with 552 records and 209 dimensions, which is a lot of data. When the number of dimensions are high, the noise components can add up to a great magnitude, even if the individual ones are relatively small, which reduces the signal to noise ratio, thereby making the monitoring process harder. This is called “curse of dimensionality”. This is the basic necessary need to reduce the dimension of the data. By the principle of effect sparsity, it is always the "vital few" instead of the "trivial many" that matters. Detection can be done effectively on a lower data dimension if we can extract the “vital-few”. Principal Component Analysis (PCA) is an effective way to reduce the dimension of data, in which we identify the directions in which most of the variability (or information) exists, and we monitor only these “vital-few” directions. Since here we don’t know the whether the relative magnitude in deviation is important or not, we done PCA using both of the covariance matrix as well as the correlation matrix. Since we do not have any information about the dataset, we performed PCA on both the covariance and the correlation matrix. Results Following are the univariate control charts for the individual PCS using the COVARIANCE matrix. Iteration 0:
  • 9. ISEN-614 Project Fall-2015 8 We see that there are many out of control points. They are removed from the dataset and the PCs and control limits are recalculated. Iteration 1:
  • 10. ISEN-614 Project Fall-2015 9 We see that there are still many out of control points in the control charts. So, we continue with the iterations till we have a dataset that is completely in control. Iteration 2: Iteration 3:
  • 14. ISEN-614 Project Fall-2015 13 Finally, after 7 iterations, we are able to get a dataset with all points within control with the recalculated UCL and LCL. This is the end of Phase 1 Analysis. Following table summarizes the number of data points removed after every iteration. Iteration No. PC1 PC2 PC3 PC4 0. 12 5 0 0 1. 6 7 0 0 2. 2 5 0 0 3. 0 6 0 0 4. 1 5 0 0 5. 0 1 0 0 6. 0 1 1 0 7. 0 0 0 0 Following are the univariate control charts for the individual PCS using the CORRELATION matrix. Iteration 0:
  • 15. ISEN-614 Project Fall-2015 14 We see that there are many out of control points. Like previously done, we remove them and recalculate the PCs, UCL and LCL. Iteration 1: Iteration 2:
  • 17. ISEN-614 Project Fall-2015 16 Following shows the number of out of control points for each PC in every iteration. Iteration No. PC1 PC2 PC3 PC4 0 0 0 2 4 1 0 0 2 3 2 0 0 1 0 3 0 0 0 0 Conclusion After using PCA for both covariance matrix as well as the correlation matrix, we were able to reduce the dimension of the data from 209 to 4 which are much easier to analyze and also the unnecessary noise has been eliminated by selecting these vital few directions of four principal components. Since we don’t know whether the relative magnitude of deviations of each of the dimension are of relative importance or not, it is better to use the correlation matrix for future monitoring process. All of the out of control data points were removed after seven iterations for method-1 (i.e using the covariance matrix) and after three iterations for method-2 (i.e. using the correlation matrix). Now, this clean data is good enough to estimate the population distribution parameters. Finally, we can conclude that after this phase-1 analysis, the estimated population parameters can be successfully used to carry out the phase-2 analysis. Individual x-bar charts will be a good choice as all of the four PC’s are uncorrelated and there is no risk of missing any change in correlation for larger mean shifts and m-CUSUM charts will be optimum to detect any small sustained mean shifts.