3. DATASET INTERPRETATION
The dataset provided was multivariate continuous with 209 dimensions (columns) of 552 observations (rows) and each
observation having a sample size of 1 (p=209, m= 552, n=1).
The distribution observed closely resembled the normal distribution, hence the dataset was approached as a continuous
multivariate data in subsequent manipulations.
4. METHODOLOGY
Method-I: Calculation of PCAs using the COVARIANCE matrix
Principal Component Analysis was applied to the data set to reduce the dimensions to a “vital few”.
Pareto Plot Scree Plot MDL Plot
4 Principal components were selected which appeared to explain over 80% of the
variability in the dataset.
5. Phase I Analysis
Since the principle components obtained were uncorrelated it was decided to use 4 univariate
charts to perform phase I analysis
Control limits for each control chart were calculated based on the 3 Sigma boundaries L = 3, and
type-I error (α) = 0.0027.
Decision rule adopted for monitoring was -”if any of the charts signal the process is considered out
of control”.
Combined type-I error (α) for the control charts was = 1-(1-.0027)4 = 0.010756
Approximate Average Run Length of the charts was (ARL0) ≈ 93
Phase I analysis was completed after 7 iterations of data imputation.
Iteration No. PC1 PC2 PC3 PC4
0. 12 5 0 0
1. 6 7 0 0
2. 2 5 0 0
3. 0 6 0 0
4. 1 5 0 0
5. 0 1 0 0
6. 0 1 1 0
7. 0 0 0 0
Table: Out of control data points observed after each iteration
6. Results
The final control charts obtained after the elimination of out-of-control data points were as follows:
7. Method-II: Calculation of PCAs using the CORRELATION matrix
Principal Component Analysis was applied to the data set to reduce the dimensions to a “vital few”.
Pareto Plot Scree Plot MDL Plot
4 Principal components were selected which appeared to explain over 90% of the
variability in the dataset.
8. Phase I Analysis
Since the principle components obtained were uncorrelated it was decided to use 4 univariate
charts to perform phase I analysis
Control limits for each control chart were calculated based on the 3 Sigma boundaries L = 3, and
type-I error (α) = 0.0027.
Decision rule adopted for monitoring was -”if any of the charts signal the process is considered out
of control”.
Combined type-I error (α) for the control charts was = 1-(1-.0027)4 = 0.010756
Approximate Average Run Length of the charts was (ARL0) ≈ 93
Phase I analysis was completed after 3 iterations of data imputation.
Table: Out of control data points observed after each iteration
Iteration No. PC1 PC2 PC3 PC4
0 0 0 2 4
1 0 0 2 3
2 0 0 1 0
3 0 0 0 0
9. Results
The final control charts obtained after the elimination of out-of-control data points were as follows:
10. Conclusion
Since it is not known whether the relative magnitude in deviation is important (because of the lack
of information conveyed by the data), it would be better to proceed with the correlation matrix.
Using 4 PCs is enough to explain more than 90% of the variation in the dataset.
After completing Phase-I analysis the estimated distribution parameters can be used for Phase-II
analysis .
Individual x-bar charts are a good choice to detect large mean shifts and spikes as the PC’s are
uncorrelated.
m-CUSUM or individual CUSUM charts can be used to detect any small sustained mean shifts for
phase II analysis.