SlideShare a Scribd company logo
1 of 17
Download to read offline
Big Data &
The Trouble with ‘Normal’
Common Pitfalls in Capability/Performance Analysis
Barry Khor
barrykhor@gmail.com
All rights reserved
This document was developed with knowledge sharing in mind. Distribution and reproduction of this document, in part
or in whole is freely encouraged provided authorship information is preserved.
1 –
“So let us not talk falsely now, the hour is getting late”
Bob Dylan
All along the watchtower
Preface
The author wishes to express that the underlying statistical techniques employed in these studies
are very basic and they are not the focus of this document. The focus is instead on the
interpretation of data and the care and discipline in examining the data for problem solving and
continuous improvement.
Secondly, this document is a work in progress. As such the author welcomes comments and
critique.
In whatever way the document may evolve over time it is the author’s hope that some of the
readers may derive benefit from these studies in its current state at any time.
Barry Khor
barrykhor@gmail.com
October 2017
“What are we waiting for, Christmas?”
“Things are not what they seem…."
CM Achuthan, author’s ex-boss/mentor@ Hitachi Penang
“Trust, but verify.”
Dr. Irwin M Jacobs, Founder/CEO, Qualcomm
Yogi-ism:
“When you come to a fork in the road, take it"
“You can observe a lot just by watching”
“No one goes there nowadays, it is too crowded"
Yogi Berra
“It’s not that I’m so smart, it’s just that I stay with problems longer"
“In the middle of difficulty lies opportunity”
“Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole
life believing that it is stupid“
Albert Einstein
“There Is no such thing as an Electrical Failure”
Failure Analysis 101
“The hardest to learn is the least complicated”
Indigo Girls
Favorite quotes (some related to the subject matter, others not).
– 4 –
Common Pitfalls to Avoid
in Statistical Capability/Performance Analysis
March 2011
Barry Khor
Case Study 1 – with engineered data. Data points
were ‘made up’ using NORMDIST function in Excel.
Let’s pretend this is a distribution of a measurement with a targeted mean
value of 50. The distribution appear to be very normal with perfect bell shape
and good symmetry around the mean value of 50. Most would stop right here
and declare the process “robust”, and happily sets the Lower and Upper
control limits at 4 sigma or 5 sigma for SPC and call it a day.
What’s wrong with that?
The trouble with Normal is that it is not Normal.
The perfect looking bell shape distribution (“overall”) is actually the
summation of 2 subgroups with very different mean values, but similar
standard deviation. This kind of composite distribution is quite common in
high volume production involving many machines with settings which can
change over time.
“Something is rotten in the state of Denmark…”
William Shakespeare, Hamlet.
Armed with the knowledge of the two subgroups/
offset of the means, the process owner can target a
new setting to bring both groups together around
the targeted mean of 50. Mean shifts are generally
easier to correct because the shift is “translational”
whereas variations around the mean are usually
harder to minimize due to their random noise-
sensitive nature.
By centering the two subgroups around the targeted
mean of 50, the overall distribution has a narrower
distribution and higher population around the mean,
i.e. “better central tendencies” which usually
translate into better product performance.
.
Comparing the distribution before and after the process centering of the sub-groups: The new distribution is
significantly improved . The benefits are two fold:
1. Less probability of rejection at the fringes (tails)
2. More units with ideal performance.
This realization of benefits would have been lost had the wrong interpretation persisted by just looking at the
so-called BIG DATA. Always slice and dice to the extent allowable. Sometimes it may be necessary to stratify
using non-existent grouping such as sequences, odd vs. even entries etc.
After
Before
– 9 –
A Case Study –
Solder Height Data Analysis
March 2011
Barry Khor
“So let us not talk falsely now, the hour is getting late”
Bob Dylan
All along the watchtower
Case Study 2 – with real but sanitized data. Data
source is kept anonymous for it is really irrelevant
here for the purpose of this study.
– 10 –
Looks pretty normal, right?
Be suspicious…….. be very suspicious!
Let’s take a closer look.
Histogram from production data data supplied by a SMT Contract manufacturer of an
OEM, in support of their claim that irregularity in an IC component is responsible for bad
yields attributed to both insufficient solder and solder bridging. This data represents
10,784 paste print height measurements from each PCB pad, on 16 panels of 2 boards
each. 337 pads per board. While not unprecedented, it can be considered as BIG DATA.
– 11 –
To the trained eye, there are clues that this data needs further
analysis
1. The abrupt cutoff at the left tail of the distribution suggests
some kind of screening or exclusion of data points. Excluding
data within the natural distribution , for the purpose of
capability analysis is not a good practice . It defeats the
purpose of the study.
2. The distribution is somewhat skewed to the right : right side
of the mode (4.2 to 4.3 bin) has more data points than the left,
even with the left tail accounted for..
What the process owner might say::
Both the cutoff and the skewness is a natural response since there is a
minimum thickness that is pre-ordained by the stencil’s thickness.
This is where things can become dangerous when the process owner has only a very rudimentary understanding of the
concept of Capability Analysis, and Performance analysis. A MiniTab report somehow legitimizes the proclamation that the
process is well in control with a Cpk of 1.57. The correct message from this chart is that the performance of the process (as
indicated by Ppk = 0.66) is much below the intrinsic capability or potential Cpk (1.57 in this case). The potential distribution is
a great Minitab feature that unfortunately can easily lead to a false sense of security if wrongly interpreted.
– 12 –
7.156.606.055.504.954.403.853.30
LSL U SL
LSL 3.5
Target *
U SL 7.5
Sam ple M ean 4.68858
Sam ple N 10784
StD ev(W ithin) 0.252208
StD ev(O verall) 0.604139
ProcessD ata
C p 2.64
C PL 1.57
C PU 3.72
C pk 1.57
Pp 1.10
PPL 0.66
PPU 1.55
Ppk 0.66
C pm *
O verallC apability
Potential(W ithin)C apability
PPM < LSL 0.00
PPM > U SL 0.00
PPM Total 0.00
O bserved Perform ance
PPM < LSL 1.22
PPM > U SL 0.00
PPM Total 1.22
Exp.W ithin Perform ance
PPM < LSL 24569.27
PPM > U SL 1.63
PPM Total 24570.90
Exp.O verallPerform ance
W ithin
O verall
Process C apability of H eight
The EMS claimed a robust process with a Cpk of 1.57, well in excess of generally acceptable Cpk of 1.33, with
this following Minitab generated Capability Summary
What is wrong with that preceding assessment?
The EMS claim of acceptable Cpk was based on the
potential capability but actual distribution is much
worse (Ppk=0.66)
Even though the original claim is dismissed there is still
an underlying issue- what caused the actual
performance to be much worse than the potential
performance? Fortunately there is enough intelligence
in the raw data to help determine the root cause.
Following pages explain the root cause for the
underperformance.
– 13 –
7.156.606.055.504.954.403.853.30
LSL U SL
LSL 3.5
Target *
U SL 7.5
Sam ple M ean 4.68858
Sam ple N 10784
StD ev(W ithin) 0.252208
StD ev(O verall) 0.604139
ProcessD ata
C p 2.64
C PL 1.57
C PU 3.72
C pk 1.57
Pp 1.10
PPL 0.66
PPU 1.55
Ppk 0.66
C pm *
O verallC apability
Potential(W ithin)C apability
PPM < LSL 0.00
PPM > U SL 0.00
PPM Total 0.00
O bserved Perform ance
PPM < LSL 1.22
PPM > U SL 0.00
PPM Total 1.22
Exp.W ithin Perform ance
PPM < LSL 24569.27
PPM > U SL 1.63
PPM Total 24570.90
Exp.O verallPerform ance
W ithin
O verall
Process C apability of H eight
MiniTAB Definition
Within and overall refer to different ways of estimating process variation. A within estimate, such as Rbar/d2, is based on variation within
subgroups. The overall estimate is the overall standard deviation for the entire study. Cp and Cpk are listed under Potential (Within)
Capability because they are calculated using the within estimate of variation. Pp and Ppk are listed under Overall Capability because they
are calculated using the overall standard deviation of the study.
The within variation corresponds to the inherent process variation defined in the Statistical Process Control (SPC) Reference Manual
(Chrysler Corporation, Ford Motor Company, and General Motors Corporation. Copyright by A.I.A.G) while overall variation corresponds
to the total process variation. Inherent process variation is due to common causes only. Overall variation is due to both common and
special causes. Cp and Cpk are called potential capability in Minitab, because they reflect the potential that could be attained if all special
causes were eliminated.
Big hint here: A within estimate, such as Rbar/d2, is based on variation within subgroups.
Process Capability Analysis (JMP)
– 14 –
Using current limits, Ppk = 0.67 with rejection rate of 2.5%
At2.5% pad rejection, board level yield is essentially 0%
Using author’s suggested limits, Ppk = 0.18 with rejection rate of 43%!
Could someone translate that?
– 15 –
Let’s give it a shot:
1. The actual process performance is indicated by the Ppk index. Cpk as defined using the Mini-TAB analysis should be
taken to mean the potential capability IF special causes has no significant contribution to the variation here.
Examination of the data indicates that special causes cannot be ignored here. The actual process performance is not
satisfactory even without considering the limits. In a well qualified and controlled process the Ppk should be close to
the intrinsic Cpk
2. Since the big hint has to do with subgroup, the data is then stratified into the subgroups using the board design
information (# of pads per panel) and a trend emerged. Following slides show large variation or shifts between
successive panel # and board number within the panel – this special cause alone is the largest contributor to the
overall wide distribution. It indicates variation between forward and reverse strokes of the squeegee/print head, and
uneven setting (squeegee pressure) on each board.
3. The overall spec limits of .0035” to 0.0075” are too loose or lenient in the author’s opinion, for a 0.004” or 100 um
stencil. The appropriate limits should be 0.004” to 0.005." Using these new limits the process capability became very
low. The conclusion is the paste print process is not stable and it is probably the dominating reason for the poor
soldering yields. It was recommended that the process be re-qualified after adjusting for the said differences between
the two squeegees in the print head.
4. The low paste thickness could contribute to poor soldering not just because of the lower solder volume but the lower
amount of flux available for optimal solder reflow.
Unleashing the power of data stratification, in this case the overall histogram is “SPLIT” 16 ways into
panel number, arranged in numeric sequence (Down then Right Top and so on).
– 16 –
The unmistakable alternating trend led to the conclusion that the screen printer’s squeegee was not set up right
leading to the wide variation in squeegee pressure between forward and backward stroke. Upon presenting this data
the complaint was immediately dropped.
BINGO!
– 17 –
The End
Questions? Comments?
Barry Khor
barrykhor@gmail.com

More Related Content

Similar to Here is a translation of the key points:1. The actual process performance is indicated by Ppk, which is 0.67. This is below the generally accepted level of 1.33, indicating the process is not capable or stable enough. Cpk from Minitab refers to the potential capability if special causes were removed, but special causes cannot be ignored here based on the data. 2. The "big hint" refers to estimating process variation within subgroups. This suggests the data should be analyzed by subgroups to identify sources of variation between subgroups, which could be causing the low Ppk. 3. At the current process limits, the rejection rate would be 2.5%. But at the board level, this would

Scientific Benchmarking of Parallel Computing Systems
Scientific Benchmarking of Parallel Computing SystemsScientific Benchmarking of Parallel Computing Systems
Scientific Benchmarking of Parallel Computing Systemsinside-BigData.com
 
Metric Abuse: Frequently Misused Metrics in Oracle
Metric Abuse: Frequently Misused Metrics in OracleMetric Abuse: Frequently Misused Metrics in Oracle
Metric Abuse: Frequently Misused Metrics in OracleSteve Karam
 
Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24SKelly514
 
Process Capability: Step 5 (Non-Normal Distributions)
Process Capability: Step 5 (Non-Normal Distributions)Process Capability: Step 5 (Non-Normal Distributions)
Process Capability: Step 5 (Non-Normal Distributions)Matt Hansen
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
 
Portfolio Management Using Questionable Quality Data
Portfolio Management Using Questionable Quality DataPortfolio Management Using Questionable Quality Data
Portfolio Management Using Questionable Quality DataPortfolio Decisions
 
Understanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationUnderstanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationAhmet Kuzubaşlı
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
 
Cpk guide 0211_tech1
Cpk guide 0211_tech1Cpk guide 0211_tech1
Cpk guide 0211_tech1Piyush Bose
 
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEINTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEIPutuAdiPratama
 
Lab - Surface WaterPre-Lab QuestionsWhen a river bends and.docx
Lab - Surface WaterPre-Lab QuestionsWhen a river bends and.docxLab - Surface WaterPre-Lab QuestionsWhen a river bends and.docx
Lab - Surface WaterPre-Lab QuestionsWhen a river bends and.docxsmile790243
 
Basics of Capability.ppt
Basics of Capability.pptBasics of Capability.ppt
Basics of Capability.pptValentinoDhiyu1
 
Six Sigma : Process Capability
Six Sigma : Process CapabilitySix Sigma : Process Capability
Six Sigma : Process CapabilityLalit Padekar
 
Bioanalytical validation house of cards
Bioanalytical validation house of cardsBioanalytical validation house of cards
Bioanalytical validation house of cardsE. Dennis Bashaw
 

Similar to Here is a translation of the key points:1. The actual process performance is indicated by Ppk, which is 0.67. This is below the generally accepted level of 1.33, indicating the process is not capable or stable enough. Cpk from Minitab refers to the potential capability if special causes were removed, but special causes cannot be ignored here based on the data. 2. The "big hint" refers to estimating process variation within subgroups. This suggests the data should be analyzed by subgroups to identify sources of variation between subgroups, which could be causing the low Ppk. 3. At the current process limits, the rejection rate would be 2.5%. But at the board level, this would (20)

Scientific Benchmarking of Parallel Computing Systems
Scientific Benchmarking of Parallel Computing SystemsScientific Benchmarking of Parallel Computing Systems
Scientific Benchmarking of Parallel Computing Systems
 
Metric Abuse: Frequently Misused Metrics in Oracle
Metric Abuse: Frequently Misused Metrics in OracleMetric Abuse: Frequently Misused Metrics in Oracle
Metric Abuse: Frequently Misused Metrics in Oracle
 
Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24
 
Process Capability: Step 5 (Non-Normal Distributions)
Process Capability: Step 5 (Non-Normal Distributions)Process Capability: Step 5 (Non-Normal Distributions)
Process Capability: Step 5 (Non-Normal Distributions)
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Portfolio Management Using Questionable Quality Data
Portfolio Management Using Questionable Quality DataPortfolio Management Using Questionable Quality Data
Portfolio Management Using Questionable Quality Data
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Understanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationUnderstanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking Generalization
 
IEEE 2 5 beta method unraveled
IEEE 2 5 beta method unraveledIEEE 2 5 beta method unraveled
IEEE 2 5 beta method unraveled
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
Basics of Process Capability
Basics of Process CapabilityBasics of Process Capability
Basics of Process Capability
 
Cpk guide 0211_tech1
Cpk guide 0211_tech1Cpk guide 0211_tech1
Cpk guide 0211_tech1
 
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEINTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
 
Lab - Surface WaterPre-Lab QuestionsWhen a river bends and.docx
Lab - Surface WaterPre-Lab QuestionsWhen a river bends and.docxLab - Surface WaterPre-Lab QuestionsWhen a river bends and.docx
Lab - Surface WaterPre-Lab QuestionsWhen a river bends and.docx
 
Basics of Capability.ppt
Basics of Capability.pptBasics of Capability.ppt
Basics of Capability.ppt
 
Six Sigma : Process Capability
Six Sigma : Process CapabilitySix Sigma : Process Capability
Six Sigma : Process Capability
 
Bioanalytical validation house of cards
Bioanalytical validation house of cardsBioanalytical validation house of cards
Bioanalytical validation house of cards
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
Analyzing Performance Test Data
Analyzing Performance Test DataAnalyzing Performance Test Data
Analyzing Performance Test Data
 

Recently uploaded

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Recently uploaded (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Here is a translation of the key points:1. The actual process performance is indicated by Ppk, which is 0.67. This is below the generally accepted level of 1.33, indicating the process is not capable or stable enough. Cpk from Minitab refers to the potential capability if special causes were removed, but special causes cannot be ignored here based on the data. 2. The "big hint" refers to estimating process variation within subgroups. This suggests the data should be analyzed by subgroups to identify sources of variation between subgroups, which could be causing the low Ppk. 3. At the current process limits, the rejection rate would be 2.5%. But at the board level, this would

  • 1. Big Data & The Trouble with ‘Normal’ Common Pitfalls in Capability/Performance Analysis Barry Khor barrykhor@gmail.com All rights reserved This document was developed with knowledge sharing in mind. Distribution and reproduction of this document, in part or in whole is freely encouraged provided authorship information is preserved. 1 – “So let us not talk falsely now, the hour is getting late” Bob Dylan All along the watchtower
  • 2. Preface The author wishes to express that the underlying statistical techniques employed in these studies are very basic and they are not the focus of this document. The focus is instead on the interpretation of data and the care and discipline in examining the data for problem solving and continuous improvement. Secondly, this document is a work in progress. As such the author welcomes comments and critique. In whatever way the document may evolve over time it is the author’s hope that some of the readers may derive benefit from these studies in its current state at any time. Barry Khor barrykhor@gmail.com October 2017
  • 3. “What are we waiting for, Christmas?” “Things are not what they seem…." CM Achuthan, author’s ex-boss/mentor@ Hitachi Penang “Trust, but verify.” Dr. Irwin M Jacobs, Founder/CEO, Qualcomm Yogi-ism: “When you come to a fork in the road, take it" “You can observe a lot just by watching” “No one goes there nowadays, it is too crowded" Yogi Berra “It’s not that I’m so smart, it’s just that I stay with problems longer" “In the middle of difficulty lies opportunity” “Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid“ Albert Einstein “There Is no such thing as an Electrical Failure” Failure Analysis 101 “The hardest to learn is the least complicated” Indigo Girls Favorite quotes (some related to the subject matter, others not).
  • 4. – 4 – Common Pitfalls to Avoid in Statistical Capability/Performance Analysis March 2011 Barry Khor Case Study 1 – with engineered data. Data points were ‘made up’ using NORMDIST function in Excel.
  • 5. Let’s pretend this is a distribution of a measurement with a targeted mean value of 50. The distribution appear to be very normal with perfect bell shape and good symmetry around the mean value of 50. Most would stop right here and declare the process “robust”, and happily sets the Lower and Upper control limits at 4 sigma or 5 sigma for SPC and call it a day. What’s wrong with that?
  • 6. The trouble with Normal is that it is not Normal. The perfect looking bell shape distribution (“overall”) is actually the summation of 2 subgroups with very different mean values, but similar standard deviation. This kind of composite distribution is quite common in high volume production involving many machines with settings which can change over time. “Something is rotten in the state of Denmark…” William Shakespeare, Hamlet.
  • 7. Armed with the knowledge of the two subgroups/ offset of the means, the process owner can target a new setting to bring both groups together around the targeted mean of 50. Mean shifts are generally easier to correct because the shift is “translational” whereas variations around the mean are usually harder to minimize due to their random noise- sensitive nature. By centering the two subgroups around the targeted mean of 50, the overall distribution has a narrower distribution and higher population around the mean, i.e. “better central tendencies” which usually translate into better product performance. .
  • 8. Comparing the distribution before and after the process centering of the sub-groups: The new distribution is significantly improved . The benefits are two fold: 1. Less probability of rejection at the fringes (tails) 2. More units with ideal performance. This realization of benefits would have been lost had the wrong interpretation persisted by just looking at the so-called BIG DATA. Always slice and dice to the extent allowable. Sometimes it may be necessary to stratify using non-existent grouping such as sequences, odd vs. even entries etc. After Before
  • 9. – 9 – A Case Study – Solder Height Data Analysis March 2011 Barry Khor “So let us not talk falsely now, the hour is getting late” Bob Dylan All along the watchtower Case Study 2 – with real but sanitized data. Data source is kept anonymous for it is really irrelevant here for the purpose of this study.
  • 10. – 10 – Looks pretty normal, right? Be suspicious…….. be very suspicious! Let’s take a closer look. Histogram from production data data supplied by a SMT Contract manufacturer of an OEM, in support of their claim that irregularity in an IC component is responsible for bad yields attributed to both insufficient solder and solder bridging. This data represents 10,784 paste print height measurements from each PCB pad, on 16 panels of 2 boards each. 337 pads per board. While not unprecedented, it can be considered as BIG DATA.
  • 11. – 11 – To the trained eye, there are clues that this data needs further analysis 1. The abrupt cutoff at the left tail of the distribution suggests some kind of screening or exclusion of data points. Excluding data within the natural distribution , for the purpose of capability analysis is not a good practice . It defeats the purpose of the study. 2. The distribution is somewhat skewed to the right : right side of the mode (4.2 to 4.3 bin) has more data points than the left, even with the left tail accounted for.. What the process owner might say:: Both the cutoff and the skewness is a natural response since there is a minimum thickness that is pre-ordained by the stencil’s thickness.
  • 12. This is where things can become dangerous when the process owner has only a very rudimentary understanding of the concept of Capability Analysis, and Performance analysis. A MiniTab report somehow legitimizes the proclamation that the process is well in control with a Cpk of 1.57. The correct message from this chart is that the performance of the process (as indicated by Ppk = 0.66) is much below the intrinsic capability or potential Cpk (1.57 in this case). The potential distribution is a great Minitab feature that unfortunately can easily lead to a false sense of security if wrongly interpreted. – 12 – 7.156.606.055.504.954.403.853.30 LSL U SL LSL 3.5 Target * U SL 7.5 Sam ple M ean 4.68858 Sam ple N 10784 StD ev(W ithin) 0.252208 StD ev(O verall) 0.604139 ProcessD ata C p 2.64 C PL 1.57 C PU 3.72 C pk 1.57 Pp 1.10 PPL 0.66 PPU 1.55 Ppk 0.66 C pm * O verallC apability Potential(W ithin)C apability PPM < LSL 0.00 PPM > U SL 0.00 PPM Total 0.00 O bserved Perform ance PPM < LSL 1.22 PPM > U SL 0.00 PPM Total 1.22 Exp.W ithin Perform ance PPM < LSL 24569.27 PPM > U SL 1.63 PPM Total 24570.90 Exp.O verallPerform ance W ithin O verall Process C apability of H eight The EMS claimed a robust process with a Cpk of 1.57, well in excess of generally acceptable Cpk of 1.33, with this following Minitab generated Capability Summary
  • 13. What is wrong with that preceding assessment? The EMS claim of acceptable Cpk was based on the potential capability but actual distribution is much worse (Ppk=0.66) Even though the original claim is dismissed there is still an underlying issue- what caused the actual performance to be much worse than the potential performance? Fortunately there is enough intelligence in the raw data to help determine the root cause. Following pages explain the root cause for the underperformance. – 13 – 7.156.606.055.504.954.403.853.30 LSL U SL LSL 3.5 Target * U SL 7.5 Sam ple M ean 4.68858 Sam ple N 10784 StD ev(W ithin) 0.252208 StD ev(O verall) 0.604139 ProcessD ata C p 2.64 C PL 1.57 C PU 3.72 C pk 1.57 Pp 1.10 PPL 0.66 PPU 1.55 Ppk 0.66 C pm * O verallC apability Potential(W ithin)C apability PPM < LSL 0.00 PPM > U SL 0.00 PPM Total 0.00 O bserved Perform ance PPM < LSL 1.22 PPM > U SL 0.00 PPM Total 1.22 Exp.W ithin Perform ance PPM < LSL 24569.27 PPM > U SL 1.63 PPM Total 24570.90 Exp.O verallPerform ance W ithin O verall Process C apability of H eight MiniTAB Definition Within and overall refer to different ways of estimating process variation. A within estimate, such as Rbar/d2, is based on variation within subgroups. The overall estimate is the overall standard deviation for the entire study. Cp and Cpk are listed under Potential (Within) Capability because they are calculated using the within estimate of variation. Pp and Ppk are listed under Overall Capability because they are calculated using the overall standard deviation of the study. The within variation corresponds to the inherent process variation defined in the Statistical Process Control (SPC) Reference Manual (Chrysler Corporation, Ford Motor Company, and General Motors Corporation. Copyright by A.I.A.G) while overall variation corresponds to the total process variation. Inherent process variation is due to common causes only. Overall variation is due to both common and special causes. Cp and Cpk are called potential capability in Minitab, because they reflect the potential that could be attained if all special causes were eliminated. Big hint here: A within estimate, such as Rbar/d2, is based on variation within subgroups.
  • 14. Process Capability Analysis (JMP) – 14 – Using current limits, Ppk = 0.67 with rejection rate of 2.5% At2.5% pad rejection, board level yield is essentially 0% Using author’s suggested limits, Ppk = 0.18 with rejection rate of 43%!
  • 15. Could someone translate that? – 15 – Let’s give it a shot: 1. The actual process performance is indicated by the Ppk index. Cpk as defined using the Mini-TAB analysis should be taken to mean the potential capability IF special causes has no significant contribution to the variation here. Examination of the data indicates that special causes cannot be ignored here. The actual process performance is not satisfactory even without considering the limits. In a well qualified and controlled process the Ppk should be close to the intrinsic Cpk 2. Since the big hint has to do with subgroup, the data is then stratified into the subgroups using the board design information (# of pads per panel) and a trend emerged. Following slides show large variation or shifts between successive panel # and board number within the panel – this special cause alone is the largest contributor to the overall wide distribution. It indicates variation between forward and reverse strokes of the squeegee/print head, and uneven setting (squeegee pressure) on each board. 3. The overall spec limits of .0035” to 0.0075” are too loose or lenient in the author’s opinion, for a 0.004” or 100 um stencil. The appropriate limits should be 0.004” to 0.005." Using these new limits the process capability became very low. The conclusion is the paste print process is not stable and it is probably the dominating reason for the poor soldering yields. It was recommended that the process be re-qualified after adjusting for the said differences between the two squeegees in the print head. 4. The low paste thickness could contribute to poor soldering not just because of the lower solder volume but the lower amount of flux available for optimal solder reflow.
  • 16. Unleashing the power of data stratification, in this case the overall histogram is “SPLIT” 16 ways into panel number, arranged in numeric sequence (Down then Right Top and so on). – 16 – The unmistakable alternating trend led to the conclusion that the screen printer’s squeegee was not set up right leading to the wide variation in squeegee pressure between forward and backward stroke. Upon presenting this data the complaint was immediately dropped. BINGO!
  • 17. – 17 – The End Questions? Comments? Barry Khor barrykhor@gmail.com