Statistical process control (SPC) can help businesses understand and reduce variation in processes. All processes have inherent variation, and SPC uses statistical methods like control charts to distinguish normal variation from special causes that require action. Control charts plot process data over time and establish upper and lower control limits to determine whether the process is behaving as expected or needs attention. SPC is used across industries to monitor processes and identify issues like click fraud in online advertising. Understanding variation through SPC allows businesses to focus on real problems rather than wasting resources reacting to normal fluctuations.
1. Statistical Process Control — Why You
Should Care
by Pete Abilla on November 7, 2006
All processes are subject to some variation — this variation can either be inherent in the process
or imposed on the process by some outside force. By observing the variability in the process
output and comparing this to statistically calculated limits, objective decisions about when to
take action can be made. Without truly understanding the cause of the process variation,
resources may be wasted reacting to variation that is normal.
When I say ALL, I really mean ALL: manufacturing processes, where the process requires
inputs, something is done to those inputs to form outputs. This is a process and there will be
variation in this process. The challenge is to statistically determine what is ―normal‖ versus when
some corrective action or more attention is required; internet processes, Statistical Process
Control is used to determine click-fraud also, which I’ll explain shortly; and any process in any
industry — Statistical Process Control can help one better understand whether or not the process
is performing as statistically expected. If not, then it is a signal for corrective action or more
attention.
Types of Variation
Common Cause Variation is fluctuation caused by many random factors resulting in random
distribution of the output around a mean. Common cause variation is a measure of the process’s
potential or how well the process will perform when all the special cause variation is removed.
Common cause variation is also called random variation, noise, non-controllable variation,
within-group variation, inherent variation, or an in statistical control process.
Special Cause Variation is caused by a specific factor that results in a non-random distribution of
output. Special Cause Variation can cause a shift or trend in the output and can usually be
reduced or eliminated through local actions. Special Cause variation is also referred to as
―exceptional‖ or ―assignable‖ variation. Variation due to an identifiable out-of-the-ordinary
event, not a usual part of the process.
Statistical Process Control Visualization
2. Control Charts are often used as part of process control systems. They were developed by W. A.
Shewhart in 1924 while working for Bell Telephone Laboratories. Control Charts consists of a
center line and two boundary lines placed above and below the center line (the control limits).
Control limits are based on the variability within the data. Values are plotted to determine the
state of the process. Control Charts tell you how the process is performing – they do not contain
Specification Limits.
Control charts can be viewed as a distribution plotted on its side – if you created a histogram of
the points, you would expect this to show a normal distribution (assuming the process stays in
control). Below is an example:
Given the chart above, if the outputs of a process fall within the control limits — in this case the
Lower Control Limit (LCL) and Upper Control Limit (UCL), then the process is said to be
performing as statistically expected. But, more attention needs to be had if the outputs of this
process go beyond the LCL and UCL.
Elements of a Control Chart
Control Charts are plots of one or more summary statistic from samples taken sequentially in
time (usually sample proportions or mean and range). Control Charts have means and Upper &
Lower Control Limits. Control Limits are set to determine whether or not a particular average (or
range) is ―within acceptable limits‖ of random variation. Control Limits try to distinguish
between common cause and special cause variation. Control Limits (LCL and UCL) are typically
based on ±3σ.
Interpreting a Control Chart
Below is a general guideline for interpreting a Control Chart:
3. An Example: Click-Fraud
Google, Yahoo, and other CPC shops are very concerned with the quality of their ads — the sites
that run them, the traffic that sees them, and the population that clicks on them. Companies like
these are concerned because they have a fudiciary duty to the advertisers and also have a
financial incentive to make sure that quality is high in order for Yahoo!, Microsoft, Google or
any other CPC shop to be an attractive destination for advertisers to advertise with.
One way CPC shops can monitor the quality of their ads and detect click-fraud is by
implementing a Process Control System.
Suppose a blogger places adsense ads on her blog. Google monitors the performance of those ads
by tracking the unique IP Address of the blogger and anyone who clicks on them. For the
―clicking process‖, there is a statistically expected number of clicks. The data will show this.
There are, however, anomalies. For example, suppose this blogger writes a great blog post and
that article gets slashdot-ted or dugg to the digg front page, then traffic will come and most likely
ads will get clicked. The data will show this and Google will have to determine this state-of-
4. affairs as a special-cause variation, because the spike in traffic is outside of the expected range
and hence clicks will also be outside of the expected range.
Click-fraud, in this example, is detected when the same IP Address or the gang-effect of IP
Addresses are detected as clicking beyond the Upper and Lower Control Limits. This is evidence
of click-fraud, and Google is able to shut down the blogger’s or publisher’s account.
Why Should I Care?
Statistical Process Control is used everywhere, behind the scenes. It is a verifiable way to bettern
understand and make sense of variation.
Remainders