TESCO Evaluation of Non-Normal Meter Data

196 views

Published on

Undertand requirements of normality in ANSI/ASQ Z1.9 and how that affects analysis of meter test data.

Review typical meter test data distributors and how to determine if meter test data is normal

Introduction to working with non-normal data

  • Be the first to comment

  • Be the first to like this

TESCO Evaluation of Non-Normal Meter Data

  1. 1. Evaluation of Non Normal Meter Test Data February 28, 2012 Frank Garcia Fred Rispoli The Eastern Specialty Company
  2. 2. Session ObjectivesUnderstand:• Requirements of normality in ANSI/ASQ Z1.9 and how that affects analysis of meter test data.• Review typical meter test data distributions and how to determine if meter test data is normal.• Introduction to working with non normal data.• Assessing risk of using ANSI/ASQ Z1.9 with non normal meter test data. Any Other Issues or Items of Interest?
  3. 3. We Assume That Population Fits a Statistical Model• The statistical model being used for the sampling/testing plan needs to match the actual distribution of the population.• In most circumstances, one is looking at a normal or Gaussian distribution (i.e. a Bell curve) based on sampling theory.• We like ANSI/ASQ Z1.9 for ease of use and universal acceptance as a sampling plan.
  4. 4. What does ANSI/AQ Z1.9 Say About Normality?Paragraph A8 states:“This standard assumes the underlying distribution ofindividual measurements to be normal in shape. Failure ofthis assumption to be valid will affect the OperatingCharacteristic (OC) curves and probabilities based onthese curves. In particular it will affect the estimate ofpercentage non conforming calculated from the mean andstandard deviation of the distribution. The assumptionshould be verified prior to use of this standard.”
  5. 5. ANSI/ASQ Z1.9 Sampling Procedures and Tables for Inspection by Variables for Percent Nonconforming• Various methods (Variability Unknown - Standard Deviation Method, Variability Unknown - Range Method, and Variability Known Method)• All methods can be used with single or double specification limits.
  6. 6. Z1.9 Calculations EXAMPLE CALCULATIONS FOR ANSI/ASQ Z1.9-2003 STANDARD DEVIATION METHOD WITH DOUBLE SPECIFICATION LIMITS Weighted Average Calculated DataExample meter group has a population of 475 meters and 2% accuracy. Use AQL = 2.5Line Information Needed Value Obtained Explanation1. Sample Size: n 25 From Table I2. Sum of % Registrations: ∑X 2506.13. Sum of Squared % Registrations: ∑X2 251222.034. Correction Factor (CF): (∑X)2/n 251221.49 (2506.1)2/255. Corrected Sum of Squares (SS): ∑X2-CF 0.54166. Variance (V): SS/(n-1) 0.0226 0.5416/247. Estimate of Lot Std. Deviation (S): √V 0.1503 √0.02268. Sample Mean (X bar): (∑X)/n 100.24 2506.1/259. Upper Specification Limit: U 102.010. Lower Specification Limit: L 98.011. Quality Index (upper): QU = (U- X bar)/S 11.71 (102.0-100.24)/0.150312. Quality Index (lower): QL = (X bar - L)/S 14.90 (100.24-98.0)/0.150313. Est. of Lot % Out of Limits Above U: Pu 0.00% From Table V14. Est. of Lot % Out of Limits Below L: Pl 0.00% From Table V15. Total Est. % Out of Limits: P = Pu+Pl 0.00% 0.00% +0.00%16. Maximum Allowable % Out of Limits: M 5.98% From Table II and IV using AQL = 2.517. Acceptability Criterion: Pu + Pl < M 0.00% < 5.98% Therefore, the meter group is acceptable for continued service.Acceptability Criterion: If the estimated lot percent nonconforming (P) is equal to or lessthan the maximum allowable percent nonconforming (M), the lot meets the acceptabilitycriterion. If P is greater than M or if either Qu or Ql or both are negative, then the lotdoes not meet the acceptability criterion.
  7. 7. Operating Characteristic Curves - OCCs The Acceptable Quality Limit (AQL) is the maximum percentage or proportion of nonconforming units in a lot that can be considered satisfactory as a process average for the purpose of acceptance sampling When sampling, you face the risk of rejecting lots of AQL quality as well as the risk of accepting lots of poorer than AQL quality. We are interested in knowing how an acceptance sampling plan will accept or not accept lots overIdeal OC Curve various lot qualities. A curve showing the probability of acceptance over various lot or process qualities is called the operating characteristic (OC) curve.
  8. 8. Challenge of Meter Accuracy Test Data Distributions• ElectroMechanical electric meter accuracy test data tends to have more variability and tends to be normal.• Electronic (AMR) and Digital (AMI) meter accuracy is very high and tends to have a higher percentage of test points around the mean – Has different flatness or what we call “kurtosis”. Results in non normal distribution. – AMI meter test data is even more concentrated
  9. 9. Skew and KurtosisNegative - Positive + Mean > Median Mean < Median
  10. 10. AMR Meter Accuracy Test Data Distribution Skew: - 0.2 Kurtosis: +0.85
  11. 11. AMI Meter Accuracy Test Data Distribution Skew: -12.2 Kurtosis: +230.05
  12. 12. Gas Meter Accuracy Test Data Distribution Skew: - 0.33 Kurtosis: +4.85
  13. 13. The Normal CurveThe normal distribution is the most recognized distribution instatistics.The normal curve is a smooth, symmetrical, bell-shapedcurve, generated by the density function.It is the most useful continuous probability model as manynaturally occurring measurements such as heights, weights,etc. are approximately normally distributed.
  14. 14. Normal DistributionEach combination of mean and standard deviation generates aunique normal curve“Standard” Normal Distribution – Has a μ = 0, and σ = 1 – Data from any normal distribution can be made to fit the standard normal by converting raw scores to standard scores. – Z-scores measure how many standard deviations from the mean a particular data-value lies.
  15. 15. Empirical RuleThe Empirical Rule… -6 -5 -4 -3 -2 -1 +1 +2 +3 +4 +5 +6 68.27 % of the data will fall within +/- 1 standard deviation 95.45 % of the data will fall within +/- 2 standard deviations 99.73 % of the data will fall within +/- 3 standard deviations 99.9937 % of the data will fall within +/- 4 standard deviations 99.999943 % of the data will fall within +/- 5 standard deviations 99.9999998 % of the data will fall within +/- 6 standard deviations
  16. 16. Why Assess Normality?While many processes behave according to the normaldistribution, many distributions in meter testing are not normal.There are many types of distributions:There are many statistical tools that assume normal distributionproperties in their calculations such as Z1.9.So understanding just how “normal” the data is will impact howwe look at the data.
  17. 17. Tools for Assessing NormalityThe shape of any normal curve can be calculated based onthe normal probability density function.Tests for normality basically compare the shape of thecalculated curve to the actual distribution of your data points.For the purposes of this training, we will focus using theAnderson-Darling test and Normal Probability plots inMINITAB™ to assess normality. Watch that curve!
  18. 18. Goodness-of-Fit The Anderson-Darling test uses an empirical density function. 100 Expected for Normal DistributionDeparture of the Actual Data 20%actual data from the 80expected normal C u mdistribution. The u l a 60Anderson-Darling t i vGoodness-of-Fit test e Passesses the e 40 r cmagnitude of these e ndepartures using an t 20Observed minus 20%Expected formula. 0 3.0 3.5 4.0 4.5 5.0 5.5 Raw Data Scale
  19. 19. The Normal Probability Plot Probability Plot of Amount Normal 99.9 Mean 84.69 StDev 7.913 99 N 70 AD 0.265 95 P-Value 0.684 90 80 70Percent 60 50 40 30 20 10 5 1 0.1 60 70 80 90 100 110 Amount Notice scale on the vertical axis The Anderson-Darling test is a good litmus test for normality: if the P-value is more than .05, your data are normal enough for most purposes
  20. 20. Anderson-Darling Caveat Use the Anderson Darling column to generate these graphs. Summary for Anderson Darling Probability Plot of Anderson Darling A nderson-Darling N ormality Test Normal A -S quared 0.18 99.9 P -V alue 0.921 Mean 50.03 M ean 50.031 StDev 4.951 99 S tD ev 4.951 N 500 V ariance 24.511 AD 0.177 S kew ness -0.061788 95 P-Value 0.921 Kurtosis -0.180064 90 N 500 80 M inimum 35.727 70 1st Q uartile 46.800Percent 60 M edian 50.006 50 3rd Q uartile 53.218 40 36 40 44 48 52 56 60 M aximum 62.823 30 95% C onfidence Interv al for M ean 20 49.596 50.466 10 95% C onfidence Interv al for M edian 5 49.663 50.500 95% C onfidence Interv al for S tD ev 1 9 5 % C onfidence Inter vals 4.662 5.278 Mean 0.1 35 40 45 50 55 60 65 Median Anderson Darling 49.50 49.75 50.00 50.25 50.50In this case, both the Histogram and the Normality Plot look very “normal”. However,because the sample size is so large, the Anderson-Darling test is very sensitive and anyslight deviation from normal will cause the p-value to be very low.Examples: Centron & AM 250 data
  21. 21. If the Data Is Not Normal, Don’t Panic!• There are lots of meaningful statistical tools you can use to analyze your data.• It just means you may have to think about your data in a slightly different way. Don’t touch that button!
  22. 22. Non NormalityWhy do we care if a data set is normally distributed? – When it is necessary to make inferences about the true nature of the population based on random samples drawn from the population. – For problem solving purposes, because we don’t want to make a bad decision – having normal data is so critical that with EVERY statistical test, the first thing we do is check for normality of the data.Some of the primary causes for non-normal data: – Skewness – Natural and Artificial Limits – Mixed Distributions - Multiple Modes – Kurtosis
  23. 23. Non Normal Data AnalysisWhat happens if the process is not normally distributed?Usually concerned about this when doing process capability analysis or hypothesis testing but can be a factor in using Z1.9.• The Box-Cox or Johnson transformations are used to try to transform the data so that they become approximately normal.• Find another known distribution that fits the data.• Evaluate the risk of assuming a normal distribution using histograms, empirical cdfs, and OCCs
  24. 24. Box Cox Transformations• Box Cox transformations are used to try to convert non normal data into normal data• Box Cox transform the input data data denoted by g, using W = gl where l is any number typically between -5 and 5.• Trick is to choose l that produces a curve that is as close to normal as possible.• Statistics software recommends a range for l often with a confidence interval.• User may choose values for l and observe resulting curves.• Limited to positive data values and assumes the data is in subgroups.• Problem using with meter test data.
  25. 25. Exercise: Box Cox Transformations Examples: Dimensions Example Box-Cox Plot of AM 250 Sort_1 no out AM 250 Data Lower CL Upper CL Lambda 0.001340 (using 95.0% confidence) Estimate 0.06 Lower CL -0.50 Am 250 Data Does not 0.001335 Upper CL Rounded Value 0.71 0.00Transform Using Box Cox. StDev 0.001330 Most Meter Data will not Transform with Box Cox. 0.001325 Limit 0.001320 -5.0 -2.5 0.0 2.5 5.0 Lambda
  26. 26. Johnson Transformations• Johnson transformations are used to try to convert non normal data into normal data.• Johnson transformation is chosen from three different functions.• Does not assume data is in subgroups.• The Johnson transformation is more powerful than a Box-Cox transformation, hence it works more often with meter data files.
  27. 27. Exercise: Johnson Transformations1. Use G1 data2. Based on the probability plots what can you say about the original data and the transformed data?3. Using the transformed data in Z1.9
  28. 28. G1 Data & Histogram Histogram of G1 Sorted Data Normal 60 Mean 99.41 StDev 3.344 N 57 50 40 Data is Frequency 30 Non Normal 20 10 0 78 84 90 96 102 FPL G1 Sorted TotalVariable Count N Mean StDev Minimum MedianG1 57 57 99.410 3.344 74.650 99.890 Maximum Range Skewness Kurtosis 100.110 25.460 -7.51 56.61
  29. 29. G1 Johnson Transformed Data Johnson Transformation for G1 Sorted_1 P r obability P lot for O r iginal Data Select a T r ansfor mation 99 0.6 N 56 0.8 P-Value for A D test AD 3.003 90 P-Value <0.005 0.6 Percent 0.4 50 0.2 Ref P 10 0.0 0.2 0.4 0.8 0.6 1.0 1.2 1 Z Value 99.0 99.5 100.0 100.5 (P-Value = 0.005 means <= 0.005) P r obability P lot for T r ansfor med Data 99 N 56 AD 0.259 P -V alue for Best F it: 0.702832 90 P-Value 0.703 Z for Best F it: 0.6 Best Transformation Ty pe: S U Percent Transformation function equals 50 0.576014 + 0.884646 * A sinh( ( X - 99.9377 ) / 0.0840756 ) 10 1 -2 0 2Transformed Data Now Satisfies the Normality Requirement
  30. 30. Z1.9 Calculations Using Johnson Transformed G1 Data EXAMPLE CALCULATIONS FOR ANSI/ASQ Z1.9 STANDARD DEVIATION METHOD WITH DOUBLE SPECIFICATION LIMITS Full Load Data Using Johnson Transformed Data Example meter group has a population of 3,000 meters and 2% accuracy. Use AQL = 0.4% Line Information Needed Value Obtained Explanation 1. Sample Size: n 57 From Table I 2. Sum of % Registrations: ∑X 3. Sum of Squared % Registrations: ∑X2 4. Correction Factor (CF): (∑X)2/n 5. Corrected Sum of Squares (SS): ∑X2-CF 6. Variance (V): SS/(n-1) 7. Estimate of Lot Std. Deviation (S): √V 1.089 8. Sample Mean (X bar): (∑X)/n .012 9. Upper Specification Limit: U 3.49830 (102.0) 10. Lower Specification Limit: L -2.47898 (98.0) 11. Quality Index (upper): QU = (U- X bar)/S 3.207 (3.49830-.012)/1.089 12. Quality Index (lower): QL = (X bar - L)/S 2.292 (.012-(-2.47898))/1.08 13. Est. of Lot % Out of Limits Above U: Pu 0.035% From Table V 14. Est. of Lot % Out of Limits Below L: Pl 0.954% From Table V 15. Total Est. % Out of Limits: P = Pu+Pl 0.989% 0.035% +0.954% 16. Maximum Allowable % Out of Limits: M 1.16% From Table II and IV using AQL = 0.4 17. Acceptability Criterion: Pu + Pl < M 0.989% < 1.16% Therefore, the meter group is acceptable for continued service. Acceptability Criterion: If the estimated lot percent nonconforming (P) is equal to or less than the maximum allowable percent nonconforming (M), the lot meets the acceptability criterion. If P is greater than M or if either Qu or Ql or both are negative, then the lot does not meet the acceptability criterion.
  31. 31. Checking for Alternative Distributions• Find another known distribution that fits the data – Normal – Lognormal – Exponential – Weibull – Smallest or Largest Extreme Value – Gamma – Logistic – Loglogistic• Can use Minitab or other statistics software package• Calculate probabilities based on distribution and determine how to accept the results• Very difficult to do
  32. 32. Exercise: Check for Alternative Distribution See Centron data Probability Plot for Centron FL Goodness of F it Test Normal - 95% C I Lognormal - 95% C I 99.99 99.99 Normal A D = 2.576 99 99 P-V alue < 0.005 90 90P er cent P er cent 50 50 Lognormal A D = 2.587 No Distributions Fit 10 1 10 1 P-V alue < 0.005 the Data Because Its 3-Parameter Lognormal 0.01 99.8 100.0 100.2 100.4 0.01 99.8 100.0 100.2 100.4 A D = 2.565 P-V alue = * Basically a Normal Centr on FL Centr on FL Exponential Distribution with 3-Parameter Lognormal - 95% C I Exponential - 95% C I A D = 564.693 99.99 99.99 P-V alue < 0.003 Kurtosis 90 99 50 90P er cent P er cent 10 50 1 10 1 0.01 0.01 3192.0 3192.2 3192.4 3192.6 0.01 0.10 1.00 10.00 100.00 1000.00 Centr on FL - T hr eshold Centr on FL
  33. 33. Assessment of Risk – Histograms & CDF• Develop histogram of data• Develop cumulative probability distribution and compare to normal probability distribution• Calculate and plot empirical cdf using Minitab and compare to normal cdf• How close are the data plot and normal plot?Sampling is frequently from a population that isapproximately normal. If the deviation from normalityis not large, the best procedure may be to proceedwith the standard Z1.9 methods and interpret theresults with some degree of caution. See AM 250 Example Excel & Minitab Excel
  34. 34. Assessment of Risk – AM 250 Data Histograms & CDF Histogram (with N ormal Curv e) of AM 2 5 0 S ort_ 1 no out M ean 99.85 1000 S tD ev 0.8336 N 7638 800Frequency 600 400 200 0 94.4 96.0 97.6 99.2 100.8 102.4 104.0 105.6 A M 250 Sort _1 no out Data is not normal & Does Not Transform Using Box Cox or Johnson
  35. 35. Assessment of Risk - AM 250 Histogram CRF Data St Dev: 0.83355 Compare Data CRF to Normal Max: 105.7 Min:  94.3 Data Normal Range: 11.4 Count: 7638 Cumulative Cumulative Relative Relative  Relative  Class  Bins Frequency Frequency Frequency Frequency 93.95 to 94.95 94.95 2 0.03% 0.03% 0.000% 94.95 to 95.95 95.95 7 0.09% 0.12% 0.000% 95.95 to 96.95 96.95 12 0.16% 0.27% 0.026% 96.95 to 97.95 97.95 65 0.85% 1.13% 1.147% 97.95 to 98.95 98.95 871 11.40% 12.53% 14.107% 98.95 to 99.95 99.95 3269 42.80% 55.33% 54.914%99.95 to 100.95 100.95 2938 38.47% 93.79% 90.701%100.95 to 101.95 101.95 392 5.13% 98.93% 99.416%101.95 to 102.95 102.95 45 0.59% 99.52% 99.990%102.95 to 103.95 103.95 19 0.25% 99.76% 100.000%103.95 to 104.95 104.95 10 0.13% 99.90% 100.000%104.95 to 105.95 105.95 8 0.10% 100.00% 100.000%
  36. 36. Assessment of Risk – Empirical CDF of AM 250Blue Curve = Normal CDF Red Curve = AM 250 Data CDF Empir ical CDF of AM 2 5 0 S or t_ 1 no out Norm al M ean 99.85 100 S tD ev 0.8336 N 7638 80 60 Percent 40 20 0 95.0 97.5 100.0 102.5 105.0 A M 250 Sort _1 no out Curves are nearly Identical…… Little Risk in Using Z1.9 Even Though Data is Not Normal
  37. 37. Assessment of Risk - OCCsSimilar to Empirical cdf but difficult to evaluate The Acceptable Quality Limit (AQL) is the maximum percentage or proportion of nonconforming units in a lot that can be considered satisfactory as a process average for the purpose of acceptance sampling Typical OC Curve for AQL ~ 1%. OCC shape will change depending on characteristics of the non normality.
  38. 38. Assessment of Risk - OCCsANSI/ASQ Z1.9 OCCs for Larger Population Size
  39. 39. Can Always Punt – Use Z1.4• Normality is not a requirement. OCCs usually calculated with Poisson, hypergeometric, or binominal distributions.• Go/No Go evaluation of the sample. Just count the number of meters that fail the test.• Down side…..Requires a larger sample size than Z1.9 – Sample of 75 meters using Z1.9 requires 200 using Z1.4
  40. 40. SummaryIf the Data Is Not Normal, Don’t Panic!• There are lots of meaningful statistical tools you can use to analyze your data.• It just means you may have to think about your data in a slightly different way. Best choices: – Use the Johnson transformation to try to transform the data so that it becomes approximately normal. Use Z1.9. – Using histograms and empirical cdfs, evaluate the risk of assuming a normal distribution and using Z1.9 – Use Z1.4 and larger sample

×