Upcoming SlideShare
×

# Error analysis statistics

1,760 views

Published on

Error analysis statistics

Published in: Education, Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,760
On SlideShare
0
From Embeds
0
Number of Embeds
180
Actions
Shares
0
38
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Error analysis statistics

1. 1. Error Analysis - Statistics • Accuracy and Precision • Individual Measurement Uncertainty – Distribution of Data – Means, Variance and Standard Deviation – Confidence Interval • Uncertainty of Quantity calculated from several Measurements – Error Propagation • Least Squares Fitting of Data Slide 1
2. 2. Accuracy and Precision • Accuracy Closeness of the data (sample) to the “true value.” • Precision Closeness of the grouping of the data (sample) around some central value. Slide 2
3. 3. Accuracy and Precision • Precise but Inaccurate Relative Frequency Relative Frequency • Inaccurate & Imprecise True Value X Value True Value X Value Slide 3
4. 4. Accuracy and Precision • Precise and Accurate Relative Frequency Relative Frequency • Accurate but Imprecise True Value X Value True Value X Value Slide 4
5. 5. Accuracy and Precision Q: How do we quantify the concept of accuracy and precision? -- How do we characterize the error that occurred in our measurement? Slide 5
6. 6. Individual Measurement Statistics • Take N measurements: X1, . . . , XN • Calculate mean and standard deviation: 1 x N Sx 2 N X i i 1 1 N 2   X i   x   N  i 1   • What to use as the “best value” and uncertainty so we can say we are Q% confident that the true value lies in the interval xbest  x. • Need to know how data is distributed. Slide 6
7. 7. Population and Sample • Parent Population The set of all possible measurements. • Sample Samples Handful of marbles from the bag A subset of the population measurements actually made. Population Bag of Marbles Slide 7
8. 8. Histogram (Sample Based) • Histogram – A plot of the number of times a given value occurred. • Relative Frequency – A plot of the relative number of times a given value occurred. Histogram 20 Relative Frequency Plot 0.25 Relative Frequency 0.3 Number of Measurements 25 15 10 5 0 0.2 0.15 0.1 0.05 0 30 35 40 45 50 55 60 65 70 75 80 30 35 40 45 50 55 60 65 70 75 80 X Value (Bin) X Value (Bin) Slide 8
9. 9. Probability Distribution (Population Based) • Probability Density Function (pdf) (p(x)) – Describes the probability distribution of all possible measures of x. – Limiting case of the relative frequency. Probability Density Function Probability per unit change in x 0.3 • Probability Distribution Function (P(x)) P x   P[ X  x] X x Probability that – Probability Distribution Function is the integral of the pdf, i.e. x P  x    p  x  dx  0.25 Q: Plot the probability distribution function vs x. Q: What is the maximum value of P(x)? 0.2 0.15 0.1 0.05 0 30 35 40 45 50 55 60 65 70 75 80 x Value (Bin) Slide 9
10. 10. Probability Density Function – The probability that a measurement X takes value between (-) is 1.    p x  dx  1 – Every pdf satisfies the above property. Ex: 1  p x   e A x2 B is a probability density function. Find the relationship between A and B.   Hint:   0 2 e - a x dx  1 2   a Q: Given a pdf, how would one find the probability that a measurement is between A and B? Slide 10
11. 11. Common Statistical Distributions • Gaussian (Normal) Distribution p  x  where: x x x x2 1  x 2  e  x   x 2 p x 2  x2 = measured value = true (mean) value = standard deviation = variance Q: What are the two parameters that define a Gaussian distribution? x Value Q: How would one calculate the probability of a Gaussian distribution between x1 and x2? ( See Chapter 4, Appendix A ) Slide 11
12. 12. Common Statistical Distributions • Uniform Distribution p x   1 x2  x1 0 x1  x  x2 p x  otherwise where: x = measured value x1 = lower limit x2 = upper limit x Value Q: Why do x1 and x2 also define the magnitude of the uniform distribution PDF? Slide 12
13. 13. Common Statistical Distributions Ex: A voltage measurement has a Gaussian distribution with mean 3.4 [V] and a standard deviation of 0.4 [V]. Using Chapter 4, Appendix A, calculate the probability that a measurement is between: (a) [2.98, 3.82] [V] Ex: The quantization error of an ADC has a uniform distribution in the quantization interval Q. What is the probability that the actual input voltage is within Q/8 of the estimated input voltage? (b) [2.4, 4.02] [V] Slide 13
14. 14. Statistical Analysis • Standard Deviation (x and Sx ) – Characterize the typical deviation of measurements from the mean and the width of the Gaussian distribution (bell curve). – Smaller x , implies better ______________. – Population Based 1 2  2  x     x   x  p  x  dx      – Sample Based (N samples) Sx  1 N N  X 2 i  x  i 1 Q: Often we do not know x , how should we calculate Sx ? Slide 14
15. 15. Statistical Analysis • Standard Deviation (x and Sx ) (cont.) Common Name for "Error" Level Error Level in Terms of  % That the Deviation from the Mean is Smaller Odds That the Deviation is Greater Standard Deviation  68.3 about 1 in 3 "Two-Sigma Error"  95 1 in 20 "Three-Sigma Error"  99.7 1 in 370 "Four-Sigma Error"  99.994 1 in 16,000  x  Z x  x   x  Z x Slide 15
16. 16. Statistical Analysis • Sampled Mean x is the best estimate of x .  1 N Best   x  E  X     Estimate x p  x  dx x   Xi  N i 1 Degree of Freedom • Sampled Standard Deviation ( Sx ) – Use x when x is not available.  reduce by one degree of freedom. Sx  1 N N  X i 1 2 i  x  N 1 2      S x    X i  x  N  1 i 1 When  x not known Q: If the sampled mean is only an estimate of the “true mean” x , how do we characterize its error? Q: If we take another set of samples, will we get a different sampled mean? Q: If we take many more sample sets, what will be the statistics of the set of sampled means? Slide 16
17. 17. Statistical Analysis Ex: The inlet pressure of a steam generator was measured 100 times during a 12 hour period. The specified inlet pressure is 4.00 MPa, with 0.7% allowable fluctuation. The measured data is summarized in the following table: Pressure (P)(MPa) Number of Results (m) 3.970 1 3.980 3 3.990 12 4.000 25 4.010 33 4.020 17 4.030 6 4.040 2 4.050 1 (1) Calculate the mean, variance and standard deviation. (2) Given the data, what pressure range will contain 95% of the data? Slide 17
18. 18. Confidence Interval • Sampled Mean Statistics – If N is large, x will also have a Gaussian distribution. (Central Limit Theorem) – Mean of x : x  E x   x x is an unbiased estimate. p( x ) p( x ) – Standard Deviation of x : x  x N  x is the best estimate of the error in estimating x . p( x ) x  x Q: Since we don’t know x , how would we calculate  x ? Slide 18
19. 19. Confidence Interval • For Large Samples ( N > 60 ), Q% of all the sampled means x will lie in the interval p x  x x  z Q x  x  z Q N Equivalently,   x  zQ x  x  x  zQ x N N   x x is the Q% Confidence Interval x x zQ x zQ x When x is unknown, Sx will be a reasonable approximation. Slide 19
20. 20. Confidence Interval Ex: 64 acceleration measurements were taken during an experiment. The estimated mean and standard deviation of the measurements were 3.15 m/s2 and 0.4 m/s2. (1) Find the 98% confidence interval for the true mean. (2) How confident are you that the true mean will be in the range from 2.85 to 3.45 m/s2 ? Slide 20
21. 21. Confidence Interval • For Small Samples ( N < 60 ), the Q% Confidence Interval can be calculated using the Student-T distribution, which is similar to the normal distribution but depends on N. – with Q% confidence, the true mean x will lie in the following interval about any sampled mean: Sx Sx x  t  ,Q  x  x  t  ,Q  Q% confidence interval N N   Sx Sx where   N  1 t,Q is defined in class notes Chapter 4, Appendix B. Slide 21
22. 22. Confidence Interval Ex: A simple postal scale is supplied with ½ , 1, 2, and 4 oz brass weights. For quality check, 14 of the 1 oz weights were measured on a precision scale. The results, in oz, are as follows: 1.08 1.03 0.96 0.95 1.04 1.01 0.98 0.99 1.05 1.08 0.97 1.00 0.98 1.01 Based on this sample and that the parent population of the weight is normally distributed, what is the 95% confidence interval for the “true” weight of the 1 oz brass weights? Slide 22
23. 23. Propagation of Error Q: If you measured the diameter (D) and height (h) of a cylindrical container, how would the measurement error affect your estimation of the volume ( V = D2h/4 )? Q: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)? How do errors propagate through calculations? Slide 23
24. 24. Propagation of Error • A Simple Example Suppose that y is related to two independent quantities X1 and X2 through y  C1 X 1  C 2 X 2  f  X 1 , X 2  To relate the changes in y to the uncertainties in X1 and X2, we need to find dy = g(dX1, dX2): dy  The magnitude of dy is the expected change in y due to the uncertainties in x1 and x2: 2 2  f   f   y  y   x1    x 2    X   X   1   2  C    C   2 1 x1 2 2 x2 Slide 24
25. 25. Propagation of Error • General Formula Suppose that y is related to n independent measured variables {X1, X2, …, Xn} by a functional representation: y  f X 1, X 2 , , X n  Given the uncertainties of X’s around some operating points: x1  x 1 , x 2  x 2 , , x n  x n  The expected value of y and its uncertainty y are: y  f  x1 , x1 ,  , xn  2 2  f   f   f  y   x1    x2      x n   X   X   X   1   2   n  2  x1 , x1 ,, x n  Slide 25
26. 26. Propagation of Error •Proof: Assume that the variability in measurement y is caused by k independent zero-mean error sources: e1, e2, . . . , ek. Then, (y - ytrue)2 = (e1 + e2 + . . . + ek)2 = e12 + e22 + . . . + ek2 + 2e1e2 + 2e1e3 + . . . E[(y - ytrue)2] = E[e12 + e22 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .] = E[e12 + e22 + . . . + ek2] y        E e1 2  E e2 2    E e k 2   1 2   2 2     k 2 Slide 26
27. 27. Propagation of Error • Example (Standard Deviation of Sampled Mean) Given x  1 X 1  X 2  X 3    X N N  Use the general formula for error propagation: 2 x   x   x   x1     X   X  x2 1 2    x  2 2   x   x    x3        X   X  x N 3 N         2 x N Slide 27
28. 28. Propagation of Error Ex: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)? 2 KE KE  KE   m   v     m   v  2 1  m  2  v  2  mv 2    2 mv 2    2  m v 1 m  2  v  2    2   mv 2   m  v 2 Slide 28
29. 29. • Best Linear Fit –How do we characterize “BEST”? Fit a linear model (relation) Output Y Least Squares Fitting of Data best linear fit yest  yi  ao  a1 xi to N pairs of [xi, yi] measurements. Given xi, the error between the  estimated output y i and the measured output yi is:  ni  yi  yi measured output yi Input X The “BEST” fit is the model that  N 2 N 2   min   ni   min   yi  yi   minimizes the sum of the ___________  i=1    i=1   of the error Least Square Error Slide 29
30. 30. Least Squares Fitting of Data N N 2 2 Let  J     yi  yi      yi  ao  a1 x i   i=1  i=1 The two independent variables are? M inim ize J  Find a o and a1 such that dJ  0 J J  0   0   a a o N    i  1 2  y i  a o  a1 x i   0 N    i  1 2 x i  y i  a o  a1 x i   0 Q: What are we trying to solve? Slide 30
31. 31. Least Squares Fitting of Data Rewrite the last two equations as two simultaneous equations for ao and a1: x  y a N  a  i 1 i  o  2  a o  x i  a1  x i       x i yi        ao    yi    a     x y    1   i i       xi 2  yi    xi   xi yi  ao     N   xi yi    xi   yi   a1      xi 2    xi  where   N Slide 31 2
32. 32. Least Squares Fitting of Data • Summary: Given N pairs of input/output measurements [xi, yi], the best linear Least Squares model from input xi to output yi is:  yi  ao  a1 xi   x  y    x   x y   2 where ao a1  i i i i i  N   x i yi    x i  yi  and   N    x i 2   xi   • The process of minimizing squared error can be used for fitting nonlinear models and many engineering applications. • Same result can also be derived from a probability distribution point of view (see Course Notes, Ch. 4 - Maximum Likelihood Estimation ). Q: Given a theoretical model y = ao + a2 x2 , what are the Least Squares estimates for ao & a2? Slide 32 2
33. 33. Least Squares Fitting of Data • Variance of the fit:  n2  1 N 2  yi  ao  a1xi 2  N i 1 • Variance of the measurements in y: y2 • Assume measurements in x are precise. • Correlation coefficient:   n2 n2 R 1 2 1 2 , y Sy 2 is a measure of how well the model explains the data. R2 = 1 implies that the linear model fits the data perfectly. Slide 33