Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

687 views

Published on

No Downloads

Total views

687

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

33

Comments

0

Likes

1

No embeds

No notes for slide

- 1. By Atcharaporn Khoomtong Elementary statistics 1
- 2. Introduction Statistical methodology Step of scientific research Important parametric tests Important nonparametric tests Example using Excel program Using Excel for Statistics in Gateway Cases – Office 2007 Elementary statistics 2
- 3. Most people become familiar with probability and statistics through radios, television,newspapers and magazines.For example,the following statements were found in newspapers. Eating 10 grams(g) of fiber a day reduce the risk of heart attack by 14% Thirty minutes (of exercise) two or three times each week can raise HDLs 10 to 15% Elementary statistics 3
- 4. Statistics is used to analyze the results of surveys and as a tool in scientific research to make decisions based on controlled experiments. Other uses of statistics include operations research, quality control, estimation and prediction. Elementary statistics 4
- 5. What’s it? Flower
- 6. as the basis of data analysis are concerned with two basic types of problems (1) summarizing, describing, and exploring the data This problems is covered by descriptive statistics (2) using sampled data to infer the nature of the process which produced the data This problems is covered by inferential statistics. Elementary statistics 6
- 7. Statistics plays an important role in the description of mass phenomena. Organized and summarized for clear presentation for ease of communications. Data may come from studies of populations or samples It offers methods to summarize a collection of data. These methods may be numerical or graphical, both of which have their own advantages and disadvantages. Elementary statistics 7
- 8. Inferential statistics is used to draw conclusions about a data set. Usually this means drawing inferences about a population from a sample either by estimating some relationships or by testing some hypothesis. A Population is the set of all possible states of a random variable. The size of the population may be either infinite or finite. Elementary statistics A Sample is a subset of the population; its size is always finite. 8
- 9. Descriptive Statistics Graphical Inferential Statistics Confidence interval Arrange data in tables Compare means of two Bar graphs and pie charts samples Numerical t Test Percentages Averages Range Relationships Correlation coefficient Regression analysis F -Test Compare means from three samples Pre/post (LSD,DMRT) ANOVA = analysis of variance F -Test
- 10. Another important aspect of data analysis is the Data, which can be of two different types: qualitative data ex. Sex, color, smell, taste etc. quantitative data ex. Height, weight, percentage etc. Qualitative data does not contain quantitative information. Qualitative data can be classified into categories. Elementary statistics 10
- 11. Type of Scale Possible Statements Allowed Operators Examples nominal scale identity, countable =, ≠ colors, phone numbers, feelings ordinal scale identity, less than/greater than relations, countable =, ≠, <, > soccer league table, military ranks, energy efficiency classes interval scale identity, less than/greater than relations, equality of differences =, ≠ , <, - dates (years), temperature in Celsius, IQ scale ratio scale identity, less than/greater than relations, equality of differences, equality of ratios, zero point =, ≠ , <, - velocities, lengths, temperatur in Kelvin, age Elementary statistics 11
- 12. Collecting the necessary Analyzing the facts facts Inference Statistics Descriptive Statistics Assessing the results Elementary statistics Making decisions Carrying out decisions 12
- 13. Mode =The most frequent value Median =The value of the middle point of the ordered measurements Mean =The average (balancing point in the distribution) Variance= The average of the squared deviations of all the population measurements from the population mean Standard deviation =The square root of the variance
- 14. Descriptive Formula Inferential Formula 2 2 2 S 2 1 Called the “unbiased estimator of the population value”
- 15. Population of profit margins for five companies: 8%, 10%, 15%, 12%, 5% 8 2 10 15 12 5 8 10 2 2 4 0 2 2 5 10 10 50 5 2 10% 15 10 5 52 2 2 52 5 25 4 25 58 11.6 5 5 02 11.6 3.406% 2 12 10 2 5 10 2
- 16. Hypothesis = a assumption or some supposition to be proved or disproved. “the automobile A is performing as well as automobile B.”
- 17. Null hypothesis (H0 ) =expresses no difference Often said “H naught” H0: =0 Or any number Later……. H0: 1 = 2 Alternative hypothesis (H1 ) H0: = 0; Null Hypothesis HA: = 0; Alternative Hypothesis Elementary statistics 17
- 18. Type I error (α) : reject H0 | H0 true Type II error (β) : Accept H0 | H1 true Elementary statistics 18
- 19. Calculated F value is greater than the critical F values Significant >>>reject H0 Calculated F value is lower than the critical F values Non Significant >>>accept H0 Elementary statistics 19
- 20. Truth H0 Correct HA Correct Decide H0 “fail to reject H0” 1- α True Negative β False Negative Decide HA “reject H0” α False Positive 1- β True Positive Data α = significance level 1- β = power
- 21. Ztest T– test Elementary statistics F– test 21
- 22. Z - test is based on the normal probability distribution and is used for judging the significance of several statistical z-test is generally used for comparing the mean of sample to measures, particularly the mean. a(n>30) some hypothesized mean for the population in case of large sample
- 23. T – test is based on t-distribution and is considered an appropriate test for judging the significance of a sample mean or for judging the significance of difference between the means of two samples in case of small sample(s) when population variance is not known (in which case we use variance of the sample as an estimate of the population variance). t-test applies only in case of small sample(s) when population variance is unknown. Unknown variance Under H0 X 0 s/ n ~ t( n 1) Critical values: statistics books or computer t-distribution approximately normal for degrees of freedom (df) >30 Elementary statistics 23
- 24. F – test is based on F-distribution and is used to compare the variance of the two-independent samples. This test is also used in the context of analysis of variance (ANOVA) for judging the significance of more than two sample means at one and the same time. Test statistic, F, is calculated and compared with its probable value (to be seen in the F-ratio tables for different degrees of freedom for greater and smaller variances at specified level of significance) for accepting or rejecting the null hypothesis. Elementary statistics 24
- 25. Anova tables: for a 1-way anova with N observations and T treatments. Source df treatment (T-1) error…………by subtraction Total (N-1) SS SStrt Sserr MS F =SStrt/(T-1) MStrt/MSerr =SSerr/dferr Finally, you (or the PC) consult tables or otherwise obtain a probability of obtaining this F value given df for treatment and error.
- 26. 1: Calculate N, Σx, Σx2 for the whole dataset. 2: Find the Correction factor CF = (Σx * Σx) /N 3: Find the total Sum of Squares for the data = Σ(xi2) – CF 4: add up the totals for each treatment in turn (Xt.), then calculate Treatment Sum of Squares SStrt = Σt(Xt.*Xt.)/r - CF where Xt. = sum of all values within treatment t, and r is the number of observations that went into that total. 3: Draw up ANOVA table, getting error terms by subtraction.
- 27. Complete @LSD Randomize Design : Least (CRD) Randomize Complete Block @DMRT: Design (RBD) Duncan’s New Multiple Range Latin Square (LQ) Test Treatments Replication Degree of freedom (df) Significant Difference Elementary statistics 27
- 28. Most people have difficulties in determining whether a model is linear or non-linear. Before discussing the issues of linear vs. nonlinear systems, let's have a short look at some examples, displaying several types of discrimination lines between two classes: Nonlinear linear Elementary statistics 28
- 29. Here's the answer: linear models are linear in the parameters which have to be estimated, but not necessarily in the independent variables. This explains why the middle of the three figures above shows a linear discrimination line between the two classes, although the line is not linear in the sense of a straight line. Elementary statistics 29
- 30. When calculating a regression model, we are interested in a measure of the usefulness of the model. There are several ways to do this, one of them being the coefficient of determination (also sometimes called goodness of fit). The concept behind this coefficient is to calculate the reduction of the error of prediction when the information provided by the x values is included in the calculation. Elementary statistics 30
- 31. Thus the coefficient of determination specifies the amount of sample variation in y explained by x. For simple linear regression the coefficient of determination is simply the square of the correlation coefficient between Y and X . Strong negative Linear relationship Strong positive Linear relationship -1 0 Elementary statistics No Linear relationship 31 +1
- 32. also called Pearson's product moment correlation after Karl Pearson is calculated by The correlation coefficient may take any value between -1.0 and +1.0. Assumptions: linear relationship between x and y continuous random variables both variables must be normally distributed x and y must be independent of each other Elementary statistics 32
- 33. test Elementary statistics 33
- 34. test is based on chi-square distribution and as a parametric test is used for comparing a sample variance to a theoretical population variance. where = variance of the sample; = variance of the population; (n – 1) = degrees of freedom, n being the number of items in the sample.
- 35. Elementary statistics 35
- 36. In quality control, there are situations when we need to know whether a sample mean lies within the confidence limits of the entire population. This can be accomplished by using t-distribution to determine confidence limits for a population mean using a selected probability. We will use Excel function TINV( ) to determine the t-distribution. Elementary statistics 36 E X A M P L E I
- 37. Ten cans of sliced pineapple were removed at random from a population of 1000 cans. The drained weight of the contents were measured as 410.5, 411.4, 410.4, 412.6, 411.9, 411.5,412.5, 411.4, 411.5, 410.1 g. Determine the 95% confidence limits for the entire population. Elementary statistics 37
- 38. We will first calculate the average of the ten data values using the AVERAGE() function. Next we will determine the standard deviation of the sample mean using STDEV() function. Then we will use the following expression to estimate the lower and upper limits of population mean Elementary statistics 38
- 39. Discussion: The results show that the 95% confidence lower and upper limits for the population mean are 410.78 and 411.98, respectively. Elementary statistics 39
- 40. When a sample is taken from a large population and analyzed for selected DATA, statistical analysis is helpful in obtaining estimates for the total population from which the sample was obtained. In this worksheet. We will use Excel's built-in data analysis techniques to determine various statistical descriptors for the sample and the population. Elementary statistics 40 E X A M P L E II
- 41. Case study : Color Data A sample of 10 breads is obtained from a conveyor belt exiting a baking oven. The breads are analyzed for color by comparing them with a standard color chart. The values recorded, in customized color units, are as follows: 34, 33, 36,37, 31, 32, 38, 33, 34, and 35. Estimate the mean, variance, and standard deviation of the population. Elementary statistics 41
- 42. We will use the Data Analysis capability of Excel in determining the descriptive statistics for the given data. First, you should make sure that Data Analysis... is available under the menu command Tools. If it is not available, then see Next slide for details on how to add this analysis package. Elementary statistics 42
- 43. Click Microsoft Office Button , and Then Click Excel Options Click Add-ins. In Manage Box, Select Excel Add-ins Click Go In the Add-Ins Available Box, Select Analysis ToolPak Check Box and Click OK. (If ToolPak Is Not Listed, Click Browse to Locate It.) 43
- 44. Step 1 Open a new worksheet expanded to full size. Step 2 In cells A2 :A 11, type the text labels and data values Elementary statistics 44
- 45. Step 3 Choose the menu items Data, Data Analysis .... A dialog box will open as shown. Step 4 Double click on Descriptive Statistics. Elementary statistics 45
- 46. Step 5 In the edit box for Input Range:, type the range of cells as SA$2:$A$11. Step 6 Select the radio button Columns. Step 7 In output range type A13. Click OK. Step 8 Excel will calculate the descriptive statistics and display results in cells A13:B28 @The results indicate that the sample mean is 34.3. @The standard deviation for the population is 2.214, and @the sample variance of the population is 4.9 Elementary statistics 46
- 47. t (difference between samples) / (variability) Excel will automatically calculate t-values to compare: Means of two datasets with equal variances Means of two datasets with unequal variances Two sets of paired data abs(t-score) < abs(t-critical): accept H0 Insufficient evidence to prove that observed differences reflect real, significant differences 47
- 48. A researcher wishes to test whether heavy metal in soil have different mean after war threat versus before war threat. The heavy metal in soil is that mean after war threat will exceed mean before war threat Use Excel to help test the hypothesis for the difference in population means. Elementary statistics 48 E X A M P L E III
- 49. Step 1 Open a new worksheet expanded to full size. Step 2 In cells B5 :C19, type the text labels and data values The null and hypothesis to be test are: Ho : HA : Elementary statistics 1 2 1 2 49 0.0 0.0
- 50. Step 3 Choose the menu items Tools, Data Analysis .... A dialog box will open as shown. Step 4 Double click on t-Test two-sample assuring equal variances. Elementary statistics 50
- 51. Elementary statistics 51
- 52. t > tcritical(one-tail), so the mean of sample #1 is significantly larger than the mean of sample #2. Change this if you want to know whether the means of the two samples differ by at least some specified amount. p value for one tailed test is .003 which is less than .05 so we reject the null hypothesis. t > tcritical(two-tail), so the mean of sample #1 is significantly different from the mean of sample #2. Elementary statistics p value for Two-tail test is .007 which is less than .05 so we reject the null hypothesis. 52
- 53. In hypothesis testing, it is sometimes not possible to use the same judges for testing different treatments. Although, it would be desirable to use the same judges to evaluate samples obtained from different treatments. In such cases, we have a completely randomized design. Using single-factor ANOVA We can test to see whether the treatments had any influence on the judges scores; in other words, does the mean of each treatment differ? Elementary statistics 53 E X A M P L E IV
- 54. Case study : Weight of oranges Data a weight of oranges from three different suppliers A, B, and C .Five oranges was random sampling and weighted. The following weights were obtained: Consider A B C 150 148 146 151 150 148 152 152 150 153 154 152 154 156 154 Elementary statistics 54
- 55. For each treatment, 5 samples were weighted by 5 times. Therefore, the design was completely randomized. Calculate the F value to determine whether the means of three treatments are significantly different. Elementary statistics 55
- 56. We will use a single factor analysis of variance available in Excel. We will determine the F value at probability of 0.95 . These computations will allow us to determine if the means between the three different treatments are significantly different. First make sure that the Data Analysis... Command is available under menu item Data. Elementary statistics 56
- 57. Step 1 Open a new worksheet expanded to full size. Step 2 In cells A4 :C8, type the text labels and data values Elementary statistics 57
- 58. Step 3 Choose the menu items Data, Data Analysis .... A dialog box will open as shown. Step 4 Double click on Anova Single Factor. Elementary statistics 58
- 59. The results show that the F value is 0.889. The critical F values are At the 5% level F = 3.885 This indicates that for the example problem the F value is lower than the value at the 5% level but not at the 5% level. Thus, we can say that no significant difference in their mean scores(P<0.05). Elementary statistics 59
- 60. When we are interested in evaluating samples for sensory characteristics using same judges with samples obtained from multiple treatments, analysis of variance for a twofactor design without replication is useful. This analysis helps in determining if there are significant differences among the various treatments as well as if an significant differences exist among the judges themselves. Elementary statistics 60 E X A M P L E V
- 61. Three types of ice cream were evaluated by 11 judges. The judges assigned the following scores. Judge Ice Cream A Ice Cream B Ice Cream C A 16 14 15 B 17 15 17 C 16 16 16 D 18 14 16 E 16 14 14 F 17 16 17 G 18 14 15 H 16 15 16 I 17 14 14 J 18 13 16 K 17 15 15 Elementary statistics 61
- 62. We will use the built-in analysis pack available in the Excel command called Data Analysis .... Three sets of results will be obtained for the 5% level Elementary statistics 62
- 63. Step 1 Open a new worksheet expanded to full size. Step 2. In cell A3 :D 13, type the text labels and data values, Elementary statistics 63
- 64. Step 3 Choose the menu items Data, Data Analysis .... A dialog box will open. Step 4 Double click on Anova: Two-Factor Without Replication. A new dialog box will open. Step 5 Type entries in edit boxes as shown. Step 6. The results will be displayed in cells Elementary statistics 64
- 65. For judges, the calculated F value is 1.36. This value is lower than the critical F values of 2.35 at the 5 % level Elementary statistics The difference among ice cream types is determined by examining the F values. The F value is calculated as 19.73. This value is greater than 3.49 for the 5% level 65
- 66. The difference among ice cream types is determined by examining the F values. The F value is calculated as 19.73. This value is greater than 3.49 for the 5% level, The ice cream types are significantly different at p<0.001. For judges, the calculated F value is 1.36. This value is lower than the critical F values of 2.35 at the 5 % level. The judges showed no significant difference in their mean scores. Elementary statistics 66
- 67. Simple regression analysis involves determining the statistical relationship between two variables. One of the uses of such analysis is in predicting one variable on the basis of the other. We will use the regression analysis available in the Add-in package in Excel to determine linear regression between two variables. Elementary statistics 67 E X A M P L E VI
- 68. Case study : Sensory scores Data flavor with storage time in a frozen vegetable. Sensory scores obtained at 0, 1, 2, 3, 4 and 6 month times were 1.5, 2, 2, 3, 2.5, and 3.5, respectively. Assuming that these data can be linearly correlated, determine the regression coefficient and predict the off-flavor score at 5 months of storage. Elementary statistics 68
- 69. We will use the package Regression available as an Add-in item in Excel. We will use this package to obtain required statistical relationships. We assume that a linear relationship exists between the off-flavor score and time (in months) with the equation y= mx+b, where y is off-flavor score, x is time in months, m is slope and b is intercept. Elementary statistics 69
- 70. Step 1 Open a new worksheet expanded to full size. Step 2 In cells A4 :B9, enter the text labels and data values Elementary statistics 70
- 71. Step 3 Choose the menu items Data, Data Analysis .... A dialog box will open. Step 4 Double click on Regression. Step 5 A new dialog box will open. Enter the range of cells for Y and X as shown. Check boxes for Residuals and Line Fit Plots. Click OK. Elementary statistics 71
- 72. The results will be displayed ~99% of the variation in y is explained by variation in x. The remainder may be random error, or may be explained by some factor other than x. Probability of getting this value of F by randomly sampling from a normally distributed population. Low value means model (rather than random variability) explains most variation in data. y = 0.31 x + 1.58 Ratio of variability explained by model to leftover variability. High number means model explains most variation in data. Probability of getting a slope or intercept this much different from zero by randomly sampling from a normally-distributed population. Elementary statistics Confidence limits on slope and intercept. 72
- 73. The r 2 value is calculated as 0.85, the standard error is 0.318.The intercept is 1.5786 and the slope is 0.3143. The linear equation is y = 0.31x + 1.58 . The residual output gives the predicted values for the off-flavor score at different time intervals. These data are also shown in the chart. The predicted and calculated values are shown. The predicted value at 5 months of storage duration is calculated as 3.13. Elementary statistics 73
- 74. Elementary statistics 74
- 75. 75
- 76. Statistics - Descriptive Statistics - Histograms - Hypothesis Testing - Scatter Plots - Regression Analysis 76
- 77. Click Microsoft Office Button , and Then Click Excel Options Click Add-ins. In Manage Box, Select Excel Add-ins Click Go In the Add-Ins Available Box, Select Analysis ToolPak Check Box and Click OK. (If ToolPak Is Not Listed, Click Browse to Locate It.) 77
- 78. Click Data/Data Analysis (Far Right) /Descriptive Statistics & OK. Put Checkmarks on Summary Statistics, 95% or 99% Confidence Interval, & Labels in First Row Boxes. Move Cursor to Input Range Window, Highlight Data to Analyze including Labels, & Click OK. Your Data will Appear on New Worksheet. Widen Columns by Clicking Home/Format/AutoFit Column Width. 78
- 79. Click Data/Data Analysis/Histogram & OK. Put Checkmarks on Chart Output & New Worksheet Boxes. Move Cursor to Input Range Window, Highlight Data Going into Histogram. Move Cursor to Input Bin Range, Highlight Data Showing Upper Value of Each Bin & Click OK. Histogram will be on New Worksheet. You May Lengthen it by Clicking Blank Space in Window, Moving Cursor to Window Bottom Line & Holding Down Mouse Button as You Pull Down Window. 79
- 80. Go to Sheet One. Click Data/Data Analysis/ and the Appropriate Statistical Test. Then Click OK. On New Window Check Labels Box and Put Cursor on Variable 1 Range. Highlight Variable 1 Data Including Label. Put Cursor on Variable 2 Range & Highlight Variable 2 Data (Including Label). Then Click OK. Click Home/Format/AutoFit/Column Width 80
- 81. Go to Sheet One. Highlight Data (Be Sure X Values are in Left Column and Y Values are in Right Column). Click Insert/Scatter. Pull down menu and click Upper Left Icon. Click a Datum Point on Chart with Right Mouse Key, Add Trendline, & Click Linear. 81
- 82. Go to Sheet One. Click Data/Data Analysis (On Far Right) /Regression & Click OK. On New Window Check Labels Box and Put Cursor on X Range. Highlight X Data Including Label. Put Cursor on Y Range & Highlight Y Data (Including Label), Then Click OK. Click Home/Format/AutoFit Column Width. 82
- 83. Elementary statistics 83

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment