Upcoming SlideShare
Loading in …5
×

# Inferential statistics

2,985 views

Published on

Published in: Education
1 Comment
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Very nice

Are you sure you want to  Yes  No
Your message goes here
• Be the first to like this

No Downloads
Views
Total views
2,985
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
134
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

### Inferential statistics

1. 1. NORMAL DISTRIBUTION<br /><ul><li>Given the standard normal deviation, find the area under
2. 2. To the left of z=1.39
3. 3. Answer:P( x<-1.39) = 0.0823
4. 4. To the right of z=1.96</li></ul>Answer:z=1.960.9750z=1-0.9750 = 0.0250 thereforeP(x<1.96)<br /><ul><li> Between z=-0.48 and z=1.74</li></ul>Answer: P(-0.48<x<1.74)z1=-0.480.3156z2 =0.9591z2-z1=0.9591-0.3156=0.6435<br /> Given the normal distribution with u=200 and σ=10, Find the area of a curve<br /><ul><li>below 214
5. 5. Z=1.4 0.9192Therefore P(x<214)= 0.9192z=x-μσ
6. 6. z=214-20010
7. 7. z=1.4
8. 8. Z=-2.1 0.0179Z=1-0.0179= 0.9821Therefore P(x>179)= 0.9821area above 179 and </li></ul>z=x-μσ<br />z=179-20010<br />z=-2.1<br /><ul><li>(c) area between188 and 206</li></ul>Z1=--1.2 0.1151Z2=0.60.7257Therefore P(188<x<206)= 0.7257-0.1151=0.6106z1=x-μσ<br />z1=188-20010<br />z1=-1.2<br />z2=x-μσ<br />z2=206-20010<br />z2=0.6<br /><ul><li>the x value that has 80% of the area below it </li></ul>Z1=--1.2 0.1151Z2=0.60.7257Therefore P(188<x<206)= 0.7257-0.1151=0.6106z=x-μσ; area 0.80.84(x)<br /><ul><li>z=x-μσ
9. 9. zσ+μ = x</li></ul>SAMPLE:<br /><ul><li>The average height of females in the freshman class of a certain college has been 165.2 cm with the standard deviation of 6.9 cm. Is there reason to believe that there has been a change in average height in a random sample of 50 females in the present freshman class has an average height of 162.5 cm? Use a 0.01 level of significance.
10. 10. Given: X=165.2 , μ=162.5, σ=
11. 11. Z=1.4 0.9192Therefore P(x<214)= 0.9192z=x-μσn
12. 12. z=214-2001050
13. 13. z=2.77</li></ul>INFERENTIAL STATISTICS<br />Consists of those methods by which one makes a generalization about the population. <br />It is divided into two areas: ESTIMATION and TEST OF HYPOTHESES.<br />Statistical inference is therefore concerned with making inferences about population parameter. Parameter is a numerical descriptive measure from a population. To make a clear distinction between population parameters and sample statistics. <br />The following symbols are to be used:<br />m = population meanX= sample meann= sample sizeδ = population standardN= population sizes= standard deviation<br />ESTIMATION OF POPULATION MEAN<br /><ul><li>The value of a sample mean is a point of estimate of population mean. The statistic that one uses to obtain a point estimate is called DECISION FUNCTION of an Estimator.
14. 14. Sample means are always expected to depart from the “true” mean of a population. Any deviation of a sample mean may be regarded as an error of estimation. The standard error of the sample mean indicated the extent of error of estimation.
15. 15. To estimate the value of the “TRUE” mean of the population is to estimate how many errors would be there be between a sample mean and a population mean. The location of a population mean is a sliding scale of value known as confidence interval. Since the sampling distribution of means for a sufficiently large sample is assumed to be normally distributed, 68.28% of a sample mean is within a distance of one standard error from a “TRUE” mean of a population, 95% of a sample mean is within a distance of 1.96 standard error from a true mean and 99% of a sample mean is within 2.58 standard errors from the “TRUE” mean of a population.</li></ul>CONFIDENCE INTERVAL<br /><ul><li>A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.
16. 16. If independent samples are taken repeatedly from the same population, and a confidence interval calculated for each sample, then a certain percentage (confidence level) of the intervals will include the unknown population parameter. Confidence intervals are usually calculated so that this percentage is 95%, but we can produce 90%, 99%, 99.9% (or whatever) confidence intervals for the unknown parameter.
17. 17. The width of the confidence interval gives us some idea about how uncertain we are about the unknown parameter (see precision). A very wide interval may indicate that more data should be collected before anything very definite can be said about the parameter.
18. 18. Confidence intervals are more informative than the simple results of hypothesis tests (where we decide "reject H0" or "don't reject H0") since they provide a range of plausible values for the unknown parameter.
19. 19. To determine the confidence interval for population mean, the following formula is used:
20. 20. CI=X+zσN-1 </li></ul>where :<br /><ul><li>CI is the confidence interval
21. 21. X is the mean of the sample,
22. 22. z is the area in the normal curve
23. 23. σN-1 is the standard error of the sample mean</li></ul>Confidence Limits<br /><ul><li>Confidence limits are the lower and upper boundaries / values of a confidence interval, that is, the values which define the range of a confidence interval.
24. 24. The upper and lower bounds of a 95% confidence interval are the 95% confidence limits. These limits may be taken for other confidence levels, for example, 90%, 99%, 99.9%.</li></ul> Confidence Level<br /><ul><li>The confidence level is the probability value  associated with a confidence interval.
25. 25. It is often expressed as a percentage. For example, say , then the confidence level is equal to (1-0.05) = 0.95, i.e. a 95% confidence level.
26. 26. Example Suppose an opinion poll predicted that, if the election were held today, the Conservative party would win 60% of the vote. The pollster might attach a 95% confidence level to the interval 60% plus or minus 3%. That is, he thinks it very likely that the Conservative party would get between 57% and 63% of the total vote.</li></ul>TEST OF HYPOTHESIS<br /> Testing Hypothesis is the next area of a statistical inference. The sample evidence is the basis for deciding between two or more prestated alternatives. Alternative are formulated as two complementary assumptions about the true value of a decision parameter.<br /> The procedure for formulating hypothesis and using sample evidence to decide which should be accepted is called a “test of hypothesis” or “ Test of Significance”. In hypothesis testing, certain value of the design parameters are assumed before sampling and decision making on their validality are made. <br /><ul><li>Null Hypothesis (Ho)
27. 27. Testing of hypotheses or significance testing begins with the statement of a supposition or hypothesis called NULL HYPOTHESIS denoted by Ho referred to null hypothesis of a type speficify a zero value. At times the null hypothesis is also called a hypothesis of no difference. It the values to be compared are A and B, the null hypothesis would be expressed in symbols as Ho: A=B or Ho: A-B
28. 28. Alternative Hypothesis (H1)
29. 29. In case the null hypothesis is rejected, alternative hypothesis is accepted. It is denoted by H1 it may also be expressed in a non-directional form H1: A≠B or either of two directional forms H1: A>B or H1: A<B
30. 30. If the average yield of the new strain is larger than that of the control, the researcher may again use of hypothesis in which the alternative hypothesis H1 : m>mo .Unless a test of hypothesis is made, a value m greater than mo does not warrant the conclusion that m > mo . Only when the difference is larger than that which can be attributed to chance at a given significant level a or 1-confidence level can be the null hypothesis be rejected and the alternatice hypothesis accepted.</li></ul>Type I and Type II Errors<br /><ul><li>On the basis of the sample evidence, the alternative evidence may be considered. It is either that the claim can be abandoned or used to take remedial action. The two causes of action may lead to an incorrect decision. Accepting the null hypothesis when it is false or rejecting it when it is true may lead to Errors of decision. The probability of committing errors in the choice of hypothesis is called ERRORS of PROBABILITY identified as Type 1 and Type II as shown in the Decision Table:
31. 31. HYPOTHESISDECISIONReject HoAccept H1TRUEType I p=aCorrect decisionFALSECorrect decisionType II p=b
32. 32. Any statistical test involves these four elements:
33. 33. Null hypothesis and alternative hypothesis
34. 34. Decision rule
35. 35. Test statistics
36. 36. decision</li></ul>Example: A professor is concerned with the effectiveness of a new teaching technique. After teaching students on the new method, he claims that the average score in the achievement test is 80 for the total set of students. As a result of the achievement test administered to a sample of 30 students, the average score is 75 with standard deviation of 6.8. He can make two complementary assumptions about the effectiveness of the teaching technique. <br />Solution using the problem the professor’s claim about the average score of the population, the hypotheses are<br /> Ho: m=X and H1: m≠ X where average if population is denoted by m . Since the alternative hypothesis indicates inequality it calls for two-sided alternative and the test used is a two-tailed. The z-values which separate the acceptance or region regions are referred as a critical values.<br />The risk of committing the a error of rejecting Ho when in fact it is true is called level of significance. The magnitude of the risks to be taken is to be determined by the researcher or the policy decision maker<br />If the null hypothesis is rejected then is means that the sample does not support the null hypothesis. One does not prove that the null hypotheses is wrong, only that the sample evidence does not support the null hypothesis. If a null hypothesis is rejected, this implies the acceptance of the alternative hypothesis.<br />DECISION RULE: partitions the sampling distribution of the test statistics inro regions for accepting or rejecting the null hypothesis. The partition is determined on the basis of the direction of the hypothesized parameter (two-tailed or one-tailed test) and the desired livel of significance. The decision to accept or reject the null hypothesis by applying the decision rule to the sample data is the last part of the test of hypothesis problem. Rejection of the null hypothesis implies acceptance of the alternative hypothesis. The error probabilities a and b are inversely related for a given sample size, if a decreased, b increased or vice-versa. The only way to reduce both risks is to increase the sample size.<br /><ul><li>INFERENCES ABOUT THE MEANS AND PROPORTIONS</li></ul>There are statistical problems which one may decide whether an observed difference between two sample means can be attributed to chance. The theoretical basis of this comparative approach is that if Ẍ1 and Ẍ 2 are the means of two large independent random sample of size n1 and n2 the sampling distribution of the statistics Ẍ1 - Ẍ 2 is approximately close with a normal curve.<br />Z-Test <br />Z-test of Two-sample Test Between Population Means <br /><ul><li>z=X1 -X2 σ12n1+σ22n2Where: Z is the test statisticsX1 is the mean of the 1st sampleσ1 is the standard deviation of 1st groupX2 is the mean of the 2nd sampleσ2 is the standard deviation of 2nd groupn1 is the number of items of sample 1n2is the number of items of sample 2Independent samples are those whose selection of the members of the sample in n o way affect the selection of the members of the second sample. Means obtained in independent sample are called uncorrelated means. The test statistic for determining differences between two sample means is given by the following formula:
37. 37. Sample 1Sample 2N4050X7478σ87
38. 38. Following the General Step in hypothesis Testing
39. 39. H0: X1=X2</li></ul> H1: X1≠ X2<br /><ul><li>Decision Rule:
40. 40. Let a = 0.05. The region of rejection consists of all values of z greater than 1.96 or less than -1.96
41. 41. Test Statistics:
42. 42. z=X1 -X2 σ12n1+σ22n2 = -2.5
43. 43. SDp=p(1-p) or pqSDp is the standard deviation of a proportionP is the proportion of subjects who possess the traitQ is the proportion of subject who do not possess the trait, The standard error of a proportion is estimated as follows;SEp= SDpN-1 SEp is the standard errorz=X1n1 -X2n2 p(1-p)1n1 -1n2 Where: S z is the test statistics for large proportionX1n1 ,X2n2 are proportions of two samplesn1, n2are number of cases of the two samples
44. 44. Decision:
45. 45. Since the computed z value is less than -1.96, the null hypothesis is REJECTED. </li></ul>Z-test of Two-Sample Test Between Population Proportions<br /><ul><li>z=P 1-P 2PQ(1+1)(N1+N2 Where: Z is the test statisticsP1 Percentage of the 1st variableP2 Percentage of the 1st variableP is the Pooled percentage of P1 and P2 Q=100-P n1 is the number cases of the 1st variablen2 is the number cases of the 2nd variableIf an event occurs x times in n trials, the frequency of its occurrence is xn. This can be used as a sample proportion to estimate the corresponding population proportion. To compare differences between proportions, the distribution of sample proportion must be known. The Standard error of a proportion and the shape of the distribution of sample proportions must be known or derived. The means of sample scored on a one-zero basis is p, that is the proportion of subjects who possess the trait.</li></ul>Considerations: When p is larger than 0.90 or less than .10, the proportion should not be compared or tested. Sample sizes of less than 50 should be avoided since a change of only one item produces a relatively large change in p.<br /><ul><li>Z-test Between Percentages
46. 46. z-test between percentages is an inferential statistical tool used to determine the significant difference between two variables of bivariate of two percentages of related individuals in which the data are collected through survey or descriptive research. The formula is expressed this way.</li></ul>T-Test – Majority of research papers, theses and dissertations used t-test on descriptive research. This is used in two variables or bivariate, but t-test is inappropriate in descriptive researches. Most are unaware of z-test between means and z-test between percentages which are best appropriate for bivariate descriptive research in determining significant difference between means and percentages.<br /><ul><li>T-test is another inferential statistical tool which is applicable only for experimental research in determining the significant difference between means. There are two statistical tools involved in getting t-test .</li></ul>T-test of Two-Sample Pooled t-test<br /><ul><li>Sdiff=X12-X12n1+X22 -X22n2n1+n2-21n1+ 1n2Where: X1 is the mean of the 1st sampleX2 is the mean of the 2nd sampleSdiffstandard error of the differenceWith small samples less than 30 which are independently drawn from a population with a normal distribution, t statistics is used. The formula is ,
47. 47. PROCEDURE:
48. 48. Look for the mean of each group
49. 49. Solve for the variance of each group either working formula or machine formula
50. 50. Compute t-test using the formula
51. 51. Find the degrees of freedom (df) by using the formula df= N1-N2-3 if two different subjects exposed to different variables and df=N-1 if only one subject is exposed to two variables
52. 52. Choose the level of significance either 0.01 or 0.05 level. Refer to t-distribution table if the computed value (CV) is equal to or greater than TV then
53. 53. If CV > TV = Significant , If CV < TV = Insignificant</li></ul>T-test: Two sample Assuming Unequal Variance <br /><ul><li>t=X1 -X2 σ12n1+σ22n2It has only TWO SET of subjects and exposed to TWO variables with difference variance. It is easier , faster and economical if t-test: paired two sample for means is computed with the used of computer because a blink of an eye results are attained and saves time and effort.</li></ul>T-test: Paired Two-Sample for Means <br /><ul><li>t=DND2-D2N-1It has only ONE SET of subjects but exposed to TWO variables. It is easier , faster and economical if t-test: paired two sample for means is computed with the used of computer because a blink of an eye results are attained and saves time and effort.</li></ul>T-test of Correlated Samples<br /><ul><li>t=D SdiffSdiff=SdnWhere: D1 is the mean of the differences in scores between pairs of match objectsX2 is the mean of the 2nd samplen is the number of matched pairsSdstandard deviation of difference scoresIn some situations of a before and after type, the same subject is measured twice. The two sets of data generated are said to be correlated. The statistics for paired data is given by the formula. The following is the procedure for determining the significance of the difference between paired or correlated means. This applies to situations where each subject is sample A is paired with a subject in sample B.
54. 54. Compute the score differences for all pairs of subjects
55. 55. For these differences compute a mean
56. 56. Computer a standard deviation of difference
57. 57. Square all D values and find their sum
58. 58. Sum all D values
59. 59. Apply the formula
60. 60. Divide Sd by n where n is the number of matched pairs in the two samples.
61. 61. Divide D by Sdiff to get the t-value and consult the t-table.</li></ul>x2 =O-E2E where: O = actual (Observed) freq for a given cellE= Expected frequency to that cellCHI-SQUARE - The chi-square test is used to compare an observed frequency distribution to its hypothetical frequency distribution. If the obtained variation is too large to attribute easily to chance differences, then the chi-square is judged significantly or non conforming to norm or accepted condition.<br /><ul><li>Chi-square test is used to determine (a) uniformity distribution (b) goodness of fit (c) association or independence of attributes. Significance of comparison is the set of values equal to or larger than the tabular values of chi-square, where degrees of freedom have to be determines in accordance with the number of values in the set of raw data.
62. 62. CHI-SQUARE is considered a versatile measure of various uses in both parametric and nonparametric test. A researcher can determine uniformity of responses in a certain questionnaire; uniformity of reading skill levels, etc using the chi-square
63. 63. Use to test association or independence of attributes between scholastic achievement and various levels of socio economic background background; or morale of teachers is associated or independent of salary, sex, leadership behavior of administrator or etc.
64. 64. It is also applied to test for normality of distribution of data collected such as achievement scores various subject or normality of distribution of teacher’s reaction on present evaluation instrument of DepEd.
65. 65. Test for Uniformity of Distribution – use to test for uniformity between observe and hypothesized uniformity of distribution.
66. 66. PROCEDURE:
67. 67. Determine the number of individuals that fall into each category being observed.
68. 68. Determine the number of individuals that are expected to fall into each category by dividing the total number of individuals by the number of categories
69. 69. For each cell subtract Expected from Observed, square the difference and divide the result by Expected
70. 70. Sum the result of step 3 for all cells</li></ul>EXAMPLE : Sixty women were asked to select the color they like best and the following distribution of choices were observed.<br /><ul><li> Ho= There is an equality of distribution of choices H1= There is inequality of distribution of choices.
71. 71. Solve for x2
72. 72. Using the x2 table with degrees of freedom (k-1), where k is the number of categories, with degrees that is, freedom equal to 2, the x2 tabular values are 5.9991 for 5% and 9.210 for 10% level of significance respectively
73. 73. Since the computed value x2 is 4.90, the decision is to accept Ho at both levels of significance and conclude that there is an equal distribution of choices. ( x2=4.90 <5.991) then accept Ho.
74. 74. Chi-Square of Test for Goodness of Fit</li></ul>x2 =O-E2E where: O = actual (Observed) freq for a given cellE= Expected frequency to that cellTo test for goodness of fit, the following procedures is used: Is the following distribution of a teacher’s grade normally distributed?<br />Letter Gradefo A20B50C50D20F5<br />PROCEDURE:<br /><ul><li>Sum the observe to fo to obain N
75. 75. There are 5 subgroups. The base of each group will be 6/5=1.2 standard deviation in length under the normal curve.
76. 76. Using half of the total area under the curve which is 0.5 as the minuend, subtract the area bounded by 1.2 standard deviation as follows</li></ul>Z-3.0-1.8-.60.61.83.0Area to mean.5000.4641.2257.2257.4641.5000<br />FDCBA.5000.4641.22574641 .5000.4641.2257.2257.2257 .4641.0359.2384.4514.2384.0359<br /><ul><li>Multiply the difference by 145 to obtain fe
77. 77. Subtract the fe to fo
78. 78. Square the difference in Steo 5.
79. 79. Divide each value obtained in Step 6 by fe
80. 80. Add the quotient which is now the computed value of x2
81. 81. The degrees of freedom (df) is k-1 which is 4
82. 82. The obtained value of x2 which is 58.73 is larger than the tabular value of x2, hence the null hypothesis is rejected. The obtained distribution is assumed to be non-normal.</li></ul> Chi-Square: .Test of Association of Independence of Attributes<br />Another use of the chi-square is to test whether certain attributes are associated or independent of behavior in general. Example the trait of sex may or m ay not be associated to study habits; diligence in studies to intelligence or handedness to scholastic achievement.The following is an example of testing for association or independence of certain attributes.<br />Is Sex related to study Habits in the following data?<br />AlwaysSometimesNeverSEXBOYS152810GIRLS23208<br />PROCEDURE<br /><ul><li>Sum the rows, columns and find N
83. 83. Compute for fe each of the cell in the table using the formula
84. 84. Subtract fe from fo for each cell.
85. 85. Square each of these differences
86. 86. Divide each square by fe
87. 87. Sum the quotients to obtain the chi-square
88. 88. Determine df according to the following (# of rows-1)(number of columns-1)
89. 89. Refer to the Table of chi-square and compare the value of chi-square just obtained. Any computed value equal to or greater than the table will reject the hypothesis of no association.
90. 90. ASSUMPTIONS: The chi-square requires certain assumptions to be present before it can be applied.The chi-square is amenable to independent NOMINAL DataData must be in frequencies or proportions of discrete categoriesWhen data are organized in a 2x2, 2x3 cells and so on, no cell shave have observed frequency less than 5. If this happen 2 cells can be combined. Chi-Square of Contingency Table</li></ul>For a 2x2 contingency table, the chi-square can be computed without resorting to the computation of fe. The following formula may be used:<br />x2=NAD-BC-N22A+BC+DA+CB+D<br />A+B+C+DN<br />ABCDA+C B+D<br />ANALYSIS OF VARIANCE- F-ratio, ANOVA -When there are three or more sets of measurements on the same variable, each under its own conditions, there may be a need to know whether the differences can be attributed to chance, or there are really differences among the sets considered simultaneously. The method used is known as the F-test or Analysis of Variance (ANOVA). The F-test is a ratio of the variances based on the variation among the means to the variation within the group or sets. <br /> It is used to measure differences among three or more variables of a study such as determining the differences in perception of teacher effectiveness along certain dimensions where students are suspected to differ in perceptions; to determine differences among three or more subgroups<br /> F-test provides an overall answer regarding the significance of total collection of differences among means.There are times when a researcher desires more information than that; he may have a special certain differences. In this case, he can make use of the z-test or t-test as the case may be, to determine specific mean difference between any two subgroups before F-test is made.<br /> For Experimental Design<br /><ul><li>Single-group design with different levels
91. 91. Parallel-group design
92. 92. One Control Group and Two or more experimental group
93. 93. Two Experimental groups
94. 94. Complete randomized design (CRD) using one-factor analysis of variance
95. 95. Latin-square design using two-factor ANOVA
96. 96. Randomized complete block Design (RCBD) using two-factor ANOVA</li></ul> Example: Is there a significant difference among the means of the following test scores obtained by four groups of pupils on the same test?<br />GROUPSIIIIIIIV3610948121087111277109<br />PROCEDURE:<br /><ul><li>In the example above the number of groups is a=4; the number of observations within each group n=4
97. 97. Find the sum of scores of all groups
98. 98. Find the sum of scores per group
99. 99. Consider each score as a total
100. 100. Square each score and add
101. 101. Square each group total and divide by the number of scores per group
102. 102. Square the grand total of the scores and divide by the total number of scores
103. 103. Summarize the sums of squares
104. 104. Prepare the anova table
105. 105. SOURCEdFSSMean SquaresAmong groupsWithin groupsF-ratio
106. 106.
107. 107. The sample problem presented is known as RANDOM EFFECT MODEL called the components of the variance model. It assumes that the treatments are random samples of all of the treatments. It does not look for differences among the group means of the treatments being tested, but rather ask whether there is significant variability among all the possible treatment groups.
108. 108. F-test or Single Factor Analysis of Variance (ANOVA) – it involves one independent variable as basis for classification. This is usually applied in single-group design and complete randomized design (CRD). To test the significance of the difference between means using F-test single factor ANOVA,
109. 109. F-test Two Factor or ANOVA Two-Factor – It involves three or more independent variables as basis for classification. F-test two-factor or ANOVA Two factor is appropriate for parallel-group design. In this design, three or more groups are used at the same time with one variable (control group) is manipulated or changed. The Experimental group varies while the parallel group serves as control group for manipulated purposes (Calmorin and Calmorin,2007). In parallel design, one variable is control group and two variables are experimental group.
110. 110. Kruskal-Wallis One-Way Analysis of Variance by Ranks (H). It is another inferential statistics used to gather independent samples both descriptive and experimental researches.This statistical technique is beneficial to test the difference of K independent samples from different populations whether they differ significantly or not. To apply H-test all the observations for the
111. 111. Friedman Two-Way Analysis of Variance (ANOVA) by Ranks (Xr2