Upcoming SlideShare
×

# Small Sampling Theory Presentation1

7,846 views
7,635 views

Published on

2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
7,846
On SlideShare
0
From Embeds
0
Number of Embeds
374
Actions
Shares
0
249
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Small Sampling Theory Presentation1

1. 1. Small Sampling Theory
2. 2. <ul><li>Small sample theory : The study of statistical inference with small sample (i.e. n ≤30) . It includes t-distribution and F-distribution. They are defined in terms of “ number of degrees of freedom”. </li></ul><ul><li>Degrees of freedom ν : Number of useful items of information generated by a sample of given size with respect to the estimation of a given population parameter. </li></ul><ul><li>OR </li></ul><ul><li>Total number of observations minus the number of independent constraints imposed on the observations. </li></ul><ul><li>n - no. of observations </li></ul><ul><li>k - no. of independent constants </li></ul><ul><li>then n - k = no. of degrees of freedom </li></ul><ul><li>Example:- X = A + B + C , (10 = 2 + 3 + C , so C = 5) </li></ul><ul><li>n = 4 , k = 3 </li></ul><ul><li>n – k = 1 , so 1 degree of freedom. </li></ul>Introduction
3. 3. t - Distribution <ul><li>William Sealy Gosset published t-distribution in 1908 in Biometrika under pen name “Student”. </li></ul><ul><li>When sample size is large than 30, then sampling distribution of mean will follow Normal distribution. </li></ul><ul><li>If sample size is less than 30, then sample statistic will follow t-distribution. </li></ul><ul><li>Probability density function of t-distribution: </li></ul><ul><li>Y 0 is a constant depending on n such that area under the curve is 1. </li></ul><ul><li>t -table gives the probability integral of t -distribution. </li></ul>
4. 4. Properties of t-Distribution <ul><li>Ranges from – ∞ to ∞ </li></ul><ul><li>Bell-shaped and symmetrical around mean zero. </li></ul><ul><li>Its shape changes as the no. of degrees of freedom changes. Hence ν is a parameter of t-distribution. </li></ul><ul><li>Variance is always greater than one and is defined only when v ≥ 3, given as </li></ul><ul><li>It is more platykurtic (less peaked at the centre and higher in tails) than normal distribution. </li></ul><ul><li>It has greater dispersion than normal distribution. As n gets larger, t-distribution approaches normal form. </li></ul>
5. 5. Steps involved in testing of hypothesis. <ul><li>Establish a null hypothesis </li></ul><ul><li>Suggest an alternate hypothesis. </li></ul><ul><li>Calculate t value. </li></ul><ul><li>Find degrees of freedom. </li></ul><ul><li>Set up a suitable significance level. </li></ul><ul><li>From t -table find critical value of t using α (risk of type 1 error, significance level) and v- degrees of freedom. </li></ul><ul><li>If calculated t value is less than critical value obtained from table, then null hypothesis is accepted. Otherwise alternate hypothesis is accepted. </li></ul>
6. 6. Applications of t - distribution <ul><li>Test of Hypothesis about the population mean. </li></ul><ul><li>Test of Hypothesis about the difference between two mean. </li></ul><ul><li>Test of Hypothesis about the difference between two mean with dependent samples. </li></ul><ul><li>Test of Hypothesis about coefficient of correlation. </li></ul>
7. 7. 1. Test of Hypothesis about the population mean( σ unknown and small sample size ) <ul><li>Null hypothesis: </li></ul><ul><li>t value is given as: </li></ul><ul><li>Standard deviation of sample is given as: </li></ul><ul><li>Degrees of freedom = n – 1 </li></ul><ul><li>Calculate table value at specified significance level & d.f. </li></ul><ul><li>If calculated value is more than table value then null hypothesis is rejected. </li></ul><ul><li>100(1- α )% Confidence interval for population mean: </li></ul>
8. 8. Test of hypothesis about the difference between two means <ul><li>When population variances are unknown, </li></ul><ul><li>t-test can be used in two types. </li></ul><ul><li>When variances are equal. </li></ul><ul><li>When variances are not equal. </li></ul>
9. 9. (a) Case of equal variances <ul><li>Null hypothesis: μ 1 = µ 2 </li></ul><ul><li>t value is given as: </li></ul><ul><li>where, </li></ul><ul><li>and </li></ul><ul><li>Degrees of freedom: n 1 + n 2 – 2 </li></ul><ul><li>Calculate table value at specified significance level & d.f. </li></ul><ul><li>If calculated value is more than table value then null hypothesis is rejected. </li></ul>
10. 10. (b) Case of unequal variances <ul><li>When population variances are not equal, we use unbiased estimators s 1 2 and s 2 2 to replace σ 1 2 and σ 2 2 . </li></ul><ul><li>Here, sampling distribution has large variability than population variability. </li></ul><ul><li>t value: </li></ul><ul><li>Degrees of freedom: </li></ul><ul><li>Calculate table value at specified significance level & d.f. </li></ul><ul><li>If calculated value is more than table value then null hypothesis is rejected. </li></ul>
11. 11. Confidence interval for the difference between two means Two samples of sizes n 1 and n 2 are randomly and independently drawn from two normally distributed populations with unknowns but equal variances. The 100(1- α )% confidence interval for µ 1 - µ 2 is given by:
12. 12. (3) Test of hypothesis about the difference between two means with dependent samples (paired t-test) <ul><li>Samples are dependent, each observation in one sample is associated with some particular observation in second sample. </li></ul><ul><li>Observations in two samples should be collected in form called matched pairs. </li></ul><ul><li>Two samples should have same number of units. </li></ul><ul><li>Instead of 2 samples we can get one random sample of pairs and two measurements associated with a pair will be related to each other. Example: in before and after type experiments or when observations are matched by rise or some other criterion. </li></ul>
13. 13. <ul><li>Null hypothesis: μ 1 = µ 2 </li></ul><ul><li>t value is given as: </li></ul><ul><li>where, mean of differences, </li></ul><ul><li>standard deviation of differences, </li></ul><ul><li>Degrees of freedom = n – 1 </li></ul><ul><li>Calculate table value at specified significance level & d.f. </li></ul><ul><li>If calculated value is more than table value then null hypothesis is rejected. </li></ul><ul><li>Confidence interval for the mean of the difference: </li></ul>
14. 14. (4) Testing of hypothesis about coefficient of correlation. <ul><li>Case 1: testing the hypothesis when the population coefficient of correlation equals zero, i.e., H o : ρ =0 </li></ul><ul><li>Case 2: testing the hypothesis when the population coefficient of correlation equals some other value than zero, i.e., H o : ρ = ρ o </li></ul><ul><li>Case 3: testing the hypothesis for the difference between two independent correlation coefficients. </li></ul>
15. 15. Case 1: testing the hypothesis when the population coefficient of correlation equals zero, i.e., H o : ρ =0 <ul><li>Null hypothesis: there is no correlation in population, i.e., H o : ρ =0 </li></ul><ul><li>t value is given as: </li></ul><ul><li>Degrees of freedom: n-2 </li></ul><ul><li>Calculate table value at specified significance level & d.f. </li></ul><ul><li>If calculated value is more than table value then null hypothesis is rejected, then there is linear relationship between the variables. </li></ul>
16. 16. Case 2: testing the hypothesis when the population coefficient of correlation equals some other value than zero, i.e., H o : ρ = ρ o <ul><li>When ρ≠ 0, test based on t-distribution will not be appropriate, but Fisher’s z-transformation will be applicable. </li></ul><ul><li>z = 0.5 log e (1+r)/(1-r) </li></ul><ul><li>OR </li></ul><ul><li>z = 1.1513 log 10 (1+r)/(1-r) </li></ul><ul><li>Z is normally distributed with mean </li></ul><ul><li>z ρ = 0.5 log e (1+ ρ )/(1- ρ ) </li></ul><ul><li>Standard deviation: σ z = 1/√(n-3) </li></ul><ul><li>This test is more applicable if sample size is large ( atleast 10). </li></ul>
17. 17. <ul><li>Null hypothesis: H o : ρ = ρ o </li></ul><ul><li>Test statistic: </li></ul><ul><li>Which follows approx. standard normal distribution. </li></ul>
18. 18. Case 3: testing the hypothesis for the difference between two independent correlation coefficients <ul><li>To test the hypothesis of 2 correlation coefficients derived from two separate samples, compare the difference of the 2 corresponding values of z with the standard error of that difference. </li></ul><ul><li>Formula used: </li></ul><ul><li>If the absolute value of this statistic is greater than 1.96, the difference will be significant at 5% significance level. </li></ul>
19. 19. The F - Distribution <ul><li>Named in honour of R.A. Fisher who studied it in 1924. </li></ul><ul><li>It is defined in terms of ratio of the variances of two normally distributed populations. So, it sometimes also called variance ratio. </li></ul><ul><li>F – distribution : </li></ul><ul><li>where, </li></ul><ul><li>s 1 2 , s 2 2 are unbiased estimator of σ 1 2 , σ 2 2 resp. </li></ul><ul><li>Degrees of freedom: v1 = n 1 -1, v 2 - 1 </li></ul><ul><li>If σ 1 2 = σ 2 2 , then , F=s 1 2 /s 2 2 </li></ul><ul><li>It depends on v 1 and v 2 for numerator and denominator resp., so v 1 and v 2 are parameters of F distribution. </li></ul><ul><li>For different values of v 1 and v 2 we will get different distributions. </li></ul>
20. 20. Probability density function <ul><li>Probability density function of F-distribution: </li></ul>
21. 21. Properties of F-distribution <ul><li>It is positively skewed and its skewness decreases with increase in v 1 and v 2 . </li></ul><ul><li>Value of F must always be positive or zero, since variances are squares. So its value lies between 0 and ∞. </li></ul><ul><li>Mean and variance of F-distribution: </li></ul><ul><li>Mean = v 2 /(-v 2 -2), for v 2 > 2 </li></ul><ul><li>Variance = 2v 2 2 (v1+v 2 -2) , for v 2 > 4 </li></ul><ul><li>v 1 (v 2 -2) 2 (v 2 -4) </li></ul><ul><li>Shape of F-distribution depends upon the number of degrees of freedom. </li></ul><ul><li>The areas in left hand side of the distribution can be found by taking reciprocal of F values corresponding to the right hand side, when the no. of degrees of freedom in nr. And in dr. are interchanged. It is known as reciprocal property, </li></ul><ul><li>F 1- α ,v 1 ,v 2 =1/F α ,v 2 ,v 1 </li></ul><ul><li>we can find lower tail f values from corresponding upper tail F values, which are given in appendix. </li></ul>
22. 22. Testing of hypothesis for equality of two variances <ul><li>It is based on the variances in two independently selected random samples drawn from two normal populations. </li></ul><ul><li>Null hypothesis H o : σ 1 2 = σ 2 2 </li></ul><ul><li>F = s 1 2 / σ 1 2 , which reduces to F = s 1 2 </li></ul><ul><li>s 2 2 / σ 2 2 s 2 2 </li></ul><ul><li>place large sample variance in numerator. </li></ul><ul><li>Degrees of freedom v 1 and v 2 . </li></ul><ul><li>Find table value using v 1 and v 2 . </li></ul><ul><li>If calculated F value exceeds table F value, null hypothesis is rejected. </li></ul>
23. 23. Confidence interval for the ratio of two variances <ul><li>100(1- α )% confidence interval for the ratio of the variances of two normally distributed populations is given by: </li></ul><ul><li>s 1 2 /s 2 2 < σ 1 2 < s 1 2 /s 2 2 </li></ul><ul><li>F (1- α /2) σ 2 2 F α /2 </li></ul>