Upcoming SlideShare
×

# Sampling distribution concepts

4,000 views

Published on

Published in: Education, Technology
7 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
4,000
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
273
0
Likes
7
Embeds 0
No embeds

No notes for slide

### Sampling distribution concepts

1. 1. UNIT-V
2. 2. Population & Sample Population Sample  Population in statistics means the whole of the information that comes under the purview of statistical investigation.  It is the totality of all the observations of a statistical inquiry.  It is also known as “UNIVERSE”  A population may be finite or infinite  A part of the population selected for study is called a SAMPLE.  Hence, Sample is nothing but the selection of a group of items from a population in such a way that this group represents the population.  The number of individuals included in the finite sample is called the SIZE OFTHE SAMPLE.
3. 3. Parameter & Statistic Parameter Statistic  Any statistical measure (such as mean, mode , S.D.) computed from population data is known as PARAMETER.  Any statistical measure computed from sample data is known as STATISTIC.  STATISTIC computed from a sample drawn from the parent population plays an important role in  A)TheTheory of Estimation  B)Testing of Hypothesis
4. 4. Notations used Notations Statistical Measure Population Sample Mean µ X Standard deviation σ S Size N n
5. 5. Sampling & Sampling Theory Sampling Sampling theory  It is the process of selecting a sample from the population.  Sampling can also be defined as the process of drawing a sample from the population & compiling a suitable statistic in order to estimate the parameter drawn from the parent population & to test the significance of the statistic computed from such sample.  Sampling theory is based on Sampling  It deals with statistical inferences drawn from sampling results, which are of three types: i. Statistical Estimation, ii. Tests of significance, and iii. Statistical inference
6. 6. Objects of Sampling theory  To estimate population parameter on the basis of sample statistic.  To set the limits of accuracy & degree of confidence of the estimates of the population parameter computed on the basis of sample statistic.  To test significance about the population characteristic on the basis of sample statistic.
7. 7. Methods of Sampling Random (Probability) Sampling Non-random Sampling  Simple Random sampling  Stratified Sampling  Systematic Sampling  Multi-stage Sampling  Judgment Sampling  Quota Sampling  Convenience Sampling
8. 8. Random Sampling Methods
9. 9. Simple Random sampling  This method refers to the sampling technique in which each and every item of the population is given a chance of being included in the sample;  The selection is free from personal bias;  This method is also known as method of chance selection.  It is sometimes also referred to as “representative sampling” (if the sample is chosen at random and if the size of the sample is sufficiently large, it’ll represent all groups in the population)
10. 10. Contd..  It is a probability sampling because every item of the population has an equal opportunity of being selected in the sample;  Methods of obtaining a Simple Random Sample: 1. Lottery method 2. Table of random numbers ( a number of random tables are available such as Tippets table; Fisher and Yates numbers; Kendall and Babington Smith numbers)
11. 11. Stratified Sampling  It is one of the restricted random methods which by using available information concerning the data attempts to design a more efficient sample than that obtained by the simple random procedure;  The process of stratification requires that the populationmay be divided into homogeneous groups or classes called strata  then a sample may be taken from each group by simple random method  And the resulting sample is called a stratified sample
12. 12. Contd..  A stratified sample may be either proportional or disproportionate.  In a proportional stratified sampling plan, the number of items drawn from each stratum is proportional to the size of the strata.  For example, if the population is divided into 4 strata, their respective sizes being 15, 10,20 ,55 % of the population and a sample of 1000 is to be drawn, the desired proportional sample may be obtained in the following manner:
13. 13. Contd.. From stratum one 1000 (0.15) 150 items From stratum two 1000 (0.10) 100 From stratum three 1000 (0.20) 200 From stratum four 1000 (0.55) 550 Sample Size 1000 Disproportionate Stratified sampling includes procedures of taking an equal number of items from each stratum irrespective of its size.
14. 14. Systematic Sampling  This method is popularly used in such cases where a complete list of the population from which sampling is to be drawn is available;  The method is to select every kth item from the list where ‘k’ refers to the sampling interval;  k = size of population / sample size (N/n);  The starting point between the first & the kth is selected at random
15. 15. Contd..  For example, if a complete list of 1000 students is available and we want to draw a sample of 200 students; this means we must take every 5th item.  But the first item between one and five shall be selected at random.  Let it be three, now we shall go on adding 5 & obtain numbers of desired sample.
16. 16. Cluster Sampling  It is different from stratified sampling in a way that each strata consists of homogeneous items but the groups in clusters are mutually exclusive and not exactly homogeneous;  Multi- stage sampling is a type of cluster sampling;
17. 17. Multi-stage Sampling  As the name suggests this method refers to a sampling procedure which is carried out in several stages;  The material is regarded as made up of a number of first stage sampling units, each made up of a number of second stage units;  At first the first stage units are sampled by some suitable method such as random sampling, then, a sample of second stage is selected from each of the selected first stage units again by some suitable method which may be the same or different from the method employed for the first stage units.
18. 18. Non-Random Sampling Methods
19. 19. Judgment Sampling  In this method of sampling the choice of sample items depends exclusively on the judgment of the of the investigator;  This method, though simple , is not scientific;  This method is used in solving many types of economic & business problems such as i. When sample size is small; ii. With the help of Judgment sampling, estimation can be made available quickly;
20. 20. Quota Sampling  It is a type of judgment sampling;  In a quota sample, quotas are set up according to given criteria but within quotas the selection of sample items depends on personal judgment.
21. 21. Convenience Sampling  It is also known as the Chunk;  A Chunk is a fraction of one population taken for investigation because of its convenient availability;  Hence chunk is selected neither by probability nor by judgment but by convenience;  Convenience samples are sometimes called accidental samples because those entering into the sample enter by ‘accident’;
22. 22. Errors in Sampling: Discrepancies in Statistical measure of population (Parameter) & of the sample drawn from the same population (Statistic). Sampling Errors Non Sampling Errors  These are of two types a. Biased arise due to any bias in selection , estimation, tec b. Unbiased errors arise due to chance factors  Occurs primarily due to the following reasons: 1. Faulty selection of the sample 2. Substitution  May arise in the following ways: 1. Due to negligence & carelessness on the part of investigator; 2. Due to incomplete investigation & sample survey; 3. Due to negligence & non response on the part of the respondents; 4. Errors in data processing.
23. 23. Principles of Sampling  Principle of “Statistical Regularity”: This principle lays down that a moderately large number of items chosen at random from a large group are almost sure on an average to possess the characteristics of the large group.  Principle of “Inertia of Large Numbers”: this is principle is corollary of the above principle. It states that, other things being equal, larger the size of sample, more accurate the results are likely to be.
24. 24. Theory of Estimation  Statistical estimation is the procedure of using a sample statistic to estimate a population parameter.  A Statistic is used to estimate a parameter is called an estimator, and  The value taken by the estimator is called an estimate.  for example, the sample mean(say 7.65) is an estimator of the population mean.
25. 25. Statistical estimation is divided into two major categories: Point Estimation Interval Estimation  In point estimation, a single statistic is used to provide an estimate of the population parameter;  Change in sample will cause deviation in estimate;  An interval estimate is a range of values within which a researcher can say with some confidence that the population parameter falls;  This range is called confidence interval;
26. 26. Qualities of a good estimator:  A good estimator is one which is close to the true value of the parameter as possible.  A good estimator must possess the following characteristics: i. Unbiasedness ii. Consistency iii. Efficiency and iv. Sufficiency
27. 27. Contd..  Unbiasedness: this is a desirable property for a good estimator to have; “unbiasedness” refers to the fact that a sample mean is an unbiased estimator of a population mean because the mean of the sampling distribution of a sample means taken from the same population is equal to the population mean itself;  Efficiency: it refers to the size of the standard error of the statistic; if two statistic are compared from a sample of the same size & try to decide which is a good estimator; the statistic that has a smaller standard error or standard deviation of the sampling distribution will be selected.
28. 28. Contd..  Consistency: a statistic is a consistent estimator if the sample size increases, it becomes almost certain that the value of statistic comes very close to the value of the population parameter;  Sufficiency: an estimator is sufficient if it makes so much use of the information in the sample that no other estimator could extract from the sample additional information about the population estimator being estimated;
29. 29. Hypothesis Testing  Hypothesis testing is based on hypothesis;  “Hypothesis” is an assumption about an unknown population parameter;  Hypothesis testing is a well defined procedure which helps in deciding objectively whether to accept or reject the hypothesis based on the information available from the sample;
30. 30. Hypothesis Testing Procedure STEP 1: SET NULL & ALTERNATIVE HYPOTHESIS:  The assumption which we want to test is called the NULL hypothesis;  It is symbolized as Ho;  Null hypothesis is set with no difference (i.e. status quo) & considered true, unless and until it is proved by the collected sample data;  Example, Ho :µ =500 “the null hypothesis is that the population mean is equal to 500”
31. 31. Contd..  The Alternative hypothesis, generally referred by H1 or Ha is the logical opposite of the null hypothesis;  H1 :µ ≠500; ( Ho :µ >500; or H1 :µ <500)  In other words, when null hypothesis is found to be true, the alternative hypothesis must be false; or vice versa;  Rejection in null hypothesis indicates that the difference have statistical significance & acceptance in null hypothesis indicates that the difference are due to chance;
32. 32. STEP2: SET UP A SUITABLE SIGNIFICANCE  The level of significance, generally denoted by ‘α’ is the probability, which is attached to a null hypothesis, which may be rejected even when it is true;  The level of significance is also known as the size of rejection region or size of critical region;  It is generally specified before any samples are drawn, so that results obtained will not influence the direction to be taken;  Any level of significance can be adopted in practice we either take 5% or 1% level of significance;
33. 33. Contd..  When we take 5% level of significance then there are about 5 chances out of 100 that we would reject the null hypothesis when it should be accepted i.e. we are about 95% confident that we have made the right decision;  When the null hypothesis is rejected at α=0.5, test result is said to be significant;  When the null hypothesis is rejected at α=0.01, test result is said to be highly significant;
34. 34. STEP3: DETERMINATION OF A SUITABLE TEST STATISTIC  Many of the test statistic that we shall encounter will have the following form:  Test statistic = Sample Statistic- hypothesized population parameter Standard Error of the sample statistic
35. 35. STEP4 : SET THE DECISION RULE  The next step for the researcher is to establish a critical region  Acceptance region : when null hypothesis is accepted;  Rejection region ; when null hypothesis is rejected;
36. 36. STEP5: COLLECT THE SAMPLE DATA  Data is now collected;  Appropriate sample statistic are computed;
37. 37. STEP6: ANALYSE THE DATA  This involves selection of an appropriate probability distribution for a particular test;  For example, when the sample is small (n<30) the use of normal probability distribution (Z) is not an accurate choice, (t) distribution needs to be used in this case;  Some commonly used testing procedures are Z, t, F & Chi square
38. 38. STEP7: ARRIVE AT A STATISTICAL CONCLUSION & BUSINESS IMPLICATION  Statistical conclusion is a decision to accept or reject a null hypothesis;  This depends on whether the computed test statistic falls in acceptance region or rejection region;
39. 39. Types of Errors in Hypothesis Testing Correct Decision Type I error (α) Type II error (β) Correct Decision Decision Condition Ho: true Ho: false Accept Reject
40. 40. Z-test  Hypothesis testing for large samples i.e. n>= 30;  Based on the assumption that the population , from which the sample is drawn, has a normal distribution;  As a result, the sampling distribution of mean is also normally distributed; Application: 1. For testing hypothesis about a single population mean; 2. Hypothesis testing for the difference between two population means; 3. Hypothesis testing for attributes.
41. 41. Formula for single population mean (finite population)  Z = x - µ σ √n Where , µ = population mean x = sample mean σ = population standard deviation n = sample size
42. 42. Q A marketing research firm conducted a survey 10 yrs ago & found that an average household income of a particular geographic is Rs 10000. Mr. gupta who recently joined the firm a VP expresses doubts. For verifying the data, firm decides to take a random sample of 200 households that yield a sample mean of Rs 11000. assume that the population S.D is Rs 1200. verify Mr. Gupta’s doubts using α=0.05?  Step 1: set null & alternative hypothesis Ho: µ=10000 H1: µ≠10000  Step2: Determine the appropriate statistical test Since sample size >=30, so z-test can be used for hypothesis testing  Step3: set the level of significance The level of significance is known (α=0.05)  Step4: Set the decision rule Acceptance region covers 95% of the area & rejection region 5% Critical area can be calculated from the table ( + 1.96)
43. 43.  Step5:collectthesampledata Asampleof200respondentsyieldasamplemeanofRs11000  Step6:Analyze the data n=200 µ=10 000 x=11000 σ=1200  Z = x - µ = 11000-10000 = 11.79 σ 1200 √n √ 200  Step7:Arrive at a statistical conclusion & business implication Z value is 11.79 which is greater than +1.96, hence null hypothesis is rejected and alternative hypothesis is accepted. Hence Mr. Gupta’s doubt about household income was right.
44. 44. Formula for single population mean (infinite population)  Z = x - µ σ x √N-n √n √N-1 When population Standard deviation is not known:  Z = x - µ s √n where s= sample standard deviation
45. 45. Hypothesis testing for the difference between two population means  Z = (x1 – x2) – (µ1 - µ2) √ σ1 2 + σ2 2 √n1 + n2
46. 46. Hypothesis for attributes  Z = x- µ √ npq Where, n=sample size µ= np p=probability of happening q=chance of not happening
47. 47. Q In 600 throws of 6-faced dice, odd points appeared 360 times, would you say that the dice is fair at 5% level of significance  Ho=dice is fair  P=q=½  n=600  np=300  x=360 Z = x-np = 360-300 =4.9 √ npq √ 600* ½*½ Z is greater than 1.96(at 5%), Ho is rejected. Hence, dice is not fair.
48. 48. t-test  Given by W.S. Gosset in 1908 under the pen name of student’s test  t-test can be applied when: 1. When a researcher draws a small random sample (n<30) to estimate the population (µ); 2. When the population standard deviation (σ) is unknown; 3. The population is normally distributed
49. 49. Application of t-test  Hypothesis testing for single population mean;  Hypothesis testing for the difference between two independent population means;  Hypothesis testing for the difference between two dependent population means;
50. 50. Hypothesis testing for single population mean  t = x - µ s √n With degree of freedom (n-1) Where , µ = population mean x = sample mean s = sample standard deviation n = sample size
51. 51. Q: Royal tyre has launched a new brand of tyres for tractors & claims that under normal circumstances the average life of tyres is 40000 km. a retailer wants to test this claim & has taken a random sample of 8 tyres. He tests the life of tyres under normal circumstances. The results obtained are: Tyres 1 2 3 4 5 6 7 8 Km 35 000 38 000 42 000 41 000 39 000 41 500 43 000 38 500 Use α = 0.05 for testing the hypothesis Step1: Set null & alternative hypothesis Null hypothesis: Ho: µ = 40 000 Alternative hypothesis: Ho: µ ≠ 40 000 Step2:Determine the appropriate statistical test The sample size is less than 30, so t test will be an appropriate test Step3:Set the level of significance The level of significance, i.e. α = 0.05 Step4: Set the decision rule The t distribution value for a two-tailed test is t0.025 = 2.365 for degrees of freedom 7. so if computed t value is outside the + 2.365 range, the null hypothesis will be rejected; otherwise accepted.
52. 52.  Step 5: Collect the sample data:  Step 6: Analyze the data X=39750; µ=40000; s=2618.61 n=8; df=n-1=7 ; Table value of t0.025,7=2.365  t = x - µ =39750-40000 = -0.27 s 2618.61 √n √ 8  Step 7: Arrive at a statistical conclusion & Business implication The observed t value is -0.27 which falls within the acceptance region & hence null hypothesis is accepted i.e. Ho: µ = 40 000 Tyres 1 2 3 4 5 6 7 8 Km 350000 38000 42000 41000 39000 41500 43000 38500
53. 53. Hypothesis testing for the difference between two independent population means  t= (x1 – x2) – (µ1 - µ2) σ √ 1 + 1 √n1 + n2  σ can be estimated by pooling two sample variances & computing pooled standard deviation  σ= s pooled = √ s1 2 (n1 -1) + s2 2 (n2 -1) n1 + n2– 2
54. 54. F-test  Is named after R.A. Fisher who first studied it in 1934;  This distribution is usually defined in terms of the ratio of the variances of two normally distributed populations  The quantity s1 2 / σ1 2 s2 2 / σ2 2 is distributed as F-distributed with (n1 – 1) & (n2 -1) degree of freedom
55. 55. Contd..  Where s1 2 = Σ (x1 – x1)2 (n1 – 1) s2 2 = Σ (x2 – x2)2 (n2 – 1)
56. 56. Chi Square test  Chi square is related to categorical data (as counting of frequencies from one or more variables);  Some researchers place chi-square in the category of Non-parametric tests  X2 test was developed by Karl Pearson in 1900;  the symbol X stands for the Greek letter “chi”;  X2 is a function of its degree of freedom;
57. 57. Contd..  Being a sum of square quantities X2 distribution can never be a negative value;  X2 is a continuous probability distribution with range zero to infinity;  X2 = Σ (O-E)2 E With df =(r-1)(c-1) E= row total x column total Grand total
58. 58. Decision rule  If X2 calculated > X2 critical, reject the null hypothesis;  If X2 calculated < X2 critical, accept the null hypothesis;
59. 59. Conditions to apply chi- square test  Data should not be in % or ratios rather they should be expressed in original units;  The sample should consist of atleast 50 observations & should be drawn randomly & individual observation in a sample should be independent from each other;