Upcoming SlideShare
×

# Inferential statistics

4,968

Published on

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
4,968
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
250
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Inferential statistics

1. 1. Statistical Techniques Inferential StatisticsTerm paper - Inferential Statistics 1
2. 2. Statistical TechniquesTable of Contents1.0 WHAT IS INFERENTIAL STATISTICS?.................................................................................................32.0 BRIEF TIMELINE OF INFERENTIAL STATISTICS .............................................................................43.0 TEST OF WHAT? ..........................................................................................................................................6 3.1 HYPOTHESIS – INTRODUCTION ...............................................................................................6 3.1.2 ERRORS IN SAMPLING.................................................................................................7 3.1.3 STUDENT’s T-TEST ........................................................................................................9 3.1.4 CHI-SQUARE TEST.......................................................................................................10 3.2 REGRESSION? ...............................................................................................................................12 3.2.1 REGRESSION MODELS...............................................................................................12 3.2.2 SCATTER-PLOTS ..........................................................................................................12 3.2.3 REGRESSION EQUATION ..........................................................................................12 3.2.4 REGRESSION INTERPRETATION............................................................................15 3.2.5 R SQUARRED .................................................................................................................154.0 BIBLIOGRAPHY.........................................................................................................................................16Term paper - Inferential Statistics 2
3. 3. Statistical Techniques1.0 WHAT IS INFERENTIAL STATISTICS?The vital key to the difference between descriptive and inferential statistics are the capitalized wordsin the description: CAN DESCRIBE, COULD NOT CONCLUDE, AND REPRESENTATIVE OF.Descriptive statistics can only describe the actual sample you study. But to extend your conclusions toa broader population, like all such classes, all workers, all women, you must use inferential statistics,which means you have to be sure the sample you study is representative of the group you want togeneralize to.Allow me to exemplify: i. The study at the local mall and cannot be used to claim that what you find is valid for all shoppers and all malls. ii. Another example would be a study conducted on an intermediate college can’t claim that what you find is valid for the colleges of all levels (i.e. General Population). iii. Also visualize a survey conducted at a womens club that includes a majority of a particular single ethnic group cannot claim that what you find is valid for women for all ethnic groups. As you can see, descriptive statistics are useful and serviceable if you dont need to extend your results to whole segments of the population. But the social sciences tend to esteem studies that give us more or less "universal" truths, or at least truths that apply to large segments of the population, like all teenagers, all parents, all women, all perpetrators, all victims, or a fairly large segment of such groups. Leaving aside the theoretical and mechanical soundness of such an investigation for some kind of broad conclusion, various statistical approaches are to be utilized if one aspires to generalize. And the primary distinction is that of SAMPLING. One must choose a sample that is REPRESENTATIVE OF THE GROUP TO WHICH YOU PLAN TO GENERALIZE. To round up, Descriptive statistics are for describing data on the group you study, While Inferential statistics are for generalizing your findings to a broader population group.Term paper - Inferential Statistics 3
4. 4. Statistical Techniques2.0 BRIEF TIMELINE OF INFERENTIAL STATISTICS1733 1733 - In the 1700s, it was Thomas Bayer who gave birth to the concept of inferential statistics. The normal distribution was discovered in 1733 by a Huguenot refugee de Moivre as an approximation to the binomial distribution when the number of trials is too large. Today, not only do scientists but also many professions rely on statistics to understand behaviour and ideally make predictions about what circumstances relate to or cause these behaviours.1796 Historical Note: In 1796, Adophe Quetelet investigated the characteristics of French conscripts to determine the "average man." Florence Nightingale was so influenced by Quetelets work that she began collecting and analyzing medical records in the military hospitals during the Crimean War. Based on her work hospitals began keeping accurate records on their patients, to provide better follow-up care.1894 1894 - At the inception of the social survey, research results were confronted with the developments in inferential statistics. In 1894, Booth wrote The Aged Poor in England and Wales: Conditions? In this volume Booth claimed that there was no relationship between the ratio of welfare (out-of-doors relief) and workhouse relief (in-relief and the incidence of poverty by parish (or poor law union).1896 Dec 1896 - Walker died rather suddenly at the age of 56, just days after giving the address opening the first meeting of ASA outside Boston—in Washington, DC in December 1896. That meeting led to the founding of the Washington Statistical Society. His achievements in developing major federal data systems, in promoting the organizational development of statistics, and of bringing statistical ideas to a wide audience, left the field much richer than he found it.1899 1899 - Since inferential social statistics are primarily concerned with correlation and regression. To prove this Yule published his paper on poverty in London in 1899, this concern has occurred in a context of establishing causality. Often investigators seem to view statistical modeling as being equivalent to a regression model. The reader is cautioned that my critique of regression analysis is not necessarily equivalent to denying the value of empirical research.Term paper - Inferential Statistics 4
5. 5. Statistical Techniques1925 1925 – First, Sigmund Freud had developed a theory that self-explained the reasons for aggression and juvenile criminal behaviours in terms of childhood experiences. Second, in 1925, RA Fisher published Statistical Methods for Research Workers in which he identified an effective experimental paradigm that included control groups and inferential statistics. Freuds theory and Fishers paradigm provided a basis so that mental health professionals could initiate studies to identify many mental behaviours1930 IN 1930, THE YEAR the CH Stoelting Co. of Chicago published what was to be the largest- ever catalog of psychological apparatus, there was virtually no use of inferential statistics in psychology, in spite of the fact that William Sealey Cosset had long since presented the T-test and Sir Ronald Fisher had presented the general logic of null hypothesis testing. Only after Fishers epochal introduction to analysis of variance procedures did psychologists even notice the procedure.1930 1930 - The fiducial argument, which Fisher produced in 1930, generated much controversy and did not survive the death of its creator. Fisher created many terms in everyday use, eg statistic and sampling distribution and so there are many references to his work on the Words pages. Symbols in Statistics are his contributions to notation.1935 1935 - In the two decades following the publication of Ronald Aylmer Fishers Design of Experiments in 1935, Fishers link between experimental design and inferential statistics became institutionalized in American experimental psychology.1936 Apr 27, 1936 - . Pearson founded the journal Biometrics and was the editor of Annals of Eugenics. Because of his fundamental work in the development of modern statistics, many scholars today regard Pearson as the founder of 20th-century statistics. He died in Coldharbour, England, on April 27, 1936.1977 1977 - The youth violence prevention landscape has changed drastically in the last quarter century. In 1977, Wright and Dixon published a review of “Juveniles delinquency prevention program” reports. The results were disappointing. From approximately 6600 program abstracts, empirical data were available from only 96 . Of the 96 empirical reports, only 9 used random assignment of subjects, inferential statistics, outcomes measure of delinquency, and a follow-up period of at least six months. Of those 9, only 3 reported positive outcomes, and these three were based on the three smallest sample sizes among the 9 reports. The authors concluded that the literature was low in both scientific and policy utility. By contrast today dozens of summaries of research on prevention practices are available.Term paper - Inferential Statistics 5
6. 6. Statistical Techniques 1981 Jun 3, 1981 - Education Practical Statistics for Educators An introduction to the basic ideas of descriptive and inferential statistics as applied to the work of the classroom teachers counselors and administrators In the public schools Emphasis Is upon practical applications of statistics to problems. 1986 Dec 17, 1986 - Koop acknowleged that the proof of these smoker’s deaths was "inferential, of course," based on analyses of statistics gathered in past studies, including several in Japan, Hong Kong, Taiwan, Europe and the United States. 1995 Jan 1995 - A jury trial on compensatory damages was held in January 1995. Dannemiller testified that the selection of the random sample met the standards of inferential statistics, that the successful efforts to locate and obtain testimony from the claimants in the random sample "were of the highest standards " in his profession, that the procedures followed conformed to the standards of inferential statistics.3.0 TEST OF WHAT? Tests of significance are helpful in problems of generalization. A Chi-Square or a T-Test tells you the probability that the results you found in the group under study represent the population of the chosen group. It can be frequently observed, Chi-Square or a t-test gives you the probability that the results found could have occurred by chance when there is really no relationship at all between the variables you studied in the population. A known method used in inferential statistics is estimation. In estimation, the sample is used to estimate a parameter, and a confidence interval about the estimate is constructed. Other examples of inferential statistics methods include i. Hypothesis testing ii. Linear regression 3.1 HYPOTHESIS – INTRODUCTION Hyptothesis is a statement about the population parameter or about a population distribution. The testing of hypothesisis conducted in two phases. In the first phase, a test is designed where we decide as to when can the null hypothesis be rejected. In the second phase, the designed test is used to draw the conclusion. Hypothesis testing is to test some hypothesis about parent population from which the sample is drawn. DEFINITIONS PARAMETER - The statistical constants of the population namely mean (µ) , variance are usually referred to as parameters. STATISTIC - Statistical measures computed from the sample observations alone namely mean X Variance S2 have been termed as Statistic. Term paper - Inferential Statistics 6
7. 7. Statistical TechniquesUNBIASED ESTMATE - A statistic t = t (X1, X2, …..Xn), a function of the sample values X1, X2,…….Xn isan unbiased estimate of the population parameter 0, if E(t) = 0. In other words, ifE(Statistic) = Paramater, then statistic is said to be an unbiased estimate of the parameter.SAMPLING DISTRIBUTION OF A STATISTIC - If we draw a sample of size n from a given finite populationof size N, then the total number of possible samples is /n!(N-n)! = KSTANDARD ERROR - The standard deviation of the sampling distribution of a statistic is known as it standarderror.NULL HYPOTHESIS - A definite statement about the population parameter which is usually a hypothesis ofno difference is called Null Hypothesis an is usually denoted by HoALTERNATIVE HYPOTHESIS - Any hypothesis which is complementary to the null hypothesis is called analternative hypothesis usually denoted by H1. For example, if we want to test the null hypothesis that thepopulation has a specified mean Mo (say) is Ho : µ - µo then the alternative hypothesis could be a) H1 : µ ≠ µ o b) H1 : µ > µ o c) H1 : µ < µ o3.1.1 PROCEDURE FOR TESTING OF HYPOTHESISVarious steps in testing of a statistical hypothesis in a systematic manner : 1. Null hypothesis : Set up the null hypothesis H0 2. Alternative Hypothesis : Set up the alternative hypothesis H1. This will be enable us to decide whether we have to use a single tailed(right or left) test of two-tailed test. 3. Level of Significance : To choose the appropriate level of significance (x) 4. Test Statistic : To compute the test statistic : Z = t-E(t)/S1E(t) , under Ho 5. Conclusion : We compare the computed value of Z with the significant value Z2, at the given level of significance, if │z │ <z2 we say it not significant, it │z│ >z is then we say that it is significant and the null hypothesis is rejected at level of significance.3.1.2 ERRORS IN SAMPLINGThe main objective in sampling theory is to draw valid inference about the population parameters on the basisof the sample results. In practice, we decide to accept or reject the lot after examining a sample from it. As suchwe are able to commit the following two types of errors :TYPE 1 ERROR : Reject Ho when it is true,TYPE II ERROR : Accept Ho when it is wrong, ie. Accept Ho when H1 is trueIf we mention P(Accept Ho when it is wrong) = P(Accept Ho/H1) = β andP(Reject Ho when it is true) = P(Reject Ho/H1) = x then 2 and β are called the sizes of type 1 error and type IIerror, respectively. In practice, type I error amounts to rejecting a lot when it is good and type II error may beTerm paper - Inferential Statistics 7
8. 8. Statistical Techniquesregarded as accepting the lot when it is bad. Thus P(Reject a lot when it is good) = α and P (Accept a lot whenit is bad) = β where α and β are referred to as producer’s risk and consumer’s risk respectively.CRITICAL REGIONA region in the sample splace S which amounts to rejection of Ho is termed as critical region of rejection.ONE – TAILED AND TWO-TAILED TESTSHo: µ > µ o (Right-tailed), the critical region lies entirely in the right tailH1: µ < µ o (left-tailed), the critical region lies entirely in the left tail.A test of statistical hypothesis where the alternative hypothesis is two – tailed tests such as Ho:µ=µ o against thealternative hypothesis H1:µ=µ o isknown as two tailed test and in such a case the critical region is given by theportion of the area lying on both tailsof the probability curve of the test statistic.CRITICAL VALUE OR SIGNIFICANT VALUESThe value of test statistic which separates the critical (or rejection) region and the acceptance region is calledthe critical value or significant value. It depends on : 1) The level of significance used, and 2) The alternative hypothesis, whether it is two-tailed of single-tailedThe standardized variable corresponding to the statistic t namely Z =The value of z above under the null hypothesis is known as test statistic.The critical value of the test statistic at level of significance 2 for a two-tailed test is given by Z, where Z isdetermined by the equation : P(1Z1>Z o ) = α i.e., Zα is the value so that the total area of the critical region onboth tails is 2. Since normal probability curve is a symmetrical curve.In case of a single-tail alternative, the critical value of Zα is determined so that total area to the right of it (forright-tailed test) is α and for left-tailed test the total area to the left of (-Zα) is αThus the significant or critical value of Z for a single-tailed test (left or right) at level of significance α is sameas the critical value of Z for a two-tailed test at level of significance ‘α’. Please find below the critical values ofZ at commonly used levels of significance for both two-tailed and single-tailed tests Critical Value Z2 LEVEL OF SIGNIFICANCE 1% 5% 10%Two tailed test │Zα│ = 2.58 │Zα│ = 1.96 │Zα│ = 1.645Right tailed test Zα = 2.33 Zα = 1.645 Zα = 1.28Left tailed test Zα = 2.33 Zα = 1.645 Zα = 1.28TEST OF SIGNIFICANCE OF A SINGLE MEANIf X1, X2, …….Xn, in a random sample of size n from a normal population with mean M and variance 2, thenthe sample mean is distributed normally with mean M and variance .Term paper - Inferential Statistics 8
9. 9. Statistical TechniquesNull Hypothesis, Ho - The sample has been drawn from a population with mean M and variance , ie there is nosignificance difference between the sample mean(X) and population mean(M), the test statistic (for largesamples), is Z =If the population Standard Deviation is unknown, then we use its estimate provided by the sample variancegiven by (for large samples)TEST OF SIGNIFICANCE FOR DIFFERENCE OF MEANSThe mean of random sample of size n, from a population with Mean M, and Variance and let be the mean of anindependent random sample of size n2 from another population with mean M2 and variance ? then, sincesample size are large.TEST OF SIGNIFICANCE FOR THE DIFFERENCE OF STANDARD DEVIATIONIf S1 and S2 are the standard deviation of two independent samples, then under null hypothesis, Ho : 1= 2 i.ethe sample standard deviations don’t differ significantly. 1) (for large samples)But in case of large samples, the S.E of the difference of the sample standard deviations is givenby3.1.3 STUDENT’s T-TESTThe entire large sample theory was based on the application of “normal test”. However if the sample size n issmall, the distribution of the various statistics are far from normally and as such ‘normal test’ cannot be appliedif n is mall. In such cases exact sample tests, pioneered by W.S.Gosst(1908) who wrote under the pen name-ofstudent, and later on developed and extended by Prof.R.A.Fisher(1926) are used.Applications Of T-Distribution The t-distribution has a wide number of applications in statistics, and some of which are 1) To test if the sample mean( ) differs significantly from the hypothetical value µ of the population mean. 2) To test the significance of the difference between two sample means. 3) To test the significance of an observed sample correlation and sample regression coefficient. 4) To test the significance of observed partial correlation coefficient.T-Test For Single MeanAll hypothesis testing is done under the assumption the null hypothesis is truePopulation Standard Deviation KnownIf the population standard deviation, sigma, is known, then the population meanhas a normal distribution, and you will be using the z-score formula for sample means. The testTerm paper - Inferential Statistics 9
10. 10. Statistical Techniquesstatistic is the standard formula youve seen before. The critical value is obtained from the normal table, orthe bottom line from the t-table.Population Standard Deviation UnknownIf the population standard deviation, sigma, is unknown, then the population mean has astudents T-distribution, and you will be using the t-score formula for sample means. Thetest statistic is very similar to that for the z-score, except that sigma has been replaced by sand z has been replaced by t.The critical value is obtained from the t-table. The degree of freedom for this test is n-1.If youre performing a t-test where you found the statistics on the calculator (as opposed to being given them inthe problem), then use the VARS key to pull up the statistics in the calculation of the test statistic. This willsave you data entry and avoid round off errors.General PatternNotice the general pattern of these test statistics is (observed - expected) / standard deviation.3.1.4 CHI-SQUARE TESTA chi-square test (also chi squared test or χ2 test) is any statistical hypothesis test in which the samplingdistribution of the test statistic is a chi-square distribution when the null hypothesis is true, or any in which thisis asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made toapproximate a chi-square distribution as closely as desired by making the sample size large enough.Chi-Square Test In Contigency TableCHI-SQUARE distribution is utlised to determine the critical value of the chi-square variate at various level ofsignificance.Properties :(1) The value of chi-square varies from 0 to α. (2) When each Oi = Ei, the value of chi-square is zero.(3) Chi-square can never be negativeTerm paper - Inferential Statistics 10
11. 11. Statistical TechniquesCONTIGENCY TABLE : The test if independence of attributes when the frequencies are presented in a two waytable according to two attributes classified to various categories known as the contigency table.Test of hypothesis in a contingency table.A contingency is a rectangular array having rows and colums ascertaining to the categories of the attributes ofA & B. The null hypothesis : H0 : Two attributes are independent vs H1 : two attributes are dependant on eachother.Statistics X2 has (p-1) (q-1) d.fUnder Ho, the indepdendence of attributes, the expected frequency,Eij = ith row total x jith column N= Ri x Cj nDecision : The calculated value compared with tabulated value of X2 for (P-1) (Q-1) d.f. & prefixed level ofsignificance α. Calculation X2 > reject Ho, if Calculation < X2 – tab – accept Ho.CONTIGENCY TABLE OF ORDER 2X2DIRECT FORMULAR FOR 2x2 = n(ad-bc)2 (a+b) (c+d) (a+c) (b+d) X2 has 1 d.f. B1 B2A1 A B a+bA2 C (cell) D c+d a+c B+d a+b+c+d = nCalculation X2>X2α1, reject HoCalculation X2<x2α1, accept HoTerm paper - Inferential Statistics 11
12. 12. Statistical Techniques3.2 REGRESSION?Regression analysis is a method for determining the relationship between two variables. The regressionstatistical skeleton is at the core of observed social and political science research. Regression analysis works asa statistical substitute for controlled experiments, and can be used to make causal inferences.3.2.1 REGRESSION MODELSResearchers render verbal theories, hypothesis, even intuition into models. A model illustrates how and underwhat circumstances two (or more) variables are linked. A regression model with a dependent variable and oneindependent variable is known as a bi-variate regression model.A regression model with a dependent variable and two or more independent variables and/or control variables isknown as a multivariate regression model.Example: The dataset "Televisions, Doctors, and Smokers" contains, among other variables, the number ofsmokers per television set and the number of smokers per physician for 50 countries.3.2.2 SCATTER-PLOTSThe X axis normally depicts the values of the independent variable, while the Y axis represents the value of thedependent variable.Scatter-plots allow you to study the flow of the dots, or the relationship between the two variablesScatter-plots allow political scientists to identify : • Positive or negative relationships • Monotonic or linear relationships3.2.3 REGRESSION EQUATIONThe linear equation is specified as follows: Y = a + bX Where Y = dependent variable X = independent variable a = constant (value of Y when X = 0) b = is the slope of the regression line“a” can be positive or negative. Referred to “a” as the intercept, “a” is the point at which the slope line passesthrough the Y axis.“b” (the slope coefficient) can be positive or negative. A positive coefficient denotes a positive relationship anda negative coefficient denotes a negative relationship.The significant interpretation of the slope coefficient depends on the variables involved, how they are codedand the dimension of the variables. Larger coefficients may indicate a solid relationship, but not necessarily.Term paper - Inferential Statistics 12
13. 13. Statistical TechniquesThe goal of regression analysis is to find an equation which “best fits” the data.In regression, the equation is found in such a manner such that its graphis a line that reduces the squared vertical distances amid the data pointsand the lines drawn.“d1” and “d2” illustrate the distances of observed data points from anapproximate regression line.Regression analysis bring into play a mathematical equation that locatesthe single line that reduces the squared distances from the line.The standard regression equation is the same as the linear equation withone exception: the error factor. Y = α + βX + εWhere Y = dependent variableα = constant termβ = slope or regression coefficientX = independent variableε = error termThis regression process is called ordinary least squares (OLS). α (the constant term) interpreted the same as earlier β (the regression coefficient) tells how much Y changes if X changes by one unit.The regression coefficient indicates the inclination and strength of the relationship between the two quantitativevariables. The error (ε) denotes that observed data does not follow a tidy pattern that can be summarized with astraight line.A observations score on Y can be split as the following two parts: α + βX is due to the independent variable ε is due to errorObserved value = Predicted value (α + βX) + error (ε)The error is the difference between the predicted value of Y and the observed value of Y. This is known as theresidual.Term paper - Inferential Statistics 13
14. 14. Statistical TechniquesExample:Lets take an example to clarify what we theoretically know:In above data on the scatterplot:Y (dependent variable) = telephone lines for 1,000 peopleX (independent variable) = Infant mortalityWe will utilize regression to look at the relationship connecting communication capacity (measured here astelephone lines per capita) and infant mortality.In this example, the intercept and regression coefficient are as follows:α (or constant) = 121Means that when X (infant deaths) is 0 deaths, there are 121 phone lines per 1,000 population.β = -1.25Means that when X (deaths) increases by 1, there is a predicted or estimated decrease of 1.25 phone lines.Term paper - Inferential Statistics 14
15. 15. Statistical Techniques3.2.4 REGRESSION INTERPRETATIONThese computations can be helpful because they allow us to make useful predictions about the data. Forexample, an increase from 1 to 10 mortalities per 1,000 live births is related with a drop of 119.75 – 108.5 =11.25 telephone lines.Interpreting the meaning of a coefficient could be a bit fiddly. What does a coefficient of -1.25 mean? Well, it means a negative association between infant mortality and phone lines. It means for every extra infant death there is a reduction of 1.25 phone lines.This is useful info, however is there a gauge that tells us how good we do predicting the observed values? Yes,the measure is known as R-squared.3.2.5 R SQUARREDAs stated earlier, there are two components of the total deviation from the mean, which is calculated by theaddition of squares (or total variance).The difference between the mean and the predicted value of Y, this is theexplained part of the deviation, or (Regression Sum of Squares).The second component is the residual sum of squares (Residual Sum of Squares), which measures predictionerrors. The is the unexplained part of the deviation. Total SS = Regression SS + Residual SSIn other words, the total sum of squares is the sum of the regression sum of squares and the residual sum ofsquares.R2 = Regression SS/TSSThe more variance the regression model explains, the higher the R2.Term paper - Inferential Statistics 15
16. 16. Statistical Techniques 4.0 BIBLIOGRAPHY 1. Inferential statistics Timeline: http://www.google.ae/search?q=inferential+statistics&hl=en&tbo=1&rls=com.microsoft:en- us:IE-SearchBox&output=search&source=lnt&tbs=tl:1&sa=X&ei=HMsFTq- yKInIrQej2LSmDA&ved=0CBEQpwUoAw&biw=1366&bih=596 [Online]. [Accessed: 23th June 2011]. 2. Handbook of Injury and Violence Prevention By Lynda S. Doll, E. N. Haas Chapter 9.3 – Brief history of youth violence prevention efforts, pg#159Term paper - Inferential Statistics 16