Upcoming SlideShare
Loading in …5
×

# Areas In Statistics

15,927 views

Published on

Information about these areas in statistics: descriptive statistics, inferential statistics and regression.

1 Comment
3 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• yes

Are you sure you want to  Yes  No
Your message goes here
No Downloads
Views
Total views
15,927
On SlideShare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
262
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

### Areas In Statistics

1. 1. Areas in statistics Mª Carmen González Cortés
2. 2. 1.Descriptive statistics <ul><li>Descriptive statistics are numbers that are used to summarize and describe data (the information that has been collected from an experiment, a survey, an historical record, etc.). </li></ul>
3. 3. 1.Descriptive statistics <ul><li>Descriptive statistics is just descriptive. It does not involve generalizing beyond the data at hand. Generalizing from our data to another set of cases is the business of inferential statistics. </li></ul>
4. 4. 1.Descriptive statistics <ul><li>It can include: </li></ul><ul><li>graphical summaries that show the spread of the data; </li></ul><ul><li>numerical summaries that either measure the central tendency (a 'typical' data value) of a data set or describe the spread of the data. </li></ul>
5. 5. Example: numerical summary <ul><li>Descriptive statistics is central to the world of sports. For the Olympic marathon (a foot race of 26.3 miles), we possess data that cover more than a century of competition. (The first modern Olympics took place in 1896). The following table shows the winning times for women, who have only been allowed to compete since 1984). </li></ul>
6. 6. 2:26:20 Japan Mizuki Noguchi 2004 2:23:14 Japan Naoko Takahashi 2000 2:26:05 Ethiopia Fatuma Roba 1996 2:32:41 UT ValentinaYegorova 1992 2:25:40 Portugal Rosa Mota 1988 2:24:52 USA Joan Benoit 1984 Time Country Winner Year
7. 7. Example: graphical summary <ul><li>A kind of graphical summary is the histogram, which combines data into groups or classes as a way to generalize the details of a data set while at the same time illustrate the data's overall pattern. </li></ul><ul><li>Let’s see an example. </li></ul>
8. 9. <ul><li>In the previous histogram we see that the first class contains all the States that experienced between zero and nineteen tornadoes during 2000. </li></ul><ul><li>Histograms can show gaps where no data values exist (the 100-119 class). In this one, there are three empty classes: 80-99, 100-119, and 120-139. </li></ul>
9. 10. 2.Inferential statistics <ul><li>Inferential statistics is used to make inferences, predictions or comparisons from our data to more general conditions. </li></ul><ul><li>On the contrary, with descriptive statistics we condense a set of known numbers into a few simple values (either numerically or graphically) to simplify an understanding of those data. </li></ul>
10. 11. 2.Inferential statistics <ul><li>A common method used in inferential statistics is estimation. In estimation, the sample is used to estimate a parameter, and a confidence interval about the estimate is constructed. </li></ul><ul><li>Other examples of inferential statistics methods include hypothesis testing, linear regression, and principle components analysis. </li></ul>
11. 12. 2.1.Hypothesis testing <ul><li>Statistical hypothesis is an assumption about a population parameter. This assumption may or may not be true. </li></ul><ul><li>The best way to determine whether a statistical hypothesis is true would be to examine the entire population. Since that is often impractical, researchers typically examine a random sample from the population. </li></ul>
12. 13. 2.1.Hypothesis testing <ul><li>There are two types of statistical hypotheses. </li></ul><ul><li>Null hypothesis (denoted by H0): the hypothesis that sample observations result purely from chance. </li></ul><ul><li>Alternative hypothesis (denoted by H1 or Ha): the hypothesis that sample observations are influenced by some non-random cause. </li></ul>
13. 14. 2.1.Hypothesis testing <ul><li>Statisticians follow a formal process to determine whether to accept or reject a null hypothesis, based on sample data. This process is called hypothesis testing. </li></ul><ul><li>It consists of four steps: </li></ul><ul><li>1st step. State the hypotheses. </li></ul><ul><li>This involves stating the null and alternative hypotheses. The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false. </li></ul>
14. 15. 2.1Hypothesis testing <ul><li>2nd step. Formulate an analysis plan. It describes how to use sample data to accept or reject the null hypothesis. The accept/reject decision often focuses around a single test statistic. </li></ul><ul><li>3rd step. Analyze sample data. </li></ul><ul><li>Find the value of the test statistic (mean score, proportion, t-score, z-score, etc.) described in the analysis plan. Complete other computations, as required by the plan. </li></ul>
15. 16. 2.1Hypothesis testing <ul><li>4th step. Interpret results. </li></ul><ul><li>Apply the decision rule described in the analysis plan. If the test statistic supports the null hypothesis, accept the null hypothesis; otherwise, reject the null hypothesis. </li></ul>
16. 17. Hypothesis testing: example <ul><li>We wish to prove a new vaccine is more effective than the current vaccine used for preventing a particular disease. The null hypothesis is that there is no difference in efficacy between the two vaccines. The alternative hypothesis is that the new vaccine is better. </li></ul>
17. 18. Hypothesis testing: example <ul><li>We need a measurement that indicates the efficacy of each vaccine. The difference between the count of occurrences of the disease for the old vaccine and the count of occurences of the disease for the new vaccine is calculated. If it is sufficiently large, the null hypothesis - that there is no difference between in efficacy between the two vaccines - is rejected. If the difference is not sufficiently large, we fail to reject the null hypothesis. </li></ul>
18. 19. Hypothesis testing: example <ul><li>In all hypothesis testing, the final conclusion once the test has been carried out is always given in terms of the null hypothesis. </li></ul>
19. 20. 3.Regression <ul><li>It includes any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps us understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Regression analysis is widely used for prediction (including forecasting of time-series data). </li></ul>
20. 21. 3.1 Linear regression <ul><li>It analyzes the relationship between two variables, X and Y. For each subject (or experimental unit), you know both X and Y and you want to find the best straight line through the data. </li></ul>
21. 22. 3.1 Linear regression <ul><li>The method was first used to examine the relationship between the heights of fathers and sons. The two were related, of course, but the slope is less than 1.0. A tall father tended to have sons shorter than himself; a short father tended to have sons taller than himself. The height of sons regressed to the mean. The term &quot;regression&quot; is now used for many sorts of curve fitting. </li></ul>
22. 23. 3.1 Linear regression <ul><li>The purpose of linear regression is to find the line that comes closest to your data. </li></ul><ul><li>Nonlinear regression is a general technique to fit a curve through your data. It fits data to any equation that defines Y as a function of X and one or more parameters. It finds the values of those parameters that generate the curve that comes closest to the data. </li></ul>
23. 24. 3.1 Linear regression <ul><li>A linear regression line has an equation of the form Y = a + bX , where X is the explanatory variable and Y is the dependent variable. The slope of the line is b , and a is the intercept (the value of y when x = 0). </li></ul><ul><li>Example: The dataset &quot;Televisions, Physicians, and Life Expectancy&quot; contains, among other variables, the number of people per television set and the number of people per physician for 40 countries. For more details: http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm </li></ul>