IBStudies Statistics 2_var_ part 1


Published on

The first introduction to 2 variable statistics in the IB Studies course

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

IBStudies Statistics 2_var_ part 1

  1. 1. Statistics 2 – (Two Variable)<br />IB Mathematical Studies SL<br />
  2. 2. Syllabus reference<br />Content<br />Detail<br />
  3. 3. Inferential Statistics<br />Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine income? - Is there a link between foot size and the height of a person?<br />One variable is independent (x-axis) whilst the other is dependent (y-axis)<br />This section of statistics involves conclusions that can be made about data that has not been collected using data that has been collected. <br />Hence we can infer or predict certain points based on the data collected. Often this involves sampling as analysing an entire population can be difficult.<br />
  4. 4. Methods<br />A scatter plot is necessary to quickly determine whether the variables are related, however, more formally we may need to measure:<br />(1) Correlation – initially it may be necessary to determine if a relationship exists between two or more variables(Pearson’s product moment correlation coefficient)<br />(2) Regression analysis – if a relationship appears to exist we can then conduct further analysis to determine the type and strength of the relationship(Linear Regression or Least Squares Regression)<br />
  5. 5. Correlation<br /><ul><li>Correlation refers to the relationship or association between two variable.
  6. 6. They are classified qualitatively in three ways:
  7. 7. Direction – positive, negative, none
  8. 8. Strength – weak, moderate, strong
  9. 9. Type – linear or non-linear
  10. 10. They are classified quantitativelyby Pearson’s product-moment correlation coefficient
  11. 11. Outliers must also be considered and usually appear as isolated points away from the main body (group) of data.</li></ul>Exam hint - use this language!<br />
  12. 12. Correlation Scatter graphs<br />Positive linear correlations<br />Negative linear correlations<br />
  13. 13. CAUTION! - Causation<br />Be careful not to jump to conclusions when you determine a strong correlation between two variables – why?<br />It does not mean that a causal relationship exists, i.e. one variable does not necessarily cause the other.<br />e.g – there is a strong correlation between arm length and running speed, does that mean that short arms cause a reduction in running speed?<br />A causal relationship only exists when they are directly correlated such that if one variable is changed the other changes as well.<br />
  14. 14. Pearson’s product moment correlation coefficient (r)<br />The “r” value that your GDC gives in statistical calc mode is a measure of the strength of the correlation<br />It lies between -1 and 1<br />The closer to 1 the r-value is, the stronger the (positive) correlation<br />The closer to 0 the r-value is, the weaker the correlation<br />The closer to -1 the r-value is, the stronger the (negative) correlation<br />
  15. 15. Linking “r” to terminology<br />Note: These are only guideline values, there is no specific division points where the description has to change from strong to moderate etc.<br />
  16. 16. Correlation Scatter graphs with “r” values<br />
  17. 17. Formula for “r”<br />There are several formulae for calculating “r” but the one given and used in the IB course is:<br />sxy is the covariance(It will always be given if required).<br />Sx is the standard deviation of x data values<br />Sy is the standard deviation of y data values<br />Exam hint – make sure you know that sxis σx and sy is σy on your GDC!<br />
  18. 18. Example – correlation coefficient<br />Use the data in the table below to calculate the r –value, given sxy=7.92<br />Calculate thestandard deviationof x<br />Calculate the standard deviationof y<br />Evaluate “r” using the IB formula and compare it to your calculator.<br />
  19. 19. Line of Best Fit & Linear Regression<br /><ul><li>The line of best fit is the “quick and easy” way of finding the trend of the data
  20. 20. By eye it should have approximately the same number of data points above the line as below
  21. 21. A more accurate method is to calculate the mean of the x data and y data and ensure the trend line passes through this point called the mean point
  22. 22. Linear regression is the most accurate process for determining the trend line, as the process takes every data point in to account via a formula.</li></li></ul><li>Syllabus reference<br />Content<br />Detail<br />
  23. 23. Example – line of best fit<br />A statistician wants to know if there is a correlation between HSC maths scores and the Math Studies IB exam scores. She collected the following data from 10 randomly selected students.<br />Is there a correlation ? If so, what kind?<br />Draw the scatter plot of IB vs HSC<br />Draw the line of best fit by finding the mean of each variable.<br />If an HSC score is 77, predict the corresponding IB score.<br />If an IB score was a 2, predict the corresponding HSC score.<br />
  24. 24. Formula for Linear Regression<br />The line of best fit by the process of linear regression can be found using the given IB formula:<br />sxy is the covariance(It will always be given if required).<br />Sx is the standard deviation of x data values<br />sx2 = (sx)2i.e the std dev of x, then squared<br />
  25. 25. Example – regression line<br />Find the equation of the line of best fit in y=ax+b form using the linear regression formula, if:sxy = 9.23sx = 3.46 = (14.4, 35.2)<br />
  26. 26. Extrapolation vs Interpolation<br />Once you have a line of best fit you can use that equation to infer or predict what would happen to one variable if the other changes.<br />If you are predicting values within the range of your current data then you are said to be “interpolating”.<br />The accuracy of interpolation depends on the accuracy of your line of best fit and your r-value<br />If you are predicting values outside the range of your data then you are “extrapolating”.<br />The accuracy of extrapolation not only on the accuracy of your line of best fit but also whether it is reasonable to assume that the same trend will continue outside your range of data.<br />
  27. 27. Graph of extrapolation and interpolation ranges<br />