Lesson04
Upcoming SlideShare
Loading in...5
×
 

Lesson04

on

  • 2,695 views

Statistics for International Business School, Hanze University of Applied Science, Groningen, The Netherlands

Statistics for International Business School, Hanze University of Applied Science, Groningen, The Netherlands

Statistics

Views

Total Views
2,695
Views on SlideShare
2,695
Embed Views
0

Actions

Likes
0
Downloads
46
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Correlation and CauseJust because two variables are correlated, does not mean that one of the variables is the cause of the other. It could be the case, but it does not necessarily follow: There is a strong positive correlation between the number of cigarettes that one smokes a day and one's chances of contracting lung cancer (measured as the number of cases of lung cancer per hundred people who smoke a given number of cigarettes). The percentage of heavy smokers who contract lung cancer is higher than the percentage of light smokers who develop the disease, and both figures are higher than the percentage of non-smokers who get lung cancer. In this case, the cigarettes are definitely causing the cancer. There is a strong negative correlation between the total number of skiing holidays that people book for any month of the year and the total amount of ice cream that supermarkets sell for that month. This means that the more skiing holidays that are booked, the less ice cream is sold. Is there a cause here? Are people spending so much money on ice cream that they can't afford skiing holidays? Is the fact that the ice cream is so cold putting people off skiing? Clearly not! The simple fact is that most people tend to book their skiing holidays in the winter, and they tend to buy ice cream in the summer. Although a correlation between two variables doesn't mean that one of them causes the other, it can suggest a way of finding out what the true cause might be. There may be some underlying variable that is causing both of them. For instance, if a survey found that there is a correlation between the time that people spend watching television and the amount of crime that people commit, it could be because unemployed people tend to sit around watching the television, and that unemployed people are more likely to commit crime. If that were the case, then unemployment would be the true cause!

Lesson04 Lesson04 Presentation Transcript

  • IBS Statistics
    Year 1
    Dr. Ning DING
    n.ding@pl.hanze.nl
    I.007
  • What we are going to learn?
    • Review
    • Chapter12: Simple Regression and Correlation
    • dependent / independent variables
    • scatterdiagrams
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation
  • Review
    • Review
    • Chapter12: Simple Regression and Correlation
    • Exercises
    Find the interquartile range:
     
    1460
    1471
    1637
    1721
    1758
    1787
    1940
    2038
    2047
    2054
    2097
    2205
    2287
    2311
    2406
    Interquartile Range
    =Q3-Q1
    =2205-1721
    =484
  • Review EXCEL Lesson
    • Review
    • Chapter12: Simple Regression and Correlation
    • Exercises
    L=(8+1)*25%=2.25
    Q1=133.5
    Interquartile Range
    =274.5-133.5
    =141
    L=(8+1)*75%=6.75
    Q3=274.5
  • Review
    Median
    Quartile
    Decile
    Percentile
    1
    2
    2
    4
    1
    2
    2
    4
    5
    7
    8
    9
    12
    1st D
    Q1=2
    Interquartile
    Range
    5
    7
    8
    9
    12
    Q3=8.5
    9th D
    Boxplot
    How to interpret?
    http://cnx.org/content/m11192/latest/
  • Review
    • Review
    • Chapter12: Simple Regression and Correlation
    • Exercises
    Mean= € 450
    a
    b
    € 20
    € 2000
    Q1= € 250
    Q3= € 850
    Median= € 350
    The distribution is skewed to __________ because the mean is __________the median.
    the right
    larger than
    http://cnx.org/content/m11192/latest/
  • 0.8
    1.0
    1.0
    1.2
    1.2
    1.3
    1.5
    1.7
    2.0
    2.0
    2.1
    2.2
    4.0
    Review
    Mean > Median
    2.0
    3.2
    3.6
    3.7
    4.0
    4.2
    4.2
    4.5
    4.5
    4.6
    4.8
    5.0
    5.0
    Mean < Median
    Positively skewed
    http://qudata.com/online/statcalc/
    Negatively skewed
  • Review
    This means that the data is symmetrically distributed.
    Zero skewness
    mode=median=mean
  • Chapter 12
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation
    • scatterdiagrams
    • dependent / independent variables
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation
  • Regression and Correlation Analyses
    • Review
    • Chapter12:
    • scatter diagrams
    • dependent / independent variables
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation
    • How to determine both the nature and the strength of a relationship between variables.
  • Regression and Correlation Analyses
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation
    Scatter Diagram:
    Positive correlation
  • Regression and Correlation Analyses
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation
    Scatter Diagram:
    Negative correlation
  • Regression and Correlation Analyses
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation
    Scatter Diagram:
    No correlation
  • Regression and Correlation Analyses
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation
    Scatter Diagrams:
    • Patterns indicating that the variables are related
    • If related, we can describe the relationship
    Weak & Positive
    correlation
    Strong & Positive
    correlation
    No
    correlation
    Weak & Negative
    correlation
    Strong & Negative
    correlation
  • Regression and Correlation Analyses
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation
    • Independent variables: known
    • Dependent variables: to predict
    Variables:
    DependentVariable
    Independent Variable
  • Regression and Correlation Analyses
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation
    Correlation & Cause Effect?
    • The relationships found by regression to be relationships of association
    • Notnecessarilly of cause and effect.
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation
  • Least-squares estimating equation:
    • The dependent variable Y is determined by the independent variable X
    Y
    X
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    DependentVariable
    88
    ?
    I
    Independent Variable
    Ŷ = a + bX
  • Least-squares estimating equation:
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    Ŷ = a + bX
  • Least-squares estimating equation:
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    Y = a + bX
    a = Y - bX
  • Least-squares estimating equation:
    therelationshipbetween the age of a truck and the annual repair expense?
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    a = Y - bX
    Step 2:
    Y = a + bX
    Step 1:
    Ŷ = 3.75 + 0.75 X
    Step 6:
    Step 4:
    X=3
    Y=6
    6.75= 3.75 + 0.75 * 4
    Step 7:
    a = 6 - 0.75*3 = 3.75
    Step 5:
    If the city has a truck that is 4 years old,
    Step 8:
    the director could use the equation to predict $675 annually in repairs.
  • Least-squares estimating equation:
    Example:
    • To find the simple/linear regression of Personal Income (X) and Auto Sales (Y)
    If X=64, what about Y?
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    Step 1:
    Count the number of values.      
    N = 5
    Step 2:
    Find XY, X2   See the below table
  • Least-squares estimating equation:
    Substitute in the above slope formula given.           
    Slope(b) = = 0.19
    1159.7-5*62.2*3.72
    19359-5*62.2*62.2
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    Find ΣX, ΣY, ΣXY, ΣX2.            ΣX = 311 Mean = 62.2             ΣY = 18.6 Mean = 3.72
                ΣXY = 1159.7             ΣX2 = 19359
    Step 3:
    Step 4:
  • Least-squares estimating equation:
               
    Slope(b) = 0.19
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    Now, again substitute in the above intercept formula given.           
    Intercept(a) = Y - bX  = 3.72- 0.19 * 62.2= -8.098
    Step 5:
    Step 6:
    Then substitute these values in regression equation formula            Regression Equation(Ŷ) = a + bX
             Ŷ  = -8.098 + 0.19X
    Regression Equation:
    Ŷ = a + bX            = -8.098 + 0.19(64)            = -8.098 + 12.16            = 4.06
    Suppose if we want to know the approximate y value for the variable X = 64. Then we can substitute the value in the above equation.
  • Least-squares estimating equation:
    to minimize the sum of the squares of the errors to measure the goodness of fit of a line
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    SE
    SE
    ei = residuali
    Strong
    correlation
    Weak
    correlation
  • Least-squares estimating equation:
    to minimize the sum of the squares of the errors to measure the goodness of fit of a line
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    ei = residuali
  • Correlation Analysis:
    describe the degree to which one variable is linearly related to another.
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    r 2
    Coefficient of Determination:
    Measure the extent, or strength, of the association that exists
    between two variables.
    r
    Coefficient of Correlation:
    Square root of coefficient of determination
  • r 2
    Coefficient of Determination:
    Measure the extent, or strength, of the association that exists between two variables.
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    • 0 ≤ r2 ≤ 1.
    • The larger r2 , the stronger the linear relationship.
    • The closer r2 is to 1, the more confident we are in our prediction.
  • r 2
    Coefficient of Determination:
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
  • r
    Coefficient of Correlation:
    Square root of coefficient of determination
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
  • Review
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    Which value of r indicates a stronger correlation than 0.40? A. -0.30B. -0.50C. +0.38D. 0
    If all the plots on a scatter diagram lie on a straight line, what is the standard error of estimate? A. -1B. +1C. 0D. Infinity
  • Review
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    In the least squares equation,  Ŷ = 10 + 20X the value of 20 indicates A. the Y intercept.B. for each unit increase in X, Y increases by 20.C. for each unit increase in Y, X increases by 20.D. none of these.
     
  • Review
    • Review
    • Chapter12:
    • scatterdiagrams
    • dependent / independent variables
    • regression analysis
    • Least-squares estimating equation
    • the coefficient of determination
    • the coefficient of correlation
    A sales manager for an advertising agency believes there is a relationship between the number of contacts and the amount of the sales. To verify this belief, the following data was collected: 
    What is the Y-intercept of the linear equation? A. -12.201B. 2.1946C. -2.1946D. 12.201
  • What we have learnt?
    • scatterdiagrams
    • dependent / independent variables
    • regressionanalysis
    • Least-squares estimatingequation
    • the coefficient of determination
    • the coefficient of correlation