Correlation & Regression - Relationship Analysis

32,271 views

Published on

Correlation & Regression - Relationship Analysis

Published in: Business
2 Comments
19 Likes
Statistics
Notes
No Downloads
Views
Total views
32,271
On SlideShare
0
From Embeds
0
Number of Embeds
120
Actions
Shares
0
Downloads
1,898
Comments
2
Likes
19
Embeds 0
No embeds

No notes for slide

Correlation & Regression - Relationship Analysis

  1. 1. Correlation/Regression RELATIONSHIP ANALYSIS PPT 18-1
  2. 2. Learning Objectives <ul><li>The meanings and uses of regression and correlation analyses </li></ul><ul><li>Calculate regressions and correlation </li></ul><ul><li>Basics of multivariate statistical analysis techniques </li></ul>PPT 18-2
  3. 3. Statistics Not Always Black and White <ul><li>How does the story relate to marketing research? </li></ul><ul><li>Explain the meaning of this statement from the story: “Statistical fallacies by themselves might create a certain amount of random mischief. But the big problem is that statistics which seem to confirm the dogmas of the intelligentsia are seized upon and trumpeted throughout academia and the media, with little or no concern for “multicollinearity” or any of the other pitfalls.” </li></ul><ul><li>How can the Internet be used to help you understand multicollinearity, correlation, and other statistical concepts? </li></ul>PPT 18-3
  4. 4. Relationship Analysis <ul><li>The examination of the association between two or more variables. In marketing, some of the more apparent relationships include associations between advertising and sales, company size and advertising budget, supply and demand for products, and customer satisfaction and customer loyalty. </li></ul>PPT 18-4
  5. 5. Scatter Diagrams <ul><li>Two related variables, called bivariate data, plotted as points on a graph. </li></ul><ul><li>Each point on the diagram represents a pair of values, one based on the X scale (independent variable) and the other based on the Y scale (dependent variable). </li></ul><ul><li>Making a scatter diagram usually is the initial step in investigating the relationship between two variables, because the diagram shows visually the shape and degree of closeness of the relationship. </li></ul><ul><li>A scatter diagram also indicates whether the relationship between the two variables is positive or negative. </li></ul>PPT 18-5
  6. 6. Simple Regression Analysis <ul><li>Refers to statistical techniques for measuring the relationship between a dependent variable and one or more independent variables. The relationship between two variables is characterized by how they vary together. Given pairs of X and Y variables, regression analysis measures the direction (positive or negative) and rate of change (slope) in Y as X changes, or vice versa. Using the values of the independent variable, it attempts to predict the values of an interval-or ratio-scaled dependent variable. </li></ul>PPT 18-6
  7. 7. Regression Analysis Requires Two Operations <ul><li>Derive an equation, called the regression equation , and a line representing the equation to describe the shape of the relationship between the variables. The regression line is the line drawn through a scatter diagram that “best fits” the data points and accurately describes the relationship between the two variables. The equation and its line may be linear or curvilinear. </li></ul><ul><li>Estimate the dependent variable ( Y ) from the independent variable ( X ), based on the relationship described by the regression equation. </li></ul>PPT 18-7
  8. 8. Correlation Analysis <ul><li>Statistical techniques for measuring the closeness of the relationship between variables. </li></ul><ul><li>It measures the degree to which changes in one variable are associated with changes in another. </li></ul><ul><li>It can only indicate the degree of association or covariance between variables. Covariance is a measure of the extent to which two variables are related. </li></ul>PPT 18-8
  9. 9. Correlation Analysis - continued <ul><li>Regression and correlation analysis may be either simple or multiple. Simple analysis uses only two variables, one dependent and one independent. Multiple analysis deals with three or more variables, one dependent and two or more independent. </li></ul>PPT 18-9
  10. 10. Regression Equation and Line <ul><li>Researchers estimate the regression line using the following equation: </li></ul><ul><li>Y =  0 +  1 Xi +  I </li></ul><ul><li> 0 = the Y intercept when X equals zero </li></ul><ul><li> 1 = the slope of the regression line, which is the increase or decrease in Y for each change of one unit of X </li></ul><ul><li>X i = a given value of the independent variable </li></ul><ul><li>i = the observation number </li></ul><ul><li> i = the error term associated with the i th observation </li></ul>PPT 18-10
  11. 11. Regression Equation and Line - continued <ul><li>The model involves parameters that are unknown (  0 and  1 ) but can be estimated from sample data. The error term,  i , referred to as “eta,” is also unobservable, but can be estimated from sample data. </li></ul>PPT 18-11
  12. 12. The Lack Of Precision Can Be Due To <ul><li>Complexity of most marketing and other business problems </li></ul><ul><li>The functional form of the relationship between the dependent and independent variables may differ from the one selected </li></ul><ul><li>Measurement of the variables may be imperfect </li></ul><ul><li>Data are typically available only at an aggregate level </li></ul><ul><li>Data are based on human behavior, so the error term in the model may account for a “random” component in behavior </li></ul>PPT 18-12
  13. 13. Least-Squares Method <ul><li>A statistical technique that fits a straight line to a scatter diagram by using the shortest vertical distances of all the points from the straight line. </li></ul><ul><li>The equation derived by this method will yield a regression line that best fits the data. </li></ul>PPT 18-13
  14. 14. <ul><li>Regression coefficients are the values that represent the effect of the individual independent variables on the dependent variable. </li></ul>
  15. 15. Standard Deviation of Regression <ul><li>The standard deviation of the Y values from the regression line ( Y c ). This is also called the standard error of estimate, since it can be used to measure the error of the estimates of individual Y values based on the regression line. </li></ul>PPT 18-14
  16. 16. Total Deviation <ul><li>Total deviation = Unexplained deviation + Explained deviation </li></ul><ul><li>The terms “explained” and “unexplained” are used here to indicate whether or not a portion of the total deviation is reduced by the introduction of the X values in computing Y c values. When these values are summed and squared individually, they estimate the explained and unexplained variation of Y . </li></ul>PPT 18-15
  17. 17. Coefficient of Determination ( r 2 ) <ul><li>The strength of association or degree of closeness of the relationship between two variables measured by a relative value. It demonstrates how well the regression line fits the scattered points. </li></ul><ul><li>It indicates the amount of variation in the dependent variable that is explained by the variation in the independent variable and vice versa. </li></ul><ul><li>It is defined as the ratio of the explained variation to the total variation. </li></ul>PPT 18-16
  18. 18. Coefficient of Determination ( r 2 ) - continued <ul><li>When r 2 is close to 1, the Y values are very close to the regression line. When r 2 is close to 0, the Y values are not close to the regression line. </li></ul><ul><li>r 2 is always a positive number. It cannot tell whether the relationship between the two variables is positive or negative. </li></ul>PPT 18-17
  19. 19. Correlation Coefficient <ul><li>The square root of r 2 , is frequently computed to indicate the direction of the relationship in addition to indicating the degree of the relationship. </li></ul><ul><li>It is the correlation between the observed and predicted values of the dependent variable. </li></ul>PPT 18-18
  20. 20. <ul><li>Since the range of r 2 is from 0 to 1, the coefficient of correlation r will vary within the range of from 0 to  1. </li></ul><ul><li>The + sign of r will mean a negative correlation. The sign of r is the same as the sign of b (the slope) in the regression equation. </li></ul>
  21. 21. Calculating Regressions Using Computers <ul><li>To run the calculations using SPSS </li></ul><ul><ul><li>Click on “Statistics” </li></ul></ul><ul><ul><li>Then click on “Regression” and “Linear” </li></ul></ul><ul><ul><li>These commands designate the statistical test to be run </li></ul></ul><ul><li>To run calculations using Excel </li></ul><ul><ul><li>Click on “Tools” and “Data Analysis” </li></ul></ul><ul><ul><li>Then click on “Regression.” </li></ul></ul>PPT 18-19
  22. 22. Multiple Regression Analysis <ul><li>This test will determine the association or relationship between dependent and independent variables. </li></ul><ul><li>In multiple regression analysis, more than two variables are included in the examination. While the dependent variables is still represented by Y , the independent variables are represented by X 1 , X 2 , X 3 , . . . and so on </li></ul>PPT 18-20
  23. 23. <ul><li>Since with multiple regression we are dealing with more than one independent variable, we refer to the association between the dependent and independent variables as the coefficient of multiple determination, denoted by. </li></ul>
  24. 24. Calculating Multiple Regression Using Computers <ul><li>To perform the computations using SPSS for Windows </li></ul><ul><ul><li>Click on “Statistics” </li></ul></ul><ul><ul><li>Then click on “Regression” and “Linear” </li></ul></ul><ul><ul><li>These commands designate the statistical test to be run </li></ul></ul><ul><li>To run the calculations using Excel </li></ul><ul><ul><li>Click on “Tools” and “Data Analysis” </li></ul></ul><ul><ul><li>Then click on “Regression.” </li></ul></ul>PPT 18-21
  25. 25. Forecasting Using Time Series Analysis <ul><li>Numerical variables that are calculated, measured, or observed sequentially on a regular chronological basis are called time series </li></ul><ul><li>A time series representing an organization’s is the result of interactions of many changing forces </li></ul><ul><li>The forces can be business, economic, political, and social influences as well as the forces of nature. </li></ul>PPT 18-22
  26. 26. Time Series Patterns Or Components <ul><li>Secular trends - direction of a time series movement over a long period of time usually represented by a straight line or a smooth curve. </li></ul><ul><li>Seasonal variation - repeating periodic movement of a time series </li></ul>PPT 18-23
  27. 27. <ul><li>Cyclical fluctuations or “business cycles” - expansions (ups) and contractions (downs) of business activities around the normal value </li></ul><ul><li>Irregular movements - erratic movements, including all types of time series movements other than secular, seasonal, or cyclical </li></ul>
  28. 28. Two Popular Forecasting Techniques <ul><li>Trend Analysis - Used when historical data is plotted or extrapolated to project some outcome in the future. </li></ul><ul><li>Exponential Smoothing -Type of weighted average forecasting technique that assigns heavier weights to recent data and lighter weights to less recent data. When forecasting, the more recent data are more likely to be better predictors of the near future than are earlier periods. </li></ul>PPT 18-24
  29. 29. Multivariate Statistical Analysis <ul><li>Any simultaneous analysis of more than two variables. </li></ul><ul><li>Many times, multivariate techniques are a means of performing in one analysis what used to take multiple analyses using univariate techniques (analysis of single-variable distributions). </li></ul><ul><li>Common multivariate techniques: multiple discriminant analysis, multidimensional scaling, factor analysis , cluster analysis and conjoint analysis . </li></ul>PPT 18-25
  30. 30. Multiple Discriminant Analysis (MDA) <ul><li>Appropriate tool for testing the hypothesis that the group means of a set of independent variables for two or more groups are equal. </li></ul><ul><li>Used if the dependent variable is categorical [either dichotomous or multichotomous and the independent variables are either interval or ratio data. </li></ul><ul><li>When two classifications are being examined, it is referred to as a two-group discriminant analysis . When three or more classifications are identified, then multiple discriminant analysis is used. </li></ul>PPT 18-26
  31. 31. Multiple Discriminant Analysis (MDA) - continued <ul><li>Intent of this technique is two-fold: </li></ul><ul><ul><li>(1) to understand group differences </li></ul></ul><ul><ul><li>(2) to predict the likelihood that a variable will belong to a particular group, based on several independent variables. </li></ul></ul><ul><li>Linear combination is known as the discriminant function </li></ul><ul><li>An important function of discriminant analysis is to create a classification matrix , which shows the number of correctly and incorrectly classified cases. </li></ul>PPT 18-27
  32. 32. Factor Analysis <ul><li>Groups attributes that are alike. </li></ul><ul><li>Used to examine interrelationships among many variables and to explain these variables in terms of their common underlying and unobservable dimensions (called “factors”). </li></ul><ul><li>Factor analysis can be used to reduce the information contained in several original variables into a smaller, more manageable, set of variables while losing as little information as possible. </li></ul><ul><li>Data must be gathered from interval scales. </li></ul>PPT 18-28
  33. 33. Cluster Analysis <ul><li>Grouping data into “clusters” such that elements in the same group are similar to each other, and elements in different groups are as different as possible. </li></ul><ul><li>Partitions a sample into homogeneous classes. </li></ul><ul><li>Used to identify market segments--groups of consumers with relatively similar needs. </li></ul><ul><li>Seeks to identify constructs that underlie objects. </li></ul><ul><li>Interval scales must have been used during data gathering. </li></ul><ul><li>Creates different groups or requires previous knowledge of the group membership for each item included. </li></ul>PPT 18-29
  34. 34. Multidimensional Scaling <ul><li>Also referred to as perceptual mapping . </li></ul><ul><li>Used to identify important dimensions underlying respondents’ evaluations of test objects. </li></ul><ul><li>Convert consumer judgments of similarity or preference into distances represented in multidimensional space. </li></ul><ul><li>Multidimensional scaling techniques are used to identify important dimensions underlying customer evaluations of products, services, or companies. </li></ul>PPT 18-30
  35. 35. Conjoint Analysis <ul><li>Provides information about the relative importance respondents place on individual attributes when choosing from multiple brands. </li></ul><ul><li>Built on the assumption that consumers make complex decisions based not on one factor at a time but on several factors “jointly” (thus the term “conjoint”). </li></ul>PPT 18-31
  36. 36. Net Impact <ul><li>The Internet </li></ul><ul><ul><li>Will not help researchers with statistical analyses. </li></ul></ul><ul><ul><li>Will lend qualitative support for the research findings obtained from the quantitative analyses. </li></ul></ul><ul><ul><li>Can inform researchers about advancements made in statistical analyses through published manuscripts, discussion groups, and chat groups </li></ul></ul><ul><ul><li>Researchers also use electronic mail extensively to share their research findings </li></ul></ul>PPT 18-32
  37. 37. Decision Time! <ul><li>If correlation analysis is a popular and informative statistical method, why should researchers bother using the somewhat intimidating multivariate statistical techniques? Do you feel there is really much to gain from these methods? </li></ul><ul><li>http://www.swlearning.com/marketing/shao/powerpoint/CH18_7.ppt#5 </li></ul>PPT 18-33

×