Upcoming SlideShare
×

# Fa2013 mba724-session 5 week 2 correlation-za edit

241 views

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
241
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
2
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Fa2013 mba724-session 5 week 2 correlation-za edit

1. 1. 1
2. 2. 2
3. 3. We  are  making  a  big  assump1on  here  –  that  the  rela1onship  is  a  straight  line   Wouldn’t  life  be  so  much  easier  if  all  rela1onships  are  straight  lines?   3
4. 4. The  Pearson  correla1on  r  is  a  numeric  index  of  the  rela1onship  between  two  con1nuous   (interval/ra1o)  variables   Cau1on:  if  a  variable  is  categorical  (e.g.,  gender  –  male  vs.  female;  ethnic  –  white,  black,   asian)  you  cannot  correlate  it  with  another  variable.  Pearson  r  can  only  be  calculated   between  two  number  variables  (e.g.,  age,  salary,  height,  weight)   R  tells  us  how  much  the  rela1onship  is  a  straight  line   These  graphs  show  possible  ways  two  variables  relate  to  one  another   The  more  the  graph  looks  like  a  straight  line,  the  stronger  the  r  value  is   The  graphs  that  resemble  a  circle  indicate  very  low  or  even  no  correla1on  between  the  two   variables   The  direc1on  of  the  line  indicates  whether  the  correla1on  is  posi1ve  or  nega1ve   If  the  line  goes  up  to  the  right,  it’s  a  posi1ve  rela1onship  (meaning,  when  X  goes  up,  Y  goes   up  too)   If  the  line  goes  down  to  the  right,  it’s  a  nega1ve  rela1onship  (meaning,  when  X  goes  up,  Y   goes  down  and  vice  versa)   For  example,  “when  we  get  older,  we  also  get  wiser”.  If  this  is  true,  that  means  there  should   be  a  posi1ve  and  strong  Pearson  correla1on  r  between  the  age  variable  and  the  wisdom   variable.   If  we  are  less  happy  when  we  have  more  money,  that  means  there  should  be  a  nega1ve   Pearson  correla1on  r  between  the  happiness  variable  and  the  money  variable   4
5. 5. As  you  can  see  from  these  charts,  Pearson  correla1on  r  becomes  stronger  as  the  data   points  cluster  more  1ghtly  around  a  straight  line.   When  the  data  points  are  distributed  like  a  round  circle,  that  means  the  X  and  Y  variables   have  liTle  rela1onship  to  each  other.   Note  that  most  of  these  (except  for  the  ﬁrst  graph)  have  posi1ve  correla1ons,  although   some  of  them  are  weaker  (more  rounded)  than  others  (more  straight  lines).   5
6. 6. The  same  principle  applies  to  the  nega1ve  correla1ons.  The  trend  goes  down  to  the  right   when  the  correla1on  is  nega1ve   6
7. 7. Again,  to  summarize  there  are  two  components  to  the  correla1on  value:   1.  It’s  direc1on,   2.  it’s  strength   What  kind  of  correla1on  are  you  predic1ng  for  your  group  project?   7
8. 8. Cau1on:   Correla1on  measures  the  linear  rela1onship  between  two  variables.   When  the  assump1on  of  normality  is  violated,  weird  things  happen.   This  slide  illustrates  4  diﬀerent  datasets  all  with  the  same  correla1on.   The  moral  of  the  story  is  that  we  should  always  inspect  the  scaTerplot  when  running   correla1ons.  Numbers  should  be  interpreted  sensibly.   8
9. 9. We  can  never  stress  enough  that  correla1on  is  NOT  the  same  as  causa1on.   One  of  my  favorite  examples  by  a  student  is  about  shoe  size  and  intelligence.    A  posi1ve   correla1on  was  found  between  shoe  size  and  intelligence  levels,  leading  people  to  think   that  bigger  feet  =  smarter  people.  Then  they  realized  that  bigger  shoe  size  also  generally   means  older  people,  and  in  fact  it  wasn’t  the  size  of  peoples’  feet  that  was  causing   increased  intelligence,  it  was  simply  the  fact  that  they  were  older  and  therefore  scored   higher  on  tests!       9
10. 10. We  all  want  to  have  a  posi1ve  rela1onship  with  our  family,  friends,  coworkers,  etc.  Who   wants  a  nega1ve  rela1onship,  right?   In  that  spirit,  why  would  anyone  want  a  nega1ve  correla1on?  And  we  should  celebrate   every  1me  we  have  a  posi1ve  correla1on,  right?   How  about  a  posi1ve  correla1on  between  GDP  and  obesity  level?  How  about  a  posi1ve   correla1on  between  smoking  and  cancer?  How  about  a  posi1ve  correla1on  between  the   CEO’s  compensa1on  and  corrup1on  level?     Now  let’s  look  at  some  nega1ve  correla1ons  that  are  supposed  to  be  “depressing:”  more   exercise  associated  with  lower  levels  of  obesity,  more  educa1on  associated  with  lower   crime  rate,  fewer  mee1ngs  associated  with  increased  produc1vity,  and,  how  about  more   relaxing  weekends  associated  with  lower  stress  levels?   What’s  the  moral  of  the  story?  Correla1on  is  what  it  is  –  it’s  a  number  that  indicates  the   strength  and  direc1on  of  a  rela1onship  between  two  numerical  (con1nuous)  variables.   Whether  the  rela1onship  is  good  for  the  mankind  or  not  is  beyond  the  scope  of  the  humble   liTle  number’s  responsibility!   10
11. 11. Assigning  numbers  to  categorical  variables  do  not  make  them  interval/ra1o  variables.   This  is  because  we  can  only  do  math  with  interval/ra1on  variables.  Basic  math  principles   don’t  apply  to  categorical  variables,  even  if  they  have  numbers  associated  with  them.  The   numbers  assign  to  categorical  variables  are  just  for  iden1ﬁca1on,  just  like  SSN,  or  zip  codes.   For  example,  1+1=2   In  the  gender  case,  this  means  that  if  you  add  a  female  and  another  female  together,  that’s   equal  to  a  male.   Another  math  principle  is  that  2  is  twice  as  big  as  1.   In  the  gender  case,  that  would  mean  that  a  male  is  twice  as  big  as  a  female.   All  this  madness  would  happen  if  we  try  to  treat  categorical  variables  in  numeric  ways.   Keep  in  mind  that  the  Pearson  correla1on  r  value  is  calculated  based  on  a  math  formula.  If   you  try  to  feed  the  gender  variables  into  SPSS  as  numbers,  SPSS  CAN  and  WILL  calculate  a   Pearson  correla1on  value  for  you,  but  using  that  number  requires  you  to  make  the  kinds  of   crazy  assump1ons  illustrated  above.   11