SlideShare a Scribd company logo
1 of 15
Download to read offline
1	
  
	
  
	
  
	
  
	
  
	
  
Total	
  Payroll	
  vs.	
  	
  
Winning	
  Percentage	
  	
  
In	
  Major	
  League	
  Baseball	
  
Bayesian	
  Statistics	
  
Fall,	
  2014	
  
	
  
	
  
Lingwen	
  He	
  
Zijian	
  Su	
  
Xiangyu	
  Li	
  
Padraic	
  O’Shea	
   	
  
2	
  
	
  
Introduction	
  
	
  
Major	
  League	
  Baseball	
  (MLB)	
  is	
  the	
  last	
  professional	
  sport	
  in	
  America	
  to	
  have	
  not	
  
adopted	
  a	
  salary	
  cap.	
  The	
  lack	
  of	
  a	
  salary	
  cap	
  has	
  led	
  to	
  large	
  differences	
  in	
  the	
  total	
  payroll	
  
for	
  big	
  market	
  teams	
  vs.	
  small	
  market	
  teams.	
  This	
  glaring	
  difference	
  in	
  total	
  payroll	
  has	
  fed	
  
the	
  ongoing	
  discussion	
  of	
  whether	
  or	
  not	
  teams	
  can	
  “buy”	
  wins	
  by	
  spending	
  more	
  money.	
  
To	
  investigate	
  whether	
  teams	
  that	
  spend	
  more	
  money	
  have	
  higher	
  winning	
  percentages,	
  we	
  
will	
  explore	
  the	
  existence	
  of	
  a	
  linear	
  relationship	
  between	
  average	
  total	
  payroll	
  and	
  average	
  
winning	
  percentage	
  of	
  MLB	
  teams	
  from	
  2004	
  to	
  2012.	
  
	
  
Methods	
  
	
   	
  
Data	
  
From	
  Baseball-­‐Reference.com	
  we	
  acquired	
  data	
  on	
  regular	
  season	
  winning	
  
percentage	
  by	
  team.	
  This	
  data	
  can	
  be	
  accessed	
  from	
  the	
  following	
  link:	
  
http://www.baseball-­‐reference.com/leagues/MLB/.	
  Data	
  on	
  total	
  payroll,	
  by	
  team,	
  was	
  
acquired	
  through	
  USA	
  Today.	
  A	
  link	
  to	
  that	
  data	
  is	
  provided	
  here:	
  
http://content.usatoday.com/sportsdata/baseball/mlb/salaries/team/2004.	
  	
  
To	
  explore	
  the	
  linear	
  relationship	
  between	
  total	
  payroll	
  and	
  winning	
  percentage	
  
over	
  time	
  one	
  data	
  point	
  for	
  each	
  team	
  was	
  needed.	
  To	
  calculate	
  these	
  data	
  points	
  winning	
  
percentage	
  and	
  total	
  payroll	
  were	
  collected	
  from	
  the	
  2004	
  to	
  2012	
  seasons	
  and	
  averaged	
  by	
  
team.	
  The	
  predictor	
  variable	
  was	
  re-­‐scaled,	
  by	
  dividing	
  by	
  a	
  million,	
  to	
  increase	
  the	
  size	
  of	
  
the	
  coefficient.	
  Initial	
  inference	
  on	
  the	
  ‘averaged’	
  dataset	
  did	
  not	
  indicate	
  a	
  severe	
  violation	
  
of	
  the	
  assumption	
  of	
  normality	
  but	
  the	
  normal-­‐QQ	
  plot	
  was	
  not	
  perfectly	
  linear.	
  Three	
  
potential	
  outliers	
  were	
  also	
  identified	
  while	
  performing	
  inference.	
  
One	
  at	
  a	
  time,	
  the	
  possible	
  outliers	
  were	
  removed	
  and	
  analysis	
  completed	
  using	
  
residual	
  plots	
  and	
  QQ	
  plots.	
  We	
  found,	
  overall,	
  that	
  removing	
  the	
  points	
  did	
  not	
  improve	
  
the	
  model	
  or	
  the	
  fit	
  of	
  the	
  distribution.	
  Therefore,	
  the	
  dataset	
  containing	
  all	
  points	
  was	
  
used	
  for	
  the	
  analysis.	
  The	
  plots	
  used	
  for	
  inference	
  can	
  be	
  found	
  in	
  Appendix	
  C.	
  
To	
  begin	
  understanding	
  the	
  data,	
  descriptive	
  statistics	
  may	
  be	
  considered.	
  The	
  
mean,	
  median	
  and	
  standard	
  deviation	
  for	
  each	
  variable	
  under	
  consideration	
  is	
  given	
  in	
  
Table	
  1	
  below.	
  Predictably	
  we	
  see	
  that	
  average	
  total	
  salary	
  appears	
  skewed	
  to	
  the	
  right	
  as	
  
the	
  mean	
  is	
  greater	
  than	
  the	
  median.	
  Average	
  winning	
  percentage	
  appears	
  to	
  have	
  a	
  
relatively	
  normal	
  distribution.	
  Additionally,	
  standard	
  deviation	
  is	
  fairly	
  large,	
  especially	
  for	
  
total	
  salary.	
  
	
  
	
  
	
  
3	
  
	
  
Data	
  
(Avg.	
  2004	
  to	
  2012)	
  
Mean	
   Standard	
  
Deviation	
  
Minimum	
   Median	
   Maximum	
  
Total	
  Salary	
   84.657	
   32.693	
   44.046	
   74.752	
   199.368	
  
Winning	
  %	
   0.5003	
   0.0414	
   0.40	
   0.50	
   0.58	
  
Table	
  1.	
  Descriptive	
  Statistics	
  for	
  Study	
  Variables	
  
Total	
  Salary	
  in	
  Millions	
  	
  
	
  
Statistical	
  Method	
  
We	
  hypothesize	
  that	
  average	
  total	
  payroll	
  and	
  average	
  winning	
  percentage	
  are	
  
linearly	
  associated.	
  To	
  assess	
  this	
  relationship,	
  Bayesian	
  simple	
  linear	
  regressions	
  will	
  be	
  
utilized	
  with	
  average	
  winning	
  percentage	
  as	
  the	
  response.	
  Two	
  methods	
  will	
  be	
  used	
  to	
  
explore	
  this	
  linear	
  relationship.	
  Firstly,	
  a	
  non-­‐informative	
  prior	
  to	
  illustrate	
  the	
  lack	
  of	
  prior	
  
knowledge	
  about	
  the	
  effects	
  of	
  salary	
  on	
  winning	
  percentage.	
  Next,	
  an	
  informative	
  prior	
  
based	
  on	
  our	
  prior	
  beliefs.	
  The	
  two	
  methods’	
  predictive	
  outputs	
  will	
  then	
  be	
  compared.	
  
For	
  the	
  informative	
  prior	
  a	
  N(0.5,	
  0.05)	
  for	
  beta0	
  is	
  used	
  as	
  our	
  expectation	
  for	
  the	
  
winning	
  percentage	
  is	
  50%	
  with	
  small	
  variance.	
  For	
  beta1	
  a	
  N(0.1,	
  100)	
  is	
  used	
  due	
  to	
  the	
  
lack	
  of	
  knowledge	
  and	
  an	
  expectation	
  that	
  this	
  rate	
  will	
  be	
  positive,	
  but	
  not	
  overly	
  large.	
  
Our	
  expectation	
  for	
  the	
  variance	
  of	
  beta1	
  is	
  that	
  it	
  will	
  be	
  large.	
  
Convergence	
  was	
  assessed	
  via	
  OpenBUGS	
  output	
  by	
  history	
  plots,	
  auto-­‐correlation	
  
plots	
  and	
  MC_error	
  values.	
  Due	
  to	
  rapid	
  convergence,	
  only	
  one	
  chain	
  was	
  used	
  for	
  the	
  
MCMC	
  integration.	
  However,	
  this	
  meant	
  BGR	
  plots	
  could	
  not	
  be	
  used	
  to	
  assess	
  burn-­‐in.	
  In	
  
an	
  effort	
  to	
  exclude	
  initial	
  values,	
  since	
  they	
  were	
  based	
  on	
  intuition	
  and	
  likely	
  not	
  
representative	
  of	
  the	
  posterior	
  distribution,	
  a	
  3000	
  sample	
  burn-­‐in	
  was	
  used.	
  
	
  
	
   	
  
4	
  
	
  
Results	
  
The	
  results	
  of	
  the	
  Bayesian	
  simple	
  linear	
  regression	
  models	
  performed	
  using	
  R	
  are	
  
given	
  below.	
  Figure	
  1	
  contains	
  the	
  node	
  statistics	
  for	
  the	
  non-­‐informative	
  prior.	
  The	
  history	
  
plots	
  and	
  auto-­‐correlation	
  plots	
  used	
  for	
  assessing	
  convergence	
  in	
  the	
  non-­‐informative	
  prior	
  
model	
  can	
  be	
  found	
  in	
  Appendix	
  A.	
  
	
  
mean sd MC_error val2.5pc median val97.5pc start sample
beta0 0.5002 0.006425 6.46E-5 0.4876 0.5002 0.5131 3001 12000
beta1 7.962E-4 2.01E-4 1.935E-6 4.009E-4 7.949E-4 0.001184 3001 12000
mu[1] 0.4835 0.007636 7.081E-5 0.4684 0.4835 0.4989 3001 12000
mu[2] 0.5043 0.006521 6.684E-5 0.4914 0.5043 0.5171 3001 12000
mu[3] 0.4925 0.006691 6.445E-5 0.4792 0.4924 0.5059 3001 12000
mu[4] 0.5449 0.01305 1.345E-4 0.5189 0.5449 0.5704 3001 12000
mu[5] 0.5199 0.008179 8.614E-5 0.5033 0.52 0.536 3001 12000
mu[6] 0.5124 0.007157 7.505E-5 0.498 0.5125 0.5266 3001 12000
mu[7] 0.4873 0.007166 6.734E-5 0.4732 0.4872 0.5018 3001 12000
mu[8] 0.4809 0.008022 7.386E-5 0.465 0.4808 0.497 3001 12000
mu[9] 0.4862 0.007292 6.824E-5 0.4718 0.4862 0.501 3001 12000
mu[10] 0.5131 0.007238 7.598E-5 0.4986 0.5132 0.5275 3001 12000
mu[11] 0.499 0.006429 6.422E-5 0.4864 0.499 0.512 3001 12000
mu[12] 0.4767 0.008688 7.94E-5 0.4594 0.4767 0.4943 3001 12000
mu[13] 0.5241 0.008868 9.323E-5 0.5062 0.5242 0.5415 3001 12000
mu[14] 0.5121 0.00713 7.474E-5 0.4978 0.5122 0.5262 3001 12000
mu[15] 0.4716 0.009598 8.729E-5 0.4525 0.4716 0.491 3001 12000
mu[16] 0.4878 0.007113 6.698E-5 0.4738 0.4877 0.5022 3001 12000
mu[17] 0.4922 0.006711 6.455E-5 0.4789 0.4922 0.5057 3001 12000
mu[18] 0.4781 0.008451 7.74E-5 0.4614 0.4781 0.4951 3001 12000
mu[19] 0.5256 0.009123 9.582E-5 0.5072 0.5257 0.5434 3001 12000
mu[20] 0.5916 0.02402 2.405E-4 0.5445 0.5915 0.6389 3001 12000
mu[21] 0.4806 0.008057 7.414E-5 0.4647 0.4806 0.4968 3001 12000
mu[22] 0.5272 0.00943 9.892E-5 0.5083 0.5274 0.5457 3001 12000
mu[23] 0.4679 0.01032 9.374E-5 0.4475 0.4678 0.4886 3001 12000
mu[24] 0.4773 0.008586 7.852E-5 0.4602 0.4773 0.4947 3001 12000
mu[25] 0.5077 0.006722 6.976E-5 0.4943 0.5078 0.5211 3001 12000
mu[26] 0.5067 0.006652 6.882E-5 0.4935 0.5068 0.5199 3001 12000
mu[27] 0.5082 0.006758 7.023E-5 0.4947 0.5082 0.5216 3001 12000
mu[28] 0.4685 0.0102 9.27E-5 0.4483 0.4684 0.4889 3001 12000
mu[29] 0.4905 0.006852 6.532E-5 0.4769 0.4904 0.5042 3001 12000
mu[30] 0.4884 0.007049 6.655E-5 0.4745 0.4883 0.5027 3001 12000
postprob 0.9998 0.01581 1.431E-4 1.0 1.0 1.0 3001 12000
sigma 0.03472 0.004829 4.703E-5 0.02672 0.03418 0.04542 3001 12000
tausq 876.2 235.3 2.254 484.9 856.2 1401.0 3001 12000
Figure	
  1.	
  Node	
  Statistics	
  for	
  Non-­‐Informative	
  Prior	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
5	
  
	
  
Figure	
  2	
  contains	
  the	
  node	
  statistics	
  for	
  the	
  informative	
  prior.	
  The	
  history	
  plots	
  and	
  
auto-­‐correlation	
  plots	
  used	
  for	
  assessing	
  convergence	
  in	
  the	
  informative	
  prior	
  model	
  can	
  be	
  
found	
  in	
  Appendix	
  A.	
  
	
  
mean sd MC_error val2.5pc median val97.5pc start sample
beta0 0.5002 0.006425 6.46E-5 0.4876 0.5002 0.5131 3001 12000
beta1 7.966E-4 2.01E-4 1.935E-6 4.013E-4 7.952E-4 0.001184 3001 12000
mu[1] 0.4835 0.007636 7.081E-5 0.4684 0.4835 0.4989 3001 12000
mu[2] 0.5043 0.006521 6.684E-5 0.4914 0.5043 0.5171 3001 12000
mu[3] 0.4925 0.006691 6.445E-5 0.4792 0.4924 0.5059 3001 12000
mu[4] 0.5449 0.01305 1.345E-4 0.5189 0.5449 0.5705 3001 12000
mu[5] 0.52 0.008179 8.614E-5 0.5033 0.52 0.536 3001 12000
mu[6] 0.5124 0.007157 7.506E-5 0.498 0.5125 0.5266 3001 12000
mu[7] 0.4873 0.007166 6.734E-5 0.4732 0.4872 0.5018 3001 12000
mu[8] 0.4808 0.008022 7.386E-5 0.465 0.4808 0.497 3001 12000
mu[9] 0.4862 0.007292 6.824E-5 0.4718 0.4862 0.501 3001 12000
mu[10] 0.5131 0.007238 7.598E-5 0.4986 0.5132 0.5275 3001 12000
mu[11] 0.499 0.006429 6.421E-5 0.4864 0.499 0.512 3001 12000
mu[12] 0.4767 0.008688 7.94E-5 0.4593 0.4767 0.4943 3001 12000
mu[13] 0.5241 0.008868 9.323E-5 0.5062 0.5242 0.5415 3001 12000
mu[14] 0.5122 0.00713 7.474E-5 0.4978 0.5122 0.5262 3001 12000
mu[15] 0.4716 0.009598 8.73E-5 0.4525 0.4716 0.4909 3001 12000
mu[16] 0.4878 0.007113 6.698E-5 0.4738 0.4877 0.5022 3001 12000
mu[17] 0.4922 0.006711 6.455E-5 0.4789 0.4922 0.5057 3001 12000
mu[18] 0.4781 0.008451 7.74E-5 0.4614 0.4781 0.4951 3001 12000
mu[19] 0.5256 0.009123 9.582E-5 0.5072 0.5257 0.5434 3001 12000
mu[20] 0.5916 0.02402 2.405E-4 0.5445 0.5915 0.639 3001 12000
mu[21] 0.4806 0.008057 7.414E-5 0.4646 0.4806 0.4968 3001 12000
mu[22] 0.5273 0.00943 9.892E-5 0.5083 0.5274 0.5457 3001 12000
mu[23] 0.4679 0.01032 9.374E-5 0.4475 0.4678 0.4885 3001 12000
mu[24] 0.4773 0.008585 7.853E-5 0.4602 0.4773 0.4947 3001 12000
mu[25] 0.5077 0.006722 6.976E-5 0.4943 0.5078 0.5211 3001 12000
mu[26] 0.5067 0.006652 6.882E-5 0.4935 0.5068 0.5199 3001 12000
mu[27] 0.5082 0.006758 7.024E-5 0.4947 0.5082 0.5216 3001 12000
mu[28] 0.4685 0.0102 9.27E-5 0.4483 0.4684 0.4889 3001 12000
mu[29] 0.4905 0.006852 6.532E-5 0.4769 0.4904 0.5042 3001 12000
mu[30] 0.4884 0.007049 6.655E-5 0.4745 0.4883 0.5027 3001 12000
postprob0.9998 0.01581 1.431E-4 1.0 1.0 1.0 3001 12000
sigma 0.03472 0.004829 4.704E-5 0.02672 0.03418 0.04542 3001 12000
tausq 876.2 235.3 2.254 484.9 856.2 1401.0 3001 12000	
  
Figure	
  2.	
  Node	
  Statistics	
  for	
  Informative	
  Prior	
  
	
  
Discussion	
  
	
  
Based	
  on	
  our	
  analyses,	
  we	
  found	
  a	
  positive	
  relationship	
  between	
  average	
  total	
  
payroll	
  and	
  average	
  winning	
  percentage	
  in	
  Major	
  League	
  Baseball	
  for	
  the	
  years	
  2004	
  to	
  
2012.	
  For	
  both	
  the	
  non-­‐informative	
  and	
  informative	
  methods,	
  the	
  statistics	
  for	
  postprob	
  
indicate	
  that	
  Pr(β1≥0|{y})	
  is	
  about	
  0.9998.	
  Or	
  in	
  other	
  words,	
  there	
  is	
  a	
  greater	
  than	
  99%	
  
chance	
  that	
  beta1	
  >	
  0.	
  These	
  findings	
  are	
  similarly	
  supported	
  by	
  the	
  means	
  and	
  positive	
  
95%	
  credible	
  sets	
  for	
  beta1.	
  Therefore,	
  there	
  does	
  appear	
  to	
  be	
  a	
  linear	
  association	
  
between	
  average	
  total	
  payroll	
  and	
  average	
  winning	
  percentage	
  for	
  MLB	
  teams.	
  
6	
  
	
  
There	
  was	
  very	
  little	
  difference	
  in	
  the	
  results	
  of	
  the	
  non-­‐informative	
  and	
  informative	
  
priors.	
  Our	
  belief	
  is	
  that	
  this	
  is	
  due	
  to	
  the	
  informative	
  prior	
  being	
  very	
  consistent	
  with	
  the	
  
data.	
  The	
  mean	
  and	
  median	
  for	
  the	
  informative	
  prior	
  are	
  actually	
  slightly	
  larger	
  than	
  those	
  
of	
  the	
  non-­‐informative	
  prior.	
  This	
  may	
  be	
  an	
  indication	
  that	
  our	
  non-­‐informative	
  prior	
  fits	
  
the	
  data	
  better,	
  but	
  the	
  difference	
  is	
  very	
  small.	
  
If	
  further	
  exploration	
  of	
  the	
  linear	
  relationship	
  between	
  average	
  total	
  payroll	
  and	
  
average	
  winning	
  percentage	
  for	
  MLB	
  teams	
  was	
  completed	
  more	
  information	
  about	
  
parameters	
  would	
  help	
  improve	
  the	
  analysis.	
  	
  Additionally,	
  if	
  an	
  ‘averaged’	
  data	
  set	
  was	
  
used	
  in	
  follow-­‐up	
  exploration,	
  including	
  more	
  years	
  would	
  be	
  advised.	
  Finally,	
  although	
  it	
  
appears	
  that	
  a	
  positive	
  relationship	
  exists	
  between	
  total	
  payroll	
  and	
  winning	
  percentage	
  
based	
  on	
  this	
  analysis.	
  It	
  would	
  be	
  important	
  to	
  explore	
  the	
  ongoing	
  changes	
  in	
  the	
  league.	
  
Most	
  notably,	
  the	
  debate	
  on	
  the	
  use	
  of	
  statistics	
  for	
  calculating	
  wins	
  based	
  on	
  on-­‐base-­‐
percentage	
  rather	
  than	
  traditional	
  baseball	
  measurements	
  for	
  success.	
  This	
  ongoing	
  
development	
  is	
  having	
  an	
  impact	
  on	
  perceived	
  value	
  for	
  many	
  players	
  and	
  may	
  drastically	
  
affect	
  a	
  team’s	
  salary	
  and	
  winning	
  percentage.	
  
	
  
References	
  
	
  
Our	
  dataset	
  was	
  constructed	
  by	
  combining	
  the	
  historical	
  Major	
  League	
  Baseball	
  team	
  
salaries	
  and	
  winning	
  percentage.	
  This	
  data	
  was	
  drawn	
  from	
  the	
  same	
  time	
  period,	
  2004	
  to	
  
2012,	
  for	
  both	
  variables.	
  Links	
  to	
  these	
  MLB	
  data	
  sources	
  can	
  be	
  found	
  below:	
  
	
  
Baseball-­‐Refernce.com.	
  (2014).	
  Team	
  Wins.	
  Retrieved	
  from	
  http://www.baseball-­‐
reference.com/leagues/MLB/.	
  
	
  
USA	
  Today.	
  (2014).	
  USATODAY	
  Salaries	
  Database,	
  MLB	
  salaries	
  by	
  team	
  for	
  various	
  years	
  
(2004	
  to	
  2014).	
  Retrieved	
  from	
  
http://content.usatoday.com/sportsdata/baseball/mlb/salaries/team/2004.	
  
	
  
Appendix	
  A:	
  MCMC	
  Integration	
  Convergence	
  
	
  
Figure	
  3	
  and	
  4	
  below	
  are	
  the	
  history	
  plots	
  and	
  auto-­‐correlation	
  plots,	
  respectively,	
  
for	
  the	
  non-­‐informative	
  prior.	
  From	
  these	
  plots	
  it	
  was	
  assessed	
  that	
  convergence	
  occurred	
  
quickly	
  for	
  every	
  variable.	
  Postprob’s	
  convergence	
  was	
  assessed	
  using	
  the	
  MC_error	
  found	
  
in	
  the	
  results	
  section	
  of	
  this	
  paper.	
  
	
  
7	
  
	
  
	
  
Figure	
  3.	
  History	
  Plots	
  for	
  Non-­‐Informative	
  Prior	
  
	
  
	
  
Figure	
  4.	
  Auto-­‐Correlation	
  Plots	
  for	
  Non-­‐Informative	
  Prior	
  
	
  
Figure	
  5	
  and	
  6	
  below	
  are	
  the	
  history	
  plots	
  and	
  auto-­‐correlation	
  plots,	
  respectively,	
  
for	
  the	
  informative	
  prior.	
  From	
  these	
  plots	
  it	
  was	
  assessed	
  that	
  convergence	
  occurred	
  
quickly	
  for	
  every	
  variable.	
  Postprob’s	
  convergence	
  was	
  assessed	
  using	
  the	
  MC_error	
  found	
  
in	
  the	
  results	
  section	
  of	
  this	
  paper.	
  
	
  
8	
  
	
  
	
  
Figure	
  5.	
  History	
  Plots	
  for	
  Informative	
  Prior	
  
	
  
	
  
Figure	
  6.	
  Auto-­‐Correlation	
  Plots	
  for	
  Informative	
  Prior	
  
	
  
	
  
	
  
	
  
	
  
9	
  
	
  
Appendix	
  B:	
  OpenBUGS	
  Code	
  
	
  
Non-­‐informative	
  Prior	
  
model	
  
{	
  
for	
  (i	
  in	
  1:N){	
  
	
   	
   xcent[i]<-­‐x[i]-­‐mean(x[])	
  
}	
  
for	
  (i	
  in	
  1:N){	
  
	
   mu[i]<-­‐beta0+beta1*xcent[i]	
  
	
   y[i]~dnorm(mu[i],tausq)	
  
}	
  
postprob<-­‐step(beta1)	
  
beta0~dflat()	
  
beta1~dflat()	
  
tausq~dgamma(0.001,0.001)	
  
sigma<-­‐1/sqrt(tausq)	
  
}	
  
	
  
#data	
  
list(x=c(63.69154422,	
  89.76840667,	
  74.92461167,	
  140.7180136,	
  109.4106621,	
  99.92325911,	
  68.43453544,	
  
60.32229233,	
  67.06203433,	
  100.8169706,	
  83.123759,	
  55.12364089,	
  114.6389837,	
  99.61515889,	
  
48.75051056,	
  69.04419133,	
  74.57996711,	
  56.90573011,	
  116.4527793,	
  199.368707,	
  60.03360822,	
  
118.5733706,	
  44.04681044,	
  55.889759,	
  94.06393778,	
  92.80824244,	
  94.66015589,	
  44.78459711,	
  
72.37762889,	
  69.80194444),y=c(0.47,	
  0.54,	
  0.44,	
  0.56,	
  0.48,	
  0.53,	
  0.49,	
  0.49,	
  0.48,	
  0.52,	
  0.46,	
  0.40,	
  0.55,	
  0.52,	
  
0.50,	
  0.50,	
  0.50,	
  0.47,	
  0.50,	
  0.58,	
  0.52,	
  0.53,	
  0.42,	
  0.50,	
  0.50,	
  0.46,	
  0.57,	
  0.49,	
  0.54,	
  0.50),	
  N=30)	
  
	
  
#inits	
  
list(beta0=0,	
  beta1=0,tausq=1)	
  
	
  
Informative	
  Prior	
  
model	
  
{	
  
for	
  (i	
  in	
  1:N){	
  
	
   	
   xcent[i]<-­‐x[i]-­‐mean(x[])	
  
}	
  
for	
  (i	
  in	
  1:N){	
  
	
   mu[i]<-­‐beta0+beta1*xcent[i]	
  
	
   y[i]~dnorm(mu[i],tausq)	
  
}	
  
postprob<-­‐step(beta1)	
  
beta0~dnorm(0.5,	
  0.05)	
  
beta1~dnorm(0.1,	
  100)	
  
tausq~dgamma(0.001,0.001)	
  
sigma<-­‐1/sqrt(tausq)	
  
10	
  
	
  
}	
  
	
  
#data	
  
list(x=c(63.69154422,	
  89.76840667,	
  74.92461167,	
  140.7180136,	
  109.4106621,	
  99.92325911,	
  68.43453544,	
  
60.32229233,	
  67.06203433,	
  100.8169706,	
  83.123759,	
  55.12364089,	
  114.6389837,	
  99.61515889,	
  
48.75051056,	
  69.04419133,	
  74.57996711,	
  56.90573011,	
  116.4527793,	
  199.368707,	
  60.03360822,	
  
118.5733706,	
  44.04681044,	
  55.889759,	
  94.06393778,	
  92.80824244,	
  94.66015589,	
  44.78459711,	
  
72.37762889,	
  69.80194444),y=c(0.47,	
  0.54,	
  0.44,	
  0.56,	
  0.48,	
  0.53,	
  0.49,	
  0.49,	
  0.48,	
  0.52,	
  0.46,	
  0.40,	
  0.55,	
  0.52,	
  
0.50,	
  0.50,	
  0.50,	
  0.47,	
  0.50,	
  0.58,	
  0.52,	
  0.53,	
  0.42,	
  0.50,	
  0.50,	
  0.46,	
  0.57,	
  0.49,	
  0.54,	
  0.50),	
  N=30)	
  
	
  
#inits	
  
list(beta0=0,	
  beta1=0,tausq=1)	
  
	
  
Appendix	
  C:	
  Inference	
  (R	
  code)	
  
	
  
Complete	
  Dataset	
  
>	
  data.bb=read.table('C://Users/xli63/Desktop/Baseball.txt',	
  header=TRUE)	
  
>	
  attach(data.bb)	
  
>	
  head(data.bb)	
  
>	
  x	
  <-­‐	
  data.bb$AverageTotalPayroll	
  
>	
  pct	
  <-­‐	
  data.bb$AveragePCT	
  
>	
  lm.out=lm(pct~x)	
  
>	
  summary(lm.out)	
  
Call:	
  
lm(formula	
  =	
  pct	
  ~	
  x)	
  
Call:	
  
lm(formula	
  =	
  pct	
  ~	
  x)	
  
Residuals:	
  
Min	
  	
  	
  	
  	
  	
  	
  	
  	
   1Q	
   Median	
   3Q	
   Max	
  	
  
-­‐0.076797	
   -­‐0.013157	
  	
   0.007243	
   0.020457	
   0.061695	
  	
  
Coefficients:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Estimate	
   Std.	
  Error	
   t	
  value	
   Pr(>|t|)	
  	
  	
  	
  	
  
(Intercept)	
   0.4328659	
   0.0168386	
   25.707	
  	
   <	
  2e-­‐16	
  ***	
  
X	
   0.0007969	
   0.0001860	
  	
   4.286	
   0.000194	
  ***	
  
-­‐-­‐-­‐	
  
Signif.	
  codes:	
  	
  0	
  ‘***’	
  0.001	
  ‘**’	
  0.01	
  ‘*’	
  0.05	
  ‘.’	
  0.1	
  ‘	
  ’	
  1	
  
Residual	
  standard	
  error:	
  0.03274	
  on	
  28	
  degrees	
  of	
  freedom	
  
Multiple	
  R-­‐squared:	
  	
  0.3961,	
  	
  	
  	
  Adjusted	
  R-­‐squared:	
  	
  0.3746	
  	
  
F-­‐statistic:	
  18.37	
  on	
  1	
  and	
  28	
  DF,	
  	
  p-­‐value:	
  0.0001944	
  
	
  
Reduced	
  Model	
  (#3	
  Removed)	
  
>	
  data_up2	
  <-­‐	
  data.bb[-­‐c(3),]	
  
>	
  xnew2	
  <-­‐	
  data_up2$AvgTotalPayroll	
  
>	
  pct2	
  <-­‐	
  data_up2$AveragePCT	
  
>	
  red_residual_line2	
  <-­‐	
  lm(pct2~xnew2)	
  
>	
  summary(red_residual_line2)	
  
Call:	
  
lm(formula	
  =	
  pct2	
  ~	
  xnew2)	
  
Residuals:	
  
	
  	
  	
  	
  	
  	
  Min	
  	
   1Q	
  	
  	
  	
  	
   Median	
  	
  	
  	
  	
  	
  	
  	
  	
   3Q	
  	
  	
  	
  	
  	
  	
  	
   Max	
  	
  
-­‐0.079121	
  	
   -­‐0.011606	
  	
  	
   0.005706	
  	
  	
   0.018941	
  	
  	
   0.060047	
  	
  
11	
  
	
  
Coefficients:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Estimate	
  	
   Std.	
  Error	
  	
   t	
  value	
  	
   Pr(>|t|)	
  	
  	
  	
  	
  
(Intercept)	
  	
   0.4361350	
  	
  	
   0.0164219	
  	
  	
   26.558	
  	
  	
   <	
  2e-­‐16	
  ***	
  
xnew2	
  	
  	
  	
  	
  	
  	
  	
   0.0007798	
  	
  	
   0.0001804	
  	
  	
  	
   4.323	
  	
   0.000187	
  ***	
  
-­‐-­‐-­‐	
  
Signif.	
  codes:	
  	
  0	
  ‘***’	
  0.001	
  ‘**’	
  0.01	
  ‘*’	
  0.05	
  ‘.’	
  0.1	
  ‘	
  ’	
  1	
  
Residual	
  standard	
  error:	
  0.03171	
  on	
  27	
  degrees	
  of	
  freedom	
  
Multiple	
  R-­‐squared:	
  	
  0.4091,	
  	
  	
  	
  Adjusted	
  R-­‐squared:	
  	
  0.3872	
  	
  
F-­‐statistic:	
  18.69	
  on	
  1	
  and	
  27	
  DF,	
  	
  p-­‐value:	
  0.0001872	
  
	
  
Reduced	
  Model	
  (#12	
  Removed)	
  
>	
  data_new	
  <-­‐	
  mydata[-­‐c(12),]	
  
>	
  xnew	
  <-­‐	
  data_new$AverageTotalPayroll	
  
>	
  pct_new	
  <-­‐	
  data_new$AveragePCT	
  
>	
  remove_residual_line	
  <-­‐	
  lm(pct_new~xnew)	
  
>	
  summary(remove_residual_line)	
  
Call:	
  
lm(formula	
  =	
  pct_new	
  ~	
  xnew)	
  
Residuals:	
  
	
  	
  	
  	
  	
  	
  Min	
  	
  	
  	
  	
  	
  	
  	
  	
   1Q	
  	
  	
  	
  	
   Median	
  	
  	
  	
  	
  	
  	
  	
  	
   3Q	
  	
  	
  	
  	
  	
  	
  	
   Max	
  	
  
-­‐0.056063	
  	
   -­‐0.013108	
  	
  	
   0.004436	
  	
  	
   0.016632	
  	
  	
   0.059747	
  	
  
Coefficients:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   Estimate	
  	
   Std.	
  Error	
  	
   t	
  value	
  	
   Pr(>|t|)	
  	
  	
  	
  	
  
(Intercept)	
  	
   0.4421938	
  	
  	
   0.0156408	
  	
  	
   28.272	
  	
  	
   <	
  2e-­‐16	
  ***	
  
xnew	
  	
  	
  	
  	
  	
  	
  	
  	
   0.0007190	
  	
  	
   0.0001709	
  	
  	
  	
   4.208	
  	
   0.000255	
  ***	
  
-­‐-­‐-­‐	
  
Signif.	
  codes:	
  	
  0	
  ‘***’	
  0.001	
  ‘**’	
  0.01	
  ‘*’	
  0.05	
  ‘.’	
  0.1	
  ‘	
  ’	
  1	
  
Residual	
  standard	
  error:	
  0.02964	
  on	
  27	
  degrees	
  of	
  freedom	
  
Multiple	
  R-­‐squared:	
  	
  0.396,	
  	
  	
  	
  	
  Adjusted	
  R-­‐squared:	
  	
  0.3737	
  	
  
F-­‐statistic:	
  	
  17.7	
  on	
  1	
  and	
  27	
  DF,	
  	
  p-­‐value:	
  0.0002551	
  
	
  
Reduced	
  Model	
  (#24	
  Removed)	
  
>	
  data_up1	
  <-­‐	
  data.bb[-­‐c(24),]	
  
>	
  xnew1	
  <-­‐	
  data_up1$Avg.Total.Payroll	
  
>	
  pct1	
  <-­‐	
  data_up1$Average.PCT	
  
>	
  red_residual_line	
  <-­‐	
  lm(pct1~xnew1)	
  
>	
  plot(red_residual_line)	
  
>	
  summary(red_residual_line)	
  
Call:	
  
lm(formula	
  =	
  pct1	
  ~	
  xnew1)	
  
Residuals:	
  
	
  	
  	
  	
  	
  	
  Min	
  	
  	
  	
  	
  	
  	
  	
  	
   1Q	
  	
  	
  	
  	
   Median	
  	
  	
  	
  	
  	
  	
  	
  	
   3Q	
  	
  	
  	
  	
  	
  	
  	
   Max	
  	
  
-­‐0.075337	
  	
   -­‐0.013510	
  	
  	
   0.007229	
  	
  	
   0.017961	
  	
  	
   0.062273	
  	
  
Coefficients:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   Estimate	
  	
   Std.	
  Error	
  	
   t	
  value	
  	
   Pr(>|t|)	
  	
  	
  	
  	
  
(Intercept)	
  	
   0.4301762	
  	
  	
   0.0174143	
  	
  	
   24.702	
  	
  	
   <	
  2e-­‐16	
  ***	
  
xnew1	
  	
  	
  	
  	
  	
  	
  	
   0.0008193	
  	
  	
   0.0001903	
  	
  	
  	
   4.305	
  	
   0.000196	
  ***	
  
-­‐-­‐	
  
Signif.	
  codes:	
  	
  0	
  ‘***’	
  0.001	
  ‘**’	
  0.01	
  ‘*’	
  0.05	
  ‘.’	
  0.1	
  ‘	
  ’	
  1	
  
Residual	
  standard	
  error:	
  0.03304	
  on	
  27	
  degrees	
  of	
  freedom	
  
Multiple	
  R-­‐squared:	
  	
  0.4071,	
  	
  	
  	
  Adjusted	
  R-­‐squared:	
  	
  0.3851	
  	
  
F-­‐statistic:	
  18.54	
  on	
  1	
  and	
  27	
  DF,	
  	
  p-­‐value:	
  0.0001965	
  
12	
  
	
  
	
  
Residual	
  &	
  QQ	
  plots	
  Based	
  on	
  Complete	
  Dataset	
  
	
  
Residual	
  &	
  QQ	
  plots	
  Based	
  on	
  Reduced	
  Model	
  (Data	
  point	
  3	
  Removed)	
  
	
  
	
  
	
  
	
  
	
  
50 100 150 200
0.400.450.500.55
x
y
13	
  
	
  
Residual	
  &	
  QQ	
  plots	
  Based	
  on	
  Reduced	
  Model	
  (Data	
  point	
  12	
  Removed)	
  
	
  
	
  
Residual	
  &	
  QQ	
  plots	
  Based	
  on	
  Reduced	
  Model	
  (Data	
  point	
  24	
  Removed)	
  
	
   	
  
	
  
	
   	
  
14	
  
	
  
Appendix	
  D:	
  ‘Averaged’	
  Dataset	
  
AverageTotalPayroll	
   AveragePCT	
  
63.69154422	
   0.47	
  
89.76840667	
   0.54	
  
74.92461167	
   0.44	
  
140.7180136	
   0.56	
  
109.4106621	
   0.48	
  
99.92325911	
   0.53	
  
68.43453544	
   0.49	
  
60.32229233	
   0.49	
  
67.06203433	
   0.48	
  
100.8169706	
   0.52	
  
83.123759	
   0.46	
  
55.12364089	
   0.40	
  
114.6389837	
   0.55	
  
99.61515889	
   0.52	
  
48.75051056	
   0.50	
  
69.04419133	
   0.50	
  
74.57996711	
   0.50	
  
56.90573011	
   0.47	
  
116.4527793	
   0.50	
  
199.368707	
   0.58	
  
60.03360822	
   0.52	
  
118.5733706	
   0.53	
  
44.04681044	
   0.42	
  
55.889759	
   0.50	
  
94.06393778	
   0.50	
  
92.80824244	
   0.46	
  
94.66015589	
   0.57	
  
44.78459711	
   0.49	
  
72.37762889	
   0.54	
  
69.80194444	
   0.50	
  
*For	
  Average	
  Total	
  Payroll	
  10.1	
  =	
  10,100,000	
  
	
  
	
  
	
   	
  
15	
  
	
  
Contributions	
  
	
  
Project	
  proposal:	
  All	
  Members	
  
OpenBUGS/R	
  Computing:	
  
-­‐ Non-­‐informative	
  prior:	
  Lingwen	
  He	
  
-­‐ Informative	
  prior:	
  Zijian	
  Su	
  
-­‐ Inference:	
  Xiangyu	
  Li	
  
-­‐ Additional	
  Computing:	
  Lingwen	
  He,	
  Zijian	
  Su,	
  Xiangyu	
  Li	
  
Interim	
  report:	
  All	
  Members	
  
Final	
  report	
  writing	
  and	
  formatting:	
  Padraic	
  O’Shea	
  
	
  
	
  
	
  
	
  
	
  

More Related Content

Viewers also liked

Viewers also liked (13)

Hola mundo
Hola mundoHola mundo
Hola mundo
 
"Γυάλινα Γιάννενα", Μ. Γκανάς
"Γυάλινα Γιάννενα", Μ. Γκανάς"Γυάλινα Γιάννενα", Μ. Γκανάς
"Γυάλινα Γιάννενα", Μ. Γκανάς
 
Assesmente menu
Assesmente menuAssesmente menu
Assesmente menu
 
Deans List Award
Deans List AwardDeans List Award
Deans List Award
 
Orlando Magic
Orlando MagicOrlando Magic
Orlando Magic
 
Tugas pkn
Tugas pknTugas pkn
Tugas pkn
 
Praktik TIK
Praktik TIKPraktik TIK
Praktik TIK
 
Incukalns rb sa_25112015
Incukalns rb sa_25112015Incukalns rb sa_25112015
Incukalns rb sa_25112015
 
result presentation semnasteknomedia 2015
result presentation semnasteknomedia 2015result presentation semnasteknomedia 2015
result presentation semnasteknomedia 2015
 
Tema 10 Belén
Tema 10 BelénTema 10 Belén
Tema 10 Belén
 
On a testé pour vous...
On a testé pour vous...On a testé pour vous...
On a testé pour vous...
 
超限戰 911
超限戰 911超限戰 911
超限戰 911
 
Path
PathPath
Path
 

Similar to MLB Final Project

Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxWeek 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxjessiehampson
 
Basic Analytics Module for Sponsors
Basic Analytics Module for SponsorsBasic Analytics Module for Sponsors
Basic Analytics Module for SponsorsDee Daley
 
Apogee_StatisticalStudy
Apogee_StatisticalStudyApogee_StatisticalStudy
Apogee_StatisticalStudyHarry Mendell
 
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...Case Study: Analysis and findings of Qubee customer satisfaction in compariso...
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...shaika_jannat
 
The Effect of Employer Matching and Defaults on Workers' TSP Savings Behavior
The Effect of Employer Matching and Defaults on Workers' TSP Savings BehaviorThe Effect of Employer Matching and Defaults on Workers' TSP Savings Behavior
The Effect of Employer Matching and Defaults on Workers' TSP Savings BehaviorCongressional Budget Office
 
Bus 308 week 5 final paper
Bus 308 week 5 final paperBus 308 week 5 final paper
Bus 308 week 5 final paperuopexam
 
Construction of a robust prediction model to forecast the likelihood of a cre...
Construction of a robust prediction model to forecast the likelihood of a cre...Construction of a robust prediction model to forecast the likelihood of a cre...
Construction of a robust prediction model to forecast the likelihood of a cre...AdekunleJoseph4
 
Capstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final DraftCapstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final DraftNick Imholte
 
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdf
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdfSupuestos Actuariales en tasas contingentes- versión inglés (3).pdf
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdfEvaristoDiz1
 
OPIM 5604 predictive modeling presentation group7
OPIM 5604 predictive modeling presentation group7OPIM 5604 predictive modeling presentation group7
OPIM 5604 predictive modeling presentation group7Shu-Feng Tsao
 
Pertemuan 3 & 4 - Pengendalian Mutu Statistik.pptx
Pertemuan 3 & 4 - Pengendalian Mutu Statistik.pptxPertemuan 3 & 4 - Pengendalian Mutu Statistik.pptx
Pertemuan 3 & 4 - Pengendalian Mutu Statistik.pptxgigol12808
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points FinalJohn Michael Croft
 
Health informationexchangeacrossus healthinstitution (1)
Health informationexchangeacrossus healthinstitution (1)Health informationexchangeacrossus healthinstitution (1)
Health informationexchangeacrossus healthinstitution (1)University of Illinois,Chicago
 
Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210pbaxter
 
Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?Elias Sipunga
 
Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?Elias Sipunga
 

Similar to MLB Final Project (20)

Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxWeek 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
 
Basic Analytics Module for Sponsors
Basic Analytics Module for SponsorsBasic Analytics Module for Sponsors
Basic Analytics Module for Sponsors
 
Apogee_StatisticalStudy
Apogee_StatisticalStudyApogee_StatisticalStudy
Apogee_StatisticalStudy
 
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...Case Study: Analysis and findings of Qubee customer satisfaction in compariso...
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...
 
The Effect of Employer Matching and Defaults on Workers' TSP Savings Behavior
The Effect of Employer Matching and Defaults on Workers' TSP Savings BehaviorThe Effect of Employer Matching and Defaults on Workers' TSP Savings Behavior
The Effect of Employer Matching and Defaults on Workers' TSP Savings Behavior
 
Bus 308 week 5 final paper
Bus 308 week 5 final paperBus 308 week 5 final paper
Bus 308 week 5 final paper
 
Construction of a robust prediction model to forecast the likelihood of a cre...
Construction of a robust prediction model to forecast the likelihood of a cre...Construction of a robust prediction model to forecast the likelihood of a cre...
Construction of a robust prediction model to forecast the likelihood of a cre...
 
Capstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final DraftCapstone Project - Nicholas Imholte - Final Draft
Capstone Project - Nicholas Imholte - Final Draft
 
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdf
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdfSupuestos Actuariales en tasas contingentes- versión inglés (3).pdf
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdf
 
OPIM 5604 predictive modeling presentation group7
OPIM 5604 predictive modeling presentation group7OPIM 5604 predictive modeling presentation group7
OPIM 5604 predictive modeling presentation group7
 
RMCPWSM_GCM_2015
RMCPWSM_GCM_2015RMCPWSM_GCM_2015
RMCPWSM_GCM_2015
 
Pertemuan 3 & 4 - Pengendalian Mutu Statistik.pptx
Pertemuan 3 & 4 - Pengendalian Mutu Statistik.pptxPertemuan 3 & 4 - Pengendalian Mutu Statistik.pptx
Pertemuan 3 & 4 - Pengendalian Mutu Statistik.pptx
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
 
Health informationexchangeacrossus healthinstitution (1)
Health informationexchangeacrossus healthinstitution (1)Health informationexchangeacrossus healthinstitution (1)
Health informationexchangeacrossus healthinstitution (1)
 
Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210
 
Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?
 
Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?
 
TIME SERIES PAPER
TIME SERIES PAPERTIME SERIES PAPER
TIME SERIES PAPER
 
Change Point Analysis (CPA)
Change Point Analysis (CPA)Change Point Analysis (CPA)
Change Point Analysis (CPA)
 
Final report mkt
Final report mktFinal report mkt
Final report mkt
 

MLB Final Project

  • 1. 1             Total  Payroll  vs.     Winning  Percentage     In  Major  League  Baseball   Bayesian  Statistics   Fall,  2014       Lingwen  He   Zijian  Su   Xiangyu  Li   Padraic  O’Shea    
  • 2. 2     Introduction     Major  League  Baseball  (MLB)  is  the  last  professional  sport  in  America  to  have  not   adopted  a  salary  cap.  The  lack  of  a  salary  cap  has  led  to  large  differences  in  the  total  payroll   for  big  market  teams  vs.  small  market  teams.  This  glaring  difference  in  total  payroll  has  fed   the  ongoing  discussion  of  whether  or  not  teams  can  “buy”  wins  by  spending  more  money.   To  investigate  whether  teams  that  spend  more  money  have  higher  winning  percentages,  we   will  explore  the  existence  of  a  linear  relationship  between  average  total  payroll  and  average   winning  percentage  of  MLB  teams  from  2004  to  2012.     Methods       Data   From  Baseball-­‐Reference.com  we  acquired  data  on  regular  season  winning   percentage  by  team.  This  data  can  be  accessed  from  the  following  link:   http://www.baseball-­‐reference.com/leagues/MLB/.  Data  on  total  payroll,  by  team,  was   acquired  through  USA  Today.  A  link  to  that  data  is  provided  here:   http://content.usatoday.com/sportsdata/baseball/mlb/salaries/team/2004.     To  explore  the  linear  relationship  between  total  payroll  and  winning  percentage   over  time  one  data  point  for  each  team  was  needed.  To  calculate  these  data  points  winning   percentage  and  total  payroll  were  collected  from  the  2004  to  2012  seasons  and  averaged  by   team.  The  predictor  variable  was  re-­‐scaled,  by  dividing  by  a  million,  to  increase  the  size  of   the  coefficient.  Initial  inference  on  the  ‘averaged’  dataset  did  not  indicate  a  severe  violation   of  the  assumption  of  normality  but  the  normal-­‐QQ  plot  was  not  perfectly  linear.  Three   potential  outliers  were  also  identified  while  performing  inference.   One  at  a  time,  the  possible  outliers  were  removed  and  analysis  completed  using   residual  plots  and  QQ  plots.  We  found,  overall,  that  removing  the  points  did  not  improve   the  model  or  the  fit  of  the  distribution.  Therefore,  the  dataset  containing  all  points  was   used  for  the  analysis.  The  plots  used  for  inference  can  be  found  in  Appendix  C.   To  begin  understanding  the  data,  descriptive  statistics  may  be  considered.  The   mean,  median  and  standard  deviation  for  each  variable  under  consideration  is  given  in   Table  1  below.  Predictably  we  see  that  average  total  salary  appears  skewed  to  the  right  as   the  mean  is  greater  than  the  median.  Average  winning  percentage  appears  to  have  a   relatively  normal  distribution.  Additionally,  standard  deviation  is  fairly  large,  especially  for   total  salary.        
  • 3. 3     Data   (Avg.  2004  to  2012)   Mean   Standard   Deviation   Minimum   Median   Maximum   Total  Salary   84.657   32.693   44.046   74.752   199.368   Winning  %   0.5003   0.0414   0.40   0.50   0.58   Table  1.  Descriptive  Statistics  for  Study  Variables   Total  Salary  in  Millions       Statistical  Method   We  hypothesize  that  average  total  payroll  and  average  winning  percentage  are   linearly  associated.  To  assess  this  relationship,  Bayesian  simple  linear  regressions  will  be   utilized  with  average  winning  percentage  as  the  response.  Two  methods  will  be  used  to   explore  this  linear  relationship.  Firstly,  a  non-­‐informative  prior  to  illustrate  the  lack  of  prior   knowledge  about  the  effects  of  salary  on  winning  percentage.  Next,  an  informative  prior   based  on  our  prior  beliefs.  The  two  methods’  predictive  outputs  will  then  be  compared.   For  the  informative  prior  a  N(0.5,  0.05)  for  beta0  is  used  as  our  expectation  for  the   winning  percentage  is  50%  with  small  variance.  For  beta1  a  N(0.1,  100)  is  used  due  to  the   lack  of  knowledge  and  an  expectation  that  this  rate  will  be  positive,  but  not  overly  large.   Our  expectation  for  the  variance  of  beta1  is  that  it  will  be  large.   Convergence  was  assessed  via  OpenBUGS  output  by  history  plots,  auto-­‐correlation   plots  and  MC_error  values.  Due  to  rapid  convergence,  only  one  chain  was  used  for  the   MCMC  integration.  However,  this  meant  BGR  plots  could  not  be  used  to  assess  burn-­‐in.  In   an  effort  to  exclude  initial  values,  since  they  were  based  on  intuition  and  likely  not   representative  of  the  posterior  distribution,  a  3000  sample  burn-­‐in  was  used.        
  • 4. 4     Results   The  results  of  the  Bayesian  simple  linear  regression  models  performed  using  R  are   given  below.  Figure  1  contains  the  node  statistics  for  the  non-­‐informative  prior.  The  history   plots  and  auto-­‐correlation  plots  used  for  assessing  convergence  in  the  non-­‐informative  prior   model  can  be  found  in  Appendix  A.     mean sd MC_error val2.5pc median val97.5pc start sample beta0 0.5002 0.006425 6.46E-5 0.4876 0.5002 0.5131 3001 12000 beta1 7.962E-4 2.01E-4 1.935E-6 4.009E-4 7.949E-4 0.001184 3001 12000 mu[1] 0.4835 0.007636 7.081E-5 0.4684 0.4835 0.4989 3001 12000 mu[2] 0.5043 0.006521 6.684E-5 0.4914 0.5043 0.5171 3001 12000 mu[3] 0.4925 0.006691 6.445E-5 0.4792 0.4924 0.5059 3001 12000 mu[4] 0.5449 0.01305 1.345E-4 0.5189 0.5449 0.5704 3001 12000 mu[5] 0.5199 0.008179 8.614E-5 0.5033 0.52 0.536 3001 12000 mu[6] 0.5124 0.007157 7.505E-5 0.498 0.5125 0.5266 3001 12000 mu[7] 0.4873 0.007166 6.734E-5 0.4732 0.4872 0.5018 3001 12000 mu[8] 0.4809 0.008022 7.386E-5 0.465 0.4808 0.497 3001 12000 mu[9] 0.4862 0.007292 6.824E-5 0.4718 0.4862 0.501 3001 12000 mu[10] 0.5131 0.007238 7.598E-5 0.4986 0.5132 0.5275 3001 12000 mu[11] 0.499 0.006429 6.422E-5 0.4864 0.499 0.512 3001 12000 mu[12] 0.4767 0.008688 7.94E-5 0.4594 0.4767 0.4943 3001 12000 mu[13] 0.5241 0.008868 9.323E-5 0.5062 0.5242 0.5415 3001 12000 mu[14] 0.5121 0.00713 7.474E-5 0.4978 0.5122 0.5262 3001 12000 mu[15] 0.4716 0.009598 8.729E-5 0.4525 0.4716 0.491 3001 12000 mu[16] 0.4878 0.007113 6.698E-5 0.4738 0.4877 0.5022 3001 12000 mu[17] 0.4922 0.006711 6.455E-5 0.4789 0.4922 0.5057 3001 12000 mu[18] 0.4781 0.008451 7.74E-5 0.4614 0.4781 0.4951 3001 12000 mu[19] 0.5256 0.009123 9.582E-5 0.5072 0.5257 0.5434 3001 12000 mu[20] 0.5916 0.02402 2.405E-4 0.5445 0.5915 0.6389 3001 12000 mu[21] 0.4806 0.008057 7.414E-5 0.4647 0.4806 0.4968 3001 12000 mu[22] 0.5272 0.00943 9.892E-5 0.5083 0.5274 0.5457 3001 12000 mu[23] 0.4679 0.01032 9.374E-5 0.4475 0.4678 0.4886 3001 12000 mu[24] 0.4773 0.008586 7.852E-5 0.4602 0.4773 0.4947 3001 12000 mu[25] 0.5077 0.006722 6.976E-5 0.4943 0.5078 0.5211 3001 12000 mu[26] 0.5067 0.006652 6.882E-5 0.4935 0.5068 0.5199 3001 12000 mu[27] 0.5082 0.006758 7.023E-5 0.4947 0.5082 0.5216 3001 12000 mu[28] 0.4685 0.0102 9.27E-5 0.4483 0.4684 0.4889 3001 12000 mu[29] 0.4905 0.006852 6.532E-5 0.4769 0.4904 0.5042 3001 12000 mu[30] 0.4884 0.007049 6.655E-5 0.4745 0.4883 0.5027 3001 12000 postprob 0.9998 0.01581 1.431E-4 1.0 1.0 1.0 3001 12000 sigma 0.03472 0.004829 4.703E-5 0.02672 0.03418 0.04542 3001 12000 tausq 876.2 235.3 2.254 484.9 856.2 1401.0 3001 12000 Figure  1.  Node  Statistics  for  Non-­‐Informative  Prior                  
  • 5. 5     Figure  2  contains  the  node  statistics  for  the  informative  prior.  The  history  plots  and   auto-­‐correlation  plots  used  for  assessing  convergence  in  the  informative  prior  model  can  be   found  in  Appendix  A.     mean sd MC_error val2.5pc median val97.5pc start sample beta0 0.5002 0.006425 6.46E-5 0.4876 0.5002 0.5131 3001 12000 beta1 7.966E-4 2.01E-4 1.935E-6 4.013E-4 7.952E-4 0.001184 3001 12000 mu[1] 0.4835 0.007636 7.081E-5 0.4684 0.4835 0.4989 3001 12000 mu[2] 0.5043 0.006521 6.684E-5 0.4914 0.5043 0.5171 3001 12000 mu[3] 0.4925 0.006691 6.445E-5 0.4792 0.4924 0.5059 3001 12000 mu[4] 0.5449 0.01305 1.345E-4 0.5189 0.5449 0.5705 3001 12000 mu[5] 0.52 0.008179 8.614E-5 0.5033 0.52 0.536 3001 12000 mu[6] 0.5124 0.007157 7.506E-5 0.498 0.5125 0.5266 3001 12000 mu[7] 0.4873 0.007166 6.734E-5 0.4732 0.4872 0.5018 3001 12000 mu[8] 0.4808 0.008022 7.386E-5 0.465 0.4808 0.497 3001 12000 mu[9] 0.4862 0.007292 6.824E-5 0.4718 0.4862 0.501 3001 12000 mu[10] 0.5131 0.007238 7.598E-5 0.4986 0.5132 0.5275 3001 12000 mu[11] 0.499 0.006429 6.421E-5 0.4864 0.499 0.512 3001 12000 mu[12] 0.4767 0.008688 7.94E-5 0.4593 0.4767 0.4943 3001 12000 mu[13] 0.5241 0.008868 9.323E-5 0.5062 0.5242 0.5415 3001 12000 mu[14] 0.5122 0.00713 7.474E-5 0.4978 0.5122 0.5262 3001 12000 mu[15] 0.4716 0.009598 8.73E-5 0.4525 0.4716 0.4909 3001 12000 mu[16] 0.4878 0.007113 6.698E-5 0.4738 0.4877 0.5022 3001 12000 mu[17] 0.4922 0.006711 6.455E-5 0.4789 0.4922 0.5057 3001 12000 mu[18] 0.4781 0.008451 7.74E-5 0.4614 0.4781 0.4951 3001 12000 mu[19] 0.5256 0.009123 9.582E-5 0.5072 0.5257 0.5434 3001 12000 mu[20] 0.5916 0.02402 2.405E-4 0.5445 0.5915 0.639 3001 12000 mu[21] 0.4806 0.008057 7.414E-5 0.4646 0.4806 0.4968 3001 12000 mu[22] 0.5273 0.00943 9.892E-5 0.5083 0.5274 0.5457 3001 12000 mu[23] 0.4679 0.01032 9.374E-5 0.4475 0.4678 0.4885 3001 12000 mu[24] 0.4773 0.008585 7.853E-5 0.4602 0.4773 0.4947 3001 12000 mu[25] 0.5077 0.006722 6.976E-5 0.4943 0.5078 0.5211 3001 12000 mu[26] 0.5067 0.006652 6.882E-5 0.4935 0.5068 0.5199 3001 12000 mu[27] 0.5082 0.006758 7.024E-5 0.4947 0.5082 0.5216 3001 12000 mu[28] 0.4685 0.0102 9.27E-5 0.4483 0.4684 0.4889 3001 12000 mu[29] 0.4905 0.006852 6.532E-5 0.4769 0.4904 0.5042 3001 12000 mu[30] 0.4884 0.007049 6.655E-5 0.4745 0.4883 0.5027 3001 12000 postprob0.9998 0.01581 1.431E-4 1.0 1.0 1.0 3001 12000 sigma 0.03472 0.004829 4.704E-5 0.02672 0.03418 0.04542 3001 12000 tausq 876.2 235.3 2.254 484.9 856.2 1401.0 3001 12000   Figure  2.  Node  Statistics  for  Informative  Prior     Discussion     Based  on  our  analyses,  we  found  a  positive  relationship  between  average  total   payroll  and  average  winning  percentage  in  Major  League  Baseball  for  the  years  2004  to   2012.  For  both  the  non-­‐informative  and  informative  methods,  the  statistics  for  postprob   indicate  that  Pr(β1≥0|{y})  is  about  0.9998.  Or  in  other  words,  there  is  a  greater  than  99%   chance  that  beta1  >  0.  These  findings  are  similarly  supported  by  the  means  and  positive   95%  credible  sets  for  beta1.  Therefore,  there  does  appear  to  be  a  linear  association   between  average  total  payroll  and  average  winning  percentage  for  MLB  teams.  
  • 6. 6     There  was  very  little  difference  in  the  results  of  the  non-­‐informative  and  informative   priors.  Our  belief  is  that  this  is  due  to  the  informative  prior  being  very  consistent  with  the   data.  The  mean  and  median  for  the  informative  prior  are  actually  slightly  larger  than  those   of  the  non-­‐informative  prior.  This  may  be  an  indication  that  our  non-­‐informative  prior  fits   the  data  better,  but  the  difference  is  very  small.   If  further  exploration  of  the  linear  relationship  between  average  total  payroll  and   average  winning  percentage  for  MLB  teams  was  completed  more  information  about   parameters  would  help  improve  the  analysis.    Additionally,  if  an  ‘averaged’  data  set  was   used  in  follow-­‐up  exploration,  including  more  years  would  be  advised.  Finally,  although  it   appears  that  a  positive  relationship  exists  between  total  payroll  and  winning  percentage   based  on  this  analysis.  It  would  be  important  to  explore  the  ongoing  changes  in  the  league.   Most  notably,  the  debate  on  the  use  of  statistics  for  calculating  wins  based  on  on-­‐base-­‐ percentage  rather  than  traditional  baseball  measurements  for  success.  This  ongoing   development  is  having  an  impact  on  perceived  value  for  many  players  and  may  drastically   affect  a  team’s  salary  and  winning  percentage.     References     Our  dataset  was  constructed  by  combining  the  historical  Major  League  Baseball  team   salaries  and  winning  percentage.  This  data  was  drawn  from  the  same  time  period,  2004  to   2012,  for  both  variables.  Links  to  these  MLB  data  sources  can  be  found  below:     Baseball-­‐Refernce.com.  (2014).  Team  Wins.  Retrieved  from  http://www.baseball-­‐ reference.com/leagues/MLB/.     USA  Today.  (2014).  USATODAY  Salaries  Database,  MLB  salaries  by  team  for  various  years   (2004  to  2014).  Retrieved  from   http://content.usatoday.com/sportsdata/baseball/mlb/salaries/team/2004.     Appendix  A:  MCMC  Integration  Convergence     Figure  3  and  4  below  are  the  history  plots  and  auto-­‐correlation  plots,  respectively,   for  the  non-­‐informative  prior.  From  these  plots  it  was  assessed  that  convergence  occurred   quickly  for  every  variable.  Postprob’s  convergence  was  assessed  using  the  MC_error  found   in  the  results  section  of  this  paper.    
  • 7. 7       Figure  3.  History  Plots  for  Non-­‐Informative  Prior       Figure  4.  Auto-­‐Correlation  Plots  for  Non-­‐Informative  Prior     Figure  5  and  6  below  are  the  history  plots  and  auto-­‐correlation  plots,  respectively,   for  the  informative  prior.  From  these  plots  it  was  assessed  that  convergence  occurred   quickly  for  every  variable.  Postprob’s  convergence  was  assessed  using  the  MC_error  found   in  the  results  section  of  this  paper.    
  • 8. 8       Figure  5.  History  Plots  for  Informative  Prior       Figure  6.  Auto-­‐Correlation  Plots  for  Informative  Prior            
  • 9. 9     Appendix  B:  OpenBUGS  Code     Non-­‐informative  Prior   model   {   for  (i  in  1:N){       xcent[i]<-­‐x[i]-­‐mean(x[])   }   for  (i  in  1:N){     mu[i]<-­‐beta0+beta1*xcent[i]     y[i]~dnorm(mu[i],tausq)   }   postprob<-­‐step(beta1)   beta0~dflat()   beta1~dflat()   tausq~dgamma(0.001,0.001)   sigma<-­‐1/sqrt(tausq)   }     #data   list(x=c(63.69154422,  89.76840667,  74.92461167,  140.7180136,  109.4106621,  99.92325911,  68.43453544,   60.32229233,  67.06203433,  100.8169706,  83.123759,  55.12364089,  114.6389837,  99.61515889,   48.75051056,  69.04419133,  74.57996711,  56.90573011,  116.4527793,  199.368707,  60.03360822,   118.5733706,  44.04681044,  55.889759,  94.06393778,  92.80824244,  94.66015589,  44.78459711,   72.37762889,  69.80194444),y=c(0.47,  0.54,  0.44,  0.56,  0.48,  0.53,  0.49,  0.49,  0.48,  0.52,  0.46,  0.40,  0.55,  0.52,   0.50,  0.50,  0.50,  0.47,  0.50,  0.58,  0.52,  0.53,  0.42,  0.50,  0.50,  0.46,  0.57,  0.49,  0.54,  0.50),  N=30)     #inits   list(beta0=0,  beta1=0,tausq=1)     Informative  Prior   model   {   for  (i  in  1:N){       xcent[i]<-­‐x[i]-­‐mean(x[])   }   for  (i  in  1:N){     mu[i]<-­‐beta0+beta1*xcent[i]     y[i]~dnorm(mu[i],tausq)   }   postprob<-­‐step(beta1)   beta0~dnorm(0.5,  0.05)   beta1~dnorm(0.1,  100)   tausq~dgamma(0.001,0.001)   sigma<-­‐1/sqrt(tausq)  
  • 10. 10     }     #data   list(x=c(63.69154422,  89.76840667,  74.92461167,  140.7180136,  109.4106621,  99.92325911,  68.43453544,   60.32229233,  67.06203433,  100.8169706,  83.123759,  55.12364089,  114.6389837,  99.61515889,   48.75051056,  69.04419133,  74.57996711,  56.90573011,  116.4527793,  199.368707,  60.03360822,   118.5733706,  44.04681044,  55.889759,  94.06393778,  92.80824244,  94.66015589,  44.78459711,   72.37762889,  69.80194444),y=c(0.47,  0.54,  0.44,  0.56,  0.48,  0.53,  0.49,  0.49,  0.48,  0.52,  0.46,  0.40,  0.55,  0.52,   0.50,  0.50,  0.50,  0.47,  0.50,  0.58,  0.52,  0.53,  0.42,  0.50,  0.50,  0.46,  0.57,  0.49,  0.54,  0.50),  N=30)     #inits   list(beta0=0,  beta1=0,tausq=1)     Appendix  C:  Inference  (R  code)     Complete  Dataset   >  data.bb=read.table('C://Users/xli63/Desktop/Baseball.txt',  header=TRUE)   >  attach(data.bb)   >  head(data.bb)   >  x  <-­‐  data.bb$AverageTotalPayroll   >  pct  <-­‐  data.bb$AveragePCT   >  lm.out=lm(pct~x)   >  summary(lm.out)   Call:   lm(formula  =  pct  ~  x)   Call:   lm(formula  =  pct  ~  x)   Residuals:   Min                   1Q   Median   3Q   Max     -­‐0.076797   -­‐0.013157     0.007243   0.020457   0.061695     Coefficients:                              Estimate   Std.  Error   t  value   Pr(>|t|)           (Intercept)   0.4328659   0.0168386   25.707     <  2e-­‐16  ***   X   0.0007969   0.0001860     4.286   0.000194  ***   -­‐-­‐-­‐   Signif.  codes:    0  ‘***’  0.001  ‘**’  0.01  ‘*’  0.05  ‘.’  0.1  ‘  ’  1   Residual  standard  error:  0.03274  on  28  degrees  of  freedom   Multiple  R-­‐squared:    0.3961,        Adjusted  R-­‐squared:    0.3746     F-­‐statistic:  18.37  on  1  and  28  DF,    p-­‐value:  0.0001944     Reduced  Model  (#3  Removed)   >  data_up2  <-­‐  data.bb[-­‐c(3),]   >  xnew2  <-­‐  data_up2$AvgTotalPayroll   >  pct2  <-­‐  data_up2$AveragePCT   >  red_residual_line2  <-­‐  lm(pct2~xnew2)   >  summary(red_residual_line2)   Call:   lm(formula  =  pct2  ~  xnew2)   Residuals:              Min     1Q           Median                   3Q                 Max     -­‐0.079121     -­‐0.011606       0.005706       0.018941       0.060047    
  • 11. 11     Coefficients:                                                  Estimate     Std.  Error     t  value     Pr(>|t|)           (Intercept)     0.4361350       0.0164219       26.558       <  2e-­‐16  ***   xnew2                 0.0007798       0.0001804         4.323     0.000187  ***   -­‐-­‐-­‐   Signif.  codes:    0  ‘***’  0.001  ‘**’  0.01  ‘*’  0.05  ‘.’  0.1  ‘  ’  1   Residual  standard  error:  0.03171  on  27  degrees  of  freedom   Multiple  R-­‐squared:    0.4091,        Adjusted  R-­‐squared:    0.3872     F-­‐statistic:  18.69  on  1  and  27  DF,    p-­‐value:  0.0001872     Reduced  Model  (#12  Removed)   >  data_new  <-­‐  mydata[-­‐c(12),]   >  xnew  <-­‐  data_new$AverageTotalPayroll   >  pct_new  <-­‐  data_new$AveragePCT   >  remove_residual_line  <-­‐  lm(pct_new~xnew)   >  summary(remove_residual_line)   Call:   lm(formula  =  pct_new  ~  xnew)   Residuals:              Min                   1Q           Median                   3Q                 Max     -­‐0.056063     -­‐0.013108       0.004436       0.016632       0.059747     Coefficients:                               Estimate     Std.  Error     t  value     Pr(>|t|)           (Intercept)     0.4421938       0.0156408       28.272       <  2e-­‐16  ***   xnew                   0.0007190       0.0001709         4.208     0.000255  ***   -­‐-­‐-­‐   Signif.  codes:    0  ‘***’  0.001  ‘**’  0.01  ‘*’  0.05  ‘.’  0.1  ‘  ’  1   Residual  standard  error:  0.02964  on  27  degrees  of  freedom   Multiple  R-­‐squared:    0.396,          Adjusted  R-­‐squared:    0.3737     F-­‐statistic:    17.7  on  1  and  27  DF,    p-­‐value:  0.0002551     Reduced  Model  (#24  Removed)   >  data_up1  <-­‐  data.bb[-­‐c(24),]   >  xnew1  <-­‐  data_up1$Avg.Total.Payroll   >  pct1  <-­‐  data_up1$Average.PCT   >  red_residual_line  <-­‐  lm(pct1~xnew1)   >  plot(red_residual_line)   >  summary(red_residual_line)   Call:   lm(formula  =  pct1  ~  xnew1)   Residuals:              Min                   1Q           Median                   3Q                 Max     -­‐0.075337     -­‐0.013510       0.007229       0.017961       0.062273     Coefficients:                               Estimate     Std.  Error     t  value     Pr(>|t|)           (Intercept)     0.4301762       0.0174143       24.702       <  2e-­‐16  ***   xnew1                 0.0008193       0.0001903         4.305     0.000196  ***   -­‐-­‐   Signif.  codes:    0  ‘***’  0.001  ‘**’  0.01  ‘*’  0.05  ‘.’  0.1  ‘  ’  1   Residual  standard  error:  0.03304  on  27  degrees  of  freedom   Multiple  R-­‐squared:    0.4071,        Adjusted  R-­‐squared:    0.3851     F-­‐statistic:  18.54  on  1  and  27  DF,    p-­‐value:  0.0001965  
  • 12. 12       Residual  &  QQ  plots  Based  on  Complete  Dataset     Residual  &  QQ  plots  Based  on  Reduced  Model  (Data  point  3  Removed)             50 100 150 200 0.400.450.500.55 x y
  • 13. 13     Residual  &  QQ  plots  Based  on  Reduced  Model  (Data  point  12  Removed)       Residual  &  QQ  plots  Based  on  Reduced  Model  (Data  point  24  Removed)            
  • 14. 14     Appendix  D:  ‘Averaged’  Dataset   AverageTotalPayroll   AveragePCT   63.69154422   0.47   89.76840667   0.54   74.92461167   0.44   140.7180136   0.56   109.4106621   0.48   99.92325911   0.53   68.43453544   0.49   60.32229233   0.49   67.06203433   0.48   100.8169706   0.52   83.123759   0.46   55.12364089   0.40   114.6389837   0.55   99.61515889   0.52   48.75051056   0.50   69.04419133   0.50   74.57996711   0.50   56.90573011   0.47   116.4527793   0.50   199.368707   0.58   60.03360822   0.52   118.5733706   0.53   44.04681044   0.42   55.889759   0.50   94.06393778   0.50   92.80824244   0.46   94.66015589   0.57   44.78459711   0.49   72.37762889   0.54   69.80194444   0.50   *For  Average  Total  Payroll  10.1  =  10,100,000          
  • 15. 15     Contributions     Project  proposal:  All  Members   OpenBUGS/R  Computing:   -­‐ Non-­‐informative  prior:  Lingwen  He   -­‐ Informative  prior:  Zijian  Su   -­‐ Inference:  Xiangyu  Li   -­‐ Additional  Computing:  Lingwen  He,  Zijian  Su,  Xiangyu  Li   Interim  report:  All  Members   Final  report  writing  and  formatting:  Padraic  O’Shea