SlideShare a Scribd company logo
1 of 10
What are MLB teams Really Paying For?
Daniel Crago
Section #03 Rohit Gupta
Introduction
Growingup, as withmany Americanchildren, baseball hasalwaysbeenapartof mylife.I’ve
beenplayingbaseball since Iwasinthe thirdgrade all the waythroughcollege,here atUC Merced. As a
math major,the fact that baseball hassomanystatisticsandways to analyze aplayer’sgame,Ithought
thisprojectwouldbe a perfectopportunitytoanalyze the game Ilove.
The most primitive game of baseball datesall the waybackto the 1700s whena few English
conformistleaderssetupthe game eachSunday.Modernday baseball aswe know itoriginatedwhen
the National League formedin1876. When the AmericanLeague formedin1901, interleague matchups
and divisional competitionsbegan.Itwasnotuntil 1903 where we findthe firsteverWorldSeries
championshipwhenthe BostonAmericansdefeatedthe PittsburghPirates.Tothisdaybaseball
continuestobe America’spastime andhasa world-wide impactonfans.
As the industryof baseball continuestogrow,the amountof moneythatis putintothe sportis
huge.Everytime ClaytonKershawstepsonthe moundforthe Los AngelesDodgers,he ismaking
approximately$10,000,00. Evenif the Dodgerslose 30-0 that day,ClaytonKershaw goeshome awinner.
The financial implicationsof thesenumbersissomethingmanypeoplehave builtcontroversyoverand
believethatmostprofessional athletesearnwaytoomuch money.Inmyproject,I planto use statistics
fromthe Lahmandatabase to analyze how playersalariescorrelatewithactual playerperformance.The
lahmandatabase foundonline at http://www.seanlahman.com/baseball-archive/statistics/
and iscomprisedof statisticsdatingbackfrom1871 to 2014.
Goals
In thisprojectI intendtocompile the topsalariesof the 125 highestpaidplayers in2013 and
compare that to theirbattingavg (BA),winsabove replacement(WAR), sluggingpercentage(Total
bases/At-Bats),forhitters,andearnedrunaverage (ERA),andwins(W),forpitchers.These statisticswill
give a goodindicationof howwell the playerpreformsbothoffensivelyanddefensively,andwhether
the baseball teamtheyplayforistrulygettingtheirmoney’sworth. Ourquestionof interestistofind
out whetherornotthere iscorrelationbetweenaplayer’sactual performance andtheirsalary.We will
alsoidentifyanyoutliers,whichare identified asplayerswhomunder- orover-preformtheircontracts.
Methods
I approachedthisprojectbytakingthe top 125 salariesfrom2013 fromboth pitchersand
position players fromthe Lahmandatabase andextractedthemintoR. I thencalculatedeach position
player’sbattingaverage andsluggingpercentage,calculatedwinsandERA for pitchers, andcompiledit
intoan excel table forviewing.Thenusinglinearregressionmodels,Icompared these statisticsto
salariesof eachof these players.
Usingonline resources,Iwasable tofindthe outliersof the data.I identifiedoutliers,asplayers
whoeithergotpaidin the top 125 and eitherdidnotplaythe full season,orthose thatwere outside the
top 125 whomover-preformedtheircontracts.Withthese playersIalsofoundtheirWARfor position
players,andW and ERA forthe pitchers.
In R I was able tocompile differentsubsetsof datatoplotand analyze withthe “Pitching”,
“Batting”and “Salaries”dataframes.Inthe “Salaries”dataframe Ifirstsortedoutsalariesof the year
2013. Lookingonline,usingespn.com,Iwasable tofindthat the average salaryfor 2013, was 3.2M
dollars,soI thencreatedtwosubsetdataframes;one of playerswiththeirsalaryoverthe average,the
otherincludesdataof playersunderthe average salary.Whenanalyzingthe “Batting”dataframe,I
createdthe subsetof data inthe year2013, thenanothersubsetdataframe of playerswithatleast200
at-bats(AB)(league minforcomputing relevantbattingstatistics).Fromthat,Icomputeda final batting
dataframe of playerswitha battingaverage above the average battingaverage from2013, of .253. From
these dataframesIwasable to create histogramsandregressionanalysisof the datato compute the
correlationbetweenthe coefficients.
Results
Afterextractionof the datafrom the Lahman database project,Iwas able tocompile multiple subsetsof
data, showninthe code attached. I found that there waslittle tonocorrelationbetweensalariesand
playerperformance.Whenlookingatthe positionplayer’sstatistics, IfoundanR2
value of .0011 when
comparingbattingaverage tosalaries (Figure 2).There wasa slightlyhighercorrelation between
sluggingpercentage andsalaries (Figure1),R2
=.0042, butstill verylow,andnotnearlyhighenoughto
indicate thatplayerperformance correlateswithsalaries.
Figure 1. Slugging percentage vs. Salaries of the highest paid
72 position players. Correlation coefficient of R squared value
0.0042.
R² = 0.0042
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 20000000 40000000
Slugging%
Salaries
Slugging % VS. Salaries
Figure 2. Batting average vs. Salaries of the highest paid 72 position players in 2013 with correlation coefficient of 0.0011.
Whenlookingoverthe pitchingstatistics, Ifound thatagain,there wasno correlationbetween
eitherERA or W andsalary.The computedcorrelationcoefficientforERA and salariesis0.0241 shownin
Figure 3 below.Whencomputingthe correlationcoefficientforwinsandsalarywe findaverysimilar
correlationcoefficientof 0.0252 (Figure 4).
Figure 3. Analysis of ERA vs Salary of highest paid 53 pitchers Figure 4. Analysis of Wins vs Salary of highest paid 53 pitchers in
in 2013 in 2013
R² = 0.0011
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 20000000 40000000
BattingAverage
Salaries
Batting AverageVS. Salaries
R² = 0.0241
0
1
2
3
4
5
6
7
8
0 10000000 20000000 30000000
ERA
Salaries
ERA VS Salaries
R² = 0.0252
0
5
10
15
20
0 10000000 20000000 30000000
Wins
Salaries
Wins Vs Salaries
Whenanalyzingthe salarydata, I found that there wasa strong skew towardsthe salariesover
5M dollarsandthendrops off towards15M dollars.We can see infigure 5 thatthere are over300
playersaroundthe league minimumsalaryof $500,000.
Figure 5. Salaries below the league average in 2013 of $3.2M Figure 6. Salaries above the league average in 2013 of $3.2M
Whenlookingatour outliersandanalyzingoutside datafoundatbaseballreference.comand
espn.com, Ifound thatthere were multipleplayerswhodidnotcomplete fullseasonsbutstillmade
millionsof dollarsfromtheircontracts.The highestpaidplayerinbaseballin2013, Alex Rodriguez,
made $29M, howeverhe onlyplayed44gamespriorto hissuspensionfor PerformanceEnhancingDrugs
(PEDs).Thismeansthatin 44 games,Alex made nearly$8Mdollars.OtherplayerssuchasRyan Howard
(PHI),DerekJeter(NYA),HanleyRamirez(LAD),JustinMorneau(MIN),andKevinYoukilis(NYA),all made
over$12M but failedtoplaymore than86 gamesdue to injury.Simplyputthese are playersthatdidnot
meettheircontractexpectations.The 4th
and 5th
highestpaidplayersin2013 were VernonWellsand
Mark Teixeiraof the NewYorkYankees.Theyhadbattingaveragesof 0.233 and0.151 respectively,
whichfallswell belowthe league average of 0.253.
Whenlookingoverthe pitchers,Ifoundthatthe secondhighestpaidplayerin2013, Johan
Santanadid nothave any documentedstats. IalsofoundthatJeff Weaver(30th
) andChrisCarpenter
(67th
) alsohad no stats forthe 2013 seasonbut still made over$12.5M. I foundthat there were more
pitcherswhosufferedinjuriesandwere unabletoperformthanthere were positionplayersinthe top
125 highestpaidplayers.
I was able tointerpretstats of the most under-paidormostefficientpositionplayersand
pitchers.Asshownbelow(Table 1) we cansee that these playersprovidedbattingaverageswell above
the league average,aswell as a WAR stat above 2. Mike Trout, wasclearlythe mostefficientplayerwith
10 winsabove replacement,andabattingaverage of .323. For the pitchers,(Table 2),we see the most
efficientpitcherin2013 wasFranciscoLirianowith16 Winswithan ERA of 2.88, witha contract of $1M.
Table 1. Position players out-preforming contract
Name Salary BA WAR
Mike trout 510000 0.323 10
Dionner Navarro 1750000 0.303 1.9
James Loney 2000000 0.299 2.2
Omar Infante 4000000 0.318 2.5
Yunel Escobar 5000000 0.256 3.3
Nick Punto 1500000 0.255 2.1
Marlon Byrd 700000 0.291 4.3
Ryan Rayburn 1000000 0.272 2.3
Chris Denorfia 2000000 0.279 3.6
Jose Altuve 500000 0.283 SLUG=0.363
Table 2. Pitchers out-preforming contract
Name Salary W/Save ERA
Francisco Liriano 1000000 16 2.88
Wade Miley 500000 10 3.55
Craig Kimbrel 500000 S=50 1.21
Chris Sale 500000 11 3.07
Mat Latos 4.6M 14 3.16
David Robertson 2.7M 5 Wins, 51
Holds
2.04
Matt Harrison 6.1 18 3.29
Discussion
Our resultsindicate one simple interpretation;money can’tbuyyouwinsinbaseball.When
lookingatoutliers,itisnotalwaysthe bestcontracts that will getyouthe bestplayersinbaseball,but
rather the playerswithbreakoutyears,endupwiththe highestcontractsthe followingyears.These
resultsimplicate thatbaseball will continue tohave anissue of an increasingpayroll andsalarycap,and
playerswho,inthe eyesof most,are earningwaymore thantheydeserve,andother“sleeper”players
havingbigyears, whomout-preformingtheircontracts.
Our data and correlationcoefficientsof all of ouranalysisindicate nocorrelationbetween
statisticsandthe salariesof players. Inorderto trulyunderstandthe extentof the information,we had
to extrapolate datafromoutside sourcestofullyunderstandhow teams andplayersfinalize contracts.
Conclusion
Our findings tellusthatinbaseball,there isnoreal indicatorof how much a playershouldmake;
it isall reallybasedonpreviousperformance andthe demandof the playerfromotherteams.General
Managers suchas BillyBeanfromthe OaklandAthletics hasattemptedtocome upwiththe formulato
winning, byacknowledgingthe factthatmoneycan’t alwaysbuyyouwins.Thisidea,knownas
sabermetrics,isthe future toformulatingwins.Asstatisticianswithalove forthe game continue to
analyze correlationssuchasthese,we mustfindbetternumbersandcoefficientsthatimplicate stronger
results.
References
http://mlb.mlb.com/mlb/history/mlb_history_teams.jsp
http://www.baseball-reference.com/players/
http://www.historyofbaseball.us/
http://www.spotrac.com/mlb/rankings/
http://www.seanlahman.com/baseball-archive/statistics/
http://espn.go.com/mlb/stats/batting/_/year/2013
Code
install.packages("Lahman")
library("Lahman")
library("Lahman",lib.loc="~/R/win-library/3.1")
data(Salaries)
Salaries
View(Salaries)
#make a subsetdataof salariesin2013
Salaries_2013 <- as.data.frame(subset(Salaries,yearID>2012))
#make a subsetdatatable of salariesover3.2m whichwasthe league average
Salaries_overavg13<- as.data.frame(subset(Salaries_2013,salary>= 3200000))
#make a subsetdataframe of salariesunderthe avgsalary
Salaries_underavg13<- as.data.frame(subset(Salaries_2013,salary< 3200000))
data(Batting)
Batting
View(Batting)
#make a subsetof battingstatsin the year2013
Batting_2013 <- as.data.frame(subset(Batting,yearID>2012))
#onlyinclude Battingstatsof playerswithatleast200AB
Batting_stats<- as.data.frame(subset(Batting_2013,AB >= 200))
#make a subsetof battingstats of avg over.280
##league average in2013 was.253
Batting_avgover253<- as.data.frame(subset(Batting_stats,H/AB>= .253))
data(Pitching)
Pitching
View(Pitching)
#subsetof stats in2013
Pitching_2013 <- as.data.frame(subset(Pitching,yearID>2012))
#subsetof gamespitched
Pitching_Gover15<- as.data.frame(subset(Pitching_2013,G >= 15))
#subsetof pitcherswitheraunder3.86 (league average)
Pitching_eraUnderAvg<- as.data.frame(subset(Pitching_Gover15,ERA <= 3.86))
hist( Salaries_overavg13$salary,xlab="SalariesinMillions",main="Histogramof Salariesin2013 over
average")
hist( Salaries_underavg13$salary,xlab="SalariesinMillions",main="Histogramof Salariesin2013 under
average")
plot(Salaries_overavg13$salary,Batting_avgover280$H/AB,xlab="Salaries",ylab="BattingAverage",
main="BattingAvgerage VS.Salaries")
abline(lm(salary~H/AB),col="blue") #regressionline (y~x)
plot(Salaries_overavg13$salary,Pitching_Gover15$ERA,xlab="Salaries",ylab="ERA",main="ERA VS.
Salaries")
abline(lm(salary~ERA),col="blue") #regressionline (y~x)
plot(Salaries_overavg13$salary,Pitching_Gover15$W,xlab="Salaries",ylab="Wins",main="WinsVS.
Salaries")
abline(lm(salary~W),col="blue")#regressionline(y~x)
Final+draft

More Related Content

Similar to Final+draft

WageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesWageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesGeorge Ulloa
 
Columbia University Baseball Analytics Case Competition
Columbia University Baseball Analytics Case CompetitionColumbia University Baseball Analytics Case Competition
Columbia University Baseball Analytics Case CompetitionTanner Crouch
 
Diamond dollars powerpoint
Diamond dollars powerpointDiamond dollars powerpoint
Diamond dollars powerpointDan Lueck
 
2016 Diamond Dollars Case Competition - Columbia Univ.
2016 Diamond Dollars Case Competition - Columbia Univ.2016 Diamond Dollars Case Competition - Columbia Univ.
2016 Diamond Dollars Case Competition - Columbia Univ.RJ Walsh
 
Math in the News: 7/25/11
Math in the News: 7/25/11Math in the News: 7/25/11
Math in the News: 7/25/11Media4math
 
Report: MLB slightly deadening baseball amid years-long home run surge
Report: MLB slightly deadening baseball amid years-long home run surgeReport: MLB slightly deadening baseball amid years-long home run surge
Report: MLB slightly deadening baseball amid years-long home run surgeJohn Eilermann St Louis
 
Nba wage study
Nba wage studyNba wage study
Nba wage studyTy Candler
 
1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docx1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docxjeremylockett77
 
A crazy -- but defensible -- All-MLB Team
A crazy -- but defensible -- All-MLB TeamA crazy -- but defensible -- All-MLB Team
A crazy -- but defensible -- All-MLB TeamJohn Eilermann St Louis
 

Similar to Final+draft (11)

Dv
DvDv
Dv
 
WageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesWageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletes
 
Columbia University Baseball Analytics Case Competition
Columbia University Baseball Analytics Case CompetitionColumbia University Baseball Analytics Case Competition
Columbia University Baseball Analytics Case Competition
 
Diamond dollars powerpoint
Diamond dollars powerpointDiamond dollars powerpoint
Diamond dollars powerpoint
 
2016 Diamond Dollars Case Competition - Columbia Univ.
2016 Diamond Dollars Case Competition - Columbia Univ.2016 Diamond Dollars Case Competition - Columbia Univ.
2016 Diamond Dollars Case Competition - Columbia Univ.
 
Math in the News: 7/25/11
Math in the News: 7/25/11Math in the News: 7/25/11
Math in the News: 7/25/11
 
Report: MLB slightly deadening baseball amid years-long home run surge
Report: MLB slightly deadening baseball amid years-long home run surgeReport: MLB slightly deadening baseball amid years-long home run surge
Report: MLB slightly deadening baseball amid years-long home run surge
 
Nba wage study
Nba wage studyNba wage study
Nba wage study
 
1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docx1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docx
 
Research Paper
Research PaperResearch Paper
Research Paper
 
A crazy -- but defensible -- All-MLB Team
A crazy -- but defensible -- All-MLB TeamA crazy -- but defensible -- All-MLB Team
A crazy -- but defensible -- All-MLB Team
 

Final+draft

  • 1. What are MLB teams Really Paying For? Daniel Crago Section #03 Rohit Gupta Introduction Growingup, as withmany Americanchildren, baseball hasalwaysbeenapartof mylife.I’ve beenplayingbaseball since Iwasinthe thirdgrade all the waythroughcollege,here atUC Merced. As a math major,the fact that baseball hassomanystatisticsandways to analyze aplayer’sgame,Ithought thisprojectwouldbe a perfectopportunitytoanalyze the game Ilove. The most primitive game of baseball datesall the waybackto the 1700s whena few English conformistleaderssetupthe game eachSunday.Modernday baseball aswe know itoriginatedwhen the National League formedin1876. When the AmericanLeague formedin1901, interleague matchups and divisional competitionsbegan.Itwasnotuntil 1903 where we findthe firsteverWorldSeries championshipwhenthe BostonAmericansdefeatedthe PittsburghPirates.Tothisdaybaseball continuestobe America’spastime andhasa world-wide impactonfans. As the industryof baseball continuestogrow,the amountof moneythatis putintothe sportis huge.Everytime ClaytonKershawstepsonthe moundforthe Los AngelesDodgers,he ismaking approximately$10,000,00. Evenif the Dodgerslose 30-0 that day,ClaytonKershaw goeshome awinner. The financial implicationsof thesenumbersissomethingmanypeoplehave builtcontroversyoverand believethatmostprofessional athletesearnwaytoomuch money.Inmyproject,I planto use statistics fromthe Lahmandatabase to analyze how playersalariescorrelatewithactual playerperformance.The lahmandatabase foundonline at http://www.seanlahman.com/baseball-archive/statistics/ and iscomprisedof statisticsdatingbackfrom1871 to 2014.
  • 2. Goals In thisprojectI intendtocompile the topsalariesof the 125 highestpaidplayers in2013 and compare that to theirbattingavg (BA),winsabove replacement(WAR), sluggingpercentage(Total bases/At-Bats),forhitters,andearnedrunaverage (ERA),andwins(W),forpitchers.These statisticswill give a goodindicationof howwell the playerpreformsbothoffensivelyanddefensively,andwhether the baseball teamtheyplayforistrulygettingtheirmoney’sworth. Ourquestionof interestistofind out whetherornotthere iscorrelationbetweenaplayer’sactual performance andtheirsalary.We will alsoidentifyanyoutliers,whichare identified asplayerswhomunder- orover-preformtheircontracts. Methods I approachedthisprojectbytakingthe top 125 salariesfrom2013 fromboth pitchersand position players fromthe Lahmandatabase andextractedthemintoR. I thencalculatedeach position player’sbattingaverage andsluggingpercentage,calculatedwinsandERA for pitchers, andcompiledit intoan excel table forviewing.Thenusinglinearregressionmodels,Icompared these statisticsto salariesof eachof these players. Usingonline resources,Iwasable tofindthe outliersof the data.I identifiedoutliers,asplayers whoeithergotpaidin the top 125 and eitherdidnotplaythe full season,orthose thatwere outside the top 125 whomover-preformedtheircontracts.Withthese playersIalsofoundtheirWARfor position players,andW and ERA forthe pitchers. In R I was able tocompile differentsubsetsof datatoplotand analyze withthe “Pitching”, “Batting”and “Salaries”dataframes.Inthe “Salaries”dataframe Ifirstsortedoutsalariesof the year 2013. Lookingonline,usingespn.com,Iwasable tofindthat the average salaryfor 2013, was 3.2M dollars,soI thencreatedtwosubsetdataframes;one of playerswiththeirsalaryoverthe average,the
  • 3. otherincludesdataof playersunderthe average salary.Whenanalyzingthe “Batting”dataframe,I createdthe subsetof data inthe year2013, thenanothersubsetdataframe of playerswithatleast200 at-bats(AB)(league minforcomputing relevantbattingstatistics).Fromthat,Icomputeda final batting dataframe of playerswitha battingaverage above the average battingaverage from2013, of .253. From these dataframesIwasable to create histogramsandregressionanalysisof the datato compute the correlationbetweenthe coefficients. Results Afterextractionof the datafrom the Lahman database project,Iwas able tocompile multiple subsetsof data, showninthe code attached. I found that there waslittle tonocorrelationbetweensalariesand playerperformance.Whenlookingatthe positionplayer’sstatistics, IfoundanR2 value of .0011 when comparingbattingaverage tosalaries (Figure 2).There wasa slightlyhighercorrelation between sluggingpercentage andsalaries (Figure1),R2 =.0042, butstill verylow,andnotnearlyhighenoughto indicate thatplayerperformance correlateswithsalaries. Figure 1. Slugging percentage vs. Salaries of the highest paid 72 position players. Correlation coefficient of R squared value 0.0042. R² = 0.0042 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 20000000 40000000 Slugging% Salaries Slugging % VS. Salaries
  • 4. Figure 2. Batting average vs. Salaries of the highest paid 72 position players in 2013 with correlation coefficient of 0.0011. Whenlookingoverthe pitchingstatistics, Ifound thatagain,there wasno correlationbetween eitherERA or W andsalary.The computedcorrelationcoefficientforERA and salariesis0.0241 shownin Figure 3 below.Whencomputingthe correlationcoefficientforwinsandsalarywe findaverysimilar correlationcoefficientof 0.0252 (Figure 4). Figure 3. Analysis of ERA vs Salary of highest paid 53 pitchers Figure 4. Analysis of Wins vs Salary of highest paid 53 pitchers in in 2013 in 2013 R² = 0.0011 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 20000000 40000000 BattingAverage Salaries Batting AverageVS. Salaries R² = 0.0241 0 1 2 3 4 5 6 7 8 0 10000000 20000000 30000000 ERA Salaries ERA VS Salaries R² = 0.0252 0 5 10 15 20 0 10000000 20000000 30000000 Wins Salaries Wins Vs Salaries
  • 5. Whenanalyzingthe salarydata, I found that there wasa strong skew towardsthe salariesover 5M dollarsandthendrops off towards15M dollars.We can see infigure 5 thatthere are over300 playersaroundthe league minimumsalaryof $500,000. Figure 5. Salaries below the league average in 2013 of $3.2M Figure 6. Salaries above the league average in 2013 of $3.2M Whenlookingatour outliersandanalyzingoutside datafoundatbaseballreference.comand espn.com, Ifound thatthere were multipleplayerswhodidnotcomplete fullseasonsbutstillmade millionsof dollarsfromtheircontracts.The highestpaidplayerinbaseballin2013, Alex Rodriguez, made $29M, howeverhe onlyplayed44gamespriorto hissuspensionfor PerformanceEnhancingDrugs (PEDs).Thismeansthatin 44 games,Alex made nearly$8Mdollars.OtherplayerssuchasRyan Howard (PHI),DerekJeter(NYA),HanleyRamirez(LAD),JustinMorneau(MIN),andKevinYoukilis(NYA),all made over$12M but failedtoplaymore than86 gamesdue to injury.Simplyputthese are playersthatdidnot meettheircontractexpectations.The 4th and 5th highestpaidplayersin2013 were VernonWellsand Mark Teixeiraof the NewYorkYankees.Theyhadbattingaveragesof 0.233 and0.151 respectively, whichfallswell belowthe league average of 0.253. Whenlookingoverthe pitchers,Ifoundthatthe secondhighestpaidplayerin2013, Johan Santanadid nothave any documentedstats. IalsofoundthatJeff Weaver(30th ) andChrisCarpenter (67th ) alsohad no stats forthe 2013 seasonbut still made over$12.5M. I foundthat there were more
  • 6. pitcherswhosufferedinjuriesandwere unabletoperformthanthere were positionplayersinthe top 125 highestpaidplayers. I was able tointerpretstats of the most under-paidormostefficientpositionplayersand pitchers.Asshownbelow(Table 1) we cansee that these playersprovidedbattingaverageswell above the league average,aswell as a WAR stat above 2. Mike Trout, wasclearlythe mostefficientplayerwith 10 winsabove replacement,andabattingaverage of .323. For the pitchers,(Table 2),we see the most efficientpitcherin2013 wasFranciscoLirianowith16 Winswithan ERA of 2.88, witha contract of $1M. Table 1. Position players out-preforming contract Name Salary BA WAR Mike trout 510000 0.323 10 Dionner Navarro 1750000 0.303 1.9 James Loney 2000000 0.299 2.2 Omar Infante 4000000 0.318 2.5 Yunel Escobar 5000000 0.256 3.3 Nick Punto 1500000 0.255 2.1 Marlon Byrd 700000 0.291 4.3 Ryan Rayburn 1000000 0.272 2.3 Chris Denorfia 2000000 0.279 3.6 Jose Altuve 500000 0.283 SLUG=0.363 Table 2. Pitchers out-preforming contract Name Salary W/Save ERA Francisco Liriano 1000000 16 2.88 Wade Miley 500000 10 3.55 Craig Kimbrel 500000 S=50 1.21 Chris Sale 500000 11 3.07 Mat Latos 4.6M 14 3.16 David Robertson 2.7M 5 Wins, 51 Holds 2.04 Matt Harrison 6.1 18 3.29
  • 7. Discussion Our resultsindicate one simple interpretation;money can’tbuyyouwinsinbaseball.When lookingatoutliers,itisnotalwaysthe bestcontracts that will getyouthe bestplayersinbaseball,but rather the playerswithbreakoutyears,endupwiththe highestcontractsthe followingyears.These resultsimplicate thatbaseball will continue tohave anissue of an increasingpayroll andsalarycap,and playerswho,inthe eyesof most,are earningwaymore thantheydeserve,andother“sleeper”players havingbigyears, whomout-preformingtheircontracts. Our data and correlationcoefficientsof all of ouranalysisindicate nocorrelationbetween statisticsandthe salariesof players. Inorderto trulyunderstandthe extentof the information,we had to extrapolate datafromoutside sourcestofullyunderstandhow teams andplayersfinalize contracts. Conclusion Our findings tellusthatinbaseball,there isnoreal indicatorof how much a playershouldmake; it isall reallybasedonpreviousperformance andthe demandof the playerfromotherteams.General Managers suchas BillyBeanfromthe OaklandAthletics hasattemptedtocome upwiththe formulato winning, byacknowledgingthe factthatmoneycan’t alwaysbuyyouwins.Thisidea,knownas sabermetrics,isthe future toformulatingwins.Asstatisticianswithalove forthe game continue to analyze correlationssuchasthese,we mustfindbetternumbersandcoefficientsthatimplicate stronger results. References http://mlb.mlb.com/mlb/history/mlb_history_teams.jsp http://www.baseball-reference.com/players/ http://www.historyofbaseball.us/
  • 8. http://www.spotrac.com/mlb/rankings/ http://www.seanlahman.com/baseball-archive/statistics/ http://espn.go.com/mlb/stats/batting/_/year/2013 Code install.packages("Lahman") library("Lahman") library("Lahman",lib.loc="~/R/win-library/3.1") data(Salaries) Salaries View(Salaries) #make a subsetdataof salariesin2013 Salaries_2013 <- as.data.frame(subset(Salaries,yearID>2012)) #make a subsetdatatable of salariesover3.2m whichwasthe league average Salaries_overavg13<- as.data.frame(subset(Salaries_2013,salary>= 3200000)) #make a subsetdataframe of salariesunderthe avgsalary Salaries_underavg13<- as.data.frame(subset(Salaries_2013,salary< 3200000)) data(Batting) Batting View(Batting) #make a subsetof battingstatsin the year2013 Batting_2013 <- as.data.frame(subset(Batting,yearID>2012)) #onlyinclude Battingstatsof playerswithatleast200AB Batting_stats<- as.data.frame(subset(Batting_2013,AB >= 200))
  • 9. #make a subsetof battingstats of avg over.280 ##league average in2013 was.253 Batting_avgover253<- as.data.frame(subset(Batting_stats,H/AB>= .253)) data(Pitching) Pitching View(Pitching) #subsetof stats in2013 Pitching_2013 <- as.data.frame(subset(Pitching,yearID>2012)) #subsetof gamespitched Pitching_Gover15<- as.data.frame(subset(Pitching_2013,G >= 15)) #subsetof pitcherswitheraunder3.86 (league average) Pitching_eraUnderAvg<- as.data.frame(subset(Pitching_Gover15,ERA <= 3.86)) hist( Salaries_overavg13$salary,xlab="SalariesinMillions",main="Histogramof Salariesin2013 over average") hist( Salaries_underavg13$salary,xlab="SalariesinMillions",main="Histogramof Salariesin2013 under average") plot(Salaries_overavg13$salary,Batting_avgover280$H/AB,xlab="Salaries",ylab="BattingAverage", main="BattingAvgerage VS.Salaries") abline(lm(salary~H/AB),col="blue") #regressionline (y~x) plot(Salaries_overavg13$salary,Pitching_Gover15$ERA,xlab="Salaries",ylab="ERA",main="ERA VS. Salaries") abline(lm(salary~ERA),col="blue") #regressionline (y~x) plot(Salaries_overavg13$salary,Pitching_Gover15$W,xlab="Salaries",ylab="Wins",main="WinsVS. Salaries") abline(lm(salary~W),col="blue")#regressionline(y~x)