SlideShare a Scribd company logo
1 of 7
Proceedings of the 2010 Industrial Engineering Research Conference
A. Johnson and J. Miller, eds.
Multi-criteria Selection of All-Star Pitching Staff for Fantasy
Baseball
Austin Lambert, Mark McGinley, Chaitanya Chandan, David Claudio, Lourdes Medina
The Pennsylvania State University Department of Industrial Engineering
State College, Pennsylvania 16801 USA
Abstract
This paper presents the problem of a fantasy baseball team managed by three decision makers with different
methods of conducting performance reviews. A multiple-criteria, multiple-decision maker optimization approach
was used in order to select the best pitching staff for the team. The criteria considered was based on the 2008 MLB
pitching statistics, including the ERA, OPP OBP, WHIP, FLD PTC, and K/BB ratio. The problem was formulated
as a linear/integer programming model with a weighted objective function. The results include a list of the pitching
selection and additional analysis conducted to evaluate the sensibility of the results to different weights scenarios.
Keywords
Linear programming, Integer programming Fantasy Baseball, Multi-criteria selection, Multiple-decision maker
1. Introduction
A frequently noted axiom in baseball is that a team is only as good as that day’s pitcher. Historically, in order to
succeed, a well-rounded pitching staff is needed. As a result, the offices in baseball strive have as their first priority
to lock up a quality pitching staff. In this paper, the data from the 2008 Major League Baseball (MLB) season is
used to create the optimal pitching staff for a fantasy baseball team. The problem presented is of a fantasy baseball
team that is managed by three decision makers, where each has different methods of conducting performance
reviews. The intent is to maximize the preferences when selecting the best twelve man staff, while limiting the field
of potential candidates by imposing restrictions on permissible variables. These variables were determined by
looking at the types of decisions that go into making an effective pitching staff for any team in the MLB.
After research on the different variables and statistics that go into a pitching staff, it was determined that a set of
detailed constraints would allow performing a linear/integer programming (LP) operation on this particular situation.
Those variables include metrics such as the Earned Run Average (ERA), Oppositions On-Base Percentage (OPP
OBP), Walks plus Hits divided by Innings Pitched (WHIP), Fielding Percentage (FLD PCT), and Strike to Ball ratio
(K/BB ratio). These variables tend to be the most important statistics that are considered by managers and coaches
of a baseball team. The problem was formulated as a LP model with a weighted objective function. With the
necessary funds, the solution to this LP problem will allow a fantasy team to build the most effective and efficient
pitching staff available, and thus have the greatest chance of success during a specific season.
2. Literature Review
LP has been used in numerous fields ranging from manufacturing to grocery shopping. The sports industry is no
exception although it has not been as extensively used as in other industries. Within sports, LP has been frequently
used for scheduling the seasonal games. For example, Hamiez and Hao [1] used algebra and LP while attempting to
solve the Sports League Scheduling Problem (SLSP). Although the specific details of the procedure are not
necessarily vital to this research, this shows how various methods of LP can lead to very different results. Michael
Trick [2] also wrote an interesting article on his method of scheduling MLB games and ACC basketball games. He
used various optimization methods (including LP) in order to successfully schedule games without conflict [2].
Trick’s work helped in this research to gather more understanding on how to collect specific data for a problem and
the steps necessary to complete the problem.
Lambert, McGinley, Chandan, Medina, and Claudio
Some people have proposed the use of optimization techniques for other sports related problems. For example,
Zappe et al. [3] constructed a model to determine the consistency in performance of baseball players. He used
different weights throughout the criteria when ranking different players in different positions. Although there could
be some controversy about the assigned weights, Zappe et al. states that the LP solution and the assigned weights
“makes sense” and thus proves no problem to the given solution [3]. Adler et al. [4] submitted an article on the use
of LP to find out exactly how a baseball team can make the playoffs. Although there is currently an accepted
method on the prediction of playoff spots in baseball, Adler et al. points out that this method is flawed as it does not
take into account additional games that must be played [4]. Similarly, a group of industrial engineers at Berkley
have set up a website (Baseball Playoff Races) that uses a sophisticated LP model to predict and estimate playoff
spots several games in advance compared to the method that is currently used [5].
Although baseball involves the constant training of all the players on the team, there is another side of the game that
many fans do not see, the managerial aspect of the game. Every day of the year managers and supervisors of MLB
team make important decisions regarding the offense, defense, pitching, and various other aspects of the game.
Each of these decisions can either make or break a team and can be in direct relation to the win loss ratio of a given
team. Lewis et al. [6] wrote an article regarding this exact aspect and how LP can be used to evaluate a team’s
managerial staff and the decisions they make. By evaluating offensive, defensive, and post season statistics, Lewis
et al. were able to use the fundamentals of LP to rank MLB’s managerial staffs in order of effectiveness over a
period of time [6]. This is one of the goals of having a fantasy baseball team. Sports fans get to experience the
managerial decisions that can affect the team they build.
3. Problem Description
The goal of using LP in fantasy baseball is to create a pitcher roster that is optimized based on several ranked
pitching statistics. In this particular model, twelve pitchers were to be selected where five were to be starters, five to
be relievers, and two to be closers.
In baseball, a pitcher’s skill is determined by calculating multiple statistics that range from strikeouts to wild pitches.
These statistics were used as the basis of the LP model. Although there are over forty official pitching statistics that
could be used, only five were selected to maintain simplicity and usability of the program. The following are the
pitching statistics selected:
• ERA-Earned Run Average: total number of earned runs multiplied by 9 then divided by innings pitched*
• WHIP-Walks and Hits per Inning Pitched: The average number of walks and hits allowed by the pitcher
per inning*
• OPP OBP- Oppositions On Base Percentage: times reached base divided by at bats plus walks plus hit by
pitch plus sacrifice flies*
• FLD PCT-Fielding Percentage: Total plays divided by the number of total chances
• K/BB- Strikeout-to-Walk Ratio: number of strikeouts divided by number of base on balls
Note that attributes with a “*” represent a statistic where a lower value is preferred. The data used for the problem
was obtained from reference [7] and [8].Table 1 contains a sample of pitchers from the overall pool along with their
respective statistics. All this data was taken from the 2008 Major League Baseball season. Since ERA, OPP OBP,
and WHIP are all measured with a smaller value being desired it was necessary to linearize and normalize the data
so that each statistic could easily be compared to one another. The first step in completing this was to scale all the
data by dividing the overall smallest ERA, WHIP, and OPP OBP of each pitching group by each individual player’s
respective statistic. Similarly, each players FP and K/BB was divided by the largest overall FP and K/BB for each
pitching group.
Lambert, McGinley, Chandan, Medina, and Claudio
Table 1- Sample of pitchers from the overall pool along with their respective statistics
PITCHERS Pitcher Type ERA WHIP OPP OBP FLD PCT K/BB
Tim Lincecum Starter 2.62 1.15 .297 1.0 3.15
Cliff Lee Starter 2.54 1.11 .285 .95 5.00
Johan Santana Starter 3.13 1.21 .296 .951 3.17
Rich Harden Starter 3.39 1.237 .283 .968 3.10
Bill Starter 3.53 1.20 .343 .951 2.11
Ben Sheets Starter 3.73 1.20 .284 .954 3.36
Jon Lester Starter 3.41 1.23 .301 .956 3.52
Joe Saunders Starter 4.22 1.37 .349 1.0 1.78
Ricky Nolasco Starter 5.06 1.25 .301 1.0 4.43
Paul Maholm Starter 4.44 1.44 .346 .954 1.98
Joey Devine Reliever .59 .8 .225 1.0 3.27
Scott Downs Reliever 1.78 1.15 .298 .96 2.11
Billy Wagner Reliever 2.3 .89 .228 1.0 5.2
Jon Lieber Reliever 4.05 1.39 .317 1.0 1.16
Brian Shouse Reliever 2.81 1.17 .288 .950 5.79
Matt Thornton Reliever 2.33 1.0 .258 1.0 4.05
Geoff Geary Reliever 4.5 1.5 .290 0 3.94
Brad Ziegler Reliever 1.06 1.16 .311 1.0 3.32
Grant Balfour Reliever 1.54 .89 .233 1.0 3.7
Chris Perez Reliever 3.46 1.34 .324 1.0 1.91
Francisco Rodriguez Closer 2.24 1.29 0.31 0.83 2.26
Joe Nathan Closer 1.33 0.90 0.24 0.90 4.11
Jonathon Papelbon Closer 2.34 0.95 0.25 0.77 9.63
Brad Lidge Closer 1.95 1.23 0.30 1.00 2.63
Mariano Rivera Closer 1.40 0.67 0.19 1.00 12.83
Jose Valverde Closer 3.38 1.18 0.29 0.90 3.61
Joakim Soria Closer 1.60 0.86 0.25 1.00 3.47
Bobby Jenks Closer 2.63 1.10 0.29 1.00 2.24
Brian Wilson Closer 4.62 1.44 0.34 1.00 2.39
BJ Ryan Closer 2.95 1.28 0.32 1.00 2.07
Next, each of the five statistics was weighted based on both pitching group and fantasy baseball owner’s preference
as shown in Tables 2 through 4. Finally, each scaled value was then multiplied by the appropriate weight. The new
scaled and weighted values allowed for the maximization of the objective function.
Table 2: Starters’ Weights
Statistic
Group
Ranking
Weight
ERA 1 .333
WHIP 2 .267
OBP 3 .200
Fld Pct 5 .067
K/BB 4 .133
Table 3: Closers’ Weights
Statistic
Group
Ranking
Weight
ERA 4 .133
WHIP 1 .333
OBP 2 .267
Fld Pct 5 .067
K/BB 3 .200
Table 4: Relievers’ Weights
Statistic
Group
Ranking
Weight
ERA 1 .333
WHIP 4 .133
Lambert, McGinley, Chandan, Medina, and Claudio
OBP 2 .267
Fld Pct 5 .067
K/BB 3 .200
Lambert, McGinley, Chandan, Medina, and Claudio
4. Problem Formulation
4.1 Objective Function
The objective function is described by,
Maximize Z = ∑i ∑j ∑k wik xij (1)
where wik is the weighted value for pitcher i on criteria k, and xij is a binary variable that refers to pitcher i for
position j. The definition of index i depends on the number of pitchers being considered, while j identifies the three
different positions known as the Reliever (j = 1), the Starter (j = 2) and the Closer (j = 3). The index k represents the
different criteria, previously defined as the five pitching statistics being considered: ERA (k = 1), OPP OBP (k = 2),
K/BB (k = 3), FLD PCT (k = 4) and WHIP (k = 5).
Also, note that:
xij=



otherwise
jpositionforselectedisipitcherif
0
1
(2)
4.2 Constraints
Along with the fantasy owners being able to set their own weights (tables 2-4), this particular LP allows for even
more customization within the code’s constraint section. A set of seven constraints were defined for each pitching
group representing the skills being evaluated, while being deemed as necessary for the selection of the pitcher into
the roster. The skills are defined as follows:
• Pitching Skill: ERA+ OPP OBP
• Balanced Player Skill 1: FLD PTC + WHIP
• Balanced Player Skill 2: WHIP + K/BB
• Ball Handling Skill: K/BB + FLD PTC
• Pitching Effectiveness Skill: WHIP + ERA
• Pitching Consistency Skill: ERA + K/BB
• Defensive Skill: FLD PTC + OPP OBP
Although these seven skills are simply the program’s defaults, any user can make adjustments by either adding or
removing skill constraints within the program’s code (see Section 6). Each constraint was designed in such a way
that the program searches for the maximum sum of the two statistics that make up each skill. The skill cutoff values
(at the right side of each constraint) were estimated based on the minimum skill expected for each pitcher (e.g. the
1.35 in constraint 3 is from the calculation of 5*0.27 and the 0.54 in constraint 11 is from 2*0.27). If any pitcher’s
skill is below the weighted cutoff value that pitcher will neither be selected nor analyzed further in the study.
• For Starters and Relievers (j=1,2):
∑i wi1 xij + ∑i wi2 xij ≥ 1.35 ∀ j=1,2 (Pitching Skill) (3)
∑i wi4 xij + ∑i wi5 xij ≥ 0.75 ∀ j=1,2 (Balanced 1) (4)
∑i wi3 xij + ∑i wi4 xij ≥ 0.65 ∀ j=1,2 (Ball Handling) (5)
∑i wi5 xij + ∑i wi3 xij ≥ 1.05 ∀ j=1,2 (Balanced 2) (6)
∑i wi5 xij + ∑i wi1 xij ≥ 1.15 ∀ j=1,2 (Effectiveness) (7)
∑i wi1 xij + ∑i wi3 xij ≥ 0.75 ∀ j=1,2 (Consistency) (8)
∑i wi4 xij + ∑i wi2 xij ≥ 1.25 ∀ j=1,2 (Defensive Skill) (9)
∑i xij = 5 ∀ j=1,2 (5 starters and 5 relievers) (10)
• For Closers (j=3):
Lambert, McGinley, Chandan, Medina, and Claudio
∑i wi1 xi3 + ∑i wi2 xi3 ≥ 0.54 (Pitching Skill) (11)
∑i wi4 xi3 + ∑i wi5 xi3 ≥ 0.30 (Balanced 1) (12)
∑i wi3 xi3 + ∑i wi4 xi3 ≥ 0.26 (Ball Handling) (13)
∑i wi5 xi3 + ∑i wi3 xi3 ≥ 0.42 (Balanced 2) (14)
∑i wi5 xi3 + ∑i wi1 xi3 ≥ 0.46 (Effectiveness) (15)
∑i wi1 xi3 + ∑i wi3 xi3 ≥ 0.30 (Consistency) (16)
∑ i wi4 xi3 + ∑i wi2 xi3 ≥ 0.50 (Defensive Skill) (17)
∑i xi3 = 2 (Need 2 closers) (18)
wij, xij ≥ 0 ∀ i,j,k (Non-negativity) (19)
5. Summary of Results
After collecting data and calculating the linear program using GAMS, a pitching staff of twelve members was
selected. This staff consisted of five starters, five relievers, and two closers. The specific players chosen are shown
in the table below.
Table 5: Players Selected
Starters Relievers Closers
Lincecum Devine Nathan
Lee Wagner Rivera
Nolasco Shouse
Harden Ziegler
Sheets Balfour
6. Analysis
The analysis performed consisted of changing the weights of the different pitching criteria such that an emphasis
was placed on different criteria (such as ERA). Particularly note the case of Matt Thorton (reliever), who missed
being selected by a value of .625. By analyzing Thorton’s ERA and OPP OBP (see Table 6) values, it was possible
to calculate the minimal amount that Matt Thorton would have to improve in order to be selected over Shouse.
Tables 7 through 10 demonstrate different weight scenarios and their impact on the players selected.
Table 6: Improvements needed for Matt Thorton to be selected
Player Original ERA Improved ERA for Selection Original OBP Improved OBP for Selection
Matt Thorton 2.325 2.297 .258 .2449
Table 7: Changing the weights to have a greater emphasis on ERA
Statistic Weight
ERA .5
WHIP .2
OBP .15
Fielding Percentage .05
K/BB .1
Table 8: New Players Selected
Starters Relievers Closers
Lincecum Downs Nathan
Lee Wagner Lidge
Santana Thornton
Harden Ziegler
Sheets Balfour
Table 9: Changing the weights for an all around balanced pitcher
Statistic Weight
ERA .4
Lambert, McGinley, Chandan, Medina, and Claudio
WHIP .2
OBP .2
Fielding Percentage .05
K/BB .15
Table 10: New Players Selected
Starters Relievers Closers
Lincecum Shouse Nathan
Lee Wagner Lidge
Harden Thornton
Sheets Ziegler
Nolasco Balfour
7. Conclusions
This paper shows the formulation of a fantasy baseball selection problem as a LP and its solution through the use of
GAMS. An interesting result was that several players, that were understood to be good enough to make it into the
pitching staff, were in fact left out. This includes Santana (starter) who had initially missed the cut by just several
points. However, the analysis demonstrated that by simply changing the importance (in the form of weights) of the
statistics (ERA, OBP, etc), the players selected would be quite different. This could be very useful to real baseball
organizations as they could slightly alter parameters and constraints in the LP in order to develop the pitching staff
of their choice. Also, this formulation could be used to help MLB managers decide which minor league pitchers to
bring up into the major leagues. Finally, by using reverse engineering, a baseball agent could inform their players of
the improvements they would need to achieve in order to stay competitive or be selected by a particular team, as
illustrated through an example in the analysis. However, in order to be used as a solution for MLB managers, several
additional cost constraints will have to be added depicting how much the team is willing to spend for each player.
Any additional research will be focused into making this transition.
References
1. Hamiez, J.P., and Hao, J.K., 2004, "A Linear-Time Algorithm to solve the sports league scheduling problem,"
Discrete Applied Mathematics 143(1-3), 252-265.
2. Trick, M., 2005, “Adventures in Sports Scheduling,” Carnegie Mellon’s Graduate School of Industrial
Administration. http://www.cs.cmu.edu/~ACO/dimacs/trick.html
3. Zappe, C., Webster, W., and Horowitz, I., 1993, "Using Linear/integer programming to Determine Post-Facto
Consistency in Performance Evaluations," Interfaces, 23(6), 107-113.
4. Adler, I., Erera, A.L., Hochbaum, D.S., and Olinick, E.V., 2002, “Baseball, Optimization, and the World Wide
Web," Interfaces, 32(2), 12-22.
5. University of California, Berkeley. “Baseball Playoff Races”. RIOT Baseball Project.
http://riot.ieor.berkeley.edu/~baseball/
6. Lewis, H. F., Lock, K. A., and Sexton, T.R., 2009, "Organizational capability, efficiency, and effectiveness in
Major League Baseball: 1901-2002," European Journal on Operations Research, 197(2), 731-740.
7. Baseball statistics and history, 2008. Retrieved from http://www.baseball-reference.com.
8. Fantasy baseball. (2008). Retrieved from http://games.espn.go.com/frontpage/baseball.

More Related Content

Viewers also liked

การสื่อสารเพื่อกิจธุระ
การสื่อสารเพื่อกิจธุระการสื่อสารเพื่อกิจธุระ
การสื่อสารเพื่อกิจธุระkingkarn somchit
 
วรรคทองในวรรณคดี
วรรคทองในวรรณคดีวรรคทองในวรรณคดี
วรรคทองในวรรณคดีkingkarn somchit
 
เสียงในภาษา
เสียงในภาษาเสียงในภาษา
เสียงในภาษาkingkarn somchit
 
Designing a uniform filter bank using multirate concept
Designing a uniform filter bank using multirate conceptDesigning a uniform filter bank using multirate concept
Designing a uniform filter bank using multirate conceptRedwan Islam
 
การโน้มน้าวใจ
การโน้มน้าวใจการโน้มน้าวใจ
การโน้มน้าวใจkingkarn somchit
 
การเขียนที่บรรลุวัตถุประสงค์
การเขียนที่บรรลุวัตถุประสงค์การเขียนที่บรรลุวัตถุประสงค์
การเขียนที่บรรลุวัตถุประสงค์kingkarn somchit
 
ธรรมชาติของภาษา
ธรรมชาติของภาษาธรรมชาติของภาษา
ธรรมชาติของภาษาkingkarn somchit
 

Viewers also liked (8)

การสื่อสารเพื่อกิจธุระ
การสื่อสารเพื่อกิจธุระการสื่อสารเพื่อกิจธุระ
การสื่อสารเพื่อกิจธุระ
 
วรรคทองในวรรณคดี
วรรคทองในวรรณคดีวรรคทองในวรรณคดี
วรรคทองในวรรณคดี
 
เสียงในภาษา
เสียงในภาษาเสียงในภาษา
เสียงในภาษา
 
Designing a uniform filter bank using multirate concept
Designing a uniform filter bank using multirate conceptDesigning a uniform filter bank using multirate concept
Designing a uniform filter bank using multirate concept
 
การโน้มน้าวใจ
การโน้มน้าวใจการโน้มน้าวใจ
การโน้มน้าวใจ
 
คำ
คำคำ
คำ
 
การเขียนที่บรรลุวัตถุประสงค์
การเขียนที่บรรลุวัตถุประสงค์การเขียนที่บรรลุวัตถุประสงค์
การเขียนที่บรรลุวัตถุประสงค์
 
ธรรมชาติของภาษา
ธรรมชาติของภาษาธรรมชาติของภาษา
ธรรมชาติของภาษา
 

Similar to Multi Criteria Selection of All-Star Pitching Staff

Identifying Key Factors in Winning MLB Games Using a Data-Mining Approach
Identifying Key Factors in Winning MLB Games Using a Data-Mining ApproachIdentifying Key Factors in Winning MLB Games Using a Data-Mining Approach
Identifying Key Factors in Winning MLB Games Using a Data-Mining ApproachJoelDabady
 
Predicting Salary for MLB Players
Predicting Salary for MLB PlayersPredicting Salary for MLB Players
Predicting Salary for MLB PlayersRobert-Ian Greene
 
WageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesWageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesGeorge Ulloa
 
1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docx1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docxjeremylockett77
 
Data Visualization and Clustering of Players in Major League Baseball
Data Visualization and Clustering of Players in Major League BaseballData Visualization and Clustering of Players in Major League Baseball
Data Visualization and Clustering of Players in Major League BaseballKaushik Nuvvula
 
atlblaze-shot-quality-cassis-presentation-pptx-3
atlblaze-shot-quality-cassis-presentation-pptx-3atlblaze-shot-quality-cassis-presentation-pptx-3
atlblaze-shot-quality-cassis-presentation-pptx-3Tyler Schanzenbach
 
Clustering of Players in Major League Baseball
Clustering of Players in Major League Baseball Clustering of Players in Major League Baseball
Clustering of Players in Major League Baseball Srinivas Osuri
 
Measuring Team Chemistry in MLB
Measuring Team Chemistry in MLBMeasuring Team Chemistry in MLB
Measuring Team Chemistry in MLBDavid Kelly
 
SSRN-id2816685
SSRN-id2816685SSRN-id2816685
SSRN-id2816685Dean Dagan
 
Ranking College Football
Ranking College FootballRanking College Football
Ranking College FootballWinston DeLoney
 
NBA Shorter Game and Competitive Balance
NBA Shorter Game and Competitive BalanceNBA Shorter Game and Competitive Balance
NBA Shorter Game and Competitive BalanceDavid Schneider
 
NBA Shorter Game and Competitive Balance
NBA Shorter Game and Competitive BalanceNBA Shorter Game and Competitive Balance
NBA Shorter Game and Competitive BalanceCaleb Engelbourg
 
The Contract Year Effect in the NBA
The Contract Year Effect in the NBAThe Contract Year Effect in the NBA
The Contract Year Effect in the NBAJoshua Kaplan
 
Senior Project Research Paper
Senior Project Research PaperSenior Project Research Paper
Senior Project Research Papercrissy498
 

Similar to Multi Criteria Selection of All-Star Pitching Staff (20)

Identifying Key Factors in Winning MLB Games Using a Data-Mining Approach
Identifying Key Factors in Winning MLB Games Using a Data-Mining ApproachIdentifying Key Factors in Winning MLB Games Using a Data-Mining Approach
Identifying Key Factors in Winning MLB Games Using a Data-Mining Approach
 
Predicting Salary for MLB Players
Predicting Salary for MLB PlayersPredicting Salary for MLB Players
Predicting Salary for MLB Players
 
Final Thesis
Final ThesisFinal Thesis
Final Thesis
 
Directed Research MRP
Directed Research MRPDirected Research MRP
Directed Research MRP
 
WageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesWageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletes
 
1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docx1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docx
 
Data Visualization and Clustering of Players in Major League Baseball
Data Visualization and Clustering of Players in Major League BaseballData Visualization and Clustering of Players in Major League Baseball
Data Visualization and Clustering of Players in Major League Baseball
 
atlblaze-shot-quality-cassis-presentation-pptx-3
atlblaze-shot-quality-cassis-presentation-pptx-3atlblaze-shot-quality-cassis-presentation-pptx-3
atlblaze-shot-quality-cassis-presentation-pptx-3
 
Clustering of Players in Major League Baseball
Clustering of Players in Major League Baseball Clustering of Players in Major League Baseball
Clustering of Players in Major League Baseball
 
Research Paper
Research PaperResearch Paper
Research Paper
 
Measuring Team Chemistry in MLB
Measuring Team Chemistry in MLBMeasuring Team Chemistry in MLB
Measuring Team Chemistry in MLB
 
Cricket predictor
Cricket predictorCricket predictor
Cricket predictor
 
SSRN-id2816685
SSRN-id2816685SSRN-id2816685
SSRN-id2816685
 
Ranking College Football
Ranking College FootballRanking College Football
Ranking College Football
 
NBA Shorter Game and Competitive Balance
NBA Shorter Game and Competitive BalanceNBA Shorter Game and Competitive Balance
NBA Shorter Game and Competitive Balance
 
NBA Shorter Game and Competitive Balance
NBA Shorter Game and Competitive BalanceNBA Shorter Game and Competitive Balance
NBA Shorter Game and Competitive Balance
 
The Contract Year Effect in the NBA
The Contract Year Effect in the NBAThe Contract Year Effect in the NBA
The Contract Year Effect in the NBA
 
I am omnipresent
I am omnipresentI am omnipresent
I am omnipresent
 
LAX IMPACT! White Paper
LAX IMPACT! White PaperLAX IMPACT! White Paper
LAX IMPACT! White Paper
 
Senior Project Research Paper
Senior Project Research PaperSenior Project Research Paper
Senior Project Research Paper
 

Multi Criteria Selection of All-Star Pitching Staff

  • 1. Proceedings of the 2010 Industrial Engineering Research Conference A. Johnson and J. Miller, eds. Multi-criteria Selection of All-Star Pitching Staff for Fantasy Baseball Austin Lambert, Mark McGinley, Chaitanya Chandan, David Claudio, Lourdes Medina The Pennsylvania State University Department of Industrial Engineering State College, Pennsylvania 16801 USA Abstract This paper presents the problem of a fantasy baseball team managed by three decision makers with different methods of conducting performance reviews. A multiple-criteria, multiple-decision maker optimization approach was used in order to select the best pitching staff for the team. The criteria considered was based on the 2008 MLB pitching statistics, including the ERA, OPP OBP, WHIP, FLD PTC, and K/BB ratio. The problem was formulated as a linear/integer programming model with a weighted objective function. The results include a list of the pitching selection and additional analysis conducted to evaluate the sensibility of the results to different weights scenarios. Keywords Linear programming, Integer programming Fantasy Baseball, Multi-criteria selection, Multiple-decision maker 1. Introduction A frequently noted axiom in baseball is that a team is only as good as that day’s pitcher. Historically, in order to succeed, a well-rounded pitching staff is needed. As a result, the offices in baseball strive have as their first priority to lock up a quality pitching staff. In this paper, the data from the 2008 Major League Baseball (MLB) season is used to create the optimal pitching staff for a fantasy baseball team. The problem presented is of a fantasy baseball team that is managed by three decision makers, where each has different methods of conducting performance reviews. The intent is to maximize the preferences when selecting the best twelve man staff, while limiting the field of potential candidates by imposing restrictions on permissible variables. These variables were determined by looking at the types of decisions that go into making an effective pitching staff for any team in the MLB. After research on the different variables and statistics that go into a pitching staff, it was determined that a set of detailed constraints would allow performing a linear/integer programming (LP) operation on this particular situation. Those variables include metrics such as the Earned Run Average (ERA), Oppositions On-Base Percentage (OPP OBP), Walks plus Hits divided by Innings Pitched (WHIP), Fielding Percentage (FLD PCT), and Strike to Ball ratio (K/BB ratio). These variables tend to be the most important statistics that are considered by managers and coaches of a baseball team. The problem was formulated as a LP model with a weighted objective function. With the necessary funds, the solution to this LP problem will allow a fantasy team to build the most effective and efficient pitching staff available, and thus have the greatest chance of success during a specific season. 2. Literature Review LP has been used in numerous fields ranging from manufacturing to grocery shopping. The sports industry is no exception although it has not been as extensively used as in other industries. Within sports, LP has been frequently used for scheduling the seasonal games. For example, Hamiez and Hao [1] used algebra and LP while attempting to solve the Sports League Scheduling Problem (SLSP). Although the specific details of the procedure are not necessarily vital to this research, this shows how various methods of LP can lead to very different results. Michael Trick [2] also wrote an interesting article on his method of scheduling MLB games and ACC basketball games. He used various optimization methods (including LP) in order to successfully schedule games without conflict [2]. Trick’s work helped in this research to gather more understanding on how to collect specific data for a problem and the steps necessary to complete the problem.
  • 2. Lambert, McGinley, Chandan, Medina, and Claudio Some people have proposed the use of optimization techniques for other sports related problems. For example, Zappe et al. [3] constructed a model to determine the consistency in performance of baseball players. He used different weights throughout the criteria when ranking different players in different positions. Although there could be some controversy about the assigned weights, Zappe et al. states that the LP solution and the assigned weights “makes sense” and thus proves no problem to the given solution [3]. Adler et al. [4] submitted an article on the use of LP to find out exactly how a baseball team can make the playoffs. Although there is currently an accepted method on the prediction of playoff spots in baseball, Adler et al. points out that this method is flawed as it does not take into account additional games that must be played [4]. Similarly, a group of industrial engineers at Berkley have set up a website (Baseball Playoff Races) that uses a sophisticated LP model to predict and estimate playoff spots several games in advance compared to the method that is currently used [5]. Although baseball involves the constant training of all the players on the team, there is another side of the game that many fans do not see, the managerial aspect of the game. Every day of the year managers and supervisors of MLB team make important decisions regarding the offense, defense, pitching, and various other aspects of the game. Each of these decisions can either make or break a team and can be in direct relation to the win loss ratio of a given team. Lewis et al. [6] wrote an article regarding this exact aspect and how LP can be used to evaluate a team’s managerial staff and the decisions they make. By evaluating offensive, defensive, and post season statistics, Lewis et al. were able to use the fundamentals of LP to rank MLB’s managerial staffs in order of effectiveness over a period of time [6]. This is one of the goals of having a fantasy baseball team. Sports fans get to experience the managerial decisions that can affect the team they build. 3. Problem Description The goal of using LP in fantasy baseball is to create a pitcher roster that is optimized based on several ranked pitching statistics. In this particular model, twelve pitchers were to be selected where five were to be starters, five to be relievers, and two to be closers. In baseball, a pitcher’s skill is determined by calculating multiple statistics that range from strikeouts to wild pitches. These statistics were used as the basis of the LP model. Although there are over forty official pitching statistics that could be used, only five were selected to maintain simplicity and usability of the program. The following are the pitching statistics selected: • ERA-Earned Run Average: total number of earned runs multiplied by 9 then divided by innings pitched* • WHIP-Walks and Hits per Inning Pitched: The average number of walks and hits allowed by the pitcher per inning* • OPP OBP- Oppositions On Base Percentage: times reached base divided by at bats plus walks plus hit by pitch plus sacrifice flies* • FLD PCT-Fielding Percentage: Total plays divided by the number of total chances • K/BB- Strikeout-to-Walk Ratio: number of strikeouts divided by number of base on balls Note that attributes with a “*” represent a statistic where a lower value is preferred. The data used for the problem was obtained from reference [7] and [8].Table 1 contains a sample of pitchers from the overall pool along with their respective statistics. All this data was taken from the 2008 Major League Baseball season. Since ERA, OPP OBP, and WHIP are all measured with a smaller value being desired it was necessary to linearize and normalize the data so that each statistic could easily be compared to one another. The first step in completing this was to scale all the data by dividing the overall smallest ERA, WHIP, and OPP OBP of each pitching group by each individual player’s respective statistic. Similarly, each players FP and K/BB was divided by the largest overall FP and K/BB for each pitching group.
  • 3. Lambert, McGinley, Chandan, Medina, and Claudio Table 1- Sample of pitchers from the overall pool along with their respective statistics PITCHERS Pitcher Type ERA WHIP OPP OBP FLD PCT K/BB Tim Lincecum Starter 2.62 1.15 .297 1.0 3.15 Cliff Lee Starter 2.54 1.11 .285 .95 5.00 Johan Santana Starter 3.13 1.21 .296 .951 3.17 Rich Harden Starter 3.39 1.237 .283 .968 3.10 Bill Starter 3.53 1.20 .343 .951 2.11 Ben Sheets Starter 3.73 1.20 .284 .954 3.36 Jon Lester Starter 3.41 1.23 .301 .956 3.52 Joe Saunders Starter 4.22 1.37 .349 1.0 1.78 Ricky Nolasco Starter 5.06 1.25 .301 1.0 4.43 Paul Maholm Starter 4.44 1.44 .346 .954 1.98 Joey Devine Reliever .59 .8 .225 1.0 3.27 Scott Downs Reliever 1.78 1.15 .298 .96 2.11 Billy Wagner Reliever 2.3 .89 .228 1.0 5.2 Jon Lieber Reliever 4.05 1.39 .317 1.0 1.16 Brian Shouse Reliever 2.81 1.17 .288 .950 5.79 Matt Thornton Reliever 2.33 1.0 .258 1.0 4.05 Geoff Geary Reliever 4.5 1.5 .290 0 3.94 Brad Ziegler Reliever 1.06 1.16 .311 1.0 3.32 Grant Balfour Reliever 1.54 .89 .233 1.0 3.7 Chris Perez Reliever 3.46 1.34 .324 1.0 1.91 Francisco Rodriguez Closer 2.24 1.29 0.31 0.83 2.26 Joe Nathan Closer 1.33 0.90 0.24 0.90 4.11 Jonathon Papelbon Closer 2.34 0.95 0.25 0.77 9.63 Brad Lidge Closer 1.95 1.23 0.30 1.00 2.63 Mariano Rivera Closer 1.40 0.67 0.19 1.00 12.83 Jose Valverde Closer 3.38 1.18 0.29 0.90 3.61 Joakim Soria Closer 1.60 0.86 0.25 1.00 3.47 Bobby Jenks Closer 2.63 1.10 0.29 1.00 2.24 Brian Wilson Closer 4.62 1.44 0.34 1.00 2.39 BJ Ryan Closer 2.95 1.28 0.32 1.00 2.07 Next, each of the five statistics was weighted based on both pitching group and fantasy baseball owner’s preference as shown in Tables 2 through 4. Finally, each scaled value was then multiplied by the appropriate weight. The new scaled and weighted values allowed for the maximization of the objective function. Table 2: Starters’ Weights Statistic Group Ranking Weight ERA 1 .333 WHIP 2 .267 OBP 3 .200 Fld Pct 5 .067 K/BB 4 .133 Table 3: Closers’ Weights Statistic Group Ranking Weight ERA 4 .133 WHIP 1 .333 OBP 2 .267 Fld Pct 5 .067 K/BB 3 .200 Table 4: Relievers’ Weights Statistic Group Ranking Weight ERA 1 .333 WHIP 4 .133
  • 4. Lambert, McGinley, Chandan, Medina, and Claudio OBP 2 .267 Fld Pct 5 .067 K/BB 3 .200
  • 5. Lambert, McGinley, Chandan, Medina, and Claudio 4. Problem Formulation 4.1 Objective Function The objective function is described by, Maximize Z = ∑i ∑j ∑k wik xij (1) where wik is the weighted value for pitcher i on criteria k, and xij is a binary variable that refers to pitcher i for position j. The definition of index i depends on the number of pitchers being considered, while j identifies the three different positions known as the Reliever (j = 1), the Starter (j = 2) and the Closer (j = 3). The index k represents the different criteria, previously defined as the five pitching statistics being considered: ERA (k = 1), OPP OBP (k = 2), K/BB (k = 3), FLD PCT (k = 4) and WHIP (k = 5). Also, note that: xij=    otherwise jpositionforselectedisipitcherif 0 1 (2) 4.2 Constraints Along with the fantasy owners being able to set their own weights (tables 2-4), this particular LP allows for even more customization within the code’s constraint section. A set of seven constraints were defined for each pitching group representing the skills being evaluated, while being deemed as necessary for the selection of the pitcher into the roster. The skills are defined as follows: • Pitching Skill: ERA+ OPP OBP • Balanced Player Skill 1: FLD PTC + WHIP • Balanced Player Skill 2: WHIP + K/BB • Ball Handling Skill: K/BB + FLD PTC • Pitching Effectiveness Skill: WHIP + ERA • Pitching Consistency Skill: ERA + K/BB • Defensive Skill: FLD PTC + OPP OBP Although these seven skills are simply the program’s defaults, any user can make adjustments by either adding or removing skill constraints within the program’s code (see Section 6). Each constraint was designed in such a way that the program searches for the maximum sum of the two statistics that make up each skill. The skill cutoff values (at the right side of each constraint) were estimated based on the minimum skill expected for each pitcher (e.g. the 1.35 in constraint 3 is from the calculation of 5*0.27 and the 0.54 in constraint 11 is from 2*0.27). If any pitcher’s skill is below the weighted cutoff value that pitcher will neither be selected nor analyzed further in the study. • For Starters and Relievers (j=1,2): ∑i wi1 xij + ∑i wi2 xij ≥ 1.35 ∀ j=1,2 (Pitching Skill) (3) ∑i wi4 xij + ∑i wi5 xij ≥ 0.75 ∀ j=1,2 (Balanced 1) (4) ∑i wi3 xij + ∑i wi4 xij ≥ 0.65 ∀ j=1,2 (Ball Handling) (5) ∑i wi5 xij + ∑i wi3 xij ≥ 1.05 ∀ j=1,2 (Balanced 2) (6) ∑i wi5 xij + ∑i wi1 xij ≥ 1.15 ∀ j=1,2 (Effectiveness) (7) ∑i wi1 xij + ∑i wi3 xij ≥ 0.75 ∀ j=1,2 (Consistency) (8) ∑i wi4 xij + ∑i wi2 xij ≥ 1.25 ∀ j=1,2 (Defensive Skill) (9) ∑i xij = 5 ∀ j=1,2 (5 starters and 5 relievers) (10) • For Closers (j=3):
  • 6. Lambert, McGinley, Chandan, Medina, and Claudio ∑i wi1 xi3 + ∑i wi2 xi3 ≥ 0.54 (Pitching Skill) (11) ∑i wi4 xi3 + ∑i wi5 xi3 ≥ 0.30 (Balanced 1) (12) ∑i wi3 xi3 + ∑i wi4 xi3 ≥ 0.26 (Ball Handling) (13) ∑i wi5 xi3 + ∑i wi3 xi3 ≥ 0.42 (Balanced 2) (14) ∑i wi5 xi3 + ∑i wi1 xi3 ≥ 0.46 (Effectiveness) (15) ∑i wi1 xi3 + ∑i wi3 xi3 ≥ 0.30 (Consistency) (16) ∑ i wi4 xi3 + ∑i wi2 xi3 ≥ 0.50 (Defensive Skill) (17) ∑i xi3 = 2 (Need 2 closers) (18) wij, xij ≥ 0 ∀ i,j,k (Non-negativity) (19) 5. Summary of Results After collecting data and calculating the linear program using GAMS, a pitching staff of twelve members was selected. This staff consisted of five starters, five relievers, and two closers. The specific players chosen are shown in the table below. Table 5: Players Selected Starters Relievers Closers Lincecum Devine Nathan Lee Wagner Rivera Nolasco Shouse Harden Ziegler Sheets Balfour 6. Analysis The analysis performed consisted of changing the weights of the different pitching criteria such that an emphasis was placed on different criteria (such as ERA). Particularly note the case of Matt Thorton (reliever), who missed being selected by a value of .625. By analyzing Thorton’s ERA and OPP OBP (see Table 6) values, it was possible to calculate the minimal amount that Matt Thorton would have to improve in order to be selected over Shouse. Tables 7 through 10 demonstrate different weight scenarios and their impact on the players selected. Table 6: Improvements needed for Matt Thorton to be selected Player Original ERA Improved ERA for Selection Original OBP Improved OBP for Selection Matt Thorton 2.325 2.297 .258 .2449 Table 7: Changing the weights to have a greater emphasis on ERA Statistic Weight ERA .5 WHIP .2 OBP .15 Fielding Percentage .05 K/BB .1 Table 8: New Players Selected Starters Relievers Closers Lincecum Downs Nathan Lee Wagner Lidge Santana Thornton Harden Ziegler Sheets Balfour Table 9: Changing the weights for an all around balanced pitcher Statistic Weight ERA .4
  • 7. Lambert, McGinley, Chandan, Medina, and Claudio WHIP .2 OBP .2 Fielding Percentage .05 K/BB .15 Table 10: New Players Selected Starters Relievers Closers Lincecum Shouse Nathan Lee Wagner Lidge Harden Thornton Sheets Ziegler Nolasco Balfour 7. Conclusions This paper shows the formulation of a fantasy baseball selection problem as a LP and its solution through the use of GAMS. An interesting result was that several players, that were understood to be good enough to make it into the pitching staff, were in fact left out. This includes Santana (starter) who had initially missed the cut by just several points. However, the analysis demonstrated that by simply changing the importance (in the form of weights) of the statistics (ERA, OBP, etc), the players selected would be quite different. This could be very useful to real baseball organizations as they could slightly alter parameters and constraints in the LP in order to develop the pitching staff of their choice. Also, this formulation could be used to help MLB managers decide which minor league pitchers to bring up into the major leagues. Finally, by using reverse engineering, a baseball agent could inform their players of the improvements they would need to achieve in order to stay competitive or be selected by a particular team, as illustrated through an example in the analysis. However, in order to be used as a solution for MLB managers, several additional cost constraints will have to be added depicting how much the team is willing to spend for each player. Any additional research will be focused into making this transition. References 1. Hamiez, J.P., and Hao, J.K., 2004, "A Linear-Time Algorithm to solve the sports league scheduling problem," Discrete Applied Mathematics 143(1-3), 252-265. 2. Trick, M., 2005, “Adventures in Sports Scheduling,” Carnegie Mellon’s Graduate School of Industrial Administration. http://www.cs.cmu.edu/~ACO/dimacs/trick.html 3. Zappe, C., Webster, W., and Horowitz, I., 1993, "Using Linear/integer programming to Determine Post-Facto Consistency in Performance Evaluations," Interfaces, 23(6), 107-113. 4. Adler, I., Erera, A.L., Hochbaum, D.S., and Olinick, E.V., 2002, “Baseball, Optimization, and the World Wide Web," Interfaces, 32(2), 12-22. 5. University of California, Berkeley. “Baseball Playoff Races”. RIOT Baseball Project. http://riot.ieor.berkeley.edu/~baseball/ 6. Lewis, H. F., Lock, K. A., and Sexton, T.R., 2009, "Organizational capability, efficiency, and effectiveness in Major League Baseball: 1901-2002," European Journal on Operations Research, 197(2), 731-740. 7. Baseball statistics and history, 2008. Retrieved from http://www.baseball-reference.com. 8. Fantasy baseball. (2008). Retrieved from http://games.espn.go.com/frontpage/baseball.