SlideShare a Scribd company logo
This presentation will be taking a look at one of the many real
life use for statistics, Sports, and what some basic and
advanced options can tell us when trying to use statistics to
decide a not so straight forward question. Using SSPS as our
backbone along with already printed statistical analysis we will
take a look at how things like z-scores, standard deviations,
selective cases, and logistic regressions, can help us obtain a
solid foundation to answer a hazy question.
Memorable or just plain Amazing:
A Statistical Look at the 2010 MLB Season otherwise known as
“The Year of the Pitcher”
By: KC Burgos
Math 449
nov. 1, 2011
What Exactly Are We Looking To Answer?
• 2010 MLB Season has been classified as the
“Year of the Pitcher”
• Seven No-Hitters (Six official)
• What exactly defines a great pitching year?
List of No-Hitters in 2010
• April 17th, Ubaldo Jiménez vs. Atlanta Braves (Final Score 4-0)
• May 9th, Dallas Braden vs. Tampa Bay Rays (Final Score 4-0)
• May 29th, Roy Halladay vs. Florida Marlins (Final Score 1-0)
• June 2nd, Armando Galarraga* vs. Cleveland Indians (Final Score 3-0)
• June 25th, Edwin Jackson vs. Tampa Bay Rays (Final Score 1-0)
• July 26th, Matt Garza vs. Detroit Tigers (Final Score 5-0)
• October 6th, Roy Halladay vs. Cincinnati Reds (Final Score 4-0)
Armando Galaragga’s No-Hitter
• Armando Galaragga’s No-Hitter is officially
recognized by MLB as a one-hitter.
• The 27th out
Defining Parameters for our Question
• Many ways to answer what makes a season the
“Year of the Pitcher”
• Comparing Seasons and Stats
• No-Hitter Probabilities
Comparisons: Variables
• IP – Innings Pitched
• R – Runs
• ER – Earned Runs
• H – Hits
• BB – Base on Balls (Walks)
• ERA – Earned Run Average
• WHIP – Walks and Hits per Inning Pitched
Computation Variables
• IP – 123.1 = 123 1/3, 453.2 = 453 2/3
• ERA – (9*ER)/IP
• WHIP – (H + BB)/IP
Comparisons: The Data
• 2010 Season
• 30 Teams
• Specifically Looking at Pitching
• Other Possible Variables We Could Have Used
Comparisons: The Data
Comparisons: Descriptives
MLB
NL
AL
Comparisons: Yearly Data
• Compare it to Data from 1969-2009
• Why 1969?
Comparisons:
Yearly Data
• Special Years:
1969
1977
1981
1993
1994
1995
1998
Comparisons: Normaliity
If we can have, or make, our values match a normal curve then we
can easily use these already derived percentages of where each
standard deviation should fit.
Comparisons: Normality and Homoscedasticity
MLB ERA 1969-2009
Comparisons: Normality and Homoscedasticity
MLB WHIP 1969-2009
Comparisons: Normality and Homoscedasticity
AL ERA 1969-2009
Comparisons: Normality and Homoscedasticity
AL WHIP 1969-2009
Comparisons: Normality and Homoscedasticity
NL ERA 1969-2009
Comparisons: Normality and Homoscedasticity
NL WHIP 1969-2009
Comparisons: Fischer’s Method
Fischer’s Method tells us that if the
skewness value is twice its standard error
in either direction then the curve is
severely skewed. A distribution is
considered normal if it meets
-1.96 <
𝑺
𝑺𝑬𝑺
< 1.96
Comparisons: Fischer’s Method
It should also be noted that this applies
for Kurtosis. However I focus here more
on the skewness of data, having already
taken kurtosis into account.
-1.96 <
𝑲
𝑺𝑬𝑲
< 1.96
Comparisons: Yearly Data Descriptives
Comparisons: Doing the Math
With Normality Assumed ERA and WHIP
should be within 3 Standard Deviations
from the mean.
Comparisons: Doing the Math
Lower 3 Standard Deviations
1969-2009 (1) (2) (3)
MLB ERA 4.0522 3.67005 3.2879 2.90575
MLB WHIP 1.3673 1.32313 1.27896 1.23479
AL ERA 4.31734 3.7317 3.29 2.8483
AL WHIP 1.3836 1.32916 1.27472 1.22028
NL ERA 3.91 3.54707 3.18414 2.82121
NL WHIP 1.35 1.30743 1.26486 1.22229
Comparisons: Doing the Math
z-Scores
z-Scores can be used to determine an
exact distance from the mean a
number ‘X’ is in terms of the standard
deviation.
z =
𝑋−𝜇
𝜎
Comparisons: Doing the Math
1969-
2009 (1) (2) (3)
MLB
ERA
4.0522 3.67005 3.2879 2.90575
MLB
WHIP
1.3673 1.32313 1.27896 1.23479
AL ERA 4.31734 3.7317 3.29 2.8483
AL WHIP 1.3836 1.32916 1.27472 1.22028
NL ERA 3.91 3.54707 3.18414 2.82121
NL
WHIP
1.35 1.30743 1.26486 1.22229
2010 MLB ERA = 4.0743
2010 MLB WHIP = 1.3473
2010 AL ERA = 4.1384
2010 AL WHIP = 1.3463
2010 NL ERA = 4.0183
2010 NL WHIP = 1.3482
Z-Scores
2010 MLB ERA: z = .05783
2010 MLB WHIP: z = -.452796
2010 AL ERA: z = -.4051166
2010 AL WHIP: z = -.685158
2010 NL ERA: z = .298405
2010 NL WHIP: z = -.0422833
Comparisons: What Does It Tell Us?
Since all of the Sample Means are within one standard
deviation of the Parent Mean then we can say that 2010
was a relatively normal pitching year with respect to
post-1969 pitching years in terms of the variables ERA
and WHIP.
Comparisons: Defining a Smaller Interval
When looking at a scatter plot
graph of MLB ERA between 1969
and 2009 we can see a unusual
increase. So let’s apply a R2 Linear
Fit Line.
Although helpful a R2 Linear Fit
Line doesn’t quite give us a great
description of what’s truly going
on. So instead let us use a a R2
LOESS Fit Line.
LOESS (Locally Weighted
Scatterplot Smoothing) looks
more specifically at subsets
within our values and produces a
curve to fit them.
Comparisons: Defining a Smaller Interval
We can now see an interesting
curve in our later values.
We want to start a subinterval
at that point. It is the last year
at which ERA would go down
the following year before
dramatic increases.
The year turns out to be 1991
Comparisons: 1991-2009 Data Descriptives
Like before we are going to want to determine skewness and
homoscedasticity. However, thanks to Fischer’s method, we can just use
the descriptives. As for Homoscedacity, they all passed.
We can see that we have skewness issues for two variables.
Comparisons: 1991-2009 Data Descriptives
Although there are certainly methods to deal with skewness to retrieve a
normal distribution, instead we can see that each league has a statistic
that is regarded as normal.
WHIP
Therefore we will focus on those three across the board.
Comparisons: Doing the Math 2
Like before we will determine the three standard deviations away from
the mean and use them to compare to the 2010 numbers
1991-2009 (1) (2) (3)
MLB WHIP 1.3996 1.36352 1.32744 1.29136
AL WHIP 1.4214 1.37659 1.33178 1.28697
NL WHIP 1.3783 1.3406 1.3029 1.2652
Comparisons: Doing the Math 2
1991-
2009 (1) (2) (3)
MLB
WHIP
1.3996 1.36352 1.32744 1.29136
AL
WHIP
1.4214 1.37659 1.33178 1.28697
NL
WHIP
1.3783 1.3406 1.3029 1.2652
2010 MLB WHIP = 1.3473
2010 AL WHIP = 1.3463
2010 NL WHIP = 1.3482
z-Scores
2010 MLB WHIP: z = -1.4496
2010 AL WHIP: z = -1.67597
2010 NL WHIP: z = -.798408
Comparisons: So what does this tell us???
That even when looking at a recent interval of years (19
between 1991-2009) that last year pitching wise, in terms
of WHIP, was at most within a range of two standard
deviations away from the MLB, AL, and NL, averages over
those 19 years (2010 MLB WHIP: z = -1.4496, 2010 AL
WHIP: z = -1.67597, 2010 NL WHIP: z = -.798408).
Therefore last year was above average, however, was not
so far off that it could be considered extremely rare or an
outlier. So once again not exactly the “Year of the
Pitcher”
Is There An Easier Way???
Comparisons Cons:
• Too many numbers
• Sample Sizes
• Normality
Logistic Regression
Logistic Regression – When we want an odds ratio for a
question that does not have a continuous answer (i.e.
yes or no questions), a Logistic Regression can give you
that equation based on the factors you give.
P(event) =
1
1+𝑒−(𝑎+𝑏1 𝑋1+𝑏2 𝑋2+ … +𝑏 𝑖 𝑋 𝑖)
Odds =
𝑃 𝑒𝑣𝑒𝑛𝑡
1+𝑃 𝑒𝑣𝑒𝑛𝑡
ln(odds) =𝑎 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + … + 𝑏𝑖 𝑋𝑖
a = Y-Intercept
b = regression coefficient
X = factors
Logistic Regression: Assumptions
• DOES NOT ASSUME Linearity
• DOES NOT ASSUME Normal Distribution
• DOES NOT ASSUME Homoscedasticity
• DOES NOT ASSUME Normality of Residuals
• Does Assume Sample Representativeness
• Does Assume Levels of Measurement
• Does Assume no Multicollinearity (exist if VIF > 10)
Logistic Regression: Behind the Scenes
Suppose we want to predict a variable Y from X where our data is
described as: (x1 , y1) … (xn , yn)
We can look at two possibilities.
If Y has a Normal Distribution, with mean and variance 2.
If Y has only two possible values 0 and 1.
Linear Regression
If Y has a Normal Distribution, with mean and variance 2.
Then it has a probability density function (pdf) of
𝑒
−
𝑦−𝜇 2
2𝜎2
2𝜋𝑟
Suppose = a + bx
The Likelihood function is
𝑒
−
𝑦1−(𝑎+𝑏𝑥1) 2
2𝜎2
2𝜋𝑟
∗ ⋯ ∗
𝑒
−
𝑦 𝑛− 𝑎+𝑏𝑥 𝑛
2
2𝜎2
2𝜋𝑟
=
𝑒
− 𝑖=1
𝑛
𝑦 𝑖− 𝑎+𝑏𝑥 𝑖
2
2𝜎2
2𝜋
𝑛
2 𝑟 𝑛
Linear Regression
If Y has a Normal Distribution, with mean and variance 2.
Maximizing this likelihood function is the same as minimizing
Q =
𝑖=1
𝑛
𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖
2
This can be done by setting derivatives with respect to a and with
respect to b equal to zero.
𝑑𝑄
𝑑𝑏
=
𝑖=1
𝑛
2[𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 ](−𝑥𝑖)
𝑑𝑄
𝑑𝑎
=
𝑖=1
𝑛
2 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 (−1)
Linear Regression
If Y has a Normal Distribution, with mean and variance 2.
𝑑𝑄
𝑑𝑏
=
𝑖=1
𝑛
2[𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 ](−𝑥𝑖)
𝑑𝑄
𝑑𝑎
=
𝑖=1
𝑛
2 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 (−1)
Solving these equations leads to the solutions
𝑏 =
𝑖=1
𝑛
(𝑥𝑖 − 𝑥) 𝑦𝑖 − 𝑦 2
𝑖=1
𝑛
(𝑥𝑖 − 𝑥)2
𝑎 = 𝑦 − 𝑏 𝑥
Logistic Regression
If Y has only two possible values 0 and 1.
It’s probability density function (pdf)
𝑝 𝑦
1 − 𝑝 1−𝑦
, y = 0, 1
Where p is the probability that y = 1
and 1 − p is the probability that y = 0
Suppose 𝑝 = 𝑎 + 𝑏𝑥
This Doesn’t Work!
It allows the possibility that p < 0 or p > 1.
Let’s Try Odds!
Logistic Regression
If Y has only two possible values 0 and 1.
𝑜𝑑𝑑𝑠 =
𝑝
1−𝑝
= 𝑎 + 𝑏𝑥.
However, This does not work either since it allows the possibility that
𝑝
1−𝑝
< 0.
Let’s try this then
log
𝑝
1−𝑝
= 𝑎 + 𝑏𝑥
This will work since log() can be any real number
𝑝
1 − 𝑝
= 𝑒 𝑎+𝑏𝑥
⇒ 𝑝 =
𝑒 𝑎+𝑏𝑥
1 + 𝑒 𝑎+𝑏𝑥
=
1
1 + 𝑒− 𝑎+𝑏𝑥
⟹ 1 − 𝑝 =
1
1 + 𝑒 𝑎+𝑏𝑥
Logistic Regression
If Y has only two possible values 0 and 1.
The likelihood function is
𝑒 𝑎+𝑏𝑥1
1 + 𝑒 𝑎+𝑏𝑥1
𝑦1
1
1 + 𝑒 𝑎+𝑏𝑥1
𝑦1
∗ ⋯ ∗
𝑒 𝑎+𝑏𝑥 𝑛
1 + 𝑒 𝑎+𝑏𝑥 𝑛
𝑦 𝑛
1
1 + 𝑒 𝑎+𝑏𝑥 𝑛
𝑦 𝑛
p 1 p
Now finding the derivatives with respect to a and b and setting these equal is
very difficult! Typically this function is maximized by using iterative procedures
from numerical analysis.
Logistic Regression: Our Use
• What made 2010 noticeable to begin with?
• NO HITTERS!!!!
• Step 1, Find our Variables
Logistic Regression: Variables and Coding
• Games Started (GS)
• Winning Percentage of Games Started (WinPer)
• Earned Run Average (ERA)
• Complete Game Percentage of Games Started
(CGPer)
• Strikeouts per Nine Innings Pitched (Kper9)
• Batting Average Against (BAA)
• Innings Pitched (IP)
• Hits per Nine Innings Pitched (Hper9)
• Shutout Percentage of Games Started (ShoPer)
• No Hitter Achieved (Coded 1 = yes, 0 = no)
Logistic Regression: Preparation
Before Throwing Our Variables into the
regression we can actually eliminate ones we
will not need using a Independent t-Test to
see which variables are statistically significant
between those who threw a no-hitter and
who didn’t.
Logistic Regression: Preparation
Logistic Regression: Preparation
But HOLD IT You still haven’t checked
for Multicollinearity!!!
Actually I have, it just turns out the two variables from the
independent t-Test fit.
Logistic Regression: Recap
What We Know
• The two variables being input are CGPer and ShoPer
• Both fit all assumptions
LET’S DO THIS THING!
Logistic Regression: Equation
So what do we get?
ln(odds of Throwing a No Hitter) = -2.505 + 38.772 * (ShoPER)
Logistic Regression: Using It!
Take a look at the ln(odds) for the six pitchers who threw a no hitter
(taking into account Stats before their 2010 season)
• Roy Halladay ln(odds) = -2.505 + 38.772 * (.05226) = -.47878
• Ubaldo Jimenez ln(odds) = -2.505 + 38.772 * (0) = -2.505
• Dallas Braden ln(odds) = -2.505 + 38.772 * (0) = -2.505
• Armando Galarraga ln(odds) = -2.505 + 38.772 * (0) = -2.505
• Edwin Jackson ln(odds) = -2.505 + 38.772 * (.00909) = -2.15256
• Matt Garza ln(odds) = -2.505 + 38.772 * (.02326) = -1.60316
Logistic Regression: Using It!
Probability Table
Pitcher ln(odds) P(no-hitter)
Roy Halladay -.47878 0.382540
Ubaldo Jimenez -2.505 0.075508
Dallas Braden -2.505 0.075508
Armando Galarraga -2.505 0.075508
Edwin Jackson -2.15256 0.104092
Matt Garza -1.60316 0.167540
Logistic Regression: One Last Comparison
Let’s make one last comparison. Let’s compare the average ln(odds) of
the previous six pitchers to throw no hitters with the previous six.
-1.647795
Of our six pitchers we will find only one, The Philadelphia Phillies’ Roy
Halladay, is above the average at -.47878. In fact he is above the
ln(odds) average of everyone who threw a no hitter in our database,
-1.173541089.
As for our five others, the average ln(odds) of those pitcher in our
database who didn’t throw a no-hitter is -1.575964583. All of our
other five pitchers fall short of this mark, the closest being Matt Garza
at -1.60316.
Logistic Regression: What Does it Mean?
It means four of the six pitchers who threw a no-hitter had, in laymen’s
terms, PULLED A MIRACLE OUT OF THEIR HAT!
Statistically, it means that four of our six pitchers not only beat the
odds, they destroyed them.
3 of our pitchers, (Ubaldo Jimenez, Dallas Braden, and Armando
Galarraga) beat the odds by nearly one having never even thrown a
shutout before their no-hitters.
The other two (Edwin Jackson and Matt Garza) showed odds more
likely to not throw a no-hitter even having previously thrown shutouts.
***Although not directly assessed in this analysis, both Dallas Braden
and Armando Galarraga pitched Perfect Games, exponentially rarer***
Logistic Regression:
Conclusion
What does the data tell us?
Overall pitching was average when we use the yearly averages and
standard deviations from 1969-2009 (2010 MLB ERA: z = .05783, 2010 MLB
WHIP: z = -.452796, 2010 AL ERA: z = -.4051166, 2010 AL WHIP: z = -.685158,
2010 NL ERA: z = .298405, 2010 NL WHIP: z = -.0422833), and only slightly
improved when reducing our interval to 1991-2009 and focus in on WHIP (2010
MLB WHIP: z = -1.4496, 2010 AL WHIP: z = -1.67597, 2010 NL WHIP: z = -
.798408).
The pitchers who made this season memorable with their no-hitters
showed that only one of them any sign that a no-hitter could be coming (Roy
Halladay, ln(odds) = -.47878). As for the other five, all of them showed odds less
than the average ln(odds) of those pitchers who didn’t throw no hitters (144,
average ln(odds) = -1.575964583). Three of the five actually had the lowest
possible odds according to our ln(odds) (Dallas Braden, Armando Galarraga,
Ubaldo Jimenez, ln(odds) = -2.505)
Conclusion
So what should we call the 2010 MLB season?
“The Comeback Year of the Pitcher”
Special Thanks
Dr. Sprechini – Advisor
David Brown – Peer Reviewer
Bibliography
• Abu-Bader, Soleman. Advanced & Multivariable Statistical Methods For Social
Science Research with a complete SSPS guide. Chicago, Ill: Lyceum Books,
2010.
• Albert, Jim. "Is Roger Clemens' WHIP Trajectory Unusual?" Chance 22.2 (2009): 9-
19.
• Bauman, Mike. Halladay stood out in Year of the Pitcher. MLB News.
http://mlb.mlb.com/news/article.jsp?ymd=20101116&content_id=1611319
0&vkey=news_mlb&c_id=mlb
• Forman, Sean, Justin Kubatko. www.baseball-reference.com
• McCarthy, David, David Groggel, and John A. Bailer. "Career Pitching Statistics and
the Probability of Throw a No-Hitter in MLB: A Case-Control Study." Chance
23.3 (2010): 25-35.
• Schmotzer, Brian, Patrick D. Kilgo, and Jeff Switchenko. "'The Natural'? The Effect
of Steroids on Offensive Performance in Baseball." Chance 22.2 (2009): 21-
31
• Wikipedia Contributors. No-Hitter. Wikipedia, The Free Encyclopedia,
http://en.wikipedia.org/wiki/No-hitter#Major_League_Baseball_no-hitters

More Related Content

Viewers also liked

Ofimatica
OfimaticaOfimatica
Gure gela hiztegia - eraiketa txokoa
Gure gela hiztegia - eraiketa txokoaGure gela hiztegia - eraiketa txokoa
Gure gela hiztegia - eraiketa txokoa
KIRIKOLATZ
 
LOS SMARTPHONES AGILIZAN EL MERCADO DE COMIDA RAPIDA
LOS SMARTPHONES AGILIZAN EL MERCADO DE COMIDA RAPIDALOS SMARTPHONES AGILIZAN EL MERCADO DE COMIDA RAPIDA
LOS SMARTPHONES AGILIZAN EL MERCADO DE COMIDA RAPIDA
MajoM2003
 
Grab Chakravorty Heyde and STINS microbial interactions neuronal function 2011
Grab Chakravorty Heyde and STINS microbial interactions neuronal function 2011Grab Chakravorty Heyde and STINS microbial interactions neuronal function 2011
Grab Chakravorty Heyde and STINS microbial interactions neuronal function 2011Monique Stins, M.A.T., PhD.
 
Dark matter
Dark matterDark matter
Dark matter
Blanca García
 
CBNews SNPTV
CBNews SNPTV CBNews SNPTV
CBNews SNPTV
SNPTV Pub TV
 
A6-Nylon_Film_Handling_Guide_v1
A6-Nylon_Film_Handling_Guide_v1A6-Nylon_Film_Handling_Guide_v1
A6-Nylon_Film_Handling_Guide_v1Ken Guhse
 
Government final
Government finalGovernment final
Government final
Westyn Swenson
 
Getfeedback 360 degree feedback options
Getfeedback 360 degree feedback optionsGetfeedback 360 degree feedback options
Getfeedback 360 degree feedback options
Abigail Clayton
 
Presentación1
Presentación1Presentación1
Presentación1
cesar martin
 
Gure gela hiztegia - mahai txokoa
Gure  gela  hiztegia - mahai txokoaGure  gela  hiztegia - mahai txokoa
Gure gela hiztegia - mahai txokoa
KIRIKOLATZ
 

Viewers also liked (12)

Ofimatica
OfimaticaOfimatica
Ofimatica
 
Gure gela hiztegia - eraiketa txokoa
Gure gela hiztegia - eraiketa txokoaGure gela hiztegia - eraiketa txokoa
Gure gela hiztegia - eraiketa txokoa
 
LOS SMARTPHONES AGILIZAN EL MERCADO DE COMIDA RAPIDA
LOS SMARTPHONES AGILIZAN EL MERCADO DE COMIDA RAPIDALOS SMARTPHONES AGILIZAN EL MERCADO DE COMIDA RAPIDA
LOS SMARTPHONES AGILIZAN EL MERCADO DE COMIDA RAPIDA
 
AIG E-Flyer
AIG E-FlyerAIG E-Flyer
AIG E-Flyer
 
Grab Chakravorty Heyde and STINS microbial interactions neuronal function 2011
Grab Chakravorty Heyde and STINS microbial interactions neuronal function 2011Grab Chakravorty Heyde and STINS microbial interactions neuronal function 2011
Grab Chakravorty Heyde and STINS microbial interactions neuronal function 2011
 
Dark matter
Dark matterDark matter
Dark matter
 
CBNews SNPTV
CBNews SNPTV CBNews SNPTV
CBNews SNPTV
 
A6-Nylon_Film_Handling_Guide_v1
A6-Nylon_Film_Handling_Guide_v1A6-Nylon_Film_Handling_Guide_v1
A6-Nylon_Film_Handling_Guide_v1
 
Government final
Government finalGovernment final
Government final
 
Getfeedback 360 degree feedback options
Getfeedback 360 degree feedback optionsGetfeedback 360 degree feedback options
Getfeedback 360 degree feedback options
 
Presentación1
Presentación1Presentación1
Presentación1
 
Gure gela hiztegia - mahai txokoa
Gure  gela  hiztegia - mahai txokoaGure  gela  hiztegia - mahai txokoa
Gure gela hiztegia - mahai txokoa
 

Similar to The Year of the Pitcher: Analyzing No-Hitters

WageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesWageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesGeorge Ulloa
 
The Data Behind Football
The Data Behind FootballThe Data Behind Football
The Data Behind Football
Apostolos Mourouzis
 
Maths A - Chapter 11
Maths A - Chapter 11Maths A - Chapter 11
Maths A - Chapter 11westy67968
 
Ordinary Least Squares Ordinary Least Squares
Ordinary Least Squares Ordinary Least SquaresOrdinary Least Squares Ordinary Least Squares
Ordinary Least Squares Ordinary Least Squares
farikaumi777
 
InstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docxInstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docx
dirkrplav
 
1982 maher modelling association football scores
1982 maher   modelling association football scores1982 maher   modelling association football scores
1982 maher modelling association football scoresponton42
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
Kapil Dev Ghante
 
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
Formulatedby
 
What Innings Determine Total Wins
What Innings Determine Total WinsWhat Innings Determine Total Wins
What Innings Determine Total WinsPayton Soicher
 
A Time Series Analysis for Predicting Basketball Statistics
A Time Series Analysis for Predicting Basketball StatisticsA Time Series Analysis for Predicting Basketball Statistics
A Time Series Analysis for Predicting Basketball StatisticsJoseph DeLay
 
Estimating Attendance at Major League Baseball Games for the 2008-2012 Seasons
Estimating Attendance at Major League Baseball Games for the 2008-2012 SeasonsEstimating Attendance at Major League Baseball Games for the 2008-2012 Seasons
Estimating Attendance at Major League Baseball Games for the 2008-2012 Seasons
Marcus A. Streips
 
Decision Analysis I 2010
Decision Analysis I 2010Decision Analysis I 2010
Decision Analysis I 2010
Martyput
 
Effects of Rule Changes and Three-point System in NHL
Effects of Rule Changes and Three-point System in NHLEffects of Rule Changes and Three-point System in NHL
Effects of Rule Changes and Three-point System in NHL
Patrice Marek
 
Frequency Tables - Statistics
Frequency Tables - StatisticsFrequency Tables - Statistics
Frequency Tables - Statisticsmscartersmaths
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points FinalJohn Michael Croft
 
Exploring Data
Exploring DataExploring Data
Exploring Data
ImAnAnteater
 

Similar to The Year of the Pitcher: Analyzing No-Hitters (20)

WageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletesWageDiscriminationAmongstNFLAthletes
WageDiscriminationAmongstNFLAthletes
 
The Data Behind Football
The Data Behind FootballThe Data Behind Football
The Data Behind Football
 
Statistical Model Report
Statistical Model ReportStatistical Model Report
Statistical Model Report
 
Statistical Model Report
Statistical Model ReportStatistical Model Report
Statistical Model Report
 
Maths A - Chapter 11
Maths A - Chapter 11Maths A - Chapter 11
Maths A - Chapter 11
 
Ordinary Least Squares Ordinary Least Squares
Ordinary Least Squares Ordinary Least SquaresOrdinary Least Squares Ordinary Least Squares
Ordinary Least Squares Ordinary Least Squares
 
InstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docxInstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docx
 
1982 maher modelling association football scores
1982 maher   modelling association football scores1982 maher   modelling association football scores
1982 maher modelling association football scores
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...
 
Statistics For Management 3 October
Statistics For Management 3 OctoberStatistics For Management 3 October
Statistics For Management 3 October
 
What Innings Determine Total Wins
What Innings Determine Total WinsWhat Innings Determine Total Wins
What Innings Determine Total Wins
 
A Time Series Analysis for Predicting Basketball Statistics
A Time Series Analysis for Predicting Basketball StatisticsA Time Series Analysis for Predicting Basketball Statistics
A Time Series Analysis for Predicting Basketball Statistics
 
Estimating Attendance at Major League Baseball Games for the 2008-2012 Seasons
Estimating Attendance at Major League Baseball Games for the 2008-2012 SeasonsEstimating Attendance at Major League Baseball Games for the 2008-2012 Seasons
Estimating Attendance at Major League Baseball Games for the 2008-2012 Seasons
 
Final Thesis
Final ThesisFinal Thesis
Final Thesis
 
Decision Analysis I 2010
Decision Analysis I 2010Decision Analysis I 2010
Decision Analysis I 2010
 
Effects of Rule Changes and Three-point System in NHL
Effects of Rule Changes and Three-point System in NHLEffects of Rule Changes and Three-point System in NHL
Effects of Rule Changes and Three-point System in NHL
 
Frequency Tables - Statistics
Frequency Tables - StatisticsFrequency Tables - Statistics
Frequency Tables - Statistics
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 

Recently uploaded

Belgium Vs Romania Witsel recalled to Belgium squad for Euro 2024.docx
Belgium Vs Romania Witsel recalled to Belgium squad for Euro 2024.docxBelgium Vs Romania Witsel recalled to Belgium squad for Euro 2024.docx
Belgium Vs Romania Witsel recalled to Belgium squad for Euro 2024.docx
World Wide Tickets And Hospitality
 
Croatia vs Italy Croatia vs Italy Predictions, Tips & Odds Azzurri looking t...
Croatia vs Italy  Croatia vs Italy Predictions, Tips & Odds Azzurri looking t...Croatia vs Italy  Croatia vs Italy Predictions, Tips & Odds Azzurri looking t...
Croatia vs Italy Croatia vs Italy Predictions, Tips & Odds Azzurri looking t...
World Wide Tickets And Hospitality
 
Croatia vs Spain Euro 2024 Epic Showdown in the Group Stage.docx
Croatia vs Spain Euro 2024 Epic Showdown in the Group Stage.docxCroatia vs Spain Euro 2024 Epic Showdown in the Group Stage.docx
Croatia vs Spain Euro 2024 Epic Showdown in the Group Stage.docx
Eticketing.co
 
Narrated Business Proposal for the Philadelphia Eagles
Narrated Business Proposal for the Philadelphia EaglesNarrated Business Proposal for the Philadelphia Eagles
Narrated Business Proposal for the Philadelphia Eagles
camrynascott12
 
TAM Sports_IPL 17_Commercial Advertising_Report.pdf
TAM Sports_IPL 17_Commercial Advertising_Report.pdfTAM Sports_IPL 17_Commercial Advertising_Report.pdf
TAM Sports_IPL 17_Commercial Advertising_Report.pdf
Social Samosa
 
Mats Zuccarello Biography & Stats-icebrek.pdf
Mats Zuccarello Biography & Stats-icebrek.pdfMats Zuccarello Biography & Stats-icebrek.pdf
Mats Zuccarello Biography & Stats-icebrek.pdf
Ice Brek
 
Serbia vs England Tickets: Serbia Prepares for Historic UEFA Euro 2024 Debut ...
Serbia vs England Tickets: Serbia Prepares for Historic UEFA Euro 2024 Debut ...Serbia vs England Tickets: Serbia Prepares for Historic UEFA Euro 2024 Debut ...
Serbia vs England Tickets: Serbia Prepares for Historic UEFA Euro 2024 Debut ...
Eticketing.co
 
Portugal Vs Czechia- Ronaldo feels 'proud' of new UEFA Euro 2024 record.docx
Portugal Vs Czechia- Ronaldo feels 'proud' of new UEFA Euro 2024 record.docxPortugal Vs Czechia- Ronaldo feels 'proud' of new UEFA Euro 2024 record.docx
Portugal Vs Czechia- Ronaldo feels 'proud' of new UEFA Euro 2024 record.docx
World Wide Tickets And Hospitality
 
Belgium vs Slovakia Belgium announce provisional squad for Euro Cup 2024 Thib...
Belgium vs Slovakia Belgium announce provisional squad for Euro Cup 2024 Thib...Belgium vs Slovakia Belgium announce provisional squad for Euro Cup 2024 Thib...
Belgium vs Slovakia Belgium announce provisional squad for Euro Cup 2024 Thib...
Eticketing.co
 
Spain vs Italy Spain at Euro Cup 2024 Group, Fixtures, Players to Watch and M...
Spain vs Italy Spain at Euro Cup 2024 Group, Fixtures, Players to Watch and M...Spain vs Italy Spain at Euro Cup 2024 Group, Fixtures, Players to Watch and M...
Spain vs Italy Spain at Euro Cup 2024 Group, Fixtures, Players to Watch and M...
Eticketing.co
 
Turkey Vs Portugal-UEFA EURO 2024 Montella calls up three Serie A players to ...
Turkey Vs Portugal-UEFA EURO 2024 Montella calls up three Serie A players to ...Turkey Vs Portugal-UEFA EURO 2024 Montella calls up three Serie A players to ...
Turkey Vs Portugal-UEFA EURO 2024 Montella calls up three Serie A players to ...
World Wide Tickets And Hospitality
 
Poland Vs Austria Poland Euro Cup 2024 squad Who is Michal Probierz bringing ...
Poland Vs Austria Poland Euro Cup 2024 squad Who is Michal Probierz bringing ...Poland Vs Austria Poland Euro Cup 2024 squad Who is Michal Probierz bringing ...
Poland Vs Austria Poland Euro Cup 2024 squad Who is Michal Probierz bringing ...
World Wide Tickets And Hospitality
 
LtCol Thomas Jasper Marine Corps Marathon.pdf
LtCol Thomas Jasper Marine Corps Marathon.pdfLtCol Thomas Jasper Marine Corps Marathon.pdf
LtCol Thomas Jasper Marine Corps Marathon.pdf
Thomas (Tom) Jasper
 
Boletin de la I Copa Panamericana de Voleibol Femenino U17 Guatemala 2024
Boletin de la I Copa Panamericana de Voleibol Femenino U17 Guatemala 2024Boletin de la I Copa Panamericana de Voleibol Femenino U17 Guatemala 2024
Boletin de la I Copa Panamericana de Voleibol Femenino U17 Guatemala 2024
Judith Chuquipul
 
Slovakia Vs Ukraine UEFA Euro 2024 Calzona selects five Serie A players in Sl...
Slovakia Vs Ukraine UEFA Euro 2024 Calzona selects five Serie A players in Sl...Slovakia Vs Ukraine UEFA Euro 2024 Calzona selects five Serie A players in Sl...
Slovakia Vs Ukraine UEFA Euro 2024 Calzona selects five Serie A players in Sl...
World Wide Tickets And Hospitality
 
My Personal Brand Key Note presentation.
My Personal Brand  Key Note presentation.My Personal Brand  Key Note presentation.
My Personal Brand Key Note presentation.
ashleymlugaro
 
CAA Region II Day 1 Morning Result Accra event
CAA Region II Day 1 Morning Result Accra eventCAA Region II Day 1 Morning Result Accra event
CAA Region II Day 1 Morning Result Accra event
Kweku Zurek
 
Denmark Vs England Cole Palmer thrilled to be selected in England’s Euro Cup ...
Denmark Vs England Cole Palmer thrilled to be selected in England’s Euro Cup ...Denmark Vs England Cole Palmer thrilled to be selected in England’s Euro Cup ...
Denmark Vs England Cole Palmer thrilled to be selected in England’s Euro Cup ...
World Wide Tickets And Hospitality
 
The Richest Female Athletes of 2024: Champions of Wealth and Excellence | CIO...
The Richest Female Athletes of 2024: Champions of Wealth and Excellence | CIO...The Richest Female Athletes of 2024: Champions of Wealth and Excellence | CIO...
The Richest Female Athletes of 2024: Champions of Wealth and Excellence | CIO...
CIOWomenMagazine
 
Turkey Hit by Double Injury Blow before of Euro 2024.docx
Turkey Hit by Double Injury Blow before of Euro 2024.docxTurkey Hit by Double Injury Blow before of Euro 2024.docx
Turkey Hit by Double Injury Blow before of Euro 2024.docx
Euro Cup 2024 Tickets
 

Recently uploaded (20)

Belgium Vs Romania Witsel recalled to Belgium squad for Euro 2024.docx
Belgium Vs Romania Witsel recalled to Belgium squad for Euro 2024.docxBelgium Vs Romania Witsel recalled to Belgium squad for Euro 2024.docx
Belgium Vs Romania Witsel recalled to Belgium squad for Euro 2024.docx
 
Croatia vs Italy Croatia vs Italy Predictions, Tips & Odds Azzurri looking t...
Croatia vs Italy  Croatia vs Italy Predictions, Tips & Odds Azzurri looking t...Croatia vs Italy  Croatia vs Italy Predictions, Tips & Odds Azzurri looking t...
Croatia vs Italy Croatia vs Italy Predictions, Tips & Odds Azzurri looking t...
 
Croatia vs Spain Euro 2024 Epic Showdown in the Group Stage.docx
Croatia vs Spain Euro 2024 Epic Showdown in the Group Stage.docxCroatia vs Spain Euro 2024 Epic Showdown in the Group Stage.docx
Croatia vs Spain Euro 2024 Epic Showdown in the Group Stage.docx
 
Narrated Business Proposal for the Philadelphia Eagles
Narrated Business Proposal for the Philadelphia EaglesNarrated Business Proposal for the Philadelphia Eagles
Narrated Business Proposal for the Philadelphia Eagles
 
TAM Sports_IPL 17_Commercial Advertising_Report.pdf
TAM Sports_IPL 17_Commercial Advertising_Report.pdfTAM Sports_IPL 17_Commercial Advertising_Report.pdf
TAM Sports_IPL 17_Commercial Advertising_Report.pdf
 
Mats Zuccarello Biography & Stats-icebrek.pdf
Mats Zuccarello Biography & Stats-icebrek.pdfMats Zuccarello Biography & Stats-icebrek.pdf
Mats Zuccarello Biography & Stats-icebrek.pdf
 
Serbia vs England Tickets: Serbia Prepares for Historic UEFA Euro 2024 Debut ...
Serbia vs England Tickets: Serbia Prepares for Historic UEFA Euro 2024 Debut ...Serbia vs England Tickets: Serbia Prepares for Historic UEFA Euro 2024 Debut ...
Serbia vs England Tickets: Serbia Prepares for Historic UEFA Euro 2024 Debut ...
 
Portugal Vs Czechia- Ronaldo feels 'proud' of new UEFA Euro 2024 record.docx
Portugal Vs Czechia- Ronaldo feels 'proud' of new UEFA Euro 2024 record.docxPortugal Vs Czechia- Ronaldo feels 'proud' of new UEFA Euro 2024 record.docx
Portugal Vs Czechia- Ronaldo feels 'proud' of new UEFA Euro 2024 record.docx
 
Belgium vs Slovakia Belgium announce provisional squad for Euro Cup 2024 Thib...
Belgium vs Slovakia Belgium announce provisional squad for Euro Cup 2024 Thib...Belgium vs Slovakia Belgium announce provisional squad for Euro Cup 2024 Thib...
Belgium vs Slovakia Belgium announce provisional squad for Euro Cup 2024 Thib...
 
Spain vs Italy Spain at Euro Cup 2024 Group, Fixtures, Players to Watch and M...
Spain vs Italy Spain at Euro Cup 2024 Group, Fixtures, Players to Watch and M...Spain vs Italy Spain at Euro Cup 2024 Group, Fixtures, Players to Watch and M...
Spain vs Italy Spain at Euro Cup 2024 Group, Fixtures, Players to Watch and M...
 
Turkey Vs Portugal-UEFA EURO 2024 Montella calls up three Serie A players to ...
Turkey Vs Portugal-UEFA EURO 2024 Montella calls up three Serie A players to ...Turkey Vs Portugal-UEFA EURO 2024 Montella calls up three Serie A players to ...
Turkey Vs Portugal-UEFA EURO 2024 Montella calls up three Serie A players to ...
 
Poland Vs Austria Poland Euro Cup 2024 squad Who is Michal Probierz bringing ...
Poland Vs Austria Poland Euro Cup 2024 squad Who is Michal Probierz bringing ...Poland Vs Austria Poland Euro Cup 2024 squad Who is Michal Probierz bringing ...
Poland Vs Austria Poland Euro Cup 2024 squad Who is Michal Probierz bringing ...
 
LtCol Thomas Jasper Marine Corps Marathon.pdf
LtCol Thomas Jasper Marine Corps Marathon.pdfLtCol Thomas Jasper Marine Corps Marathon.pdf
LtCol Thomas Jasper Marine Corps Marathon.pdf
 
Boletin de la I Copa Panamericana de Voleibol Femenino U17 Guatemala 2024
Boletin de la I Copa Panamericana de Voleibol Femenino U17 Guatemala 2024Boletin de la I Copa Panamericana de Voleibol Femenino U17 Guatemala 2024
Boletin de la I Copa Panamericana de Voleibol Femenino U17 Guatemala 2024
 
Slovakia Vs Ukraine UEFA Euro 2024 Calzona selects five Serie A players in Sl...
Slovakia Vs Ukraine UEFA Euro 2024 Calzona selects five Serie A players in Sl...Slovakia Vs Ukraine UEFA Euro 2024 Calzona selects five Serie A players in Sl...
Slovakia Vs Ukraine UEFA Euro 2024 Calzona selects five Serie A players in Sl...
 
My Personal Brand Key Note presentation.
My Personal Brand  Key Note presentation.My Personal Brand  Key Note presentation.
My Personal Brand Key Note presentation.
 
CAA Region II Day 1 Morning Result Accra event
CAA Region II Day 1 Morning Result Accra eventCAA Region II Day 1 Morning Result Accra event
CAA Region II Day 1 Morning Result Accra event
 
Denmark Vs England Cole Palmer thrilled to be selected in England’s Euro Cup ...
Denmark Vs England Cole Palmer thrilled to be selected in England’s Euro Cup ...Denmark Vs England Cole Palmer thrilled to be selected in England’s Euro Cup ...
Denmark Vs England Cole Palmer thrilled to be selected in England’s Euro Cup ...
 
The Richest Female Athletes of 2024: Champions of Wealth and Excellence | CIO...
The Richest Female Athletes of 2024: Champions of Wealth and Excellence | CIO...The Richest Female Athletes of 2024: Champions of Wealth and Excellence | CIO...
The Richest Female Athletes of 2024: Champions of Wealth and Excellence | CIO...
 
Turkey Hit by Double Injury Blow before of Euro 2024.docx
Turkey Hit by Double Injury Blow before of Euro 2024.docxTurkey Hit by Double Injury Blow before of Euro 2024.docx
Turkey Hit by Double Injury Blow before of Euro 2024.docx
 

The Year of the Pitcher: Analyzing No-Hitters

  • 1. This presentation will be taking a look at one of the many real life use for statistics, Sports, and what some basic and advanced options can tell us when trying to use statistics to decide a not so straight forward question. Using SSPS as our backbone along with already printed statistical analysis we will take a look at how things like z-scores, standard deviations, selective cases, and logistic regressions, can help us obtain a solid foundation to answer a hazy question. Memorable or just plain Amazing: A Statistical Look at the 2010 MLB Season otherwise known as “The Year of the Pitcher” By: KC Burgos Math 449 nov. 1, 2011
  • 2. What Exactly Are We Looking To Answer? • 2010 MLB Season has been classified as the “Year of the Pitcher” • Seven No-Hitters (Six official) • What exactly defines a great pitching year?
  • 3. List of No-Hitters in 2010 • April 17th, Ubaldo Jiménez vs. Atlanta Braves (Final Score 4-0) • May 9th, Dallas Braden vs. Tampa Bay Rays (Final Score 4-0) • May 29th, Roy Halladay vs. Florida Marlins (Final Score 1-0) • June 2nd, Armando Galarraga* vs. Cleveland Indians (Final Score 3-0) • June 25th, Edwin Jackson vs. Tampa Bay Rays (Final Score 1-0) • July 26th, Matt Garza vs. Detroit Tigers (Final Score 5-0) • October 6th, Roy Halladay vs. Cincinnati Reds (Final Score 4-0)
  • 4. Armando Galaragga’s No-Hitter • Armando Galaragga’s No-Hitter is officially recognized by MLB as a one-hitter. • The 27th out
  • 5. Defining Parameters for our Question • Many ways to answer what makes a season the “Year of the Pitcher” • Comparing Seasons and Stats • No-Hitter Probabilities
  • 6. Comparisons: Variables • IP – Innings Pitched • R – Runs • ER – Earned Runs • H – Hits • BB – Base on Balls (Walks) • ERA – Earned Run Average • WHIP – Walks and Hits per Inning Pitched
  • 7. Computation Variables • IP – 123.1 = 123 1/3, 453.2 = 453 2/3 • ERA – (9*ER)/IP • WHIP – (H + BB)/IP
  • 8. Comparisons: The Data • 2010 Season • 30 Teams • Specifically Looking at Pitching • Other Possible Variables We Could Have Used
  • 11. Comparisons: Yearly Data • Compare it to Data from 1969-2009 • Why 1969?
  • 12. Comparisons: Yearly Data • Special Years: 1969 1977 1981 1993 1994 1995 1998
  • 13. Comparisons: Normaliity If we can have, or make, our values match a normal curve then we can easily use these already derived percentages of where each standard deviation should fit.
  • 14. Comparisons: Normality and Homoscedasticity MLB ERA 1969-2009
  • 15. Comparisons: Normality and Homoscedasticity MLB WHIP 1969-2009
  • 16. Comparisons: Normality and Homoscedasticity AL ERA 1969-2009
  • 17. Comparisons: Normality and Homoscedasticity AL WHIP 1969-2009
  • 18. Comparisons: Normality and Homoscedasticity NL ERA 1969-2009
  • 19. Comparisons: Normality and Homoscedasticity NL WHIP 1969-2009
  • 20. Comparisons: Fischer’s Method Fischer’s Method tells us that if the skewness value is twice its standard error in either direction then the curve is severely skewed. A distribution is considered normal if it meets -1.96 < 𝑺 𝑺𝑬𝑺 < 1.96
  • 21. Comparisons: Fischer’s Method It should also be noted that this applies for Kurtosis. However I focus here more on the skewness of data, having already taken kurtosis into account. -1.96 < 𝑲 𝑺𝑬𝑲 < 1.96
  • 22. Comparisons: Yearly Data Descriptives
  • 23. Comparisons: Doing the Math With Normality Assumed ERA and WHIP should be within 3 Standard Deviations from the mean.
  • 24. Comparisons: Doing the Math Lower 3 Standard Deviations 1969-2009 (1) (2) (3) MLB ERA 4.0522 3.67005 3.2879 2.90575 MLB WHIP 1.3673 1.32313 1.27896 1.23479 AL ERA 4.31734 3.7317 3.29 2.8483 AL WHIP 1.3836 1.32916 1.27472 1.22028 NL ERA 3.91 3.54707 3.18414 2.82121 NL WHIP 1.35 1.30743 1.26486 1.22229
  • 25. Comparisons: Doing the Math z-Scores z-Scores can be used to determine an exact distance from the mean a number ‘X’ is in terms of the standard deviation. z = 𝑋−𝜇 𝜎
  • 26. Comparisons: Doing the Math 1969- 2009 (1) (2) (3) MLB ERA 4.0522 3.67005 3.2879 2.90575 MLB WHIP 1.3673 1.32313 1.27896 1.23479 AL ERA 4.31734 3.7317 3.29 2.8483 AL WHIP 1.3836 1.32916 1.27472 1.22028 NL ERA 3.91 3.54707 3.18414 2.82121 NL WHIP 1.35 1.30743 1.26486 1.22229 2010 MLB ERA = 4.0743 2010 MLB WHIP = 1.3473 2010 AL ERA = 4.1384 2010 AL WHIP = 1.3463 2010 NL ERA = 4.0183 2010 NL WHIP = 1.3482 Z-Scores 2010 MLB ERA: z = .05783 2010 MLB WHIP: z = -.452796 2010 AL ERA: z = -.4051166 2010 AL WHIP: z = -.685158 2010 NL ERA: z = .298405 2010 NL WHIP: z = -.0422833
  • 27. Comparisons: What Does It Tell Us? Since all of the Sample Means are within one standard deviation of the Parent Mean then we can say that 2010 was a relatively normal pitching year with respect to post-1969 pitching years in terms of the variables ERA and WHIP.
  • 28. Comparisons: Defining a Smaller Interval When looking at a scatter plot graph of MLB ERA between 1969 and 2009 we can see a unusual increase. So let’s apply a R2 Linear Fit Line. Although helpful a R2 Linear Fit Line doesn’t quite give us a great description of what’s truly going on. So instead let us use a a R2 LOESS Fit Line. LOESS (Locally Weighted Scatterplot Smoothing) looks more specifically at subsets within our values and produces a curve to fit them.
  • 29. Comparisons: Defining a Smaller Interval We can now see an interesting curve in our later values. We want to start a subinterval at that point. It is the last year at which ERA would go down the following year before dramatic increases. The year turns out to be 1991
  • 30. Comparisons: 1991-2009 Data Descriptives Like before we are going to want to determine skewness and homoscedasticity. However, thanks to Fischer’s method, we can just use the descriptives. As for Homoscedacity, they all passed. We can see that we have skewness issues for two variables.
  • 31. Comparisons: 1991-2009 Data Descriptives Although there are certainly methods to deal with skewness to retrieve a normal distribution, instead we can see that each league has a statistic that is regarded as normal. WHIP Therefore we will focus on those three across the board.
  • 32. Comparisons: Doing the Math 2 Like before we will determine the three standard deviations away from the mean and use them to compare to the 2010 numbers 1991-2009 (1) (2) (3) MLB WHIP 1.3996 1.36352 1.32744 1.29136 AL WHIP 1.4214 1.37659 1.33178 1.28697 NL WHIP 1.3783 1.3406 1.3029 1.2652
  • 33. Comparisons: Doing the Math 2 1991- 2009 (1) (2) (3) MLB WHIP 1.3996 1.36352 1.32744 1.29136 AL WHIP 1.4214 1.37659 1.33178 1.28697 NL WHIP 1.3783 1.3406 1.3029 1.2652 2010 MLB WHIP = 1.3473 2010 AL WHIP = 1.3463 2010 NL WHIP = 1.3482 z-Scores 2010 MLB WHIP: z = -1.4496 2010 AL WHIP: z = -1.67597 2010 NL WHIP: z = -.798408
  • 34. Comparisons: So what does this tell us??? That even when looking at a recent interval of years (19 between 1991-2009) that last year pitching wise, in terms of WHIP, was at most within a range of two standard deviations away from the MLB, AL, and NL, averages over those 19 years (2010 MLB WHIP: z = -1.4496, 2010 AL WHIP: z = -1.67597, 2010 NL WHIP: z = -.798408). Therefore last year was above average, however, was not so far off that it could be considered extremely rare or an outlier. So once again not exactly the “Year of the Pitcher”
  • 35. Is There An Easier Way??? Comparisons Cons: • Too many numbers • Sample Sizes • Normality
  • 36. Logistic Regression Logistic Regression – When we want an odds ratio for a question that does not have a continuous answer (i.e. yes or no questions), a Logistic Regression can give you that equation based on the factors you give. P(event) = 1 1+𝑒−(𝑎+𝑏1 𝑋1+𝑏2 𝑋2+ … +𝑏 𝑖 𝑋 𝑖) Odds = 𝑃 𝑒𝑣𝑒𝑛𝑡 1+𝑃 𝑒𝑣𝑒𝑛𝑡 ln(odds) =𝑎 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + … + 𝑏𝑖 𝑋𝑖 a = Y-Intercept b = regression coefficient X = factors
  • 37. Logistic Regression: Assumptions • DOES NOT ASSUME Linearity • DOES NOT ASSUME Normal Distribution • DOES NOT ASSUME Homoscedasticity • DOES NOT ASSUME Normality of Residuals • Does Assume Sample Representativeness • Does Assume Levels of Measurement • Does Assume no Multicollinearity (exist if VIF > 10)
  • 38. Logistic Regression: Behind the Scenes Suppose we want to predict a variable Y from X where our data is described as: (x1 , y1) … (xn , yn) We can look at two possibilities. If Y has a Normal Distribution, with mean and variance 2. If Y has only two possible values 0 and 1.
  • 39. Linear Regression If Y has a Normal Distribution, with mean and variance 2. Then it has a probability density function (pdf) of 𝑒 − 𝑦−𝜇 2 2𝜎2 2𝜋𝑟 Suppose = a + bx The Likelihood function is 𝑒 − 𝑦1−(𝑎+𝑏𝑥1) 2 2𝜎2 2𝜋𝑟 ∗ ⋯ ∗ 𝑒 − 𝑦 𝑛− 𝑎+𝑏𝑥 𝑛 2 2𝜎2 2𝜋𝑟 = 𝑒 − 𝑖=1 𝑛 𝑦 𝑖− 𝑎+𝑏𝑥 𝑖 2 2𝜎2 2𝜋 𝑛 2 𝑟 𝑛
  • 40. Linear Regression If Y has a Normal Distribution, with mean and variance 2. Maximizing this likelihood function is the same as minimizing Q = 𝑖=1 𝑛 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 2 This can be done by setting derivatives with respect to a and with respect to b equal to zero. 𝑑𝑄 𝑑𝑏 = 𝑖=1 𝑛 2[𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 ](−𝑥𝑖) 𝑑𝑄 𝑑𝑎 = 𝑖=1 𝑛 2 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 (−1)
  • 41. Linear Regression If Y has a Normal Distribution, with mean and variance 2. 𝑑𝑄 𝑑𝑏 = 𝑖=1 𝑛 2[𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 ](−𝑥𝑖) 𝑑𝑄 𝑑𝑎 = 𝑖=1 𝑛 2 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 (−1) Solving these equations leads to the solutions 𝑏 = 𝑖=1 𝑛 (𝑥𝑖 − 𝑥) 𝑦𝑖 − 𝑦 2 𝑖=1 𝑛 (𝑥𝑖 − 𝑥)2 𝑎 = 𝑦 − 𝑏 𝑥
  • 42. Logistic Regression If Y has only two possible values 0 and 1. It’s probability density function (pdf) 𝑝 𝑦 1 − 𝑝 1−𝑦 , y = 0, 1 Where p is the probability that y = 1 and 1 − p is the probability that y = 0 Suppose 𝑝 = 𝑎 + 𝑏𝑥 This Doesn’t Work! It allows the possibility that p < 0 or p > 1. Let’s Try Odds!
  • 43. Logistic Regression If Y has only two possible values 0 and 1. 𝑜𝑑𝑑𝑠 = 𝑝 1−𝑝 = 𝑎 + 𝑏𝑥. However, This does not work either since it allows the possibility that 𝑝 1−𝑝 < 0. Let’s try this then log 𝑝 1−𝑝 = 𝑎 + 𝑏𝑥 This will work since log() can be any real number 𝑝 1 − 𝑝 = 𝑒 𝑎+𝑏𝑥 ⇒ 𝑝 = 𝑒 𝑎+𝑏𝑥 1 + 𝑒 𝑎+𝑏𝑥 = 1 1 + 𝑒− 𝑎+𝑏𝑥 ⟹ 1 − 𝑝 = 1 1 + 𝑒 𝑎+𝑏𝑥
  • 44. Logistic Regression If Y has only two possible values 0 and 1. The likelihood function is 𝑒 𝑎+𝑏𝑥1 1 + 𝑒 𝑎+𝑏𝑥1 𝑦1 1 1 + 𝑒 𝑎+𝑏𝑥1 𝑦1 ∗ ⋯ ∗ 𝑒 𝑎+𝑏𝑥 𝑛 1 + 𝑒 𝑎+𝑏𝑥 𝑛 𝑦 𝑛 1 1 + 𝑒 𝑎+𝑏𝑥 𝑛 𝑦 𝑛 p 1 p Now finding the derivatives with respect to a and b and setting these equal is very difficult! Typically this function is maximized by using iterative procedures from numerical analysis.
  • 45. Logistic Regression: Our Use • What made 2010 noticeable to begin with? • NO HITTERS!!!! • Step 1, Find our Variables
  • 46. Logistic Regression: Variables and Coding • Games Started (GS) • Winning Percentage of Games Started (WinPer) • Earned Run Average (ERA) • Complete Game Percentage of Games Started (CGPer) • Strikeouts per Nine Innings Pitched (Kper9) • Batting Average Against (BAA) • Innings Pitched (IP) • Hits per Nine Innings Pitched (Hper9) • Shutout Percentage of Games Started (ShoPer) • No Hitter Achieved (Coded 1 = yes, 0 = no)
  • 47. Logistic Regression: Preparation Before Throwing Our Variables into the regression we can actually eliminate ones we will not need using a Independent t-Test to see which variables are statistically significant between those who threw a no-hitter and who didn’t.
  • 49. Logistic Regression: Preparation But HOLD IT You still haven’t checked for Multicollinearity!!! Actually I have, it just turns out the two variables from the independent t-Test fit.
  • 50. Logistic Regression: Recap What We Know • The two variables being input are CGPer and ShoPer • Both fit all assumptions LET’S DO THIS THING!
  • 51. Logistic Regression: Equation So what do we get? ln(odds of Throwing a No Hitter) = -2.505 + 38.772 * (ShoPER)
  • 52. Logistic Regression: Using It! Take a look at the ln(odds) for the six pitchers who threw a no hitter (taking into account Stats before their 2010 season) • Roy Halladay ln(odds) = -2.505 + 38.772 * (.05226) = -.47878 • Ubaldo Jimenez ln(odds) = -2.505 + 38.772 * (0) = -2.505 • Dallas Braden ln(odds) = -2.505 + 38.772 * (0) = -2.505 • Armando Galarraga ln(odds) = -2.505 + 38.772 * (0) = -2.505 • Edwin Jackson ln(odds) = -2.505 + 38.772 * (.00909) = -2.15256 • Matt Garza ln(odds) = -2.505 + 38.772 * (.02326) = -1.60316
  • 53. Logistic Regression: Using It! Probability Table Pitcher ln(odds) P(no-hitter) Roy Halladay -.47878 0.382540 Ubaldo Jimenez -2.505 0.075508 Dallas Braden -2.505 0.075508 Armando Galarraga -2.505 0.075508 Edwin Jackson -2.15256 0.104092 Matt Garza -1.60316 0.167540
  • 54. Logistic Regression: One Last Comparison Let’s make one last comparison. Let’s compare the average ln(odds) of the previous six pitchers to throw no hitters with the previous six. -1.647795 Of our six pitchers we will find only one, The Philadelphia Phillies’ Roy Halladay, is above the average at -.47878. In fact he is above the ln(odds) average of everyone who threw a no hitter in our database, -1.173541089. As for our five others, the average ln(odds) of those pitcher in our database who didn’t throw a no-hitter is -1.575964583. All of our other five pitchers fall short of this mark, the closest being Matt Garza at -1.60316.
  • 55. Logistic Regression: What Does it Mean? It means four of the six pitchers who threw a no-hitter had, in laymen’s terms, PULLED A MIRACLE OUT OF THEIR HAT! Statistically, it means that four of our six pitchers not only beat the odds, they destroyed them. 3 of our pitchers, (Ubaldo Jimenez, Dallas Braden, and Armando Galarraga) beat the odds by nearly one having never even thrown a shutout before their no-hitters. The other two (Edwin Jackson and Matt Garza) showed odds more likely to not throw a no-hitter even having previously thrown shutouts. ***Although not directly assessed in this analysis, both Dallas Braden and Armando Galarraga pitched Perfect Games, exponentially rarer***
  • 57. Conclusion What does the data tell us? Overall pitching was average when we use the yearly averages and standard deviations from 1969-2009 (2010 MLB ERA: z = .05783, 2010 MLB WHIP: z = -.452796, 2010 AL ERA: z = -.4051166, 2010 AL WHIP: z = -.685158, 2010 NL ERA: z = .298405, 2010 NL WHIP: z = -.0422833), and only slightly improved when reducing our interval to 1991-2009 and focus in on WHIP (2010 MLB WHIP: z = -1.4496, 2010 AL WHIP: z = -1.67597, 2010 NL WHIP: z = - .798408). The pitchers who made this season memorable with their no-hitters showed that only one of them any sign that a no-hitter could be coming (Roy Halladay, ln(odds) = -.47878). As for the other five, all of them showed odds less than the average ln(odds) of those pitchers who didn’t throw no hitters (144, average ln(odds) = -1.575964583). Three of the five actually had the lowest possible odds according to our ln(odds) (Dallas Braden, Armando Galarraga, Ubaldo Jimenez, ln(odds) = -2.505)
  • 58. Conclusion So what should we call the 2010 MLB season? “The Comeback Year of the Pitcher”
  • 59. Special Thanks Dr. Sprechini – Advisor David Brown – Peer Reviewer
  • 60. Bibliography • Abu-Bader, Soleman. Advanced & Multivariable Statistical Methods For Social Science Research with a complete SSPS guide. Chicago, Ill: Lyceum Books, 2010. • Albert, Jim. "Is Roger Clemens' WHIP Trajectory Unusual?" Chance 22.2 (2009): 9- 19. • Bauman, Mike. Halladay stood out in Year of the Pitcher. MLB News. http://mlb.mlb.com/news/article.jsp?ymd=20101116&content_id=1611319 0&vkey=news_mlb&c_id=mlb • Forman, Sean, Justin Kubatko. www.baseball-reference.com • McCarthy, David, David Groggel, and John A. Bailer. "Career Pitching Statistics and the Probability of Throw a No-Hitter in MLB: A Case-Control Study." Chance 23.3 (2010): 25-35. • Schmotzer, Brian, Patrick D. Kilgo, and Jeff Switchenko. "'The Natural'? The Effect of Steroids on Offensive Performance in Baseball." Chance 22.2 (2009): 21- 31 • Wikipedia Contributors. No-Hitter. Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/No-hitter#Major_League_Baseball_no-hitters