Baseball stats

Baseball
and Steroids
Rachel Monaco
April 27, 2014
MA-315-A

You mad, bro?
 Steroid use has been a plague in our modern day athletic
world
 Roid Rage is a commonly used term for those who show
over aggressive tendencies in the athletic world
 However steroid users say that the drugs give them better
moods, cognitive functions, confidence, and many other
seemingly positive side effects
 Users also claim steroid use helps them be who they are
Case and point, if you’re a jerk to begin with, you’re probably
just going to be a bigger jerk on steroids

Baseball Steroid Use
 In 2004 the MLB decided to make all players submit to
random steroid testing to help cut down on what seemed to be
an epidemic during prior years
 I wanted to compare the two means for the batting averages
(for both the American and the National League) before and
after the MLB’s steroid testing
 I want to assume that after steroid testing the batting averages
dropped due to the super sluggers dropping out a bit or
stopping their steroid use
 I also want to look at specific MLB stars Barry Bonds and
Alex Rodriguez and how their stats have changed during this
time

How the data was
collected
Fortunately baseball has an enormous amount of data that
has been collected over an incredible amount of years. I was
able to use the MLB’s website full of statistics as well as the
Baseball Almanac. It was a fairly easy collection of data.
However, I wish I could have seen all of these games and
been able to collect the data that way.

Batting Averages before and
after
 In order to test the differences in the batting averages over the ten years
before and the ten years after implementing random steroid testing I
decided to do a test of comparing two means for the combined batting
averages of the American League and the National League.
 My null hypothesis is that the MLB’s combined batting averages before
random steroid testing and the MLB’s combined batting averages after
random steroid testing are equal

 My alternative hypothesis tis that the MLB’s combined batting averages
before random steroid testing and the MLB’s combined batting averages
after random steroid testing are not equal

Where Hn is the null hypothesis, μbt is the batting average before random steroid
testing and μat is the batting average after random steroid testing
 Also I have chosen a 0.10 level of significance

Pool or not to pool?
 Before comparing our two means we must use an F test to see if our variances are
equal or not in order to decide whether or not to pool or not to pool our sets of
data.
 My null hypothesis is that the variances are equal

 My alternative hypothesis is that the variances are not equal

 I used to Data Analysis ToolPak extension from Excel to run the F test. Before
running this however I decided that my level of significance would be 10%.
 We see that our Fvalue is 0.2961 and our F Crticial value is 0.4098. When we
compare these two values we see that our F Critical value is higher and we reject
our null hypothesis that our variances are equal and will assume unequal variance
for our test of comparing two means along with not pooling the data sets

Comparing two means
 From there we end up using the equation
 Where our SE is 0.002145.
 From here I am able to get our test statistic through the equation
 We get our t to be 2.592
 From there we use our Excel command “=t.dist.2t(2.592,min(9,9))”
which gives us the output of 0.029 as my p value
 From here I am able to compare my p value to my level of significance
which is 0.10. Seeing as our level of significance is greater than that of
our p value we reject the null hypothesis that the MLB’s combined
batting averages are equal.
 What does that mean?!

Alex Rodriguez Recent
Scandal
 Within the last year one of the biggest steroid scandals
has happened which happens to circle around Alex
Rodriguez (A-Rod) who plays for the New York
Yankees. In 2013, A-Rod was sentenced to the biggest
drug suspension from baseball which will take place
during the 2014 season for his use of steroids.
 A-Rod will be suspended from 162 games instead of his
original 211 decision
 So I decided to look at this third baseman and did a One-
way ANOVA for his years with the NYY.

One-Way ANOVA
 I performed a one-way ANOVA test for the different types of hits
A-Rod had during his years with the Yankees from 2004-2013.
Our null hypothesis for a one-way ANOVA is that all of the
averages are equal to each other whereas our alternative
hypothesis is that at least one of the means is different.
 After performing the test I see the hit and getting to first average
is 140.4, getting a double is 23.4, a triple 0.8, and a home run turns
out to be 30.9.
 Between the groups we see the p value is 1.07E-14, which is an
extremely small value showing us that we cannot conclude that
the averages are the same.
 In this case given the data I would assume that A-Rod is indeed
the power hitter we have assumed him to be, and as someone
who doesn’t like the Yankees I am hopeful that the 2014 season
without him will help others in the league make the strides they
need

Barry Bonds
 One of the greats
 Bonds definitely would have made it to the hall of
fame without the use of steroids, much like Nixon
would have won his presidency the second time
around if he had not cheated as well
 San Francisco Giants Left Fielder from 1993-2007
 While A-Rod had his years with the Yankees after
the starting of the random steroid testing, Barry was
before, after, and during implementation of the
change

Confidence Interval
 I wanted to construct a 95% confidence interval of
Barry Bonds Home Runs in the years he played for
the Giants.
 Doing this I get my z critical score to be -1.96 and
1.96
 I defined my population as Barry Bond’s Home
Runs during his years playing for the Giants;
therefore, I can figure out the population standard
deviation through my data

 I ran the descriptive statistics for Barry Bonds and
found the mean for his Home Runs to be 39067, the
median to be 40, the mode to be 46, the standard
deviation to be 14.52, as well as seeing the minimum
to be 73, and the maximum to be 73.
 Because I know the Standard Deviation of my
population I can use the equation

 Using this equation we get our E to be 17.80. From there we
get our interval to be (31.72, 46.41).
 This means that if we were to take random samples from our
population, we would get a piece of data that would be within
this interval 95% of the time.
 So looking at this I wanted to look at the pieces of data that do
not to analyze it
 We see that before the steroid testing we have two pieces of
data that are far higher than the upperbound of my confidence
interval and after steroid testing we have two that are
significantly lower (scratching 2005 where Bonds only played
14 games and hit 5 home runs)
 Looking at this it seems that Bonds best seasons were when he
was using steroids
 This makes sense because of Bonds involvement with the
BALCO scandal between 2003 to 2004 which was one of the
trigger points of implementing the random steroid testing rule
in the MLB

Has it worked?
 Within my analyses I believe that the
implementation of random steroid drug testing has
been effective in preventing some cheating within
the sport
 I would like to see how the Yankees do this coming
season to better help with my analysis

Baseball stats

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (15)

Similar to Baseball stats

Similar to Baseball stats (9)

Recently uploaded

Recently uploaded (20)

Baseball stats