Au sommaire de ce numéro spécial SNPTV / CBNews :
> Martine Hollinger Point de vue
> Nouveaux contrats d'écoute
> La TV dans tous ses états
> 1+1 =3 Histoire de congruence
> Les Présidents des Groupes Audiovisuels Regards Croisés
Getfeedback has over 15 years of expertise in designing and running 360 degree feedback surveys online. We provide small and large organisations with leading edge, user-friendly 360 degree feedback surveys and comprehensive reports on their people. This presentation provides viewers with an overview of our approach, tool and options available.
InstructionsFor this assignment, collect data exhibiting a relat.docxdirkrplav
Instructions
For this assignment, collect data exhibiting a relatively linear trend, find the line of best fit, plot the data and the line, interpret the slope, and use the linear equation to make a prediction. Also, find r2 (coefficient of determination) and r (correlation coefficient). Discuss your findings. Your topic may be that is related to sports, your work, a hobby, or something you find interesting. If you choose, you may use the suggestions described below.
A Linear Model Example and Technology Tips are provided in separate documents.
Tasks for Linear Regression Model (LR)
(LR-1) Describe your topic, provide your data, and cite your source. Collect at least 8 data points. Label appropriately. (Highly recommended: Post this information in the Linear Model Project discussion as well as in your completed project. Include a brief informative description in the title of your posting. Each student must use different data.)
The idea with the discussion posting is two-fold: (1) To share your interesting project idea with your classmates, and (2) To give me a chance to give you a brief thumbs-up or thumbs-down about your proposed topic and data. Sometimes students get off on the wrong foot or misunderstand the intent of the project, and your posting provides an opportunity for some feedback. Remark: Students may choose similar topics, but must have different data sets. For example, several students may be interested in a particular Olympic sport, and that is fine, but they must collect different data, perhaps from different events or different gender.
(LR-2) Plot the points (x, y) to obtain a scatterplot. Use an appropriate scale on the horizontal and vertical axes and be sure to label carefully. Visually judge whether the data points exhibit a relatively linear trend. (If so, proceed. If not, try a different topic or data set.)
(LR-3) Find the line of best fit (regression line) and graph it on the scatterplot. State the equation of the line.
(LR-4) State the slope of the line of best fit. Carefully interpret the meaning of the slope in a sentence or two.
(LR-5) Find and state the value of r2, the coefficient of determination, and r, the correlation coefficient. Discuss your findings in a few sentences. Is r positive or negative? Why? Is a line a good curve to fit to this data? Why or why not? Is the linear relationship very strong, moderately strong, weak, or nonexistent?
(LR-6) Choose a value of interest and use the line of best fit to make an estimate or prediction. Show calculation work.
(LR-7) Write a brief narrative of a paragraph or two. Summarize your findings and be sure to mention any aspect of the linear model project (topic, data, scatterplot, line, r, or estimate, etc.) that you found particularly important or interesting.
Scatterplots, Linear Regression, and Correlation [Section 1.4, starting on page 114 in the textbook]
When we have a set of data, often we would like to develop a model that fits the data.
First .
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...Formulatedby
We will give an overview of how data visualization and data analysis are used within the Florida Panthers organization, around the National Hockey League, and in the sports industry in general, in a variety of different contexts. We discuss how analytics can be used to assist an NHL team’s front office, coaching staff, and scouting department. We also discuss the kinds of data we encounter on the business side of the organization in departments like sales and marketing, as well as the kinds of questions the league offices try to answer with the help of data.
Next DSS MIA Event - https://datascience.salon/miami/
Au sommaire de ce numéro spécial SNPTV / CBNews :
> Martine Hollinger Point de vue
> Nouveaux contrats d'écoute
> La TV dans tous ses états
> 1+1 =3 Histoire de congruence
> Les Présidents des Groupes Audiovisuels Regards Croisés
Getfeedback has over 15 years of expertise in designing and running 360 degree feedback surveys online. We provide small and large organisations with leading edge, user-friendly 360 degree feedback surveys and comprehensive reports on their people. This presentation provides viewers with an overview of our approach, tool and options available.
InstructionsFor this assignment, collect data exhibiting a relat.docxdirkrplav
Instructions
For this assignment, collect data exhibiting a relatively linear trend, find the line of best fit, plot the data and the line, interpret the slope, and use the linear equation to make a prediction. Also, find r2 (coefficient of determination) and r (correlation coefficient). Discuss your findings. Your topic may be that is related to sports, your work, a hobby, or something you find interesting. If you choose, you may use the suggestions described below.
A Linear Model Example and Technology Tips are provided in separate documents.
Tasks for Linear Regression Model (LR)
(LR-1) Describe your topic, provide your data, and cite your source. Collect at least 8 data points. Label appropriately. (Highly recommended: Post this information in the Linear Model Project discussion as well as in your completed project. Include a brief informative description in the title of your posting. Each student must use different data.)
The idea with the discussion posting is two-fold: (1) To share your interesting project idea with your classmates, and (2) To give me a chance to give you a brief thumbs-up or thumbs-down about your proposed topic and data. Sometimes students get off on the wrong foot or misunderstand the intent of the project, and your posting provides an opportunity for some feedback. Remark: Students may choose similar topics, but must have different data sets. For example, several students may be interested in a particular Olympic sport, and that is fine, but they must collect different data, perhaps from different events or different gender.
(LR-2) Plot the points (x, y) to obtain a scatterplot. Use an appropriate scale on the horizontal and vertical axes and be sure to label carefully. Visually judge whether the data points exhibit a relatively linear trend. (If so, proceed. If not, try a different topic or data set.)
(LR-3) Find the line of best fit (regression line) and graph it on the scatterplot. State the equation of the line.
(LR-4) State the slope of the line of best fit. Carefully interpret the meaning of the slope in a sentence or two.
(LR-5) Find and state the value of r2, the coefficient of determination, and r, the correlation coefficient. Discuss your findings in a few sentences. Is r positive or negative? Why? Is a line a good curve to fit to this data? Why or why not? Is the linear relationship very strong, moderately strong, weak, or nonexistent?
(LR-6) Choose a value of interest and use the line of best fit to make an estimate or prediction. Show calculation work.
(LR-7) Write a brief narrative of a paragraph or two. Summarize your findings and be sure to mention any aspect of the linear model project (topic, data, scatterplot, line, r, or estimate, etc.) that you found particularly important or interesting.
Scatterplots, Linear Regression, and Correlation [Section 1.4, starting on page 114 in the textbook]
When we have a set of data, often we would like to develop a model that fits the data.
First .
Data Science Salon: Data visualization and Analysis in the Florida Panthers H...Formulatedby
We will give an overview of how data visualization and data analysis are used within the Florida Panthers organization, around the National Hockey League, and in the sports industry in general, in a variety of different contexts. We discuss how analytics can be used to assist an NHL team’s front office, coaching staff, and scouting department. We also discuss the kinds of data we encounter on the business side of the organization in departments like sales and marketing, as well as the kinds of questions the league offices try to answer with the help of data.
Next DSS MIA Event - https://datascience.salon/miami/
Estimating Attendance at Major League Baseball Games for the 2008-2012 SeasonsMarcus A. Streips
Both ordinary least squares and censored regression statistical model for predicting baseball attendance at MLB games in the 2008-2012 are presented, building on prior work of Robert J. Lemke, Matthew Leonard, Kelebogile Tlhokwane, “Estimating Attendance at Major League Baseball Games for the 2007 Season”, Journal of Sports Economics, Vol.1(3), 316 (2010).
Effects of Rule Changes and Three-point System in NHLPatrice Marek
There are two main reasons for changing rules in the ice hockey. The first reason is a safety of players and spectators and the second reason is an attractiveness of matches. This paper studies effects of rule changes that were made because of the second reason, e.g., allowing a two-line pass, narrowing a neutral zone, overtime in the case of a tie match. Rule changes are analyzed from two perspectives – the first one is a distribution of goals scored in a match and the second one is a relative number of ties after a 60-minute regulation time. All seasons since the last big expansion of the NHL in 1979 are used in this analysis. The second part of this paper is dedicated to study of the three-point system that is often named as a cure for a high number of ties in the NHL. This system was earlier introduced in the world’s most important ice hockey leagues, i.e., in the Czech Republic, Finland, Germany, Russia, Switzerland and Sweden and its effect in these leagues is analyzed.
Euro Cup fans worldwide can book Euro 2024 Tickets from our online platform www.worldwideticketsandhospitality. Fans can book Belgium Vs Romania Tickets on our website at discounted prices.
Euro Cup fans worldwide can book Euro 2024 Tickets from our online platform www.worldwideticketsandhospitality. Fans can book Croatia vs Italy Tickets on our website at discounted prices.
Narrated Business Proposal for the Philadelphia Eaglescamrynascott12
Slide 1:
Welcome, and thank you for joining me today. We will explore a strategic proposal to enhance parking and traffic management at Lincoln Financial Field, aiming to improve the overall fan experience and operational efficiency. This comprehensive plan addresses existing challenges and leverages innovative solutions to create a smoother and more enjoyable experience for our fans.
Slide 2:
Picture this: It’s a crisp fall afternoon, driving towards Lincoln Financial Field. The atmosphere is electric—tailgaters grilling, fans in Eagles jerseys creating a sea of green and white. The air buzzes with camaraderie and anticipation. You park, join the throng, and make your way to your seat. The stadium roars as the Eagles take the field, sending chills down your spine. Each play is a thrilling dance of strategy and skill. This is what being an Eagles fan is all about—the joy, the pride, and the shared experience.
Slide 3:
But now, the day is marred by frustration. The excitement wanes as you struggle to find a parking spot. The congestion is overwhelming, and tempers flare. The delays mean you miss the pre-game excitement, the tailgate camaraderie, and even the opening kick-off. After the game, the joy of victory or the shared solace of defeat is overshadowed by the stress of navigating out of the parking lot. The gridlock, honking horns, and endless waiting drain the energy and joy from what should have been an unforgettable experience.
Our proposal aims to eliminate these frustrations, ensuring that from arrival to departure, your experience is extraordinary. Efficient parking and smooth traffic flow are key to maintaining the high spirits and excitement that make game days special.
Slide 4:
The Philadelphia Eagles are not just a premier NFL team; they are an integral part of the community, hosting games, concerts, and various events at Lincoln Financial Field. Our state-of-the-art stadium is designed to provide a world-class experience for every attendee. Whether it's the thrill of game day, the excitement of a live concert, or the camaraderie of community events, we pride ourselves on delivering a fan-first experience and maintaining operational excellence across all our activities. Our commitment to our fans and community is unwavering, and we continuously strive to enhance every aspect of their experience, ensuring they leave with unforgettable memories.
Slide 5:
Recent trends show an increasing demand for efficient event logistics. Our customer feedback has consistently highlighted frustrations with parking and traffic. Surveys indicate that a significant number of fans are dissatisfied with the current parking situation. Comparisons with other venues like Citizens Bank Park and Wells Fargo Center reveal that we lag in terms of parking efficiency and convenience. These insights underscore the urgent need for innovation to meet and exceed fan expectations.
Slide 6:
As we delve into the intricacies of our operations, one glaring issue emer
Mats André Zuccarello Aasen, commonly known as Mats Zuccarello, was born on September 1, 1987, in
Oslo, Norway. He grew up in the bustling neighborhood of Løren, where his passion for ice hockey began
at a young age. His mother, Anita Zuccarello, is of Italian descent, and his father, Glenn Aasen, is
Norwegian. This multicultural background played a significant role in shaping his identity and versatility
on and off the ice.
Serbia vs England Tickets: Serbia Prepares for Historic UEFA Euro 2024 Debut ...Eticketing.co
Eticketing.co offers UEFA Euro 2024 Tickets to admirers who can get Serbia vs England Tickets through our trusted online ticketing marketplace. Eticketing.co is the most reliable source for booking Euro Cup Final Tickets. Sign up for the latest Euro Cup Germany Ticket alert.
Euro Cup international supporters can book Euro 2024 Tickets from our online platform Worldwideticketsandhospitality.com. Followers can book Portugal Vs Czechia Tickets on our website at sale prices.
Belgium vs Slovakia Belgium announce provisional squad for Euro Cup 2024 Thib...Eticketing.co
Euro 2024 fans worldwide can book Belgium vs Slovakia Tickets from our online platform www.eticketing.co. Fans can book Euro Cup Germany Tickets on our website at discounted prices.
Spain vs Italy Spain at Euro Cup 2024 Group, Fixtures, Players to Watch and M...Eticketing.co
Euro Cup 2024 fans worldwide can book Spain vs Italy Tickets from our online platform www.eticketing.co. Fans can book Euro Cup Germany Tickets on our website at discounted prices.
Euro Cup international supporters can book Euro 2024 Tickets from our online platform Worldwideticketsandhospitality.com. Followers can book Turkey Vs Portugal Tickets on our website at sale prices.
Euro Cup fans worldwide can book Euro 2024 Tickets from our online platform www.worldwideticketsandhospitality. Fans can book Poland Vs Austria Tickets on our website at discounted prices.
Results for LtCol Thomas Jasper, Marine, for the 2010 Marine Corps Marathon held October 31, 2010, marking the 35th annual marathon known as "The People's Marathon."
An impressive finishing time of 3:46:39, placing 324th in the Male division ages 40-44.
Boletin de la I Copa Panamericana de Voleibol Femenino U17 Guatemala 2024Judith Chuquipul
holaesungusto.- Boletín final de la I Copa Panamericana de Voleibol Femenino U17 - Ciudad de Guatemala 2024 que se realizó del 27 de mayo al 01 de julio, en el Domo Polideportivo Zona 13.
Fuente: norceca.net
Euro Cup fans worldwide can book Euro 2024 Tickets from our online platform www.worldwideticketsandhospitality. Fans can book Slovakia Vs Ukraine Tickets on our website at discounted prices.
Euro Cup fans worldwide can book Euro 2024 Tickets from our online platform www.worldwideticketsandhospitality. Fans can book Denmark Vs England Tickets on our website at discounted prices.
Turkey Hit by Double Injury Blow before of Euro 2024.docx
The Year of the Pitcher: Analyzing No-Hitters
1. This presentation will be taking a look at one of the many real
life use for statistics, Sports, and what some basic and
advanced options can tell us when trying to use statistics to
decide a not so straight forward question. Using SSPS as our
backbone along with already printed statistical analysis we will
take a look at how things like z-scores, standard deviations,
selective cases, and logistic regressions, can help us obtain a
solid foundation to answer a hazy question.
Memorable or just plain Amazing:
A Statistical Look at the 2010 MLB Season otherwise known as
“The Year of the Pitcher”
By: KC Burgos
Math 449
nov. 1, 2011
2. What Exactly Are We Looking To Answer?
• 2010 MLB Season has been classified as the
“Year of the Pitcher”
• Seven No-Hitters (Six official)
• What exactly defines a great pitching year?
3. List of No-Hitters in 2010
• April 17th, Ubaldo Jiménez vs. Atlanta Braves (Final Score 4-0)
• May 9th, Dallas Braden vs. Tampa Bay Rays (Final Score 4-0)
• May 29th, Roy Halladay vs. Florida Marlins (Final Score 1-0)
• June 2nd, Armando Galarraga* vs. Cleveland Indians (Final Score 3-0)
• June 25th, Edwin Jackson vs. Tampa Bay Rays (Final Score 1-0)
• July 26th, Matt Garza vs. Detroit Tigers (Final Score 5-0)
• October 6th, Roy Halladay vs. Cincinnati Reds (Final Score 4-0)
4. Armando Galaragga’s No-Hitter
• Armando Galaragga’s No-Hitter is officially
recognized by MLB as a one-hitter.
• The 27th out
5. Defining Parameters for our Question
• Many ways to answer what makes a season the
“Year of the Pitcher”
• Comparing Seasons and Stats
• No-Hitter Probabilities
6. Comparisons: Variables
• IP – Innings Pitched
• R – Runs
• ER – Earned Runs
• H – Hits
• BB – Base on Balls (Walks)
• ERA – Earned Run Average
• WHIP – Walks and Hits per Inning Pitched
13. Comparisons: Normaliity
If we can have, or make, our values match a normal curve then we
can easily use these already derived percentages of where each
standard deviation should fit.
20. Comparisons: Fischer’s Method
Fischer’s Method tells us that if the
skewness value is twice its standard error
in either direction then the curve is
severely skewed. A distribution is
considered normal if it meets
-1.96 <
𝑺
𝑺𝑬𝑺
< 1.96
21. Comparisons: Fischer’s Method
It should also be noted that this applies
for Kurtosis. However I focus here more
on the skewness of data, having already
taken kurtosis into account.
-1.96 <
𝑲
𝑺𝑬𝑲
< 1.96
23. Comparisons: Doing the Math
With Normality Assumed ERA and WHIP
should be within 3 Standard Deviations
from the mean.
24. Comparisons: Doing the Math
Lower 3 Standard Deviations
1969-2009 (1) (2) (3)
MLB ERA 4.0522 3.67005 3.2879 2.90575
MLB WHIP 1.3673 1.32313 1.27896 1.23479
AL ERA 4.31734 3.7317 3.29 2.8483
AL WHIP 1.3836 1.32916 1.27472 1.22028
NL ERA 3.91 3.54707 3.18414 2.82121
NL WHIP 1.35 1.30743 1.26486 1.22229
25. Comparisons: Doing the Math
z-Scores
z-Scores can be used to determine an
exact distance from the mean a
number ‘X’ is in terms of the standard
deviation.
z =
𝑋−𝜇
𝜎
26. Comparisons: Doing the Math
1969-
2009 (1) (2) (3)
MLB
ERA
4.0522 3.67005 3.2879 2.90575
MLB
WHIP
1.3673 1.32313 1.27896 1.23479
AL ERA 4.31734 3.7317 3.29 2.8483
AL WHIP 1.3836 1.32916 1.27472 1.22028
NL ERA 3.91 3.54707 3.18414 2.82121
NL
WHIP
1.35 1.30743 1.26486 1.22229
2010 MLB ERA = 4.0743
2010 MLB WHIP = 1.3473
2010 AL ERA = 4.1384
2010 AL WHIP = 1.3463
2010 NL ERA = 4.0183
2010 NL WHIP = 1.3482
Z-Scores
2010 MLB ERA: z = .05783
2010 MLB WHIP: z = -.452796
2010 AL ERA: z = -.4051166
2010 AL WHIP: z = -.685158
2010 NL ERA: z = .298405
2010 NL WHIP: z = -.0422833
27. Comparisons: What Does It Tell Us?
Since all of the Sample Means are within one standard
deviation of the Parent Mean then we can say that 2010
was a relatively normal pitching year with respect to
post-1969 pitching years in terms of the variables ERA
and WHIP.
28. Comparisons: Defining a Smaller Interval
When looking at a scatter plot
graph of MLB ERA between 1969
and 2009 we can see a unusual
increase. So let’s apply a R2 Linear
Fit Line.
Although helpful a R2 Linear Fit
Line doesn’t quite give us a great
description of what’s truly going
on. So instead let us use a a R2
LOESS Fit Line.
LOESS (Locally Weighted
Scatterplot Smoothing) looks
more specifically at subsets
within our values and produces a
curve to fit them.
29. Comparisons: Defining a Smaller Interval
We can now see an interesting
curve in our later values.
We want to start a subinterval
at that point. It is the last year
at which ERA would go down
the following year before
dramatic increases.
The year turns out to be 1991
30. Comparisons: 1991-2009 Data Descriptives
Like before we are going to want to determine skewness and
homoscedasticity. However, thanks to Fischer’s method, we can just use
the descriptives. As for Homoscedacity, they all passed.
We can see that we have skewness issues for two variables.
31. Comparisons: 1991-2009 Data Descriptives
Although there are certainly methods to deal with skewness to retrieve a
normal distribution, instead we can see that each league has a statistic
that is regarded as normal.
WHIP
Therefore we will focus on those three across the board.
32. Comparisons: Doing the Math 2
Like before we will determine the three standard deviations away from
the mean and use them to compare to the 2010 numbers
1991-2009 (1) (2) (3)
MLB WHIP 1.3996 1.36352 1.32744 1.29136
AL WHIP 1.4214 1.37659 1.33178 1.28697
NL WHIP 1.3783 1.3406 1.3029 1.2652
33. Comparisons: Doing the Math 2
1991-
2009 (1) (2) (3)
MLB
WHIP
1.3996 1.36352 1.32744 1.29136
AL
WHIP
1.4214 1.37659 1.33178 1.28697
NL
WHIP
1.3783 1.3406 1.3029 1.2652
2010 MLB WHIP = 1.3473
2010 AL WHIP = 1.3463
2010 NL WHIP = 1.3482
z-Scores
2010 MLB WHIP: z = -1.4496
2010 AL WHIP: z = -1.67597
2010 NL WHIP: z = -.798408
34. Comparisons: So what does this tell us???
That even when looking at a recent interval of years (19
between 1991-2009) that last year pitching wise, in terms
of WHIP, was at most within a range of two standard
deviations away from the MLB, AL, and NL, averages over
those 19 years (2010 MLB WHIP: z = -1.4496, 2010 AL
WHIP: z = -1.67597, 2010 NL WHIP: z = -.798408).
Therefore last year was above average, however, was not
so far off that it could be considered extremely rare or an
outlier. So once again not exactly the “Year of the
Pitcher”
35. Is There An Easier Way???
Comparisons Cons:
• Too many numbers
• Sample Sizes
• Normality
36. Logistic Regression
Logistic Regression – When we want an odds ratio for a
question that does not have a continuous answer (i.e.
yes or no questions), a Logistic Regression can give you
that equation based on the factors you give.
P(event) =
1
1+𝑒−(𝑎+𝑏1 𝑋1+𝑏2 𝑋2+ … +𝑏 𝑖 𝑋 𝑖)
Odds =
𝑃 𝑒𝑣𝑒𝑛𝑡
1+𝑃 𝑒𝑣𝑒𝑛𝑡
ln(odds) =𝑎 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + … + 𝑏𝑖 𝑋𝑖
a = Y-Intercept
b = regression coefficient
X = factors
37. Logistic Regression: Assumptions
• DOES NOT ASSUME Linearity
• DOES NOT ASSUME Normal Distribution
• DOES NOT ASSUME Homoscedasticity
• DOES NOT ASSUME Normality of Residuals
• Does Assume Sample Representativeness
• Does Assume Levels of Measurement
• Does Assume no Multicollinearity (exist if VIF > 10)
38. Logistic Regression: Behind the Scenes
Suppose we want to predict a variable Y from X where our data is
described as: (x1 , y1) … (xn , yn)
We can look at two possibilities.
If Y has a Normal Distribution, with mean and variance 2.
If Y has only two possible values 0 and 1.
39. Linear Regression
If Y has a Normal Distribution, with mean and variance 2.
Then it has a probability density function (pdf) of
𝑒
−
𝑦−𝜇 2
2𝜎2
2𝜋𝑟
Suppose = a + bx
The Likelihood function is
𝑒
−
𝑦1−(𝑎+𝑏𝑥1) 2
2𝜎2
2𝜋𝑟
∗ ⋯ ∗
𝑒
−
𝑦 𝑛− 𝑎+𝑏𝑥 𝑛
2
2𝜎2
2𝜋𝑟
=
𝑒
− 𝑖=1
𝑛
𝑦 𝑖− 𝑎+𝑏𝑥 𝑖
2
2𝜎2
2𝜋
𝑛
2 𝑟 𝑛
40. Linear Regression
If Y has a Normal Distribution, with mean and variance 2.
Maximizing this likelihood function is the same as minimizing
Q =
𝑖=1
𝑛
𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖
2
This can be done by setting derivatives with respect to a and with
respect to b equal to zero.
𝑑𝑄
𝑑𝑏
=
𝑖=1
𝑛
2[𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 ](−𝑥𝑖)
𝑑𝑄
𝑑𝑎
=
𝑖=1
𝑛
2 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 (−1)
41. Linear Regression
If Y has a Normal Distribution, with mean and variance 2.
𝑑𝑄
𝑑𝑏
=
𝑖=1
𝑛
2[𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 ](−𝑥𝑖)
𝑑𝑄
𝑑𝑎
=
𝑖=1
𝑛
2 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 (−1)
Solving these equations leads to the solutions
𝑏 =
𝑖=1
𝑛
(𝑥𝑖 − 𝑥) 𝑦𝑖 − 𝑦 2
𝑖=1
𝑛
(𝑥𝑖 − 𝑥)2
𝑎 = 𝑦 − 𝑏 𝑥
42. Logistic Regression
If Y has only two possible values 0 and 1.
It’s probability density function (pdf)
𝑝 𝑦
1 − 𝑝 1−𝑦
, y = 0, 1
Where p is the probability that y = 1
and 1 − p is the probability that y = 0
Suppose 𝑝 = 𝑎 + 𝑏𝑥
This Doesn’t Work!
It allows the possibility that p < 0 or p > 1.
Let’s Try Odds!
43. Logistic Regression
If Y has only two possible values 0 and 1.
𝑜𝑑𝑑𝑠 =
𝑝
1−𝑝
= 𝑎 + 𝑏𝑥.
However, This does not work either since it allows the possibility that
𝑝
1−𝑝
< 0.
Let’s try this then
log
𝑝
1−𝑝
= 𝑎 + 𝑏𝑥
This will work since log() can be any real number
𝑝
1 − 𝑝
= 𝑒 𝑎+𝑏𝑥
⇒ 𝑝 =
𝑒 𝑎+𝑏𝑥
1 + 𝑒 𝑎+𝑏𝑥
=
1
1 + 𝑒− 𝑎+𝑏𝑥
⟹ 1 − 𝑝 =
1
1 + 𝑒 𝑎+𝑏𝑥
44. Logistic Regression
If Y has only two possible values 0 and 1.
The likelihood function is
𝑒 𝑎+𝑏𝑥1
1 + 𝑒 𝑎+𝑏𝑥1
𝑦1
1
1 + 𝑒 𝑎+𝑏𝑥1
𝑦1
∗ ⋯ ∗
𝑒 𝑎+𝑏𝑥 𝑛
1 + 𝑒 𝑎+𝑏𝑥 𝑛
𝑦 𝑛
1
1 + 𝑒 𝑎+𝑏𝑥 𝑛
𝑦 𝑛
p 1 p
Now finding the derivatives with respect to a and b and setting these equal is
very difficult! Typically this function is maximized by using iterative procedures
from numerical analysis.
45. Logistic Regression: Our Use
• What made 2010 noticeable to begin with?
• NO HITTERS!!!!
• Step 1, Find our Variables
46. Logistic Regression: Variables and Coding
• Games Started (GS)
• Winning Percentage of Games Started (WinPer)
• Earned Run Average (ERA)
• Complete Game Percentage of Games Started
(CGPer)
• Strikeouts per Nine Innings Pitched (Kper9)
• Batting Average Against (BAA)
• Innings Pitched (IP)
• Hits per Nine Innings Pitched (Hper9)
• Shutout Percentage of Games Started (ShoPer)
• No Hitter Achieved (Coded 1 = yes, 0 = no)
47. Logistic Regression: Preparation
Before Throwing Our Variables into the
regression we can actually eliminate ones we
will not need using a Independent t-Test to
see which variables are statistically significant
between those who threw a no-hitter and
who didn’t.
49. Logistic Regression: Preparation
But HOLD IT You still haven’t checked
for Multicollinearity!!!
Actually I have, it just turns out the two variables from the
independent t-Test fit.
50. Logistic Regression: Recap
What We Know
• The two variables being input are CGPer and ShoPer
• Both fit all assumptions
LET’S DO THIS THING!
52. Logistic Regression: Using It!
Take a look at the ln(odds) for the six pitchers who threw a no hitter
(taking into account Stats before their 2010 season)
• Roy Halladay ln(odds) = -2.505 + 38.772 * (.05226) = -.47878
• Ubaldo Jimenez ln(odds) = -2.505 + 38.772 * (0) = -2.505
• Dallas Braden ln(odds) = -2.505 + 38.772 * (0) = -2.505
• Armando Galarraga ln(odds) = -2.505 + 38.772 * (0) = -2.505
• Edwin Jackson ln(odds) = -2.505 + 38.772 * (.00909) = -2.15256
• Matt Garza ln(odds) = -2.505 + 38.772 * (.02326) = -1.60316
53. Logistic Regression: Using It!
Probability Table
Pitcher ln(odds) P(no-hitter)
Roy Halladay -.47878 0.382540
Ubaldo Jimenez -2.505 0.075508
Dallas Braden -2.505 0.075508
Armando Galarraga -2.505 0.075508
Edwin Jackson -2.15256 0.104092
Matt Garza -1.60316 0.167540
54. Logistic Regression: One Last Comparison
Let’s make one last comparison. Let’s compare the average ln(odds) of
the previous six pitchers to throw no hitters with the previous six.
-1.647795
Of our six pitchers we will find only one, The Philadelphia Phillies’ Roy
Halladay, is above the average at -.47878. In fact he is above the
ln(odds) average of everyone who threw a no hitter in our database,
-1.173541089.
As for our five others, the average ln(odds) of those pitcher in our
database who didn’t throw a no-hitter is -1.575964583. All of our
other five pitchers fall short of this mark, the closest being Matt Garza
at -1.60316.
55. Logistic Regression: What Does it Mean?
It means four of the six pitchers who threw a no-hitter had, in laymen’s
terms, PULLED A MIRACLE OUT OF THEIR HAT!
Statistically, it means that four of our six pitchers not only beat the
odds, they destroyed them.
3 of our pitchers, (Ubaldo Jimenez, Dallas Braden, and Armando
Galarraga) beat the odds by nearly one having never even thrown a
shutout before their no-hitters.
The other two (Edwin Jackson and Matt Garza) showed odds more
likely to not throw a no-hitter even having previously thrown shutouts.
***Although not directly assessed in this analysis, both Dallas Braden
and Armando Galarraga pitched Perfect Games, exponentially rarer***
57. Conclusion
What does the data tell us?
Overall pitching was average when we use the yearly averages and
standard deviations from 1969-2009 (2010 MLB ERA: z = .05783, 2010 MLB
WHIP: z = -.452796, 2010 AL ERA: z = -.4051166, 2010 AL WHIP: z = -.685158,
2010 NL ERA: z = .298405, 2010 NL WHIP: z = -.0422833), and only slightly
improved when reducing our interval to 1991-2009 and focus in on WHIP (2010
MLB WHIP: z = -1.4496, 2010 AL WHIP: z = -1.67597, 2010 NL WHIP: z = -
.798408).
The pitchers who made this season memorable with their no-hitters
showed that only one of them any sign that a no-hitter could be coming (Roy
Halladay, ln(odds) = -.47878). As for the other five, all of them showed odds less
than the average ln(odds) of those pitchers who didn’t throw no hitters (144,
average ln(odds) = -1.575964583). Three of the five actually had the lowest
possible odds according to our ln(odds) (Dallas Braden, Armando Galarraga,
Ubaldo Jimenez, ln(odds) = -2.505)
60. Bibliography
• Abu-Bader, Soleman. Advanced & Multivariable Statistical Methods For Social
Science Research with a complete SSPS guide. Chicago, Ill: Lyceum Books,
2010.
• Albert, Jim. "Is Roger Clemens' WHIP Trajectory Unusual?" Chance 22.2 (2009): 9-
19.
• Bauman, Mike. Halladay stood out in Year of the Pitcher. MLB News.
http://mlb.mlb.com/news/article.jsp?ymd=20101116&content_id=1611319
0&vkey=news_mlb&c_id=mlb
• Forman, Sean, Justin Kubatko. www.baseball-reference.com
• McCarthy, David, David Groggel, and John A. Bailer. "Career Pitching Statistics and
the Probability of Throw a No-Hitter in MLB: A Case-Control Study." Chance
23.3 (2010): 25-35.
• Schmotzer, Brian, Patrick D. Kilgo, and Jeff Switchenko. "'The Natural'? The Effect
of Steroids on Offensive Performance in Baseball." Chance 22.2 (2009): 21-
31
• Wikipedia Contributors. No-Hitter. Wikipedia, The Free Encyclopedia,
http://en.wikipedia.org/wiki/No-hitter#Major_League_Baseball_no-hitters