TRANSFERRING LEAN SIX SIGMA AND DFSS DATA SIMPLY AND EFFECTIVELY “Baseball Analytics”   Baseball is the only field of ende...
Agenda <ul><li>Introduction of Baseball Analytics </li></ul><ul><li>Descriptive statistics and graphical data analysis  </...
Introduction <ul><li>Why the session… </li></ul><ul><li>Better way to understand and teach LSS and DFSS Tools </li></ul><u...
Test of Hypothesis <ul><li>Null Hypothesis: </li></ul><ul><ul><li> Ho  1     2 </li></ul></ul><ul><ul><li>MLB exa...
Batting Stats American League National League It ain't like football.  You can't make up no trick plays.  ~Yogi Berra Team...
Test of Hypothesis <ul><li>Are the batting averages of the National League different than the American League?  </li></ul>...
Are Salaries Correlated to Team Performance? <ul><li>The trend is… </li></ul><ul><li>Problem statement: </li></ul><ul><ul>...
2008 MLB Salaries and Win Count Team Total Salary  Wins Team Total Salary  Wins NY Yankees $207,108,489 89 San Francisco $...
Correlation Between Salary and Wins?
Use these Derivation Formulae or?
Use This Simple Graphic? Pearson Correlation Coefficient Definition Values of  r
Correlation Coefficient <ul><li>Graphic approximation… what do you think? </li></ul><ul><li>Minitab results:  Pearson corr...
American League West in 2002 (“Moneyball” Data Set) Pearson correlation of Wins and Payroll = -0.928 Team Wins Payroll Oak...
ANOVA <ul><li>Null Hypothesis: </li></ul><ul><ul><li> Ho  1     2  =   3     n </li></ul></ul><ul><ul><li>ML...
Regression Analysis <ul><li>Is it possible to model and predict number of wins for a season based on statistical parameter...
Multiple Regression and Best Fit Model <ul><li>Regression studies the relationship between the mean value of a random vari...
American League West in 2002 (“Moneyball” Data Set) Team Wins Payroll Oakland 103 $41,942,665 Anaheim 99 $62,757,041 Seatt...
Exploratory Data Analysis What does it mean?
Testing the Predictive Model <ul><li>Tigers 2008 data… </li></ul><ul><ul><li>Here is the predictive transfer function from...
Statistical Process Control and Statistical Thinking <ul><li>“ Statistical process control  is the application of statisti...
Example 1:  Notional Data – Status at Game 37 Range outside UCL indicates “out of control”  -Need to investigate “special ...
Which Method is Earliest at Detecting a “Special Cause? Old Way Analytics Approach
Next Steps <ul><li>Additional MLB Analytics </li></ul><ul><li>System approach to baseball </li></ul><ul><li>Other sports? ...
 
 
Wasiloff – Young Baseball Analytics “ Systems Approach to Batting” Analytic Based Reactive Batting Problem Solving Pre Emp...
 
Questions / comments? Thanks! Baseball?  It's just a game - as simple as a ball and a bat.  Yet, as complex as the America...
Upcoming SlideShare
Loading in …5
×

Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

530
-1

Published on

Published in: Education, Technology, Sports
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
530
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transferring Lean Six Sigma and DFSS Data Simply and Effectively - Baseball Analytics

  1. 1. TRANSFERRING LEAN SIX SIGMA AND DFSS DATA SIMPLY AND EFFECTIVELY “Baseball Analytics” Baseball is the only field of endeavor where a man can succeed three times out of ten and be considered a good performer.  ~Ted Williams 4 th Annual Design for Six Sigma Conference James M. Wasiloff Cary Young US Army TACOM LCMC 9 February 2009
  2. 2. Agenda <ul><li>Introduction of Baseball Analytics </li></ul><ul><li>Descriptive statistics and graphical data analysis </li></ul><ul><li>Hypothesis development and testing </li></ul><ul><li>Analysis of Variance (ANOVA) </li></ul><ul><li>Pearson Correlation Coefficient </li></ul><ul><li>Simple Linear Regression </li></ul><ul><li>Multiple Regression and Best Fit Model </li></ul><ul><li>Predictive Models </li></ul><ul><li>Statistical Process Control </li></ul><ul><li>Next Steps / Application in Other Sports </li></ul>
  3. 3. Introduction <ul><li>Why the session… </li></ul><ul><li>Better way to understand and teach LSS and DFSS Tools </li></ul><ul><li>Can Money Spent = Wins </li></ul><ul><li>Keep it “Statistically Simple” </li></ul><ul><li>Just the Beginning </li></ul>Baseball quote … The charm of baseball is that, dull as it may be on the field, it is endlessly fascinating as a rehash.  ~Jim Murray
  4. 4. Test of Hypothesis <ul><li>Null Hypothesis: </li></ul><ul><ul><li> Ho  1     2 </li></ul></ul><ul><ul><li>MLB example: Ho: Mean Batting Average of the NY Yankees from 2006-2008 equals the Mean Batting Average of the Tampa Bay Rays from 2006-2008 </li></ul></ul><ul><li>Alternative Hypothesis: </li></ul><ul><li>Ha:  1   2 or  1   2 </li></ul><ul><li>“ They are not the same” </li></ul>During my 18 years I came to bat almost 10,000 times.  I struck out about 1,700 times and walked maybe 1,800 times.  You figure a ballplayer will average about 500 at bats a season.  That means I played seven years without ever hitting the ball.  ~Mickey Mantle, 1970
  5. 5. Batting Stats American League National League It ain't like football.  You can't make up no trick plays.  ~Yogi Berra Team Batting Averages 2006 2007 2008 Baltimore 0.277 0.272 0.267 Boston 0.269 0.279 0.280 Chicago Sox 0.280 0.246 0.263 Cleveland 0.280 0.268 0.262 Detroit 0.274 0.287 0.271 Kansas City 0.271 0.261 0.269 LA Angeles 0.274 0.284 0.268 Minnesota 0.287 0.264 0.279 NY Yankees 0.285 0.290 0.271 Oakland 0.260 0.256 0.242 Seattle 0.272 0.287 0.265 Tampa Bay 0.255 0.268 0.260 Texas 0.278 0.263 0.283 Toronto 0.284 0.259 0.264 Team Batting Averages 2006 2007 2008 Arizona 0.267 0.250 0.251 Atlanta 0.270 0.275 0.270 Chicago Cubs 0.268 0.271 0.278 Cincinnati 0.257 0.267 0.247 Colorado 0.270 0.280 0.263 Florida 0.264 0.267 0.254 Houston 0.255 0.260 0.263 LA Dodgers 0.276 0.275 0.264 Milwaukee 0.258 0.262 0.253 NY Mets 0.264 0.275 0.266 Philadelphia 0.267 0.274 0.255 Pittsburgh 0.263 0.263 0.258 San Diego 0.263 0.251 0.250 San Francisco 0.259 0.254 0.262 St. Louis 0.269 0.274 0.281 Washington 0.262 0.256 0.251
  6. 6. Test of Hypothesis <ul><li>Are the batting averages of the National League different than the American League? </li></ul><ul><li>T-test </li></ul><ul><li>Interpretation: “P Low, null must go – P High, null will fly” </li></ul>Two-Sample T-Test and CI: AL, NL N Mean StDev SE Mean AL 14 0.27086 0.00772 0.0021 NL 16 0.26356 0.00704 0.0018 Difference = mu (AL) - mu (NL) Estimate for difference: 0.007295 95% CI for difference: (0.001717, 0.012872) T-Test of difference = 0 (vs not =): T-Value = 2.69 P-Value = 0.012
  7. 7. Are Salaries Correlated to Team Performance? <ul><li>The trend is… </li></ul><ul><li>Problem statement: </li></ul><ul><ul><li>Will increasing player salaries lead to more success? </li></ul></ul>Baseball was the major American sport in which money bought success. George Will, Moneyball
  8. 8. 2008 MLB Salaries and Win Count Team Total Salary Wins Team Total Salary Wins NY Yankees $207,108,489 89 San Francisco $76,194,000 72 NY Mets $137,391,376 89 Milwaukee $74,687,499 90 Detroit $137,290,196 74 Cincinnati $74,117,695 74 Boston $133,220,112 95 San Diego $72,626,616 63 Chicago Sox $121,189,332 89 Colorado $68,655,500 74 LA Angels $118,825,333 100 Baltimore $66,806,249 68 LA Dodgers $118,188,536 84 Texas $66,312,326 79 Chicago Cubs $117,954,333 97 Arizona $66,202,712 82 Seattle $116,876,482 61 Kansas City $57,855,500 75 Atlanta $102,849,666 72 Minnesota $56,932,766 88 St. Louis $99,624,449 86 Washington $54,166,000 59 Toronto $97,001,500 86 Pittsburgh $48,689,783 67 Philadelphia $95,479,880 92 Oakland $47,167,126 75 Houston $88,930,414 86 Tampa Bay $43,422,997 97 Cleveland $78,970,066 81 Florida $22,650,000 84
  9. 9. Correlation Between Salary and Wins?
  10. 10. Use these Derivation Formulae or?
  11. 11. Use This Simple Graphic? Pearson Correlation Coefficient Definition Values of r
  12. 12. Correlation Coefficient <ul><li>Graphic approximation… what do you think? </li></ul><ul><li>Minitab results: Pearson correlation of Total Salary 2008 and Wins in 2008 = 0.323 </li></ul><ul><li>Interpretation of results </li></ul>
  13. 13. American League West in 2002 (“Moneyball” Data Set) Pearson correlation of Wins and Payroll = -0.928 Team Wins Payroll Oakland 103 $41,942,665 Anaheim 99 $62,757,041 Seattle 93 $86,084,710 Texas 73 $106,915,180
  14. 14. ANOVA <ul><li>Null Hypothesis: </li></ul><ul><ul><li> Ho  1     2 =  3     n </li></ul></ul><ul><ul><li>MLB example: Ho: Mean Batting Average of the NY Yankees equals the Mean Batting Average of the Tampa Bay Rays equals the Mean Batting Average of the NY Mets equals the Mean Batting Average of the … </li></ul></ul><ul><li>Alternative Hypothesis: </li></ul><ul><li>Ha: At least on  k   is different from one other  k </li></ul><ul><li>MLB example: At least one team has a Mean Batting Average different from all other teams </li></ul>A baseball fan has the digestive apparatus of a billy goat.  He can, and does, devour any set of diamond statistics with insatiable appetite and then nuzzles hungrily for more.  ~Arthur Daley
  15. 15. Regression Analysis <ul><li>Is it possible to model and predict number of wins for a season based on statistical parameters? </li></ul><ul><li>The initial simple linear regression model, 2002 data: </li></ul>Team Wins Payroll Oakland 103 $41,942,665 Anaheim 99 $62,757,041 Seattle 93 $86,084,710 Texas 73 $106,915,180
  16. 16. Multiple Regression and Best Fit Model <ul><li>Regression studies the relationship between the mean value of a random variable and the corresponding values of one or more independent variables. </li></ul><ul><ul><li>A model for predicting one variable from another. </li></ul></ul><ul><ul><li>A statistical analysis assessing the association between two variables. Regression analysis is a method of analysis that enables you to quantify the relationship between two or more variables (X) and (Y) by fitting a line or plane through all the points such that they are evenly distributed about the line or plane. </li></ul></ul><ul><li>Multiple regression is a method of determining the relationship between a continuous process output (Y) and several factors (Xs). </li></ul>
  17. 17. American League West in 2002 (“Moneyball” Data Set) Team Wins Payroll Oakland 103 $41,942,665 Anaheim 99 $62,757,041 Seattle 93 $86,084,710 Texas 73 $106,915,180
  18. 18. Exploratory Data Analysis What does it mean?
  19. 19. Testing the Predictive Model <ul><li>Tigers 2008 data… </li></ul><ul><ul><li>Here is the predictive transfer function from Minitab: </li></ul></ul><ul><ul><li>Testing on 2008 Data: </li></ul></ul><ul><ul><ul><li>Actual win count = 74 </li></ul></ul></ul><ul><ul><ul><li>Predicted win count = 74.26 </li></ul></ul></ul>Wins = 32.1 + 1.48 Average Age - 34.5 Team ERA + 154 Team Batting Average + 0.582 Saves (P) + 0.150 Runs (P) - 0.0202 Walks (P) - 0.0087 SO (P)
  20. 20. Statistical Process Control and Statistical Thinking <ul><li>“ Statistical process control is the application of statistical methods to identify and control the special cause of variation in a process” – iSixSigma.com </li></ul><ul><li>Statistical Thinking : The process of using wide ranging and interacting data to understand processes, problems, and solutions. </li></ul><ul><ul><li>The opposite of “one factor at a time” where the tendency is to change one factor and “see” what happens. </li></ul></ul><ul><ul><li>Statistical thinking is the tendency to want to understand situational phenomena over a wide range of data where several control factors may be interacting at once to produce and outcome. </li></ul></ul><ul><ul><li>Common cause variation becomes your friend and special cause variation your enemy. </li></ul></ul><ul><ul><li>Attribute judgements of good and bad are replaced with estimates of significance with given confidence. </li></ul></ul>
  21. 21. Example 1: Notional Data – Status at Game 37 Range outside UCL indicates “out of control” -Need to investigate “special cause”
  22. 22. Which Method is Earliest at Detecting a “Special Cause? Old Way Analytics Approach
  23. 23. Next Steps <ul><li>Additional MLB Analytics </li></ul><ul><li>System approach to baseball </li></ul><ul><li>Other sports? </li></ul><ul><ul><li>Golf Fishbone Cause and Effect Analysis example </li></ul></ul>Baseball statistics are like a girl in a bikini.  They show a lot, but not everything.  ~Toby Harrah, 1983
  24. 26. Wasiloff – Young Baseball Analytics “ Systems Approach to Batting” Analytic Based Reactive Batting Problem Solving Pre Emptive Batting Problem Discovery Optimal Batting System Design Accessories Stadium Our Mission Develop world class batters who use consistent, disciplined, and proven methods, of eliminating or preventing hitting problems thereby providing our fans excellence in batting, league leading run creation resulting in high level fan satisfaction   Systems Based Potential Causes <ul><li>Lean Six Sigma Analytics </li></ul><ul><li>Design for Six Sigma </li></ul><ul><li>Statistical Methods </li></ul><ul><li>Correlation/Regression Analysis </li></ul><ul><li>Design of Experiments </li></ul><ul><li>VOC / QFD </li></ul><ul><li>Taguchi Methods </li></ul><ul><li>Innovation Methods </li></ul>Batter Fundamentals Bat
  25. 28. Questions / comments? Thanks! Baseball?  It's just a game - as simple as a ball and a bat.  Yet, as complex as the American spirit it symbolizes.  It's a sport, business - and sometimes even religion.  ~Ernie Harwell, &quot;The Game for All America,&quot; 1955
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×