Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Framework for Forecasting Professional Soccer Player Career Paths


Published on

Slide deck of a presentation given at the 2015 OptaPro Analytics Forum on a statistical forecasting model that projects performance output of a football player as he transitions between multiple leagues in a career. The objective is to create a soccer equivalent of projection systems such as PECOTA in baseball and SCHOENE in basketball while incorporating machine learning techniques as much as possible. Work on this model began at the beginning of the year, so don't expect a lot of results to be presented. The goal of this talk is to present at a high level the objectives and methodology of the model, obtain feedback from the soccer analytics community, and gauge interest from the broader football industry.

Published in: Sports
  • Be the first to comment

Framework for Forecasting Professional Soccer Player Career Paths

  1. 1. 2015 OptaPro Analytics Forum Framework for a Player Career Forecast Model Between Multiple Leagues Howard Hamilton Founder, Soccermetrics Research
  2. 2. 2015 OptaPro Analytics Forum Developed a career statistical forecasting modelling framework for football players, automated by applying machine-learning techniques. Inputs 1. Season statistical performance 2. Physical / playing characteristics Outputs 1. Identify peer group of players with comparable performance 2. Forecast future statistical performance over a limited horizon 3. Translate performance in one domestic league competition to performance in another Expected Interest  Clubs  Media  Betting  Fantasy Early Stage: Framework > Results Main Points
  3. 3. 2015 OptaPro Analytics Forum Baseball 1. Similarity Scores (Bill James, 1980s) 2. Vladimir Forecasting System (Gary Huckabay, 1990s) 3. PECOTA (Nate Silver/Baseball Prospectus, 2003) PECOTA-inspired forecasting models in other sports 1. SCHOENE (Kevin Pelton/Basketball Prospectus/ESPN, mid 2000s) 2. KUBIAK (Aaron Schatz/Football Outsiders, mid 2000s) 3. VUKOTA (Puck Prospectus, 2010) Individual / team projection models in football 1. Aaron Nielsen (ENB Sports) • One-year projection of individual/team performance 2. Pérez Sánchez et al (2013) • Estimating goal-scoring performance in Spanish league Forecasting Statistical Performance in Sport Prior Art
  4. 4. 2015 OptaPro Analytics Forum Data scarcity • Range of seasons • Statistical categories collected • League variations Characteristics of domestic leagues • Differences in aging curves between leagues • Would a 'universal' aging curve work? Not sure... • Statistical translations between leagues • Some leagues are very connected, others less so Challenges
  5. 5. 2015 OptaPro Analytics Forum Data Source: ENB Soccer Database • 60,000+ players, • 75 domestic league competitions, • 500+ clubs Individual season statistics • 1992-93 to 2011-12 (European) • 1992 to 2012 (American/Scandinavian/Japanese) Database Analysis All players • Season • Team • Competition • Appearances • Subs • Minutes • Yellows / reds Field players • Goals • Assists • Shots • Fouls Goalkeepers • Goals allowed • Clean sheets • Shots faced • Wins • Draws • Losses Modeling Components
  6. 6. 2015 OptaPro Analytics Forum Normalize statistical categories Convert statistical values of players in same competition and season • to “standard score” • Places statistical performances on one standard distribution • This is what allows us to compare players Identify K comparable players (“nearest neighbors”) • Consider players of same age and position • Calculate similarity score between statistical records • Comparable players: Score about 0.90 - 0.95 • Relax threshold for “unique” players Forecast future performance with historical performance of comparable players Using regression techniques • Adjust for aging and regression to mean • Convert to statistics for league competition of interest (x-)/ K-NN  Model Description
  7. 7. 2015 OptaPro Analytics Forum Player League Season Similarity Osvaldo Val Baiano Brazil Serie B 2007 0.961 Wayne Rooney English Premier League 2011-2012 0.957 Oscar Cardozo Portugal Primeira Liga 2009-2010 0.954 Maciej Zurawski Poland Ekstraklasa 2002-2003 0.939 Carlos Tevez English Premier League 2010-2011 0.926 Javi Moreno Spanish Primera 2000-2001 0.925 Katlego Mphela South Africa PSL 2010-2011 0.913 Matt Tubbs England Conference 2010-2011 0.913 Kris Boyd Scotland Premier League 2009-2010 0.905 Goncalves Jonas Brazil Serie A 2010 0.904 Rickie Lambert England League One 2008-2009 0.901 Mario Bermejo Spanish Segunda 2004-2005 0.897 Alan Shearer English Premier League 1996-1997 0.877 Kevin Phillips English Premier League 1999-2000 0.863 Photo by Simon Harriyott Cristiano Ronaldo: Forward, aged 27 (Spanish Primera 2011/12) Active Player. Scored 46 goals in 2011/12 La Liga season. Nearest Neighbor Results Nearest Neighbor groups leading goalscorers at Ronaldo's age 0.96 similarity metric – few players had a season as dominant
  8. 8. 2015 OptaPro Analytics Forum Marvin Bejarano: Defender, aged 21 (Bolivia Liga Profesional 2008) Player League Season Similarity Fernando Tobio Argentina Primera 2009-2010 0.996 Charlie Wassmer England League Two 2011-2012 0.990 Oswaldo Alanis Mexico Primera 2009-2010 0.985 Jan Vertonghen Netherlands Eredivisie 2007-2008 0.984 Paul Papp Romania Liga I 2009-2010 0.957 Santiago Vergini Paraguay Primera 2009 0.957 Mauricio Casierra Colombia Primera 2006 0.957 Rafael Delgado Argentina Nacional B 2010-2011 0.955 Konstantin Engel Germany 2 Bundesliga 2008-2009 0.954 Jae Sung Lee South Korea K-League 2009 0.953 Koybasi Ismail Turkey Super Lig 2009-2010 0.953 Luke O'Brien England League Two 2008-2009 0.951 Hector Quinones Colombia Primera 2012 0.950 Mate Ghvinianidze Germany 2 Bundesliga 2006-2007 0.950 Franz Schiemer Austria 1 Bundesliga 2006-2007 0.947 Active Player. Has played for one club over his career. 5 caps for Bolivia. 0.996 similarity metric – very comparable, but limited defensive data Nearest Neighbor Results
  9. 9. 2015 OptaPro Analytics Forum Iker Casillas: Goalkeeper, aged 26 (Spanish Primera, 2006-2007) Active Player. Has played for one club over his career. 450+ appearances at Real Madrid, 160 caps for Spain. Interesting that Gianluigi Buffon is closest comparable at 26 y/o Nearest Neighbor Results Player League Season Similarity Gianluigi Buffon Italy Serie A 2003-2004 0.994 Mark Crossley English Premier League 1994-1995 0.992 Dionissis Chiotis Greece Super League 2002-2003 0.990 Steve Mandanda France Ligue 1 2010-2011 0.989 Marco Wolfli Switzerland Super League 2007-2008 0.989 Shay Given English Premier League 2001-2002 0.986 Guillermo Ochoa Mexico Primera 2010-2011 0.986 Eduardo Martini Brazil Serie A 2004 0.985 Morgan de Sanctis Italy Serie A 2002-2003 0.984 Hiroki Iikura Japan J1-League 2011 0.982 Cesar Lainez Spanish Segunda 2002-2003 0.981 Marcelo Grohe Brazil Serie A 2012 0.981 Hitoshi Sogahata Japan J1-League 2005 0.980 Henri Sillanpaa Finland Veikkausliiga 2004 0.980
  10. 10. 2015 OptaPro Analytics Forum Projecting career performance is difficult • Next steps: ● Use nearest neighbors to forecast future performance ● Quantify adjustments for age, league quality, position ● Create multiple career forecast paths with probabilities • Limited horizons important (2-3 years) • Probabilistic projections sensible, not necessarily useful • Accuracy vs. clarity • Diverse range of statistical categories necessary – • Attacking and defending contributions and impact • Advanced metrics Data normalization is a necessity! Club projections are logical step Need to enforce a “conservation of goals” in the universe of data in our system, i.e: Total goals scored == total goals conceded Photo by Simon Harriyott Conclusions
  11. 11. 2015 OptaPro Analytics Forum Customization • Integrate with financial/medical databases, scouting data • Greatest utility at football operations/sporting director level Biggest challenge: Data! Not just data on all players in league, but players • in all other leagues of interest • Some statistical categories not available in some leagues • As always, data collection and analysis problems are non-trivial Photo by JD Hancock Knowledge Transfer
  12. 12. 2015 OptaPro Analytics Forum Thank You! Special Thanks To: OptaPro (Invitation to Forum) Aaron Nielsen (ENB Database access) Simon Harriyott (Presentation at Forum) For more information contact Soccermetrics Research @soccermetrics