Major League Soccer Analytics                                 with Python                                   Chris Armstron...
The highest paid player, Thierry Henry earns            Results         $5,000,000.                                       ...
Owners can get the similar goal/assist production FIGURE 3 FORWARD, DEFENDER AND MIDFIELDER GOALS                         ...
Game Winning Assists and Salary to summarize the          main characteristics in easy-to-understand form.                ...
November 13, 2012, Hoboken, NJMajor League Soccer Analytics with Python                   5
Upcoming SlideShare
Loading in …5
×

Major League Soccer Player Analysis-Report

577 views

Published on

  • Be the first to comment

  • Be the first to like this

Major League Soccer Player Analysis-Report

  1. 1. Major League Soccer Analytics with Python Chris Armstrong, Dan Derringer, Jude Ken-Kwofie, Hemanth Mahadevaiah and Sujana Veeraganti Stevens Institute of Technology,chris.r.armstrong@gmail.com, dderringer311@gmail.com, jkenkwof@stevens.edu, hemanth.m1@gmail.com, andsujanaveeraganti@gmail.comAbstract– Unlike European soccer leagues and popular the salary data from 2012 would be used.American sports, relatively little work as been done onMajor League Soccer (MLS) player and team The next issue faced was how to merge the tables. The firstperformance analytics.With MLS growing in popularity idea was to use a for loop in Python to match the players’combined with the small community of individuals names and produce a master table with all their all-time statsconducting MLS analytics, we decided to apply web and salaries from 2012. Although it was successful, it wasanalytics concepts taught in Business Intelligence & less than ideal as it would take over 45 minutes to mergeAnalytics class (BIA 660) to help determine player these five tables. The next idea was to write a script in R toratings and compensation. To this end we used the merge the tables; since R is designed to be a statistical toolPython programming language and related modules to: and can better manipulate tables. This plan successfully1)crawl the web, 2) scrape relevant data, 3) compile reduced the processing time down to less than a minute andcaptured data into a data set, 4) determine player ratings we added the ability for Python to run the R scriptand simple statistics, and 5) create attractive plots automatically after the data scraping was complete.showcasing the data relationships. However, this wasn’t as clean as we would like it to be. The final solution was to use the Pandas module for Python. TheIndex Terms–Major League Soccer, Python, Visualization, Pandas module gave us the ability to manipulate data theWeb Scraping. way we need it, without having to go outside of Python. PROJECT GOAL The key Python scripts used in our work are as follows:The primary goal of the project was to use BIA 660 webanalytics lessons on the Python programming language and MLS_Statistical_Application.py – Includes a fullrelated modules to analyze and visualize MLS specific scraping function plusan interactive plotting featuredata.The following Python modules were used in this work: developed in Tkinter. The Tkinter function imports a comma-separated value (csv) and allows the user Web – mechanize, urllib2, BeautifulSoup, PyPDF2 to plot results by selecting column names as the x and y-axis. Regular Expression – re System & I/O - sys, StringIO, csv, print, json DATA ANALYSIS Data Analysis - R, Pandas, Numpy, Scipy Data Visualization – Tkinter, Matplotlib Initially, our analysis focused on determining 1) the best XI MLS players of all time and 2) if a reasonable correlationThe following sections describe our python data scraping, exists between player compensation and performance, i.e.,compilation, and analysis and visualization efforts. goals, assists, and shots. However, due to the lack of publically available player passing efficiency data we found DATA SCRAPING& COMPILATION it challenging to build relationships between salary andThe Python script has gone through severaliterations. The performance and to determine the best players. Ultimately,original plan was to extract four tables of players’ all-time we decided to analyze player compensation versus playerstats and six pdf files with salary data for players in 2007- goals, assists, shot as well as to simply calculate statistics2012. The idea was to merge these ten lists to create one based on player minutes, goal, assists, shots, shots on goal,master list; however, not all players in the all-time stats game winning goals and game winning assists. From a datatables collected a salary in 2012 and not all of those that set of 251 MLS players we determined for the year 2012:collected salaries in 2012 also collected a salary in 2007.This issue drastically reduced the number of records to The average MLS player earns $200,262.58.analyze in the master list. Therefore, it was decided to only The lowest paid player, Jeb Brovsky earns $33,750. November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 1
  2. 2. The highest paid player, Thierry Henry earns Results $5,000,000. The above statistics shows the average salary, median,Out of the 251 players, 55.77% of the players make salaries lowest salary and highest salary by position. Also includedgreater than or equal to $100,000. Additional statistics are in the table are the top five players with highest salarypresented below. among each position. As anticipated, the forwards are paid a higher salary of the four positions. Goalkeepers are theWe also found with the data on hand that in the MLS there is lowest wage earners on average.little to no correlation between player’s salaries and goals,assists and shots (shown in Figure 1). Player compensation FIGURE 1 GOALS AND ASSISTS VERSUS SALARYseems to be based on their popularity than their ability toscore goals, assists and shots. There is a solid relationshipbetween players Google search hit rate and salary.The lack of correlation between salary and performance is aninteresting result since in other leagues the highest paidplayers are usually the best at scoring and assisting. Asmentioned earlier, an adequate data set on player passingmay provide better insights and results between salary andperformance. DATA VISUALIZATIONThe visual representation of the statistics was generated withR, Matplotlib and Pandas. Scatter plots and histograms weredeveloped to show: Player compensation versus player goals, assists and shots (scatter plots) Player minutes, goal, assists, shots, shots on goal, game winning goals and game winning assists (histrograms)The following section presents a few of the generatedvisuals. FIGURES, TABLES AND EQUATIONS TABLE 1 - PLAYER PAY BY POSITION November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 2
  3. 3. Owners can get the similar goal/assist production FIGURE 3 FORWARD, DEFENDER AND MIDFIELDER GOALS from someone making < $200K as with someone VERSUS SALARY making >$400K to $1.2M. This tends to suggest that higher paid players have the same impact on goals or assists as a low wager, which is interesting. Data shows that that the players have similar skill sets. It takes special players to score goals or give assists. FIGURE 2 - 3D PLOT OF FORWARDS GOALS, ASSISTS AND MINUTES Figure 2 shows a 3D rendering of player assists, minutes and game winning assists. In general, the plots sh ows little correlation between the fields. However, for defenders there is a strong correlation between the fields suggesting assists by defenders lead to wins. FIGURE 3 - HISTROGRAMS OF PLAYER MINUTES, GOALS, ASSISTS, SHOTS, SHORTS ON GOAL, GAME WINNING GOALS, GAME WINNING ASSISTS AND SALARY Results The plot shows exploratory data analysis of theResults various attributes like Minutes, Goals, Shots, There is little correlation between a goals or assists Assists, and Shots on Goals, Game Winning Goals, and a high salary. November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 3
  4. 4. Game Winning Assists and Salary to summarize the main characteristics in easy-to-understand form. CONCLUSIONUnlike European soccer leagues and popular Americansports, relatively little work as been done on Major LeagueSoccer (MLS) player and team performance analytics. WithMLS growing in popularity combined with the smallcommunity of individuals conducting MLS analytics, wedecided to apply web analytics concepts taught in BusinessIntelligence & Analytics class (BIA 660) to help determineplayer ratings and compensation.The primary goal of the project was to use BIA 660 webanalytics lessons on the Python programming language andrelated modules to analyze and visualize MLS specific data. ACKNOWLEDGMENTWe acknowledge the mentoring of Professor Winter Mason. REFERENCES PYTHON PROGRAMMING LANGUAGE – HTTP://WWW.PYTHON.ORG/ HTTP://WIKI.PYTHON.ORG/MOIN/TKINTER 1 AUTHOR INFORMATION Chris Armstrong,chris.r.armstrong@gmail.com Dan Derringer, dderringer311@gmail.com Jude Ken-Kwofie, jkenkwof@stevens.edu Hemanth Mahadevaiah,hemanth.m1@gmail.com Sujana Veeraganti, sujanaveeraganti@gmail.com1 Stevens Institute of Technology Business Intelligence & AnalyticsGraduate Students November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 4
  5. 5. November 13, 2012, Hoboken, NJMajor League Soccer Analytics with Python 5

×