Major League Soccer Analytics                                 with Python                                   Chris Armstron...
The above statistics shows the average salary, median,Out of the 251 players, 55.77% of the players make salaries      low...
FIGURE 3 FORWARD, DEFENDER AND MIDFIELDER GOALS                             Owners can get the similar goal/assist product...
Game Winning Assists and Salary to summarize the          main characteristics in easy-to-understand form.                ...
November 13, 2012, Hoboken, NJMajor League Soccer Analytics with Python                   5
Upcoming SlideShare
Loading in …5
×

Major League Soccer Player Analysis-Report

785 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
785
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Major League Soccer Player Analysis-Report

  1. 1. Major League Soccer Analytics with Python Chris Armstrong, Dan Derringer, Jude Ken-Kwofie, Hemanth Mahadevaiah and Sujana Veeraganti Stevens Institute of TechnologyAbstract– Unlike European soccer leagues and popularAmerican sports, relatively little work as been done on The next issue faced was how to merge the tables. The firstMajor League Soccer (MLS) player and team idea was to use a for loop in Python to match the players’performance analytics.With MLS growing in popularity names and produce a master table with all their all-time statscombined with the small community of individuals and salaries from 2012. Although it was successful, it wasconducting MLS analytics, we decided to apply web less than ideal as it would take over 45 minutes to mergeanalytics concepts taught in Business Intelligence & these five tables. The next idea was to write a script in R toAnalytics class (BIA 660) to help determine player merge the tables; since R is designed to be a statistical toolratings and compensation. To this end we used the and can better manipulate tables. This plan successfullyPython programming language and related modules to: reduced the processing time down to less than a minute and1)crawl the web, 2) scrape relevant data, 3) compile we added the ability for Python to run the R scriptcaptured data into a data set, 4) determine player ratings automatically after the data scraping was complete.and simple statistics, and 5) create attractive plots However, this wasn’t as clean as we would like it to be. Theshowcasing the data relationships. final solution was to use the Pandas module for Python. The Pandas module gave us the ability to manipulate data theIndex Terms–Major League Soccer, Python, Visualization, way we need it, without having to go outside of Python.Web Scraping. The key Python scripts used in our work are as follows: PROJECT GOALThe primary goal of the project was to use BIA 660 web MLS_Statistical_Application.py – Includes a fullanalytics lessons on the Python programming language and scraping function plusan interactive plotting featurerelated modules to analyze and visualize MLS specific developed in Tkinter. The Tkinter function importsdata.The following Python modules were used in this work: a comma-separated value (csv) and allows the user to plot results by selecting column names as the x Web – mechanize, urllib2, BeautifulSoup, PyPDF2 and y-axis. Regular Expression – re DATA ANALYSIS System & I/O - sys, StringIO, csv, print, json Data Analysis - R, Pandas, Numpy, Scipy Initially, our analysis focused on determining 1) the best XI Data Visualization – Tkinter, Matplotlib MLS players of all time and 2) if a reasonable correlation exists between player compensation and performance, i.e.,The following sections describe our python data scraping, goals, assists, and shots. However, due to the lack ofcompilation, and analysis and visualization efforts. publically available player passing efficiency data we found it challenging to build relationships between salary and DATA SCRAPING& COMPILATION performance and to determine the best players. Ultimately,The Python script has gone through severaliterations. The we decided to analyze player compensation versus playeroriginal plan was to extract four tables of players’ all-time goals, assists, shot as well as to simply calculate statisticsstats and six pdf files with salary data for players in 2007- based on player minutes, goal, assists, shots, shots on goal,2012. The idea was to merge these ten lists to create one game winning goals and game winning assists. From a datamaster list; however, not all players in the all-time stats set of 251 MLS players we determined for the year 2012:tables collected a salary in 2012 and not all of those thatcollected salaries in 2012 also collected a salary in 2007. The average MLS player earns $200,262.58.This issue drastically reduced the number of records to The lowest paid player, Jeb Brovsky earns $33,750.analyze in the master list. Therefore, it was decided to only The highest paid player, Thierry Henry earnsthe salary data from 2012 would be used. $5,000,000. November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 1
  2. 2. The above statistics shows the average salary, median,Out of the 251 players, 55.77% of the players make salaries lowest salary and highest salary by position. Also includedgreater than or equal to $100,000. Additional statistics are in the table are the top five players with highest salarypresented below. among each position. As anticipated, the forwards are paid a higher salary of the four positions. Goalkeepers are theWe also found with the data on hand that in the MLS there is lowest wage earners on average.little to no correlation between player’s salaries and goals,assists and shots (shown in Figure 1). Player compensation FIGURE 1 GOALS AND ASSISTS VERSUS SALARYseems to be based on their popularity than their ability toscore goals, assists and shots. There is a solid relationshipbetween players Google search hit rate and salary.The lack of correlation between salary and performance is aninteresting result since in other leagues the highest paidplayers are usually the best at scoring and assisting. Asmentioned earlier, an adequate data set on player passingmay provide better insights and results between salary andperformance. DATA VISUALIZATIONThe visual representation of the statistics was generated withR, Matplotlib and Pandas. Scatter plots and histograms weredeveloped to show: Player compensation versus player goals, assists and shots (scatter plots) Player minutes, goal, assists, shots, shots on goal, game winning goals and game winning assists (histrograms)The following section presents a few of the generatedvisuals. FIGURES, TABLES AND EQUATIONS TABLE 1 - PLAYER PAY BY POSITIONResults November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 2
  3. 3. FIGURE 3 FORWARD, DEFENDER AND MIDFIELDER GOALS Owners can get the similar goal/assist production VERSUS SALARY from someone making < $200K as with someone making >$400K to $1.2M. This tends to suggest that higher paid players have the same impact on goals or assists as a low wager, which is interesting. Data shows that that the players have similar skill sets. It takes special players to score goals or give assists. FIGURE 2 - 3D PLOT OF FORWARDS GOALS, ASSISTS AND MINUTES Figure 2 shows a 3D rendering of player assists, minutes and game winning assists. In general, the plots sh ows little correlation between the fields. However, for defenders there is a strong correlation between the fields suggesting assists by defenders lead to wins. FIGURE 3 - HISTROGRAMS OF PLAYER MINUTES, GOALS, ASSISTS, SHOTS, SHORTS ON GOAL, GAME WINNING GOALS, GAME WINNING ASSISTS AND SALARY ResultsResults The plot shows exploratory data analysis of the There is little correlation between a goals or assists various attributes like Minutes, Goals, Shots, and a high salary. Assists, and Shots on Goals, Game Winning Goals, November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 3
  4. 4. Game Winning Assists and Salary to summarize the main characteristics in easy-to-understand form. CONCLUSIONUnlike European soccer leagues and popular Americansports, relatively little work as been done on Major LeagueSoccer (MLS) player and team performance analytics. WithMLS growing in popularity combined with the smallcommunity of individuals conducting MLS analytics, wedecided to apply web analytics concepts taught in BusinessIntelligence & Analytics class (BIA 660) to help determineplayer ratings and compensation.The primary goal of the project was to use BIA 660 webanalytics lessons on the Python programming language andrelated modules to analyze and visualize MLS specific data. ACKNOWLEDGMENTWe acknowledge the mentoring of Professor Winter Mason. REFERENCES PYTHON PROGRAMMING LANGUAGE – HTTP://WWW.PYTHON.ORG/ HTTP://WIKI.PYTHON.ORG/MOIN/TKINTER 1 AUTHOR INFORMATION Chris Armstrong,chris.r.armstrong@gmail.com Dan Derringer, dderringer311@gmail.com Jude Ken-Kwofie, jkenkwof@stevens.edu Hemanth Mahadevaiah,hemanth.m1@gmail.com Sujana Veeraganti, sujanaveeraganti@gmail.com1 Stevens Institute of Technology Business Intelligence & AnalyticsGraduate Students November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 4
  5. 5. November 13, 2012, Hoboken, NJMajor League Soccer Analytics with Python 5

×