Major League Soccer Player Analysis


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Major League Soccer Player Analysis

  1. 1. Stevens Institute of Technology Web Analytics - Fall 2012 Midterm ProjectBy Chris Armstrong, Dan Derringer, Jude Ken-Kwofie, Hemanth Mahadevaiah, and Sujana Veeraganti
  2. 2.  Which player statistics (goals, assists, etc) are most strongly correlated to a player‟s salary? ◦ What statistics determine a player‟s value? Are MLS player salaries correlated to player popularity?
  3. 3. Question #1: Which player statistics (goals,assists, etc) are most strongly correlated to aplayer‟s salary?Method: Extract player statistics from Extract player salary from Analyze to determine what correlation(s) exist
  4. 4. Question #2: Are MLS player salaries correlatedto player popularity?Method: Use salary information previously extracted Use # of google search results for each player‟s name to use as indicator for popularity: David Beckham = 1,700,000 search results Abdul Thompson Conteh = 2,480 search results Popularity: David Beckham > Abdul Thompson Conteh
  5. 5.  All-time stats scraped by Chris Armstrong 2007-2012 players‟ salaries scraped by Jude Ken-Kwofie Scripts combined by Daniel DerringerIssues encountered: Few players in the all-time stats received salaries in 2007-2012 Merging the data
  6. 6. Data to be used: The all-time stats and 2012 salaries ◦ Using the salaries from 2007-2011 eliminated too many players for analysisMerging the data: Create a for loop in Python to merge all five of the tables ◦ However, this took over 45 minutes to run Write an R script to merge the tables ◦ However, was not very elegant
  7. 7. Tools Used: Mechanize, urllib2: URL handling Regular Expressions, Beautiful Soup: Parsing, cleaning Pandas: Data manipulation
  8. 8. Process:1) Iterate through each stat „type‟ (goals, assists, goalkeeping, fouls, shots)2) Extract all stats using Beautiful Soup/RE3) Merge dictionaries into one Pandas DataFrame (dropping duplicates)4) Save output to CSV file
  9. 9. Tools Used: urllib2: URL handling Regular Expressions: Parsing, cleaning PyPDF2: Reading/extracting from PDF‟s Pandas: Data manipulation
  10. 10.  Open URL for 2012 Salaries Save resulting PDF to local machine Open PDF file and parse with PyPDF2 Extract player name and salary with Reg Ex Concatenate Last Name, First Name Merge on player name with MLS Stats dataframe
  11. 11. Tools Used: Google Custom Search API urllib2: URL handling JSON: Data structure
  12. 12.  Use Google Custom Search API to iterate through MLS Stats dataframe and search for each player name + „MLS‟ example: “John Doe” MLS Extract search result # from returned JSON object Append to MLS Stats dataframe
  13. 13.  926 Players 29 stat categories plus salaries and search results 280 players with salary figures All contained in one Dataframe object CSV saved for each scraping process, as well as for master table
  14. 14.  Wanted to give users ability to point and click options Plotting on demand High level access to script Learn something new!
  15. 15.  Tkinter is the defacto Python module for creating user interfaces Can be as simple as dialog boxes or complex as games Wide range of options and very flexible (menus, radio buttons, checkboxes, etc)
  16. 16.  Used Tkinter “widgets” to create simple dialog box interfaces Allows user to upload files via dialog box Interactive plotting ◦ Pandas/Matplotlib
  17. 17.  Due to the lack of publically available player passing efficiency data we found it challenging to build relationships between salary and performance and to determine the best players. Analyzed player compensation versus player goals, assists, shot as well as to simply calculate statistics based on player minutes, goal, assists, shots, shots on goal, game winning goals and game winning assists
  18. 18. From a data set of 251 MLS players wedetermined for the year 2012: The average MLS player earns $200,262.58. The lowest paid player, Jeb Brovsky, earns $33,750. The highest paid player is Thierry Henry. He earns $5,000,000. Out of the 251 players, 55.77% of the players make salaries greater than or equal to $100,000
  19. 19. Basic Statistics
  20. 20. The visual representation of the statistics wasgenerated with R, Matplotlib and Pandas.Scatter plots and histograms were developed toshow: Player compensation versus player goals, assists and shots (scatter plots) Player minutes, goal, assists, shots, shots on goal, game winning goals and game winning assists (histrograms)
  21. 21. The plot shows exploratory data analysis of the various attributeslike Minutes, Goals, Shots, Assists, and Shots on Goals, GameWinning Goals, Game Winning Assists and Salary to summarize themain characteristics in easy-to-understand form
  22. 22.  Remove possible confound of more experienced players having higher “counting” stats by converting all stats to be per game. Soccer not necessarily a meritocracy, salary more correlated to google search results than any other metric (cause and effect issue?) True player value challenging to measure based on limited statistical information
  23. 23.  Varying knowledge of Python and other platforms created issues when combining and editing code Working on a team requires the right system for effective collaboration (beware the danger of email chains!) When you think you‟ve debugged enough, debug some more