This document summarizes a project to predict NBA game outcomes using statistical analysis. It scrapes data from ESPN, develops an Elo rating system to rank teams, and analyzes player and team statistics. Key findings include accurately predicting 15 of 16 playoff series in 2014 using maximum season Elo ratings. Salary is also found to correlate with success, as most top-paid teams made the playoffs. Elo alone predicts single games at 65.6% accuracy, identifying areas for improvement.
This document provides an overview of the Salesforce Summer '16 release including new features for Lightning, Sales Cloud, Service Cloud, Field Service, and Community Cloud. Key highlights include global picklists, process builder automation, contacts associated with multiple accounts, an enhanced Lightning calendar and ability to edit page layouts. The Field Service section outlines features for scheduling and dispatching technicians from any device. Community Cloud allows connecting with customers through different community templates for customers, partners and employees.
Eames Consulting Group is an award-winning recruitment firm focused on identifying and delivering mid-to-senior level professionals to the financial and professional services sectors. They operate across 11 specialist divisions including actuarial, audit, broking, and claims. Eames conducts in-depth research to access niche skills and forges long-term relationships to deliver precise recruitment solutions globally through their offices in London, Zurich, Singapore, and Hong Kong.
Presentation entitled the "Art of Twitter" delivered at the Urological Society of Australia and New Zealand's Annual Scientific Meeting in March 2015 in Adelaide.
Follow Henry Woo on twitter @drhwoo or on his blog at www.surgicalopinion.blogspot.com.au
This document provides introductions from General Peter Pace and Congressman Ike Skelton to a book on the foundations of being a commissioned officer. General Pace emphasizes that the themes of honor, integrity, and moral leadership that defined the profession of arms in the past still apply today. Congressman Skelton commends members of the armed forces for their service and sacrifices. He introduces the book as continuing the tradition of educating officers on their moral and ethical obligations, and emphasizing that understanding the common foundation of military leadership is essential for excellence in joint operations. The book is intended to be the first volume in the professional library of newly commissioned officers.
Nast is a company that sells customizable products and original products to help people express themselves, with a focus on excellent customer service and personal attention to build trust. Customers can contact Nast on Facebook and Twitter.
The document summarizes the history and features of Karczowka Hill in Kielce, Poland. It describes the 17th century post-Bernardine monastery and early Baroque church located at the top of the hill. The hill rises 340 meters above sea level and consists of Devonian limestone covered by an old-growth pine forest. It was turned into a scenic reservation in 1953, and contains historic mining traces and geological phenomena from the mining of lead deposits in the 16th-17th centuries. A unique statue of Saint Barbara carved from a block of galena ore can be seen inside the Karczowka church chapel.
1. Vedran Puclin was born in 1984 in Zagreb, Croatia. On March 25, 2011, he completed a 6 semester professional study in electrical engineering at the Polytechnic of Varaždin.
2. He passed all requisite examinations and met all other obligations to earn the professional title of Electrical Engineer in accordance with applicable regulations.
3. The document, numbered EL154/2011, is certified as a true translation from Croatian by certified court interpreter Snježana Husnjak Pavlek on February 18, 2016 in Varaždin, Croatia.
Employment laws govern virtually every aspect of an employer/ employee relationship. If a dispute arises under employment law, an employment law lawyer can represent workers or employers.
This document provides an overview of the Salesforce Summer '16 release including new features for Lightning, Sales Cloud, Service Cloud, Field Service, and Community Cloud. Key highlights include global picklists, process builder automation, contacts associated with multiple accounts, an enhanced Lightning calendar and ability to edit page layouts. The Field Service section outlines features for scheduling and dispatching technicians from any device. Community Cloud allows connecting with customers through different community templates for customers, partners and employees.
Eames Consulting Group is an award-winning recruitment firm focused on identifying and delivering mid-to-senior level professionals to the financial and professional services sectors. They operate across 11 specialist divisions including actuarial, audit, broking, and claims. Eames conducts in-depth research to access niche skills and forges long-term relationships to deliver precise recruitment solutions globally through their offices in London, Zurich, Singapore, and Hong Kong.
Presentation entitled the "Art of Twitter" delivered at the Urological Society of Australia and New Zealand's Annual Scientific Meeting in March 2015 in Adelaide.
Follow Henry Woo on twitter @drhwoo or on his blog at www.surgicalopinion.blogspot.com.au
This document provides introductions from General Peter Pace and Congressman Ike Skelton to a book on the foundations of being a commissioned officer. General Pace emphasizes that the themes of honor, integrity, and moral leadership that defined the profession of arms in the past still apply today. Congressman Skelton commends members of the armed forces for their service and sacrifices. He introduces the book as continuing the tradition of educating officers on their moral and ethical obligations, and emphasizing that understanding the common foundation of military leadership is essential for excellence in joint operations. The book is intended to be the first volume in the professional library of newly commissioned officers.
Nast is a company that sells customizable products and original products to help people express themselves, with a focus on excellent customer service and personal attention to build trust. Customers can contact Nast on Facebook and Twitter.
The document summarizes the history and features of Karczowka Hill in Kielce, Poland. It describes the 17th century post-Bernardine monastery and early Baroque church located at the top of the hill. The hill rises 340 meters above sea level and consists of Devonian limestone covered by an old-growth pine forest. It was turned into a scenic reservation in 1953, and contains historic mining traces and geological phenomena from the mining of lead deposits in the 16th-17th centuries. A unique statue of Saint Barbara carved from a block of galena ore can be seen inside the Karczowka church chapel.
1. Vedran Puclin was born in 1984 in Zagreb, Croatia. On March 25, 2011, he completed a 6 semester professional study in electrical engineering at the Polytechnic of Varaždin.
2. He passed all requisite examinations and met all other obligations to earn the professional title of Electrical Engineer in accordance with applicable regulations.
3. The document, numbered EL154/2011, is certified as a true translation from Croatian by certified court interpreter Snježana Husnjak Pavlek on February 18, 2016 in Varaždin, Croatia.
Employment laws govern virtually every aspect of an employer/ employee relationship. If a dispute arises under employment law, an employment law lawyer can represent workers or employers.
The UN Office for the Coordination of Humanitarian Affairs reports that donors have committed $35.7 million in humanitarian assistance to Mauritania in 2015 to address acute malnutrition. The Strategic Response Plan for Mauritania requests $95 million but is currently only 34% funded. The European Union's Humanitarian Aid department has contributed the most at $13.2 million. Nutrition and food security projects in Mauritania's Hodh El Chargui region have received $1.2 million out of $16.6 million total for such projects nationwide.
Система дистанционного обслуживания – инструмент для оптимизации процессов управления отношениями с клиентами через предоставление сервисов самообслуживания.
How effective is the combination of your main product and ancillary tasks?Jake Wilde
The author is pleased with the outcome of three media products created to promote a film. Continuity was used as the main technique by including the same elements like the film logo, main character, and film name across the products. This clearly links the teaser trailer, film poster, and allows the audience to recognize what genre of film is being promoted. Elements were combined between the ancillary products and teaser/poster specifically to create interest and recognition for viewers.
Elliptic curve cryptography (ECC) is the most efficient public key encryption scheme based on
elliptic curve concepts that can be used to create faster, smaller, and efficient cryptographic keys. ECC
generates keys through the properties of the elliptic curve equation instead of the conventional method of
key generation. This scheme can be used with public key encryption methods, such as RSA, Diffie-Hellman key
exchange and Digital Signature. Review of the four protocols which applies ECC namely Bitcoin, secure
shell (SSH), transport layer security (TLS), and the Austrian e-ID Card describes the high security by using
elliptic curve cryptography.
This document lists over 100 solo and ensemble performances by the individual from 2011 to 2015, including recitals, concerts, and competitions featuring works by composers such as Bach, Sor, Villa-Lobos, and Barrios Mangore. The performances took place at venues in and around Morris, Minnesota as well as locations in Illinois, Manitoba, and the Twin Cities. The extensive list demonstrates the individual's dedication to musical performance and involvement in both solo and ensemble works over several years of study.
Richard Filion has over 15 years of experience in sales management and business development roles. He has a track record of success developing new clients and sales territories. His experience spans several industries including automotive, transportation, tires, and technical services. He is bilingual in French and English with strong communication, negotiation, and client management skills.
Wie eine aktuelle Studie der L'TUR Tourismus AG zeigt, reicht es den meisten Deutschen vollkommen aus, das RTL-Dschungelcamp am Bildschirm zu verfolgen. Einen Urlaub mit Brigitte Nielsen, Jenny Elvers, Gunter Gabriel & Co. können sich die wenigsten vorstellen - vielen wäre das einfach zu peinlich.
This document summarizes a study that used statistical modeling to predict NBA game outcomes based on team statistics. The authors scraped ESPN for game and player stats from 2004-2015, cleaned the data, and calculated advanced metrics like points differential, efficiency, and rebounding rates. Single and multivariate logistic regression was used to identify the most predictive stats and create a prediction model achieving 67% accuracy. Principal component analysis did not improve the model's performance.
This document discusses using machine learning to analyze and predict results from matches in the Indian Premier League (IPL). Data from 2008-2020 is scraped and preprocessed. Models like random forest regression and logistic regression are used to predict first innings scores and the probability of the second team winning. Visualizations show team and player performances in different match situations. The models are deployed in a web app. The analysis provides insights to help teams and players improve strategies. Future work could incorporate more detailed match and player stats.
This document summarizes a study that compares the performance of K-Means clustering implemented in Apache Spark MLlib and MPI (Message Passing Interface). The authors applied K-Means clustering to NBA play-by-play game data to cluster teams based on their position distributions. They found that MPI ran faster for smaller cluster sizes and fewer iterations, while Spark provided more stable runtimes as parameters increased. The authors tested different numbers of machines in MPI and found that runtime increased linearly with more machines, opposite to their expectation of faster runtimes with more machines distributed the work.
The document discusses using machine learning models to predict point totals in NBA games in order to inform sports betting. It explores using collaborative filtering, neural networks, and LSTMs to predict the combined score of both teams. The best models were able to achieve results similar to sportsbooks, correctly predicting the outcome 51.5% of the time based on the mean squared error between the model predictions and actual scores. Feature engineering included team performance statistics from previous games as well as player and opponent data.
This document summarizes a statistical model to predict results of Euro 2016 qualifiers using multivariate regression. The model examines individual player performance data from club matches to predict national team match outcomes. It finds the model can correctly predict match results 62.1% of the time and goal scores in 33.6% of matches. Factor analysis is used to compress defending and attacking player stats into defending and attacking factors for each team.
This document analyzes the performance of teams in the 2013-2014 La Liga season to determine strategic factors for success. It finds strong relationships between goals scored and matches won, as well as goals scored and shots on target. However, teams' offensive strategies are not very efficient, as most shots are wasted. The conclusion recommends that teams focus on strengthening defense, increasing shot efficiency, and motivating defensive players to improve performance going forward.
This document describes analyzing NFL drive data to predict points per drive (PPD). The authors cleaned 16 years of drive data, removing incomplete or clock-running drives. They tested linear regression models to predict PPD from variables like time, starting position, passing/rushing yards and attempts. The best model included passing efficiency, rushing efficiency, and passing/rushing first downs. This supports the hypothesis that passing stats better predict PPD than rushing. The model predicted test set PPD well, showing it effectively captures factors influencing offensive efficiency.
NIT1201 Introduction to Database System Assignment by USA ExpertsJohnsmith5188
The objective of this assignment is for you to put into practice the many different skills that you are learning in this unit into a single cohesive database project.
A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...Shannon Green
This document proposes a hybrid constraint programming and enumeration approach to solve National Hockey League (NHL) playoff qualification and elimination problems. The approach uses constraint programming, enumeration, network flows and decomposition to efficiently determine the minimum points needed to guarantee or possibly earn a playoff spot. It was experimentally tested on NHL data from 2005-06 and 2006-07 seasons, providing earlier qualification results than newspapers. The approach can identify critical "must-win" games that significantly impact playoff chances.
This study simulates NBA seasons with game lengths of 48, 44, and 40 minutes to analyze the effects on competitive balance. The simulation assigns each team a Net Points Per Possession (NPPP) value based on historical data and simulates games using these values. Results showed decreasing game length from 48 to 40 minutes resulted in more parity league-wide, with significant changes in win totals for the best and worst teams, indicating shorter games may improve competitive balance. The NBA has discussed shortening games to reduce player fatigue and minutes as well as better align with international rules.
Shortening NBA games from 48 to 44 or 40 minutes would likely increase competitive balance according to a simulation. For the bottom 5 teams, win totals increased an average of 0.68 wins reducing from 48 to 44 minutes and 1.12 wins reducing to 40 minutes. The top 5 teams saw win totals decrease 0.62 and 1.04 wins respectively. Increased parity could boost league revenues by attracting more fans to less predictable games.
This document discusses projecting fantasy football points for quarterbacks using linear regression models. The authors aim to reproduce and improve projections from various websites by training linear regression models on NFL player statistics and game data from 2008-2014. They explore using single and multiple variable regression models to predict a quarterback's total seasonal fantasy points based on prior year performance and team/opponent factors. The best models will achieve high R2 and low error values when evaluated on a test dataset.
This document summarizes the author's proposal for a new advanced lacrosse statistic called LAX IMPACT! to better evaluate player performance in Major League Lacrosse. It finds current lacrosse stats are limited and don't account for context. LAX IMPACT! calculates a player's points per team possession by considering shots, ground balls, and faceoff wins. An analysis of 2015 MLL data ranks players and finds top teams don't always have highest LAX IMPACT! due to style of play differences. The author believes advanced stats can help teams evaluate players and improve as salaries rise.
The UN Office for the Coordination of Humanitarian Affairs reports that donors have committed $35.7 million in humanitarian assistance to Mauritania in 2015 to address acute malnutrition. The Strategic Response Plan for Mauritania requests $95 million but is currently only 34% funded. The European Union's Humanitarian Aid department has contributed the most at $13.2 million. Nutrition and food security projects in Mauritania's Hodh El Chargui region have received $1.2 million out of $16.6 million total for such projects nationwide.
Система дистанционного обслуживания – инструмент для оптимизации процессов управления отношениями с клиентами через предоставление сервисов самообслуживания.
How effective is the combination of your main product and ancillary tasks?Jake Wilde
The author is pleased with the outcome of three media products created to promote a film. Continuity was used as the main technique by including the same elements like the film logo, main character, and film name across the products. This clearly links the teaser trailer, film poster, and allows the audience to recognize what genre of film is being promoted. Elements were combined between the ancillary products and teaser/poster specifically to create interest and recognition for viewers.
Elliptic curve cryptography (ECC) is the most efficient public key encryption scheme based on
elliptic curve concepts that can be used to create faster, smaller, and efficient cryptographic keys. ECC
generates keys through the properties of the elliptic curve equation instead of the conventional method of
key generation. This scheme can be used with public key encryption methods, such as RSA, Diffie-Hellman key
exchange and Digital Signature. Review of the four protocols which applies ECC namely Bitcoin, secure
shell (SSH), transport layer security (TLS), and the Austrian e-ID Card describes the high security by using
elliptic curve cryptography.
This document lists over 100 solo and ensemble performances by the individual from 2011 to 2015, including recitals, concerts, and competitions featuring works by composers such as Bach, Sor, Villa-Lobos, and Barrios Mangore. The performances took place at venues in and around Morris, Minnesota as well as locations in Illinois, Manitoba, and the Twin Cities. The extensive list demonstrates the individual's dedication to musical performance and involvement in both solo and ensemble works over several years of study.
Richard Filion has over 15 years of experience in sales management and business development roles. He has a track record of success developing new clients and sales territories. His experience spans several industries including automotive, transportation, tires, and technical services. He is bilingual in French and English with strong communication, negotiation, and client management skills.
Wie eine aktuelle Studie der L'TUR Tourismus AG zeigt, reicht es den meisten Deutschen vollkommen aus, das RTL-Dschungelcamp am Bildschirm zu verfolgen. Einen Urlaub mit Brigitte Nielsen, Jenny Elvers, Gunter Gabriel & Co. können sich die wenigsten vorstellen - vielen wäre das einfach zu peinlich.
This document summarizes a study that used statistical modeling to predict NBA game outcomes based on team statistics. The authors scraped ESPN for game and player stats from 2004-2015, cleaned the data, and calculated advanced metrics like points differential, efficiency, and rebounding rates. Single and multivariate logistic regression was used to identify the most predictive stats and create a prediction model achieving 67% accuracy. Principal component analysis did not improve the model's performance.
This document discusses using machine learning to analyze and predict results from matches in the Indian Premier League (IPL). Data from 2008-2020 is scraped and preprocessed. Models like random forest regression and logistic regression are used to predict first innings scores and the probability of the second team winning. Visualizations show team and player performances in different match situations. The models are deployed in a web app. The analysis provides insights to help teams and players improve strategies. Future work could incorporate more detailed match and player stats.
This document summarizes a study that compares the performance of K-Means clustering implemented in Apache Spark MLlib and MPI (Message Passing Interface). The authors applied K-Means clustering to NBA play-by-play game data to cluster teams based on their position distributions. They found that MPI ran faster for smaller cluster sizes and fewer iterations, while Spark provided more stable runtimes as parameters increased. The authors tested different numbers of machines in MPI and found that runtime increased linearly with more machines, opposite to their expectation of faster runtimes with more machines distributed the work.
The document discusses using machine learning models to predict point totals in NBA games in order to inform sports betting. It explores using collaborative filtering, neural networks, and LSTMs to predict the combined score of both teams. The best models were able to achieve results similar to sportsbooks, correctly predicting the outcome 51.5% of the time based on the mean squared error between the model predictions and actual scores. Feature engineering included team performance statistics from previous games as well as player and opponent data.
This document summarizes a statistical model to predict results of Euro 2016 qualifiers using multivariate regression. The model examines individual player performance data from club matches to predict national team match outcomes. It finds the model can correctly predict match results 62.1% of the time and goal scores in 33.6% of matches. Factor analysis is used to compress defending and attacking player stats into defending and attacking factors for each team.
This document analyzes the performance of teams in the 2013-2014 La Liga season to determine strategic factors for success. It finds strong relationships between goals scored and matches won, as well as goals scored and shots on target. However, teams' offensive strategies are not very efficient, as most shots are wasted. The conclusion recommends that teams focus on strengthening defense, increasing shot efficiency, and motivating defensive players to improve performance going forward.
This document describes analyzing NFL drive data to predict points per drive (PPD). The authors cleaned 16 years of drive data, removing incomplete or clock-running drives. They tested linear regression models to predict PPD from variables like time, starting position, passing/rushing yards and attempts. The best model included passing efficiency, rushing efficiency, and passing/rushing first downs. This supports the hypothesis that passing stats better predict PPD than rushing. The model predicted test set PPD well, showing it effectively captures factors influencing offensive efficiency.
NIT1201 Introduction to Database System Assignment by USA ExpertsJohnsmith5188
The objective of this assignment is for you to put into practice the many different skills that you are learning in this unit into a single cohesive database project.
A Hybrid Constraint Programming And Enumeration Approach For Solving NHL Play...Shannon Green
This document proposes a hybrid constraint programming and enumeration approach to solve National Hockey League (NHL) playoff qualification and elimination problems. The approach uses constraint programming, enumeration, network flows and decomposition to efficiently determine the minimum points needed to guarantee or possibly earn a playoff spot. It was experimentally tested on NHL data from 2005-06 and 2006-07 seasons, providing earlier qualification results than newspapers. The approach can identify critical "must-win" games that significantly impact playoff chances.
This study simulates NBA seasons with game lengths of 48, 44, and 40 minutes to analyze the effects on competitive balance. The simulation assigns each team a Net Points Per Possession (NPPP) value based on historical data and simulates games using these values. Results showed decreasing game length from 48 to 40 minutes resulted in more parity league-wide, with significant changes in win totals for the best and worst teams, indicating shorter games may improve competitive balance. The NBA has discussed shortening games to reduce player fatigue and minutes as well as better align with international rules.
Shortening NBA games from 48 to 44 or 40 minutes would likely increase competitive balance according to a simulation. For the bottom 5 teams, win totals increased an average of 0.68 wins reducing from 48 to 44 minutes and 1.12 wins reducing to 40 minutes. The top 5 teams saw win totals decrease 0.62 and 1.04 wins respectively. Increased parity could boost league revenues by attracting more fans to less predictable games.
This document discusses projecting fantasy football points for quarterbacks using linear regression models. The authors aim to reproduce and improve projections from various websites by training linear regression models on NFL player statistics and game data from 2008-2014. They explore using single and multiple variable regression models to predict a quarterback's total seasonal fantasy points based on prior year performance and team/opponent factors. The best models will achieve high R2 and low error values when evaluated on a test dataset.
This document summarizes the author's proposal for a new advanced lacrosse statistic called LAX IMPACT! to better evaluate player performance in Major League Lacrosse. It finds current lacrosse stats are limited and don't account for context. LAX IMPACT! calculates a player's points per team possession by considering shots, ground balls, and faceoff wins. An analysis of 2015 MLL data ranks players and finds top teams don't always have highest LAX IMPACT! due to style of play differences. The author believes advanced stats can help teams evaluate players and improve as salaries rise.
A Time Series Analysis for Predicting Basketball StatisticsJoseph DeLay
This document summarizes a time series analysis of points scored by NBA player Derrick Rose. The analysis found that an IMA(1,1) model best fit the data. When used to forecast future points, the model predictions narrowed to Rose's average points per game due to the limited data points. Adding more seasons of data would improve the model's accuracy for long-term predictions.
InstructionsCongratulations. You are a finalist in for a data a.docxnormanibarber20063
Instructions:
Congratulations. You are a finalist in for a data analyst position for a Major League Baseball (MLB) team. As you prepare for the final round of interviews, you've been asked to use the above data set to create a series of analytics / dashboards to help show how well the team is doing in two important KPIs: home-game attendance and salaries.
Within the MLB, the San Francisco Giants are in the:
League = National League
Division = West (W) Division
The intended audience for this dashboard is the Director of Analytics.
Limitations: Clearly this project is limited in terms of scope of data. In the real world setting there would be ticket sales, customer demographic information, television viewership ratings, social media mentions / hits, and a whole host of additional data to churn through. But (realistically) like any project, it's good to start with a piece of the puzzle at a time, and in sequence. So consider this an initial step in what could be a much larger project.
Two files are needed for this submission: Your Power BI dashboard file and the answers to the questions below (in a Word document).
Broadly speaking this project's learning outcomes include:
· Assigning KPIs
· Trend analysis
· Comparative analysis
· Creating columns and measures
· Creating relationships between multiple data sources
· Creating the best visualization to appropriately show the data
Hint: Use the TeamsMostRecent table as your centralized table that all others are related to (connected with). But only connect Salaries to Team_Statistics and Team_Statistics to TeamsMostRecent as you don't want to have unnecessary relationships that will cause a circular logic in your design.
Hint2: you will need to create a new column to join the Salaries and Team_statistics tables together. What 2 (or more fields) create a unique identifier for each individual row that exists in both of these tables? You will need to use this field to join these tables together.
Analytics portion:
1. Get a sense of the data to start. Create a matrix that has every ball club, each year (2006-2014) and total games played. This will allow you to see if there are any significant gaps in the data. Are there? Explain.
2. a. Choose the most appropriate visualization to show the total attendance for the team from 2006 - 2014. What's their trend? b. Choose the most appropriate visualization to show the total attendance for each year and each club in their division attendance for 2006 - 2014. What is the trend for the team? Which team came closest to surpassing them in attendance and in what year? c. Choose the most appropriate visualization to show how the team's attendance average (combined for all years, 06-14) compares with the attendance average of all other teams in the League. Sort by average attendance in descending order (Most to Least). How are they ranked? Overall is their attendance numbers considered "good" or "bad"? How do you know?
3. Plot all stadium addresses on a map.
This document discusses predicting the outcomes of National Hockey League (NHL) games using machine learning models. It aims to improve upon the results of a previous study by the University of Ottawa that achieved 60% accuracy. The document uses the same dataset from the Ottawa study containing statistics from 517 NHL games. It builds machine learning models using decision trees, neural networks, and a proprietary software to predict game outcomes. The models are built using different combinations of the dataset's categorical and continuous variables. The best performing models achieve accuracies between 57-62%, showing an improvement over the previous study.
Explore the effect of offensive and defensive team productivity in the NBA on wins, 10+ years of NBA regular season data (2002 – 2013).
Key words: data normalization; directional hypotheses; feauture engineering; ols regression; web scraping
Data visualization graduate course final project paper. Paper seeks to identify characteristics of successful NCAA men's basketball teams and how it impacts their recruiting. Research uses three datasets to perform exploratory data analysis and build different dashboards in Tableau.
The document describes a statistical model created to predict NBA playoff results based on regular season statistics. Key steps included collecting NBA data, selecting important features through correlation analysis like net rating and true shooting percentage, training a logistic regression model on past seasons, and testing it on new seasons. The model successfully predicted 6 of 6 teams that reached conference finals, though missed some like the 2023 Miami Heat. Improving the model could include additional stats like all-NBA players. This type of predictive analytics could benefit NBA teams and be applied to other sports.
2. 2
Table of Contents
1. Project Purpose 3
2. Acquiring Data from Internet Sources 4
3. Determining Strength of Teams 5-6
4. 2014 NBA Playoffs Prediction 7
5. 2015 NBA Playoffs Predictions 8
6. Analyzing the Impact of Money on Performance 9
7. Analyzing Elo's Failures 10
7. Determining Players' Performances 11-12
8. Using Player Statistics to Categorize Teams 13
9. What's Next 14
10. Technologies Used 15
3. 3
Project Purpose
The purpose of this project is to attempt to create a platform which analyzes statistics between
professional Basketball players and teams, and by doing so predict the outcome of a future game. To do
this, a platform with three major components was developed in order to Scrape, Develop, and Analyze
data from 10 seasons worth of NBA games.
4. 4
Acquiring Data from Internet Sources
To begin this project, a database of game statistics first needed to be generated. ESPN.com was
selected due to its uniform formatting style between various teams and games. A Web Scraping1
component was developed in Python3 which utilizes BeautifulSoup4 to generate a parse tree for the
HTML content. In order to scrape all seasons worth of data, a recursive scraping algorithm was
developed, which functions like so:
Scrape all of ESPN.go.com/nba/standings to acquire URL's of each team.
Scrape all of the previously acquired team URL's in parallel to acquire the URLs of each game.
Scrape all of the previously acquired game URL's in parallel to acquire the data from each
game.
Store the data acquired by scraper threads in a relational SQLite3 database.
Several problems were encountered along the way of developing this program, the first and
foremost being lack of uniformity between some teams homepage formatting. The most problematic of
teams happens to be the Charlotte Bobcats, or as they are known now the Charlotte Hornets. Due to a
team name change this last season, all of the hyperlinks on ESPN were incorrect and did not display
within the parse tree. In order to solve for this a page parsing algorithm was developed to determine if
either of the teams in the game being parsed were “Charlotte”, if this proved to be true then several
values were hardcoded in order to allow the rest of the scraping component to function correctly.
The second most complicated issue which arose was SQLite3 multi-threaded database writes.
Due to the program launching upwards of 14,000 game scraping threads at a time (in a pipelined
architecture), the database writes experienced a large amount of data contention and thus causing the
database to lock. The solution to this problem just required two steps, which works in the following
way:
If the thread attempted a database write but received a DATABASE LOCKED error, the thread
recursively calls the same function with the same parameters. If this sequence repeats more than
two times, the thread exits and stores the game ID in an error message.
If any threads exited with an error, then the program will output informing how many games
were not able to be scraped. The program will have to run again in order to attempt to scrape
these games.
This method of error handling allows for “Eventual Validity” of the database. What this means is that
although one pass of the program will not achieve an entirely correct database with all games scraped,
the size of games which still need to be scraped decreases every successful run of the program. In order
to scrape one season worth of games, a total of 2-3 runs are required.
1- A technique of extracting information from websites
5. 5
Determining Strength of Teams
The second component in this project is a team strength generator. In order to determine how
important the outcome of a previous game was, it is necessary to have some metric by which we can
rank teams. In order to do this, an Elo ranking system2
was adopted, Elo is a metric developed by
Arpad Elo, a Hungarian physicist. This ranking system functions very well when there are highly
ranked individuals competing against low ranked individuals, and the rating updates should scale
according to the difference in rankings.
The formula reads as follows:
Ea=
1
1+10(RB− RA
400 )
Eb=
1
1+10(RA − RB
400 )
Where Ex is the expected chance that X will be the victor, and RA and RB denote the current ratings of
Team A and Team B respectively.
To calculate the rating change that occurs in the event of a victory for team X, we use the following:
R0=1500
Rx=(Rx)(n−1)+32(W −Ex )
Where W is a binary value, 1 = win, 0 = loss.
For example:
1500 rating vs 1500 rating: The victor of this will gain 25 points, the loser will lose 25 points
1600 rating vs 1400 rating: If the 1400 wins, ~40 points will change hands, while if the 1600
wins only ~15 points will change hands.
In order to implement this component, each team was given an initial rating of 1500, and an algorithm
was developed which functioned in the following steps:
Select all days on which any games were played.
Select all games which were played on each day.
For each day on which games were played, spawn an appropriate number of threads so that
each game's information has its own worker thread.
Update the database to reflect the teams' ELO Rating.
2- http://en.wikipedia.org/wiki/Elo_rating_system
6. 6
This implementation method solves one key problem, when all ten seasons were scraped,
upwards of 13000 games needed to have Elo ratings generated. If this implementation was performed
sequentially, a runtime of 200 minutes would be expected. By performing this in parallel by
decomposing the problem set into days in which games were played, we are able to reduce the problem
set into smaller parallelizable chunks and reduce the runtime to 45 minutes on a quad-core machine.
Analyzing the ELO output, we see an interesting trend for each season. Throughout the season,
each team's maximum ELO rating was recorded, and at the end of each season the top sixteen teams
which made it into the playoffs were put in order. Here is an example of the outputs from the 2013-
2014 season:
Team Abbreviation Max ELO Achieved
sa 1827.277
okc 1776.423
mia 1755.340
lac 1751.937
ind 1746.647
hou 1739.535
por 1735.329
gsw 1696.646
mem 1686.782
bkn 1686.189
chi 1668.387
dal 1649.538
tor 1647.060
cha 1623.499
was 1588.494
atl 1568.493
7. 7
2014 NBA Playoff Predictions
By comparing the maximum Elo rating each team achieved throughout the regular season, we are able
to make predictions of the outcomes of the playoff brackets.
RO16: RO8: RO4: Finals
SA > Dal correct SA > POR correct MIA > IND correct SA > MIA
HOU > POR incorrect OKC > LAC correct SA > OKC correct
LAC > GSW correct IND > WAS correct
OKC > MEM correct MIA > BKN correct
IND > ATL correct
CHI > WAS correct
BKN > TOR correct
MIA > CHA correct
Using these predictions we are able to achieve an accuracy of 15/16 series predicted correctly.
By using the maximum Elo rating each team achieved throughout the season, we are effectively
determining the team's peak performance. Because the playoffs are so high stakes, it is expected that
each team will perform at or near their peak performance. Although any team may win an individual
game, with a seven-game series the chances that the stronger team will win four games is much higher.
We see that the only case in which maximum Elo did not accurately predict the victor of a series was
between Houston and Portland, although Houston had the greater Elo rating than Portland, the
difference was negligible, leading to an incredibly close six game series in which three games went to
overtime and the average spread was 4.7 points.
8. 8
2015 NBA Playoffs Predictions
Comparing this season's maximum ELO's with the previous season, we see a large difference in team
strengths.
2014 Playoffs 2015 Playoffs
sa 1827.27 lac 1741.23
okc 1776.423 mem 1729.13
mia 1755.340 atl 1718.02
lac 1751.937 hou 1716.41
ind 1746.647 gs 1702.32
We see that the peak strength of teams in the 2013-2014 season was significantly higher than the peak
strength of teams in the 2014-2015 season. This would imply that throughout the season, no team was
dominant for large streaks of time, and the team strengths were more balanced.
By comparing the ELO differences in the 2014 playoffs to those we expect in the 2015 playoffs, we can
expect to see a very interesting series between Memphis and Los Angeles in the semifinals, as well as a
much closer finals this year.
9. 9
Analyzing the Impact of Money on Performance
(Data for 2015 season.)
An interesting comparison between teams arises when the total salaries of teams is compared to
the maximum Elo rating that team was able to achieve during a season. We see a tiered distribution of
teams, where those which spent less than ~65M were not able to achieve the peaks that additional
money can bring. We see two outliers in the data set, Atlanta and Brooklyn. Atlanta performed
exceptionally well for a team with its salary, while Brooklyn performed very poorly for a team with the
highest salary in the league. These results help explain the predicted brackets, and give us another piece
of information about the teams.
The teams which are circled in Red denote the teams that made it into the playoffs. We see that
14/18 teams which spent more than 70 million or more made it into the playoffs, while only 3 that
spent less than 70 million were able to make the playoffs.
10. 10
Analyzing Elo's Failures
In addition to Elo's effectiveness in a seven-game series, it is effective in predicting single
games. By comparing the Elo rating of each team in all games with the result of the game, we are able
to predict the victor of the game with an accuracy of 65.6%. This is significantly lower than the 15/16
accuracy we are able to achieve through a seven-game series, but still a very good result.
In order to improve our overall accuracy, we must look at the games in which a prediction using
purely Elo did not prove to be accurate.
We see that the Elo differential is skewed heavily toward the lower end of the plot. This means
that as the Elo differential between teams increases, the likelihood that the lower rated team will prove
to be victorious declines rapidly. This was to be expected through the definition of Elo rating, but using
this graph will allow us to develop a percent chance the team will win based on Elo differential.
In order to analyze why the lower rated team was able to be victorious, we must move toward a
lower level view than simply team ratings. Since a whole is just the sum of its parts, we will look at the
performances of individual players in these games to determine what factors led to the underdog
coming out on top.
11. 11
Determining Players' Performances
Now that we have developed a ranking system for each team, we must determine how well each
player played in each game, as well as determine how they performed against the strength of the other
team. We must do so in order to develop a player-based simulation in which the team's performance is
equal to the sum of each of its players' performances. In order to do this effectively, we must gauge a
players' performance based on their previous performances as well as a standardized performance
metric. The metrics which we will be using to measure the players' performances will be:
Performance Index Rating (PIR)3
Normalized Performance Rating(NPR)4
Both of these metrics will use all measurable statistics from all games for which data has been tracked.
Minutes Played, Field Goals (M/A), Three Pointers(M/A), Free Throws(M/A), Offensive
Rebounds, Defensive Rebounds, Assists, Steals, Turnovers, Points, Plus Minus
In order to calculate the PIR values for each player for each game, we will use this fairly simple
formula.
PIR=(Points+2∗Rebounds+ Assists+Steals+2∗Blocks+Fouls Drawn)
−(Missed FG' s+ Missed FT ' s+Turnovers+Fouls+ShotsBlocked )
This provides us with a generic way to determine a players' performance in a game, including
both his offensive and defensive contributions. The use of this metric to determine a team's
performance is also quite reliable. By generating a Team Performance Rating (TPR) using the
following formula:
TPR= ∑
i=1
i=numplayers
Playeri. PIR
When comparing the TPR's of teams within each game, we are able to achieve an incredible
90.6% accuracy rating of predicting the winner of a game. Although this value is incredibly accurate,
the values from which it is calculated are intrinsically related to the outcome of a game. If we are able
to predict individual players' PIR with no knowledge of a future game, we will have an incredibly
accurate method of determining individual players' performances as well as the outcome of any
individual game. We will return to this TPR statistic when we begin to generate simulations for each
player.
3- Performance Index Rating (PIR) is a basketball statistical formula that is used in a variety of European Basketball
leagues. It is similar but not identical to the Efficiency rating used by the NBA.
4- Not related to National Public Radio.
12. 12
To compute the NPR values for each player within each game, we must develop an algorithm
which functions in the following way:
Create a player object for each player in the league.
Go through each game in order.
◦ For each game, attribute each players' performances to the appropriate player.
◦ Using the players μ and σ values up to the game being measured, determine how many
standard deviations from mean the players performance for each statistic was.
▪ The NPR value for this game will be:
∑
i=0
i=num− variables
(VariablePerformancei −μi
σi
)
By doing so, we can go through each player's game performances sequentially and determine
the mean and standard deviation of each of these metrics at the time that any game is played. We can
then compare the player's performance in that particular game to determine where along their standard
curve they fall by using a fairly simple algorithm:
This NPR value will tell us how well this player performed in regards to his average. A positive
value will imply that he performed better than an average game for him, while a negative value implies
the opposite. We will use this value to predict a player's momentum. For example if a player has the
following NPR values for his previous 5 games:
• 0.5
• 1.6
• 6.4
• 16.4
• 15.8
We would imply that the player has been “Heating up”, and would predict that his hot streak
will continue through to his next game. It is important to note that some players may be able to
generate inflated NPR values due to simply having very few minutes played.
• Example: A player with an average of 1 minute played, and 0 in all statistics who played for 5
minutes one game and managed to score a few points would be able to achieve astronomical
NPR values which may not even be achievable by Lebron James.
We will account for this effect by multiplying the NPR value by the percentage of minutes
played within a game. This will scale down a player's performance to account for their actual game
contribution, while still displaying their personal momentum.
Individually, the NPR statistic does not tell us much about the outcome of a game, since it
simply projects a player's game performance against his previous performances, but it does allow us to
predict with some certainty how well the player has been performing recently, and thus generate a 5
and 10-game moving momentum.
13. 13
Using Player Statistics to Categorize Teams
By using a team's player-level statistics, we can try to categorize teams into whether the team
won or not. Using a SVM5
model like so:
svm(Win~fga+tpa+fta+oreb+dreb+assist+steal+block+turnover+fouls+npr)
Note that all information about points gained is removed in order to prevent direct influence
into the SVM Model. Using this model, we can make a prediction of the teams within our data set, and
output this prediction as a table where the SVM proved correct, and where it failed.
> independent = fga + tpa + fta + oreb + dreb + assist + steal + block + turnover + fouls + npr
> model = svm(Win~independent, data = overall)
> pred = predict(model, independent, type = "class")
> table(pred = round(pred), true=dependent)
Actual
pred 0 1
0 776 173
1 122 725
These bolded figures in our table are the important output of the SVM. This tells us that our
SVM model places 776 /949 losing teams correctly, and 725/847 winning team correctly. This means
that if we can develop a model which predicts a team's performances in the tracked variables,
FGA, TPA, FTA, OREB, DREB, ASSIST, STEAL, BLOCK, TURNOVER, FOULS, NPR
Then we will be able to place that point into N-dimensional space on either side of a N-dimensional
plane. Based on which side of the N-dimensional plane the point lands, then we will predict that team
will win or lose the game that was simulated. We can now use this model to predict a team's chances of
winning a game, based on simulated values for each player.
5 Support Vector Models try to classify vectors in N dimensions based on a dependent variable.
14. 14
What's Next
-Develop a simulation which tries to give a predicted
value for each team's performances. Using the SVM, predict
whether the team's performance classifies as a Win or Loss.
-Dig deeper into Salary ~ Performance player-level
statistics
-Compare predictions against Las Vegas bookie odds,
determine approximate returns if I bet $100 on all predicted
teams.
15. 15
Technologies Used
SQLite3 : Lightweight database which has SQL syntax.
Git : Version Control.
Docker : Virtual Environment which contains all library
dependencies, packages, data to run the program.
Python : Web Scraping.
C++ : Team Elo, NPR, PIR generator, Game simulations.
R : Statistics on gathered data.