SlideShare a Scribd company logo
1 of 54
Predicting the NBA MVP with Data Science
bit.ly/nba-la
CrossCamp.us Events
About us
We train developers and data
scientists through 1-on-1
mentorship and career prep
About me
• Alex Nussbacher
• Lead Data Science Instructor at Thinkful
• Data scientist at Uber, focus on consumption
economics and economics of choice 🤔
What’s your background?
• I have a software background
• I have a math or stats background
• None of the above
Data Science Process
• Frame the question.
• Collect the raw data.
• Process the data.
• Explore the data.
• Communicate results.
Frame the question
• Who will win the MVP in the NBA this
season?
Collect the Data
• What kind of data do we need?
• Individual stats
• Team stats and success
• Past winners and voting records
• All data from basketball-reference.com
Process the data
• How’s the data “dirty” and how can we fix it?
• User input, redundancies, missing data…
• Formatting: adapt the data to meet certain
specifications.
• Cleaning: detecting and correcting corrupt
or inaccurate records.
Explore the data
• What are the meaningful patterns in the
data?
• How meaningful is each data point for our
predictions?
Goals
• Introduction to a data scientist's tools and
methods:
• Jupyter notebooks, numpy, pandas,
sklearn…
• Overview of basic machine learning concepts:
• Data formatting and cleaning, Decision
trees, Overfitting, Random Forests…
Jupyter Notebooks
• One of data scientist’s everyday tools.
• Find the links in our classroom tool.
• Contains cells with code.
NumPy
• The fundamental package for scientific
computing with Python.
• Provides powerful multi-dimensional array
objects.
• Many methods for fast operations on arrays.
Pandas
• Fundamental high-level building block for
doing practical, real world data analysis in
Python.
• Built on top of NumPy.
• Offers data structures and operations for
manipulating numerical tables and time
series.
Scikit-learn
• Python module for machine learning.
• Provides a large menu of libraries for
scientific computation, such as integration,
interpolation, signal processing, linear
algebra, statistics, etc.
Initial imports and loading data with Pandas
Understanding your data
• .head(n) method: Returns first n rows.
• .value_counts() method: Returns the counts
of unique values in the DataFrame.
Training Set
• We loaded in our data as a training set.
• This is because we’re going to use this data
to build, or train, our model
• It consists of every year for which we have
data on NBA MVP voting, from the 1955-56
season onward
Formatting your Data
Formatting your Data
• We need to put our data in the easiest to use
format
• No blanks allowed
• Numeric strings (like win loss record) need to
have the numbers extracted and typed as
integers
• Factors, or categories, need to be changed to
dummies, which report a 0 or 1 to show if that
value is present
Decision Trees
• It breaks down a dataset into smaller and
smaller subsets.
• The final result is a model with a tree
structure that has:
• Decision nodes: ask a question and have
two or more branches.
• Leaf nodes: represent a classification or
decision.
Classification vs Regression
• Classification — Predict categories.
• Identifying group membership.
• Regression — Predict values.
• Involves estimating or predicting a
response.
Classification
Classification
?
Regression
• Regression — Predict values.
• Involves estimating or predicting a
response.
• This is what we’ll be doing. Predicting
vote share…
Creating your first Decision Tree
You will use the scikit-learn and numpy libraries
to build your first decision tree. We will need the
following to build a decision tree
• Response (y): A one-dimensional array or
series containing the target from the train
data.
• Inputs (X): A multidimensional pandas data
frame containing the features/predictors from
the train data.
Creating your first Decision Tree
Importances and Score
• .feature_importances_ attribute: tells us
how important the features are for the final
result.
• .score() method: returns the mean accuracy
of our fitting.
Importances and Score
That looks good…
But that’s actually not clear.
CLASS IMBALANCE
• We have what is called a class imbalance
problem.
• The outcome of not being MVP is much much
more common than being the MVP,
• So our model is ‘accurate’ if it just tells
everyone they’re not going to be MVP
Looking closer
Looking at our results
• We seem to be doing a decent job of
identifying players who are great players
• But the ordering isn’t perfect
• And we have a lot of people who are scored
as equivalent
• Also note this seems to be a year with a lot of
great performers this year
Let’s improve it!
• We have options for improving the model
• Firstly, we can look at our feature list and
select a smaller but more effective list of
features
• We could also choose a better type of
model…
Let’s improve it!
Modify the feature list
• We put a lot of features into our model
• Trimming it down to a smaller list could
improve the efficiency of our trees and
possibly improve accuracy as well
Overfitting
• Resulting model too tied to the training set.
• It doesn’t generalize to new data, which is the
point of prediction.
Random Forest Classifier
• Random Forest Classifiers use many
Decision Trees to build a classifier.
• We introduce a bit of randomness.
• Each Tree can uses a subset of the data to
give a different answer (a vote). The final
classification is the most common amongst
the Trees.
Random Forest Classifier
Results
And the MVP goes to…
Russell Westbrook!
What’s going on?
• Our model is giving good weight to major
statistical categories and position, but not
enough to team record…
• How could you fix continue to improve???
Trim our variable list…
2016
STEPH!
2008
Kobe!
1996
MJ!
The End
More about Thinkful
• Anyone who’s committed can learn to code
• 1-on-1 mentorship is the best way to learn
• Flexibility! Learn anywhere, anytime, & at your
own pace
Our Program
You’ll learn concepts, practice with drills, and build
capstone projects — all guided by a personal mentor
Our Mentors
Mentors have, on average, 10+ years of experience
Data Science Syllabus
• Managing data with SQL and Python
• Modeling with both supervised and unsupervised
models
• Data visualization and communicating with data
• Technical interviews + career services
Special Introductory Offer
• Prep course for 50% off —
$250 instead of $500
• Covers math, stats,
Python, and data science
toolkit
• Option to continue into full
program
• Talk to me (or email
noel@thinkful.com) if
you’re interested

More Related Content

What's hot

Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics Akanksha Bali
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine LearningPranav Ainavolu
 
Ensemble methods for modeling financial data
Ensemble methods for modeling financial dataEnsemble methods for modeling financial data
Ensemble methods for modeling financial dataGaurav Chakravorty
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1Sara Hooker
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientistPoo Kuan Hoong
 

What's hot (6)

Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
Ensemble methods for modeling financial data
Ensemble methods for modeling financial dataEnsemble methods for modeling financial data
Ensemble methods for modeling financial data
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 

Similar to Predicting the NBA MVP

Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsVivastream
 
Machine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup EventMachine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup EventBenjamin Schulte
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptxjasontseng19
 
datamining and warehousing ppt
datamining  and warehousing pptdatamining  and warehousing ppt
datamining and warehousing pptSatyamverma2011
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Rohit Dubey
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product ManagerProduct School
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - SlidesAditya Joshi
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NETDev Raj Gautam
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
Using Net Promoter Score (NPS) to Increase Course Engagement
Using Net Promoter Score (NPS) to Increase Course EngagementUsing Net Promoter Score (NPS) to Increase Course Engagement
Using Net Promoter Score (NPS) to Increase Course EngagementLambda Solutions
 
Data Analyst Job Description | Edureka
Data Analyst Job Description | EdurekaData Analyst Job Description | Edureka
Data Analyst Job Description | EdurekaEdureka!
 
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...Louis Dorard
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionBigML, Inc
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...PAPIs.io
 

Similar to Predicting the NBA MVP (20)

Hpd 1
Hpd 1Hpd 1
Hpd 1
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Machine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup EventMachine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup Event
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptx
 
datamining and warehousing ppt
datamining  and warehousing pptdatamining  and warehousing ppt
datamining and warehousing ppt
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product Manager
 
Data Analytics course.pptx
Data Analytics course.pptxData Analytics course.pptx
Data Analytics course.pptx
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
Using Net Promoter Score (NPS) to Increase Course Engagement
Using Net Promoter Score (NPS) to Increase Course EngagementUsing Net Promoter Score (NPS) to Increase Course Engagement
Using Net Promoter Score (NPS) to Increase Course Engagement
 
Data Analyst Job Description | Edureka
Data Analyst Job Description | EdurekaData Analyst Job Description | Edureka
Data Analyst Job Description | Edureka
 
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
 

More from Thinkful

893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370Thinkful
 
LA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: FundamentalsLA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: FundamentalsThinkful
 
LA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: FundamentalsLA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: FundamentalsThinkful
 
Twit botsd1.30.18
Twit botsd1.30.18Twit botsd1.30.18
Twit botsd1.30.18Thinkful
 
Build your-own-instagram-filters-with-javascript-202-335 (1)
Build your-own-instagram-filters-with-javascript-202-335 (1)Build your-own-instagram-filters-with-javascript-202-335 (1)
Build your-own-instagram-filters-with-javascript-202-335 (1)Thinkful
 
Baggwjs124
Baggwjs124Baggwjs124
Baggwjs124Thinkful
 
Become a Data Scientist: A Thinkful Info Session
Become a Data Scientist: A Thinkful Info SessionBecome a Data Scientist: A Thinkful Info Session
Become a Data Scientist: A Thinkful Info SessionThinkful
 
Vpet sd-1.25.18
Vpet sd-1.25.18Vpet sd-1.25.18
Vpet sd-1.25.18Thinkful
 
LA 1/18/18 Become A Web Developer: A Thinkful Info Session
LA 1/18/18 Become A Web Developer: A Thinkful Info SessionLA 1/18/18 Become A Web Developer: A Thinkful Info Session
LA 1/18/18 Become A Web Developer: A Thinkful Info SessionThinkful
 
How to Choose a Programming Language
How to Choose a Programming LanguageHow to Choose a Programming Language
How to Choose a Programming LanguageThinkful
 
Batbwjs117
Batbwjs117Batbwjs117
Batbwjs117Thinkful
 
1/16/18 Intro to JS Workshop
1/16/18 Intro to JS Workshop1/16/18 Intro to JS Workshop
1/16/18 Intro to JS WorkshopThinkful
 
LA 1/16/18 Intro to Javascript: Fundamentals
LA 1/16/18 Intro to Javascript: FundamentalsLA 1/16/18 Intro to Javascript: Fundamentals
LA 1/16/18 Intro to Javascript: FundamentalsThinkful
 
(LA 1/16/18) Intro to JavaScript: Fundamentals
(LA 1/16/18) Intro to JavaScript: Fundamentals(LA 1/16/18) Intro to JavaScript: Fundamentals
(LA 1/16/18) Intro to JavaScript: FundamentalsThinkful
 
Websitesd1.15.17.
Websitesd1.15.17.Websitesd1.15.17.
Websitesd1.15.17.Thinkful
 
Bavpwjs110
Bavpwjs110Bavpwjs110
Bavpwjs110Thinkful
 
Byowwhc110
Byowwhc110Byowwhc110
Byowwhc110Thinkful
 
Getting started-jan-9-2018
Getting started-jan-9-2018Getting started-jan-9-2018
Getting started-jan-9-2018Thinkful
 
Introjs1.9.18tf
Introjs1.9.18tfIntrojs1.9.18tf
Introjs1.9.18tfThinkful
 

More from Thinkful (20)

893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
 
LA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: FundamentalsLA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: Fundamentals
 
LA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: FundamentalsLA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: Fundamentals
 
Itjsf129
Itjsf129Itjsf129
Itjsf129
 
Twit botsd1.30.18
Twit botsd1.30.18Twit botsd1.30.18
Twit botsd1.30.18
 
Build your-own-instagram-filters-with-javascript-202-335 (1)
Build your-own-instagram-filters-with-javascript-202-335 (1)Build your-own-instagram-filters-with-javascript-202-335 (1)
Build your-own-instagram-filters-with-javascript-202-335 (1)
 
Baggwjs124
Baggwjs124Baggwjs124
Baggwjs124
 
Become a Data Scientist: A Thinkful Info Session
Become a Data Scientist: A Thinkful Info SessionBecome a Data Scientist: A Thinkful Info Session
Become a Data Scientist: A Thinkful Info Session
 
Vpet sd-1.25.18
Vpet sd-1.25.18Vpet sd-1.25.18
Vpet sd-1.25.18
 
LA 1/18/18 Become A Web Developer: A Thinkful Info Session
LA 1/18/18 Become A Web Developer: A Thinkful Info SessionLA 1/18/18 Become A Web Developer: A Thinkful Info Session
LA 1/18/18 Become A Web Developer: A Thinkful Info Session
 
How to Choose a Programming Language
How to Choose a Programming LanguageHow to Choose a Programming Language
How to Choose a Programming Language
 
Batbwjs117
Batbwjs117Batbwjs117
Batbwjs117
 
1/16/18 Intro to JS Workshop
1/16/18 Intro to JS Workshop1/16/18 Intro to JS Workshop
1/16/18 Intro to JS Workshop
 
LA 1/16/18 Intro to Javascript: Fundamentals
LA 1/16/18 Intro to Javascript: FundamentalsLA 1/16/18 Intro to Javascript: Fundamentals
LA 1/16/18 Intro to Javascript: Fundamentals
 
(LA 1/16/18) Intro to JavaScript: Fundamentals
(LA 1/16/18) Intro to JavaScript: Fundamentals(LA 1/16/18) Intro to JavaScript: Fundamentals
(LA 1/16/18) Intro to JavaScript: Fundamentals
 
Websitesd1.15.17.
Websitesd1.15.17.Websitesd1.15.17.
Websitesd1.15.17.
 
Bavpwjs110
Bavpwjs110Bavpwjs110
Bavpwjs110
 
Byowwhc110
Byowwhc110Byowwhc110
Byowwhc110
 
Getting started-jan-9-2018
Getting started-jan-9-2018Getting started-jan-9-2018
Getting started-jan-9-2018
 
Introjs1.9.18tf
Introjs1.9.18tfIntrojs1.9.18tf
Introjs1.9.18tf
 

Recently uploaded

Jual obat aborsi Madiun ( 085657271886 ) Cytote pil telat bulan penggugur kan...
Jual obat aborsi Madiun ( 085657271886 ) Cytote pil telat bulan penggugur kan...Jual obat aborsi Madiun ( 085657271886 ) Cytote pil telat bulan penggugur kan...
Jual obat aborsi Madiun ( 085657271886 ) Cytote pil telat bulan penggugur kan...ZurliaSoop
 
Top 10 French Footballers for Euro Cup 2024.docx
Top 10 French Footballers for Euro Cup 2024.docxTop 10 French Footballers for Euro Cup 2024.docx
Top 10 French Footballers for Euro Cup 2024.docxEuro Cup 2024 Tickets
 
IPL Betting ID: The best cricket ID provider | The best beting ID
IPL Betting ID: The best cricket ID provider | The best beting IDIPL Betting ID: The best cricket ID provider | The best beting ID
IPL Betting ID: The best cricket ID provider | The best beting IDsilverexchange id
 
Silverexchange ID: To play on the Silver Exchange, Get ID from Virat777
Silverexchange ID: To play on the Silver Exchange, Get ID from Virat777Silverexchange ID: To play on the Silver Exchange, Get ID from Virat777
Silverexchange ID: To play on the Silver Exchange, Get ID from Virat777silverexchange id
 
virat kohli presentation in life jurney
virat kohli presentation in life  jurneyvirat kohli presentation in life  jurney
virat kohli presentation in life jurneyjaydeeplabana77
 
Albania Vs Spain Albania and Serbia in the race to host EURO 2027 AFF formali...
Albania Vs Spain Albania and Serbia in the race to host EURO 2027 AFF formali...Albania Vs Spain Albania and Serbia in the race to host EURO 2027 AFF formali...
Albania Vs Spain Albania and Serbia in the race to host EURO 2027 AFF formali...World Wide Tickets And Hospitality
 
TAM Sports-IPL 17 Advertising Report- M01 - M55.xlsx - IPL 17 FCT (Commercial...
TAM Sports-IPL 17 Advertising Report- M01 - M55.xlsx - IPL 17 FCT (Commercial...TAM Sports-IPL 17 Advertising Report- M01 - M55.xlsx - IPL 17 FCT (Commercial...
TAM Sports-IPL 17 Advertising Report- M01 - M55.xlsx - IPL 17 FCT (Commercial...Social Samosa
 
Techniques for those who create fantasy sports Developers
Techniques for those who create fantasy sports DevelopersTechniques for those who create fantasy sports Developers
Techniques for those who create fantasy sports DevelopersAndrew Mathew
 
Romania vs Ukraine Euro 2024 Prediction Who will get off to a great start in ...
Romania vs Ukraine Euro 2024 Prediction Who will get off to a great start in ...Romania vs Ukraine Euro 2024 Prediction Who will get off to a great start in ...
Romania vs Ukraine Euro 2024 Prediction Who will get off to a great start in ...Eticketing.co
 
Ozan Tufan Eyes Turkey Return and Portugal Finalizes Euro 2024 Squad.docx
Ozan Tufan Eyes Turkey Return and Portugal Finalizes Euro 2024 Squad.docxOzan Tufan Eyes Turkey Return and Portugal Finalizes Euro 2024 Squad.docx
Ozan Tufan Eyes Turkey Return and Portugal Finalizes Euro 2024 Squad.docxEuro Cup 2024 Tickets
 
Belgium Vs Romania First look at Romania UEFA at the start of EURO 2024.docx
Belgium Vs Romania First look at Romania UEFA at the start of EURO 2024.docxBelgium Vs Romania First look at Romania UEFA at the start of EURO 2024.docx
Belgium Vs Romania First look at Romania UEFA at the start of EURO 2024.docxWorld Wide Tickets And Hospitality
 
NFL SOAP BOX WEEKLY REPORTS- 2024 NFL Schedules
NFL SOAP BOX WEEKLY REPORTS- 2024 NFL SchedulesNFL SOAP BOX WEEKLY REPORTS- 2024 NFL Schedules
NFL SOAP BOX WEEKLY REPORTS- 2024 NFL SchedulesBrian Slack
 
Nirupam Singh on Fan Development, Growth, and the Future of Formula 1
Nirupam Singh on Fan Development, Growth, and the Future of Formula 1Nirupam Singh on Fan Development, Growth, and the Future of Formula 1
Nirupam Singh on Fan Development, Growth, and the Future of Formula 1Neil Horowitz
 
Italy Vs Albania Italy squad for Euro Cup 2024 Who should be in the starting ...
Italy Vs Albania Italy squad for Euro Cup 2024 Who should be in the starting ...Italy Vs Albania Italy squad for Euro Cup 2024 Who should be in the starting ...
Italy Vs Albania Italy squad for Euro Cup 2024 Who should be in the starting ...World Wide Tickets And Hospitality
 
JORNADA 7 LIGA MURO 2024BASQUETBOL12.pdf
JORNADA 7 LIGA MURO 2024BASQUETBOL12.pdfJORNADA 7 LIGA MURO 2024BASQUETBOL12.pdf
JORNADA 7 LIGA MURO 2024BASQUETBOL12.pdfArturo Pacheco Alvarez
 
Smart Coach Radar For Volleyball - Pocke Radar
Smart Coach Radar For Volleyball - Pocke RadarSmart Coach Radar For Volleyball - Pocke Radar
Smart Coach Radar For Volleyball - Pocke RadarPocket Radar Inc.
 
Croatia Vs Albania Spain vs Croatia Prediction, Croatia odds-on favorites for...
Croatia Vs Albania Spain vs Croatia Prediction, Croatia odds-on favorites for...Croatia Vs Albania Spain vs Croatia Prediction, Croatia odds-on favorites for...
Croatia Vs Albania Spain vs Croatia Prediction, Croatia odds-on favorites for...World Wide Tickets And Hospitality
 
2024 IFFL DRAFT LOTTERY REVIEW-5.12.2024
2024 IFFL DRAFT LOTTERY REVIEW-5.12.20242024 IFFL DRAFT LOTTERY REVIEW-5.12.2024
2024 IFFL DRAFT LOTTERY REVIEW-5.12.2024Brian Slack
 

Recently uploaded (19)

Jual obat aborsi Madiun ( 085657271886 ) Cytote pil telat bulan penggugur kan...
Jual obat aborsi Madiun ( 085657271886 ) Cytote pil telat bulan penggugur kan...Jual obat aborsi Madiun ( 085657271886 ) Cytote pil telat bulan penggugur kan...
Jual obat aborsi Madiun ( 085657271886 ) Cytote pil telat bulan penggugur kan...
 
Top 10 French Footballers for Euro Cup 2024.docx
Top 10 French Footballers for Euro Cup 2024.docxTop 10 French Footballers for Euro Cup 2024.docx
Top 10 French Footballers for Euro Cup 2024.docx
 
Austria Vs France Euro Cup predictions and tips.docx
Austria Vs France Euro Cup predictions and tips.docxAustria Vs France Euro Cup predictions and tips.docx
Austria Vs France Euro Cup predictions and tips.docx
 
IPL Betting ID: The best cricket ID provider | The best beting ID
IPL Betting ID: The best cricket ID provider | The best beting IDIPL Betting ID: The best cricket ID provider | The best beting ID
IPL Betting ID: The best cricket ID provider | The best beting ID
 
Silverexchange ID: To play on the Silver Exchange, Get ID from Virat777
Silverexchange ID: To play on the Silver Exchange, Get ID from Virat777Silverexchange ID: To play on the Silver Exchange, Get ID from Virat777
Silverexchange ID: To play on the Silver Exchange, Get ID from Virat777
 
virat kohli presentation in life jurney
virat kohli presentation in life  jurneyvirat kohli presentation in life  jurney
virat kohli presentation in life jurney
 
Albania Vs Spain Albania and Serbia in the race to host EURO 2027 AFF formali...
Albania Vs Spain Albania and Serbia in the race to host EURO 2027 AFF formali...Albania Vs Spain Albania and Serbia in the race to host EURO 2027 AFF formali...
Albania Vs Spain Albania and Serbia in the race to host EURO 2027 AFF formali...
 
TAM Sports-IPL 17 Advertising Report- M01 - M55.xlsx - IPL 17 FCT (Commercial...
TAM Sports-IPL 17 Advertising Report- M01 - M55.xlsx - IPL 17 FCT (Commercial...TAM Sports-IPL 17 Advertising Report- M01 - M55.xlsx - IPL 17 FCT (Commercial...
TAM Sports-IPL 17 Advertising Report- M01 - M55.xlsx - IPL 17 FCT (Commercial...
 
Techniques for those who create fantasy sports Developers
Techniques for those who create fantasy sports DevelopersTechniques for those who create fantasy sports Developers
Techniques for those who create fantasy sports Developers
 
Romania vs Ukraine Euro 2024 Prediction Who will get off to a great start in ...
Romania vs Ukraine Euro 2024 Prediction Who will get off to a great start in ...Romania vs Ukraine Euro 2024 Prediction Who will get off to a great start in ...
Romania vs Ukraine Euro 2024 Prediction Who will get off to a great start in ...
 
Ozan Tufan Eyes Turkey Return and Portugal Finalizes Euro 2024 Squad.docx
Ozan Tufan Eyes Turkey Return and Portugal Finalizes Euro 2024 Squad.docxOzan Tufan Eyes Turkey Return and Portugal Finalizes Euro 2024 Squad.docx
Ozan Tufan Eyes Turkey Return and Portugal Finalizes Euro 2024 Squad.docx
 
Belgium Vs Romania First look at Romania UEFA at the start of EURO 2024.docx
Belgium Vs Romania First look at Romania UEFA at the start of EURO 2024.docxBelgium Vs Romania First look at Romania UEFA at the start of EURO 2024.docx
Belgium Vs Romania First look at Romania UEFA at the start of EURO 2024.docx
 
NFL SOAP BOX WEEKLY REPORTS- 2024 NFL Schedules
NFL SOAP BOX WEEKLY REPORTS- 2024 NFL SchedulesNFL SOAP BOX WEEKLY REPORTS- 2024 NFL Schedules
NFL SOAP BOX WEEKLY REPORTS- 2024 NFL Schedules
 
Nirupam Singh on Fan Development, Growth, and the Future of Formula 1
Nirupam Singh on Fan Development, Growth, and the Future of Formula 1Nirupam Singh on Fan Development, Growth, and the Future of Formula 1
Nirupam Singh on Fan Development, Growth, and the Future of Formula 1
 
Italy Vs Albania Italy squad for Euro Cup 2024 Who should be in the starting ...
Italy Vs Albania Italy squad for Euro Cup 2024 Who should be in the starting ...Italy Vs Albania Italy squad for Euro Cup 2024 Who should be in the starting ...
Italy Vs Albania Italy squad for Euro Cup 2024 Who should be in the starting ...
 
JORNADA 7 LIGA MURO 2024BASQUETBOL12.pdf
JORNADA 7 LIGA MURO 2024BASQUETBOL12.pdfJORNADA 7 LIGA MURO 2024BASQUETBOL12.pdf
JORNADA 7 LIGA MURO 2024BASQUETBOL12.pdf
 
Smart Coach Radar For Volleyball - Pocke Radar
Smart Coach Radar For Volleyball - Pocke RadarSmart Coach Radar For Volleyball - Pocke Radar
Smart Coach Radar For Volleyball - Pocke Radar
 
Croatia Vs Albania Spain vs Croatia Prediction, Croatia odds-on favorites for...
Croatia Vs Albania Spain vs Croatia Prediction, Croatia odds-on favorites for...Croatia Vs Albania Spain vs Croatia Prediction, Croatia odds-on favorites for...
Croatia Vs Albania Spain vs Croatia Prediction, Croatia odds-on favorites for...
 
2024 IFFL DRAFT LOTTERY REVIEW-5.12.2024
2024 IFFL DRAFT LOTTERY REVIEW-5.12.20242024 IFFL DRAFT LOTTERY REVIEW-5.12.2024
2024 IFFL DRAFT LOTTERY REVIEW-5.12.2024
 

Predicting the NBA MVP

  • 1. Predicting the NBA MVP with Data Science bit.ly/nba-la CrossCamp.us Events
  • 2. About us We train developers and data scientists through 1-on-1 mentorship and career prep
  • 3. About me • Alex Nussbacher • Lead Data Science Instructor at Thinkful • Data scientist at Uber, focus on consumption economics and economics of choice 🤔
  • 4. What’s your background? • I have a software background • I have a math or stats background • None of the above
  • 5. Data Science Process • Frame the question. • Collect the raw data. • Process the data. • Explore the data. • Communicate results.
  • 6. Frame the question • Who will win the MVP in the NBA this season?
  • 7. Collect the Data • What kind of data do we need? • Individual stats • Team stats and success • Past winners and voting records • All data from basketball-reference.com
  • 8. Process the data • How’s the data “dirty” and how can we fix it? • User input, redundancies, missing data… • Formatting: adapt the data to meet certain specifications. • Cleaning: detecting and correcting corrupt or inaccurate records.
  • 9. Explore the data • What are the meaningful patterns in the data? • How meaningful is each data point for our predictions?
  • 10. Goals • Introduction to a data scientist's tools and methods: • Jupyter notebooks, numpy, pandas, sklearn… • Overview of basic machine learning concepts: • Data formatting and cleaning, Decision trees, Overfitting, Random Forests…
  • 11. Jupyter Notebooks • One of data scientist’s everyday tools. • Find the links in our classroom tool. • Contains cells with code.
  • 12. NumPy • The fundamental package for scientific computing with Python. • Provides powerful multi-dimensional array objects. • Many methods for fast operations on arrays.
  • 13. Pandas • Fundamental high-level building block for doing practical, real world data analysis in Python. • Built on top of NumPy. • Offers data structures and operations for manipulating numerical tables and time series.
  • 14. Scikit-learn • Python module for machine learning. • Provides a large menu of libraries for scientific computation, such as integration, interpolation, signal processing, linear algebra, statistics, etc.
  • 15. Initial imports and loading data with Pandas
  • 16. Understanding your data • .head(n) method: Returns first n rows. • .value_counts() method: Returns the counts of unique values in the DataFrame.
  • 17. Training Set • We loaded in our data as a training set. • This is because we’re going to use this data to build, or train, our model • It consists of every year for which we have data on NBA MVP voting, from the 1955-56 season onward
  • 19. Formatting your Data • We need to put our data in the easiest to use format • No blanks allowed • Numeric strings (like win loss record) need to have the numbers extracted and typed as integers • Factors, or categories, need to be changed to dummies, which report a 0 or 1 to show if that value is present
  • 20. Decision Trees • It breaks down a dataset into smaller and smaller subsets. • The final result is a model with a tree structure that has: • Decision nodes: ask a question and have two or more branches. • Leaf nodes: represent a classification or decision.
  • 21.
  • 22. Classification vs Regression • Classification — Predict categories. • Identifying group membership. • Regression — Predict values. • Involves estimating or predicting a response.
  • 25. Regression • Regression — Predict values. • Involves estimating or predicting a response. • This is what we’ll be doing. Predicting vote share…
  • 26. Creating your first Decision Tree You will use the scikit-learn and numpy libraries to build your first decision tree. We will need the following to build a decision tree • Response (y): A one-dimensional array or series containing the target from the train data. • Inputs (X): A multidimensional pandas data frame containing the features/predictors from the train data.
  • 27. Creating your first Decision Tree
  • 28. Importances and Score • .feature_importances_ attribute: tells us how important the features are for the final result. • .score() method: returns the mean accuracy of our fitting.
  • 30. That looks good… But that’s actually not clear.
  • 31. CLASS IMBALANCE • We have what is called a class imbalance problem. • The outcome of not being MVP is much much more common than being the MVP, • So our model is ‘accurate’ if it just tells everyone they’re not going to be MVP
  • 33. Looking at our results • We seem to be doing a decent job of identifying players who are great players • But the ordering isn’t perfect • And we have a lot of people who are scored as equivalent • Also note this seems to be a year with a lot of great performers this year
  • 34. Let’s improve it! • We have options for improving the model • Firstly, we can look at our feature list and select a smaller but more effective list of features • We could also choose a better type of model…
  • 36. Modify the feature list • We put a lot of features into our model • Trimming it down to a smaller list could improve the efficiency of our trees and possibly improve accuracy as well
  • 37. Overfitting • Resulting model too tied to the training set. • It doesn’t generalize to new data, which is the point of prediction.
  • 38. Random Forest Classifier • Random Forest Classifiers use many Decision Trees to build a classifier. • We introduce a bit of randomness. • Each Tree can uses a subset of the data to give a different answer (a vote). The final classification is the most common amongst the Trees.
  • 41. And the MVP goes to…
  • 43.
  • 44. What’s going on? • Our model is giving good weight to major statistical categories and position, but not enough to team record… • How could you fix continue to improve???
  • 45. Trim our variable list…
  • 50. More about Thinkful • Anyone who’s committed can learn to code • 1-on-1 mentorship is the best way to learn • Flexibility! Learn anywhere, anytime, & at your own pace
  • 51. Our Program You’ll learn concepts, practice with drills, and build capstone projects — all guided by a personal mentor
  • 52. Our Mentors Mentors have, on average, 10+ years of experience
  • 53. Data Science Syllabus • Managing data with SQL and Python • Modeling with both supervised and unsupervised models • Data visualization and communicating with data • Technical interviews + career services
  • 54. Special Introductory Offer • Prep course for 50% off — $250 instead of $500 • Covers math, stats, Python, and data science toolkit • Option to continue into full program • Talk to me (or email noel@thinkful.com) if you’re interested

Editor's Notes

  1. 80-20 rule: that 80% of a typical data science project is sourcing cleaning and preparing the data, while the remaining 20% is actual data analysis. Surprisingly time-consuming task. What we’re seeing now is increased number of data analysts who work on cleaning data to free up data scientist time.
  2. Let's start with loading in the training and testing set into your Python environment. You will use the training set to build your model, and the test set to validate it. The data is stored as csv files. You can load this data with the read_csv() method from the Pandas library.
  3. Before starting with the actual analysis, it's important to understand the structure of your data.
  4. the decision tree algorithm starts with all the data at the root node and scans all the variables for the best one to split on. Once a variable is chosen, you do the split and go down one level (or one node) and repeat.
  5. Famous example is Iris data set. Flowers have four features, sepal length and width, petal width and length
  6. If we plot it out across two dimensions, we can see that the setosa is in red, versicolor in green and virginia in blue. Imagine each of these dots represent a training point, something I’ve told a computer about. Then I show that computer the gray dot and ask what it is. What should the computer predict? Imaging this same concept is taking place in three dimensions. Or more! The more data we have, the better we can teach the computer how to do various things.
  7. In January, 2016 Thinkful became the first online bootcamp to publish a jobs report. And now we’re the first one to use a 3rd-party auditor to ensure our data is accurate and our methods are applied as advertised. We’ve seen 92% of our graduates land jobs as developers within 4 months of graduation. Our students generally move into full-time, salaried positions as developers or engineers. They work at startups and also larger, more established companies in several industries. We published the report because we want to give students the tools they need to make an informed decision about the programming school they attend. Education requires trust, and transparency builds it. Until now students choosing a bootcamp must take a leap of faith that schools are honest, their numbers up to date, and the results accurate. That's not sustainable and we hope it stops. We want to make sure our students have the tools they need to make an informed decision on which programming school they attend. Feel free to take a look on our website if you’d like to see all the data and the audit report. 1. Job placement stats. Audited stats. We are the only bootcamp that publishes monthly job stats. One of only bootcamps in the nation that has these stats verified by a third party. 2. 32% of flexible bootcamps. Whenever a student withdraws. Overall the most common reason is there are changes in schedule or financial ability changes. We try to address first one. Over 60% are full-time. Outside of our control. Full-time is 85% grad rate. In Atlanta, we’ve yet to have someone drop out.