SlideShare a Scribd company logo
1 of 42
Machine Learning
       &
 Decision Trees


    Nithum Thain
  January 12th, 2013
Overview

• The Data Science Value Chain

• Common Uses for Machine Learning

• The Art of Prediction & Classification

• Introduction to Decision Trees

• The Data Ninja Methodology
The Data Science Value Chain



                                   Visualization    Strategy,
               Storage &                           Marketing,
 Collection                             &
              Maintenance                           Product,
                                     Analysis      Operations




                            Machine Learning
                               lives here
Overview

• The Data Science Value Chain

• Common Uses for Machine Learning

• The Art of Prediction & Classification

• Introduction to Decision Trees

• The Data Ninja Methodology
Machine Learning vs. Artificial Intelligence

• Artificial Intelligence is a set of tools that allow machines to
perform higher order functions. These include natural language
processing, robotics, knowledge representation, etc.


• Machine Learning is a subset of artificial intelligence. It is a set of
(usually statistical) tools that allow machines to detect and extract
patterns from data.
Subdomains of Machine Learning
Unsupervised Learning
• Clustering
• Optimization
• Recommendation Systems
Supervised Learning
• Prediction & Classification
Reinforcement Learning
Clustering
Optimization
Recommendation Systems
Recommendation Systems
Recommendation Systems
Reinforcement Learning
Reinforcement Learning
Prediction & Classification
Overview

• The Data Science Value Chain

• Common Uses for Machine Learning

• The Art of Prediction & Classification

• Introduction to Decision Trees

• The Data Ninja Methodology
What is a Prediction Problem?

• A set of known input variables.
• An unknown output variable.
• A training set of data for which both the inputs and
outputs are known.
A Useful Formulation
                       Output Variable

Training Set




Test Set
The Algorithms Are Many

• Regression
• Decision Trees
• Neural Networks           Each has it’s own
                            strengths and
• Support Vector Machines   weaknesses.

• Random Forests
• Naive Bayes Classifier
Prediction vs. Classification


           1.618033988
           7498948482
           0458683436
           .....
Break Time!
Overview

• The Data Science Value Chain

• Common Uses for Machine Learning

• The Art of Prediction & Classification

• Introduction to Decision Trees

• The Data Ninja Methodology
What is a Decision Tree?
Why Not Automate It?
I Did!

                                Internet Friends?



                   Video Games?               XBOX 360




               Friends?            Friends?




         PS3              Wii     PC             PS3
How Our Algorithm Works

1. Start with the “root” node.
2. Check if the data all has the same output variable. If so, then
   you are done.
3. Check how every possible output variable splits the data.
4. Choose the one that splits the data MOST
    - The one which reduces the variance in the output variable
       in the resulting sets.
5. Repeat the process for the resulting “true” node and “false”
   node.
A Picture

            Friends?
A Picture

            Internet Friends?
                Friends?
A Picture

            Internet Friends?
                Friends?



                        XBOX 360
A Picture

                   Internet Friends?
                       Friends?



            Video Games?       XBOX 360
A Picture

                   Internet Friends?
                       Friends?



            Video Games?         XBOX 360




                      Friends?
A Picture

                                   Internet Friends?
                                       Friends?



                      Video Games?               XBOX 360




                  Friends?            Friends?




            PS3              Wii     PC             PS3
Coding Time
The Classes and Functions We Will Build:

Classes
• decisionnode: The basic building block of our tree
Functions
• divideset: Splits the tree into two sets based on a variable
• variance: Calculates the variance of the output variable in a set
• buildtree: Builds the tree according to the algorithm described

• classify: For any new data points, uses the tree to predict their value
• printree: Prints a text-based version of the full decision tree
The decisionnode Class


      if variable >= value
The decisionnode Class


      if variable >= value



                             or   result
Overview

• The Data Science Value Chain

• Common Uses for Machine Learning

• The Art of Prediction & Classification

• Introduction to Decision Trees

• The Data Ninja Methodology
The Data Ninja Methodology

1. Find the appropriate data
2. Play with the data (plot, sort, examine)
3. Clean the data
4. Choose the appropriate tool for analysis
5. Apply the tool
6. Repeat steps 2-6 until something works
7. ...
8. Profit!
Let’s Try Predicting Housing Prices!
Beware Overfitting!
How Can We Improve Our Results?
Appendix
Neural Network

More Related Content

Similar to Decision tree upload

rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
Jeff Heaton
 
00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdf00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdf
eanyang7
 
Что такое Data Science
Что такое Data ScienceЧто такое Data Science
Что такое Data Science
Olga Lavrentieva
 
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdfIntroduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
bisan3
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
CSIRO
 

Similar to Decision tree upload (20)

Predict the Oscars with Data Science
Predict the Oscars with Data SciencePredict the Oscars with Data Science
Predict the Oscars with Data Science
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
 
00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdf00_pytorch_and_deep_learning_fundamentals.pdf
00_pytorch_and_deep_learning_fundamentals.pdf
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
 
Predicting the NBA MVP
Predicting the NBA MVPPredicting the NBA MVP
Predicting the NBA MVP
 
The (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology residentThe (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology resident
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummies
 
Introduction to ML.NET
Introduction to ML.NETIntroduction to ML.NET
Introduction to ML.NET
 
Что такое Data Science
Что такое Data ScienceЧто такое Data Science
Что такое Data Science
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
Qiagram
QiagramQiagram
Qiagram
 
Qiagram Slides 2011 05
Qiagram Slides 2011 05Qiagram Slides 2011 05
Qiagram Slides 2011 05
 
Qiagram
QiagramQiagram
Qiagram
 
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdfIntroduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
 
From c# Into Machine Learning
From c# Into Machine LearningFrom c# Into Machine Learning
From c# Into Machine Learning
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
 
Demystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial IntelligenceDemystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial Intelligence
 
2. Data Preprocessing.pdf
2. Data Preprocessing.pdf2. Data Preprocessing.pdf
2. Data Preprocessing.pdf
 
Connected Components Labeling
Connected Components LabelingConnected Components Labeling
Connected Components Labeling
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Decision tree upload

  • 1. Machine Learning & Decision Trees Nithum Thain January 12th, 2013
  • 2. Overview • The Data Science Value Chain • Common Uses for Machine Learning • The Art of Prediction & Classification • Introduction to Decision Trees • The Data Ninja Methodology
  • 3. The Data Science Value Chain Visualization Strategy, Storage & Marketing, Collection & Maintenance Product, Analysis Operations Machine Learning lives here
  • 4. Overview • The Data Science Value Chain • Common Uses for Machine Learning • The Art of Prediction & Classification • Introduction to Decision Trees • The Data Ninja Methodology
  • 5. Machine Learning vs. Artificial Intelligence • Artificial Intelligence is a set of tools that allow machines to perform higher order functions. These include natural language processing, robotics, knowledge representation, etc. • Machine Learning is a subset of artificial intelligence. It is a set of (usually statistical) tools that allow machines to detect and extract patterns from data.
  • 6. Subdomains of Machine Learning Unsupervised Learning • Clustering • Optimization • Recommendation Systems Supervised Learning • Prediction & Classification Reinforcement Learning
  • 15. Overview • The Data Science Value Chain • Common Uses for Machine Learning • The Art of Prediction & Classification • Introduction to Decision Trees • The Data Ninja Methodology
  • 16. What is a Prediction Problem? • A set of known input variables. • An unknown output variable. • A training set of data for which both the inputs and outputs are known.
  • 17. A Useful Formulation Output Variable Training Set Test Set
  • 18. The Algorithms Are Many • Regression • Decision Trees • Neural Networks Each has it’s own strengths and • Support Vector Machines weaknesses. • Random Forests • Naive Bayes Classifier
  • 19. Prediction vs. Classification 1.618033988 7498948482 0458683436 .....
  • 21. Overview • The Data Science Value Chain • Common Uses for Machine Learning • The Art of Prediction & Classification • Introduction to Decision Trees • The Data Ninja Methodology
  • 22. What is a Decision Tree?
  • 24. I Did! Internet Friends? Video Games? XBOX 360 Friends? Friends? PS3 Wii PC PS3
  • 25. How Our Algorithm Works 1. Start with the “root” node. 2. Check if the data all has the same output variable. If so, then you are done. 3. Check how every possible output variable splits the data. 4. Choose the one that splits the data MOST - The one which reduces the variance in the output variable in the resulting sets. 5. Repeat the process for the resulting “true” node and “false” node.
  • 26. A Picture Friends?
  • 27. A Picture Internet Friends? Friends?
  • 28. A Picture Internet Friends? Friends? XBOX 360
  • 29. A Picture Internet Friends? Friends? Video Games? XBOX 360
  • 30. A Picture Internet Friends? Friends? Video Games? XBOX 360 Friends?
  • 31. A Picture Internet Friends? Friends? Video Games? XBOX 360 Friends? Friends? PS3 Wii PC PS3
  • 33. The Classes and Functions We Will Build: Classes • decisionnode: The basic building block of our tree Functions • divideset: Splits the tree into two sets based on a variable • variance: Calculates the variance of the output variable in a set • buildtree: Builds the tree according to the algorithm described • classify: For any new data points, uses the tree to predict their value • printree: Prints a text-based version of the full decision tree
  • 34. The decisionnode Class if variable >= value
  • 35. The decisionnode Class if variable >= value or result
  • 36. Overview • The Data Science Value Chain • Common Uses for Machine Learning • The Art of Prediction & Classification • Introduction to Decision Trees • The Data Ninja Methodology
  • 37. The Data Ninja Methodology 1. Find the appropriate data 2. Play with the data (plot, sort, examine) 3. Clean the data 4. Choose the appropriate tool for analysis 5. Apply the tool 6. Repeat steps 2-6 until something works 7. ... 8. Profit!
  • 38. Let’s Try Predicting Housing Prices!
  • 40. How Can We Improve Our Results?

Editor's Notes

  1. Google Analytics
  2. 2:Internet? T-> Xbox360F-> 0:Yes? T-> 1:No? T-> PS3 F-> PC F-> 1:No? T-> PS3 F-> Wii