SlideShare a Scribd company logo
 Data-Applied.com: Decision
Introduction Decision trees let you construct decision models They can be used for forecasting, classification or decision At each branch the data is spit based on a particular field of data Decision trees are constructed using Divide and Conquer techniques
Divide-and-Conquer: Constructing Decision Trees Steps to construct a decision tree recursively: Select an attribute to placed at root node and make one branch for each possible value  Repeat the process recursively at each branch, using only those instances that reach the branch  If at any time all instances at a node have the classification, stop developing that part of the tree Problem: How to decide which attribute to split on
Divide-and-Conquer: Constructing Decision Trees Steps to find the attribute to split on: We consider all the possible attributes as option and branch them according to different possible values Now for each possible attribute value we calculate Information and then find the Information gain for each attribute option Select that attribute for division which gives a Maximum Information Gain Do this until each branch terminates at an attribute which gives Information = 0
Divide-and-Conquer: Constructing Decision Trees Calculation of Information and Gain: For data: (P1, P2, P3……Pn) such that P1 + P2 + P3 +……. +Pn = 1  Information(P1, P2 …..Pn)  =  -P1logP1 -P2logP2 – P3logP3 ……… -PnlogPn Gain  = Information before division – Information after division
Divide-and-Conquer: Constructing Decision Trees Example: Here we have consider each attribute individually Each is divided into branches  according to different possible  values  Below each branch the number of class is marked
Divide-and-Conquer: Constructing Decision Trees Calculations: Using the formulae for Information, initially we have Number of instances with class = Yes is 9  Number of instances with class = No is 5 So we have P1 = 9/14 and P2 = 5/14 Info[9/14, 5/14] = -9/14log(9/14) -5/14log(5/14) = 0.940 bits Now for example lets consider Outlook attribute, we observe the following:
Divide-and-Conquer: Constructing Decision Trees Example Contd. Gain by using Outlook for division        = info([9,5]) – info([2,3],[4,0],[3,2]) 				                          = 0.940 – 0.693 = 0.247 bits Gain (outlook) = 0.247 bits 	Gain (temperature) = 0.029 bits 	Gain (humidity) = 0.152 bits 	Gain (windy) = 0.048 bits So since Outlook gives maximum gain, we will use it for division And we repeat the steps for Outlook = Sunny and Rainy and stop for 	Overcast since we have Information = 0 for it
Divide-and-Conquer: Constructing Decision Trees Highly branching attributes: The problem If we follow the previously subscribed method, it will always favor an attribute with the largest number of  branches In extreme cases it will favor an attribute which has different value for each instance: Identification code
Divide-and-Conquer: Constructing Decision Trees Highly branching attributes: The problem Information for such an attribute is 0 info([0,1]) + info([0,1]) + info([0,1]) + …………. + info([0,1]) = 0 It will hence have the maximum gain and will be chosen for branching But such an attribute is not good for predicting class of an unknown instance nor does it tells anything about the structure of division So we use gain ratio to compensate for this
Divide-and-Conquer: Constructing Decision Trees Highly branching attributes: Gain ratio Gain ratio =  gain/split info To calculate split info, for each instance value we just consider the number of instances covered by each attribute value, irrespective of the class Then we calculate the split info, so for identification code with 14 different values we have: info([1,1,1,…..,1]) = -1/14 x log1/14 x 14 = 3.807 For Outlook we will have the split info: info([5,4,5]) =  -1/5 x log 1/5 -1/4 x log1/4 -1/5 x log 1/5  = 1.577
Decision using Data Applied’s web interface
Step1: Selection of data
Step2: SelectingDecision
Step3: Result
Visit more self help tutorials ,[object Object]

More Related Content

What's hot

Decision tree
Decision treeDecision tree
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
butest
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
Lippo Group Digital
 
WEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And TechniquesWEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And Techniques
DataminingTools Inc
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_application
Cemal Ardil
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
Dr.E.N.Sathishkumar
 
ID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC Analysis
Talha Kabakus
 
Rough K Means - Numerical Example
Rough K Means - Numerical ExampleRough K Means - Numerical Example
Rough K Means - Numerical Example
Dr.E.N.Sathishkumar
 
Image Compression
Image CompressionImage Compression
Image Compression
Katie Harvey
 
k Nearest Neighbor
k Nearest Neighbork Nearest Neighbor
k Nearest Neighbor
butest
 

What's hot (11)

Decision tree
Decision treeDecision tree
Decision tree
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
WEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And TechniquesWEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And Techniques
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_application
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
ID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC Analysis
 
Rough K Means - Numerical Example
Rough K Means - Numerical ExampleRough K Means - Numerical Example
Rough K Means - Numerical Example
 
Image Compression
Image CompressionImage Compression
Image Compression
 
k Nearest Neighbor
k Nearest Neighbork Nearest Neighbor
k Nearest Neighbor
 

Viewers also liked

Data Applied:Outliers
Data Applied:OutliersData Applied:Outliers
Data Applied:Outliers
dataapplied content
 
Data Applied: Clustering
Data Applied: ClusteringData Applied: Clustering
Data Applied: Clustering
dataapplied content
 
Data Applied: Correlation
Data Applied: CorrelationData Applied: Correlation
Data Applied: Correlation
dataapplied content
 
Data Applied: Association
Data Applied: AssociationData Applied: Association
Data Applied: Association
dataapplied content
 
Data Applied: Forecast
Data Applied: ForecastData Applied: Forecast
Data Applied: Forecast
dataapplied content
 
Data Applied:Tree Maps
Data Applied:Tree MapsData Applied:Tree Maps
Data Applied:Tree Maps
dataapplied content
 
Data Applied:Similarity
Data Applied:SimilarityData Applied:Similarity
Data Applied:Similarity
dataapplied content
 
Data Applied:Tree Maps
Data Applied:Tree MapsData Applied:Tree Maps
Data Applied:Tree Maps
DataminingTools Inc
 

Viewers also liked (8)

Data Applied:Outliers
Data Applied:OutliersData Applied:Outliers
Data Applied:Outliers
 
Data Applied: Clustering
Data Applied: ClusteringData Applied: Clustering
Data Applied: Clustering
 
Data Applied: Correlation
Data Applied: CorrelationData Applied: Correlation
Data Applied: Correlation
 
Data Applied: Association
Data Applied: AssociationData Applied: Association
Data Applied: Association
 
Data Applied: Forecast
Data Applied: ForecastData Applied: Forecast
Data Applied: Forecast
 
Data Applied:Tree Maps
Data Applied:Tree MapsData Applied:Tree Maps
Data Applied:Tree Maps
 
Data Applied:Similarity
Data Applied:SimilarityData Applied:Similarity
Data Applied:Similarity
 
Data Applied:Tree Maps
Data Applied:Tree MapsData Applied:Tree Maps
Data Applied:Tree Maps
 

Similar to Data Applied: Decision

WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methods
weka Content
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
Laxmi139487
 
Machine learning session 10
Machine learning session 10Machine learning session 10
Machine learning session 10
NirsandhG
 
unit 5 decision tree2.pptx
unit 5 decision tree2.pptxunit 5 decision tree2.pptx
unit 5 decision tree2.pptx
ssuser5c580e1
 
Know How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfKnow How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdf
Data Science Council of America
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
classification in data warehouse and mining
classification in data warehouse and miningclassification in data warehouse and mining
classification in data warehouse and mining
anjanasharma77573
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
An algorithm for building
An algorithm for buildingAn algorithm for building
An algorithm for building
ajmal_fuuast
 
Data mining
Data miningData mining
Data mining
NafisehOfoghi
 
Classification (ML).ppt
Classification (ML).pptClassification (ML).ppt
Classification (ML).ppt
rajasamal1999
 
WEKA:Practical Machine Learning Tools And Techniques
WEKA:Practical Machine Learning Tools And TechniquesWEKA:Practical Machine Learning Tools And Techniques
WEKA:Practical Machine Learning Tools And Techniques
weka Content
 
Cs501 classification prediction
Cs501 classification predictionCs501 classification prediction
Cs501 classification prediction
Kamal Singh Lodhi
 
Oracle Fusion Trees
Oracle Fusion TreesOracle Fusion Trees
Oracle Fusion Trees
Feras Ahmad
 
Tutorial ground classification with Laserdata LiS
Tutorial ground classification with Laserdata LiSTutorial ground classification with Laserdata LiS
Tutorial ground classification with Laserdata LiS
Frederic Petrini-Monteferri
 
weka-190429184259.pdf
weka-190429184259.pdfweka-190429184259.pdf
weka-190429184259.pdf
TeamRebel1
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
Abrar ali
 
Decision tree
Decision treeDecision tree
Decision tree
Karan Deopura
 
ML .pptx
ML .pptxML .pptx
ML .pptx
ssuser8324dd
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
PriyadharshiniG41
 

Similar to Data Applied: Decision (20)

WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methods
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
 
Machine learning session 10
Machine learning session 10Machine learning session 10
Machine learning session 10
 
unit 5 decision tree2.pptx
unit 5 decision tree2.pptxunit 5 decision tree2.pptx
unit 5 decision tree2.pptx
 
Know How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfKnow How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdf
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
classification in data warehouse and mining
classification in data warehouse and miningclassification in data warehouse and mining
classification in data warehouse and mining
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 
An algorithm for building
An algorithm for buildingAn algorithm for building
An algorithm for building
 
Data mining
Data miningData mining
Data mining
 
Classification (ML).ppt
Classification (ML).pptClassification (ML).ppt
Classification (ML).ppt
 
WEKA:Practical Machine Learning Tools And Techniques
WEKA:Practical Machine Learning Tools And TechniquesWEKA:Practical Machine Learning Tools And Techniques
WEKA:Practical Machine Learning Tools And Techniques
 
Cs501 classification prediction
Cs501 classification predictionCs501 classification prediction
Cs501 classification prediction
 
Oracle Fusion Trees
Oracle Fusion TreesOracle Fusion Trees
Oracle Fusion Trees
 
Tutorial ground classification with Laserdata LiS
Tutorial ground classification with Laserdata LiSTutorial ground classification with Laserdata LiS
Tutorial ground classification with Laserdata LiS
 
weka-190429184259.pdf
weka-190429184259.pdfweka-190429184259.pdf
weka-190429184259.pdf
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 
Decision tree
Decision treeDecision tree
Decision tree
 
ML .pptx
ML .pptxML .pptx
ML .pptx
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 

Data Applied: Decision

  • 2. Introduction Decision trees let you construct decision models They can be used for forecasting, classification or decision At each branch the data is spit based on a particular field of data Decision trees are constructed using Divide and Conquer techniques
  • 3. Divide-and-Conquer: Constructing Decision Trees Steps to construct a decision tree recursively: Select an attribute to placed at root node and make one branch for each possible value Repeat the process recursively at each branch, using only those instances that reach the branch If at any time all instances at a node have the classification, stop developing that part of the tree Problem: How to decide which attribute to split on
  • 4. Divide-and-Conquer: Constructing Decision Trees Steps to find the attribute to split on: We consider all the possible attributes as option and branch them according to different possible values Now for each possible attribute value we calculate Information and then find the Information gain for each attribute option Select that attribute for division which gives a Maximum Information Gain Do this until each branch terminates at an attribute which gives Information = 0
  • 5. Divide-and-Conquer: Constructing Decision Trees Calculation of Information and Gain: For data: (P1, P2, P3……Pn) such that P1 + P2 + P3 +……. +Pn = 1 Information(P1, P2 …..Pn) = -P1logP1 -P2logP2 – P3logP3 ……… -PnlogPn Gain = Information before division – Information after division
  • 6. Divide-and-Conquer: Constructing Decision Trees Example: Here we have consider each attribute individually Each is divided into branches according to different possible values Below each branch the number of class is marked
  • 7. Divide-and-Conquer: Constructing Decision Trees Calculations: Using the formulae for Information, initially we have Number of instances with class = Yes is 9 Number of instances with class = No is 5 So we have P1 = 9/14 and P2 = 5/14 Info[9/14, 5/14] = -9/14log(9/14) -5/14log(5/14) = 0.940 bits Now for example lets consider Outlook attribute, we observe the following:
  • 8. Divide-and-Conquer: Constructing Decision Trees Example Contd. Gain by using Outlook for division = info([9,5]) – info([2,3],[4,0],[3,2]) = 0.940 – 0.693 = 0.247 bits Gain (outlook) = 0.247 bits Gain (temperature) = 0.029 bits Gain (humidity) = 0.152 bits Gain (windy) = 0.048 bits So since Outlook gives maximum gain, we will use it for division And we repeat the steps for Outlook = Sunny and Rainy and stop for Overcast since we have Information = 0 for it
  • 9. Divide-and-Conquer: Constructing Decision Trees Highly branching attributes: The problem If we follow the previously subscribed method, it will always favor an attribute with the largest number of branches In extreme cases it will favor an attribute which has different value for each instance: Identification code
  • 10. Divide-and-Conquer: Constructing Decision Trees Highly branching attributes: The problem Information for such an attribute is 0 info([0,1]) + info([0,1]) + info([0,1]) + …………. + info([0,1]) = 0 It will hence have the maximum gain and will be chosen for branching But such an attribute is not good for predicting class of an unknown instance nor does it tells anything about the structure of division So we use gain ratio to compensate for this
  • 11. Divide-and-Conquer: Constructing Decision Trees Highly branching attributes: Gain ratio Gain ratio = gain/split info To calculate split info, for each instance value we just consider the number of instances covered by each attribute value, irrespective of the class Then we calculate the split info, so for identification code with 14 different values we have: info([1,1,1,…..,1]) = -1/14 x log1/14 x 14 = 3.807 For Outlook we will have the split info: info([5,4,5]) = -1/5 x log 1/5 -1/4 x log1/4 -1/5 x log 1/5 = 1.577
  • 12. Decision using Data Applied’s web interface
  • 16.
  • 17. The tutorials section is free, self-guiding and will not involve any additional support.
  • 18. Visit us at www.dataminingtools.net