SlideShare a Scribd company logo
From decision trees to
random forests
Viet-Trung Tran
Decision tree learning
•  Supervised learning
•  From a set of measurements, 
– learn a model
– to predict and understand a phenomenon
Example 1: wine taste preference
•  From physicochemical properties (alcohol, acidity,
sulphates, etc)
•  Learn a model
•  To predict wine taste preference (from 0 to 10)
P.	
  Cortez,	
  A.	
  Cerdeira,	
  F.	
  Almeida,	
  T.	
  Matos	
  and	
  J.	
  Reis,	
  Modeling	
  wine	
  
preferences	
  by	
  data	
  mining	
  from	
  physicochemical	
  proper@es,	
  2009	
  
Observation
•  Decision tree can be interpreted as set of
IF...THEN rules
•  Can be applied to noisy data
•  One of popular inductive learning
•  Good results for real-life applications
Decision tree representation
•  An inner node represents an attribute
•  An edge represents a test on the attribute of
the father node
•  A leaf represents one of the classes 
•  Construction of a decision tree
– Based on the training data
– Top-down strategy
Example 2: Sport preferene
Example 3: Weather & sport practicing
Classification 
•  The classification of an unknown input vector is done by
traversing the tree from the root node to a leaf node.
•  A record enters the tree at the root node.
•  At the root, a test is applied to determine which child node
the record will encounter next.
•  This process is repeated until the record arrives at a leaf
node.
•  All the records that end up at a given leaf of the tree are
classified in the same way.
•  There is a unique path from the root to each leaf.
•  The path is a rule which is used to classify the records.
•  The data set has five attributes.
•  There is a special attribute: the attribute class is the class
label.
•  The attributes, temp (temperature) and humidity are
numerical attributes
•  Other attributes are categorical, that is, they cannot be
ordered.
•  Based on the training data set, we want to find a set of rules
to know what values of outlook, temperature, humidity and
wind, determine whether or not to play golf.
•  RULE 1 If it is sunny and the humidity is not above 75%,
then play.
•  RULE 2 If it is sunny and the humidity is above 75%, then
do not play.
•  RULE 3 If it is overcast, then play.
•  RULE 4 If it is rainy and not windy, then play.
•  RULE 5 If it is rainy and windy, then don't play.
Splitting attribute
•  At every node there is an attribute associated with
the node called the splitting attribute
•  Top-down traversal
–  In our example, outlook is the splitting attribute at root.
–  Since for the given record, outlook = rain, we move to the
rightmost child node of the root.
–  At this node, the splitting attribute is windy and we find
that for the record we want classify, windy = true.
–  Hence, we move to the left child node to conclude that
the class label Is "no play".
Decision tree construction
•  Identify the splitting attribute and splitting
criterion at every level of the tree 
•  Algorithm 
– Iterative Dichotomizer (ID3)
Iterative Dichotomizer (ID3)
•  Quinlan (1986)
•  Each node corresponds to a splitting attribute
•  Each edge is a possible value of that attribute.
•  At each node the splitting attribute is selected to be the
most informative among the attributes not yet considered in
the path from the root.
•  Entropy is used to measure how informative is a node.
Splitting attribute selection
•  The algorithm uses the criterion of information gain
to determine the goodness of a split.
–  The attribute with the greatest information gain is taken
as the splitting attribute, and the data set is split for all
distinct values of the attribute values of the attribute.
•  Example: 2 classes: C1, C2, pick A1 or A2
Entropy – General Case
•  Impurity/Inhomogeneity measurement
•  Suppose X takes n values, V1, V2,… Vn, and
P(X=V1)=p1, P(X=V2)=p2, … P(X=Vn)=pn
•  What is the smallest number of bits, on average, per
symbol, needed to transmit the symbols drawn from
distribution of X? It’s
E(X) = p1 log2 p1 – p2 log2 p2 – … pnlog2pn
•  E(X) = the entropy of X
)(log
1
2 i
n
i
i pp∑=
−=
Example: 2 classes
Information gain
•  Gain(S,Wind)?
•  Wind = {Weak, Strong}
•  S = {9 Yes &5 No}
•  Sweak = {6 Yes & 2 No | Wind=Weak}
•  Sstrong = {3 Yes &3 No | Wind=Strong}
Example: Decision tree learning
•  Choose splitting attribute for root among {Outlook,
Temperature, Humidity, Wind}?
–  Gain(S, Outlook) = ... = 0.246
–  Gain(S, Temperature) = ... = 0.029
–  Gain(S, Humidity) = ... = 0.151
–  Gain(S, Wind) = ... = 0.048
•  Gain(Ssunny,Temperature) = 0,57
•  Gain(Ssunny, Humidity) = 0,97
•  Gain(Ssunny, Windy) =0,019
Over-fitting example
•  Consider adding noisy training example #15
–  Sunny, hot, normal, strong, playTennis = No
•  What effect on earlier tree?
Over-fitting
Avoid over-fitting
•  Stop growing when data split not statistically
significant
•  Grow full tree then post-prune
•  How to select best tree
– Measure performance over training tree
– Measure performance over separate validation
dataset
– MDL minimize
•  size(tree) + size(misclassifications(tree))
Reduced-error pruning
•  Split data into training and validation set
•  Do until further pruning is harmful
–  Evaluate impact on validation set of pruning
each possible node
– Greedily remove the one that most improves
validation set accuracy
Rule post-pruning
•  Convert tree to equivalent set
of rules
•  Prune each rule independently
of others
•  Sort final rules into desired
sequence for use
Issues in Decision Tree Learning
•  How deep to grow?
•  How to handle continuous attributes?
•  How to choose an appropriate attributes selection
measure?
•  How to handle data with missing attributes values?
•  How to handle attributes with different costs?
•  How to improve computational efficiency?
•  ID3 has been extended to handle most of these.
The resulting system is C4.5 (http://cis-
linux1.temple.edu/~ingargio/cis587/readings/id3-c45.html)
Decision tree – When?
References
•  Data mining, Nhat-Quang Nguyen, HUST
•  http://www.cs.cmu.edu/~awm/10701/slides/
DTreesAndOverfitting-9-13-05.pdf
RANDOM FORESTS
Credits: Michal Malohlava @Oxdata
Motivation
•  Training sample of points
covering area [0,3] x [0,3]
•  Two possible colors of
points
•  The model should be able to predict a color of a
new point
Decision tree
How to grow a decision tree
•  Split rows in a given
node into two sets with
respect to impurity
measure
–  The smaller, the more
skewed is distribution
–  Compare impurity of
parent with impurity of
children
When to stop growing tree
•  Build full tree or
•  Apply stopping criterion - limit on:
–  Tree depth, or
–  Minimum number of points in a leaf
How to assign leaf

value?
•  The leaf value is
–  If leaf contains only one point
then its color represents leaf
value
•  Else majority color is picked, or
color distribution is stored
Decision tree
•  Tree covered whole area by rectangles
predicting a point color
Decision tree scoring
•  The model can predict a point color based
on its coordinates.
Over-fitting
•  Tree perfectly represents training data (0%
training error), but also learned about noise!
•  And hence poorly predicts a new point!
Handle over-fitting
•  Pre-pruning via stopping criterion!
•  Post-pruning: decreases complexity of
model but helps with model generalization
•  Randomize tree building and combine trees
together
Randomize #1- Bagging
Randomize #1- Bagging
Randomize #1- Bagging
•  Each tree sees only sample of training data
and captures only a part of the information.
•  Build multiple weak trees which vote
together to give resulting prediction
– voting is based on majority vote, or weighted
average
Bagging - boundary
•  Bagging averages many trees, and produces
smoother decision boundaries.
Randomize #2 - Feature selection

Random forest
Random forest - properties
•  Refinement of bagged trees; quite popular
•  At each tree split, a random sample of m features is drawn,
and only those m features are considered for splitting.
Typically
•  m=√p or log2(p), where p is the number of features
•  For each tree grown on a bootstrap sample, the error rate
for observations left out of the bootstrap sample is
monitored. This is called the “out-of-bag” error rate.
•  Random forests tries to improve on bagging by “de-
correlating” the trees. Each tree has the same expectation
Advantages of Random Forest
•  Independent trees which can be built in
parallel
•  The model does not overfit easily
•  Produces reasonable accuracy
•  Brings more features to analyze data variable
importance, proximities, missing values
imputation
Out of bag points and validation
•  Each tree is built over
a sample of training
points.
•  Remaining points are
called “out-of-
bag” (OOB).
These	
  points	
  are	
  used	
  for	
  valida@on	
  
as	
  a	
  good	
  approxima@on	
  for	
  
generaliza@on	
  error.	
  Almost	
  
iden@cal	
  as	
  N-­‐fold	
  cross	
  valida@on.	
  

More Related Content

What's hot

Decision tree
Decision treeDecision tree
Decision tree
SEMINARGROOT
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Parth Khare
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
Carlo Carandang
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
Rupak Roy
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
Kirkwood Donavin
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
Andrew Ferlitsch
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
zekeLabs Technologies
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
zekeLabs Technologies
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Md. Ariful Hoque
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
Palin analytics
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
Salford Systems
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
Zhen Li
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
SSA KPI
 

What's hot (20)

Decision tree
Decision treeDecision tree
Decision tree
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
Decision tree
Decision treeDecision tree
Decision tree
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 

Similar to From decision trees to random forests

decision tree.pdf
decision tree.pdfdecision tree.pdf
decision tree.pdf
DivitGoyal2
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
JayabharathiMuraliku
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3Laila Fatehy
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
Rajasekhar364622
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
7 decision tree
7 decision tree7 decision tree
7 decision tree
tafosepsdfasg
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
ssuser4c50a9
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
Vijayalakshmi171563
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
Luca Zavarella
 
Lecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lectureLecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lecture
AjayKumar773878
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
Souma Maiti
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
GaytriDhingra1
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
GaytriDhingra1
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
hktripathy
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
tttiba
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Maninda Edirisooriya
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
vijaita kashyap
 

Similar to From decision trees to random forests (20)

decision tree.pdf
decision tree.pdfdecision tree.pdf
decision tree.pdf
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
7 decision tree
7 decision tree7 decision tree
7 decision tree
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
 
Lecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lectureLecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lecture
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
 

More from Viet-Trung TRAN

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Viet-Trung TRAN
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
Viet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
Viet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
Viet-Trung TRAN
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Viet-Trung TRAN
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
Viet-Trung TRAN
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
Viet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Viet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Viet-Trung TRAN
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
Viet-Trung TRAN
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
Viet-Trung TRAN
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
Viet-Trung TRAN
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
Viet-Trung TRAN
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
Viet-Trung TRAN
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Viet-Trung TRAN
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
Viet-Trung TRAN
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
Viet-Trung TRAN
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
Viet-Trung TRAN
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
Viet-Trung TRAN
 

More from Viet-Trung TRAN (20)

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
 

Recently uploaded

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 

Recently uploaded (20)

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 

From decision trees to random forests

  • 1. From decision trees to random forests Viet-Trung Tran
  • 2. Decision tree learning •  Supervised learning •  From a set of measurements, – learn a model – to predict and understand a phenomenon
  • 3. Example 1: wine taste preference •  From physicochemical properties (alcohol, acidity, sulphates, etc) •  Learn a model •  To predict wine taste preference (from 0 to 10) P.  Cortez,  A.  Cerdeira,  F.  Almeida,  T.  Matos  and  J.  Reis,  Modeling  wine   preferences  by  data  mining  from  physicochemical  proper@es,  2009  
  • 4. Observation •  Decision tree can be interpreted as set of IF...THEN rules •  Can be applied to noisy data •  One of popular inductive learning •  Good results for real-life applications
  • 5. Decision tree representation •  An inner node represents an attribute •  An edge represents a test on the attribute of the father node •  A leaf represents one of the classes •  Construction of a decision tree – Based on the training data – Top-down strategy
  • 6. Example 2: Sport preferene
  • 7. Example 3: Weather & sport practicing
  • 8. Classification •  The classification of an unknown input vector is done by traversing the tree from the root node to a leaf node. •  A record enters the tree at the root node. •  At the root, a test is applied to determine which child node the record will encounter next. •  This process is repeated until the record arrives at a leaf node. •  All the records that end up at a given leaf of the tree are classified in the same way. •  There is a unique path from the root to each leaf. •  The path is a rule which is used to classify the records.
  • 9. •  The data set has five attributes. •  There is a special attribute: the attribute class is the class label. •  The attributes, temp (temperature) and humidity are numerical attributes •  Other attributes are categorical, that is, they cannot be ordered. •  Based on the training data set, we want to find a set of rules to know what values of outlook, temperature, humidity and wind, determine whether or not to play golf.
  • 10. •  RULE 1 If it is sunny and the humidity is not above 75%, then play. •  RULE 2 If it is sunny and the humidity is above 75%, then do not play. •  RULE 3 If it is overcast, then play. •  RULE 4 If it is rainy and not windy, then play. •  RULE 5 If it is rainy and windy, then don't play.
  • 11. Splitting attribute •  At every node there is an attribute associated with the node called the splitting attribute •  Top-down traversal –  In our example, outlook is the splitting attribute at root. –  Since for the given record, outlook = rain, we move to the rightmost child node of the root. –  At this node, the splitting attribute is windy and we find that for the record we want classify, windy = true. –  Hence, we move to the left child node to conclude that the class label Is "no play".
  • 12.
  • 13.
  • 14. Decision tree construction •  Identify the splitting attribute and splitting criterion at every level of the tree •  Algorithm – Iterative Dichotomizer (ID3)
  • 15. Iterative Dichotomizer (ID3) •  Quinlan (1986) •  Each node corresponds to a splitting attribute •  Each edge is a possible value of that attribute. •  At each node the splitting attribute is selected to be the most informative among the attributes not yet considered in the path from the root. •  Entropy is used to measure how informative is a node.
  • 16.
  • 17. Splitting attribute selection •  The algorithm uses the criterion of information gain to determine the goodness of a split. –  The attribute with the greatest information gain is taken as the splitting attribute, and the data set is split for all distinct values of the attribute values of the attribute. •  Example: 2 classes: C1, C2, pick A1 or A2
  • 18. Entropy – General Case •  Impurity/Inhomogeneity measurement •  Suppose X takes n values, V1, V2,… Vn, and P(X=V1)=p1, P(X=V2)=p2, … P(X=Vn)=pn •  What is the smallest number of bits, on average, per symbol, needed to transmit the symbols drawn from distribution of X? It’s E(X) = p1 log2 p1 – p2 log2 p2 – … pnlog2pn •  E(X) = the entropy of X )(log 1 2 i n i i pp∑= −=
  • 20.
  • 22.
  • 23. •  Gain(S,Wind)? •  Wind = {Weak, Strong} •  S = {9 Yes &5 No} •  Sweak = {6 Yes & 2 No | Wind=Weak} •  Sstrong = {3 Yes &3 No | Wind=Strong}
  • 24. Example: Decision tree learning •  Choose splitting attribute for root among {Outlook, Temperature, Humidity, Wind}? –  Gain(S, Outlook) = ... = 0.246 –  Gain(S, Temperature) = ... = 0.029 –  Gain(S, Humidity) = ... = 0.151 –  Gain(S, Wind) = ... = 0.048
  • 25. •  Gain(Ssunny,Temperature) = 0,57 •  Gain(Ssunny, Humidity) = 0,97 •  Gain(Ssunny, Windy) =0,019
  • 26. Over-fitting example •  Consider adding noisy training example #15 –  Sunny, hot, normal, strong, playTennis = No •  What effect on earlier tree?
  • 28. Avoid over-fitting •  Stop growing when data split not statistically significant •  Grow full tree then post-prune •  How to select best tree – Measure performance over training tree – Measure performance over separate validation dataset – MDL minimize •  size(tree) + size(misclassifications(tree))
  • 29. Reduced-error pruning •  Split data into training and validation set •  Do until further pruning is harmful –  Evaluate impact on validation set of pruning each possible node – Greedily remove the one that most improves validation set accuracy
  • 30. Rule post-pruning •  Convert tree to equivalent set of rules •  Prune each rule independently of others •  Sort final rules into desired sequence for use
  • 31. Issues in Decision Tree Learning •  How deep to grow? •  How to handle continuous attributes? •  How to choose an appropriate attributes selection measure? •  How to handle data with missing attributes values? •  How to handle attributes with different costs? •  How to improve computational efficiency? •  ID3 has been extended to handle most of these. The resulting system is C4.5 (http://cis- linux1.temple.edu/~ingargio/cis587/readings/id3-c45.html)
  • 33. References •  Data mining, Nhat-Quang Nguyen, HUST •  http://www.cs.cmu.edu/~awm/10701/slides/ DTreesAndOverfitting-9-13-05.pdf
  • 34. RANDOM FORESTS Credits: Michal Malohlava @Oxdata
  • 35. Motivation •  Training sample of points covering area [0,3] x [0,3] •  Two possible colors of points
  • 36. •  The model should be able to predict a color of a new point
  • 38. How to grow a decision tree •  Split rows in a given node into two sets with respect to impurity measure –  The smaller, the more skewed is distribution –  Compare impurity of parent with impurity of children
  • 39. When to stop growing tree •  Build full tree or •  Apply stopping criterion - limit on: –  Tree depth, or –  Minimum number of points in a leaf
  • 40. How to assign leaf
 value? •  The leaf value is –  If leaf contains only one point then its color represents leaf value •  Else majority color is picked, or color distribution is stored
  • 41. Decision tree •  Tree covered whole area by rectangles predicting a point color
  • 42. Decision tree scoring •  The model can predict a point color based on its coordinates.
  • 43. Over-fitting •  Tree perfectly represents training data (0% training error), but also learned about noise!
  • 44. •  And hence poorly predicts a new point!
  • 45. Handle over-fitting •  Pre-pruning via stopping criterion! •  Post-pruning: decreases complexity of model but helps with model generalization •  Randomize tree building and combine trees together
  • 48. Randomize #1- Bagging •  Each tree sees only sample of training data and captures only a part of the information. •  Build multiple weak trees which vote together to give resulting prediction – voting is based on majority vote, or weighted average
  • 49. Bagging - boundary •  Bagging averages many trees, and produces smoother decision boundaries.
  • 50. Randomize #2 - Feature selection
 Random forest
  • 51. Random forest - properties •  Refinement of bagged trees; quite popular •  At each tree split, a random sample of m features is drawn, and only those m features are considered for splitting. Typically •  m=√p or log2(p), where p is the number of features •  For each tree grown on a bootstrap sample, the error rate for observations left out of the bootstrap sample is monitored. This is called the “out-of-bag” error rate. •  Random forests tries to improve on bagging by “de- correlating” the trees. Each tree has the same expectation
  • 52. Advantages of Random Forest •  Independent trees which can be built in parallel •  The model does not overfit easily •  Produces reasonable accuracy •  Brings more features to analyze data variable importance, proximities, missing values imputation
  • 53. Out of bag points and validation •  Each tree is built over a sample of training points. •  Remaining points are called “out-of- bag” (OOB). These  points  are  used  for  valida@on   as  a  good  approxima@on  for   generaliza@on  error.  Almost   iden@cal  as  N-­‐fold  cross  valida@on.