Lecture 3b: Decision Trees (1 part)

•

4 likes•7,740 views

Greediness, Divide and Conquer, Inductive Bias of the Decision Tree, Loss function, Expected loss, Empirical error Induction

Education

Machine Learning for Language Technology 2015
http://stp.lingfil.uu.se/~santinim/ml/2015/ml4lt_2015.htm
Decision Trees (1 part)
Marina Santini
santinim@stp.lingfil.uu.se
Department of Linguistics and Philology
Uppsala University, Uppsala, Sweden
Autumn 2015

Outline
• Greediness
• Divide and Conquer
• Inductive Bias of the Decision Tree
• Loss function
• Expected loss
• Empirical error
• Induction
Lecture 3: Decision Trees (1) 2

Learning: Generalization Ability
• Predicting the future based on the past
Lecture 3: Decision Trees (1) 3

Predict whether a student will like a course
Lecture 3: Decision Trees (1) 4

Training Data
Lecture 3: Decision Trees (1) 5

That is, ....
• Questions = Features
• Answers = Feature Values
• Ratings = Class Labels
• An example is a set of feature values.
• Traning data is a set of examples associated
with class labels.
Lecture 3: Decision Trees (1) 6

”Greedy model”: the most useful feature
– Histograms
– Rood node
Lecture 3: Decision Trees (1) 7

Divide & Conquer
• Divide:
– Partition the date into 2 parts:
• YES part vs NO part
• Conquer:
– Recurse and run the Divide routine
Lecture 3: Decision Trees (1) 8

The end of the cycle
• ... When it becomes useless to query on
additional features
Lecture 3: Decision Trees (1) 9

Decision tree: Inductive Bias
• The goal of the decision tree learning model
is:
– to figure out what questions to ask
– in what order
– what answer to predict once you have asked
enough questions
– The inductive bias of decision trees: The things
that we want to learn to predect are more like the
root node and less like the other branch nodes.
Lecture 3: Decision Trees (1) 10

Informal Definition
• A decision tree is:
– a flow-chart-like structure, where
• each internal (non-leaf) node denotes a test on an
attribute,
• each branch represents the outcome of a test, and
• each leaf (or terminal) node holds a class label.
• The topmost node in a tree is the root node.
Lecture 3: Decision Trees (1) 11

Formalising the learning problem:
1) the loss function
loss function
Lecture 3: Decision Trees (1) 12

Formalising the learning problem:
2) Data Generating Distribution
D ( x, y )
Lecture 3: Decision Trees (1) 13

Expected Loss
1. The loss function
2. The data generating distribution
Lecture 3: Decision Trees (1) 14

Formulae: Expected Value
How to read:
= epsilon
= equal by definition to (or: is defined as)
= blackboard-bold E
= sub the pair xy
= over script D
= l of the pair y f of x
15
Sum over all the pairs xy in script D
of x and y times l of y and f of x

Training Error
• The training error is the average error over the
training data
• How to read: the training error epsilon-hat is
equal by definition to 1 over N of the Sum from
n=1 to capital N of “l” of y and f of x.
Lecture 3: Decision Trees (1) 16

Empirical Error
• Alpaydin (2010: 24): the empirical error is the
proportion of training instances where the
predictions of h (the hypothesis = the
informed guess) do not mach the required
values given in X (the training set). The error
of the the hypothesis h given the training set X
is:
Lecture 3: Decision Trees (1) 17

Induction
Given:
• a loss function l
and
• a sample d from some unknown distribution D
• you must compute a function f that has low
expected error ε over D with respect to l.
Lecture 3: Decision Trees (1) 18

Quiz 1: Training error
• How would you define a training error on a
dataset:
1. Training error is the average loss over the
training sample
2. Training error is the expected prediction error
over an independent test sample
3. None of the above
Lecture 3: Decision Trees (1) 19

Quiz 2: Distributions
What kind of distribution is D
in the formula above?
1. Normal
2. Unknown
3. None of the above
Lecture 3: Decision Trees (1) 20

Quiz 3: Loss function
• How would you define a loss function?
1. The loss function L(actual value, predicted value)
characterizes how bad predictions are
2. The loss function is an unknown distribution
3. Both definitions are incorrect.
Lecture 3: Decision Trees (1) 21

The End
Lecture 3: Decision Trees (1) 22

What's hot

L3. Decision TreesMachine Learning Valencia

Decision TreesCloudxLab

Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance

L13. Cluster AnalysisMachine Learning Valencia

LR1. Summary Day 1Machine Learning Valencia

L4. Ensembles of Decision TreesMachine Learning Valencia

ClassificationCloudxLab

Decision treesRohit Srivastava

Machine learning Lecture 1Srinivasan R

Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra

Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007

Decision Tree LearningMilind Gokhale

supervised learningAmar Tripathi

Decision treeShraddhaPandey45

Lecture 6: Ensemble Methods Marina Santini

Machine Learning 1 - Introductionbutest

Supervised learningAlia Hamwi

Decision TreesStudent

Supervised Machine Learning in RBabu Priyavrat

What's hot (19)

L3. Decision Trees

Decision Trees

Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav

L13. Cluster Analysis

LR1. Summary Day 1

L4. Ensembles of Decision Trees

Classification

Decision trees

Machine learning Lecture 1

Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University

Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

Decision Tree Learning

supervised learning

Decision tree

Lecture 6: Ensemble Methods

Machine Learning 1 - Introduction

Supervised learning

Decision Trees

Supervised Machine Learning in R

Viewers also liked

Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini

Color modelsHaitham Ahmed

Color modelsMoahmed Sweelam

Color image processing PresentationRevanth Chimmani

Color Image Processingkiruthiammu

10 color image processingbabak danyal

Colour modelsBCET

Digital image processing question bankYaseen Albakry

Color Models Computer Graphicsdhruv141293

Color ModelsMustafa Salam

Digital image processing img smoothningVinay Gupta

Viewers also liked (11)

Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio

Color models

Color image processing Presentation

Color Image Processing

10 color image processing

Colour models

Digital image processing question bank

Color Models Computer Graphics

Color Models

Digital image processing img smoothning

Similar to Lecture 3b: Decision Trees (1 part)

Machine learningDigvijay Singh

MachineLearning.pptbutest

Introduction to Machine LearningAI Summary

Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya

Lecture 01: Machine Learning for Language Technology - IntroductionMarina Santini

林守德/Practical Issues in Machine Learning台灣資料科學年會

Introduction to Machine Learning Aristotelis Tsirigos butest

know Machine Learning Basic Concepts.pdfhemangppatel

Machine Learning ebook.pdfHODIT12

1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1MostafaHazemMostafaa

pptbutest

Machine Learning for NLPbutest

Week_1 Machine Learning introduction.pptxmuhammadsamroz

Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Universitat Politècnica de Catalunya

Heuristic Searchbutest

2014 11-13clementsuwyo

Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...DrkhanchanaR

MLlecture1.pptbutest

Similar to Lecture 3b: Decision Trees (1 part) (20)

Machine learning

MachineLearning.ppt

Introduction to Machine Learning

Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)

Lecture 01: Machine Learning for Language Technology - Introduction

林守德/Practical Issues in Machine Learning

Introduction to Machine Learning Aristotelis Tsirigos

know Machine Learning Basic Concepts.pdf

Machine Learning ebook.pdf

1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1

ppt

Machine Learning for NLP

Week_1 Machine Learning introduction.pptx

Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...

Heuristic Search

2014 11-13

Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...

MLlecture1.ppt

Recently uploaded

Roles & Responsibilities in PharmacovigilanceSamikshaHamane

Introduction to AI in Higher Education_draft.pptxpboyjonauth

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb

Alper Gobel In Media Res Media ComponentInMediaRes1

Types of Journalistic Writing Grade 8.pptxEyham Joco

Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo

Difference Between Search & Browse Methods in Odoo 17Celine George

OS-operating systems- ch04 (Threads) ...Dr. Mazin Mohamed alkathiri

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection

ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli

Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe

Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2

Earth Day Presentation wow hello nice greatYousafMalik24

Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth

Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma

DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu

Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir

Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri

Recently uploaded (20)

Roles & Responsibilities in Pharmacovigilance

Introduction to AI in Higher Education_draft.pptx

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf

Alper Gobel In Media Res Media Component

Types of Journalistic Writing Grade 8.pptx

Quarter 4 Peace-education.pptx Catch Up Friday

Difference Between Search & Browse Methods in Odoo 17

OS-operating systems- ch04 (Threads) ...

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...

ACC 2024 Chronicles. Cardiology. Exam.pdf

Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf

Grade 9 Q4-MELC1-Active and Passive Voice.pptx

Earth Day Presentation wow hello nice great

Introduction to ArtificiaI Intelligence in Higher Education

Solving Puzzles Benefits Everyone (English).pptx

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx

DATA STRUCTURE AND ALGORITHM for beginners

Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf

Judging the Relevance and worth of ideas part 2.pptx

Lecture 3b: Decision Trees (1 part)

1. Machine Learning for Language Technology 2015 http://stp.lingfil.uu.se/~santinim/ml/2015/ml4lt_2015.htm Decision Trees (1 part) Marina Santini santinim@stp.lingfil.uu.se Department of Linguistics and Philology Uppsala University, Uppsala, Sweden Autumn 2015

2. Outline • Greediness • Divide and Conquer • Inductive Bias of the Decision Tree • Loss function • Expected loss • Empirical error • Induction Lecture 3: Decision Trees (1) 2

3. Learning: Generalization Ability • Predicting the future based on the past Lecture 3: Decision Trees (1) 3

4. Predict whether a student will like a course Lecture 3: Decision Trees (1) 4

5. Training Data Lecture 3: Decision Trees (1) 5

6. That is, .... • Questions = Features • Answers = Feature Values • Ratings = Class Labels • An example is a set of feature values. • Traning data is a set of examples associated with class labels. Lecture 3: Decision Trees (1) 6

7. ”Greedy model”: the most useful feature – Histograms – Rood node Lecture 3: Decision Trees (1) 7

8. Divide & Conquer • Divide: – Partition the date into 2 parts: • YES part vs NO part • Conquer: – Recurse and run the Divide routine Lecture 3: Decision Trees (1) 8

9. The end of the cycle • ... When it becomes useless to query on additional features Lecture 3: Decision Trees (1) 9

10. Decision tree: Inductive Bias • The goal of the decision tree learning model is: – to figure out what questions to ask – in what order – what answer to predict once you have asked enough questions – The inductive bias of decision trees: The things that we want to learn to predect are more like the root node and less like the other branch nodes. Lecture 3: Decision Trees (1) 10

11. Informal Definition • A decision tree is: – a flow-chart-like structure, where • each internal (non-leaf) node denotes a test on an attribute, • each branch represents the outcome of a test, and • each leaf (or terminal) node holds a class label. • The topmost node in a tree is the root node. Lecture 3: Decision Trees (1) 11

12. Formalising the learning problem: 1) the loss function loss function Lecture 3: Decision Trees (1) 12

13. Formalising the learning problem: 2) Data Generating Distribution D ( x, y ) Lecture 3: Decision Trees (1) 13

14. Expected Loss 1. The loss function 2. The data generating distribution Lecture 3: Decision Trees (1) 14

15. Formulae: Expected Value How to read: = epsilon = equal by definition to (or: is defined as) = blackboard-bold E = sub the pair xy = over script D = l of the pair y f of x 15 Sum over all the pairs xy in script D of x and y times l of y and f of x

16. Training Error • The training error is the average error over the training data • How to read: the training error epsilon-hat is equal by definition to 1 over N of the Sum from n=1 to capital N of “l” of y and f of x. Lecture 3: Decision Trees (1) 16

17. Empirical Error • Alpaydin (2010: 24): the empirical error is the proportion of training instances where the predictions of h (the hypothesis = the informed guess) do not mach the required values given in X (the training set). The error of the the hypothesis h given the training set X is: Lecture 3: Decision Trees (1) 17

18. Induction Given: • a loss function l and • a sample d from some unknown distribution D • you must compute a function f that has low expected error ε over D with respect to l. Lecture 3: Decision Trees (1) 18

19. Quiz 1: Training error • How would you define a training error on a dataset: 1. Training error is the average loss over the training sample 2. Training error is the expected prediction error over an independent test sample 3. None of the above Lecture 3: Decision Trees (1) 19

20. Quiz 2: Distributions What kind of distribution is D in the formula above? 1. Normal 2. Unknown 3. None of the above Lecture 3: Decision Trees (1) 20

21. Quiz 3: Loss function • How would you define a loss function? 1. The loss function L(actual value, predicted value) characterizes how bad predictions are 2. The loss function is an unknown distribution 3. Both definitions are incorrect. Lecture 3: Decision Trees (1) 21

22. The End Lecture 3: Decision Trees (1) 22

Lecture 3b: Decision Trees (1 part)

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (11)

Similar to Lecture 3b: Decision Trees (1 part)

Similar to Lecture 3b: Decision Trees (1 part) (20)

More from Marina Santini

More from Marina Santini (20)

Recently uploaded

Recently uploaded (20)

Lecture 3b: Decision Trees (1 part)