SlideShare a Scribd company logo
1 of 91
Machine Learning
Jesse Davis
jdavis@cs.washington.edu
Outline
• Brief overview of learning
• Inductive learning
• Decision trees
A Few Quotes
• “A breakthrough in machine learning would be worth
ten Microsofts” (Bill Gates, Chairman, Microsoft)
• “Machine learning is the next Internet”
(Tony Tether, Director, DARPA)
• Machine learning is the hot new thing”
(John Hennessy, President, Stanford)
• “Web rankings today are mostly a matter of machine
learning” (Prabhakar Raghavan, Dir. Research, Yahoo)
• “Machine learning is going to result in a real revolution” (Greg
Papadopoulos, CTO, Sun)
So What Is Machine Learning?
• Automating automation
• Getting computers to program themselves
• Writing software is the bottleneck
• Let the data do the work instead!
Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging
• [Your favorite area]
Defining A Learning Problem
• A program learns from experience E with
respect to task T and performance measure P,
if it’s performance at task T, as measured by P,
improves with experience E.
• Example:
– Task: Play checkers
– Performance: % of games won
– Experience: Play games against itself
Types of Learning
• Supervised (inductive) learning
– Training data includes desired outputs
• Unsupervised learning
– Training data does not include desired outputs
• Semi-supervised learning
– Training data includes a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
Outline
• Brief overview of learning
• Inductive learning
• Decision trees
Inductive Learning
• Inductive learning or “Prediction”:
– Given examples of a function (X, F(X))
– Predict function F(X) for new examples X
• Classification
F(X) = Discrete
• Regression
F(X) = Continuous
• Probability estimation
F(X) = Probability(X):
Terminology
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Feature Space:
Properties that describe the problem
Terminology
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Example:
<0.5,2.8,+>
+
+
+ +
+
+
+
+
- -
-
- -
-
-
-
-
- +
+
+
-
-
-
+
+
Terminology
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Hypothesis:
Function for labeling examples
+
+
+ +
+
+
+
+
- -
-
- -
-
-
-
-
- +
+
+
-
-
-
+
+ Label: -
Label: +
?
?
?
?
Terminology
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Hypothesis Space:
Set of legal hypotheses
+
+
+ +
+
+
+
+
- -
-
- -
-
-
-
-
- +
+
+
-
-
-
+
+
Supervised Learning
Given: <x, f(x)> for some unknown function f
Learn: A hypothesis H, that approximates f
Example Applications:
• Disease diagnosis
x: Properties of patient (e.g., symptoms, lab test results)
f(x): Predict disease
• Automated steering
x: Bitmap picture of road in front of car
f(x): Degrees to turn the steering wheel
• Credit risk assessment
x: Customer credit history and proposed purchase
f(x): Approve purchase or not
© Daniel S. Weld 16
© Daniel S. Weld 17
© Daniel S. Weld 18
Inductive Bias
• Need to make assumptions
– Experience alone doesn’t allow us to make
conclusions about unseen data instances
• Two types of bias:
– Restriction: Limit the hypothesis space
(e.g., look at rules)
– Preference: Impose ordering on hypothesis space
(e.g., more general, consistent with data)
© Daniel S. Weld 20
© Daniel S. Weld 21
x1  y
x3  y
x4  y
© Daniel S. Weld 22
© Daniel S. Weld 23
© Daniel S. Weld 24
© Daniel S. Weld 25
© Daniel S. Weld 26
Eager
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
+
+
+ +
+
+
+
+
- -
-
- -
-
-
-
-
- +
+
+
-
-
-
+
+ Label: -
Label: +
Eager
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Label: -
Label: +
?
?
?
?
Lazy
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
+
+
+ +
+
+
+
+
- -
-
- -
-
-
-
-
- +
+
+
-
-
-
+
+
Label based
on
neighbors
?
?
?
?
Batch
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Batch
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
+
+
+ +
+
+
+
+
- -
-
- -
-
-
-
-
- +
+
+
-
-
-
+
+ Label: -
Label: +
Online
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Online
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
-
+
Label: -
Label: +
Online
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
-
+
Label: -
Label: +
+
Online
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
-
+
Label: -
Label: +
+
Outline
• Brief overview of learning
• Inductive learning
• Decision trees
Decision Trees
• Convenient Representation
– Developed with learning in mind
– Deterministic
– Comprehensible output
• Expressive
– Equivalent to propositional DNF
– Handles discrete and continuous parameters
• Simple learning algorithm
– Handles noise well
– Classify as follows
• Constructive (build DT by adding nodes)
• Eager
• Batch (but incremental versions exist)
Concept Learning
• E.g., Learn concept “Edible mushroom”
– Target Function has two values: T or F
• Represent concepts as decision trees
• Use hill climbing search thru
space of decision trees
– Start with simple concept
– Refine it into a complex concept as needed
Example: “Good day for tennis”
• Attributes of instances
– Outlook = {rainy (r), overcast (o), sunny (s)}
– Temperature = {cool (c), medium (m), hot (h)}
– Humidity = {normal (n), high (h)}
– Wind = {weak (w), strong (s)}
• Class value
– Play Tennis? = {don’t play (n), play (y)}
• Feature = attribute with one value
– E.g., outlook = sunny
• Sample instance
– outlook=sunny, temp=hot, humidity=high,
wind=weak
Experience: “Good day for tennis”
Day Outlook Temp Humid Wind PlayTennis?
d1 s h h w n
d2 s h h s n
d3 o h h w y
d4 r m h w y
d5 r c n w y
d6 r c n s n
d7 o c n s y
d8 s m h w n
d9 s c n w y
d10 r m n w y
d11 s m n s y
d12 o m h s y
d13 o h n w y
d14 r m h s n
Decision Tree Representation
Outlook
Humidity Wind
Sunny Rain
Overcast
High Normal
Weak
Strong
Play
Play
Don’t play
Play
Don’t play
Good day for tennis?
Leaves = classification
Arcs = choice of value
for parent attribute
Decision tree is equivalent to logic in disjunctive normal form
Play  (Sunny  Normal)  Overcast  (Rain  Weak)
Numeric Attributes
Outlook
Humidity Wind
Sunny Rain
Overcast
>= 75% < 75%
< 10 MPH
>= 10 MPH
Play
Play
Don’t play
Play
Don’t play
Use thresholds
to convert
numeric
attributes into
discrete values
© Daniel S. Weld 43
© Daniel S. Weld 44
DT Learning as Search
• Nodes
• Operators
• Initial node
• Heuristic?
• Goal?
Decision Trees
Tree Refinement: Sprouting the tree
Smallest tree possible: a single leaf
Information Gain
Best tree possible (???)
What is the
Simplest Tree?
Day Outlook Temp Humid Wind Play?
d1 s h h w n
d2 s h h s n
d3 o h h w y
d4 r m h w y
d5 r c n w y
d6 r c n s n
d7 o c n s y
d8 s m h w n
d9 s c n w y
d10 r m n w y
d11 s m n s y
d12 o m h s y
d13 o h n w y
d14 r m h s n
How good?
[9+, 5-]
Majority class:
correct on 9 examples
incorrect on 5 examples
© Daniel S. Weld 47
Successors Yes
Outlook Temp
Humid Wind
Which attribute should we use to split?
Disorder is bad
Homogeneity is good
No
Better
Good
Bad
© Daniel S. Weld
Entropy
.00 .50 1.00
1.0
0.5
% of example that are positive
50-50 class split
Maximum disorder
All positive
Pure distribution
Entropy (disorder) is bad
Homogeneity is good
• Let S be a set of examples
• Entropy(S) = -P log2(P) - N log2(N)
– P is proportion of pos example
– N is proportion of neg examples
– 0 log 0 == 0
• Example: S has 9 pos and 5 neg
Entropy([9+, 5-]) = -(9/14) log2(9/14) -
(5/14)log2(5/14)
= 0.940
Information Gain
• Measure of expected reduction in entropy
• Resulting from splitting along an attribute
Gain(S,A) = Entropy(S) - (|Sv| / |S|) Entropy(Sv)
Where Entropy(S) = -P log2(P) - N log2(N)

v  Values(A)
Day Wind Tennis?
d1 weak n
d2 s n
d3 weak yes
d4 weak yes
d5 weak yes
d6 s n
d7 s yes
d8 weak n
d9 weak yes
d10 weak yes
d11 s yes
d12 s yes
d13 weak yes
d14 s n
Gain of Splitting on Wind
Values(wind)=weak, strong
S = [9+, 5-]
Gain(S, wind)
= Entropy(S) - (|Sv| / |S|) Entropy(Sv)
= Entropy(S) - 8/14 Entropy(Sweak)
- 6/14 Entropy(Ss)
= 0.940 - (8/14) 0.811 - (6/14) 1.00
= .048

v  {weak, s}
Sweak = [6+, 2-]
Ss = [3+, 3-]
Decision Tree Algorithm
BuildTree(TraingData)
Split(TrainingData)
Split(D)
If (all points in D are of the same class)
Then Return
For each attribute A
Evaluate splits on attribute A
Use best split to partition D into D1, D2
Split (D1)
Split (D2)
Evaluating Attributes
Yes
Outlook Temp
Humid Wind
Gain(S,Humid)
=0.151
Gain(S,Outlook)
=0.246
Gain(S,Temp)
=0.029
Gain(S,Wind)
=0.048
Resulting Tree
Outlook
Sunny Rain
Overcast
Good day for tennis?
Don’t Play
[2+, 3-] Play
[4+]
Don’t Play
[3+, 2-]
Recurse
Outlook
Sunny Rain
Overcast
Good day for tennis?
Day Temp Humid Wind Tennis?
d1 h h weak n
d2 h h s n
d8 m h weak n
d9 c n weak yes
d11 m n s yes
One Step Later
Outlook
Humidity
Sunny Rain
Overcast
High
Normal
Play
[2+]
Play
[4+]
Don’t play
[3-]
Good day for tennis?
Don’t Play
[2+, 3-]
Recurse Again
Outlook
Humidity
Sunny Medium
Overcast
High Low
Good day for tennis?
Day Temp Humid Wind Tennis?
d4 m h weak yes
d5 c n weak yes
d6 c n s n
d10 m n weak yes
d14 m h s n
One Step Later: Final Tree
Outlook
Humidity
Sunny Rain
Overcast
High
Normal
Play
[2+]
Play
[4+]
Don’t play
[3-]
Good day for tennis?
Wind
Weak
Strong
Play
[3+]
Don’t play
[2-]
Issues
• Missing data
• Real-valued attributes
• Many-valued features
• Evaluation
• Overfitting
Missing Data 1
Day Temp Humid Wind Tennis?
d1 h h weak n
d2 h h s n
d8 m h weak n
d9 c ? weak yes
d11 m n s yes
Day Temp Humid Wind Tennis?
d1 h h weak n
d2 h h s n
d8 m h weak n
d9 c ? weak yes
d11 m n s yes
Assign most common
value at this node
?=>h
Assign most common
value for class
?=>n
Missing Data 2
• 75% h and 25% n
• Use in gain calculations
• Further subdivide if other missing attributes
• Same approach to classify test ex with missing attr
– Classification is most probable classification
– Summing over leaves where it got divided
Day Temp Humid Wind Tennis?
d1 h h weak n
d2 h h s n
d8 m h weak n
d9 c ? weak yes
d11 m n s yes
[0.75+, 3-]
[1.25+, 0-]
Real-valued Features
• Discretize?
• Threshold split using observed values?
Wind
Play
8
n
25
n
12
y
10
y
10
n
12
y
7
y
6
y
7
y
7
y
6
y
5
n
7
y
11
n
8
n
25
n
12
y
10
n
10
y
12
y
7
y
6
y
7
y
7
y
6
y
5
n
7
y
11
n
Wind
Play
>= 10
Gain = 0.048
>= 12
Gain = 0.0004
Many-valued Attributes
• Problem:
– If attribute has many values, Gain will select it
– Imagine using Date = June_6_1996
• So many values
– Divides examples into tiny sets
– Sets are likely uniform => high info gain
– Poor predictor
• Penalize these attributes
One Solution: Gain Ratio
Gain Ratio(S,A) = Gain(S,A)/SplitInfo(S,A)
SplitInfo = (|Sv| / |S|) Log2(|Sv|/|S|)

v  Values(A)
SplitInfo  entropy of S wrt values of A
(Contrast with entropy of S wrt target value)
 attribs with many uniformly distrib values
e.g. if A splits S uniformly into n sets
SplitInformation = log2(n)… = 1 for Boolean
Evaluation: Cross Validation
• Partition examples into k disjoint sets
• Now create k training sets
– Each set is union of all equiv classes except one
– So each set has (k-1)/k of the original training data
 Train 
Test
Test
Test
Cross-Validation (2)
• Leave-one-out
– Use if < 100 examples (rough estimate)
– Hold out one example, train on remaining
examples
• M of N fold
– Repeat M times
– Divide data into N folds, do N fold cross-validation
Methodology Citations
• Dietterich, T. G., (1998). Approximate
Statistical Tests for Comparing Supervised
Classification Learning Algorithms. Neural
Computation, 10 (7) 1895-1924
• Densar, J., (2006). Demsar, Statistical
Comparisons of Classifiers over Multiple Data
Sets. The Journal of Machine Learning
Research, pages 1-30.
© Daniel S. Weld 69
Overfitting
Number of Nodes in Decision tree
Accuracy
0.9
0.8
0.7
0.6
On training data
On test data
Overfitting Definition
• DT is overfit when exists another DT’ and
– DT has smaller error on training examples, but
– DT has bigger error on test examples
• Causes of overfitting
– Noisy data, or
– Training set is too small
• Solutions
– Reduced error pruning
– Early stopping
– Rule post pruning
Reduced Error Pruning
• Split data into train and validation set
• Repeat until pruning is harmful
– Remove each subtree and replace it with majority
class and evaluate on validation set
– Remove subtree that leads to largest gain in
accuracy
Test
Tune
Tune
Tune
Reduced Error Pruning Example
Outlook
Humidity Wind
Sunny Rain
Overcast
High Low
Weak
Strong
Play
Play
Don’t play
Play
Don’t play
Validation set accuracy = 0.75
Reduced Error Pruning Example
Outlook
Wind
Sunny Rain
Overcast
Weak
Strong
Play
Don’t play
Play
Don’t play
Validation set accuracy = 0.80
Reduced Error Pruning Example
Outlook
Humidity
Sunny Rain
Overcast
High Low
Play
Play
Don’t play
Play
Validation set accuracy = 0.70
Reduced Error Pruning Example
Outlook
Wind
Sunny Rain
Overcast
Weak
Strong
Play
Don’t play
Play
Don’t play
Use this as final tree
© Daniel S. Weld 76
Early Stopping
Number of Nodes in Decision tree
Accuracy
0.9
0.8
0.7
0.6
On training data
On test data
On validation data
Remember this tree and
use it as the final classifier
Post Rule Pruning
• Split data into train and validation set
• Prune each rule independently
– Remove each pre-condition and evaluate accuracy
– Pick pre-condition that leads to largest
improvement in accuracy
• Note: ways to do this using training data and
statistical tests
Conversion to Rule
Outlook
Humidity Wind
Sunny Rain
Overcast
High Low
Weak
Strong
Play
Play
Don’t play
Play
Don’t play
Outlook = Sunny  Humidity = High  Don’t play
Outlook = Sunny  Humidity = Low  Play
Outlook = Overcast  Play
…
Example
Outlook = Sunny  Humidity = High  Don’t play
Outlook = Sunny  Don’t play
Humidity = High  Don’t play
Validation set accuracy = 0.68
Validation set accuracy = 0.65
Validation set accuracy = 0.75
Keep this rule
Summary
• Overview of inductive learning
– Hypothesis spaces
– Inductive bias
– Components of a learning algorithm
• Decision trees
– Algorithm for constructing trees
– Issues (e.g., real-valued data, overfitting)
end
Gain of Split
on Humidity
Day Outlook Temp Humid Wind Play?
d1 s h h w n
d2 s h h s n
d3 o h h w y
d4 r m h w y
d5 r c n w y
d6 r c n s n
d7 o c n s y
d8 s m h w n
d9 s c n w y
d10 r m n w y
d11 s m n s y
d12 o m h s y
d13 o h n w y
d14 r m h s n
Entropy([9+,5-]) = 0.940
Entropy([4+,3-]) = 0.985
Entropy([6+,-1]) = 0.592
Gain = 0.940- 0.985/2 - 0.592/2= 0.151
© Daniel S. Weld 85
Overfitting 2
Figure from w.w.cohen
© Daniel S. Weld 86
Choosing the Training Experience
• Credit assignment problem:
– Direct training examples:
• E.g. individual checker boards + correct move for each
• Supervised learning
– Indirect training examples :
• E.g. complete sequence of moves and final result
• Reinforcement learning
• Which examples:
– Random, teacher chooses, learner chooses
© Daniel S. Weld 87
Example: Checkers
• Task T:
– Playing checkers
• Performance Measure P:
– Percent of games won against opponents
• Experience E:
– Playing practice games against itself
• Target Function
– V: board -> R
• Representation of approx. of target function
V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6
© Daniel S. Weld 88
Choosing the Target Function
• What type of knowledge will be learned?
• How will the knowledge be used by the
performance program?
• E.g. checkers program
– Assume it knows legal moves
– Needs to choose best move
– So learn function: F: Boards -> Moves
• hard to learn
– Alternative: F: Boards -> R
Note similarity to choice of problem space
© Daniel S. Weld 89
The Ideal Evaluation Function
• V(b) = 100 if b is a final, won board
• V(b) = -100 if b is a final, lost board
• V(b) = 0 if b is a final, drawn board
• Otherwise, if b is not final
V(b) = V(s) where s is best, reachable final board
Nonoperational…
Want operational approximation of V: V
© Daniel S. Weld 90
How Represent Target Function
• x1 = number of black pieces on the board
• x2 = number of red pieces on the board
• x3 = number of black kings on the board
• x4 = number of red kings on the board
• x5 = num of black pieces threatened by red
• x6 = num of red pieces threatened by black
V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6
Now just need to learn 7 numbers!
© Daniel S. Weld 91
Target Function
• Profound Formulation:
Can express any type of inductive learning
as approximating a function
• E.g., Checkers
– V: boards -> evaluation
• E.g., Handwriting recognition
– V: image -> word
• E.g., Mushrooms
– V: mushroom-attributes -> {E, P}
© Daniel S. Weld 92
Choosing the Training Experience
• Credit assignment problem:
– Direct training examples:
• E.g. individual checker boards + correct move for each
• Supervised learning
– Indirect training examples :
• E.g. complete sequence of moves and final result
• Reinforcement learning
• Which examples:
– Random, teacher chooses, learner chooses
A Framework for Learning Algorithms
• Search procedure
– Direction computation: Solve for hypothesis directly
– Local search: Start with an initial hypothesis on make local
refinements
– Constructive search: start with empty hypothesis and add
constraints
• Timing
– Eager: Analyze data and construct explicit hypothesis
– Lazy: Store data and construct ad-hoc hypothesis to classify data
• Online vs. batch
– Online
– Batch

More Related Content

Similar to machine learning.ppt

Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014PyData
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
CMSC 131 Discussion 09-07-2011
CMSC 131 Discussion  09-07-2011CMSC 131 Discussion  09-07-2011
CMSC 131 Discussion 09-07-2011daslerpc
 
Clean, Learn and Visualise data with R
Clean, Learn and Visualise data with RClean, Learn and Visualise data with R
Clean, Learn and Visualise data with RBarbara Fusinska
 
Part XIV
Part XIVPart XIV
Part XIVbutest
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature SelectionJames Huang
 
Basics of Dynamic programming
Basics of Dynamic programming Basics of Dynamic programming
Basics of Dynamic programming Yan Xu
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Simultaneous,Deep,Transfer,Across, Domains,and,Tasks
Simultaneous,Deep,Transfer,Across, Domains,and,TasksSimultaneous,Deep,Transfer,Across, Domains,and,Tasks
Simultaneous,Deep,Transfer,Across, Domains,and,TasksAlejandro Cartas
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 
Data Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerData Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerProduct School
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision treesankit_ppt
 
Embedded SW Interview Questions
Embedded SW Interview Questions Embedded SW Interview Questions
Embedded SW Interview Questions PiTechnologies
 

Similar to machine learning.ppt (20)

Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
 
Quepy
QuepyQuepy
Quepy
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Lecture2-DT.pptx
Lecture2-DT.pptxLecture2-DT.pptx
Lecture2-DT.pptx
 
CMSC 131 Discussion 09-07-2011
CMSC 131 Discussion  09-07-2011CMSC 131 Discussion  09-07-2011
CMSC 131 Discussion 09-07-2011
 
Clean, Learn and Visualise data with R
Clean, Learn and Visualise data with RClean, Learn and Visualise data with R
Clean, Learn and Visualise data with R
 
Part XIV
Part XIVPart XIV
Part XIV
 
Self healing data
Self healing dataSelf healing data
Self healing data
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
 
Basics of Dynamic programming
Basics of Dynamic programming Basics of Dynamic programming
Basics of Dynamic programming
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Simultaneous,Deep,Transfer,Across, Domains,and,Tasks
Simultaneous,Deep,Transfer,Across, Domains,and,TasksSimultaneous,Deep,Transfer,Across, Domains,and,Tasks
Simultaneous,Deep,Transfer,Across, Domains,and,Tasks
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
Data Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerData Science: The Product Manager's Primer
Data Science: The Product Manager's Primer
 
BA lab1.pptx
BA lab1.pptxBA lab1.pptx
BA lab1.pptx
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision trees
 
Embedded SW Interview Questions
Embedded SW Interview Questions Embedded SW Interview Questions
Embedded SW Interview Questions
 

More from Pratik Gohel

Introduction to Asic Design and VLSI Design
Introduction to Asic Design and VLSI DesignIntroduction to Asic Design and VLSI Design
Introduction to Asic Design and VLSI DesignPratik Gohel
 
Information and Theory coding Lecture 18
Information and Theory coding Lecture 18Information and Theory coding Lecture 18
Information and Theory coding Lecture 18Pratik Gohel
 
introduction to machine learning 3c.pptx
introduction to machine learning 3c.pptxintroduction to machine learning 3c.pptx
introduction to machine learning 3c.pptxPratik Gohel
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxPratik Gohel
 
introduction to machine learning 3d-collab-filtering.pptx
introduction to machine learning 3d-collab-filtering.pptxintroduction to machine learning 3d-collab-filtering.pptx
introduction to machine learning 3d-collab-filtering.pptxPratik Gohel
 
Introduction to embedded System.pptx
Introduction to embedded System.pptxIntroduction to embedded System.pptx
Introduction to embedded System.pptxPratik Gohel
 
710402_Lecture 1.ppt
710402_Lecture 1.ppt710402_Lecture 1.ppt
710402_Lecture 1.pptPratik Gohel
 
Interdependencies of IoT and cloud computing.pptx
Interdependencies of IoT and cloud computing.pptxInterdependencies of IoT and cloud computing.pptx
Interdependencies of IoT and cloud computing.pptxPratik Gohel
 
6-IoT protocol.pptx
6-IoT protocol.pptx6-IoT protocol.pptx
6-IoT protocol.pptxPratik Gohel
 
C Programming for ARM.pptx
C Programming for ARM.pptxC Programming for ARM.pptx
C Programming for ARM.pptxPratik Gohel
 
ARM Introduction.pptx
ARM Introduction.pptxARM Introduction.pptx
ARM Introduction.pptxPratik Gohel
 

More from Pratik Gohel (18)

Introduction to Asic Design and VLSI Design
Introduction to Asic Design and VLSI DesignIntroduction to Asic Design and VLSI Design
Introduction to Asic Design and VLSI Design
 
Information and Theory coding Lecture 18
Information and Theory coding Lecture 18Information and Theory coding Lecture 18
Information and Theory coding Lecture 18
 
introduction to machine learning 3c.pptx
introduction to machine learning 3c.pptxintroduction to machine learning 3c.pptx
introduction to machine learning 3c.pptx
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptx
 
introduction to machine learning 3d-collab-filtering.pptx
introduction to machine learning 3d-collab-filtering.pptxintroduction to machine learning 3d-collab-filtering.pptx
introduction to machine learning 3d-collab-filtering.pptx
 
13486500-FFT.ppt
13486500-FFT.ppt13486500-FFT.ppt
13486500-FFT.ppt
 
Introduction to embedded System.pptx
Introduction to embedded System.pptxIntroduction to embedded System.pptx
Introduction to embedded System.pptx
 
Lecture 3.ppt
Lecture 3.pptLecture 3.ppt
Lecture 3.ppt
 
710402_Lecture 1.ppt
710402_Lecture 1.ppt710402_Lecture 1.ppt
710402_Lecture 1.ppt
 
UNIT-2.pptx
UNIT-2.pptxUNIT-2.pptx
UNIT-2.pptx
 
Interdependencies of IoT and cloud computing.pptx
Interdependencies of IoT and cloud computing.pptxInterdependencies of IoT and cloud computing.pptx
Interdependencies of IoT and cloud computing.pptx
 
Chapter1.pdf
Chapter1.pdfChapter1.pdf
Chapter1.pdf
 
6-IoT protocol.pptx
6-IoT protocol.pptx6-IoT protocol.pptx
6-IoT protocol.pptx
 
IOT gateways.pptx
IOT gateways.pptxIOT gateways.pptx
IOT gateways.pptx
 
AVRTIMER.pptx
AVRTIMER.pptxAVRTIMER.pptx
AVRTIMER.pptx
 
C Programming for ARM.pptx
C Programming for ARM.pptxC Programming for ARM.pptx
C Programming for ARM.pptx
 
ARM Introduction.pptx
ARM Introduction.pptxARM Introduction.pptx
ARM Introduction.pptx
 
arm
armarm
arm
 

Recently uploaded

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 

Recently uploaded (20)

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 

machine learning.ppt

  • 2. Outline • Brief overview of learning • Inductive learning • Decision trees
  • 3. A Few Quotes • “A breakthrough in machine learning would be worth ten Microsofts” (Bill Gates, Chairman, Microsoft) • “Machine learning is the next Internet” (Tony Tether, Director, DARPA) • Machine learning is the hot new thing” (John Hennessy, President, Stanford) • “Web rankings today are mostly a matter of machine learning” (Prabhakar Raghavan, Dir. Research, Yahoo) • “Machine learning is going to result in a real revolution” (Greg Papadopoulos, CTO, Sun)
  • 4. So What Is Machine Learning? • Automating automation • Getting computers to program themselves • Writing software is the bottleneck • Let the data do the work instead!
  • 6. Sample Applications • Web search • Computational biology • Finance • E-commerce • Space exploration • Robotics • Information extraction • Social networks • Debugging • [Your favorite area]
  • 7. Defining A Learning Problem • A program learns from experience E with respect to task T and performance measure P, if it’s performance at task T, as measured by P, improves with experience E. • Example: – Task: Play checkers – Performance: % of games won – Experience: Play games against itself
  • 8. Types of Learning • Supervised (inductive) learning – Training data includes desired outputs • Unsupervised learning – Training data does not include desired outputs • Semi-supervised learning – Training data includes a few desired outputs • Reinforcement learning – Rewards from sequence of actions
  • 9. Outline • Brief overview of learning • Inductive learning • Decision trees
  • 10. Inductive Learning • Inductive learning or “Prediction”: – Given examples of a function (X, F(X)) – Predict function F(X) for new examples X • Classification F(X) = Discrete • Regression F(X) = Continuous • Probability estimation F(X) = Probability(X):
  • 11. Terminology 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 Feature Space: Properties that describe the problem
  • 12. Terminology 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 Example: <0.5,2.8,+> + + + + + + + + - - - - - - - - - - + + + - - - + +
  • 13. Terminology 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 Hypothesis: Function for labeling examples + + + + + + + + - - - - - - - - - - + + + - - - + + Label: - Label: + ? ? ? ?
  • 14. Terminology 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 Hypothesis Space: Set of legal hypotheses + + + + + + + + - - - - - - - - - - + + + - - - + +
  • 15. Supervised Learning Given: <x, f(x)> for some unknown function f Learn: A hypothesis H, that approximates f Example Applications: • Disease diagnosis x: Properties of patient (e.g., symptoms, lab test results) f(x): Predict disease • Automated steering x: Bitmap picture of road in front of car f(x): Degrees to turn the steering wheel • Credit risk assessment x: Customer credit history and proposed purchase f(x): Approve purchase or not
  • 16. © Daniel S. Weld 16
  • 17. © Daniel S. Weld 17
  • 18. © Daniel S. Weld 18
  • 19. Inductive Bias • Need to make assumptions – Experience alone doesn’t allow us to make conclusions about unseen data instances • Two types of bias: – Restriction: Limit the hypothesis space (e.g., look at rules) – Preference: Impose ordering on hypothesis space (e.g., more general, consistent with data)
  • 20. © Daniel S. Weld 20
  • 21. © Daniel S. Weld 21 x1  y x3  y x4  y
  • 22. © Daniel S. Weld 22
  • 23. © Daniel S. Weld 23
  • 24. © Daniel S. Weld 24
  • 25. © Daniel S. Weld 25
  • 26. © Daniel S. Weld 26
  • 27. Eager 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 + + + + + + + + - - - - - - - - - - + + + - - - + + Label: - Label: +
  • 28. Eager 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 Label: - Label: + ? ? ? ?
  • 29. Lazy 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 + + + + + + + + - - - - - - - - - - + + + - - - + + Label based on neighbors ? ? ? ?
  • 30. Batch 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0
  • 31. Batch 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 + + + + + + + + - - - - - - - - - - + + + - - - + + Label: - Label: +
  • 32. Online 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0
  • 33. Online 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 - + Label: - Label: +
  • 34. Online 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 - + Label: - Label: + +
  • 35. Online 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 - + Label: - Label: + +
  • 36. Outline • Brief overview of learning • Inductive learning • Decision trees
  • 37. Decision Trees • Convenient Representation – Developed with learning in mind – Deterministic – Comprehensible output • Expressive – Equivalent to propositional DNF – Handles discrete and continuous parameters • Simple learning algorithm – Handles noise well – Classify as follows • Constructive (build DT by adding nodes) • Eager • Batch (but incremental versions exist)
  • 38. Concept Learning • E.g., Learn concept “Edible mushroom” – Target Function has two values: T or F • Represent concepts as decision trees • Use hill climbing search thru space of decision trees – Start with simple concept – Refine it into a complex concept as needed
  • 39. Example: “Good day for tennis” • Attributes of instances – Outlook = {rainy (r), overcast (o), sunny (s)} – Temperature = {cool (c), medium (m), hot (h)} – Humidity = {normal (n), high (h)} – Wind = {weak (w), strong (s)} • Class value – Play Tennis? = {don’t play (n), play (y)} • Feature = attribute with one value – E.g., outlook = sunny • Sample instance – outlook=sunny, temp=hot, humidity=high, wind=weak
  • 40. Experience: “Good day for tennis” Day Outlook Temp Humid Wind PlayTennis? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s n d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n
  • 41. Decision Tree Representation Outlook Humidity Wind Sunny Rain Overcast High Normal Weak Strong Play Play Don’t play Play Don’t play Good day for tennis? Leaves = classification Arcs = choice of value for parent attribute Decision tree is equivalent to logic in disjunctive normal form Play  (Sunny  Normal)  Overcast  (Rain  Weak)
  • 42. Numeric Attributes Outlook Humidity Wind Sunny Rain Overcast >= 75% < 75% < 10 MPH >= 10 MPH Play Play Don’t play Play Don’t play Use thresholds to convert numeric attributes into discrete values
  • 43. © Daniel S. Weld 43
  • 44. © Daniel S. Weld 44
  • 45. DT Learning as Search • Nodes • Operators • Initial node • Heuristic? • Goal? Decision Trees Tree Refinement: Sprouting the tree Smallest tree possible: a single leaf Information Gain Best tree possible (???)
  • 46. What is the Simplest Tree? Day Outlook Temp Humid Wind Play? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s n d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n How good? [9+, 5-] Majority class: correct on 9 examples incorrect on 5 examples
  • 47. © Daniel S. Weld 47 Successors Yes Outlook Temp Humid Wind Which attribute should we use to split?
  • 48. Disorder is bad Homogeneity is good No Better Good Bad
  • 49. © Daniel S. Weld Entropy .00 .50 1.00 1.0 0.5 % of example that are positive 50-50 class split Maximum disorder All positive Pure distribution
  • 50. Entropy (disorder) is bad Homogeneity is good • Let S be a set of examples • Entropy(S) = -P log2(P) - N log2(N) – P is proportion of pos example – N is proportion of neg examples – 0 log 0 == 0 • Example: S has 9 pos and 5 neg Entropy([9+, 5-]) = -(9/14) log2(9/14) - (5/14)log2(5/14) = 0.940
  • 51. Information Gain • Measure of expected reduction in entropy • Resulting from splitting along an attribute Gain(S,A) = Entropy(S) - (|Sv| / |S|) Entropy(Sv) Where Entropy(S) = -P log2(P) - N log2(N)  v  Values(A)
  • 52. Day Wind Tennis? d1 weak n d2 s n d3 weak yes d4 weak yes d5 weak yes d6 s n d7 s yes d8 weak n d9 weak yes d10 weak yes d11 s yes d12 s yes d13 weak yes d14 s n Gain of Splitting on Wind Values(wind)=weak, strong S = [9+, 5-] Gain(S, wind) = Entropy(S) - (|Sv| / |S|) Entropy(Sv) = Entropy(S) - 8/14 Entropy(Sweak) - 6/14 Entropy(Ss) = 0.940 - (8/14) 0.811 - (6/14) 1.00 = .048  v  {weak, s} Sweak = [6+, 2-] Ss = [3+, 3-]
  • 53. Decision Tree Algorithm BuildTree(TraingData) Split(TrainingData) Split(D) If (all points in D are of the same class) Then Return For each attribute A Evaluate splits on attribute A Use best split to partition D into D1, D2 Split (D1) Split (D2)
  • 54. Evaluating Attributes Yes Outlook Temp Humid Wind Gain(S,Humid) =0.151 Gain(S,Outlook) =0.246 Gain(S,Temp) =0.029 Gain(S,Wind) =0.048
  • 55. Resulting Tree Outlook Sunny Rain Overcast Good day for tennis? Don’t Play [2+, 3-] Play [4+] Don’t Play [3+, 2-]
  • 56. Recurse Outlook Sunny Rain Overcast Good day for tennis? Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c n weak yes d11 m n s yes
  • 57. One Step Later Outlook Humidity Sunny Rain Overcast High Normal Play [2+] Play [4+] Don’t play [3-] Good day for tennis? Don’t Play [2+, 3-]
  • 58. Recurse Again Outlook Humidity Sunny Medium Overcast High Low Good day for tennis? Day Temp Humid Wind Tennis? d4 m h weak yes d5 c n weak yes d6 c n s n d10 m n weak yes d14 m h s n
  • 59. One Step Later: Final Tree Outlook Humidity Sunny Rain Overcast High Normal Play [2+] Play [4+] Don’t play [3-] Good day for tennis? Wind Weak Strong Play [3+] Don’t play [2-]
  • 60. Issues • Missing data • Real-valued attributes • Many-valued features • Evaluation • Overfitting
  • 61. Missing Data 1 Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c ? weak yes d11 m n s yes Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c ? weak yes d11 m n s yes Assign most common value at this node ?=>h Assign most common value for class ?=>n
  • 62. Missing Data 2 • 75% h and 25% n • Use in gain calculations • Further subdivide if other missing attributes • Same approach to classify test ex with missing attr – Classification is most probable classification – Summing over leaves where it got divided Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c ? weak yes d11 m n s yes [0.75+, 3-] [1.25+, 0-]
  • 63. Real-valued Features • Discretize? • Threshold split using observed values? Wind Play 8 n 25 n 12 y 10 y 10 n 12 y 7 y 6 y 7 y 7 y 6 y 5 n 7 y 11 n 8 n 25 n 12 y 10 n 10 y 12 y 7 y 6 y 7 y 7 y 6 y 5 n 7 y 11 n Wind Play >= 10 Gain = 0.048 >= 12 Gain = 0.0004
  • 64. Many-valued Attributes • Problem: – If attribute has many values, Gain will select it – Imagine using Date = June_6_1996 • So many values – Divides examples into tiny sets – Sets are likely uniform => high info gain – Poor predictor • Penalize these attributes
  • 65. One Solution: Gain Ratio Gain Ratio(S,A) = Gain(S,A)/SplitInfo(S,A) SplitInfo = (|Sv| / |S|) Log2(|Sv|/|S|)  v  Values(A) SplitInfo  entropy of S wrt values of A (Contrast with entropy of S wrt target value)  attribs with many uniformly distrib values e.g. if A splits S uniformly into n sets SplitInformation = log2(n)… = 1 for Boolean
  • 66. Evaluation: Cross Validation • Partition examples into k disjoint sets • Now create k training sets – Each set is union of all equiv classes except one – So each set has (k-1)/k of the original training data  Train  Test Test Test
  • 67. Cross-Validation (2) • Leave-one-out – Use if < 100 examples (rough estimate) – Hold out one example, train on remaining examples • M of N fold – Repeat M times – Divide data into N folds, do N fold cross-validation
  • 68. Methodology Citations • Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10 (7) 1895-1924 • Densar, J., (2006). Demsar, Statistical Comparisons of Classifiers over Multiple Data Sets. The Journal of Machine Learning Research, pages 1-30.
  • 69. © Daniel S. Weld 69 Overfitting Number of Nodes in Decision tree Accuracy 0.9 0.8 0.7 0.6 On training data On test data
  • 70. Overfitting Definition • DT is overfit when exists another DT’ and – DT has smaller error on training examples, but – DT has bigger error on test examples • Causes of overfitting – Noisy data, or – Training set is too small • Solutions – Reduced error pruning – Early stopping – Rule post pruning
  • 71. Reduced Error Pruning • Split data into train and validation set • Repeat until pruning is harmful – Remove each subtree and replace it with majority class and evaluate on validation set – Remove subtree that leads to largest gain in accuracy Test Tune Tune Tune
  • 72. Reduced Error Pruning Example Outlook Humidity Wind Sunny Rain Overcast High Low Weak Strong Play Play Don’t play Play Don’t play Validation set accuracy = 0.75
  • 73. Reduced Error Pruning Example Outlook Wind Sunny Rain Overcast Weak Strong Play Don’t play Play Don’t play Validation set accuracy = 0.80
  • 74. Reduced Error Pruning Example Outlook Humidity Sunny Rain Overcast High Low Play Play Don’t play Play Validation set accuracy = 0.70
  • 75. Reduced Error Pruning Example Outlook Wind Sunny Rain Overcast Weak Strong Play Don’t play Play Don’t play Use this as final tree
  • 76. © Daniel S. Weld 76 Early Stopping Number of Nodes in Decision tree Accuracy 0.9 0.8 0.7 0.6 On training data On test data On validation data Remember this tree and use it as the final classifier
  • 77. Post Rule Pruning • Split data into train and validation set • Prune each rule independently – Remove each pre-condition and evaluate accuracy – Pick pre-condition that leads to largest improvement in accuracy • Note: ways to do this using training data and statistical tests
  • 78. Conversion to Rule Outlook Humidity Wind Sunny Rain Overcast High Low Weak Strong Play Play Don’t play Play Don’t play Outlook = Sunny  Humidity = High  Don’t play Outlook = Sunny  Humidity = Low  Play Outlook = Overcast  Play …
  • 79. Example Outlook = Sunny  Humidity = High  Don’t play Outlook = Sunny  Don’t play Humidity = High  Don’t play Validation set accuracy = 0.68 Validation set accuracy = 0.65 Validation set accuracy = 0.75 Keep this rule
  • 80. Summary • Overview of inductive learning – Hypothesis spaces – Inductive bias – Components of a learning algorithm • Decision trees – Algorithm for constructing trees – Issues (e.g., real-valued data, overfitting)
  • 81. end
  • 82. Gain of Split on Humidity Day Outlook Temp Humid Wind Play? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s n d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n Entropy([9+,5-]) = 0.940 Entropy([4+,3-]) = 0.985 Entropy([6+,-1]) = 0.592 Gain = 0.940- 0.985/2 - 0.592/2= 0.151
  • 83. © Daniel S. Weld 85 Overfitting 2 Figure from w.w.cohen
  • 84. © Daniel S. Weld 86 Choosing the Training Experience • Credit assignment problem: – Direct training examples: • E.g. individual checker boards + correct move for each • Supervised learning – Indirect training examples : • E.g. complete sequence of moves and final result • Reinforcement learning • Which examples: – Random, teacher chooses, learner chooses
  • 85. © Daniel S. Weld 87 Example: Checkers • Task T: – Playing checkers • Performance Measure P: – Percent of games won against opponents • Experience E: – Playing practice games against itself • Target Function – V: board -> R • Representation of approx. of target function V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6
  • 86. © Daniel S. Weld 88 Choosing the Target Function • What type of knowledge will be learned? • How will the knowledge be used by the performance program? • E.g. checkers program – Assume it knows legal moves – Needs to choose best move – So learn function: F: Boards -> Moves • hard to learn – Alternative: F: Boards -> R Note similarity to choice of problem space
  • 87. © Daniel S. Weld 89 The Ideal Evaluation Function • V(b) = 100 if b is a final, won board • V(b) = -100 if b is a final, lost board • V(b) = 0 if b is a final, drawn board • Otherwise, if b is not final V(b) = V(s) where s is best, reachable final board Nonoperational… Want operational approximation of V: V
  • 88. © Daniel S. Weld 90 How Represent Target Function • x1 = number of black pieces on the board • x2 = number of red pieces on the board • x3 = number of black kings on the board • x4 = number of red kings on the board • x5 = num of black pieces threatened by red • x6 = num of red pieces threatened by black V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6 Now just need to learn 7 numbers!
  • 89. © Daniel S. Weld 91 Target Function • Profound Formulation: Can express any type of inductive learning as approximating a function • E.g., Checkers – V: boards -> evaluation • E.g., Handwriting recognition – V: image -> word • E.g., Mushrooms – V: mushroom-attributes -> {E, P}
  • 90. © Daniel S. Weld 92 Choosing the Training Experience • Credit assignment problem: – Direct training examples: • E.g. individual checker boards + correct move for each • Supervised learning – Indirect training examples : • E.g. complete sequence of moves and final result • Reinforcement learning • Which examples: – Random, teacher chooses, learner chooses
  • 91. A Framework for Learning Algorithms • Search procedure – Direction computation: Solve for hypothesis directly – Local search: Start with an initial hypothesis on make local refinements – Constructive search: start with empty hypothesis and add constraints • Timing – Eager: Analyze data and construct explicit hypothesis – Lazy: Store data and construct ad-hoc hypothesis to classify data • Online vs. batch – Online – Batch