Class 7
Binary Classification & DecisionTree Learning
Legal Analytics
Professor Daniel Martin Katz
Professor Michael J Bommarito II
legalanalyticscourse.com
< Binary Classification >
access more at legalanalyticscourse.com
http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
access more at legalanalyticscourse.com
Classification to Predict Quantity
Classification to Predict Category
Regression Methods
Trees, Forests, Knn, etc.
access more at legalanalyticscourse.com
Adapted from Slides By
Victor Lavrenko and Nigel Goddard
@ University of Edinburgh
Take A LookThese 12
access more at legalanalyticscourse.com
72
Female
Human
3
Female
Horse
36
Male
Human
21
Male
Human
67
Male
Human
29
Female
Human
54
Male
Human
44
Male
Human
50
Male
Human
42
Female
Human
6
Male
Dog
7
Female
Human
Task = Determine Whether the Agents
Will Obtain Employment?
Yes
No
f( )
Job?
Binary Classification (Supervised Learning)
access more at legalanalyticscourse.com
Classification (Supervised Learning)
Yes
No
f( )
Job?
access more at legalanalyticscourse.com
Classification (Supervised Learning)
decision boundary
Yes
No
f( )
Job?
decision boundary
access more at legalanalyticscourse.com
Multi Class Classification
access more at legalanalyticscourse.com
https://www.youtube.com/watch?v=p5rTio1G4ys
Task = Determine Whether the Agents
Will Obtain a Loan?
Yes
Perhapsf( )
Loan?
Multi Class Classification (Supervised Learning)
No
access more at legalanalyticscourse.com
f( )
Multi Class Classification (Supervised Learning)
Loan?
Yes
Perhaps
No
access more at legalanalyticscourse.com
f( )
Loan?
Yes
Multi Class Classification (Supervised Learning)
No
Maybe
Yes
Perhaps
No
access more at legalanalyticscourse.com
Multiclass = Hyperplane
access more at legalanalyticscourse.com
Task = Determine the Age of the
Respective Agents
f( )
Age?
Regression (Supervised Learning)
#
access more at legalanalyticscourse.com
Generative
vs.
Discriminant Models
access more at legalanalyticscourse.com
access more at legalanalyticscourse.com
Follow the video
and take your
own notes
Intro to DecisionTree Learning
Classification And RegressionTree (CART)
access more at legalanalyticscourse.com
DecisionTrees in DecisionTheory
DecisionTrees in Machine Learning
≠
access more at legalanalyticscourse.com
Uses a set of binary rules applied to calculate a
target value
Used for classification (categorical variables)
or regression (continuous variables)
Different algorithms are used to determine the
“best” split at a node
Introduction to DecisionTrees
access more at legalanalyticscourse.com
“CART Approach”
to Decision Trees
Classification And RegressionTree (CART)
access more at legalanalyticscourse.com
https://www.youtube.com/watch?v=WOOTNBxbi8c
access more at legalanalyticscourse.com
http://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/
access more at legalanalyticscourse.com
http://www.r-bloggers.com/classification-tree-models/
https://www.youtube.com/watch?v=_RxqyvRK0Rw&list=PLD0F06AA0D2E8FFBA
access more at legalanalyticscourse.com
Given Some Data:
(X1, Y1), ... , (Xn, Yn)
Now We Have a New Set of X’s
We Want to Predict the Y
access more at legalanalyticscourse.com
Form a BinaryTree that
Minimizes the Error
in each leaf of the tree
CART
(Classification & RegressionTrees)
access more at legalanalyticscourse.com
Observe the Correspondence
Between the Data andTrees
access more at legalanalyticscourse.com
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
Adapted from Example
By Mathematical Monk
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
Adapted from Example
By Mathematical Monk
We want to build an
approach which can
lead to the proper
classification (labeling)
of new data points
( ) that are dropped
into this space
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
Adapted from Example
By Mathematical Monk
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
Adapted from Example
By Mathematical Monk
L e t s B e g i n t o
Partition the Space
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
1 2
1
2
Adapted from Example
By Mathematical Monk
L e t s B e g i n t o
Partition the Space
split 1
(a)
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
1 2
1
2
Adapted from Example
By Mathematical Monk
This Split Will Be
Memorialized in theTree
split 1
(a)
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
1 2
1
2
Adapted from Example
By Mathematical Monk
We Ask the Question is
Xi1 > 1 ? - with a binary
(yes or no) response
split 1
(a)
Xi1 > 1 ?
YesNo
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
1 2
1
2
Adapted from Example
By Mathematical Monk
If No - then we are in zone (a) ...
we tally the number of zeros and ones
Using Majority Rule do we assign a
classification to this rule this leaf
split 1
(a)
Xi1 > 1 ?
YesNo
(0,5)
Classify as 1
zone (a)
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
1 2
1
2
Adapted from Example
By Mathematical Monk
Here we Classify as a 1 because
(0,5) which is 0 zero’s and 5 one’s
split 1
(a)
Xi1 > 1 ?
YesNo
(0,5)
Classify as 1
zone (a)
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
1 2
1
2
Adapted from Example
By Mathematical Monk
Using a Similar Approach Lets
Begin to Fill in the Rest of theTree
split 1
(a)
Xi1 > 1 ?
YesNo
(0,5)
Classify as 1
zone (a)
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
1 2
1
2
Adapted from Example
By Mathematical Monk
split 1
(a)
Xi1 > 1 ?
YesNo
(0,5)
Classify as 1
zone (a) Xi2 > 1.45 ?
No Yes
split 2
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0split 1
split 2
split 3
1 2 2.2
1
2
Xi1 > 1 ?
(0,5)
Xi2 > 1.45 ?
(4,1)(2,3)
Classify as 1
Classify as 1 Classify as 0
(a)
zone (a)
1.45
YesNo
Adapted from Example
By Mathematical Monk
No
(b)
(c)
zone (b) zone (c)
YesNo
Yes
Xi1 > 2 ?
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0split 1
split 2
split 3
split 4
1 2 2.2
1
2
Xi1 > 1 ?
(0,5)
Xi2 > 1.45 ?
Xi1 > 2.2 ?
(1,4)(5,0)(4,1)(2,3)
Classify as 1
Classify as 1 Classify as 0
(a)
zone (a)
1.45
YesNo
Adapted from Example
By Mathematical Monk
No
(b)
(c)
(d)
(e)
zone (b) zone (c)
YesNo YesNo
Yes
zone (d)
Classify as 0 Classify as 1
zone (e)
Xi1 > 2 ?
Okay Lets Add Back the ( )
which are new items
to be classified
For simplicity sake there
is one in each zone
We Will Use theTree Because
theTree Is Our Prediction Machine
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0split 1
split 2
split 3
split 4
1 2 2.2
1
2
Xi1 > 1 ?
(0,5)
Xi2 > 1.45 ?
Xi1 > 2.2 ?
(1,4)(5,0)(4,1)(2,3)
Classify as 1
Classify as 1 Classify as 0
(a)
zone (a)
1.45
YesNo
Adapted from Example
By Mathematical Monk
No
(b)
(c)
(d)
(e)
zone (b) zone (c)
YesNo YesNo
Yes
zone (d)
Classify as 0 Classify as 1
zone (e)
Xi1 > 2 ?
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0split 1
split 2
split 3
split 4
1 2 2.2
1
2
Xi1 > 1 ?
(0,5)
Xi2 > 1.45 ?
Xi1 > 2.2 ?
(1,4)(5,0)(4,1)(2,3)
Classify as 1
Classify as 1 Classify as 0
(a)
zone (a)
1.45
YesNo
Adapted from Example
By Mathematical Monk
No
(b)
(c)
(d)
(e)
zone (b) zone (c)
Yes No YesNo
Yes
zone (d)
Classify as 0 Classify as 1
zone (e)
1
1
1
0 1
0
Xi1 > 2 ?
1
0
1
1
1
0
0
0
0
0
1
1 1
1
0
0
1
1
1
1
0
01
0
Xi1
Xi2
0
1 2
1
2
3
0
0
0
0
1
1
1
1
1
1 10
0
0
0
1
1 1
1
1 1
0
0
1
1 1
0
A B C
D
E
F
G
How about this one?
In this simple example, we
eyeballed the 2D space, partitioned
it and stopped after 4 Splits
access more at legalanalyticscourse.com
Most Real Problems
are Not So Simple ...
access more at legalanalyticscourse.com
Real problems are
n-dimensional (not 2D)
(1)
access more at legalanalyticscourse.com
For real problems, you
need to select criteria
(or a criterion) for
deciding where to
partition (split) the data
(2)
access more at legalanalyticscourse.com
For real problems you must
develop a stopping condition
or pursue recursive
partitioning of the space
(3)
access more at legalanalyticscourse.com
Solutions to these 3 Problems
are among the core questions in
algorithm selection / development
access more at legalanalyticscourse.com
From an Algorithmic Perspective -
TheTask is to Develop a
Method to Partition theTrees
access more at legalanalyticscourse.com
Must Do So Without Knowing
the Specific Contours of the
Data / Problem in Question
access more at legalanalyticscourse.com
So How Do We
TraverseThrough
The Data?
access more at legalanalyticscourse.com
Optimal Partitioning of Trees is
NP-Complete
access more at legalanalyticscourse.com
“Although any given solution to an NP-complete problem can
be verified quickly (in polynomial time), there is no known
efficient way to locate a solution in the first place; indeed, the
most notable characteristic of NP-complete problems is that no
fast solution to them is known.That is, the time required to
solve the problem using any currently known algorithm
increases very quickly as the size of the problem grows”
key implication is that one
cannot in advance determine
the “optimal tree”
access more at legalanalyticscourse.com
Breiman, et al (1984) uses a
Greedy Optimization Method
access more at legalanalyticscourse.com
Greedy Optimization Method
is used to calculate the MLE
(maximum-likelihood estimation)
access more at legalanalyticscourse.com
Greedy is a Heuristic
“makes the locally optimal choice at each stage
with the hope of finding a global optimum. In
many problems, a greedy strategy does not in
general produce an optimal solution, but
nonetheless a greedy heuristic may yield locally
optimal solutions that approximate a global optimal
solution in a reasonable time.”
access more at legalanalyticscourse.com
More onTrees (and Forests)
NextTime ...
access more at legalanalyticscourse.com
Legal Analytics
Class 7 - Binary Classification with Decision Tree Learning
daniel martin katz
blog | ComputationalLegalStudies
corp | LexPredict
michael j bommarito
twitter | @computational
blog | ComputationalLegalStudies
corp | LexPredict
twitter | @mjbommar
more content available at legalanalyticscourse.com
site | danielmartinkatz.com site | bommaritollc.com

Legal Analytics Course - Class 7 - Binary Classification with Decision Tree Learning - Professor Daniel Martin Katz + Professor Michael J Bommarito