classification edition
Machine Learning in 5 Minutes
Brian Lange
hi, i’m a
data scientist
classification
algorithms
popular examples
-spam filters
-the Sorting Hat
things to know
- you need data labeled with the correct answers to
“train” these algorithms before they work
- feature = dimension = attribute of the data
- class = category = Harry Potter house
linear discriminants
“draw a line through it”
linear discriminants
“draw a line through it”
linear discriminants
“draw a line through it”
linear discriminants
“draw a line through it”
🎉
define what “shitty” means
6 wrong
define what “shitty” means
4 wrong
a map of shittiness
to find the least shitty line
shittiness
slope
intercept
probably don’t use these
linear discriminants:
logistic regression
“divide it with a log function”
logistic regression
“divide it with a log function”
🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉
+ gives you probabilities
+ the model is a formula
+ can “threshold” to make model more or less
conservative
💩💩💩💩💩💩💩💩💩💩💩
- only works with linear decision boundaries
SVMs (support vector machines)
“*advanced* draw a line through it”
- better definition of “shitty”
- lines can turn into non-linear
shapes if you transform your
data
💩
💩
“the kernel trick”
🎉
woooooooooooo
🎉🎉
SVMs (support vector machines)
“*advanced* draw a line through it”
SVMs (support vector machines)
“*advanced* draw a line through it”
🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉
works well on a lot of different shapes of data
thanks to the kernel trick
💩💩💩💩💩💩💩💩💩💩💩
not super easy to explain to people
can only kinda do probabilities
KNN (k-nearest neighbors)
“what do similar cases look like?”
KNN (k-nearest neighbors)
“what do similar cases look like?”
k=1
KNN (k-nearest neighbors)
“what do similar cases look like?”
k=2
KNN (k-nearest neighbors)
“what do similar cases look like?”
k=1
KNN (k-nearest neighbors)
“what do similar cases look like?”
k=2
KNN (k-nearest neighbors)
“what do similar cases look like?”
k=3
KNN (k-nearest neighbors)
“what do similar cases look like?”
KNN (k-nearest neighbors)
“what do similar cases look like?”
🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉
+ no training, adding new data is easy
+ you get to define “distance” 

💩💩💩💩💩💩💩💩💩💩💩
- can be outlier-sensitive
- you have to define “distance”
decision tree learners
make a flow chart of it
decision tree learners
make a flow chart of it
x < 3?
yes no
3
decision tree learners
make a flow chart of it
x < 3?
yes no
y < 4?
yes no
3
4
decision tree learners
make a flow chart of it
x < 3?
yes no
y < 4?
yes no
x < 5?
yes no
3 5
4
decision tree learners
make a flow chart of it
🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉
+ fit all kinds of arbitrary shapes
+ output is a clear set of
conditionals

💩💩💩💩💩💩💩💩💩💩💩
- extremely prone to overfitting
- have to rebuild when you get new
data
- no probability estimates
ensemble models
make a bunch of models and combine them
ensemble models
make a bunch of models and combine them
🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉
- don’t overfit as much as their component parts
- Generally don’t require much parameter tweaking
- If data doesn’t change very often, you can make
them semi-online by just adding new trees
- Can provide probabilities
💩💩💩💩💩💩💩💩💩💩💩
- Slower than their component parts (though if
those are fast, it doesn’t matter)

Machine Learning in 5 Minutes— Classification