MLSEV Virtual. Supervised vs Unsupervised

#MLSEV 2
Supervised vs. Unsupervised
The Lay of the Land
Charles Parker
VP Algorithms, BigML, Inc

#MLSEV 3
Machine Learning Landscape
It’s not just you - this stuff can be hard
• Do I have data on which I can do
machine learning?
• Do I have a problem to which I can apply
machine learning?
• Should I apply machine learning to that
problem?
• What if my problem doesn’t match a
traditional machine learning problem?

#MLSEV 5
Getting Your Data In Order
• Data takes many shapes and sizes
• Databases
• Collections of multimedia files
• Log files
• The largest class of ML algorithms
generally expect your data in a
tabular form
• If it isn’t in that form, you’ve got to
get it there

#MLSEV 6
Rows: What do you want to Know About?
• Each row is a thing that you want to have
more information about
• Churn prediction: Each row is a customer
• Medical diagnosis: Each row is a patient
• Credit card fraud: Each row is a
transaction
• Market closing price prediction: Each row
is a day

#MLSEV 7
Columns: What Information do you Have?
• Each column is a piece of information you can get about
the thing represented by the row
• Churn prediction: Last month’s bill, number of times
support was called, whether or not the customer churned
• Medical diagnosis: Body temperature, BMI, whether or not
the patient has a disease
• Credit card fraud: Transaction geolocation, the address of
the card holder, the amount
• Market closing price prediction: Opening price, volume
that day, day of week

#MLSEV 8
Feature Engineering
• You should in general try to reduce complex data to features that are either
numeric or categorical (i.e., a variable with a finite set of possible values)
• Aside: A good categorical feature should have no unique values and the set of
possible values should be small (less than 10 is good, less than 100 is maybe okay)
• Text data can be reduced to counts of informative words
• Strings representing a date can be reduced to the parts of the date (month, year,
day of month, day of week)
• Sometimes, you must do this yourself, but in some common cases it can be
automated (BigML does this for text and date time data)

#MLSEV 9
Special Case: Aggregation
• If the rows in your data do not match the
thing you want to know about, you need
to do some sort of data transformation
• Problem: You have a table of
transactions, but you want to do
customer segmentation
• Solution: Create features that are per-
customer aggregations (e.g., total
number of transactions, average
purchase size, etc.)

#MLSEV 11
Learning From Data
• In supervised learning, one of those columns is special and
is variously called the “objective”, “target variable” or
“label”.
• This is something we know when we’re training, but don’t
know at prediction time (but wish we did)
• Supervised machine learning creates a program to
predict that value from the other values in the training
data
• Said another way, it creats a program that transforms
the things you know into the things you want to know

#MLSEV 12
The Objective Field
• The column in your data that you don’t
know in advance, but wish you did
• Churn prediction: Whether or not the
customer churned
• Medical diagnosis: Whether or not the
patent has a disease
• Credit card fraud: Whether or not the
transaction was fraudulent
• Market closing price prediction: The
closing price of the market that day

#MLSEV 13
So Many Algorithms!
• All supervised learning algorithms are
doing the same thing
• So why are there so many of them?
• Different algorithms make different assumptions
about the function they’re trying to fit
• Different algorithms have very different
performance characteristics
• The “right” algorithm depends on the
problem you’re trying to solve and the
data that you’re using to solve it

#MLSEV 14
A Simple Algorithmic Ontology
Amount of data required Linear models < trees, ensembles < deep learning
Potential to overﬁt Linear models < ensembles < trees, deep learning
Speed Linear models, trees < ensembles < deep learning
Representational Power Linear models < trees < ensembles < deep learning
• How much data do you have
• How fast do you need things to go?
• How much performance do you really need?

#MLSEV 15
The Triple Tradeoﬀ
Prediction error
Training data size Algorithmic power

#MLSEV 16
Unsupervised Learning

#MLSEV 17
I Have Nothing To Predict!
• What if there is no objective column? Is all
lost?
• Which segment does this customer fit into?
• What is this collection of documents about?
• What are some strong correlations in this dataset?
• Find me some points that are odd in this data
• This is unsupervised learning
• Unsupervised learning creates a structure
that explains all or part of the data

#MLSEV 18
Supervised Learning
Predict customer churn from the rest of the
features, like calls to support and last
month’s bill
• We have a bunch of columns we
know, both now and when we
make a prediction
• We have one column that we know
now, but would like to know
without having to acquire the
answer again
• Use the former to predict the latter

#MLSEV 19
Clustering
The best way to break these customers up
into three groups is group 1, with one
customer, group 2, with three customers and
group 3 with two customers
• We have a bunch of columns we
know, but nothing to predict
• We'd like to see the groups this
data “naturally” falls into
• Applications: Customer
Segmentation, Recommendation

#MLSEV 20
Association Rules
If the last months bill was greater than $200
and the user called support more than twice
then the customer usually churns
• We have a bunch of columns
• We don't have a specific prediction
in mind, but we’d like to see simple
rules where one thing predicts
another
• Applications: Market basket
analysis, data exploration, simple
modeling

#MLSEV 21
Topic Modeling
Create a model of the topics that best
explain these text fields
• We have text data
• We’d like to know what this text
data is “about”, in terms of groups
of words that tend to occur
together
• Applications: Document discovery,
preprocessing for classification

#MLSEV 22
Anomaly Detection
This combination of feature values is unusual
amongst all combinations of values in the
dataset
• We have a bunch of rows
• We know most of them are the
same in some way (they are the
"usual case”)
• But a very few are not normal
• We'd like to find these very few
• Applications: Fraud detection, data
cleaning

#MLSEV 24
I’m Sure You’re All Very Excited
• You’ve got data and a problem that ML
can solve. Great!
• Now, should you use ML to solve that
problem?
• What are some useful ways to think about
that question?

#MLSEV 25
Expert System: Expert And Programmer
• Historical computerized expert systems
are based on the knowledge of two
people
• The expert is the person with the
domain knowledge and experience
• The programmer interacts with the
expert and creates a computer program
based on their knowledge
• They may be the same person, but you
need both

#MLSEV 26
Machine Learning: Data and Algorithm
• In machine learning, data replaces the expert
and algorithms replace programmers
• Data is often more reliable and sometimes
easier to get than human expertise
• Algorithms work faster, generate more
complex programs, and are more modular
than human programmers
• Machine learning is a good idea when
you can leverage these advantages

#MLSEV 27
#1: People Can’t Tell You How They Do It
• Cases where everyone can do this thing, but
it’s hard for them to explain how they do it
• Many computer vision tasks
• Speech recognition
• Lots of NLP problems (e.g., document
classification)
• Many spatial navigation problems
• Bonus if many people have to do this thing

#MLSEV 28
#2: Human Experts are Expensive
• Cases where it’s tough to get your
hands on an expert, or their
knowledge is too deep to be readily
programmed
• Medical diagnosis
• Game-playing at high levels
• Autonomous helicopter piloting

#MLSEV 29
#3: Everyone Gets Their Own Algorithm
• Cases where a specific model in thousands
of locations would be better than one big
system, and each location is generating the
data necessary to create one
• Location prediction (via mobile)
• Spam detection (from content)
• Demand prediction

#MLSEV 30
#4: Every Little Bit Counts
• Cases where performance is the
overarching concern, even at very small
increments
• Market trading, financial modeling
• High volume vision tasks where mistakes
are costly
• Some product recommendation problems

#MLSEV 31
Some Negative Examples
• Human experts are cheap and easy to come
by (lots of examples in NLP and vision)
• Performance of humans is better (though it
may be slower and more expensive)
• A competent program can easily be written
by hand
• The data is difficult to acquire and/or to label

#MLSEV 33
Am I In The Right Room?
• There are lots of problems solvable via ML
that don’t fall exactly into any of these
buckets
• Machine translation
• Image Segmentation
• Game-playing
• “Matching” problems
• Let’s talk about a few of these other
buckets now

#MLSEV 34
Label Sequence Learning
Part-of-speech tagging, OCR, multimedia annotation
We played outside yesterday
pronoun
verb
adverb
adverb

• Predictions come in a sequence
• Correct value may be “context dependent”

#MLSEV 35
Metric Learning
Document matching, query processing, recommendation
• Learn an embedding for the data
• In the new space, things that are related
should be close together, and unrelated
things should be far apart
• The objective isn’t usually a column, but a
list of pairs of things that are “related” and
“unrelated”

#MLSEV 36
Reinforcement Learning
Game playing, planning, control systems
• Predictions are sequential and when you make one
(take an action), it influences the next prediction you
have to make (next state)
• You may or may not get a reward when you take an
action in a certain state
• Taking a certain action in a certain state might not
always result in the same next state or reward
• The action space is often infinite, structured, and/or
conditional on the state
• You learn from a simulator instead of a dataset

#MLSEV 37
A Solution Might Be At Hand
• All of these problems have their own
algorithms, even their own niches in
the academic literature
• Sometimes, solutions can be
assembled by using standard
algorithms as parts of a solution
• See, for example, sliding window
classifiers
• . . . and WhizzML!

#MLSEV 38
Summary
• The quickest way to machine learning
is to get tabular data
• If you know what you want to predict,
it’s supervised learning, if you don’t,
it’s unsupervised learning
• Machine learning will only work if you
can leverage the advantages of the
data + algorithm paradigm

MLSEV Virtual. Supervised vs Unsupervised

MLSEV Virtual. Supervised vs Unsupervised

More Related Content

What's hot

Similar to MLSEV Virtual. Supervised vs Unsupervised

More from BigML, Inc

Recently uploaded

MLSEV Virtual. Supervised vs Unsupervised