Demystifying Machine Learning

Demystifying
Machine Learning
Uncloaking the Math in the Black Box

Hey Y'all
Ayodele Odubela
I'm a Data Scientist at MINDBODY.
I have a Master's Degree in Data Science from Regis
University.
3 years of experience working with machine learning.

Today's Workshop
What's in an Algorithm?
ML vs AI
Machine Learning Applications
Math for Machine Learning
Types of ML
What We Can Predict
Decision Trees
Neural Networks
Natural Language Processing
Code Along
Wrap Up

WHAT DOES AN ALGORITHM
DO?
Ouput:
Cat or dog
0 or 1
Chance of rain
"Complex"
Math
Input:
An image
A vector of
numbers

ML takes cues
from
neuroscience
TEACHING MACHINES
• Show a program lots of data
• Teach it to recognize patterns
• Check if it learned well

ARTIFICIAL
INTELLIGENCE
“Artificial intelligence is the science
of making computers behave in
ways that we thought required
human intelligence.”
MACHINE
LEARNING
Machine learning algorithms can
figure out how to perform
important tasks by generalizing
from examples.
Andrew Moore
Carnegie Mellon University
Pedro Domingos
University of Washington

MACHINE LEARNING IN THE WILD
VIDEO GAME
ENEMIES
AI is the foundation of
many video games. From
controlling NPCs to
playing against the AI
trained on thousands of
past games.
PERSONAL
ASSISTANTS
Services like Siri, Alexa,
Google Assistant use
both audio processing
and natural language
processing to retrieve
results and help you
send texts with your
voice.
ROUTE
OPTIMIZATION
Google Maps and Waze
can run hundreds of
potential routes to get
you to your destination
quickest. Many search
algorithms used in route
optimization are
fundamental to artificial
intelligence.
MOVIE
RECCOMENDATIONS
Netflix's movie
reccomendaions
use collaborative
filtering which helps
suggest movies based
on our past views and
people similar to us.

WILD MACHINE LEARNING
POLICING
Police use systems to
predict where crimes will
happen and deploy more
officers to an at risk area.
This leads to an increase
of arrests and without
feedback they succumb
to confirmation bias.
CREDIT
SCORING
These algorithms asses
the risk a creditor takes
on by giving you a loan
or credit card. Inputs like
zip code can be useed as
a proxy for race.
90210 vs any South
Central.
RECIDIVISM
These algorithms are
used in counties across
the nation to predict
which incarcerated
people will commit
another crime after
release,
HR
SCREENINGS
Companies looking to
harness machine
learning in HR should be
weary of perpetuating
the same workplace bias
and hiring practices.
Does a candidate have to
look like successful
employees?.

Math Foundations
BIGGEST BARRIER TO ENTRY
• Linear Algebra
• Calculus
• Statistics
• Discrete Math
• Logical Operators
• Probability
• Statistics
WHAT'S USED FREQUENTLYWHAT THEY SAY YOU NEED

SUPERVISED
The machine is shown data, but
there are labelled answers for if the
prediction is right or wrong.
This learning is supervised because it
requires input data to be properly
laabelled and often a binary
classification column is added as a
response variable.
LABELED OR NAH?

UNSUPERVISED
CREATES SIMILAR GROUPINGS
Unsu[pervised models don't have a
list of ground truth, but tends to have
adifferent goal.
Unsupervised learning methods
usually serve one of two purposes.
To cluster groups or to reduce
dimensionality.

REINFORCEMENTA way of letting a system learn by
navigating its surroundings without
guidance and improving with
performance the more it
understands its current state.
A RL model will calculate the value
of being in one state. There are
rewards for good actions (ie. roomba
in a tile and the state went from dirty
to clean) and penalties for bad ones.
WHAT MANY CONSIDER AI

CLASSIFICATIONIn this case a machine learning
model will predict the class of the
inputs.
Your model will output whether it
thinks someone has heart disease or
not (0 or 1) or what segment a
customer is in (multi-class)
DISCRETE/CATEGORICAL VARIABLE

REGRESSION
REAL/CONTINUOUS NUMBERS
The model predicts a value based on
past data. In a Linear dataset the
regression values will likely be
"through" the values it's trained on.
Your model will output what
temperature it will be tomorrow, the
price of Bitcoin, or the number of
people who will see the live action
Lion King movie.

DECISION TREES
• Breaks a dataset into small
subsets
• Tree structure includes
decision nodes and leaf nodes
• Root node is the best predictor
SUPERVISED

Entropy
• A measure of the degree of randomness in a variable
• "Good" Decision Trees have homogenous leaf nodes
• The higher the entropy, the harder to draw conclusions

Information Gain
• Used to decide which of the attributes are most relevant
• The purpose is to find the attribute that returns the most information gain
• Expected information gain = decrease in entropy
• The less random the variables, the more information is gained

Gini Index
• Measures how impure a node is
• Calculated per node
Gini index is used in CART
(Classification and Regression Trees)
IID3 search algorithm uses entropy and infromation gain

Pre-pruning
Involves setting the tree parameters before building it so it stops early without
completely being built.
Variables to tune:
⚬ Set max tree depth
⚬ Set max terminal nodes
⚬ Set max number of features
⚬ Set max samples for a node split
￭ controls the size of terminal nodes

Post-pruning
• Validate the performance of the model on a test
• Cut back splits that seem to overfit the noise in the training set
• Removes a branch from a fully grown tree
• Available in R, but not Python's scikit-learn package

NEURAL
NETWORKS
• Neural nets are designed based on
architecture of neurons in the
brain.
SUPERVISED

Forward Propogation
(aka making an
inference)
Calculate the weight input to the
hidden layer.
Mulitply weight by input and pass to
next layer.
Apply an activation function
Calculate this again to go from
hidden layer to output

Back Propogation
The output from forward
propogation is the predicted value.
We use a loss function to compare
the predicted value to the actual
value.

Learning Rate
• Gradient Descent is used to get
ideal weights for each neuron
• The learning rate is how fast or
slow you want the machine to
update weight values
WHAT YOU NEED TO KNOW:
• Learning rate should be high
enough to converge* in a
reasonable amount of time
• It should also be small enough to
find the local minima
Convergence is when the output
gets closer to the minima

Activation Functions
The job of an activation function is to convert an input to an output signal.
This is based on a mathematical threshold. The activation function tells
the node to activate once a critera has been met.
If a model thinks there's a 51% chance an image is a dog, the activation
function will output a prediction of dog (depending on your function)

Gradient Descent
• Gradient Descent is an iterative
machine learning optimization
algorithm to reduce the loss
function.
• Having a low loss function
means predicted values are
close to actual values

NATURAL
LANGUAGE
PROCESSING
Based on the field of lingustics,
natural language processing is aided
by new text analytics packages and
the abundance of sample text online.
Challenges:
• Thousands of languages with
hundreds of thousands of words.
• Complex syntax (varying words
per sentence, relative clauses)
• Many ambiguities (special
naames, sarcasm)

Word Embeddings
• Models that have mapped a set
of words or phrases to vectors
of numbers,
Most popular are:
• Word2Vec (Trained on Google
News}
• GloVe provided by Stanford
• FastText by Facebook

Sentiment Analysis
The polarity of a word of phrase on
how positive or negative it skews.
Types:
• Subjectivity classification
• Polarity classification
• Intent classification
Challenges:
• Biased to the dominant culture
• Many sentiment packages are
based off linguisitc work that is
not universal,
• Variance in individual speech
not a factor

Lexical Density
• The number of meaningful words
• After removing stop words like "the", "and",
"I", etc. lexical density is thenumber of
words that add content divided by the total
number of words.
She told him that she loved him
2 lexical words out of 7 total words
28.57% lexical density

Markov Chains for NLP
• First a dictionary is built based
on historical texts. They key is a
given word in a sentence and
the results are natural follow up
words.
• Next calculate the word most
likely to follow a given word.

Create-Your-Own Kanye Lyrics
GOAL
Generate rap verses that
almost sound like they
could belong to Kanye.
METHOD
Use a Markov Chain to
generate a new verse of
a Kany-AI song.
USE CASE
Perhaps you want to see
if your verses can fool
some fans.

Evaluating
Models
• Classification Accuracy
• Confusion matrix
• Logarithmic Loss
• Area under curve (AUC)
• F-Measure

Classification Accuracy
The most common metric you
might here will be accuracy
It's not always the best metric for
any given model

Confusion Matrices
Table that visualizes the
performance of a classification
algorithm.
Rows represent predicted class
Columns represent the actual
class

Normalization
• The process of getting all data
for predictions on the same
scale
• Most algorithms have trouble
performing well with data on
multiple scales.
• Usually between 0 and 1
• Step data pre-processing for
machine learning

Overfitting
When a model describes the pattern too
well.This means it has essentially
memorized the data without learning. We
say it hasn't learned because an overfit
model generalizes poorly to new data.
This is one of the biggest problems with
machine learning is that we taught our
machines what we told ourselves not to do.
Don't just memorize, learn.

Regularizarion
Adds a penalty to your model to avoid the
model becoming too complex and
overfitting the data.
Multiple methods to do this (L1 and L2
regularization)
If there is noise in the trianing data
regularization shrinks the learned "noise"
towards 0
Regularization reduces the variance of the
model without a major increase in bias

DATA IS VALUABLE
IF IT'S
PROTECTED
Does your favorite
website or app allow you
to use two-factor
authentication? How do
companies protect your
data?
WHO'S USING
IT
Our data has and will
continue to be used
against us. We question
the possibility of ever
having a free and fair
election.
WHO
COLLECTS IT
As we know from the
Cambridge Analytica
scandal, companies can
be dubious with our data.
and which developers
have access to it.
WHO OWNS IT
Recent outcry against
the privacy issues with
apps like FaceApp have
been at the forefront of
tech news.

USING ML IN YOUR PROJECTS
INVESTIGATE
DATA
TRAIN YOUR
MODEL
MODEL
EVALUATION

Will the
product be
better? If so,
how?

How will this
impact my
users?

How will I
protect their
data?

How often
will this
model get
feedback?

PICKING A MODEL
NO FREE LUNCH
• What type of data are my outputs?
• What am I trying to predict
• What is wrong with the data? (small sample size, imbalanced classes)
• What data cleaning did I do?
• Will this be a problem when runnning the model in the real world?

The Future: XAI
eXplainable AI
TCAV- Testing with Concept Activation Vectors is a new interpretability
method to understand what signals your neural networks models uses for
prediction.

Thank you!
@data_bayes
@data_bayes
/ayodeleodubela

Demystifying Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Demystifying Machine Learning

Similar to Demystifying Machine Learning (20)

Recently uploaded

Recently uploaded (20)

Demystifying Machine Learning