Machine Learning 101

Machine Learning
101
Edwin Jiménez
October 2017.

A.I.
Bring the attention of
your audience over a
key concept using icons
or illustrations
Image from: https://i0.wp.com/dailypremiere.com/wp-content/uploads/2016/11/AI_Poster.jpg?resize=1024%2C641

What is:
Intelligence?
Learning?

“
A very general mental capability that, [...],
involves the ability to reason, plan, solve
problems, think abstractly, comprehend
complex ideas, learn quickly and learn from
experience.
Intelligence Definition, From "Mainstream Science on Intelligence" (1994),

“
We define learning as the transformative
process of taking in information that—when
internalized and mixed with what we have
experienced—changes what we know and builds
on what we do. It’s based on input, process,
and reflection.
Learning Definition. From The New Social Learning by Tony Bingham and
Marcia Conner

A.I.
The term was coined in 1956
by John McCarthy at the
Massachusetts Institute of
Technology.
“It is the science and
engineering of making
intelligent machines, especially
intelligent computer
programs.”

Image from: http://legalexecutiveinstitute.com/wp-content/uploads/2016/02/AI-Graphic-NEW.jpg

M.L.
Arthur Samuel (1959)
Field of study that gives
computers the ability to learn
without being explicitly
programmed.

‘Machine’ is a term
we use to denote a
mathematical model
which aims to optimize
a given function.

Data Input
Picture
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore
et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud
exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat.
Duis aute irure dolor in
reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla
pariatur.
Text
ID Name Age Sex Student Happy
01 Marc 25 Male Yes No
02 Ana 18 Female Yes Yes
Vector of values

Data Input
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore
et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud
exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat.
Duis aute irure dolor in
reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla
pariatur.
ID Name Age Sex Student Happy
01 Marc 25 Male Yes No
02 Ana 18 Female Yes Yes
Each value of the data is called feature

Machine Learning
Unsupervised
We have data and its
features.
Supervised
We have labeled data
and we learn from that.

How it works?
Image from: https://cdn0.tnwcdn.com/wp-content/blogs.dir/1/files/2017/07/big-data-theree.png

Supervised
Input: Data points
Feature 1
Feature2
Learn
from
data
Result
(classification
or regression)

Classification
Input
Process
Vector of
probabilities
0.01
0.05
0.07
0.62
0.12
0.05
0.02
0.03
0.01
0.02

Regression
Input
House Size:
289 m2
Process
House Price
4,000£

Popular Supervised Algorithms
● Nearest Neighbor
● Naive Bayes
● Decision Trees
● Linear Regression
● Support Vector Machines (SVM)
● Neural Networks

Unsupervised
Image from: https://www.packtpub.com/sites/default/files/Article-Images/B03905_01_01.png
Input: Data points
Result
(clustering)
Learn
from
data

Unsupervised
Image from: https://www.datascience.com/blog/k-means-clustering
Input: Data points
Learn
from
data

Popular Unsupervised Algorithms
● k-means clustering
● Association Rules

That’s nice
But how do I ‘learn’ from data?

● Dendrite - It receives signals from other neurons.
Image from: https://www.xenonstack.com/blog/overview-of-artificial-neural-networks-and-its-applications

● Soma (cell body) - It sums all the incoming signals to
generate input.

● Axon - When the sum reaches a threshold value, neuron
fires and the signal travels down the axon to the other
neurons

● Synapses - The point of interconnection of one neuron
with other neurons. The amount of signal transmitted
depend upon the strength (synaptic weights) of the
connections.

Perceptron
Image from: https://d4datascience.files.wordpress.com/2016/09/600px-artificialneuronmodel_english.png

Perceptron
X1 … Xn are the features

Perceptron
w1j … wnj are the weights, denotes importance of Xi to the
result.

Artificial Neural Network
Image from: http://cs231n.github.io/assets/nn1
● Supervised
● Classification

Perceptron

Vector of
values
1.245
0.789

Vector of
values
1.245
0.789
Get
probabilities
?

Vector of
values
1.245
0.789
Softmax

Vector of
values
1.245
0.789
softmax
0.75
0.25
Vector of
probabilities

Vector of
values
1.245
0.789
softmax
0.75
0.25
Vector of
probabilities
0
Predicted
label
0
1

Training or learning from data
1.Give data to the model
2.Calculate probabilities
3.Evaluate, how far is the predicted label from the
original label (error)
4.Update weights
5.Repeat 1 - 4 until the error is minor than some
value

Training
12
0.7
0.4

Training
12
0.7
0.4
x1

Training
12
0.7
0.4
x1 x2 x3 x4

Training
12
0.7
0.4
x1 y1 z1

Training
12
0.7
0.4
x1 y1 z1
12*x1+0.7*y1+0.4*z
1

Training
12
0.7
0.4
1.67

Training
12
0.7
0.4
1.67
1.27

Training
12
0.7
0.4
1.67
1.27
0.35

Training
12
0.7
0.4
1.67
1.27
0.35
Softmax

Training
12
0.7
0.4
1.67
1.27
0.35
0.8
0.2

Training (error propagation)
1
0
Lossi = -ti log(pi)
X
Y
=-
0.8
0.2
*log( )

1
0
Lossi = -ti log(pi)
X
Y
=-
-0.096
−0.69
*
Why log? Because log(1) = 0
When expected and predicted label are
equal we have 0 error.

Lossi = -ti log(pi)
X
Y
=-
-0.096
0

Lossi = -ti log(pi)
0.096
0

Training
12
0.7
0.4
1.67
1.27
0.35
0.096
0
error

Training (back propagation)
initialize network weights (often small random values)
do
forEach training example named ex
prediction = neural-net-output(network, ex) // forward
actual = teacher-output(ex)
compute error at the output units
compute Δwh for all weights from hidden layer to output
layer // backward
compute Δwi for all weights from input layer to hidden
layer // backward
update network weights
// input layer not modified by error estimate
until all examples classified correctly
or another stopping criterion satisfied
return the network

Evaluation Confusion Matrix
Image from:Wikipedia
Actual class
Cat Non-cat
Predicted
class
Cat 5 True
Positives
(TP)
2 False
Positives
(FP)
Non-cat 3 False
Negatives
(FN)
17 True
negatives
(TN)

Evaluation Confusion Matrix
T. Fawcett / Pattern Recognition Letters 27 (2006) 861–874

MNIST - Model definition
28x28
784 features
10 categories
10
Accuracy:
93.10%

28x28
784 features
10 categories
10
Accuracy:
93.51%
10

28x28
784 features
10 categories
10
Accuracy:
93.75%
10
hidden layer 5
[10,10,10]

Why to go deeper?
Image from: http://fortune.com/ai-artificial-intelligence-deep-machine-learning/

Image from: https://fortunedotcom.files.wordpress.com
/2016/09/lrn-10-01-16-neural-networks-e1474990995824.png

http://fortune.com/ai-artificial-intelligence-deep-machine-learning/

Generate poetry from images
A man is taking a picture of
himself in the mirror of the
world.
So I said the word had come to
this story once a year ago,
and I was a fool for the sake of
the same word,
and the fact that I was a boy
who had been dead for a
moment.
He is a poor picture of the past.

Automatic Colorization of Black and White Images

Generate christmas carols from images.

Automatically Adding Sounds To Silent Movies

Artistic reinterpretation of images (GoogLeNet, aka inception)

Object Classification and Detection in Photographs

Automatic Handwriting Generation

Automatic Image Caption Generation

Example
with
Theano
import theano
import numpy
x = theano.tensor.fvector('x')
target = theano.tensor.fscalar('target')
W = theano.shared(numpy.asarray([0.2, 0.7]),'W')
y = (x * W).sum()
cost = theano.tensor.sqr(target - y)
gradients = theano.tensor.grad(cost, [W])
W_updated = W - (0.1 * gradients)
updates = [(W, W_updated)]
f = theano.function([x, target], y, updates=updates)
for i in xrange(10):
output = f([1.0, 1.0], 20.0)
1.0
1.0
20

Example
with
Lasagne
network = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
input_var=input_var)
# Hidden layers and dropout:
nonlin = lasagne.nonlinearities.rectify
for _ in range(depth):
network = lasagne.layers.DenseLayer(network, width,
nonlinearity=nonlin)
# Output layer:
softmax = lasagne.nonlinearities.softmax
network = lasagne.layers.DenseLayer(network, 10,
nonlinearity=softmax)

Example
with
Lasagne
network = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
input_var=input_var)
network = lasagne.layers.Conv2DLayer(network, num_filters=32, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify)
# Max-pooling layer of factor 2 in both dimensions:
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# Another convolution with 32 5x5 kernels, and another 2x2 pooling:
network = lasagne.layers.Conv2DLayer(network, num_filters=32, filter_size=(5, 5),
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# A fully-connected layer of 256 units
network = lasagne.layers.DenseLayer(network, num_units=256,
# And, finally, the 10-unit output layer
network = lasagne.layers.DenseLayer(
network,
num_units=10,
nonlinearity=lasagne.nonlinearities.softmax)

https://es.coursera.org/learn/machine-learning
Andrew Ng
Universidad de Stanford

Is Machine Learning
the answer to everything?

“
Turning over rocks and
finding nothing is progress

Modern Data Scientist
http://www.mark
etingdistillery.co
m/wp-
content/uploads
/2014/08/mds.p
ng
Math & Statistics
• Machine Learning
Supervised Learning
Unsupervised Learning
Optimization
• Statistical modeling
• Experiment design
• Bayesian Inference

Modern Data Scientist
http://www.mark
etingdistillery.co
m/wp-
content/uploads
/2014/08/mds.p
ng
Programming & Database
• Computer science
fundamentals
• Scripting language
• Databases
• Relational Algebra
• MapReduce

Thanks!
Any questions?
You can find me at:
eejimenez@gdl.cinvestav.mx
edwinjimenezlepe
@Lepe_92
lepe92

“
What is research,
but a blind date with knowledge.
William Henry

Machine Learning 101

More Related Content

Similar to Machine Learning 101

Recently uploaded

Machine Learning 101

Editor's Notes