Machine Learning: The Bare Math Behind Libraries

achine Learning:
The Bare Math Behind Libraries
Piotr Czajka
Łukasz Gebel

Agenda
●
What is Machine Learning?
●
Supervised learning
●
Unsupervised learning
●
Q&A

Machine Learning
„Field of study that gives computers the ability to
learn without being explicitly programmed.”
Arthur Samuel

Machine Learning
„I consider every method that needs training
as intelligent or machine learning method.”
Our Lecturer

Supervised learning
●
It is similar to be tought by a teacher.

Supervised learning
●
Build your model that performs particular task:
– Prepare data set consisting of examples & expected
outputs

Supervised learning
●
outputs
– Present examples to your model

Supervised learning
●
outputs
– Check how it responds (model’s output values)

Supervised learning
●
outputs
– Check how it responds (model’s output values)
– Adjust model’s params by comparing output values
with expected output values

Neural Networks
●
Inspired by biological brain mechanisms
●
Many applications:
– Computer vision
– Speech recognition
– Compression

Biological Neuron
http://www.marekrei.com/blog/wp-content/uploads/2014/01/neuron.png

Artificial Neuron
● Inputs (x1, … , xn) are features of single example
●
Multiply each input by weight, sum it and put the
sum as an argument of activation function
w1
w2
wn
w0
Σ
x1
x2
xn
...
s y

Activation function
●
Sigmoid
– Maps sum of neurons signals to value from 0 to 1
– Continous, nonlinear
– If input is positive it gives values > 0.5
f (x)=
1
1+e(−βx)

Linear Regression
●
Method for modelling relationship between
variables
●
Simplest form: how x relates to y
●
Examples:
– House size vs house price
– Voltage vs electric current

Costume price vs number of issues
●
For given amount of money predict in how many
comic book issues you’ll appear.
Costume
price(x)
Number of issues
(y)
240 6370
480 8697
... ...
26 2200

Interesting fact
●
Invisible woman costume costs $120
Invisible woman

Linear regression
●
Let's have a function:
f (x ,Θ)=Θ1 x+Θ0
f (x,Θ)−number of comicbookissues
x−costume price
Θ−parameters

Objective function
Q(Θ)=
1
2 N
∑
j=0
N
(f ( x
j
,Θ)−y
j
)
2
Q(Θ)−objectivefunction
N−numberof datasamples
j−indexof particulardatasample

Objective function - intuition
f (xj
,Θ)=4 y=2
(4−2)2
=22
=4
sum+=4

Objective function - intuition
f (xj
,Θ)=1 y=1
(1−1)2
=02
=0
sum+=0

Gradient descent
●
Find the minimum of the objective function
●
Iteratively update function parameters:
Θ0(t +1)=Θ0(t)−α
1
N
∑
j=0
N
(f ( x
j
,Θ)− y
j
)
Θ1(t +1)=Θ1(t )−α
1
N
∑
j=0
N
(f (x
j
,Θ)− y
j
) x
t−number of iteration
α−learning step

Gradient descent
−α ∗ +derivative < 0

Gradient descent
−α ∗ −derivative > 0

Not linearly separable problem

NN – random weights init
+1
+1
x1
x2
y

NN – feed forward
+1
+1
x1
x2
y

NN – compute error
+1
+1
x1
x2
y
( y−expected output)2

NN – backpropagation step
●
Use gradient descent and computed error
●
Update every weight of every neuron from
hidden and output layer

+1
+1
x1
x2
y
( y−expected output)2

+1
+1
x1
x2
y

Neural Network
„Neural networks are nothing more than
randomized optimization.”
Another Lecturer

Real life problem
●
You said it can solve non linear problems, let's
generate superhero logo using it.

What? They can learn by themselves?

Why would we let them?
●
Less complex mathematical apparatus than in
supervised learning.
●
It is similar to discovering world on your own.

Why would we let them?
Used mostly for sorting and grouping when:
●
Sorting key can’t be easily figured out.
●
Data is very complex and finding the key is not
trivial.

Hebbian learning
●
Works similar to the nature
●
Great for beginners and biological simulations :)
●
Simple Hebbian learning algorithm
Δ wij=η⋅xij⋅yi
Δ wij−change of j weight of ineuron
η−learningcoefficient
xij− jinput of ineuron
yi−output of i neuron

Hebbian learning
●
Works similar to the nature
●
Great for beginners and biological simulations :)
●
Generalised Hebbian learning algorithm
Δ wij=F(xij , yj)
Δ wij−change of j weight of ineuron
η−learningcoefficient
xij− jinput of ineuron
yi−output of i neuron

Hebb’s neuron model
w1
w2
wn
w0
Δ wij=F(xij , y j)
Σ
x1
x2
xn
...
1
s y

0.230
0.010
0.900
0.110
Δ wij=F(xij , y j)
Σ
0.200
0.300
0.100
1

0.230
0.010
0.900
0.110
Δ wij=F(xij , y j)
Σ
1
0.046
0.003
0.090
0.110
0.200
0.300
0.100

0.230
0.010
0.900
0.110
Δ wij=F(xij , y j)
Σ
1
0.046
0.003
0.090
0.110
0.249
0.200
0.300
0.100

0.230
0.010
0.900
0.110
Δ wij=F(xij , y j)
Σ
1
0.046
0.003
0.090
0.110
0.249 0.562
0.562
0.200
0.300
0.100

0.230
0.010
0.900
0.110
Δ wij=F(xij , y j)
Σ
1
0.562
0.200
0.300
0.100

0.230
0.010
0.900
0.110
Δ wij=F(xij , y j)
Σ
1
0.562
+0.011
+0.016
+0.005
+0.056
0.300
0.100
0.200

0.241
0.026
0.905
0.166
Δ wij=F(xij , y j)
Σ
x1
x2
x3
1

Marvel database to the rescue
Intelligence Strength Speed Durability Energy projection Fighting skills
Iron Man 6 6 5 6 6 4
Spiderman 4 4 3 3 1 4
Black Panter 5 3 2 3 3 5
Wolverine 2 4 2 4 1 7
Thor 2 7 7 6 6 4
Dr Strange 4 2 7 2 6 6
Hulk 2 7 3 7 5 4
Cpt. America 3 3 2 3 1 6
Mr Fantastic 6 2 2 5 1 3
Human Torch 2 2 5 2 5 3
Invisible Woman 3 2 3 6 5 3
The Thing 3 6 2 6 1 5
Luke Cage 3 4 2 5 1 4
She Hulk 3 7 3 6 1 4
Ms Marvel 2 6 2 6 1 4
Daredevil 3 3 2 2 4 5

Hebbian learning weaknesses
●
Unstable.
●
Prone to rise the weights ad infinitum.
●
Some groups can trigger no response.
●
Some groups may trigger response from too
many neurons.

And now – self organising networks

Learning with concurrency
●
You try to generalize input vector in weights vector.
●
Instead of checking the reaction to input - you
check distance between both vectors.
●
Ideally – each neuron specializes in one class
generalization.
●
Two main strategies:
– Winner Takes All (WTA)
– Winner Takes Most (WTM)

Idea behind
Neuron
weights
3.0
2.0
2.0
Example
1.0
2.0
3.0

Example
1.0
2.0
3.0
Idea behind
Distance
2.0
0.0
-1.0
Neuron
weights
3.0
2.0
2.0
di=wi−xi

Distance
2.0
0.0
-1.0
Idea behind
Neuron
weights
3.0
2.0
2.0
Learning coefficient η=0.100
Learning Step
0.2
0.0
-0.1
Δ wi=η⋅di
Example
1.0
2.0
3.0
di=wi−xi

Idea behind
Distan
ce
2.0
0.0
-1.0
Neuron weights
2.8 = 3.0 – 0.2
2.0 = 2.0 – 0.0
2.1 = 2.0 -(-0.1)
Exam
ple
1.0
2.0
3.0
di=wi−xi
Learning coefficient η=0.100
Learning
Step
0.2
0.0
-0.1
w'i=wi−Δwi Δ wi=η⋅di
Example
1.0
2.0
3.0
Learning Step
0.2
0.0
-0.1
Δ wi=η⋅di
Distance
2.0
0.0
-1.0
di=wi−xi

Idea behind
Neuron
weights
2.8
2.0
2.1

Idea behind – initial setup
1
2 3

Idea behind – mid learning
1
2
3

Idea behind – homeostasis
1
2
3

Learning with concurrency
●
Gives more diverse groups.
●
Less prone to clustering (than Hebb’s).
●
Searches wider spectrum of answers.
●
First step to more complex networks.

Learning with concurency -
weaknesses
●
WTA – works best if teaching examples are
evenly distributed in solution space.
●
WTM – works best if weights’ vectors are evenly
distributed in solution space.
●
Still can stick to local optimum.

Teuvo Kohonen’s SOM
Created by this nice guy here

Kohonen’s self-organizing map
●
The most popular self-organizing network with
concurrency algorithm.
●
It teaches groups of neurons with WTM alghoritm
●
Special features:
– Neurons are organised in a grid
– Nevertheless – they are treated as a single layer

Kohonen’s self-organizing map
wij(s+1)=wij(s)+Θ(kbest ,i , s)⋅η(s)⋅(I j(s)−wij(s))
s−epochnumber
kbest−best neuron
wij (s)− j weight of ineuron
Θ(kbest ,i,s)−neighbourhood function
η(s)−learning coefficient for sepoch
I j(s)− jchunk of example for s epoch

SOM model
By Mcld - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=10373592

Common weaknesses of artificial
neuron systems
●
We are still dependant on randomized weights.
●
All algorithms can stick to local optimums.

Bibliography
●
https://www.coursera.org/learn/machine-learning
●
https://www.coursera.org/specializations/deep-learning
●
Math for Machine Learning - Amazon Training and Certification
●
Linear and Logistic Regression - Amazon Training and Certification
●
Grus J., Data Science from Scratch: First Principles with Python
●
Patterson J., Gibson A., Deep Learning: A Practitioner's Approach
●
https://github.com/massie/octave-nn- neural network Octave implementation
●
https://www.desmos.com/calculator/dnzfajfpym - Nanananana … Batman equation ;)
●
https://xkcd.com/605/ - extrapolating ;)
●
http://dilbert.com/strip/2013-02-02 - Dilbert & Machine Learning
●
Presentation + code: https://bitbucket.org/medwith/public/downloads/ml-math-jotb.zip

Machine Learning: The Bare Math Behind Libraries

More Related Content

What's hot

Similar to Machine Learning: The Bare Math Behind Libraries

More from J On The Beach

Recently uploaded

Machine Learning: The Bare Math Behind Libraries