Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
3. 1.
2.
3.
4.
5.
6.
Machine Learning: finding features, patterns &
representations
The connectionist approach: Neural Networks
Applications
The Deep Learning “revolution”: a step closer to the brain?
Applications
The Big Data deluge: better algorithms & more data
4. Was “Deep Blue” Intelligent?
How about Watson?
Or Google?
Does machines have reached the intelligence
level of a rat?
….
Let’s be pragmatic: I’ll call “intelligent” any
device capable of surprise me!
6.
1943 – Mculloch& Pitts + Hebb
1968- Rosenblatperceptron and the Minsk
argument - or why a good theory may kill an even better idea
1985- RumelhartPerceptron
2006- Hinton Deep Learning (Boltzmann)
Networks
All together: Watson, Google et al
7.
8.
9.
Input builds up on receptors
(dendrites)
Cell has an input threshold
Upon breech of cell’s
threshold, activation is fired
down the axon.
12.
A step closer to success thanks to a training
algorithm: back propagation
13.
14.
15.
Training is nothing more than fitting:
regression, classification, recommendations
Problem is we have to find a way to represent
the world (extract features)
25.
ANN are very hard to optimize
Lots of local minimum (trap for stochastic
gradient descendent)
Permutation invariant (no unique solution)
How to stop training?
26.
27.
Neural Networks are incredible powerful
algorithms
But they are also wild beasts that should be
treated with great care
Its very easy to fall in the GIGO trap
Problems like overfit, suboptimization, bad
conditioned, wrong interpretation are
common
28. Interpretation of outputs
Loss function
Outputs ≠ probabilities
Where to draw the line?
VERY careful in interpreting the outputs of ML
algorithms: you not always get what you see
Input preparation
Clean & balance the data
Normalize it properly
Remove unneeded features, create new ones
Missing values
32. 25Å Ge -layer under 400 nm Si
1500
(c)
1000
50
Angle of incidence
o
500
25
o
0
o
Yield (arb. units)
0
(b)
1000
120
o
500
Scattering angle
140
180
o
o
0
(a)
1.2 MeV
1000
Beam energy
1.6 MeV
500
2 MeV
0
0
100
200
Channel
300
400
48.
Neural networks are good when: many
training data available; continuous
variables; relevant features are known;
unicity of the mapping.
Neural networks are less useful when:
problem is linear; few data compared to
the size of search space; data high
dimensional; long range correlations.
They are black boxes
49. Characteristic
Traditional methods
(Von Neumann)
Artificial neural networks
logics
Deductive
Inductive
Processing principle
Logical
Gestalt
Processing style
Sequential
Distributed (parallel)
Functions
through
realised concepts, rules, calculations
Concepts, images, categories,
maps
Connections
concepts
between Programmed a priori
Dynamic, evolving
Programming
Through a limited set of Self-programmable (given an
rigid rules
appropriate architecture)
Learning
By rules
Self-learning
Through internal algorithmic Continuously adaptable
parameters
Tolerance to errors
Mostly none
By examples (analogies)
Inerent
50.
ANN are massive correlation & feature
extraction machines isn’t what intelligence is all about?
Knowledge is embedded in a messy network
of weights
Capable to model an arbitrary complex
mapping
51.
We need thousands of examples for training.
Why?
Prior
Algorithms are simple: complexity lies in the
data
65.
“quasi” non-supervised machines
Extract and combine subtle features in the data
Build high-level representations (abstractions)
Capable of knowledge transfer
Can handle (very) high-dimensional data
Are deep and broad: millions of synapses
Work both ways: up and down
67.
Top on image identification (is some cases it
beat humans)
Top on video classification
Top on real-time translation
Top on Gene identification
Reverse engineering: can replicate complex
human behaviour, like walking.
Data visualization and of text disambiguation
(river-bank/bank-bailout)
Kaggle
73.
Most ML algorithms work better (sometimes
much better) by simple throwing more data
to them
And now we have more data. Plenty of it!
Which is signal and which is noise? Let the
machines decide (they are good at it)
Where humans stands in this equation? We
are feeding the machines!
74.
Don’t look for causation; welcome correlations
Messiness - prepare to get your hands dirty
Don’t expect definitive answers. Only
communists have them!
Stop searching God’s equation
Keep theories at bay and let the data speak
Exactitude may not be better than “estimations”
Forget about keep data clean and organized
Data is alive and wild. Don’t imprisoned it
75.
Flue prediction
Netflix movie rating contest
New York city building security
Used car
Veg food->airport
Prediction rare events frauds and why its
important
76.
A step closer to the brain? Yes and No
What is missing?
Predictive analytics (crime before it occurs)?
Algorithms that learn & adapt
Replace humans?
Augment reality
Big Data & algorithms are revolutionizing the
world. Fast!
77.
78.
Recommendations (Amazon, Netflix, Facebook)
Trading (70% Wall Street is made by them)
Identifying your partner, recruiting, votes
Images, video, voice, translation (real time)
Where are we heading?
NSA?
Black boxes?
80.
Matlab (several code – google for it)
R (CRAN repository), Rminer
Python (Skilearn)
C++ (mainly on Github)
Torch
More on Deeplearning.net
81. Recommend an unseen item i to an user ubased on engagement of other users to items 1 to 8.
Items recommended in this case are i2 followed by i1.
82. Item based recommendation for a user ua
based on a neighbour of k = 3.
Items recommended in this case
are i3 followed by i4.
(item based CF superior to user based CF
but it requires lot of information like ratings
or user interaction with the product).