machine learning in the age of big data: new approaches and business applications

Armando Vieira
Closer
Armando.lidinwise.com

1.

2.
3.
4.
5.
6.

Machine Learning: finding features, patterns &
representations
The connectionist approach: Neural Networks
Applications
The Deep Learning “revolution”: a step closer to the brain?
Applications
The Big Data deluge: better algorithms & more data

Was “Deep Blue” Intelligent?
 How about Watson?
 Or Google?
 Does machines have reached the intelligence
level of a rat?
….
 Let’s be pragmatic: I’ll call “intelligent” any
device capable of surprise me!


Deep
MLP
Hebb
concept

Architectures









1943 – Mculloch& Pitts + Hebb
1968- Rosenblatperceptron and the Minsk
argument - or why a good theory may kill an even better idea
1985- RumelhartPerceptron
2006- Hinton Deep Learning (Boltzmann)
Networks
All together: Watson, Google et al







Input builds up on receptors
(dendrites)
Cell has an input threshold
Upon breech of cell’s
threshold, activation is fired
down the axon.



A step closer to success thanks to a training
algorithm: back propagation





Training is nothing more than fitting:
regression, classification, recommendations
Problem is we have to find a way to represent
the world (extract features)

Simpler hypothesis has lower error rate







ANN are very hard to optimize
Lots of local minimum (trap for stochastic
gradient descendent)
Permutation invariant (no unique solution)
How to stop training?









Neural Networks are incredible powerful
algorithms
But they are also wild beasts that should be
treated with great care
Its very easy to fall in the GIGO trap
Problems like overfit, suboptimization, bad
conditioned, wrong interpretation are
common

Interpretation of outputs
 Loss function
 Outputs ≠ probabilities
 Where to draw the line?
 VERY careful in interpreting the outputs of ML
algorithms: you not always get what you see
Input preparation
 Clean & balance the data
 Normalize it properly
 Remove unneeded features, create new ones
 Missing values







Rutherford Backscattering (RBS)
Credit Risk & Scoring
Churn prediction (CDR)
Prediction of hotel demand with Google trends
Adwords Optimization

Ion beam analysis
MeV/amu

RBS
Channelling

h
NRA
PIXE
ERDA

25Å Ge -layer under 400 nm Si
1500

(c)

1000
50

Angle of incidence

o

500
25

o

0

o

Yield (arb. units)

0

(b)
1000

120

o

500

Scattering angle
140

180

o

o

0

(a)

1.2 MeV
1000

Beam energy
1.6 MeV

500

2 MeV

0
0

100

200

Channel

300

400

architecture

(I, 100, O)
(I, 250, O)
(I, 100, 80, O)
(I, 100, 50, 20, O)
(I, 100, 80, 50, O)
(I, 100, 80, 80, O)
(I, 100, 50, 100, O)
(I, 100, 80, 80, 50, O)
(I, 100, 80, 50, 30, 20, O)

train set error

test set error

6.3
5.2
3.6
4.2
3.0
2.8
3.0
3.2
3.8

11.7
10.1
5.3
5.1
4.1
4.7
4.2
4.1
5.3

2

DepthANN (10 at/cm )

4000

b)

15

3000

2000

1000

0
0

1000

2000

3000
15

4000

2

100

a)

10

15

2

DoseANN (10 at/cm )

Depthdata (10 at/cm )

1

0,1

0,1

1

10
15

100
2

Dosedata (10 at/cm )

Score (EBIT, Current ratio)
1
0.5
0
-0.5
-1
-1.5
2
1
0
-1
eb

-2

-2

0

-1
cr

1

2





Neural networks are good when: many
training data available; continuous
variables; relevant features are known;
unicity of the mapping.

Neural networks are less useful when:
problem is linear; few data compared to
the size of search space; data high
dimensional; long range correlations.

They are black boxes

Characteristic

Traditional methods
(Von Neumann)

Artificial neural networks

logics

Deductive

Inductive

Processing principle

Logical

Gestalt

Processing style

Sequential

Distributed (parallel)

Functions
through

realised concepts, rules, calculations

Concepts, images, categories,
maps

Connections
concepts

between Programmed a priori

Dynamic, evolving

Programming

Through a limited set of Self-programmable (given an
rigid rules
appropriate architecture)

Learning

By rules

Self-learning

Through internal algorithmic Continuously adaptable
parameters

Tolerance to errors

Mostly none

By examples (analogies)

Inerent







ANN are massive correlation & feature
extraction machines isn’t what intelligence is all about?
Knowledge is embedded in a messy network
of weights
Capable to model an arbitrary complex
mapping







We need thousands of examples for training.
Why?
Prior

Algorithms are simple: complexity lies in the
data










“quasi” non-supervised machines
Extract and combine subtle features in the data
Build high-level representations (abstractions)
Capable of knowledge transfer
Can handle (very) high-dimensional data
Are deep and broad: millions of synapses
Work both ways: up and down

Learning features that are not mutually exclusive













Top on image identification (is some cases it
beat humans)
Top on video classification
Top on real-time translation
Top on Gene identification
Reverse engineering: can replicate complex
human behaviour, like walking.
Data visualization and of text disambiguation
(river-bank/bank-bailout)
Kaggle

Data

Data
Features

Results

Features

Results

Better

Powerful

Algorithms

Computers

Data
Available

MAGIC





In 2 years we produce more data (and garbage)
than the accumulated over all history
Zettabytesof data, 1021 bytes produced every
year

Machine learning molecules (**)








Most ML algorithms work better (sometimes
much better) by simple throwing more data
to them
And now we have more data. Plenty of it!
Which is signal and which is noise? Let the
machines decide (they are good at it)
Where humans stands in this equation? We
are feeding the machines!












Don’t look for causation; welcome correlations
Messiness - prepare to get your hands dirty
Don’t expect definitive answers. Only
communists have them!
Stop searching God’s equation
Keep theories at bay and let the data speak
Exactitude may not be better than “estimations”
Forget about keep data clean and organized
Data is alive and wild. Don’t imprisoned it









Flue prediction
Netflix movie rating contest
New York city building security
Used car
Veg food->airport
Prediction rare events frauds and why its
important











A step closer to the brain? Yes and No
What is missing?
Predictive analytics (crime before it occurs)?
Algorithms that learn & adapt
Replace humans?
Augment reality
Big Data & algorithms are revolutionizing the
world. Fast!










Recommendations (Amazon, Netflix, Facebook)
Trading (70% Wall Street is made by them)
Identifying your partner, recruiting, votes
Images, video, voice, translation (real time)

Where are we heading?
NSA?
Black boxes?








Deeplearning.net
Hinton Google talks
“Too big to know”
Big Data: a new revolution that will transform
business
Machine Learning in R









Matlab (several code – google for it)
R (CRAN repository), Rminer
Python (Skilearn)
C++ (mainly on Github)
Torch
More on Deeplearning.net

Recommend an unseen item i to an user ubased on engagement of other users to items 1 to 8.
Items recommended in this case are i2 followed by i1.

Item based recommendation for a user ua
based on a neighbour of k = 3.
Items recommended in this case
are i3 followed by i4.

(item based CF superior to user based CF
but it requires lot of information like ratings
or user interaction with the product).

machine learning in the age of big data: new approaches and business applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to machine learning in the age of big data: new approaches and business applications

Similar to machine learning in the age of big data: new approaches and business applications (20)

More from Armando Vieira

More from Armando Vieira (20)

Recently uploaded

Recently uploaded (20)

machine learning in the age of big data: new approaches and business applications

Editor's Notes