PyData2015

Representation learning
PyData Warsaw 2015
Michael Jamroz
Matthew Opala
24’th september 2015

● Goals of AI
● Learning representations
● Deep learning
● Examples
Presentation Plan

AI
● Goal: build the intelligent machine
● It needs knowledge to make decisions
● Impossible to put the knowledge into
computer program
● Knowledge gained by learning from data

Data representation
● Representation - features passed to ML
algorithms, crucial for good performance on
various tasks
● Features can be handcrafted or learned
automatically
● Representation learning: discovering
meaningful features by the computer

ML in industry nowadays
● Most of the time spent on manual feature
extraction
● We would like to have

Why representation learning ?
● Previous slide (time-consuming, incomplete)
● Unsupervised feature learning
○ Collected data are mostly unlabeled
(bigger datasets)
○ Labels do not provide enough information
○ Process of learning is independent of the
ML task performed on data

Semi-supervised, transfer learning
● Transfer learning - transferring knowledge
from previous learning to the new machine
learning task
● Semi-supervised learning
few labeled
examples
many unlabeled
examples

Need for Deep Architectures
● deep architecture can represent certain
functions more compactly than shallow one
● any boolean function (e. g. AND, OR, XOR)
can be represented by a single hidden layer
- however it may require exponential number
of hidden units

Formally
● shown by Yao in 1985 that d-bit parity
circuits of depth 2 have exponential size
● generalised to perceptrons with linear
threshold units in 1991 by Hastad

How deep representation
do we need?

Learning multiple levels of
representation

“I'm sorry, Dave. I'm afraid I can't do that.”

对不起，戴夫。恐怕我不能这样做。

Let’s build deep
representation

Multilayer Perceptron
input layer
hidden layers
output layer

But MLPs have their problems
● vanishing, exploding gradients
● stucking in poor local optima
● lack of good initializations
● lack of labeled data
● hard time to encourage for research
● slow hardware

● but for natural images we would like to be
invariant to translations, rotations and other
non-changing class transformations
● fully connected networks do not introduce
such invariance
Limitations of fully connected
networks

Convolution = sparse connectivity +
parameters sharing

Word2Vec / Doc2Vec
● Tomas Mikolov et al 2013
● Embedding words / documents in vector
space
● Neural network with one hidden layer
● Trained in unsupervised way
● Representation for word obtained by
computing hidden layer activation
● Good explanation: http://arxiv.org/pdf/1411.
2738v1.pdf

Problem
● ~180k documents - reports made by
american companies of activity
● companies belonging to different industry
segments (260)
● ~9k labeled documents (given industry the
company operates in)
● example of semi-supervised learning
● task: classify the remaining part of
documents

Doc2Vec - classification
● Division of labeled set to training/test data
with ratio 70/30
● Test set: ~2700 examples, 260 classes
● Classification performed on representation
obtained from Doc2Vec
● Accuracy on test set:
○ KNN with voting: ~85 %
○ SVM one-versus-one: ~83 %
○ Random forest: ~80 %

Summing up
● define loss function for content
● define loss function for art
● define total loss
● perform gradient-based optimization
● compute derivatives with respect to data

● Theano & Lasagne
● NViDIA GTX
● https://github.com/Craftinity/art_style
● http://deeplearning.net

Contact
● http://www.craftinity.com
● https://www.facebook.com/craftinitycom
● https://twitter.com/craftinitycom
● mateuszopala@craftinity.com
● michaljamroz@craftinity.com
● contact@craftinity.com

PyData2015

Recommended

Recommended

More Related Content

Similar to PyData2015

Similar to PyData2015 (20)

PyData2015