Zürcher Fachhochschule
Deep Learning @ ZHAW
Thilo Stadelmann, Mark Cieliebak & Yves Pauchard
InIT Colloquium, 15. April 2015, Winterthur
Zürcher Fachhochschule
2
Agenda
Overview
• What is Deep Learning? ‘15
• Our stake in it
InIT Use Case: Text Analytics ‘10
•
InIT Use Case: Face Recognition ‘10
•
Zürcher Fachhochschule
3
Deep Learning is…
…a hot topic!
Zürcher Fachhochschule
4
Deep Learning is…
…Continued Neural Network Research
What’s new?
• Novel architectures (wider, deeper)
• Faster and better training
(e.g., understanding of Backpropagation’s “vanishing gradient” problem, good initial weights)
• Better regularization (e.g., Dropout, Max-pooling etc.)
• Big Data (or augmentation) and corresponding computational power on GPUs
 «Add as many parameters as possible for your hardware and train the hell out of
it with proper regularization» (Yann LeCun)
Zürcher Fachhochschule
5
Deep Learning is…
… Successful
Areas of successful application:
• Computer Vision (detection, segmentation, recognition, OCR, video analysis)
• Speech Processing (Recognition, Siri etc.)
• Natural Language Processing (Translation, Sentiment Analysis)
• Metric Learning (distances, invariances, hashing)
• Prediction & Forecasting (financial, time series)
Red titled slides by Jonathan Masci
Zürcher Fachhochschule
6
Technical Idea
Learning Features, not just rules
Hand-engineering features is tedious
 Let each layer learn a new representation of the data by itself
Actual learning is…
• governed by the learning target (input-output pairs & objective function),
• facilitated by constraints & regularizations (e.g., sparsity to learn distributed codes),
• enforced by the Backpropagation algorithm (1970-1989)
What is learned?
• Highly non-linear functions purely from data
• Hierarchies of features, combinations of elements (distributed codes)
State of the Art
• CNNs (Convolutional Neural Networks) for vision tasks and beyond
 Relatively easy to use, very successful, biologically inspired, broad user basis
• RNNs (Recurrent Neural networks) for sequences and hard tasks
 Turing complete, hot research topic Honglak Lee, University of Michigan
Yan et al., National University of Singapore
Zürcher Fachhochschule
7
The Deep Learning Market
… and what we do about it!
Strategic relevance
• 3 years ago: <10 research groups at «ivy league» universities
• 01/2014: Google acquires DeepMind for 500 Mio. $ (startup by IDSIA / Ticino)
• Currently:
• Courses / books / software frameworks are all «beta versions»
• Boundaries between research and application are strongly domain-specific
• Outlook: Could be a tool like «SVM» in 2-5 years
Deep Learning @ Datalab
• Hardware invests: 2 multi-GPU Workstations
http://www.zhaw.ch/de/zhaw/institute-zentren/uebergreifende-institute-zentren/dlab/hardware.html
• People invests: 13 researchers formed the Deep Learning Journals Club in 2014
deeplearning@downbirn.zhaw.ch
• Projects:
• 2 internal projects finished (see use cases later!)
• 2 CTI projects just got funded (start this summer)
• Several proposals pending
Zürcher Fachhochschule
8
Use Case «Text Analytics»
Mark Cieliebak

Zürcher Fachhochschule
9
Goal: Turn text
into information
Sentiment Analysis
Q&A
Named Entity Extraction
Text Summarization
Machine Translation
Spelling Correction
Information Retrieval
What is "Text Analytics"?
Zürcher Fachhochschule
10
Rule-Based Corpus-Based
Deep Learning
Predicted
Label
Approaches to Text Analytics
Zürcher Fachhochschule
11
Predicted
Label
Feature-Based Text Analytics
Zürcher Fachhochschule
12
Sample Features for Tweets
Word ngrams: presence or absence of contiguous sequences of 1, 2, 3, and 4
tokens; noncontiguous ngrams
POS: the number of occurrences of each part-of-speech tag
Sentiment Lexica: each word annotated with tonality score (-1..0..+1)
Negation: the number of negated contexts
Punctuation: the number of contiguous sequences of exclamation marks, question
marks, and both exclamation and question marks
Emoticons: presence or absence, last token is a positive or negative emoticon;
Hashtags: the number of hashtags;
Elongated words: the number of words with one character repeated (e.g. ‘soooo’)
from: Mohammad et al., SemEval 2013
Zürcher Fachhochschule
13
Feature-Based Text Analytics
Most Important Issues
• Requires large annotated corpora
• Depends on good features
[6]
Zürcher Fachhochschule
14
Deep Learning on Text
Deep Learning:
It's all about Word Vectors!
Zürcher Fachhochschule
15
Word2Vec
• Huge set of text samples (billions of
words)
• Extract dictionary
• Word-Matrix: k-dimensional vector for
each word (k typically 50-500)
• Word vector initialized randomly
• Train word vectors to predict next
words, given a sequence of words
from sample text
Major contributions by Bengio et al. 2003, Collobert&Weston 2008, Socher et al. 2011, Mikolov et al. 2013
Zürcher Fachhochschule
16
The Magic of Word Vectors
King - Man + Woman ≈ Queen
Live Demo on 100b words from Google News dataset: http://radimrehurek.com/2014/02/word2vec-tutorial/
Zürcher Fachhochschule
17
Relations Learned by Word2Vec
[11]
Zürcher Fachhochschule
18
Using Word Vectors in NLP
Collobert et al., 2011:
• SENNA: Generic NLP System based on word vectors
• Solves many NLP-Tasks as good as benchmark systems
Zürcher Fachhochschule
19
Sentiment Analysis
"… WiFi Analytics is a free Android app that I find
very handy when it comes to troubleshooting and
monitoring a home network. "
Zürcher Fachhochschule
20
Deep Learning and Sentiment
• Maas et al., 2011: word vectors with sentiment context
• Socher et al, 2013: Representing sentence structures
as trees with sentiment annotation
• Quoc and Mikolov, 2014:
"Paragraph Vectors"
wonderful terrible
amazing awful
Zürcher Fachhochschule
21
Words and Images
Untrained
Class
Demo: http://www.clarifai.com/#demo
Zürcher Fachhochschule
22
Use Case «Face Recognition»
Yves Pauchard

Zürcher Fachhochschule
23
piVision: Face recognition on a Raspberry Pi
Zürcher Fachhochschule
24
What is face recognition?
Detection: Is this a face or not?
Verification: Are these two pictures showing the same face?
Identification: Is this Yves?
Zürcher Fachhochschule
25
Pipeline
Detect Align
Feature
extractor
Train
Pre-processor Model
Filter
Recognizer
Predict
Extract face Correct
pose
Correct
illumination
Dimensionality
reduction
Classification
Zürcher Fachhochschule
26
Software development
• Python (OpenCV) + PyCharm + SVN + TeamCity
(Raspberry Pi and Linux agents)
• Timing and accuracy test after each commit
Zürcher Fachhochschule
27
Baseline: Fisherfaces (OpenCV)
Detect Align
Feature
extractor
Train
Pre-processor Model
Filter
Recognizer
Predict
Viola & Jones 2D similarity
transform
Gamma +
DoG
Principal
Component
Analysis
Linear Discriminate
Analysis
Zürcher Fachhochschule
28
Deep Learning
Detect Align
Feature
extractor
Train
Pre-processor Model
Filter
Recognizer
Predict
Viola & Jones Local binary
pattern +
ellipse
Convolutional Neural Network:
Features are learned
Zürcher Fachhochschule
29
Experiment
Testing outdoors (used exclusively for testing)
Training indoors (used for learning)
Approx. 40 images of 6 individuals acquired in 2 batches.
For CNN training, an augmented set was used, i.e.
additional training images were synthetically created.
Zürcher Fachhochschule
30
Results
Zürcher Fachhochschule
31
Interesting findings
• Alignment is crucial for baseline algorithm – time consuming
• CNN needs to be trained on desktop PC with GPU
• Training data augmentation for CNN can effectively replace
the alignment step – saving time
• CNN outperforms baseline algorithm 99.6 % : 96.9 %,
dropping less images and saving time.
• Let’s see it running:
https://www.youtube.com/watch?v=oI1eJa-UWNU
Zürcher Fachhochschule
32
Further Reading
• Very brief history with some links (2015)
http://dublin.zhaw.ch/~stdm/?p=241
• Comprehensive history & survey (2015)
Schmidhuber, “Deep Learning in Neural Networks: An Overview”
http://arxiv.org/abs/1404.7828
• Deep Learning Kick-off (2006  of historical interest)
Hinton et al., “A Fast Learning Algorithm for Deep Belief Nets”
http://www.cs.toronto.edu/~hinton/absps/ncfast.pdf
• Very practical overview of Convolutional Neural Networks (CNNs, 1998)
LeCun et al., “Gradient-Based Learning Applied to Document Recognition”
http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
• Cool application for which Google paid 500 Mio. $ (2015)
Mnih et al, “Human-Level Control through Deep Reinforcement Learning”
http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html

Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)

  • 1.
    Zürcher Fachhochschule Deep Learning@ ZHAW Thilo Stadelmann, Mark Cieliebak & Yves Pauchard InIT Colloquium, 15. April 2015, Winterthur
  • 2.
    Zürcher Fachhochschule 2 Agenda Overview • Whatis Deep Learning? ‘15 • Our stake in it InIT Use Case: Text Analytics ‘10 • InIT Use Case: Face Recognition ‘10 •
  • 3.
  • 4.
    Zürcher Fachhochschule 4 Deep Learningis… …Continued Neural Network Research What’s new? • Novel architectures (wider, deeper) • Faster and better training (e.g., understanding of Backpropagation’s “vanishing gradient” problem, good initial weights) • Better regularization (e.g., Dropout, Max-pooling etc.) • Big Data (or augmentation) and corresponding computational power on GPUs  «Add as many parameters as possible for your hardware and train the hell out of it with proper regularization» (Yann LeCun)
  • 5.
    Zürcher Fachhochschule 5 Deep Learningis… … Successful Areas of successful application: • Computer Vision (detection, segmentation, recognition, OCR, video analysis) • Speech Processing (Recognition, Siri etc.) • Natural Language Processing (Translation, Sentiment Analysis) • Metric Learning (distances, invariances, hashing) • Prediction & Forecasting (financial, time series) Red titled slides by Jonathan Masci
  • 6.
    Zürcher Fachhochschule 6 Technical Idea LearningFeatures, not just rules Hand-engineering features is tedious  Let each layer learn a new representation of the data by itself Actual learning is… • governed by the learning target (input-output pairs & objective function), • facilitated by constraints & regularizations (e.g., sparsity to learn distributed codes), • enforced by the Backpropagation algorithm (1970-1989) What is learned? • Highly non-linear functions purely from data • Hierarchies of features, combinations of elements (distributed codes) State of the Art • CNNs (Convolutional Neural Networks) for vision tasks and beyond  Relatively easy to use, very successful, biologically inspired, broad user basis • RNNs (Recurrent Neural networks) for sequences and hard tasks  Turing complete, hot research topic Honglak Lee, University of Michigan Yan et al., National University of Singapore
  • 7.
    Zürcher Fachhochschule 7 The DeepLearning Market … and what we do about it! Strategic relevance • 3 years ago: <10 research groups at «ivy league» universities • 01/2014: Google acquires DeepMind for 500 Mio. $ (startup by IDSIA / Ticino) • Currently: • Courses / books / software frameworks are all «beta versions» • Boundaries between research and application are strongly domain-specific • Outlook: Could be a tool like «SVM» in 2-5 years Deep Learning @ Datalab • Hardware invests: 2 multi-GPU Workstations http://www.zhaw.ch/de/zhaw/institute-zentren/uebergreifende-institute-zentren/dlab/hardware.html • People invests: 13 researchers formed the Deep Learning Journals Club in 2014 deeplearning@downbirn.zhaw.ch • Projects: • 2 internal projects finished (see use cases later!) • 2 CTI projects just got funded (start this summer) • Several proposals pending
  • 8.
    Zürcher Fachhochschule 8 Use Case«Text Analytics» Mark Cieliebak 
  • 9.
    Zürcher Fachhochschule 9 Goal: Turntext into information Sentiment Analysis Q&A Named Entity Extraction Text Summarization Machine Translation Spelling Correction Information Retrieval What is "Text Analytics"?
  • 10.
    Zürcher Fachhochschule 10 Rule-Based Corpus-Based DeepLearning Predicted Label Approaches to Text Analytics
  • 11.
  • 12.
    Zürcher Fachhochschule 12 Sample Featuresfor Tweets Word ngrams: presence or absence of contiguous sequences of 1, 2, 3, and 4 tokens; noncontiguous ngrams POS: the number of occurrences of each part-of-speech tag Sentiment Lexica: each word annotated with tonality score (-1..0..+1) Negation: the number of negated contexts Punctuation: the number of contiguous sequences of exclamation marks, question marks, and both exclamation and question marks Emoticons: presence or absence, last token is a positive or negative emoticon; Hashtags: the number of hashtags; Elongated words: the number of words with one character repeated (e.g. ‘soooo’) from: Mohammad et al., SemEval 2013
  • 13.
    Zürcher Fachhochschule 13 Feature-Based TextAnalytics Most Important Issues • Requires large annotated corpora • Depends on good features [6]
  • 14.
    Zürcher Fachhochschule 14 Deep Learningon Text Deep Learning: It's all about Word Vectors!
  • 15.
    Zürcher Fachhochschule 15 Word2Vec • Hugeset of text samples (billions of words) • Extract dictionary • Word-Matrix: k-dimensional vector for each word (k typically 50-500) • Word vector initialized randomly • Train word vectors to predict next words, given a sequence of words from sample text Major contributions by Bengio et al. 2003, Collobert&Weston 2008, Socher et al. 2011, Mikolov et al. 2013
  • 16.
    Zürcher Fachhochschule 16 The Magicof Word Vectors King - Man + Woman ≈ Queen Live Demo on 100b words from Google News dataset: http://radimrehurek.com/2014/02/word2vec-tutorial/
  • 17.
  • 18.
    Zürcher Fachhochschule 18 Using WordVectors in NLP Collobert et al., 2011: • SENNA: Generic NLP System based on word vectors • Solves many NLP-Tasks as good as benchmark systems
  • 19.
    Zürcher Fachhochschule 19 Sentiment Analysis "…WiFi Analytics is a free Android app that I find very handy when it comes to troubleshooting and monitoring a home network. "
  • 20.
    Zürcher Fachhochschule 20 Deep Learningand Sentiment • Maas et al., 2011: word vectors with sentiment context • Socher et al, 2013: Representing sentence structures as trees with sentiment annotation • Quoc and Mikolov, 2014: "Paragraph Vectors" wonderful terrible amazing awful
  • 21.
    Zürcher Fachhochschule 21 Words andImages Untrained Class Demo: http://www.clarifai.com/#demo
  • 22.
    Zürcher Fachhochschule 22 Use Case«Face Recognition» Yves Pauchard 
  • 23.
    Zürcher Fachhochschule 23 piVision: Facerecognition on a Raspberry Pi
  • 24.
    Zürcher Fachhochschule 24 What isface recognition? Detection: Is this a face or not? Verification: Are these two pictures showing the same face? Identification: Is this Yves?
  • 25.
    Zürcher Fachhochschule 25 Pipeline Detect Align Feature extractor Train Pre-processorModel Filter Recognizer Predict Extract face Correct pose Correct illumination Dimensionality reduction Classification
  • 26.
    Zürcher Fachhochschule 26 Software development •Python (OpenCV) + PyCharm + SVN + TeamCity (Raspberry Pi and Linux agents) • Timing and accuracy test after each commit
  • 27.
    Zürcher Fachhochschule 27 Baseline: Fisherfaces(OpenCV) Detect Align Feature extractor Train Pre-processor Model Filter Recognizer Predict Viola & Jones 2D similarity transform Gamma + DoG Principal Component Analysis Linear Discriminate Analysis
  • 28.
    Zürcher Fachhochschule 28 Deep Learning DetectAlign Feature extractor Train Pre-processor Model Filter Recognizer Predict Viola & Jones Local binary pattern + ellipse Convolutional Neural Network: Features are learned
  • 29.
    Zürcher Fachhochschule 29 Experiment Testing outdoors(used exclusively for testing) Training indoors (used for learning) Approx. 40 images of 6 individuals acquired in 2 batches. For CNN training, an augmented set was used, i.e. additional training images were synthetically created.
  • 30.
  • 31.
    Zürcher Fachhochschule 31 Interesting findings •Alignment is crucial for baseline algorithm – time consuming • CNN needs to be trained on desktop PC with GPU • Training data augmentation for CNN can effectively replace the alignment step – saving time • CNN outperforms baseline algorithm 99.6 % : 96.9 %, dropping less images and saving time. • Let’s see it running: https://www.youtube.com/watch?v=oI1eJa-UWNU
  • 32.
    Zürcher Fachhochschule 32 Further Reading •Very brief history with some links (2015) http://dublin.zhaw.ch/~stdm/?p=241 • Comprehensive history & survey (2015) Schmidhuber, “Deep Learning in Neural Networks: An Overview” http://arxiv.org/abs/1404.7828 • Deep Learning Kick-off (2006  of historical interest) Hinton et al., “A Fast Learning Algorithm for Deep Belief Nets” http://www.cs.toronto.edu/~hinton/absps/ncfast.pdf • Very practical overview of Convolutional Neural Networks (CNNs, 1998) LeCun et al., “Gradient-Based Learning Applied to Document Recognition” http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf • Cool application for which Google paid 500 Mio. $ (2015) Mnih et al, “Human-Level Control through Deep Reinforcement Learning” http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html