Machine Learning
«A Gentle Introduction»
@ZimMatthias Matthias Zimmermann
BSI Business Systems Integration AG
«What is the difference
to IBM’s chess system
20 years ago?»
AlphaGo Hardware
Powered by TPUs
Tensor Processing Unit (TPU)
Specialized ML Hardware
What else
is needed?
Markov Decision Process
 Environment (Atari Breakout)
 Agent performing Actions (Left, Right, Release Ball)
 State (Bricks, location / direction of ball, …)
 Rewards (A Brick is hit)
Deep Reinforcement Learning
Q-Learning (simplified)
 Markov Decision Process
 Q(s, a) Highest sum of future Rewards for action a in state s
initialize Q randomly
set initial state s0
repeat
execute a to maximize Q(si, a)
observe r and new state si+1
set Q = update(Q, r, a, si+1)
set si = si+1
until terminated
Deep Reinforcement Learning
Deep Q Learning (DQN)
 Q Learning
 Q(s, a) = Deep Neural Network (DNN)
 Retrain DNN regularly (using it’s own experience)
Deep Reinforcement Learning
Action a
Left, Right, Release
DNN Q(s, a)
State s
Machine Learning Concepts
Data
Models
Training and Evaluation
ML Topics
Challenges
 Getting the RIGHT data for the task
 And LOTs of it
 There is never enough data …
Real World Lessons
 Data is crucial for successful ML projects
 Most boring and timeconsuming task
 Most underestimated task
Getting the Data
Rosemary, Rosmarinus officinalis
Sentiment Analysis
1245 NEGATIVE 
shallow , noisy and pretentious .
14575 POSITIVE 
one of the most splendid
entertainments to emerge from
the french film industry in years
Iris or Flower set or
example for outlier
detection?
86211,B,12.18,17.84,77.79, …
862261,B,9.787,19.94,62.11, …
862485,B,11.6,12.84,74.34, …
862548,M,14.42,19.77,94.48, …
862009,B,13.45,18.3,86.6, …
Data
Models
Training and Evaluation
ML Topics
2012, ImageNet, G. Hinton
Data
Models
Training and Evaluation
ML Topics
Model Complexity
Training Iterationen
Error Rate
Training Data
Test Data
«Underfitting»
more training needed
«Overfitting»
too much training

model too simple
model too complex
Data
Models
Training and Evaluation
ML Topics
Supervised Learning
• Learning from Examples
• Right Answers are known
Unsupervised Learning
• Discover Structure in Data
• Dimensionality Reduction
Reinforcement Learning
• Interaction with Dynamic Environment
Demo Time
Demo 1 Supervised Learning
 Pattern recognition
 Handwritten character recognition
 Convolutional neural network
Demo 2 Unsupervised Learning
 Natural language processing (NLP)
 Neural word embeddings
 Word2vec
Demos
Data
 Which digit is this?
 Collect our own data
Model
 Deep Neural Network (LeNet-5)
Deeplearning4j
 Deep Learning Library
 Open Source (Apache)
 Java
Pattern Recognition
Handwritten Digits
1998 Gradient-based Learing for Document Recognition, Y. LeCun
Unsupervised Learning
Natural Language Processing
Data
 Google News text training dataset
 Texts with total of 3’000’000’000 words
 Lexicon: 3’000’000 words/phrases
Model
 Word2Vec Skip-gram
 Mapping: Word  300-dimensional number space
 Many useful properties (word clustering, syntax, semantics)
Deeplearning4j
 (Train) load and use Google News word2vec model
Recent Advances
Games Backgammon 1979, chess 1997, Jeopardy! 2011,
Atari games 2014, Go 2016, Poker (Texas Hold’em) 2017
Visual CAPTCHAs 2005, face recognition 2007,
traffic sign reading 2011, ImageNet 2015,
lip-reading 2016
Other Age estimation from pictures 2013, personality judgement from
Facebook «likes» 2014, conversational speech recognition 2016
ML performance >= Human Levels (2017)
https://finnaarupnielsen.wordpress.com/2015/03/15/status-on-human-vs-machines/
2014, Stanford
http://cs.stanford.edu/people/karpathy/deepimagesent/devisagen.pdf
https://gigaom.com/2014/11/18/google-stanford-build-hybrid-neural-networks-that-can-explain-photos/
2016, UMich+Max Plank
https://arxiv.org/pdf/1605.05396.pdf
2017, Cornell + Adobe
https://arxiv.org/abs/1703.07511
2016, Google
https://research.googleblog.com/2016/09/a-neural-network-for-machine.html
2016, Erlangen, Max-
Plank, Stanford
http://www.graphics.stanford.edu/~niessner/papers/2016/1facetoface/thies2016face.pdf
https://www.youtube.com/watch?v=ttGUiwfTYvg
2011
2015
2016
2017
2014
ML Libraries
Food for Thought + Next Steps
?
Positive Outcomes
Statement by Lee Sedol
Socalizing
 Go to talks, conferences
 Visit meetups (Zurich Machine Learning and Data Science, …)
Increase Context
 Blogs, Twitter, arxiv.org, …
Doing
 GitHub (deeplearning4j/deeplearning4j,
BSI-Business-Systems-Integration-AG/anagnostes, …)
 Learn Python ;-)
Like to learn more?
Thanks!
@ZimMatthias

Machine Learning: A gentle Introduction

Editor's Notes

  • #3 You have probably followed the story of the korean wold champion in go loosing against the alphago system built by the guys at deep mind. So, how is playing go different to playing chess from a systems perspective? This is nicely explained in the following video clip The next clip has «demis hassabis» from deepmind taking about the difference of playing go and chess from a human perspective Video 1: What is go and how does it compare to chess? Video 2: As the complexity of go is so much higher than with chess, intuition becomes even more important. Video 3: The last snippet is a high level description of the training of alpha go
  • #4 But of course – as in the case of the chess playing system – there is also a hardware story behind the win of alphago --- AlphaGo was powered by TPUs in the matches against Go world champion, Lee Sedol, enabling it to "think" much faster and look farther ahead between moves. https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html?utm_content=buffer73148&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
  • #5 The Tensor Processing Units (TPU) are custom built ASIC boards that boost machine learning applications by parallelizing large amounts of low precision matrix computations. This brings around 3 generation of moore’s law making available the processing power today that we could expect to be available 6 years in the future … https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html?utm_content=buffer73148&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
  • #6 Deep reinforcement learning was used for the Alpha go A milestone article about Atari games was submitted to Nature just 2 years ago The Alpha-Go Paper https://deepmind.com/research/dqn/ http://www.nature.com/nature/journal/v518/n7540/abs/nature14236.html Deep reinforcement learning https://www.nervanasys.com/demystifying-deep-reinforcement-learning/ http://rll.berkeley.edu/deeprlcourse/ tensor flow: https://www.tensorflow.org/
  • #7 https://www.nervanasys.com/demystifying-deep-reinforcement-learning/
  • #8 https://www.nervanasys.com/demystifying-deep-reinforcement-learning/
  • #9 https://github.com/tambetm/simple_dqn Agent implementation https://github.com/tambetm/simple_dqn/blob/master/src/agent.py DQN implementation https://github.com/tambetm/simple_dqn/blob/master/src/deepqnetwork.py https://www.nervanasys.com/openai/ https://github.com/NervanaSystems/neon
  • #10 In the follwing we will Get some stories about where all this came from Talk about the major concepts Do some small demos And have a look at current work that provides a glimpse at the things to come
  • #12 Most boring task and timeconsuming task of most machine learning projects (often > most of the time of a new machine learning project is used to get at the right data and enough data) Most underestimated task But, Good data sets are valuable over many decades
  • #13 movie reviews: sentiment polarity http://www.cs.cornell.edu/People/pabo/movie-review-data/ Handwritten digits: mnist http://yann.lecun.com/exdb/mnist/ breast cancer: wdbc https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) imagenet competition http://image-net.org/challenges/LSVRC/2012/ http://image-net.org/challenges/LSVRC/2012/browse-synsets http://image-net.org/synset?wnid=n12864160 (rosemary) many more data sets http://deeplearning.net/datasets/
  • #15 Massive revival with deep deep learning Classical neural networks Many layers Some tricks Rebranding from neural nets to deep learning
  • #17 more training needed overtrained
  • #19 http://cs229.stanford.edu/schedule.html
  • #20 In the follwing we will Get some stories about where all this came from Talk about the major concepts Do some small demos And have a look at current work that provides a glimpse at the things to come
  • #22 Convolutinoal nn explained http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/?utm_content=bufferfb698&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer http://yann.lecun.com/exdb/mnist/ https://deeplearning4j.org/
  • #23 web online demo https://transcranial.github.io/keras-js/#/mnist-cnn
  • #26 Word2vec model https://github.com/mmihaltz/word2vec-GoogleNews-vectors 3 billion words, 3 million vocabular (incl phrases such as «boston globe») http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/ 100 billion words from a Google News dataset Original paper (mikolov et al, 2013a) «Efficient Estimation of Word Representations in Vector Space” vector dimensionality 300 and context size 5 https://arxiv.org/pdf/1301.3781.pdf [6 billion tokens, 1 million words in voc] 2nd paper (mikolov et al, 2013b) https://arxiv.org/pdf/1310.4546.pdf []
  • #30 https://finnaarupnielsen.wordpress.com/2015/03/15/status-on-human-vs-machines/ https://www.researchgate.net/publication/220271810_Computers_beat_Humans_at_Single_Character_Recognition_in_Reading_based_Human_Interaction_Proofs_HIPs
  • #31 Also in 2014, researchers started to combine pattern recognition and natural language processing, both based on deep learning Stanford (2014) http://cs.stanford.edu/people/karpathy/deepimagesent/devisagen.pdf Google + stanford (2014) https://gigaom.com/2014/11/18/google-stanford-build-hybrid-neural-networks-that-can-explain-photos/
  • #32 https://arxiv.org/pdf/1605.05396.pdf
  • #33 https://arxiv.org/abs/1703.07511
  • #34 https://research.googleblog.com/2016/09/a-neural-network-for-machine.html paper https://arxiv.org/abs/1609.08144 Newer version (multi-lingual) https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
  • #35 face2face http://www.graphics.stanford.edu/~niessner/thies2016face.html video https://www.youtube.com/watch?v=ttGUiwfTYvg paper http://www.graphics.stanford.edu/~niessner/papers/2016/1facetoface/thies2016face.pdf
  • #36 Theano (python) http://deeplearning.net/software/theano/ (2011) Tensorflow (python) https://www.tensorflow.org/ (2015) Keras (Python) https://keras.io/ (2016) Pytorch (python) http://pytorch.org/ (2017) Deeplearning4j (Java) https://deeplearning4j.org/ (2014)
  • #38 Financial times: https://www.ft.com/content/063c1176-d29a-11e5-969e-9d801cf5e15b Medium: https://medium.com/basic-income/deep-learning-is-going-to-teach-us-all-the-lesson-of-our-lives-jobs-are-for-machines-7c6442e37a49#.w1d0sk8dn
  • #39 https://www.stlouisfed.org/on-the-economy/2016/january/jobs-involving-routine-tasks-arent-growing 1. The 90’ Computers getting used everywhere, first industrial roboters (? Need to verify!) 2. 2008 Financial crisis 3. 2017+ Future …
  • #41 arXiv https://arxiv.org/list/stat.ML/current Twitter @StanfordNLP, @OpenAI ‏ @AndrewYNg, @goodfellow_ian, @karpathy, @hugo_larochelle ‏, @ylecun ‏, @dennybritz ‏, @StatMLPapers ‏