Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tensorflow for Janitors

442 views

Published on

Deep learning -- show me the golden path between hype and reality. How can you use it in your everyday development -- how will it change the software service landscape? The Data Janitor provides an enthusiastic, yet down to earth and practical view.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Tensorflow for Janitors

  1. 1. Tensorflow for Janitors Cra$ Conference Budapest 2017 Daniel Molnar @soobrosa door2door GmbH 1
  2. 2. Perspec've • rounded, not complete, • slow, old, stupid and lazy and • looking for feedback either to add or remove. 2
  3. 3. Where I'm coming from • head of data and analy-cs, • senior applied and data scien-st, • data analyst, • head of data, • or just data janitor. 3
  4. 4. Orienta(on What's this talk is about: - deep learning, - what a generalist can use Tensorflow for, - what can it teach us about a good product. What's this talk is not about: - ex%nc%on or salva%on by AI, - coding tutorial, - pitching a Google product. 4
  5. 5. Decyphering jargon via history 5
  6. 6. Adventures in CS (cca 1999) Machine learning is a func/on trained (un)supervised that generalizes well but hopefully not too much (overfi1ng) on a dataset ending up in fancy thesises. 6
  7. 7. Technical and simplified We: • run mul$variate linear regression • with cost/loss func$on to op,mize for (typically squared error) • with a batch gradient descent. 7
  8. 8. A neuron 8
  9. 9. Neural networks 9
  10. 10. Perceptron (1958) • random start weights, • ac*va*on func*on is a weighted sum exceeding a threshold. 10
  11. 11. Hidden layers before the AI winter ('70s) Mostly Minsky's fault: - non-linear failure (XOR), - backpropaga)on. 11
  12. 12. Backpropaga)on ('70-'80s) • ac$va$on func$on differen$able, • deriva$ve to adjust the weight to minimize error, • chain rule to blame prior layers, • op$mize with stochas.c gradient descent. 12
  13. 13. Deep Learning 13
  14. 14. What is it good for? • supervised near-human level accuracy in image classifica+on, voice recogni+on, natural language processing • unsupervised use large volumes of unstructured data to learn hierarchical models which capture the complex structure in the data and then use these models to predict proper+es of previously unseen data 14
  15. 15. 15
  16. 16. So is this supercharged ML? Kinda yes: • large scale neural networks with many layers, • weighs can be n dimensional arrays (tensors), • high level way of defining predic0on code or forward pass, • framework figures the deriva<ves (backwards pass). 16
  17. 17. Who made it work? Blame Canada! According to Geoffrey Hinton in the past: - our labeled datasets were thousands of 9mes too small, - our computers were millions of 9mes too slow, - we ini1alized the weights in a stupid way, - we used the wrong type of non-linearity. 17
  18. 18. Datasets 18
  19. 19. Speed 19
  20. 20. GPUs to scale Training is highly parallelizable linear matrix algebra. 20
  21. 21. Weights 21
  22. 22. Training jargon • regulariza)on to avoid overfi,ng (dataset augmenta)on, early stopping, dropout layer, weight penalty L1 and L2), • proper learning rate decay (both high and low can be bad, proper rate decay), • batch normaliza)on (faster learning and higher overall accuracy). 22
  23. 23. Non-linearity 23
  24. 24. Ac#va#on func#ons • sigmoid 1/(1+e^-x) • TanH (2/(1+e^-2x))-1 • ReLU (rec'fied linear unit) max(0,x) • so/plus ln(1+e^x) 24
  25. 25. ReLU for president ReLU • is sparse and gives more robust representa2ons, • has best performance, • avoids vanishing gradient problem, • actually it's so#max so it's differen2able. 25
  26. 26. Basic architectures 26
  27. 27. Convolu'onal (CNN) • tradi'onal CV was hand-cra3ing, • mimics visual percep'on, • convolu'on extracts features1 , • with lots of matrix mul'plica'on, • subsampling/pooling to reduce size and avoid overfi@ng. 1 LeNet5, Yann LeCun, 1988 27
  28. 28. Recurrent (RNN) • stateful, • TDNN -me delay neural networks, • LSTM long short-term memory, • supervised. 28
  29. 29. Autoencoder • reinforcement learning (deliver ac2on on context) • DBN Deep Belief Networks - directed • DBM Deep Boltzmann Machines - undirected • unsupervised. 29
  30. 30. Tensorflow 30
  31. 31. Recent major contestants • 2002 Torch (Lua) industrial, mul3ple GPUs, acyclic comp. graphs • 2010 Caffe (Python) academic, boilerplate-heavy • 2010 Theano (Python) academic, high level lightweight Keras • 2011 DistBelief (Google) • 2015 Tensorflow (Python) • 2016 CNTK (C#) 31
  32. 32. TF is 18 months old • pla%orms: DSP, CPU (ARM, Intel), (mul+ple) GPU(s), TPU, • Linux, OSX, Windows, Android, iOS, Raspberry Pi, • Python, Go, Rust, Java and Haskell, • performance improvements. 32
  33. 33. Liason with Python • API stability, • resemble NumPy more closely, • pip packages are now PyPI compliant, • high-level API includes a new *.keras module (almost halve the boilerplate), • Sonnet, a new high level API from DeepMind. 33
  34. 34. TF is open source for 18 months • the most popular machine learning project on GitHub in 2016, • 16.644 commits by 789 people, • 10,031 related repos. 34
  35. 35. Tooling • TensorBoard visualize network topology and performance, • Embedding Projector high level model understanding via visualiza:on, • XLA domain-specific compiler for TF graphs (CPUs and GPUs), • Fold for dynamic batching, • TensorFlow Serving to serve TF models in produc:on, 35
  36. 36. TF product choices: Tesla, not Ford • the right language, • mul/ple GPUs for training efficiency, • compile /mes are great (no to config), • high level API, • enable community, • tooling. 36
  37. 37. OS models Dozens of pretrained models like: • Incep'on (CNN), • SyntaxNet parser (LSTM), • Parsey McParseface for English (LSTM), • Parsey's Cousins for 40 addi'onal languages (LSTM). 37
  38. 38. Examples CNN (percep)on, image recogni)on) recycling and cucumber sor1ng with RasPI preven1ng skin cancer and blindness in diabe1cs LSTM (transla)on, speech recogni)on) language transla1on RNN (genera)on, )me series analysis) text, image and doodle genera1on in style or from text Reinforcement learning (control and play, autonomous driving) OpenAI Lab 38
  39. 39. A good product • don't lead the pack, • well stolen is half done, • end-to-end, • ecosystem (tooling), • eat your own dogfood. 39
  40. 40. Distributed deep learning Past: centralized, shovel all to the same pit, do magic, command and control. Future: pretrain models centrally, distribute models, retrain locally, merge and manage models (Squeezenet 500 kb). Gain: - efficiency, - no big data pipes, - privacy. 40
  41. 41. Federated Learning (3 weeks ago) Phones collabora-vely learn a shared predic-on model • device downloads current model, • improves it by learning from local data (retrain), • summarizes changes of model as small focused update, • update, but no data, is sent to the cloud encrypted, • averaged with other user updates to improve the shared model. 41
  42. 42. Subject ma+er experts - deep learning novices • Do you really need it? • Prepare data (small data < transfer learning + domain adapta9on, cover problem space, balance classes, lower dimensionality). • Find analogy (CNN, RNN/LSTM/GRU, RL). • Create a simple, small & easy baseline model, visualize & debug. • Fine-tune (evalua9on metrics - test data, loss func9on - training). (Smith: Best Prac0ces for Applying Deep Learning to Novel ... , 2017) 42
  43. 43. Training • hosted (GCMLE, Rescale, Floydub, Indico, Bi<usion Boost), • rented GPU (AWS -- TFAMI, AWS Deep Learning AMI for Ubuntu), • local (OSX :sigh:), • own GPU. 43
  44. 44. Near future? • bots go bust, • deep learning goes commodity, • AI is cleantech 2.0 for VCs, • MLaaS dies a second death, • full stack verCcal AI startups actually work. (Cross: Five AI Startup Predic6ons for 2017) 44
  45. 45. Major sources and read more • Andrey Kurenkov: A 'Brief' History of Neural Nets and Deep Learning, Part 1-4 • Adam Geitgey: Machine Learning is Fun! • TensorFlow and Deep Learning – Without a PhD (1 and 3 hour version) • Pete Warden: Tensorflow for Poets 45
  46. 46. Thank you! @soobrosa 46

×