Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Flow of TensorFlow

1,819 views

Published on

이 발표에서는 TensorFlow의 지난 1년을 간단하게 돌아보고, TensorFlow의 차기 로드맵에 따라 개발 및 도입될 예정인 여러 기능들을 소개합니다. 또한 2017년 및 2018년의 머신러닝 프레임워크 개발 트렌드와 방향에 대한 이야기도 함께 합니다.

In this talk, I look back the TensorFlow development over the past year. Then discusses the overall development direction of machine learning frameworks, with an introduction to features that will be added to TensorFlow later on.

Published in: Technology

The Flow of TensorFlow

  1. 1. The Flow of TensorFlow Jeongkyu Shin Lablup Inc. 2017. 11. 12 / GDG DevFest Nanjing 2017 2017. 11. 19 / GDG DevFest Seoul 2017
  2. 2. Descript.ion § CEO / Co-founder, Lablup Inc. § Develops Backend.AI § Open-source devotee § Google Developer Experts (Machine Learning) § Principal Researcher, KOSSLab., Korea § Textcube open-source project maintainer (10th anniversary!) § Physicist / Neuroscientist § Adj. professor (Dept. of CSE, Hanyang Univ.) § Ph.D in Statistical Physics (complex systems / computational neuroscience) Jeongkyu Shin / @inureyes
  3. 3. Machine Learning Era: All came from dust § Machine learning § ”Field of study that gives computers the ability to learn without being explicitly programmed” Arthur Samuel (1959) § "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” Tom Michel (1999) § Type of Machine Learning § Supervised learning § Unsupervised learning § Reinforcement learning § Recommender system
  4. 4. Artificial Intelligence § Definition § Allan Turing, ‘The Imitation Game” (1950) => Turing test § John McCarthy, Dartmouth Artificial Intelligence Conference (1956) § Information Processing Language (1955) § From axiom to theory § Heuristics to reduce probing space § Born of LISP programming language § First approach : IF-THEN rule § Probe every possible cases and choose the pathway with highest fitness
  5. 5. Artificial Neural Network: Basics § Effect of layers A. K. Jain, J. Mao, K. M. Mohiuddin (1996) Artificial Neural Networks: A Tutorial IEEE Computer 29
  6. 6. Winter was coming § First winter (1970s) § Complex problems: too difficult to construct logic models (by hand) § Second winter (1990s) § Overfitting problem → pre-training, supervised backpropagation → dropout (2013) § Convergence → vanishing gradient problem (1991) § Divergence problem → weight decay / sparsity regularization § Tedious training speed → IT evolution, mini-batch § And the spring: Environmental changes open the gate § Rise of big-data § Phenomenal computation cost reduction
  7. 7. Deep Learning: flower of the golden era § What if you have enough money to do (formally) crazy experiments? Like § Increase the number of hidden layers § Pour unlimited number of data § Breakthrough of deep learning § Geoffrey Hinton (2005) § Andrew Ng (2012) § Convolution Neural Network § Pooling layer + weight § Recurrent Neural Network § Feedforward routine with (long/short) term memory § Deep disbelief Network § Multipartite neural network with generative model § Deep Q-Network § Using deep learning for reinforcement learning
  8. 8. AlphaGo as a mixture of Machine Learning techniques § Reducing search space § Breadth reduction § And depth reduction § Prediction § 13 layer convolutional NN § Value network § Policy network § Principal variation
  9. 9. Flow of TensorFlow Still less than two years passed.
  10. 10. TensorFlow § Open-source software library for machine learning across a range of tasks § Developed by Google (Dec. 2015~) § Characteristics § Python API (like Theano) § From 1.0, TensorFlow expands native API binding with Java C, etc. § Supports § Linux, macOS § NVidia GPUs (pascal and above)
  11. 11. Before TensorFlow § User-friendly Deep-learning toolkits § Caffe (2012) § Generalized programming method to researchers § Provides common NN blocks § Configuration file + training kernel program § Theano (2013~2017) § User code / configuration part is written in Python § Keras (2015~) § Meta-framework for Deep Learning programming § Supports various backends: § Theano (default) / TensorFlow (2016~) / MXNet (2017~) / CNTK (WIP) § ETC § Paddle, Chainer, DL4J…
  12. 12. TensorFlow: Summary § Statistics § More than 24000 commits since Dec. 2015 § More than 1140 committers § More than 24000 forks for last 12 months § Dominates Bootstrap! (15000) § More than 6400 TensorFlow-related repository created on GitHub § Current § Complete ML model prototyping § Distributed training § CPU / GPU / TPU / Mobile support § TensorFlow Serving § Enables easier inference / model serving § XLA compiler (1.0~) § Support various environments / speedups § Keras API Support (1.2~) § High-level programming API § Keras-compatible API § Eager Execution (1.4~) § Interactive mode of TensorFlow § Treat TensorFlow python code as real python code https://www.infoworld.com/article/3233283/javascript/at-github-javascript-rules-in-usage-tensorflow-leads-in-forks.html
  13. 13. TensorFlow: Summary § TensorFlow Serving § Enables easier inference / model serving § XLA compiler (1.0~) § Support various environments / speedups § Keras API Support (1.2~) § High-level programming API § Keras-compatible API § Eager Execution (1.4~) § Interactive mode of TensorFlow § Treat TensorFlow python code as real python code 2016 2017 ⏤ TensorFlow Serving ⏤ Keras API ⏤ Eager Execution ⏤ TensorFlow Lite ⏤ XLA ⏤ OpenAL w/ OpenCompute ⏤ Distributed TensorFlow ⏤ Multi GPU support ⏤ Mobile TensorFlow ⏤ TensorFlow Datasets ⏤ SKLearn (contrib) ⏤ TensorFlow Slim ⏤ SyntaxNet ⏤ DRAGNN ⏤ TFLearn (contrib) ⏤ TensorFlow TimeSeries
  14. 14. How TensorFlow works § CPU § Multiprocessor § AVX-based acceleration § GPU part in chip § OpenMP § GPU § CUDA (NVidia) ➜ cuDNN § OpenCL (AMD) ➜ ComputeCPP / ROCm § TPU (1st, 2nd gen.) § ASIC for accelerating matrix calculation § In-house development by Google https://www.tensorflow.org/get_started/graph_viz
  15. 15. How TensorFlow works § Python but not Python § Python API is default API for TensorFlow § However, TF core is written in C++, with cuDNN library (for GPU acceleration) § Computation Graph § User TF code is not a code § it is a configuration to generate computation graph § Session § Creates a computation graph and run the training using C++ core § Tedious debug process
  16. 16. Google I/O 2017 / TensorFlow Frontiers How TensorFlow works
  17. 17. TensorFlow Features § Recent TensorFlow core features § TensorFlow Estimators § Included in 1.4 (Oct. 2017) / high-level API for using, modeling well-known estimators § TensorFlow Serving (independent project) § TensorFlow Keras-compatible API (Sep. 2017) § Included in 1.3 (Sep. 2017) § TensorFlow Datasets § Included in 1.4 (Oct. 2017) § Upcoming/testing TensorFlow core features § TensorFlow eager execution § Introduced in 1.4 (Oct. 2017) § TensorFlow Lite § (Work-in-progress)
  18. 18. XLA: linear algebra compiler for TensorFlow Google I/O 2017 / TensorFlow Frontiers
  19. 19. TensorFlow Serving § Serving system for inference service § Components § Servables § Loaders § Managers § Features § Model building § Model versioning § Model saving / loading § Online inference support with RPC
  20. 20. Keras-compatible API for TensorFlow § Keras ( https://keras.io ) § High-level API § Focus on user experience § “Deep learning accessible to everyone” § History § Announced at Feb. 2017 § Bundled as an contribution package from TF 1.2 § Official core package since 1.4 § Characteristics § “Simplified workflow for TensorFlow users, more powerful features to Keras users” § Most Keras code can be used on TensorFlow (with keras. to tf.keras.) § Can mix Keras code with TensorFlow codes
  21. 21. TensorFlow Datasets § New way to generate data pipeline § Dataset classes § TextLineDataset § TFRecordDataset § FixedLengthRecordDataset § Iterator
  22. 22. Example: Decoding and resizing image data # Reads an image from a file, decodes it into a dense tensor, and resizes it # to a fixed shape. def _parse_function(filename, label): image_string = tf.read_file(filename) image_decoded = tf.image.decode_image(image_string) image_resized = tf.image.resize_images(image_decoded, [28, 28]) return image_resized, label # A vector of filenames. filenames = tf.constant(["/var/data/image1.jpg", "/var/data/image2.jpg", ...]) # `labels[i]` is the label for the image in `filenames[i]. labels = tf.constant([0, 37, ...]) dataset = tf.data.Dataset.from_tensor_slices((filenames, labels)) dataset = dataset.map(_parse_function)
  23. 23. Eager execution § Announced at Oct. 30, 2017 § Makes TensorFlow execute operations immediately § Returns concrete values § Provides § A NumPy-like library for numerical computation § Support for GPU acceleration and automatic differentiation § A flexible platform for machine learning research and experiments § Advantages § Python debugger tools § Immediate error reporting § Easy control flow § Python data structures
  24. 24. Example: Session x = tf.placeholder(tf.float32, shape=[1, 1]) m = tf.matmul(x, x) print(m) # Tensor("MatMul:0", shape=(1, 1), dtype=float32) with tf.Session() as sess: m_out = sess.run(m, feed_dict={x: [[2.]]}) print(m_out) # [[4.]] x = [[2.]] m = tf.matmul(x, x) print(m) # tf.Tensor([[4.]], dtype=float32, shape=(1,1))
  25. 25. Example: Instant error x = tf.gather([0, 1, 2], 7) InvalidArgumentError: indices = 7 is not in [0, 3) [Op:Gather]
  26. 26. Example: removing metaprogramming x = tf.random_uniform([2, 2]) with tf.Session() as sess: for i in range(x.shape[0]): for j in range(x.shape[1]): print(sess.run(x[i, j])) x = tf.random_uniform([2, 2]) for i in range(x.shape[0]): for j in range(x.shape[1]): print(x[i, j])
  27. 27. a = tf.constant(6) while not tf.equal(a, 1): if tf.equal(a % 2, 0): a = a / 2 else: a = 3 * a + 1 print(a) Eager execution: Python Control Flow # Outputs tf.Tensor(3, dtype=int32) tf.Tensor(10, dtype=int32) tf.Tensor(5, dtype=int32) tf.Tensor(16, dtype=int32) tf.Tensor(8, dtype=int32) tf.Tensor(4, dtype=int32) tf.Tensor(2, dtype=int32) tf.Tensor(1, dtype=int32)
  28. 28. def square(x): return tf.multiply(x, x) # Or x * x grad = tfe.gradients_function(square) print(square(3.)) # tf.Tensor(9., dtype=tf.float32 print(grad(3.)) # [tf.Tensor(6., dtype=tf.float32))] Eager execution: Gradients
  29. 29. def square(x): return tf.multiply(x, x) # Or x * x grad = tfe.gradients_function(square) gradgrad = tfe.gradients_function(lambda x: grad(x)[0]) print(square(3.)) # tf.Tensor(9., dtype=tf.float32) print(grad(3.)) # [tf.Tensor(6., dtype=tf.float32)] print(gradgrad(3.)) # [tf.Tensor(2., dtype=tf.float32))] Eager execution: Gradients
  30. 30. def log1pexp(x): return tf.log(1 + tf.exp(x)) grad_log1pexp = tfe.gradients_function(log1pexp) print(grad_log1pexp(0.)) Eager execution: Custom Gradients Works fine, prints [0.5]
  31. 31. def log1pexp(x): return tf.log(1 + tf.exp(x)) grad_log1pexp = tfe.gradients_function(log1pexp) print(grad_log1pexp(100.)) Eager execution: Custom Gradients [nan] due to numeric instability
  32. 32. @tfe.custom_gradient def log1pexp(x): e = tf.exp(x) def grad(dy): return dy * (1 - 1 / (1 + e)) return tf.log(1 + e), grad grad_log1pexp = tfe.gradients_function(log1pexp) # Gradient at x = 0 works as before. print(grad_log1pexp(0.)) # [0.5] # And now gradient computation at x=100 works as well. print(grad_log1pexp(100.)) # [1.0] Eager execution: Custom Gradients
  33. 33. tf.device() for manual placement with tf.device(“/gpu:0”): x = tf.random_uniform([10, 10]) y = tf.matmul(x, x) # x and y reside in GPU memory Eager execution: Using GPUs
  34. 34. The same APIs as graph building (tf.layers, tf.train.Optimizer, tf.data etc.) model = tf.layers.Dense(units=1, use_bias=True) optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) Eager execution: Building Models
  35. 35. model = tf.layers.Dense(units=1, use_bias=True) optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) # Define a loss function def loss(x, y): return tf.reduce_mean(tf.square(y - model(x))) Eager execution: Building Models
  36. 36. Compute and apply gradients for (x, y) in get_next_batch(): optimizer.apply_gradients(grad_fn(x, y)) Eager execution: Training Models
  37. 37. Compute and apply gradients grad_fn = tfe.implicit_gradients(loss) for (x, y) in get_next_batch(): optimizer.apply_gradients(grad_fn(x, y)) Eager execution: Training Models
  38. 38. Comparison TensorFlow TFlearn TF Slim TF Eager Execution Keras (with TF backend) Keras (with MXNet backend) PyTorch CNTK MXNet Difficulty ■■■■ ■■■ ■■ ■■ ■■ ■■■ ■ ■■■■ ■■■■ Extensibility ■■■■ ■■■■ ■■■■ ■■ ■■ ■■ ■ ■■■■ ■■■■ Interactive mode X X X O X X O X X Multi-CPU (NUMA) O O X X O O O O O Multi-CPU (Cluster) O O O X O O X O O Multi-GPU (single node) O O O X O O ? (manual multi- batch) O O Multi-GPU (Cluster) O O O X O O X O O
  39. 39. TensorFlow Lite § TensorFlow Lite: Embedded TensorFlow § No additional environment installation required § OS level hardware acceleration § Leverages Android NN § XLA-based optimization support § Enables binding to various programming languages § Developer Preview (4 days ago) § Part of Android O-MR1 Google I/O 2017 / Android meets TensorFlow
  40. 40. TensorFlow Lite § Format § FlatBuffers instead of ProtocolBuffers § Provides converter § Models § InceptionV3 § MobileNets: vision-specific model family § API § Java § C++
  41. 41. TensorFlow Lite: Why and How § Why? Less traffic / faster response § Image / OCR, Speech <-> Text, Translation, NLP § Motion, GPS and more § ML can extract the meaning from raw data § Image recognition: Send raw image vs. send detected label § Motion detection: Send raw motion vs. send feature vector § How? Model compression § Graph freezing § Graph conversion tools § Quantization § Weight § Calculation § Memory mapping Google I/O 2017 / Android meets TensorFlow
  42. 42. Android Neural Network API § New APIs for NeuralNet § Part of Android Framework § Since next Android release § Reduce the library duplication through apps. § Supports Hardware acceleration § GPU, DSP, ISP, NeuralNet chips, etc. Google I/O 2017 / Android meets TensorFlow
  43. 43. Flow goes to: market What is flowing through the stream?
  44. 44. Market: API-based (personalized) deep learning service § Service with pre-baked models via API § Focuses on the fields that does not require real-time § e.g. Microsoft Azure Cognitive service § Pre-trained ANN + personalized data = personalized NN § Easy personalization : server-side training + =
  45. 45. Market: User-side deep learning services § Inference with trained models § Does not require heavy calculation § e.g. ARMv7 with ~512MB / 1GB RAM § Toys / light products § Smart toys for kidult (adult + kids) : Self-driving R/C car / drone § Home appliance and controllers § IoT + ML § Locality : Home (per room), Car, Office, etc. § E.g. Smart home resource management systems
  46. 46. Market: Deep Learning service for everyone § Digital assistants War § Digital assistant (with sprakers): Gateway of deep learning based services § Context extraction + inference + features § Echo (Amazon) / Google Home (Google) § Microsoft (Cortana in every MS products) / Apple (HomePod) § Korea? Also entering the war field § Naver: Wave / Friends § Kakao: Kakao mini § SK: Nugu
  47. 47. Flow goes to: tech. What is flowing through the stream?
  48. 48. Portability and extensibility § Training on § Mac / windows § GPU server § GPU / TPU on Cloud § Prediction / Inference using § Android / iOS § Raspberry Pi and TPU § Android Things Google I/O 2017 / Android meets TensorFlow
  49. 49. Open-source Machine Learning Framework § Machine Learning Framework: (almost) open-source § Google: TensorFlow (2015~) § Microsoft: CNTK (2016~) § Amazon: MxNet (2015~) § Facebook: Caffe 2 (2017~) / PyTorch (2016~) § Baidu: PaddlePaddle (2016~) § Why? § 2017 § General goal of new versions: user-friendly syntax § Rise of Keras, PyTorch leads TensorFlow Eager execution
  50. 50. Server-side machine learning § Machine learning workload characteristics § Training § Requires ultra-heavy computation resources § Need to feed big, indexed data § OR, (reinforcement learning) need pair model / training environment to give feedbacks § Serving § Requires (relatively) light resources: § Low CPU cost § Middle memory capacity (to load NeuralNet)
  51. 51. TensorFlow: Multiverse § TensorFlow AMD GPU acceleration § OpenCL with ComputeCPP (Feb. 2017) § Accelerates c++ codes (codeplay) § Khronos support / SYCL standard § Still in early stage § Only supports Linux § ROCm (AMD) based TensorFlow (Sep. 2017) § First open-source HPC/Hyperscale-class platform for GPU computing § LLVM based / HCC C++ / GCN compiler § https://github.com/ROCmSoftwarePlatform/ hiptensorflow
  52. 52. Hand-held machine learning: Why? § Issues from real-time models / apps § Autopilot § Real-time effect on photos / videos § Voice recognition § Automators § Privacy issues § Increasing privacy information § ETC § Lead the network cost reduction
  53. 53. Hand-held machine learning: How? § Apple’s approach § Keeping user privacy with Differential Privacy § Gather Anonymized user data § User-specific machine learning models: keep them in the phone § e.g. Photo face detection / voice recognition / smart keyboard § Core ML (iOS 11) § Support Machine Learning model as function (.mlmodel format) § Google’s approach § Ultra-large scale server side training using TPU (2nd gen.) § Mobile: Handles data compression and feature extraction (to reduce traffic) § On the mobile: § Android NeuralNet API (Android O) § TensorFlow Lite on Android (Android O) https://backchannel.com/an-exclusive-look-at-how-ai-and-machine-learning-work-at-apple-8dbfb131932b
  54. 54. Hand-held machine learning: How? § Train on server, Serve on smartphone § Enough to serve pre-trained models on smartphones § Both train and serve on smartphone § Keeping privacy / reduce traffic / personalization § Uses GPUs on recent smartphones § Working together § Feature extraction / compression / preprocessing ‒ Mobile side § Machine Learning model training / updating / streaming advanced models ‒ Server side
  55. 55. Hand-held machine learning: How? § TensorFlow § Supports both Android and iOS § XCode and Android Studio § XLA compiler framework since TensorFlow 1.0: § Will support diverse languages / environments § Also, optimizing for smartphones and tablets § MobileNet (Apr. 2017) § Efficient Convolutional Neural Networks for Mobile Vision Applications § TensorFlow Lite (Nov. 2017): development focus § Built-in operators for both quantized models (int (8bit) / fixed point) and floating point models (FP10, FP16) § Support for embedded GPUs / ASICs
  56. 56. Browser-side machine learning § Machine Learning without hassle § Ingredients for machine learning: Computation, Data, Algorithm § XLA: provides binary-code level optimization for various environment § Do we have cross-platform computation environment? § Java? § Browser! § Recent improvements of web browser § WebGL § Unified programming environment for many GPU-enabled machines § WebAssembly § Binary-level optimization § Shipped to every mainstream browser! (just in this week)
  57. 57. Convertible NeuralNet format § ONNX (Open Neural Network Exchange) § Microsoft / Facebook (Sep. 2017) § Caffe 2, PyTorch (by Facebook), CNTK (Microsoft) § MLMODEL (Code ML model, Machine Learning Model) § Apple (Aug. 2017) § Caffe, Keras, scikit-learn, LIBSVM (Open Source) § Provides Core ML converter / specification
  58. 58. Recap § Machine Learning / Artificial Intelligence § Flow of TensorFlow § TensorFlow Serving Project § Keras-compatible API § Datasets § Eager execution § TensorFlow Lite § Flow goes to § More user-friendly toolkits / frameworks § API-based / personalized § User-side inference / Hand-held ML § Convertible Machine Learning Model formats
  59. 59. End! Thank you for listening https://www.lablup.ai https://backend.ai https://cloud.backend.ai https://www.codeonweb.com https://github.com/lablup Lablup Inc. Backend.AI Backend.AI Cloud CodeOnWeb Service Github repository

×