Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017

1,601 views

Published on

Tianqi holds a bachelor’s degree in Computer Science from Shanghai Jiao Tong University, where he was a member of ACM Class, now part of Zhiyuan College in SJTU. He did his master’s degree at Changhai Jiao Tong University in China on Apex Data and Knowledge Management before joining the University of Washington as a PhD. He has had several prestigious internships and has been a visiting scholar including: Google on the Brain Team, at Graphlab authoring the boosted tree and neural net toolkit, at Microsoft Research Asia in the Machine Learning Group, and the Digital Enterprise Institute in Galway Ireland. What really excites Tianqi is what processes and goals can be enabled when we bring advanced learning techniques and systems together. He pushes the envelope on deep learning, knowledge transfer and lifelong learning. His PhD is supported by a Google PhD Fellowship.

Abstract summary

Build Scalable and Modular Learning Systems:
Machine learning and data-driven approaches are becoming very important in many areas. There are one factors that drive these successful applications: scalable learning systems that learn the model of interest from large datasets. More importantly, the system needed to be designed in a modular way to work with existing ecosystem and improve users’ productivity environment. In this talk, I will talk about XGBoost and MXNet, two learning scalable and portable systems that I build. I will discuss how we can apply distributed computing, asynchronous scheduling and hardware acceleration to improve these systems, as well as how do they fit into bigger open-source ecosystem of machine learning.

Published in: Technology
  • Be the first to comment

Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017

  1. 1. Build Scalable and Modular Learning Systems Tianqi Chen University of Washington tqchen@cs.washington.edu Joint work with contributors from
  2. 2. Machine Learning Impact us All Advance Science Improve Our LifeImprove Web Experience
  3. 3. Learning System: The Engine of Revolution Flexible Scalable Modular Lightweight
  4. 4. A Method to Solve Half of the Problems age < 15 is male? +2 -1+0.1 Y N Y N Use Computer Daily Y N +0.9 -0.9 f( ) = 2 + 0.9= 2.9 f( )= -1 - 0.9= -1.9 Tree Boosting (Friedman 1999) Used by 17 out of 29 Kaggle winners last year and more, winning solutions for the all the problems the last slide All use
  5. 5. XGBoost is Great and is Getting Better
  6. 6. Monotonic Functions • Constraint prediction to be monotonic to certain features • Useful for interpretability and generalization
  7. 7. Fast Histogram-based Trees • Bring techniques of recent improvements in histogram based tree construction to XGBoost • FastBDT (Thomas Keck), LightGBM (Ke et.al) • Optimized for both categorical and continuous features. Contributed by Hyunsu Cho, University of Washington
  8. 8. GPU based Optimization • Run each boosting iteration on the GPU • Uses fast parallel prefix sum / radix sort operations • Available now in XGboost Dataset i7-6700K (s) Titan X (s) Speedup Yahoo LTR 3738 507 7.37 Higgs 31352 4173 7.51 Bosch 9460 1009 9.38 Contributed by Rory Mitchell, Waikato University
  9. 9. Modularity: Platform Agnostic Engine In any language On any Platform • YARN, MPI, Flink, Spark, ... • Easily extendible to other cloud data flow
  10. 10. Solution to the other half of problems
  11. 11. MXNet: A Scalable Deep Learning System Flexibility Scalable Modular
  12. 12. Declarative vs Imperative Programs • Declarative graphs are easy to store, port, and optimize • Theano, TensorFlow • Imperative programs are flexible but hard to optimize • PyTorch, Chainer, Numpy
  13. 13. MXNet’s Approach: Mixed Programming Imperative NDArray API >>> import mxnet as mx >>> a = mx.nd.zeros((100, 50)) >>> a.shape (100L, 50L) >>> b = mx.nd.ones((100, 50)) >>> c = a + b >>> b += c >>> import mxnet as mx >>> net = mx.symbol.Variable('data') >>> net = mx.symbol.FullyConnected(data=net, num_hidden=128) >>> net = mx.symbol.SoftmaxOutput(data=net) >>> type(net) <class ‘mxnet.symbol.Symbol’> >>> texec = net.simple_bind(data=data_shape) Declarative API
  14. 14. MXNet Flexible Scalable Modular
  15. 15. • Speed is critical to deep learning • Parallelism leads to higher performance • Parallelization across multiple GPUs • Parallel execution of small kernels • Overlapping memory/networking transfer and computation • … Need for Parallelism
  16. 16. Parallel Programs are Painful to Write… • … because of dependencies
  17. 17. Solution: Auto Parallelization with Dependency Engine • Single thread abstraction of parallel environment • Works for both symbolic and imperative programs Dependency Engine
  18. 18. Scaling up to 256 AWS GPUs • Weak scaling (fix batch-size per GPU) • Need to tune different optimal parameters with more GPUs • Larger learning rate • More noised augmentation Bias variance trade-off https://github.com/dmlc/mxnet/tree/master/example/image-classification#scalability-results Adopted as AWS’s choice of deep learning system
  19. 19. Scaling Up is Good, How about Big Model? Many model is bounded by memory
  20. 20. Memory Optimization • Memory sharing, in-place optimization
  21. 21. Trade Computation for Memory • Do re-computation instead of saving • Training Deep Nets with Sublinear Memory Cost Chen.et.al arXiv 1604.06174 • Memory-Efficient Backpropagation Through Time Gruslys.et.al arXiv:1606.03401
  22. 22. O(sqrt(N)) memory cost with 25% overhead ImageNet ResNet configurations Train Bigger models on a single GPU
  23. 23. MXNet Flexible Scalable Modular
  24. 24. Deep Learning System will Become more Heterogeneous Mobile System A Operators of A Graph, without gradient System B Code generators Computation Graph Def, Gradient and Execution B Front-end Operators B System C Operators of C Front-end • More heterogeneous • Need different system for specific cases(with common modules)
  25. 25. Unix Philosophy vs Monolithic System • Monolithic: Build one system to solve everything • Unix Philosophy: Build modules to solve one thing well, work with other pieces
  26. 26. NNVM: High Level Graph Optimization for Deep learning • Allow different front-ends and back-ends • Allow extensive optimizations: • Memory reusing. • Runtime kernel fusion. • Automatic tensor partition and placement. • … Lightweight
  27. 27. The Challenge for IR of Deep Learning Systems FGradient FInferShape Conv Relu Symbolic Differentiation Shape Inference Operators Optimization Passes Use Set of Common Operator Attributes BatchNorm The need for adding new operators Code Gen The need for adding new optimizations FCodeGen
  28. 28. The Challenge for IR of Deep Learning Systems The Systems Add New Operator Add New Optimization Pass Most DL Systems(e.g. old MXNet) Easy Fixed Set of Optimization Pass LLVM Fixed Set of Primitive Ops Easy NNVM Easy Easy Comparison: • Ease of adding new operator, optimization pass without changing the core interface • Fixed interface is useful for decentralization • New optimization directly usable by other projects, without pushing back to centralized repo • Easy removable of not relevant passes
  29. 29. Graph, Attribute, and Pass Attr Key Attr Value “A_shape” (256, 512) “A_dtype” Float “B_shape” (256, 512) “B_dtype” Float … … GradientPass PlaceDevicePass InferShapePass InferTypePass PlanMemoryPass Semantic Graph Execution Graph … …
  30. 30. In Progress: Multi-Level Compilation Pipeline NNVM TVM Automatic Differentiation Memory Reuse- Planning Predefined Executors/OpsTiling Parallel Pattern TinyFlow MXNet High level NNVM Pass Frontend Memory Hierarchy Low level Pass CUDA x86/armMetal OpenCLLLVM ….
  31. 31. Recap Flexible Scalable Modular
  32. 32. MXNet and XGBoost is developed by over 100 collaborators Special thanks to Tianqi Chen UW Mu Li CMU/Amazon Bing Xu Turi Chiyuan Zhang MIT Junyuan Xie UW Yizhi Liu MediaV Tianjun Xiao Microsoft Yutian Li Stanford Yuan Tang Uptake Qian Kou Indiana University Hu Shiwen Shanghai Chuntao Hong Microsoft Min Lin Qihoo 360 Naiyan Wang TuSimple Tong He Simon Fraser University Minjie Wang NYU Valentin Churavy OIST Ali Farhadi UW/AI2 Carlos Guestrin UW/Turi Alexander Smola CMU/Amazon Zheng Zhang NYU Shanghai

×