Machine Learning in Rust with Leaf and Collenchyma

1,224 views

Published on

Machine Learning in Rust. A presentation about the Machine Intelligence Framework Leaf and the portable computation Framework Collenchyma by Autumn

Published in: Technology

Machine Learning in Rust with Leaf and Collenchyma

  1. 1. MACHINE LEARNING IN RUST WITH LEAF AND COLLENCHYMA RUST TALK BERLIN | Feb.2016 > @autumn_eng
  2. 2. “In machine learning, we seek methods by which the computer will come up with its own program based on examples that we provide.” MACHINE LEARNING ~ PROGRAMMING BY EXAMPLE [1]: http://www.cs.princeton.edu/courses/archive/spr08/cos511/scribe_notes/0204.pdf
  3. 3. AREAS OF ML [1]: http://cs.jhu.edu/~jason/tutorials/ml-simplex DOMAIN KNOWLEDGE LOTS OF DATA PROOF. TECHNIQUES BAYESIAN DEEP CLASSICAL INSIGHTFUL MODEL FITTING MODEL ANALYZABLE MODEL
  4. 4. > DEEP LEARNING. TON OF DATA + SMART ALGOS + PARALLEL COMP. [1]: http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning/
  5. 5. LARGE, SPARSE, HIGH-DIMENSIONAL DATASETS, LIKE IMAGES, AUDIO, TEXT, SENSORY & TIMESERIES DATA KEY CONCEPTS | DATA
  6. 6. > DEEP NEURAL NETWORK UNIVERSAL FUNCTION APPROXIMATOR, REPRESENTING HIRARCHICAL STRUCTURES IN LEARNED DATA KEY CONCEPTS | ALGORITHMS [1]: http://cs231n.github.io/neural-networks-1/
  7. 7. > DEEP NEURAL NETWORK KEY CONCEPTS | ALGORITHMS [image]: http://www.kdnuggets.com/wp-content/uploads/deep-learning.png
  8. 8. > BACKPROPAGATION COMPUTING GRADIENTS OF THE NETWORK THROUGH THE CHAIN RULE KEY CONCEPTS | ALGORITHMS [1]: http://cs231n.github.io/optimization-2/
  9. 9. KEY CONCEPTS | PARALLEL COMPUTATION > MULTI CORE DEVICES (GPUs) HIGH-DIMENSIONAL MATHEMATICAL OPERATIONS CAN BE EXECUTED MORE EFFICIENTLY ON SPECIAL-PURPOSE CHIPS LIKE GPUS OR FPGAS.
  10. 10. > COLLENCHYMA PORTABLE, PARALLEL, HPC IN RUST [1]: https://github.com/autumnai/collenchyma
  11. 11. SIMILAR PROJECTS: ARRAYFIRE (C++) COLLENCHYMA | OVERVIEW [1]: https://github.com/autumnai/collenchyma
  12. 12. COLLENCHYMA FUNDAMENTAL CONCEPTS 1. PORTABLE COMPUTATION | FRAMEWORKS/BACKEND 2. PLUGINS | OPERATIONS 3. MEMORY MANAGEMENT | SHAREDTENSOR
  13. 13. A COLLENCHYMA-FRAMEWORK DESCRIBES A COMPUTATIONAL LANGUAGE LIKE RUST, OPENCL, CUDA. A COLLENCHYMA-BACKEND DESCRIBES A SINGLE COMPUTATIONAL-CAPABLE HARDWARE (CPU, GPU, FPGA) WHICH IS ADDRESSABLE BY A FRAMEWORK. COLLENCHYMA | PORTABLE COMPUTATION
  14. 14. /// Defines a Framework. pub trait IFramework { /// Initializes a new Framework. /// /// Loads all the available hardwares fn new() -> Self where Self: Sized; /// Initializes a new Device from the provided hardwares. fn new_device(&self, &[Self::H]) -> Result<DeviceType, Error>; } /// Defines the main and highest struct of Collenchyma. pub struct Backend<F: IFramework> { framework: Box<F>, device: DeviceType, } COLLENCHYMA | PORTABLE COMPUTATION
  15. 15. // Initialize a CUDA Backend. let backend = Backend::<Cuda>::default().unwrap(); // Initialize a CUDA Backend - the explicit way let framework = Cuda::new(); let hardwares = framework.hardwares(); let backend_config = BackendConfig::new(framework, hardwares[0]); let backend = Backend::new(backend_config).unwrap(); COLLENCHYMA | PORTABLE COMPUTATION
  16. 16. COLLENCHYMA-PLUGINS ARE CRATES, WHICH EXTEND THE COLLENCHYMA BACKEND WITH FRAMEWORK AGNOSTIC, MATHEMATICAL OPERATIONS E.G. BLAS OPERATIONS COLLENCHYMA | OPERATIONS [1]: https://github.com/autumnai/collenchyma-blas [2]: https://github.com/autumnai/collenchyma-nn
  17. 17. /// Provides the functionality for a backend to support Neural Network related operations. pub trait NN<F: Float> { /// Initializes the Plugin. fn init_nn(); /// Returns the device on which the Plugin operations will run. fn device(&self) -> &DeviceType; } /// Provides the functionality for a Backend to support Sigmoid operations. pub trait Sigmoid<F: Float> : NN<F> { fn sigmoid(&self, x: &mut SharedTensor<F>, result: &mut SharedTensor<F>) -> Result<(), ::co::error::Error>; fn sigmoid_plain(&self, x: &SharedTensor<F>, result: &mut SharedTensor<F>) -> Result<(), ::co::error::Error>; fn sigmoid_grad(&self, x: &mut SharedTensor<F>, x_diff: &mut SharedTensor<F>) -> Result<(), ::co::error::Error>; fn sigmoid_grad_plain(&self, x: &SharedTensor<F>, x_diff: &SharedTensor<F>) -> Result<(), ::co::error::Error>; } COLLENCHYMA | OPERATIONS [1]: https://github.com/autumnai/collenchyma-nn/blob/master/src/plugin.rs
  18. 18. impl NN<f32> for Backend<Cuda> { fn init_nn() { let _ = CUDNN.id_c(); } fn device(&self) -> &DeviceType { self.device() } } impl_desc_for_sharedtensor!(f32, ::cudnn::utils::DataType::Float); impl_ops_convolution_for!(f32, Backend<Cuda>); impl_ops_sigmoid_for!(f32, Backend<Cuda>); impl_ops_relu_for!(f32, Backend<Cuda>); impl_ops_softmax_for!(f32, Backend<Cuda>); impl_ops_lrn_for!(f32, Backend<Cuda>); impl_ops_pooling_for!(f32, Backend<Cuda>); COLLENCHYMA | OPERATIONS [1]: https://github.com/autumnai/collenchyma-nn/blob/master/src/frameworks/cuda/mod.rs
  19. 19. COLLENCHYMA’S SHARED TENSOR IS A DEVICE- AND FRAMEWORK-AGNOSTIC MEMORY-AWARE, N- DIMENSIONAL STORAGE. COLLENCHYMA | MEMORY MANAGEMENT
  20. 20. OPTIMIZED FOR NON-SYNC USE CASE pub struct SharedTensor<T> { desc: TensorDesc, latest_location: DeviceType, latest_copy: MemoryType, copies: LinearMap<DeviceType, MemoryType>, phantom: PhantomData<T>, } COLLENCHYMA | MEMORY MANAGEMENT [1]: https://github.com/autumnai/collenchyma/blob/master/src/tensor.rs
  21. 21. fn sigmoid( &self, x: &mut SharedTensor<f32>, result: &mut SharedTensor<f32> ) -> Result<(), ::co::error::Error> { match x.add_device(self.device()) { _ => try!(x.sync(self.device())) } match result.add_device(self.device()) { _ => () } self.sigmoid_plain(x, result) } COLLENCHYMA | MEMORY MANAGEMENT [1]: https://github.com/autumnai/collenchyma-nn/blob/master/src/frameworks/cuda/helper.rs
  22. 22. fn main() { // Initialize a CUDA Backend. let backend = Backend::<Cuda>::default().unwrap(); // Initialize two SharedTensors. let mut x = SharedTensor::<f32>::new(backend.device(), &(1, 1, 3)).unwrap(); let mut result = SharedTensor::<f32>::new(backend.device(), &(1, 1, 3)).unwrap(); // Fill `x` with some data. let payload: &[f32] = &::std::iter::repeat(1f32).take(x.capacity()).collect::<Vec<f32>>(); let native = Backend::<Native>::default().unwrap(); x.add_device(native.device()).unwrap(); write_to_memory(x.get_mut(native.device()).unwrap(), payload); // Write to native host memory. x.sync(backend.device()).unwrap(); // Sync the data to the CUDA device. // Run the sigmoid operation, provided by the NN Plugin, on your CUDA enabled GPU. backend.sigmoid(&mut x, &mut result).unwrap(); // See the result. result.add_device(native.device()).unwrap(); // Add native host memory result.sync(native.device()).unwrap(); // Sync the result to host memory. println!("{:?}", result.get(native.device()).unwrap().as_native().unwrap().as_slice::<f32>()); } COLLENCHYMA | BRINGING IT TOGETHER [1]: https://github.com/autumnai/collenchyma#examples
  23. 23. > LEAF MACHINE INTELLIGENCE FRAMEWORK [1]: https://github.com/autumnai/leaf
  24. 24. SIMILAR PROJECTS: Torch (Lua), Theano (Python), Tensorflow (C++/Python), Caffe (C++) LEAF | OVERVIEW
  25. 25. Fundamental Parts Layers (based on Collenchyma plugins) Solvers
  26. 26. CONNECTED LAYERS FORM A NEURAL NETWORK BACKPROPAGATION VIA TRAITS => GRADIENT CALCULATION EASILY SWAPABLE LEAF | Layers
  27. 27. T = DATATYPE OF SHAREDTENSOR (e.g. f32). B = BACKEND /// A Layer that can compute the gradient with respect to its input. pub trait ComputeInputGradient<T, B: IBackend> { /// Compute gradients with respect to the inputs and write them into `input_gradients`. fn compute_input_gradient(&self, backend: &B, weights_data: &[&SharedTensor<T>], output_data: &[&SharedTensor<T>], output_gradients: &[&SharedTensor<T>], input_data: &[&SharedTensor<T>], input_gradients: &mut [&mut SharedTensor<T>]); } LEAF | Layers
  28. 28. POOLING LAYER EXECUTE ONE OPERATION OVER A REGION OF THE INPUT LEAF | Layers [image]: http://cs231n.github.io/convolutional-networks/
  29. 29. impl<B: IBackend + conn::Pooling<f32>> ComputeInputGradient<f32, B> for Pooling<B> { fn compute_input_gradient(&self, backend: &B, _weights_data: &[&SharedTensor<f32>], output_data: &[&SharedTensor<f32>], output_gradients: &[&SharedTensor<f32>], input_data: &[&SharedTensor<f32>], input_gradients: &mut [&mut SharedTensor<f32>]) { let config = &self.pooling_configs[0]; match self.mode { PoolingMode::Max => backend.pooling_max_grad_plain( output_data[0], output_gradients[0], input_data[0], input_gradients[0], config).unwrap() } }} LEAF | Layers
  30. 30. STOCHASTIC GRADIENT DESCENT REQUIRES BACKPROPAGATION MIGHT NOT FIND THE GLOBAL MINIMUM BUT WORKS FOR A HUGH NUBMER OF WEIGHTS LEAF | Solvers [image]: https://commons.wikimedia.org/wiki/File:Extrema_example_original.svg by Wikipedia user KSmrq
  31. 31. SDG WITH MOMENTUM: LEARNING RATE MOMENTUM OF HISTORY LEAF | PART 2
  32. 32. LEAF BRINGING ALL THE PARTS TOGETHER
  33. 33. > LIVE EXAMPLE - MNIST Dataset of 50_000 (training) + 10_000 (test) handwritten digits 28 x 28 px greyscale (8bit) images [image]: http://neuralnetworksanddeeplearning.com/chap1.html
  34. 34. We will use a single-layer perceptron LEAF | LIVE EXAMPLE [image]: http://neuralnetworksanddeeplearning.com/chap1.html
  35. 35. RUST MACHINE LEARNING IRC: #rust-machine-learning on irc.mozilla.org TWITTER: @autumn_eng http://autumnai.com

×