Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Scott Triglia, MLconf 2013 by MLconf 2295 views
- UN Global Pulse Annual Report 2014 by Global Pulse 6309 views
- Beverly Wright, Executive Director,... by MLconf 497 views
- Amy Langville, Professor of Mathema... by MLconf 359 views
- Josh Patterson, Advisor, Skymind – ... by MLconf 534 views
- Global Pulse Oximetry Market 2015-2019 by Technavio 1110 views

1,898 views

Published on

No Downloads

Total views

1,898

On SlideShare

0

From Embeds

0

Number of Embeds

894

Shares

0

Downloads

15

Comments

0

Likes

1

No embeds

No notes for slide

- 1. me rono et M s ra m Algorith tive YARN rallel Ite and Pa
- 2. Josh Patterson Email: josh@floe.tv Twitter: @jpatanooga Github: https://github.com/ jpatanooga Past Published in IAAI-09: “TinyTermite: A Secure Routing Algorithm” Grad work in Meta-heuristics, Ant-algorithms Tennessee Valley Authority (TVA) Hadoop and the Smartgrid Cloudera Principal Solution Architect Today: Consultant
- 3. Sections 1. Parallel Iterative Algorithms 2. Parallel Neural Networks 3. Future Directions
- 4. rithms Algo e rativ llel Ite ra Pa p c Hadoo e and YARN, I u tiveRed tera
- 5. 5 Machine Learning and Optimization Direct Methods Normal Equation Iterative Methods Newton’s Method Quasi-Newton Gradient Descent Heuristics AntNet PSO Genetic Algorithms
- 6. Linear Regression In linear regression, data is modeled using linear predictor functions unknown model parameters are estimated from the data. We use optimization techniques like Stochastic Gradient Descent to find the coeffcients in the model Y = (1*x0) + (c1*x1) + … + (cN*xN)
- 7. 7 Stochastic Gradient Descent Hypothesis about data Cost function Update function Andrew Ng’s Tutorial: https://class.coursera.org/ml/ lecture/preview_view/11
- 8. 8 Stochastic Gradient Descent Training Data Training Simple gradient descent procedure Loss functions needs to be convex (with exceptions) Linear Regression SGD Loss Function: squared error of prediction Prediction: linear combination of coefficients and input variables Model
- 9. 9 Mahout’s SGD Currently Single Process Multi-threaded parallel, but not cluster parallel Runs locally, not deployed to the cluster Tied to logistic regression implementation
- 10. 10 Distributed Learning Strategies McDonald, 2010 Distributed Training Strategies for the Structured Perceptron Langford, 2007 Vowpal Wabbit Jeff Dean’s Work on Parallel SGD DownPour SGD
- 11. 11 MapReduce vs. Parallel Iterative Input Processor Map Map Map Reduce Output Processor Superstep 1 Processor Reduce Processor Processor Superstep 2 . . . Processor
- 12. 12 YARN Node Manager Yet Another Resource Negotiator Container Framework for scheduling distributed applications App Ms Client Node Manager Resource Manager Client Allows for any type of parallel application to run natively on hadoop App Mstr Node Manager MapReduce Status MRv2 is now a distributed application Job Submission Node Status Resource Request Containe Container Containe
- 13. 13 IterativeReduce API ComputableMaster Worker Setup() Worker Worker Master Compute() Complete() Worker Worker ComputableWorker Master Setup() Compute() . . . Worker
- 14. 14 SGD: Serial vs Parallel Split 1 Split 2 Split 3 Training Data Worker 1 Partial Model Worker 2 … Partial Model Master Model Global Model Worker N Partial Model
- 15. Parallel Iterative Algorithms on YARN Based directly on work we did with Knitting Boar Parallel logistic regression And then added Parallel linear regression Parallel Neural Networks Packaged in a new suite of parallel iterative algorithms called Metronome 100% Java, ASF 2.0 Licensed, on github
- 16. Linear Regression Results 160 Total Processing Time 140 120 100 Series 1 Series 2 80 60 40 64.0 128.0 192.0 Megabytes Processed Total 256.0 320.0
- 17. 17 Logistic Regression: 20Newsgroups 250 200 150 Series 1 Series 2 100 50 0 4.1 8.200000000000001 12.3 16.4 20.5 24.59999999999999 28.7 32.8 Input Size vs Processing Time 36.9 41.0
- 18. Convergence Testing Debugging parallel iterative algorithms during testing is hard Processes on different hosts are difficult to observe Using the Unit Test framework IRUnit we can simulate the IterativeReduce framework We know the plumbing of message passing works Allows us to focus on parallel algorithm design/testing while still using standard debugging tools
- 19. works l Net a Pa Neur allel r r Let’s G n-Linea et No
- 20. What are Neural Networks? Inspired by nervous systems in biological systems Models layers of neurons in the brain Can learn non-linear functions Recently enjoying a surge in popularity
- 21. Multi-Layer Perceptron First layer has input neurons Last layer has output neurons Each neuron in the layer connected to all neurons in the next layer Neuron has activation function, typically sigmoid / logistic Input to neuron is the sum of the weight * input of connections
- 22. Backpropogation Learning Calculates the gradient of the error of the network regarding the network's modifiable weights Intuition Run forward pass of example through network Compute activations and output Iterating output layer back to input layer (backwards) For each neuron in the layer Compute node’s responsibility for error Update weights on connections
- 23. Parallelizing Neural Networks Dean, (NIPS, 2012) First Steps: Focus on linear convex models, calculating distributed gradient Model Parallelism must be combined with distributed optimization that leverages data parallelization simultaneously process distinct training examples in each of the many model replicas periodically combine their results to optimize our objective function Single pass frameworks such as MapReduce “ill-suited”
- 24. Costs of Neural Network Training Connections count explodes quickly as neurons and layers increase Example: {784, 450, 10} network has 357,300 connections Need fast iterative framework Example: 30 sec MR setup cost: 10k Epochs: 30s x 10,000 == 300,000 seconds of setup time 5,000 minutes or 83 hours 3 ways to speed up training Subdivide dataset between works (data parallelism) Max transfer rate of disks and Vector caching to max data throughput Minimize inter-epoch setup times with proper iterative framework
- 25. Vector In-Memory Caching Since we make lots of passes over same dataset In memory caching makes sense here Once a record is vectorized it is cached in memory on the worker node Speedup (single pass, “no cache” vs “cached”): ~12x
- 26. Neural Networks Parallelization Speedup
- 27. tions irec ure D t Fu d G Forwar oing
- 28. Lessons Learned Linear scale continues to be achieved with parameter averaging variations Tuning is critical Need to be good at selecting a learning rate
- 29. Future Directions Adagrad (SGD Adaptive Learning Rates) Parallel Quasi-Newton Methods L-BFGS Conjugate Gradient More Neural Network Learning Refinement Training progressively larger networks
- 30. Github IterativeReduce https://github.com/emsixteeen/IterativeReduce Metronome https://github.com/jpatanooga/Metronome
- 31. Unit Testing and IRUnit Simulates the IterativeReduce parallel framework Uses the same app.properties file that YARN applications do Examples https://github.com/jpatanooga/Metronome/blob/master/src/ test/java/tv/floe/metronome/linearregression/iterativereduce/ TestSimulateLinearRegressionIterativeReduce.java https://github.com/jpatanooga/KnittingBoar/blob/master/ src/test/java/com/cloudera/knittingboar/sgd/iterativereduce/ TestKnittingBoar_IRUnitSim.java

No public clipboards found for this slide

Be the first to comment