• Save
Hadoop Summit 2012 | BranchReduce: Distributed Branch-and-Bound on YARN
 

Like this? Share it with your network

Share

Hadoop Summit 2012 | BranchReduce: Distributed Branch-and-Bound on YARN

on

  • 2,311 views

Session Abstract</strong><div></div><div><p>Branch-and-bound is a widely used technique for efficiently searching for solutions to combinatorial optimization problems. In this session, ...

Session Abstract</strong><div></div><div><p>Branch-and-bound is a widely used technique for efficiently searching for solutions to combinatorial optimization problems. In this session, we will introduce BranchReduce, an open-source Java library for performing distributed branch-and-bound on a Hadoop cluster under YARN. Applications only need to write code that is specific to their optimization problem (namely the branching rule, the lower bound computation, and the upper bound computation), and BranchReduce handles deploying the application to the cluster, managing the execution, and periodically rebalancing the search space across the machines. We will give an overview of how BranchReduce works and then walk through an example that solves a scheduling problem with a near-linear speedup over a single machine implementation.

Statistics

Views

Total Views
2,311
Views on SlideShare
2,138
Embed Views
173

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 173

http://www.cloudera.com 148
http://blog.cloudera.com 25

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • 1600+ Lines of Java Code796 Client, 837 Application MasterRelatively Simple Lifecycle PatternsStart Up: RPCs and Resource DefinitionsPeriodic Heartbeat ChecksShut Down: Resource CleanupNo advanced featuresCheckpointingApplication Master failuresDynamic resource allocation

Hadoop Summit 2012 | BranchReduce: Distributed Branch-and-Bound on YARN Presentation Transcript

  • 1. BranchReduceDistributed Branch-and-Bound on YARNJune 14, 2012
  • 2. About Me Copyright 2012 Cloudera Inc. All rights reserved 2
  • 3. Hadoop Distributed Processing Frameworks Copyright 2012 Cloudera Inc. All rights reserved
  • 4. Lots of Other Parallel Processing Platforms Copyright 2012 Cloudera Inc. All rights reserved
  • 5. Hadoop 2.0: Resource Scheduling with YARN Copyright 2012 Cloudera Inc. All rights reserved
  • 6. The Data Deluge and the Cambrian Explosion Copyright 2012 Cloudera Inc. All rights reserved
  • 7. Parallel Distributed Processing For Everyone Copyright 2012 Cloudera Inc. All rights reserved
  • 8. Building a New Processing Framework on YARN Copyright 2012 Cloudera Inc. All rights reserved
  • 9. A Terrifyingly Accurate Paraphrasing of JWZSome people, when confronted with a tediousproblem, say, “I know, I’ll write a framework.”Now they have two tedious problems. Copyright 2012 Cloudera Inc. All rights reserved
  • 10. On Designing Frameworks Copyright 2012 Cloudera Inc. All rights reserved
  • 11. The Example YARN App: Distributed Shell Copyright 2012 Cloudera Inc. All rights reserved
  • 12. Do We Need a New Programming Language for Developing YARN Applications? Copyright 2012 Cloudera Inc. All rights reserved
  • 13. Do We Need a New Programming Language for Developing YARN Applications? Copyright 2012 Cloudera Inc. All rights reserved
  • 14. Leverage Existing Frameworks • Popular RPC libraries with support for multiple languages • C++, Java, Python • We need to make it easy to deploy existing applications on YARN Copyright 2012 Cloudera Inc. All rights reserved
  • 15. Kitten: Playing with YARN Copyright 2012 Cloudera Inc. All rights reserved
  • 16. Design Pattern: The Unified Application Master • Contains business logic and YARN logic • Primary reason: Communication • Also: dynamic resource allocation • Develop our master/worker applications locally and then deploy them on YARN Copyright 2012 Cloudera Inc. All rights reserved
  • 17. YARN Lifecycle Management as a Service • Specifically, extensions of Guava’s Service interface • YarnClientService • AppMasterService • Contains all of the logic for creating applications and keeping an eye on them Copyright 2012 Cloudera Inc. All rights reserved
  • 18. Moving the Configuration Logic Out of Java Copyright 2012 Cloudera Inc. All rights reserved
  • 19. Lua as a Configuration Language • Small and Simple • Looks like a configuration file • Functions are there when/if you need them • Inheritance • Don’t Repeat Yourself • Forgiving of undefined values • Java/C++ Integration Copyright 2012 Cloudera Inc. All rights reserved
  • 20. First Kitten Utility: The cat Function Copyright 2012 Cloudera Inc. All rights reserved
  • 21. Second Kitten Utility: The yarn Function Copyright 2012 Cloudera Inc. All rights reserved
  • 22. BranchReduceCopyright 2012 Cloudera Inc. All rights reserved
  • 23. Branch-and-Bound Copyright 2012 Cloudera Inc. All rights reserved
  • 24. The Challenge of Parallel Branch and Bound:Unbalanced Search Space • Some branches are pruned quickly • Can be difficult to determine the best splits a priori • Easy to revert to a de facto single-threaded search Copyright 2012 Cloudera Inc. All rights reserved
  • 25. The Solution: Work Stealing Copyright 2012 Cloudera Inc. All rights reserved
  • 26. You Write Three Classes• A Task class that implements Writable• A GlobalState class that implements Writable and has a mergeWith(GlobalState other) method• A Processor class that defines: • execute(T task, BranchReduceContext<T, GlobalState> ctxt); • With optional initialize and cleanup methods• Configuration is done via BranchReduceJob Copyright 2012 Cloudera Inc. All rights reserved
  • 27. Example: The Knapsack Problem Copyright 2012 Cloudera Inc. All rights reserved
  • 28. 0-1 Integer Programming Problems • NP-Hard Resource Allocation Problem • Portfolio Optimization • Asset Securitization Copyright 2012 Cloudera Inc. All rights reserved
  • 29. Problem Formulation: (Simplified) LP Format Copyright 2012 Cloudera Inc. All rights reserved
  • 30. Questions?@josh_wills