Hadoop Summit 2012 | BranchReduce: Distributed Branch-and-Bound on YARN

2,382 views
2,184 views

Published on

Session Abstract</strong><div></div><div><p>Branch-and-bound is a widely used technique for efficiently searching for solutions to combinatorial optimization problems. In this session, we will introduce BranchReduce, an open-source Java library for performing distributed branch-and-bound on a Hadoop cluster under YARN. Applications only need to write code that is specific to their optimization problem (namely the branching rule, the lower bound computation, and the upper bound computation), and BranchReduce handles deploying the application to the cluster, managing the execution, and periodically rebalancing the search space across the machines. We will give an overview of how BranchReduce works and then walk through an example that solves a scheduling problem with a near-linear speedup over a single machine implementation.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,382
On SlideShare
0
From Embeds
0
Number of Embeds
176
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • 1600+ Lines of Java Code796 Client, 837 Application MasterRelatively Simple Lifecycle PatternsStart Up: RPCs and Resource DefinitionsPeriodic Heartbeat ChecksShut Down: Resource CleanupNo advanced featuresCheckpointingApplication Master failuresDynamic resource allocation
  • Hadoop Summit 2012 | BranchReduce: Distributed Branch-and-Bound on YARN

    1. 1. BranchReduceDistributed Branch-and-Bound on YARNJune 14, 2012
    2. 2. About Me Copyright 2012 Cloudera Inc. All rights reserved 2
    3. 3. Hadoop Distributed Processing Frameworks Copyright 2012 Cloudera Inc. All rights reserved
    4. 4. Lots of Other Parallel Processing Platforms Copyright 2012 Cloudera Inc. All rights reserved
    5. 5. Hadoop 2.0: Resource Scheduling with YARN Copyright 2012 Cloudera Inc. All rights reserved
    6. 6. The Data Deluge and the Cambrian Explosion Copyright 2012 Cloudera Inc. All rights reserved
    7. 7. Parallel Distributed Processing For Everyone Copyright 2012 Cloudera Inc. All rights reserved
    8. 8. Building a New Processing Framework on YARN Copyright 2012 Cloudera Inc. All rights reserved
    9. 9. A Terrifyingly Accurate Paraphrasing of JWZSome people, when confronted with a tediousproblem, say, “I know, I’ll write a framework.”Now they have two tedious problems. Copyright 2012 Cloudera Inc. All rights reserved
    10. 10. On Designing Frameworks Copyright 2012 Cloudera Inc. All rights reserved
    11. 11. The Example YARN App: Distributed Shell Copyright 2012 Cloudera Inc. All rights reserved
    12. 12. Do We Need a New Programming Language for Developing YARN Applications? Copyright 2012 Cloudera Inc. All rights reserved
    13. 13. Do We Need a New Programming Language for Developing YARN Applications? Copyright 2012 Cloudera Inc. All rights reserved
    14. 14. Leverage Existing Frameworks • Popular RPC libraries with support for multiple languages • C++, Java, Python • We need to make it easy to deploy existing applications on YARN Copyright 2012 Cloudera Inc. All rights reserved
    15. 15. Kitten: Playing with YARN Copyright 2012 Cloudera Inc. All rights reserved
    16. 16. Design Pattern: The Unified Application Master • Contains business logic and YARN logic • Primary reason: Communication • Also: dynamic resource allocation • Develop our master/worker applications locally and then deploy them on YARN Copyright 2012 Cloudera Inc. All rights reserved
    17. 17. YARN Lifecycle Management as a Service • Specifically, extensions of Guava’s Service interface • YarnClientService • AppMasterService • Contains all of the logic for creating applications and keeping an eye on them Copyright 2012 Cloudera Inc. All rights reserved
    18. 18. Moving the Configuration Logic Out of Java Copyright 2012 Cloudera Inc. All rights reserved
    19. 19. Lua as a Configuration Language • Small and Simple • Looks like a configuration file • Functions are there when/if you need them • Inheritance • Don’t Repeat Yourself • Forgiving of undefined values • Java/C++ Integration Copyright 2012 Cloudera Inc. All rights reserved
    20. 20. First Kitten Utility: The cat Function Copyright 2012 Cloudera Inc. All rights reserved
    21. 21. Second Kitten Utility: The yarn Function Copyright 2012 Cloudera Inc. All rights reserved
    22. 22. BranchReduceCopyright 2012 Cloudera Inc. All rights reserved
    23. 23. Branch-and-Bound Copyright 2012 Cloudera Inc. All rights reserved
    24. 24. The Challenge of Parallel Branch and Bound:Unbalanced Search Space • Some branches are pruned quickly • Can be difficult to determine the best splits a priori • Easy to revert to a de facto single-threaded search Copyright 2012 Cloudera Inc. All rights reserved
    25. 25. The Solution: Work Stealing Copyright 2012 Cloudera Inc. All rights reserved
    26. 26. You Write Three Classes• A Task class that implements Writable• A GlobalState class that implements Writable and has a mergeWith(GlobalState other) method• A Processor class that defines: • execute(T task, BranchReduceContext<T, GlobalState> ctxt); • With optional initialize and cleanup methods• Configuration is done via BranchReduceJob Copyright 2012 Cloudera Inc. All rights reserved
    27. 27. Example: The Knapsack Problem Copyright 2012 Cloudera Inc. All rights reserved
    28. 28. 0-1 Integer Programming Problems • NP-Hard Resource Allocation Problem • Portfolio Optimization • Asset Securitization Copyright 2012 Cloudera Inc. All rights reserved
    29. 29. Problem Formulation: (Simplified) LP Format Copyright 2012 Cloudera Inc. All rights reserved
    30. 30. Questions?@josh_wills

    ×