BranchReduceDistributed Branch-and-Bound on YARNJune 14, 2012
About Me           Copyright 2012 Cloudera Inc. All rights reserved   2
Hadoop Distributed Processing Frameworks             Copyright 2012 Cloudera Inc. All rights reserved
Lots of Other Parallel Processing Platforms               Copyright 2012 Cloudera Inc. All rights reserved
Hadoop 2.0: Resource Scheduling with YARN              Copyright 2012 Cloudera Inc. All rights reserved
The Data Deluge and the Cambrian Explosion              Copyright 2012 Cloudera Inc. All rights reserved
Parallel Distributed Processing For Everyone               Copyright 2012 Cloudera Inc. All rights reserved
Building a New Processing Framework on YARN           Copyright 2012 Cloudera Inc. All rights reserved
A Terrifyingly Accurate Paraphrasing of JWZSome people, when confronted with a tediousproblem, say, “I know, I’ll write a ...
On Designing Frameworks             Copyright 2012 Cloudera Inc. All rights reserved
The Example YARN App: Distributed Shell              Copyright 2012 Cloudera Inc. All rights reserved
Do We Need a New Programming Language for      Developing YARN Applications?           Copyright 2012 Cloudera Inc. All ri...
Do We Need a New Programming Language for      Developing YARN Applications?           Copyright 2012 Cloudera Inc. All ri...
Leverage Existing Frameworks • Popular RPC libraries   with support for   multiple languages    • C++, Java, Python • We n...
Kitten: Playing with YARN              Copyright 2012 Cloudera Inc. All rights reserved
Design Pattern: The Unified Application Master                                              • Contains business logic     ...
YARN Lifecycle Management as a Service • Specifically, extensions   of Guava’s Service   interface    • YarnClientService ...
Moving the Configuration Logic Out of Java              Copyright 2012 Cloudera Inc. All rights reserved
Lua as a Configuration Language • Small and Simple    • Looks like a      configuration file    • Functions are there     ...
First Kitten Utility: The cat Function                Copyright 2012 Cloudera Inc. All rights reserved
Second Kitten Utility: The yarn Function               Copyright 2012 Cloudera Inc. All rights reserved
BranchReduceCopyright 2012 Cloudera Inc. All rights reserved
Branch-and-Bound            Copyright 2012 Cloudera Inc. All rights reserved
The Challenge of Parallel Branch and Bound:Unbalanced Search Space                                              • Some bra...
The Solution: Work Stealing              Copyright 2012 Cloudera Inc. All rights reserved
You Write Three Classes• A Task class that implements Writable• A GlobalState class that implements Writable and has a  me...
Example: The Knapsack Problem    Copyright 2012 Cloudera Inc. All rights reserved
0-1 Integer Programming Problems • NP-Hard Resource   Allocation Problem • Portfolio Optimization • Asset Securitization  ...
Problem Formulation: (Simplified) LP Format              Copyright 2012 Cloudera Inc. All rights reserved
Questions?@josh_wills
Upcoming SlideShare
Loading in...5
×

Hadoop Summit 2012 | BranchReduce: Distributed Branch-and-Bound on YARN

1,957

Published on

Session Abstract</strong><div></div><div><p>Branch-and-bound is a widely used technique for efficiently searching for solutions to combinatorial optimization problems. In this session, we will introduce BranchReduce, an open-source Java library for performing distributed branch-and-bound on a Hadoop cluster under YARN. Applications only need to write code that is specific to their optimization problem (namely the branching rule, the lower bound computation, and the upper bound computation), and BranchReduce handles deploying the application to the cluster, managing the execution, and periodically rebalancing the search space across the machines. We will give an overview of how BranchReduce works and then walk through an example that solves a scheduling problem with a near-linear speedup over a single machine implementation.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,957
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 1600+ Lines of Java Code796 Client, 837 Application MasterRelatively Simple Lifecycle PatternsStart Up: RPCs and Resource DefinitionsPeriodic Heartbeat ChecksShut Down: Resource CleanupNo advanced featuresCheckpointingApplication Master failuresDynamic resource allocation
  • Transcript of "Hadoop Summit 2012 | BranchReduce: Distributed Branch-and-Bound on YARN"

    1. 1. BranchReduceDistributed Branch-and-Bound on YARNJune 14, 2012
    2. 2. About Me Copyright 2012 Cloudera Inc. All rights reserved 2
    3. 3. Hadoop Distributed Processing Frameworks Copyright 2012 Cloudera Inc. All rights reserved
    4. 4. Lots of Other Parallel Processing Platforms Copyright 2012 Cloudera Inc. All rights reserved
    5. 5. Hadoop 2.0: Resource Scheduling with YARN Copyright 2012 Cloudera Inc. All rights reserved
    6. 6. The Data Deluge and the Cambrian Explosion Copyright 2012 Cloudera Inc. All rights reserved
    7. 7. Parallel Distributed Processing For Everyone Copyright 2012 Cloudera Inc. All rights reserved
    8. 8. Building a New Processing Framework on YARN Copyright 2012 Cloudera Inc. All rights reserved
    9. 9. A Terrifyingly Accurate Paraphrasing of JWZSome people, when confronted with a tediousproblem, say, “I know, I’ll write a framework.”Now they have two tedious problems. Copyright 2012 Cloudera Inc. All rights reserved
    10. 10. On Designing Frameworks Copyright 2012 Cloudera Inc. All rights reserved
    11. 11. The Example YARN App: Distributed Shell Copyright 2012 Cloudera Inc. All rights reserved
    12. 12. Do We Need a New Programming Language for Developing YARN Applications? Copyright 2012 Cloudera Inc. All rights reserved
    13. 13. Do We Need a New Programming Language for Developing YARN Applications? Copyright 2012 Cloudera Inc. All rights reserved
    14. 14. Leverage Existing Frameworks • Popular RPC libraries with support for multiple languages • C++, Java, Python • We need to make it easy to deploy existing applications on YARN Copyright 2012 Cloudera Inc. All rights reserved
    15. 15. Kitten: Playing with YARN Copyright 2012 Cloudera Inc. All rights reserved
    16. 16. Design Pattern: The Unified Application Master • Contains business logic and YARN logic • Primary reason: Communication • Also: dynamic resource allocation • Develop our master/worker applications locally and then deploy them on YARN Copyright 2012 Cloudera Inc. All rights reserved
    17. 17. YARN Lifecycle Management as a Service • Specifically, extensions of Guava’s Service interface • YarnClientService • AppMasterService • Contains all of the logic for creating applications and keeping an eye on them Copyright 2012 Cloudera Inc. All rights reserved
    18. 18. Moving the Configuration Logic Out of Java Copyright 2012 Cloudera Inc. All rights reserved
    19. 19. Lua as a Configuration Language • Small and Simple • Looks like a configuration file • Functions are there when/if you need them • Inheritance • Don’t Repeat Yourself • Forgiving of undefined values • Java/C++ Integration Copyright 2012 Cloudera Inc. All rights reserved
    20. 20. First Kitten Utility: The cat Function Copyright 2012 Cloudera Inc. All rights reserved
    21. 21. Second Kitten Utility: The yarn Function Copyright 2012 Cloudera Inc. All rights reserved
    22. 22. BranchReduceCopyright 2012 Cloudera Inc. All rights reserved
    23. 23. Branch-and-Bound Copyright 2012 Cloudera Inc. All rights reserved
    24. 24. The Challenge of Parallel Branch and Bound:Unbalanced Search Space • Some branches are pruned quickly • Can be difficult to determine the best splits a priori • Easy to revert to a de facto single-threaded search Copyright 2012 Cloudera Inc. All rights reserved
    25. 25. The Solution: Work Stealing Copyright 2012 Cloudera Inc. All rights reserved
    26. 26. You Write Three Classes• A Task class that implements Writable• A GlobalState class that implements Writable and has a mergeWith(GlobalState other) method• A Processor class that defines: • execute(T task, BranchReduceContext<T, GlobalState> ctxt); • With optional initialize and cleanup methods• Configuration is done via BranchReduceJob Copyright 2012 Cloudera Inc. All rights reserved
    27. 27. Example: The Knapsack Problem Copyright 2012 Cloudera Inc. All rights reserved
    28. 28. 0-1 Integer Programming Problems • NP-Hard Resource Allocation Problem • Portfolio Optimization • Asset Securitization Copyright 2012 Cloudera Inc. All rights reserved
    29. 29. Problem Formulation: (Simplified) LP Format Copyright 2012 Cloudera Inc. All rights reserved
    30. 30. Questions?@josh_wills

    ×