REEF: Retainable Evaluator Execution Framework

629 views
475 views

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1jaRzhu.

Rusty Sears introduces REEF along with examples of computational frameworks, including interactive sessions, iterative graph processing, bulk synchronous computations, Hive queries, and MapReduce. Filmed at qconsf.com.

Rusty Sears is a member of Microsoft's Cloud Information Services Lab, where he works on infrastructure for large-scale hosted services. In addition to his work on REEF, he has an interest in log-structured indexing and persistent storage for serving workloads. Prior to Microsoft, he worked on backend storage and services for mobile and large-scale applications at Yahoo! Research.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
629
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

REEF: Retainable Evaluator Execution Framework

  1. 1. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /reef InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  2. 2. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  3. 3. True multi-tenancy… Unified realtime-batch workflows In-situ processing Utilization: one cluster for scientists and production …but, only for sophisticated apps
  4. 4. True multi-tenancy… Unified realtime-batch workflows In-situ processing Utilization: one cluster for scientists and production …but, only for sophisticated apps Fault tolerance Pre-emption Elasticity
  5. 5. σ π Checkpointing Fault tolerance Elasticity ⋈⋈⋈⋈⋈
  6. 6. Checkpointing Fault tolerance Elasticity Iterative computations
  7. 7. Checkpointing Fault tolerance Elasticity Iterative computations Low latency communication
  8. 8. σ π Tedious: users write code to dump + load data at each step Slow: Data unnecessarily written to disk, read back (and re-parsed) at each step Hard to build: Each duplicates the same mechanisms under the hood ⋈
  9. 9. Support YARN versions of new (and existing) scalable data pipelines. Allow them to be transparently composed. Move redundant tooling and plumbing into shared libraries.
  10. 10. Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle faults, preemption, etc… REEF Evaluators ( ) hold hardware resources, allowing multiple Activities π σ, etc…) to use ( , , , , , the same cached state. σ σ σ
  11. 11. Handover of pre-partitioned and parsed data between frameworks Iterative computation Interactive queries $…
  12. 12. Thread per connection / file doesn’t scale Provide static subset of Rx → static checking of event flows → aggressive JVM event inlining Latency, throughput profiler
  13. 13. Fault-tolerant async communication Group communication / shuffle Low-latency communication Storage, checkpointing, preemption
  14. 14. Configuring distributed systems is hard So is reasoning about event flows Tang performs static and dynamic checks to help ease the pain
  15. 15. Error: Configuring distributed systems is hard So is reasoning about event flows Tang performs static and dynamic checks to help ease the pain container-4872364523847-02.stderr: NullPointerException at: java…eval():1234 ShellActivity.helper():546 Error: ShellActivity.onNext():789 Unknowninstanceof Evaluator Required parameter “Command” YarnEvaluator.onNext():12 Missing required parameter “cmd” Got ShellActivity
  16. 16. sears@microsoft.com
  17. 17. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/reef

×