Your SlideShare is downloading. ×
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
REEF: Retainable Evaluator Execution Framework
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

REEF: Retainable Evaluator Execution Framework

393

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1jaRzhu. …

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1jaRzhu.

Rusty Sears introduces REEF along with examples of computational frameworks, including interactive sessions, iterative graph processing, bulk synchronous computations, Hive queries, and MapReduce. Filmed at qconsf.com.

Rusty Sears is a member of Microsoft's Cloud Information Services Lab, where he works on infrastructure for large-scale hosted services. In addition to his work on REEF, he has an interest in log-structured indexing and persistent storage for serving workloads. Prior to Microsoft, he worked on backend storage and services for mobile and large-scale applications at Yahoo! Research.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
393
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /reef InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 2. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 3. True multi-tenancy… Unified realtime-batch workflows In-situ processing Utilization: one cluster for scientists and production …but, only for sophisticated apps
  • 4. True multi-tenancy… Unified realtime-batch workflows In-situ processing Utilization: one cluster for scientists and production …but, only for sophisticated apps Fault tolerance Pre-emption Elasticity
  • 5. σ π Checkpointing Fault tolerance Elasticity ⋈⋈⋈⋈⋈
  • 6. Checkpointing Fault tolerance Elasticity Iterative computations
  • 7. Checkpointing Fault tolerance Elasticity Iterative computations Low latency communication
  • 8. σ π Tedious: users write code to dump + load data at each step Slow: Data unnecessarily written to disk, read back (and re-parsed) at each step Hard to build: Each duplicates the same mechanisms under the hood ⋈
  • 9. Support YARN versions of new (and existing) scalable data pipelines. Allow them to be transparently composed. Move redundant tooling and plumbing into shared libraries.
  • 10. Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle faults, preemption, etc… REEF Evaluators ( ) hold hardware resources, allowing multiple Activities π σ, etc…) to use ( , , , , , the same cached state. σ σ σ
  • 11. Handover of pre-partitioned and parsed data between frameworks Iterative computation Interactive queries $…
  • 12. Thread per connection / file doesn’t scale Provide static subset of Rx → static checking of event flows → aggressive JVM event inlining Latency, throughput profiler
  • 13. Fault-tolerant async communication Group communication / shuffle Low-latency communication Storage, checkpointing, preemption
  • 14. Configuring distributed systems is hard So is reasoning about event flows Tang performs static and dynamic checks to help ease the pain
  • 15. Error: Configuring distributed systems is hard So is reasoning about event flows Tang performs static and dynamic checks to help ease the pain container-4872364523847-02.stderr: NullPointerException at: java…eval():1234 ShellActivity.helper():546 Error: ShellActivity.onNext():789 Unknowninstanceof Evaluator Required parameter “Command” YarnEvaluator.onNext():12 Missing required parameter “cmd” Got ShellActivity
  • 16. sears@microsoft.com
  • 17. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/reef

×