Expressiveness, Simplicity and Users


Published on

Craig Chambers' ECOOP 2011 Keynote talk.

Published in: Technology

Expressiveness, Simplicity and Users

  1. 1. Expressiveness, Simplicity, and Users<br />Craig Chambers<br />Google<br />
  2. 2. A Brief Bio<br />MIT: 82-86<br />Argus, with Barbara Liskov, Bill Weihl, Mark Day<br />Stanford: 86-91<br />Self, with David Ungar, UrsHölzle, …<br />U. of Washington: 91-07<br />Cecil, MultiJava, ArchJava; Vortex, DyC, Rhodium, ...<br />Jeff Dean, Dave Grove, Jonathan Aldrich, Todd Millstein, Sorin Lerner, … <br />Google: 07-<br />Flume, …<br />
  3. 3. Some Questions<br />What makes an idea successful?<br />Which ideas are adopted most?<br />Which ideas have the most impact?<br />
  4. 4. Outline<br />Some past projects<br />Self language, Self compiler<br />Cecil language, Vortex compiler<br />A current project<br />Flume: data-parallel programming system<br />
  5. 5. Self Language[Ungar & Smith 87]<br />Purified essence of Smalltalk-like languages<br />all data are objects<br />no classes<br />all actions are messages<br />field accesses, control structures<br />Core ideas are very simple<br />widely cited and understood<br />
  6. 6. Self v2[Chambers, Ungar, Chang 91]<br />Added encapsulation and privacy<br />Added prioritized multiple inheritance<br />supported both ordered and unordered mult. inh.<br />Sophisticated, or complicated?<br />Unified, or kitchen sink?<br />Not adopted; dropped from Self v3<br />
  7. 7. Self Compiler[Chambers, Ungar 89-91]<br />Dynamic optimizer (an early JIT compiler)<br />Customization: specialize code for each receiver class<br />Class/type dataflow analysis; lots of inlining<br />Lazy compilation of uncommon code paths<br />89: customization + simple analysis: effective<br />90: + complicated analysis: more effective but slow<br />91: + lazy compilation: still more effective, and fast<br />[Hölzle, … 92-94]: + dynamic type feedback: zowie!<br />Simple analysis + type feedback widely adopted<br />
  8. 8. Cecil Language[Chambers, Leavens, Millstein, Litvinov 92-99]<br />Pure objects, pure messages<br />Multimethods, static typechecking<br />encapsulation<br />modules, modular typechecking<br />constraint-based polymorphic type system<br />integrates F-bounded poly. and “where” clauses<br />later: MultiJava, EML [Lee], Diesel, …<br />Work on multimethods, “open classes” is well-known<br />Multimethods not widely available <br />
  9. 9. Vortex Compiler[Chambers, Dean, Grove, Lerner, … 94-01]<br />Whole-program optimizer, for Cecil, Java, …<br />Class hierarchy analysis<br />Profile-guided class/type feedback<br />Dataflow analysis, code specialization<br />Interprocedural static class/type analysis<br />Fast context-insensitive [Defouw], context-sensitive<br />Incremental recompilation; composable dataflow analyses<br />Project well-known<br />CHA: my most cited paper; a very simple idea<br />More-sophisticated work less widely adopted<br />
  10. 10. Some Other Work<br />DyC [Grant, Philipose, Mock, Eggers 96-00]<br />Dynamic compilation for C<br />ArchJava, AliasJava, … [Aldrich, Notkin 01-04 …]<br />PL support for software architecture<br />Cobalt, Rhodium [Lerner, Millstein 02-05 …]<br />Provably correct compiler optimizations<br />
  11. 11. Trends<br />Simpler ideas easier to adopt<br />Sophisticated ideas need a simple story to be impactful<br />Ideal: “deceptively simple”<br />Unification != Swiss Army Knife<br />Language papers have had more citations;compiler work has had more practical impact<br />The combination can work well<br />
  12. 12. A Current Project:Flume[Chambers, Raniwala, Perry, ... 10]<br />Make data-parallel MapReduce-like pipelineseasy to write<br /> yetefficient to run<br />
  13. 13. Data-Parallel Programming<br />Analyze & transform large, homogeneous data sets, processing separate elements in parallel<br />Web pages<br />Click logs<br />Purchase records<br />Geographical data sets<br />Census data<br />…<br />Ideal: “embarrassingly parallel” analysis ofpetabytes of data<br />
  14. 14. Challenges<br />Parallel distributed programming is hard<br />To do:<br />Assign machines<br />Distribute program binaries<br />Partition input data across machines<br />Synchronize jobs, communicate data when needed<br />Monitor jobs<br />Deal with faults in programs, machines, network, …<br />Tune: stragglers, work stealing, …<br />What if user is a domain expert, not a systems/PL expert?<br />
  15. 15. MapReduce[Dean & Ghemawat, 04]<br />purchases<br />queries<br />map<br />item -><br />co-item<br />term -><br />hour+city<br />shuffle<br />item -><br />all co-items<br />term-><br />(hour+city)*<br />reduce<br />item -><br />recommend<br />term-><br />what’s hot, when<br />
  16. 16. MapReduce<br />Greatly eases writing fault-tolerant data-parallel programs<br />Handles many tedious and/or tricky details<br />Has excellent (batch) performance<br />Offers a simple programming model<br />Lots of knobs for tuning<br />Pipelines of MapReduces?<br />Additional details to handle<br />temp files<br />pipeline control<br />Programming model becomes low-level<br />
  17. 17. Flume<br />Ease task of writing data-parallel pipelines<br />Offer high-level data-parallel abstractions,as a Java or C++ library<br />Classes for (possibly huge) immutable collections<br />Methods for data-parallel operations<br />Easily composed to form pipelines<br />Entire pipeline in a single program<br />Automatically optimize and execute pipeline,e.g., via a series of MapReduces<br />Manage lower-level details automatically<br />
  18. 18. Flume Classes and Methods<br />Core data-parallel collection classes:<br />PCollection<T>, PTable<K,V><br />Core data-parallel methods:<br />parallelDo(DoFn)<br />groupByKey()<br />combineValues(CombineFn)<br />flatten(...)<br />read(Source), writeTo(Sink), …<br />Derive other methods from these primitives:<br />join(...), count(), top(CompareFn,N), ...<br />
  19. 19. Example: TopWords<br />PCollection<String> lines =read(TextIO.source(“/gfs/corpus/*.txt”));<br />PCollection<String> words =lines.parallelDo(newExtractWordsFn());<br />PTable<String, Long> wordCounts =words.count();<br />PCollection<Pair<String, Long>> topWords, 1000);<br />PCollection<String>formattedOutput =topWords.parallelDo(newFormatCountFn());<br />formattedOutput.writeTo(TextIO.sink(“cnts.txt”));<br />;<br />
  20. 20. Example: TopWords<br />read(TextIO.source(“/gfs/corpus/*.txt”))<br />.parallelDo(newExtractWordsFn())<br />.count()<br />.top(new OrderCountsFn(), 1000)<br />.parallelDo(new FormatCountFn())<br />.writeTo(TextIO.sink(“cnts.txt”));<br />;<br />
  21. 21. Execution Graph<br />Data-parallel primitives (e.g., parallelDo) are “lazy”<br />Don’t actually run right away, but wait until demanded<br />Calls to primitives build an execution graph<br />Nodes are operations to be performed<br />Edges are PCollections that will hold the results<br />An unevaluated result PCollection is a “future”<br />Points to the graph that computes it<br />Derived operations (e.g., count, user code) call lazy primitives and so get inlined away<br />Evaluation is “demanded” by<br />Optimizes, then executes<br />
  22. 22. read<br />read(TextIO.source(“/…/*.txt”))<br />pDo<br />parallelDo(newExtractWordsFn())<br />pDo<br />count()<br />gbk<br />Execution Graph<br />cv<br />pDo<br />gbk<br />top(new OrderCountsFn(), 1000)<br />pDo<br />pDo<br />parallelDo(new FormatCountFn())<br />write<br />writeTo(TextIO.sink(“cnts.txt”))<br />
  23. 23. Optimizer<br />Fuse trees of parallelDo operations into one<br />Producer-consumer,co-consumers (“siblings”)<br />Eliminate now-unused intermediate PCollections<br />Form MapReduces<br />pDo + gbk + cv + pDo MapShuffleCombineReduce (MSCR)<br />General: multi-mapper, multi-reducer, multi-output<br />pDo<br />pDo<br />pDo<br />pDo<br />pDo<br />pDo<br />
  24. 24. read<br />read(TextIO.source(“/…/*.txt”))<br />mscr<br />pDo<br />pDo<br />parallelDo(newExtractWordsFn())<br />pDo<br />count()<br />gbk<br />Final Pipeline<br />Fusion<br />cv<br />mscr<br />pDo<br />8 operations 2 operations<br />gbk<br />top(new OrderCountsFn(), 1000)<br />pDo<br />pDo<br />pDo<br />parallelDo(new FormatCountFn())<br />write<br />writeTo(TextIO.sink(“cnts.txt”))<br />
  25. 25. Executor<br />Runs each optimized MSCR<br />If small data, runs locally, sequentially<br />develop and test in normal IDE<br />If large data, runs remotely, in parallel<br />Handles creating, deleting temp files<br />Supports fast re-execution of incomplete runs<br />Caches, reuses partial pipeline results<br />
  26. 26. Another Example: SiteData<br />GetPScoreFn,<br />GetVerticalFn<br />pDo<br />pDo<br />pDo<br />GetDocInfoFn<br />gbk<br />PickBestFn<br />cv<br />pDo<br />pDo<br />pDo<br />join()<br />gbk<br />pDo<br />pDo<br />MakeDocTraitsFn<br />
  27. 27. Another Example: SiteData<br />pDo<br />pDo<br />pDo<br />pDo<br />mscr<br />mscr<br />pDo<br />gbk<br />cv<br />pDo<br />pDo<br />pDo<br />11 ops 2 ops<br />gbk<br />pDo<br />pDo<br />pDo<br />
  28. 28. Experience<br />FlumeJava released to Google users in May 2009<br />Now: hundreds of pipelines run by hundreds of users every month<br />Real pipelines process megabytes <=> petabytes<br />Users find FlumeJava a lot easier than MapReduce<br />Advanced users can exert control over optimizer and executor if/when necessary<br />But when things go wrong, lower abstraction levels intrude<br />
  29. 29. How Well Does It Work?<br />How does FlumeJava compare in speed to:<br />an equally modular Java MapReduce pipeline?<br />a hand-optimized Java MapReduce pipeline?<br />a hand-optimized Sawzall pipeline?<br />Sawzall: language for logs processing<br />How big are pipelines in practice?<br />How much does the optimizer help?<br />
  30. 30. Performance<br />
  31. 31. Optimizer Impact<br />
  32. 32. Current and Future Work<br />FlumeC++ just released to Google users<br />Auto-tuner<br />Profile executions,choose good settings for tuning MapReduces<br />Other execution substrates than MapReduce<br />Continuous/streaming execution?<br />Dynamic code generation and optimization?<br />
  33. 33. A More Advanced Approach<br />Apply advanced PL ideas to the data-parallel domain<br />A custom language tuned to this domain<br />A sophisticated static optimizer and code generator<br />An integrated parallel run-time system<br />
  34. 34. Lumberjack<br />A language designed for data-parallel programming<br />An implicitly parallel model<br />All collections potentially PCollections<br />All loops potentially parallel<br />Functional<br />Mostly side-effect free<br />Concise lambdas<br />Advanced type system to minimize verbosity<br />
  35. 35. Static Optimizer<br />Decide which collections are PCollections,which loops are parallel loops<br />Interprocedural context-sensitive analysis<br />OO type analysis<br />side-effect analysis<br />inlining<br />dead assignment elimination<br />…<br />
  36. 36. Parallel Run-Time System<br />Similar to Flume’s run-time system<br />Schedules MapReduces<br />Manages temp files<br />Handles faults <br />
  37. 37. Result: Not Successful<br />A new language is a hard sell to most developers<br />Language details obscure key new concepts<br />Hard to be proficient in yet another language with yet another syntax<br />Libraries?<br />Increases risk to their projects<br />Optimizer constrained by limits of static analysis<br />
  38. 38. Response: FlumeJava<br />Replace custom language with Java + Flume library<br />More verbose syntactically<br /><ul><li>Flume abstractions highlighted
  39. 39. All standard libraries & coding idioms preserved
  40. 40. Much less risk
  41. 41. Easy to try out, easy to like, easy to adopt
  42. 42. Dynamic optimizer less constrained than static optimizer
  43. 43. Reuse parallel run-time system
  44. 44. Sophistication and novelty can hinder adoption</li></li></ul><li>Some Related Systems<br />Hadoop, Cascading<br />C#/LINQ, Dryad<br />Pig, PigLatin<br />streaming languages (e.g. StreamIt, Brook)<br />database query optimizers<br />
  45. 45. Conclusions<br />Simpler ideas easier to adopt<br />By researchers and by users<br />Sophisticated ideas still needed,to support simple interfaces<br />Doing things dynamically instead of staticallycan be liberating<br />
  46. 46. Thanks!<br />