A closer look at hue: how to interface with Hadoop


Published on

Description about various ways to interface with Hadoop (Thrift, REST, JT plugins...) and how to build a Oozie workflow Drag & Drop editor.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A closer look at hue: how to interface with Hadoop

  1. 1. HueA closer Look at Hue
  2. 2. Whats on the Menu● Hue Architecture ○ Many interfaces to implement ○ How do I list HDFS files, how do I submit a job...? ○ SDK● Hue UI: Dynamic Workflow Editor ○ Why improve the user experience? ○ How can we improve the user experience? ○ Design Considerations ○ Design and Code Deep Dive
  3. 3. View from 30 000 feet
  4. 4. Ecosystem
  5. 5. Integrate with the Web● HTTP, stateless (async queries)● Frontend / Backend (e.g. different servers, pagination)● Resources (e.g. img, js, callbacks, css, json)● Browsers, multi techs● DB (sqlite, MySql, PostGres...)● i18n● ...More on UI later
  6. 6. Integrate Users● Auth ○ Standard ○ LDAP ○ PAM ○ Spnego ○ Custom (OAuth, Cookie...)
  7. 7. Integrate HDFS● Interfaces ○ Thrift (old) ■ NN ○ REST ■ WebHdfs ■ HttpFs (HA, new bugs)● Uploads to HDFSclass HDFStemporaryUploadedFile(object):class HDFSfileUploadHandler(FileUploadHandler):
  8. 8. Integrate Hive ● Beeswax: embedded Hive CLI ● Concurrent executions ● Beeswax / Hive Server 2 Thrift interfaces ● Hue models, HQL, Impala, DDLservice BeeswaxService { service TCLIService { QueryHandle query(1:Query query) throws(1:BeeswaxException error), TExecuteStatementResp ExecuteStatement(1: TExecuteStatementReq req); QueryHandle executeAndWait(1:Query query, 2:LogContextId clientCtx) TGetOperationStatusResp GetOperationStatus throws(1:BeeswaxException error), (1:TGetOperationStatusReq req);.... ....
  9. 9. Integrate HiveMoving to Pluggable interfaces DBMS SQL API Beeswax HS2 Table BTable HS2Table
  10. 10. Integrate Impala● New app● Same Beeswax/Hive Server 2 interfaces● One more moving target..
  11. 11. Integrate Jobs ● List, access, kill ● aka JobBrowser ● JobTracker Thrift Pluginmapred-site.xml More Thrift<property> <name>jobtracker.thrift.address</name> service Jobtracker extends common. <value></value> HadoopServiceBase {</property> ThriftJobInProgress getJob(10: common.<property> RequestContext ctx, 1: ThriftJobID jobID) <name>mapred.jobtracker.plugins</name> throws(1: JobNotFoundException err), <value> org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin ThriftJobList getRunningJobs(10: common. </value> RequestContext ctx),</property>
  12. 12. Integrate Jobs● Submit jobs (MR, Hive, Java, Pig...)● Manage workflows● Schedule workflows● REST (GET, PUT, POST)
  13. 13. Integrate Shell● Pig● HBase● Sqoop 2● Spawning Server● Greenlets● popen/pty/tty● IO (HTTP, DB...)● setuid● css/js/POST
  14. 14. Integrate YARN● JobBrowser MR2, Oozie● No JT, 4 more REST API● MR to History Server, missing logs...● MR1/2 API not 100% compatible (like Beeswax/HiveServer2, Beeswax UI/Impala switches)
  15. 15. Integrate security● hue superuser ● One hue JT, Shell setuid root:hue Kerberos ticket ● Hive Server 2 ?● hue Proxy User / doAs HDFS Oozie <property> <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name> <value>*</value> </property>
  16. 16. SDK: Integrate Developers● Set of raw libs ● Hue modelslibs apps/ /hadoop /jobbrowser /jobtracker /oozie /webhdfs /... /yarn /liboozie /rest /thrift
  17. 17. SDK: Integrate Developers$ ./build/env/bin/hue create_desktop_appclouddemo● Custom: views/model/templates● Reuse Hue libshttp://cloudera.github.com/hue/docs-2.1.0/sdk/sdk.html#fast-guide-to-creating-a-new-hue-application
  18. 18. CloudDemo exampleSingle click:● HTTP● HDFS● Oozie● JT
  19. 19. After the Interfaces...... now the dynamic UI (Oozie App use case)
  20. 20. Why Improve User Experience● Users like things that are easy to use● Intuition and ease of use
  21. 21. How to Improve User Experience● How can we do this for Oozie? ○ Hue users are not engineers ○ Most users are not familiar with shortcuts and command lines ○ Windowing systems have taught us drag and drop is goodDrag and drop every thing in a Workflow!
  22. 22. Old Hue Windowing System
  23. 23. Fundamentals of Front End Design● Behavior ○ Javascript ○ Knockout JS ○ JQuery● Presentation ○ CSS ○ Bootstrap● Content ○ HTML (Templates)● MV* ○ MVC ○ MVP ○ MVVM
  24. 24. Design Constraints● Existing backend from Hue 2.1 ○ Need to be able to easily migrate from Hue 2.1 to Hue 2.2● Knockout JS and JQuery already chosen ○ Rudimentary templating ○ Subscription based bindings ○ Observables for arrays and Javascript literals only ○ Event delegation● Existing UI from Hue 2.1 ○ Provides basic node movement through form submission (reloads the page) ○ Not dynamic
  25. 25. Other Design Considerations● Serializing should be trivial● Basic API ○ Save a workflow ○ Validate a node ○ Read a workflow● Difference in representation between Hue 2.1 backend and the KnockoutJS way of doing things● New nodes need an ID
  26. 26. Design - High Level Components● Left out ○ Many event bindings and custom events ○ Views left out
  27. 27. Purpose of the Node Model● Provides defaults for data:var NodeModel = ModelModule($);$.extend(NodeModel.prototype, { id: 0, name: , description: , node_type: , workflow: 0, child_links: []});● Sent over the wire● Mimics Django models
  28. 28. Model - ModelView Separation● ModelViews should be the "shield" and Models the source of truth.● Models are more serializable if they do not carry extraneous data.● Subscribed update through KnockoutJS:$.each(mapping, function(key, value) { var key = key; if (ko.isObservable(self[key])) { self[key].subscribe(function(value) { model[key] = ko.mapping.toJS(value); }); }});
  29. 29. Purpose of the Registry● Construction optimization● Constant time node lookup● Looking towards the future and storage● Simple start: var self = this; self.nodes = {}; module.prototype.initialize.apply(self, arguments); return self;
  30. 30. Purpose of ID Generation● Unique identifier for new nodes (IE: mapreduce:1).● Assists in creating parent-child relationships through links.var IdGeneratorModule = function($) { return function(options) { var self = this; $.extend(self, options); self.counter = 1; self.nextId = function() { return ((self.prefix) ? self.prefix + : : ) +self.counter++; }; };};
  31. 31. Transpose to Show● KnockoutJS supports 3 kinds of observables ○ Observables for literals ○ Observable arrays ○ Computed Observables● DAG received is represented as a tree● DAG represented as a list of lists when we display... MVVM restriction
  32. 32. Other Difficulties● Decision node representation● JSON.stringify does not include parent class members● Memory consumption● Cycles, cycles, cycles
  33. 33. Next steps● Integrate ○ Pig, Hive Server 2 ○ Oozie Bundles, SLA ○ Document model, "Editors", git ○ SDK revamp, language agnostic, proxy app● UX ○ Impala real time UI ○ Redesign overall layout● Sqoop 2, HBase? Mahout?... Face of Hadoop/CDH