Hue
A closer Look at Hue
What's on the Menu
● Hue Architecture
  ○ Many interfaces to implement
  ○ How do I list HDFS files, how do I submit a job...?
  ○ SDK
● Hue UI: Dynamic Workflow Editor
  ○   Why improve the user experience?
  ○   How can we improve the user experience?
  ○   Design Considerations
  ○   Design and Code Deep Dive
View from 30 000 feet
Ecosystem
Integrate with the Web
● HTTP, stateless (async queries)
● Frontend / Backend (e.g. different servers,
  pagination)
● Resources (e.g. img, js, callbacks, css, json)
● Browsers, multi techs
● DB (sqlite, MySql, PostGres...)
● i18n
● ...

More on UI later
Integrate Users
● Auth
  ○   Standard
  ○   LDAP
  ○   PAM
  ○   Spnego
  ○   Custom
      (OAuth, Cookie...)
Integrate HDFS
● Interfaces
     ○ Thrift (old)
       ■ NN
     ○ REST
       ■ WebHdfs
       ■ HttpFs (HA, new bugs)


● Uploads to HDFS
class HDFStemporaryUploadedFile(object):
class HDFSfileUploadHandler(FileUploadHandler):
Integrate Hive
   ● Beeswax: embedded Hive CLI
   ● Concurrent executions

   ● Beeswax / Hive Server 2 Thrift interfaces
   ● Hue models, HQL, Impala, DDL
service BeeswaxService {
                                                     service TCLIService {
 QueryHandle query(1:Query query) throws(1:
BeeswaxException error),                              TExecuteStatementResp ExecuteStatement(1:
                                                     TExecuteStatementReq req);
  QueryHandle executeAndWait(1:Query query, 2:
LogContextId clientCtx)                                TGetOperationStatusResp GetOperationStatus
                 throws(1:BeeswaxException error),   (1:TGetOperationStatusReq req);
....                                                 ....
Integrate Hive
Moving to Pluggable interfaces
                                       DBMS
                                      SQL API




                                 Beeswax        HS2




                                       Table




                                  BTable       HS2Table
Integrate Impala

● New app
● Same Beeswax/Hive Server 2 interfaces
● One more moving target..
Integrate Jobs
    ● List, access, kill
    ● aka JobBrowser

    ● JobTracker Thrift Plugin
mapred-site.xml
                                                       More Thrift
<property>
 <name>jobtracker.thrift.address</name>                service Jobtracker extends common.
 <value>0.0.0.0:9290</value>                           HadoopServiceBase {
</property>                                             ThriftJobInProgress getJob(10: common.
<property>                                             RequestContext ctx, 1: ThriftJobID jobID)
 <name>mapred.jobtracker.plugins</name>                      throws(1: JobNotFoundException err),
 <value>
   org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin    ThriftJobList getRunningJobs(10: common.
 </value>                                              RequestContext ctx),
</property>
Integrate Jobs
● Submit jobs (MR, Hive, Java, Pig...)
● Manage workflows
● Schedule workflows


● REST (GET, PUT, POST)
Integrate Shell
● Pig
● HBase
● Sqoop 2

●   Spawning Server
●   Greenlets
●   popen/pty/tty
●   IO (HTTP, DB...)
●   setuid
●   css/js/POST
Integrate YARN
● JobBrowser MR2, Oozie

● No JT, 4 more REST API
● MR to History Server, missing logs...
● MR1/2 API not 100% compatible
  (like Beeswax/HiveServer2, Beeswax
  UI/Impala switches)
Integrate security
● 'hue' superuser                                 ●   One 'hue'
  JT, Shell setuid root:hue                           Kerberos ticket
                                                  ●   Hive Server 2 ?
● 'hue' Proxy User / doAs
  HDFS
  Oozie
      <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
        <value>*</value>
      </property>
SDK: Integrate Developers
● Set of raw libs         ● Hue models

libs                      apps/
    /hadoop                   /jobbrowser
            /jobtracker       /oozie
            /webhdfs          /...
            /yarn
    /liboozie
    /rest
    /thrift
SDK: Integrate Developers
$ ./build/env/bin/hue create_desktop_app
clouddemo


● Custom: views/model/templates
● Reuse Hue libs

http://cloudera.github.com/hue/docs-2.1.0
/sdk/sdk.html#fast-guide-to-creating-a-new-
hue-application
CloudDemo example
Single click:

●   HTTP
●   HDFS
●   Oozie
●   JT
After the Interfaces...

... now the dynamic UI
    (Oozie App use case)
Why Improve User Experience
● Users like things that are easy to use
● Intuition and ease of use
How to Improve User Experience
● How can we do this for Oozie?
  ○ Hue users are not engineers
  ○ Most users are not familiar with shortcuts and
    command lines
  ○ Windowing systems have taught us drag and drop is
    good




Drag and drop every thing in a Workflow!
Old Hue Windowing System
Fundamentals of Front End Design
● Behavior
  ○ Javascript
  ○ Knockout JS
  ○ JQuery
● Presentation
  ○ CSS
  ○ Bootstrap
● Content
  ○ HTML (Templates)
● MV*
  ○ MVC
  ○ MVP
  ○ MVVM
Design Constraints
● Existing backend from Hue 2.1
  ○ Need to be able to easily migrate from Hue 2.1 to
    Hue 2.2
● Knockout JS and JQuery already chosen
  ○   Rudimentary templating
  ○   Subscription based bindings
  ○   Observables for arrays and Javascript literals only
  ○   Event delegation
● Existing UI from Hue 2.1
  ○ Provides basic node movement through form
    submission (reloads the page)
  ○ Not dynamic
Other Design Considerations
● Serializing should be trivial
● Basic API
   ○ Save a workflow
   ○ Validate a node
   ○ Read a workflow
● Difference in representation between Hue
  2.1 backend and the KnockoutJS way of
  doing things
● New nodes need an ID
Design - High Level Components




● Left out
  ○ Many event bindings and custom events
  ○ Views left out
Purpose of the Node Model
● Provides defaults for data:
var NodeModel = ModelModule($);
$.extend(NodeModel.prototype, {
  id: 0,
  name: '',
  description: '',
  node_type: '',
  workflow: 0,
  child_links: []
});

● Sent over the wire
● Mimics Django models
Model - ModelView Separation
● ModelViews should be the "shield" and
  Models the source of truth.
● Models are more serializable if they do not
  carry extraneous data.
● Subscribed update through KnockoutJS:
$.each(mapping, function(key, value) {
    var key = key;
    if (ko.isObservable(self[key])) {
         self[key].subscribe(function(value) {
             model[key] = ko.mapping.toJS(value);
         });
    }
});
Purpose of the Registry
●   Construction optimization
●   Constant time node lookup
●   Looking towards the future and storage
●   Simple start:
    var self = this;
    self.nodes = {};
    module.prototype.initialize.apply(self, arguments);
    return self;
Purpose of ID Generation
● Unique identifier for new nodes (IE: mapreduce:1).
● Assists in creating parent-child relationships through
   links.
var IdGeneratorModule = function($) {
   return function(options) {
      var self = this;
      $.extend(self, options);
      self.counter = 1;
      self.nextId = function() {
         return ((self.prefix) ? self.prefix + ':' : '') +
self.counter++;
      };
   };
};
Transpose to Show
● KnockoutJS supports 3 kinds of observables
    ○ Observables for literals
    ○ Observable arrays
    ○ Computed Observables
●   DAG received is represented as a tree




● DAG represented as a list of lists when we display...
    MVVM restriction
Other Difficulties
● Decision node representation
● JSON.stringify does not include parent class
  members
● Memory consumption
● Cycles, cycles, cycles
Next steps
● Integrate
  ○   Pig, Hive Server 2
  ○   Oozie Bundles, SLA
  ○   Document model, "Editors", git
  ○   SDK revamp, language agnostic, proxy app
● UX
  ○ Impala real time UI
  ○ Redesign overall layout
● Sqoop 2, HBase? Mahout?...


              Face of Hadoop/CDH

A closer look at hue: how to interface with Hadoop

  • 1.
  • 2.
    What's on theMenu ● Hue Architecture ○ Many interfaces to implement ○ How do I list HDFS files, how do I submit a job...? ○ SDK ● Hue UI: Dynamic Workflow Editor ○ Why improve the user experience? ○ How can we improve the user experience? ○ Design Considerations ○ Design and Code Deep Dive
  • 3.
    View from 30000 feet
  • 4.
  • 5.
    Integrate with theWeb ● HTTP, stateless (async queries) ● Frontend / Backend (e.g. different servers, pagination) ● Resources (e.g. img, js, callbacks, css, json) ● Browsers, multi techs ● DB (sqlite, MySql, PostGres...) ● i18n ● ... More on UI later
  • 6.
    Integrate Users ● Auth ○ Standard ○ LDAP ○ PAM ○ Spnego ○ Custom (OAuth, Cookie...)
  • 7.
    Integrate HDFS ● Interfaces ○ Thrift (old) ■ NN ○ REST ■ WebHdfs ■ HttpFs (HA, new bugs) ● Uploads to HDFS class HDFStemporaryUploadedFile(object): class HDFSfileUploadHandler(FileUploadHandler):
  • 8.
    Integrate Hive ● Beeswax: embedded Hive CLI ● Concurrent executions ● Beeswax / Hive Server 2 Thrift interfaces ● Hue models, HQL, Impala, DDL service BeeswaxService { service TCLIService { QueryHandle query(1:Query query) throws(1: BeeswaxException error), TExecuteStatementResp ExecuteStatement(1: TExecuteStatementReq req); QueryHandle executeAndWait(1:Query query, 2: LogContextId clientCtx) TGetOperationStatusResp GetOperationStatus throws(1:BeeswaxException error), (1:TGetOperationStatusReq req); .... ....
  • 9.
    Integrate Hive Moving toPluggable interfaces DBMS SQL API Beeswax HS2 Table BTable HS2Table
  • 10.
    Integrate Impala ● Newapp ● Same Beeswax/Hive Server 2 interfaces ● One more moving target..
  • 11.
    Integrate Jobs ● List, access, kill ● aka JobBrowser ● JobTracker Thrift Plugin mapred-site.xml More Thrift <property> <name>jobtracker.thrift.address</name> service Jobtracker extends common. <value>0.0.0.0:9290</value> HadoopServiceBase { </property> ThriftJobInProgress getJob(10: common. <property> RequestContext ctx, 1: ThriftJobID jobID) <name>mapred.jobtracker.plugins</name> throws(1: JobNotFoundException err), <value> org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin ThriftJobList getRunningJobs(10: common. </value> RequestContext ctx), </property>
  • 12.
    Integrate Jobs ● Submitjobs (MR, Hive, Java, Pig...) ● Manage workflows ● Schedule workflows ● REST (GET, PUT, POST)
  • 13.
    Integrate Shell ● Pig ●HBase ● Sqoop 2 ● Spawning Server ● Greenlets ● popen/pty/tty ● IO (HTTP, DB...) ● setuid ● css/js/POST
  • 14.
    Integrate YARN ● JobBrowserMR2, Oozie ● No JT, 4 more REST API ● MR to History Server, missing logs... ● MR1/2 API not 100% compatible (like Beeswax/HiveServer2, Beeswax UI/Impala switches)
  • 15.
    Integrate security ● 'hue'superuser ● One 'hue' JT, Shell setuid root:hue Kerberos ticket ● Hive Server 2 ? ● 'hue' Proxy User / doAs HDFS Oozie <property> <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name> <value>*</value> </property>
  • 16.
    SDK: Integrate Developers ●Set of raw libs ● Hue models libs apps/ /hadoop /jobbrowser /jobtracker /oozie /webhdfs /... /yarn /liboozie /rest /thrift
  • 17.
    SDK: Integrate Developers $./build/env/bin/hue create_desktop_app clouddemo ● Custom: views/model/templates ● Reuse Hue libs http://cloudera.github.com/hue/docs-2.1.0 /sdk/sdk.html#fast-guide-to-creating-a-new- hue-application
  • 18.
    CloudDemo example Single click: ● HTTP ● HDFS ● Oozie ● JT
  • 19.
    After the Interfaces... ...now the dynamic UI (Oozie App use case)
  • 20.
    Why Improve UserExperience ● Users like things that are easy to use ● Intuition and ease of use
  • 21.
    How to ImproveUser Experience ● How can we do this for Oozie? ○ Hue users are not engineers ○ Most users are not familiar with shortcuts and command lines ○ Windowing systems have taught us drag and drop is good Drag and drop every thing in a Workflow!
  • 22.
  • 23.
    Fundamentals of FrontEnd Design ● Behavior ○ Javascript ○ Knockout JS ○ JQuery ● Presentation ○ CSS ○ Bootstrap ● Content ○ HTML (Templates) ● MV* ○ MVC ○ MVP ○ MVVM
  • 24.
    Design Constraints ● Existingbackend from Hue 2.1 ○ Need to be able to easily migrate from Hue 2.1 to Hue 2.2 ● Knockout JS and JQuery already chosen ○ Rudimentary templating ○ Subscription based bindings ○ Observables for arrays and Javascript literals only ○ Event delegation ● Existing UI from Hue 2.1 ○ Provides basic node movement through form submission (reloads the page) ○ Not dynamic
  • 25.
    Other Design Considerations ●Serializing should be trivial ● Basic API ○ Save a workflow ○ Validate a node ○ Read a workflow ● Difference in representation between Hue 2.1 backend and the KnockoutJS way of doing things ● New nodes need an ID
  • 26.
    Design - HighLevel Components ● Left out ○ Many event bindings and custom events ○ Views left out
  • 27.
    Purpose of theNode Model ● Provides defaults for data: var NodeModel = ModelModule($); $.extend(NodeModel.prototype, { id: 0, name: '', description: '', node_type: '', workflow: 0, child_links: [] }); ● Sent over the wire ● Mimics Django models
  • 28.
    Model - ModelViewSeparation ● ModelViews should be the "shield" and Models the source of truth. ● Models are more serializable if they do not carry extraneous data. ● Subscribed update through KnockoutJS: $.each(mapping, function(key, value) { var key = key; if (ko.isObservable(self[key])) { self[key].subscribe(function(value) { model[key] = ko.mapping.toJS(value); }); } });
  • 29.
    Purpose of theRegistry ● Construction optimization ● Constant time node lookup ● Looking towards the future and storage ● Simple start: var self = this; self.nodes = {}; module.prototype.initialize.apply(self, arguments); return self;
  • 30.
    Purpose of IDGeneration ● Unique identifier for new nodes (IE: mapreduce:1). ● Assists in creating parent-child relationships through links. var IdGeneratorModule = function($) { return function(options) { var self = this; $.extend(self, options); self.counter = 1; self.nextId = function() { return ((self.prefix) ? self.prefix + ':' : '') + self.counter++; }; }; };
  • 31.
    Transpose to Show ●KnockoutJS supports 3 kinds of observables ○ Observables for literals ○ Observable arrays ○ Computed Observables ● DAG received is represented as a tree ● DAG represented as a list of lists when we display... MVVM restriction
  • 32.
    Other Difficulties ● Decisionnode representation ● JSON.stringify does not include parent class members ● Memory consumption ● Cycles, cycles, cycles
  • 33.
    Next steps ● Integrate ○ Pig, Hive Server 2 ○ Oozie Bundles, SLA ○ Document model, "Editors", git ○ SDK revamp, language agnostic, proxy app ● UX ○ Impala real time UI ○ Redesign overall layout ● Sqoop 2, HBase? Mahout?... Face of Hadoop/CDH