Long Haul Hadoop
Steve Loughran




© 2009 Hewlett-Packard Development Company, L.P.
The information contained herein is s...
Recurrent theme

 Issue           Title
 HADOOP-3421     Requirements for a Resource Manager
 HADOOP-4559     REST API for...
In-datacentre: binary

  Hadoop RPC, AVRO, Apache Thrift...
  Space-efficient text and data
  Performance through effic...
Requirements

  Works through proxies and firewalls
  Robust against cluster upgrades
  Deploys  tasks and sequences of...
WS-*
WS-*

  Integrates with SOA story
  Integration with WS-* Management layer
  BPEL-ready
  Auto-generation of WSDL from...
WS-* Problems

 Generated WSDL is a maintenance mess
 Generated clients and server code brittle
 No stable WS-* Managem...
REST
Pure REST

  PUT  and DELETE
  Cleaner, purer
  Restlet is best client (LGPL)‫‏‬
  Needs HTML5 browser for in-browser ...
HTTP POST

  Easy  to use from web browser
  Include URLs in bugreps
  Security through HTTPS only
  HttpClient for cl...
JobTracker or Workflow?

1. HADOOP-5303: Oozie, Hadoop Workflow
2. Cascading
3. Pig
4. BPEL
5. Constraint-based languages
...
Status as of Aug 6 2009

 1. Can deploy SmartFrog components and hence
   workflows or constrained models
 2. Portlet engi...
Long Haul Hadoop
Upcoming SlideShare
Loading in...5
×

Long Haul Hadoop

2,810

Published on

Mombasa: long-haul Hadoop

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,810
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
81
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Long Haul Hadoop

  1. 1. Long Haul Hadoop Steve Loughran © 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
  2. 2. Recurrent theme Issue Title HADOOP-3421 Requirements for a Resource Manager HADOOP-4559 REST API for job/task statistics MAPREDUCE-454 Service interface for Job submission MAPREDUCE-445 Web service interface to the job tracker HADOOP-5123 Ant tasks for job submission
  3. 3. In-datacentre: binary   Hadoop RPC, AVRO, Apache Thrift...   Space-efficient text and data   Performance through efficient marshalling, writable re-use   Brittle – fails with odd messages   Insecure – caller is trusted!
  4. 4. Requirements   Works through proxies and firewalls   Robust against cluster upgrades   Deploys tasks and sequences of tasks   Monitor jobs, logs   Pause, resume, kill jobs   Only trusted users to submit, manage jobs
  5. 5. WS-*
  6. 6. WS-*   Integrates with SOA story   Integration with WS-* Management layer   BPEL-ready   Auto-generation of WSDL from server code   Auto-generation of client code from WSDL   Stable Apache stacks (Axis2, CXF)‫‏‬   Security through HTTPS and WS-Security
  7. 7. WS-* Problems  Generated WSDL is a maintenance mess  Generated clients and server code brittle  No stable WS-* Management layer (yet)  WS-Security  Little/no web browser integration  Binary data tricky
  8. 8. REST
  9. 9. Pure REST   PUT and DELETE   Cleaner, purer   Restlet is best client (LGPL)‫‏‬   Needs HTML5 browser for in-browser PUT Model jobs as URLs you PUT; State through attributes you manipulate
  10. 10. HTTP POST   Easy to use from web browser   Include URLs in bugreps   Security through HTTPS only   HttpClient for client API   Binary files through form/multipart Have a job manager you POST to; URLs for each created job
  11. 11. JobTracker or Workflow? 1. HADOOP-5303: Oozie, Hadoop Workflow 2. Cascading 3. Pig 4. BPEL 5. Constraint-based languages The REST model could be independent of the back-end
  12. 12. Status as of Aug 6 2009 1. Can deploy SmartFrog components and hence workflows or constrained models 2. Portlet engine runs in datacentre 3. No REST API 4. Need access to logs, rest of JobTracker 5. Issue: how best to configure the work?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×