0
Django+NoSQL
HOW Hue Integrates
with Hadoop
Abraham Elmahrek
Cloudera - March 5th, 2014

Monday, March 3, 14
What is Hue?
HUE 1

Desktop-like in a browser,
did its job but pretty slow,
memory leaks and not very
IE friendly but defi...
HISTORY
HUE 2

The first flat structure port,
with Twitter Bootstrap all
over the place.

Monday, March 3, 14
HISTORY
HUE 2.5

New apps, improved the UX
adding new nice
functionalities like
autocomplete and drag &
drop.

Monday, Mar...
HISTORY
HUE 3 ALPHA

Proposed design, didn’t
make it.

Monday, March 3, 14
HISTORY
HUE 3

Transition to the new UI,
major improvements and
new apps.

Monday, March 3, 14
HISTORY
HUE 3.5+

Monday, March 3, 14
Monday, March 3, 14

RE

O

ET
AS
T

M

B
BR

R

H

...

M
E

O

H

K

SP
AR

ER
Y

U

Q

IN

M

AD
DB

R

SE

U

ER

EP

...
APPS

Hue Plugins
YARN

Monday, March 3, 14

JobTracker

Pig
Oozie

Cloudera
Impala

HiveServer2
HDFS

Hive
Metastore

HBa...
FAST PACE
LAST MONTH

91 issues created and 90
resolved.
Core team + Community

Monday, March 3, 14
STACK
BACKEND
Python + Django (2.6+/
1.4.5)

Monday, March 3, 14

FRONTEND
jQuery
Bootstrap
Knockout.js
Love
HADOOP INTERFACES
REST & THRIFT

Many Hadoop interfaces
used
CUSTOM CLIENTS

Provide custom clients for
more explicit API ...
PROTOCOLS
REST

Use python-requests and a
custom client to streamline
RESTful interface calls.
Thrift

Custom connection p...
ACCESSIBILITY
Middleware

Make Hadoop interfaces
accessible in request objects

class ClusterMiddleware(object):
def proce...
HDFS
Goal

Easily browse, create, read,
update, and delete files in
HDFS

Monday, March 3, 14
HDFS - Communication
REST

The NameNode provides a
RESTful server called
WebHDFS
Explicit Client

Provide an API that is e...
HDFS - Cool Things
MIME Type Detection

Detect the various kinds of
files being read: Avro, GZIP,
etc.
Pagination

Nice pa...
HBase
Goal

Make it easy to view and
search HBase

Monday, March 3, 14
HBase - Technical Risk
2 Dimensions

Infinitely many columns and
rows

Sparseness

Column names will often
differ per row
...
HBase - Communication
Thrift

Communicate with HBase
using Thrift for better
filtering

Explicit Client

Provide an API th...
HBase - Results
Improved View

Intelligent view that
collapses null cells

Better Search

Improved searchability of
HBase ...
Hive
Goal

Make it easy to run queries
in Hive

Monday, March 3, 14
Hive - Communication
Thrift

Communicate with
HiveServer2 using Thrift

Explicit Client

Provide a higher level API
that i...
Hive - Results
One Page App

Intelligent view that lets
users worry about their
queries
Secure

Achieved some level of
sec...
DEMO
TIME

Monday, March 3, 14
Missed something?
GET STARTED

Take a closer look at REST and Thrift
communication in Hue
The inner workings of the Filebr...
What else does Hue do with Django?
Extensible settings

Security

Doc Model

Configuration of settings.py
provided through...
GET HUE
CLOUDERA’S CDH

TARBALL

CLOUDERA’S DEMO VM

Stable and highly tested
releases perfectly
integrated with the
Hadoo...
LINKS
WEBSITE

http://gethue.com
GITHUB

https://github.com/cloudera/hue/
BLOG

http://blog.gethue.com
TWITTER

@gethue
US...
THANKS.
QUESTIONS?

gethue.com

Monday, March 3, 14
Upcoming SlideShare
Loading in...5
×

How Hue integrates Hadoop with Django

5,471

Published on

Given the different structure of big data systems, they can be difficult to query, and even more difficult to explore. Hue, a Django-drive web application, integrates with these components and provides a clean, easy-to-use interface. In this discussion, we'll cover how the Hue project addressed communicating with Hbase, Hdfs, and various query engines. We'll also cover the reasons behind these design decisions.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,471
On Slideshare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
57
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "How Hue integrates Hadoop with Django"

  1. 1. Django+NoSQL HOW Hue Integrates with Hadoop Abraham Elmahrek Cloudera - March 5th, 2014 Monday, March 3, 14
  2. 2. What is Hue? HUE 1 Desktop-like in a browser, did its job but pretty slow, memory leaks and not very IE friendly but definitely advanced for its time (2009-2010). Monday, March 3, 14
  3. 3. HISTORY HUE 2 The first flat structure port, with Twitter Bootstrap all over the place. Monday, March 3, 14
  4. 4. HISTORY HUE 2.5 New apps, improved the UX adding new nice functionalities like autocomplete and drag & drop. Monday, March 3, 14
  5. 5. HISTORY HUE 3 ALPHA Proposed design, didn’t make it. Monday, March 3, 14
  6. 6. HISTORY HUE 3 Transition to the new UI, major improvements and new apps. Monday, March 3, 14
  7. 7. HISTORY HUE 3.5+ Monday, March 3, 14
  8. 8. Monday, March 3, 14 RE O ET AS T M B BR R H ... M E O H K SP AR ER Y U Q IN M AD DB R SE U ER EP R SE O W BR O P O O KE ZO SQ SE BA H AR C SE BR A O W SE R PA L IM O DE W SI SE G O R N O ER ZI H E IV E B JO G PI SE O W BR JO LE FI APPS
  9. 9. APPS Hue Plugins YARN Monday, March 3, 14 JobTracker Pig Oozie Cloudera Impala HiveServer2 HDFS Hive Metastore HBase Solr Zookeeper Sqoop2 LDAP SAML
  10. 10. FAST PACE LAST MONTH 91 issues created and 90 resolved. Core team + Community Monday, March 3, 14
  11. 11. STACK BACKEND Python + Django (2.6+/ 1.4.5) Monday, March 3, 14 FRONTEND jQuery Bootstrap Knockout.js Love
  12. 12. HADOOP INTERFACES REST & THRIFT Many Hadoop interfaces used CUSTOM CLIENTS Provide custom clients for more explicit API definitions Monday, March 3, 14 WebHDFS YARN API (RM, NM, MR...) HiveServer2 Impala HBase Oozie Sqoop2 ZooKeeper ...
  13. 13. PROTOCOLS REST Use python-requests and a custom client to streamline RESTful interface calls. Thrift Custom connection pooling and socket multiplexing to streamline thrift calls. Monday, March 3, 14 http_client.HttpClient(url, exc_class=WebHdfsException, logger=LOG) if security_enabled: client.set_kerberos_auth() return client thrift_util.get_client(TCLIService.Client, query_server['server_host'], query_server['server_port'], service_name=query_server['server_name'], kerberos_principal=kerberos_principal_short_name, use_sasl=use_sasl, mechanism=mechanism, username=user.username, timeout_seconds=conf.SERVER_CONN_TIMEOUT.get(), use_ssl=conf.SSL.ENABLED.get(), ca_certs=conf.SSL.CACERTS.get(), keyfile=conf.SSL.KEY.get(), certfile=conf.SSL.CERT.get(), validate=conf.SSL.VALIDATE.get())
  14. 14. ACCESSIBILITY Middleware Make Hadoop interfaces accessible in request objects class ClusterMiddleware(object): def process_view(self, request, ...): request.fs = cluster.get_hdfs(request.fs_ref) if request.user.is_authenticated(): if request.fs is not None: request.fs.setuser(request.user.username) def download(request, path): if not request.fs.exists(path): raise Http404(_("File not found.")) if not request.fs.isfile(path): raise PopupException(_("not a file.")) Monday, March 3, 14
  15. 15. HDFS Goal Easily browse, create, read, update, and delete files in HDFS Monday, March 3, 14
  16. 16. HDFS - Communication REST The NameNode provides a RESTful server called WebHDFS Explicit Client Provide an API that is explicit Request Accessible Provide a middleware for populating a request member Monday, March 3, 14 http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN ... class WebHdfs(Hdfs): def create(self, path, ...): ... def read(self, path, ...): ... def download(request, path): if not request.fs.exists(path): raise Http404(_("File not found.")) if not request.fs.isfile(path): raise PopupException(_("not a file."))
  17. 17. HDFS - Cool Things MIME Type Detection Detect the various kinds of files being read: Avro, GZIP, etc. Pagination Nice pagination by block size when viewing a file (soon to be more like a PDF reader with content automatically being added) Monday, March 3, 14
  18. 18. HBase Goal Make it easy to view and search HBase Monday, March 3, 14
  19. 19. HBase - Technical Risk 2 Dimensions Infinitely many columns and rows Sparseness Column names will often differ per row Monday, March 3, 14
  20. 20. HBase - Communication Thrift Communicate with HBase using Thrift for better filtering Explicit Client Provide an API that is explicit Monday, March 3, 14 class HBaseApi(Hdfs): def createTable(self, cluster, tableName, ...): ... def getRows(self, cluster, tableName, columns, ...): ...
  21. 21. HBase - Results Improved View Intelligent view that collapses null cells Better Search Improved searchability of HBase via flexible search MIME Type Detection Able to view documents in HBase: PDF, images, etc Monday, March 3, 14
  22. 22. Hive Goal Make it easy to run queries in Hive Monday, March 3, 14
  23. 23. Hive - Communication Thrift Communicate with HiveServer2 using Thrift Explicit Client Provide a higher level API that is explicit and easy to configure DBMS Further the capacities of the DBMS in Hue Monday, March 3, 14 thrift_util.get_client(TCLIService.Client, query_server['server_host'], query_server['server_port'], service_name=query_server['server_name'], ...) class HiveServerClient: HS2_MECHANISMS = {'KERBEROS': 'GSSAPI', 'NONE': 'PLAIN', 'NOSASL': 'NOSASL'} def __init__(self, query_server, user, ...): thrift_util.get_client(TCLIService.Client, ... class HiveServer2Dbms(object): def get_databases(self): return self.client.get_databases() ... def select_star_from(self, database, table): hql = "SELECT * FROM `%s.%s` %s" % (database, table.name, self._get_browse_limit_clause(table)) return self.execute_statement(hql) ...
  24. 24. Hive - Results One Page App Intelligent view that lets users worry about their queries Secure Achieved some level of security through SASL, Kerberos, and SSL Navigation Able to navigate databases and tables easily Monday, March 3, 14
  25. 25. DEMO TIME Monday, March 3, 14
  26. 26. Missed something? GET STARTED Take a closer look at REST and Thrift communication in Hue The inner workings of the Filebrowser The fundamentals of the HBase browser The concepts behind the Beeswax app Monday, March 3, 14
  27. 27. What else does Hue do with Django? Extensible settings Security Doc Model Configuration of settings.py provided through the hue.ini Configurable session timeouts, SAML authentication, etc. Polymorphic documents via a base document model Authentication Permissions Testing LDAP, PAM, OAuth, etc. provided through authentication backends Per-app permissions configurable in the UserAdmin Mocked and functional tests via nose + django-nose Monday, March 3, 14
  28. 28. GET HUE CLOUDERA’S CDH TARBALL CLOUDERA’S DEMO VM Stable and highly tested releases perfectly integrated with the Hadoop ecosystem, automagically configured by Cloudera Manager. Try in advance the latest and greatest but you’ll have to configure everything on your own. HORTONWORKS* MAPR* In HDP there’s an old forked version of Hue 2.3. Newer version than HDP, close to the original 2.5 minus apps like HBase, Impala, Sqoop, Search. Get to play with Hue and various Hadoop components in 5 minutes. It’s a self contained CDH environment ready to HP CLOUD* use. The newest addition, ships Hue 3.0 through the GreenButton products. BIGTOP EMBEDDED/DEMO IN IND. COMPANIES * YOUR MILEAGE MAY VARY. Monday, March 3, 14
  29. 29. LINKS WEBSITE http://gethue.com GITHUB https://github.com/cloudera/hue/ BLOG http://blog.gethue.com TWITTER @gethue USER GROUP hue-user@ Monday, March 3, 14
  30. 30. THANKS. QUESTIONS? gethue.com Monday, March 3, 14
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×