0
INTEGRATE HUE
WITH YOUR
HADOOP CLUSTER
Romain Rigaux

Y! HUG Apr 16, 2014
WHAT

IS HUE?
WEB INTERFACE FOR MAKING
HADOOP EASIER TO USE


Suite of apps for each Hadoop component,

like Hive, Pig, Im...
VIEW FROM

30K FEET
Hadoop Web Server You and even

that friend 

that uses IE9 ;)
YARN JobTracker Oozie
Pig
HDFS
HiveServer2
Hive	

Metastore
Cloudera	

Impala
Solr
HBase
Sqoop2
Zookeeper
LDAP	

SAML
Hue ...
TARGET

OF HUE
GETTING STARTED WITH HADOOP


BEING PRODUCTIVE EXPLORING
DIFFERENT ANGLES OF THE PLATFORM
!
LET ANY USER FO...
OPEN SOURCE

~3000 COMMITS


33 CONTRIBUTORS



648 STARS



212 FORKS
!


github.com/cloudera/hue
THE CORE

TEAM PLAYERS
team.gethue.com
ABRAHAM ELMAHREK
ROMAIN RIGAUX
ENRICO BERTI
CHANG BEER
TALKS
Meetups and events in NYC,
Paris, LA, Tokyo, SF,
Stockholm, Vienna, San Jose,
Singapore…

Coming up in London, West
...
FAST PACE
LAST 30 DAYS
41 issues created and 38
resolved.
Core team + Community
TREND: GROWTH
gethue.com
HISTORY

HUE 1
Desktop-like in a browser,
did its job but pretty slow,
memory leaks and not very
IE friendly but definitely...
HISTORY

HUE 2
The first flat structure port,
with Twitter Bootstrap all
over the place.
HISTORY

HUE 2.5
New apps, improved the UX
adding new nice
functionalities like
autocomplete and drag &
drop.
HISTORY

HUE 3 ALPHA
Proposed design, didn’t
make it.
HISTORY

HUE 3.5+
Where we are now, new UI,
several new apps, the most
user friendly features to
date.
WHICH VERSION TO USE?
6 months
 1k commits later1-2 years old
HUE 2.X HUE 3.X HUE 3.5 + 1/2 3.6
WHICH DISTRIBUTION?
Advanced preview The most stable and
cross component
checked
Very latest
GITHUB CDH / CMTARBALL
HACKER...
WHERE TO PUT HUE? IN ONE MACHINE
WHERE TO PUT HUE? INSIDE THE CLUSTER
WHERE TO PUT HUE? OUTSIDE THE CLUSTER
WHAT DO YOU NEED?
Python 2.4 2.6



That’s it if using a packaged version. If
building from the source, here are the extra...
HOW DOES THE HUE SERVICE LOOK LIKE?
Process serving pages
and also static content
1 SERVER 1 DB
For cookies, saved
queries...
HOW TO CONFIGURE HUE
HUE.INI
Similar to core-site.xml but
with .INI syntax
!
Where?
/etc/hue/conf/hue.ini

or
$HUE_HOME/de...
AUTHENTICATE / LOGIN
[desktop]
[[auth]]
# - django.contrib.auth.backends.ModelBackend (entirely Django backend)
# - deskto...
USERS
Can give and revoke
permissions to single
users or group of users
ADMIN USER
Regular user +
permissions
DB BACKEND
LDAP BACKEND
Integrate your employees: LDAP How to guide
LIST OF GROUPS AND PERMISSIONS
A permission can:
- allow access to one app
(e.g. Hive Editor)
- modify data from the app
(...
PERMISSIONS IN ACTION
User ‘test’ belonging to the
group ‘hiveonly’ that has just
the ‘hive’ permissions
CONFIGURE APPS

A...
HOW HUE INTERACTS

WITH HADOOP
YARN
JobTracker
Oozie
Hue Plugins
LDAP	

SAML
Pig
HDFS HiveServer2
Hive	

Metastore
Clouder...
RCP CALLS TO ALL

THE HADOOP COMPONENTS
HDFS EXAMPLE
WebHDFS
REST
DN
DN
DN
…
DN
NN
http://localhost:50070/webhdfs/v1/<PATH...
HOW
Host/port of all services like
Oozie, Yarn, HDFS, HBase…
APIs are specified in hue.ini
on sections, e.g. [hbase] by
maj...
KERBEROS
1 Hue ticket/ principal - no user ticket
!
Hue uses its ticket for authenticating to every other service
(HDFS, O...
HUE KERBEROS TICKET
kadmin: addprinc -randkey hue/hue.server.fully.qualified.domain.name@YOUR-REALM.COM
Add Hue user princ...
HOW
Hue is a “super proxy”


Client could be on a
Windows machine, phone…
and interact with all the
Hadoop services
http:/...
HTTPS SSL DBSSL WITH HIVESERVER2
READ MORE …AUDITING
OTHER SECURITY

FEATURES
2 Hue instances
HA proxy
Multi DB
Performances: like a website,
mostly RPC calls
HIGH AVAILABILITY
HOW
DEMO
TIME

SUM-UP
Enable Hadoop Service
APIs for Hue as a proxy
user
Configure hue.ini to
point to each Service API
Get help on @geth...
CONFIGURATIONS ARE HARD…
…GIVE CLOUDERA MANAGER A TRY!
vimeo.com/91805055
MISSED

SOMETHING?
learn.gethue.com
LINKS

TWITTER
@gethue
USER GROUP
hue-user@
WEBSITE
http://gethue.com
LEARN
http://learn.gethue.com
GET HUE

Try in advance the latest
and greatest but you’ll
have to configure
everything on your own.
Get to play with Hue ...
WHAT ARE YOUR USE
CASES?
WHICH COMPONENTS DO
YOU USE?
WHAT WOULD YOU LIKE TO
SEE IN HUE?
INTERESTED IN
CONTRIBUTING?
WANNA...
THANK YOU! 

gethue.com
APPENDIX

HOW
Add Hue as WebHDFS proxy
user setting like 3 slides ago



Add the property on the
right in hdfs-site.xml to
enable We...
HOW
Example of config for having
Hue interact with Yarn
[hadoop]
[[yarn_clusters]]
!
[[[default]]]
# Enter the host on whic...
HOW
Based on HiveServer2
interface
!
Note for Hive:

<property>

<name>hive.server2.enable.doAs</
name>

<value>true</valu...
HOW
Make sure share lib is
installed
!
Alternative Dashboard and
Editors
[liboozie]
#oozie_url=http://localhost.com:11000/...
Upcoming SlideShare
Loading in...5
×

Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup

4,164

Published on

This talk will describe how Hue can be integrated with existing Hadoop deployments with minimal changes/disturbances. Romain will cover details on how Hue can leverage the existing authentication system and security model of your company. He will also cover the Hive/Shark/Pig/Oozie best practice setup for Hue.

http://www.meetup.com/hadoop/events/125191612/

Published in: Data & Analytics, Technology

Transcript of "Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup"

  1. 1. INTEGRATE HUE WITH YOUR HADOOP CLUSTER Romain Rigaux Y! HUG Apr 16, 2014
  2. 2. WHAT
 IS HUE? WEB INTERFACE FOR MAKING HADOOP EASIER TO USE 
 Suite of apps for each Hadoop component,
 like Hive, Pig, Impala, Oozie, Solr, Sqoop2, HBase...
  3. 3. VIEW FROM
 30K FEET Hadoop Web Server You and even that friend that uses IE9 ;)
  4. 4. YARN JobTracker Oozie Pig HDFS HiveServer2 Hive Metastore Cloudera Impala Solr HBase Sqoop2 Zookeeper LDAP SAML Hue Plugins ECOSYSTEM
 AND APPS
  5. 5. TARGET
 OF HUE GETTING STARTED WITH HADOOP 
 BEING PRODUCTIVE EXPLORING DIFFERENT ANGLES OF THE PLATFORM ! LET ANY USER FOCUS ON BIG DATA PROCESSING
 
 BEING COMPATIBLE WITH ANY HADOOP VERSION (0.20/1.2.0/2.3.0)
  6. 6. OPEN SOURCE
 ~3000 COMMITS 
 33 CONTRIBUTORS
 
 648 STARS
 
 212 FORKS ! 
 github.com/cloudera/hue
  7. 7. THE CORE
 TEAM PLAYERS team.gethue.com ABRAHAM ELMAHREK ROMAIN RIGAUX ENRICO BERTI CHANG BEER
  8. 8. TALKS Meetups and events in NYC, Paris, LA, Tokyo, SF, Stockholm, Vienna, San Jose, Singapore…
 Coming up in London, West coast AROUND
 THE WORLD RETREATS Nov 13 Koh Chang, Thailand May 14 Curaçao, Netherlands Antilles
  9. 9. FAST PACE LAST 30 DAYS 41 issues created and 38 resolved. Core team + Community
  10. 10. TREND: GROWTH gethue.com
  11. 11. HISTORY
 HUE 1 Desktop-like in a browser, did its job but pretty slow, memory leaks and not very IE friendly but definitely advanced for its time (2009-2010).
  12. 12. HISTORY
 HUE 2 The first flat structure port, with Twitter Bootstrap all over the place.
  13. 13. HISTORY
 HUE 2.5 New apps, improved the UX adding new nice functionalities like autocomplete and drag & drop.
  14. 14. HISTORY
 HUE 3 ALPHA Proposed design, didn’t make it.
  15. 15. HISTORY
 HUE 3.5+ Where we are now, new UI, several new apps, the most user friendly features to date.
  16. 16. WHICH VERSION TO USE? 6 months 1k commits later1-2 years old HUE 2.X HUE 3.X HUE 3.5 + 1/2 3.6
  17. 17. WHICH DISTRIBUTION? Advanced preview The most stable and cross component checked Very latest GITHUB CDH / CMTARBALL HACKER ADVANCED USER NORMAL USER
  18. 18. WHERE TO PUT HUE? IN ONE MACHINE
  19. 19. WHERE TO PUT HUE? INSIDE THE CLUSTER
  20. 20. WHERE TO PUT HUE? OUTSIDE THE CLUSTER
  21. 21. WHAT DO YOU NEED? Python 2.4 2.6
 
 That’s it if using a packaged version. If building from the source, here are the extra packages SERVER CLIENT Web Browser
 
 IE 9+, FF 10+, Chrome, Safari
  22. 22. HOW DOES THE HUE SERVICE LOOK LIKE? Process serving pages and also static content 1 SERVER 1 DB For cookies, saved queries, workflows, …
  23. 23. HOW TO CONFIGURE HUE HUE.INI Similar to core-site.xml but with .INI syntax ! Where? /etc/hue/conf/hue.ini
 or $HUE_HOME/desktop/conf/ pseudo-distributed.ini [desktop] [[database]] # Database engine is typically one of: # postgresql_psycopg2, mysql, or sqlite3 engine=sqlite3 ## host= ## port= ## user= ## password= name=desktop/desktop.db
  24. 24. AUTHENTICATE / LOGIN [desktop] [[auth]] # - django.contrib.auth.backends.ModelBackend (entirely Django backend) # - desktop.auth.backend.AllowAllBackend (allows everyone) # - desktop.auth.backend.AllowFirstUserDjangoBackend # - desktop.auth.backend.LdapBackend # - desktop.auth.backend.OAuthBackend # ... ## backend=desktop.auth.backend.AllowFirstUserDjangoBackend
  25. 25. USERS Can give and revoke permissions to single users or group of users ADMIN USER Regular user + permissions
  26. 26. DB BACKEND
  27. 27. LDAP BACKEND Integrate your employees: LDAP How to guide
  28. 28. LIST OF GROUPS AND PERMISSIONS A permission can: - allow access to one app (e.g. Hive Editor) - modify data from the app (e.g drop Hive Tables or edit cells in HBase Browser) CONFIGURE APPS
 AND PERMISSIONS A list of permissions
  29. 29. PERMISSIONS IN ACTION User ‘test’ belonging to the group ‘hiveonly’ that has just the ‘hive’ permissions CONFIGURE APPS
 AND PERMISSIONS
  30. 30. HOW HUE INTERACTS
 WITH HADOOP YARN JobTracker Oozie Hue Plugins LDAP SAML Pig HDFS HiveServer2 Hive Metastore Cloudera Impala Solr HBase Sqoop2 Zookeeper
  31. 31. RCP CALLS TO ALL
 THE HADOOP COMPONENTS HDFS EXAMPLE WebHDFS REST DN DN DN … DN NN http://localhost:50070/webhdfs/v1/<PATH>?op=LISTSTATUS
  32. 32. HOW Host/port of all services like Oozie, Yarn, HDFS, HBase… APIs are specified in hue.ini on sections, e.g. [hbase] by major service, Hue core [desktop] or Hue lib [liboozie] [hbase] # Comma-separated list of HBase Thrift servers for # clusters in the format of '(name|host:port)'. hbase_clusters=(Cluster|localhost:9090) ! [liboozie] # The URL where the Oozie service runs on. # oozie_url=http://hue.ent.cloudera.com:11000/oozie RCP CALLS TO ALL
 THE HADOOP COMPONENTS Full list
  33. 33. KERBEROS 1 Hue ticket/ principal - no user ticket ! Hue uses its ticket for authenticating to every other service (HDFS, Oozie, …)
 read more on the Hue Security Guide
  34. 34. HUE KERBEROS TICKET kadmin: addprinc -randkey hue/hue.server.fully.qualified.domain.name@YOUR-REALM.COM Add Hue user principal to Kerberos $ kinit -k -t /etc/hue/hue.keytab hue/hue.server.fully.qualified.domain.name@YOUR-REALM.COM Test Ticket should be renewable (krb5.conf and kdc.conf) [desktop] [[kerberos]] # Path to Hue's Kerberos keytab file hue_keytab=/etc/hue/hue.keytab # Kerberos principal name for Hue hue_principal=hue/FQDN@REALM # add kinit path for non root users kinit_path=/usr/kerberos/bin/kinit hue.ini
  35. 35. HOW Hue is a “super proxy” 
 Client could be on a Windows machine, phone… and interact with all the Hadoop services http://localhost:50070/webhdfs/v1/tmp? op=GETFILESTATUS&user.name=hue&doas=bob IMPERSONATION <!-- Hue WebHDFS proxy user setting -->
 <property>
 <name>hadoop.proxyuser.hue.hosts</name>
 <value>*</value>
 </property>
 <property>
 <name>hadoop.proxyuser.hue.groups</name>
 <value>*</value>
 </property> Call for getting the information about an HDFS file WebHDFS, add to core-site.xml
  36. 36. HTTPS SSL DBSSL WITH HIVESERVER2 READ MORE …AUDITING OTHER SECURITY
 FEATURES
  37. 37. 2 Hue instances HA proxy Multi DB Performances: like a website, mostly RPC calls HIGH AVAILABILITY HOW
  38. 38. DEMO TIME

  39. 39. SUM-UP Enable Hadoop Service APIs for Hue as a proxy user Configure hue.ini to point to each Service API Get help on @gethue or hue-user Install Hue on one machine + Hue Kerberos ticket Use an LDAP backend INSTALL CONFIGUREENABLE HELPLDAP
  40. 40. CONFIGURATIONS ARE HARD… …GIVE CLOUDERA MANAGER A TRY! vimeo.com/91805055
  41. 41. MISSED
 SOMETHING? learn.gethue.com
  42. 42. LINKS
 TWITTER @gethue USER GROUP hue-user@ WEBSITE http://gethue.com LEARN http://learn.gethue.com
  43. 43. GET HUE
 Try in advance the latest and greatest but you’ll have to configure everything on your own. Get to play with Hue and various Hadoop components in 5 minutes. It’s a self contained CDH environment ready to use. Newer version than HDP, close to the original 2.5 minus apps like HBase, Impala, Sqoop, Search. The newest addition, ships Hue 3.0 through the GreenButton products. Stable and highly tested releases perfectly integrated with the Hadoop ecosystem, automagically configured by Cloudera Manager. In HDP there’s an old forked version of Hue 2.3. CLOUDERA’S CDH TARBALL CLOUDERA’S DEMO VM HORTONWORKS* MAPR* HP CLOUD* * YOUR MILEAGE MAY VARY. BIGTOP EMBEDDED/DEMO IN IND. COMPANIES
  44. 44. WHAT ARE YOUR USE CASES? WHICH COMPONENTS DO YOU USE? WHAT WOULD YOU LIKE TO SEE IN HUE? INTERESTED IN CONTRIBUTING? WANNA SAY HELLO? DO YOU WANT A TAILOR MADE TEAM RETREAT? QUESTIONS? TEAM@ GETHUE.COM
  45. 45. THANK YOU! 
 gethue.com
  46. 46. APPENDIX

  47. 47. HOW Add Hue as WebHDFS proxy user setting like 3 slides ago
 
 Add the property on the right in hdfs-site.xml to enable WebHDFS in the NameNode and DataNodes <property>
 <name>dfs.webhdfs.enabled</name>
 <value>true</value>
 </property> HDFS FILE BROWSER [hadoop] [[hdfs_clusters]] # HA support by using HttpFs ! [[[default]]] # Enter the filesystem uri ##fs_defaultfs=hdfs://localhost:8020 ! # Use WebHdfs/HttpFs as the communication mechanism. ##webhdfs_url=http://localhost:50070/webhdfs/v1 hdfs-site.xml hue.ini
  48. 48. HOW Example of config for having Hue interact with Yarn [hadoop] [[yarn_clusters]] ! [[[default]]] # Enter the host on which you are running the ResourceManager resourcemanager_host=localhost ! # The port where the ResourceManager IPC listens on ## resourcemanager_port=8032 ! # Whether to submit jobs to this cluster submit_to=True ! # Change this if your YARN cluster is Kerberos-secured ## security_enabled=false ! # URL of the ResourceManager API ## resourcemanager_api_url=http://localhost:8088 ! # URL of the ProxyServer API ## proxy_api_url=http://localhost:8088 ! # URL of the HistoryServer API # history_server_api_url=http://localhost:19888 ! [[[ha]]] # Enter the host on which you are running the failover Resource Manager resourcemanager_api_url=http://localhost:8088 ## logical_name= submit_to=True YARN / MR2
  49. 49. HOW Based on HiveServer2 interface ! Note for Hive:
 <property>
 <name>hive.server2.enable.doAs</ name>
 <value>true</value>
 </property> ! Video demo
 Setup tutorial [beeswax] # Host where Hive server Thrift daemon is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). ## hive_server_host=localhost ## hive_server_port=10000 ! # Hive configuration directory, where hive-site.xml is located ## hive_conf_dir=/etc/hive/conf HIVE (IMPALA / SHARK)
  50. 50. HOW Make sure share lib is installed ! Alternative Dashboard and Editors [liboozie] #oozie_url=http://localhost.com:11000/oozie OOZIE HOW Comes with Oozie, no PigServer yet Oozie sharelib Oozie credentials for security PIG
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×