Successfully reported this slideshow.
OPEN UP INTERACTIVE
BIG DATA ANALYSIS FOR
YOUR ENTERPRISE
Enrico Berti

Budapest DW Forum, Jun 4, 2014
GOAL

OF HUE
WEB INTERFACE FOR ANALYZING DATA
WITH APACHE HADOOP	
  
!
SIMPLIFY AND INTEGRATE



FREE AND OPEN SOURCE
!
—>...
VIEW FROM

30K FEET
Hadoop Web Server
You, your colleagues and even that
friend that uses IE9 ;)
OPEN SOURCE

~3350 COMMITS	
  


38 CONTRIBUTORS



698 STARS



245 FORKS
!


github.com/cloudera/hue
THE CORE

TEAM PLAYERS
Join	
  us	
  at	
  team.gethue.com
Romain	
  Rigaux Enrico	
  Ber9Chang Abraham	
  ElmahrekAmstel
TALKS
Meetups	
  and	
  events	
  in	
  NYC,	
  Paris,	
  
LA,	
  Tokyo,	
  SF,	
  Stockholm,	
  Vienna,	
  
San	
  Jose,	...
TREND: GROWTH
gethue.com
HISTORY

HUE 1
Desktop-­‐like	
  in	
  a	
  browser,	
  did	
  its	
  
job	
  but	
  preWy	
  slow,	
  memory	
  leaks	
  ...
HISTORY

HUE 2
The	
  first	
  flat	
  structure	
  port,	
  with	
  
TwiWer	
  Bootstrap	
  all	
  over	
  the	
  
place.
H...
HISTORY

HUE 3 ALPHA
Proposed	
  design,	
  didn’t	
  make	
  it.
HISTORY

HUE 3.5
New	
  UI,	
  several	
  new	
  apps,	
  the	
  
most	
  user	
  friendly	
  features	
  to	
  
date.
HISTORY

HUE 3.6+
Where	
  we	
  are	
  now,	
  a	
  brand	
  new	
  
way	
  to	
  search	
  and	
  explore	
  your	
  
da...
WHICH VERSION TO USE?
2500+	
  commits	
  later,	
  new	
  UI,	
  
interac9ve	
  search	
  /	
  SQL,	
  
dashboards...
1-­...
WHICH DISTRIBUTION?
Advanced	
  preview The	
  most	
  stable	
  and	
  cross	
  
component	
  checked
Very	
  latest
GITH...
WHERE TO PUT HUE? IN ONE MACHINE
WHERE TO PUT HUE? OUTSIDE THE CLUSTER
WHERE TO PUT HUE? INSIDE THE CLUSTER
Python	
  2.4	
  2.6



That’s	
  it	
  if	
  using	
  a	
  packaged	
  version.	
  If	
  building	
  from	
  the	
  
sour...
HOW DOES THE HUE SERVICE LOOK LIKE?
Process	
  serving	
  pages	
  and	
  also	
  
static	
  content
1 SERVER 1 DB
For	
  ...
HOW TO CONFIGURE HUE
HUE.INI
Similar	
  to	
  core-­‐site.xml	
  but	
  
with	
  .INI	
  syntax	
  
!
Where?	
  
/etc/hue/...
AUTHENTICATION
Login/Password	
  in	
  a	
  Database	
  
(SQLite,	
  MySQL,	
  …)
SIMPLE ENTERPRISE
LDAP	
  (most	
  used)...
DB BACKEND
LDAP BACKEND
Integrate	
  your	
  employees:	
  LDAP	
  How	
  to	
  guide
USERS
Can	
  give	
  and	
  revoke	
  
permissions	
  to	
  single	
  users	
  or	
  
group	
  of	
  users
ADMIN USER
Regu...
LIST OF GROUPS AND PERMISSIONS
A	
  permission	
  can:	
  
- allow	
  access	
  to	
  one	
  app	
  (e.g.	
  
Hive	
  Edit...
PERMISSIONS IN ACTION
User	
  ‘test’	
  belonging	
  to	
  the	
  group	
  
‘hiveonly’	
  that	
  has	
  just	
  the	
  ‘h...
HOW HUE INTERACTS

WITH HADOOP
YARN
JobTracker
Oozie
Hue Plugins
LDAP	

SAML
Pig
HDFS HiveServer2
Hive	

Metastore
Clouder...
RCP CALLS TO ALL

THE HADOOP COMPONENTS
HDFS EXAMPLE
WebHDFS
REST
DN
DN
DN
…
DN
NN
hWp://localhost:50070/webhdfs/v1/<PATH>...
HOW
List	
  all	
  the	
  host/port	
  of	
  Hadoop	
  
APIs	
  in	
  the	
  hue.ini	
  
!
For	
  example	
  here	
  HBase...
HTTPS SSL DBSSL WITH HIVESERVER2
READ MORE …AUDITING
SECURITY

FEATURES
KERBEROS
2	
  Hue	
  instances	
  
HA	
  proxy	
  
Mul9	
  DB	
  
Performances:	
  like	
  a	
  website,	
  
mostly	
  RPC	
  calls...
Impala,	
  Hive	
  integra9on,	
  Spark	
  
(Shark	
  too)	
  
Interac9ve	
  SQL	
  editor	
  	
  
Integra9on	
  with	
  M...
Solr	
  &	
  Cloud	
  integra9on	
  
Custom	
  interac9ve	
  dashboards	
  
Drag	
  &	
  drop	
  widgets	
  (charts,	
  
9...
Simple	
  custom	
  query	
  language	
  
Supports	
  HBase	
  filter	
  language	
  
Supports	
  selec9on	
  &	
  Copy	
  ...
DEMO
TIME

SUM-UP
Enable	
  Hadoop	
  Service	
  APIs	
  
for	
  Hue	
  as	
  a	
  proxy	
  user
Configure	
  hue.ini	
  to	
  point	...
ROADMAP

NEXT 6 MONTHS
Sentry	
  
Search,	
  Spark,	
  SQL	
  
More	
  dashboards!	
  
Oozie	
  v2	
  
Inter	
  component	...
CONFIGURATIONS ARE HARD…
…GIVE CLOUDERA MANAGER A TRY!
vimeo.com/91805055
MISSED

SOMETHING?
learn.gethue.com
TWITTER
@gethue
USER GROUP
hue-­‐user@
WEBSITE
hWp://gethue.com
LEARN
hWp://learn.gethue.com
THANK YOU! 

Upcoming SlideShare
Loading in …5
×

Open up interactive big data analysis for your enterprise

964 views

Published on

Open up interactive big data analysis for your enterprise
Hadoop brings many data crunching possibilities but also comes with a lot of complexity: the ecosystem is large and continuously changing, interactions happens on the command line, interfaces are built for engineers…
This talk describes how Hue can be integrated with existing Hadoop deployments with minimal changes/disturbances. Enrico covers details on how Hue can leverage the existing authentication system and security model of your company.
This talk describes through an interactive demo and dialog based on open source Hue how users can get started with Hadoop. We will detail how one can start or use an existing Hadoop cluster to setup Hue. The best practices about how to integrate your company directory and security will be shared. Moreover, the underlying technical details about interact with the ecosystem.
The presentation will continue with real life analytics business use cases. It will show how data can be imported and loaded into the cluster for then being queried interactively with SQL or a search dashboard. All through your Web Browser!
To sum-up, attendees of this talk will learn how Hadoop can be made more accessible and why Hue is the ideal gateway for quickly getting started or using the platform more efficiently.

Published in: Data & Analytics, Technology
  • Be the first to comment

  • Be the first to like this

Open up interactive big data analysis for your enterprise

  1. 1. OPEN UP INTERACTIVE BIG DATA ANALYSIS FOR YOUR ENTERPRISE Enrico Berti Budapest DW Forum, Jun 4, 2014
  2. 2. GOAL
 OF HUE WEB INTERFACE FOR ANALYZING DATA WITH APACHE HADOOP   ! SIMPLIFY AND INTEGRATE
 
 FREE AND OPEN SOURCE ! —> OPEN UP BIG DATA
  3. 3. VIEW FROM
 30K FEET Hadoop Web Server You, your colleagues and even that friend that uses IE9 ;)
  4. 4. OPEN SOURCE
 ~3350 COMMITS   
 38 CONTRIBUTORS
 
 698 STARS
 
 245 FORKS ! 
 github.com/cloudera/hue
  5. 5. THE CORE
 TEAM PLAYERS Join  us  at  team.gethue.com Romain  Rigaux Enrico  Ber9Chang Abraham  ElmahrekAmstel
  6. 6. TALKS Meetups  and  events  in  NYC,  Paris,   LA,  Tokyo,  SF,  Stockholm,  Vienna,   San  Jose,  Singapore,  Budapest…
 Coming  up  in  London,  West  coast AROUND
 THE WORLD RETREATS Nov  13  Koh  Chang,  Thailand   May  14  Curaçao,  Netherlands  An9lles   Nov  14  Goa,  India
  7. 7. TREND: GROWTH gethue.com
  8. 8. HISTORY
 HUE 1 Desktop-­‐like  in  a  browser,  did  its   job  but  preWy  slow,  memory  leaks   and  not  very  IE  friendly  but   definitely  advanced  for  its  9me   (2009-­‐2010).
  9. 9. HISTORY
 HUE 2 The  first  flat  structure  port,  with   TwiWer  Bootstrap  all  over  the   place. HUE 2.5 New  apps,  improved  the  UX   adding  new  nice  func9onali9es   like  autocomplete  and  drag  &   drop.
  10. 10. HISTORY
 HUE 3 ALPHA Proposed  design,  didn’t  make  it.
  11. 11. HISTORY
 HUE 3.5 New  UI,  several  new  apps,  the   most  user  friendly  features  to   date.
  12. 12. HISTORY
 HUE 3.6+ Where  we  are  now,  a  brand  new   way  to  search  and  explore  your   data.
  13. 13. WHICH VERSION TO USE? 2500+  commits  later,  new  UI,   interac9ve  search  /  SQL,   dashboards... 1-­‐2  years  old,  use  only  if  you   depend  on  Hive  <  0.12   HUE 2.X HUE 3.X
  14. 14. WHICH DISTRIBUTION? Advanced  preview The  most  stable  and  cross   component  checked Very  latest GITHUB CDH / CMTARBALL HACKER ADVANCED USER NORMAL USER
  15. 15. WHERE TO PUT HUE? IN ONE MACHINE
  16. 16. WHERE TO PUT HUE? OUTSIDE THE CLUSTER
  17. 17. WHERE TO PUT HUE? INSIDE THE CLUSTER
  18. 18. Python  2.4  2.6
 
 That’s  it  if  using  a  packaged  version.  If  building  from  the   source,  here  are  the  extra  packages SERVER CLIENT Web  Browser
 
 IE  9+,  FF  10+,  Chrome,  Safari WHAT DO YOU NEED? Hi  there,  I’m  “just”  a  web  server.
  19. 19. HOW DOES THE HUE SERVICE LOOK LIKE? Process  serving  pages  and  also   static  content 1 SERVER 1 DB For  cookies,  saved  queries,   workflows,  … Hi  there,  I’m  “just”  a  web  server.
  20. 20. HOW TO CONFIGURE HUE HUE.INI Similar  to  core-­‐site.xml  but   with  .INI  syntax   ! Where?   /etc/hue/conf/hue.ini
 or   $HUE_HOME/desktop/conf/ pseudo-distributed.ini [desktop] [[database]] # Database engine is typically one of: # postgresql_psycopg2, mysql, or sqlite3 engine=sqlite3 ## host= ## port= ## user= ## password= name=desktop/desktop.db
  21. 21. AUTHENTICATION Login/Password  in  a  Database   (SQLite,  MySQL,  …) SIMPLE ENTERPRISE LDAP  (most  used),  OAuth,   OpenID,  SAML
  22. 22. DB BACKEND
  23. 23. LDAP BACKEND Integrate  your  employees:  LDAP  How  to  guide
  24. 24. USERS Can  give  and  revoke   permissions  to  single  users  or   group  of  users ADMIN USER Regular  user  +  permissions
  25. 25. LIST OF GROUPS AND PERMISSIONS A  permission  can:   - allow  access  to  one  app  (e.g.   Hive  Editor)   - modify  data  from  the  app  (e.g   drop  Hive  Tables  or  edit  cells  in   HBase  Browser) CONFIGURE APPS
 AND PERMISSIONS A  list  of  permissions
  26. 26. PERMISSIONS IN ACTION User  ‘test’  belonging  to  the  group   ‘hiveonly’  that  has  just  the  ‘hive’   permissions CONFIGURE APPS
 AND PERMISSIONS
  27. 27. HOW HUE INTERACTS
 WITH HADOOP YARN JobTracker Oozie Hue Plugins LDAP SAML Pig HDFS HiveServer2 Hive Metastore Cloudera Impala Solr HBase Sqoop2 Zookeeper
  28. 28. RCP CALLS TO ALL
 THE HADOOP COMPONENTS HDFS EXAMPLE WebHDFS REST DN DN DN … DN NN hWp://localhost:50070/webhdfs/v1/<PATH>?op=LISTSTATUS
  29. 29. HOW List  all  the  host/port  of  Hadoop   APIs  in  the  hue.ini   ! For  example  here  HBase  and  Hive. RCP CALLS TO ALL
 THE HADOOP COMPONENTS Full  list [hbase] # Comma-separated list of HBase Thrift servers for # clusters in the format of '(name|host:port)'. hbase_clusters=(Cluster|localhost:9090) ! [beeswax] hive_server_host=host-abc hive_server_port=10000
  30. 30. HTTPS SSL DBSSL WITH HIVESERVER2 READ MORE …AUDITING SECURITY
 FEATURES KERBEROS
  31. 31. 2  Hue  instances   HA  proxy   Mul9  DB   Performances:  like  a  website,   mostly  RPC  calls HIGH AVAILABILITY HOW
  32. 32. Impala,  Hive  integra9on,  Spark   (Shark  too)   Interac9ve  SQL  editor     Integra9on  with  MapReduce,   Metastore,  HDFS SQL WHAT
  33. 33. Solr  &  Cloud  integra9on   Custom  interac9ve  dashboards   Drag  &  drop  widgets  (charts,   9meline…) SEARCH WHAT
  34. 34. Simple  custom  query  language   Supports  HBase  filter  language   Supports  selec9on  &  Copy  +  Paste,   gracefully  degrades  in  IE   Autocomplete  Help  Menu   Row$Key$ Scan$Length$ Prefix$Scan$ Column/Family$Filters$ Thri=$Filterstring$ Searchbar(Syntax(Breakdown( HBASE BROWSER WHAT
  35. 35. DEMO TIME

  36. 36. SUM-UP Enable  Hadoop  Service  APIs   for  Hue  as  a  proxy  user Configure  hue.ini  to  point  to   each  Service  API Get  help  on  @gethue  or  hue-­‐ user Install  Hue  on  one  machine Use  an  LDAP  backend INSTALL CONFIGUREENABLE HELPLDAP
  37. 37. ROADMAP
 NEXT 6 MONTHS Sentry   Search,  Spark,  SQL   More  dashboards!   Oozie  v2   Inter  component  integra9ons   (HBase  <-­‐>  Search,  create  index   wizards,  document  permissions),   Hadoop  Web  apps  SDK   Your  idea  here. WHAT
  38. 38. CONFIGURATIONS ARE HARD… …GIVE CLOUDERA MANAGER A TRY! vimeo.com/91805055
  39. 39. MISSED
 SOMETHING? learn.gethue.com
  40. 40. TWITTER @gethue USER GROUP hue-­‐user@ WEBSITE hWp://gethue.com LEARN hWp://learn.gethue.com THANK YOU! 


×