SlideShare a Scribd company logo
1
Cloudera	
  Impala	
  
Charm	
  City	
  Linux,	
  March	
  2014	
  
	
  
Alex	
  Moundalexis	
  
	
  	
  
@technmsg	
  
Thirty	
  Seconds	
  About	
  Alex	
  
•  Solu@ons	
  Architect	
  
•  aka	
  consultant	
  
•  government	
  
•  infrastructure	
  
•  former	
  coder	
  of	
  Perl	
  
•  former	
  administrator	
  
•  likes	
  shiny	
  objects	
  
2	
  
What	
  Does	
  Cloudera	
  Do?	
  
•  product	
  
•  distribu@on	
  of	
  Hadoop	
  components,	
  Apache	
  licensed	
  
•  enterprise	
  tooling	
  
•  support	
  
•  training	
  
•  services	
  (aka	
  consul@ng)	
  
•  community	
  
3
Disclaimer	
  
•  Cloudera	
  builds	
  things	
  soMware	
  
•  most	
  donated	
  to	
  Apache	
  
•  some	
  closed-­‐source	
  
•  Cloudera	
  “products”	
  I	
  reference	
  are	
  open	
  source	
  
•  Apache	
  Licensed	
  
•  source	
  code	
  is	
  on	
  GitHub	
  
•  hSps://github.com/cloudera	
  
4
What	
  This	
  Talk	
  Isn’t	
  About	
  
•  deploying	
  
•  Puppet,	
  Chef,	
  Ansible,	
  homegrown	
  scripts,	
  intern	
  labor	
  
•  sizing	
  &	
  tuning	
  
•  depends	
  heavily	
  on	
  data	
  and	
  workload	
  
•  coding	
  
•  unless	
  you	
  count	
  XML	
  or	
  CSV	
  or	
  SQL	
  
•  algorithms	
  
5
6
Quick	
  and	
  dirty,	
  for	
  context.	
  
The	
  Apache	
  Hadoop	
  Ecosystem	
  
Why	
  “Ecosystem?”	
  
•  In	
  the	
  beginning,	
  just	
  Hadoop	
  
•  HDFS	
  
•  MapReduce	
  
•  Today,	
  dozens	
  of	
  interrelated	
  components	
  
•  I/O	
  
•  Processing	
  
•  Specialty	
  Applica@ons	
  
•  Configura@on	
  
•  Workflow	
  
7
HDFS	
  
•  Distributed,	
  highly	
  fault-­‐tolerant	
  filesystem	
  
•  Op@mized	
  for	
  large	
  streaming	
  access	
  to	
  data	
  
•  Based	
  on	
  Google	
  File	
  System	
  
•  hSp://research.google.com/archive/gfs.html	
  
8
Lots	
  of	
  Commodity	
  Machines	
  
9
Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
MapReduce	
  (MR)	
  
•  Programming	
  paradigm	
  
•  Batch	
  oriented,	
  not	
  real@me	
  
•  Works	
  well	
  with	
  distributed	
  compu@ng	
  
•  Lots	
  of	
  Java,	
  but	
  other	
  languages	
  supported	
  
•  Based	
  on	
  Google’s	
  paper	
  
•  hSp://research.google.com/archive/mapreduce.html	
  
10
Under	
  the	
  Covers	
  
11
You specify map() and
reduce() functions.

The framework does the
rest.	

60
Apache	
  Hive	
  
•  Abstrac@on	
  of	
  Hadoop’s	
  Java	
  API	
  
•  HiveQL	
  “compiles”	
  down	
  to	
  MR	
  
•  a	
  “SQL-­‐like”	
  language	
  
•  Eases	
  analysis	
  using	
  MapReduce	
  
13
Apache	
  Hive	
  Metastore	
  
•  Maps	
  HDFS	
  files	
  to	
  DB-­‐like	
  resources	
  
•  Databases	
  
•  Tables	
  
•  Column/field	
  names,	
  data	
  types	
  
•  Roles/users	
  
•  InputFormat/OutputFormat	
  
14
WHY	
  DO	
  WE	
  NEED	
  THIS?	
  
But	
  wait…	
  
15	
  
16	
  
17
I	
  am	
  not	
  a	
  SQL	
  wizard	
  by	
  any	
  means…	
  
Super	
  Shady	
  SQL	
  Supplement	
  
A	
  Simple	
  Rela@onal	
  Database	
  
name	
   state	
   employer	
   year	
  
Alex	
   Maryland	
   Cloudera	
   2013	
  
Joey	
   Maryland	
   Cloudera	
   2011	
  
Sean	
   Texas	
   Cloudera	
   2013	
  
Paris	
   Maryland	
   AOL	
   2011	
  
18
	
  
Interac@ng	
  with	
  Rela@onal	
  Data	
  
name	
   state	
   employer	
   year	
  
Alex	
   Maryland	
   Cloudera	
   2013	
  
Joey	
   Maryland	
   Cloudera	
   2011	
  
Sean	
   Texas	
   Cloudera	
   2013	
  
Paris	
   Maryland	
   AOL	
   2011	
  
19
	
  SELECT	
  *	
  FROM	
  people;	
  
Interac@ng	
  with	
  Rela@onal	
  Data	
  
name	
   state	
   employer	
   year	
  
Alex	
   Maryland	
   Cloudera	
   2013	
  
Joey	
   Maryland	
   Cloudera	
   2011	
  
Sean	
   Texas	
   Cloudera	
   2013	
  
Paris	
   Maryland	
   AOL	
   2011	
  
20
	
  SELECT	
  *	
  FROM	
  people;	
  
Reques@ng	
  Specific	
  Fields	
  
name	
   state	
   employer	
   year	
  
Alex	
   Maryland	
   Cloudera	
   2013	
  
Joey	
   Maryland	
   Cloudera	
   2011	
  
Sean	
   Texas	
   Cloudera	
   2013	
  
Paris	
   Maryland	
   AOL	
   2011	
  
21
	
  SELECT	
  name,	
  state	
  FROM	
  people;	
  
Reques@ng	
  Specific	
  Fields	
  
name	
   state	
   employer	
   year	
  
Alex	
   Maryland	
   Cloudera	
   2013	
  
Joey	
   Maryland	
   Cloudera	
   2011	
  
Sean	
   Texas	
   Cloudera	
   2013	
  
Paris	
   Maryland	
   AOL	
   2011	
  
22
	
  SELECT	
  name,	
  state	
  FROM	
  people;	
  
Reques@ng	
  Specific	
  Rows	
  
name	
   state	
   employer	
   year	
  
Alex	
   Maryland	
   Cloudera	
   2013	
  
Joey	
   Maryland	
   Cloudera	
   2011	
  
Sean	
   Texas	
   Cloudera	
   2013	
  
Paris	
   Maryland	
   AOL	
   2011	
  
23
	
  SELECT	
  name,	
  state	
  FROM	
  people	
  WHERE	
  year	
  	
  2012;	
  
Reques@ng	
  Specific	
  Rows	
  
name	
   state	
   employer	
   year	
  
Alex	
   Maryland	
   Cloudera	
   2013	
  
Joey	
   Maryland	
   Cloudera	
   2011	
  
Sean	
   Texas	
   Cloudera	
   2013	
  
Paris	
   Maryland	
   AOL	
   2011	
  
24
	
  SELECT	
  name,	
  state	
  FROM	
  people	
  WHERE	
  year	
  	
  2012;	
  
Two	
  Simple	
  Tables	
  
owner	
   species	
   name	
  
Alex	
   Cactus	
   Marvin	
  
Joey	
   Cat	
   Brain	
  
Sean	
   None	
  
Paris	
   Unknown	
  
25	
  
	
  
name	
   state	
   employer	
   year	
  
Alex	
   Maryland	
   Cloudera	
   2013	
  
Joey	
   Maryland	
   Cloudera	
   2011	
  
Sean	
   Texas	
   Cloudera	
   2013	
  
Paris	
   Maryland	
   AOL	
   2011	
  
Joining	
  Two	
  Tables	
  
owner	
   species	
   name	
  
Alex	
   Cactus	
   Marvin	
  
Joey	
   Cat	
   Brain	
  
Sean	
   None	
  
Paris	
   Unknown	
  
26	
  
	
  SELECT	
  people.name	
  AS	
  owner,	
  people.state	
  AS	
  state,	
  pets.name	
  AS	
  pet	
  
	
  FROM	
  people	
  LEFT	
  JOIN	
  pets	
  ON	
  people.name	
  =	
  pets.owner	
  
	
  name	
   state	
   employer	
   year	
  
Alex	
   Maryland	
   Cloudera	
   2013	
  
Joey	
   Maryland	
   Cloudera	
   2011	
  
Sean	
   Texas	
   Cloudera	
   2013	
  
Paris	
   Maryland	
   AOL	
   2011	
  
Joining	
  Two	
  Tables	
  
owner	
   species	
   name	
  
Alex	
   Cactus	
   Marvin	
  
Joey	
   Cat	
   Brain	
  
Sean	
   None	
  
Paris	
   Unknown	
  
27	
  
	
  SELECT	
  people.name	
  AS	
  owner,	
  people.state	
  AS	
  state,	
  pets.name	
  AS	
  pet	
  
	
  FROM	
  people	
  LEFT	
  JOIN	
  pets	
  ON	
  people.name	
  =	
  pets.owner	
  
	
  name	
   state	
   employer	
   year	
  
Alex	
   Maryland	
   Cloudera	
   2013	
  
Joey	
   Maryland	
   Cloudera	
   2011	
  
Sean	
   Texas	
   Cloudera	
   2013	
  
Paris	
   Maryland	
   AOL	
   2011	
  
Joining	
  Two	
  Tables	
  
owner	
   species	
   name	
  
Alex	
   Cactus	
   Marvin	
  
Joey	
   Cat	
   Brain	
  
Sean	
   None	
  
Paris	
   Unknown	
  
28	
  
	
  SELECT	
  people.name	
  AS	
  owner,	
  people.state	
  AS	
  state,	
  pets.name	
  AS	
  pet	
  
	
  FROM	
  people	
  LEFT	
  JOIN	
  pets	
  ON	
  people.name	
  =	
  pets.owner	
  
name	
   state	
   employer	
   year	
  
Alex	
   Maryland	
   Cloudera	
   2013	
  
Joey	
   Maryland	
   Cloudera	
   2011	
  
Sean	
   Texas	
   Cloudera	
   2013	
  
Paris	
   Maryland	
   AOL	
   2011	
  
Joining	
  Two	
  Tables	
  
29
	
  SELECT	
  people.name	
  AS	
  owner,	
  people.state	
  AS	
  state,	
  pets.name	
  AS	
  pet	
  
	
  FROM	
  people	
  LEFT	
  JOIN	
  pets	
  ON	
  people.name	
  =	
  pets.owner	
  
owner	
   state	
   pet	
  
Alex	
   Maryland	
   Marvin	
  
Joey	
   Maryland	
   Brain	
  
Sean	
   Texas	
  
Paris	
   Maryland	
  
Varying	
  Implementa@on	
  of	
  JOIN	
  
30
	
  SELECT	
  people.name	
  AS	
  owner,	
  people.state	
  AS	
  state,	
  pets.name	
  AS	
  pet	
  
	
  FROM	
  people	
  LEFT	
  JOIN	
  pets	
  ON	
  people.name	
  =	
  pets.owner	
  
owner	
   state	
   pet	
  
Alex	
   Maryland	
   Marvin	
  
Joey	
   Maryland	
   Brain	
  
Sean	
   Texas	
   ?	
  
Paris	
   Maryland	
   ?	
  
31
Familiar	
  interface,	
  but	
  more	
  powerful.	
  
Cloudera	
  Impala	
  
Cloudera	
  Impala	
  
•  Interac@ve	
  query	
  on	
  Hadoop	
  
•  think	
  seconds,	
  not	
  minutes	
  
•  Nearly	
  ANSI-­‐92	
  standard	
  SQL	
  
•  compa@ble	
  with	
  HiveQL	
  
•  Na@ve	
  MPP	
  query	
  engine	
  
•  built	
  for	
  low-­‐latency	
  queries	
  
32
Cloudera	
  Impala	
  –	
  Design	
  Choices	
  
•  Na@ve	
  daemons,	
  wriSen	
  in	
  C/C++	
  
•  No	
  JVM,	
  no	
  MapReduce	
  
•  Saturate	
  disks	
  on	
  reads	
  
•  Uses	
  in-­‐memory	
  HDFS	
  caching	
  
•  Re-­‐uses	
  Hive	
  metastore	
  
•  Not	
  as	
  fault-­‐tolerant	
  as	
  MapReduce	
  
33
Cloudera	
  Impala	
  –	
  Architecture	
  
•  Impala	
  Daemon	
  
•  runs	
  on	
  every	
  node	
  
•  handles	
  client	
  requests	
  
•  handles	
  query	
  planning	
  	
  execu@on	
  
•  State	
  Store	
  Daemon	
  
•  provides	
  name	
  service	
  
•  metadata	
  distribu@on	
  
•  used	
  for	
  finding	
  data	
  
34
Impala	
  Query	
  Execu@on	
  
35
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Hive	
  
Metastore	
  
HDFS	
  NN	
   Statestore	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  request	
  
1)	
  Request	
  arrives	
  via	
  ODBC/JDBC/HUE/Shell	
  
Impala	
  Query	
  Execu@on	
  
36
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Hive	
  
Metastore	
  
HDFS	
  NN	
   Statestore	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
2)	
  Planner	
  turns	
  request	
  into	
  collecRons	
  of	
  plan	
  fragments	
  
3)	
  Coordinator	
  iniRates	
  execuRon	
  on	
  impalad(s)	
  local	
  to	
  data	
  
Impala	
  Query	
  Execu@on	
  
37
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Hive	
  
Metastore	
  
HDFS	
  NN	
   Statestore	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
4)	
  Intermediate	
  results	
  are	
  streamed	
  between	
  impalad(s)	
  
5)	
  Query	
  results	
  are	
  streamed	
  back	
  to	
  client	
  
Query	
  results	
  
Cloudera	
  Impala	
  –	
  Results	
  
•  Allows	
  for	
  fast	
  itera@on/discovery	
  
•  How	
  much	
  faster?	
  
•  3-­‐4x	
  faster	
  on	
  I/O	
  bound	
  workloads	
  
•  up	
  to	
  45x	
  faster	
  on	
  mul@-­‐MR	
  queries	
  
•  up	
  to	
  90x	
  faster	
  on	
  in-­‐memory	
  cache	
  
38
39
Hold	
  onto	
  something,	
  folks.	
  
Demo	
  
What’s	
  Next?	
  
•  Download	
  Hadoop!	
  
•  CDH	
  available	
  at	
  www.cloudera.com	
  
•  Already	
  done	
  that?	
  Contribute…	
  
•  Cloudera	
  provides	
  pre-­‐loaded	
  VMs	
  
•  hSp://@ny.cloudera.com/quickstartvm	
  
•  Clone	
  our	
  repos!	
  
•  hSps://github.com/cloudera	
  
40
PARIS	
  
Special	
  thanks:	
  
41	
  
42
Preferably	
  related	
  to	
  the	
  talk…	
  or	
  not.	
  
Ques@ons?	
  
43
Thank	
  You!	
  
Alex	
  Moundalexis	
  
	
  	
  
@technmsg	
  
	
  
We’re	
  hiring,	
  kids!	
  Well,	
  not	
  kids.	
  

More Related Content

Similar to Introduction to Cloudera Impala

dplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Datadplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Data
Cloudera, Inc.
 
Database
DatabaseDatabase
Analyzing twitter data with hadoop
Analyzing twitter data with hadoopAnalyzing twitter data with hadoop
Analyzing twitter data with hadoop
Joey Echeverria
 
PHP and MySQL.pptx
PHP and MySQL.pptxPHP and MySQL.pptx
PHP and MySQL.pptx
natesanp1234
 
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
Dave Stokes
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
DataWorks Summit
 
State of The Dolphin - May 2021
State of The Dolphin - May 2021State of The Dolphin - May 2021
State of The Dolphin - May 2021
Frederic Descamps
 
Silicon Valley 2014 - API Antipatterns
Silicon Valley 2014 - API AntipatternsSilicon Valley 2014 - API Antipatterns
Silicon Valley 2014 - API Antipatterns
Manish Pandit
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Publicis Sapient Engineering
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Lucidworks
 
AZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxAZMS PRESENTATION.pptx
AZMS PRESENTATION.pptx
SonuShaw16
 
Oracle database 12c_and_DevOps
Oracle database 12c_and_DevOpsOracle database 12c_and_DevOps
Oracle database 12c_and_DevOps
Maria Colgan
 
Extending drupal authentication
Extending drupal authenticationExtending drupal authentication
Extending drupal authentication
Charles Russell
 
Which Freaking Database Should I Use?
Which Freaking Database Should I Use?Which Freaking Database Should I Use?
Which Freaking Database Should I Use?
Great Wide Open
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Gwen (Chen) Shapira
 
Cassandra Workshop - Cassandra from scratch in one day
Cassandra Workshop - Cassandra from scratch in one dayCassandra Workshop - Cassandra from scratch in one day
Cassandra Workshop - Cassandra from scratch in one day
Carlos Alonso Pérez
 
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
OUG Scotland 2014 - NoSQL and MySQL - The best of both worldsOUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
Andrew Morgan
 
RESTful Web APIs – Mike Amundsen, Principal API Architect, Layer 7
RESTful Web APIs – Mike Amundsen, Principal API Architect, Layer 7RESTful Web APIs – Mike Amundsen, Principal API Architect, Layer 7
RESTful Web APIs – Mike Amundsen, Principal API Architect, Layer 7
CA API Management
 
Leveraging a distributed architecture to your advantage
Leveraging a distributed architecture to your advantageLeveraging a distributed architecture to your advantage
Leveraging a distributed architecture to your advantage
Michelangelo van Dam
 
MySQL Quick Dive
MySQL Quick DiveMySQL Quick Dive
MySQL Quick Dive
Sudipta Kumar Sahoo
 

Similar to Introduction to Cloudera Impala (20)

dplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Datadplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Data
 
Database
DatabaseDatabase
Database
 
Analyzing twitter data with hadoop
Analyzing twitter data with hadoopAnalyzing twitter data with hadoop
Analyzing twitter data with hadoop
 
PHP and MySQL.pptx
PHP and MySQL.pptxPHP and MySQL.pptx
PHP and MySQL.pptx
 
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
 
State of The Dolphin - May 2021
State of The Dolphin - May 2021State of The Dolphin - May 2021
State of The Dolphin - May 2021
 
Silicon Valley 2014 - API Antipatterns
Silicon Valley 2014 - API AntipatternsSilicon Valley 2014 - API Antipatterns
Silicon Valley 2014 - API Antipatterns
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
 
AZMS PRESENTATION.pptx
AZMS PRESENTATION.pptxAZMS PRESENTATION.pptx
AZMS PRESENTATION.pptx
 
Oracle database 12c_and_DevOps
Oracle database 12c_and_DevOpsOracle database 12c_and_DevOps
Oracle database 12c_and_DevOps
 
Extending drupal authentication
Extending drupal authenticationExtending drupal authentication
Extending drupal authentication
 
Which Freaking Database Should I Use?
Which Freaking Database Should I Use?Which Freaking Database Should I Use?
Which Freaking Database Should I Use?
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Cassandra Workshop - Cassandra from scratch in one day
Cassandra Workshop - Cassandra from scratch in one dayCassandra Workshop - Cassandra from scratch in one day
Cassandra Workshop - Cassandra from scratch in one day
 
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
OUG Scotland 2014 - NoSQL and MySQL - The best of both worldsOUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
 
RESTful Web APIs – Mike Amundsen, Principal API Architect, Layer 7
RESTful Web APIs – Mike Amundsen, Principal API Architect, Layer 7RESTful Web APIs – Mike Amundsen, Principal API Architect, Layer 7
RESTful Web APIs – Mike Amundsen, Principal API Architect, Layer 7
 
Leveraging a distributed architecture to your advantage
Leveraging a distributed architecture to your advantageLeveraging a distributed architecture to your advantage
Leveraging a distributed architecture to your advantage
 
MySQL Quick Dive
MySQL Quick DiveMySQL Quick Dive
MySQL Quick Dive
 

More from Alex Moundalexis

Powered by the Sun
Powered by the SunPowered by the Sun
Powered by the Sun
Alex Moundalexis
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
Alex Moundalexis
 
YARN
YARNYARN
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
Alex Moundalexis
 
Hue Visual Tour
Hue Visual TourHue Visual Tour
Hue Visual Tour
Alex Moundalexis
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
Alex Moundalexis
 
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the FieldSearch in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Alex Moundalexis
 

More from Alex Moundalexis (7)

Powered by the Sun
Powered by the SunPowered by the Sun
Powered by the Sun
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
YARN
YARNYARN
YARN
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
Hue Visual Tour
Hue Visual TourHue Visual Tour
Hue Visual Tour
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the FieldSearch in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
 

Recently uploaded

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 

Recently uploaded (20)

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 

Introduction to Cloudera Impala

  • 1. 1 Cloudera  Impala   Charm  City  Linux,  March  2014     Alex  Moundalexis       @technmsg  
  • 2. Thirty  Seconds  About  Alex   •  Solu@ons  Architect   •  aka  consultant   •  government   •  infrastructure   •  former  coder  of  Perl   •  former  administrator   •  likes  shiny  objects   2  
  • 3. What  Does  Cloudera  Do?   •  product   •  distribu@on  of  Hadoop  components,  Apache  licensed   •  enterprise  tooling   •  support   •  training   •  services  (aka  consul@ng)   •  community   3
  • 4. Disclaimer   •  Cloudera  builds  things  soMware   •  most  donated  to  Apache   •  some  closed-­‐source   •  Cloudera  “products”  I  reference  are  open  source   •  Apache  Licensed   •  source  code  is  on  GitHub   •  hSps://github.com/cloudera   4
  • 5. What  This  Talk  Isn’t  About   •  deploying   •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor   •  sizing  &  tuning   •  depends  heavily  on  data  and  workload   •  coding   •  unless  you  count  XML  or  CSV  or  SQL   •  algorithms   5
  • 6. 6 Quick  and  dirty,  for  context.   The  Apache  Hadoop  Ecosystem  
  • 7. Why  “Ecosystem?”   •  In  the  beginning,  just  Hadoop   •  HDFS   •  MapReduce   •  Today,  dozens  of  interrelated  components   •  I/O   •  Processing   •  Specialty  Applica@ons   •  Configura@on   •  Workflow   7
  • 8. HDFS   •  Distributed,  highly  fault-­‐tolerant  filesystem   •  Op@mized  for  large  streaming  access  to  data   •  Based  on  Google  File  System   •  hSp://research.google.com/archive/gfs.html   8
  • 9. Lots  of  Commodity  Machines   9 Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
  • 10. MapReduce  (MR)   •  Programming  paradigm   •  Batch  oriented,  not  real@me   •  Works  well  with  distributed  compu@ng   •  Lots  of  Java,  but  other  languages  supported   •  Based  on  Google’s  paper   •  hSp://research.google.com/archive/mapreduce.html   10
  • 12. You specify map() and reduce() functions. The framework does the rest. 60
  • 13. Apache  Hive   •  Abstrac@on  of  Hadoop’s  Java  API   •  HiveQL  “compiles”  down  to  MR   •  a  “SQL-­‐like”  language   •  Eases  analysis  using  MapReduce   13
  • 14. Apache  Hive  Metastore   •  Maps  HDFS  files  to  DB-­‐like  resources   •  Databases   •  Tables   •  Column/field  names,  data  types   •  Roles/users   •  InputFormat/OutputFormat   14
  • 15. WHY  DO  WE  NEED  THIS?   But  wait…   15  
  • 16. 16  
  • 17. 17 I  am  not  a  SQL  wizard  by  any  means…   Super  Shady  SQL  Supplement  
  • 18. A  Simple  Rela@onal  Database   name   state   employer   year   Alex   Maryland   Cloudera   2013   Joey   Maryland   Cloudera   2011   Sean   Texas   Cloudera   2013   Paris   Maryland   AOL   2011   18  
  • 19. Interac@ng  with  Rela@onal  Data   name   state   employer   year   Alex   Maryland   Cloudera   2013   Joey   Maryland   Cloudera   2011   Sean   Texas   Cloudera   2013   Paris   Maryland   AOL   2011   19  SELECT  *  FROM  people;  
  • 20. Interac@ng  with  Rela@onal  Data   name   state   employer   year   Alex   Maryland   Cloudera   2013   Joey   Maryland   Cloudera   2011   Sean   Texas   Cloudera   2013   Paris   Maryland   AOL   2011   20  SELECT  *  FROM  people;  
  • 21. Reques@ng  Specific  Fields   name   state   employer   year   Alex   Maryland   Cloudera   2013   Joey   Maryland   Cloudera   2011   Sean   Texas   Cloudera   2013   Paris   Maryland   AOL   2011   21  SELECT  name,  state  FROM  people;  
  • 22. Reques@ng  Specific  Fields   name   state   employer   year   Alex   Maryland   Cloudera   2013   Joey   Maryland   Cloudera   2011   Sean   Texas   Cloudera   2013   Paris   Maryland   AOL   2011   22  SELECT  name,  state  FROM  people;  
  • 23. Reques@ng  Specific  Rows   name   state   employer   year   Alex   Maryland   Cloudera   2013   Joey   Maryland   Cloudera   2011   Sean   Texas   Cloudera   2013   Paris   Maryland   AOL   2011   23  SELECT  name,  state  FROM  people  WHERE  year    2012;  
  • 24. Reques@ng  Specific  Rows   name   state   employer   year   Alex   Maryland   Cloudera   2013   Joey   Maryland   Cloudera   2011   Sean   Texas   Cloudera   2013   Paris   Maryland   AOL   2011   24  SELECT  name,  state  FROM  people  WHERE  year    2012;  
  • 25. Two  Simple  Tables   owner   species   name   Alex   Cactus   Marvin   Joey   Cat   Brain   Sean   None   Paris   Unknown   25     name   state   employer   year   Alex   Maryland   Cloudera   2013   Joey   Maryland   Cloudera   2011   Sean   Texas   Cloudera   2013   Paris   Maryland   AOL   2011  
  • 26. Joining  Two  Tables   owner   species   name   Alex   Cactus   Marvin   Joey   Cat   Brain   Sean   None   Paris   Unknown   26    SELECT  people.name  AS  owner,  people.state  AS  state,  pets.name  AS  pet    FROM  people  LEFT  JOIN  pets  ON  people.name  =  pets.owner    name   state   employer   year   Alex   Maryland   Cloudera   2013   Joey   Maryland   Cloudera   2011   Sean   Texas   Cloudera   2013   Paris   Maryland   AOL   2011  
  • 27. Joining  Two  Tables   owner   species   name   Alex   Cactus   Marvin   Joey   Cat   Brain   Sean   None   Paris   Unknown   27    SELECT  people.name  AS  owner,  people.state  AS  state,  pets.name  AS  pet    FROM  people  LEFT  JOIN  pets  ON  people.name  =  pets.owner    name   state   employer   year   Alex   Maryland   Cloudera   2013   Joey   Maryland   Cloudera   2011   Sean   Texas   Cloudera   2013   Paris   Maryland   AOL   2011  
  • 28. Joining  Two  Tables   owner   species   name   Alex   Cactus   Marvin   Joey   Cat   Brain   Sean   None   Paris   Unknown   28    SELECT  people.name  AS  owner,  people.state  AS  state,  pets.name  AS  pet    FROM  people  LEFT  JOIN  pets  ON  people.name  =  pets.owner   name   state   employer   year   Alex   Maryland   Cloudera   2013   Joey   Maryland   Cloudera   2011   Sean   Texas   Cloudera   2013   Paris   Maryland   AOL   2011  
  • 29. Joining  Two  Tables   29  SELECT  people.name  AS  owner,  people.state  AS  state,  pets.name  AS  pet    FROM  people  LEFT  JOIN  pets  ON  people.name  =  pets.owner   owner   state   pet   Alex   Maryland   Marvin   Joey   Maryland   Brain   Sean   Texas   Paris   Maryland  
  • 30. Varying  Implementa@on  of  JOIN   30  SELECT  people.name  AS  owner,  people.state  AS  state,  pets.name  AS  pet    FROM  people  LEFT  JOIN  pets  ON  people.name  =  pets.owner   owner   state   pet   Alex   Maryland   Marvin   Joey   Maryland   Brain   Sean   Texas   ?   Paris   Maryland   ?  
  • 31. 31 Familiar  interface,  but  more  powerful.   Cloudera  Impala  
  • 32. Cloudera  Impala   •  Interac@ve  query  on  Hadoop   •  think  seconds,  not  minutes   •  Nearly  ANSI-­‐92  standard  SQL   •  compa@ble  with  HiveQL   •  Na@ve  MPP  query  engine   •  built  for  low-­‐latency  queries   32
  • 33. Cloudera  Impala  –  Design  Choices   •  Na@ve  daemons,  wriSen  in  C/C++   •  No  JVM,  no  MapReduce   •  Saturate  disks  on  reads   •  Uses  in-­‐memory  HDFS  caching   •  Re-­‐uses  Hive  metastore   •  Not  as  fault-­‐tolerant  as  MapReduce   33
  • 34. Cloudera  Impala  –  Architecture   •  Impala  Daemon   •  runs  on  every  node   •  handles  client  requests   •  handles  query  planning    execu@on   •  State  Store  Daemon   •  provides  name  service   •  metadata  distribu@on   •  used  for  finding  data   34
  • 35. Impala  Query  Execu@on   35 Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Hive   Metastore   HDFS  NN   Statestore   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  request   1)  Request  arrives  via  ODBC/JDBC/HUE/Shell  
  • 36. Impala  Query  Execu@on   36 Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Hive   Metastore   HDFS  NN   Statestore   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   2)  Planner  turns  request  into  collecRons  of  plan  fragments   3)  Coordinator  iniRates  execuRon  on  impalad(s)  local  to  data  
  • 37. Impala  Query  Execu@on   37 Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Hive   Metastore   HDFS  NN   Statestore   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   4)  Intermediate  results  are  streamed  between  impalad(s)   5)  Query  results  are  streamed  back  to  client   Query  results  
  • 38. Cloudera  Impala  –  Results   •  Allows  for  fast  itera@on/discovery   •  How  much  faster?   •  3-­‐4x  faster  on  I/O  bound  workloads   •  up  to  45x  faster  on  mul@-­‐MR  queries   •  up  to  90x  faster  on  in-­‐memory  cache   38
  • 39. 39 Hold  onto  something,  folks.   Demo  
  • 40. What’s  Next?   •  Download  Hadoop!   •  CDH  available  at  www.cloudera.com   •  Already  done  that?  Contribute…   •  Cloudera  provides  pre-­‐loaded  VMs   •  hSp://@ny.cloudera.com/quickstartvm   •  Clone  our  repos!   •  hSps://github.com/cloudera   40
  • 42. 42 Preferably  related  to  the  talk…  or  not.   Ques@ons?  
  • 43. 43 Thank  You!   Alex  Moundalexis       @technmsg     We’re  hiring,  kids!  Well,  not  kids.