Impala: A Modern,
Open-Source SQL
Engine for Hadoop	
  
Mark	
  Grover	
  
So+ware	
  Engineer,	
  Cloudera	
  
Janu...
•  What	
  is	
  Impala	
  and	
  what	
  is	
  Hadoop?	
  
•  ExecuPon	
  frameworks	
  on	
  Hadoop	
  –	
  MR,	
  Hive,...
•  General-­‐purpose	
  SQL	
  engine	
  
•  Real-­‐Pme	
  queries	
  in	
  Apache	
  Hadoop	
  
What	
  is	
  Impala?	
  
What	
  is	
  Apache	
  Hadoop?	
  
4	
  
Has the Flexibility to Store
and Mine Any Type of Data
§  Ask questions across ...
•  Batch	
  oriented	
  
•  High	
  latency	
  
•  Not	
  all	
  paradigms	
  fit	
  very	
  well	
  
•  Only	
  for	
  dev...
•  MR	
  is	
  hard	
  and	
  only	
  for	
  developers	
  
•  Higher	
  level	
  pla]orms	
  for	
  converPng	
  declaraP...
•  General-­‐purpose	
  SQL	
  engine	
  
•  Real-­‐Pme	
  queries	
  in	
  Apache	
  Hadoop	
  
•  Beta	
  version	
  rel...
Impala	
  Overview:	
  Goals	
  
•  General-­‐purpose	
  SQL	
  query	
  engine:	
  
•  Works	
  for	
  both	
  for	
  ana...
User	
  View	
  of	
  Impala:	
  Overview	
  
•  Runs	
  as	
  a	
  distributed	
  service	
  in	
  cluster:	
  one	
  Imp...
User	
  View	
  of	
  Impala:	
  Overview	
  
•  There	
  is	
  no	
  ‘Impala	
  format’!	
  
•  There	
  is	
  no	
  ‘Imp...
User	
  View	
  of	
  Impala:	
  SQL	
  
•  SQL	
  support:	
  
•  essenPally	
  SQL-­‐92,	
  minus	
  correlated	
  subqu...
User	
  View	
  of	
  Impala:	
  SQL	
  
•  FuncPonal	
  limitaPons:	
  
•  No	
  file	
  formats,	
  SerDes	
  
•  no	
  b...
User	
  View	
  of	
  Impala:	
  HBase	
  
•  FuncPonality	
  highlights:	
  
•  Support	
  for	
  SELECT,	
  INSERT	
  IN...
User	
  View	
  of	
  Impala:	
  HBase	
  
•  Roadmap	
  
•  Full	
  support	
  for	
  UPDATE	
  and	
  DELETE	
  
•  Stor...
Impala	
  Architecture	
  
•  Three	
  binaries:	
  impalad,	
  statestored,	
  catalogd	
  
•  Impala	
  daemon	
  (impal...
Impala	
  Architecture	
  
•  Query	
  execuPon	
  phases	
  
•  request	
  arrives	
  via	
  odbc/jdbc	
  
•  planner	
  ...
Impala	
  Architecture	
  
•  During	
  execuPon	
  
•  intermediate	
  results	
  are	
  streamed	
  between	
  
executor...
Impala	
  Architecture:	
  Query	
  ExecuPon	
  
Request	
  arrives	
  via	
  odbc/jdbc	
  
Query	
  Planner	
  
Query	
  ...
Impala	
  Architecture:	
  Query	
  ExecuPon	
  
Planner	
  turns	
  request	
  into	
  collecPons	
  of	
  plan	
  fragme...
Impala	
  Architecture:	
  Query	
  ExecuPon	
  
Intermediate	
  results	
  are	
  streamed	
  between	
  impalad's	
  Que...
Query	
  Planning:	
  Overview	
  
•  2-­‐phase	
  planning	
  process:	
  
•  single-­‐node	
  plan:	
  le+-­‐deep	
  tre...
Query Planning:	
  Single-­‐Node	
  Plan	
  
•  Plan	
  operators:	
  Scan,	
  HashJoin,	
  HashAggregaPon,	
  Union,	
  
...
Single-­‐Node	
  Plan:	
  Example	
  Query	
  
SELECT	
  t1.cusPd,	
  
	
  	
  	
  	
  	
  	
  	
  SUM(t2.revenue)	
  AS	
...
Query Planning:	
  Single-­‐Node	
  Plan	
  
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
TopN
Agg
•  Single-­‐node	
  pla...
Query Planning:	
  Distributed	
  Plans	
  
•  Goals:	
  
o  maximize	
  scan	
  locality,	
  minimize	
  data	
  movement...
Query Planning:	
  Distributed	
  Plans	
  
•  Parallel	
  aggregaPon:	
  
o  pre-­‐aggregaPon	
  where	
  data	
  is	
  fi...
Query Planning:	
  Distributed	
  Plans	
  
•  In	
  the	
  example:	
  
o  scans	
  are	
  local:	
  each	
  scan	
  rece...
Query	
  Planning:	
  Distributed	
  Plans	
  
HashJoinScan: t1
Scan: t3
Scan: t2
HashJoin
TopN
Pre-Agg
MergeAgg
TopN
Broa...
Metadata	
  Handling	
  
•  Impala	
  metadata:	
  
o  Hive's	
  metastore:	
  logical	
  metadata	
  (table	
  definiPons,...
Metadata	
  Handling	
  
•  Caches	
  metadata:	
  no	
  synchronous	
  metastore	
  API	
  calls	
  during	
  
query	
  e...
Impala	
  ExecuPon	
  Engine	
  
•  Wri@en	
  in	
  C++	
  for	
  minimal	
  execuPon	
  overhead	
  
•  Internal	
  in-­‐...
Impala	
  ExecuPon	
  Engine	
  
•  More	
  on	
  runPme	
  code	
  generaPon	
  
•  example	
  of	
  "big	
  loop":	
  in...
Impala's	
  Statestore	
  
•  Central	
  system	
  state	
  repository	
  
•  name	
  service	
  (membership)	
  
•  Metad...
Statestore:	
  Why	
  not	
  ZooKeeper?	
  	
  
•  ZK	
  is	
  not	
  a	
  good	
  pub-­‐sub	
  system	
  
•  Watch	
  API...
Comparing	
  Impala	
  to	
  Dremel	
  
•  What	
  is	
  Dremel?	
  
•  columnar	
  storage	
  for	
  data	
  with	
  nest...
•  What	
  is	
  it:	
  
•  container	
  format	
  for	
  all	
  popular	
  serializaPon	
  formats:	
  Avro,	
  Thri+,	
 ...
Comparing	
  Impala	
  to	
  Hive	
  
•  Hive:	
  MapReduce	
  as	
  an	
  execuPon	
  engine	
  
•  High	
  latency,	
  l...
Impala	
  Roadmap:	
  2013	
  
•  AddiPonal	
  SQL:	
  
•  ORDER	
  BY	
  without	
  LIMIT	
  
•  AnalyPc	
  window	
  fun...
Impala	
  Roadmap:	
  2013	
  	
  
•  RunPme	
  opPmizaPons:	
  
•  straggler	
  handling	
  
•  improved	
  cache	
  mana...
Demo	
  
•  Uses	
  Cloudera’s	
  Quickstart	
  VM
h@p://Pny.cloudera.com/quick-­‐start	
  
Try	
  it	
  out!	
  
	
  	
  
•  Open	
  source!	
  Available	
  at	
  cloudera.com,	
  AWS	
  EMR!	
  
•  We	
  have	
  ...
Upcoming SlideShare
Loading in...5
×

Impala Architecture presentation

2,272

Published on

Impala Architecture Presentation at Toronto Hadoop User Group, in January 2014 by Mark Grover.

Event details:
http://www.meetup.com/TorontoHUG/events/150328602/

Published in: Software, Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,272
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
212
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Impala Architecture presentation

  1. 1.     Impala: A Modern, Open-Source SQL Engine for Hadoop   Mark  Grover   So+ware  Engineer,  Cloudera   January  7th,  2014   Twi@er:  mark_grover   github.com/markgrover/impala-­‐thug/  
  2. 2. •  What  is  Impala  and  what  is  Hadoop?   •  ExecuPon  frameworks  on  Hadoop  –  MR,  Hive,  etc.   •  Goals  and  user  view  of  Impala   •  Architecture  of  Impala   •  Comparing  Impala  to  other  systems   •  Impala  Roadmap   Agenda  
  3. 3. •  General-­‐purpose  SQL  engine   •  Real-­‐Pme  queries  in  Apache  Hadoop   What  is  Impala?  
  4. 4. What  is  Apache  Hadoop?   4   Has the Flexibility to Store and Mine Any Type of Data §  Ask questions across structured and unstructured data that were previously impossible to ask or solve §  Not bound by a single schema Excels at Processing Complex Data §  Scale-out architecture divides workloads across multiple nodes §  Flexible file system eliminates ETL bottlenecks Scales Economically §  Can be deployed on commodity hardware §  Open source platform guards against vendor lock Hadoop Distributed File System (HDFS) Self-Healing, High Bandwidth Clustered Storage MapReduce Distributed Computing Framework Apache Hadoop is an open source platform for data storage and processing that is… ü  Distributed ü  Fault tolerant ü  Scalable CORE HADOOP SYSTEM COMPONENTS
  5. 5. •  Batch  oriented   •  High  latency   •  Not  all  paradigms  fit  very  well   •  Only  for  developers   So,  what’s  wrong  with  MapReduce?  
  6. 6. •  MR  is  hard  and  only  for  developers   •  Higher  level  pla]orms  for  converPng  declaraPve   syntax  to  MapReduce   •  SQL  –  Hive   •  workflow  language  –  Pig   •  Build  on  top  of  MapReduce   What  are  Hive  and  Pig?  
  7. 7. •  General-­‐purpose  SQL  engine   •  Real-­‐Pme  queries  in  Apache  Hadoop   •  Beta  version  released  since  October  2012   •  General  availability  (v1.0)  release  out  since  April  2013   •  Open  source  under  Apache  license   •  Latest  release  (v1.2.3)  released  on  December  23rd   What  is  Impala?  
  8. 8. Impala  Overview:  Goals   •  General-­‐purpose  SQL  query  engine:   •  Works  for  both  for  analyPcal  and  transacPonal/single-­‐row   workloads   •  Supports  queries  that  take  from  milliseconds  to  hours   •  Runs  directly  within  Hadoop:   •  reads  widely  used  Hadoop  file  formats   •  talks  to  widely  used  Hadoop  storage  managers     •  runs  on  same  nodes  that  run  Hadoop  processes   •  High  performance:   •  C++  instead  of  Java   •  runPme  code  generaPon   •  completely  new  execuPon  engine  –  No  MapReduce  
  9. 9. User  View  of  Impala:  Overview   •  Runs  as  a  distributed  service  in  cluster:  one  Impala  daemon  on   each  node  with  data   •  Highly  available:  no  single  point  of  failure   •  User  submits  query  via  ODBC/JDBC,  Impala  CLI  or  Hue  to  any   of  the  daemons   •  Query  is  distributed  to  all  nodes  with  relevant  data   •  Impala  uses  Hive's  metadata  interface,  connects  to  Hive   metastore  
  10. 10. User  View  of  Impala:  Overview   •  There  is  no  ‘Impala  format’!   •  There  is  no  ‘Impala  format’!!   •  Supported  file  formats:   •  uncompressed/lzo-­‐compressed  text  files   •  sequence  files  and  RCFile  with  snappy/gzip  compression   •  Avro  data  files   •  Parquet  columnar  format  (more  on  that  later)  
  11. 11. User  View  of  Impala:  SQL   •  SQL  support:   •  essenPally  SQL-­‐92,  minus  correlated  subqueries   •  INSERT  INTO  …  SELECT  …   •  only  equi-­‐joins;  no  non-­‐equi  joins,  no  cross  products   •  Order  By  requires  Limit   •  (Limited)  DDL  support   •  SQL-­‐style  authorizaPon  via  Apache  Sentry  (incubaPng)   •  UDFs  and  UDAFs  are  supported  
  12. 12. User  View  of  Impala:  SQL   •  FuncPonal  limitaPons:   •  No  file  formats,  SerDes   •  no  beyond  SQL  (buckets,  samples,  transforms,  arrays,   structs,  maps,  xpath,  json)   •  Broadcast  joins  and  parPPoned  hash  joins  supported   •  Smaller  table  has  to  fit  in  aggregate  memory  of  all  execuPng   nodes  
  13. 13. User  View  of  Impala:  HBase   •  FuncPonality  highlights:   •  Support  for  SELECT,  INSERT  INTO  …  SELECT  …,  and  INSERT   INTO  …  VALUES(…)   •  Predicates  on  rowkey  columns  are  mapped  into  start/stop   rows   •  Predicates  on  other  columns  are  mapped  into   SingleColumnValueFilters   •  But:  mapping  of  HBase  tables  metastore  table   pa@erned  a+er  Hive   •  All  data  stored  as  scalars  and  in  ascii   •  The  rowkey  needs  to  be  mapped  into  a  single  string   column  
  14. 14. User  View  of  Impala:  HBase   •  Roadmap   •  Full  support  for  UPDATE  and  DELETE   •  Storage  of  structured  data  to  minimize  storage  and  access   overhead   •  Composite  row  key  encoding,  mapped  into  an  arbitrary   number  of  table  columns  
  15. 15. Impala  Architecture   •  Three  binaries:  impalad,  statestored,  catalogd   •  Impala  daemon  (impalad)  –  N  instances   •  handles  client  requests  and  all  internal  requests  related  to   query  execuPon   •  State  store  daemon  (statestored)  –  1  instance   •  Provides  name  service  and  metadata  distribuPon   •  Catalog  daemon  (catalogd)  –  1  instance   •  Relays  metadata  changes  to  all  impalad’s  
  16. 16. Impala  Architecture   •  Query  execuPon  phases   •  request  arrives  via  odbc/jdbc   •  planner  turns  request  into  collecPons  of  plan   fragments   •  coordinator  iniPates  execuPon  on  remote   impalad’s  
  17. 17. Impala  Architecture   •  During  execuPon   •  intermediate  results  are  streamed  between   executors   •  query  results  are  streamed  back  to  client   •  subject  to  limitaPons  imposed  to  blocking   operators  (top-­‐n,  aggregaPon)  
  18. 18. Impala  Architecture:  Query  ExecuPon   Request  arrives  via  odbc/jdbc   Query  Planner   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Executor   HDFS  DN   HBase   SQL   request   Query  Coordinator   Query  Coordinator   HiveMeta store   HDFS  NN   Statestore   +   Catalogd  
  19. 19. Impala  Architecture:  Query  ExecuPon   Planner  turns  request  into  collecPons  of  plan  fragments   Coordinator  iniPates  execuPon  on  remote  impalad's   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   HiveMeta store   HDFS  NN   Statestore   +   Catalogd  
  20. 20. Impala  Architecture:  Query  ExecuPon   Intermediate  results  are  streamed  between  impalad's  Query   results  are  streamed  back  to  client   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   query   results   HiveMeta store   HDFS  NN   Statestore   +   Catalogd  
  21. 21. Query  Planning:  Overview   •  2-­‐phase  planning  process:   •  single-­‐node  plan:  le+-­‐deep  tree  of  plan  operators   •  plan  parPPoning:  parPPon  single-­‐node  plan  to  maximize  scan  locality,   minimize  data  movement   •  ParallelizaPon  of  operators:   •  All  query  operators  are  fully  distributed  
  22. 22. Query Planning:  Single-­‐Node  Plan   •  Plan  operators:  Scan,  HashJoin,  HashAggregaPon,  Union,   TopN,  Exchange  
  23. 23. Single-­‐Node  Plan:  Example  Query   SELECT  t1.cusPd,                SUM(t2.revenue)  AS  revenue   FROM  LargeHdfsTable  t1   JOIN  LargeHdfsTable  t2  ON  (t1.id1  =  t2.id)   JOIN  SmallHbaseTable  t3  ON  (t1.id2  =  t3.id)   WHERE  t3.category  =  'Online'   GROUP  BY  t1.cusPd   ORDER  BY  revenue  DESC  LIMIT  10;  
  24. 24. Query Planning:  Single-­‐Node  Plan   HashJoin Scan: t1 Scan: t3 Scan: t2 HashJoin TopN Agg •  Single-­‐node  plan  for  example:    
  25. 25. Query Planning:  Distributed  Plans   •  Goals:   o  maximize  scan  locality,  minimize  data  movement   o  full  distribuPon  of  all  query  operators  (where   semanPcally  correct)   •  Parallel  joins:   o  broadcast  join:  join  is  collocated  with  le+  input;  right-­‐ hand  side  table  is  broadcast  to  each  node  execuPng  join   -­‐>  preferred  for  small  right-­‐hand  side  input   o  parPPoned  join:  both  tables  are  hash-­‐parPPoned  on  join   columns   -­‐>  preferred  for  large  joins   o  cost-­‐based  decision  based  on  column  stats/esPmated   cost  of  data  transfers  
  26. 26. Query Planning:  Distributed  Plans   •  Parallel  aggregaPon:   o  pre-­‐aggregaPon  where  data  is  first  materialized   o  merge  aggregaPon  parPPoned  by  grouping  columns   •  Parallel  top-­‐N:   o  iniPal  top-­‐N  operaPon  where  data  is  first  materialized   o  final  top-­‐N  in  single-­‐node  plan  fragment  
  27. 27. Query Planning:  Distributed  Plans   •  In  the  example:   o  scans  are  local:  each  scan  receives  its  own  fragment   o  1st  join:  large  x  large  -­‐>  parPPoned  join   o  2nd  scan:  large  x  small  -­‐>  broadcast  join   o  pre-­‐aggregaPon  in  fragment  that  materializes  join  result   o  merge  aggregaPon  a+er  reparPPoning  on  grouping   column   o  iniPal  top-­‐N  in  fragment  that  does  merge  aggregaPon   o  final  top-­‐N  in  coordinator  fragment  
  28. 28. Query  Planning:  Distributed  Plans   HashJoinScan: t1 Scan: t3 Scan: t2 HashJoin TopN Pre-Agg MergeAgg TopN Broadcast Broadcast hash t2.idhash t1.id1 hash t1.custid at HDFS DN at HBase RS at coordinator
  29. 29. Metadata  Handling   •  Impala  metadata:   o  Hive's  metastore:  logical  metadata  (table  definiPons,   columns,  CREATE  TABLE  parameters)   o  HDFS  NameNode:  directory  contents  and  block  replica   locaPons   o  HDFS  DataNode:  block  replicas'  volume  ids  
  30. 30. Metadata  Handling   •  Caches  metadata:  no  synchronous  metastore  API  calls  during   query  execuPon   •  impalad  instances  read  metadata  from  metastore  at  startup   •  Catalog  Service  relays  metadata  when  you  run  DDL  or  update   metadata  on  one  of  Impalad’s   •  REFRESH [<tbl>]:  reloads  metadata  on  all  impalad’s  (if   you  added  new  files  via  Hive)   •  INVALIDATE METADATA:  reloads  metadata  for  all  tables   •  Roadmap:  HCatalog  
  31. 31. Impala  ExecuPon  Engine   •  Wri@en  in  C++  for  minimal  execuPon  overhead   •  Internal  in-­‐memory  tuple  format  puts  fixed-­‐width   data  at  fixed  offsets   •  Uses  intrinsics/special  cpu  instrucPons  for  text   parsing,  crc32  computaPon,  etc.   •  RunPme  code  generaPon  for  “big  loops”  
  32. 32. Impala  ExecuPon  Engine   •  More  on  runPme  code  generaPon   •  example  of  "big  loop":  insert  batch  of  rows  into  hash  table   •  known  at  query  compile  Pme:  #  of  tuples  in  a  batch,  tuple   layout,  column  types,  etc.   •  generate  at  compile  Pme:  unrolled  loop  that  inlines  all   funcPon  calls,  contains  no  dead  code,  minimizes  branches   •  code  generated  using  llvm  
  33. 33. Impala's  Statestore   •  Central  system  state  repository   •  name  service  (membership)   •  Metadata   •  Roadmap:  other  scheduling-­‐relevant  or  diagnosPc  state   •  So+-­‐state   •  all  data  can  be  reconstructed  from  the  rest  of  the  system   •  cluster  conPnues  to  funcPon  when  statestore  fails,  but  per-­‐node  state   becomes  increasingly  stale   •  Sends  periodic  heartbeats   •  pushes  new  data   •  checks  for  liveness  
  34. 34. Statestore:  Why  not  ZooKeeper?     •  ZK  is  not  a  good  pub-­‐sub  system   •  Watch  API  is  awkward  and  requires  a  lot  of  client  logic   •  mulPple  round-­‐trips  required  to  get  data  for  changes  to   node's  children   •  push  model  is  more  natural  for  our  use  case   •  Don't  need  all  the  guarantees  ZK  provides:   •  serializability   •  persistence   •  prefer  to  avoid  complexity  where  possible   •  ZK  is  bad  at  the  things  we  care  about  and  good  at  the   things  we  don't  
  35. 35. Comparing  Impala  to  Dremel   •  What  is  Dremel?   •  columnar  storage  for  data  with  nested  structures   •  distributed  scalable  aggregaPon  on  top  of  that   •  Columnar  storage  in  Hadoop:  Parquet   •  stores  data  in  appropriate  naPve/binary  types   •  can  also  store  nested  structures  similar  to  Dremel's  ColumnIO   •  Distributed  aggregaPon:  Impala   •  Impala  plus  Parquet:  a  superset  of  the  published  version  of   Dremel  (which  didn't  support  joins)  
  36. 36. •  What  is  it:   •  container  format  for  all  popular  serializaPon  formats:  Avro,  Thri+,   Protocol  Buffers   •  successor  to  Trevni   •  jointly  developed  between  Cloudera  and  Twi@er   •  open  source;  hosted  on  github   •  Features   •  rowgroup  format:  file  contains  mulPple  horiz.  slices   •  supports  storing  each  column  in  separate  file   •  supports  fully  shredded  nested  data;  repePPon  and  definiPon  levels   similar  to  Dremel's  ColumnIO   •  column  values  stored  in  naPve  types  (bool,  int<x>,  float,  double,  byte   array)   •  support  for  index  pages  for  fast  lookup   •  extensible  value  encodings   More  about  Parquet  
  37. 37. Comparing  Impala  to  Hive   •  Hive:  MapReduce  as  an  execuPon  engine   •  High  latency,  low  throughput  queries   •  Fault-­‐tolerance  model  based  on  MapReduce's  on-­‐disk   checkpoinPng;  materializes  all  intermediate  results   •  Java  runPme  allows  for  easy  late-­‐binding  of  funcPonality:   file  formats  and  UDFs.   •  Extensive  layering  imposes  high  runPme  overhead   •  Impala:   •  direct,  process-­‐to-­‐process  data  exchange   •  no  fault  tolerance   •  an  execuPon  engine  designed  for  low  runPme  overhead  
  38. 38. Impala  Roadmap:  2013   •  AddiPonal  SQL:   •  ORDER  BY  without  LIMIT   •  AnalyPc  window  funcPons   •  support  for  structured  data  types   •  Improved  HBase  support:   •  composite  keys,  complex  types  in  columns,   index  nested-­‐loop  joins,   INSERT/UPDATE/DELETE  
  39. 39. Impala  Roadmap:  2013     •  RunPme  opPmizaPons:   •  straggler  handling   •  improved  cache  management   •  data  collocaPon  for  improved  join  performance   •  Resource  management:   •  goal:  run  exploratory  and  producPon  workloads  in  same   cluster,  against  same  data,  w/o  impacPng  producPon  jobs  
  40. 40. Demo   •  Uses  Cloudera’s  Quickstart  VM h@p://Pny.cloudera.com/quick-­‐start  
  41. 41. Try  it  out!       •  Open  source!  Available  at  cloudera.com,  AWS  EMR!   •  We  have  packages  for:   •  RHEL  5,6,  SLES11,  Ubuntu  Lucid,  Maverick,  Precise,   Debain,  etc.   •  QuesPons/comments?  community.cloudera.com   •  My  twi@er  handle:  mark_grover   •  Slides  at:  github.com/markgrover/impala-­‐thug  
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×