Your SlideShare is downloading. ×
0
1
Cloudera	
  Impala	
  
Charm	
  City	
  Linux,	
  March	
  2014	
  
	
  
Alex	
  Moundalexis	
  
	
  	
  
@technmsg	
  
Thirty	
  Seconds	
  About	
  Alex	
  
•  Solu@ons	
  Architect	
  
•  aka	
  consultant	
  
•  government	
  
•  infrastr...
What	
  Does	
  Cloudera	
  Do?	
  
•  product	
  
•  distribu@on	
  of	
  Hadoop	
  components,	
  Apache	
  licensed	
  ...
Disclaimer	
  
•  Cloudera	
  builds	
  things	
  soMware	
  
•  most	
  donated	
  to	
  Apache	
  
•  some	
  closed-­‐s...
What	
  This	
  Talk	
  Isn’t	
  About	
  
•  deploying	
  
•  Puppet,	
  Chef,	
  Ansible,	
  homegrown	
  scripts,	
  in...
6
Quick	
  and	
  dirty,	
  for	
  context.	
  
The	
  Apache	
  Hadoop	
  Ecosystem	
  
Why	
  “Ecosystem?”	
  
•  In	
  the	
  beginning,	
  just	
  Hadoop	
  
•  HDFS	
  
•  MapReduce	
  
•  Today,	
  dozens	...
HDFS	
  
•  Distributed,	
  highly	
  fault-­‐tolerant	
  filesystem	
  
•  Op@mized	
  for	
  large	
  streaming	
  access...
Lots	
  of	
  Commodity	
  Machines	
  
9
Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
MapReduce	
  (MR)	
  
•  Programming	
  paradigm	
  
•  Batch	
  oriented,	
  not	
  real@me	
  
•  Works	
  well	
  with	...
Under	
  the	
  Covers	
  
11
You specify map() and
reduce() functions.
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Upcoming SlideShare
Loading in...5
×

Introduction to Cloudera Impala

478

Published on

Cloudera Impala provides a fast, ad hoc query capability to Apache Hadoop, complementing traditional MapReduce batch processing. Learn the design choices and architecture behind Impala, and how to use near-ubiquitous SQL to explore your own data at scale.

As presented to Charm City Linux on March 25th 2014.
http://www.meetup.com/CharmCityLinux/events/168288632/

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
478
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Introduction to Cloudera Impala"

  1. 1. 1 Cloudera  Impala   Charm  City  Linux,  March  2014     Alex  Moundalexis       @technmsg  
  2. 2. Thirty  Seconds  About  Alex   •  Solu@ons  Architect   •  aka  consultant   •  government   •  infrastructure   •  former  coder  of  Perl   •  former  administrator   •  likes  shiny  objects   2  
  3. 3. What  Does  Cloudera  Do?   •  product   •  distribu@on  of  Hadoop  components,  Apache  licensed   •  enterprise  tooling   •  support   •  training   •  services  (aka  consul@ng)   •  community   3
  4. 4. Disclaimer   •  Cloudera  builds  things  soMware   •  most  donated  to  Apache   •  some  closed-­‐source   •  Cloudera  “products”  I  reference  are  open  source   •  Apache  Licensed   •  source  code  is  on  GitHub   •  hSps://github.com/cloudera   4
  5. 5. What  This  Talk  Isn’t  About   •  deploying   •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor   •  sizing  &  tuning   •  depends  heavily  on  data  and  workload   •  coding   •  unless  you  count  XML  or  CSV  or  SQL   •  algorithms   5
  6. 6. 6 Quick  and  dirty,  for  context.   The  Apache  Hadoop  Ecosystem  
  7. 7. Why  “Ecosystem?”   •  In  the  beginning,  just  Hadoop   •  HDFS   •  MapReduce   •  Today,  dozens  of  interrelated  components   •  I/O   •  Processing   •  Specialty  Applica@ons   •  Configura@on   •  Workflow   7
  8. 8. HDFS   •  Distributed,  highly  fault-­‐tolerant  filesystem   •  Op@mized  for  large  streaming  access  to  data   •  Based  on  Google  File  System   •  hSp://research.google.com/archive/gfs.html   8
  9. 9. Lots  of  Commodity  Machines   9 Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
  10. 10. MapReduce  (MR)   •  Programming  paradigm   •  Batch  oriented,  not  real@me   •  Works  well  with  distributed  compu@ng   •  Lots  of  Java,  but  other  languages  supported   •  Based  on  Google’s  paper   •  hSp://research.google.com/archive/mapreduce.html   10
  11. 11. Under  the  Covers   11
  12. 12. You specify map() and reduce() functions.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×