Your SlideShare is downloading. ×
0
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Introduction to Cloudera Impala

462

Published on

Cloudera Impala provides a fast, ad hoc query capability to Apache Hadoop, complementing traditional MapReduce batch processing. Learn the design choices and architecture behind Impala, and how to use …

Cloudera Impala provides a fast, ad hoc query capability to Apache Hadoop, complementing traditional MapReduce batch processing. Learn the design choices and architecture behind Impala, and how to use near-ubiquitous SQL to explore your own data at scale.

As presented to Charm City Linux on March 25th 2014.
http://www.meetup.com/CharmCityLinux/events/168288632/

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
462
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. 1 Cloudera  Impala   Charm  City  Linux,  March  2014     Alex  Moundalexis       @technmsg  
  2. Thirty  Seconds  About  Alex   •  Solu@ons  Architect   •  aka  consultant   •  government   •  infrastructure   •  former  coder  of  Perl   •  former  administrator   •  likes  shiny  objects   2  
  3. What  Does  Cloudera  Do?   •  product   •  distribu@on  of  Hadoop  components,  Apache  licensed   •  enterprise  tooling   •  support   •  training   •  services  (aka  consul@ng)   •  community   3
  4. Disclaimer   •  Cloudera  builds  things  soMware   •  most  donated  to  Apache   •  some  closed-­‐source   •  Cloudera  “products”  I  reference  are  open  source   •  Apache  Licensed   •  source  code  is  on  GitHub   •  hSps://github.com/cloudera   4
  5. What  This  Talk  Isn’t  About   •  deploying   •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor   •  sizing  &  tuning   •  depends  heavily  on  data  and  workload   •  coding   •  unless  you  count  XML  or  CSV  or  SQL   •  algorithms   5
  6. 6 Quick  and  dirty,  for  context.   The  Apache  Hadoop  Ecosystem  
  7. Why  “Ecosystem?”   •  In  the  beginning,  just  Hadoop   •  HDFS   •  MapReduce   •  Today,  dozens  of  interrelated  components   •  I/O   •  Processing   •  Specialty  Applica@ons   •  Configura@on   •  Workflow   7
  8. HDFS   •  Distributed,  highly  fault-­‐tolerant  filesystem   •  Op@mized  for  large  streaming  access  to  data   •  Based  on  Google  File  System   •  hSp://research.google.com/archive/gfs.html   8
  9. Lots  of  Commodity  Machines   9 Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
  10. MapReduce  (MR)   •  Programming  paradigm   •  Batch  oriented,  not  real@me   •  Works  well  with  distributed  compu@ng   •  Lots  of  Java,  but  other  languages  supported   •  Based  on  Google’s  paper   •  hSp://research.google.com/archive/mapreduce.html   10
  11. Under  the  Covers   11
  12. You specify map() and reduce() functions.

×