Your SlideShare is downloading. ×
0
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Introduction to Cloudera Impala
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Introduction to Cloudera Impala

424

Published on

Cloudera Impala provides a fast, ad hoc query capability to Apache Hadoop, complementing traditional MapReduce batch processing. Learn the design choices and architecture behind Impala, and how to use …

Cloudera Impala provides a fast, ad hoc query capability to Apache Hadoop, complementing traditional MapReduce batch processing. Learn the design choices and architecture behind Impala, and how to use near-ubiquitous SQL to explore your own data at scale.

As presented to Charm City Linux on March 25th 2014.
http://www.meetup.com/CharmCityLinux/events/168288632/

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
424
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 1 Cloudera  Impala   Charm  City  Linux,  March  2014     Alex  Moundalexis   alexm+ccl@clouderagovt.com     @technmsg  
  • 2. Thirty  Seconds  About  Alex   •  SoluEons  Architect   •  aka  consultant   •  government   •  infrastructure   •  former  coder  of  Perl   •  former  administrator   •  likes  shiny  objects   2  
  • 3. What  Does  Cloudera  Do?   •  product   •  distribuEon  of  Hadoop  components,  Apache  licensed   •  enterprise  tooling   •  support   •  training   •  services  (aka  consulEng)   •  community   3
  • 4. Disclaimer   •  Cloudera  builds  things  soPware   •  most  donated  to  Apache   •  some  closed-­‐source   •  Cloudera  “products”  I  reference  are  open  source   •  Apache  Licensed   •  source  code  is  on  GitHub   •  hVps://github.com/cloudera   4
  • 5. What  This  Talk  Isn’t  About   •  deploying   •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor   •  sizing  &  tuning   •  depends  heavily  on  data  and  workload   •  coding   •  unless  you  count  XML  or  CSV  or  SQL   •  algorithms   5
  • 6. 6 Quick  and  dirty,  for  context.   The  Apache  Hadoop  Ecosystem  
  • 7. Why  “Ecosystem?”   •  In  the  beginning,  just  Hadoop   •  HDFS   •  MapReduce   •  Today,  dozens  of  interrelated  components   •  I/O   •  Processing   •  Specialty  ApplicaEons   •  ConfiguraEon   •  Workflow   7
  • 8. HDFS   •  Distributed,  highly  fault-­‐tolerant  filesystem   •  OpEmized  for  large  streaming  access  to  data   •  Based  on  Google  File  System   •  hVp://research.google.com/archive/gfs.html   8
  • 9. Lots  of  Commodity  Machines   9 Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
  • 10. MapReduce  (MR)   •  Programming  paradigm   •  Batch  oriented,  not  realEme   •  Works  well  with  distributed  compuEng   •  Lots  of  Java,  but  other  languages  supported   •  Based  on  Google’s  paper   •  hVp://research.google.com/archive/mapreduce.html   10
  • 11. Under  the  Covers   11
  • 12. You specify map() and reduce() functions.

×