Your SlideShare is downloading. ×
0
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Cloudera Impala
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cloudera Impala

196

Published on

Cloudera Impala provides a fast, ad hoc query capability to Apache Hadoop, complementing traditional MapReduce batch processing. Learn the design choices and architecture behind Impala, and how to use …

Cloudera Impala provides a fast, ad hoc query capability to Apache Hadoop, complementing traditional MapReduce batch processing. Learn the design choices and architecture behind Impala, and how to use near-ubiquitous SQL to explore your own data at scale.

As presented to Portland Big Data User Group on July 23rd 2014.
http://www.meetup.com/Hadoop-Portland/events/194930422/

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
196
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 1 Cloudera  Impala   Portland  Big  Data  User  Group,  July  2014     Alex  Moundalexis   @technmsg  
  • 2. Thirty  Seconds  About  Alex   •  SoluGons  Architect   •  aka  consultant   •  government   •  infrastructure   •  former  coder  of  Perl   •  former  administrator   •  fan  of  Portland     2  
  • 3. What  Does  Cloudera  Do?   •  product   •  distribuGon  of  Hadoop  components,  Apache  licensed   •  enterprise  tooling   •  support   •  training   •  services  (aka  consulGng)   •  community   3
  • 4. Disclaimer   •  Cloudera  builds  things  soPware   •  most  donated  to  Apache   •  some  closed-­‐source   •  Cloudera  “products”  I  reference  are  open  source   •  Apache  Licensed   •  source  code  is  on  GitHub   •  hVps://github.com/cloudera   4
  • 5. What  This  Talk  Isn’t  About   •  deploying   •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor   •  sizing  &  tuning   •  depends  heavily  on  data  and  workload   •  coding   •  unless  you  count  XML  or  CSV  or  SQL   •  algorithms   5
  • 6. Public  Domain  IFCAR  
  • 7. CC  BY-­‐SA  Lilian  De  Cassai  
  • 8. cloud·∙e·∙ra  im·∙pal·∙a   8 /kloudˈi(ə)rə  imˈpalə/     noun     a  modern,  open  source,  MPP  SQL  query  engine   for  Apache  Hadoop.     “Cloudera  Impala  provides  fast,  ad  hoc  SQL  query   capability  for  Apache  Hadoop,  complemenGng   tradiGonal  MapReduce  batch  processing.”  
  • 9. 9 Quick  and  dirty,  for  context.   The  Apache  Hadoop  Ecosystem  
  • 10. Why  “Ecosystem?”   •  In  the  beginning,  just  Hadoop   •  HDFS   •  MapReduce   •  Today,  dozens  of  interrelated  components   •  I/O   •  Processing   •  Specialty  ApplicaGons   •  ConfiguraGon   •  Workflow   10
  • 11. HDFS   •  Distributed,  highly  fault-­‐tolerant  filesystem   •  OpGmized  for  large  streaming  access  to  data   •  Based  on  Google  File  System   •  hVp://research.google.com/archive/gfs.html   11
  • 12. Lots  of  Commodity  Machines   12 Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
  • 13. MapReduce  (MR)   •  Programming  paradigm   •  Batch  oriented,  not  realGme   •  Works  well  with  distributed  compuGng   •  Lots  of  Java,  but  other  languages  supported   •  Based  on  Google’s  paper   •  hVp://research.google.com/archive/mapreduce.html   13
  • 14. Under  the  Covers   14
  • 15. You specify map() and reduce() functions.

×