Your SlideShare is downloading. ×
  • Like
An introduction to Cloudera Impala
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

An introduction to Cloudera Impala

  • 921 views
Published

An introduction to Cloudera Impala, what is it and …

An introduction to Cloudera Impala, what is it and
how does it work ? How can it bring real time
performance gains to Apache Hadoop ?

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
921
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
48
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Impala ● What is it ? ● How does it work ? ● Performance ● Formats ● Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 2. Impala – What is it ? ● Adhoc real time query for Hadoop ● Open source ● Developed by Cloudera ● Based on Google 2010 dremel paper ● Direct data access via Impala engine ● Future Hadoop parquet update will – Add columnar binary storage to Hadoop – Improve Impala performance www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 3. Impala – How does it work ? ● Direct data access ● Query planning / coordination on data nodes ● Node based query engine ● Low latency ● Perfomance imrovement ● Query data on HDFS or Hbase ● Uses same Hive QL syntax ( SQL like ) ● Has the Hue GUI ● Allows table joins and aggregation www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 4. Impala – Performance Impala delivers performance gains ● IO bound queries – hardware limitations – Min 3 times ● Complex – multiple MapReduce stages – Min 7 times ● Cached queries – Min 20 times www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 5. Impala – Formats Supported formats – Text & Sequence Files which can be compressed as ● Snappy ● GZIP ● BZIP – Future support for ● Avro ● RCFile ● LZO text file ● Parquet www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 6. Impala – Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 7. Impala – Requirements What does Impala need to run ? – CentOS 6.2 – or RHEL (Red Hat Enterprise Linux) – CDH 4.1 (Cloudera Hadoop Distribution) – Cloudera Manager ( advised ) www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems