Your SlideShare is downloading. ×
An introduction to Cloudera Impala
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

An introduction to Cloudera Impala

958
views

Published on

An introduction to Cloudera Impala, what is it and …

An introduction to Cloudera Impala, what is it and
how does it work ? How can it bring real time
performance gains to Apache Hadoop ?

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
958
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
50
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Impala ● What is it ? ● How does it work ? ● Performance ● Formats ● Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 2. Impala – What is it ? ● Adhoc real time query for Hadoop ● Open source ● Developed by Cloudera ● Based on Google 2010 dremel paper ● Direct data access via Impala engine ● Future Hadoop parquet update will – Add columnar binary storage to Hadoop – Improve Impala performance www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 3. Impala – How does it work ? ● Direct data access ● Query planning / coordination on data nodes ● Node based query engine ● Low latency ● Perfomance imrovement ● Query data on HDFS or Hbase ● Uses same Hive QL syntax ( SQL like ) ● Has the Hue GUI ● Allows table joins and aggregation www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 4. Impala – Performance Impala delivers performance gains ● IO bound queries – hardware limitations – Min 3 times ● Complex – multiple MapReduce stages – Min 7 times ● Cached queries – Min 20 times www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 5. Impala – Formats Supported formats – Text & Sequence Files which can be compressed as ● Snappy ● GZIP ● BZIP – Future support for ● Avro ● RCFile ● LZO text file ● Parquet www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 6. Impala – Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 7. Impala – Requirements What does Impala need to run ? – CentOS 6.2 – or RHEL (Red Hat Enterprise Linux) – CDH 4.1 (Cloudera Hadoop Distribution) – Cloudera Manager ( advised ) www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems