An introduction to Cloudera Impala

1,907 views

Published on

An introduction to Cloudera Impala, what is it and
how does it work ? How can it bring real time
performance gains to Apache Hadoop ?

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,907
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
61
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

An introduction to Cloudera Impala

  1. 1. Impala ● What is it ? ● How does it work ? ● Performance ● Formats ● Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  2. 2. Impala – What is it ? ● Adhoc real time query for Hadoop ● Open source ● Developed by Cloudera ● Based on Google 2010 dremel paper ● Direct data access via Impala engine ● Future Hadoop parquet update will – Add columnar binary storage to Hadoop – Improve Impala performance www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  3. 3. Impala – How does it work ? ● Direct data access ● Query planning / coordination on data nodes ● Node based query engine ● Low latency ● Perfomance imrovement ● Query data on HDFS or Hbase ● Uses same Hive QL syntax ( SQL like ) ● Has the Hue GUI ● Allows table joins and aggregation www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  4. 4. Impala – Performance Impala delivers performance gains ● IO bound queries – hardware limitations – Min 3 times ● Complex – multiple MapReduce stages – Min 7 times ● Cached queries – Min 20 times www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  5. 5. Impala – Formats Supported formats – Text & Sequence Files which can be compressed as ● Snappy ● GZIP ● BZIP – Future support for ● Avro ● RCFile ● LZO text file ● Parquet www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  6. 6. Impala – Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  7. 7. Impala – Requirements What does Impala need to run ? – CentOS 6.2 – or RHEL (Red Hat Enterprise Linux) – CDH 4.1 (Cloudera Hadoop Distribution) – Cloudera Manager ( advised ) www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  8. 8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems

×