Getting the most out of Impala - Best practices for infrastructure optimization

1,795 views

Published on

We started testing Impala in an effort to understand what hardware setup would provide the best performance/price for it. We didn’t want to see it perform in extreme cases, but in regular situations that most users would encounter. We aimed to provide a quick practical guide for choosing the infrastructure to run Impala on.

With this in mind, we looked at a medium sized deployment of 4 Full Metal Compute Instances, and we scaled the hardware from single CPU, low RAM capacity to dual CPU, high RAM capacity.
We didn’t start out aiming to prove any particular assumption. The purpose of the project was to explore and understand how Impala works with hardware and our findings were quite surprising.

This presentation was prepared for the London Enterprise Technology Meetup by Alex Bordei - Product Manager at Bigstep, together with Graham Gear - EMEA Director of Systems Engineering at Cloudera.

Stay informed: http://blog.bigstep.com/

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,795
On SlideShare
0
From Embeds
0
Number of Embeds
186
Actions
Shares
0
Downloads
21
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Getting the most out of Impala - Best practices for infrastructure optimization

  1. 1. Who is Cloudera? 2
  2. 2. 3 The Leading Open Source Distribution of Apache Hadoop Powerful Suite of System & Data Management Software Built for the Enterprise Founded: 2008 Employees: 500+ Customers: Over 50% of the Fortune 50 and 65% of the Fortune 500 plus top US intelligence and defense agencies Partners: 700+ in hardware, software, and services Education: 15,000+ trained; developers, admins, analysts, data scientists Community: Founders and top supporters of the Hadoop open source ecosystem
  3. 3. What is Hadoop and how has it evolved? 6 BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA DATA MANAGEMENT SYSTEM MANAGEMENT ENTERPRISE DATA HUB FILESYETEM ONLINE NOSQL OPEN SOURCE SCALABLE FLEXIBLE COST EFFECTIVE MANAGED OPEN ARCHITECTURE SECURE AND GOVERNED ✔ ✔ ✔ ✔
  4. 4. 7

×