6. Test setup
Performance test on Hive / Impala / Tajo
H/W
CPU
24 cores (Xeon 2.5 GHz, HT)
Memory
64 GB
Disks
3TB x 6 (NLSAS 7200 RPM)
Network
10G
Size
1 master + 6 data nodes
Versions:
Hadoop
cdh4.3.0
Hive
0.10.0-cdh4.3.0
Impala
impalad_version_1.1.1_RELEASE
Tajo
0.2-SNAPSHOT
Data size: 1.7 TB (4.1B rows, Q1), 8 or less GB (results of Q1, rest of Qs)
3
7.
8. Test setup: Queries
Q1: scan using about 20 text pattern matching filters
Q2: 7 unions with joins
Q3: join
Q4: group by and order by
Q5: 30 text pattern matching filters with OR conditions, group
by, having, and order by
4
12. Results: Q2 – unions, joins
•
•
70
63.64
NB:
60
*Tajo materializing all query results to HDFS
, as is the main goal
*unions are processed in sequence in Tajo n
ow (parallel processing is coming soon)
50
38.64
40
Impala
30
Tajo
processing time (sec.)
20
10
0
6
20. Results: Wrap up
The project is underway; more findings expected in
the future
Performance enhancement thanks to dynamic task
scheduling
: some results showed better performan
ce than Impala, despite Tajo materializing every qu
ery result to HDFS, the project still being in its earl
y stages, and Tajo still being an early build.
10