Impala: A Modern, Open-Source SQL Engine for Hadoop

Query Planner
Query Coordinator
Query Executor
HDFS DN HBase
SQL App
ODBC
HDFS NN
Statestore
&
Catalog
Query Planner
Query Coordinator
Query Executor
HDFS DN HBase
Query Planner
Query Coordinator
Query Executor
HDFS DN HBase
SQL request
Hive
Metastore

Query Planner
Query Coordinator
Query Executor
HDFS DN HBase
SQL App
ODBC
Query Planner
Query Coordinator
Query Executor
HDFS DN HBase
HDFS NN
Statestore
&
Catalog
Query Planner
Query Coordinator
Query Executor
HDFS DN HBase
Planner turns request into collections of plan fragments
Coordinator initiates execution on remotes nodes
Hive
Metastore

Intermediate results are streamed between nodes
Operation permitted, query results are streamed back to client
Query Planner
Query Coordinator
Query Executor
HDFS DN HBase
SQL App
ODBC
Hive
Metastore HDFS NN
Statestore
&
Catalog
Query Planner
Query Coordinator
Query Executor
HDFS DN HBase
Query Planner
Query Coordinator
Query Executor
HDFS DN HBase
query
results

void MaterializeTuple(char* tuple) {
for (int i = 0; i < num_slots_; ++i) {
char* slot = tuple + offsets_[i];
switch (types_[i]) {
case BOOLEAN:
*slot = ParseBoolean();
break;
case INT:
*slot = ParseInt();
case FLOAT: …
case STRING: …
// etc.
}
}
}
void MaterializeTuple(char* tuple) {
// i = 0
*(tuple + 0) = ParseInt();
// i = 1
*(tuple + 4) = ParseBoolean();
// i = 2
*(tuple + 5) = ParseInt();
}
Hot code path, called per row

Query
Fragment
Impala Daemon
Query
Fragment
Query
Fragment
IO Manager
Disk Disk Disk
Disk Disk
Thread
0
Thread
1
Thread
2
Thread
3
Thread
4

container format for all popular serialization formats: Avro, Thrift,
Protocol Buffers

From Twitter’s “Dremel Made Simple” blog
The most efficient IO, is one that never happens at all

OVER PARTITION, RANK, LEAD, LAG, NTILE, ..
•
VARCHAR, CHAR

ROLLUP, CUBE, GROUPING SET
SET MINUS INTERSECT

SELECT question FROM audience WHERE has_question = true;

Impala: A Modern, Open-Source SQL Engine for Hadoop

More Related Content

What's hot

Viewers also liked

Similar to Impala: A Modern, Open-Source SQL Engine for Hadoop

More from All Things Open

Recently uploaded

Impala: A Modern, Open-Source SQL Engine for Hadoop