Dremel is a system that can run aggregate queries over trillions of read-only data in seconds. It uses a nested columnar data storage model and a multi-level serving tree to distribute queries across a cluster. Experiments showed that Dremel was able to process queries orders of magnitude faster than traditional MapReduce approaches on large datasets, and could scale to handle increasing volumes of data and query loads. Dremel's success led Google to develop BigQuery for interactive analysis of petabytes of data, and influenced the open source Apache Drill project.
4. WASHINGTON
STATE
UNIVERSITY
EXISTING TOOLS !!
4
❖ Hive
➢ allows users to read, write, and manage petabytes of
data using SQL
❖ Pig
➢ tool which is used to process the large datasets
❖ MapReduce
➢ allows data analysis for massive datasets using a
SQL-style syntax
➢ works on record oriented data
7. WASHINGTON
STATE
UNIVERSITY
DREMEL
7
❖ System that can run aggregate queries over trillions of
read-only data in seconds
❖ Scalable, Fault Tolerant & Fast
❖ Supports Nested data model
➢ Columnar storage of nested (non-relational) data
➢ Tree like architecture similar to web search
❖ Access data in situ (Eg. – GFS, Bigtable)
❖ Often used in conjunction with MapReduce
❖ Used by Google since 2006
Dremel : Speed
18. WASHINGTON
STATE
UNIVERSITY
QUERY EXECUTION
18
❖ Dremel uses a multi-level serving tree to
execute queries
❖ A root server receives incoming queries,
and routes the queries to the next level in
the serving tree.
❖ The leaf servers communicate with the
storage layer or access the data on local
disk.
Fig: System architecture and execution inside a server node
19. WASHINGTON
STATE
UNIVERSITY
QUERY DISPATCHER
19
❖ In Dremel, several queries are executed simultaneously
❖ Tablets, horizontal partitions of the table, are usually three-way replicated
❖ Fault tolerant; if one replica fails or is unavailable; reschedules on another
server
28. WASHINGTON
STATE
UNIVERSITY
DREMEL LED TO !!
28
❖ Google Big-Query
❖ Google Big-Query
❖ Apache Drill
➢ Open source Implementation of
BigQuery
Fig: System architecture of Big Query