SlideShare a Scribd company logo
1 of 10
Download to read offline
Tenzing
      A SQL Implementation On The MapReduce Framework

                  Biswapesh                               Liang Lin                        Weiran Liu                Sagar Mittal
                 Chattopadhyay                              lianglin@                         wrliu@                  sagarmittal@
                     biswapesh@
                    Prathyusha                       Vera Lychagina                    Younghee Kwon                Michael Wong
                     Aragonda                            vlychagina@                       youngheek@                  mcwong@
                    prathyusha@
                                                                              Google
                                                                         *@google.com

ABSTRACT                                                                               an alternative to SQL for increasingly sophisticated analyt-
Tenzing is a query engine built on top of MapReduce [9]                                ics. Such vendors include AsterData, GreenPlum, Paraccel,
for ad hoc analysis of Google data. Tenzing supports a                                 and Vertica.
mostly complete SQL implementation (with several exten-                                   In this paper, we describe Tenzing, a SQL query execution
sions) combined with several key characteristics such as het-                          engine built on top of MapReduce. We have been able to
erogeneity, high performance, scalability, reliability, meta-                          build a system with latency as low as ten seconds, high ef-
data awareness, low latency, support for columnar storage                              ficiency, and a comprehensive SQL92 implementation with
and structured data, and easy extensibility. Tenzing is cur-                           some SQL99 extensions. Tenzing also supports efficiently
rently used internally at Google by 1000+ employees and                                querying data in row stores, column stores, Bigtable [7],
serves 10000+ queries per day over 1.5 petabytes of com-                               GFS [14], text and protocol buffers. Users have access to
pressed data. In this paper, we describe the architecture                              the underlying platform through SQL extensions such as
and implementation of Tenzing, and present benchmarks of                               user defined table valued functions and native support for
typical analytical queries.                                                            nested relational data.
                                                                                          We take advantage of indexes and other traditional opti-
                                                                                       mization techniques—along with a few new ones—to achieve
1. INTRODUCTION                                                                        performance comparable to commercial parallel databases.
   The MapReduce [9] framework has gained wide popularity                              Thanks to MapReduce, Tenzing scales to thousands of cores
both inside and outside Google. Inside Google, MapReduce                               and petabytes of data on cheap, unreliable hardware. We
has quickly become the framework of choice for writing large                           worked closely with the MapReduce team to implement and
scalable distributed data processing jobs. Outside Google,                             take advantage of MapReduce optimizations. Our enhance-
projects such as Apache Hadoop have been gaining popular-                              ments to the MapReduce framework are described in more
ity rapidly. However, the framework is inaccessible to casual                          detail in section 5.1.
business users due to the need to understand distributed                                  Tenzing has been widely adopted inside Google, especially
data processing and C++ or Java.                                                       by the non-engineering community (Sales, Finance, Market-
   Attempts have been made to create simpler interfaces on                             ing), with more than a thousand users and ten thousand
top of MapReduce, both inside Google (Sawzall [19], Flume-                             analytic queries per day. The Tenzing service currently runs
Java [6]) and outside (PIG [18], HIVE [21], HadoopDB [1]).                             on two data centers with two thousand cores each and serves
To the best of our knowledge, such implementations suffer                               queries on over 1.5 petabytes of compressed data in several
from latency on the order of minutes, low efficiency, or poor                            different data sources and formats. Multiple Tenzing tables
SQL compatibility. Part of this inefficiency is a result of                              have over a trillion rows each.
MapReduce’s perceived inability to exploit familiar database
optimization and execution techniques [10, 12, 20]. At the
same time, distributed DBMS vendors have integrated the
MapReduce execution model in their engines [13] to provide                             2. HISTORY AND MOTIVATION
                                                                                          At the end of 2008, the data warehouse for Google Ads
                                                                                       data was implemented on top of a proprietary third-party
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are              database appliance henceforth referred to as DBMS-X. While
not made or distributed for profit or commercial advantage and that copies              it was working reasonably well, we faced the following issues:
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. Articles from this volume were invited to present                 1. Increased cost of scalability: our need was to scale to
their results at The 37th International Conference on Very Large Data Bases,                  petabytes of data, but the cost of doing so on DBMS-X
August 29th - September 3rd 2011, Seattle, Washington.                                        was deemed unacceptably high.
Proceedings of the VLDB Endowment, Vol. 4, No. 12
Copyright 2011 VLDB Endowment 2150-8097/11/08... $ 10.00.




                                                                                    1318
WebUI, API, CLI                                            constantly. This allows us to significantly decrease the end-
          data & control
                                                                   to-end latency of queries. The pool consists of master and
        Query Server        Metadata Server                        worker nodes, plus an overall gatekeeper called the master
                                                                   watcher. The workers manipulate the data for all the tables
                            Master Watcher                         defined in the metadata layer. Since Tenzing is a heteroge-
                            Masters        Workers                 neous system, the backend storage can be a mix of various
                                                                   data stores, such as ColumnIO [17], Bigtable [7], GFS [14]
        Storage                                                    files, MySQL databases, etc.
                                                                      The query server serves as the gateway between the
                                                                   client and the pool. The query server parses the query, ap-
                                                                   plies optimizations and sends the plan to the master for
            Figure 1: Tenzing Architecture.
                                                                   execution. The Tenzing optimizer applies some basic rule
                                                                   and cost based optimizations to create an optimal execution
    2. Rapidly increasing loading times: importing new data        plan.
       took hours each day and adding new data sources took           Tenzing has several client interfaces, including a com-
       proportionally longer. Further, import jobs competed        mand line client (CLI) and a Web UI. The CLI provides
       with user queries for resources, leading to poor query      more power such as complex scripting and is used mostly by
       performance during the import process.                      power users. The Web UI, with easier-to-use features such
                                                                   as query & table browsers and syntax highlighting, is geared
    3. Analyst creativity was being stifled by the limitations      towards novice and intermediate users. There is also an API
       of SQL and lack of access to multiple sources of data.      to directly execute queries on the pool, and a standalone bi-
       An increasing number of analysts were being forced          nary which does not need any server side components, but
       to write custom code for more complex analysis, often       rather launches its own MapReduce jobs.
       directly against the source (such as Sawzall against           The metadata server provides an API to store and fetch
       logs).                                                      metadata such as table names and schemas, and pointers to
                                                                   the underlying data. The metadata server is also responsi-
  We decided to do a major re-design of the existing plat-         ble for storing ACLs (Access Control Lists) and other secu-
form by moving all our analysis to use Google infrastructure,      rity related information about the tables. The server uses
and specifically to the MapReduce platform. The new plat-           Bigtable as the persistent backing store.
form had to:
    1. Scale to thousands of cores, hundreds of users and
                                                                   3.1 Life Of A Query
       petabytes of data.                                              A typical Tenzing query goes through the following steps:

    2. Run on unreliable off-the-shelf hardware, while contin-          1. A user (or another process) submits the query to the
       uing to be highly reliable.                                        query server through the Web UI, CLI or API.

    3. Match or exceed the performance of the existing plat-           2. The query server parses the query into an intermediate
       form.                                                              parse tree.

    4. Have the ability to run directly off the data stored on          3. The query server fetches the required metadata from
       Google systems, to minimize expensive ETL processes.               the metadata server to create a more complete inter-
                                                                          mediate format.
    5. Provide all the required SQL features to the analysts
       to minimize the learning curve, while also support-             4. The optimizer goes through the intermediate format
       ing more advanced functionality such as complex user-              and applies various optimizations.
       defined functions, prediction and mining.
                                                                       5. The optimized execution plan consists of one or more
  We have been largely successful in this effort with Tenz-                MapReduces. For each MapReduce, the query server
ing. With about 18 months of development, we were able to                 finds an available master using the master watcher and
successfully migrate all users off DBMS-X to the new plat-                 submits the query to it. At this stage, the execution
form and provide them with significantly more data, more                   has been physically partitioned into multiple units of
powerful analysis capabilities, similar performance for most              work(i.e. shards).
common scenarios, and far better scalability.
                                                                       6. Idle workers poll the masters for available work. Re-
                                                                          duce workers write their results to an intermediate
3. IMPLEMENTATION OVERVIEW                                                storage.
   The system has four major components: the worker pool,
query server, client interfaces, and metadata server. Figure           7. The query server monitors the intermediate area for
1 illustrates the overall Tenzing architecture.                           results being created and gathers them as they arrive.
   The distributed worker pool is the execution system                    The results are then streamed to the upstream client.
which takes a query execution plan and executes the MapRe-
duces. In order to reduce query latency, we do not spawn           4. SQL FEATURES
any new processes,1 but instead keep the processes running           Tenzing is a largely feature complete SQL engine, and
1                                                                  supports all major SQL92 constructs, with some limitations.
 This option is available as a separate batch execution
model.                                                             Tenzing also adds a number of enhancements on core SQL



                                                                1319
for more advanced analysis. These enhancements are de-              MapReduce framework to relax this restriction so that all
signed to be fully parallelizable to utilize the underlying         values for the same key end up in the same reducer shard,
MapReduce framework.                                                but not necessarily in the same Reduce() call. This made
                                                                    it possible to completely avoid the sorting step on the re-
4.1     Projection And Filtering                                    ducer and implement a pure hash table based aggregation
   Tenzing supports all the standard SQL operators (arith-          on MapReduce. This can have a significant impact on the
metic operators, IN, LIKE, BETWEEN, CASE, etc.) and                 performance on certain types of queries. Due to optimizer
functions. In addition, the execution engine embeds the             limitations, the user must explicitly indicate that they want
Sawzall [19] language so that any built-in Sawzall function         hash based aggregation. A query using hash-based aggre-
can be used. Users can also write their own functions in            gation will fail if there is not enough memory for the hash
Sawzall and call them from Tenzing.                                 table.
   The Tenzing compiler does several basic optimizations re-           Consider the following query:
lated to filtering. Some examples include:
                                                                        SELECT dept id, COUNT(1)
                                                                        FROM Employee
  1. If an expression evaluates to a constant, it is converted
                                                                        /∗+ HASH ∗/ GROUP BY 1;
     to a constant value at compile time.
                                                                    The following is the pseudo-code:
  2. If a predicate is a constant or a constant range (e.g.,
     BETWEEN) and the source data is an indexed source                  // In the mapper startup, initialize a hash table with dept id
     (e.g., Bigtable), the compiler will push down the con-             // as key and count as value.
                                                                        Mapper::Start() {
     dition to an index range scan on the underlying source.              dept hash = new Hashtable()
     This is very useful for making date range scans on fact            }
     tables, or point queries on dimension tables, for exam-
     ple.                                                               // For each row, increment the count for the corresponding
                                                                        // dept id.
  3. If the predicate does not involve complex functions                Mapper::Map(in) {
     (e.g., Sawzall functions) and the source is a database               dept hash[in.dept id] ++
     (e.g., MySQL), the filter is pushed down to the under-              }
     lying query executed on the source database.                       // At the end of the mapping phase, flush the hash table to
                                                                        // the reducer, without sorting.
  4. If the underlying store is range partitioned on a col-             Mapper::Finish() {
     umn, and the predicate is a constant range on that                  for (dept id in dept hash) {
     column, the compiler will skip the partitions that fall               OutputToReducerNoSort(
     outside the scan range. This is very useful for date                    key = dept id, value = dept hash[dept id])
     based fact tables, for example.                                     }
                                                                        }
  5. ColumnIO files have headers which contain meta in-
                                                                        // Similarly, in the reducer, initialize a hash table
     formation about the data, including the low and high
                                                                        Reducer::Start() {
     values for each column. The execution engine will ig-                dept hash = new Hashtable()
     nore the file after processing the header if it can de-             }
     termine that the file does not contain any records of
     interest, based on the predicates defined for that table            // Each Reduce call receives a dept id and a count from
     in the query.                                                      // the map output. Increment the corresponding entry in
                                                                        // the hash table.
  6. Tenzing will scan only the columns required for query              Reducer::Reduce(key, value) {
                                                                          dept hash[key] += value
     execution if the underlying format supports it (e.g.,              }
     ColumnIO).
                                                                        // At the end of the reduce phase, flush out the
4.2     Aggregation                                                     // aggregated results.
   Tenzing supports all standard aggregate functions such as            Reducer::Finish() {
SUM, COUNT, MIN, MAX, etc. and the DISTINCT equiv-                        for (dept id in dept hash) {
                                                                            print dept id, dept hash[dept id]
alents (e.g., COUNT DISTINCT). In addition, we support a                  }
significant number of statistical aggregate functions such as            }
CORR, COVAR and STDDEV. The implementation of ag-
gregate functions on the MapReduce framework have been
discussed in numerous papers including the original MapRe-          4.3 Joins
duce paper [9]; however, Tenzing employs a few additional              Joins in MapReduce have been studied by various groups
optimizations. One such is pure hash table based aggrega-           [2,4,22]. Since joins are one of the most important aspects of
tion, which is discussed below.                                     our system, we have spent considerable time on implement-
                                                                    ing different types of joins and optimizing them. Tenzing
4.2.1    HASH BASED AGGREGATION                                     supports efficient joins across data sources, such as Colum-
  Hash table based aggregation is common in RDBMS sys-              nIO to Bigtable; inner, left, right, cross, and full outer joins;
tems. However, it is impossible to implement efficiently              and equi semi-equi, non-equi and function based joins. Cross
on the basic MapReduce framework, since the reducer al-             joins are only supported for tables small enough to fit in
ways unnecessarily sorts the data by key. We enhanced the           memory, and right outer joins are supported only with sort/



                                                                 1320
merge joins. Non-equi correlated subqueries are currently                4.3.3   DISTRIBUTED SORT-MERGE JOINS
not supported.                                                          Distributed sort-merge joins are the most widely used
  We include distributed implementations for nested loop,            joins in MapReduce implementations. Tenzing has an im-
sort/merge and hash joins. For sort/merge and hash joins,            plementation which is most effective when the two tables
the degree of parallelism must be explicitly specified due to         being joined are roughly the same size and neither has an
optimizer limitations. Some of the join techniques imple-            index on the join key.
mented in Tenzing are discussed below.
                                                                         4.3.4   DISTRIBUTED HASH JOINS
4.3.1    BROADCAST JOINS                                                Distributed hash joins are frequently the most effective
   The Tenzing cost-based optimizer can detect when a sec-           join method in Tenzing when:
ondary table is small enough to fit in memory. If the order of
tables specified in the query is not optimal, the compiler can              • Neither table fits completely in memory,
also use a combination of rule and cost based heuristics to                • One table is an order of magnitude larger than the
switch the order of joins so that the larger table becomes the               other,
driving table. If small enough, the secondary table is pulled
into the memory of each mapper / reducer process for in-                   • Neither table has an efficient index on the join key.
memory lookups, which typically is the fastest method for
joining. In some cases, we also use a sorted disk based serial-      These conditions are often satisfied by OLAP queries with
ized implementation for the bigger tables to conserve mem-           star joins to large dimensions, a type of query often used
ory. Broadcast joins are supported for all join conditions           with Tenzing.
(CROSS, EQUI, SEMI-EQUI, NON-EQUI), each having a                      Consider the execution of the following query:
specialized implementation for optimal performance. A few                SELECT Dimension.attr, sum(Fact.measure)
additional optimizations are applied on a case-by-case basis:            FROM Fact
                                                                         /∗+ HASH ∗/ JOIN Dimension USING (dimension key)
   • The data structure used to store the lookup data is                 /∗+ HASH ∗/ GROUP BY 1;
     determined at execution time. For example, if the sec-          Tenzing processes the query as follows:2
     ondary table has integer keys in a limited range, we use
     an integer array. For integer keys with wider range, we             // The optimizer creates the MapReduce operations required
                                                                         // for the hash join and aggregation.
     use a sparse integer map. Otherwise we use a data type              Optimizer::CreateHashJoinPlan(fact table, dimension table) {
     specific hash table.                                                   if (dimension table is not hash partitioned on
                                                                                 dimension key)
   • We apply filters on the join data while loading to re-                    Create MR1: Partition dimension table by dimension key
     duce the size of the in-memory structure, and also only               if (fact table is not hash partitioned on dimension key)
                                                                              Create MR2: Partition fact table by dimension key
     load the columns that are needed for the query.
                                                                           Create MR3: Shard−wise hash join and aggregation

   • For multi-threaded workers, we create a single copy                 // MR1 and MR2 are trivial repartitions and hence the
     of the join data in memory and share it between the                 // pseudo−code is skipped. Pseudo−code for MR3 is below:
     threads.
                                                                         // At the start of the mapper, load the corresponding
                                                                         // dimension shard into memory. Also initialize a hash
   • Once a secondary data set is copied into the worker                 // table for the aggregation.
     process, we retain the copy for the duration of the                 Mapper::Start() {
     query so that we do not have to copy the data for every               fact shard.Open()
     map shard. This is valuable when there are many map                   shard id = fact shard.GetCurrentShard()
     shards being processed by a relatively small number of                dimension shard[shard id].Open()
     workers.                                                              lookup hash = new HashTable()
                                                                           lookup hash.LoadFrom(dimension shard[shard id])
                                                                           agg hash = new Hashtable()
   • For tables which are both static and frequently used,               }
     we permanently cache the data in local disk of the
     worker to avoid remote reads. Only the first use of                  // For each row, increment the count for the corresponding
     the table results in a read into the worker. Subsequent             // dimension key.
                                                                         Mapper::Map(in) {
     reads are from the cached copy on local disk.
                                                                           dim rec = lookup hash(in.dimension key);
                                                                           agg hash(dim rec.attr) += in.measure;
   • We cache the join results from the last record; since               }
     input data often is naturally ordered on the join at-
     tribute(s), it saves us one lookup access.                          // At the end of the mapping phase, flush the hash table
                                                                         // to the reducer, without sorting.
4.3.2    REMOTE LOOKUP JOINS                                             Mapper::Finish() {
                                                                          for (attr in agg hash)
  For sources which support remote lookups on index (e.g.,                  OutputToReducerNoSort(
Bigtable), Tenzing supports remote lookup joins on the key                    key = attr, value = agg hash[attr]);
(or a prefix of the key). We employ an asynchronous batch                  }
lookup technique combined with a local LRU cache in order                }
to improve performance. The optimizer can intelligently              2
                                                                       Note that no sorting is required in any of the MapReduces,
switch table order to enable this if needed.                         so it is disabled for all of them.



                                                                  1321
The reducer code is identical to the hash aggregation             4.5 OLAP Extensions
pseudo-code in 4.2.1.                                                  Tenzing supports the ROLLUP() and CUBE() OLAP ex-
   The Tenzing scheduler can intelligently parallelize opera-        tensions to the SQL language. We follow the Oracle variant
tions to make hash joins run faster. In this case, for example,      of the syntax for these extensions. These are implemented
the compiler would chain MR3 and MR1, and identify MR2               by emitting additional records in the Map() phase based on
as a join source. This has the following implications for the        the aggregation combinations required.
backend scheduler:                                                     Consider the following query which outputs the salary of
                                                                     each employee and also department-wise and grand totals:
   • MR1 and MR2 will be started in parallel.
                                                                         SELECT dept, emp, SUM(salary)
                                                                         FROM Employee
   • MR3 will be started as soon as MR2 finishes and MR1                  GROUP BY ROLLUP(1, 2);
     starts producing data, without waiting for MR1 to fin-
                                                                     The following pseudo-code explains the backend implemen-
     ish.
                                                                     tation:
   We also added several optimizations to the MapReduce                  // For each row, emit multiple key value pairs, one
framework to improve the efficiency of distributed hash joins,             // for each combination of keys being aggregated.
                                                                         Mapper::Map(in) {
such as sort avoidance, streaming, memory chaining and                     OutputToReducerWithSort(
block shuffle. These are covered in more detail in 5.1.                        key = {dept, emp}, value = salary)
                                                                           OutputToReducerWithSort(
4.4    Analytic Functions                                                    key = {dept, NULL}, value = salary)
                                                                           OutputToReducerWithSort(
  Tenzing supports all the major analytic functions, with                    key = {NULL, NULL}, value = salary)
a syntax similar to PostgreSQL / Oracle. These functions                 }
have proven to be quite popular with the analyst commu-
nity. Some of the most commonly used analytic functions                  // The reducer will do a standard pre−sorted
are RANK, SUM, MIN, MAX, LEAD, LAG and NTILE.                            // aggregation on the data.
  Consider the following example with analytic functions                 Reducer::Reduce(key, list<value>) {
                                                                           sum := 0
which ranks employees in each department by their salary:                  for (i = 1 to list<value>.size()) {
                                                                             sum += value[i].salary
 SELECT                                                                    }
  dept, emp, salary,                                                       print key, sum
  RANK() OVER (                                                          }
    PARTITION BY dept ORDER BY salary DESC)
    AS salary rank
 FROM Employee;                                                      4.6 Set Operations
                                                                       Tenzing supports all standard SQL set operations such as
The following pseudo-code explains the backend implemen-
                                                                     UNION, UNION ALL, MINUS and MINUS ALL. Set op-
tation:
                                                                     erations are implemented mainly in the reduce phase, with
 Mapper::Map(in) {                                                   the mapper emitting the records using the whole record as
   // From the mapper, we output the partitioning key of             the key, in order to ensure that similar keys end up together
   // the analytic function as the key, and the ordering             in the same reduce call. This does not apply to UNION
   // key and other information as value.                            ALL, for which we use round robin partitioning and turn off
   OutputToReducerWithSort(                                          reducer side sorting for greater efficiency.
     key = in.dept, value = {in.emp, in.salary})
 }                                                                   4.7 Nested Queries And Subqueries
 Reducer::Reduce(key, values) {                                        Tenzing supports SELECT queries anywhere where a ta-
   // Reducer receives all values with the same partitioning         ble name is allowed, in accordance with the SQL standard.
   // key. The list is then sorted on the ordering key for           Typically, each nested SQL gets converted to a separate
   // the analytic function.                                         MapReduce and the resultant intermediate table is substi-
   sort(values on value.salary)
   // For simple analytic function such as RANK, it is
                                                                     tuted in the outer query. However, the compiler can opti-
         enough                                                      mize away relatively simple nested queries such that extra
   // to just print out the results once sorted.                     MapReduce jobs need not be created. For example, the
   for (value in values) {                                           query
     print key, value.emp, value.salary, i
   }                                                                     SELECT COUNT(∗) FROM (
 }                                                                        SELECT DISTINCT emp id FROM Employee);
                                                                     results in two MapReduce jobs: one for the inner DISTINCT
  Currently, Tenzing does not support the use of multi-              and a second for the outer COUNT. However, the query
ple analytic functions with different partitioning keys in the
same query. We plan to remove this restriction by rewriting              SELECT ∗ FROM (
                                                                          SELECT DISTINCT emp id FROM Employee);
such queries as merging result sets from multiple queries,
each with analytic functions having the same partitioning            results in only one MapReduce job. If two (or more) MapRe-
key. Note that it is possible to combine aggregation and             duces are required, Tenzing will put the reducer and follow-
analytic functions in the same query - the compiler simply           ing mapper in the same process. This is discussed in greater
rewrites it into multiple queries.                                   detail in 5.1.



                                                                  1322
message Department {                                                Tenzing allows the user to specify, but does not enforce,
   required int32 dept id = 1;                                      primary and foreign keys.
   required string dept name = 2;                                     Limited support (no joins) for UPDATE and DELETE is
   repeated message Employee {                                      implemented by applying the update or delete criteria on the
      required int32 emp id = 3;                                    data to create a new dataset. A reference to the new dataset
      required string emp name = 4;
   };                                                               then replaces the old reference in the metadata repository.
   repeated message Location {
      required string city name = 5;                                4.11 DDL
   }                                                                   We support a number of DDL operations, including CRE-
};                                                                  ATE [OR REPLACE] [EXTERNAL] [TEMPORARY] TA-
                                                                    BLE, DROP TABLE [IF EXISTS], RENAME TABLE, GEN-
      Figure 2: Sample protocol buffer definition.                    ERATE STATISTICS, GRANT and REVOKE. They all
                                                                    work as per standard SQL and act as the main means of
                                                                    controlling metadata and access. In addition, Tenzing has
4.8    Handling Structured Data                                     metadata discovery mechanisms built-in to simplify import-
   Tenzing has read-only support for structured (nested and         ing datasets into Tenzing. For example, we provide tools
repeated) data formats such as complex protocol buffer struc-        and commands to automatically determine the structure of
tures. The current implementation is somewhat native and            a MySQL database and make its tables accessible from Ten-
inefficient in that the data is flattened by the reader at the         zing. We can also discover the structure of protocol buffers
lowest level and fed as multiple records to the engine. The         from the protocol definition and import the metadata into
engine itself can only deal with flat relational data, unlike        Tenzing. This is useful for Bigtable, ColumnIO, RecordIO
Dremel [17]. Selecting fields from different repetition levels        and SSTable files in protocol buffer format.
in the same query is considered an error.
   For example, given the protocol buffer definition in figure
                                                                    4.12 Table Valued Functions
2, the query                                                           Tenzing supports both scalar and table-valued user-defined
                                                                    functions, implemented by embedding a Sawzall interpreter
 SELECT emp name, dept name FROM Department;                        in the Tenzing execution engine. The framework is designed
is valid since it involves one repeated level (Employee). How-      such that other languages can also be easily integrated. In-
ever, the query                                                     tegration of Lua and R has been proposed, and work is in
                                                                    progress. Tenzing currently has support for creating func-
 SELECT emp name, city name FROM Department;                        tions in Sawzall that take tables (vector of tuples) as input
is invalid since Employee and Location are independently            and emit tables as output. These are useful for tasks such
repeating groups.                                                   as normalization of data and doing complex computation
                                                                    involving groups of rows.
4.9    Views
  Tenzing supports logical views over data. Views are sim-          4.13 Data Formats
ply named SELECT statements which are expanded inline                  Tenzing supports direct querying of, loading data from,
during compilation. Views in Tenzing are predominantly              and downloading data into many formats. Various options
used for security reasons: users can be given access to views       can be specified to tweak the exact form of input / output.
without granting them access to underlying tables, enabling         For example, for delimited text format, the user can spec-
row and column level security of data.                              ify the delimiter, encoding, quoting, escaping, headers, etc.
  Consider the table Employee with fields emp id, ldap user,         The statement below will create the Employee table from a
name, dept id, and salary. We can create the following view         pipe delimited text file input and validate that the loaded
over it:                                                            data matches the table definition and all constraints (e.g.,
                                                                    primary key) are met:
 CREATE VIEW Employee V AS
 SELECT emp id, ldap user, name, dept id                               CREATE TABLE
 FROM Employee                                                           Employee(emp id int32, emp name string)
 WHERE dept id IN (                                                     WITH DATAFILE:CSV[delim=”|”]:”employee.txt”
  SELECT e.dept id FROM Employee e                                      WITH VALIDATION;
  WHERE e.ldap user=USERNAME());
                                                                        Other formats supported by Tenzing include:
This view allows users to see only other employees in their
department, and prevents users from seeing the salary of                 • ColumnIO, a columnar storage system developed by
other employees.                                                           the Dremel team [17].

4.10     DML                                                             • Bigtable, a highly distributed key-value store [7].
   Tenzing has basic support for DML operations INSERT,                  • Protocol buffers [11] stored in compressed record for-
UPDATE and DELETE. Support is batch mode (i.e. meant                       mat (RecordIO) and sorted strings format (SSTables
for batch DML application of relatively large volumes of                   [7]).
data). Tenzing is not ACID compliant - specifically, we are
atomic, consistent and durable, but do not support isolation.            • MySQL databases.
   INSERT is implemented by creating a new data set and
adding the new data set to the existing metadata. Essen-                 • Data embedded in the metadata (useful for testing and
tially, this acts as a batch mode APPEND style INSERT.                     small static data sets).




                                                                 1323
5.     PERFORMANCE                                                                 Workers    Time (s)    Throughput
  One of the key aims of Tenzing has been to have per-                                 100      188.74           16.74
formance comparable to traditional MPP database systems                                500       36.12           17.49
such as Teradata, Netezza and Vertica. In order to achieve                            1000       19.57           16.14
this, there are several areas that we had to work on:

5.1      MapReduce Enhancements                                                    Table 1: Tenzing Scalability.
   Tenzing is tightly integrated with the Google MapReduce
implementation, and we made several enhancements to the
MapReduce framework to increase throughput, decrease la-              the reducer of the upstream MR and the mapper of the
tency and make SQL operators more efficient.                            downstream MR are co-located in the same process.
   Workerpool. One of the key challenges we faced was re-                Sort Avoidance. Certain operators such as hash join
ducing latency from minutes to seconds. It became rapidly             and hash aggregation require shuffling, but not sorting. The
clear that in order to do so, we had to implement a solution          MapReduce API was enhanced to automatically turn off
which did not entail spawning of new binaries for each new            sorting for these operations. When sorting is turned off, the
Tenzing query. The MapReduce and Tenzing teams collab-                mapper feeds data to the reducer which directly passes the
oratively came up with the pool implementation. A typical             data to the Reduce() function bypassing the intermediate
pool consists of three process groups:                                sorting step. This makes many SQL operators significantly
                                                                      more efficient.
     1. The master watcher. The watcher is responsible for               Block Shuffle. Typically, MapReduce uses row based en-
        receiving a work request and assigning a free master          coding and decoding during shuffle. This is necessary since
        for the task. The watcher also monitors the overall           in order to sort the data, rows must be processed individu-
        health of the pool such as free resources, number of          ally. However, this is inefficient when sorting is not required.
        running queries, etc. There is usually one one watcher        We implemented a block-based shuffle mechanism on top of
        process for one instance of the pool.                         the existing row-based shuffler in MapReduce that combines
     2. The master pool. This consists of a relatively small          many small rows into compressed blocks of roughly 1MB in
        number of processes (usually a few dozen). The job            size. By treating the entire block as one row and avoiding
        of the master is to coordinate the execution of one           reducer side sorting, we were able to avoid some of the over-
        query. The master receives the task from the watcher          head associated with row serialization and deserialization in
        and distributes the tasks to the workers, and monitors        the underlying MapReduce framework code. This lead to
        their progress. Note that once a master receives a task,      3X faster shuffling of data compared to row based shuffling
        it takes over ownership of the task, and the death of         with sorting.
        the watcher process does not impact the query in any             Local Execution. Another simple MapReduce optimiza-
        way.                                                          tion we do is local run. The backend can detect the size of
                                                                      the underlying data to be processed. If the size is under a
     3. The worker pool. This contains a set of workers (typi-        threshold (typically 128 MB), the query is not sent to the
        cally a few thousand processes) which do all the heavy        pool, but executed directly in the client process. This re-
        lifting of processing the data. Each worker can work          duces the query latency to about 2 seconds.
        as either a mapper or a reducer or both. Each worker
        constantly monitors a common area for new tasks and           5.2 Scalability
        picks up new tasks as they arrive on a FIFO basis. We           Because it is built on the MapReduce framework, Ten-
        intend to implement a priority queue so that queries          zing has excellent scalability characteristics. The current
        can be tiered by priority.                                    production deployment runs in two data centers, using 2000
                                                                      cores each. Each core has 6 GB of RAM and 24 GB of lo-
  Using this approach, we were able to bring down the la-             cal disk, mainly used for sort buffers and local caches (Note
tency of the execution of a Tenzing query itself to around 7          that the data is primarily stored in GFS and Bigtable). We
seconds. There are other bottlenecks in the system however,           benchmarked the system for the simple query
such as computation of map splits, updating the metadata
service, committing / rolling back results (which involves                SELECT a, SUM(b)
                                                                          FROM T
file renames), etc. which means the typical latency varies                 WHERE c = k
between 10 and 20 seconds currently. We are working on                    GROUP BY a
various other enhancements and believe we can cut this time
down to less than 5 seconds end-to-end, which is fairly ac-           with data stored in ColumnIO format on GFS files (see table
ceptable to the analyst community.                                    1). Throughput, measured in rows per second per worker,
  Streaming & In-memory Chaining. The original im-                    remains steady as the number of workers is scaled up.
plementation of Tenzing serialized all intermediate data to
GFS. This led to poor performance for multi-MapReduce                 5.3 System Benchmarks
queries, such as hash joins and nested sub-selects. We im-               In order to evaluate Tenzing performance against com-
proved the performance of such queries significantly by im-            mercial parallel databases, we benchmarked four commonly
plementing streaming between MapReduces, i.e. the up-                 used analyst queries (see appendix A) against DBMS-X, a
stream and downstream MRs communicate using the net-                  leading MPP database appliance with row-major storage.
work and only use GFS for backup. We subsequently im-                 Tenzing data was stored in GFS in ColumnIO format. The
proved performance further by using memory chaining, where            Tenzing setup used 1000 processes with 1 CPU, 2 GB RAM



                                                                   1324
Query     DBMS-X (s)      Tenzing (s)    Change                     what better for pure select-project queries with high selec-
                                                                      tivity. In comparison to the production evaluation engine,
  #2                  129             93   39% faster
                                                                      the vector engine’s per-worker throughput was about three
  #4                   70             69   1.4% faster
                                                                      times higher; the LLVM engine’s per-worker throughput was
  #1                  155            213   38% slower
                                                                      six to twelve times higher.
  #3                    9             28   3.1 times slower
                                                                         Our LLVM based query engine stores intermediate results
                                                                      by rows, even when both input and output are columnar. In
                                                                      comparison to columnar vector processing, we saw several
           Table 2: Tenzing versus DBMS-X.
                                                                      advantages and disadvantages.

                                Throughput                                • For hash table based operators like aggregation and
           Rows     Selected    Vector     LLVM     Ratio                   join, using rows is more natural: composite key com-
                                                                            parison has cache locality, and conflict resolution is
      #8     104     99.58%       21         89        4.2                  done using pointers. Our engine iterates on input rows
      #6      60    100.00%        8.1       25        3.2                  and uses generated procedures that do both. However
      #1      19     99.16%        0.66       2.0      3.0                  to our best knowledge, there is no straightforward and
      #2      19      0.81%       23         61        2.6                  fast columnar solution for using hash table. Searching
      #7     104     58.18%        6.7       17        2.5                  in hash table breaks the cache locality for columns,
      #3      57      3.74%        7.4       15        2.0                  resulting in more random memory accesses.
      #5     104      4.28%       13         19        1.5
      #4      40      2.49%       12         13        1.1                • Vector processing engines always materialize interme-
                                                                            diate results in main memory. In contrast our gener-
                                                                            ated native code can store results on the stack (better
 Table 3: LLVM and Vector Engine Benchmarks.                                data locality) or even registers, depending on how the
                                                                            JIT compiler optimizes.

and 8 GB local disk each. Results are shown in table 2.                   • If selectivity of source data is low, vector processing
The poor performance on query #3 is because query exe-                      engines load less data into cache and thus scan faster.
cution time is dominated by startup time. The production                    Because our engine stores data in rows, scans end up
version of Tenzing was used for the benchmarks; we believe                  reading more data and are slower.
Tenzing’s results would have been significantly faster if the
experimental LLVM engine discussed in the next section had                • While LLVM provides support for debugging JITed
been ready at benchmark time.                                               code [16], there are more powerful analysis and debug-
                                                                            ging tools for the native C/C++ routines used by most
5.4    Experimental LLVM Query Engine                                       vector processing engines.
   Our execution engine has gone through multiple itera-
tions to achieve single-node efficiency close to commercial                The majority of queries in our workload are analytic queries
DBMS. The first implementation translated SQL expres-                  that have aggregations and/or joins. These operations are
sions to Sawzall code. This code was then compiled using              significant performance bottlenecks compared to other op-
Sawzall’s just-in-time (JIT) compiler. However, this proved           erators like selection and projection. We believe that native
to be inefficient because of the serialization and deserializa-         code generation engine deals better with these bottlenecks
tion costs associated with translating to and from Sawzall’s          and is a promising approach for query processing.
native type system. The second and current implementation
uses Dremel’s SQL expression evaluation engine, which is              6. RELATED WORK
based on direct evaluation of parse trees of SQL expressions.
While more efficient than the original Sawzall implementa-                 A significant amount of work has been done in recent years
tion, it was still somewhat slow because of its interpreter-like      in the area of large scale distributed data processing. These
nature and row based processing.                                      fall into the following broad categories:
   For the third iteration, we did extensive experiments with
                                                                          • Core frameworks for distributed data processing. Our
two major styles of execution: LLVM based native code gen-
                                                                            focus has been on MapReduce [9], but there are sev-
eration with row major block based intermediate data and
                                                                            eral others such as Nephele/PACT [3] and Dryad [15].
column major vector based processing with columnar in-
                                                                            Hadoop [8] is an open source implementation of the
termediate storage. The results of benchmarking LLVM vs
                                                                            MapReduce framework. These powerful but complex
vector on some typical aggregation queries is shown in table
                                                                            frameworks are designed for software engineers imple-
3. All experiments were done on the same dual-core Intel
                                                                            menting complex parallel algorithms.
machine with 4 GB of RAM, with input data in columnar
form in-memory. Note that the LLVM engine is a work                       • Simpler procedural languages built on top of these
in progress and has not yet been integrated into Tenzing.                   frameworks. The most popular of these are Sawzall
Table sizes are in millions of rows and throughput is again                 [19] and PIG [18]. These have limited optimizations
measured in millions of rows per worker per second. The                     built in, and are more suited for reasonably experi-
data suggests that the higher the selectivity of the where                  enced analysts, who are comfortable with a procedu-
clause, the better the LLVM engine’s relative performance.                  ral programming style, but need the ability to iterate
We found that the LLVM approach gave better overall re-                     quickly over massive volumes of data.
sults for real-life queries while vector processing was some-



                                                                   1325
• Language extensions, usually with special purpose op-                 • The MapReduce team, specifically Jerry Zhao, Mari´n
                                                                                                                              a
     timizers, built on top of the core frameworks. The                      Dvorsk´ and Derek Thomson, for working closely with
                                                                                    y
     ones we have studied are FlumeJava [6], which is built                  us in implementing various enhancements to MapRe-
     on top of MapReduce, and DryadLINQ [23], which is                       duce.
     built on top of Dryad. These seem to be mostly geared
     towards programmers who want to quickly build large                   • The Dremel team, specifically Sergey Melnik and Matt
     scalable pipelines with relatively simple operations (e.g.              Tolton, for ColumnIO storage and the SQL expression
     ETL pipelines for data warehouses).                                     evaluation engine.

   • Declarative query languages built on top of the core                  • The Sawzall team, specifically Polina Sokolova and
     frameworks with intermediate to advanced optimiza-                      Robert Griesemer, for helping us embed the Sawzall
     tions. These are geared towards the reporting and                       language which we use as the primary scripting lan-
     analysis community, which is very comfortable with                      guage for supporting complex user-defined functions.
     SQL. Tenzing is mostly focused on this use-case, as we
     believe, are HIVE [21], SCOPE [5] and HadoopDB [1]                    • Various other infrastructure teams at Google, includ-
     (recently commercialized as Hadapt). The latter is an                   ing but not limited to the Bigtable, GFS, Machines
     interesting hybrid that tries to approach the perfor-                   and SRE teams.
     mance of parallel DBMS by using a cluster of single-
     node databases and MapReduce as the glue layer.                       • Our loyal and long-suffering user base, especially the
                                                                             people in Sales & Finance.
   • Embedding MapReduce and related concepts into tra-
     ditional parallel DBMSs. A number of vendors have               9. REFERENCES
     such offerings now, including Greenplum, AsterData,
     Paraccel and Vertica.                                               [1] A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi,
                                                                             A. Silberschatz, and A. Rasin. HadoopDB: an
                                                                             architectural hybrid of MapReduce and DBMS
7. CONCLUSION                                                                technologies for analytical workloads. Proceedings of
  We described Tenzing, a SQL query execution engine built                   the VLDB Endowment, 2:922–933, August 2009.
on top of MapReduce. We demonstrated that:                               [2] F.N. Afrati and J.D. Ullman. Optimizing joins in a
                                                                             Map-Reduce environment. In Proceedings of the 13th
   • It is possible to create a fully functional SQL engine                  International Conference on Extending Database
     on top of the MapReduce framework, with extensions                      Technology, pages 99–110. ACM, 2010.
     that go beyond SQL into deep analytics.
                                                                         [3] D. Battr´, S. Ewen, F. Hueske, O. Kao, V. Markl, and
                                                                                      e
                                                                             D. Warneke. Nephele/pacts: a programming model
   • With relatively minor enhancements to the MapRe-
                                                                             and execution framework for web-scale analytical
     duce framework, it is possible to implement a large
                                                                             processing. In Proceedings of the 1st ACM symposium
     number of optimizations currently available in com-
                                                                             on Cloud computing, SoCC ’10, pages 119–130, New
     mercial database systems, and create a system which
                                                                             York, NY, USA, 2010. ACM.
     can compete with commercial MPP DBMS in terms of
     throughput and latency.                                             [4] S. Blanas, J.M. Patel, V. Ercegovac, J. Rao, E.J.
                                                                             Shekita, and Y. Tian. A comparison of join algorithms
   • The MapReduce framework provides a combination of                       for log processing in MapReduce. In Proceedings of the
     high performance, high reliability and high scalability                 2010 international conference on Management of data,
     on cheap unreliable hardware, which makes it an ex-                     pages 975–986. ACM, 2010.
     cellent platform to build distributed applications that             [5] R. Chaiken, B. Jenkins, P. Larson, B. Ramsey,
     involve doing simple to medium complexity operations                    D. Shakib, S. Weaver, and J. Zhou. Scope: easy and
     on large data volumes.                                                  efficient parallel processing of massive data sets. Proc.
                                                                             VLDB Endow., 1:1265–1276, August 2008.
   • By designing the engine and the optimizer to be aware               [6] C. Chambers, A. Raniwala, F. Perry, S. Adams, R.R.
     of the characteristics of heterogeneous data sources, it                Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava:
     is possible to create a smart system which can fully uti-               easy, efficient data-parallel pipelines. In Proceedings of
     lize the characteristics of the underlying data sources.                the 2010 ACM SIGPLAN conference on Programming
                                                                             language design and implementation, pages 363–375.
                                                                             ACM, 2010.
8. ACKNOWLEDGEMENTS                                                      [7] F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A.
  Tenzing is built on top of a large number of existing                      Wallach, M. Burrows, T. Chandra, A. Fikes, and R.E.
Google technologies. While it is not possible to list all in-                Gruber. Bigtable: A distributed storage system for
dividuals who have contributed either directly or indirectly                 structured data. ACM Transactions on Computer
towards the implementation, we would like to highlight the                   Systems (TOCS), 26(2):1–26, 2008.
contribution of the following:                                           [8] D. Cutting et al. Apache Hadoop Project.
                                                                             http://hadoop.apache.org/.
   • The data warehouse management, specifically Hemant
     Maheshwari, for providing business context, manage-                 [9] J. Dean and S. Ghemawat. MapReduce: Simplified
     ment support and overall guidance.                                      data processing on large clusters. Communications of
                                                                             the ACM, 51(1):107–113, 2008.



                                                                  1326
[10] J. Dean and S. Ghemawat. MapReduce: a flexible                        USA, 2008. USENIX Association.
     data processing tool. Communications of the ACM,
     53(1):72–77, 2010.                                            APPENDIX
[11] J. Dean, S. Ghemawat, K. Varda, et al. Protocol
     Buffers: Google’s Data Interchange Format.                     A. BENCHMARK QUERIES
     Documentation and open source release at                        This section contains the queries used for benchmarking
     http://code.google.com/p/protobuf/.                           performance against DBMS-X. The table sizes are indicated
[12] D.J. DeWitt and M. Stonebraker. MapReduce: A                  using XM or XK suffix.
     major step backwards. The Database Column, 1, 2008.
[13] E. Friedman, P. Pawlowski, and J. Cieslewicz.
                                                                   A.1 Query 1
     SQL/MapReduce: A practical approach to                            SELECT DISTINCT dim2.dim1 id
     self-describing, polymorphic, and parallelizable                  FROM FactTable1 XXB stats
     user-defined functions. Proceedings of the VLDB                    INNER JOIN DimensionTable1 XXXM dim1
     Endowment, 2(2):1402–1413, 2009.                                    USING (dimension1 id)
                                                                       INNER JOIN DimensionTable2 XXM dim2
[14] S. Ghemawat, H. Gobioff, and S. Leung. The Google                    USING (dimension2 id)
     file system. SIGOPS Oper. Syst. Rev., 37:29–43,                    WHERE dim2.attr BETWEEN A and B
     October 2003.                                                     AND stats.date id > some date id
[15] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly.           AND dim1.attr IN (Val1, Val2, Val3, ...);
     Dryad: distributed data-parallel programs from
     sequential building blocks. SIGOPS Oper. Syst. Rev.,          A.2 Query 2
     41:59–72, March 2007.
[16] R. Kleckner. LLVM: Debugging JITed Code With                      SELECT
     GDB. Retrieved June 27, 2011, from                                 dim1.attr1, dates.finance week id,
     http://llvm.org/docs/DebuggingJITedCode.html.                      SUM(FN1(dim3.attr3, stats.measure1)),
                                                                        SUM(FN2(dim3.attr4, stats.measure2)),
[17] S. Melnik, A. Gubarev, J.J. Long, G. Romer,
     S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel:              FROM FactTable1 XXB stats
     Interactive Analysis of Web-Scale Datasets.                       INNER JOIN DimensionTable1 XXXM dim1
     Proceedings of the VLDB Endowment, 3(1), 2010.                      USING (dimension1 id)
                                                                       INNER JOIN DatesDim dates
[18] C. Olston, B. Reed, U. Srivastava, R. Kumar, and
                                                                         USING (date id)
     A. Tomkins. Pig Latin: a not-so-foreign language for              INNER JOIN DimensionTable3 XXXK dim3
     data processing. In Proceedings of the 2008 ACM                     USING (dimension3 id)
     SIGMOD international conference on Management of                  WHERE <fact date range>
     data, pages 1099–1110. ACM, 2008.                                 GROUP BY 1, 2;
[19] R. Pike, S. Dorward, R. Griesemer, and S. Quinlan.
     Interpreting the data: Parallel analysis with Sawzall.        A.3 Query 3
     Scientific Programming, 13(4):277–298, 2005.
[20] M. Stonebraker, D. Abadi, D.J. DeWitt, S. Madden,                 SELECT attr4 FROM (
     E. Paulson, A. Pavlo, and A. Rasin. MapReduce and                  SELECT dim4.attr4, COUNT(∗) AS dup count
                                                                        FROM DimensionTable4 XXM dim4
     parallel DBMSs: friends or foes? Communications of                 JOIN DimensionTable5 XXM dim5
     the ACM, 53(1):64–71, 2010.                                          USING (dimension4 id)
[21] A. Thusoo, J.S. Sarma, N. Jain, Z. Shao, P. Chakka,                WHERE dim4.attr1 BETWEEN Val1 and Val2
     S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive:                AND dim5.attr2 IN (Val3, Val4)
     a warehousing solution over a Map-Reduce framework.                GROUP BY 1
                                                                        HAVING dup count = 1) x;
     Proceedings of the VLDB Endowment, 2(2):1626–1629,
     2009.
[22] H. Yang, A. Dasdan, R.L. Hsiao, and D.S. Parker.              A.4 Query4
     Map-Reduce-Merge: simplified relational data
                                                                       SELECT attr1, measure1 / measure2
     processing on large clusters. In Proceedings of the 2007
                                                                       FROM (
     ACM SIGMOD international conference on                              SELECT
     Management of data, pages 1029–1040. ACM, 2007.                        attr1,
                                               ´
[23] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson,                 SUM(FN1(attr2, attr3, attr4)) measure1,
     P. K. Gunda, and J. Currey. DryadLINQ: a system for                    SUM(FN2(attr5, attr6, attr7)) measure2,
     general-purpose distributed data-parallel computing                 FROM DimensionTable6 XXM
                                                                         WHERE FN3(attr8) AND FN4(attr9)
     using a high-level language. In Proceedings of the 8th              GROUP BY 1
     USENIX conference on Operating systems design and                 ) v;
     implementation, OSDI’08, pages 1–14, Berkeley, CA,




                                                                1327

More Related Content

What's hot

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2Giovanna Roda
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combinersijcsit
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Sumeet Singh
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyShital Kat
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationUT, San Antonio
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesDavid Tjahjono,MD,MBA(UK)
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEijdpsjournal
 
The Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICDThe Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICDShu-Jeng Hsieh
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceHortonworks
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 

What's hot (20)

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
43_Sameer_Kumar_Das2
43_Sameer_Kumar_Das243_Sameer_Kumar_Das2
43_Sameer_Kumar_Das2
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop Technology
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
 
The Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICDThe Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICD
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 

Similar to A sql implementation on the map reduce framework

Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...IOSR Journals
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
 
Cisco Big Data Use Case
Cisco Big Data Use CaseCisco Big Data Use Case
Cisco Big Data Use CaseErni Susanti
 
cisco_bigdata_case_study_1
cisco_bigdata_case_study_1cisco_bigdata_case_study_1
cisco_bigdata_case_study_1Erni Susanti
 
oracle_workprofile.pptx
oracle_workprofile.pptxoracle_workprofile.pptx
oracle_workprofile.pptxssuser20fcbe
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss
 
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...IOSR Journals
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopBigData Research
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringIRJET Journal
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Databricks
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperScott Gray
 

Similar to A sql implementation on the map reduce framework (20)

Cloud & Data Center Networking
Cloud & Data Center NetworkingCloud & Data Center Networking
Cloud & Data Center Networking
 
Facade
FacadeFacade
Facade
 
H017144148
H017144148H017144148
H017144148
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
 
Dremel
DremelDremel
Dremel
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
Cisco Big Data Use Case
Cisco Big Data Use CaseCisco Big Data Use Case
Cisco Big Data Use Case
 
cisco_bigdata_case_study_1
cisco_bigdata_case_study_1cisco_bigdata_case_study_1
cisco_bigdata_case_study_1
 
oracle_workprofile.pptx
oracle_workprofile.pptxoracle_workprofile.pptx
oracle_workprofile.pptx
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
 
D017212027
D017212027D017212027
D017212027
 
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
 
Oracle
OracleOracle
Oracle
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
 

Recently uploaded

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 

Recently uploaded (20)

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 

A sql implementation on the map reduce framework

  • 1. Tenzing A SQL Implementation On The MapReduce Framework Biswapesh Liang Lin Weiran Liu Sagar Mittal Chattopadhyay lianglin@ wrliu@ sagarmittal@ biswapesh@ Prathyusha Vera Lychagina Younghee Kwon Michael Wong Aragonda vlychagina@ youngheek@ mcwong@ prathyusha@ Google *@google.com ABSTRACT an alternative to SQL for increasingly sophisticated analyt- Tenzing is a query engine built on top of MapReduce [9] ics. Such vendors include AsterData, GreenPlum, Paraccel, for ad hoc analysis of Google data. Tenzing supports a and Vertica. mostly complete SQL implementation (with several exten- In this paper, we describe Tenzing, a SQL query execution sions) combined with several key characteristics such as het- engine built on top of MapReduce. We have been able to erogeneity, high performance, scalability, reliability, meta- build a system with latency as low as ten seconds, high ef- data awareness, low latency, support for columnar storage ficiency, and a comprehensive SQL92 implementation with and structured data, and easy extensibility. Tenzing is cur- some SQL99 extensions. Tenzing also supports efficiently rently used internally at Google by 1000+ employees and querying data in row stores, column stores, Bigtable [7], serves 10000+ queries per day over 1.5 petabytes of com- GFS [14], text and protocol buffers. Users have access to pressed data. In this paper, we describe the architecture the underlying platform through SQL extensions such as and implementation of Tenzing, and present benchmarks of user defined table valued functions and native support for typical analytical queries. nested relational data. We take advantage of indexes and other traditional opti- mization techniques—along with a few new ones—to achieve 1. INTRODUCTION performance comparable to commercial parallel databases. The MapReduce [9] framework has gained wide popularity Thanks to MapReduce, Tenzing scales to thousands of cores both inside and outside Google. Inside Google, MapReduce and petabytes of data on cheap, unreliable hardware. We has quickly become the framework of choice for writing large worked closely with the MapReduce team to implement and scalable distributed data processing jobs. Outside Google, take advantage of MapReduce optimizations. Our enhance- projects such as Apache Hadoop have been gaining popular- ments to the MapReduce framework are described in more ity rapidly. However, the framework is inaccessible to casual detail in section 5.1. business users due to the need to understand distributed Tenzing has been widely adopted inside Google, especially data processing and C++ or Java. by the non-engineering community (Sales, Finance, Market- Attempts have been made to create simpler interfaces on ing), with more than a thousand users and ten thousand top of MapReduce, both inside Google (Sawzall [19], Flume- analytic queries per day. The Tenzing service currently runs Java [6]) and outside (PIG [18], HIVE [21], HadoopDB [1]). on two data centers with two thousand cores each and serves To the best of our knowledge, such implementations suffer queries on over 1.5 petabytes of compressed data in several from latency on the order of minutes, low efficiency, or poor different data sources and formats. Multiple Tenzing tables SQL compatibility. Part of this inefficiency is a result of have over a trillion rows each. MapReduce’s perceived inability to exploit familiar database optimization and execution techniques [10, 12, 20]. At the same time, distributed DBMS vendors have integrated the MapReduce execution model in their engines [13] to provide 2. HISTORY AND MOTIVATION At the end of 2008, the data warehouse for Google Ads data was implemented on top of a proprietary third-party Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are database appliance henceforth referred to as DBMS-X. While not made or distributed for profit or commercial advantage and that copies it was working reasonably well, we faced the following issues: bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were invited to present 1. Increased cost of scalability: our need was to scale to their results at The 37th International Conference on Very Large Data Bases, petabytes of data, but the cost of doing so on DBMS-X August 29th - September 3rd 2011, Seattle, Washington. was deemed unacceptably high. Proceedings of the VLDB Endowment, Vol. 4, No. 12 Copyright 2011 VLDB Endowment 2150-8097/11/08... $ 10.00. 1318
  • 2. WebUI, API, CLI constantly. This allows us to significantly decrease the end- data & control to-end latency of queries. The pool consists of master and Query Server Metadata Server worker nodes, plus an overall gatekeeper called the master watcher. The workers manipulate the data for all the tables Master Watcher defined in the metadata layer. Since Tenzing is a heteroge- Masters Workers neous system, the backend storage can be a mix of various data stores, such as ColumnIO [17], Bigtable [7], GFS [14] Storage files, MySQL databases, etc. The query server serves as the gateway between the client and the pool. The query server parses the query, ap- plies optimizations and sends the plan to the master for Figure 1: Tenzing Architecture. execution. The Tenzing optimizer applies some basic rule and cost based optimizations to create an optimal execution 2. Rapidly increasing loading times: importing new data plan. took hours each day and adding new data sources took Tenzing has several client interfaces, including a com- proportionally longer. Further, import jobs competed mand line client (CLI) and a Web UI. The CLI provides with user queries for resources, leading to poor query more power such as complex scripting and is used mostly by performance during the import process. power users. The Web UI, with easier-to-use features such as query & table browsers and syntax highlighting, is geared 3. Analyst creativity was being stifled by the limitations towards novice and intermediate users. There is also an API of SQL and lack of access to multiple sources of data. to directly execute queries on the pool, and a standalone bi- An increasing number of analysts were being forced nary which does not need any server side components, but to write custom code for more complex analysis, often rather launches its own MapReduce jobs. directly against the source (such as Sawzall against The metadata server provides an API to store and fetch logs). metadata such as table names and schemas, and pointers to the underlying data. The metadata server is also responsi- We decided to do a major re-design of the existing plat- ble for storing ACLs (Access Control Lists) and other secu- form by moving all our analysis to use Google infrastructure, rity related information about the tables. The server uses and specifically to the MapReduce platform. The new plat- Bigtable as the persistent backing store. form had to: 1. Scale to thousands of cores, hundreds of users and 3.1 Life Of A Query petabytes of data. A typical Tenzing query goes through the following steps: 2. Run on unreliable off-the-shelf hardware, while contin- 1. A user (or another process) submits the query to the uing to be highly reliable. query server through the Web UI, CLI or API. 3. Match or exceed the performance of the existing plat- 2. The query server parses the query into an intermediate form. parse tree. 4. Have the ability to run directly off the data stored on 3. The query server fetches the required metadata from Google systems, to minimize expensive ETL processes. the metadata server to create a more complete inter- mediate format. 5. Provide all the required SQL features to the analysts to minimize the learning curve, while also support- 4. The optimizer goes through the intermediate format ing more advanced functionality such as complex user- and applies various optimizations. defined functions, prediction and mining. 5. The optimized execution plan consists of one or more We have been largely successful in this effort with Tenz- MapReduces. For each MapReduce, the query server ing. With about 18 months of development, we were able to finds an available master using the master watcher and successfully migrate all users off DBMS-X to the new plat- submits the query to it. At this stage, the execution form and provide them with significantly more data, more has been physically partitioned into multiple units of powerful analysis capabilities, similar performance for most work(i.e. shards). common scenarios, and far better scalability. 6. Idle workers poll the masters for available work. Re- duce workers write their results to an intermediate 3. IMPLEMENTATION OVERVIEW storage. The system has four major components: the worker pool, query server, client interfaces, and metadata server. Figure 7. The query server monitors the intermediate area for 1 illustrates the overall Tenzing architecture. results being created and gathers them as they arrive. The distributed worker pool is the execution system The results are then streamed to the upstream client. which takes a query execution plan and executes the MapRe- duces. In order to reduce query latency, we do not spawn 4. SQL FEATURES any new processes,1 but instead keep the processes running Tenzing is a largely feature complete SQL engine, and 1 supports all major SQL92 constructs, with some limitations. This option is available as a separate batch execution model. Tenzing also adds a number of enhancements on core SQL 1319
  • 3. for more advanced analysis. These enhancements are de- MapReduce framework to relax this restriction so that all signed to be fully parallelizable to utilize the underlying values for the same key end up in the same reducer shard, MapReduce framework. but not necessarily in the same Reduce() call. This made it possible to completely avoid the sorting step on the re- 4.1 Projection And Filtering ducer and implement a pure hash table based aggregation Tenzing supports all the standard SQL operators (arith- on MapReduce. This can have a significant impact on the metic operators, IN, LIKE, BETWEEN, CASE, etc.) and performance on certain types of queries. Due to optimizer functions. In addition, the execution engine embeds the limitations, the user must explicitly indicate that they want Sawzall [19] language so that any built-in Sawzall function hash based aggregation. A query using hash-based aggre- can be used. Users can also write their own functions in gation will fail if there is not enough memory for the hash Sawzall and call them from Tenzing. table. The Tenzing compiler does several basic optimizations re- Consider the following query: lated to filtering. Some examples include: SELECT dept id, COUNT(1) FROM Employee 1. If an expression evaluates to a constant, it is converted /∗+ HASH ∗/ GROUP BY 1; to a constant value at compile time. The following is the pseudo-code: 2. If a predicate is a constant or a constant range (e.g., BETWEEN) and the source data is an indexed source // In the mapper startup, initialize a hash table with dept id (e.g., Bigtable), the compiler will push down the con- // as key and count as value. Mapper::Start() { dition to an index range scan on the underlying source. dept hash = new Hashtable() This is very useful for making date range scans on fact } tables, or point queries on dimension tables, for exam- ple. // For each row, increment the count for the corresponding // dept id. 3. If the predicate does not involve complex functions Mapper::Map(in) { (e.g., Sawzall functions) and the source is a database dept hash[in.dept id] ++ (e.g., MySQL), the filter is pushed down to the under- } lying query executed on the source database. // At the end of the mapping phase, flush the hash table to // the reducer, without sorting. 4. If the underlying store is range partitioned on a col- Mapper::Finish() { umn, and the predicate is a constant range on that for (dept id in dept hash) { column, the compiler will skip the partitions that fall OutputToReducerNoSort( outside the scan range. This is very useful for date key = dept id, value = dept hash[dept id]) based fact tables, for example. } } 5. ColumnIO files have headers which contain meta in- // Similarly, in the reducer, initialize a hash table formation about the data, including the low and high Reducer::Start() { values for each column. The execution engine will ig- dept hash = new Hashtable() nore the file after processing the header if it can de- } termine that the file does not contain any records of interest, based on the predicates defined for that table // Each Reduce call receives a dept id and a count from in the query. // the map output. Increment the corresponding entry in // the hash table. 6. Tenzing will scan only the columns required for query Reducer::Reduce(key, value) { dept hash[key] += value execution if the underlying format supports it (e.g., } ColumnIO). // At the end of the reduce phase, flush out the 4.2 Aggregation // aggregated results. Tenzing supports all standard aggregate functions such as Reducer::Finish() { SUM, COUNT, MIN, MAX, etc. and the DISTINCT equiv- for (dept id in dept hash) { print dept id, dept hash[dept id] alents (e.g., COUNT DISTINCT). In addition, we support a } significant number of statistical aggregate functions such as } CORR, COVAR and STDDEV. The implementation of ag- gregate functions on the MapReduce framework have been discussed in numerous papers including the original MapRe- 4.3 Joins duce paper [9]; however, Tenzing employs a few additional Joins in MapReduce have been studied by various groups optimizations. One such is pure hash table based aggrega- [2,4,22]. Since joins are one of the most important aspects of tion, which is discussed below. our system, we have spent considerable time on implement- ing different types of joins and optimizing them. Tenzing 4.2.1 HASH BASED AGGREGATION supports efficient joins across data sources, such as Colum- Hash table based aggregation is common in RDBMS sys- nIO to Bigtable; inner, left, right, cross, and full outer joins; tems. However, it is impossible to implement efficiently and equi semi-equi, non-equi and function based joins. Cross on the basic MapReduce framework, since the reducer al- joins are only supported for tables small enough to fit in ways unnecessarily sorts the data by key. We enhanced the memory, and right outer joins are supported only with sort/ 1320
  • 4. merge joins. Non-equi correlated subqueries are currently 4.3.3 DISTRIBUTED SORT-MERGE JOINS not supported. Distributed sort-merge joins are the most widely used We include distributed implementations for nested loop, joins in MapReduce implementations. Tenzing has an im- sort/merge and hash joins. For sort/merge and hash joins, plementation which is most effective when the two tables the degree of parallelism must be explicitly specified due to being joined are roughly the same size and neither has an optimizer limitations. Some of the join techniques imple- index on the join key. mented in Tenzing are discussed below. 4.3.4 DISTRIBUTED HASH JOINS 4.3.1 BROADCAST JOINS Distributed hash joins are frequently the most effective The Tenzing cost-based optimizer can detect when a sec- join method in Tenzing when: ondary table is small enough to fit in memory. If the order of tables specified in the query is not optimal, the compiler can • Neither table fits completely in memory, also use a combination of rule and cost based heuristics to • One table is an order of magnitude larger than the switch the order of joins so that the larger table becomes the other, driving table. If small enough, the secondary table is pulled into the memory of each mapper / reducer process for in- • Neither table has an efficient index on the join key. memory lookups, which typically is the fastest method for joining. In some cases, we also use a sorted disk based serial- These conditions are often satisfied by OLAP queries with ized implementation for the bigger tables to conserve mem- star joins to large dimensions, a type of query often used ory. Broadcast joins are supported for all join conditions with Tenzing. (CROSS, EQUI, SEMI-EQUI, NON-EQUI), each having a Consider the execution of the following query: specialized implementation for optimal performance. A few SELECT Dimension.attr, sum(Fact.measure) additional optimizations are applied on a case-by-case basis: FROM Fact /∗+ HASH ∗/ JOIN Dimension USING (dimension key) • The data structure used to store the lookup data is /∗+ HASH ∗/ GROUP BY 1; determined at execution time. For example, if the sec- Tenzing processes the query as follows:2 ondary table has integer keys in a limited range, we use an integer array. For integer keys with wider range, we // The optimizer creates the MapReduce operations required // for the hash join and aggregation. use a sparse integer map. Otherwise we use a data type Optimizer::CreateHashJoinPlan(fact table, dimension table) { specific hash table. if (dimension table is not hash partitioned on dimension key) • We apply filters on the join data while loading to re- Create MR1: Partition dimension table by dimension key duce the size of the in-memory structure, and also only if (fact table is not hash partitioned on dimension key) Create MR2: Partition fact table by dimension key load the columns that are needed for the query. Create MR3: Shard−wise hash join and aggregation • For multi-threaded workers, we create a single copy // MR1 and MR2 are trivial repartitions and hence the of the join data in memory and share it between the // pseudo−code is skipped. Pseudo−code for MR3 is below: threads. // At the start of the mapper, load the corresponding // dimension shard into memory. Also initialize a hash • Once a secondary data set is copied into the worker // table for the aggregation. process, we retain the copy for the duration of the Mapper::Start() { query so that we do not have to copy the data for every fact shard.Open() map shard. This is valuable when there are many map shard id = fact shard.GetCurrentShard() shards being processed by a relatively small number of dimension shard[shard id].Open() workers. lookup hash = new HashTable() lookup hash.LoadFrom(dimension shard[shard id]) agg hash = new Hashtable() • For tables which are both static and frequently used, } we permanently cache the data in local disk of the worker to avoid remote reads. Only the first use of // For each row, increment the count for the corresponding the table results in a read into the worker. Subsequent // dimension key. Mapper::Map(in) { reads are from the cached copy on local disk. dim rec = lookup hash(in.dimension key); agg hash(dim rec.attr) += in.measure; • We cache the join results from the last record; since } input data often is naturally ordered on the join at- tribute(s), it saves us one lookup access. // At the end of the mapping phase, flush the hash table // to the reducer, without sorting. 4.3.2 REMOTE LOOKUP JOINS Mapper::Finish() { for (attr in agg hash) For sources which support remote lookups on index (e.g., OutputToReducerNoSort( Bigtable), Tenzing supports remote lookup joins on the key key = attr, value = agg hash[attr]); (or a prefix of the key). We employ an asynchronous batch } lookup technique combined with a local LRU cache in order } to improve performance. The optimizer can intelligently 2 Note that no sorting is required in any of the MapReduces, switch table order to enable this if needed. so it is disabled for all of them. 1321
  • 5. The reducer code is identical to the hash aggregation 4.5 OLAP Extensions pseudo-code in 4.2.1. Tenzing supports the ROLLUP() and CUBE() OLAP ex- The Tenzing scheduler can intelligently parallelize opera- tensions to the SQL language. We follow the Oracle variant tions to make hash joins run faster. In this case, for example, of the syntax for these extensions. These are implemented the compiler would chain MR3 and MR1, and identify MR2 by emitting additional records in the Map() phase based on as a join source. This has the following implications for the the aggregation combinations required. backend scheduler: Consider the following query which outputs the salary of each employee and also department-wise and grand totals: • MR1 and MR2 will be started in parallel. SELECT dept, emp, SUM(salary) FROM Employee • MR3 will be started as soon as MR2 finishes and MR1 GROUP BY ROLLUP(1, 2); starts producing data, without waiting for MR1 to fin- The following pseudo-code explains the backend implemen- ish. tation: We also added several optimizations to the MapReduce // For each row, emit multiple key value pairs, one framework to improve the efficiency of distributed hash joins, // for each combination of keys being aggregated. Mapper::Map(in) { such as sort avoidance, streaming, memory chaining and OutputToReducerWithSort( block shuffle. These are covered in more detail in 5.1. key = {dept, emp}, value = salary) OutputToReducerWithSort( 4.4 Analytic Functions key = {dept, NULL}, value = salary) OutputToReducerWithSort( Tenzing supports all the major analytic functions, with key = {NULL, NULL}, value = salary) a syntax similar to PostgreSQL / Oracle. These functions } have proven to be quite popular with the analyst commu- nity. Some of the most commonly used analytic functions // The reducer will do a standard pre−sorted are RANK, SUM, MIN, MAX, LEAD, LAG and NTILE. // aggregation on the data. Consider the following example with analytic functions Reducer::Reduce(key, list<value>) { sum := 0 which ranks employees in each department by their salary: for (i = 1 to list<value>.size()) { sum += value[i].salary SELECT } dept, emp, salary, print key, sum RANK() OVER ( } PARTITION BY dept ORDER BY salary DESC) AS salary rank FROM Employee; 4.6 Set Operations Tenzing supports all standard SQL set operations such as The following pseudo-code explains the backend implemen- UNION, UNION ALL, MINUS and MINUS ALL. Set op- tation: erations are implemented mainly in the reduce phase, with Mapper::Map(in) { the mapper emitting the records using the whole record as // From the mapper, we output the partitioning key of the key, in order to ensure that similar keys end up together // the analytic function as the key, and the ordering in the same reduce call. This does not apply to UNION // key and other information as value. ALL, for which we use round robin partitioning and turn off OutputToReducerWithSort( reducer side sorting for greater efficiency. key = in.dept, value = {in.emp, in.salary}) } 4.7 Nested Queries And Subqueries Reducer::Reduce(key, values) { Tenzing supports SELECT queries anywhere where a ta- // Reducer receives all values with the same partitioning ble name is allowed, in accordance with the SQL standard. // key. The list is then sorted on the ordering key for Typically, each nested SQL gets converted to a separate // the analytic function. MapReduce and the resultant intermediate table is substi- sort(values on value.salary) // For simple analytic function such as RANK, it is tuted in the outer query. However, the compiler can opti- enough mize away relatively simple nested queries such that extra // to just print out the results once sorted. MapReduce jobs need not be created. For example, the for (value in values) { query print key, value.emp, value.salary, i } SELECT COUNT(∗) FROM ( } SELECT DISTINCT emp id FROM Employee); results in two MapReduce jobs: one for the inner DISTINCT Currently, Tenzing does not support the use of multi- and a second for the outer COUNT. However, the query ple analytic functions with different partitioning keys in the same query. We plan to remove this restriction by rewriting SELECT ∗ FROM ( SELECT DISTINCT emp id FROM Employee); such queries as merging result sets from multiple queries, each with analytic functions having the same partitioning results in only one MapReduce job. If two (or more) MapRe- key. Note that it is possible to combine aggregation and duces are required, Tenzing will put the reducer and follow- analytic functions in the same query - the compiler simply ing mapper in the same process. This is discussed in greater rewrites it into multiple queries. detail in 5.1. 1322
  • 6. message Department { Tenzing allows the user to specify, but does not enforce, required int32 dept id = 1; primary and foreign keys. required string dept name = 2; Limited support (no joins) for UPDATE and DELETE is repeated message Employee { implemented by applying the update or delete criteria on the required int32 emp id = 3; data to create a new dataset. A reference to the new dataset required string emp name = 4; }; then replaces the old reference in the metadata repository. repeated message Location { required string city name = 5; 4.11 DDL } We support a number of DDL operations, including CRE- }; ATE [OR REPLACE] [EXTERNAL] [TEMPORARY] TA- BLE, DROP TABLE [IF EXISTS], RENAME TABLE, GEN- Figure 2: Sample protocol buffer definition. ERATE STATISTICS, GRANT and REVOKE. They all work as per standard SQL and act as the main means of controlling metadata and access. In addition, Tenzing has 4.8 Handling Structured Data metadata discovery mechanisms built-in to simplify import- Tenzing has read-only support for structured (nested and ing datasets into Tenzing. For example, we provide tools repeated) data formats such as complex protocol buffer struc- and commands to automatically determine the structure of tures. The current implementation is somewhat native and a MySQL database and make its tables accessible from Ten- inefficient in that the data is flattened by the reader at the zing. We can also discover the structure of protocol buffers lowest level and fed as multiple records to the engine. The from the protocol definition and import the metadata into engine itself can only deal with flat relational data, unlike Tenzing. This is useful for Bigtable, ColumnIO, RecordIO Dremel [17]. Selecting fields from different repetition levels and SSTable files in protocol buffer format. in the same query is considered an error. For example, given the protocol buffer definition in figure 4.12 Table Valued Functions 2, the query Tenzing supports both scalar and table-valued user-defined functions, implemented by embedding a Sawzall interpreter SELECT emp name, dept name FROM Department; in the Tenzing execution engine. The framework is designed is valid since it involves one repeated level (Employee). How- such that other languages can also be easily integrated. In- ever, the query tegration of Lua and R has been proposed, and work is in progress. Tenzing currently has support for creating func- SELECT emp name, city name FROM Department; tions in Sawzall that take tables (vector of tuples) as input is invalid since Employee and Location are independently and emit tables as output. These are useful for tasks such repeating groups. as normalization of data and doing complex computation involving groups of rows. 4.9 Views Tenzing supports logical views over data. Views are sim- 4.13 Data Formats ply named SELECT statements which are expanded inline Tenzing supports direct querying of, loading data from, during compilation. Views in Tenzing are predominantly and downloading data into many formats. Various options used for security reasons: users can be given access to views can be specified to tweak the exact form of input / output. without granting them access to underlying tables, enabling For example, for delimited text format, the user can spec- row and column level security of data. ify the delimiter, encoding, quoting, escaping, headers, etc. Consider the table Employee with fields emp id, ldap user, The statement below will create the Employee table from a name, dept id, and salary. We can create the following view pipe delimited text file input and validate that the loaded over it: data matches the table definition and all constraints (e.g., primary key) are met: CREATE VIEW Employee V AS SELECT emp id, ldap user, name, dept id CREATE TABLE FROM Employee Employee(emp id int32, emp name string) WHERE dept id IN ( WITH DATAFILE:CSV[delim=”|”]:”employee.txt” SELECT e.dept id FROM Employee e WITH VALIDATION; WHERE e.ldap user=USERNAME()); Other formats supported by Tenzing include: This view allows users to see only other employees in their department, and prevents users from seeing the salary of • ColumnIO, a columnar storage system developed by other employees. the Dremel team [17]. 4.10 DML • Bigtable, a highly distributed key-value store [7]. Tenzing has basic support for DML operations INSERT, • Protocol buffers [11] stored in compressed record for- UPDATE and DELETE. Support is batch mode (i.e. meant mat (RecordIO) and sorted strings format (SSTables for batch DML application of relatively large volumes of [7]). data). Tenzing is not ACID compliant - specifically, we are atomic, consistent and durable, but do not support isolation. • MySQL databases. INSERT is implemented by creating a new data set and adding the new data set to the existing metadata. Essen- • Data embedded in the metadata (useful for testing and tially, this acts as a batch mode APPEND style INSERT. small static data sets). 1323
  • 7. 5. PERFORMANCE Workers Time (s) Throughput One of the key aims of Tenzing has been to have per- 100 188.74 16.74 formance comparable to traditional MPP database systems 500 36.12 17.49 such as Teradata, Netezza and Vertica. In order to achieve 1000 19.57 16.14 this, there are several areas that we had to work on: 5.1 MapReduce Enhancements Table 1: Tenzing Scalability. Tenzing is tightly integrated with the Google MapReduce implementation, and we made several enhancements to the MapReduce framework to increase throughput, decrease la- the reducer of the upstream MR and the mapper of the tency and make SQL operators more efficient. downstream MR are co-located in the same process. Workerpool. One of the key challenges we faced was re- Sort Avoidance. Certain operators such as hash join ducing latency from minutes to seconds. It became rapidly and hash aggregation require shuffling, but not sorting. The clear that in order to do so, we had to implement a solution MapReduce API was enhanced to automatically turn off which did not entail spawning of new binaries for each new sorting for these operations. When sorting is turned off, the Tenzing query. The MapReduce and Tenzing teams collab- mapper feeds data to the reducer which directly passes the oratively came up with the pool implementation. A typical data to the Reduce() function bypassing the intermediate pool consists of three process groups: sorting step. This makes many SQL operators significantly more efficient. 1. The master watcher. The watcher is responsible for Block Shuffle. Typically, MapReduce uses row based en- receiving a work request and assigning a free master coding and decoding during shuffle. This is necessary since for the task. The watcher also monitors the overall in order to sort the data, rows must be processed individu- health of the pool such as free resources, number of ally. However, this is inefficient when sorting is not required. running queries, etc. There is usually one one watcher We implemented a block-based shuffle mechanism on top of process for one instance of the pool. the existing row-based shuffler in MapReduce that combines 2. The master pool. This consists of a relatively small many small rows into compressed blocks of roughly 1MB in number of processes (usually a few dozen). The job size. By treating the entire block as one row and avoiding of the master is to coordinate the execution of one reducer side sorting, we were able to avoid some of the over- query. The master receives the task from the watcher head associated with row serialization and deserialization in and distributes the tasks to the workers, and monitors the underlying MapReduce framework code. This lead to their progress. Note that once a master receives a task, 3X faster shuffling of data compared to row based shuffling it takes over ownership of the task, and the death of with sorting. the watcher process does not impact the query in any Local Execution. Another simple MapReduce optimiza- way. tion we do is local run. The backend can detect the size of the underlying data to be processed. If the size is under a 3. The worker pool. This contains a set of workers (typi- threshold (typically 128 MB), the query is not sent to the cally a few thousand processes) which do all the heavy pool, but executed directly in the client process. This re- lifting of processing the data. Each worker can work duces the query latency to about 2 seconds. as either a mapper or a reducer or both. Each worker constantly monitors a common area for new tasks and 5.2 Scalability picks up new tasks as they arrive on a FIFO basis. We Because it is built on the MapReduce framework, Ten- intend to implement a priority queue so that queries zing has excellent scalability characteristics. The current can be tiered by priority. production deployment runs in two data centers, using 2000 cores each. Each core has 6 GB of RAM and 24 GB of lo- Using this approach, we were able to bring down the la- cal disk, mainly used for sort buffers and local caches (Note tency of the execution of a Tenzing query itself to around 7 that the data is primarily stored in GFS and Bigtable). We seconds. There are other bottlenecks in the system however, benchmarked the system for the simple query such as computation of map splits, updating the metadata service, committing / rolling back results (which involves SELECT a, SUM(b) FROM T file renames), etc. which means the typical latency varies WHERE c = k between 10 and 20 seconds currently. We are working on GROUP BY a various other enhancements and believe we can cut this time down to less than 5 seconds end-to-end, which is fairly ac- with data stored in ColumnIO format on GFS files (see table ceptable to the analyst community. 1). Throughput, measured in rows per second per worker, Streaming & In-memory Chaining. The original im- remains steady as the number of workers is scaled up. plementation of Tenzing serialized all intermediate data to GFS. This led to poor performance for multi-MapReduce 5.3 System Benchmarks queries, such as hash joins and nested sub-selects. We im- In order to evaluate Tenzing performance against com- proved the performance of such queries significantly by im- mercial parallel databases, we benchmarked four commonly plementing streaming between MapReduces, i.e. the up- used analyst queries (see appendix A) against DBMS-X, a stream and downstream MRs communicate using the net- leading MPP database appliance with row-major storage. work and only use GFS for backup. We subsequently im- Tenzing data was stored in GFS in ColumnIO format. The proved performance further by using memory chaining, where Tenzing setup used 1000 processes with 1 CPU, 2 GB RAM 1324
  • 8. Query DBMS-X (s) Tenzing (s) Change what better for pure select-project queries with high selec- tivity. In comparison to the production evaluation engine, #2 129 93 39% faster the vector engine’s per-worker throughput was about three #4 70 69 1.4% faster times higher; the LLVM engine’s per-worker throughput was #1 155 213 38% slower six to twelve times higher. #3 9 28 3.1 times slower Our LLVM based query engine stores intermediate results by rows, even when both input and output are columnar. In comparison to columnar vector processing, we saw several Table 2: Tenzing versus DBMS-X. advantages and disadvantages. Throughput • For hash table based operators like aggregation and Rows Selected Vector LLVM Ratio join, using rows is more natural: composite key com- parison has cache locality, and conflict resolution is #8 104 99.58% 21 89 4.2 done using pointers. Our engine iterates on input rows #6 60 100.00% 8.1 25 3.2 and uses generated procedures that do both. However #1 19 99.16% 0.66 2.0 3.0 to our best knowledge, there is no straightforward and #2 19 0.81% 23 61 2.6 fast columnar solution for using hash table. Searching #7 104 58.18% 6.7 17 2.5 in hash table breaks the cache locality for columns, #3 57 3.74% 7.4 15 2.0 resulting in more random memory accesses. #5 104 4.28% 13 19 1.5 #4 40 2.49% 12 13 1.1 • Vector processing engines always materialize interme- diate results in main memory. In contrast our gener- ated native code can store results on the stack (better Table 3: LLVM and Vector Engine Benchmarks. data locality) or even registers, depending on how the JIT compiler optimizes. and 8 GB local disk each. Results are shown in table 2. • If selectivity of source data is low, vector processing The poor performance on query #3 is because query exe- engines load less data into cache and thus scan faster. cution time is dominated by startup time. The production Because our engine stores data in rows, scans end up version of Tenzing was used for the benchmarks; we believe reading more data and are slower. Tenzing’s results would have been significantly faster if the experimental LLVM engine discussed in the next section had • While LLVM provides support for debugging JITed been ready at benchmark time. code [16], there are more powerful analysis and debug- ging tools for the native C/C++ routines used by most 5.4 Experimental LLVM Query Engine vector processing engines. Our execution engine has gone through multiple itera- tions to achieve single-node efficiency close to commercial The majority of queries in our workload are analytic queries DBMS. The first implementation translated SQL expres- that have aggregations and/or joins. These operations are sions to Sawzall code. This code was then compiled using significant performance bottlenecks compared to other op- Sawzall’s just-in-time (JIT) compiler. However, this proved erators like selection and projection. We believe that native to be inefficient because of the serialization and deserializa- code generation engine deals better with these bottlenecks tion costs associated with translating to and from Sawzall’s and is a promising approach for query processing. native type system. The second and current implementation uses Dremel’s SQL expression evaluation engine, which is 6. RELATED WORK based on direct evaluation of parse trees of SQL expressions. While more efficient than the original Sawzall implementa- A significant amount of work has been done in recent years tion, it was still somewhat slow because of its interpreter-like in the area of large scale distributed data processing. These nature and row based processing. fall into the following broad categories: For the third iteration, we did extensive experiments with • Core frameworks for distributed data processing. Our two major styles of execution: LLVM based native code gen- focus has been on MapReduce [9], but there are sev- eration with row major block based intermediate data and eral others such as Nephele/PACT [3] and Dryad [15]. column major vector based processing with columnar in- Hadoop [8] is an open source implementation of the termediate storage. The results of benchmarking LLVM vs MapReduce framework. These powerful but complex vector on some typical aggregation queries is shown in table frameworks are designed for software engineers imple- 3. All experiments were done on the same dual-core Intel menting complex parallel algorithms. machine with 4 GB of RAM, with input data in columnar form in-memory. Note that the LLVM engine is a work • Simpler procedural languages built on top of these in progress and has not yet been integrated into Tenzing. frameworks. The most popular of these are Sawzall Table sizes are in millions of rows and throughput is again [19] and PIG [18]. These have limited optimizations measured in millions of rows per worker per second. The built in, and are more suited for reasonably experi- data suggests that the higher the selectivity of the where enced analysts, who are comfortable with a procedu- clause, the better the LLVM engine’s relative performance. ral programming style, but need the ability to iterate We found that the LLVM approach gave better overall re- quickly over massive volumes of data. sults for real-life queries while vector processing was some- 1325
  • 9. • Language extensions, usually with special purpose op- • The MapReduce team, specifically Jerry Zhao, Mari´n a timizers, built on top of the core frameworks. The Dvorsk´ and Derek Thomson, for working closely with y ones we have studied are FlumeJava [6], which is built us in implementing various enhancements to MapRe- on top of MapReduce, and DryadLINQ [23], which is duce. built on top of Dryad. These seem to be mostly geared towards programmers who want to quickly build large • The Dremel team, specifically Sergey Melnik and Matt scalable pipelines with relatively simple operations (e.g. Tolton, for ColumnIO storage and the SQL expression ETL pipelines for data warehouses). evaluation engine. • Declarative query languages built on top of the core • The Sawzall team, specifically Polina Sokolova and frameworks with intermediate to advanced optimiza- Robert Griesemer, for helping us embed the Sawzall tions. These are geared towards the reporting and language which we use as the primary scripting lan- analysis community, which is very comfortable with guage for supporting complex user-defined functions. SQL. Tenzing is mostly focused on this use-case, as we believe, are HIVE [21], SCOPE [5] and HadoopDB [1] • Various other infrastructure teams at Google, includ- (recently commercialized as Hadapt). The latter is an ing but not limited to the Bigtable, GFS, Machines interesting hybrid that tries to approach the perfor- and SRE teams. mance of parallel DBMS by using a cluster of single- node databases and MapReduce as the glue layer. • Our loyal and long-suffering user base, especially the people in Sales & Finance. • Embedding MapReduce and related concepts into tra- ditional parallel DBMSs. A number of vendors have 9. REFERENCES such offerings now, including Greenplum, AsterData, Paraccel and Vertica. [1] A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS 7. CONCLUSION technologies for analytical workloads. Proceedings of We described Tenzing, a SQL query execution engine built the VLDB Endowment, 2:922–933, August 2009. on top of MapReduce. We demonstrated that: [2] F.N. Afrati and J.D. Ullman. Optimizing joins in a Map-Reduce environment. In Proceedings of the 13th • It is possible to create a fully functional SQL engine International Conference on Extending Database on top of the MapReduce framework, with extensions Technology, pages 99–110. ACM, 2010. that go beyond SQL into deep analytics. [3] D. Battr´, S. Ewen, F. Hueske, O. Kao, V. Markl, and e D. Warneke. Nephele/pacts: a programming model • With relatively minor enhancements to the MapRe- and execution framework for web-scale analytical duce framework, it is possible to implement a large processing. In Proceedings of the 1st ACM symposium number of optimizations currently available in com- on Cloud computing, SoCC ’10, pages 119–130, New mercial database systems, and create a system which York, NY, USA, 2010. ACM. can compete with commercial MPP DBMS in terms of throughput and latency. [4] S. Blanas, J.M. Patel, V. Ercegovac, J. Rao, E.J. Shekita, and Y. Tian. A comparison of join algorithms • The MapReduce framework provides a combination of for log processing in MapReduce. In Proceedings of the high performance, high reliability and high scalability 2010 international conference on Management of data, on cheap unreliable hardware, which makes it an ex- pages 975–986. ACM, 2010. cellent platform to build distributed applications that [5] R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, involve doing simple to medium complexity operations D. Shakib, S. Weaver, and J. Zhou. Scope: easy and on large data volumes. efficient parallel processing of massive data sets. Proc. VLDB Endow., 1:1265–1276, August 2008. • By designing the engine and the optimizer to be aware [6] C. Chambers, A. Raniwala, F. Perry, S. Adams, R.R. of the characteristics of heterogeneous data sources, it Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: is possible to create a smart system which can fully uti- easy, efficient data-parallel pipelines. In Proceedings of lize the characteristics of the underlying data sources. the 2010 ACM SIGPLAN conference on Programming language design and implementation, pages 363–375. ACM, 2010. 8. ACKNOWLEDGEMENTS [7] F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Tenzing is built on top of a large number of existing Wallach, M. Burrows, T. Chandra, A. Fikes, and R.E. Google technologies. While it is not possible to list all in- Gruber. Bigtable: A distributed storage system for dividuals who have contributed either directly or indirectly structured data. ACM Transactions on Computer towards the implementation, we would like to highlight the Systems (TOCS), 26(2):1–26, 2008. contribution of the following: [8] D. Cutting et al. Apache Hadoop Project. http://hadoop.apache.org/. • The data warehouse management, specifically Hemant Maheshwari, for providing business context, manage- [9] J. Dean and S. Ghemawat. MapReduce: Simplified ment support and overall guidance. data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008. 1326
  • 10. [10] J. Dean and S. Ghemawat. MapReduce: a flexible USA, 2008. USENIX Association. data processing tool. Communications of the ACM, 53(1):72–77, 2010. APPENDIX [11] J. Dean, S. Ghemawat, K. Varda, et al. Protocol Buffers: Google’s Data Interchange Format. A. BENCHMARK QUERIES Documentation and open source release at This section contains the queries used for benchmarking http://code.google.com/p/protobuf/. performance against DBMS-X. The table sizes are indicated [12] D.J. DeWitt and M. Stonebraker. MapReduce: A using XM or XK suffix. major step backwards. The Database Column, 1, 2008. [13] E. Friedman, P. Pawlowski, and J. Cieslewicz. A.1 Query 1 SQL/MapReduce: A practical approach to SELECT DISTINCT dim2.dim1 id self-describing, polymorphic, and parallelizable FROM FactTable1 XXB stats user-defined functions. Proceedings of the VLDB INNER JOIN DimensionTable1 XXXM dim1 Endowment, 2(2):1402–1413, 2009. USING (dimension1 id) INNER JOIN DimensionTable2 XXM dim2 [14] S. Ghemawat, H. Gobioff, and S. Leung. The Google USING (dimension2 id) file system. SIGOPS Oper. Syst. Rev., 37:29–43, WHERE dim2.attr BETWEEN A and B October 2003. AND stats.date id > some date id [15] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. AND dim1.attr IN (Val1, Val2, Val3, ...); Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev., A.2 Query 2 41:59–72, March 2007. [16] R. Kleckner. LLVM: Debugging JITed Code With SELECT GDB. Retrieved June 27, 2011, from dim1.attr1, dates.finance week id, http://llvm.org/docs/DebuggingJITedCode.html. SUM(FN1(dim3.attr3, stats.measure1)), SUM(FN2(dim3.attr4, stats.measure2)), [17] S. Melnik, A. Gubarev, J.J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: FROM FactTable1 XXB stats Interactive Analysis of Web-Scale Datasets. INNER JOIN DimensionTable1 XXXM dim1 Proceedings of the VLDB Endowment, 3(1), 2010. USING (dimension1 id) INNER JOIN DatesDim dates [18] C. Olston, B. Reed, U. Srivastava, R. Kumar, and USING (date id) A. Tomkins. Pig Latin: a not-so-foreign language for INNER JOIN DimensionTable3 XXXK dim3 data processing. In Proceedings of the 2008 ACM USING (dimension3 id) SIGMOD international conference on Management of WHERE <fact date range> data, pages 1099–1110. ACM, 2008. GROUP BY 1, 2; [19] R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. A.3 Query 3 Scientific Programming, 13(4):277–298, 2005. [20] M. Stonebraker, D. Abadi, D.J. DeWitt, S. Madden, SELECT attr4 FROM ( E. Paulson, A. Pavlo, and A. Rasin. MapReduce and SELECT dim4.attr4, COUNT(∗) AS dup count FROM DimensionTable4 XXM dim4 parallel DBMSs: friends or foes? Communications of JOIN DimensionTable5 XXM dim5 the ACM, 53(1):64–71, 2010. USING (dimension4 id) [21] A. Thusoo, J.S. Sarma, N. Jain, Z. Shao, P. Chakka, WHERE dim4.attr1 BETWEEN Val1 and Val2 S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: AND dim5.attr2 IN (Val3, Val4) a warehousing solution over a Map-Reduce framework. GROUP BY 1 HAVING dup count = 1) x; Proceedings of the VLDB Endowment, 2(2):1626–1629, 2009. [22] H. Yang, A. Dasdan, R.L. Hsiao, and D.S. Parker. A.4 Query4 Map-Reduce-Merge: simplified relational data SELECT attr1, measure1 / measure2 processing on large clusters. In Proceedings of the 2007 FROM ( ACM SIGMOD international conference on SELECT Management of data, pages 1029–1040. ACM, 2007. attr1, ´ [23] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, SUM(FN1(attr2, attr3, attr4)) measure1, P. K. Gunda, and J. Currey. DryadLINQ: a system for SUM(FN2(attr5, attr6, attr7)) measure2, general-purpose distributed data-parallel computing FROM DimensionTable6 XXM WHERE FN3(attr8) AND FN4(attr9) using a high-level language. In Proceedings of the 8th GROUP BY 1 USENIX conference on Operating systems design and ) v; implementation, OSDI’08, pages 1–14, Berkeley, CA, 1327