MapReduce and DBMS Hybrids

12: MapReduce and DBMS Hybrids
Zubair Nabi
zubair.nabi@itu.edu.pk
May 26, 2013
Zubair Nabi 12: MapReduce and DBMS Hybrids May 26, 2013 1 / 37

Outline
1 Hive
2 HadoopDB
3 nCluster
4 Summary

Introduction
Data warehousing solution built atop Hadoop by Facebook
1
https://www.facebook.com/note.php?note_id=89508453919

Introduction
Now an Apache open source project
1

Introduction
Queries are expressed in SQL-like HiveQL, which are compiled into
map-reduce jobs
1

Introduction
map-reduce jobs
Also contains a type system for describing RDBMS-like tables
1

Introduction
map-reduce jobs
A system catalog, Hive-Metastore, which contains schemas and
statistics is used for data exploration and query optimization
1

Introduction
map-reduce jobs
Stores 2PB of uncompressed data at Facebook and is heavily used for
simple summarization, business intelligence, machine learning, among
many other applications1
1

Introduction
map-reduce jobs
Stores 2PB of uncompressed data at Facebook and is heavily used for
simple summarization, business intelligence, machine learning, among
many other applications1
Also used by Digg, Grooveshark, hi5, Last.fm, Scribd, etc.
1

Data Model
Tables:
Similar to RDBMS tables

Data Model
Tables:
Each table has a corresponding HDFS directory

Data Model
Tables:
The contents of the table are serialized and stored in ﬁles within that
directory

Data Model
Tables:
directory
Serialization can be both system provided or user deﬁned

Data Model
Tables:
directory
Serialization information of each table is also stored in the
Hive-Metastore for query optimization

Data Model
Tables:
directory
Serialization information of each table is also stored in the
Hive-Metastore for query optimization
Tables can also be deﬁned for data stored in external sources such as
HDFS, NFS, and local FS

Data Model (2)
Partitions:
Determine the distribution of data within sub-directories of the main
table directory

Data Model (2)
Partitions:
table directory
For instance, for a table T stored in /wh/T and partitioned on columns
ds and ctry

Data Model (2)
Partitions:
table directory
ds and ctry
Data with ds value 20090101 and ctry value US,

Data Model (2)
Partitions:
table directory
ds and ctry
Will be stored in ﬁles within /wh/T/ds=20090101/ctry=US

Data Model (2)
Partitions:
table directory
ds and ctry
Buckets:
Data within partitions is divided into buckets

Data Model (2)
Partitions:
table directory
ds and ctry
Buckets:
Buckets are calculated based on the hash of a column within the
partition

Data Model (2)
Partitions:
table directory
ds and ctry
Buckets:
Buckets are calculated based on the hash of a column within the
partition
Each bucket is stored within a ﬁle in the partition directory

Column Data Types
Primitive types: integers, ﬂoats, strings, dates, and booleans

Column Data Types
Nestable collection types: arrays and maps

Column Data Types
Nestable collection types: arrays and maps
Custom types: user-deﬁned

HiveQL
Supports select, project, join, aggregate, union all, and sub-queries

HiveQL
Tables are created using data deﬁnition statements with speciﬁc
serialization formats, partitioning, and bucketing

HiveQL
Data is loaded from external sources and inserted into tables

HiveQL
Support for multi-table insert – multiple queries on the same input data
using a single HiveQL statement

HiveQL
User-deﬁned column transformation and aggregation functions in Java

HiveQL
User-deﬁned column transformation and aggregation functions in Java
Custom map-reduce scripts written in any language can be embedded

Example: Facebook Status
Status updates are stored on ﬂat ﬁles in an NFS directory
/logs/status_updates

This data is loaded on a daily basis to a Hive table:
status_updates(userid int,status string,ds
string)

string)
Using:
1 LOAD DATA LOCAL INPATH ’/logs/status_updates’
2 INTO TABLE status_updates PARTITION (ds=’2013-05-26’)

string)
Using:
1 LOAD DATA LOCAL INPATH ’/logs/status_updates’
2 INTO TABLE status_updates PARTITION (ds=’2013-05-26’)
Detailed proﬁle information, such as gender and academic institution is
present in the table: profiles(userid int,school
string,gender int)

Example: Facebook Status (2)
Query to workout the frequency of status updates based on gender and
academic institution

Example: Facebook Status (2)
Query to workout the frequency of status updates based on gender and
academic institution
1 FROM (SELECT a.status, b.school, b.gender
2 FROM status_updates a JOIN profiles b
3 ON (a.userid = b.userid and
4 a.ds=’2013-05-26’)
5 ) subq1
6 INSERT OVERWRITE TABLE gender_summary
7 PARTITION(ds=’2013-05-26’)
8 SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender
9 INSERT OVERWRITE TABLE school_summary
10 PARTITION(ds=’2013-05-26’)
11 SELECT subq1.school, COUNT(1) GROUP BY subq1.school

Metastore
Similar to the metastore maintained by traditional warehousing
solutions such as Oracle and IBM DB2 (distinguishes Hive from Pig or
Cascading which have no such store)

Metastore
Stored in either a traditional DB such as MySQL or an FS such as NFS

Metastore
Contains the following objects:
Database: namespace for tables

Metastore
Table: metadata for a table including columns and their types, owner,
storage, and serialization information

Metastore
Table: metadata for a table including columns and their types, owner,
storage, and serialization information
Partition: metadata for a partition; similar to the information for a table

Outline
1 Hive
2 HadoopDB
3 nCluster
4 Summary

Introduction
Two options for data analytics on shared nothing clusters:
1 Parallel Databases, such as Teradata, Oracle etc. but,

Introduction
Assume that failures are a rare event

Introduction
Assume that hardware is homogeneous

Introduction
Never tested in deployments with more than a few dozen nodes

Introduction
2 MapReduce but,

Introduction
2 MapReduce but,
All shortcomings pointed by DeWitt and Stonebraker, as discussed
before

Introduction
2 MapReduce but,
All shortcomings pointed by DeWitt and Stonebraker, as discussed
before
At times an order of magnitude slower than parallel DBs

Hybrid
Combine scalability and non-existent monetary cost of MapReduce
with performance of parallel DBs
2
http://hadapt.com/

Hybrid
HadoopDB is such a hybrid
2
http://hadapt.com/

Hybrid
Unlike Hive, Pig, Greenplum, Aster, etc. which are language and
interface level hybrids, Hadoop DB is a systems level hybrid
2
http://hadapt.com/

Hybrid
Uses MapReduce as the communication layer atop a cluster of nodes
running single-node DBMS instances
2
http://hadapt.com/

Hybrid
PostgreSQL as the database layer, Hadoop as the communication
layer, and Hive as the translation layer
2
http://hadapt.com/

Hybrid
PostgreSQL as the database layer, Hadoop as the communication
layer, and Hive as the translation layer
Commercialized through the start up, Hadapt2
2
http://hadapt.com/

HadoopDB
Consists of four components:
1 Database Connector: Interface between per-node database systems
and Hadoop TaskTrackers

HadoopDB
2 Catalog: Meta-information about per-node databases

HadoopDB
3 Data Loader: Data partitioning across single-node databases

HadoopDB
3 Data Loader: Data partitioning across single-node databases
4 SQL to MapReduce to SQL (SMS) Planner: Translation between
SQL and MapReduce

HadoopDB Architecture

Database Connector
Uses the Java Database Connectivity (JDBC)-compliant Hadoop
InputFormat

Database Connector
InputFormat
The connector is served the SQL query and other information by the
MapReduce job

Database Connector
InputFormat
MapReduce job
The connector connects to the DB, executes the SQL query, and
returns results in the form of key/value pairs

Database Connector
InputFormat
MapReduce job
The connector connects to the DB, executes the SQL query, and
returns results in the form of key/value pairs
Hadoop in essence sees the DB as just another data source

Catalog
Contains information, such as:
1 Connection parameters, such as DB location, format, and any
credentials

Catalog
credentials
2 Metadata about the datasets, replica locations, and partitioning scheme

Catalog
credentials
2 Metadata about the datasets, replica locations, and partitioning scheme
Stored as an XML ﬁle on the HDFS

Data Loader
Consists of two key components:
1 Global Hasher: Executes a custom Hadoop job to repartition raw data
ﬁles from the HDFS into n parts, where n is the number of nodes in the
cluster

Data Loader
Consists of two key components:
1 Global Hasher: Executes a custom Hadoop job to repartition raw data
ﬁles from the HDFS into n parts, where n is the number of nodes in the
cluster
2 Local Hasher: Copies a partition from the HDFS to the node-local DB
of each node and further partitions it into smaller size chunks

SQL to MapReduce to SQL (SMS) Planner
Extends HiveQL in two key ways:
1 Before query execution, the Hive Metastore is updated with references
to HadoopDB tables, table schemas, formats, and serialization
information

SQL to MapReduce to SQL (SMS) Planner
Extends HiveQL in two key ways:
1 Before query execution, the Hive Metastore is updated with references
to HadoopDB tables, table schemas, formats, and serialization
information
2 All operators with partitioning keys similar to the node-local database
are converted into SQL queries and pushed to the database layer

Outline
1 Hive
2 HadoopDB
3 nCluster
4 Summary

Introduction
The declarative nature of SQL is too limiting for describing most big
data computation

Introduction
data computation
The underlying subsystems are also suboptimal as they do not
consider domain-speciﬁc optimizations

Introduction
data computation
nCluster makes use of SQL/MR, a framework that inserts user-deﬁned
functions in any programming language into SQL queries

Introduction
data computation
By itself, nCluster is a shared-nothing parallel database geared
towards analytic workloads

Introduction
data computation
Originally designed by Aster Data Systems and later acquired by
Teradata

Introduction
data computation
Originally designed by Aster Data Systems and later acquired by
Teradata
Used by Barnes and Noble, LinkedIn, SAS, etc.

SQL/MR Functions
Dynamically polymorphic: input and output schemes are decided at
runtime

SQL/MR Functions
runtime
Parallelizable across cores and machines

SQL/MR Functions
runtime
Composable because their input and output behaviour is identical to
SQL subqueries

SQL/MR Functions
runtime
SQL subqueries
Amenable to static and dynamic optimizations just like SQL subqueries
or a relation

SQL/MR Functions
runtime
SQL subqueries
or a relation
Can be implemented in a number of languages including Java, C#,
C++, Python, etc. and can thus make use of third-party libraries

SQL/MR Functions
runtime
SQL subqueries
or a relation
Can be implemented in a number of languages including Java, C#,
C++, Python, etc. and can thus make use of third-party libraries
Executed within processes to provide sandboxing and resource
allocation

Syntax
1 SELECT ...
2 FROM functionname(
3 ON table-or-query
4 [PARTITION BY expr, ...]
5 [ORDER BY expr, ...]
6 [clausename(arg, ...) ...]
7 )
8 ...
SQL/MR function appears in the FROM clause

Syntax
1 SELECT ...
3 ON table-or-query
7 )
8 ...
ON is the only required clause which speciﬁes the input to the function

Syntax
1 SELECT ...
3 ON table-or-query
7 )
8 ...
ON is the only required clause which speciﬁes the input to the function
PARTITION BY partitions the input to the function on one or more
attributes from the schema

Syntax (2)
1 SELECT ...
3 ON table-or-query
7 )
8 ...
ORDER BY sorts the input to the function and can only be used after a
PARTITION BY clause

Syntax (2)
1 SELECT ...
3 ON table-or-query
7 )
8 ...
PARTITION BY clause
Any number of custom clauses can also be deﬁned whose names and
arguments are passed as a key/value map to the function

Syntax (2)
1 SELECT ...
3 ON table-or-query
7 )
8 ...
PARTITION BY clause
Any number of custom clauses can also be deﬁned whose names and
arguments are passed as a key/value map to the function
Implemented as relations so easily nestable

Execution Model
Functions are equivalent to either map (row function) or reduce
(partition function) functions

Execution Model
Identical to MapReduce, these functions are executed across many
nodes and machines

Execution Model
nodes and machines
Contracts identical to MapReduce functions
Only one row function operates over a row from the input table

Execution Model
nodes and machines
Contracts identical to MapReduce functions
Only one row function operates over a row from the input table
Only one partition function operates over a group of rows deﬁned by the
PARTITION BY clause, in the order speciﬁed by the ORDER BY
clause

Programming Interface
A Runtime Contract is passed by the query planner to the
function which contains the names and types of the input columns and
the names and values of the argument clauses

The function then completes this contract by ﬁlling in the output
schema and making a call to complete()

Row and partition functions are implemented through the
operateOnSomeRows and operateOnPartition methods,
respectively

respectively
These methods are passed an iterator over their input rows and an
emitter object for returning output rows to the database

respectively
These methods are passed an iterator over their input rows and an
emitter object for returning output rows to the database
operateOnPartition can also optionally implement the combiner
interface

Installation
Functions need to be installed ﬁrst before they can be used

Installation
Can be supplied as a .zip along with third-party libraries

Installation
Install-time examination also enables static analysis of properties, such
as row function or partition function, support for combining, etc.

Installation
Any arbitrary file can be installed which is replicated to all workers,
such as configuration files, binaries, etc.

Installation
Any arbitrary file can be installed which is replicated to all workers,
such as configuration files, binaries, etc.
Each function is provided with a temporary directory which is garbage
collected after execution

Architecture
One or more Queen nodes process queries and hash partition them
across Worker nodes

Architecture
across Worker nodes
The query planner honours the Runtime Contract with the
function and invokes its initializer (Constructor in case of Java)

Architecture
across Worker nodes
Functions are executed within the Worker databases as separate
processes for isolation, security, resource allocation, forced
termination, etc.

Architecture
across Worker nodes
termination, etc.
The worker database implements a “bridge” which manages its
communication with the SQL/MR function

Architecture
across Worker nodes
termination, etc.
The worker database implements a “bridge” which manages its
communication with the SQL/MR function
The SQL/MR function process contains a “runner” which manages its
communication with the worker database

Architecture (2)

Example: Wordcount
1 SELECT token, COUNT(*)
2 FROM tokenizer(
3 ON input-table
4 DELIMITER(’ ’)
5 )
6 GROUP BY token;

Example: Clickstream Sessionization
Divide a user’s clicks on a website into sessions

A session includes the user’s clicks within a speciﬁed time period

A session includes the user’s clicks within a speciﬁed time period
Timestamp User ID
10:00:00 238909
00:58:24 7656
10:00:24 238909
02:30:33 7656
10:01:23 238909
10:02:40 238909
Timestamp User ID Session ID
10:00:00 238909 0
10:00:24 238909 0
10:01:23 238909 0
10:02:40 238909 1
00:58:24 7656 0
02:30:33 7656 1

Example: Clickstream Sessionization (2)
1 SELECT ts, userid, session
2 FROM sessionize (
3 ON clicks
4 PARTITION BY userid
5 ORDER BY ts
6 TIMECOLUMN (’ts’)
7 TIMEOUT (60)
8 );

Example: Clickstream Sessionization (3)
1 public class Sessionize implements PartitionFunction {
2
3 private int timeColumnIndex;
4 private int timeout;
5
6 public Sessionize(RuntimeContract contract) {
7 // Get time column and timeout from contract
8 // Define output schema
9 contract.complete();
10 }
11
12 public void operationOnPartition(
13 PartitionDefinition partition,
14 RowIterator inputIterator,
15 RowEmitter outputEmitter) {
16 // Implement the partition function logic
17 // Emit output rows
18 }
19
20 }

Outline
1 Hive
2 HadoopDB
3 nCluster
4 Summary

Summary
Hive, HadoopDB, and nCluster explore three different points in the design
space

Summary
space
1 Hive uses MapReduce to give DBMS-like functionality

Summary
space
2 HadoopDB uses MapReduce and DBMS side-by-side

Summary
space
2 HadoopDB uses MapReduce and DBMS side-by-side
3 nCluster implements MapReduce within a DBMS

References
1 Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad
Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham
Murthy. 2009. Hive: a warehousing solution over a map-reduce
framework. Proc. VLDB Endow. 2, 2 (August 2009), 1626-1629.
2 Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi
Silberschatz, and Alexander Rasin. 2009. HadoopDB: an architectural
hybrid of MapReduce and DBMS technologies for analytical workloads.
Proc. VLDB Endow. 2, 1 (August 2009), 922-933.
3 Eric Friedman, Peter Pawlowski, and John Cieslewicz. 2009.
SQL/MapReduce: a practical approach to self-describing, polymorphic,
and parallelizable user-deﬁned functions. Proc. VLDB Endow. 2, 2
(August 2009), 1402-1413.

MapReduce and DBMS Hybrids

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (19)

Similar to MapReduce and DBMS Hybrids

Similar to MapReduce and DBMS Hybrids (20)

More from Zubair Nabi

More from Zubair Nabi (10)

Recently uploaded

Recently uploaded (20)

MapReduce and DBMS Hybrids