Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing

Apache Hive: From MapReduce to
Enterprise-grade Big Data Warehousing
Slim Bouguerra
(bslim AT apache DOT org )
Apache Druid PMC
Apache Hive Committer
Apache Calcite Committer
Jesús Camacho-Rodríguez, Ashutosh Chauhan, Alan Gates,
Eugene Koifman, Owen O’Malley, Vineet Garg, Zoltan
Haindrich, Sergey Shelukhin, Prasanth Jayachandran,
Siddharth Seth, Deepak Jaiswal, Slim Bouguerra, Nishant
Bangarwa, Sankar Hariappan, Anishek Agarwal, Jason
Dere, Daniel Dai, Thejas Nair, Nita Dembla, Gopal
Vijayaraghavan, Günther Hagleitner
SIGMOD 2019 Industrial Track

© 2019 Cloudera, Inc. All rights reserved. 2
BRIEF HISTORY
HDFS, MapReduce, Hive, and Pig
• Hadoop (HDFS, MapReduce) is open sourced in 2006
– Ubiquitous platform for inexpensive data storage and processing
– Focused mainly in ETL and batch reporting workloads
• Hive (Facebook) and Pig (Yahoo!) are developed to expose a SQL-ISH
higher-level abstraction for data processing on top of MapReduce
“To the developers of the Hive and Pig database systems, for developing seminal software systems that
served to bring relational-style declarative programming to the Hadoop ecosystem”
2018 SIGMOD Systems Award
NO-SQL

MOTIVATION
• Evolve from SQL-LIKE and Batch TO Low latency
FULL SQL engine On Hadoop.
– Oﬄoad existing workloads from Major expensive
MPP databases!
Option 1
Implement new system
Option 2
Extend existing system
– Exists !!!
– Years worth of hackers code on the
open source community.
– Handles very well XXXL size ETL.
– Handles very well lot of Hadoop/Blob
storage consistency edge cases.

MOTIVATION
Goal
• Requirements for our implementation
– Compliant: support SQL standard and provide ACID guarantees
– Eﬃcient: use optimization techniques present in other MPP databases
– Flexible: work reliably for multiple use cases
– Extensible: able to interact with other data processing engines

APACHE HIVE IMPROVEMENTS
Compliant
SQL and ACID support
Flexible
Runtime latency
Eﬃcient
Query optimization
Extensible
Federation capabilities

Eﬃcient
Query optimization
Compliant
Flexible
Runtime latency
Extensible

SQL AND ACID SUPPORT
ACID implementation
• Implementation of ACID compliant record level transactions
– Support to execute INSERT, UPDATE, DELETE and MERGE statements
• How to Build this ?
– Transaction manager
– Overcome Hadoop/Cloud ﬁle system limitations (no updates and s3 fuzzy
consistency)
• Multi-version optimistic concurrency control (MVOCC)
– Snapshot isolation level
– Single statement transactions across tables
– Performance comparable to non-transactional tables

HiveServer2
WriteId = 1
Table contents
Write transaction
Transaction
Manager
open transaction
TxnId
get WriteId (table1, TxnId)
WriteId
table1/
├── delta_001_001/
│ ├── 0000
│ └── 0001
├── delete_delta_002_002/
│ ├── 0000
│ └── 0001
└── delta_003_003/
└── 0000
WriteId = 2
WriteId = 3
‘john’ ‘doe’INSERT record
<ROW__ID> null nullDELETE record
<ROW__ID>
Identiﬁes uniquely every record in the table
commit (TxnId)
Hive Metastore

HiveServer2
Table contents
Read transaction
Transaction
Manager
get snapshot
<TXN_ID_LIST>
get snapshot (table1, <TXN_ID_LIST>)
<WRITE_ID_LIST>
table1/
├── delta_001_001/
│ ├── 0000
│ └── 0001
│ ├── 0000
│ └── 0001
└── delta_003_003/
└── 0000
‘john’ ‘doe’INSERT record
<ROW__ID> null nullDELETE record
<ROW__ID>
Ignored by record reader
Record reader performs anti-semijoin
WRITE_ID_LIST = [2, ()]
Hive Metastore

Compactor
• Minor compaction: Merge
ﬁles in delta directories
• Major compaction: Merge
delta ﬁles with base
directories
Table contents
table1/
├── delta_001_001/
│ ├── 0000
│ └── 0001
│ ├── 0000
│ └── 0001
└── delta_003_003/
└── 0000
Table contents
table1/
├── delta_001_003/
│ ├── 0000
│ └── 0001
└── delete_delta_002_002/
├── 0000
└── 0001
Table contents
table2/
├── base_100/
│ ├── 0000
│ └── 0001
└── delta_101_103/
├── 0000
└── 0001
Table contents
table2/
└── base_103/
├── 0000
└── 0001

Eﬃcient
Query optimization
Compliant
Flexible
Runtime latency
Extensible

QUERY OPTIMIZATION
Work smarter, not harder
• Rule and cost-based optimizer based on Apache Calcite
– Representing queries at the right abstraction level is critical to implementing
advanced optimization algorithms
• Query reoptimization
– Catches runtime errors and re-executes query, changing conﬁguration parameters
(overlay) or using statistics captured at runtime (re-optimize)
• Query results cache
– Reuses the results of a previously executed query by checking the internal
transactional state of the participating tables

QUERY OPTIMIZATION
Work smarter, not harder
• Materialized views:
– Transparent query rewriting (rich SQL dialect), incremental maintenance
• Shared work:
– Identifying overlapping subexpressions within executing plan of a given query,
computing them only once and reusing their results
• Dynamic semijoin:
– Reduces the size of intermediate results during query execution by skipping complete
partitions (dynamic partition pruning) or row groups (index semijoin)

Eﬃcient
Query optimization
Compliant
Flexible
Runtime latency
Extensible

RUNTIME LATENCY
Motivation
• Previous improvements introduced by Stinger initiative reduced query latency
by orders of magnitude
– Apache Tez, columnar storage formats and vectorized operators
• Architecture tailored towards cluster throughput
– Execution requires containers allocation → Startup time overhead
– Containers killed after query execution → JIT compiler optimizations not effective
– Impossible to exploit data sharing and caching → Unnecessary IO overhead

RUNTIME LATENCY
Apache Hive architecture (next-gen) LLAP
JDBC, ODBC,
Beeline
YARN cluster
HDFS
Object stores
(AWS, GCP, Azure)
Apache Druid, JDBC,
other external enginesRDBMS
Node manager Node manager Node manager Node manager Node manager
Query Coordinator
Container
Container
Container Container Container Container
Container
Hive Metastore
HiveServer2
LLAP daemon LLAP daemon LLAP daemon LLAP daemon LLAP daemon
Shared Hive services
Infrastructure / Hadoop
Ephemeral per query tasks
LLAP
Coordinator

Query
Coordinator
RUNTIME LATENCY
LLAP daemon anatomy
LLAP daemon
Execution IO elevatorWork queue
Fragment
Fragment
Fragment
Executor
Fragment
Fragment
Fragment
Fragment
Fragment
Executor
Fragment
Executor
Fragment
Executor
Fragment
IO queue
Request
Reader
Reader
Data
(HDFS, object store)
Request
Request
Query
Coordinator
Query
Coordinator
Off-heap cache
(encoded data)

RUNTIME LATENCY
Data caching in LLAP
• Fine-grained compact data cache
– Keep only the columns and rows that are accessed
– Data is stored encoded to minimize memory footprint
– Cache ﬁle metadata to enable PPD pushdown with no FS reads!
• Supports most common ﬁle formats ORC, Parquet, Text
• Incremental: Adding new data to your tables does not invalidate the cache
• Plugable replacement policy: FIFO, LRFU.

RUNTIME LATENCY
Multi-tenant deployments
• Fragment preemption based on state, priorities
• Workload manager
– Deﬁne plans to share effectively LLAP cluster resources
– Resource-based guardrail policies
Resource plan
Resource pool
BI: 80%
Resource pool
ETL: 20%
Downgrade when runtime > 3s

TPC-DS 10TB running 10 Nodes querying ACID tables on HDFS

Eﬃcient
Query optimization
Compliant
Flexible
Runtime latency
Extensible

FEDERATED WAREHOUSE SYSTEM
Motivation
• Growing proliferation of specialized data management systems
• Apache Hive as a mediator
– Use a blend of systems to achieve desired performance and functionality
– Implement data movement and transformations between systems
– Globally enforce access control and capture audit trails (Apache Ranger)
– Meet compliance requirements (Apache Atlas)

FEDERATED WAREHOUSE SYSTEM
Storage handler + Calcite adapter
• Storage handler implementation deﬁnes how to interact with another data
processing engine
– Treats engine as a external Hive table
• Calcite adapters deﬁne which operations can be pushed to the engine and how
to generate queries for it
• Currently supported systems include Apache Druid, Kafka and JDBC sources
Query
Planning (Calcite) Execution
op1 op2
op3
op5
op6
op4
op1 op2
op3
op5
op6
op4
op5
op6
op4op7

Conclusion and road ahead
• Hive’s architecture and design principles have
proven to be powerful in today’s analytic
landscape
• The work done by the community has taken
Hive a step closer to other existing MPP
database engines
7000 analysts, 80ms average latency, 1PB data
250k BI queries per hour
• Future improvements to Apache Hive
– Compliant, eﬃcient, ﬂexible, extensible

Containerized Hive in the Cloud
Work in progress
• Hive on Kubernetes
– Hive/LLAP side install (to main cluster)
– Multiple versions of Hive
– Multiple warehouse & compute instances
– Dynamic conﬁguration and secrets
management
– Stateful and work preserving restarts (cache)
– Rolling restart for upgrades. Fast rollback to
previous good state

BRIEF HISTORY
Wide adoption of Hadoop in the enterprise
• YARN for resource management and job scheduling in Hadoop
• Increase workloads executed natively within Hadoop
– Batch, interactive, iterative, streaming
Scalability ServiceabilityMulti-tenancy Locality awareness
Reliability / Availability Secure and auditable operation
High Cluster Utilization
Support for programming model diversity
Backwards compatibleFlexible resource model

MOTIVATION
Why extending Hive?
• Apache Hive provided a solid foundation to satisfy these requirements
– Already designed for large-scale reliable computation in Hadoop
– Provided SQL compatibility (alas, limited)
– Implemented connectivity to other systems in the Hadoop ecosystem
• However, it needed to evolve and undergo major renovation

MOTIVATION
Apache Hive architecture (before 2.0)
JDBC, ODBC,
Beeline
YARN cluster
HDFS
Object stores
(AWS, GCP, Azure)
Apache Druid, JDBC,
other external enginesRDBMS
Query Coordinator
Container
Container
Container Container Container Container
Container
Hive Metastore
HiveServer2
Shared Hive services
Infrastructure / Hadoop
Ephemeral per query tasks

Ofﬂoad data from kafka exactly once.

RUNTIME LATENCY
Low-latency analytical processing
• Interactive queries require more fundamental enhancements
• LLAP (Live Long And Process) optional layer
– Persistent multi-threaded query executors
– Asynchronous IO and multi-tenant in-memory data cache
– Compatible with existing execution runtime

Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing

Similar to Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing (20)

Recently uploaded

Recently uploaded (20)

Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing