Hive 3 a new horizon

1 © Hortonworks Inc. 2011–2018. All rights reserved
© Hortonworks Inc. 2011 – 2017
Apache Hive 3: A new horizon
Gunther Hagleitner, Thejas Nair, Gopal Vijayaraghavan

7000 analysts, 80ms average latency, 1PB data.
250k BI queries per hour
On demand deep reporting in the cloud over
100Tb in minutes.

© Hortonworks Inc. 2011- 2018. All rights reserved | 3
Agenda
● Data Analytics Studio
● Apache Hive 3
● Hive-Spark interoperability
● Performance
● Look ahead

Data Analytics Studio

Self-service question #1: Why is my query slow?
Noisy neighbors Poor schema Inefficient queries Unstable demand
Smart query
log search
Storage
Optimizations
Query
Optimizations
Demand
Shifting
Hortonworks Data Analytics Studio

DAS 1.0
Smart query log search
⬢ Query log reports: Most expensive queries, Long running queries, Hot files/tables, Space usage by table etc.
⬢ Query log filter/search: Tables not using statistics, queries not optimized by CBO etc.
Storage optimizations
⬢ Storage heatmap and data layout optimization suggestions
⬢ Storage level optimization recommendations
⬢ Batch operations
Query optimizations
⬢ Query level optimization recommendations
⬢ Detailed query level report
⬢ Admin alerts on expensive queries
Quality of life changes
⬢ Query editor auto complete
⬢ Data browser (Top 20 rows sample data)
⬢ Specify output destination (S3, CVS etc.)
⬢ Query kill

One of the Extensible DataPlane Services
⬢ DAS 1.0 available now for HDP 3.0!
⬢ Monthly release cadence
⬢ Replaces Hive & Tez Views
⬢ Separate install from stack
Hortonworks Data Analytics Studio
HORTONWORKS DATAPLANE SERVICE
DATA SOURCE INTEGRATION
DATA SERVICES CATALOG
…DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO
+OTHER
(partner)
SECURITY CONTROLS
CORE CAPABILITIES
MULTIPLE CLUSTERS AND SOURCES
MULTIHYBRID
*not yet available, coming soon
EXTENSIBLE SERVICES
IBM DSX*
DATA
ANALYTICS
STUDIO

Apache Hive 3

Hive3: EDW analyst pipeline
Tableau
BI systems
Materialized
view
Surrogate
key
Constraints
Query
Result
Cache
Workload
management
• Results return
from HDFS/cache
directly
• Reduce load from
repetitive queries
• Allows more
queries to be run
in parallel
• Reduce resource
starvation in large
clusters
• Active/Passive HA
• More “tools” for
optimizer to use
• More ”tools” for
DBAs to
tune/optimize
• Invisible tuning of
DB from users’
perspective
• ACID v2 is as fast as
regular tables
• Hive 3 is optimized
for S3/WASB/GCP
• Support for
JDBC/Kafka/Druid
out of the box
ACID v2
Cloud
Storage
Connectors

Connectors

Hive-1010: Information schema & sysdb
Question:
Find which tables have a column with ‘ssn’
as part of the column name?
use information_schema;
SELECT table_schema, table_name
FROM information_schema.columns
WHERE column_name LIKE '%ssn%';
Question:
Find the biggest tables in the system.
use sys;
SELECT tbl_name, total_size
FROM table_stats_view v, tbls t
WHERE t.tbl_id = v.tbl_id ORDER BY
cast(v.total_size as int) DESC LIMIT 3;

HIVE-1555: JDBC connector
• How did we build the information_schema?
• We mapped the metastore into Hive’s table space!
• Uses Hive-JDBC connector
• Read-only for now
• Supports automatic pushdown of full subqueries
• Cost-based optimizer decides part of query runs in
RDBMS versus Hive
• Joins, aggregates, filters, projections, etc

JDBC Table mapping example
CREATE TABLE postgres_table
(
id INT,
name varchar
);
CREATE EXTERNAL TABLE hive_table
(
id INT,
name STRING
)
STORED BY
'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "POSTGRES",
"hive.sql.jdbc.driver"="org.postgresql.Driver",
"hive.sql.jdbc.url"="jdbc:postgresql://...",
"hive.sql.dbcp.username"="jdbctest",
"hive.sql.dbcp.password"="",
"hive.sql.query"="select * from postgres_table",
"hive.sql.column.mapping" = "id=ID, name=NAME",
"hive.jdbc.update.on.duplicate" = "true"
);
In Postgres In Hive

Druid Connector
Realtime Node
Realtime Node
Realtime Node
Broker HiveServer2
Instantly analyze kafka data with milliseconds latency

Druid Connector - Joins between Hive and realtime data in Druid
Bloom filter pushdown greatly reduces data transfer
Send promotional email to all customers from CA who purchased more than 1000$ worth of merchandise today.
create external table sales(`__time` timestamp, quantity int, sales_price double,customer_id bigint, item_id int, store_id int)
stored by 'org.apache.hadoop.hive.druid.DruidStorageHandler'
tblproperties ( "kafka.bootstrap.servers" = "localhost:9092", "kafka.topic" = "sales-topic",
"druid.kafka.ingestion.maxRowsInMemory" = "5");
create table customers (customer_id bigint, first_name string, last_name string, email string, state string);
select email from customers join sales using customer_id where to_date(sales.__time) = date ‘2018-09-06’
and quantity * sales_price > 1000 and customers.state = ‘CA’;

Kafka Connector
LLAP Node
LLAP Node
LLAP Node
Query
Coordinator
HiveServer2
Ad-hoc / Ingest / Transform

Kafka connector
Transformation over stream in real time
I want to have moving average over sliding window in kafka from stock ticker kafka stream.
create external table
tickers (`__time` timestamp , stock_id bigint, stock_sym varchar(4), price decimal (10,2), exhange_id int)
stored by 'org.apache.hadoop.hive.kafka.KafkaStorageHandler’
tblproperties ("kafka.topic" = "stock-topic", "kafka.bootstrap.servers"="localhost:9092",
"kafka.serde.class"="org.apache.hadoop.hive.serde2.JsonSerDe");
create external table
moving_avg (`__time` timestamp , stock_id bigint, avg_price decimal (10,2)
stored by 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
tblproperties ("kafka.topic" = "averages-topic", "kafka.bootstrap.servers"="localhost:9092",
"kafka.serde.class"="org.apache.hadoop.hive.serde2.JsonSerDe");
Insert into table moving_avg select CURRENT_TIMESTAMP, stock_id, avg(price) group by stock_id,
from tickers where __timestamp > to_unix_timestamp(CURRENT_TIMESTAMP - 5 minutes) * 1000

Table types

Managed and External Tables
• Hive 3 cleans up semantics of managed and external tables
• External: Outside control and management of data
• Managed: Fully under Hive control, ACID only
• Non-native tables are external
• ACID: Full IUD on ORC, Insert-only on other formats
• Defaults have changed
• Managed: ORC + ACID
• External: TextFile
• Two tablespaces with different permissions & ownership

Differences between external and managed tables
• Storage based auth (doAs=true) is supported for external tables
• Ranger and SBA can co-exist in HDP 3 (Ranger is default)
• Script to convert from file permissions to Ranger policies on tables
Note: SBA in HDP 3 requires ACL in HDFS. ACL is turned on by default in HDP3
Hive managed table
ACID on by default
No SBA, Ranger auth only
Statistics and other optimizations apply
Spark access via
HiveWarehouseConnector
External tables
No ACID, Text by default
SBA possible
Some optimizations unavailable
Spark direct file access

ACID v2
V1: CREATE TABLE hello_acid (load_date date, key int, value int)
CLUSTERED BY(key) INTO 3 BUCKETS
STORED AS ORC TBLPROPERTIES ('transactional'='true');
V2: CREATE TABLE hello_acid_v2 (load_date date, key int, value int);
• Performance just as good as non-ACID tables
• No bucketing required
• Fully compatible with native cloud storage

SQL Enhancements

Materialized view
Optimizing workloads and queries without changing the SQL
SELECT distinct dest,origin
FROM flights;
SELECT origin, count(*)
FROM flights
GROUP BY origin
HAVING origin = ‘OAK’;
CREATE MATERIALIZED VIEW flight_agg
AS
SELECT dest,origin,count(*)
FROM flights
GROUP BY dest,origin;

Materialized view - Maintenance
• Partial table rewrites are supported
• Typical: Denormalize last month of data only
• Rewrite engine will produce union of latest and historical data
• Updates to base tables
• Invalidates views, but
• Can choose to allow stale views (max staleness) for performance
• Can partial match views and compute delta after updates
• Incremental updates
• Common classes of views allow for incremental updates
• Others need full refresh

Constraints & defaults
• Helps optimizer to produce better plans
• BI tool integrations
• Data Integrity
• hive.constraint.notnull.enforce = true
• SQL compatibility & offload scenarios
Example:
CREATE TABLE Persons (
ID Int NOT NULL,
Name String NOT NULL,
Age Int,
Creator String DEFAULT CURRENT_USER(),
CreateDate Date DEFAULT CURRENT_DATE(),
PRIMARY KEY (ID) DISABLE NOVALIDATE
);
CREATE TABLE BusinessUnit (
ID Int NOT NULL,
Head Int NOT NULL,
Creator String DEFAULT CURRENT_USER(),
CreateDate Date DEFAULT CURRENT_DATE(),
PRIMARY KEY (ID) DISABLE NOVALIDATE,
CONSTRAINT fk FOREIGN KEY (Head)
REFERENCES Persons(ID) DISABLE
NOVALIDATE
);

Default clause & Surrogate keys
• Multiple approaches
• Sequence number is dense & increasing, but: Bottleneck in distributed DBMS
• UUID is easy & distributable, but large and slow
• Surrogate key UDF is easy & distributable & fast, but: No sequence and has gaps
CREATE TABLE AIRLINES_V2
(ID BIGINT DEFAULT SURROGATE_KEY(24,16,24),
CODE STRING,
DESCRIPTION STRING,
PRIMARY KEY (ID) DISABLE NOVALIDATE);
INSERT INTO AIRLINES_V2 (CODE, DESCRIPTION) SELECT * FROM AIRLINES;
ALTER TABLE FLIGHTS ADD COLUMNS (carrier_sk BIGINT);
MERGE INTO FLIGHTS f USING AIRLINES_V2 a ON f.uniquecarrier = a.code
WHEN MATCHED THEN UPDATE SET carrier_sk = a.id;

⬢ Solution
● Query fails because of stats estimation error
● Runtime sends observed statistics back to
coordinator
● Statistics overrides are created at session, server
or global level
● Query is replanned and resubmitted
Hive-17626: Optimizer is learning from planning mistakes
⬢ Symptoms
● Memory exhaustion due to under
provisioning
● Excessive runtime (future)
● Excessive spilling (future)

Multitenancy

HIVE-17481: LLAP workload management
⬢ Effectively share LLAP cluster resources
– Resource allocation per user policy; separate ETL and BI, etc.
⬢ Resources based guardrails
– Protect against long running queries, high memory usage
⬢ Improved, query-aware scheduling
– Scheduler is aware of query characteristics, types, etc.
– Fragments easy to pre-empt compared to containers
– Queries get guaranteed fractions of the cluster, but can use
empty space

Common Triggers
● ELAPSED_TIME
● EXECUTION_TIME
● TOTAL_TASKS
● HDFS_BYTES_READ, HDFS_BYTES_WRITTEN
● CREATED FILES
● CREATED_DYNAMIC_PARTITIONS
Example
CREATE RESOURCE PLAN guardrail;
CREATE TRIGGER guardrail.long_running WHEN EXECUTION_TIME > 2000 DO KILL;
ALTER TRIGGER guardrail.long_running ADD TO UNMANAGED;
ALTER RESOURCE PLAN guardrail ENABLE ACTIVATE;
Guardrail Example

Resource plans example
CREATE RESOURCE PLAN daytime;
CREATE POOL daytime.bi WITH ALLOC_FRACTION=0.8, QUERY_PARALLELISM=5;
CREATE POOL daytime.etl WITH ALLOC_FRACTION=0.2, QUERY_PARALLELISM=20;
CREATE RULE downgrade IN daytime WHEN total_runtime > 3000 THEN MOVE etl;
ADD RULE downgrade TO bi;
CREATE APPLICATION MAPPING tableau in daytime TO bi;
ALTER PLAN daytime SET default pool= etl;
APPLY PLAN daytime;
daytime
bi: 80% etl: 20%
Downgrade when total_runtime>3000

BI caching

HIVE-18513: Query result cache
Returns results directly from storage (e.g.
HDFS) without actually executing the query
If
the same query had ran before
Important for dashboards, reports etc.
where repetitive queries is common
Without cache
With cache

HIVE-18513: Query result cache details
• hive.query.results.cache.enabled=true (on by default)
• Works only on hive managed tables
• If you JOIN an external table with Hive managed table, Hive will fall back to executing the full
query. Because Hive can’t know if external table data has changed
• Works with ACID
• That means if Hive table has been updated, the query will be rerun automatically
• Is different from LLAP cache
• LLAP cache is a data cache. That means multiple queries can benefit by avoiding reading from disk.
Speeds up the read path.
• Result cache effectively bypasses execution of query
• Stored at /tmp/hive/__resultcache__/, default space is 2GB, LRU eviction
• Tunable setting hive.query.results.cache.max.size (bytes)

Metastore Cache
• With query execution time being < 1 sec, compilation time starts to dominate
• Metadata retrieval is often significant part of compilation time. Most of it is in RDBMS
queries.
• Cloud RDBMS As a Service is often slower, and frequent queries leads to throttling.
• Metadata cache speeds compilation time by around 50% with onprem mysql.
Significantly more improvement with cloud RDBMS.
• Cache is consistent in single metastore setup, eventually consistent with HA setup.
Consistent HA setup support is in the works.

Phew. That was a lot.

Hive 3 feature summary
⬢ EDW offload
– Surrogate key and constraints
– Information schema
– Materialized views
⬢ Perf
– Query result & metastore caches
– LLAP workload management
⬢ Real-time capabilities with Kafka
– Ingest in ACID tables
– Instantly query using Druid
⬢ Unified SQL
– JDBC connector
– Druid connector
– Spark-hive connector
⬢ Table types
– ACID v2 and on by default
– External v Managed
⬢ Cloud
– AWS/GCP/Azure cloud storage
natively supported now

Spark-Hive connect

Hive is broadening its reach
SQL over Hadoop
• External tables only
• No ACID
• No LLAP
• SBA OK (doAs=True)
• Some perf penalty
Hive as EDW
• ACID
• LLAP
• doAs=False
• Managed tables
• Column-level security
• Stats, better perf etc.
External table/
Direct
Hive Warehouse
Connector
Spark

Why change and why change now?
• Advancements: Today’s workloads require higher performance, better security, better
ingest and update tooling. All of which requires Hive to take more control over things
• Predictability: What is perceived to work today doesn’t actually work
• Visible breakages are better than hidden, implied breakages
• Innovation: HDP 3 is a major release where we have the opportunity to innovate on the
architecture, next point is at least a few years out

Hive’s features require more control
• Interactive query at scale
• Data caching and query result caching
• LLAP Workload management
• Materialized view
• ACID everywhere, single system that supports data change, query, etc.
• No need for bucketing
• Now faster with ACID v2, on par performance with non ACID tables
• Information schema, Kafka integration and many other goodies

Features: Spark access to ACID & column security
⬢ Hive supports traditional ACID semantics
– ORC with delta files to support low-latency writes
– Compaction to prevent storage fragmentation
– Custom readers to reconcile deltas on read
⬢ ACID tables use extended Metastore format
⬢ Spark doesn’t read/write ACID tables
⬢ Spark doesn’t use ACID Metastore format
Support Spark reads/writes for ACID tables

Features: Spark access to Ranger tables
– Column-level access control
– Column masking
• “show only first four chars of string column”
– Row-level access control
• “show only rows WHERE …”
Support Spark reads/writes for Ranger tables Ranger UI

Driver
MetaStore
HiveServer+Tez
LLAP DaemonsExecutors
Spark
Meta
Hive
Meta
Executors LLAP Daemons
ACID
TablesX
X
Spark can’t read/write ACID tables
Spark doesn’t use ACID Metastore format

Driver
MetaStore
HiveServer2
Spark
Meta
Hive
Meta
Isolate Spark and Hive Catalogs/Tables
Leverage connector for Spark <-> Hive
Uses Apache Arrow for fast data transfer
HWC
HWC

Connector WRITE API
a) hive.executeUpdate(sql : String) : Bool
• Create, Update, Alter, Insert, Merge, Delete, etc…
b) df.write.format(HIVE_WAREHOUSE_CONNECTOR)
• Write DataFrame using LOAD DATA INTO TABLE
c) df.write.format(STREAM_TO_STREAM)
• Write Streaming DataFrame using Hive-Streaming

Driver
MetaStore
HiveServer2
Spark
Meta
Hive
Meta
HWC (Thrift JDBC)
a) hive.executeUpdate(“INSERT INTO s SELECT * FROM t”)
1. Driver submits update op to HiveServer2
2. Process update through Tez and/or LLAP
3. HWC returns true on success
1
2
3

Example: LOAD to Hive
df.select("ws_sold_time_sk", "ws_ship_date_sk")
.filter("ws_sold_time_sk > 80000")
.write.format(HIVE_WAREHOUSE_CONNECTOR)
.option("table", “my_acid_table”)
.save()

Driver
MetaStore
HiveServer2
Spark
Meta
Hive
Meta
b) df.write.format(HIVE_WAREHOUSE_CONNECTOR).save()
1.Driver launches DataWriter tasks
2.Tasks write ORC files
3.On commit, Driver executes LOAD DATA INTO TABLE
HDFS
/tmp
1
2
3
ACID
Tables

Example: Stream to Hive
val df = spark.readStream.format("socket")
...
.load()
df.writeStream.format(STREAM_TO_STREAM)
.option(“table”, “my_acid_table”)
.start()

Driver
MetaStore
HiveServer+Tez
Executors
Spark
Meta
Hive
Meta
Executors
c) df.write.format(STREAM_TO_STREAM).start()
1.Driver launches DataWriter tasks
2.Tasks open Txns
3.Write rows to ACID tables in Tx
ACID
Tables
1
2
3

Performance

• Ran all 99 TPCDS queries
• Total query runtime have improved multifold in each release!
Benchmark journey
TPCDS 10TB scale on 10 node cluster
HDP 2.5
Hive1
HDP 2.5
LLAP
HDP 2.6
LLAP
25x 3x 2x
HDP 3.0
LLAP
2016 20182017
ACID
tables

• Faster analytical queries with improved vectorization in HDP 3.0
• Vectorized execution of PTF, rollup and grouping sets.
• Perf gain compared to HDP 2.6
• TPCDS query67 ~ 10x!
OLAP Vectorization

SELECT * FROM
( SELECT AVG(ss_list_price) B1_LP,
COUNT(ss_list_price) B1_CNT , COUNT(DISTINCT
ss_list_price) B1_CNTD
FROM store_sales
WHERE ss_quantity BETWEEN 0 AND 5 AND
(ss_list_price BETWEEN 11 and 11+10 OR
ss_coupon_amt BETWEEN 460 and 460+1000 OR
ss_wholesale_cost BETWEEN 14 and 14+20)) B1,
( SELECT AVG(ss_list_price) B2_LP,
COUNT(ss_list_price) B2_CNT , COUNT(DISTINCT
ss_list_price) B2_CNTD
FROM store_sales
WHERE ss_quantity BETWEEN 6 AND 10 AND
(ss_list_price BETWEEN 91 and 91+10 OR
ss_coupon_amt BETWEEN 1430 and 1430+1000 OR
ss_wholesale_cost BETWEEN 32 and 32+20)) B2,
. . .
LIMIT 100;
TPCDS SQL query 28 joins 6 instances of store_sales table
Shared scan - 4x improvement!
RS RS RS RS RS
Scan
store_sales
Combined OR’ed B1-B6 Filters
B1 Filter B2 Filter B3 Filter B4 Filter B5 Filter
Join

• Dramatically improves performance of very selective joins
• Builds a bloom filter from one side of join and filters rows from other side
• Skips scan and further evaluation of rows that would not qualify the join
Dynamic Semijoin Reduction - 7x improvement for q72
SELECT …
FROM sales JOIN time ON
sales.time_id = time.time_id
WHERE time.year = 2014 AND
time.quarter IN ('Q1', 'Q2’)
Reduced scan on sales

We’ve made it! Questions?

Oh. One more thing.

⬢ Hive on Kubernetes solves:
– Hive/LLAP side install (to main cluster)
– Multiple versions of Hive
– Multiple warehouse & compute instances
– Dynamic configuration and secrets management
– Stateful and work preserving restarts (cache)
– Rolling restart for upgrades. Fast rollback to
previous good state.
Hive on Kubernetes (WIP)
Kubernetes Hosting Environments
AWS GCP
Data OS
CPU / MEMORY /
STORAGE
OPENSHIFTAZURE
CLOUD PROVIDERS
ON-PREM/
HYBRID
DATA PLANE SERVICES
Cluster Lifecycle Manager Data Analytics Studio (DAS) Organizational Services
COMPUTE CLUSTER
SHARED SERVICES
Ranger
Atlas
Metastore
Tiller API Server
DAS Web Service
Query Coordinators
Query Executors
Registry
Blobstore
Indexe
r
RDBMS
Hive Server
Long-running kubernetes cluster
Inter-cluster communication Intra-cluster communication
Ingress Controller or
Load Balancer
Internal Service Endpoint for
ReplicaSet or StatefulSet
Ephemeral kubernetes cluster

Backup slides

Data Analytics Studio

SOLUTIONS: Pre-defined searches to quickly narrow
down problematic queries in a large cluster

SOLUTIONS: Data Analytics Studio gives database
heatmap, quickly discover and see what part of your
cluster is being utilized more

SOLUTIONS: Full featured Auto-complete, results
direct download, quick-data preview and many other
quality-of-life improvements

SOLUTIONS: Heuristic recommendation engine
Fully self-serviced query and storage optimization

SOLUTIONS: Built-in batch operations
No more scripting needed for day-to-day operations

Materialized view navigation
The query planner will automatically navigate to existing views

Other changes
• Hive-CLI is gone, beeline now connects to HS2
• Make sure Beeline has network connectivity to HS2
• Hive on MapReduce is gone
• By default, MapReduce jobs on Hive will fail
• There is a global option to let jobs automatically fallback to Tez. Good for moving a lot of jobs
instead of manually one by one.
• WebHCat is gone
• Because it depends on Hive-CLI
• We’ll bring it back in HDP 3.1
• Oozie hive action does not work, will be re-implemented in HDP 3.x to use HS2

Workload management
• Cluster is divided into query pools (optionally
nested)
• Resource plan can be switched without stopping
queries, e.g. based on time of day
• Queries are automatically routed to pools based on
user name or app
• Rules to kill, move, or deprioritized queries based on
DFS usage, runtime, etc.
• Docs here

Hive 3 a new horizon

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hive 3 a new horizon

Similar to Hive 3 a new horizon (20)

More from Artem Ervits

More from Artem Ervits (6)

Recently uploaded

Recently uploaded (20)

Hive 3 a new horizon