SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011–2018. All rights reserved.
What is new in Apache Hive?
Ashutosh Chauhan
2 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Hive – Distant Past – First Five Years
• Initial use case: batch processing
• Circa 2008
• Read-only data
• MapReduce
• HiveQL
3 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Hive – Past 5 Years
• Effort to take Hive beyond its batch processing roots
• Started in Apache Hive 0.10.0 (January 2013)
• Latest released version: Apache Hive 3.0 (May 2018)
• Extensive renovation along four different axes
• Runtime : Enable sub-second queries - LLAP
• Compiler : Cost Based Optimizer
• SQL support : Improved coverage of SQL syntax
• Transactional Support : ACID
4 © Hortonworks Inc. 2011–2018. All rights reserved.
Hive – Today
• Comprehensive ANSI SQL including all TPC-DS Queries.
• The only Hadoop SQL with ACID MERGE for easy updates.
• In-Memory caching for MPP performance at Hadoop scale.
• Enables Per-User dynamic row and column security.
• Enables Replication and DR for critical workloads.
• Compatible with every major BI Tool.
• Proven at 300+ PB Scale.
5 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Hive: Fast Facts
Most Queries Per Hour
100,000 Queries Per Hour
Analytics Performance
100 Million rows/s Per Node
Largest Hive Warehouse
300+ PB Raw Storage
Largest Cluster
4,500+ Nodes
6 © Hortonworks Inc. 2011–2018. All rights reserved.
Hive: Serving ETL Workloads to BI Systems
BI
systems
Materialized
view
Improved
Stats
Constraints
Query
Result
Cache
Workload
manage
ment
ACID v2
• Results return
from HDFS/cache
directly
• Reduce load from
repetitive queries
• Allows more
queries to be run
in parallel
• Reduce resource
starvation in large
clusters
• Also:
Active/Passive HA
• More “tools” for
optimizer to use
• More ”tools” for
DBAs to
tune/optimize
• Invisible tuning of
DB from users’
perspective
• ACID v2 is as fast
as regular tables
7 © Hortonworks Inc. 2011–2018. All rights reserved.
• SIGMOD Software Systems Award
• “For developing seminal software systems that served to bring relational-style
declarative programming to the Hadoop ecosystem.”
• Postgres, SQLLite and MonetDB
8 © Hortonworks Inc. 2011–2018. All rights reserved.
Hive – How Did We Get Here?
• LLAP Enhancements
• CBO Enhancements
• ACID Enhancements
9 © Hortonworks Inc. 2011–2018. All rights reserved.
Materialized Views in Hive
10 © Hortonworks Inc. 2011–2018. All rights reserved.
Accelerating Query Processing
• Change data physical properties (distribute, sort)
• Filter rows
• Denormalize
• Preaggregate
Optimization based on access patterns
11 © Hortonworks Inc. 2011–2018. All rights reserved.
Materialized Views to Rescue
 Speed up aggregates and joins via MVs
 View navigation via CBO/Calcite
 Optionally allow rewrites against out-of-date
materializations
12 © Hortonworks Inc. 2011–2018. All rights reserved.
Materialized Views in Hive 3
• Multiple storage options: Hive, Druid
• Multiple options to control materialized views lifecycle
13 © Hortonworks Inc. 2011–2018. All rights reserved.
Materialized View-based Rewriting
• Materialized view definition
CREATE MATERIALIZED VIEW mv AS
SELECT <dims>,
lo_revenue,
lo_extprice * lo_disc AS d_price,
lo_revenue - lo_supplycost,
FROM
customer, dates, lineorder, part, supplier
WHERE
lo_orderdate = d_datekey
and lo_partkey = p_partkey
and lo_suppkey = s_suppkey
and lo_custkey = c_custkey;
• Query
SELECT sum(lo_extendedprice*lo_discount)
FROM
lineorder, dates
WHERE
lo_orderdate = d_datekey
and d_year = 2013
and lo_discount between 1 and 3;
• Materialized view-based rewriting
SELECT SUM(d_price)
FROM mv
WHERE
d_year = 2013
and lo_discount between 1 and 3;
supplier
part
dates
customerlineorder
mv contents
Query results
14 © Hortonworks Inc. 2011–2018. All rights reserved.
Rebuilding Materialized Views
• ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD;
• Incremental materialized view maintenance
• Only refresh data that has changed in source tables
15 © Hortonworks Inc. 2011–2018. All rights reserved.
Accelerating Query Processing with
Materialized Views in Apache Hive
Jesus Camacho Rodriguez
Tuesday, June 19
2:50 PM - 3:30 PM
Executive Ballroom 210A/E
16 © Hortonworks Inc. 2011–2018. All rights reserved.
Workload Management
17 © Hortonworks Inc. 2011–2018. All rights reserved.
Overview
• Effectively share LLAP cluster resources
• Resource allocation per user policy; separate ETL and BI, etc.
• Resources based guardrails
• Protect against long running queries, high memory usage
• Improved, query-aware scheduling
• Scheduler is aware of query characteristics, types, etc.
• Fragments easy to pre-empt compared to containers
• Queries get guaranteed fractions of the cluster, but
can use empty space
18 © Hortonworks Inc. 2011–2018. All rights reserved.
Resource Plans
• Resource plan is a workload management configuration for a cluster
• Switching is allowed without stopping queries, e.g. based on time of day
• Cluster is divided into query pools (optionally nested)
• Each pool defines query parallelism, cluster resources percentage
• Queries are automatically routed to pools based on user name, app, etc.
• Rules (Triggers) to kill, move, or deprioritized queries based on DFS usage, runtime, etc.
• Example :
CREATE RESOURCE PLAN daytime;
CREATE POOL bi IN daytime (resource_percent=75, concurrent_queries=5);
CREATE POOL etl IN daytime TIME (resource_percent=25, concurrent_queries=10);
CREATE RULE downgrade IN daytime WHEN total_runtime > 120 THEN MOVE etl;
ADD RULE downgrade TO bi IN daytime ;
CREATE MAPPING tableau IN daytime (application='Tableau', pool=bi);
ALTER PLAN daytime SET default_pool='etl';
APPLY PLAN daytime;
19 © Hortonworks Inc. 2011–2018. All rights reserved.
Decentralized Guaranteed Resources
• A guaranteed task for each resource (executor slots)
• HS2 gives N guaranteed tasks to an AM based on configured resource plan
• AMs mark N of its most important tasks as guaranteed at any given time
• Guaranteed tasks pre-empt speculative tasks
20 © Hortonworks Inc. 2011–2018. All rights reserved.
Guaranteed Tasks – BI and ETL Example
BI (80% = 14 guaranteed) ETL (20% = 4 guaranteed)
Query 1 Query 2
LLAP Daemon 1 LLAP Daemon 2 LLAP Daemon 3
Wait Queue
Executors
10 active tasks (running):
10 guaranteed (running)
4 unused for now
19 active tasks (8 running):
4 guaranteed (4 running)
15 speculative (4 running)
HS2
18 executors total
21 © Hortonworks Inc. 2011–2018. All rights reserved.
Caching
22 © Hortonworks Inc. 2011–2018. All rights reserved.
Caching for BI Workloads
• Fine-grained (columnar), compact (dictionary, RLE encoded)
• Important due to projections over many wide EDW tables
• Prioritized – indexes are cached with higher priority
• Important to make use of predicate pushdown
• Off-heap (no extra GC), supports SSD
• LRFU replacement policy avoids the damage from large scans
23 © Hortonworks Inc. 2011–2018. All rights reserved.
Caching for BI Workloads – Formats, Zero-ETL
• ORC, Parquet
• Cached natively
• Zero-ETL analytics on CSV and JSON data with text caching
• Text is efficiently encoded in background; once cached, queries speed up
24 © Hortonworks Inc. 2011–2018. All rights reserved.
In-memory Processing – Native Columnar (ORC)
I/O threads
SSD
cache
Off-heap
cacheCompact encoded data
Distributed FS
Compressed data
Decoder: ORC
col1
col2
Compression
codec
Read planner
Execution thread
Fragment
Hive
operator
Hive
operator
Vectorized
processing
col1 col2
Native data
vectors
Replacement
policy
25 © Hortonworks Inc. 2011–2018. All rights reserved.
Running Hive queries fast in the cloud
Nita Dembla
Wednesday, June 20
4:00 PM - 4:40 PM
Grand Ballroom 220C
26 © Hortonworks Inc. 2011–2018. All rights reserved.
Druid + Apache Hive
Layer Data Access Pattern Features
Hive Layer Large Scale analytics
Joins
Subqueries
Windowing Functions
Transformations
Complex Aggregations
Advanced Sorting
UDFs
Druid Layer
Needles-in-a-haystack queries with
large numbers of dimensions
Dimensional Aggregates
Top N Queries
Min/Max Values
Timeseries Queries
Approximate Distinct Count
Approximate Histograms
27 © Hortonworks Inc. 2011–2018. All rights reserved.
Druid Integration
• Pushdown of aggregate queries
• Pushdown of complex expressions
• Improvements in Druid to support sql standard NULL semantics
• Store MV In Druid
28 © Hortonworks Inc. 2011–2018. All rights reserved.
Hive 3: Real-time Ingestion
Hive
Kafka-Druid-
Hive ingest
Druid
Real-time analytics
• Druid answers in near real-time
29 © Hortonworks Inc. 2011–2018. All rights reserved.
Druid and Hive Together: Interactive
Realtime Analytics at Scale
Nishant Bangarwa
Tuesday, June 19
4:50 PM - 5:30 PM
Grand Ballroom 220B
30 © Hortonworks Inc. 2011–2018. All rights reserved.
Acid V2
• New On disk storage format for Acid tables
• Run major compactions before you upgrade
• Update = Delete + Insert
• Performance at par with non-Acid tables
• Support for load statements
• New Streaming ingestion library
31 © Hortonworks Inc. 2011–2018. All rights reserved.
Insert-only Tables
• Transactional Semantics for non-ORC tables
• For insert into and Insert overwrite
• With near-zero overhead
• No rename() - Cloud friendly
32 © Hortonworks Inc. 2011–2018. All rights reserved.
Transactional Operations in Apache Hive
Eugene Koifman
Wednesday, June 20
11:50 AM - 12:30 PM
Executive Ballroom 210A/E
33 © Hortonworks Inc. 2011–2018. All rights reserved.
Disaster Recovery for Hive Data
A
A B
B
CentralizedSecurityandGovernance
On-Premise
Data Center (a)
On-Premise
Data Center (b)
Scheduled Policy (A)
(2am, 10am, 6pm daily)
Scheduled Policy (B)
(2am daily)
1 Data replication with scheduled policy
2 Disaster takes down Data Center (b)
3 Failover to Data Center (a); data set B made active
4 Active data set B changes to B’ in Data Center (a)
34 © Hortonworks Inc. 2011–2018. All rights reserved.
Hive-based Replication
• Replv2 introduces new REPL commands
• Incremental replication - only copy delta changes
• Point-in time replication.
• Hive maintains the replication state.
• Additional support for other database objects - for ex: functions, constraint etc.
• Reduce number of copies.
35 © Hortonworks Inc. 2011–2018. All rights reserved.
Seamless Replication and Disaster
Recovery for Apache Hive Warehouse
Sankar Hariappan
Thursday, June 21
9:30 AM - 10:10 AM
Meeting Room 211A/B/C/D
36 © Hortonworks Inc. 2011–2018. All rights reserved.
One Metastore to Rule Them All
HDFS/S3 Kafka
Hive
LLAP
Spark
HMS Atlas
RangerSR
Hive
on Tez
37 © Hortonworks Inc. 2011–2018. All rights reserved.
Between Us and the Grand Vision
• Make HMS separable from Hive
• Standalone Metastore
• Unify HMS and Schema Registry so batch and streaming can see each other’s data
• Also reduces the number of metadata systems admins have to install and maintain
38 © Hortonworks Inc. 2011–2018. All rights reserved.
Sharing Metadata Across the Data Lake
and Streams
Alan Gates
Wednesday, June 20
11:50 AM - 12:30 PM
Meeting Room 230A
39 © Hortonworks Inc. 2011–2018. All rights reserved.
External Access –
Spark Llap
40 © Hortonworks Inc. 2011–2018. All rights reserved.
External Access – Relational View for Everyone
• Hive-on-Tez and other DAG executors can use LLAP directly
• LLAP also provides a "relational datanode" view of the data
• Anyone (with access) can push the (approved) code in, from complex query fragments to
simple data reads
• E.g. a Spark DataFrame can be created with LlapInputFormat
• Gives the external services the access to
• Hive data: centralized, secure data access
• Ability to read all Hive table types, like ACID transactional tables
• Hive features: from column-level security, to LLAP columnar cache
41 © Hortonworks Inc. 2011–2018. All rights reserved.
Support Row/Column-level Security in Spark
spark-shell
pyspark
42 © Hortonworks Inc. 2011–2018. All rights reserved.
What Is Required?
• Apache Ranger
• Apache Hive with LLAP
• Spark-LLAP
• A library to integrate above tech with SparkSQL
43 © Hortonworks Inc. 2011–2018. All rights reserved.
HiveServer2 + LLAP + Ranger
YARN Cluster
HiveServer2
Client App
Hive Query
Coordinator
Plan Generation
TableScan: users
Filter: state = ‘CA’
Projection:
mask(name)
SQL Query:
select name from users
1.Client sends query to HiveServer2.
2.Query plan generation by HiveServer2. Ranger
security policies applied. Plan modified based on
dynamic security policies.
3.Query plan sent to query coordinator
4.Query plan sent to LLAP daemons for execution.
Filtering/masking performed.
5.Results consolidated and sent to client
1 Ranger
Dynamic Policies
5 2
3 4
LLAP
LLAP
LLAP Daemons
44 © Hortonworks Inc. 2011–2018. All rights reserved.
LLAP
InputFor
mat
YARN Cluster
HiveServer2
Client App
Hive Query
Coordinator
Plan Generation
TableScan: users
Filter: state = ‘CA’
Projection:
mask(name)
SQL Query:
select name from users
1.Client requests data locations known as “splits”
from HiveServer2.
2.Query plan generation by HiveServer2. Ranger
security policies applied. Plan modified based on
dynamic security policies.
3.Splits returned to client which include signed
query plan.
4.LLAP splits used by client to securely submit
query plan to LLAP. Filtering/masking performed.
Data returned to client.
1 Ranger
Dynamic Policies
3 2
LLAP
LLAP
LLAP Daemons
HiveServer2 + LLAP + Ranger
4
45 © Hortonworks Inc. 2011–2018. All rights reserved.
“Other” Improvements
• Query reoptimization
• Constraints
• Vectorization
• Query Cache
• Active Passive HS2 HA for llap
• HLL BitVectors
• CachedStore
• Numerous enhancements in Spark Integration
46 © Hortonworks Inc. 2011–2018. All rights reserved.
Future
• Standalone Metastore
• Materialized Views – Automatic Recommendations
• Better integration with cloud storage
• HS2 scalability
47 © Hortonworks Inc. 2011–2018. All rights reserved.
Thanks
to Open Source Community
for continued success for last
10 years.
Now,
Onwards to next 10 years

More Related Content

What's hot

What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
enissoz
 
Ceph on Windows
Ceph on WindowsCeph on Windows
Ceph on Windows
Ceph Community
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
DataWorks Summit
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
Eric Xiao
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
Anshum Gupta
 
Redefining tables online without surprises
Redefining tables online without surprisesRedefining tables online without surprises
Redefining tables online without surprises
Nelson Calero
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
Carol McDonald
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
Databricks
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
Jacques Nadeau
 
DAT341_Working with Amazon ElastiCache for Redis
DAT341_Working with Amazon ElastiCache for RedisDAT341_Working with Amazon ElastiCache for Redis
DAT341_Working with Amazon ElastiCache for Redis
Amazon Web Services
 
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Sean Cohen
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
DataWorks Summit
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
Neville Li
 
Oracle Enterprise Manager 12c: EMCLI Crash Course
Oracle Enterprise Manager 12c: EMCLI Crash CourseOracle Enterprise Manager 12c: EMCLI Crash Course
Oracle Enterprise Manager 12c: EMCLI Crash Course
Gokhan Atil
 

What's hot (20)

What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Ceph on Windows
Ceph on WindowsCeph on Windows
Ceph on Windows
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Redefining tables online without surprises
Redefining tables online without surprisesRedefining tables online without surprises
Redefining tables online without surprises
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
 
DAT341_Working with Amazon ElastiCache for Redis
DAT341_Working with Amazon ElastiCache for RedisDAT341_Working with Amazon ElastiCache for Redis
DAT341_Working with Amazon ElastiCache for Redis
 
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Oracle Enterprise Manager 12c: EMCLI Crash Course
Oracle Enterprise Manager 12c: EMCLI Crash CourseOracle Enterprise Manager 12c: EMCLI Crash Course
Oracle Enterprise Manager 12c: EMCLI Crash Course
 

Similar to What's new in apache hive

Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
HortonworksJapan
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
DataWorks Summit
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
DataWorks Summit
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
DataWorks Summit
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
Timothy Spann
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
alanfgates
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
 

Similar to What's new in apache hive (20)

Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

What's new in apache hive

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved. What is new in Apache Hive? Ashutosh Chauhan
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Hive – Distant Past – First Five Years • Initial use case: batch processing • Circa 2008 • Read-only data • MapReduce • HiveQL
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Hive – Past 5 Years • Effort to take Hive beyond its batch processing roots • Started in Apache Hive 0.10.0 (January 2013) • Latest released version: Apache Hive 3.0 (May 2018) • Extensive renovation along four different axes • Runtime : Enable sub-second queries - LLAP • Compiler : Cost Based Optimizer • SQL support : Improved coverage of SQL syntax • Transactional Support : ACID
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved. Hive – Today • Comprehensive ANSI SQL including all TPC-DS Queries. • The only Hadoop SQL with ACID MERGE for easy updates. • In-Memory caching for MPP performance at Hadoop scale. • Enables Per-User dynamic row and column security. • Enables Replication and DR for critical workloads. • Compatible with every major BI Tool. • Proven at 300+ PB Scale.
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Hive: Fast Facts Most Queries Per Hour 100,000 Queries Per Hour Analytics Performance 100 Million rows/s Per Node Largest Hive Warehouse 300+ PB Raw Storage Largest Cluster 4,500+ Nodes
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved. Hive: Serving ETL Workloads to BI Systems BI systems Materialized view Improved Stats Constraints Query Result Cache Workload manage ment ACID v2 • Results return from HDFS/cache directly • Reduce load from repetitive queries • Allows more queries to be run in parallel • Reduce resource starvation in large clusters • Also: Active/Passive HA • More “tools” for optimizer to use • More ”tools” for DBAs to tune/optimize • Invisible tuning of DB from users’ perspective • ACID v2 is as fast as regular tables
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved. • SIGMOD Software Systems Award • “For developing seminal software systems that served to bring relational-style declarative programming to the Hadoop ecosystem.” • Postgres, SQLLite and MonetDB
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved. Hive – How Did We Get Here? • LLAP Enhancements • CBO Enhancements • ACID Enhancements
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved. Materialized Views in Hive
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved. Accelerating Query Processing • Change data physical properties (distribute, sort) • Filter rows • Denormalize • Preaggregate Optimization based on access patterns
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved. Materialized Views to Rescue  Speed up aggregates and joins via MVs  View navigation via CBO/Calcite  Optionally allow rewrites against out-of-date materializations
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved. Materialized Views in Hive 3 • Multiple storage options: Hive, Druid • Multiple options to control materialized views lifecycle
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved. Materialized View-based Rewriting • Materialized view definition CREATE MATERIALIZED VIEW mv AS SELECT <dims>, lo_revenue, lo_extprice * lo_disc AS d_price, lo_revenue - lo_supplycost, FROM customer, dates, lineorder, part, supplier WHERE lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and lo_custkey = c_custkey; • Query SELECT sum(lo_extendedprice*lo_discount) FROM lineorder, dates WHERE lo_orderdate = d_datekey and d_year = 2013 and lo_discount between 1 and 3; • Materialized view-based rewriting SELECT SUM(d_price) FROM mv WHERE d_year = 2013 and lo_discount between 1 and 3; supplier part dates customerlineorder mv contents Query results
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved. Rebuilding Materialized Views • ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD; • Incremental materialized view maintenance • Only refresh data that has changed in source tables
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved. Accelerating Query Processing with Materialized Views in Apache Hive Jesus Camacho Rodriguez Tuesday, June 19 2:50 PM - 3:30 PM Executive Ballroom 210A/E
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved. Workload Management
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved. Overview • Effectively share LLAP cluster resources • Resource allocation per user policy; separate ETL and BI, etc. • Resources based guardrails • Protect against long running queries, high memory usage • Improved, query-aware scheduling • Scheduler is aware of query characteristics, types, etc. • Fragments easy to pre-empt compared to containers • Queries get guaranteed fractions of the cluster, but can use empty space
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved. Resource Plans • Resource plan is a workload management configuration for a cluster • Switching is allowed without stopping queries, e.g. based on time of day • Cluster is divided into query pools (optionally nested) • Each pool defines query parallelism, cluster resources percentage • Queries are automatically routed to pools based on user name, app, etc. • Rules (Triggers) to kill, move, or deprioritized queries based on DFS usage, runtime, etc. • Example : CREATE RESOURCE PLAN daytime; CREATE POOL bi IN daytime (resource_percent=75, concurrent_queries=5); CREATE POOL etl IN daytime TIME (resource_percent=25, concurrent_queries=10); CREATE RULE downgrade IN daytime WHEN total_runtime > 120 THEN MOVE etl; ADD RULE downgrade TO bi IN daytime ; CREATE MAPPING tableau IN daytime (application='Tableau', pool=bi); ALTER PLAN daytime SET default_pool='etl'; APPLY PLAN daytime;
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved. Decentralized Guaranteed Resources • A guaranteed task for each resource (executor slots) • HS2 gives N guaranteed tasks to an AM based on configured resource plan • AMs mark N of its most important tasks as guaranteed at any given time • Guaranteed tasks pre-empt speculative tasks
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved. Guaranteed Tasks – BI and ETL Example BI (80% = 14 guaranteed) ETL (20% = 4 guaranteed) Query 1 Query 2 LLAP Daemon 1 LLAP Daemon 2 LLAP Daemon 3 Wait Queue Executors 10 active tasks (running): 10 guaranteed (running) 4 unused for now 19 active tasks (8 running): 4 guaranteed (4 running) 15 speculative (4 running) HS2 18 executors total
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved. Caching
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved. Caching for BI Workloads • Fine-grained (columnar), compact (dictionary, RLE encoded) • Important due to projections over many wide EDW tables • Prioritized – indexes are cached with higher priority • Important to make use of predicate pushdown • Off-heap (no extra GC), supports SSD • LRFU replacement policy avoids the damage from large scans
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved. Caching for BI Workloads – Formats, Zero-ETL • ORC, Parquet • Cached natively • Zero-ETL analytics on CSV and JSON data with text caching • Text is efficiently encoded in background; once cached, queries speed up
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved. In-memory Processing – Native Columnar (ORC) I/O threads SSD cache Off-heap cacheCompact encoded data Distributed FS Compressed data Decoder: ORC col1 col2 Compression codec Read planner Execution thread Fragment Hive operator Hive operator Vectorized processing col1 col2 Native data vectors Replacement policy
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved. Running Hive queries fast in the cloud Nita Dembla Wednesday, June 20 4:00 PM - 4:40 PM Grand Ballroom 220C
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved. Druid + Apache Hive Layer Data Access Pattern Features Hive Layer Large Scale analytics Joins Subqueries Windowing Functions Transformations Complex Aggregations Advanced Sorting UDFs Druid Layer Needles-in-a-haystack queries with large numbers of dimensions Dimensional Aggregates Top N Queries Min/Max Values Timeseries Queries Approximate Distinct Count Approximate Histograms
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved. Druid Integration • Pushdown of aggregate queries • Pushdown of complex expressions • Improvements in Druid to support sql standard NULL semantics • Store MV In Druid
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved. Hive 3: Real-time Ingestion Hive Kafka-Druid- Hive ingest Druid Real-time analytics • Druid answers in near real-time
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved. Druid and Hive Together: Interactive Realtime Analytics at Scale Nishant Bangarwa Tuesday, June 19 4:50 PM - 5:30 PM Grand Ballroom 220B
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved. Acid V2 • New On disk storage format for Acid tables • Run major compactions before you upgrade • Update = Delete + Insert • Performance at par with non-Acid tables • Support for load statements • New Streaming ingestion library
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved. Insert-only Tables • Transactional Semantics for non-ORC tables • For insert into and Insert overwrite • With near-zero overhead • No rename() - Cloud friendly
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved. Transactional Operations in Apache Hive Eugene Koifman Wednesday, June 20 11:50 AM - 12:30 PM Executive Ballroom 210A/E
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved. Disaster Recovery for Hive Data A A B B CentralizedSecurityandGovernance On-Premise Data Center (a) On-Premise Data Center (b) Scheduled Policy (A) (2am, 10am, 6pm daily) Scheduled Policy (B) (2am daily) 1 Data replication with scheduled policy 2 Disaster takes down Data Center (b) 3 Failover to Data Center (a); data set B made active 4 Active data set B changes to B’ in Data Center (a)
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved. Hive-based Replication • Replv2 introduces new REPL commands • Incremental replication - only copy delta changes • Point-in time replication. • Hive maintains the replication state. • Additional support for other database objects - for ex: functions, constraint etc. • Reduce number of copies.
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved. Seamless Replication and Disaster Recovery for Apache Hive Warehouse Sankar Hariappan Thursday, June 21 9:30 AM - 10:10 AM Meeting Room 211A/B/C/D
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved. One Metastore to Rule Them All HDFS/S3 Kafka Hive LLAP Spark HMS Atlas RangerSR Hive on Tez
  • 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved. Between Us and the Grand Vision • Make HMS separable from Hive • Standalone Metastore • Unify HMS and Schema Registry so batch and streaming can see each other’s data • Also reduces the number of metadata systems admins have to install and maintain
  • 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved. Sharing Metadata Across the Data Lake and Streams Alan Gates Wednesday, June 20 11:50 AM - 12:30 PM Meeting Room 230A
  • 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved. External Access – Spark Llap
  • 40. 40 © Hortonworks Inc. 2011–2018. All rights reserved. External Access – Relational View for Everyone • Hive-on-Tez and other DAG executors can use LLAP directly • LLAP also provides a "relational datanode" view of the data • Anyone (with access) can push the (approved) code in, from complex query fragments to simple data reads • E.g. a Spark DataFrame can be created with LlapInputFormat • Gives the external services the access to • Hive data: centralized, secure data access • Ability to read all Hive table types, like ACID transactional tables • Hive features: from column-level security, to LLAP columnar cache
  • 41. 41 © Hortonworks Inc. 2011–2018. All rights reserved. Support Row/Column-level Security in Spark spark-shell pyspark
  • 42. 42 © Hortonworks Inc. 2011–2018. All rights reserved. What Is Required? • Apache Ranger • Apache Hive with LLAP • Spark-LLAP • A library to integrate above tech with SparkSQL
  • 43. 43 © Hortonworks Inc. 2011–2018. All rights reserved. HiveServer2 + LLAP + Ranger YARN Cluster HiveServer2 Client App Hive Query Coordinator Plan Generation TableScan: users Filter: state = ‘CA’ Projection: mask(name) SQL Query: select name from users 1.Client sends query to HiveServer2. 2.Query plan generation by HiveServer2. Ranger security policies applied. Plan modified based on dynamic security policies. 3.Query plan sent to query coordinator 4.Query plan sent to LLAP daemons for execution. Filtering/masking performed. 5.Results consolidated and sent to client 1 Ranger Dynamic Policies 5 2 3 4 LLAP LLAP LLAP Daemons
  • 44. 44 © Hortonworks Inc. 2011–2018. All rights reserved. LLAP InputFor mat YARN Cluster HiveServer2 Client App Hive Query Coordinator Plan Generation TableScan: users Filter: state = ‘CA’ Projection: mask(name) SQL Query: select name from users 1.Client requests data locations known as “splits” from HiveServer2. 2.Query plan generation by HiveServer2. Ranger security policies applied. Plan modified based on dynamic security policies. 3.Splits returned to client which include signed query plan. 4.LLAP splits used by client to securely submit query plan to LLAP. Filtering/masking performed. Data returned to client. 1 Ranger Dynamic Policies 3 2 LLAP LLAP LLAP Daemons HiveServer2 + LLAP + Ranger 4
  • 45. 45 © Hortonworks Inc. 2011–2018. All rights reserved. “Other” Improvements • Query reoptimization • Constraints • Vectorization • Query Cache • Active Passive HS2 HA for llap • HLL BitVectors • CachedStore • Numerous enhancements in Spark Integration
  • 46. 46 © Hortonworks Inc. 2011–2018. All rights reserved. Future • Standalone Metastore • Materialized Views – Automatic Recommendations • Better integration with cloud storage • HS2 scalability
  • 47. 47 © Hortonworks Inc. 2011–2018. All rights reserved. Thanks to Open Source Community for continued success for last 10 years. Now, Onwards to next 10 years