SlideShare a Scribd company logo
1 of 29
1 © Hortonworks Inc. 2011–2018. All rights reserved
Accelerating query processing with
materialized views in Apache Hive
Jesús Camacho Rodríguez
DataWorks Summit San Jose
June 19, 2018
2 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hive
• Initial use case: batch processing
• Read-only data
• HiveQL (SQL-like query language)
• MapReduce
• Effort to take Hive beyond its batch processing roots
• Started in Apache Hive 0.10.0 (January 2013)
• Latest release: Apache Hive 3.0 (May 2018)
• Extensive renovation to improve three different axes
• Latency: allow interactive and sub-second queries
• Scalability: from TB to PB of data
• SQL support: move from HiveQL to SQL standard
3 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hive
• Multiple execution engines: Apache Tez and Apache Spark
• More efficient join execution algorithms
• Vectorized query execution
• Integration with columnar storage formats:
Apache ORC, Apache Parquet
• LLAP (Live Long and Process)
• Persistent deamons for low-latency queries
• Rule-based and cost-based optimizer
• Better statistics
• Tighter integration with other data processing systems: Druid
Important internals improvements
4 © Hortonworks Inc. 2011–2018. All rights reserved
Accelerating query processing
• Change data physical properties (distribute, sort)
• Filter rows
• Denormalize
• Preaggregate
Optimization based on access patterns
5 © Hortonworks Inc. 2011–2018. All rights reserved
Accelerating query processing
• Establish relationship between original and new tables
• Has a similar table already been created?
• Rewrite your queries to use new tables
• What happens when access patterns change?
• Maintain your new tables when original tables change
• Do I have to fully rebuild new tables?
Optimization based on access patterns
Currently, Hive users
have to do it manually
6 © Hortonworks Inc. 2011–2018. All rights reserved
Materialized views
• A materialized view is an entity that contains the result of a evaluating a query
• Important property  Awareness of the materialized view definition semantics
• Optimizer can exploit them for automatic query rewriting
• System can handle maintenance of the materialized views
• Generally, materializations can be created in different forms depending on the scope
• DBA writes “CREATE MATERIALIZED VIEW” statement
• Daemon creates materialized view based on recent query activity
• Cached result of previous similar query
• Query factorization identifies common pieces within a single query
7 © Hortonworks Inc. 2011–2018. All rights reserved
Possible workflow
1. Create materialized view using Hive tables
• Stored by Hive or Druid
2. User or dashboard sends queries to Hive
• Hive rewrites queries using available materialized views
• Execute rewitten query
Dashboards, BI tools
CREATE MATERIALIZED VIEW `ssb_mv`
STORED AS 'org.apache.hadoop.hive.druid.DruidStorageHandler'
ENABLE REWRITE
AS
<query>;
DBA, recommendation system
①
②
Data
Queries
8 © Hortonworks Inc. 2011–2018. All rights reserved
Materialized views in Apache Hive
• First implementation is part of Apache Hive 3.0
• Multiple storage options: Hive, Druid
• Automatic rewriting of incoming queries to use materialized views
• Efficient view maintenance
• Incremental refresh
• Multiple options to control materialized views lifecycle
9 © Hortonworks Inc. 2011–2018. All rights reserved
Management of
materialized views in Hive
10 © Hortonworks Inc. 2011–2018. All rights reserved
Materialized view creation
• CREATE MATERIALIZED VIEW statement
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name
[DISABLE REWRITE]
[COMMENT materialized_view_comment]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
AS
<query>;
⇢ Supports custom table properties, storage format, etc.
11 © Hortonworks Inc. 2011–2018. All rights reserved
Materialized view creation (stored in Druid)
• CREATE MATERIALIZED VIEW statement
CREATE MATERIALIZED VIEW druid_wiki_mv
STORED AS 'org.apache.hadoop.hive.druid.DruidStorageHandler'
AS
SELECT __time, page, user, c_added, c_removed
FROM src;
Hive materialized view name
Hive storage handler classname
12 © Hortonworks Inc. 2011–2018. All rights reserved
Other operations for materialized view management
DROP MATERIALIZED VIEW [db_name.]materialized_view_name;
SHOW MATERIALIZED VIEWS [IN database_name] ['identifier_with_wildcards’];
DESCRIBE [EXTENDED | FORMATTED] [db_name.]materialized_view_name;
⇢ More operations to be added and extended
13 © Hortonworks Inc. 2011–2018. All rights reserved
Materialized view-based
query rewriting
14 © Hortonworks Inc. 2011–2018. All rights reserved
Materialized view-based rewriting algorithm
• Automatically rewrite incoming queries using materialized views
• Optimizer exploits materialized view definition semantics
• Built on the ideas presented in [GL01] using Apache Calcite
• Supports queries containing TableScan, Project, Filter, Join, Aggregate operators
• Includes some extensions
• Generation of additional rewritings without needing to do join permutation
• Partial rewritings using union operators
• More information about the rewriting coverage
• http://calcite.apache.org/docs/materialized_views#rewriting-using-plan-structural-information
[GL01] Jonathan Goldstein and Per-åke Larson. Optimizing queries using materialized views: A practical,
scalable solution. In Proc. ACM SIGMOD Conf., 2001.
15 © Hortonworks Inc. 2011–2018. All rights reserved
Enable materialized view-based rewriting
• Global property to enable materialized view rewriting for queries
SET hive.materializedview.rewriting=true;
• User can selectively use enable/disable materialized views for rewriting
• Materialized views are enabled by default for rewriting
• Behavior can be altered after materialized view has been created
ALTER MATERIALIZED VIEW [db_name.]materialized_view_name ENABLE|DISABLE REWRITE;
16 © Hortonworks Inc. 2011–2018. All rights reserved
depts
Materialized view-based rewriting (example)
• Materialized view definition
Employees that were hired after 2016
CREATE MATERIALIZED VIEW mv
AS
SELECT empid, deptname, hire_date
FROM emps JOIN depts
ON (emps.deptno = depts.deptno)
WHERE hire_date >= '2016-01-01';
• Query
Employees that were hired last quarter
SELECT empid, deptname
FROM emps JOIN depts
ON (emps.deptno = depts.deptno)
WHERE hire_date >= '2018-01-01'
AND hire_date <= '2018-03-31';
• Materialized view-based rewriting
SELECT empid, deptname
FROM mv
WHERE hire_date >= '2018-01-01'
AND hire_date <= '2018-03-31';
deptsemps
empid depname hire_date
10001 IT 2016-03-01
10002 IT 2017-01-02
10003 HR 2017-07-01
10004 Finance 2018-01-15
10005 HR 2018-02-02
mv contents
empid depname
10004 Finance
10005 HR
Query results
17 © Hortonworks Inc. 2011–2018. All rights reserved
Materialized view-based rewriting (example 2)
• Materialized view definition
CREATE MATERIALIZED VIEW mv AS
SELECT <dims>,
lo_revenue,
lo_extendedprice * lo_discount AS d_price,
lo_revenue - lo_supplycost
FROM
customer, dates, lineorder, part, supplier
WHERE
lo_orderdate = d_datekey
and lo_partkey = p_partkey
and lo_suppkey = s_suppkey
and lo_custkey = c_custkey;
• Query
SELECT sum(lo_extendedprice * lo_discount)
FROM
lineorder, dates
WHERE
lo_orderdate = d_datekey
and d_year = 2013
and lo_discount between 1 and 3;
• Materialized view-based rewriting
SELECT SUM(d_price)
FROM mv
WHERE
d_year = 2013
and lo_discount between 1 and 3;
supplier
part
dates
customerlineorder
Exploit SQL PK-FK and
NOT NULL constraints
d_year lo_discount <dims> d_price
2013 2 ... 7.55
2014 4 ... 432.60
2013 2 ... 34.45
2012 2 ... 2.05
… … ... …
mv contents
sum
42.0
…
Query results
18 © Hortonworks Inc. 2011–2018. All rights reserved
Materialized view-based rewriting (example 3)
• Materialized view definition
CREATE MATERIALIZED VIEW mv AS
SELECT floor(time to minute), page,
SUM(added) AS c_added,
SUM(removed) AS c_rmv
FROM wiki
GROUP BY floor(time to minute), page;
• Query
SELECT floor(time to month),
SUM(added) AS c_added
FROM wiki
GROUP BY floor(time to month);
• Materialized view-based rewriting
SELECT floor(time to month),
SUM(c_added) as c_added
FROM mv
GROUP BY floor(time to month);
wiki
__time page c_added c_rmv
2011-01-01 01:05:00 Justin 1800 25
2011-01-20 19:00:00 Justin 2912 42
2011-01-01 11:06:00 Ke$ha 1953 17
2011-02-02 13:15:00 Ke$ha 3194 170
2011-01-02 18:00:00 Miley 2232 34
mv contents
__time c_added
2011-01-01 00:00:00 8897
2011-02-01 00:00:00 3194
Query results
19 © Hortonworks Inc. 2011–2018. All rights reserved
Materialized view
maintenance
20 © Hortonworks Inc. 2011–2018. All rights reserved
Rebuilding materialized views
• Rebuild needs to be triggered manually by user
ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD;
• Incremental materialized view maintenance
• Only refresh data that has changed in source tables
• Multiple benefits
• Decrease rebuild step execution time
• Preserves LLAP cache for existing data
• Materialized view should only use transactional tables (micromanaged or ACID)
• Current implementation only supports incremental rebuild for insert operations
• Update/delete operations force full rebuild
• Optimizer will attempt incremental rebuild
• Otherwise, fallback to full rebuild (INSERT OVERWRITE with MV definition)
21 © Hortonworks Inc. 2011–2018. All rights reserved
Incremental view maintenance algorithm
• Relies on materialized view rewriting algorithm
• Materialized view stores write ID for its tables when it is created/refreshed
• Write ID associates rows with transactions
• When rebuild is triggered, introduce filter condition on write ID column in MV definition
• Read only new rows from source tables
• Execute materialized view rewriting
• Rewrite INSERT OVERWRITE (full rebuild) into more efficient plan
• INSERT (table scan, filter, project, join)
• MERGE (table scan, filter, project, join, aggregate)
22 © Hortonworks Inc. 2011–2018. All rights reserved
CREATE MATERIALIZED VIEW mv1 AS
SELECT page, user,
SUM(added) AS c_added,
SUM(removed) AS c_rmv
FROM wiki
GROUP BY page, user;
Incremental view maintenance algorithm (example)
mv1 contents
page user c_added c_rmv
Justin Boxer 1800 25
Justin Reach 2912 42
Ke$ha Xeno 1953 17
Ke$ha Helz 3194 170
Miley Ashu 2232 34
page user … added removed … writeID
… … … … … … …
Miley Ashu … 68 16 … 10000
Justin Zaka … 392 239 … 10000
wiki contents
New records
⇢ ALTER MATERIALIZED VIEW mv1 REBUILD;
23 © Hortonworks Inc. 2011–2018. All rights reserved
CREATE MATERIALIZED VIEW mv1 AS
SELECT page, user,
SUM(added) AS c_added,
SUM(removed) AS c_rmv
FROM wiki
GROUP BY page, user;
① Rebuild statement rewriting
INSERT OVERWRITE mv1
SELECT page, user, SUM(added) AS c_added, SUM(removed) AS c_rmv
FROM (
SELECT page, user, c_added, c_removed
FROM mv1
UNION ALL
SELECT page, user, SUM(added) AS c_added, SUM(removed) AS c_rmv
FROM wiki
WHERE writeID > 9999
GROUP BY page, user) subq
GROUP BY page, user;
Incremental view maintenance algorithm (example)
Rollup data
mv1 contents
page user c_added c_rmv
Justin Boxer 1800 25
Justin Reach 2912 42
Ke$ha Xeno 1953 17
Ke$ha Helz 3194 170
Miley Ashu 2232 34
page user … added removed … writeID
… … … … … … …
Miley Ashu … 68 16 … 10000
Justin Zaka … 392 239 … 10000
wiki contents
New records
24 © Hortonworks Inc. 2011–2018. All rights reserved
CREATE MATERIALIZED VIEW mv1 AS
SELECT page, user,
SUM(added) AS c_added,
SUM(removed) AS c_rmv
FROM wiki
GROUP BY page, user;
② Rewrite INSERT OVERWRITE into MERGE statement
MERGE INTO mv1
USING (
SELECT page, user, SUM(added) AS c_added, SUM(removed) AS c_rmv
FROM wiki
WHERE writeID > 9999
GROUP BY page, user) src
ON mv1.page = src.page AND mv1.user = src.user
WHEN MATCHED
THEN UPDATE SET c_added = mv1.c_added + src.c_added,
c_removed = mv1.c_removed + src.c_rmv
WHEN NOT MATCHED
THEN INSERT VALUES (page, user, c_added, c_rmv);
Incremental view maintenance algorithm (example)
page user … added removed … writeID
… … … … … … …
Miley Ashu … 68 16 … 10000
Justin Zaka … 392 239 … 10000
wiki contents
New records
mv1 contents
page user c_added c_rmv
Justin Boxer 1800 25
Justin Reach 2912 42
Ke$ha Xeno 1953 17
Ke$ha Helz 3194 170
Miley Ashu 2300 50
Justin Zaka 392 239
25 © Hortonworks Inc. 2011–2018. All rights reserved
Materialized view lifecycle
26 © Hortonworks Inc. 2011–2018. All rights reserved
Management of materialized view lifecycle
• Do not accept stale data (default)
• If content of the materialized view is not fresh, we do not use it for automatic query rewriting
• Still possible to trigger partial rewritings that read both the stale materialized view and new data
from source tables
• Accept stale data
• Freshness defined as a time parameter
• If MV was not rebuilt for a certain time period and there were changes in base tables, ignore
• SET hive.materializedview.rewriting.time.window=10min;
• Can also be overriden by a certain materialized view using table properties
• Periodically rebuild materialized view, e.g., every 5 minutes
t=0min t=10min t=20min
Create MV Rebuild Rebuild Rebuild Rebuild
t=5min t=15min
27 © Hortonworks Inc. 2011–2018. All rights reserved
Road ahead
28 © Hortonworks Inc. 2011–2018. All rights reserved
Road ahead
• Improvements to current materialized views implementation
• Rewriting performance and scalability
• Single/many MVs
• Control physical distribution of data
• PARTITIONED BY,DISTRIBUTE BY, SORT BY, CLUSTER BY
• Increase incremental view maintenance coverage
• Support update/delete in source tables
• Materialized view recommender
• Ease the identification of access patterns for a given workload
29 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you
https://cwiki.apache.org/confluence/display/Hive/Materialized+views

More Related Content

What's hot

Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019alanfgates
 
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...DataWorks Summit
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureDataWorks Summit/Hadoop Summit
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
Enabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integrationEnabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integrationDataWorks Summit
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderAuthoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderDataWorks Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonArtem Ervits
 

What's hot (20)

Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Enabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integrationEnabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integration
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderAuthoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using Slider
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 

Similar to Accelerating query processing

Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveSahil Takiar
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?DataWorks Summit
 
OpenNTF Webinar, October 2020
OpenNTF Webinar, October 2020OpenNTF Webinar, October 2020
OpenNTF Webinar, October 2020Howard Greenberg
 
Liquibase – a time machine for your data
Liquibase – a time machine for your dataLiquibase – a time machine for your data
Liquibase – a time machine for your dataNeev Technologies
 
The Central View of your Data with Postgres
The Central View of your Data with PostgresThe Central View of your Data with Postgres
The Central View of your Data with PostgresEDB
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...HostedbyConfluent
 
Oracle DV V4 new features overview
Oracle DV V4 new features overviewOracle DV V4 new features overview
Oracle DV V4 new features overviewPhilippe Lions
 
SharePoint 2014: Where to save my data, for devs!
SharePoint 2014: Where to save my data, for devs!SharePoint 2014: Where to save my data, for devs!
SharePoint 2014: Where to save my data, for devs!Ben Steinhauser
 
Ansible Meetup FI - Ansible use cases with enterprise application
Ansible Meetup FI - Ansible use cases with enterprise application Ansible Meetup FI - Ansible use cases with enterprise application
Ansible Meetup FI - Ansible use cases with enterprise application Tanja REPO 🦊
 
Ajug hibernate-dos-donts
Ajug hibernate-dos-dontsAjug hibernate-dos-donts
Ajug hibernate-dos-dontsRoy Russo
 
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...Sebastien Goiffon
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidDataWorks Summit
 
Rocking the enterprise with Ruby - RubyKaigi 2010
Rocking the enterprise with Ruby - RubyKaigi 2010Rocking the enterprise with Ruby - RubyKaigi 2010
Rocking the enterprise with Ruby - RubyKaigi 2010releasebeta
 

Similar to Accelerating query processing (20)

Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache Hive
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
OpenNTF Webinar, October 2020
OpenNTF Webinar, October 2020OpenNTF Webinar, October 2020
OpenNTF Webinar, October 2020
 
Liquibase – a time machine for your data
Liquibase – a time machine for your dataLiquibase – a time machine for your data
Liquibase – a time machine for your data
 
The Central View of your Data with Postgres
The Central View of your Data with PostgresThe Central View of your Data with Postgres
The Central View of your Data with Postgres
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
 
Oracle DV V4 new features overview
Oracle DV V4 new features overviewOracle DV V4 new features overview
Oracle DV V4 new features overview
 
SharePoint 2014: Where to save my data, for devs!
SharePoint 2014: Where to save my data, for devs!SharePoint 2014: Where to save my data, for devs!
SharePoint 2014: Where to save my data, for devs!
 
Ansible Meetup FI - Ansible use cases with enterprise application
Ansible Meetup FI - Ansible use cases with enterprise application Ansible Meetup FI - Ansible use cases with enterprise application
Ansible Meetup FI - Ansible use cases with enterprise application
 
Ajug hibernate-dos-donts
Ajug hibernate-dos-dontsAjug hibernate-dos-donts
Ajug hibernate-dos-donts
 
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
Rocking the enterprise with Ruby - RubyKaigi 2010
Rocking the enterprise with Ruby - RubyKaigi 2010Rocking the enterprise with Ruby - RubyKaigi 2010
Rocking the enterprise with Ruby - RubyKaigi 2010
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Accelerating query processing

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Accelerating query processing with materialized views in Apache Hive Jesús Camacho Rodríguez DataWorks Summit San Jose June 19, 2018
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Apache Hive • Initial use case: batch processing • Read-only data • HiveQL (SQL-like query language) • MapReduce • Effort to take Hive beyond its batch processing roots • Started in Apache Hive 0.10.0 (January 2013) • Latest release: Apache Hive 3.0 (May 2018) • Extensive renovation to improve three different axes • Latency: allow interactive and sub-second queries • Scalability: from TB to PB of data • SQL support: move from HiveQL to SQL standard
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Apache Hive • Multiple execution engines: Apache Tez and Apache Spark • More efficient join execution algorithms • Vectorized query execution • Integration with columnar storage formats: Apache ORC, Apache Parquet • LLAP (Live Long and Process) • Persistent deamons for low-latency queries • Rule-based and cost-based optimizer • Better statistics • Tighter integration with other data processing systems: Druid Important internals improvements
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Accelerating query processing • Change data physical properties (distribute, sort) • Filter rows • Denormalize • Preaggregate Optimization based on access patterns
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Accelerating query processing • Establish relationship between original and new tables • Has a similar table already been created? • Rewrite your queries to use new tables • What happens when access patterns change? • Maintain your new tables when original tables change • Do I have to fully rebuild new tables? Optimization based on access patterns Currently, Hive users have to do it manually
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Materialized views • A materialized view is an entity that contains the result of a evaluating a query • Important property  Awareness of the materialized view definition semantics • Optimizer can exploit them for automatic query rewriting • System can handle maintenance of the materialized views • Generally, materializations can be created in different forms depending on the scope • DBA writes “CREATE MATERIALIZED VIEW” statement • Daemon creates materialized view based on recent query activity • Cached result of previous similar query • Query factorization identifies common pieces within a single query
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Possible workflow 1. Create materialized view using Hive tables • Stored by Hive or Druid 2. User or dashboard sends queries to Hive • Hive rewrites queries using available materialized views • Execute rewitten query Dashboards, BI tools CREATE MATERIALIZED VIEW `ssb_mv` STORED AS 'org.apache.hadoop.hive.druid.DruidStorageHandler' ENABLE REWRITE AS <query>; DBA, recommendation system ① ② Data Queries
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Materialized views in Apache Hive • First implementation is part of Apache Hive 3.0 • Multiple storage options: Hive, Druid • Automatic rewriting of incoming queries to use materialized views • Efficient view maintenance • Incremental refresh • Multiple options to control materialized views lifecycle
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Management of materialized views in Hive
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Materialized view creation • CREATE MATERIALIZED VIEW statement CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name [DISABLE REWRITE] [COMMENT materialized_view_comment] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] AS <query>; ⇢ Supports custom table properties, storage format, etc.
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Materialized view creation (stored in Druid) • CREATE MATERIALIZED VIEW statement CREATE MATERIALIZED VIEW druid_wiki_mv STORED AS 'org.apache.hadoop.hive.druid.DruidStorageHandler' AS SELECT __time, page, user, c_added, c_removed FROM src; Hive materialized view name Hive storage handler classname
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved Other operations for materialized view management DROP MATERIALIZED VIEW [db_name.]materialized_view_name; SHOW MATERIALIZED VIEWS [IN database_name] ['identifier_with_wildcards’]; DESCRIBE [EXTENDED | FORMATTED] [db_name.]materialized_view_name; ⇢ More operations to be added and extended
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Materialized view-based query rewriting
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Materialized view-based rewriting algorithm • Automatically rewrite incoming queries using materialized views • Optimizer exploits materialized view definition semantics • Built on the ideas presented in [GL01] using Apache Calcite • Supports queries containing TableScan, Project, Filter, Join, Aggregate operators • Includes some extensions • Generation of additional rewritings without needing to do join permutation • Partial rewritings using union operators • More information about the rewriting coverage • http://calcite.apache.org/docs/materialized_views#rewriting-using-plan-structural-information [GL01] Jonathan Goldstein and Per-åke Larson. Optimizing queries using materialized views: A practical, scalable solution. In Proc. ACM SIGMOD Conf., 2001.
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Enable materialized view-based rewriting • Global property to enable materialized view rewriting for queries SET hive.materializedview.rewriting=true; • User can selectively use enable/disable materialized views for rewriting • Materialized views are enabled by default for rewriting • Behavior can be altered after materialized view has been created ALTER MATERIALIZED VIEW [db_name.]materialized_view_name ENABLE|DISABLE REWRITE;
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved depts Materialized view-based rewriting (example) • Materialized view definition Employees that were hired after 2016 CREATE MATERIALIZED VIEW mv AS SELECT empid, deptname, hire_date FROM emps JOIN depts ON (emps.deptno = depts.deptno) WHERE hire_date >= '2016-01-01'; • Query Employees that were hired last quarter SELECT empid, deptname FROM emps JOIN depts ON (emps.deptno = depts.deptno) WHERE hire_date >= '2018-01-01' AND hire_date <= '2018-03-31'; • Materialized view-based rewriting SELECT empid, deptname FROM mv WHERE hire_date >= '2018-01-01' AND hire_date <= '2018-03-31'; deptsemps empid depname hire_date 10001 IT 2016-03-01 10002 IT 2017-01-02 10003 HR 2017-07-01 10004 Finance 2018-01-15 10005 HR 2018-02-02 mv contents empid depname 10004 Finance 10005 HR Query results
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Materialized view-based rewriting (example 2) • Materialized view definition CREATE MATERIALIZED VIEW mv AS SELECT <dims>, lo_revenue, lo_extendedprice * lo_discount AS d_price, lo_revenue - lo_supplycost FROM customer, dates, lineorder, part, supplier WHERE lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and lo_custkey = c_custkey; • Query SELECT sum(lo_extendedprice * lo_discount) FROM lineorder, dates WHERE lo_orderdate = d_datekey and d_year = 2013 and lo_discount between 1 and 3; • Materialized view-based rewriting SELECT SUM(d_price) FROM mv WHERE d_year = 2013 and lo_discount between 1 and 3; supplier part dates customerlineorder Exploit SQL PK-FK and NOT NULL constraints d_year lo_discount <dims> d_price 2013 2 ... 7.55 2014 4 ... 432.60 2013 2 ... 34.45 2012 2 ... 2.05 … … ... … mv contents sum 42.0 … Query results
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Materialized view-based rewriting (example 3) • Materialized view definition CREATE MATERIALIZED VIEW mv AS SELECT floor(time to minute), page, SUM(added) AS c_added, SUM(removed) AS c_rmv FROM wiki GROUP BY floor(time to minute), page; • Query SELECT floor(time to month), SUM(added) AS c_added FROM wiki GROUP BY floor(time to month); • Materialized view-based rewriting SELECT floor(time to month), SUM(c_added) as c_added FROM mv GROUP BY floor(time to month); wiki __time page c_added c_rmv 2011-01-01 01:05:00 Justin 1800 25 2011-01-20 19:00:00 Justin 2912 42 2011-01-01 11:06:00 Ke$ha 1953 17 2011-02-02 13:15:00 Ke$ha 3194 170 2011-01-02 18:00:00 Miley 2232 34 mv contents __time c_added 2011-01-01 00:00:00 8897 2011-02-01 00:00:00 3194 Query results
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Materialized view maintenance
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Rebuilding materialized views • Rebuild needs to be triggered manually by user ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD; • Incremental materialized view maintenance • Only refresh data that has changed in source tables • Multiple benefits • Decrease rebuild step execution time • Preserves LLAP cache for existing data • Materialized view should only use transactional tables (micromanaged or ACID) • Current implementation only supports incremental rebuild for insert operations • Update/delete operations force full rebuild • Optimizer will attempt incremental rebuild • Otherwise, fallback to full rebuild (INSERT OVERWRITE with MV definition)
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Incremental view maintenance algorithm • Relies on materialized view rewriting algorithm • Materialized view stores write ID for its tables when it is created/refreshed • Write ID associates rows with transactions • When rebuild is triggered, introduce filter condition on write ID column in MV definition • Read only new rows from source tables • Execute materialized view rewriting • Rewrite INSERT OVERWRITE (full rebuild) into more efficient plan • INSERT (table scan, filter, project, join) • MERGE (table scan, filter, project, join, aggregate)
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved CREATE MATERIALIZED VIEW mv1 AS SELECT page, user, SUM(added) AS c_added, SUM(removed) AS c_rmv FROM wiki GROUP BY page, user; Incremental view maintenance algorithm (example) mv1 contents page user c_added c_rmv Justin Boxer 1800 25 Justin Reach 2912 42 Ke$ha Xeno 1953 17 Ke$ha Helz 3194 170 Miley Ashu 2232 34 page user … added removed … writeID … … … … … … … Miley Ashu … 68 16 … 10000 Justin Zaka … 392 239 … 10000 wiki contents New records ⇢ ALTER MATERIALIZED VIEW mv1 REBUILD;
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved CREATE MATERIALIZED VIEW mv1 AS SELECT page, user, SUM(added) AS c_added, SUM(removed) AS c_rmv FROM wiki GROUP BY page, user; ① Rebuild statement rewriting INSERT OVERWRITE mv1 SELECT page, user, SUM(added) AS c_added, SUM(removed) AS c_rmv FROM ( SELECT page, user, c_added, c_removed FROM mv1 UNION ALL SELECT page, user, SUM(added) AS c_added, SUM(removed) AS c_rmv FROM wiki WHERE writeID > 9999 GROUP BY page, user) subq GROUP BY page, user; Incremental view maintenance algorithm (example) Rollup data mv1 contents page user c_added c_rmv Justin Boxer 1800 25 Justin Reach 2912 42 Ke$ha Xeno 1953 17 Ke$ha Helz 3194 170 Miley Ashu 2232 34 page user … added removed … writeID … … … … … … … Miley Ashu … 68 16 … 10000 Justin Zaka … 392 239 … 10000 wiki contents New records
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved CREATE MATERIALIZED VIEW mv1 AS SELECT page, user, SUM(added) AS c_added, SUM(removed) AS c_rmv FROM wiki GROUP BY page, user; ② Rewrite INSERT OVERWRITE into MERGE statement MERGE INTO mv1 USING ( SELECT page, user, SUM(added) AS c_added, SUM(removed) AS c_rmv FROM wiki WHERE writeID > 9999 GROUP BY page, user) src ON mv1.page = src.page AND mv1.user = src.user WHEN MATCHED THEN UPDATE SET c_added = mv1.c_added + src.c_added, c_removed = mv1.c_removed + src.c_rmv WHEN NOT MATCHED THEN INSERT VALUES (page, user, c_added, c_rmv); Incremental view maintenance algorithm (example) page user … added removed … writeID … … … … … … … Miley Ashu … 68 16 … 10000 Justin Zaka … 392 239 … 10000 wiki contents New records mv1 contents page user c_added c_rmv Justin Boxer 1800 25 Justin Reach 2912 42 Ke$ha Xeno 1953 17 Ke$ha Helz 3194 170 Miley Ashu 2300 50 Justin Zaka 392 239
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Materialized view lifecycle
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved Management of materialized view lifecycle • Do not accept stale data (default) • If content of the materialized view is not fresh, we do not use it for automatic query rewriting • Still possible to trigger partial rewritings that read both the stale materialized view and new data from source tables • Accept stale data • Freshness defined as a time parameter • If MV was not rebuilt for a certain time period and there were changes in base tables, ignore • SET hive.materializedview.rewriting.time.window=10min; • Can also be overriden by a certain materialized view using table properties • Periodically rebuild materialized view, e.g., every 5 minutes t=0min t=10min t=20min Create MV Rebuild Rebuild Rebuild Rebuild t=5min t=15min
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved Road ahead
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved Road ahead • Improvements to current materialized views implementation • Rewriting performance and scalability • Single/many MVs • Control physical distribution of data • PARTITIONED BY,DISTRIBUTE BY, SORT BY, CLUSTER BY • Increase incremental view maintenance coverage • Support update/delete in source tables • Materialized view recommender • Ease the identification of access patterns for a given workload
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved Thank you https://cwiki.apache.org/confluence/display/Hive/Materialized+views