SlideShare a Scribd company logo
1 of 38
Download to read offline
1© 2017 Pivotal Software, Inc. All rights reserved. 1© 2017 Pivotal Software, Inc. All rights reserved.
Querying Unmanaged Data
HAWQ meets Hive
Shivram Mani
Oleksandr Diachenko
2© 2017 Pivotal Software, Inc. All rights reserved.
Agenda
● Overview of Apache HAWQ (incubating)
● HAWQ Architecture
● HAWQ Extension Framework
● HAWQ Hive Integration
● HAWQ HCatalog Integration
3© 2017 Pivotal Software, Inc. All rights reserved.
Apache HAWQ’s Lineage
1986 … 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015
Postgres developed
at UC Berkeley
Postgres adds support for SQL
Open Source PostgreSQL
PostgreSQL 7.0 released
PostgreSQL 8.0 released
Greenplum based on
PostgreSQL
Hadoop 1.0 Released
HAWQ goes
open-source
(Apache)
HAWQ project launched
Hadoop 2.0 Released
4© 2017 Pivotal Software, Inc. All rights reserved.
HAWQ Overview
Multi-level Fault
Tolerance
Granular
Authorization
Resource Mgmt
(+ YARN)
Multi-tenancy + Security
ANSI SQL
Standard
OLAP Extensions
JDBC ODBC
Connectivity
Online
Expansion
Hadoop / HDFS
Operations
Cost Based Optimizer (ORCA)
Dynamic
Pipelining
ACID +
Transactional
MPP
Architecture
Data Federation
Language
Extensions
Advanced Analytics MPP Database for Enterprises
Extensibility
HDFS Native
File Formats
Compression +
Partitioning
Core
Connectivity
- Enable Data Science
- Large Scale Analytics
- Query All Data Types &
sources
- Manage Multiple
Workloads
- Security controls
- Well Integrated
- Leverage Existing
SQL Skills & BI Tools
- High-performance
Ambari
Management
Machine
Learning
5© 2017 Pivotal Software, Inc. All rights reserved.
HAWQ Components
HAWQ Master (1)
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Resource Mgr.
NN cache
Query Dispatch
Fault Tolerant Svc
HAWQ Segment (1..N)
Postmaster
Local directory
(Temp Data / Logs)
Virtual Segments (Query Executors)
libhdfs3
Datanode YARN NM
HAWQ Standby Master (1)
6© 2017 Pivotal Software, Inc. All rights reserved.
Server NServer 2Server 1
Query Execution (Native)
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
Resource Mgr.
NameNode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Local directory Local directory Local directory
Animated slides
NN Cache
Interconnect
7© 2017 Pivotal Software, Inc. All rights reserved.
Server NServer 2Server 1
Query Execution - Plan
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
NN Cache
Resource Mgr.
NameNode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Query Dispatch
Local directory Local directory Local directory
8© 2017 Pivotal Software, Inc. All rights reserved.
Server NServer 2Server 1
Query Execution - Resource
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
NN Cache
Resource Mgr.
NameNode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Query Dispatch
VS VS VS VS VS
Local directory Local directory Local directory
I need 5 containers
Each with 1 CPU core
and 1 GB RAM
Server 1: 2 containers
Server 2: 1 container
Server N: 2 containers
VS = Virtual Segment (container for Query Executors)
# of QEs in a v-seg = # of slices in a query
9© 2017 Pivotal Software, Inc. All rights reserved.
Query Execution - Prepare
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
NN Cache
Resource Mgr.
NameNode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Query Dispatch
VS VS VS VS VS
Server 1
Local directory
Server 2
Local directory
Server N
Local directory
VS = Virtual Segment (container for Query Executors)
# of QEs in a v-seg = # of slices in a query
10© 2017 Pivotal Software, Inc. All rights reserved.
Query Execution - Execute
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
NN Cache
Resource Mgr.
NameNode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Query Dispatch
VS VS VS VS VS
Server 1
Local directory
Server 2
Local directory
Server N
Local directory
VS = Virtual Segment (container for Query Executors)
# of QEs in a v-seg = # of slices in a query
11© 2017 Pivotal Software, Inc. All rights reserved.
Query Execution - Result
HAWQ Master
Metadata
Transaction Mgr.
Query Parser Query Optimizer
NN Cache
Resource Mgr.
NameNode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
YARN RMPostmaster
Query Dispatch
VS VS VS VS VS
Server 1
Local directory
Server 2
Local directory
Server N
Local directory
VS = Virtual Segment (container for Query Executors)
# of QEs in a v-seg = # of slices in a query
12© 2017 Pivotal Software, Inc. All rights reserved.
Highly efficient MPP
(massively parallel
processing) heritage
and architecture
Dynamic pipelining, no
intermediate writes
to disk
Advanced
cost-based
optimizer
Scalable and fast
Interconnect
Native (C++) HDFS
access/scan speed
HDFS metadata
cache Optimal data locality
matching methods
Reasons why HAWQ is high-performance
13© 2017 Pivotal Software, Inc. All rights reserved.
seconds
* Queries that did not complete are omitted from results on both platforms
• HAWQ ~1.3x faster
• Competing MPP Hadoop engine failed to
complete 47% of the queries (unmodified)
1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44
45 46 47 48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74 75 76 77
78 79 80 81 82 83 84 85 86 87 88
89 90 91 92 93 94 95 96 97 98 99
Unsupported SQL
Long running killed
Memory Limit Exceeded
Test Query Failed in
the other engine
TPC-DS Queries with 5-Users
TPC-DS benchmark
14© 2017 Pivotal Software, Inc. All rights reserved.
Managed vs Unmanaged data
Managed data
Unmanaged data
Metadata Metadata
???
HAWQ eXtension Framework (aka PXF)
Uniform tabular view to
heterogeneous data sources
Exploits parallelism for data
access
Pluggable framework for
Custom connectors(profiles)
Built-in connectors for various data
sources/formats
Tomcat
(Webapp)
REST API
Java API
External Tables
Java API
Java/Thrift
● JDBC
● Solr
● Redis
● Cassandra
● GemfireXD
PXF Architecture
➔ Independent JVM
➔ Runs alongside namenode and datanodes
PXF
17© 2017 Pivotal Software, Inc. All rights reserved.
Server NServer 2Server 1
Query Execution (External Data)
HAWQ Master
NameNode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
Postmaster
Local directory Local directory Local directory
Animated slides
18© 2017 Pivotal Software, Inc. All rights reserved.
Server NServer 2Server 1
Query Planning - Distribution
HAWQ Master
NameNode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
Postmaster PXF
Local directory Local directory Local directory
Get Partition Metadata
{P1, P2, P3, P4, P5}
Planner
Partition Mapper
{P1, P4} {P5} {P2, P3}
19© 2017 Pivotal Software, Inc. All rights reserved.
Server NServer 2Server 1
Query Execution - Read
HAWQ Master
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
Postmaster
VS VSVS VS VS
NameNode
PXF
PXF PXF PXF
P2P5P1 P4 P3
20© 2017 Pivotal Software, Inc. All rights reserved.
Query Execution - Result
HAWQ Master
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
HAWQ Segment
Postmaster
HDFS Datanode
VS VS VS VS VS
Server 1
Local directory
Server 2
Local directory
Server N
Local directory
VS = Virtual Segment (container for Query Executors)
# of QEs in a v-seg = # of slices in a query
NameNode
PXFPostmaster
Global Aggregate
21© 2017 Pivotal Software, Inc. All rights reserved.
HAWQ-Hive Data Integration
HiveRC
➢ Works for
RCFile format
Hive
➢ Works for
heterogeneous tables
➢ Support all formats
➢ Unmooptimized
HiveText
➢ Works fast for text
data
➢ Lazy data resolution
➢ Only text datatypes
are supported
HiveORC
➢ Optimized for ORC
data
➢ Leverages predicates
push down
➢ Column projection
HiveVectorizedORC
➢ Uses ORC Batch API
➢ Sends 1024 row batch to
HAWQ
➢ Enables Vectorized
Execution
22© 2017 Pivotal Software, Inc. All rights reserved.
HAWQ-Hive ORC Optimizations
HAWQ Master
HAWQ Segment
Postmaster
PXF
column attributes: col1, col2
predicate: RPNF {filter(s)}
aggregate functions
{Col1,col2
col3=’abc’}
col4;
col3;
col2;
col1;
SELECT col1,col2 FROM tab1
WHERE col3 = ‘abc’;
SELECT COUNT(*) FROM tab1
WHERE col3 = ‘abc’;Query Dispatch
ORC API {Col1,col2
col3=’abc’}
23© 2017 Pivotal Software, Inc. All rights reserved.
Optimizations
Statistics
● Exposing statistics
about unmanaged
tables
● Optimized Query plan
Columns projection
● Passing requested
columns
● Disk I/O is optimized if
data format allows
Predicates pushdown
● Passing down predicates
from WHERE clause through
the PXF framework
● Partitions/stripes/files
elimination
Batches vs tuples
● HiveText
● HiveVectorizedORC
● Lazy Data resolution
24© 2017 Pivotal Software, Inc. All rights reserved.
HAWQ-Hive Catalog Integration
CREATE EXTERNAL TABLE items (column2 int, column2 string)
LOCATION ('pxf://namenode:51200/customer_db?PROFILE=Hive')
FORMAT 'custom' (formatter='pxfwritable_import');
SELECT * FROM items;
Was: Wanted:
● Need to create external HAWQ table
● Users need to know HAWQ-Hive data mapping
● Need to keep both tables metadata in sync manually
SELECT * FROM items;
● No need to create external HAWQ table
● Users don't know about HAWQ-Hive data types
mapping, etc
● Metadata is always up to date
25© 2017 Pivotal Software, Inc. All rights reserved.
Challenges with Catalog Unification
Hive Catalog
26© 2017 Pivotal Software, Inc. All rights reserved.
Challenges with Catalog Unification
HAWQ Catalog
27© 2017 Pivotal Software, Inc. All rights reserved.
Where to store HCatalog data in HAWQ
Requires few HAWQ changes
Getting all catalog utilities for free
Catalog is polluted with external
data
HCatalog objects are visible to
concurrent sessions
Session-level isolation
Cheap cleanup process
HAWQ Catalog service need to be
changed to be able to work with
disk/memory
Catalog utilities need to be modified
to work with HCatalog objects
28© 2017 Pivotal Software, Inc. All rights reserved.
Object namespaces
0 2^3210*2^20
Globalcounter
Session
1
counter
In-memory
In-memory
In-memory
Session
2
counter
Session
N
counte
HAWQ objects HCatalog objects
Persistant
Sessions states
are isolated
29© 2017 Pivotal Software, Inc. All rights reserved.
HAWQ-HCatalog Integration
Weblogs
id double
ts timestamp
...
SELECT * FROM hcatalog.default.weblogs
WHERE ts between ‘2015-09-01’ and ‘2015-09-30’;
HIVE
PXF
PXF
PXF
HCAT
SELECT COUNT(*) FROM hcatalog.default.weblogs
WHERE ts between ‘2015-09-01’ and ‘2015-09-30’;
In Memory
Catalog
Disk Heap
Catalog
Weblogs
id double
ts timestamp
...
HAWQCatalogservice
HAWQ
30© 2017 Pivotal Software, Inc. All rights reserved.
Avoid data duplication:
All processing engines point to the same copy of data
⬢ Apache HAWQ
● MPP engine from the core
● Easy transition from Tradition
DB/Warehouse
● Ad-hoc Analytics, BI & Visualization
● Low Query Latency
● Scale 100s TB to low PB’s
● Machine Learning (Madlib)
Apache Hive & HAWQ (via HDB)
The Most Comprehensive SQL on Hadoop
Right Tool for the Job:
Choose the right SQL engine based on your
application’s needs.
⬢ Apache Hive
● Holds very detailed information
● Integrates all data sources
● Low-Mid Query Latency
● Scales to 100’s petabytes
● Large Community
Run HAWQ & Hive alongside!
github.com/apache/incubator-hawq
HAWQ Homepage
Getting Started
HAWQ Wiki
PXF Wiki
Sandbox
Additional Resources
Documentation Wiki/Docs
Code Github(Apache)
Join Discussion/Ask Questions Apache DLs
dev@hawq.incubator.apache.org
user@hawq.incubator.apache.org
Additional Slides
33© 2016 Pivotal Software, Inc. All rights reserved.
LIBYARNResourceBroker
libyarn
Resource pool
YARNResourceManager
segments
YARN Node
Manager
HAWQ
Segment
Register HAWQ as an unmanaged
application exclusively consuming a
YARN queue
Periodically fetch YARN cluster report,
container report and queue report to
recognize YARN cluster
Acquire YARN containers with host
preference information
Return YARN containers
Unregister HAWQ in YARN
Add activated YARN
containers’ quota
Return YARN
containers’ quota
Global RM container
Lifecycle Manager
Resourcebrokeruseslibyarn(ac/c++
versionlibrary)tocommunicatewith
YARNthroughprotobuf.
Indexed Resource Quota
Table
Accepted YARN
container quota
To be returned
YARN containers’
quota
Increase HAWQ segment resource quota when have new global resource
manager’s containers allocated;
Decrease HAWQ segment resource quota when some global resource manager’s
containers are decided to be kicked.
HAWQ resource
queue manager
Acquire
calculated
resource
quota or
return
unused
query
resource
HAWQ Query
Dispatcher
Acquire/Returnqueryresource
SQL statement
Container report
Cluster report
Queue report
Query Quota
Calculator
Query Resource
Request
Queuing Facility
HAWQ Resource Manager
Queue Quota
Calculator
Allocated query
resource
Allocatedqueryresource
Active YARN containers with
resource holding processes
started
Drive resource broker to acquire global resource manager containers. The quota of a global
resource manager can be (1GB,1core), (2GB, 1core), etc.
Allocate virtual segments with fixed resource quota assigned and dispatch workload to segments.
The resource quota can be as small as 128MB, 256MB and as large as GBs.
4
79
10
11
14
15
8
312
6
5
1
2
13
Internal Use Only
34© 2016 Pivotal Software, Inc. All rights reserved.
• Responsibility
– Responsible for acquiring & returning CPU/Mem resources from/to YARN
– Responsible for resource allocation among HAWQ users and queries
• Master resource manager process
– Resource negotiation with YARN and resource allocation
– Manage and maintain the resources in resource pool
– Handle resource allocation/return RPC requests from QD (query
dispatcher)
– Fault tolerance service are in the same process
• Segment resource manager process
– One HAWQ RM on each Segment
– Negotiation with Master resource manager (for resource enforcement)
– Fault tolerance service: Heartbeat sender
Resource Management
HAWQ Resource Manager
35© 2016 Pivotal Software, Inc. All rights reserved.
SQL on Hadoop benchmark
36© 2016 Pivotal Software, Inc. All rights reserved.
PXF Data Flow
37© 2016 Pivotal Software, Inc. All rights reserved.
PXF Data Model
38© 2016 Pivotal Software, Inc. All rights reserved.
Putting it all together
External Data pxf Parallelized access to external data sources (read/write)
Install and Configure Ambari to deploy and manage HAWQ, just like any other Hadoop service.
Manage Resources YARN-integrated for dynamic resource allocation across hierarchical groups.
Write Queries Advanced optimizer and dynamic pipelining for high-performance response.orca
Enable Data Science In-database machine learning algorithms for predictive analytics.
Extend Data Processing Procedural language extensions for custom application logic.
Summary of HAWQ user experience (via HDB)

More Related Content

What's hot

Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamDataWorks Summit
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeDataWorks Summit
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDataWorks Summit
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsDataWorks Summit
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache AmbariDataWorks Summit
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerDataWorks Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...DataWorks Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
 
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...DataWorks Summit
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingDataWorks Summit
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
 

What's hot (20)

Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache Beam
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache Ambari
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 

Similar to HAWQ meets Hive: Querying Unmanaged Data

Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017Alex Diachenko
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Ashish Narasimham
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQpivotalny
 
Coherence RoadMap 2018
Coherence RoadMap 2018Coherence RoadMap 2018
Coherence RoadMap 2018harvraja
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015NoSQLmatters
 
SQL and Machine Learning on Hadoop
SQL and Machine Learning on HadoopSQL and Machine Learning on Hadoop
SQL and Machine Learning on HadoopMukund Babbar
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and dockerBob Ward
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesMesosphere Inc.
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Community
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Community
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR Technologies
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaAshish Thapliyal
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 

Similar to HAWQ meets Hive: Querying Unmanaged Data (20)

Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017Hawq meets Hive - DataWorks San Jose 2017
Hawq meets Hive - DataWorks San Jose 2017
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 
Coherence RoadMap 2018
Coherence RoadMap 2018Coherence RoadMap 2018
Coherence RoadMap 2018
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
 
SQL and Machine Learning on Hadoop
SQL and Machine Learning on HadoopSQL and Machine Learning on Hadoop
SQL and Machine Learning on Hadoop
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and Kafka
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

HAWQ meets Hive: Querying Unmanaged Data

  • 1. 1© 2017 Pivotal Software, Inc. All rights reserved. 1© 2017 Pivotal Software, Inc. All rights reserved. Querying Unmanaged Data HAWQ meets Hive Shivram Mani Oleksandr Diachenko
  • 2. 2© 2017 Pivotal Software, Inc. All rights reserved. Agenda ● Overview of Apache HAWQ (incubating) ● HAWQ Architecture ● HAWQ Extension Framework ● HAWQ Hive Integration ● HAWQ HCatalog Integration
  • 3. 3© 2017 Pivotal Software, Inc. All rights reserved. Apache HAWQ’s Lineage 1986 … 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 Postgres developed at UC Berkeley Postgres adds support for SQL Open Source PostgreSQL PostgreSQL 7.0 released PostgreSQL 8.0 released Greenplum based on PostgreSQL Hadoop 1.0 Released HAWQ goes open-source (Apache) HAWQ project launched Hadoop 2.0 Released
  • 4. 4© 2017 Pivotal Software, Inc. All rights reserved. HAWQ Overview Multi-level Fault Tolerance Granular Authorization Resource Mgmt (+ YARN) Multi-tenancy + Security ANSI SQL Standard OLAP Extensions JDBC ODBC Connectivity Online Expansion Hadoop / HDFS Operations Cost Based Optimizer (ORCA) Dynamic Pipelining ACID + Transactional MPP Architecture Data Federation Language Extensions Advanced Analytics MPP Database for Enterprises Extensibility HDFS Native File Formats Compression + Partitioning Core Connectivity - Enable Data Science - Large Scale Analytics - Query All Data Types & sources - Manage Multiple Workloads - Security controls - Well Integrated - Leverage Existing SQL Skills & BI Tools - High-performance Ambari Management Machine Learning
  • 5. 5© 2017 Pivotal Software, Inc. All rights reserved. HAWQ Components HAWQ Master (1) Metadata Transaction Mgr. Query Parser Query Optimizer Resource Mgr. NN cache Query Dispatch Fault Tolerant Svc HAWQ Segment (1..N) Postmaster Local directory (Temp Data / Logs) Virtual Segments (Query Executors) libhdfs3 Datanode YARN NM HAWQ Standby Master (1)
  • 6. 6© 2017 Pivotal Software, Inc. All rights reserved. Server NServer 2Server 1 Query Execution (Native) HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer Resource Mgr. NameNode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Local directory Local directory Local directory Animated slides NN Cache Interconnect
  • 7. 7© 2017 Pivotal Software, Inc. All rights reserved. Server NServer 2Server 1 Query Execution - Plan HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer NN Cache Resource Mgr. NameNode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Query Dispatch Local directory Local directory Local directory
  • 8. 8© 2017 Pivotal Software, Inc. All rights reserved. Server NServer 2Server 1 Query Execution - Resource HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer NN Cache Resource Mgr. NameNode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Query Dispatch VS VS VS VS VS Local directory Local directory Local directory I need 5 containers Each with 1 CPU core and 1 GB RAM Server 1: 2 containers Server 2: 1 container Server N: 2 containers VS = Virtual Segment (container for Query Executors) # of QEs in a v-seg = # of slices in a query
  • 9. 9© 2017 Pivotal Software, Inc. All rights reserved. Query Execution - Prepare HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer NN Cache Resource Mgr. NameNode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Query Dispatch VS VS VS VS VS Server 1 Local directory Server 2 Local directory Server N Local directory VS = Virtual Segment (container for Query Executors) # of QEs in a v-seg = # of slices in a query
  • 10. 10© 2017 Pivotal Software, Inc. All rights reserved. Query Execution - Execute HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer NN Cache Resource Mgr. NameNode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Query Dispatch VS VS VS VS VS Server 1 Local directory Server 2 Local directory Server N Local directory VS = Virtual Segment (container for Query Executors) # of QEs in a v-seg = # of slices in a query
  • 11. 11© 2017 Pivotal Software, Inc. All rights reserved. Query Execution - Result HAWQ Master Metadata Transaction Mgr. Query Parser Query Optimizer NN Cache Resource Mgr. NameNode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode YARN RMPostmaster Query Dispatch VS VS VS VS VS Server 1 Local directory Server 2 Local directory Server N Local directory VS = Virtual Segment (container for Query Executors) # of QEs in a v-seg = # of slices in a query
  • 12. 12© 2017 Pivotal Software, Inc. All rights reserved. Highly efficient MPP (massively parallel processing) heritage and architecture Dynamic pipelining, no intermediate writes to disk Advanced cost-based optimizer Scalable and fast Interconnect Native (C++) HDFS access/scan speed HDFS metadata cache Optimal data locality matching methods Reasons why HAWQ is high-performance
  • 13. 13© 2017 Pivotal Software, Inc. All rights reserved. seconds * Queries that did not complete are omitted from results on both platforms • HAWQ ~1.3x faster • Competing MPP Hadoop engine failed to complete 47% of the queries (unmodified) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 Unsupported SQL Long running killed Memory Limit Exceeded Test Query Failed in the other engine TPC-DS Queries with 5-Users TPC-DS benchmark
  • 14. 14© 2017 Pivotal Software, Inc. All rights reserved. Managed vs Unmanaged data Managed data Unmanaged data Metadata Metadata ???
  • 15. HAWQ eXtension Framework (aka PXF) Uniform tabular view to heterogeneous data sources Exploits parallelism for data access Pluggable framework for Custom connectors(profiles) Built-in connectors for various data sources/formats
  • 16. Tomcat (Webapp) REST API Java API External Tables Java API Java/Thrift ● JDBC ● Solr ● Redis ● Cassandra ● GemfireXD PXF Architecture ➔ Independent JVM ➔ Runs alongside namenode and datanodes PXF
  • 17. 17© 2017 Pivotal Software, Inc. All rights reserved. Server NServer 2Server 1 Query Execution (External Data) HAWQ Master NameNode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode Postmaster Local directory Local directory Local directory Animated slides
  • 18. 18© 2017 Pivotal Software, Inc. All rights reserved. Server NServer 2Server 1 Query Planning - Distribution HAWQ Master NameNode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode Postmaster PXF Local directory Local directory Local directory Get Partition Metadata {P1, P2, P3, P4, P5} Planner Partition Mapper {P1, P4} {P5} {P2, P3}
  • 19. 19© 2017 Pivotal Software, Inc. All rights reserved. Server NServer 2Server 1 Query Execution - Read HAWQ Master HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode Postmaster VS VSVS VS VS NameNode PXF PXF PXF PXF P2P5P1 P4 P3
  • 20. 20© 2017 Pivotal Software, Inc. All rights reserved. Query Execution - Result HAWQ Master HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode HAWQ Segment Postmaster HDFS Datanode VS VS VS VS VS Server 1 Local directory Server 2 Local directory Server N Local directory VS = Virtual Segment (container for Query Executors) # of QEs in a v-seg = # of slices in a query NameNode PXFPostmaster Global Aggregate
  • 21. 21© 2017 Pivotal Software, Inc. All rights reserved. HAWQ-Hive Data Integration HiveRC ➢ Works for RCFile format Hive ➢ Works for heterogeneous tables ➢ Support all formats ➢ Unmooptimized HiveText ➢ Works fast for text data ➢ Lazy data resolution ➢ Only text datatypes are supported HiveORC ➢ Optimized for ORC data ➢ Leverages predicates push down ➢ Column projection HiveVectorizedORC ➢ Uses ORC Batch API ➢ Sends 1024 row batch to HAWQ ➢ Enables Vectorized Execution
  • 22. 22© 2017 Pivotal Software, Inc. All rights reserved. HAWQ-Hive ORC Optimizations HAWQ Master HAWQ Segment Postmaster PXF column attributes: col1, col2 predicate: RPNF {filter(s)} aggregate functions {Col1,col2 col3=’abc’} col4; col3; col2; col1; SELECT col1,col2 FROM tab1 WHERE col3 = ‘abc’; SELECT COUNT(*) FROM tab1 WHERE col3 = ‘abc’;Query Dispatch ORC API {Col1,col2 col3=’abc’}
  • 23. 23© 2017 Pivotal Software, Inc. All rights reserved. Optimizations Statistics ● Exposing statistics about unmanaged tables ● Optimized Query plan Columns projection ● Passing requested columns ● Disk I/O is optimized if data format allows Predicates pushdown ● Passing down predicates from WHERE clause through the PXF framework ● Partitions/stripes/files elimination Batches vs tuples ● HiveText ● HiveVectorizedORC ● Lazy Data resolution
  • 24. 24© 2017 Pivotal Software, Inc. All rights reserved. HAWQ-Hive Catalog Integration CREATE EXTERNAL TABLE items (column2 int, column2 string) LOCATION ('pxf://namenode:51200/customer_db?PROFILE=Hive') FORMAT 'custom' (formatter='pxfwritable_import'); SELECT * FROM items; Was: Wanted: ● Need to create external HAWQ table ● Users need to know HAWQ-Hive data mapping ● Need to keep both tables metadata in sync manually SELECT * FROM items; ● No need to create external HAWQ table ● Users don't know about HAWQ-Hive data types mapping, etc ● Metadata is always up to date
  • 25. 25© 2017 Pivotal Software, Inc. All rights reserved. Challenges with Catalog Unification Hive Catalog
  • 26. 26© 2017 Pivotal Software, Inc. All rights reserved. Challenges with Catalog Unification HAWQ Catalog
  • 27. 27© 2017 Pivotal Software, Inc. All rights reserved. Where to store HCatalog data in HAWQ Requires few HAWQ changes Getting all catalog utilities for free Catalog is polluted with external data HCatalog objects are visible to concurrent sessions Session-level isolation Cheap cleanup process HAWQ Catalog service need to be changed to be able to work with disk/memory Catalog utilities need to be modified to work with HCatalog objects
  • 28. 28© 2017 Pivotal Software, Inc. All rights reserved. Object namespaces 0 2^3210*2^20 Globalcounter Session 1 counter In-memory In-memory In-memory Session 2 counter Session N counte HAWQ objects HCatalog objects Persistant Sessions states are isolated
  • 29. 29© 2017 Pivotal Software, Inc. All rights reserved. HAWQ-HCatalog Integration Weblogs id double ts timestamp ... SELECT * FROM hcatalog.default.weblogs WHERE ts between ‘2015-09-01’ and ‘2015-09-30’; HIVE PXF PXF PXF HCAT SELECT COUNT(*) FROM hcatalog.default.weblogs WHERE ts between ‘2015-09-01’ and ‘2015-09-30’; In Memory Catalog Disk Heap Catalog Weblogs id double ts timestamp ... HAWQCatalogservice HAWQ
  • 30. 30© 2017 Pivotal Software, Inc. All rights reserved. Avoid data duplication: All processing engines point to the same copy of data ⬢ Apache HAWQ ● MPP engine from the core ● Easy transition from Tradition DB/Warehouse ● Ad-hoc Analytics, BI & Visualization ● Low Query Latency ● Scale 100s TB to low PB’s ● Machine Learning (Madlib) Apache Hive & HAWQ (via HDB) The Most Comprehensive SQL on Hadoop Right Tool for the Job: Choose the right SQL engine based on your application’s needs. ⬢ Apache Hive ● Holds very detailed information ● Integrates all data sources ● Low-Mid Query Latency ● Scales to 100’s petabytes ● Large Community Run HAWQ & Hive alongside!
  • 31. github.com/apache/incubator-hawq HAWQ Homepage Getting Started HAWQ Wiki PXF Wiki Sandbox Additional Resources Documentation Wiki/Docs Code Github(Apache) Join Discussion/Ask Questions Apache DLs dev@hawq.incubator.apache.org user@hawq.incubator.apache.org
  • 33. 33© 2016 Pivotal Software, Inc. All rights reserved. LIBYARNResourceBroker libyarn Resource pool YARNResourceManager segments YARN Node Manager HAWQ Segment Register HAWQ as an unmanaged application exclusively consuming a YARN queue Periodically fetch YARN cluster report, container report and queue report to recognize YARN cluster Acquire YARN containers with host preference information Return YARN containers Unregister HAWQ in YARN Add activated YARN containers’ quota Return YARN containers’ quota Global RM container Lifecycle Manager Resourcebrokeruseslibyarn(ac/c++ versionlibrary)tocommunicatewith YARNthroughprotobuf. Indexed Resource Quota Table Accepted YARN container quota To be returned YARN containers’ quota Increase HAWQ segment resource quota when have new global resource manager’s containers allocated; Decrease HAWQ segment resource quota when some global resource manager’s containers are decided to be kicked. HAWQ resource queue manager Acquire calculated resource quota or return unused query resource HAWQ Query Dispatcher Acquire/Returnqueryresource SQL statement Container report Cluster report Queue report Query Quota Calculator Query Resource Request Queuing Facility HAWQ Resource Manager Queue Quota Calculator Allocated query resource Allocatedqueryresource Active YARN containers with resource holding processes started Drive resource broker to acquire global resource manager containers. The quota of a global resource manager can be (1GB,1core), (2GB, 1core), etc. Allocate virtual segments with fixed resource quota assigned and dispatch workload to segments. The resource quota can be as small as 128MB, 256MB and as large as GBs. 4 79 10 11 14 15 8 312 6 5 1 2 13 Internal Use Only
  • 34. 34© 2016 Pivotal Software, Inc. All rights reserved. • Responsibility – Responsible for acquiring & returning CPU/Mem resources from/to YARN – Responsible for resource allocation among HAWQ users and queries • Master resource manager process – Resource negotiation with YARN and resource allocation – Manage and maintain the resources in resource pool – Handle resource allocation/return RPC requests from QD (query dispatcher) – Fault tolerance service are in the same process • Segment resource manager process – One HAWQ RM on each Segment – Negotiation with Master resource manager (for resource enforcement) – Fault tolerance service: Heartbeat sender Resource Management HAWQ Resource Manager
  • 35. 35© 2016 Pivotal Software, Inc. All rights reserved. SQL on Hadoop benchmark
  • 36. 36© 2016 Pivotal Software, Inc. All rights reserved. PXF Data Flow
  • 37. 37© 2016 Pivotal Software, Inc. All rights reserved. PXF Data Model
  • 38. 38© 2016 Pivotal Software, Inc. All rights reserved. Putting it all together External Data pxf Parallelized access to external data sources (read/write) Install and Configure Ambari to deploy and manage HAWQ, just like any other Hadoop service. Manage Resources YARN-integrated for dynamic resource allocation across hierarchical groups. Write Queries Advanced optimizer and dynamic pipelining for high-performance response.orca Enable Data Science In-database machine learning algorithms for predictive analytics. Extend Data Processing Procedural language extensions for custom application logic. Summary of HAWQ user experience (via HDB)