SlideShare a Scribd company logo
1 of 33
Download to read offline
© Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0
Federated Queries with
Greenplum and PXF
Alexander Denissov
Software Architect
April 2018
Agenda
■ Introduction to Federated Queries
■ Federation Use Cases
■ Greenplum External Tables
■ PXF Architecture
■ PXF Connectors and Profiles
■ Advanced Topics
■ Q+A
Data Platform for Analytics
The world’s first open-
source massively parallel
processing (MPP) data
platform for advanced
analytics
Based on PostgreSQL
Developed since early 2000s
Open sourced in 2015
SQL 2003 compliant
Advanced cost-based
optimizer
ACID transactions
ANALYTICAL
APPLICATIONS
NATIVE INTERFACES
PIVOTAL
GREENPLUM
PLATFORM
MULTI-
STRUCTURED DATA
SOURCES &
PIPELINES
Structured Data
JDBC, ODBC
SQL
ANSI SQL
USERS
FLEXIBLE
DEPLOYMENT
Local
Storage
Other
RDBMSes
SparkGemFire
Cloud
Object
Storage
HDFS
JSON, Apache AVRO, Apache Parquet and XML
Teradata SQL
Other DB SQL
Apache MADlib
ML/Stats/Graph
Python. R,
Java, Perl, C
Programmatic
Apache SOLR
Text
PostGIS
GeoSpatial
Custom Apps BI / Reporting
Machine
Learning
AI
On-Premises
KafkaETL
Spring
Cloud
Data Flow
Massively
Parallel
(MPP)
PostgreSQL
Kernel
Petabyte
Scale
Loading
Query
Optimizer
(GPORCA)
Workload
Manager
Polymorphic
Storage
Command
Center
SQL
Compatibilit
y
(Hyper-Q)
IT Dev
Business
Analysts
Data
Scientists
Public
Clouds
Private
Clouds
Fully
Managed
Clouds
Greenplum = Massively Parallel Postgres for
Analytics
Standby
Master
…
Master
Host
SQL
Interconnect
Segment Host
Node1
Segment Host
Node2
Segment Host
Node3
Segment Host
NodeN
Local
Storage
Other
RDBMSes
SparkGemFire
Cloud
Object
Storage
HDFS KafkaETL
Spring
Cloud
Data Flow
Master Servers
Query planning and dispatch
Segment Servers
Query processing and data storage
Interconnect
External Sources & Pipelines
Parallel loading and streaming
Modern Enterprise : heterogeneous data formats
{ semi-structured
data }
unstructured
data
raw data
structured data
Modern Enterprise : wide variety of data engines
RDBMS
How can we access all this data
?
Cover w/ Image
Customer information is stored in native
Greenplum tables
Find all customer names in CA:
Managing internal data
id , name, state
1234, ACME, NJ
1235, PVTL, CA
SELECT c.name
FROM customers c
WHERE c.state = 'CA'
Cover w/ Image
Order transactions are stored as CSV
files in HDFS
Find all orders from today:
Viewing external data
SELECT *
FROM orders o
WHERE o.date = NOW()
cust, sku, amount,
date
1234, ABC, $9.90,
4/01
1235, CDE, $8.80,
3/30
Cover w/ Image
Merge order and customer data from
different data sources
Find all orders from today, including
customer names:
Joining with external
data
cust, sku, amount,
date
1234, ABC, $9.90,
4/01
1235, CDE, $8.80,
3/30
SELECT c.name, o.amount
FROM customer c, sales s
WHERE s.date = NOW()
AND c.id = s.cust
id , name, state
1234, ACME, NJ
1235, PVTL, CA
Analytics across data of wide time range
Data is stored in different
systems based on operational
requirements
Can I work with data created
5 seconds ago ?
Can I run a report on data
from 5 months ago ?
Can I inspect the data
archived 5 years ago ?
Data is available for analytics
with Greenplum no matter
where it resides !
In-memory
data grid
RDBMS
dataData Lake
HOT
WARM
COLD
Federated Query is the ability to
answer a SQL query with the
information from different
sources.
Cover w/ Image
Greenplum External
Table
Provides the definitions for:
● the schema of the external data
● the protocol used to access the data
● the location of the data in an external
system
● the format of the external data
Participates in query execution and allows plug-in
connectors to external data for different protocols.
CREATE [READABLE] EXTERNAL TABLE table_name
( col_name data_type [,...] | LIKE other_table )
LOCATION ('<protocol>://<path to data>...)
FORMAT 'TEXT'
CREATE WRITABLE EXTERNAL TABLE table_name
( col_name data_type [,...] | LIKE other_table )
LOCATION ('<protocol>://<path to data>...)
FORMAT 'CUSTOM'
(Formatter=<formatter_specifications>)
[ ENCODING 'encoding' ]
CREATE [READABLE] EXTERNAL WEB TABLE table_name
...
CREATE WRITABLE EXTERNAL WEB TABLE table_name
...
Cover w/ Image
External Protocol
● Provides connectivity to an external system
● Implements methods to read data from the
external system and write data into it
● Defines the validation logic for external
table specifications
● Can be packaged as a shared library file
(.so) and loaded dynamically
AVAILABLE PROTOCOLS
file:// -- for files on Greenplum segments
gpfdist:// -- for files on remote hosts
s3:// -- for files in AWS S3 bucket
gphdfs:// -- for files in Hadoop HDFS
http:// -- for WEB tables
pxf:// -- for data sources with JAVA APIs :
● files in Hadoop HDFS
● data in Apache Hive tables
● data in Apache HBase tables
● rows in RDBMS tables via JDBC
● objects in in-memory grids
● messages in queues
● ... build your own adapter ...
Platform Extension Framework (PXF)
The Platform Extension Framework (PXF) provides:
parallel, high throughput data access
federated queries across heterogeneous data sources
built-in connectors that map a Greenplum Database external
table definition to an external data source.
Available in since 2017 (5.1
release)
● PXF is originally a part of
Apache HAWQ (incubating)
launched in 2012 and open-
sourced in 2015
● PXF is used to connect to
data in Hadoop ecosystem
● PXF is open-sourced under
the Apache license
PXF > Architecture
REST API
Java APIs
Java API
segment
PXF
extension
pxf.so
Tomcat
S
E
R
V
E
R
HDFS
connector
HIVE
connector
. . .
PXF agent
webapp
Java / Thrift API
RDBMS GRIDS
PXF > HDFS Data Import Flow
1. Master submits a query and
segments start parallel
execution
2. Each segment query
execution slice gets a thread in
PXF JVM
3. PXF asks HDFS Namenode for
the information on file
fragments
4. PXF decides on a workload
distribution among threads
5. PXF reads data fragments via
HDFS APIs from Datanodes and
passes it to segments
6. Segments convert data into
seg4
seg5
seg6
P X F
Segment Host
seg1
seg2
seg3
P X F
Segment Host
…
master
Master Host
NameNode
DataNode
…
DataNode
H
D
F
S
J
A
V
A
A
P
I
Cover w/ Image
PXF Fragmenter
Functional interface which
splits data from an external data source
into a list of independent fragments
that can be read in parallel.
Examples of a fragment:
● FileSplit in HDFS
● Table partition in JDBC
External
Data
Source
Frag 1
Frag 2
Frag n
Fragments
FragmenterSELECT
Cover w/ Image
PXF Accessor
Functional interface which
reads a single fragment
from an external data source and
produces a list of records/rows.
Examples of a record:
● Line in a text file
● Row in a JDBC ResultSet
Frag 1
Fragment
Fragmenter Read Accessor
Rows
Row 1
Row 2
Row n
Cover w/ Image
PXF Resolver
Functional interface which
deserializes a record/row into fields and
transforms the data types
into those supported by Greenplum
Examples of a field:
● Value between commas in a CSV line
● Column value in a JDBC ResultSet
F1
Read Accessor Read Resolver
Row
F3 FnRow 1 F2
Fields
Cover w/ Image
PXF Profile
A profile is a simple name mapping to
a set of connector plug-in class names
implementing
Fragmenter, Accessor and Resolver
functional interfaces.
Profiles are useful when defining PXF
external tables in Greenplum
HdfsDataFragmenter
LineBreakAccessor
StringPassResolver
HdfsTextSimple
Cover w/ Image
PXF External Table
Register PXF Greenplum extension
Define an external table with:
the schema that corresponds to the
structure of external data
the protocol pxf:// and the location of
the data on external system
the profile to use for accessing the data
the format of data returned by PXF
cust, sku, amount,
date
1234, ABC, $9.90,
4/01
1235, CDE, $8.80,
3/30
-- create extension only once per database
CREATE EXTENSION pxf;
-- define external table
CREATE EXTERNAL TABLE sales
(cust int, sku text, amount decimal, date date)
LOCATION
('pxf:///2018/sales.csv?PROFILE=HdfsTextSimple')
FORMAT 'TEXT'
PXF > Data Flows Summary
Fragmenter, Accessor and
Resolver are working in
combination to process data
They can be specified as a pre-
built profile or independently
Greenplum external table
defines data schema, location,
format and the profile to use to
get the data
PXF can read the data from the
external system or write to it
PXF > HDFS Connector
Data Format Profile Name Description
Text HdfsTextSimple
HdfsTextMulti
Read delimited single or multi-line records from plain
text data on HDFS.
Parquet Parquet Read Parquet format data (<filename>.parq).
Avro Avro Read Avro format binary data (<filename>.avro).
JSON JSON Read JSON format data (<filename>.json).
PXF > Hive Connector
File Format Profile Name Description
TextFile Hive, HiveText Flat file with data in comma-, tab-, or space-
separated value format or JSON notation.
SequenceFile Hive Flat file consisting of binary key/value pairs.
RCFile Hive, HiveRC Record columnar data consisting of binary
key/value pairs; high row compression rate.
ORC Hive, HiveORC,
HiveVectorizedORC
Optimized row columnar data with stripe, footer,
and postscript sections; reduces data size.
Parquet Hive Compressed columnar data representation.
PXF > Other Connectors
Apache HBase connector
JDBC connector (community)
Apache Ignite connector (community)
Alluxio connector (community)
Advanced Topics > Data Processing Optimizations
● Avoid data deserialization -- read chunks of text and stream to Greenplum
without “resolving” in PXF
● Columnar vectorization -- resolve all row values for a given column at once
● Send multiple rows in batches
● Limit amount of data read from an external system and sent over the network
Advanced Topics > Column Projection
date:
state:
amount:
item:
{item:,
amount:,
state=’CA’}
SELECT item, amount FROM orders
WHERE state = 'CA'
SELECT COUNT(*) FROM orders
WHERE state = 'CA'
MASTER
SEGMENT
columns : item, amount
predicates : state=CA
aggregates : count
PXF with
Hive/ORC
columnar
storage format
Pushing information about
requested columns all the way
down to the external system
improves performance
Avoids sending unnecessary
columns over the network
from PXF to Greenplum
Avoids reading unnecessary
columns from the disk
Similar benefits can be
obtained for some aggregate
queries
Advanced Topics > Predicate Pushdown
state=NY
state=NJ
state=CA
state=CA
{item:,
amount:,
state=’CA’}
SELECT item, amount FROM orders
WHERE state = 'CA'
SELECT COUNT(*) FROM orders
WHERE state = 'CA'
MASTER
SEGMENT
columns : item, amount
predicates : state=CA
aggregates : count
PXF with
Hive/Text
row oriented
storage format
Pushing information about
filter conditions (predicates)
all the way down to the
external system improves
performance
PXF itself does not evaluate
predicates
But external system might
support predicates for its own
queries (e.g. JDBC)
A predicate might cause the
whole partition to be
eliminated from consideration
Advanced Topics > User Impersonation
Allows the PXF server to
submit requests to external
systems on behalf of
Greenplum end-users
Must be explicitly supported
by the PXF connectors
Prevents the need to grant the
PXF server OS user 'gpadmin'
superuser access in the
external system
Allows to preserve fine-
grained access control setting
in the external system
alice
scott
alice
scott
GREENPLUM PXF Server HDFS
gpadmin
gpadmin
without impersonation
alice
scott
alice
scott
GREENPLUM PXF Server HDFS
with impersonation
alice
scott
Advanced Topics > Kerberos Security
A Hadoop cluster secured with
Kerberos requires strong
authentication to services
based on keys and tickets
The PXF server registers service
principal with the Kerberos KDC
and stores its secret in a keytab
file on a local file system
The PXF server uses the key in
the keytab file to obtain a ticket
to access resources in Hadoop
cluster, such as files in HDFS
GREENPLUM PXF Server HDFS
KDC
Summary
Reviewed the Federated Query concept
Explored Greenplum External Tables
Learned about PXF and its architecture
Understood how to use Greenplum with PXF for
creating federated queries across multiple data
sources, data engines and data formats
More information at:
https://greenplum.org
https://github.com/greenplum-db/gpdb
https://github.com/apache/incubator-hawq/tree/master/pxf
http://gpdb.docs.pivotal.io/570/pxf/overview_pxf.html
You can contact me at:
Alexander Denissov
adenissov@pivotal.io
www.linkedin.com/in/denissov
Transforming How The World Builds Software
© Copyright 2017 Pivotal Software, Inc. All rights Reserved.

More Related Content

What's hot

Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources confluent
 
JSON improvements in MySQL 8.0
JSON improvements in MySQL 8.0JSON improvements in MySQL 8.0
JSON improvements in MySQL 8.0Mydbops
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks EDB
 
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...Citus Data
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenPostgresOpen
 
Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsAnil Nair
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxData
 
Spark tunning in Apache Kylin
Spark tunning in Apache KylinSpark tunning in Apache Kylin
Spark tunning in Apache KylinShi Shao Feng
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HAharoonm
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Databricks
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedis Labs
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 

What's hot (20)

Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
 
JSON improvements in MySQL 8.0
JSON improvements in MySQL 8.0JSON improvements in MySQL 8.0
JSON improvements in MySQL 8.0
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
 
Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret Internals
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
 
Spark tunning in Apache Kylin
Spark tunning in Apache KylinSpark tunning in Apache Kylin
Spark tunning in Apache Kylin
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 

Similar to Federated Queries Across Both Different Storage Mediums and Different Data Engines - Greenplum Summit 2018

PXF HAWQ Unmanaged Data
PXF HAWQ Unmanaged DataPXF HAWQ Unmanaged Data
PXF HAWQ Unmanaged DataShivram Mani
 
SQL/MED: Doping for PostgreSQL
SQL/MED: Doping for PostgreSQLSQL/MED: Doping for PostgreSQL
SQL/MED: Doping for PostgreSQLPeter Eisentraut
 
EKAW - Publishing with Triple Pattern Fragments
EKAW - Publishing with Triple Pattern FragmentsEKAW - Publishing with Triple Pattern Fragments
EKAW - Publishing with Triple Pattern FragmentsRuben Taelman
 
Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...Sameer Tiwari
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Zhenxiao Luo
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Stuart Chalk
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesappaji intelhunt
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pi...
A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pi...A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pi...
A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pi...IJCSES Journal
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...ijcses
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 

Similar to Federated Queries Across Both Different Storage Mediums and Different Data Engines - Greenplum Summit 2018 (20)

Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
Hadoop
HadoopHadoop
Hadoop
 
PXF HAWQ Unmanaged Data
PXF HAWQ Unmanaged DataPXF HAWQ Unmanaged Data
PXF HAWQ Unmanaged Data
 
SQL/MED: Doping for PostgreSQL
SQL/MED: Doping for PostgreSQLSQL/MED: Doping for PostgreSQL
SQL/MED: Doping for PostgreSQL
 
SQL/MED and PostgreSQL
SQL/MED and PostgreSQLSQL/MED and PostgreSQL
SQL/MED and PostgreSQL
 
EKAW - Publishing with Triple Pattern Fragments
EKAW - Publishing with Triple Pattern FragmentsEKAW - Publishing with Triple Pattern Fragments
EKAW - Publishing with Triple Pattern Fragments
 
PXF BDAM 2016
PXF BDAM 2016PXF BDAM 2016
PXF BDAM 2016
 
Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...Accessing external hadoop data sources using pivotal e xtension framework (px...
Accessing external hadoop data sources using pivotal e xtension framework (px...
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
HadoopDB
HadoopDBHadoopDB
HadoopDB
 
A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pi...
A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pi...A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pi...
A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pi...
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Recently uploaded (20)

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Federated Queries Across Both Different Storage Mediums and Different Data Engines - Greenplum Summit 2018

  • 1. © Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0 Federated Queries with Greenplum and PXF Alexander Denissov Software Architect April 2018
  • 2. Agenda ■ Introduction to Federated Queries ■ Federation Use Cases ■ Greenplum External Tables ■ PXF Architecture ■ PXF Connectors and Profiles ■ Advanced Topics ■ Q+A
  • 3. Data Platform for Analytics The world’s first open- source massively parallel processing (MPP) data platform for advanced analytics Based on PostgreSQL Developed since early 2000s Open sourced in 2015 SQL 2003 compliant Advanced cost-based optimizer ACID transactions ANALYTICAL APPLICATIONS NATIVE INTERFACES PIVOTAL GREENPLUM PLATFORM MULTI- STRUCTURED DATA SOURCES & PIPELINES Structured Data JDBC, ODBC SQL ANSI SQL USERS FLEXIBLE DEPLOYMENT Local Storage Other RDBMSes SparkGemFire Cloud Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Stats/Graph Python. R, Java, Perl, C Programmatic Apache SOLR Text PostGIS GeoSpatial Custom Apps BI / Reporting Machine Learning AI On-Premises KafkaETL Spring Cloud Data Flow Massively Parallel (MPP) PostgreSQL Kernel Petabyte Scale Loading Query Optimizer (GPORCA) Workload Manager Polymorphic Storage Command Center SQL Compatibilit y (Hyper-Q) IT Dev Business Analysts Data Scientists Public Clouds Private Clouds Fully Managed Clouds
  • 4. Greenplum = Massively Parallel Postgres for Analytics Standby Master … Master Host SQL Interconnect Segment Host Node1 Segment Host Node2 Segment Host Node3 Segment Host NodeN Local Storage Other RDBMSes SparkGemFire Cloud Object Storage HDFS KafkaETL Spring Cloud Data Flow Master Servers Query planning and dispatch Segment Servers Query processing and data storage Interconnect External Sources & Pipelines Parallel loading and streaming
  • 5. Modern Enterprise : heterogeneous data formats { semi-structured data } unstructured data raw data structured data
  • 6. Modern Enterprise : wide variety of data engines RDBMS
  • 7. How can we access all this data ?
  • 8. Cover w/ Image Customer information is stored in native Greenplum tables Find all customer names in CA: Managing internal data id , name, state 1234, ACME, NJ 1235, PVTL, CA SELECT c.name FROM customers c WHERE c.state = 'CA'
  • 9. Cover w/ Image Order transactions are stored as CSV files in HDFS Find all orders from today: Viewing external data SELECT * FROM orders o WHERE o.date = NOW() cust, sku, amount, date 1234, ABC, $9.90, 4/01 1235, CDE, $8.80, 3/30
  • 10. Cover w/ Image Merge order and customer data from different data sources Find all orders from today, including customer names: Joining with external data cust, sku, amount, date 1234, ABC, $9.90, 4/01 1235, CDE, $8.80, 3/30 SELECT c.name, o.amount FROM customer c, sales s WHERE s.date = NOW() AND c.id = s.cust id , name, state 1234, ACME, NJ 1235, PVTL, CA
  • 11. Analytics across data of wide time range Data is stored in different systems based on operational requirements Can I work with data created 5 seconds ago ? Can I run a report on data from 5 months ago ? Can I inspect the data archived 5 years ago ? Data is available for analytics with Greenplum no matter where it resides ! In-memory data grid RDBMS dataData Lake HOT WARM COLD
  • 12. Federated Query is the ability to answer a SQL query with the information from different sources.
  • 13. Cover w/ Image Greenplum External Table Provides the definitions for: ● the schema of the external data ● the protocol used to access the data ● the location of the data in an external system ● the format of the external data Participates in query execution and allows plug-in connectors to external data for different protocols. CREATE [READABLE] EXTERNAL TABLE table_name ( col_name data_type [,...] | LIKE other_table ) LOCATION ('<protocol>://<path to data>...) FORMAT 'TEXT' CREATE WRITABLE EXTERNAL TABLE table_name ( col_name data_type [,...] | LIKE other_table ) LOCATION ('<protocol>://<path to data>...) FORMAT 'CUSTOM' (Formatter=<formatter_specifications>) [ ENCODING 'encoding' ] CREATE [READABLE] EXTERNAL WEB TABLE table_name ... CREATE WRITABLE EXTERNAL WEB TABLE table_name ...
  • 14. Cover w/ Image External Protocol ● Provides connectivity to an external system ● Implements methods to read data from the external system and write data into it ● Defines the validation logic for external table specifications ● Can be packaged as a shared library file (.so) and loaded dynamically AVAILABLE PROTOCOLS file:// -- for files on Greenplum segments gpfdist:// -- for files on remote hosts s3:// -- for files in AWS S3 bucket gphdfs:// -- for files in Hadoop HDFS http:// -- for WEB tables pxf:// -- for data sources with JAVA APIs : ● files in Hadoop HDFS ● data in Apache Hive tables ● data in Apache HBase tables ● rows in RDBMS tables via JDBC ● objects in in-memory grids ● messages in queues ● ... build your own adapter ...
  • 15. Platform Extension Framework (PXF) The Platform Extension Framework (PXF) provides: parallel, high throughput data access federated queries across heterogeneous data sources built-in connectors that map a Greenplum Database external table definition to an external data source. Available in since 2017 (5.1 release) ● PXF is originally a part of Apache HAWQ (incubating) launched in 2012 and open- sourced in 2015 ● PXF is used to connect to data in Hadoop ecosystem ● PXF is open-sourced under the Apache license
  • 16. PXF > Architecture REST API Java APIs Java API segment PXF extension pxf.so Tomcat S E R V E R HDFS connector HIVE connector . . . PXF agent webapp Java / Thrift API RDBMS GRIDS
  • 17. PXF > HDFS Data Import Flow 1. Master submits a query and segments start parallel execution 2. Each segment query execution slice gets a thread in PXF JVM 3. PXF asks HDFS Namenode for the information on file fragments 4. PXF decides on a workload distribution among threads 5. PXF reads data fragments via HDFS APIs from Datanodes and passes it to segments 6. Segments convert data into seg4 seg5 seg6 P X F Segment Host seg1 seg2 seg3 P X F Segment Host … master Master Host NameNode DataNode … DataNode H D F S J A V A A P I
  • 18. Cover w/ Image PXF Fragmenter Functional interface which splits data from an external data source into a list of independent fragments that can be read in parallel. Examples of a fragment: ● FileSplit in HDFS ● Table partition in JDBC External Data Source Frag 1 Frag 2 Frag n Fragments FragmenterSELECT
  • 19. Cover w/ Image PXF Accessor Functional interface which reads a single fragment from an external data source and produces a list of records/rows. Examples of a record: ● Line in a text file ● Row in a JDBC ResultSet Frag 1 Fragment Fragmenter Read Accessor Rows Row 1 Row 2 Row n
  • 20. Cover w/ Image PXF Resolver Functional interface which deserializes a record/row into fields and transforms the data types into those supported by Greenplum Examples of a field: ● Value between commas in a CSV line ● Column value in a JDBC ResultSet F1 Read Accessor Read Resolver Row F3 FnRow 1 F2 Fields
  • 21. Cover w/ Image PXF Profile A profile is a simple name mapping to a set of connector plug-in class names implementing Fragmenter, Accessor and Resolver functional interfaces. Profiles are useful when defining PXF external tables in Greenplum HdfsDataFragmenter LineBreakAccessor StringPassResolver HdfsTextSimple
  • 22. Cover w/ Image PXF External Table Register PXF Greenplum extension Define an external table with: the schema that corresponds to the structure of external data the protocol pxf:// and the location of the data on external system the profile to use for accessing the data the format of data returned by PXF cust, sku, amount, date 1234, ABC, $9.90, 4/01 1235, CDE, $8.80, 3/30 -- create extension only once per database CREATE EXTENSION pxf; -- define external table CREATE EXTERNAL TABLE sales (cust int, sku text, amount decimal, date date) LOCATION ('pxf:///2018/sales.csv?PROFILE=HdfsTextSimple') FORMAT 'TEXT'
  • 23. PXF > Data Flows Summary Fragmenter, Accessor and Resolver are working in combination to process data They can be specified as a pre- built profile or independently Greenplum external table defines data schema, location, format and the profile to use to get the data PXF can read the data from the external system or write to it
  • 24. PXF > HDFS Connector Data Format Profile Name Description Text HdfsTextSimple HdfsTextMulti Read delimited single or multi-line records from plain text data on HDFS. Parquet Parquet Read Parquet format data (<filename>.parq). Avro Avro Read Avro format binary data (<filename>.avro). JSON JSON Read JSON format data (<filename>.json).
  • 25. PXF > Hive Connector File Format Profile Name Description TextFile Hive, HiveText Flat file with data in comma-, tab-, or space- separated value format or JSON notation. SequenceFile Hive Flat file consisting of binary key/value pairs. RCFile Hive, HiveRC Record columnar data consisting of binary key/value pairs; high row compression rate. ORC Hive, HiveORC, HiveVectorizedORC Optimized row columnar data with stripe, footer, and postscript sections; reduces data size. Parquet Hive Compressed columnar data representation.
  • 26. PXF > Other Connectors Apache HBase connector JDBC connector (community) Apache Ignite connector (community) Alluxio connector (community)
  • 27. Advanced Topics > Data Processing Optimizations ● Avoid data deserialization -- read chunks of text and stream to Greenplum without “resolving” in PXF ● Columnar vectorization -- resolve all row values for a given column at once ● Send multiple rows in batches ● Limit amount of data read from an external system and sent over the network
  • 28. Advanced Topics > Column Projection date: state: amount: item: {item:, amount:, state=’CA’} SELECT item, amount FROM orders WHERE state = 'CA' SELECT COUNT(*) FROM orders WHERE state = 'CA' MASTER SEGMENT columns : item, amount predicates : state=CA aggregates : count PXF with Hive/ORC columnar storage format Pushing information about requested columns all the way down to the external system improves performance Avoids sending unnecessary columns over the network from PXF to Greenplum Avoids reading unnecessary columns from the disk Similar benefits can be obtained for some aggregate queries
  • 29. Advanced Topics > Predicate Pushdown state=NY state=NJ state=CA state=CA {item:, amount:, state=’CA’} SELECT item, amount FROM orders WHERE state = 'CA' SELECT COUNT(*) FROM orders WHERE state = 'CA' MASTER SEGMENT columns : item, amount predicates : state=CA aggregates : count PXF with Hive/Text row oriented storage format Pushing information about filter conditions (predicates) all the way down to the external system improves performance PXF itself does not evaluate predicates But external system might support predicates for its own queries (e.g. JDBC) A predicate might cause the whole partition to be eliminated from consideration
  • 30. Advanced Topics > User Impersonation Allows the PXF server to submit requests to external systems on behalf of Greenplum end-users Must be explicitly supported by the PXF connectors Prevents the need to grant the PXF server OS user 'gpadmin' superuser access in the external system Allows to preserve fine- grained access control setting in the external system alice scott alice scott GREENPLUM PXF Server HDFS gpadmin gpadmin without impersonation alice scott alice scott GREENPLUM PXF Server HDFS with impersonation alice scott
  • 31. Advanced Topics > Kerberos Security A Hadoop cluster secured with Kerberos requires strong authentication to services based on keys and tickets The PXF server registers service principal with the Kerberos KDC and stores its secret in a keytab file on a local file system The PXF server uses the key in the keytab file to obtain a ticket to access resources in Hadoop cluster, such as files in HDFS GREENPLUM PXF Server HDFS KDC
  • 32. Summary Reviewed the Federated Query concept Explored Greenplum External Tables Learned about PXF and its architecture Understood how to use Greenplum with PXF for creating federated queries across multiple data sources, data engines and data formats More information at: https://greenplum.org https://github.com/greenplum-db/gpdb https://github.com/apache/incubator-hawq/tree/master/pxf http://gpdb.docs.pivotal.io/570/pxf/overview_pxf.html You can contact me at: Alexander Denissov adenissov@pivotal.io www.linkedin.com/in/denissov
  • 33. Transforming How The World Builds Software © Copyright 2017 Pivotal Software, Inc. All rights Reserved.