© Copyright 2017 Pivotal Software, Inc. All rights Reserved.
Pivotal Greenplum
Hadoop Integration with PXF
Kong Yew, Chan
kochan@pivotal.io
Last update: Nov 2018
Disclaimer
This presentation contains statements relating to Pivotal’s expectations, projections, beliefs and prospects which are "forward-looking
statements” about Pivotal’s future which by their nature are uncertain. Such forward-looking statements are not guarantees of future performance,
and you are cautioned not to place undue reliance on these forward-looking statements. Actual results could differ materially from those projected in
the forward-looking statements as a result of many factors, including but not limited to: (i) adverse changes in general economic or market conditions;
(ii) delays or reductions in information technology spending; (iii) risks associated with managing the growth of Pivotal’s business, including operating
costs; (iv) changes to Pivotal’s software business model; (v) competitive factors, including pricing pressures and new product introductions; (vi)
Pivotal’s customers' ability to transition to new products and computing strategies such as cloud computing, the uncertainty of customer acceptance
of emerging technologies, and rapid technological and market changes; (vii) Pivotal's ability to protect its proprietary technology; (viii) Pivotal’s ability
to attract and retain highly qualified employees; (ix) Pivotal’s ability to execute on its plans and strategy; and (x) risks related to data and information
security vulnerabilities. All information set forth in this presentation is current as of the date of this presentation. These forward-looking statements are
based on current expectations and are subject to uncertainties and changes in condition, significance, value and effect as well as other risks disclosed
previously and from time to time in documents filed by Dell Technologies Inc., the parent company of Pivotal, with the U.S. Securities and Exchange
Commission. Dell and Pivotal assume no obligation to, and do not currently intend to, update any such forward-looking statements after the date of
this presentation.
The following is intended to outline the general direction of Pivotal's offerings. It is intended for information purposes only and may not be
incorporated into any contract. Any information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to
ongoing evaluation by Pivotal and is subject to change. This information is provided without warranty or any kind, express or implied, and is not a
commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions regarding Pivotal's offerings.
These purchasing decisions should only be based on features currently available. The development, release, and timing of any features or
functionality described for Pivotal's offerings in this presentation remain at the sole discretion of Pivotal. Pivotal has no obligation to update forward-
looking information in this presentation.
Agenda
● Use Cases
● PXF Architecture
● Q&A
Greenplum-PXF use cases
● Combined Data Analysis
● Hadoop as the data lake / landing zone
● Hadoop as the data archive store (cold data store)
● Import / Export data for external systems (RDBMS)
PXF Extension Framework Overview
Apache Tomcat
PXF WebappREST API Java API
External Tables
HTTP, port: 5888
Java API
Java API
Pivotal Greenplum
In progress:
PXF Service*
PXF Service is installed on each GPDB segment*
Design - Define External Tables (SQL)
CREATE EXTERNAL TABLE ext_table <attr list, ...>
LOCATION('pxf://path/to/data?PROFILE=<profile>&
<Other custom user options>=<value>')
FORMAT'custom'(formatter='pxfwritable_import');
CREATE EXTERNAL TABLE ext_table <attr list, ...>
LOCATION('pxf://path/to/data?PROFILE=avro
FORMAT'custom'(formatter='pxfwritable_import');
* When defining a table, PXF doesn’t check if the file exists, accessible or name, port are correct, etc.
It is just a DEFINITION.
PXF Profiles - Examples
CREATE EXTERNAL TABLE pxf_hdfs_textsimple
(location text, month text, num_orders int, total_sales float8)
LOCATION
('pxf://data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
FORMAT 'TEXT' (delimiter=E',');
CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name
( column_name data_type [, ...] | LIKE other_table )
LOCATION ('pxf://[alias:]/path?Profile=')
FORMAT 'CUSTOM' (Formatter='pxfwritable_import');
CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap text,
relationship text, address text)
LOCATION
('pxf://data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&
RECORDKEY_DELIM=:')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
PXF Plugins, Profiles to be ported from Hawq
• Hawq Available Profiles
• HDFS: HDFSTextSimple(R/W), HDFSTextMulti(RO), Avro(RO)
• Hive: Hive, HiveRC, HiveText, HiveORC
• HBase: HBase
• JDBC profile ...
PXF Filter Push Down
• Goal: Performance , Efficiency, less data over wire
• Criteria: SQL “Where” Clause
• Single Expression or a group of “AND/OR” Expressions
• Supported data types and operators
• Types: text, int, smallint, bigint.
• Operators: EQ, NE, LT, GT, LE, GE & AND
© Copyright 2017 Pivotal Software, Inc. All rights Reserved.
Reference:
Greenplum
PXF documentation
Thank You

Greenplum-PXF November 2018

  • 1.
    © Copyright 2017Pivotal Software, Inc. All rights Reserved. Pivotal Greenplum Hadoop Integration with PXF Kong Yew, Chan kochan@pivotal.io Last update: Nov 2018
  • 2.
    Disclaimer This presentation containsstatements relating to Pivotal’s expectations, projections, beliefs and prospects which are "forward-looking statements” about Pivotal’s future which by their nature are uncertain. Such forward-looking statements are not guarantees of future performance, and you are cautioned not to place undue reliance on these forward-looking statements. Actual results could differ materially from those projected in the forward-looking statements as a result of many factors, including but not limited to: (i) adverse changes in general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) risks associated with managing the growth of Pivotal’s business, including operating costs; (iv) changes to Pivotal’s software business model; (v) competitive factors, including pricing pressures and new product introductions; (vi) Pivotal’s customers' ability to transition to new products and computing strategies such as cloud computing, the uncertainty of customer acceptance of emerging technologies, and rapid technological and market changes; (vii) Pivotal's ability to protect its proprietary technology; (viii) Pivotal’s ability to attract and retain highly qualified employees; (ix) Pivotal’s ability to execute on its plans and strategy; and (x) risks related to data and information security vulnerabilities. All information set forth in this presentation is current as of the date of this presentation. These forward-looking statements are based on current expectations and are subject to uncertainties and changes in condition, significance, value and effect as well as other risks disclosed previously and from time to time in documents filed by Dell Technologies Inc., the parent company of Pivotal, with the U.S. Securities and Exchange Commission. Dell and Pivotal assume no obligation to, and do not currently intend to, update any such forward-looking statements after the date of this presentation. The following is intended to outline the general direction of Pivotal's offerings. It is intended for information purposes only and may not be incorporated into any contract. Any information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to ongoing evaluation by Pivotal and is subject to change. This information is provided without warranty or any kind, express or implied, and is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions regarding Pivotal's offerings. These purchasing decisions should only be based on features currently available. The development, release, and timing of any features or functionality described for Pivotal's offerings in this presentation remain at the sole discretion of Pivotal. Pivotal has no obligation to update forward- looking information in this presentation.
  • 3.
    Agenda ● Use Cases ●PXF Architecture ● Q&A
  • 4.
    Greenplum-PXF use cases ●Combined Data Analysis ● Hadoop as the data lake / landing zone ● Hadoop as the data archive store (cold data store) ● Import / Export data for external systems (RDBMS)
  • 5.
    PXF Extension FrameworkOverview Apache Tomcat PXF WebappREST API Java API External Tables HTTP, port: 5888 Java API Java API Pivotal Greenplum In progress: PXF Service* PXF Service is installed on each GPDB segment*
  • 6.
    Design - DefineExternal Tables (SQL) CREATE EXTERNAL TABLE ext_table <attr list, ...> LOCATION('pxf://path/to/data?PROFILE=<profile>& <Other custom user options>=<value>') FORMAT'custom'(formatter='pxfwritable_import'); CREATE EXTERNAL TABLE ext_table <attr list, ...> LOCATION('pxf://path/to/data?PROFILE=avro FORMAT'custom'(formatter='pxfwritable_import'); * When defining a table, PXF doesn’t check if the file exists, accessible or name, port are correct, etc. It is just a DEFINITION.
  • 7.
    PXF Profiles -Examples CREATE EXTERNAL TABLE pxf_hdfs_textsimple (location text, month text, num_orders int, total_sales float8) LOCATION ('pxf://data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple') FORMAT 'TEXT' (delimiter=E','); CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name ( column_name data_type [, ...] | LIKE other_table ) LOCATION ('pxf://[alias:]/path?Profile=') FORMAT 'CUSTOM' (Formatter='pxfwritable_import'); CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap text, relationship text, address text) LOCATION ('pxf://data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:& RECORDKEY_DELIM=:') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
  • 8.
    PXF Plugins, Profilesto be ported from Hawq • Hawq Available Profiles • HDFS: HDFSTextSimple(R/W), HDFSTextMulti(RO), Avro(RO) • Hive: Hive, HiveRC, HiveText, HiveORC • HBase: HBase • JDBC profile ...
  • 9.
    PXF Filter PushDown • Goal: Performance , Efficiency, less data over wire • Criteria: SQL “Where” Clause • Single Expression or a group of “AND/OR” Expressions • Supported data types and operators • Types: text, int, smallint, bigint. • Operators: EQ, NE, LT, GT, LE, GE & AND
  • 10.
    © Copyright 2017Pivotal Software, Inc. All rights Reserved. Reference: Greenplum PXF documentation Thank You