SlideShare a Scribd company logo
© Copyright 2017 Pivotal Software, Inc. All rights Reserved.
Pivotal Greenplum
Hadoop Integration with PXF
Kong Yew, Chan
kochan@pivotal.io
Last update: Nov 2018
Disclaimer
This presentation contains statements relating to Pivotal’s expectations, projections, beliefs and prospects which are "forward-looking
statements” about Pivotal’s future which by their nature are uncertain. Such forward-looking statements are not guarantees of future performance,
and you are cautioned not to place undue reliance on these forward-looking statements. Actual results could differ materially from those projected in
the forward-looking statements as a result of many factors, including but not limited to: (i) adverse changes in general economic or market conditions;
(ii) delays or reductions in information technology spending; (iii) risks associated with managing the growth of Pivotal’s business, including operating
costs; (iv) changes to Pivotal’s software business model; (v) competitive factors, including pricing pressures and new product introductions; (vi)
Pivotal’s customers' ability to transition to new products and computing strategies such as cloud computing, the uncertainty of customer acceptance
of emerging technologies, and rapid technological and market changes; (vii) Pivotal's ability to protect its proprietary technology; (viii) Pivotal’s ability
to attract and retain highly qualified employees; (ix) Pivotal’s ability to execute on its plans and strategy; and (x) risks related to data and information
security vulnerabilities. All information set forth in this presentation is current as of the date of this presentation. These forward-looking statements are
based on current expectations and are subject to uncertainties and changes in condition, significance, value and effect as well as other risks disclosed
previously and from time to time in documents filed by Dell Technologies Inc., the parent company of Pivotal, with the U.S. Securities and Exchange
Commission. Dell and Pivotal assume no obligation to, and do not currently intend to, update any such forward-looking statements after the date of
this presentation.
The following is intended to outline the general direction of Pivotal's offerings. It is intended for information purposes only and may not be
incorporated into any contract. Any information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to
ongoing evaluation by Pivotal and is subject to change. This information is provided without warranty or any kind, express or implied, and is not a
commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions regarding Pivotal's offerings.
These purchasing decisions should only be based on features currently available. The development, release, and timing of any features or
functionality described for Pivotal's offerings in this presentation remain at the sole discretion of Pivotal. Pivotal has no obligation to update forward-
looking information in this presentation.
Agenda
● Use Cases
● PXF Architecture
● Q&A
Greenplum-PXF use cases
● Combined Data Analysis
● Hadoop as the data lake / landing zone
● Hadoop as the data archive store (cold data store)
● Import / Export data for external systems (RDBMS)
PXF Extension Framework Overview
Apache Tomcat
PXF WebappREST API Java API
External Tables
HTTP, port: 5888
Java API
Java API
Pivotal Greenplum
In progress:
PXF Service*
PXF Service is installed on each GPDB segment*
Design - Define External Tables (SQL)
CREATE EXTERNAL TABLE ext_table <attr list, ...>
LOCATION('pxf://path/to/data?PROFILE=<profile>&
<Other custom user options>=<value>')
FORMAT'custom'(formatter='pxfwritable_import');
CREATE EXTERNAL TABLE ext_table <attr list, ...>
LOCATION('pxf://path/to/data?PROFILE=avro
FORMAT'custom'(formatter='pxfwritable_import');
* When defining a table, PXF doesn’t check if the file exists, accessible or name, port are correct, etc.
It is just a DEFINITION.
PXF Profiles - Examples
CREATE EXTERNAL TABLE pxf_hdfs_textsimple
(location text, month text, num_orders int, total_sales float8)
LOCATION
('pxf://data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
FORMAT 'TEXT' (delimiter=E',');
CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name
( column_name data_type [, ...] | LIKE other_table )
LOCATION ('pxf://[alias:]/path?Profile=')
FORMAT 'CUSTOM' (Formatter='pxfwritable_import');
CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap text,
relationship text, address text)
LOCATION
('pxf://data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&
RECORDKEY_DELIM=:')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
PXF Plugins, Profiles to be ported from Hawq
• Hawq Available Profiles
• HDFS: HDFSTextSimple(R/W), HDFSTextMulti(RO), Avro(RO)
• Hive: Hive, HiveRC, HiveText, HiveORC
• HBase: HBase
• JDBC profile ...
PXF Filter Push Down
• Goal: Performance , Efficiency, less data over wire
• Criteria: SQL “Where” Clause
• Single Expression or a group of “AND/OR” Expressions
• Supported data types and operators
• Types: text, int, smallint, bigint.
• Operators: EQ, NE, LT, GT, LE, GE & AND
© Copyright 2017 Pivotal Software, Inc. All rights Reserved.
Reference:
Greenplum
PXF documentation
Thank You

More Related Content

What's hot

Redbooks with live links 2010 12-06
Redbooks with live links 2010 12-06Redbooks with live links 2010 12-06
Redbooks with live links 2010 12-06Willie Favero
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
 
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
VMware Tanzu
 
Optimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real WorldOptimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real World
DevOps.com
 
The Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with AlluxioThe Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with Alluxio
Alluxio, Inc.
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
Containerized Storage
Containerized StorageContainerized Storage
Containerized Storage
Red_Hat_Storage
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
Alluxio, Inc.
 
Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...
VMware Tanzu
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
Alluxio, Inc.
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
BigData_Europe
 
Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System
Alluxio, Inc.
 
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDsSeagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Red_Hat_Storage
 
Improving Presto performance with Alluxio at TikTok
Improving Presto performance with Alluxio at TikTokImproving Presto performance with Alluxio at TikTok
Improving Presto performance with Alluxio at TikTok
Alluxio, Inc.
 
Ten Reasons Why Netezza Professionals Should Consider Greenplum
Ten Reasons Why Netezza Professionals Should Consider GreenplumTen Reasons Why Netezza Professionals Should Consider Greenplum
Ten Reasons Why Netezza Professionals Should Consider Greenplum
VMware Tanzu
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
Vinayak Hegde
 
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
VMware Tanzu
 
Autonomous ETL with Materialized Views
Autonomous ETL with Materialized ViewsAutonomous ETL with Materialized Views
Autonomous ETL with Materialized Views
Abhishek Somani
 
Incorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product DesignerIncorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product Designer
The HDF-EOS Tools and Information Center
 
Why Software-Defined Storage Matters
Why Software-Defined Storage MattersWhy Software-Defined Storage Matters
Why Software-Defined Storage Matters
Red_Hat_Storage
 

What's hot (20)

Redbooks with live links 2010 12-06
Redbooks with live links 2010 12-06Redbooks with live links 2010 12-06
Redbooks with live links 2010 12-06
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
 
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
 
Optimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real WorldOptimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real World
 
The Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with AlluxioThe Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with Alluxio
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and Cloud
 
Containerized Storage
Containerized StorageContainerized Storage
Containerized Storage
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
 
Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 
Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System
 
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDsSeagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
 
Improving Presto performance with Alluxio at TikTok
Improving Presto performance with Alluxio at TikTokImproving Presto performance with Alluxio at TikTok
Improving Presto performance with Alluxio at TikTok
 
Ten Reasons Why Netezza Professionals Should Consider Greenplum
Ten Reasons Why Netezza Professionals Should Consider GreenplumTen Reasons Why Netezza Professionals Should Consider Greenplum
Ten Reasons Why Netezza Professionals Should Consider Greenplum
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
 
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
 
Autonomous ETL with Materialized Views
Autonomous ETL with Materialized ViewsAutonomous ETL with Materialized Views
Autonomous ETL with Materialized Views
 
Incorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product DesignerIncorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product Designer
 
Why Software-Defined Storage Matters
Why Software-Defined Storage MattersWhy Software-Defined Storage Matters
Why Software-Defined Storage Matters
 

Similar to Greenplum-PXF November 2018

Linux Containers Postgres Meetup Dec 2017
Linux Containers Postgres Meetup Dec 2017Linux Containers Postgres Meetup Dec 2017
Linux Containers Postgres Meetup Dec 2017
Cesar Rojas
 
Greenplum Roadmap
Greenplum RoadmapGreenplum Roadmap
Greenplum Roadmap
VMware Tanzu Korea
 
Oracle Big Data Jam Session #1 - オラクルのビッグデータ系サーバレス・サービスのポートフォリオ
Oracle Big Data Jam Session #1 - オラクルのビッグデータ系サーバレス・サービスのポートフォリオOracle Big Data Jam Session #1 - オラクルのビッグデータ系サーバレス・サービスのポートフォリオ
Oracle Big Data Jam Session #1 - オラクルのビッグデータ系サーバレス・サービスのポートフォリオ
オラクルエンジニア通信
 
Spring Cloud Stream: What's New in 2.x—and What's Next?
Spring Cloud Stream: What's New in 2.x—and What's Next?Spring Cloud Stream: What's New in 2.x—and What's Next?
Spring Cloud Stream: What's New in 2.x—and What's Next?
VMware Tanzu
 
Docker on Heroku のはじめ方
Docker on Heroku のはじめ方Docker on Heroku のはじめ方
Docker on Heroku のはじめ方
Takashi Abe
 
GPCloud ( GP on PKS)
GPCloud ( GP on PKS)GPCloud ( GP on PKS)
GPCloud ( GP on PKS)
VMware Tanzu Korea
 
Cloud Event Driven Architectures with Spring Cloud Stream 2.0 - SpringOne Tou...
Cloud Event Driven Architectures with Spring Cloud Stream 2.0 - SpringOne Tou...Cloud Event Driven Architectures with Spring Cloud Stream 2.0 - SpringOne Tou...
Cloud Event Driven Architectures with Spring Cloud Stream 2.0 - SpringOne Tou...
VMware Tanzu
 
gp text roadmap presentation
gp text roadmap presentationgp text roadmap presentation
gp text roadmap presentation
VMware Tanzu Korea
 
1 greenplum in banking sk cab
1 greenplum in banking   sk cab1 greenplum in banking   sk cab
1 greenplum in banking sk cab
VMware Tanzu Korea
 
Force.com Friday : Intro to Apex
Force.com Friday : Intro to Apex Force.com Friday : Intro to Apex
Force.com Friday : Intro to Apex
Salesforce Developers
 
Q4 2020 DBX Investor Presentation
Q4 2020 DBX Investor Presentation Q4 2020 DBX Investor Presentation
Q4 2020 DBX Investor Presentation
Dropbox
 
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to KnowSupporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
VMware Tanzu
 
Qonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big DataQonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big Data
John Park
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
Splunk
 
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
VMware Tanzu
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
Splunk
 
Access External Data in Real-time with Lightning Connect
Access External Data in Real-time with Lightning ConnectAccess External Data in Real-time with Lightning Connect
Access External Data in Real-time with Lightning Connect
Salesforce Developers
 
MuleSoft Online meetup - An expert's guide to Runtime fabric - August 2020
MuleSoft Online meetup -  An expert's guide to Runtime fabric - August 2020MuleSoft Online meetup -  An expert's guide to Runtime fabric - August 2020
MuleSoft Online meetup - An expert's guide to Runtime fabric - August 2020
Royston Lobo
 
Gemfire Introduction
Gemfire Introduction Gemfire Introduction
Gemfire Introduction
VMware Tanzu Korea
 
Greenplum User Case
Greenplum User Case Greenplum User Case
Greenplum User Case
VMware Tanzu Korea
 

Similar to Greenplum-PXF November 2018 (20)

Linux Containers Postgres Meetup Dec 2017
Linux Containers Postgres Meetup Dec 2017Linux Containers Postgres Meetup Dec 2017
Linux Containers Postgres Meetup Dec 2017
 
Greenplum Roadmap
Greenplum RoadmapGreenplum Roadmap
Greenplum Roadmap
 
Oracle Big Data Jam Session #1 - オラクルのビッグデータ系サーバレス・サービスのポートフォリオ
Oracle Big Data Jam Session #1 - オラクルのビッグデータ系サーバレス・サービスのポートフォリオOracle Big Data Jam Session #1 - オラクルのビッグデータ系サーバレス・サービスのポートフォリオ
Oracle Big Data Jam Session #1 - オラクルのビッグデータ系サーバレス・サービスのポートフォリオ
 
Spring Cloud Stream: What's New in 2.x—and What's Next?
Spring Cloud Stream: What's New in 2.x—and What's Next?Spring Cloud Stream: What's New in 2.x—and What's Next?
Spring Cloud Stream: What's New in 2.x—and What's Next?
 
Docker on Heroku のはじめ方
Docker on Heroku のはじめ方Docker on Heroku のはじめ方
Docker on Heroku のはじめ方
 
GPCloud ( GP on PKS)
GPCloud ( GP on PKS)GPCloud ( GP on PKS)
GPCloud ( GP on PKS)
 
Cloud Event Driven Architectures with Spring Cloud Stream 2.0 - SpringOne Tou...
Cloud Event Driven Architectures with Spring Cloud Stream 2.0 - SpringOne Tou...Cloud Event Driven Architectures with Spring Cloud Stream 2.0 - SpringOne Tou...
Cloud Event Driven Architectures with Spring Cloud Stream 2.0 - SpringOne Tou...
 
gp text roadmap presentation
gp text roadmap presentationgp text roadmap presentation
gp text roadmap presentation
 
1 greenplum in banking sk cab
1 greenplum in banking   sk cab1 greenplum in banking   sk cab
1 greenplum in banking sk cab
 
Force.com Friday : Intro to Apex
Force.com Friday : Intro to Apex Force.com Friday : Intro to Apex
Force.com Friday : Intro to Apex
 
Q4 2020 DBX Investor Presentation
Q4 2020 DBX Investor Presentation Q4 2020 DBX Investor Presentation
Q4 2020 DBX Investor Presentation
 
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to KnowSupporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
 
Qonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big DataQonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big Data
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Access External Data in Real-time with Lightning Connect
Access External Data in Real-time with Lightning ConnectAccess External Data in Real-time with Lightning Connect
Access External Data in Real-time with Lightning Connect
 
MuleSoft Online meetup - An expert's guide to Runtime fabric - August 2020
MuleSoft Online meetup -  An expert's guide to Runtime fabric - August 2020MuleSoft Online meetup -  An expert's guide to Runtime fabric - August 2020
MuleSoft Online meetup - An expert's guide to Runtime fabric - August 2020
 
Gemfire Introduction
Gemfire Introduction Gemfire Introduction
Gemfire Introduction
 
Greenplum User Case
Greenplum User Case Greenplum User Case
Greenplum User Case
 

Recently uploaded

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 

Recently uploaded (20)

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 

Greenplum-PXF November 2018

  • 1. © Copyright 2017 Pivotal Software, Inc. All rights Reserved. Pivotal Greenplum Hadoop Integration with PXF Kong Yew, Chan kochan@pivotal.io Last update: Nov 2018
  • 2. Disclaimer This presentation contains statements relating to Pivotal’s expectations, projections, beliefs and prospects which are "forward-looking statements” about Pivotal’s future which by their nature are uncertain. Such forward-looking statements are not guarantees of future performance, and you are cautioned not to place undue reliance on these forward-looking statements. Actual results could differ materially from those projected in the forward-looking statements as a result of many factors, including but not limited to: (i) adverse changes in general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) risks associated with managing the growth of Pivotal’s business, including operating costs; (iv) changes to Pivotal’s software business model; (v) competitive factors, including pricing pressures and new product introductions; (vi) Pivotal’s customers' ability to transition to new products and computing strategies such as cloud computing, the uncertainty of customer acceptance of emerging technologies, and rapid technological and market changes; (vii) Pivotal's ability to protect its proprietary technology; (viii) Pivotal’s ability to attract and retain highly qualified employees; (ix) Pivotal’s ability to execute on its plans and strategy; and (x) risks related to data and information security vulnerabilities. All information set forth in this presentation is current as of the date of this presentation. These forward-looking statements are based on current expectations and are subject to uncertainties and changes in condition, significance, value and effect as well as other risks disclosed previously and from time to time in documents filed by Dell Technologies Inc., the parent company of Pivotal, with the U.S. Securities and Exchange Commission. Dell and Pivotal assume no obligation to, and do not currently intend to, update any such forward-looking statements after the date of this presentation. The following is intended to outline the general direction of Pivotal's offerings. It is intended for information purposes only and may not be incorporated into any contract. Any information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to ongoing evaluation by Pivotal and is subject to change. This information is provided without warranty or any kind, express or implied, and is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions regarding Pivotal's offerings. These purchasing decisions should only be based on features currently available. The development, release, and timing of any features or functionality described for Pivotal's offerings in this presentation remain at the sole discretion of Pivotal. Pivotal has no obligation to update forward- looking information in this presentation.
  • 3. Agenda ● Use Cases ● PXF Architecture ● Q&A
  • 4. Greenplum-PXF use cases ● Combined Data Analysis ● Hadoop as the data lake / landing zone ● Hadoop as the data archive store (cold data store) ● Import / Export data for external systems (RDBMS)
  • 5. PXF Extension Framework Overview Apache Tomcat PXF WebappREST API Java API External Tables HTTP, port: 5888 Java API Java API Pivotal Greenplum In progress: PXF Service* PXF Service is installed on each GPDB segment*
  • 6. Design - Define External Tables (SQL) CREATE EXTERNAL TABLE ext_table <attr list, ...> LOCATION('pxf://path/to/data?PROFILE=<profile>& <Other custom user options>=<value>') FORMAT'custom'(formatter='pxfwritable_import'); CREATE EXTERNAL TABLE ext_table <attr list, ...> LOCATION('pxf://path/to/data?PROFILE=avro FORMAT'custom'(formatter='pxfwritable_import'); * When defining a table, PXF doesn’t check if the file exists, accessible or name, port are correct, etc. It is just a DEFINITION.
  • 7. PXF Profiles - Examples CREATE EXTERNAL TABLE pxf_hdfs_textsimple (location text, month text, num_orders int, total_sales float8) LOCATION ('pxf://data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple') FORMAT 'TEXT' (delimiter=E','); CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name ( column_name data_type [, ...] | LIKE other_table ) LOCATION ('pxf://[alias:]/path?Profile=') FORMAT 'CUSTOM' (Formatter='pxfwritable_import'); CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap text, relationship text, address text) LOCATION ('pxf://data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:& RECORDKEY_DELIM=:') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
  • 8. PXF Plugins, Profiles to be ported from Hawq • Hawq Available Profiles • HDFS: HDFSTextSimple(R/W), HDFSTextMulti(RO), Avro(RO) • Hive: Hive, HiveRC, HiveText, HiveORC • HBase: HBase • JDBC profile ...
  • 9. PXF Filter Push Down • Goal: Performance , Efficiency, less data over wire • Criteria: SQL “Where” Clause • Single Expression or a group of “AND/OR” Expressions • Supported data types and operators • Types: text, int, smallint, bigint. • Operators: EQ, NE, LT, GT, LE, GE & AND
  • 10. © Copyright 2017 Pivotal Software, Inc. All rights Reserved. Reference: Greenplum PXF documentation Thank You