SlideShare a Scribd company logo
© Copyright 2017 Pivotal Software, Inc. All rights Reserved.
Pivotal Greenplum
Hadoop Integration with PXF
Kong Yew, Chan
kochan@pivotal.io
Last update: Nov 2018
Disclaimer
This presentation contains statements relating to Pivotal’s expectations, projections, beliefs and prospects which are "forward-looking
statements” about Pivotal’s future which by their nature are uncertain. Such forward-looking statements are not guarantees of future performance,
and you are cautioned not to place undue reliance on these forward-looking statements. Actual results could differ materially from those projected in
the forward-looking statements as a result of many factors, including but not limited to: (i) adverse changes in general economic or market conditions;
(ii) delays or reductions in information technology spending; (iii) risks associated with managing the growth of Pivotal’s business, including operating
costs; (iv) changes to Pivotal’s software business model; (v) competitive factors, including pricing pressures and new product introductions; (vi)
Pivotal’s customers' ability to transition to new products and computing strategies such as cloud computing, the uncertainty of customer acceptance
of emerging technologies, and rapid technological and market changes; (vii) Pivotal's ability to protect its proprietary technology; (viii) Pivotal’s ability
to attract and retain highly qualified employees; (ix) Pivotal’s ability to execute on its plans and strategy; and (x) risks related to data and information
security vulnerabilities. All information set forth in this presentation is current as of the date of this presentation. These forward-looking statements are
based on current expectations and are subject to uncertainties and changes in condition, significance, value and effect as well as other risks disclosed
previously and from time to time in documents filed by Dell Technologies Inc., the parent company of Pivotal, with the U.S. Securities and Exchange
Commission. Dell and Pivotal assume no obligation to, and do not currently intend to, update any such forward-looking statements after the date of
this presentation.
The following is intended to outline the general direction of Pivotal's offerings. It is intended for information purposes only and may not be
incorporated into any contract. Any information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to
ongoing evaluation by Pivotal and is subject to change. This information is provided without warranty or any kind, express or implied, and is not a
commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions regarding Pivotal's offerings.
These purchasing decisions should only be based on features currently available. The development, release, and timing of any features or
functionality described for Pivotal's offerings in this presentation remain at the sole discretion of Pivotal. Pivotal has no obligation to update forward-
looking information in this presentation.
Agenda
● Use Cases
● PXF Architecture
● Q&A
Greenplum-PXF use cases
● Combined Data Analysis
● Hadoop as the data lake / landing zone
● Hadoop as the data archive store (cold data store)
● Import / Export data for external systems (RDBMS)
PXF Extension Framework Overview
Apache Tomcat
PXF WebappREST API Java API
External Tables
HTTP, port: 51200
Java API
Java API
Pivotal Greenplum
In progress:
PXF Service*
PXF Service is installed on each GPDB segment*
Design - Define External Tables (SQL)
CREATE EXTERNAL TABLE ext_table <attr list, ...>
LOCATION('pxf://path/to/data?PROFILE=<profile>&
<Other custom user options>=<value>')
FORMAT'custom'(formatter='pxfwritable_import');
CREATE EXTERNAL TABLE ext_table <attr list, ...>
LOCATION('pxf://path/to/data?PROFILE=avro
FORMAT'custom'(formatter='pxfwritable_import');
* When defining a table, PXF doesn’t check if the file exists, accessible or name, port are correct, etc.
It is just a DEFINITION.
PXF Profiles - Examples
CREATE EXTERNAL TABLE pxf_hdfs_textsimple
(location text, month text, num_orders int, total_sales float8)
LOCATION
('pxf://data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
FORMAT 'TEXT' (delimiter=E',');
CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name
( column_name data_type [, ...] | LIKE other_table )
LOCATION ('pxf://[alias:]/path?Profile=')
FORMAT 'CUSTOM' (Formatter='pxfwritable_import');
CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap text,
relationship text, address text)
LOCATION
('pxf://data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&
RECORDKEY_DELIM=:')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
PXF Plugins, Profiles to be ported from Hawq
• Hawq Available Profiles
• HDFS: HDFSTextSimple(R/W), HDFSTextMulti(RO), Avro(RO)
• Hive: Hive, HiveRC, HiveText, HiveORC
• HBase: HBase
• JDBC profile ...
PXF Filter Push Down
• Goal: Performance , Efficiency, less data over wire
• Criteria: SQL “Where” Clause
• Single Expression or a group of “AND/OR” Expressions
• Supported data types and operators
• Types: text, int, smallint, bigint.
• Operators: EQ, NE, LT, GT, LE, GE & AND
© Copyright 2017 Pivotal Software, Inc. All rights Reserved.
Questions? Contact kochan@pivotal.io
Thank You

More Related Content

Similar to Greenplum PXF-Nov 2018

gp text roadmap presentation
gp text roadmap presentationgp text roadmap presentation
gp text roadmap presentation
VMware Tanzu Korea
 
1 greenplum in banking sk cab
1 greenplum in banking   sk cab1 greenplum in banking   sk cab
1 greenplum in banking sk cab
VMware Tanzu Korea
 
Q4 2020 DBX Investor Presentation
Q4 2020 DBX Investor Presentation Q4 2020 DBX Investor Presentation
Q4 2020 DBX Investor Presentation
Dropbox
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
Splunk
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
Splunk
 
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to KnowSupporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
VMware Tanzu
 
Access External Data in Real-time with Lightning Connect
Access External Data in Real-time with Lightning ConnectAccess External Data in Real-time with Lightning Connect
Access External Data in Real-time with Lightning Connect
Salesforce Developers
 
Salesforce Developer Group Toronto - Winter'19
Salesforce Developer Group Toronto - Winter'19Salesforce Developer Group Toronto - Winter'19
Salesforce Developer Group Toronto - Winter'19
Jaswinder Rattanpal
 
Gemfire Introduction
Gemfire Introduction Gemfire Introduction
Gemfire Introduction
VMware Tanzu Korea
 
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
VMware Tanzu
 
MuleSoft Online meetup - An expert's guide to Runtime fabric - August 2020
MuleSoft Online meetup -  An expert's guide to Runtime fabric - August 2020MuleSoft Online meetup -  An expert's guide to Runtime fabric - August 2020
MuleSoft Online meetup - An expert's guide to Runtime fabric - August 2020
Royston Lobo
 
Java Tech & Tools | Deploying Java & Play Framework Apps to the Cloud | Sande...
Java Tech & Tools | Deploying Java & Play Framework Apps to the Cloud | Sande...Java Tech & Tools | Deploying Java & Play Framework Apps to the Cloud | Sande...
Java Tech & Tools | Deploying Java & Play Framework Apps to the Cloud | Sande...JAX London
 
Qonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big DataQonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big Data
John Park
 
You've Made Kubernetes Available to Your Developers, Now What?
You've Made Kubernetes Available to Your Developers, Now What?You've Made Kubernetes Available to Your Developers, Now What?
You've Made Kubernetes Available to Your Developers, Now What?
cornelia davis
 
Getting Started with Splunk Enterprise Hands-On
Getting Started with Splunk Enterprise Hands-OnGetting Started with Splunk Enterprise Hands-On
Getting Started with Splunk Enterprise Hands-On
Splunk
 
Splunk in Otto: Business Analytics
Splunk in Otto: Business Analytics Splunk in Otto: Business Analytics
Splunk in Otto: Business Analytics
Timur Bagirov
 
Data hero dream ole19
Data hero dream ole19Data hero dream ole19
Data hero dream ole19
rikkehovgaard
 
Greenplum User Case
Greenplum User Case Greenplum User Case
Greenplum User Case
VMware Tanzu Korea
 
B2conference performance 2004
B2conference performance 2004B2conference performance 2004
B2conference performance 2004Steve Feldman
 
Making Microservices Smarter with Istio, Envoy and Pivotal Ingress Router
Making Microservices Smarter with Istio, Envoy and Pivotal Ingress RouterMaking Microservices Smarter with Istio, Envoy and Pivotal Ingress Router
Making Microservices Smarter with Istio, Envoy and Pivotal Ingress Router
VMware Tanzu
 

Similar to Greenplum PXF-Nov 2018 (20)

gp text roadmap presentation
gp text roadmap presentationgp text roadmap presentation
gp text roadmap presentation
 
1 greenplum in banking sk cab
1 greenplum in banking   sk cab1 greenplum in banking   sk cab
1 greenplum in banking sk cab
 
Q4 2020 DBX Investor Presentation
Q4 2020 DBX Investor Presentation Q4 2020 DBX Investor Presentation
Q4 2020 DBX Investor Presentation
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to KnowSupporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
Supporting Java, Spring, and OpenJDK in the Enterprise: What You Need to Know
 
Access External Data in Real-time with Lightning Connect
Access External Data in Real-time with Lightning ConnectAccess External Data in Real-time with Lightning Connect
Access External Data in Real-time with Lightning Connect
 
Salesforce Developer Group Toronto - Winter'19
Salesforce Developer Group Toronto - Winter'19Salesforce Developer Group Toronto - Winter'19
Salesforce Developer Group Toronto - Winter'19
 
Gemfire Introduction
Gemfire Introduction Gemfire Introduction
Gemfire Introduction
 
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
 
MuleSoft Online meetup - An expert's guide to Runtime fabric - August 2020
MuleSoft Online meetup -  An expert's guide to Runtime fabric - August 2020MuleSoft Online meetup -  An expert's guide to Runtime fabric - August 2020
MuleSoft Online meetup - An expert's guide to Runtime fabric - August 2020
 
Java Tech & Tools | Deploying Java & Play Framework Apps to the Cloud | Sande...
Java Tech & Tools | Deploying Java & Play Framework Apps to the Cloud | Sande...Java Tech & Tools | Deploying Java & Play Framework Apps to the Cloud | Sande...
Java Tech & Tools | Deploying Java & Play Framework Apps to the Cloud | Sande...
 
Qonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big DataQonnections2015 - Why Qlik is better with Big Data
Qonnections2015 - Why Qlik is better with Big Data
 
You've Made Kubernetes Available to Your Developers, Now What?
You've Made Kubernetes Available to Your Developers, Now What?You've Made Kubernetes Available to Your Developers, Now What?
You've Made Kubernetes Available to Your Developers, Now What?
 
Getting Started with Splunk Enterprise Hands-On
Getting Started with Splunk Enterprise Hands-OnGetting Started with Splunk Enterprise Hands-On
Getting Started with Splunk Enterprise Hands-On
 
Splunk in Otto: Business Analytics
Splunk in Otto: Business Analytics Splunk in Otto: Business Analytics
Splunk in Otto: Business Analytics
 
Data hero dream ole19
Data hero dream ole19Data hero dream ole19
Data hero dream ole19
 
Greenplum User Case
Greenplum User Case Greenplum User Case
Greenplum User Case
 
B2conference performance 2004
B2conference performance 2004B2conference performance 2004
B2conference performance 2004
 
Making Microservices Smarter with Istio, Envoy and Pivotal Ingress Router
Making Microservices Smarter with Istio, Envoy and Pivotal Ingress RouterMaking Microservices Smarter with Istio, Envoy and Pivotal Ingress Router
Making Microservices Smarter with Istio, Envoy and Pivotal Ingress Router
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 

Greenplum PXF-Nov 2018

  • 1. © Copyright 2017 Pivotal Software, Inc. All rights Reserved. Pivotal Greenplum Hadoop Integration with PXF Kong Yew, Chan kochan@pivotal.io Last update: Nov 2018
  • 2. Disclaimer This presentation contains statements relating to Pivotal’s expectations, projections, beliefs and prospects which are "forward-looking statements” about Pivotal’s future which by their nature are uncertain. Such forward-looking statements are not guarantees of future performance, and you are cautioned not to place undue reliance on these forward-looking statements. Actual results could differ materially from those projected in the forward-looking statements as a result of many factors, including but not limited to: (i) adverse changes in general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) risks associated with managing the growth of Pivotal’s business, including operating costs; (iv) changes to Pivotal’s software business model; (v) competitive factors, including pricing pressures and new product introductions; (vi) Pivotal’s customers' ability to transition to new products and computing strategies such as cloud computing, the uncertainty of customer acceptance of emerging technologies, and rapid technological and market changes; (vii) Pivotal's ability to protect its proprietary technology; (viii) Pivotal’s ability to attract and retain highly qualified employees; (ix) Pivotal’s ability to execute on its plans and strategy; and (x) risks related to data and information security vulnerabilities. All information set forth in this presentation is current as of the date of this presentation. These forward-looking statements are based on current expectations and are subject to uncertainties and changes in condition, significance, value and effect as well as other risks disclosed previously and from time to time in documents filed by Dell Technologies Inc., the parent company of Pivotal, with the U.S. Securities and Exchange Commission. Dell and Pivotal assume no obligation to, and do not currently intend to, update any such forward-looking statements after the date of this presentation. The following is intended to outline the general direction of Pivotal's offerings. It is intended for information purposes only and may not be incorporated into any contract. Any information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to ongoing evaluation by Pivotal and is subject to change. This information is provided without warranty or any kind, express or implied, and is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions regarding Pivotal's offerings. These purchasing decisions should only be based on features currently available. The development, release, and timing of any features or functionality described for Pivotal's offerings in this presentation remain at the sole discretion of Pivotal. Pivotal has no obligation to update forward- looking information in this presentation.
  • 3. Agenda ● Use Cases ● PXF Architecture ● Q&A
  • 4. Greenplum-PXF use cases ● Combined Data Analysis ● Hadoop as the data lake / landing zone ● Hadoop as the data archive store (cold data store) ● Import / Export data for external systems (RDBMS)
  • 5. PXF Extension Framework Overview Apache Tomcat PXF WebappREST API Java API External Tables HTTP, port: 51200 Java API Java API Pivotal Greenplum In progress: PXF Service* PXF Service is installed on each GPDB segment*
  • 6. Design - Define External Tables (SQL) CREATE EXTERNAL TABLE ext_table <attr list, ...> LOCATION('pxf://path/to/data?PROFILE=<profile>& <Other custom user options>=<value>') FORMAT'custom'(formatter='pxfwritable_import'); CREATE EXTERNAL TABLE ext_table <attr list, ...> LOCATION('pxf://path/to/data?PROFILE=avro FORMAT'custom'(formatter='pxfwritable_import'); * When defining a table, PXF doesn’t check if the file exists, accessible or name, port are correct, etc. It is just a DEFINITION.
  • 7. PXF Profiles - Examples CREATE EXTERNAL TABLE pxf_hdfs_textsimple (location text, month text, num_orders int, total_sales float8) LOCATION ('pxf://data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple') FORMAT 'TEXT' (delimiter=E','); CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name ( column_name data_type [, ...] | LIKE other_table ) LOCATION ('pxf://[alias:]/path?Profile=') FORMAT 'CUSTOM' (Formatter='pxfwritable_import'); CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap text, relationship text, address text) LOCATION ('pxf://data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:& RECORDKEY_DELIM=:') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
  • 8. PXF Plugins, Profiles to be ported from Hawq • Hawq Available Profiles • HDFS: HDFSTextSimple(R/W), HDFSTextMulti(RO), Avro(RO) • Hive: Hive, HiveRC, HiveText, HiveORC • HBase: HBase • JDBC profile ...
  • 9. PXF Filter Push Down • Goal: Performance , Efficiency, less data over wire • Criteria: SQL “Where” Clause • Single Expression or a group of “AND/OR” Expressions • Supported data types and operators • Types: text, int, smallint, bigint. • Operators: EQ, NE, LT, GT, LE, GE & AND
  • 10. © Copyright 2017 Pivotal Software, Inc. All rights Reserved. Questions? Contact kochan@pivotal.io Thank You