SlideShare a Scribd company logo
1© Copyright 2018 Pivotal. All rights reserved.
2.0
2
01
8
2© Copyright 2018 Pivotal. All rights reserved.
Ÿ
–
– Pivotal
Ÿ Pivotal Greenplum
–
–
–
Ÿ
– AWS&Azure Pivotal Greenplum
– (AWS S3)
– Apache MADlib(k-Means Clustering) In-DB
Ÿ
(DWH/Hadoop)
&
(IaaS/PaaS)
(M2M/ )
DevOps
CI/CD
Pivotal Data Suite
Spring Cloud Data FlowGemFireGreenplum
b nop Q N G
DS L aM L
l r
ed )( a
Q N P
l r
Q Nh P cs
am A L
aM Lc
5© Copyright 2018 Pivotal. All rights reserved.
RDBMS
Hadoop
Analytics
Apps
Online
Apps
Mobile
Apps
Analytics
Apps
Online
Apps
Mobile
Apps
Machine
PIVOTAL
GREENPLUM
PIVOTAL
GEMFIRE
Spring Cloud
Data Flow
Hadoop Data Lakes
Massively Parallel Architecture
Public Cloud Data Lakes
Predefined Libraries
Programmatic
GPText
Parallel Configurable Data Load
High Speed
Ingestion
Analytical
Data to cache
In-Memory Data Grid
Parallel Data Load and External Tables
Pivotal Data Suite
In-DB Predictive Analytics
ColdHotWarm
DataTemperature
PIVOTAL
GEMFIRE
PIVOTAL
GREENPLUM
(Data Warehouse)
7© Copyright 2018 Pivotal. All rights reserved.
Pivotal Greenplum
• Pivotal Data Suite (CPU )
•
•
• ( )
• MPP DB
•
• ( etc..)
•
•
•
8© Copyright 2018 Pivotal. All rights reserved.
Pivotal Greenplum
MPP (Massively Parallel Processing)
... ...
x 2
x 2
SQL
SQL gNet
9© Copyright 2018 Pivotal. All rights reserved.
CPU
I/O
CPU CPU CPU CPU
CPU CPU CPU CPU CPU
I/O I/O
HW
RDB DB
10© Copyright 2018 Pivotal. All rights reserved.
Pivotal Greenplum
( )
• 
–  RDBMS
– 
• 
– 
–  IO
–  DB
A B C D A B C D
IO IO
• 
• 
– 
– 
• 
1
2008 2009 2010 2011 2012
? ? ? ??
? ? ? ?? ?
?
?
?
? ? ? ?? ? ? ?
?
?
?
? ? ? ?
? ? ? ?
: HIGH
: LOW
: Medium
XXX
YYY
ZZZ
?
?
? ? ? ? ? ? ? ?
? ? ? ?
AAA
( )
( )
BBB
CCC
11© Copyright 2018 Pivotal. All rights reserved.
Pivotal Greenplum/Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Greenplum
Greenplum
Greenplum
Pivotal Greenplum
→ DB Hadoop
Pivotal Greenplum/Hadoop
→
Pivotal Greenplum/Hadoop
→
10Gb
Pivotal Greenplum
Hadoop
Pivotal Greenplum
Hadoop DB
12© Copyright 2018 Pivotal. All rights reserved.
Hadoop Data Lakes Public Cloud Data Lakes HybridLocal
Pivotal Greenplum 5
• / /
•
•
• PostgreSQL
PostgreSQL
• / /
/
BI/
•
14© Copyright 2018 Pivotal. All rights reserved.
15© Copyright 2018 Pivotal. All rights reserved.
Pivotal Greenplum 5
• / /
•
•
• PostgreSQL
PostgreSQL
• / /
/
BI/
•
18© Copyright 2018 Pivotal. All rights reserved.
DWH
BI/
DWH ( )
BI/
Pivotal Greenplum OSS
•
• 50
• Pivotal Greenplum
• Apache
2017
7 :
http://madlib.apache.
org
Apache MADlib
•
•
• http://lucene.apache.
org/solr/
Apache Solr
•
PostgreSQL
OSS
•
•
• http://postgis.net/
PostGIS
•
•
•
R
• https://www.r-
project.org/
R
•
•
•
•
• https://www.python.o
rg/
Python
Pivotal
2km
ATM 24 200
Peter
Pavan
Pivotal 2km ATM 24 200
Peter Pavan
drop function if exists get_people(text,text,integer,integer,float,float);
CREATE FUNCTION get_people(text,text,integer,integer,float,float) RETURNS integer
AS $$
declare
linkchk integer; v1 record; v2 record;
begin
execute 'truncate table results;';
for v1 in select distinct a.id,a.firstname,a.lastname,amount,tran_date,c.lat,c.lng,address,a.description,d.score from people a,transactions b,location c,
(SELECT w.id, q.score FROM people w, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gpadmin.public.people' , 'Pivotal', null) q
WHERE (q.id::integer) = w.id order by 2 desc) d
where soundex(firstname)=soundex($1) and a.id=b.id and amount > $3 and (extract(epoch from tran_date) - extract(epoch from now()))/3600 < $4
and st_distance_sphere(st_makepoint($5, $6),st_makepoint(c.lng, c.lat))/1000.0 <= 2.0 and b.locid=c.locid and a.id=d.id
loop
for v2 in select distinct a.id,a.firstname,a.lastname,amount,tran_date,c.lat,c.lng,address,a.description,d.score from people a,transactions b,location c,
(SELECT w.id, q.score FROM people w, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gpadmin.public.people' , 'Pivotal', null) q
WHERE (q.id::integer) = w.id order by 2 desc) d
where soundex(firstname)=soundex($2) and a.id=b.id and amount > $3 and (extract(epoch from tran_date) - extract(epoch from now()))/3600 < $4
and st_distance_sphere(st_makepoint($5, $6),st_makepoint(c.lng, c.lat))/1000.0 <= 2.0 and b.locid=c.locid and a.id=d.id
loop
execute 'DROP TABLE IF EXISTS out, out_summary;';
execute 'SELECT madlib.graph_bfs(''people'',''id'',''links'',NULL,'||v1.id||',''out'');' ;
select 1 into linkchk from out where dist=1 and id=v2.id;
if linkchk is not null then
insert into results values (v1.id,v1.firstname,v1.lastname,v1.amount,v1.tran_date,v1.lat,v1.lng,v1.address,v1.description,v1.score);
insert into results values (v2.id,v2.firstname,v2.lastname,v2.amount,v2.tran_date,v2.lat,v2.lng,v2.address,v2.description,v2.score);
end if;
end loop;
end loop;
return 0;
end
$$ LANGUAGE plpgsql;
-- person1 , person 2, amount, duration in hours, longtitude, latitude (in question)
select get_people('Pavan','Peter',200,24,103.912680, 1.309432) ;
Greenplum POSTGIS functions
st_distance_sphere() and
st_makepoint() calculate distance
between ATM location and
reference lat ,long < 2 KM
GPText.search() function is
used to know if both
people work at ‘Pivotal’
Greenplum and Apache MADlib BFS
search to know if there are direct or
indirect links between people
Greenplum Fuzzy String
Match function Soundex()
to know if people name
sounds like ‘Pavan’ or
‘Peter’
Greenplum Time functions to
calculate difference in amount
withdrawn time < 24 hours
Amount
> $200
“Pivotal
- GPText
Peter
Pavan
- Fuzzy
String Match
- Apache MADlib 2km ATM”
- PostGIS
24 ”
/
200
”
•
• 50
• Pivotal Greenplum
• http://madlib.apache.org/
Apache MADlib
24© Copyright 2018 Pivotal. All rights reserved.
MADlib
•
•
• (SVD)
•
•
•
•
•
•
•
•
• (PCA)
• (
, )
• ( LDA)
•
• ( )
•
• (CRF)
• (K )
•
• CountMin
• Flajolet-Martin
•
Latest release: MADlib v1.15, URL: madlib.net
Pivotal Greenplum 5
• / /
•
•
• PostgreSQL
PostgreSQL
• / /
/
BI/
•
26© Copyright 2018 Pivotal. All rights reserved.
1
PL/Container: R/Python Containerization on Greenplum
Ÿ Python R
Pivotal Greenplum
PL/Python PL/R
Ÿ UDF
Ÿ DBA UDF
DBA
Ÿ PL/Container: Greenplum
Docker
– R Python Greenplum
– UDF
– UDF
– Python/R
–
–
System(“rm -rf /data”)
++
Standby
Master
…
Master
Host
Interconnect
Segment Host
Primary Mirror
Node1
Segment Host
Primary Mirror
Node2
Segment Host
Primary Mirror
NodeN
Greenplum for Kubernetes
Pod
kubelet kube-proxy docker
master
Pod
kubelet kube-proxy docker
standby
Pod
kubelet kube-proxy docker
Mirror
Pod
kubelet kube-proxy docker
Primary
Pod
kubelet kube-proxy docker
Mirror
Pod
kubelet kube-proxy docker
Primary
28© Copyright 2018 Pivotal. All rights reserved.
2
Greenplum for Kubernetes
Ÿ Day 1 - Easy to Build
–
–
–
–
Ÿ Pivotal
Ÿ
Ÿ BI/ETL/
Ÿ Day 2 - Easy to Operate
– &
–
–
– &
– &
–
– &
Container-Native Databases.
PostgreSQL
OSS
• Greenplum OSS 2 Greenplum
PostgreSQL
• (3355 PostgreSQL
commits) Greenplum 5
•
• New Data Types
– JSON &
– Hstore( )
– XML /
– UUID
• Analyze
• (Do )
• default/Variadic
• DBLink
• : PostGIS
• Lazy XID
•
31© Copyright 2018 Pivotal. All rights reserved.
Greenplum
32© Copyright 2018 Pivotal. All rights reserved.
Ÿ
– AWS
– Azure
Ÿ ( )
– AWS S3
Ÿ Apache MADlib
– k-Means Clustering
▪ URL:
http://madlib.apache.org/docs/v1.
8/group__grp__kmeans.html
Ÿ
– CentOS 7.5 64bit
– Pivotal Greenplum 5.9.0
Ÿ EC2
– r4.xlarge
▪ URL:
https://aws.amazon.com/market
place/pp/B06XKQ8Z3H
33© Copyright 2018 Pivotal. All rights reserved.
Hadoop Data Lakes
Massively Parallel Architecture
Public Cloud Data Lakes
Predefined Libraries
Programmatic
GPText
Parallel Configurable Data Load
High Speed
Ingestion
Analytical
Data to cache
In-Memory Data Grid
Parallel Data Load and External Tables
Pivotal Data Suite
In-DB Predictive Analytics
ColdHotWarm
DataTemperature
PIVOTAL
GEMFIRE
PIVOTAL
GREENPLUM
(Data Warehouse)
PIVOTAL
GREENPLUM
Structured Data
JDBC, OBBC
SQL
ANSI SQL
RDBMS
SparkGemFireHDFS
JSON, Apache AVRO, Apache Parquet and XML
Teradata SQL
DB SQL
Apache MADlib
/ /
Python. R,
Java, Perl, C
Apache SOLR PostGIS
Custom Apps BI / Reporting Machine Learning AI
Pivotal Greenplum
KafkaETL
Spring
Cloud
Data Flow
(MPP)
PostgreSQL
(GPORCA)
Command
Center
SQL
(Hyper-Q)
IT
• Transforming your Data platform for Analytics -
– : 2018 10 23 ( ) 15:00-17:30(14:30 )
– : Pivotal ( 20F)
– : https://pivotal.omniattend.com/seminar/Data20181023
• Pivotal Japan Tech Community
– https://pivotal-japan.connpass.com/
• Twitter: @greenplummy
– https://twitter.com/greenplummy

More Related Content

What's hot

Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
DataWorks Summit
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
DataWorks Summit
 
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
VMware Tanzu
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PivotalOpenSourceHub
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
rightsize
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
Jan Pieter Posthuma
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
 
November 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity CalculatorNovember 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity CalculatorYahoo Developer Network
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hive
David Kaiser
 
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian... White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
EMC
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
DataWorks Summit/Hadoop Summit
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Caserta
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
DataWorks Summit
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Rohit Kulkarni
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Sumeet Singh
 

What's hot (20)

Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
 
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
Scaling hadoopapplications
Scaling hadoopapplicationsScaling hadoopapplications
Scaling hadoopapplications
 
November 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity CalculatorNovember 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity Calculator
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hive
 
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian... White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 

Similar to Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム

Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
Skillspeed
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
tshiran
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
Keith Kraus
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Anant Corporation
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
CRS4 Research Center in Sardinia
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Debraj GuhaThakurta
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
Sanket Shikhar
 
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache Spark
Dan Lynn
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
DoKC
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
Márton Kodok
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
ModusOptimum
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
DataKitchen
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
Denny Lee
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Julien Le Dem
 

Similar to Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム (20)

Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache Spark
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 

Recently uploaded

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム

  • 1. 1© Copyright 2018 Pivotal. All rights reserved. 2.0 2 01 8
  • 2. 2© Copyright 2018 Pivotal. All rights reserved. Ÿ – – Pivotal Ÿ Pivotal Greenplum – – – Ÿ – AWS&Azure Pivotal Greenplum – (AWS S3) – Apache MADlib(k-Means Clustering) In-DB Ÿ
  • 4. Pivotal Data Suite Spring Cloud Data FlowGemFireGreenplum b nop Q N G DS L aM L l r ed )( a Q N P l r Q Nh P cs am A L aM Lc
  • 5. 5© Copyright 2018 Pivotal. All rights reserved. RDBMS Hadoop Analytics Apps Online Apps Mobile Apps Analytics Apps Online Apps Mobile Apps Machine PIVOTAL GREENPLUM PIVOTAL GEMFIRE Spring Cloud Data Flow
  • 6. Hadoop Data Lakes Massively Parallel Architecture Public Cloud Data Lakes Predefined Libraries Programmatic GPText Parallel Configurable Data Load High Speed Ingestion Analytical Data to cache In-Memory Data Grid Parallel Data Load and External Tables Pivotal Data Suite In-DB Predictive Analytics ColdHotWarm DataTemperature PIVOTAL GEMFIRE PIVOTAL GREENPLUM (Data Warehouse)
  • 7. 7© Copyright 2018 Pivotal. All rights reserved. Pivotal Greenplum • Pivotal Data Suite (CPU ) • • • ( ) • MPP DB • • ( etc..) • • •
  • 8. 8© Copyright 2018 Pivotal. All rights reserved. Pivotal Greenplum MPP (Massively Parallel Processing) ... ... x 2 x 2 SQL SQL gNet
  • 9. 9© Copyright 2018 Pivotal. All rights reserved. CPU I/O CPU CPU CPU CPU CPU CPU CPU CPU CPU I/O I/O HW RDB DB
  • 10. 10© Copyright 2018 Pivotal. All rights reserved. Pivotal Greenplum ( ) •  –  RDBMS –  •  –  –  IO –  DB A B C D A B C D IO IO •  •  –  –  •  1 2008 2009 2010 2011 2012 ? ? ? ?? ? ? ? ?? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? : HIGH : LOW : Medium XXX YYY ZZZ ? ? ? ? ? ? ? ? ? ? ? ? ? ? AAA ( ) ( ) BBB CCC
  • 11. 11© Copyright 2018 Pivotal. All rights reserved. Pivotal Greenplum/Hadoop Hadoop Hadoop Hadoop Hadoop Greenplum Greenplum Greenplum Pivotal Greenplum → DB Hadoop Pivotal Greenplum/Hadoop → Pivotal Greenplum/Hadoop → 10Gb Pivotal Greenplum Hadoop Pivotal Greenplum Hadoop DB
  • 12. 12© Copyright 2018 Pivotal. All rights reserved. Hadoop Data Lakes Public Cloud Data Lakes HybridLocal
  • 13. Pivotal Greenplum 5 • / / • • • PostgreSQL PostgreSQL • / / / BI/ •
  • 14. 14© Copyright 2018 Pivotal. All rights reserved.
  • 15. 15© Copyright 2018 Pivotal. All rights reserved.
  • 16.
  • 17. Pivotal Greenplum 5 • / / • • • PostgreSQL PostgreSQL • / / / BI/ •
  • 18. 18© Copyright 2018 Pivotal. All rights reserved. DWH BI/ DWH ( )
  • 19. BI/
  • 20. Pivotal Greenplum OSS • • 50 • Pivotal Greenplum • Apache 2017 7 : http://madlib.apache. org Apache MADlib • • • http://lucene.apache. org/solr/ Apache Solr • PostgreSQL OSS • • • http://postgis.net/ PostGIS • • • R • https://www.r- project.org/ R • • • • • https://www.python.o rg/ Python
  • 22. Pivotal 2km ATM 24 200 Peter Pavan drop function if exists get_people(text,text,integer,integer,float,float); CREATE FUNCTION get_people(text,text,integer,integer,float,float) RETURNS integer AS $$ declare linkchk integer; v1 record; v2 record; begin execute 'truncate table results;'; for v1 in select distinct a.id,a.firstname,a.lastname,amount,tran_date,c.lat,c.lng,address,a.description,d.score from people a,transactions b,location c, (SELECT w.id, q.score FROM people w, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gpadmin.public.people' , 'Pivotal', null) q WHERE (q.id::integer) = w.id order by 2 desc) d where soundex(firstname)=soundex($1) and a.id=b.id and amount > $3 and (extract(epoch from tran_date) - extract(epoch from now()))/3600 < $4 and st_distance_sphere(st_makepoint($5, $6),st_makepoint(c.lng, c.lat))/1000.0 <= 2.0 and b.locid=c.locid and a.id=d.id loop for v2 in select distinct a.id,a.firstname,a.lastname,amount,tran_date,c.lat,c.lng,address,a.description,d.score from people a,transactions b,location c, (SELECT w.id, q.score FROM people w, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gpadmin.public.people' , 'Pivotal', null) q WHERE (q.id::integer) = w.id order by 2 desc) d where soundex(firstname)=soundex($2) and a.id=b.id and amount > $3 and (extract(epoch from tran_date) - extract(epoch from now()))/3600 < $4 and st_distance_sphere(st_makepoint($5, $6),st_makepoint(c.lng, c.lat))/1000.0 <= 2.0 and b.locid=c.locid and a.id=d.id loop execute 'DROP TABLE IF EXISTS out, out_summary;'; execute 'SELECT madlib.graph_bfs(''people'',''id'',''links'',NULL,'||v1.id||',''out'');' ; select 1 into linkchk from out where dist=1 and id=v2.id; if linkchk is not null then insert into results values (v1.id,v1.firstname,v1.lastname,v1.amount,v1.tran_date,v1.lat,v1.lng,v1.address,v1.description,v1.score); insert into results values (v2.id,v2.firstname,v2.lastname,v2.amount,v2.tran_date,v2.lat,v2.lng,v2.address,v2.description,v2.score); end if; end loop; end loop; return 0; end $$ LANGUAGE plpgsql; -- person1 , person 2, amount, duration in hours, longtitude, latitude (in question) select get_people('Pavan','Peter',200,24,103.912680, 1.309432) ; Greenplum POSTGIS functions st_distance_sphere() and st_makepoint() calculate distance between ATM location and reference lat ,long < 2 KM GPText.search() function is used to know if both people work at ‘Pivotal’ Greenplum and Apache MADlib BFS search to know if there are direct or indirect links between people Greenplum Fuzzy String Match function Soundex() to know if people name sounds like ‘Pavan’ or ‘Peter’ Greenplum Time functions to calculate difference in amount withdrawn time < 24 hours Amount > $200 “Pivotal - GPText Peter Pavan - Fuzzy String Match - Apache MADlib 2km ATM” - PostGIS 24 ” / 200 ”
  • 23. • • 50 • Pivotal Greenplum • http://madlib.apache.org/ Apache MADlib
  • 24. 24© Copyright 2018 Pivotal. All rights reserved. MADlib • • • (SVD) • • • • • • • • • (PCA) • ( , ) • ( LDA) • • ( ) • • (CRF) • (K ) • • CountMin • Flajolet-Martin • Latest release: MADlib v1.15, URL: madlib.net
  • 25. Pivotal Greenplum 5 • / / • • • PostgreSQL PostgreSQL • / / / BI/ •
  • 26. 26© Copyright 2018 Pivotal. All rights reserved. 1 PL/Container: R/Python Containerization on Greenplum Ÿ Python R Pivotal Greenplum PL/Python PL/R Ÿ UDF Ÿ DBA UDF DBA Ÿ PL/Container: Greenplum Docker – R Python Greenplum – UDF – UDF – Python/R – – System(“rm -rf /data”) ++
  • 27. Standby Master … Master Host Interconnect Segment Host Primary Mirror Node1 Segment Host Primary Mirror Node2 Segment Host Primary Mirror NodeN Greenplum for Kubernetes Pod kubelet kube-proxy docker master Pod kubelet kube-proxy docker standby Pod kubelet kube-proxy docker Mirror Pod kubelet kube-proxy docker Primary Pod kubelet kube-proxy docker Mirror Pod kubelet kube-proxy docker Primary
  • 28. 28© Copyright 2018 Pivotal. All rights reserved. 2 Greenplum for Kubernetes Ÿ Day 1 - Easy to Build – – – – Ÿ Pivotal Ÿ Ÿ BI/ETL/ Ÿ Day 2 - Easy to Operate – & – – – & – & – – &
  • 30. PostgreSQL OSS • Greenplum OSS 2 Greenplum PostgreSQL • (3355 PostgreSQL commits) Greenplum 5 • • New Data Types – JSON & – Hstore( ) – XML / – UUID • Analyze • (Do ) • default/Variadic • DBLink • : PostGIS • Lazy XID •
  • 31. 31© Copyright 2018 Pivotal. All rights reserved. Greenplum
  • 32. 32© Copyright 2018 Pivotal. All rights reserved. Ÿ – AWS – Azure Ÿ ( ) – AWS S3 Ÿ Apache MADlib – k-Means Clustering ▪ URL: http://madlib.apache.org/docs/v1. 8/group__grp__kmeans.html Ÿ – CentOS 7.5 64bit – Pivotal Greenplum 5.9.0 Ÿ EC2 – r4.xlarge ▪ URL: https://aws.amazon.com/market place/pp/B06XKQ8Z3H
  • 33. 33© Copyright 2018 Pivotal. All rights reserved. Hadoop Data Lakes Massively Parallel Architecture Public Cloud Data Lakes Predefined Libraries Programmatic GPText Parallel Configurable Data Load High Speed Ingestion Analytical Data to cache In-Memory Data Grid Parallel Data Load and External Tables Pivotal Data Suite In-DB Predictive Analytics ColdHotWarm DataTemperature PIVOTAL GEMFIRE PIVOTAL GREENPLUM (Data Warehouse)
  • 34. PIVOTAL GREENPLUM Structured Data JDBC, OBBC SQL ANSI SQL RDBMS SparkGemFireHDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL DB SQL Apache MADlib / / Python. R, Java, Perl, C Apache SOLR PostGIS Custom Apps BI / Reporting Machine Learning AI Pivotal Greenplum KafkaETL Spring Cloud Data Flow (MPP) PostgreSQL (GPORCA) Command Center SQL (Hyper-Q) IT
  • 35.
  • 36. • Transforming your Data platform for Analytics - – : 2018 10 23 ( ) 15:00-17:30(14:30 ) – : Pivotal ( 20F) – : https://pivotal.omniattend.com/seminar/Data20181023 • Pivotal Japan Tech Community – https://pivotal-japan.connpass.com/ • Twitter: @greenplummy – https://twitter.com/greenplummy