Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム

1© Copyright 2018 Pivotal. All rights reserved.
2.0
2
01
8

Ÿ
–
– Pivotal
Ÿ Pivotal Greenplum
–
–
–
Ÿ
– AWS&Azure Pivotal Greenplum
– (AWS S3)
– Apache MADlib(k-Means Clustering) In-DB
Ÿ

(DWH/Hadoop)
&
(IaaS/PaaS)
(M2M/ )
DevOps
CI/CD

Pivotal Data Suite
Spring Cloud Data FlowGemFireGreenplum
b nop Q N G
DS L aM L
l r
ed )( a
Q N P
l r
Q Nh P cs
am A L
aM Lc

RDBMS
Hadoop
Analytics
Apps
Online
Apps
Mobile
Apps
Analytics
Apps
Online
Apps
Mobile
Apps
Machine
PIVOTAL
GREENPLUM
PIVOTAL
GEMFIRE
Spring Cloud
Data Flow

Hadoop Data Lakes
Massively Parallel Architecture
Public Cloud Data Lakes
Predefined Libraries
Programmatic
GPText
Parallel Configurable Data Load
High Speed
Ingestion
Analytical
Data to cache
In-Memory Data Grid
Parallel Data Load and External Tables
Pivotal Data Suite
In-DB Predictive Analytics
ColdHotWarm
DataTemperature
PIVOTAL
GEMFIRE
PIVOTAL
GREENPLUM
(Data Warehouse)

Pivotal Greenplum
• Pivotal Data Suite (CPU )
•
•
• ( )
• MPP DB
•
• ( etc..)
•
•
•

Pivotal Greenplum
MPP (Massively Parallel Processing)
... ...
x 2
x 2
SQL
SQL gNet

CPU
I/O
CPU CPU CPU CPU
CPU CPU CPU CPU CPU
I/O I/O
HW
RDB DB

Pivotal Greenplum
( )
• 
–  RDBMS
– 
• 
– 
–  IO
–  DB
A B C D A B C D
IO IO
• 
• 
– 
– 
• 
1
2008 2009 2010 2011 2012
? ? ? ??
? ? ? ?? ?
?
?
?
? ? ? ?? ? ? ?
?
?
?
? ? ? ?
? ? ? ?
: HIGH
: LOW
: Medium
XXX
YYY
ZZZ
?
?
? ? ? ? ? ? ? ?
? ? ? ?
AAA
( )
( )
BBB
CCC

Pivotal Greenplum/Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Greenplum
Greenplum
Greenplum
Pivotal Greenplum
→ DB Hadoop
→
→
10Gb
Pivotal Greenplum
Hadoop
Pivotal Greenplum
Hadoop DB

Hadoop Data Lakes Public Cloud Data Lakes HybridLocal

Pivotal Greenplum 5
• / /
•
•
• PostgreSQL
PostgreSQL
• / /
/
BI/
•

DWH
BI/
DWH ( )

Pivotal Greenplum OSS
•
• 50
• Pivotal Greenplum
• Apache
2017
7 :
http://madlib.apache.
org
Apache MADlib
•
•
• http://lucene.apache.
org/solr/
Apache Solr
•
PostgreSQL
OSS
•
•
• http://postgis.net/
PostGIS
•
•
•
R
• https://www.r-
project.org/
R
•
•
•
•
• https://www.python.o
rg/
Python

Pivotal
2km
ATM 24 200
Peter
Pavan

Pivotal 2km ATM 24 200
Peter Pavan
drop function if exists get_people(text,text,integer,integer,float,float);
CREATE FUNCTION get_people(text,text,integer,integer,float,float) RETURNS integer
AS $$
declare
linkchk integer; v1 record; v2 record;
begin
execute 'truncate table results;';
for v1 in select distinct a.id,a.firstname,a.lastname,amount,tran_date,c.lat,c.lng,address,a.description,d.score from people a,transactions b,location c,
(SELECT w.id, q.score FROM people w, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gpadmin.public.people' , 'Pivotal', null) q
WHERE (q.id::integer) = w.id order by 2 desc) d
where soundex(firstname)=soundex($1) and a.id=b.id and amount > $3 and (extract(epoch from tran_date) - extract(epoch from now()))/3600 < $4
and st_distance_sphere(st_makepoint($5, $6),st_makepoint(c.lng, c.lat))/1000.0 <= 2.0 and b.locid=c.locid and a.id=d.id
loop
for v2 in select distinct a.id,a.firstname,a.lastname,amount,tran_date,c.lat,c.lng,address,a.description,d.score from people a,transactions b,location c,
(SELECT w.id, q.score FROM people w, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gpadmin.public.people' , 'Pivotal', null) q
WHERE (q.id::integer) = w.id order by 2 desc) d
where soundex(firstname)=soundex($2) and a.id=b.id and amount > $3 and (extract(epoch from tran_date) - extract(epoch from now()))/3600 < $4
and st_distance_sphere(st_makepoint($5, $6),st_makepoint(c.lng, c.lat))/1000.0 <= 2.0 and b.locid=c.locid and a.id=d.id
loop
execute 'DROP TABLE IF EXISTS out, out_summary;';
execute 'SELECT madlib.graph_bfs(''people'',''id'',''links'',NULL,'||v1.id||',''out'');' ;
select 1 into linkchk from out where dist=1 and id=v2.id;
if linkchk is not null then
insert into results values (v1.id,v1.firstname,v1.lastname,v1.amount,v1.tran_date,v1.lat,v1.lng,v1.address,v1.description,v1.score);
insert into results values (v2.id,v2.firstname,v2.lastname,v2.amount,v2.tran_date,v2.lat,v2.lng,v2.address,v2.description,v2.score);
end if;
end loop;
end loop;
return 0;
end
$$ LANGUAGE plpgsql;
-- person1 , person 2, amount, duration in hours, longtitude, latitude (in question)
select get_people('Pavan','Peter',200,24,103.912680, 1.309432) ;
Greenplum POSTGIS functions
st_distance_sphere() and
st_makepoint() calculate distance
between ATM location and
reference lat ,long < 2 KM
GPText.search() function is
used to know if both
people work at ‘Pivotal’
Greenplum and Apache MADlib BFS
search to know if there are direct or
indirect links between people
Greenplum Fuzzy String
Match function Soundex()
to know if people name
sounds like ‘Pavan’ or
‘Peter’
Greenplum Time functions to
calculate difference in amount
withdrawn time < 24 hours
Amount
> $200
“Pivotal
- GPText
Peter
Pavan
- Fuzzy
String Match
- Apache MADlib 2km ATM”
- PostGIS
24 ”
/
200
”

•
• 50
• Pivotal Greenplum
• http://madlib.apache.org/
Apache MADlib

MADlib
•
•
• (SVD)
•
•
•
•
•
•
•
•
• (PCA)
• (
, )
• ( LDA)
•
• ( )
•
• (CRF)
• (K )
•
• CountMin
• Flajolet-Martin
•
Latest release: MADlib v1.15, URL: madlib.net

1
PL/Container: R/Python Containerization on Greenplum
Ÿ Python R
Pivotal Greenplum
PL/Python PL/R
Ÿ UDF
Ÿ DBA UDF
DBA
Ÿ PL/Container: Greenplum
Docker
– R Python Greenplum
– UDF
– UDF
– Python/R
–
–
System(“rm -rf /data”)
++

Standby
Master
…
Master
Host
Interconnect
Segment Host
Primary Mirror
Node1
Segment Host
Primary Mirror
Node2
Segment Host
Primary Mirror
NodeN
Greenplum for Kubernetes
Pod
kubelet kube-proxy docker
master
Pod
standby
Pod
Mirror
Pod
Primary
Pod
Mirror
Pod
Primary

2
Greenplum for Kubernetes
Ÿ Day 1 - Easy to Build
–
–
–
–
Ÿ Pivotal
Ÿ
Ÿ BI/ETL/
Ÿ Day 2 - Easy to Operate
– &
–
–
– &
– &
–
– &

PostgreSQL
OSS
• Greenplum OSS 2 Greenplum
PostgreSQL
• (3355 PostgreSQL
commits) Greenplum 5
•
• New Data Types
– JSON &
– Hstore( )
– XML /
– UUID
• Analyze
• (Do )
• default/Variadic
• DBLink
• : PostGIS
• Lazy XID
•

Greenplum

Ÿ
– AWS
– Azure
Ÿ ( )
– AWS S3
Ÿ Apache MADlib
– k-Means Clustering
▪ URL:
http://madlib.apache.org/docs/v1.
8/group__grp__kmeans.html
Ÿ
– CentOS 7.5 64bit
– Pivotal Greenplum 5.9.0
Ÿ EC2
– r4.xlarge
▪ URL:
https://aws.amazon.com/market
place/pp/B06XKQ8Z3H

Hadoop Data Lakes
Massively Parallel Architecture
Public Cloud Data Lakes
Predefined Libraries
Programmatic
GPText
Parallel Configurable Data Load
High Speed
Ingestion
Analytical
Data to cache
In-Memory Data Grid
Parallel Data Load and External Tables
Pivotal Data Suite
In-DB Predictive Analytics
ColdHotWarm
DataTemperature
PIVOTAL
GEMFIRE
PIVOTAL
GREENPLUM
(Data Warehouse)

PIVOTAL
GREENPLUM
Structured Data
JDBC, OBBC
SQL
ANSI SQL
RDBMS
SparkGemFireHDFS
JSON, Apache AVRO, Apache Parquet and XML
Teradata SQL
DB SQL
Apache MADlib
/ /
Python. R,
Java, Perl, C
Apache SOLR PostGIS
Custom Apps BI / Reporting Machine Learning AI
Pivotal Greenplum
KafkaETL
Spring
Cloud
Data Flow
(MPP)
PostgreSQL
(GPORCA)
Command
Center
SQL
(Hyper-Q)
IT

• Transforming your Data platform for Analytics -
– : 2018 10 23 ( ) 15:00-17:30(14:30 )
– : Pivotal ( 20F)
– : https://pivotal.omniattend.com/seminar/Data20181023
• Pivotal Japan Tech Community
– https://pivotal-japan.connpass.com/
• Twitter: @greenplummy
– https://twitter.com/greenplummy

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム

Similar to Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム (20)

Recently uploaded

Recently uploaded (20)

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム