1Pivotal Confidential–Internal Use Only 1Pivotal Confidential–Internal Use Only
Modern Data Architecture
Alexey Grishchenko
2Pivotal Confidential–Internal Use Only
About me
Enterprise Architect @ Pivotal
 7 years in data processing
 5 years with MPP
 4 years with Hadoop
 Spark contributor
 http://0x0fff.com
3Pivotal Confidential–Internal Use Only
How it started…
Front
End
4Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
5Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
DBMS
6Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
DBMS
What about BI?
7Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
DBMS
Just put it there!
8Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
DBMS
BI
9Pivotal Confidential–Internal Use Only
How it started…
Front
End
Back
End
DBMS
BI
Was it fast?
10Pivotal Confidential–Internal Use Only
How it started…
Front
End
10ms
Back
End
DBMS
BI
100ms
200ms
1-2 min
11Pivotal Confidential–Internal Use Only
How it started…
Front
End
10ms
Back
End
DBMS
BI
100ms
200ms
1-2 min
yes, single server…
12Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
200ms
1-2 min
More users got
workstations
13Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
400ms
800ms
1-2 min
14Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
400ms
800ms
1-2 min
Split!
15Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
300ms
600ms
1-2 min
16Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
300ms
600ms
1-2 min
Even more users?
17Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
300ms
600ms
1-2 min
Split!
18Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
400ms
1-2 min
Front
End
Back
End
Front
End
Back
End
19Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
400ms
1-2 min
Front
End
Back
End
Front
End
Back
End
What about
automated systems?
20Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
1 sec
5-10 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
21Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
1 sec
5-10 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Database, please, live!
22Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
1 sec
5-10 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
23Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
800ms
15-20 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
24Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
800ms
15-20 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
What if “split” didn’t
help this time?
25Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
800ms
15-20 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Split more! Eventually
it will help…
26Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
300ms
35-40 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
27Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
300ms
35-40 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
28Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
300ms
35-40 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
Sales went
10% up!
29Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
300ms
35-40 min
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
Sales went
10% up!
Sales went
20%
down!
30Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
600ms
2-3 hrs
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
Sales went
10% up!
Sales went
20%
down!
31Pivotal Confidential–Internal Use Only
First Issues
Front
End
10ms
Back
End
DBMS
BI
100ms
600ms
2-3 hrs
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
Front
End
Back
End
DBMS DBMSDBMSDBMS
Sales went
10% up!
Sales went
20%
down!
Stop loading my
system with your
stupid reports!
32Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS
300ms
2 days
FE
BE
DBMS DBMSDBMSDBMS
FE
BE
FE
BE
FE
BE
FE
BE
ETL
DWH
1 day
33Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS
300ms
2 days
FE
BE
DBMS DBMSDBMSDBMS
FE
BE
FE
BE
FE
BE
FE
BE
ETL
DWH
1 day
We need more
reports!
34Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS
300ms
3-4 days
FE
BE
DBMS DBMSDBMSDBMS
FE
BE
FE
BE
FE
BE
FE
BE
ETL
DWH
1 day
Data
Mining
OLAP…
35Pivotal Confidential–Internal Use Only
BI
The Era of Data Warehouse
100ms
DBMS
300ms
3-4 days
FE
BE
DBMS DBMSDBMSDBMS
FE
BE
FE
BE
FE
BE
FE
BE
ETL
DWH
1 day
Data
Mining
OLAP… We need
secondary site!
36Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
37Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
WAL Replication
3-5 minutes late
38Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
WAL Replication
3-5 minutes late
39Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
WAL Replication
3-5 minutes late
Where is our
DWH? We need
this data now!
40Pivotal Confidential–Internal Use Only
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
WAL Replication
3-5 minutes late
41Pivotal Confidential–Internal Use Only
ETL
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
42Pivotal Confidential–Internal Use Only
ETL
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
Why is this data
so old?
43Pivotal Confidential–Internal Use Only
ETL
The Era of Data Warehouse
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
44Pivotal Confidential–Internal Use Only
ETL
Advanced Architecture – ELT
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ETL
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
DBMS DBMS DBMS…
ETL
DDS
Data Marts Reports
Aggregates
OLAP
DBMS DBMS DBMS…
ELT
DDS
Data Marts Reports
Aggregates
OLAP
ODS ODS ODS…
45Pivotal Confidential–Internal Use Only
ELT
Advanced Architecture – ELT
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
46Pivotal Confidential–Internal Use Only
ELT
Advanced Architecture – CDC
100ms
300ms
3-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
1 day
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
DWH
BI
Data
Mining
OLAP…
5-7 days
DBMS DBMS DBMS DBMS DBMS
DBMS DBMS DBMS…
ELT
DDS
Data Marts Reports
Aggregates
OLAP
ODS ODS ODS…
DBMS DBMS DBMS…
ELT
DDS
Data Marts Reports
Aggregates
OLAP
ODS ODS ODS…
CDC
1 day
1 hour
47Pivotal Confidential–Internal Use Only
ELT CDC
Advanced Architecture – CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
48Pivotal Confidential–Internal Use Only
ELT CDC
Advanced Architecture – CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Why is our
secondary site’s
DWH so old?
49Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Moving Forward
50Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
Moving Forward
51Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
 Time to action takes up to 7 days
Moving Forward
52Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
 Time to action takes up to 7 days
 Amount of data is growing
Moving Forward
53Pivotal Confidential–Internal Use Only
ELT CDC
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
 Time to action takes up to 7 days
 Amount of data is growing
 DWH MPP storage is expensive
Moving Forward
54Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
 Time to action takes up to 7 days
 Amount of data is growing
 DWH MPP storage is expensive
Data Lake
55Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Our problems are
 Time to action takes up to 7 days
 Amount of data is growing
 DWH MPP storage is expensive
Lambda
Data Lake
56Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Data Lake
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
Hadoop
DBMS DBMS DBMS…
ELT
DDS
OLAP Data Marts
Aggregates
Reports
ODS ODS ODS…
CDC
DWH
ODS UDS
Analytical Archives
BI
Data
Mining
OLAP
SQL-on-Hadoop
Data Mining
At Scale
57Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Data Lake
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
BI
Data
Mining
OLAP…
FE
BE
FE
BE
FE
BE
FE
BE
FE
BE
WAL Replication
3-5 minutes late
NAS NAS
Backup / Restore
3 days late
BI
Data
Mining
OLAP…
4-7 days
DBMS DBMS DBMS DBMS DBMS
CDC
DWH
58Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Data Lake
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
Data
Mining
BI OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
59Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Lambda
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
Data
Mining
BI OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
Source
Data
Speed Layer Batch Layer
Serving Layer
Query Query
Master Dataset
Batch
View
Batch
View
Batch
View
Real-time
View
Real-time
View
Real-time
View
60Pivotal Confidential–Internal Use Only
ELT CDC
Modern Architectures – Lambda
100ms
300ms
1-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
3-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
Data
Mining
BI OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
61Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures – Lambda
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
62Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
63Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
 Too many standby systems
64Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
 Too many standby systems
 How to replicate Hadoop cluster?
65Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
 Too many standby systems
 How to replicate Hadoop cluster?
 How to sync data in real-time systems?
66Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
 Too many standby systems
 How to replicate Hadoop cluster?
 How to sync data in real-time systems?
 How to better sync DWH?
67Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
Modern Architectures
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Our problems are
 Too many standby systems
 How to replicate Hadoop cluster?
 How to sync data in real-time systems?
 How to better sync DWH?
Pipelining
68Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
69Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
70Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
71Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Table
72Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
73Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
74Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
75Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
load
ODS
DWH
76Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
load
ODS
DDS
DWH
77Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
load
ODS
DDS
DataMart
DWH
78Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…SOAP
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
load
ODS
DDS
DataMart
DWH
JDBC
79Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
ETL
cp
Batch
ETL
ODS
DDS
DataMart
DWH
JDBC
80Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatch
81Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatch
loadETL
82Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatchApp
ETLBatch
load
loadETL
83Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatchApp
ETLBatch
load
loadETL
STG
BatchApp
Hadoop
HDFS
SQL
On
Hadoop
84Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatchApp
ETLBatch
load
loadETL
STG
BatchApp
Hadoop
HDFS
SQL
On
Hadoop
RTI
App
85Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
FE
BI
App
App
App
…HTTP
BE
Srv
Srv
Srv
…
OLTP
SP
JDBC
Log
Table
CDC
copy
Parse
Batch
load
ODS
DDS
DataMart
DWH
JDBC
API
Queue ETL
ETLBatchApp
ETLBatch
load
loadETL
STG
BatchApp
Hadoop
HDFS
SQL
On
Hadoop
RTI
AppReplicate
86Pivotal Confidential–Internal Use Only
In-Memory
Data Store
ELT CDC
100ms
300ms
0-4 days
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
0-24 hrs
OLAP
Data
Mining
BI…
FE
BE
FE
BE
FE
BE
NAS NAS
Backup / Restore
2 days late
OLAP…
3-6 days
DBMS DBMS DBMS
WAL Replication
3-5 minutes late
CDC
DWHHadoop Hadoop
?
In-Memory
Data Store
RTDM BI
Data
Mining
Modern Data Architecture – Pipelining
87Pivotal Confidential–Internal Use Only
ELT CDC
FE
BE
DBMS DBMS
FE
BE
DBMS
FE
BE
ELT
DWH
OLAP
Data
Mining
RTBI…
FE
BE
FE
BE
FE
BE
CDC
Hadoop
In-Memory
Data Store
BI
Modern Data Architecture – Pipelining
Replication Queue
3-5 minutes late
In-Memory
Data Store
OLAP…
DWHHadoop
BI
Data
Mining
RTBI
DBMS DBMS DBMSWAL Replication
3-5 minutes late
88Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
89Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
HTTP
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
Pivotal Cloud Foundry
FE
…
App
App
App
Queue BE
…
App
App
App
 Pivotal Labs – agile software
development for next-generation
applications
 Pivotal Cloud Foundry – PaaS for
customer applications
 RabbitMQ – distributed message
queue service on top of PCF
 Spring IO – foundation platform for
modern applications
90Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
Pivotal GemFire
App
Pivotal GemFire and Apache Geode (incubating) –
in-memory data grid enabling real-time data processing and
real-time decision making for enterprises
91Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
Spring XD
Streaming
Spring XD – unified, distributed and extensible framework for
data pipelining: ingesting, batching, processing and exporting
92Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
ES
DDS
DataMart
Pivotal
Greenplum
PostgreSQL
SP
Table
ODS
ETL
ETL
Streaming
Data
Pivotal HD
Pivotal
HAWQ
Data
Mart
 Pivotal HD – leading Hadoop distribution based on ODP
 Pivotal HAWQ and Apache HAWQ (incubating) – bringing the
power of MPP to the Hadoop cluster, best in class SQL-on-
Hadoop solution
 Apache Spark – component of the Pivotal HD distribution,
modern framework for distributed data processing
93Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
Mart
ODS
ETL
ETL
PostgreSQL
SP
Table
 Pivotal PostgreSQL – commercially supported by Pivotal
open source distribution of PostgreSQL
94Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
Data
MartPostgreSQL
SP
Table
ETL
ETL
ES
DDS
DataMart
Pivotal
Greenplum
ODS
Pivotal Greenplum – leading analytical MPP database,
foundation for the enterprise data warehousing systems and
advanced analytics
95Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
Pivotal GemFire
App
Spring XD
Streaming
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
Data Lake
96Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Spring XD
Streaming
ES
DDS
DataMart
Pivotal
Greenplum
PostgreSQL
SP
Table
ODS
ETL
ETL
Pivotal GemFire
App
Streaming
Data
Pivotal HD
Pivotal
HAWQ
Data
Mart
BI
Lambda Architecture
97Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
ES
DDS
DataMart
Pivotal
Greenplum
PostgreSQL
SP
Table
ODS
ETL
ETL
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Streaming
Pivotal HD
BI
Pivotal GemFire
App
Spring XD
Streaming
Data
Pivotal
HAWQ
Data
Mart
Pipelining
98Pivotal Confidential–Internal Use Only
Pivotal and Modern Data Architecture
BI
Pivotal Cloud Foundry
HTTP
FE
…
App
App
App
Queue BE
…
App
App
App
Pivotal GemFire
App
Spring XD
Streaming
Streaming
Data
Pivotal HD
Pivotal
HAWQ
ES
DDS
DataMart
Pivotal
Greenplum
Data
MartPostgreSQL
SP
Table
ODS
ETL
ETL
99Pivotal Confidential–Internal Use Only 99Pivotal Confidential–Internal Use Only
Questions?
BUILT FOR THE SPEED OF BUSINESS

Modern Data Architecture

  • 1.
    1Pivotal Confidential–Internal UseOnly 1Pivotal Confidential–Internal Use Only Modern Data Architecture Alexey Grishchenko
  • 2.
    2Pivotal Confidential–Internal UseOnly About me Enterprise Architect @ Pivotal  7 years in data processing  5 years with MPP  4 years with Hadoop  Spark contributor  http://0x0fff.com
  • 3.
    3Pivotal Confidential–Internal UseOnly How it started… Front End
  • 4.
    4Pivotal Confidential–Internal UseOnly How it started… Front End Back End
  • 5.
    5Pivotal Confidential–Internal UseOnly How it started… Front End Back End DBMS
  • 6.
    6Pivotal Confidential–Internal UseOnly How it started… Front End Back End DBMS What about BI?
  • 7.
    7Pivotal Confidential–Internal UseOnly How it started… Front End Back End DBMS Just put it there!
  • 8.
    8Pivotal Confidential–Internal UseOnly How it started… Front End Back End DBMS BI
  • 9.
    9Pivotal Confidential–Internal UseOnly How it started… Front End Back End DBMS BI Was it fast?
  • 10.
    10Pivotal Confidential–Internal UseOnly How it started… Front End 10ms Back End DBMS BI 100ms 200ms 1-2 min
  • 11.
    11Pivotal Confidential–Internal UseOnly How it started… Front End 10ms Back End DBMS BI 100ms 200ms 1-2 min yes, single server…
  • 12.
    12Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 200ms 1-2 min More users got workstations
  • 13.
    13Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 400ms 800ms 1-2 min
  • 14.
    14Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 400ms 800ms 1-2 min Split!
  • 15.
    15Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 300ms 600ms 1-2 min
  • 16.
    16Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 300ms 600ms 1-2 min Even more users?
  • 17.
    17Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 300ms 600ms 1-2 min Split!
  • 18.
    18Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 400ms 1-2 min Front End Back End Front End Back End
  • 19.
    19Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 400ms 1-2 min Front End Back End Front End Back End What about automated systems?
  • 20.
    20Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 1 sec 5-10 min Front End Back End Front End Back End Front End Back End Front End Back End
  • 21.
    21Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 1 sec 5-10 min Front End Back End Front End Back End Front End Back End Front End Back End Database, please, live!
  • 22.
    22Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 1 sec 5-10 min Front End Back End Front End Back End Front End Back End Front End Back End
  • 23.
    23Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 800ms 15-20 min Front End Back End Front End Back End Front End Back End Front End Back End
  • 24.
    24Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 800ms 15-20 min Front End Back End Front End Back End Front End Back End Front End Back End What if “split” didn’t help this time?
  • 25.
    25Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 800ms 15-20 min Front End Back End Front End Back End Front End Back End Front End Back End Split more! Eventually it will help…
  • 26.
    26Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS
  • 27.
    27Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS
  • 28.
    28Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up!
  • 29.
    29Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 300ms 35-40 min Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up! Sales went 20% down!
  • 30.
    30Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 600ms 2-3 hrs Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up! Sales went 20% down!
  • 31.
    31Pivotal Confidential–Internal UseOnly First Issues Front End 10ms Back End DBMS BI 100ms 600ms 2-3 hrs Front End Back End Front End Back End Front End Back End Front End Back End DBMS DBMSDBMSDBMS Sales went 10% up! Sales went 20% down! Stop loading my system with your stupid reports!
  • 32.
    32Pivotal Confidential–Internal UseOnly BI The Era of Data Warehouse 100ms DBMS 300ms 2 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day
  • 33.
    33Pivotal Confidential–Internal UseOnly BI The Era of Data Warehouse 100ms DBMS 300ms 2 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day We need more reports!
  • 34.
    34Pivotal Confidential–Internal UseOnly BI The Era of Data Warehouse 100ms DBMS 300ms 3-4 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day Data Mining OLAP…
  • 35.
    35Pivotal Confidential–Internal UseOnly BI The Era of Data Warehouse 100ms DBMS 300ms 3-4 days FE BE DBMS DBMSDBMSDBMS FE BE FE BE FE BE FE BE ETL DWH 1 day Data Mining OLAP… We need secondary site!
  • 36.
    36Pivotal Confidential–Internal UseOnly The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP…
  • 37.
    37Pivotal Confidential–Internal UseOnly The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late
  • 38.
    38Pivotal Confidential–Internal UseOnly The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late
  • 39.
    39Pivotal Confidential–Internal UseOnly The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late Where is our DWH? We need this data now!
  • 40.
    40Pivotal Confidential–Internal UseOnly The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE WAL Replication 3-5 minutes late
  • 41.
    41Pivotal Confidential–Internal UseOnly ETL The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS
  • 42.
    42Pivotal Confidential–Internal UseOnly ETL The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS Why is this data so old?
  • 43.
    43Pivotal Confidential–Internal UseOnly ETL The Era of Data Warehouse 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS
  • 44.
    44Pivotal Confidential–Internal UseOnly ETL Advanced Architecture – ELT 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ETL DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS… ETL DDS Data Marts Reports Aggregates OLAP DBMS DBMS DBMS… ELT DDS Data Marts Reports Aggregates OLAP ODS ODS ODS…
  • 45.
    45Pivotal Confidential–Internal UseOnly ELT Advanced Architecture – ELT 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS
  • 46.
    46Pivotal Confidential–Internal UseOnly ELT Advanced Architecture – CDC 100ms 300ms 3-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 1 day BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late DWH BI Data Mining OLAP… 5-7 days DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS… ELT DDS Data Marts Reports Aggregates OLAP ODS ODS ODS… DBMS DBMS DBMS… ELT DDS Data Marts Reports Aggregates OLAP ODS ODS ODS… CDC 1 day 1 hour
  • 47.
    47Pivotal Confidential–Internal UseOnly ELT CDC Advanced Architecture – CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH
  • 48.
    48Pivotal Confidential–Internal UseOnly ELT CDC Advanced Architecture – CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Why is our secondary site’s DWH so old?
  • 49.
    49Pivotal Confidential–Internal UseOnly ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Moving Forward
  • 50.
    50Pivotal Confidential–Internal UseOnly ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are Moving Forward
  • 51.
    51Pivotal Confidential–Internal UseOnly ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days Moving Forward
  • 52.
    52Pivotal Confidential–Internal UseOnly ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing Moving Forward
  • 53.
    53Pivotal Confidential–Internal UseOnly ELT CDC 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing  DWH MPP storage is expensive Moving Forward
  • 54.
    54Pivotal Confidential–Internal UseOnly ELT CDC Modern Architectures 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing  DWH MPP storage is expensive Data Lake
  • 55.
    55Pivotal Confidential–Internal UseOnly ELT CDC Modern Architectures 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Our problems are  Time to action takes up to 7 days  Amount of data is growing  DWH MPP storage is expensive Lambda Data Lake
  • 56.
    56Pivotal Confidential–Internal UseOnly ELT CDC Modern Architectures – Data Lake 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH Hadoop DBMS DBMS DBMS… ELT DDS OLAP Data Marts Aggregates Reports ODS ODS ODS… CDC DWH ODS UDS Analytical Archives BI Data Mining OLAP SQL-on-Hadoop Data Mining At Scale
  • 57.
    57Pivotal Confidential–Internal UseOnly ELT CDC Modern Architectures – Data Lake 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs BI Data Mining OLAP… FE BE FE BE FE BE FE BE FE BE WAL Replication 3-5 minutes late NAS NAS Backup / Restore 3 days late BI Data Mining OLAP… 4-7 days DBMS DBMS DBMS DBMS DBMS CDC DWH
  • 58.
    58Pivotal Confidential–Internal UseOnly ELT CDC Modern Architectures – Data Lake 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late Data Mining BI OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ?
  • 59.
    59Pivotal Confidential–Internal UseOnly ELT CDC Modern Architectures – Lambda 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late Data Mining BI OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? Source Data Speed Layer Batch Layer Serving Layer Query Query Master Dataset Batch View Batch View Batch View Real-time View Real-time View Real-time View
  • 60.
    60Pivotal Confidential–Internal UseOnly ELT CDC Modern Architectures – Lambda 100ms 300ms 1-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 3-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late Data Mining BI OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ?
  • 61.
    61Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC Modern Architectures – Lambda 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining
  • 62.
    62Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are
  • 63.
    63Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems
  • 64.
    64Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?
  • 65.
    65Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?  How to sync data in real-time systems?
  • 66.
    66Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?  How to sync data in real-time systems?  How to better sync DWH?
  • 67.
    67Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC Modern Architectures 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Our problems are  Too many standby systems  How to replicate Hadoop cluster?  How to sync data in real-time systems?  How to better sync DWH? Pipelining
  • 68.
    68Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining
  • 69.
    69Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP
  • 70.
    70Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP
  • 71.
    71Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Table
  • 72.
    72Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table
  • 73.
    73Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch
  • 74.
    74Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL
  • 75.
    75Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DWH
  • 76.
    76Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DDS DWH
  • 77.
    77Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DDS DataMart DWH
  • 78.
    78Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv …SOAP OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL load ODS DDS DataMart DWH JDBC
  • 79.
    79Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch ETL cp Batch ETL ODS DDS DataMart DWH JDBC
  • 80.
    80Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatch
  • 81.
    81Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatch loadETL
  • 82.
    82Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL
  • 83.
    83Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL STG BatchApp Hadoop HDFS SQL On Hadoop
  • 84.
    84Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL STG BatchApp Hadoop HDFS SQL On Hadoop RTI App
  • 85.
    85Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining FE BI App App App …HTTP BE Srv Srv Srv … OLTP SP JDBC Log Table CDC copy Parse Batch load ODS DDS DataMart DWH JDBC API Queue ETL ETLBatchApp ETLBatch load loadETL STG BatchApp Hadoop HDFS SQL On Hadoop RTI AppReplicate
  • 86.
    86Pivotal Confidential–Internal UseOnly In-Memory Data Store ELT CDC 100ms 300ms 0-4 days FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH 0-24 hrs OLAP Data Mining BI… FE BE FE BE FE BE NAS NAS Backup / Restore 2 days late OLAP… 3-6 days DBMS DBMS DBMS WAL Replication 3-5 minutes late CDC DWHHadoop Hadoop ? In-Memory Data Store RTDM BI Data Mining Modern Data Architecture – Pipelining
  • 87.
    87Pivotal Confidential–Internal UseOnly ELT CDC FE BE DBMS DBMS FE BE DBMS FE BE ELT DWH OLAP Data Mining RTBI… FE BE FE BE FE BE CDC Hadoop In-Memory Data Store BI Modern Data Architecture – Pipelining Replication Queue 3-5 minutes late In-Memory Data Store OLAP… DWHHadoop BI Data Mining RTBI DBMS DBMS DBMSWAL Replication 3-5 minutes late
  • 88.
    88Pivotal Confidential–Internal UseOnly Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL
  • 89.
    89Pivotal Confidential–Internal UseOnly Pivotal and Modern Data Architecture BI HTTP Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Pivotal Cloud Foundry FE … App App App Queue BE … App App App  Pivotal Labs – agile software development for next-generation applications  Pivotal Cloud Foundry – PaaS for customer applications  RabbitMQ – distributed message queue service on top of PCF  Spring IO – foundation platform for modern applications
  • 90.
    90Pivotal Confidential–Internal UseOnly Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Pivotal GemFire App Pivotal GemFire and Apache Geode (incubating) – in-memory data grid enabling real-time data processing and real-time decision making for enterprises
  • 91.
    91Pivotal Confidential–Internal UseOnly Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Spring XD Streaming Spring XD – unified, distributed and extensible framework for data pipelining: ingesting, batching, processing and exporting
  • 92.
    92Pivotal Confidential–Internal UseOnly Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming ES DDS DataMart Pivotal Greenplum PostgreSQL SP Table ODS ETL ETL Streaming Data Pivotal HD Pivotal HAWQ Data Mart  Pivotal HD – leading Hadoop distribution based on ODP  Pivotal HAWQ and Apache HAWQ (incubating) – bringing the power of MPP to the Hadoop cluster, best in class SQL-on- Hadoop solution  Apache Spark – component of the Pivotal HD distribution, modern framework for distributed data processing
  • 93.
    93Pivotal Confidential–Internal UseOnly Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data Mart ODS ETL ETL PostgreSQL SP Table  Pivotal PostgreSQL – commercially supported by Pivotal open source distribution of PostgreSQL
  • 94.
    94Pivotal Confidential–Internal UseOnly Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ Data MartPostgreSQL SP Table ETL ETL ES DDS DataMart Pivotal Greenplum ODS Pivotal Greenplum – leading analytical MPP database, foundation for the enterprise data warehousing systems and advanced analytics
  • 95.
    95Pivotal Confidential–Internal UseOnly Pivotal and Modern Data Architecture Pivotal GemFire App Spring XD Streaming BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL Data Lake
  • 96.
    96Pivotal Confidential–Internal UseOnly Pivotal and Modern Data Architecture Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Spring XD Streaming ES DDS DataMart Pivotal Greenplum PostgreSQL SP Table ODS ETL ETL Pivotal GemFire App Streaming Data Pivotal HD Pivotal HAWQ Data Mart BI Lambda Architecture
  • 97.
    97Pivotal Confidential–Internal UseOnly Pivotal and Modern Data Architecture ES DDS DataMart Pivotal Greenplum PostgreSQL SP Table ODS ETL ETL Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Streaming Pivotal HD BI Pivotal GemFire App Spring XD Streaming Data Pivotal HAWQ Data Mart Pipelining
  • 98.
    98Pivotal Confidential–Internal UseOnly Pivotal and Modern Data Architecture BI Pivotal Cloud Foundry HTTP FE … App App App Queue BE … App App App Pivotal GemFire App Spring XD Streaming Streaming Data Pivotal HD Pivotal HAWQ ES DDS DataMart Pivotal Greenplum Data MartPostgreSQL SP Table ODS ETL ETL
  • 99.
    99Pivotal Confidential–Internal UseOnly 99Pivotal Confidential–Internal Use Only Questions?
  • 100.
    BUILT FOR THESPEED OF BUSINESS