Evolution of Data
Architectures:
From Hadoop to Data Lake
in becoming Data Driven
Alexandre Vasseur, Pivotal
@PivotalFrance
© Copyright 2015 Pivotal. All rights reserved.
If you have one thing to do
Store Massive
Data Sets
Achieve Continuous
Innovation at Scale
Becoming Data
Driven with Apps
Data Driven Apps
AGILE
DEV & DATA
SCIENCE
MODERN,
COLLABORATIVE
APP & DEV
PLATFORM:
MODERN,
CLOUD-ORIENTED
& OPEN
DATA FABRIC:
MODERN
CLOUD-ORIENTED
& OPEN
© Copyright 2015 Pivotal. All rights reserved.
The Big Data Problem
Fragmentation ContraintsComplexity
© Copyright 2015 Pivotal. All rights reserved.
Pivotal + Hortonworks Alliance
•  Started July 2014 around Ambari collaboration
•  Announcing Pivotal Big Data Suite
on Hortonworks Data Platform
•  Advanced support from world’s leading Hortonworks
support services
•  Joint engineering efforts and enhanced Pivotal HD
© Copyright 2015 Pivotal. All rights reserved.
ODP - Standardize Hadoop Ecosystem
•  Deliver ODP Core to build a versionned, packaged,
tested set of Hadoop components.
•  Focus on developing a platform, rather than projects
•  Initial scope on Apache Hadoop
HDFS / MR / Yarn / Ambari
Remove
vendors lock-in
Ecosystem
Effect
Shorter
Innovation Cycles
http://opendataplatform.org
…
© Copyright 2015 Pivotal. All rights reserved.
Open Sourced but not just Hadoop
•  Open sourcing all Pivotal Big Data Suite components
–  Pivotal GemFire - premium in-memory NoSQL database
–  Pivotal HAWQ - world’s leading SQL compliant enterprise
SQL on Hadoop
–  Pivotal Greenplum Database - advanced enterprise MPP
analytic database with Hadoop interconnect
– SpringXD - Unified, distributed, and extensible system for
data driven application development
© Copyright 2015 Pivotal. All rights reserved.
HAWQ SQL on Hadoop
PROVEN AT SCALE
PRODUCTIVE
NATIVE on HADOOP / ODP
OPEN & EXTENSIBLE
© Copyright 2015 Pivotal. All rights reserved.
HAWQ SQL on Hadoop
10+ years R&D in Massively Parallel SQL
SQL engine at peta scale analytics in world’s largest industries
Mature cost based query optimizer
Full SQL semantics
Rich ecosystem of ELT/dataviz/BI & partners
PL/*, build in analytics, R native framing
All Hadoop formats (gz, Parquet, HAWQ etc)
Data node short circuit reads (colocated, not M/R based)
Predicate pushdown to Hive, HBase
HAWQ PXF: Query federation to NoSQL, DB, etc
© Copyright 2015 Pivotal. All rights reserved.
SpringXD
Data from anywhere, to anywhere
Real time & batch
Ingest + analytics
+ jobs orchestration
Developer friendly
Built in connectors
With / without Spark
DSL
Your choice of Hadoop
Your choice of messaging
Standalone, YARN & outside Hadoop
© Copyright 2015 Pivotal. All rights reserved.
Simplify Data Driven Applications
•  PaaS with NoSQL & Big Data choices built-in
•  Emergence of vertical services: Mobile, IoT, …
Data centric runtimes built in
Java/PHP/Node.js/Ruby
Python
R/Shiny
Scala
SpringXD
Large choice of data services
DB, clustered MySQL etc
Memcache, Redis etc
GemFire, Cassandra etc
Hadoop, GreenPlum etc
Can run virtualized inside PaaS
Can run multi-tenant-ified alongside PaaS
© Copyright 2015 Pivotal. All rights reserved.
DEMO
PHD (or any ODP Core-based Hadoop Distribution)
HDFS
HAWQ
(SQL on Hadoop)
GreenplumDB
(Analytics DW)
GemFire
(JSON/Object
in memory data grid)
Redis
(Key Value Store)
RabbitMQ
SpringXD
(Stream Processing/scoring)
SpringXD
CloudFoundryDataServices
HBase Hive
PXF
(Filtered Pushdown)
Direct Store
Federated
GPHDFS
Write behind
Persistence
Analytic AppsOnline Apps
Pivotal
Big Data Suite
Spark
© Copyright 2015 Pivotal. All rights reserved.
The New Data Imperatives
Converged
Data & Cloud
OpenData-Driven
Apps
A NEW PLATFORM FOR A NEW ERA
Meet us at the booth !
Come to do a “HAWQ in 2 min” lab
Win a Solo2 Beats Headphone !

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

  • 1.
    Evolution of Data Architectures: FromHadoop to Data Lake in becoming Data Driven Alexandre Vasseur, Pivotal @PivotalFrance
  • 2.
    © Copyright 2015Pivotal. All rights reserved. If you have one thing to do Store Massive Data Sets Achieve Continuous Innovation at Scale Becoming Data Driven with Apps
  • 3.
    Data Driven Apps AGILE DEV& DATA SCIENCE MODERN, COLLABORATIVE APP & DEV PLATFORM: MODERN, CLOUD-ORIENTED & OPEN DATA FABRIC: MODERN CLOUD-ORIENTED & OPEN
  • 4.
    © Copyright 2015Pivotal. All rights reserved. The Big Data Problem Fragmentation ContraintsComplexity
  • 5.
    © Copyright 2015Pivotal. All rights reserved. Pivotal + Hortonworks Alliance •  Started July 2014 around Ambari collaboration •  Announcing Pivotal Big Data Suite on Hortonworks Data Platform •  Advanced support from world’s leading Hortonworks support services •  Joint engineering efforts and enhanced Pivotal HD
  • 6.
    © Copyright 2015Pivotal. All rights reserved. ODP - Standardize Hadoop Ecosystem •  Deliver ODP Core to build a versionned, packaged, tested set of Hadoop components. •  Focus on developing a platform, rather than projects •  Initial scope on Apache Hadoop HDFS / MR / Yarn / Ambari Remove vendors lock-in Ecosystem Effect Shorter Innovation Cycles http://opendataplatform.org …
  • 7.
    © Copyright 2015Pivotal. All rights reserved. Open Sourced but not just Hadoop •  Open sourcing all Pivotal Big Data Suite components –  Pivotal GemFire - premium in-memory NoSQL database –  Pivotal HAWQ - world’s leading SQL compliant enterprise SQL on Hadoop –  Pivotal Greenplum Database - advanced enterprise MPP analytic database with Hadoop interconnect – SpringXD - Unified, distributed, and extensible system for data driven application development
  • 8.
    © Copyright 2015Pivotal. All rights reserved. HAWQ SQL on Hadoop PROVEN AT SCALE PRODUCTIVE NATIVE on HADOOP / ODP OPEN & EXTENSIBLE
  • 9.
    © Copyright 2015Pivotal. All rights reserved. HAWQ SQL on Hadoop 10+ years R&D in Massively Parallel SQL SQL engine at peta scale analytics in world’s largest industries Mature cost based query optimizer Full SQL semantics Rich ecosystem of ELT/dataviz/BI & partners PL/*, build in analytics, R native framing All Hadoop formats (gz, Parquet, HAWQ etc) Data node short circuit reads (colocated, not M/R based) Predicate pushdown to Hive, HBase HAWQ PXF: Query federation to NoSQL, DB, etc
  • 10.
    © Copyright 2015Pivotal. All rights reserved. SpringXD Data from anywhere, to anywhere Real time & batch Ingest + analytics + jobs orchestration Developer friendly Built in connectors With / without Spark DSL Your choice of Hadoop Your choice of messaging Standalone, YARN & outside Hadoop
  • 11.
    © Copyright 2015Pivotal. All rights reserved. Simplify Data Driven Applications •  PaaS with NoSQL & Big Data choices built-in •  Emergence of vertical services: Mobile, IoT, … Data centric runtimes built in Java/PHP/Node.js/Ruby Python R/Shiny Scala SpringXD Large choice of data services DB, clustered MySQL etc Memcache, Redis etc GemFire, Cassandra etc Hadoop, GreenPlum etc Can run virtualized inside PaaS Can run multi-tenant-ified alongside PaaS
  • 12.
    © Copyright 2015Pivotal. All rights reserved. DEMO PHD (or any ODP Core-based Hadoop Distribution) HDFS HAWQ (SQL on Hadoop) GreenplumDB (Analytics DW) GemFire (JSON/Object in memory data grid) Redis (Key Value Store) RabbitMQ SpringXD (Stream Processing/scoring) SpringXD CloudFoundryDataServices HBase Hive PXF (Filtered Pushdown) Direct Store Federated GPHDFS Write behind Persistence Analytic AppsOnline Apps Pivotal Big Data Suite Spark
  • 13.
    © Copyright 2015Pivotal. All rights reserved. The New Data Imperatives Converged Data & Cloud OpenData-Driven Apps
  • 14.
    A NEW PLATFORMFOR A NEW ERA Meet us at the booth ! Come to do a “HAWQ in 2 min” lab Win a Solo2 Beats Headphone !