Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

Evolution of Data
Architectures:
From Hadoop to Data Lake
in becoming Data Driven
Alexandre Vasseur, Pivotal
@PivotalFrance

© Copyright 2015 Pivotal. All rights reserved.
If you have one thing to do
Store Massive
Data Sets
Achieve Continuous
Innovation at Scale
Becoming Data
Driven with Apps

Data Driven Apps
AGILE
DEV & DATA
SCIENCE
MODERN,
COLLABORATIVE
APP & DEV
PLATFORM:
MODERN,
CLOUD-ORIENTED
& OPEN
DATA FABRIC:
MODERN
CLOUD-ORIENTED
& OPEN

The Big Data Problem
Fragmentation ContraintsComplexity

Pivotal + Hortonworks Alliance
•  Started July 2014 around Ambari collaboration
•  Announcing Pivotal Big Data Suite
on Hortonworks Data Platform
•  Advanced support from world’s leading Hortonworks
support services
•  Joint engineering efforts and enhanced Pivotal HD

ODP - Standardize Hadoop Ecosystem
•  Deliver ODP Core to build a versionned, packaged,
tested set of Hadoop components.
•  Focus on developing a platform, rather than projects
•  Initial scope on Apache Hadoop
HDFS / MR / Yarn / Ambari
Remove
vendors lock-in
Ecosystem
Effect
Shorter
Innovation Cycles
http://opendataplatform.org
…

Open Sourced but not just Hadoop
•  Open sourcing all Pivotal Big Data Suite components
–  Pivotal GemFire - premium in-memory NoSQL database
–  Pivotal HAWQ - world’s leading SQL compliant enterprise
SQL on Hadoop
–  Pivotal Greenplum Database - advanced enterprise MPP
analytic database with Hadoop interconnect
– SpringXD - Unified, distributed, and extensible system for
data driven application development

HAWQ SQL on Hadoop
PROVEN AT SCALE
PRODUCTIVE
NATIVE on HADOOP / ODP
OPEN & EXTENSIBLE

HAWQ SQL on Hadoop
10+ years R&D in Massively Parallel SQL
SQL engine at peta scale analytics in world’s largest industries
Mature cost based query optimizer
Full SQL semantics
Rich ecosystem of ELT/dataviz/BI & partners
PL/*, build in analytics, R native framing
All Hadoop formats (gz, Parquet, HAWQ etc)
Data node short circuit reads (colocated, not M/R based)
Predicate pushdown to Hive, HBase
HAWQ PXF: Query federation to NoSQL, DB, etc

SpringXD
Data from anywhere, to anywhere
Real time & batch
Ingest + analytics
+ jobs orchestration
Developer friendly
Built in connectors
With / without Spark
DSL
Your choice of Hadoop
Your choice of messaging
Standalone, YARN & outside Hadoop

Simplify Data Driven Applications
•  PaaS with NoSQL & Big Data choices built-in
•  Emergence of vertical services: Mobile, IoT, …
Data centric runtimes built in
Java/PHP/Node.js/Ruby
Python
R/Shiny
Scala
SpringXD
Large choice of data services
DB, clustered MySQL etc
Memcache, Redis etc
GemFire, Cassandra etc
Hadoop, GreenPlum etc
Can run virtualized inside PaaS
Can run multi-tenant-ified alongside PaaS

DEMO
PHD (or any ODP Core-based Hadoop Distribution)
HDFS
HAWQ
(SQL on Hadoop)
GreenplumDB
(Analytics DW)
GemFire
(JSON/Object
in memory data grid)
Redis
(Key Value Store)
RabbitMQ
SpringXD
(Stream Processing/scoring)
SpringXD
CloudFoundryDataServices
HBase Hive
PXF
(Filtered Pushdown)
Direct Store
Federated
GPHDFS
Write behind
Persistence
Analytic AppsOnline Apps
Pivotal
Big Data Suite
Spark

The New Data Imperatives
Converged
Data & Cloud
OpenData-Driven
Apps

A NEW PLATFORM FOR A NEW ERA
Meet us at the booth !
Come to do a “HAWQ in 2 min” lab
Win a Solo2 Beats Headphone !

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

More Related Content

What's hot

Viewers also liked

Similar to Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015

More from NoSQLmatters

Recently uploaded

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015