Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
10 Amazing Things 
To Do With a 
Hadoop-Based Data 
Lake 
Strata Conference New York 2014 
Greg Chase 
Director, Product M...
Pivotal Business Data Lake Architecture 
Sources Ingestion 
Action Tier 
Tier 
Insights 
Tier 
Unified Operations Tier 
Co...
Pivotal Business Data Lake Architecture 
Sources Ingestion 
Action Tier 
Tier 
Insights 
Tier 
Unified Operations Tier 
Co...
1. Store Massive Data Sets 
… 
Rack 1 Rack 2 Rack 3 Rack n 
Scale-out: 
use 
commodity 
hardware 
and storage 
© 2014 Pivo...
2. Mix Disparate Data Sources 
101010101010 
Sensor data 
CRM data 
Website click streams 
Schema 
flexibility: 
adsorb 
d...
Pivotal Business Data Lake Architecture 
Sources Ingestion 
Action Tier 
Tier 
Insights 
Tier 
Unified Operations Tier 
Co...
3. Ingest Bulk Data 
D … 
D … D 
Microbatch 
Scalable 
open source 
tools for 
batch 
loading data 
Batch 
Flume 
 Event ...
4. Ingest High-Velocity Data 
Capture all 
volatile data. 
Apply 
structure. 
1010101010101010101 
1010101010101010101 
10...
Pivotal Business Data Lake Architecture 
Sources Ingestion 
Action Tier 
Tier 
Insights 
Tier 
Unified Operations Tier 
Co...
5. Apply Structure to Unstructured / Semi- 
Structured Data 
Flexible 
processing 
of different 
data types 
101010101010 ...
6. Make Data Available for MPP SQL Analysis 
Name 
Node 
Fast 
processing 
for 
advanced 
analytics in 
many 
supported 
H...
7. Achieve Data Integration 
Create multi-dimensional 
analytical 
models. 
101010101010 
1 
101010101010 
1 
101010101010...
8. Improve Machine Learning & Predictive 
Analytics 
Richer, 
deeper data 
sets for 
accurate 
predictive 
analytics. 
HAW...
9. Deploy Real-Time Automation at Scale 
Respond in 
real-time, at 
scale. 
Archive 
history in 
Hadoop. 
Pivotal 
GemFire...
10. Achieve Continuous Innovation at Scale 
HAWQ 
Master 
HAWQ 
Segment(s) 
HAWQ 
Segment(s) 
HAWQ 
Segment(s) 
In-Memory ...
Increase Value Derived from Data With a Data 
Lake 
Store 
massive 
data sets 
Mix 
disparate 
data 
Ingest bulk 
data 
In...
For more information on 
Pivotal Big Data Suite 
Visit Pivotal.io/big-data 
© 2014 Pivotal Software, Inc. All rights reser...
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
Upcoming SlideShare
Loading in …5
×

10 Amazing Things To Do With a Hadoop-Based Data Lake

32,119 views

Published on

Greg Chase, Director, Product Marketing presents Big Data 10 A
mazing Things to do With A Hadoop-based Data Lake at the Strata Conference + Hadoop World 2014 in NYC.

Published in: Technology
  • Be the first to comment

10 Amazing Things To Do With a Hadoop-Based Data Lake

  1. 1. 10 Amazing Things To Do With a Hadoop-Based Data Lake Strata Conference New York 2014 Greg Chase Director, Product Marketing, Pivotal Software © 2014 Pivotal Software, Inc. All rights reserved. 2
  2. 2. Pivotal Business Data Lake Architecture Sources Ingestion Action Tier Tier Insights Tier Unified Operations Tier Command Center Spring XD, Oozie Processing Tier GemFire XD HAWQ/Greenplum Distillation Tier Pivotal HD Unstructured and structured data GemFire XD Spring XD Spring XD GemFire XD Sqoop Flume Spring XD GemFire XD HAWQ HBase HAWQ GemFire XD HBase HAWQ MapReduce Hive Pig Query interfaces Clickstream Sensor Data Weblogs Network Data CRM Data ERP Data GemFire RabbitMQ Redis Pivotal CF © 2014 Pivotal Software, Inc. All rights reserved. 3
  3. 3. Pivotal Business Data Lake Architecture Sources Ingestion Action Tier Tier Insights Tier Unified Operations Tier Command Center Spring XD, Oozie Processing Tier GemFire XD HAWQ/Greenplum Distillation Tier Pivotal HD Unstructured and structured data GemFire XD Spring XD Spring XD GemFire XD Sqoop Flume Spring XD GemFire XD HAWQ HBase HAWQ GemFire XD HBase HAWQ MapReduce Hive Pig Query interfaces Clickstream Sensor Data Weblogs Network Data CRM Data ERP Data GemFire RabbitMQ Redis Pivotal CF © 2014 Pivotal Software, Inc. All rights reserved. 4
  4. 4. 1. Store Massive Data Sets … Rack 1 Rack 2 Rack 3 Rack n Scale-out: use commodity hardware and storage © 2014 Pivotal Software, Inc. All rights reserved. 5
  5. 5. 2. Mix Disparate Data Sources 101010101010 Sensor data CRM data Website click streams Schema flexibility: adsorb different data types from data sources © 2014 Pivotal Software, Inc. All rights reserved. 6
  6. 6. Pivotal Business Data Lake Architecture Sources Ingestion Action Tier Tier Insights Tier Unified Operations Tier Command Center Spring XD, Oozie Processing Tier GemFire XD HAWQ/Greenplum Distillation Tier Pivotal HD Unstructured and structured data GemFire XD Spring XD Spring XD GemFire XD Sqoop Flume Spring XD GemFire XD HAWQ HBase HAWQ GemFire XD HBase HAWQ MapReduce Hive Pig Query interfaces Clickstream Sensor Data Weblogs Network Data CRM Data ERP Data GemFire RabbitMQ Redis Pivotal CF © 2014 Pivotal Software, Inc. All rights reserved. 7
  7. 7. 3. Ingest Bulk Data D … D … D Microbatch Scalable open source tools for batch loading data Batch Flume  Event driven  Any source Spring XD  Bulk load  With processing  With analytics  Any source Sqoop  Bulk load  RDBMS © 2014 Pivotal Software, Inc. All rights reserved. 8
  8. 8. 4. Ingest High-Velocity Data Capture all volatile data. Apply structure. 1010101010101010101 1010101010101010101 1010101010101010101 Spring XD  Bulk load  Real-time ingest  With processing  With analytics  Any source Pivotal GemFire XD  Advanced DB operations  Consistency  Reliable persistence  Convert to structured Streaming data © 2014 Pivotal Software, Inc. All rights reserved. 9
  9. 9. Pivotal Business Data Lake Architecture Sources Ingestion Action Tier Tier Insights Tier Unified Operations Tier Command Center Spring XD, Oozie Processing Tier GemFire XD HAWQ/Greenplum Distillation Tier Pivotal HD Unstructured and structured data GemFire XD Spring XD Spring XD GemFire XD Sqoop Flume Spring XD GemFire XD HAWQ HBase HAWQ GemFire XD HBase HAWQ MapReduce Hive Pig Query interfaces Clickstream Sensor Data Weblogs Network Data CRM Data ERP Data GemFire RabbitMQ Redis Pivotal CF © 2014 Pivotal Software, Inc. All rights reserved. 10
  10. 10. 5. Apply Structure to Unstructured / Semi- Structured Data Flexible processing of different data types 101010101010 1 101010101010 1 101010101010 1 © 2014 Pivotal Software, Inc. All rights reserved. 11
  11. 11. 6. Make Data Available for MPP SQL Analysis Name Node Fast processing for advanced analytics in many supported HDFS formats Resource Manager HAWQ Master Data Node Node Manager HAWQ Segment(s) Data Node Node Manager Data Node Node Manager Data Node Node Manager HAWQ Segment(s) HAWQ Segment(s) HAWQ Segment(s) Hadoop Cluster © 2014 Pivotal Software, Inc. All rights reserved. 12
  12. 12. 7. Achieve Data Integration Create multi-dimensional analytical models. 101010101010 1 101010101010 1 101010101010 1 © 2014 Pivotal Software, Inc. All rights reserved. 13
  13. 13. 8. Improve Machine Learning & Predictive Analytics Richer, deeper data sets for accurate predictive analytics. HAWQ Master HAWQ Segment(s) HAWQ Segment(s) HAWQ Segment(s) © 2014 Pivotal Software, Inc. All rights reserved. 14
  14. 14. 9. Deploy Real-Time Automation at Scale Respond in real-time, at scale. Archive history in Hadoop. Pivotal GemFire XD 101010101010 Web App Web App Web App 101010101010 In-Memory © 2014 Pivotal Software, Inc. All rights reserved. 15
  15. 15. 10. Achieve Continuous Innovation at Scale HAWQ Master HAWQ Segment(s) HAWQ Segment(s) HAWQ Segment(s) In-Memory Web App Web App Web App 101010101010 Sensor data CRM data Website click streams Deploy automation At scale Capture and store all data Analyze to discover insights & algorithms © 2014 Pivotal Software, Inc. All rights reserved. 16
  16. 16. Increase Value Derived from Data With a Data Lake Store massive data sets Mix disparate data Ingest bulk data Ingest high-velocity data Apply structure Enable MPP analysis Achieve data integration Business Value Improve predictive analytics Deploy real-time automation at scale Achieve continuous innovation © 2014 Pivotal Software, Inc. All rights reserved. 17
  17. 17. For more information on Pivotal Big Data Suite Visit Pivotal.io/big-data © 2014 Pivotal Software, Inc. All rights reserved. 18

×