SlideShare a Scribd company logo
1 of 34
A NEW PLATFORM FOR A NEW ERA
2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved.
Big & Fast Data
Real-world architecture blueprints
…for building infinitely scalable systems
Frederico Melo
fmelo@gopivotal.com
3© Copyright 2013 Pivotal. All rights reserved.
Agenda
About Pivotal
Building infinitely scalable systems
Big + Fast Data
Pivotal Platform
Real world use-cases
4© Copyright 2013 Pivotal. All rights reserved.
Pivotal Platform
Cloud Storage
Virtualization
Data &
Analytics
Platform
Cloud
Application
Platform
Data-Driven
Application
Development
Pivotal Data
Science Labs
5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved.
Building infinitely scalable
systems
6© Copyright 2013 Pivotal. All rights reserved.
What is scalability?
Scalability: How a system behave (scale) as we add volume or
load, incrementally increasing its processing power
Scalable: System which handles increases of volume or
load, increasing his throughput
Linear Scalability: Increase the throughput at the same rate as we
increase the load (twice the requests coming, twice the
throughput), keeping the same response time per transaction.
Scalability limit: the limit where a system stop scaling as we add more
load  we have a bottleneck!!
7© Copyright 2013 Pivotal. All rights reserved.
Vertical scalability x Horizontal scalability
Scale up x Scale out
8© Copyright 2013 Pivotal. All rights reserved.
Usual computer system
Location
Firewall
External
Storage
Network
RouterProcessor Processor Processor Processor
CPUs
Main Memory (RAM)
Internal Disk
NIC
9© Copyright 2013 Pivotal. All rights reserved.
What could prevent from scaling out?
Location
Firewall
External
Storage
Network
RouterProcessor Processor Processor Processor
CPUs
Main Memory (RAM)
Internal Disk
NIC
10© Copyright 2013 Pivotal. All rights reserved.
Location
Firewall
External
Storage
Network
RouterProcessor Processor Processor Processor
CPUs
Main Memory (RAM)
Internal Disk
NIC
I/O
I/O I/O
I/O
Disc I/O
Memory I/O
Network I/O
External Devices I/O
What could prevent from scaling out?
11© Copyright 2013 Pivotal. All rights reserved.
Typical latencies
12© Copyright 2013 Pivotal. All rights reserved.
Minimizing Disk I/O
Maximize Disc Speed: Ultra-fast disks, SSDs
Parallelize Disc I/O: Write to multiple files/disks at once.
Get rid of updates: avoid disk seek, although there’s still I/O
Minimize inserts: do only batch inserts.
Asynchronous writes: remove disk I/O from transactions critical path
13© Copyright 2013 Pivotal. All rights reserved.
Minimizing Disk I/O
Columnar Databases
Parallelizing Disc I/O…
14© Copyright 2013 Pivotal. All rights reserved.
Minimizing Disk I/O
In-memory Databases
All in-memory storage
In- Memory
System
ProcessorProcessor
Latency times are memory-based
Useful to *some* scenarios
However there’s no distributed processing
(processor usually becomes a bottleneck and
limits horizontal scalability)
15© Copyright 2013 Pivotal. All rights reserved.
Starting to scale out…
Now we’re not pinned to disc I/O, we can start to divide and
distribute processing power, scaling out
In- Memory
System
ProcessorProcessor
In- Memory
System
ProcessorProcessor
In- Memory
System
ProcessorProcessor
16© Copyright 2013 Pivotal. All rights reserved.
Starting to scale out…
… but then the network (the only shared resource) can be a
bottleneck!
In- Memory
System
ProcessorProcessor
In- Memory
System
ProcessorProcessor
In- Memory
System
ProcessorProcessor
Obj Obj
17© Copyright 2013 Pivotal. All rights reserved.
Minimizing Disk & Network I/O
Maximize Network Speed: Fast GB networks, Fiber Channel
Bring computing close to data: Data-aware procedures, data
partitioning
Improve algorithms: Avoid multiple hops, avoid slow members.
18© Copyright 2013 Pivotal. All rights reserved.
Minimizing Disk & Network I/O
Hadoop
YARN
19© Copyright 2013 Pivotal. All rights reserved.
HDFS also distributes data among nodes - but persists on disk.
I/O is parallelized since it’s distributed.. But there’s no in-memory
latency.
Network latency + disk access latency  slow for real-time
queries / processes.
More suitable for data transformation / load / staging / batches
Minimizing Disk & Network I/O
Hadoop
20© Copyright 2013 Pivotal. All rights reserved.
Great when we have long-lasting jobs over huge amounts of data
(multiple terabytes / petabytes)
Great for non-structured data (although Hive can do SQL-like)
Can’t handle updates (insert-only model)
Not suitable for low latency
Minimizing Disk & Network I/O
Hadoop
21© Copyright 2013 Pivotal. All rights reserved.
Data can usually be both distributed to different members and
partitioned to different files
Minimizing Disk & Network I/O
MPP Databases
MPP Database
member
ProcessorProcessor
MPP Database
member
ProcessorProcessor
External
Storage
22© Copyright 2013 Pivotal. All rights reserved.
However, latency is still limited to disk access
Inserts are usually very slow (too many indexes, many
partitions, many distributions)
Great for huge amounts of structured data
Minimizing Disk & Network I/O
MPP Databases
23© Copyright 2013 Pivotal. All rights reserved.
In- Memory
System
ProcessorProcessor
In- Memory
System
ProcessorProcessor
In- Memory
System
ProcessorProcessor
Obj
ObjObjObj
Distribute data in order do minimize their transference between nodes
Functions are also distributed, executing close to where the data is
Minimizing Disk & Network I/O
In-memory Data Grids
24© Copyright 2013 Pivotal. All rights reserved.
Data can be either distributed, replicated or both between nodes
In-memory access times
Related data should be co-located to avoid network hops on joins
Now we’re not pinned to either disc I/O or network I/O
… but we’re limited to the server’s memory capacity :-)
Minimizing Disk & Network I/O
In-memory Data Grids
25© Copyright 2013 Pivotal. All rights reserved.
Strategy
Access
Latency
Horizonaly
Scalable
Storage
I/O
Capacity Variety
Traditional
RDBMS
Disk No Disk Gigabytes Structured
In-Memory DB Memory No Memory Few Gb Structured
Columnar DB Disk No* Partitioned Disk Terabytes Structured
Hadoop Disk Yes Partitioned Disk Petabytes Unstructured*
In-Memory Data
Grid
Memory Yes Memory Hundreds Gb Unstructured
New SQL Grid Memory Yes Memory Hundreds Gb Structured
MPP Database Disk Yes Partitioned Disk Petabytes Structured
26© Copyright 2013 Pivotal. All rights reserved.
Fast Data meets Big Data
Working together they enable entirely new business models.
27© Copyright 2013 Pivotal. All rights reserved.
Ref. Architecture
Transactional systems
Distributed non- structured
data computing
Enterprise Data
Warehouse (RDBMS)
In- Memory Data Grid
IMDG
Member Member
Member Member
Data Ingest
Asynchronous
Persistence
Analytic Data Mart
(MPP Database)
Real-time
analytical queries
Big Data
analytical queries
"Hot data" search
Reference DataMap- ReduceBig Data jobs
Hive Pig
Transactional SystemTransactional SystemTransactional System
28© Copyright 2013 Pivotal. All rights reserved.
Ref. Architecture
Real-time Analytics
Real case
SQLFireCluster
Sales Visits Invoices
Message
Dispatcher
Table Functions
Insert / Update
SQL
Columnar Database
OLTP
transactions
and real-time
analytics
OLAP, traditional
analytics and
archival database
Traditional FS or
Hadoop FS
Polling:
- XML File access
Stored Procedure
Fire any needed SP
XML
Polling
Consumer
Sales
Stored ProcedureStored Procedure
- Real-time queries
Invoices w/ taxes, sales reps,
customer, customer visits, ...
Async Insert/ update
Async
End-User GUI
Table Function
- Batching
- Long-running analytics
Real-time reports
Data Load
InvoicesInvoicesOther entities
29© Copyright 2013 Pivotal. All rights reserved.
GemFire
/SQLFire
Cluster
JCA Connector
Greenplum
Hadoop FileSystem Greenplum DB
Highly scalable
structured +
unstructured
data analytics
Async
UnstructuredData
StructuredData
Pivotal HAWK
Highly
scalable
transaction
processing
and real-time
analytics
Data Model
ANSI SQL Java / .NET / C++ APIs Web Services
Legacy API
Async
HDFS Connector JDBC Pipes
Stored ProcedureStored ProcedureLegacy App
Stored ProcedureStored ProcedureLegacy App
Web Services
Ref. Architecture
Data Service
30© Copyright 2013 Pivotal. All rights reserved.
GemFire
/SQLFire
Cluster
Mainframe Connector
MainframeGreenplum
Hadoop FileSystem Greenplum DB
Highly scalable structured +
unstructured data analytics
Async
UnstructuredData
StructuredData
Pivotal HAWK
Highly
scalable
transaction
processing
Stored ProcedureStored ProcedureLegacy App
Data Model
ANSI SQL Java / .NET / C++ APIs Web Services
CICS Web Services
Async
HDFS Connector JDBC Pipes
Stored ProcedureStored ProcedureLegacy App
Transaction Manager
Database
Modernization
Stored ProcedureStored ProcedureModernized App
Stored ProcedureStored ProcedureModernized App
Ref. Architecture
App Modernization
31© Copyright 2013 Pivotal. All rights reserved.
Ref. Architecture
App Modernization
• Real case
• Brazilian banking industry
32© Copyright 2013 Pivotal. All rights reserved.
SQLFire/Gemfire
Cluster
Data Model
Message
Dispatcher
Content
Enricher
Async
Insert / Update
SQL
Transactions
and real-time
analytics
Async
Re-calculate
RT analytics data
Distributed Stored
Procedure
Distributed Stored
ProcedureDistributed
Function
Update
RT Analytic Model
Hadoop FileSystem Greenplum DB
Highly scalable
data analytics
Pivotal
HAWK
Java API .NET API C++ API Web Services
Transactional Applications
Stored ProcedureStored ProcedureApplication
Stored ProcedureStored ProcedureApplication
Stored ProcedureStored ProcedureApplication
Stored ProcedureStored ProcedureApplication
Stored ProcedureStored ProcedureApplication
Analytic Applications
Stored ProcedureStored ProcedureApplication
Ref. Architecture
Summary
33© Copyright 2013 Pivotal. All rights reserved.
Thank You
A NEW PLATFORM FOR A NEW ERA

More Related Content

What's hot

Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at UberDataWorks Summit
 
Migrating legacy ERP data into Hadoop
Migrating legacy ERP data into HadoopMigrating legacy ERP data into Hadoop
Migrating legacy ERP data into HadoopDataWorks Summit
 
IoT Story: From Edge to HDP
IoT Story: From Edge to HDPIoT Story: From Edge to HDP
IoT Story: From Edge to HDPDataWorks Summit
 
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data PlatformLessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data PlatformDataWorks Summit
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Spark Summit
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonDataWorks Summit/Hadoop Summit
 
The Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and ResiliencyThe Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and ResiliencyAlluxio, Inc.
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeDataWorks Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Large-scaled telematics analytics
Large-scaled telematics analyticsLarge-scaled telematics analytics
Large-scaled telematics analyticsDataWorks Summit
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityData Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityDataWorks Summit
 
Real Time Streaming Architecture at Ford
Real Time Streaming Architecture at FordReal Time Streaming Architecture at Ford
Real Time Streaming Architecture at FordDataWorks Summit
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームMasayuki Matsushita
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudDataWorks Summit
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 

What's hot (20)

Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at Uber
 
Migrating legacy ERP data into Hadoop
Migrating legacy ERP data into HadoopMigrating legacy ERP data into Hadoop
Migrating legacy ERP data into Hadoop
 
IoT Story: From Edge to HDP
IoT Story: From Edge to HDPIoT Story: From Edge to HDP
IoT Story: From Edge to HDP
 
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data PlatformLessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
The Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and ResiliencyThe Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and Resiliency
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Large-scaled telematics analytics
Large-scaled telematics analyticsLarge-scaled telematics analytics
Large-scaled telematics analytics
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityData Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data Security
 
Real Time Streaming Architecture at Ford
Real Time Streaming Architecture at FordReal Time Streaming Architecture at Ford
Real Time Streaming Architecture at Ford
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 

Viewers also liked

A Stock Prediction System using Open-Source Software
A Stock Prediction System using Open-Source SoftwareA Stock Prediction System using Open-Source Software
A Stock Prediction System using Open-Source SoftwareFred Melo
 
GemFire Data Fabric: Extrema performance e throughput transacional com alta d...
GemFire Data Fabric: Extrema performance e throughput transacional com alta d...GemFire Data Fabric: Extrema performance e throughput transacional com alta d...
GemFire Data Fabric: Extrema performance e throughput transacional com alta d...Fred Melo
 
Architecting for cloud native data: Data Microservices done right using Sprin...
Architecting for cloud native data: Data Microservices done right using Sprin...Architecting for cloud native data: Data Microservices done right using Sprin...
Architecting for cloud native data: Data Microservices done right using Sprin...Fred Melo
 
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...William Markito Oliveira
 
Build your first Internet of Things app today with Open Source
Build your first Internet of Things app today with Open SourceBuild your first Internet of Things app today with Open Source
Build your first Internet of Things app today with Open SourceApache Geode
 

Viewers also liked (6)

A Stock Prediction System using Open-Source Software
A Stock Prediction System using Open-Source SoftwareA Stock Prediction System using Open-Source Software
A Stock Prediction System using Open-Source Software
 
GemFire Data Fabric: Extrema performance e throughput transacional com alta d...
GemFire Data Fabric: Extrema performance e throughput transacional com alta d...GemFire Data Fabric: Extrema performance e throughput transacional com alta d...
GemFire Data Fabric: Extrema performance e throughput transacional com alta d...
 
Architecting for cloud native data: Data Microservices done right using Sprin...
Architecting for cloud native data: Data Microservices done right using Sprin...Architecting for cloud native data: Data Microservices done right using Sprin...
Architecting for cloud native data: Data Microservices done right using Sprin...
 
Geode on Docker
Geode on DockerGeode on Docker
Geode on Docker
 
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
 
Build your first Internet of Things app today with Open Source
Build your first Internet of Things app today with Open SourceBuild your first Internet of Things app today with Open Source
Build your first Internet of Things app today with Open Source
 

Similar to Big and Fast Data - Building Infinitely Scalable Systems

Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias StorageFran Navarro
 
Building Scalable Applications using Pivotal Gemfire/Apache Geode
Building Scalable Applications using Pivotal Gemfire/Apache GeodeBuilding Scalable Applications using Pivotal Gemfire/Apache Geode
Building Scalable Applications using Pivotal Gemfire/Apache Geodeimcpune
 
Lecture 24
Lecture 24Lecture 24
Lecture 24Shani729
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
GLOC Keynote 2014 - In-memory
GLOC Keynote 2014 - In-memoryGLOC Keynote 2014 - In-memory
GLOC Keynote 2014 - In-memoryConnor McDonald
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireJohn Blum
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopHazelcast
 
Get to know the browser better and write faster web apps
Get to know the browser better   and write faster web appsGet to know the browser better   and write faster web apps
Get to know the browser better and write faster web appsLior Bar-On
 
Big Data 2107 for Ribbon
Big Data 2107 for RibbonBig Data 2107 for Ribbon
Big Data 2107 for RibbonSamuel Dratwa
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewPaulo Fagundes
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureKhalid Salama
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataInfiniteGraph
 
Exadata x4 for_sap
Exadata x4 for_sapExadata x4 for_sap
Exadata x4 for_sapFran Navarro
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Alluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory SpeedAlluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory SpeedAlluxio, Inc.
 
OOW 2013 Highlights
OOW 2013 HighlightsOOW 2013 Highlights
OOW 2013 HighlightsAna Galindo
 

Similar to Big and Fast Data - Building Infinitely Scalable Systems (20)

Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias Storage
 
Building Scalable Applications using Pivotal Gemfire/Apache Geode
Building Scalable Applications using Pivotal Gemfire/Apache GeodeBuilding Scalable Applications using Pivotal Gemfire/Apache Geode
Building Scalable Applications using Pivotal Gemfire/Apache Geode
 
Lecture 24
Lecture 24Lecture 24
Lecture 24
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
GLOC Keynote 2014 - In-memory
GLOC Keynote 2014 - In-memoryGLOC Keynote 2014 - In-memory
GLOC Keynote 2014 - In-memory
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFire
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
 
Get to know the browser better and write faster web apps
Get to know the browser better   and write faster web appsGet to know the browser better   and write faster web apps
Get to know the browser better and write faster web apps
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
 
Big Data 2107 for Ribbon
Big Data 2107 for RibbonBig Data 2107 for Ribbon
Big Data 2107 for Ribbon
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overview
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
 
Exadata x4 for_sap
Exadata x4 for_sapExadata x4 for_sap
Exadata x4 for_sap
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Alluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory SpeedAlluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory Speed
 
OOW 2013 Highlights
OOW 2013 HighlightsOOW 2013 Highlights
OOW 2013 Highlights
 

Recently uploaded

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Recently uploaded (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Big and Fast Data - Building Infinitely Scalable Systems

  • 1. A NEW PLATFORM FOR A NEW ERA
  • 2. 2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved. Big & Fast Data Real-world architecture blueprints …for building infinitely scalable systems Frederico Melo fmelo@gopivotal.com
  • 3. 3© Copyright 2013 Pivotal. All rights reserved. Agenda About Pivotal Building infinitely scalable systems Big + Fast Data Pivotal Platform Real world use-cases
  • 4. 4© Copyright 2013 Pivotal. All rights reserved. Pivotal Platform Cloud Storage Virtualization Data & Analytics Platform Cloud Application Platform Data-Driven Application Development Pivotal Data Science Labs
  • 5. 5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved. Building infinitely scalable systems
  • 6. 6© Copyright 2013 Pivotal. All rights reserved. What is scalability? Scalability: How a system behave (scale) as we add volume or load, incrementally increasing its processing power Scalable: System which handles increases of volume or load, increasing his throughput Linear Scalability: Increase the throughput at the same rate as we increase the load (twice the requests coming, twice the throughput), keeping the same response time per transaction. Scalability limit: the limit where a system stop scaling as we add more load  we have a bottleneck!!
  • 7. 7© Copyright 2013 Pivotal. All rights reserved. Vertical scalability x Horizontal scalability Scale up x Scale out
  • 8. 8© Copyright 2013 Pivotal. All rights reserved. Usual computer system Location Firewall External Storage Network RouterProcessor Processor Processor Processor CPUs Main Memory (RAM) Internal Disk NIC
  • 9. 9© Copyright 2013 Pivotal. All rights reserved. What could prevent from scaling out? Location Firewall External Storage Network RouterProcessor Processor Processor Processor CPUs Main Memory (RAM) Internal Disk NIC
  • 10. 10© Copyright 2013 Pivotal. All rights reserved. Location Firewall External Storage Network RouterProcessor Processor Processor Processor CPUs Main Memory (RAM) Internal Disk NIC I/O I/O I/O I/O Disc I/O Memory I/O Network I/O External Devices I/O What could prevent from scaling out?
  • 11. 11© Copyright 2013 Pivotal. All rights reserved. Typical latencies
  • 12. 12© Copyright 2013 Pivotal. All rights reserved. Minimizing Disk I/O Maximize Disc Speed: Ultra-fast disks, SSDs Parallelize Disc I/O: Write to multiple files/disks at once. Get rid of updates: avoid disk seek, although there’s still I/O Minimize inserts: do only batch inserts. Asynchronous writes: remove disk I/O from transactions critical path
  • 13. 13© Copyright 2013 Pivotal. All rights reserved. Minimizing Disk I/O Columnar Databases Parallelizing Disc I/O…
  • 14. 14© Copyright 2013 Pivotal. All rights reserved. Minimizing Disk I/O In-memory Databases All in-memory storage In- Memory System ProcessorProcessor Latency times are memory-based Useful to *some* scenarios However there’s no distributed processing (processor usually becomes a bottleneck and limits horizontal scalability)
  • 15. 15© Copyright 2013 Pivotal. All rights reserved. Starting to scale out… Now we’re not pinned to disc I/O, we can start to divide and distribute processing power, scaling out In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor
  • 16. 16© Copyright 2013 Pivotal. All rights reserved. Starting to scale out… … but then the network (the only shared resource) can be a bottleneck! In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor Obj Obj
  • 17. 17© Copyright 2013 Pivotal. All rights reserved. Minimizing Disk & Network I/O Maximize Network Speed: Fast GB networks, Fiber Channel Bring computing close to data: Data-aware procedures, data partitioning Improve algorithms: Avoid multiple hops, avoid slow members.
  • 18. 18© Copyright 2013 Pivotal. All rights reserved. Minimizing Disk & Network I/O Hadoop YARN
  • 19. 19© Copyright 2013 Pivotal. All rights reserved. HDFS also distributes data among nodes - but persists on disk. I/O is parallelized since it’s distributed.. But there’s no in-memory latency. Network latency + disk access latency  slow for real-time queries / processes. More suitable for data transformation / load / staging / batches Minimizing Disk & Network I/O Hadoop
  • 20. 20© Copyright 2013 Pivotal. All rights reserved. Great when we have long-lasting jobs over huge amounts of data (multiple terabytes / petabytes) Great for non-structured data (although Hive can do SQL-like) Can’t handle updates (insert-only model) Not suitable for low latency Minimizing Disk & Network I/O Hadoop
  • 21. 21© Copyright 2013 Pivotal. All rights reserved. Data can usually be both distributed to different members and partitioned to different files Minimizing Disk & Network I/O MPP Databases MPP Database member ProcessorProcessor MPP Database member ProcessorProcessor External Storage
  • 22. 22© Copyright 2013 Pivotal. All rights reserved. However, latency is still limited to disk access Inserts are usually very slow (too many indexes, many partitions, many distributions) Great for huge amounts of structured data Minimizing Disk & Network I/O MPP Databases
  • 23. 23© Copyright 2013 Pivotal. All rights reserved. In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor Obj ObjObjObj Distribute data in order do minimize their transference between nodes Functions are also distributed, executing close to where the data is Minimizing Disk & Network I/O In-memory Data Grids
  • 24. 24© Copyright 2013 Pivotal. All rights reserved. Data can be either distributed, replicated or both between nodes In-memory access times Related data should be co-located to avoid network hops on joins Now we’re not pinned to either disc I/O or network I/O … but we’re limited to the server’s memory capacity :-) Minimizing Disk & Network I/O In-memory Data Grids
  • 25. 25© Copyright 2013 Pivotal. All rights reserved. Strategy Access Latency Horizonaly Scalable Storage I/O Capacity Variety Traditional RDBMS Disk No Disk Gigabytes Structured In-Memory DB Memory No Memory Few Gb Structured Columnar DB Disk No* Partitioned Disk Terabytes Structured Hadoop Disk Yes Partitioned Disk Petabytes Unstructured* In-Memory Data Grid Memory Yes Memory Hundreds Gb Unstructured New SQL Grid Memory Yes Memory Hundreds Gb Structured MPP Database Disk Yes Partitioned Disk Petabytes Structured
  • 26. 26© Copyright 2013 Pivotal. All rights reserved. Fast Data meets Big Data Working together they enable entirely new business models.
  • 27. 27© Copyright 2013 Pivotal. All rights reserved. Ref. Architecture Transactional systems Distributed non- structured data computing Enterprise Data Warehouse (RDBMS) In- Memory Data Grid IMDG Member Member Member Member Data Ingest Asynchronous Persistence Analytic Data Mart (MPP Database) Real-time analytical queries Big Data analytical queries "Hot data" search Reference DataMap- ReduceBig Data jobs Hive Pig Transactional SystemTransactional SystemTransactional System
  • 28. 28© Copyright 2013 Pivotal. All rights reserved. Ref. Architecture Real-time Analytics Real case SQLFireCluster Sales Visits Invoices Message Dispatcher Table Functions Insert / Update SQL Columnar Database OLTP transactions and real-time analytics OLAP, traditional analytics and archival database Traditional FS or Hadoop FS Polling: - XML File access Stored Procedure Fire any needed SP XML Polling Consumer Sales Stored ProcedureStored Procedure - Real-time queries Invoices w/ taxes, sales reps, customer, customer visits, ... Async Insert/ update Async End-User GUI Table Function - Batching - Long-running analytics Real-time reports Data Load InvoicesInvoicesOther entities
  • 29. 29© Copyright 2013 Pivotal. All rights reserved. GemFire /SQLFire Cluster JCA Connector Greenplum Hadoop FileSystem Greenplum DB Highly scalable structured + unstructured data analytics Async UnstructuredData StructuredData Pivotal HAWK Highly scalable transaction processing and real-time analytics Data Model ANSI SQL Java / .NET / C++ APIs Web Services Legacy API Async HDFS Connector JDBC Pipes Stored ProcedureStored ProcedureLegacy App Stored ProcedureStored ProcedureLegacy App Web Services Ref. Architecture Data Service
  • 30. 30© Copyright 2013 Pivotal. All rights reserved. GemFire /SQLFire Cluster Mainframe Connector MainframeGreenplum Hadoop FileSystem Greenplum DB Highly scalable structured + unstructured data analytics Async UnstructuredData StructuredData Pivotal HAWK Highly scalable transaction processing Stored ProcedureStored ProcedureLegacy App Data Model ANSI SQL Java / .NET / C++ APIs Web Services CICS Web Services Async HDFS Connector JDBC Pipes Stored ProcedureStored ProcedureLegacy App Transaction Manager Database Modernization Stored ProcedureStored ProcedureModernized App Stored ProcedureStored ProcedureModernized App Ref. Architecture App Modernization
  • 31. 31© Copyright 2013 Pivotal. All rights reserved. Ref. Architecture App Modernization • Real case • Brazilian banking industry
  • 32. 32© Copyright 2013 Pivotal. All rights reserved. SQLFire/Gemfire Cluster Data Model Message Dispatcher Content Enricher Async Insert / Update SQL Transactions and real-time analytics Async Re-calculate RT analytics data Distributed Stored Procedure Distributed Stored ProcedureDistributed Function Update RT Analytic Model Hadoop FileSystem Greenplum DB Highly scalable data analytics Pivotal HAWK Java API .NET API C++ API Web Services Transactional Applications Stored ProcedureStored ProcedureApplication Stored ProcedureStored ProcedureApplication Stored ProcedureStored ProcedureApplication Stored ProcedureStored ProcedureApplication Stored ProcedureStored ProcedureApplication Analytic Applications Stored ProcedureStored ProcedureApplication Ref. Architecture Summary
  • 33. 33© Copyright 2013 Pivotal. All rights reserved. Thank You
  • 34. A NEW PLATFORM FOR A NEW ERA

Editor's Notes

  1. There is a significant opportunity for EMC’s customers take technology leadership, not only at the infrastructure level, but also across the rapidly growing and fast-moving application development and big data markets.  Pivotal is aligning resources for our customers to leverage this transformational period, and drive more quickly towards the rising opportunities.As the assets from EMC and VMware come together under Pivotal they fall into three strategic areas. Data and Analytics PlatformCloud Application PlatformData-Driven Application DevelopmentWe will discuss each of these today
  2. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  3. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  4. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  5. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  6. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  7. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  8. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  9. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  10. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  11. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  12. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  13. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  14. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  15. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  16. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  17. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  18. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  19. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  20. The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  21. The combination of Big Data and Fast Data working together enables new business models you never could have done before.The idea here is that you analyze the historical data looking for trends or patterns that lead to good results. Then you try to model those patterns in such a way that you can detect them as they are unfolding in real-time based on incoming Fast Data. If you can just influence the behaviors of the actors a little bit, you might be able to steer them toward the patterns that produce the GOOD results.For instance, there is at least one hedge fund out there that uses sentiment data from the twitter “fire hose” to pick their top 10 stocks for their strategy every day. They establish their strategy using that Big Data, then they execute against the strategy and make course corrections as needed based on traditional market data as the day goes along. A true combination of Big and Fast, to make the business work better.Lets look at some others cases. How about location-based services:Mobile phone companies are looking at using their big-data to determine things like travel congestion for crowd management or traffic management purposes. This was a very hot topic leading up to the 2012 Olympics.Here’s another interesting use-case for Fast Data/Big Data:Amazon will pay shoppers $5 to walk out of stores empty-handed.Amazon is offering consumers up to $5 off on purchases if they compare prices using their mobile phone Price Check app in a store.They are getting consumers to submit the prices of items with the app so Amazon knows if it is still offering the best prices. AND they are grabbing the sale right out of the store! Talk about capturing an opportunity in real time!