A NEW PLATFORM FOR A NEW ERA
2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved.
Big & Fast Data
Real-world...
3© Copyright 2013 Pivotal. All rights reserved.
Agenda
About Pivotal
Building infinitely scalable systems
Big + Fast Data
...
4© Copyright 2013 Pivotal. All rights reserved.
Pivotal Platform
Cloud Storage
Virtualization
Data &
Analytics
Platform
Cl...
5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved.
Building infinitely scalab...
6© Copyright 2013 Pivotal. All rights reserved.
What is scalability?
Scalability: How a system behave (scale) as we add vo...
7© Copyright 2013 Pivotal. All rights reserved.
Vertical scalability x Horizontal scalability
Scale up x Scale out
8© Copyright 2013 Pivotal. All rights reserved.
Usual computer system
Location
Firewall
External
Storage
Network
RouterPro...
9© Copyright 2013 Pivotal. All rights reserved.
What could prevent from scaling out?
Location
Firewall
External
Storage
Ne...
10© Copyright 2013 Pivotal. All rights reserved.
Location
Firewall
External
Storage
Network
RouterProcessor Processor Proc...
11© Copyright 2013 Pivotal. All rights reserved.
Typical latencies
12© Copyright 2013 Pivotal. All rights reserved.
Minimizing Disk I/O
Maximize Disc Speed: Ultra-fast disks, SSDs
Paralleli...
13© Copyright 2013 Pivotal. All rights reserved.
Minimizing Disk I/O
Columnar Databases
Parallelizing Disc I/O…
14© Copyright 2013 Pivotal. All rights reserved.
Minimizing Disk I/O
In-memory Databases
All in-memory storage
In- Memory
...
15© Copyright 2013 Pivotal. All rights reserved.
Starting to scale out…
Now we’re not pinned to disc I/O, we can start to ...
16© Copyright 2013 Pivotal. All rights reserved.
Starting to scale out…
… but then the network (the only shared resource) ...
17© Copyright 2013 Pivotal. All rights reserved.
Minimizing Disk & Network I/O
Maximize Network Speed: Fast GB networks, F...
18© Copyright 2013 Pivotal. All rights reserved.
Minimizing Disk & Network I/O
Hadoop
YARN
19© Copyright 2013 Pivotal. All rights reserved.
HDFS also distributes data among nodes - but persists on disk.
I/O is par...
20© Copyright 2013 Pivotal. All rights reserved.
Great when we have long-lasting jobs over huge amounts of data
(multiple ...
21© Copyright 2013 Pivotal. All rights reserved.
Data can usually be both distributed to different members and
partitioned...
22© Copyright 2013 Pivotal. All rights reserved.
However, latency is still limited to disk access
Inserts are usually very...
23© Copyright 2013 Pivotal. All rights reserved.
In- Memory
System
ProcessorProcessor
In- Memory
System
ProcessorProcessor...
24© Copyright 2013 Pivotal. All rights reserved.
Data can be either distributed, replicated or both between nodes
In-memor...
25© Copyright 2013 Pivotal. All rights reserved.
Strategy
Access
Latency
Horizonaly
Scalable
Storage
I/O
Capacity Variety
...
26© Copyright 2013 Pivotal. All rights reserved.
Fast Data meets Big Data
Working together they enable entirely new busine...
27© Copyright 2013 Pivotal. All rights reserved.
Ref. Architecture
Transactional systems
Distributed non- structured
data ...
28© Copyright 2013 Pivotal. All rights reserved.
Ref. Architecture
Real-time Analytics
Real case
SQLFireCluster
Sales Visi...
29© Copyright 2013 Pivotal. All rights reserved.
GemFire
/SQLFire
Cluster
JCA Connector
Greenplum
Hadoop FileSystem Greenp...
30© Copyright 2013 Pivotal. All rights reserved.
GemFire
/SQLFire
Cluster
Mainframe Connector
MainframeGreenplum
Hadoop Fi...
31© Copyright 2013 Pivotal. All rights reserved.
Ref. Architecture
App Modernization
• Real case
• Brazilian banking indus...
32© Copyright 2013 Pivotal. All rights reserved.
SQLFire/Gemfire
Cluster
Data Model
Message
Dispatcher
Content
Enricher
Asy...
33© Copyright 2013 Pivotal. All rights reserved.
Thank You
A NEW PLATFORM FOR A NEW ERA
Upcoming SlideShare
Loading in...5
×

Big and Fast Data - Building Infinitely Scalable Systems

2,080

Published on

Published in: Technology, Business
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,080
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide
  • There is a significant opportunity for EMC’s customers take technology leadership, not only at the infrastructure level, but also across the rapidly growing and fast-moving application development and big data markets.  Pivotal is aligning resources for our customers to leverage this transformational period, and drive more quickly towards the rising opportunities.As the assets from EMC and VMware come together under Pivotal they fall into three strategic areas. Data and Analytics PlatformCloud Application PlatformData-Driven Application DevelopmentWe will discuss each of these today
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The first is Data and Analytics. Combining Greenplum Database and Hadoop with the fast data technologies of Gemfire and SQLfire from VMware, Pivotal is delivering the industries most comprehensive big and fast data platform. Pivotal HD with HAWQ SQL technology is delivering the worlds most powerful Hadoop data infrastructure. No longer does the Enterprise need to struggle with silos of data segmented by skills within the organization. Pivotal HD combines the scalable SQL processing from the Greenplum database with Enterprise ready Hadoop to deliver the industries most complete offering. For customers looking to stop data mart sprawl or add deep analytical capabilities, the Greenplum Database continues to server as the best in class MPP database. Proven at scale across many industries, The Greenplum database delivers business value through analytics on structured dataReal-time has quickly become a new requirement for many Enterprise companies. Data must be consumed and acted upon in micro-seconds to deliver new business models while staying competitive. Gemfire and SQLfire provide industry leading in-memory capabilities to deliver real-time performance while addressing Enterprise availability requirements.
  • The combination of Big Data and Fast Data working together enables new business models you never could have done before.The idea here is that you analyze the historical data looking for trends or patterns that lead to good results. Then you try to model those patterns in such a way that you can detect them as they are unfolding in real-time based on incoming Fast Data. If you can just influence the behaviors of the actors a little bit, you might be able to steer them toward the patterns that produce the GOOD results.For instance, there is at least one hedge fund out there that uses sentiment data from the twitter “fire hose” to pick their top 10 stocks for their strategy every day. They establish their strategy using that Big Data, then they execute against the strategy and make course corrections as needed based on traditional market data as the day goes along. A true combination of Big and Fast, to make the business work better.Lets look at some others cases. How about location-based services:Mobile phone companies are looking at using their big-data to determine things like travel congestion for crowd management or traffic management purposes. This was a very hot topic leading up to the 2012 Olympics.Here’s another interesting use-case for Fast Data/Big Data:Amazon will pay shoppers $5 to walk out of stores empty-handed.Amazon is offering consumers up to $5 off on purchases if they compare prices using their mobile phone Price Check app in a store.They are getting consumers to submit the prices of items with the app so Amazon knows if it is still offering the best prices. AND they are grabbing the sale right out of the store! Talk about capturing an opportunity in real time!
  • Big and Fast Data - Building Infinitely Scalable Systems

    1. 1. A NEW PLATFORM FOR A NEW ERA
    2. 2. 2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved. Big & Fast Data Real-world architecture blueprints …for building infinitely scalable systems Frederico Melo fmelo@gopivotal.com
    3. 3. 3© Copyright 2013 Pivotal. All rights reserved. Agenda About Pivotal Building infinitely scalable systems Big + Fast Data Pivotal Platform Real world use-cases
    4. 4. 4© Copyright 2013 Pivotal. All rights reserved. Pivotal Platform Cloud Storage Virtualization Data & Analytics Platform Cloud Application Platform Data-Driven Application Development Pivotal Data Science Labs
    5. 5. 5© Copyright 2013 Pivotal. All rights reserved. 5© Copyright 2013 Pivotal. All rights reserved. Building infinitely scalable systems
    6. 6. 6© Copyright 2013 Pivotal. All rights reserved. What is scalability? Scalability: How a system behave (scale) as we add volume or load, incrementally increasing its processing power Scalable: System which handles increases of volume or load, increasing his throughput Linear Scalability: Increase the throughput at the same rate as we increase the load (twice the requests coming, twice the throughput), keeping the same response time per transaction. Scalability limit: the limit where a system stop scaling as we add more load  we have a bottleneck!!
    7. 7. 7© Copyright 2013 Pivotal. All rights reserved. Vertical scalability x Horizontal scalability Scale up x Scale out
    8. 8. 8© Copyright 2013 Pivotal. All rights reserved. Usual computer system Location Firewall External Storage Network RouterProcessor Processor Processor Processor CPUs Main Memory (RAM) Internal Disk NIC
    9. 9. 9© Copyright 2013 Pivotal. All rights reserved. What could prevent from scaling out? Location Firewall External Storage Network RouterProcessor Processor Processor Processor CPUs Main Memory (RAM) Internal Disk NIC
    10. 10. 10© Copyright 2013 Pivotal. All rights reserved. Location Firewall External Storage Network RouterProcessor Processor Processor Processor CPUs Main Memory (RAM) Internal Disk NIC I/O I/O I/O I/O Disc I/O Memory I/O Network I/O External Devices I/O What could prevent from scaling out?
    11. 11. 11© Copyright 2013 Pivotal. All rights reserved. Typical latencies
    12. 12. 12© Copyright 2013 Pivotal. All rights reserved. Minimizing Disk I/O Maximize Disc Speed: Ultra-fast disks, SSDs Parallelize Disc I/O: Write to multiple files/disks at once. Get rid of updates: avoid disk seek, although there’s still I/O Minimize inserts: do only batch inserts. Asynchronous writes: remove disk I/O from transactions critical path
    13. 13. 13© Copyright 2013 Pivotal. All rights reserved. Minimizing Disk I/O Columnar Databases Parallelizing Disc I/O…
    14. 14. 14© Copyright 2013 Pivotal. All rights reserved. Minimizing Disk I/O In-memory Databases All in-memory storage In- Memory System ProcessorProcessor Latency times are memory-based Useful to *some* scenarios However there’s no distributed processing (processor usually becomes a bottleneck and limits horizontal scalability)
    15. 15. 15© Copyright 2013 Pivotal. All rights reserved. Starting to scale out… Now we’re not pinned to disc I/O, we can start to divide and distribute processing power, scaling out In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor
    16. 16. 16© Copyright 2013 Pivotal. All rights reserved. Starting to scale out… … but then the network (the only shared resource) can be a bottleneck! In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor Obj Obj
    17. 17. 17© Copyright 2013 Pivotal. All rights reserved. Minimizing Disk & Network I/O Maximize Network Speed: Fast GB networks, Fiber Channel Bring computing close to data: Data-aware procedures, data partitioning Improve algorithms: Avoid multiple hops, avoid slow members.
    18. 18. 18© Copyright 2013 Pivotal. All rights reserved. Minimizing Disk & Network I/O Hadoop YARN
    19. 19. 19© Copyright 2013 Pivotal. All rights reserved. HDFS also distributes data among nodes - but persists on disk. I/O is parallelized since it’s distributed.. But there’s no in-memory latency. Network latency + disk access latency  slow for real-time queries / processes. More suitable for data transformation / load / staging / batches Minimizing Disk & Network I/O Hadoop
    20. 20. 20© Copyright 2013 Pivotal. All rights reserved. Great when we have long-lasting jobs over huge amounts of data (multiple terabytes / petabytes) Great for non-structured data (although Hive can do SQL-like) Can’t handle updates (insert-only model) Not suitable for low latency Minimizing Disk & Network I/O Hadoop
    21. 21. 21© Copyright 2013 Pivotal. All rights reserved. Data can usually be both distributed to different members and partitioned to different files Minimizing Disk & Network I/O MPP Databases MPP Database member ProcessorProcessor MPP Database member ProcessorProcessor External Storage
    22. 22. 22© Copyright 2013 Pivotal. All rights reserved. However, latency is still limited to disk access Inserts are usually very slow (too many indexes, many partitions, many distributions) Great for huge amounts of structured data Minimizing Disk & Network I/O MPP Databases
    23. 23. 23© Copyright 2013 Pivotal. All rights reserved. In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor In- Memory System ProcessorProcessor Obj ObjObjObj Distribute data in order do minimize their transference between nodes Functions are also distributed, executing close to where the data is Minimizing Disk & Network I/O In-memory Data Grids
    24. 24. 24© Copyright 2013 Pivotal. All rights reserved. Data can be either distributed, replicated or both between nodes In-memory access times Related data should be co-located to avoid network hops on joins Now we’re not pinned to either disc I/O or network I/O … but we’re limited to the server’s memory capacity :-) Minimizing Disk & Network I/O In-memory Data Grids
    25. 25. 25© Copyright 2013 Pivotal. All rights reserved. Strategy Access Latency Horizonaly Scalable Storage I/O Capacity Variety Traditional RDBMS Disk No Disk Gigabytes Structured In-Memory DB Memory No Memory Few Gb Structured Columnar DB Disk No* Partitioned Disk Terabytes Structured Hadoop Disk Yes Partitioned Disk Petabytes Unstructured* In-Memory Data Grid Memory Yes Memory Hundreds Gb Unstructured New SQL Grid Memory Yes Memory Hundreds Gb Structured MPP Database Disk Yes Partitioned Disk Petabytes Structured
    26. 26. 26© Copyright 2013 Pivotal. All rights reserved. Fast Data meets Big Data Working together they enable entirely new business models.
    27. 27. 27© Copyright 2013 Pivotal. All rights reserved. Ref. Architecture Transactional systems Distributed non- structured data computing Enterprise Data Warehouse (RDBMS) In- Memory Data Grid IMDG Member Member Member Member Data Ingest Asynchronous Persistence Analytic Data Mart (MPP Database) Real-time analytical queries Big Data analytical queries "Hot data" search Reference DataMap- ReduceBig Data jobs Hive Pig Transactional SystemTransactional SystemTransactional System
    28. 28. 28© Copyright 2013 Pivotal. All rights reserved. Ref. Architecture Real-time Analytics Real case SQLFireCluster Sales Visits Invoices Message Dispatcher Table Functions Insert / Update SQL Columnar Database OLTP transactions and real-time analytics OLAP, traditional analytics and archival database Traditional FS or Hadoop FS Polling: - XML File access Stored Procedure Fire any needed SP XML Polling Consumer Sales Stored ProcedureStored Procedure - Real-time queries Invoices w/ taxes, sales reps, customer, customer visits, ... Async Insert/ update Async End-User GUI Table Function - Batching - Long-running analytics Real-time reports Data Load InvoicesInvoicesOther entities
    29. 29. 29© Copyright 2013 Pivotal. All rights reserved. GemFire /SQLFire Cluster JCA Connector Greenplum Hadoop FileSystem Greenplum DB Highly scalable structured + unstructured data analytics Async UnstructuredData StructuredData Pivotal HAWK Highly scalable transaction processing and real-time analytics Data Model ANSI SQL Java / .NET / C++ APIs Web Services Legacy API Async HDFS Connector JDBC Pipes Stored ProcedureStored ProcedureLegacy App Stored ProcedureStored ProcedureLegacy App Web Services Ref. Architecture Data Service
    30. 30. 30© Copyright 2013 Pivotal. All rights reserved. GemFire /SQLFire Cluster Mainframe Connector MainframeGreenplum Hadoop FileSystem Greenplum DB Highly scalable structured + unstructured data analytics Async UnstructuredData StructuredData Pivotal HAWK Highly scalable transaction processing Stored ProcedureStored ProcedureLegacy App Data Model ANSI SQL Java / .NET / C++ APIs Web Services CICS Web Services Async HDFS Connector JDBC Pipes Stored ProcedureStored ProcedureLegacy App Transaction Manager Database Modernization Stored ProcedureStored ProcedureModernized App Stored ProcedureStored ProcedureModernized App Ref. Architecture App Modernization
    31. 31. 31© Copyright 2013 Pivotal. All rights reserved. Ref. Architecture App Modernization • Real case • Brazilian banking industry
    32. 32. 32© Copyright 2013 Pivotal. All rights reserved. SQLFire/Gemfire Cluster Data Model Message Dispatcher Content Enricher Async Insert / Update SQL Transactions and real-time analytics Async Re-calculate RT analytics data Distributed Stored Procedure Distributed Stored ProcedureDistributed Function Update RT Analytic Model Hadoop FileSystem Greenplum DB Highly scalable data analytics Pivotal HAWK Java API .NET API C++ API Web Services Transactional Applications Stored ProcedureStored ProcedureApplication Stored ProcedureStored ProcedureApplication Stored ProcedureStored ProcedureApplication Stored ProcedureStored ProcedureApplication Stored ProcedureStored ProcedureApplication Analytic Applications Stored ProcedureStored ProcedureApplication Ref. Architecture Summary
    33. 33. 33© Copyright 2013 Pivotal. All rights reserved. Thank You
    34. 34. A NEW PLATFORM FOR A NEW ERA

    ×