How the data explosion of recent years has spawned many new technologies, and role of in-memory techology currently and in light of advances in flash memory.
2. What are we here to discuss?
Making Sense of the Exploding Data
World
The role of middleware to address
scalability challenges
The role of middleware to address
integration challenges
4. GB
TB
PB
DataVolume
Yr Mo Day Hr Min Sec MS µS
Data Mining
Machine
Learning
Data Velocity
Data
Warehouse High Throughput OLTP
Operational Intelligence
Exploratory Analytics
OLTP
Business Intelligence
Streaming
Capacity and Performance Drives
New Data Management Technologies
9. Key/Value
• Query: Key, Value
• Semantics:
• Mostly Read
• No Aggregation
• No Projection
• No Partial update
• Performance: 1M’s/sec
• Consistency: Atomic*
• Scaling: Mostly Scale-Out
• Availability: Limited (varies
quite substantially between
implementations)
10. Stream Processing (Storm)
• Semantics
– Event driven data processing
• Used for continuous
updates
– No need for a costly “SELECT
FOR UPDATE”
• Performance: 10’sM/sec
updates
Spouts
Bolt
11. Common Assumption
Disk is the bottleneck
2010
Performance^10
2000 2020
HDD Latency (Seek & Rotate) = Little Improvement
100X
10,000X
Source: GigaOM Research
12. Capacity and Performance Drives
New Data Management Technologies
(Source: IDC, 2013)
Big Data (Hadoop)
NoSQL
In Memory,
Stream
Processing
RDBMS
14. A Typical App Looks Like This..
Front End Analytics
RT
Batch
STORM
The Data Flow
Complexity
15. What if Disk Was no Longer the
Bottleneck?
FLASH Closes the
CPU to Storage Gap
16. Our Application Cloud Look Like This..
Front End
High Speed
Data Store
(Using Flash/NVM)
Key/Value
SQL
Document
Graph
Transactional
Map/Reduce
Disk Becomes
the new Tape
StreamBase
Common Data Store serving
Multiple Semantics/API
18. We can use High Speed Data Bus for
Integrating All of our Data Sources
Front End Analytics
RT
Batch
STORM
High Speed
Data Bus
(Built-In
Caching)
RT
Transactional
Data Access
Direct Access
RT Streaming
Hadoop Synch
MySQL Synch
Mongo Synch
20. Data Grid Ideal Integration Nexus
• Transactional
• HA – Self Healing
• Horizontally scalable
• FIFO (and partial FIFO) support
• Queryable
• Ultra high performance read/write
21. Designed for Transactional and
Analytics Scenarios..
Homeland Security
Real Time Search
Social
eCommerce
User Tracking &
Engagement
Financial Services
31. So what?
• Data access not tied to store
implementation.
• Middle tier grows as source of truth.
• Simplifies data access as it grows
• Can support strong consistency as
needed.
• Provides HA platform for integration.
32. - 1KB object size and uniform distribution
- 2 sockets 2.8GHz CPU with total 24 cores, CentOS
5.8, 2 FusionIO SLC PCIe cards RAID
- YCSB measurements performed by SanDisk
62
121
17
56
0
20
40
60
80
100
120
140
160
No Read / 100% Write 100 % Read / No Write
FDF-GigaSpaces on SSDs Stock GigaSpaces in DRAM
Assumptions: 1TB Flash = $2K; 1TB RAM = $20K
The Performance of RAM at a Cost/Capacity Closer to Disk
ZetaScale-GigaSpaces on SSDs
Stock GigaSpaces in DRAM
ZetaScale-GigaSpaces
Provides 2x – 3.6x Better TPS/$ 1:50 More Capacity
ZetaScale™ – XAP MemoryXtend
20
1000
0
200
400
600
800
1000
1200
Capacity
XAP XAP Extend
1:50
242k Read/Sec
33. Take Aways
• Explosion of data has created an
explosion of targeted technologies
• Many architected on “disk is slow”
• Flash changing the equation.
• In-memory tech best suited to take
advantage of flash
• Continued blurring of in-memory
middleware and data storage.
Very hard to understand. All claim different overlapping capabilities. Incomprehensible really. Not the few vendors that you used to have. Many niches.
To make sense of the preceding slide, it’s helpful to break down the various technologies.
Some of the emerging NewSQL and NoSQL disk-based databases might have had the ability to deal with the more demanding data volume and variety but…
But disk-based databases have always been I/O bound – in other words, keeping up with the new velocity demands of data is much harder. Disks have always gotten in the way of database velocity or throughput. The closer to real-time that transaction throughput or analytics must be, the harder it is for disk-based approaches to keep up. *In each domain, currently (and maybe always) you need different tools*
Very rich because very mature
Ad hoc queries
Strong consistency.
Cassandra: distributed multi-dimensional hash map. Optimized for write. Tunable consistency. Bigtable, Dynamo
Mongo: document store optimized for ease of development
Couch: document store, ACID transaction, MVCC
Ability to scale existing databases. Not to replace it.
Serve data from memory.
Highly available in memory.
Applicable to any slow data store.
Memcached – usually local cache. In memory side-cache
Redis – distributed key/value store. Does snapshotting for persistence. Allows updates to backups (inconsistency).
Riak – key value with eventual consistency
Not transactional or relational