Your SlideShare is downloading. ×
0
Building An Elastic Real Time NoSQL Platform        Creating a platform for unlimited elastic           computation power ...
Motivation Complete elastic solution stack Applications that need massive “strategic” storage (disk-  based NoSQL) and a...
What Is Real-Time? It’s all relative In this context, it means “really fast”. How fast is really fast? Reads as low as ...
Two Layer Approach Advantage: Minimal                                                                                    ...
Two Layer Approach (continued) Grid layer doing CEP can act as a filter, as many raw events  get converted to semantic/bu...
Basics Of In Memory DataGrid Technology An In Memory Data Grid (IMDG) is a data store Grid just means “cluster” Data ca...
Advanced Capabilities   Business logic (code) co-resident with data shards   Scalable messaging   Dynamic code executio...
Features: IMDG vs NoSQL                                                                  Disk Based                       ...
Vive La Difference The IMDG compliments a NoSQL store:   – Can serve as a short term request cache (side cache or inline)...
A Complete Scalable Application Platform                                                       Raw Event Stream           ...
Key Implementation Issues Grid must support reliable asynchronous persistence   – If not reliable: in-flight data is at r...
Key Implementation Issues Grid ideally supports FIFO entry ordering   – Key to using grid as a queue   – Key to scaling m...
Use Case 1 – Event Cloud Complex event processing    Collect events in real time                                       Tr...
Use Case 2 – Time Bounded Time Bounded – suited to operations with daily business cycle  (e.g. trading) Current day (or ...
Use Case 3 - LRU Grid holds a subset of NoSQL store, and supports an LRU  caching model. In line or side-cache. Appropr...
Wishlist This platform concept is still at an early stage For Gigaspaces, integrations already exist for Cassandra and  ...
Conclusion Two shared nothing “NoSQL” architectures complementing  each other Fully elastic/scalable Ultra high perform...
18
Upcoming SlideShare
Loading in...5
×

Building an elastic real time no sql platform

106

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
106
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Building an elastic real time no sql platform "

  1. 1. Building An Elastic Real Time NoSQL Platform Creating a platform for unlimited elastic computation power and storage
  2. 2. Motivation Complete elastic solution stack Applications that need massive “strategic” storage (disk- based NoSQL) and a real time (“tactical”) component Horizontally and vertically scalable Highly available Self healing Fault tolerant: suitable for commodity h/w strategy Simplified management and monitoring, vs conventional, multi-product solutions ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  3. 3. What Is Real-Time? It’s all relative In this context, it means “really fast”. How fast is really fast? Reads as low as 5 μs read and typically under 1 ms for a fully replicated write. Source: http://blog.gigaspaces.com/2010/12/06/possible-impossibility-the-race-to-zero-latency/ ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  4. 4. Two Layer Approach Advantage: Minimal Raw Event Stream Raw Event Stream Raw Event Stream ts ents “impedance mismatch” en Real Time Ev Real Time Ev between layers. – Both NoSQL cluster technologies, with similar advantages SCALE Grid layer serves as an in Reporting Engine In Memory Compute Cluster memory cache for interactive Raw And Derived Events requests. Grid layer serves as a real time ... SCALE computation fabric for CEP, and NoSQL Cluster limited ( to allocated memory) real time map/reduce capability. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  5. 5. Two Layer Approach (continued) Grid layer doing CEP can act as a filter, as many raw events get converted to semantic/business events, reducing meaningless data verbosity Grid layer provides scalable messaging NoSQL layer provides unlimited cheap storage on commodity hardware NoSQL layer provides virtually unlimited scale processing power ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  6. 6. Basics Of In Memory DataGrid Technology An In Memory Data Grid (IMDG) is a data store Grid just means “cluster” Data can be partitioned across cluster nodes Processing power near data storage Distributed hash table Application optimized data model denormalization Nodes are typically configured with one or more replicas (sound familiar yet)? Not a “cache”: a system of record, but can be used as a cache, or both ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  7. 7. Advanced Capabilities Business logic (code) co-resident with data shards Scalable messaging Dynamic code execution across cluster Multi-language support Object-oriented Document-oriented/schema free Multi-level indexing SQL Queries Full ACID transaction support Elastic scaling (automatic and manual) Write-behind persistence ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  8. 8. Features: IMDG vs NoSQL Disk Based Data Grid NoSQL Low Latency Eventual/Tunable Horizontally Scalable Consistency Code co-location Service remoting Parallel Execution Unlimited scale Fault Tolerant Cloud enabled Hadoop tools Transactional Highly Available Elastic Messaging Platform Independent Complex Event Processing Flexible Schema ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  9. 9. Vive La Difference The IMDG compliments a NoSQL store: – Can serve as a short term request cache (side cache or inline) – Can serve as a cache for MR results – Enables event driven architectures / CEP – In memory map/reduce – Very fast writes, regardless of NoSQL store – Transactional layer: can essentially turn “eventual” consistency into pure transactional persistency without a performance hit – Highly available and independently scalable ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  10. 10. A Complete Scalable Application Platform Raw Event Stream Raw Event Stream Raw Event Stream ts vents n Real Time Eve Real Time E SCALE Reporting Engine In Memory Compute Cluster Raw And Derived Events ... SCALE NoSQL Cluster ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  11. 11. Key Implementation Issues Grid must support reliable asynchronous persistence – If not reliable: in-flight data is at risk. Ideally tunable to accommodate differing risk tolerance. – If not asynchronous: too slow – If not persistent: obviously nothing gets send to disk To do more than a distributed cache, grid must support code and data partitioning – Ideally, code is collocated in memory with data partition – Needed to support CEP, application, and service remoting capabilities ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  12. 12. Key Implementation Issues Grid ideally supports FIFO entry ordering – Key to using grid as a queue – Key to scaling messaging without an additional tier – Combined with co-located business logic, operates at memory speeds Write speed on the NoSQL layer – Grid is, in effect, queuing entries to the NoSQL layer – If the NoSQL layer cannot keep up, in memory grid backs up – This behavior is an asset, unless an unanticipated, sustained flood occurs. – The faster the write speed the better ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  13. 13. Use Case 1 – Event Cloud Complex event processing Collect events in real time Transform into decision factors •Interactions •Good customer •Orders •Pays 3-6 days early •Bills •Decreasing usage •Payments •Missed payment •Activations •Unusual bill •… •App usage  Original events, possibly scrubbed or annotated, are passed through  Business logic derived “synthetic events” constructed from raw event stream. Possible rule engine integration(e.g. Drools).  Derived events and analytics passed on to NoSQL layer  Other events forwarded to external listeners, systems ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  14. 14. Use Case 2 – Time Bounded Time Bounded – suited to operations with daily business cycle (e.g. trading) Current day (or other time period that will fit in memory) held in memory, along with related application state, caching etc… Still streaming operations to underlying NoSQL platform, or hold for end of day flush if back end can’t write fast enough. Supports application hosting, messaging, and complex event processing. External clients are aware of “current day” store, vs archival. Large scale reports/analytics run in background on NoSQL archive. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  15. 15. Use Case 3 - LRU Grid holds a subset of NoSQL store, and supports an LRU caching model. In line or side-cache. Appropriate only in cases where, like any cache, usage pattern does not generate many cache misses. Still supports CEP, messaging, and computation scaling (provided grid product supports it). ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  16. 16. Wishlist This platform concept is still at an early stage For Gigaspaces, integrations already exist for Cassandra and MongoDB. Customers are currently implementing solutions Stuff I’d like to see: – Unified management and scaling. Shared infrastructure. – Grid/NoSQL aware hive façade that can run MR jobs on both. Perhaps other Hadoop tools integration – Deeper integration. To further optimize write speed/capacity, and perhaps offload some in-memory aspects of underlying NoSQL platform to minimize duplication and possibly optimize elasticity. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  17. 17. Conclusion Two shared nothing “NoSQL” architectures complementing each other Fully elastic/scalable Ultra high performance/low latency combined with unlimited scale. Full application stack Highly reliable and self-healing Scalable complex event handling Multi-language Simple. Two products. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  18. 18. 18
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×