100K times faster apps.
In Memory Grids
Prateek Jain
Agenda
• In Memory Grids
– 10,000 foot view.
– Present scenario
– Why
– Why now
• Use Cases
• Types of In Memory Grids
– Compute Grid
– Data Grid
• Reference Architecture
• Sample application demo
• Further Resources
• Questions & Feedback
In Memory Grids
• 10,000 foot view
Breaking your problem to solve it using multiple resources on network.
Using main memory instead of Disk to do file I/O.
BigData landscape
Traditional App Associated Challenges
RDBMS -- Used to run many analytics
systems
Performance ( Not Real time), Scaling,
Cost++
CEP -- Designed to correlate data in real
time
Scaling (often necessary to aggregate
events into a centralized source ), not
designed for historical data.
Hadoop -- Designed for batch analytics and
complex correlation
Not designed for Real time.
NoSQL -- Designed to handle large data
volumes at low cost
Processing capability: Sheer amount of
data can be challenging.
IMDG -- Fast for storing and processing
data
Storing vast amounts of information in-
memory doesn’t scale, in terms of both
system scaling and cost
Different problems, so are the solutions.
Why ?
• Speed matters
– Citi : 100ms == $1 M
– Google : 500ms == 20% traffic drop
• Disk up to 107 times slower than RAM.
In Memory Grids• Why now?
– Hardware, ability++ and cost--
• 1TB RAM & 48 core cluster (can hold full week tweets) ~ $40K
Data Growth, PB DRAM Cost, $
BigData tech. plannedData is growing exponentially 30% drop each 12-18 months
Use Cases
• Trading Systems
– Handle large volume of transactions
• Real time risk analytics
– Analysis of trading positions and risk
• Online gaming
– Online real-time backbone for gaming
• Geo Mapping
– Real-time geographical route and traffic information
• Bio Informatics
– Real-time DNA sequencing and matching
In Memory Compute Grid
(IMCG)
In Memory Grids
1. In Memory -- Compute Grid.
Compute Grids allow you to take a computation, optionally split it into multiple parts, and
execute them on different grid nodes in parallel.
Functionality
• Distributed Execution Models - map-reduce, Streaming
Processing & CEP, MPP, MPI style
• Distributed Execution Management Services – task
distribution, failover, load balancing, collision resolution,
job stealing, redundant mapping support, task scheduling,
asynchronous reduction, task checkpoints
• Distributed Deployment & Provisioning.
• Distributed Resources Management - Automatic discovery
In Memory Data Grid
(IMDG)
In Memory Grids
2. In Memory -- Data Grid. (aka, Distributed data caching )
Provides applications with ability to keep data in memory for high availability rather than
constantly fetching it from slower storage elsewhere, like RDBMS or shared file systems.
IMDG ?
• Several JVMs sharing in-memory partitioned data.
• Provides extremely low latency access to,
and high availability of, application data by keeping it in
memory and to do so in a highly parallelized way.
• Support most of the Big Data processing requirements.
Common Features
• Distributed maps
• Caching , Evictions
• Code execution (executor service, map-reduce)
• Listeners
• Queries (SQL like)
• Pluggable indexing
• Hibernate L 2 cache (optional)
• ACID Transactions
• MapStore (write-behind, write-through, read-through)
• Optimized Serialization
Common Features
• The same object your business logic is using can be kept in the data grid.
• No extra step of marshaling and un-marshaling.
• Embeddable (optional)
Reference Architecture
IMDG is not a
• NoSQL database
• In Memory Database (IMDB)
• How?
• Support for true distributed ACID transactions with highly optimized 2PC protocol implementation.
• Scalable Data Partitioning across a cluster including both partitioned or fully replicated scenarios
• Ability to work directly with application domain objects rather than with primitive types or “documents”
• Tight integration with In-Memory Compute Grid (IMCG)
• Pluggable segmentation (a.k.a. "brain split" problem) resolution
• Pluggable expiration policies
• Pluggable indexing support
Further Reading
• http://www.ventanaresearch.com/uploadedFiles/Content/Landing_Pages/Ventana_Research_Big_
Data_Benchmark_Research_Presentation.pdf
• http://wikibon.org/wiki/v/Data_in_DRAM_is_a_Flash_in_the_Pan#Data_in_Memory_Solutions_for
_Real-Time_High-Performance_Transaction_Analytics
• http://www.gridgain.com/book/book.html
• http://java.dzone.com/articles/compute-grids-vs-data-grids
• http://www.infoq.com/articles/in-memory-data-grids
• http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-
alternative-approach-to-facebooks-new-realtime-analytics-system.html
• https://del.sapient.resultspace.com/scm/gmtechip/POCs/gridgain_risk_analytics

In memory grids IMDG

  • 1.
    100K times fasterapps. In Memory Grids Prateek Jain
  • 2.
    Agenda • In MemoryGrids – 10,000 foot view. – Present scenario – Why – Why now • Use Cases • Types of In Memory Grids – Compute Grid – Data Grid • Reference Architecture • Sample application demo • Further Resources • Questions & Feedback
  • 3.
    In Memory Grids •10,000 foot view Breaking your problem to solve it using multiple resources on network. Using main memory instead of Disk to do file I/O.
  • 4.
    BigData landscape Traditional AppAssociated Challenges RDBMS -- Used to run many analytics systems Performance ( Not Real time), Scaling, Cost++ CEP -- Designed to correlate data in real time Scaling (often necessary to aggregate events into a centralized source ), not designed for historical data. Hadoop -- Designed for batch analytics and complex correlation Not designed for Real time. NoSQL -- Designed to handle large data volumes at low cost Processing capability: Sheer amount of data can be challenging. IMDG -- Fast for storing and processing data Storing vast amounts of information in- memory doesn’t scale, in terms of both system scaling and cost Different problems, so are the solutions.
  • 5.
    Why ? • Speedmatters – Citi : 100ms == $1 M – Google : 500ms == 20% traffic drop • Disk up to 107 times slower than RAM.
  • 6.
    In Memory Grids•Why now? – Hardware, ability++ and cost-- • 1TB RAM & 48 core cluster (can hold full week tweets) ~ $40K Data Growth, PB DRAM Cost, $ BigData tech. plannedData is growing exponentially 30% drop each 12-18 months
  • 7.
    Use Cases • TradingSystems – Handle large volume of transactions • Real time risk analytics – Analysis of trading positions and risk • Online gaming – Online real-time backbone for gaming • Geo Mapping – Real-time geographical route and traffic information • Bio Informatics – Real-time DNA sequencing and matching
  • 8.
    In Memory ComputeGrid (IMCG)
  • 9.
    In Memory Grids 1.In Memory -- Compute Grid. Compute Grids allow you to take a computation, optionally split it into multiple parts, and execute them on different grid nodes in parallel.
  • 10.
    Functionality • Distributed ExecutionModels - map-reduce, Streaming Processing & CEP, MPP, MPI style • Distributed Execution Management Services – task distribution, failover, load balancing, collision resolution, job stealing, redundant mapping support, task scheduling, asynchronous reduction, task checkpoints • Distributed Deployment & Provisioning. • Distributed Resources Management - Automatic discovery
  • 11.
    In Memory DataGrid (IMDG)
  • 12.
    In Memory Grids 2.In Memory -- Data Grid. (aka, Distributed data caching ) Provides applications with ability to keep data in memory for high availability rather than constantly fetching it from slower storage elsewhere, like RDBMS or shared file systems.
  • 13.
    IMDG ? • SeveralJVMs sharing in-memory partitioned data. • Provides extremely low latency access to, and high availability of, application data by keeping it in memory and to do so in a highly parallelized way. • Support most of the Big Data processing requirements.
  • 14.
    Common Features • Distributedmaps • Caching , Evictions • Code execution (executor service, map-reduce) • Listeners • Queries (SQL like) • Pluggable indexing • Hibernate L 2 cache (optional) • ACID Transactions • MapStore (write-behind, write-through, read-through) • Optimized Serialization
  • 15.
    Common Features • Thesame object your business logic is using can be kept in the data grid. • No extra step of marshaling and un-marshaling. • Embeddable (optional)
  • 16.
  • 17.
    IMDG is nota • NoSQL database • In Memory Database (IMDB) • How? • Support for true distributed ACID transactions with highly optimized 2PC protocol implementation. • Scalable Data Partitioning across a cluster including both partitioned or fully replicated scenarios • Ability to work directly with application domain objects rather than with primitive types or “documents” • Tight integration with In-Memory Compute Grid (IMCG) • Pluggable segmentation (a.k.a. "brain split" problem) resolution • Pluggable expiration policies • Pluggable indexing support
  • 18.
    Further Reading • http://www.ventanaresearch.com/uploadedFiles/Content/Landing_Pages/Ventana_Research_Big_ Data_Benchmark_Research_Presentation.pdf •http://wikibon.org/wiki/v/Data_in_DRAM_is_a_Flash_in_the_Pan#Data_in_Memory_Solutions_for _Real-Time_High-Performance_Transaction_Analytics • http://www.gridgain.com/book/book.html • http://java.dzone.com/articles/compute-grids-vs-data-grids • http://www.infoq.com/articles/in-memory-data-grids • http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an- alternative-approach-to-facebooks-new-realtime-analytics-system.html • https://del.sapient.resultspace.com/scm/gmtechip/POCs/gridgain_risk_analytics