100K times faster apps.In Memory GridsPrateek Jain
Agenda• In Memory Grids– 10,000 foot view.– Present scenario– Why– Why now• Use Cases• Types of In Memory Grids– Compute Grid– Data Grid• Reference Architecture• Sample application demo• Further Resources• Questions & Feedback
In Memory Grids• 10,000 foot viewBreaking your problem to solve it using multiple resources on network.Using main memory instead of Disk to do file I/O.
BigData landscapeTraditional App Associated ChallengesRDBMS -- Used to run many analyticssystemsPerformance ( Not Real time), Scaling,Cost++CEP -- Designed to correlate data in realtimeScaling (often necessary to aggregateevents into a centralized source ), notdesigned for historical data.Hadoop -- Designed for batch analytics andcomplex correlationNot designed for Real time.NoSQL -- Designed to handle large datavolumes at low costProcessing capability: Sheer amount ofdata can be challenging.IMDG -- Fast for storing and processingdataStoring vast amounts of information in-memory doesn’t scale, in terms of bothsystem scaling and costDifferent problems, so are the solutions.
Why ?• Speed matters– Citi : 100ms == $1 M– Google : 500ms == 20% traffic drop• Disk up to 107 times slower than RAM.
In Memory Grids• Why now?– Hardware, ability++ and cost--• 1TB RAM & 48 core cluster (can hold full week tweets) ~ $40KData Growth, PB DRAM Cost, $BigData tech. plannedData is growing exponentially 30% drop each 12-18 months
Use Cases• Trading Systems– Handle large volume of transactions• Real time risk analytics– Analysis of trading positions and risk• Online gaming– Online real-time backbone for gaming• Geo Mapping– Real-time geographical route and traffic information• Bio Informatics– Real-time DNA sequencing and matching
In Memory Compute Grid(IMCG)
In Memory Grids1. In Memory -- Compute Grid.Compute Grids allow you to take a computation, optionally split it into multiple parts, andexecute them on different grid nodes in parallel.
In Memory Grids2. In Memory -- Data Grid. (aka, Distributed data caching )Provides applications with ability to keep data in memory for high availability rather thanconstantly fetching it from slower storage elsewhere, like RDBMS or shared file systems.
IMDG ?• Several JVMs sharing in-memory partitioned data.• Provides extremely low latency access to,and high availability of, application data by keeping it inmemory and to do so in a highly parallelized way.• Support most of the Big Data processing requirements.
Common Features• The same object your business logic is using can be kept in the data grid.• No extra step of marshaling and un-marshaling.• Embeddable (optional)
IMDG is not a• NoSQL database• In Memory Database (IMDB)• How?• Support for true distributed ACID transactions with highly optimized 2PC protocol implementation.• Scalable Data Partitioning across a cluster including both partitioned or fully replicated scenarios• Ability to work directly with application domain objects rather than with primitive types or “documents”• Tight integration with In-Memory Compute Grid (IMCG)• Pluggable segmentation (a.k.a. "brain split" problem) resolution• Pluggable expiration policies• Pluggable indexing support
Further Reading• http://www.ventanaresearch.com/uploadedFiles/Content/Landing_Pages/Ventana_Research_Big_Data_Benchmark_Research_Presentation.pdf• http://wikibon.org/wiki/v/Data_in_DRAM_is_a_Flash_in_the_Pan#Data_in_Memory_Solutions_for_Real-Time_High-Performance_Transaction_Analytics• http://www.gridgain.com/book/book.html• http://java.dzone.com/articles/compute-grids-vs-data-grids• http://www.infoq.com/articles/in-memory-data-grids• http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach-to-facebooks-new-realtime-analytics-system.html• https://del.sapient.resultspace.com/scm/gmtechip/POCs/gridgain_risk_analytics