In Memory Data Grids, Demystified!

•

3 likes•2,060 views

Uri Cohen

The principles and foundations of in memory data grids

Technology Business

Uri Cohen
Head of Product @ GigaSpaces
@uri1803
github.com/uric
In-Memory Data Grids,
Demystified

Agenda
• Why IMDG?
• Brief History
• How It Works
– Data model & placement
– HA and fault tolerance
– Consistency
– Internals

Why
IMDG?
Today, more than
ever, there are many
choices when it
comes to storing your
data

® Copyright 2011 Gigaspaces Ltd. All Rights
Reserved
4
But There
Many
Solutions

Just A Few Years Back
® Copyright 2011 Gigaspaces Ltd. All Rights
Reserved
5

Memory will
always be faster
than disk
(usually by
orders of
magnitude)

67%
The ratio of IT
managers that think
that real time
analysis is the
biggest challenge
for
big data
implementations

40%
• Plan to use in
memory
technologies for
big data projects.
• Only 32%
mentioned
Hadoop

Hell, Even Gartner Thinks So
“In memory computing (IMC) … provides
transformational opportunities. The execution of
certain-types of hours-long batch processes can be
squeezed into minutes or even seconds …
Millions of events can be scanned in a matter of a few
tens of millisecond to detect correlations and patterns
pointing at emerging opportunities and threats "as
things happen.”

And
nowadays
HW and SW
just makes it
a whole lot
cheaper

Fast,
Transactional
Data Access
• Inventory
management
• Financial
reference data
• Real time
transactional data

Real Time
Stream
Processing
• Fraud Detection
• Click Stream
Analysis
• Real time
analytics
• Continuous
calculation

Heavyweight
Offline
Calculations
• Trade
Reconciliation
• Pattern analysis
and detection
• Number crunching

Caching
• Database
offloading
• Content heavy
websites

First There Were Local Caches
Cache
In process caching
of Key->Value data
structure
Distribute Cache
Partitioned cache
nodes
IMDG
Partitioned system
of record
IMDG.next()
Good for repetitive-data reads
Limited in capacity
Doesn’t handle write-heavy scenarios
Reads are only part latency path

Then Came Distributed Caches
Cache
In process caching
of Key->Value data
structure
Distribute Cache
Partitioned cache
nodes
IMDG
Partitioned system
of record
Increased Capacity
Still no support for write-heavy scenarios
Limited to ID-based reads
Reads are only part latency path
IMDG.next()

In Memory Data Grids
Cache
In process caching
of Key->Value data
structure
Increased capacity
Write scalability
Can serve as system of record with querying & transaction
semantics
Still limited in capacity
Latency can come from other parts of your app
Distribute Cache
Partitioned cache
nodes
IMDG
Partitioned system
of record
IMDG.next()

Data Placement – Fixed Hashing
27
hash(key) % #nodes

Fixed Hashing - HA
28
hash(key) % #nodes

Fixed Hashing – Scaling
29
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

Data Placement – Consistent Hashing
30
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

Data Placement – Consistent Hashing
31
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

Data Placement – Consistent Hashing
32
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

Data Placement – Consistent Hashing
33
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

Data Placement – Consistent Hashing
34
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

Data
Consistency
Since we’re dealing
with distributed
data, consistency
cannot be taken for
granted
• Read after write
• Read after read
• Write-write consistency

Some More Concerns
• Transactions
• Querying
• Failure detection
• Leader election
• Persistency
• Interoperability

IMDG.next()
Using IMDG for
messaging, BL

What's hot

What's the Hadoop-la about Kubernetes?DataWorks Summit

Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico

From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky

Bootstrapping state in Apache FlinkDataWorks Summit

Intro to databricks delta lakeMykola Zerniuk

Build your first Internet of Things app today with Open SourceApache Geode

What databaseRegunath B

Migrating legacy ERP data into HadoopDataWorks Summit

Caching for Microservices Architectures: Session II - Caching PatternsVMware Tanzu

Preventative Maintenance of Robots in Automotive IndustryDataWorks Summit/Hadoop Summit

Large Table Partitioning with PostgreSQL and DjangoEDB

BlueData and Hortonworks Data Platform (HDP)BlueData, Inc.

Machine Learning for Capacity ManagementEDB

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax

Infinspan: In-memory data grid meets NoSQLManik Surtani

#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...PivotalOpenSourceHub

PostgreSQL continuous backup and PITR with BarmanEDB

Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax

Distributed applications using HazelcastTaras Matyashovsky

DataStax | DataStax Enterprise Advanced Replication (Brian Hess & Cliff Gilmo...DataStax

What's hot (20)

What's the Hadoop-la about Kubernetes?

Key trends in Big Data and new reference architecture from Hewlett Packard En...

From cache to in-memory data grid. Introduction to Hazelcast.

Bootstrapping state in Apache Flink

Intro to databricks delta lake

Build your first Internet of Things app today with Open Source

What database

Migrating legacy ERP data into Hadoop

Caching for Microservices Architectures: Session II - Caching Patterns

Preventative Maintenance of Robots in Automotive Industry

Large Table Partitioning with PostgreSQL and Django

BlueData and Hortonworks Data Platform (HDP)

Machine Learning for Capacity Management

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...

Infinspan: In-memory data grid meets NoSQL

#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...

PostgreSQL continuous backup and PITR with Barman

Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known

Distributed applications using Hazelcast

DataStax | DataStax Enterprise Advanced Replication (Brian Hess & Cliff Gilmo...

Viewers also liked

Data Grids vs DatabasesGalder Zamarreño

Data Grids and Data CachingGalder Zamarreño

Multi modularized project setup with gulp, typescript and angular.jsDavid Amend

In memory computing principles by Mac Moore of GridGainData Con LA

Apache ignite DatagridSurinder Mehra

Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...DataStax

Viewers also liked (6)

Data Grids vs Databases

Data Grids and Data Caching

Multi modularized project setup with gulp, typescript and angular.js

In memory computing principles by Mac Moore of GridGain

Apache ignite Datagrid

Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...

Similar to In Memory Data Grids, Demystified!

MongoDB Sharding Webinar 2014Dylan Tong

Meta scale kognitio hadoop webinarMichael Hiskey

DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataHakka Labs

ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY

L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation

MongoDB and In-Memory ComputingDylan Tong

Nisha talagala keynote_inflow_2016Nisha Talagala

Watson christofer j_180208IBM Sverige

Introduction to HadoopMindsMapped Consulting

Big Data BoomSyed Jahanzaib Bin Hassan - JBH Syed

Msst 2019 v4Nisha Talagala

Processing Drone data @ScaleDr Hajji Hicham

Everything You Need to Know About ShardingMongoDB

In-Memory Big Data AnalyticsSupreeth M P

Hardware ProvisioningMongoDB

How AI and ML are driving Memory Architecture changesDanny Sabour

DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo

Meta scale kognitio hadoop webinarKognitio

MongoDB Deployment ChecklistMongoDB

How to build a data stack from scratchVinayak Hegde

Similar to In Memory Data Grids, Demystified! (20)

MongoDB Sharding Webinar 2014

Meta scale kognitio hadoop webinar

DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture

L'architettura di classe enterprise di nuova generazione - Massimo Brignoli

MongoDB and In-Memory Computing

Nisha talagala keynote_inflow_2016

Watson christofer j_180208

Introduction to Hadoop

Big Data Boom

Msst 2019 v4

Processing Drone data @Scale

Everything You Need to Know About Sharding

In-Memory Big Data Analytics

Hardware Provisioning

How AI and ML are driving Memory Architecture changes

DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization

Meta scale kognitio hadoop webinar

MongoDB Deployment Checklist

How to build a data stack from scratch

Recently uploaded

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple

React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech

Connecting the Dots for Information Discovery.pdfNeo4j

A Journey Into the Emotions of Software DevelopersNicole Novielli

4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica

A Glance At The Java Performance ToolboxAna-Maria Mihalceanu

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

A Framework for Development in the AI AgeCprime

Accelerating Enterprise Software Engineering with PlatformlessWSO2

2024 April Patch TuesdayIvanti

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

Digital Tools & AI in Career DevelopmentMahmoud Rabie

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300

Top 10 Hubspot Development Companies in 2024TopCSSGallery

Recently uploaded (20)

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

React Native vs Ionic - The Best Mobile App Framework

Connecting the Dots for Information Discovery.pdf

A Journey Into the Emotions of Software Developers

4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector

A Glance At The Java Performance Toolbox

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

A Framework for Development in the AI Age

Accelerating Enterprise Software Engineering with Platformless

2024 April Patch Tuesday

Generative Artificial Intelligence: How generative AI works.pdf

Long journey of Ruby standard library at RubyConf AU 2024

Digital Tools & AI in Career Development

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

Potential of AI (Generative AI) in Business: Learnings and Insights

Kuma Meshes Part I - The basics - A tutorial

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...

Top 10 Hubspot Development Companies in 2024

In Memory Data Grids, Demystified!

1. Uri Cohen Head of Product @ GigaSpaces @uri1803 github.com/uric In-Memory Data Grids, Demystified

2. Agenda • Why IMDG? • Brief History • How It Works – Data model & placement – HA and fault tolerance – Consistency – Internals

3. Why IMDG? Today, more than ever, there are many choices when it comes to storing your data

6. So Why Indeed??

7. The Need for Speed, In Real Time…

8. Some Facts

9. Memory will always be faster than disk (usually by orders of magnitude)

10. Recent Survey

11. 67% The ratio of IT managers that think that real time analysis is the biggest challenge for big data implementations

12. 40% • Plan to use in memory technologies for big data projects. • Only 32% mentioned Hadoop

13. Stream Processing

14. Hell, Even Gartner Thinks So “In memory computing (IMC) … provides transformational opportunities. The execution of certain-types of hours-long batch processes can be squeezed into minutes or even seconds … Millions of events can be scanned in a matter of a few tens of millisecond to detect correlations and patterns pointing at emerging opportunities and threats "as things happen.”

15. And nowadays HW and SW just makes it a whole lot cheaper

16. Some Common Use Cases

17. Fast, Transactional Data Access • Inventory management • Financial reference data • Real time transactional data

18. Real Time Stream Processing • Fraud Detection • Click Stream Analysis • Real time analytics • Continuous calculation

19. Heavyweight Offline Calculations • Trade Reconciliation • Pattern analysis and detection • Number crunching

20. Caching • Database offloading • Content heavy websites

21. The Evolution of Data Grids

22. First There Were Local Caches Cache In process caching of Key->Value data structure Distribute Cache Partitioned cache nodes IMDG Partitioned system of record IMDG.next() Good for repetitive-data reads Limited in capacity Doesn’t handle write-heavy scenarios Reads are only part latency path

23. Then Came Distributed Caches Cache In process caching of Key->Value data structure Distribute Cache Partitioned cache nodes IMDG Partitioned system of record Increased Capacity Still no support for write-heavy scenarios Limited to ID-based reads Reads are only part latency path IMDG.next()

24. In Memory Data Grids Cache In process caching of Key->Value data structure Increased capacity Write scalability Can serve as system of record with querying & transaction semantics Still limited in capacity Latency can come from other parts of your app Distribute Cache Partitioned cache nodes IMDG Partitioned system of record IMDG.next()

25. How It Works

26. Data Models

27. Data Placement – Fixed Hashing 27 hash(key) % #nodes

28. Fixed Hashing - HA 28 hash(key) % #nodes

29. Fixed Hashing – Scaling 29 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

30. Data Placement – Consistent Hashing 30 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

31. Data Placement – Consistent Hashing 31 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

32. Data Placement – Consistent Hashing 32 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

33. Data Placement – Consistent Hashing 33 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

34. Data Placement – Consistent Hashing 34 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

35. Data Consistency Since we’re dealing with distributed data, consistency cannot be taken for granted • Read after write • Read after read • Write-write consistency

36. Solution 1: Single Master

37. Solution 2: Read/Write Quorums

38. Some More Concerns • Transactions • Querying • Failure detection • Leader election • Persistency • Interoperability

39. IMDG.next() Using IMDG for messaging, BL

40. IMDG.next() SSD FTW!

41. Thank You! docs.gigaspaces.com