Uri Cohen
Head of Product @ GigaSpaces
@uri1803
github.com/uric
In-Memory Data Grids,
Demystified
Agenda
• Why IMDG?
• Brief History
• How It Works
– Data model & placement
– HA and fault tolerance
– Consistency
– Intern...
Why
IMDG?
Today, more than
ever, there are many
choices when it
comes to storing your
data
® Copyright 2011 Gigaspaces Ltd. All Rights
Reserved
4
But There
Many
Solutions
Just A Few Years Back
® Copyright 2011 Gigaspaces Ltd. All Rights
Reserved
5
So Why Indeed??
The Need for
Speed, In
Real Time…
Some Facts
Memory will
always be faster
than disk
(usually by
orders of
magnitude)
Recent Survey
67%
The ratio of IT
managers that think
that real time
analysis is the
biggest challenge
for
big data
implementations
40%
• Plan to use in
memory
technologies for
big data projects.
• Only 32%
mentioned
Hadoop
Stream Processing
Hell, Even Gartner Thinks So
“In memory computing (IMC) … provides
transformational opportunities. The execution of
certai...
And
nowadays
HW and SW
just makes it
a whole lot
cheaper
Some
Common
Use Cases
Fast,
Transactional
Data Access
• Inventory
management
• Financial
reference data
• Real time
transactional data
Real Time
Stream
Processing
• Fraud Detection
• Click Stream
Analysis
• Real time
analytics
• Continuous
calculation
Heavyweight
Offline
Calculations
• Trade
Reconciliation
• Pattern analysis
and detection
• Number crunching
Caching
• Database
offloading
• Content heavy
websites
The
Evolution of
Data Grids
First There Were Local Caches
Cache
In process caching
of Key->Value data
structure
Distribute Cache
Partitioned cache
nod...
Then Came Distributed Caches
Cache
In process caching
of Key->Value data
structure
Distribute Cache
Partitioned cache
node...
In Memory Data Grids
Cache
In process caching
of Key->Value data
structure
Increased capacity
Write scalability
Can serve ...
How It
Works
Data Models
Data Placement – Fixed Hashing
27
hash(key) % #nodes
Fixed Hashing - HA
28
hash(key) % #nodes
Fixed Hashing – Scaling
29
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
Data Placement – Consistent Hashing
30
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
Data Placement – Consistent Hashing
31
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
Data Placement – Consistent Hashing
32
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
Data Placement – Consistent Hashing
33
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
Data Placement – Consistent Hashing
34
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
Data
Consistency
Since we’re dealing
with distributed
data, consistency
cannot be taken for
granted
• Read after write
• R...
Solution 1:
Single
Master
Solution 2:
Read/Write
Quorums
Some More Concerns
• Transactions
• Querying
• Failure detection
• Leader election
• Persistency
• Interoperability
IMDG.next()
Using IMDG for
messaging, BL
IMDG.next()
SSD FTW!
Thank You!
docs.gigaspaces.com
Upcoming SlideShare
Loading in...5
×

In Memory Data Grids, Demystified!

718

Published on

The principles and foundations of in memory data grids

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
718
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
51
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

In Memory Data Grids, Demystified!

  1. 1. Uri Cohen Head of Product @ GigaSpaces @uri1803 github.com/uric In-Memory Data Grids, Demystified
  2. 2. Agenda • Why IMDG? • Brief History • How It Works – Data model & placement – HA and fault tolerance – Consistency – Internals
  3. 3. Why IMDG? Today, more than ever, there are many choices when it comes to storing your data
  4. 4. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved 4 But There Many Solutions
  5. 5. Just A Few Years Back ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved 5
  6. 6. So Why Indeed??
  7. 7. The Need for Speed, In Real Time…
  8. 8. Some Facts
  9. 9. Memory will always be faster than disk (usually by orders of magnitude)
  10. 10. Recent Survey
  11. 11. 67% The ratio of IT managers that think that real time analysis is the biggest challenge for big data implementations
  12. 12. 40% • Plan to use in memory technologies for big data projects. • Only 32% mentioned Hadoop
  13. 13. Stream Processing
  14. 14. Hell, Even Gartner Thinks So “In memory computing (IMC) … provides transformational opportunities. The execution of certain-types of hours-long batch processes can be squeezed into minutes or even seconds … Millions of events can be scanned in a matter of a few tens of millisecond to detect correlations and patterns pointing at emerging opportunities and threats "as things happen.”
  15. 15. And nowadays HW and SW just makes it a whole lot cheaper
  16. 16. Some Common Use Cases
  17. 17. Fast, Transactional Data Access • Inventory management • Financial reference data • Real time transactional data
  18. 18. Real Time Stream Processing • Fraud Detection • Click Stream Analysis • Real time analytics • Continuous calculation
  19. 19. Heavyweight Offline Calculations • Trade Reconciliation • Pattern analysis and detection • Number crunching
  20. 20. Caching • Database offloading • Content heavy websites
  21. 21. The Evolution of Data Grids
  22. 22. First There Were Local Caches Cache In process caching of Key->Value data structure Distribute Cache Partitioned cache nodes IMDG Partitioned system of record IMDG.next() Good for repetitive-data reads Limited in capacity Doesn’t handle write-heavy scenarios Reads are only part latency path
  23. 23. Then Came Distributed Caches Cache In process caching of Key->Value data structure Distribute Cache Partitioned cache nodes IMDG Partitioned system of record Increased Capacity Still no support for write-heavy scenarios Limited to ID-based reads Reads are only part latency path IMDG.next()
  24. 24. In Memory Data Grids Cache In process caching of Key->Value data structure Increased capacity Write scalability Can serve as system of record with querying & transaction semantics Still limited in capacity Latency can come from other parts of your app Distribute Cache Partitioned cache nodes IMDG Partitioned system of record IMDG.next()
  25. 25. How It Works
  26. 26. Data Models
  27. 27. Data Placement – Fixed Hashing 27 hash(key) % #nodes
  28. 28. Fixed Hashing - HA 28 hash(key) % #nodes
  29. 29. Fixed Hashing – Scaling 29 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
  30. 30. Data Placement – Consistent Hashing 30 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
  31. 31. Data Placement – Consistent Hashing 31 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
  32. 32. Data Placement – Consistent Hashing 32 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
  33. 33. Data Placement – Consistent Hashing 33 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
  34. 34. Data Placement – Consistent Hashing 34 Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
  35. 35. Data Consistency Since we’re dealing with distributed data, consistency cannot be taken for granted • Read after write • Read after read • Write-write consistency
  36. 36. Solution 1: Single Master
  37. 37. Solution 2: Read/Write Quorums
  38. 38. Some More Concerns • Transactions • Querying • Failure detection • Leader election • Persistency • Interoperability
  39. 39. IMDG.next() Using IMDG for messaging, BL
  40. 40. IMDG.next() SSD FTW!
  41. 41. Thank You! docs.gigaspaces.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×