Nothing can beat RAM with regard to accessing data, analyzing it or searching it. So, what prevents us from leveraging this technology. For a long time, it was the limited size of available RAM and of course it's reliability - node down, data gone. In-memory data grids try to mitigate those disadvantages and let us build highly scalable architectures from commodity hardware. Build your own supercomputer by using concepts like "scale-out", "elasticity" and "resilience". The size of the data grid can be dynamically adapted to your need of resources and node-failures are compensated by having failover mechanisms in place. Well, and not only data is distributed in the cluster but also the logic that needs to be executed on the data.
In this talk I will show what in-memory data grids are and how they work. Always with Hazelcast in mind - an open source implementation of an IMDG which is really lightweight and provides an easy way to get started with this topic (and nevertheless being capable of handling requirements from large enterprises).
2. RALPH WINZINGER
WHO AM I?
• Senior Technical Leader @ Senacor Technologies
• evaluating new technologies
• panning of Senacor-internal academy
• hacker :-)
• Senacor Technologies, several offices in Germany
• partner for large IT transformations @ big brands
• finance, logistics, industry, insurance, government
3. • nature of applications is changing
• increased amount of connectivity and communication
• increased amount of data collected from sensors and apps
• expectations of customers are changing
• services have to be available at any time
• responses need to be delivered immediately
NEW CHALLENGES DUE TO DIGITIZATION
PHYSICAL MEETS DIGITAL
4. NEW CHALLENGES DUE TO DIGITIZATION
PHYSICAL MEETS DIGITAL
MORE DATA
MORE REQUESTS
HIGHER AVAILABILITY
HIGHER PERFORMANCE
?
5. Web 2.0
MASSIVE PARALLEL APPROACH
DOES IT SCALE?
2003
2004
2006
2007
2009
Google Distributed Filesystem
Google Map Reduce
Google Big Table
Amazon Dynamo
Facebook Cassandra
2000
2010
6. Web 2.0
MASSIVE PARALLEL APPROACH
DOES IT SCALE?
2003
2004
2006
2007
2009
Google Distributed Filesystem
Google Map Reduce
Google Big Table
Amazon Dynamo
Facebook Cassandra
scaling to
billions of requests per day
with
commodity hardware
- scaling out -
2000
2010
7. SCALE UP VS. SCALE OUT
DOES IT SCALE?
• scaling up is easy but surely expensive
• every piece of technology has upper limits
• scaling out is cheap but has certain drawbacks
• clustering is commodity for many years now,
but primary addresses logic, not data
• synchronization issues
10. EVOLUTION OF PERFORMANCE AND PRICING
CAPABILITIES & COSTS
NETWORKLATENCY
MEMORYCAPACITY
Price
THIS IS THE BASE FOR
IN-MEMORY DATA GRIDS
11. IN-MEMORY DATA-GRIDS
JUST KEEP IT IN MIND
• IN-MEMORY DATA
• all data needed is supposed to be kept in memory
• HEAP / RAM is becoming a first class citizen
• GRID
• too big for one node, so data is distributed in cluster
• already a couple of players out there
• Hazelcast, Oracle, Terracotta, Infinispan, GridGain, …
12. IN-MEMORY DATA-GRIDS
JUST KEEP IT IN MIND
• IN-MEMORY DATA
• all data needed is supposed to be kept in memory
• HEAP / RAM is becoming a first class citizen
• GRID
• too big for one node, so data is distributed in cluster
• already a couple of players out there
• Hazelcast, Oracle, Terracotta, Infinispan, GridGain, …
MEMORY X1
NETWORK X100
DISK X1000
13. IN-MEMORY DATA-GRIDS
JUST KEEP IT IN MIND
• IN-MEMORY DATA
• all data needed is supposed to be kept in memory
• HEAP / RAM is becoming a first class citizen
• GRID
• too big for one node, so data is distributed in cluster
• already a couple of players out there
• Hazelcast, Oracle, Terracotta, Infinispan, GridGain, …
MEMORY X1
NETWORK X100
DISK X1000
14. EMBEDDED OR CLIENT/SERVER IMDG APPROACH
MAKI AND NIGIRI
• embedded IMDG
• every node-instance of an app is contributing
to overall memory
• client / server
• dedicated memory cluster, apart form application
16. DISTRIBUTED DATA AND THE CAP THEOREM
… GO, CHOOSE TWO OF THEM!
or even better: „drop one of them“
Actually no choice - as long as we are
in a network
Use a quorum - if there are enough
nodes with the same data, that is the
truth. Might get expensive
Tolerate a „split brain“ and keep on
working. Might get hard to merge
P
C
A
C
PA
17. DISTRIBUTED DATA AND THE CAP THEOREM
… GO, CHOOSE TWO OF THEM!
or even better: „drop one of them“
Actually no choice - as long as we are
in a network
Use a quorum - if there are enough
nodes with the same data, that is the
truth. Might get expensive
Tolerate a „split brain“ and keep on
working. Might get hard to merge
P
C
A
C
PA
18. HIGH DENSITY DATA
HONEY, I SHRUNK THE DATA
• serialization has massive impact on
• performance - how fast can be de-/serialized?
• throughput - how big is data on the wire?
• volume - how much data can be put in memory?
• go & compare Java, XML, JSON, Protobuf, Capnproto, Thrift, …
• … and be suprised!
• hypercast = hazelcast +c24 preon
19. OFF-HEAP MEMORY
LEAVING THE SANDBOX
• IMDGs keep lots of data in memory - say hello to our friend, the
garbage collector!
• organizational overhead will be present if millions of objects are
stored on the heap
• tuning and deep understanding garbage collection is mandatory
• off-heap memory to the rescue
• data is not stored on the heap but in explicitly allocated areas
• IMDG is responsible for deallocating memory
20. OFF-HEAP MEMORY
LEAVING THE SANDBOX
• IMDGs keep lots of data in memory - say hello to our friend, the
garbage collector!
• organizational overhead will be present if millions of objects are
stored on the heap
• tuning and deep understanding garbage collection is mandatory
• off-heap memory to the rescue
• data is not stored on the heap but in explicitly allocated areas
• IMDG is responsible for deallocating memory
java.misc.Unsafe
21. OFF-HEAP MEMORY
LEAVING THE SANDBOX
• IMDGs keep lots of data in memory - say hello to our friend, the
garbage collector!
• organizational overhead will be present if millions of objects are
stored on the heap
• tuning and deep understanding garbage collection is mandatory
• off-heap memory to the rescue
• data is not stored on the heap but in explicitly allocated areas
• IMDG is responsible for deallocating memory
java.misc.Unsafe
22. DATA SHARDING & ELASTICITY
WHERE DID IT GO?
• scaling out with distributed data only makes sense when data is
partitioned - how to find the right partition?
• an IMDG is quite close to a HashMap - partitions are buckets
• partitionID = hashcode() % num_partions
• now think of a distributed HashMap - partitions are scattered over
our cluster
NODE 1
1
2
3
NODE 2
4
5
6
NODE 3
7
8
9
NODE N
P-2
P-1
P
…
23. DATA SHARDING & ELASTICITY
WHERE DID IT GO?
• scaling out with distributed data only makes sense when data is
partitioned - how to find the right partition?
• an IMDG is quite close to a HashMap - partitions are buckets
• partitionID = hashcode() % num_partions
• now think of a distributed HashMap - partitions are scattered over
our cluster
NODE 1
1
2
3
NODE 2
4
5
6
NODE 3
7
8
9
NODE N
P-5
P-4
P-3
NODE N+1
P-2
P-1
P
…
24. DATA SHARDING & ELASTICITY
WHERE DID IT GO?
• scaling out with distributed data only makes sense when data is
partitioned - how to find the right partition?
• an IMDG is quite close to a HashMap - partitions are buckets
• partitionID = hashcode() % num_partions
• now think of a distributed HashMap - partitions are scattered over
our cluster
NODE 1
1
2
3
NODE 2
4
5
6
NODE 3
7
8
9
NODE N
P-8
P-7
P-6
NODE N+1
P-5
P-4
P-3
EC3
P-2
P-1
P
…
25. ELDEST MEMBER VS. CENTRAL MANAGEMENT
HAVING A PARTY
• there is no central management instance in a (Hazelcast) IMDG
cluster, no single point of failure
• autodiscovery via network broadcast
• there is always one node which knows all other members - like the
first person on a party which gets introduced to all other guest
26. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
27. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
1 2
3
backup
partitions
28. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
1
2
3
backup
partitions
29. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
1 2
3
backup
partitions
30. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
1 2 3
backup
partitions
31. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
backup
partitions
1 2 31 2 31 2
31 2 3
32. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
backup
partitions
1 2 31 2 31 2
31 2 3
1 2
3
2 2
3
33. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
backup
partitions
1 2 31 2 31 2
31 2 3
1
2
3
2 2
3
34. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
backup
partitions
1 2 31 2 31 2
31 2 3
1 2
3
2 2
3
35. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
backup
partitions
1 2 31 2 31 2
31 2 3
1 23
2 2
3
36. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
backup
partitions
1 2 31 2 31 2
31 2 3
1 23
2 23
37. FAILOVER
AND IF I PULLED THE PLUG???
• data is not only sharded but also redundant to recover from failing
nodes
NODE 1
1 2
3
NODE 2
1 2
3
NODE 3
1 2
3
NODE 4
1 2
3
backup
partitions
1 2 31 2 3
1 2 3
1’ 2’3’
1’ 2’
3’3 2 2
38. DISTRIBUTED COMPUTING IN AN IMDG
DIVIDE AND CONQUER
• reading data from the cluster and processing it is a straightforward
approach - but not always clever
• it might also be feasible to send algorithms to the cluster and
distribute processing
• MapReduce
• Hazelcast has built-in support for distributed executors
• think of it as serializable Runnables which can be sent and
executed on a different node