Apache ignite is one of the powerful horizontally scalable in-memory computing platforms which is capable to handling huge amount of data in memory/disk with quick cluster restart.
4. Stream Consuming Application: 1
Cache serves as first data layer
Manage persisting data to database
Processing much faster due to no direct DB access
5. Stream Consuming Application cont…
Cache serves as first class in memory data database
Manage persisting data to native storage
No DB connections, mechanism overhead
7. Cache Evolution
Distributed caches
Shared cache for app instances
Beyond local RAM capacity
Ease of maintenance
No auto sync with DB(yes/no) ?
In App caches
Cache results
More responsive application
Reduce load on DB
Limited to local RAM size
8. Cache Evolution : Data grids
Benefits
Distributed caches with brains
Compute capabilities
DB Read/Write through
Collocated processing
Better scalability
9. Cache Evolution : In memory computing
Memory centric storage
Scalable to store data in TBs
Sql, transactions support
Collocate related data
DB Read/Write through
Pluggable to ext databases
Native storage on disk
No Ram warm up
Compute capabilities
Map Reduce
Collocated processing
Better scalability
10. What is Apache Ignite ?
A distributed cache
A Distributed in memory data grid
A Distributed in memory database
High-performance computing with in-memory
ANSI 99 SQL Compliant
Transactional operations
SQL transactions in beta
11. Ignite cluster
Group of nodes
Types:
Server : stores data, baseline node
Thick client node : doesn’t store data
Thin client node : not part of cluster
Attribute based grouping possible
Scalable
Fault tolerant
Data consistency
Demo
12. Data Grid
Distributed In-Memory Caching
Read/Write through
Data Consistency
Off-Heap Storage
Distributed SQL
ACID Support
Transactions
14. Cache Queries…
Scan Query : Return data matching BiPredicate
Predicate sent to each node,
Node scan its cache
Data consolidated by requested node
Sql Query : load data based on sql given
Needs indexing to be enabled
Registering indexing in config
Annotations for fields visibility
Other queries:
Text Query
Index query
Continuous query
15. Data Partitioning
Partitioned caches
Backups
Ensures data availability in node failures
Read from backup node when primary node leaves
Demo
16. Demo Queries
Scan Query
Sql Query
Data collocation
Next week : this slide onwards
17. Data collocation
Collocate related data for performance
All Employees of dept. can be stored together
Affinity on dept. attribute
Only key attribute can be used in affinity key
Performant CRUD operations
Avoids network trips
Reduced latency
Can cause hot nodes if used inappropriately
18. Compute Tasks
Run distributed computations on grid
Tasks can be run on selected nodes
Ignite manages the task management
E.g. node specific aggregates
List each dept.. students stored on each node
Can be parallelized
19. Continuous Queries
Exactly once processing semantic
3 basic components
Cache to monitor updates
Remote filter to look for data changes
Local listener to act upon data changes
Optional initial query to process initial data
Used to capture data changes on cache
Use case: Reacting to cache entry change
Listen for particular state of cache value
Process the state
Move to next state
20. Eviction Policies
On Heap [cache level]
LRU : Recommended when in doubt
FIFO : It ignores the element access order
Sorted : Sorted according to key for order
Off Heap [data region level]
Random LRU:
Random-2 LRU
Persistence On [Page replacement]
Random-LRU
Segmented-LRU
Clock
22. Data Distribution
Why distributing data ?
Data size can go beyond node limits
Load beyond node processing limits
Solutions:
partition the dataset
Migrate to distributed database
Both will have set of nodes : topology
23. Data Distribution Soln.
Distribution Requirements:
Algorithm
Distribution Uniformity
Minimal disruption
Approaches:
Mod N
Consistent Hashing
Rendezvous(HRW)
24. Data Distribution in Ignite
Mapping partition to node
Rendezvous Hashing
Cluster changes moves partitions
Mapping key to partition
Mod N
Partitions are fixed
1024 by default
25. Data Rebalancing
Used when new node join the grid
In memory grids start rebalancing immediately
Enabled manually when persistence is enabled
Possibly more backups than configured in such scenarios
Rebalance Modes
SYNC: cache calls blocked until rebalancing is completed
ASYNC: rebalancing happen in background. Cache respond immediately
NONE : No rebalancing, cache loaded on demand when required or explicitly loading
26. Partition Map Exchange
Triggered when partitions need to
moved across nodes
A node joins/leaves the cluster
New cache is created/stopped
An index is created etc.
Cluster waits for ongoing
operations
Oldest/youngest node is
coordinator
27. Native Storage Architecture
Work directory
Binary data : internal metadata
Marshaler : marshaler info
DB
Lock file : used to ensure node lock
node dir.(s) : cache partitions
cp dir. (checkpoint start end markers)
WAL dir.
node(s) dir. : wal segments
Archive dir.
Node(s) dir. : wal segments
28. Dirty Pages
Pages are always on disk, optionally in RAM
Each cache update is written to RAM and
appended to WAL
Cache operation cause dirty pages
Dirty pages are accumulated in RAM
Checkpoint: batch of dirty pages written to
disk
WAL file cleared after checkpoint
Updates between checkpoints are logged
Nodes crashes between checkpoints ?
WAL to the rescue
29. Apache Ignite ~ Cassandra
Insert and Update performance is
comparable
Read and mixed(read + update) are 2x+
better in ignite
Cassandra UPADTE outperforms under high
load
Cassandra demands upfront query patterns
Major model changes/new tables if
Query changes required
New queries with different requirements needed
Ignite support collocated/non collocated
joins and hence
Queries can be created just like old school sql
No major changes required except creating few
indexes if needed
Check reference slide for more
30. Next steps
Read docs
Get hands dirty with ignite
Explore queries
Ignite compute tasks
Native persistence
Third party persistence