SlideShare a Scribd company logo
S U R I N D E R
2 N D M A R C H 2 0 2 2
Apache Ignite
Agenda
 Setting up context
 Cache Evolution
 Apache Ignite
 Data Queries
 Compute
 Data Partitioning
 Eviction policies
 Performance Comparison
Stream Consuming Application
Too many read, write and updates to
database
Limited connections
Can slow down stream under load
Stream Consuming Application: 1
Cache serves as first data layer
Manage persisting data to database
Processing much faster due to no direct DB access
Stream Consuming Application cont…
Cache serves as first class in memory data database
Manage persisting data to native storage
No DB connections, mechanism overhead
Cache Evolution
Cache Evolution
 Distributed caches
 Shared cache for app instances
 Beyond local RAM capacity
 Ease of maintenance
 No auto sync with DB(yes/no) ?
 In App caches
 Cache results
 More responsive application
 Reduce load on DB
 Limited to local RAM size
Cache Evolution : Data grids
 Benefits
 Distributed caches with brains
 Compute capabilities
 DB Read/Write through
 Collocated processing
 Better scalability
Cache Evolution : In memory computing
 Memory centric storage
 Scalable to store data in TBs
 Sql, transactions support
 Collocate related data
 DB Read/Write through
 Pluggable to ext databases
 Native storage on disk
 No Ram warm up
 Compute capabilities
 Map Reduce
 Collocated processing
 Better scalability
What is Apache Ignite ?
 A distributed cache
 A Distributed in memory data grid
 A Distributed in memory database
 High-performance computing with in-memory
 ANSI 99 SQL Compliant
 Transactional operations
 SQL transactions in beta
Ignite cluster
 Group of nodes
 Types:
 Server : stores data, baseline node
 Thick client node : doesn’t store data
 Thin client node : not part of cluster
 Attribute based grouping possible
 Scalable
 Fault tolerant
 Data consistency
 Demo
Data Grid
 Distributed In-Memory Caching
 Read/Write through
 Data Consistency
 Off-Heap Storage
 Distributed SQL
 ACID Support
 Transactions
Keep required backup
Everyone knows
everything
Cache Modes
Cache Queries…
 Scan Query : Return data matching BiPredicate
 Predicate sent to each node,
 Node scan its cache
 Data consolidated by requested node
 Sql Query : load data based on sql given
 Needs indexing to be enabled
 Registering indexing in config
 Annotations for fields visibility
 Other queries:
 Text Query
 Index query
 Continuous query
Data Partitioning
 Partitioned caches
 Backups
 Ensures data availability in node failures
 Read from backup node when primary node leaves
 Demo
Demo Queries
 Scan Query
 Sql Query
 Data collocation
 Next week : this slide onwards
Data collocation
 Collocate related data for performance
 All Employees of dept. can be stored together
 Affinity on dept. attribute
 Only key attribute can be used in affinity key
 Performant CRUD operations
 Avoids network trips
 Reduced latency
 Can cause hot nodes if used inappropriately
Compute Tasks
 Run distributed computations on grid
 Tasks can be run on selected nodes
 Ignite manages the task management
 E.g. node specific aggregates
 List each dept.. students stored on each node
 Can be parallelized
Continuous Queries
 Exactly once processing semantic
 3 basic components
 Cache to monitor updates
 Remote filter to look for data changes
 Local listener to act upon data changes
 Optional initial query to process initial data
 Used to capture data changes on cache
 Use case: Reacting to cache entry change
 Listen for particular state of cache value
 Process the state
 Move to next state
Eviction Policies
 On Heap [cache level]
 LRU : Recommended when in doubt
 FIFO : It ignores the element access order
 Sorted : Sorted according to key for order
 Off Heap [data region level]
 Random LRU:
 Random-2 LRU
 Persistence On [Page replacement]
 Random-LRU
 Segmented-LRU
 Clock
Persistent Store
 CacheStoreAdapter extendable
 Read through
 Write through
 Write behind
 Works behind the cache API’s
Data Distribution
 Why distributing data ?
 Data size can go beyond node limits
 Load beyond node processing limits
 Solutions:
 partition the dataset
 Migrate to distributed database
 Both will have set of nodes : topology
Data Distribution Soln.
 Distribution Requirements:
 Algorithm
 Distribution Uniformity
 Minimal disruption
 Approaches:
 Mod N
 Consistent Hashing
 Rendezvous(HRW)
Data Distribution in Ignite
 Mapping partition to node
 Rendezvous Hashing
 Cluster changes moves partitions
 Mapping key to partition
 Mod N
 Partitions are fixed
 1024 by default
Data Rebalancing
 Used when new node join the grid
 In memory grids start rebalancing immediately
 Enabled manually when persistence is enabled
 Possibly more backups than configured in such scenarios
 Rebalance Modes
 SYNC: cache calls blocked until rebalancing is completed
 ASYNC: rebalancing happen in background. Cache respond immediately
 NONE : No rebalancing, cache loaded on demand when required or explicitly loading
Partition Map Exchange
 Triggered when partitions need to
moved across nodes
 A node joins/leaves the cluster
 New cache is created/stopped
 An index is created etc.
 Cluster waits for ongoing
operations
 Oldest/youngest node is
coordinator
Native Storage Architecture
 Work directory
 Binary data : internal metadata
 Marshaler : marshaler info
 DB
 Lock file : used to ensure node lock
 node dir.(s) : cache partitions
 cp dir. (checkpoint start end markers)
 WAL dir.
 node(s) dir. : wal segments
 Archive dir.
 Node(s) dir. : wal segments
Dirty Pages
 Pages are always on disk, optionally in RAM
 Each cache update is written to RAM and
appended to WAL
 Cache operation cause dirty pages
 Dirty pages are accumulated in RAM
 Checkpoint: batch of dirty pages written to
disk
 WAL file cleared after checkpoint
 Updates between checkpoints are logged
 Nodes crashes between checkpoints ?
 WAL to the rescue
Apache Ignite ~ Cassandra
 Insert and Update performance is
comparable
 Read and mixed(read + update) are 2x+
better in ignite
 Cassandra UPADTE outperforms under high
load
 Cassandra demands upfront query patterns
 Major model changes/new tables if
 Query changes required
 New queries with different requirements needed
 Ignite support collocated/non collocated
joins and hence
 Queries can be created just like old school sql
 No major changes required except creating few
indexes if needed
 Check reference slide for more
Next steps
 Read docs
 Get hands dirty with ignite
 Explore queries
 Ignite compute tasks
 Native persistence
 Third party persistence
References
 https://ignite.apache.org/docs/latest/
 https://www.youtube.com/watch?v=eMs_2vEsbBk
 https://dzone.com/articles/apache-ignite-client-connectors-variety
 https://apacheignite.readme.io/docs/leader-election
 https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exc
hange+-+under+the+hood

 https://data-science-blog.com/blog/2020/09/25/in-memory-data-grid-vs-
distributed-cache-which-is-best/
 https://hazelcast.com/blog/imdg-vs-imdb-a-business-level-perspective/
 https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-
cassandratm-benchmarks-power-in-memory-computing
Questions

More Related Content

What's hot

NoSQL Technology
NoSQL TechnologyNoSQL Technology
NoSQL Technology
Fachry Bafadal
 
DPPL AcaDocFlow
DPPL AcaDocFlowDPPL AcaDocFlow
DPPL AcaDocFlowEdi Yanto
 
Fungsi Single Row dan Multi Row pada Oracle
Fungsi Single Row dan Multi Row pada OracleFungsi Single Row dan Multi Row pada Oracle
Fungsi Single Row dan Multi Row pada Oracle
RIZKY ASIAWATI
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
Databricks
 
Principales bases de datos
Principales bases de datosPrincipales bases de datos
Principales bases de datos
Sergio Castañeda Ortega
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
Spark Summit
 
Standard IEEE
Standard IEEEStandard IEEE
Standard IEEE
afandi_latif
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested ColumnsMaterialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Databricks
 
Penjelasan bagian bagian motherboard
Penjelasan bagian bagian motherboard Penjelasan bagian bagian motherboard
Penjelasan bagian bagian motherboard
Telaga Gunawan
 
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkApache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
Wes McKinney
 
Software Engineering 1 (Introduction of Software Engineering)
Software Engineering 1 (Introduction of Software Engineering)Software Engineering 1 (Introduction of Software Engineering)
Software Engineering 1 (Introduction of Software Engineering)
Adam Mukharil Bachtiar
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
Andrew Lamb
 
Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and Kafka
Ashish Thapliyal
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
vishal choudhary
 
BAB IV - FORMAT TABEL DAN HALAMAN WEB
BAB IV - FORMAT TABEL DAN HALAMAN WEBBAB IV - FORMAT TABEL DAN HALAMAN WEB
BAB IV - FORMAT TABEL DAN HALAMAN WEB
TeukuMahawira
 
In-memory Databases
In-memory DatabasesIn-memory Databases
In-memory Databases
Robert Friberg
 
RPL 1 (Lama) - Perancangan Perangkat Lunak
RPL 1 (Lama) - Perancangan Perangkat LunakRPL 1 (Lama) - Perancangan Perangkat Lunak
RPL 1 (Lama) - Perancangan Perangkat Lunak
Adam Mukharil Bachtiar
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
Abdullah Çetin ÇAVDAR
 

What's hot (20)

NoSQL Technology
NoSQL TechnologyNoSQL Technology
NoSQL Technology
 
DPPL AcaDocFlow
DPPL AcaDocFlowDPPL AcaDocFlow
DPPL AcaDocFlow
 
Fungsi Single Row dan Multi Row pada Oracle
Fungsi Single Row dan Multi Row pada OracleFungsi Single Row dan Multi Row pada Oracle
Fungsi Single Row dan Multi Row pada Oracle
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
 
Principales bases de datos
Principales bases de datosPrincipales bases de datos
Principales bases de datos
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
 
Standard IEEE
Standard IEEEStandard IEEE
Standard IEEE
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested ColumnsMaterialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
 
Penjelasan bagian bagian motherboard
Penjelasan bagian bagian motherboard Penjelasan bagian bagian motherboard
Penjelasan bagian bagian motherboard
 
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkApache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
 
Software Engineering 1 (Introduction of Software Engineering)
Software Engineering 1 (Introduction of Software Engineering)Software Engineering 1 (Introduction of Software Engineering)
Software Engineering 1 (Introduction of Software Engineering)
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
 
Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and Kafka
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
BAB IV - FORMAT TABEL DAN HALAMAN WEB
BAB IV - FORMAT TABEL DAN HALAMAN WEBBAB IV - FORMAT TABEL DAN HALAMAN WEB
BAB IV - FORMAT TABEL DAN HALAMAN WEB
 
In-memory Databases
In-memory DatabasesIn-memory Databases
In-memory Databases
 
RPL 1 (Lama) - Perancangan Perangkat Lunak
RPL 1 (Lama) - Perancangan Perangkat LunakRPL 1 (Lama) - Perancangan Perangkat Lunak
RPL 1 (Lama) - Perancangan Perangkat Lunak
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
 

Similar to Apache ignite as in-memory computing platform

De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
CitiusTech
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
hothyfa
 
Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational Databases
Andrew Kandels
 
Cassandra
CassandraCassandra
Cassandra
Upaang Saxena
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
AnkitChauhan817826
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
Directi Group
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Alluxio, Inc.
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalk
gordonyorke
 
MYSQL
MYSQLMYSQL
MYSQL
gilashikwa
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
System Design Basics by Pratyush Majumdar
System Design Basics by Pratyush MajumdarSystem Design Basics by Pratyush Majumdar
System Design Basics by Pratyush Majumdar
Pratyush Majumdar
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptx
levichan1
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
narsiman
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
Firat Atagun
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
NPN Training
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
shrey mehrotra
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
appaji intelhunt
 

Similar to Apache ignite as in-memory computing platform (20)

De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational Databases
 
Cassandra
CassandraCassandra
Cassandra
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Sinfonia
Sinfonia Sinfonia
Sinfonia
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalk
 
MYSQL
MYSQLMYSQL
MYSQL
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
System Design Basics by Pratyush Majumdar
System Design Basics by Pratyush MajumdarSystem Design Basics by Pratyush Majumdar
System Design Basics by Pratyush Majumdar
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptx
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 

Apache ignite as in-memory computing platform

  • 1. S U R I N D E R 2 N D M A R C H 2 0 2 2 Apache Ignite
  • 2. Agenda  Setting up context  Cache Evolution  Apache Ignite  Data Queries  Compute  Data Partitioning  Eviction policies  Performance Comparison
  • 3. Stream Consuming Application Too many read, write and updates to database Limited connections Can slow down stream under load
  • 4. Stream Consuming Application: 1 Cache serves as first data layer Manage persisting data to database Processing much faster due to no direct DB access
  • 5. Stream Consuming Application cont… Cache serves as first class in memory data database Manage persisting data to native storage No DB connections, mechanism overhead
  • 7. Cache Evolution  Distributed caches  Shared cache for app instances  Beyond local RAM capacity  Ease of maintenance  No auto sync with DB(yes/no) ?  In App caches  Cache results  More responsive application  Reduce load on DB  Limited to local RAM size
  • 8. Cache Evolution : Data grids  Benefits  Distributed caches with brains  Compute capabilities  DB Read/Write through  Collocated processing  Better scalability
  • 9. Cache Evolution : In memory computing  Memory centric storage  Scalable to store data in TBs  Sql, transactions support  Collocate related data  DB Read/Write through  Pluggable to ext databases  Native storage on disk  No Ram warm up  Compute capabilities  Map Reduce  Collocated processing  Better scalability
  • 10. What is Apache Ignite ?  A distributed cache  A Distributed in memory data grid  A Distributed in memory database  High-performance computing with in-memory  ANSI 99 SQL Compliant  Transactional operations  SQL transactions in beta
  • 11. Ignite cluster  Group of nodes  Types:  Server : stores data, baseline node  Thick client node : doesn’t store data  Thin client node : not part of cluster  Attribute based grouping possible  Scalable  Fault tolerant  Data consistency  Demo
  • 12. Data Grid  Distributed In-Memory Caching  Read/Write through  Data Consistency  Off-Heap Storage  Distributed SQL  ACID Support  Transactions
  • 13. Keep required backup Everyone knows everything Cache Modes
  • 14. Cache Queries…  Scan Query : Return data matching BiPredicate  Predicate sent to each node,  Node scan its cache  Data consolidated by requested node  Sql Query : load data based on sql given  Needs indexing to be enabled  Registering indexing in config  Annotations for fields visibility  Other queries:  Text Query  Index query  Continuous query
  • 15. Data Partitioning  Partitioned caches  Backups  Ensures data availability in node failures  Read from backup node when primary node leaves  Demo
  • 16. Demo Queries  Scan Query  Sql Query  Data collocation  Next week : this slide onwards
  • 17. Data collocation  Collocate related data for performance  All Employees of dept. can be stored together  Affinity on dept. attribute  Only key attribute can be used in affinity key  Performant CRUD operations  Avoids network trips  Reduced latency  Can cause hot nodes if used inappropriately
  • 18. Compute Tasks  Run distributed computations on grid  Tasks can be run on selected nodes  Ignite manages the task management  E.g. node specific aggregates  List each dept.. students stored on each node  Can be parallelized
  • 19. Continuous Queries  Exactly once processing semantic  3 basic components  Cache to monitor updates  Remote filter to look for data changes  Local listener to act upon data changes  Optional initial query to process initial data  Used to capture data changes on cache  Use case: Reacting to cache entry change  Listen for particular state of cache value  Process the state  Move to next state
  • 20. Eviction Policies  On Heap [cache level]  LRU : Recommended when in doubt  FIFO : It ignores the element access order  Sorted : Sorted according to key for order  Off Heap [data region level]  Random LRU:  Random-2 LRU  Persistence On [Page replacement]  Random-LRU  Segmented-LRU  Clock
  • 21. Persistent Store  CacheStoreAdapter extendable  Read through  Write through  Write behind  Works behind the cache API’s
  • 22. Data Distribution  Why distributing data ?  Data size can go beyond node limits  Load beyond node processing limits  Solutions:  partition the dataset  Migrate to distributed database  Both will have set of nodes : topology
  • 23. Data Distribution Soln.  Distribution Requirements:  Algorithm  Distribution Uniformity  Minimal disruption  Approaches:  Mod N  Consistent Hashing  Rendezvous(HRW)
  • 24. Data Distribution in Ignite  Mapping partition to node  Rendezvous Hashing  Cluster changes moves partitions  Mapping key to partition  Mod N  Partitions are fixed  1024 by default
  • 25. Data Rebalancing  Used when new node join the grid  In memory grids start rebalancing immediately  Enabled manually when persistence is enabled  Possibly more backups than configured in such scenarios  Rebalance Modes  SYNC: cache calls blocked until rebalancing is completed  ASYNC: rebalancing happen in background. Cache respond immediately  NONE : No rebalancing, cache loaded on demand when required or explicitly loading
  • 26. Partition Map Exchange  Triggered when partitions need to moved across nodes  A node joins/leaves the cluster  New cache is created/stopped  An index is created etc.  Cluster waits for ongoing operations  Oldest/youngest node is coordinator
  • 27. Native Storage Architecture  Work directory  Binary data : internal metadata  Marshaler : marshaler info  DB  Lock file : used to ensure node lock  node dir.(s) : cache partitions  cp dir. (checkpoint start end markers)  WAL dir.  node(s) dir. : wal segments  Archive dir.  Node(s) dir. : wal segments
  • 28. Dirty Pages  Pages are always on disk, optionally in RAM  Each cache update is written to RAM and appended to WAL  Cache operation cause dirty pages  Dirty pages are accumulated in RAM  Checkpoint: batch of dirty pages written to disk  WAL file cleared after checkpoint  Updates between checkpoints are logged  Nodes crashes between checkpoints ?  WAL to the rescue
  • 29. Apache Ignite ~ Cassandra  Insert and Update performance is comparable  Read and mixed(read + update) are 2x+ better in ignite  Cassandra UPADTE outperforms under high load  Cassandra demands upfront query patterns  Major model changes/new tables if  Query changes required  New queries with different requirements needed  Ignite support collocated/non collocated joins and hence  Queries can be created just like old school sql  No major changes required except creating few indexes if needed  Check reference slide for more
  • 30. Next steps  Read docs  Get hands dirty with ignite  Explore queries  Ignite compute tasks  Native persistence  Third party persistence
  • 31. References  https://ignite.apache.org/docs/latest/  https://www.youtube.com/watch?v=eMs_2vEsbBk  https://dzone.com/articles/apache-ignite-client-connectors-variety  https://apacheignite.readme.io/docs/leader-election  https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exc hange+-+under+the+hood   https://data-science-blog.com/blog/2020/09/25/in-memory-data-grid-vs- distributed-cache-which-is-best/  https://hazelcast.com/blog/imdg-vs-imdb-a-business-level-perspective/  https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher- cassandratm-benchmarks-power-in-memory-computing

Editor's Notes

  1. https://ignite.apache.org/docs/latest/memory-configuration/replacement-policies