SlideShare a Scribd company logo
1 of 32
S U R I N D E R
2 N D M A R C H 2 0 2 2
Apache Ignite
Agenda
 Setting up context
 Cache Evolution
 Apache Ignite
 Data Queries
 Compute
 Data Partitioning
 Eviction policies
 Performance Comparison
Stream Consuming Application
Too many read, write and updates to
database
Limited connections
Can slow down stream under load
Stream Consuming Application: 1
Cache serves as first data layer
Manage persisting data to database
Processing much faster due to no direct DB access
Stream Consuming Application cont…
Cache serves as first class in memory data database
Manage persisting data to native storage
No DB connections, mechanism overhead
Cache Evolution
Cache Evolution
 Distributed caches
 Shared cache for app instances
 Beyond local RAM capacity
 Ease of maintenance
 No auto sync with DB(yes/no) ?
 In App caches
 Cache results
 More responsive application
 Reduce load on DB
 Limited to local RAM size
Cache Evolution : Data grids
 Benefits
 Distributed caches with brains
 Compute capabilities
 DB Read/Write through
 Collocated processing
 Better scalability
Cache Evolution : In memory computing
 Memory centric storage
 Scalable to store data in TBs
 Sql, transactions support
 Collocate related data
 DB Read/Write through
 Pluggable to ext databases
 Native storage on disk
 No Ram warm up
 Compute capabilities
 Map Reduce
 Collocated processing
 Better scalability
What is Apache Ignite ?
 A distributed cache
 A Distributed in memory data grid
 A Distributed in memory database
 High-performance computing with in-memory
 ANSI 99 SQL Compliant
 Transactional operations
 SQL transactions in beta
Ignite cluster
 Group of nodes
 Types:
 Server : stores data, baseline node
 Thick client node : doesn’t store data
 Thin client node : not part of cluster
 Attribute based grouping possible
 Scalable
 Fault tolerant
 Data consistency
 Demo
Data Grid
 Distributed In-Memory Caching
 Read/Write through
 Data Consistency
 Off-Heap Storage
 Distributed SQL
 ACID Support
 Transactions
Keep required backup
Everyone knows
everything
Cache Modes
Cache Queries…
 Scan Query : Return data matching BiPredicate
 Predicate sent to each node,
 Node scan its cache
 Data consolidated by requested node
 Sql Query : load data based on sql given
 Needs indexing to be enabled
 Registering indexing in config
 Annotations for fields visibility
 Other queries:
 Text Query
 Index query
 Continuous query
Data Partitioning
 Partitioned caches
 Backups
 Ensures data availability in node failures
 Read from backup node when primary node leaves
 Demo
Demo Queries
 Scan Query
 Sql Query
 Data collocation
 Next week : this slide onwards
Data collocation
 Collocate related data for performance
 All Employees of dept. can be stored together
 Affinity on dept. attribute
 Only key attribute can be used in affinity key
 Performant CRUD operations
 Avoids network trips
 Reduced latency
 Can cause hot nodes if used inappropriately
Compute Tasks
 Run distributed computations on grid
 Tasks can be run on selected nodes
 Ignite manages the task management
 E.g. node specific aggregates
 List each dept.. students stored on each node
 Can be parallelized
Continuous Queries
 Exactly once processing semantic
 3 basic components
 Cache to monitor updates
 Remote filter to look for data changes
 Local listener to act upon data changes
 Optional initial query to process initial data
 Used to capture data changes on cache
 Use case: Reacting to cache entry change
 Listen for particular state of cache value
 Process the state
 Move to next state
Eviction Policies
 On Heap [cache level]
 LRU : Recommended when in doubt
 FIFO : It ignores the element access order
 Sorted : Sorted according to key for order
 Off Heap [data region level]
 Random LRU:
 Random-2 LRU
 Persistence On [Page replacement]
 Random-LRU
 Segmented-LRU
 Clock
Persistent Store
 CacheStoreAdapter extendable
 Read through
 Write through
 Write behind
 Works behind the cache API’s
Data Distribution
 Why distributing data ?
 Data size can go beyond node limits
 Load beyond node processing limits
 Solutions:
 partition the dataset
 Migrate to distributed database
 Both will have set of nodes : topology
Data Distribution Soln.
 Distribution Requirements:
 Algorithm
 Distribution Uniformity
 Minimal disruption
 Approaches:
 Mod N
 Consistent Hashing
 Rendezvous(HRW)
Data Distribution in Ignite
 Mapping partition to node
 Rendezvous Hashing
 Cluster changes moves partitions
 Mapping key to partition
 Mod N
 Partitions are fixed
 1024 by default
Data Rebalancing
 Used when new node join the grid
 In memory grids start rebalancing immediately
 Enabled manually when persistence is enabled
 Possibly more backups than configured in such scenarios
 Rebalance Modes
 SYNC: cache calls blocked until rebalancing is completed
 ASYNC: rebalancing happen in background. Cache respond immediately
 NONE : No rebalancing, cache loaded on demand when required or explicitly loading
Partition Map Exchange
 Triggered when partitions need to
moved across nodes
 A node joins/leaves the cluster
 New cache is created/stopped
 An index is created etc.
 Cluster waits for ongoing
operations
 Oldest/youngest node is
coordinator
Native Storage Architecture
 Work directory
 Binary data : internal metadata
 Marshaler : marshaler info
 DB
 Lock file : used to ensure node lock
 node dir.(s) : cache partitions
 cp dir. (checkpoint start end markers)
 WAL dir.
 node(s) dir. : wal segments
 Archive dir.
 Node(s) dir. : wal segments
Dirty Pages
 Pages are always on disk, optionally in RAM
 Each cache update is written to RAM and
appended to WAL
 Cache operation cause dirty pages
 Dirty pages are accumulated in RAM
 Checkpoint: batch of dirty pages written to
disk
 WAL file cleared after checkpoint
 Updates between checkpoints are logged
 Nodes crashes between checkpoints ?
 WAL to the rescue
Apache Ignite ~ Cassandra
 Insert and Update performance is
comparable
 Read and mixed(read + update) are 2x+
better in ignite
 Cassandra UPADTE outperforms under high
load
 Cassandra demands upfront query patterns
 Major model changes/new tables if
 Query changes required
 New queries with different requirements needed
 Ignite support collocated/non collocated
joins and hence
 Queries can be created just like old school sql
 No major changes required except creating few
indexes if needed
 Check reference slide for more
Next steps
 Read docs
 Get hands dirty with ignite
 Explore queries
 Ignite compute tasks
 Native persistence
 Third party persistence
References
 https://ignite.apache.org/docs/latest/
 https://www.youtube.com/watch?v=eMs_2vEsbBk
 https://dzone.com/articles/apache-ignite-client-connectors-variety
 https://apacheignite.readme.io/docs/leader-election
 https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exc
hange+-+under+the+hood

 https://data-science-blog.com/blog/2020/09/25/in-memory-data-grid-vs-
distributed-cache-which-is-best/
 https://hazelcast.com/blog/imdg-vs-imdb-a-business-level-perspective/
 https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-
cassandratm-benchmarks-power-in-memory-computing
Questions

More Related Content

What's hot

InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxData
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...HostedbyConfluent
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdfChris Hoyean Song
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionDatabricks
 
Contemporary Linux Networking
Contemporary Linux NetworkingContemporary Linux Networking
Contemporary Linux NetworkingMaximilan Wilhelm
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)Ryan Blue
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)Alexandre Dutra
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsJignesh Shah
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxData
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsdatamantra
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Distributed SQL Databases Deconstructed
Distributed SQL Databases DeconstructedDistributed SQL Databases Deconstructed
Distributed SQL Databases DeconstructedYugabyte
 
Comparison of SRv6 Extensions uSID, SRv6+, C-SRH
Comparison of SRv6 Extensions uSID, SRv6+, C-SRHComparison of SRv6 Extensions uSID, SRv6+, C-SRH
Comparison of SRv6 Extensions uSID, SRv6+, C-SRHKentaro Ebisawa
 

What's hot (20)

InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
 
Contemporary Linux Networking
Contemporary Linux NetworkingContemporary Linux Networking
Contemporary Linux Networking
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
Staying Ahead of the Curve with Spring and Cassandra 4 (SpringOne 2020)
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Distributed SQL Databases Deconstructed
Distributed SQL Databases DeconstructedDistributed SQL Databases Deconstructed
Distributed SQL Databases Deconstructed
 
Comparison of SRv6 Extensions uSID, SRv6+, C-SRH
Comparison of SRv6 Extensions uSID, SRv6+, C-SRHComparison of SRv6 Extensions uSID, SRv6+, C-SRH
Comparison of SRv6 Extensions uSID, SRv6+, C-SRH
 
VPNaaS in Neutron
VPNaaS in NeutronVPNaaS in Neutron
VPNaaS in Neutron
 

Similar to Apache Ignite In-Memory Data Grid Agenda and Overview

De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...CitiusTech
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppthothyfa
 
Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesAndrew Kandels
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxAnkitChauhan817826
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsDirecti Group
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalkgordonyorke
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptxlevichan1
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internalsnarsiman
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsFirat Atagun
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesappaji intelhunt
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt omalreda
 

Similar to Apache Ignite In-Memory Data Grid Agenda and Overview (20)

De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational Databases
 
Cassandra
CassandraCassandra
Cassandra
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Sinfonia
Sinfonia Sinfonia
Sinfonia
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalk
 
MYSQL
MYSQLMYSQL
MYSQL
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptx
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Apache Ignite In-Memory Data Grid Agenda and Overview

  • 1. S U R I N D E R 2 N D M A R C H 2 0 2 2 Apache Ignite
  • 2. Agenda  Setting up context  Cache Evolution  Apache Ignite  Data Queries  Compute  Data Partitioning  Eviction policies  Performance Comparison
  • 3. Stream Consuming Application Too many read, write and updates to database Limited connections Can slow down stream under load
  • 4. Stream Consuming Application: 1 Cache serves as first data layer Manage persisting data to database Processing much faster due to no direct DB access
  • 5. Stream Consuming Application cont… Cache serves as first class in memory data database Manage persisting data to native storage No DB connections, mechanism overhead
  • 7. Cache Evolution  Distributed caches  Shared cache for app instances  Beyond local RAM capacity  Ease of maintenance  No auto sync with DB(yes/no) ?  In App caches  Cache results  More responsive application  Reduce load on DB  Limited to local RAM size
  • 8. Cache Evolution : Data grids  Benefits  Distributed caches with brains  Compute capabilities  DB Read/Write through  Collocated processing  Better scalability
  • 9. Cache Evolution : In memory computing  Memory centric storage  Scalable to store data in TBs  Sql, transactions support  Collocate related data  DB Read/Write through  Pluggable to ext databases  Native storage on disk  No Ram warm up  Compute capabilities  Map Reduce  Collocated processing  Better scalability
  • 10. What is Apache Ignite ?  A distributed cache  A Distributed in memory data grid  A Distributed in memory database  High-performance computing with in-memory  ANSI 99 SQL Compliant  Transactional operations  SQL transactions in beta
  • 11. Ignite cluster  Group of nodes  Types:  Server : stores data, baseline node  Thick client node : doesn’t store data  Thin client node : not part of cluster  Attribute based grouping possible  Scalable  Fault tolerant  Data consistency  Demo
  • 12. Data Grid  Distributed In-Memory Caching  Read/Write through  Data Consistency  Off-Heap Storage  Distributed SQL  ACID Support  Transactions
  • 13. Keep required backup Everyone knows everything Cache Modes
  • 14. Cache Queries…  Scan Query : Return data matching BiPredicate  Predicate sent to each node,  Node scan its cache  Data consolidated by requested node  Sql Query : load data based on sql given  Needs indexing to be enabled  Registering indexing in config  Annotations for fields visibility  Other queries:  Text Query  Index query  Continuous query
  • 15. Data Partitioning  Partitioned caches  Backups  Ensures data availability in node failures  Read from backup node when primary node leaves  Demo
  • 16. Demo Queries  Scan Query  Sql Query  Data collocation  Next week : this slide onwards
  • 17. Data collocation  Collocate related data for performance  All Employees of dept. can be stored together  Affinity on dept. attribute  Only key attribute can be used in affinity key  Performant CRUD operations  Avoids network trips  Reduced latency  Can cause hot nodes if used inappropriately
  • 18. Compute Tasks  Run distributed computations on grid  Tasks can be run on selected nodes  Ignite manages the task management  E.g. node specific aggregates  List each dept.. students stored on each node  Can be parallelized
  • 19. Continuous Queries  Exactly once processing semantic  3 basic components  Cache to monitor updates  Remote filter to look for data changes  Local listener to act upon data changes  Optional initial query to process initial data  Used to capture data changes on cache  Use case: Reacting to cache entry change  Listen for particular state of cache value  Process the state  Move to next state
  • 20. Eviction Policies  On Heap [cache level]  LRU : Recommended when in doubt  FIFO : It ignores the element access order  Sorted : Sorted according to key for order  Off Heap [data region level]  Random LRU:  Random-2 LRU  Persistence On [Page replacement]  Random-LRU  Segmented-LRU  Clock
  • 21. Persistent Store  CacheStoreAdapter extendable  Read through  Write through  Write behind  Works behind the cache API’s
  • 22. Data Distribution  Why distributing data ?  Data size can go beyond node limits  Load beyond node processing limits  Solutions:  partition the dataset  Migrate to distributed database  Both will have set of nodes : topology
  • 23. Data Distribution Soln.  Distribution Requirements:  Algorithm  Distribution Uniformity  Minimal disruption  Approaches:  Mod N  Consistent Hashing  Rendezvous(HRW)
  • 24. Data Distribution in Ignite  Mapping partition to node  Rendezvous Hashing  Cluster changes moves partitions  Mapping key to partition  Mod N  Partitions are fixed  1024 by default
  • 25. Data Rebalancing  Used when new node join the grid  In memory grids start rebalancing immediately  Enabled manually when persistence is enabled  Possibly more backups than configured in such scenarios  Rebalance Modes  SYNC: cache calls blocked until rebalancing is completed  ASYNC: rebalancing happen in background. Cache respond immediately  NONE : No rebalancing, cache loaded on demand when required or explicitly loading
  • 26. Partition Map Exchange  Triggered when partitions need to moved across nodes  A node joins/leaves the cluster  New cache is created/stopped  An index is created etc.  Cluster waits for ongoing operations  Oldest/youngest node is coordinator
  • 27. Native Storage Architecture  Work directory  Binary data : internal metadata  Marshaler : marshaler info  DB  Lock file : used to ensure node lock  node dir.(s) : cache partitions  cp dir. (checkpoint start end markers)  WAL dir.  node(s) dir. : wal segments  Archive dir.  Node(s) dir. : wal segments
  • 28. Dirty Pages  Pages are always on disk, optionally in RAM  Each cache update is written to RAM and appended to WAL  Cache operation cause dirty pages  Dirty pages are accumulated in RAM  Checkpoint: batch of dirty pages written to disk  WAL file cleared after checkpoint  Updates between checkpoints are logged  Nodes crashes between checkpoints ?  WAL to the rescue
  • 29. Apache Ignite ~ Cassandra  Insert and Update performance is comparable  Read and mixed(read + update) are 2x+ better in ignite  Cassandra UPADTE outperforms under high load  Cassandra demands upfront query patterns  Major model changes/new tables if  Query changes required  New queries with different requirements needed  Ignite support collocated/non collocated joins and hence  Queries can be created just like old school sql  No major changes required except creating few indexes if needed  Check reference slide for more
  • 30. Next steps  Read docs  Get hands dirty with ignite  Explore queries  Ignite compute tasks  Native persistence  Third party persistence
  • 31. References  https://ignite.apache.org/docs/latest/  https://www.youtube.com/watch?v=eMs_2vEsbBk  https://dzone.com/articles/apache-ignite-client-connectors-variety  https://apacheignite.readme.io/docs/leader-election  https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exc hange+-+under+the+hood   https://data-science-blog.com/blog/2020/09/25/in-memory-data-grid-vs- distributed-cache-which-is-best/  https://hazelcast.com/blog/imdg-vs-imdb-a-business-level-perspective/  https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher- cassandratm-benchmarks-power-in-memory-computing

Editor's Notes

  1. https://ignite.apache.org/docs/latest/memory-configuration/replacement-policies