SlideShare a Scribd company logo
1 of 62
Download to read offline
Troubleshooting Apache® Ignite™
Customer Solutions, GridGain
Stan Lukyanov
2019 © GridGain Systems
GridGain and Apache Ignite
GridGain In-Memory Computing Platform
In-Memory
Data Grid
In-Memory
Database
Streaming
Analytics
Continuous
Learning Framework
Segmentation
Protection
Data Center
Replication
Monitoring &
Management
Enterprise
Security
Rolling
Upgrades
Point-in-Time
Recovery
Heterogenous
Recovery
Full, Incremental,
Continuous Backups
Network
Backups
1
2019 © GridGain Systems
Apache Ignite Support – Faster Time to Reliable Ignite
• Get up and running faster with
2 hours initial consultation
• Ensure fast, reliable Ignite with
unlimited 9x5 global support
– Unlimited web/e-mail support
– Identify bugs, workarounds
– Troubleshoot performance,
reliability issues
https://www.gridgain.com/products/services/support/support-apache-ignite
2
2019 © GridGain Systems
Agenda
3
• Monitoring
– Logging
– JMX
– GridGain Web Console
• Troubleshooting
– Network
– Storage
– Performance
• Checklist
2019 © GridGain Systems4
Monitoring
2019 © GridGain Systems
Monitoring
5
Setup it before something bad happens!
2019 © GridGain Systems
Agenda
6
• Monitoring
– Logging
– JMX
– GridGain Web Console
• Troubleshooting
– Network
– Storage
– Performance
• Checklist
2019 © GridGain Systems
Logging
7
What can you do with logs
• Manually check nodes state
• Identify issues with cluster configuration
• Add automatic parsing to report issues on the fly
– With custom or third-party tools
• Provide to GridGain Support experts
2019 © GridGain Systems
Logging
8
Configuring GridGain logs
• https://apacheignite.readme.io/docs/logging
• Use log4j 2.x integration
– Other options: j.u.logging, log4j 1.x, slf4j, custom integration
• Start Ignite in verbose mode
– ignite.sh –v
– java -DIGNITE_QUIET=false –cp ...
2019 © GridGain Systems
Logging
9
Quiet log
2019 © GridGain Systems
Logging
10
Configuring GC logs
• https://apacheignite.readme.io/docs/jvm-and-system-tuning
• Crucial to troubleshoot a lot of issues
• To configure
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
-Xloggc:/path/to/gc/logs/log.txt
2019 © GridGain Systems
Logging
11
How to manage logs
• Choose location for the log files
– The default location is ${IGNITE_HOME}/work/log/ignite.log
– Use a local disk with enough space (2 GB+)
– Don’t use /tmp!
– Good idea to store GridGain, application and GC logs together
• Archive old log files periodically to save on storage space
• Try to save logs for the cluster’s current uptime
2019 © GridGain Systems
Logging
Run-time configuration changes
• Helpful when you need to debug a
running deployment
• Easy to do with log4j 2.x
– Edit the log4j config file directly
• https://logging.apache.org/log4j/2.x/manual/c
onfiguration.html#AutomaticReconfiguration
– Use JMX – log4j has its own bean
• https://logging.apache.org/log4j/2.x/manual/j
mx.html
12
2019 © GridGain Systems
Agenda
13
• Monitoring
– Logging
– JMX
– GridGain Web Console
• Troubleshooting
– Network
– Storage
– Performance
2019 © GridGain Systems
JMX
14
What can you do with JMX
• Check state of various grid subsystems: storage, thread pools, etc
• Monitor metrics changing over time
• Add automatic warnings on important metrics
– With custom or third-party tools
2019 © GridGain Systems
JMX
15
How to leverage JMX beans
• Standard way to monitor Java
applications
– A lot of tools on the market
– JConsole and VisualVM are
bundled with JDK
2019 © GridGain Systems
GridGain Metrics
16
Name Description JMX Query Default
Cluster metrics Basic node information org.apache:clsLdr=*,group=Kernal,name=ClusterMetricsMXBeanImpl
org.apache:clsLdr=*,group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl
Enabled
Data region metrics Memory information org.apache:clsLdr=*,group=DataRegionMetrics,name=<region_name> Disabled
Data storage metrics Persistent storage
information
org.apache:clsLdr=*,group="Persistent Store",name=DataStorageMetrics Disabled
Cache metrics Cache statistics org.apache:clsLdr=*,group=<cache_name>,name="org.apache.ignite.internal.proc
essors.cache.CacheClusterMetricsMXBeanImpl“
org.apache:clsLdr=*,group=<cache_name>,name="org.apache.ignite.internal.proc
essors.cache.CacheLocalMetricsMXBeanImpl“
Disabled
Thread pool metrics Thread pools information org.apache:clsLdr=*,group="Thread Pools",name=<thread_pool_name> Enabled
• https://apacheignite.readme.io/docs/cluster-groups#section-cluster-group-metrics
• https://apacheignite.readme.io/docs/memory-metrics
• https://apacheignite.readme.io/docs/cache-metrics
2019 © GridGain Systems
Agenda
17
• Monitoring
– Logging
– JMX
– GridGain Web Console
• Troubleshooting
– Network
– Storage
– Performance
2019 © GridGain Systems
GridGain Tools
18
Web Console Visor GUI
• Basic version in
Ignite
• Enhanced version in
GridGain
• All-in-one solution
• Web-based
• The richest
functionality
• Available in
GridGain
• All-in-one solution
• Runs on a user’s PC
Command line tools
• Available in Ignite
• Multiple scripts
– Visor CMD
– control.sh
– SQLLine
2019 © GridGain Systems
GridGain Web Console
19
What can be done
• Manage multiple GridGain clusters through a web interface
– Monitor metrics changing over time
– Watch logs and thread dumps of the nodes
– Execute and monitor running queries
• Generate cluster configurations and RDBMS integrations
• Available online to try at https://console.gridgain.com/
• On-premise installation for production using Docker or bare metal
– https://apacheignite-tools.readme.io/docs/docker-deployment
2019 © GridGain Systems
GridGain Web Console
20
2019 © GridGain Systems
GridGain Web Console
21
2019 © GridGain Systems
GridGain Web Console
22
2019 © GridGain Systems
GridGain Web Console
23
2019 © GridGain Systems24
Troubleshooting
2019 © GridGain Systems
Agenda
25
• Monitoring
– Logging
– JMX
– GridGain Web Console
• Troubleshooting
– Network
– Storage
– Performance
• Checklist
2019 © GridGain Systems
Network troubleshooting
26
• Nodes not joining the cluster
• Nodes joining a wrong cluster
• Nodes taking a long time to connect
• Node is kicked out from the cluster
2019 © GridGain Systems
Nodes not joining the cluster
27
Symptom
• Started nodes don’t join in a single cluster – each has a cluster of its own instead
• Topology snapshot is “ver=1, <...> servers=1” on both nodes
2019 © GridGain Systems
Nodes not joining the cluster
28
Possible cause and solution
• Connection issues, firewall, etc
– Check via ping, check firewall settings
– Make sure TCP ports are open: 47500-47509, 47100-47109, 11211, 10800
• IP Finder configuration
– Use TcpDiscoveryVmIpFinder when you know all hosts in advance
• Make sure to list all IPs and ports
– A lot of other options: Kubernetes, cloud, etc
• https://apacheignite.readme.io/docs/kubernetes-deployment
• https://apacheignite.readme.io/docs/tcpip-discovery
2019 © GridGain Systems
Nodes joining a wrong cluster
29
Symptom
• Node started in a seemingly clean environment connects to some unknown cluster
• Topology snapshot is “ver=2, servers=2”
2019 © GridGain Systems
Nodes joining a wrong cluster
30
Possible cause and solution
• Multicast (TcpDiscoveryMulticastIpFinder) is used
– Only use it in closed subnets, or for a “Hello, world!”
• Wrong IPs in the configuration
– Check and fix IP finder configuration – see previous slide
2019 © GridGain Systems
Nodes taking a long time to connect
31
Symptom
• Cluster is taking a long time to start
– Or the first node is taking a long time to start
• After the nodes joined in the cluster performance is good
2019 © GridGain Systems
Nodes taking a long time to connect
32
Possible cause and solution
• Too many addresses in the IP finder
– Ignite will try to connect to each know IP
– Only list the addresses and ports you use
• Too high IgniteConfiguration.failureDetectionTimeout
– Basic timeout for most network operations
– Used for establishing a connection
– Reduce the failureDetectionTimeout to scan through the IP list faster
2019 © GridGain Systems
Node is kicked out from the cluster
33
Symptom
• Cluster nodes report that one of them has failed
[...][WARN ][disco-event-worker-#42][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=13169858-92db-48a3-bc4b-
db369cab6457, addrs=[…], sockAddrs=[…], discPort=47501,
order=2, intOrder=2, lastExchangeTime=1550680047403, loc=false,
ver=2.7.2#20190206-sha1:5f8f5488, isClient=false]
• “Local node segmented” messages on the failed node
[...][WARN ][disco-event-worker-#42][GridDiscoveryManager]
Local node SEGMENTED: TcpDiscoveryNode [id=13169858-92db-48a3-
bc4b-db369cab6457, addrs=[…], sockAddrs=[…], discPort=47501,
order=2, intOrder=2, lastExchangeTime=1550680109573, loc=true,
ver=2.7.2#20190206-sha1:5f8f5488, isClient=false]
2019 © GridGain Systems
Node is kicked out from the cluster
34
Possible cause and solution
• Likely a GC pause
• To confirm
– Check for “Possible too long JVM pause” messages in the logs
– Analyze GC logs around the time of the issue
• https://gceasy.io/ can help
2019 © GridGain Systems
Node is kicked out from the cluster
35
Possible cause and solution
• To handle GC issues
– Increase heap: -Xmx16g
– Try G1 with a latency target: -XX:+UseG1GC -XX:MaxGCPauseMillis=200
– Reduce heap pressure
• Use smaller SQL queries
▪ More about SQL later
• Use IgniteCache.withKeepBinary() to avoid deserialization
▪ https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#withKeepBinary--
• Use on-heap caching with CacheConfiguration.copyOnRead=false
▪ https://apacheignite.readme.io/docs/memory-configuration#section-on-heap-caching
– https://apacheignite.readme.io/docs/jvm-and-system-tuning
• Remedy: change failureDetectionTimeout to allow a longer inactivity
period
2019 © GridGain Systems
Agenda
36
• Monitoring
– Logging
– JMX
– GridGain Web Console
• Troubleshooting
– Network
– Storage
– Performance
• Checklist
2019 © GridGain Systems
Storage troubleshooting
37
• “Out of memory” errors
• Lost data after node restart
• Incorrect/partial data returned by SQL
• Some nodes don’t store data
2019 © GridGain Systems
Out of memory
38
Symptom
• Three kinds of “out of memory” conditions
– Java’s OutOfMemoryError exception
– IgniteOutOfMemoryException exception
– OS killing the Ignite’s JVM
2019 © GridGain Systems
Out of memory
39
Possible cause and solution: OutOfMemoryError
• Heap is too small
– Increase –Xmx
• Large SQL queries are running
– Use lazy queries
• https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-result-set-lazy-
loading
– Avoid many large queries running concurrently
– Split query to reduce data set size
• Before:
▪ SELECT * FROM PERSON ORDER BY AGE
• After:
▪ SELECT * FROM PERSON WHERE AGE < 40 ORDER BY AGE
▪ SELECT * FROM PERSON WHERE AGE >= 40 ORDER BY AGE
2019 © GridGain Systems
Out of memory
40
Possible cause and solution: IgniteOutOfMemoryException
• Data region is too small
IgniteOutOfMemoryException: Out of memory in data region [name=default,
initSize=256,0 MiB, maxSize=476,8 MiB, persistenceEnabled=false] Try the
following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies
– Increase DataRegionConfiguration.maxSize
• https://apacheignite.readme.io/docs/capacity-planning
– Use Native Persistence
• https://apacheignite.readme.io/docs/distributed-persistent-store
– Use DataRegionConfiguration.pageEvictionMode=RANDOM_2_LRU
• https://apacheignite.readme.io/docs/evictions
– Use an expiry policy
• https://apacheignite.readme.io/docs/expiry-policies
2019 © GridGain Systems
Out of memory
41
Possible cause and solution: OOM killer
• Total size of the processes is greater than RAM
– Check for message
Out of memory: Kill process <PID> (java) score <SCORE> or sacrifice
child
– Make sure that total size of the processes is within bounds
– Disable overcommit to have a more graceful, in-process errors
• sysctl -w vm.overcommit_memory=2
• https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
• May affect other applications, use carefully
2019 © GridGain Systems
Lost data after node restart
42
Symptom
• Some or all data was lost after one or several node restarts
2019 © GridGain Systems
Lost data after node restart
43
Possible cause and solution
• In-memory cache with zero backups
– Taking down a node will always lose data
• In-memory cache with one or more backups
– Don’t take down more nodes simultaneously than there are backups
– Wait for rebalance after bringing a node back online (use Web Console for that)
2019 © GridGain Systems
Incorrect/partial data returned by SQL
44
Symptom
• An SQL query with JOIN returns less data than expected
CREATE TABLE ORGANIZATION (ID INT PRIMARY KEY, NAME VARCHAR);
CREATE TABLE PERSON (ID INT, ORGID INT, NAME VARCHAR, PRIMARY KEY (ID, ORGID));
SELECT * FROM PERSON P JOIN ORGANIZATION O ON O.ORGID = O.ID;
2019 © GridGain Systems
Incorrect/partial data returned by SQL
45
Possible cause and solution
• Data isn’t collocated
– https://apacheignite.readme.io/docs/affinity-collocation
– Use affinity key to keep related data together
CREATE TABLE ORGANIZATION (ID INT PRIMARY KEY, NAME VARCHAR);
CREATE TABLE PERSON (ID INT, ORGID INT, NAME VARCHAR, PRIMARY KEY (ID, ORGID))
WITH “AFFINITY_KEY=ORGID”;
SELECT * FROM PERSON P JOIN ORGANIZATION O ON O.ORGID = O.ID;
– Use replicated tables – they’re collocated with all others
CREATE TABLE ORGANIZATION (ID INT PRIMARY KEY, NAME VARCHAR)
WITH “TEMPLATE=REPLICATED”;
CREATE TABLE PERSON (ID INT, ORGID INT, NAME VARCHAR, PRIMARY KEY (ID, ORGID));
SELECT * FROM PERSON P JOIN ORGANIZATION O ON O.ORGID = O.ID;
– Use distributedJoins=true to do JOINs without collocation (this is costly!)
2019 © GridGain Systems
Some nodes don’t store data
46
Symptom
• Native persistence is enabled
• Data is only stored on one
node/subset of nodes
• New nodes don’t store data
2019 © GridGain Systems
Some nodes don’t store data
47
Possible cause and solution
• Baseline topology doesn’t include some nodes
– https://apacheignite.readme.io/docs/baseline-topology
– Update the baseline after every long-term topology change (use Web Console for
that)
2019 © GridGain Systems
Agenda
48
• Monitoring
– Logging
– JMX
– GridGain Web Console
• Troubleshooting
– Network
– Storage
– Performance
• Checklist
2019 © GridGain Systems
Performance troubleshooting
49
• Throughput of persistent cache updates drops to zero
• Low throughput of SQL access
2019 © GridGain Systems
Throughput of cache updates drops to zero
50
Symptom
• Native persistence is enabled
• Write performance is good but there are periodic intervals of 0 ops/sec
2019 © GridGain Systems
Throughput of cache updates drops to zero
51
Possible cause and solution
• Updates in RAM happen faster than on disk – disk needs to catch up
– Enable write throttling
• DataStorageConfiguration.writeThrottlingEnabled=true
• Sacrifice peak performance for stable latency and throughput
– Increase checkpoint page buffer size
• Allow more pending updates in RAM
• https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Sto
re+-+under+the+hood
2019 © GridGain Systems
Low throughput of SQL access
52
Symptom
• SQL reads using indexed fields are slow
– Even for simple queries like SELECT * FROM FOO WHERE ID = 123
• Writes to an SQL-enabled cache are slow
– Even via key-value API
2019 © GridGain Systems
Low throughput of SQL access
53
Possible cause and solution
• A lot of possible causes – performance tuning is hard!
– https://apacheignite-sql.readme.io/docs/performance-and-debugging
• A common issue – indexes not fitting into maximum inline size
– Check for “Indexed columns of a row cannot be fully
inlined into index” warnings
– Do what the warning says
[2019-02-19T23:36:26,848][WARN ][main][H2TreeIndex] <CacheQueryExamplePersons> Indexed
columns of a row cannot be fully inlined into index what may lead to slowdown due to
additional data page reads, increase index inline size if needed (use INLINE_SIZE option
for CREATE INDEX command, QuerySqlField.inlineSize for annotated classes, or
QueryIndex.inlineSize for explicit QueryEntity configuration) [cacheName=PERSON,
tableName=CacheQueryExamplePersons, idxName=PERSON_FIRSTNAME_IDX, idxCols=(FIRSTNAME,
_KEY), idxType=SECONDARY, curSize=10, recommendedInlineSize=71]
2019 © GridGain Systems
Agenda
54
• Monitoring
– Logging
– JMX
– GridGain Web Console
• Troubleshooting
– Network
– Storage
– Performance
• Checklist
2019 © GridGain Systems55
Checklist
2019 © GridGain Systems
Checklist
56
Be prepared
• Setup monitoring before something happens, not after!
– Make sure logs are safely stored
– GridGain Web Console is an easy and effective way to monitor
• Properly configure the cluster
– Implement the suggestions from this presentation before you encounter the
issues they help with
• Know the common problems and how to react
– Logs often provide solutions
2019 © GridGain Systems
Checklist
57
Avoiding common problems
• Network-related issues are often caused by IP configuration
– Configure IP finder – TcpDiscoveryVmIpFinder is usually a good fit
• GC pauses cause performance and stability issues
– Use G1 GC, set a fitting heap size
– Reduce GC pressure by using lazy SQL, withKeepBinary() and on-
heap cache
• Insufficient memory will result in a crash
– Plan storage capacity in advance
– Assess required heap size during testing
– Account for other processes in the system
2019 © GridGain Systems
Checklist
58
Avoiding common problems
• Implement cluster administration processes
– For in-memory: wait for rebalance when starting or stopping nodes
– For persistence: update baseline for long-term topology changes
• Write performance may be unstable when update speed is more than the
disk speed
– Enable write throttling
• Consider performance suggestions reported in the log
– Index inline size suggestion is an example
2019 © GridGain Systems59
Questions?
2019 © GridGain Systems
Apache Ignite Resources
60
• Apache Ignite documentation
– https://apacheignite.readme.io/docs/
• Apache Ignite community resources
– user@ignite.apache.org – the mailing list
– https://ignite.apache.org/community/resources.html – other resources and
instructions
– http://apache-ignite-users.70518.x6.nabble.com/ – forum and archive
– https://stackoverflow.com/questions/tagged/ignite – StackOverflow questions
• Contact me!
– slukyanov@gridgain.com
– stanlukyanov@gmail.com
2019 © GridGain Systems
GridGain Resources
• White Papers
– Visit https://www.gridgain.com/resources/papers
• Webinars
– Visit https://www.gridgain.com/resources/webinars
• Videos
– Visit https://www.gridgain.com/resources/videos
• Free 30-Day Ultimate, Enterprise or Professional Edition Trial
– Visit https://www.gridgain.com/resources/download

More Related Content

What's hot

Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...HostedbyConfluent
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityJérôme Petazzoni
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutronvivekkonnect
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzerDmitry Vyukov
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2Dvir Volk
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-RegionJi-Woong Choi
 
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondKernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondAnne Nicolas
 
Cumulus networks conversion guide
Cumulus networks conversion guideCumulus networks conversion guide
Cumulus networks conversion guideScott Suehle
 
Introducing OpenHPC Cross Platform Provisioning Assembly for Warewulf
Introducing OpenHPC Cross Platform Provisioning Assembly for WarewulfIntroducing OpenHPC Cross Platform Provisioning Assembly for Warewulf
Introducing OpenHPC Cross Platform Provisioning Assembly for WarewulfNaohiro Tamura
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...ScyllaDB
 

What's hot (20)

Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and security
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
 
Vagrant
Vagrant Vagrant
Vagrant
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzer
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2
 
ClickHouse Keeper
ClickHouse KeeperClickHouse Keeper
ClickHouse Keeper
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
 
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondKernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
 
Cumulus networks conversion guide
Cumulus networks conversion guideCumulus networks conversion guide
Cumulus networks conversion guide
 
Introducing OpenHPC Cross Platform Provisioning Assembly for Warewulf
Introducing OpenHPC Cross Platform Provisioning Assembly for WarewulfIntroducing OpenHPC Cross Platform Provisioning Assembly for Warewulf
Introducing OpenHPC Cross Platform Provisioning Assembly for Warewulf
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 

Similar to Troubleshooting Apache® Ignite™

In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersIn-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersDenis Magda
 
Container and Test Automation Management Practices in TrendMicro
Container and Test Automation Management Practices in TrendMicroContainer and Test Automation Management Practices in TrendMicro
Container and Test Automation Management Practices in TrendMicroJen-Chieh Ko
 
On Cloud Nine: How to be happy migrating your in-memory computing platform to...
On Cloud Nine: How to be happy migrating your in-memory computing platform to...On Cloud Nine: How to be happy migrating your in-memory computing platform to...
On Cloud Nine: How to be happy migrating your in-memory computing platform to...Stephen Darlington
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveWalid Shaari
 
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...DevOpsDays Riga
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platformLars Albertsson
 
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...NETWAYS
 
Viavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptxViavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptxmani723
 
Zendcon scaling magento
Zendcon scaling magentoZendcon scaling magento
Zendcon scaling magentoMathew Beane
 
ROI for IP Address Management (IPAM) Solutions
ROI for IP Address Management (IPAM) SolutionsROI for IP Address Management (IPAM) Solutions
ROI for IP Address Management (IPAM) SolutionsSolarWinds
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridYahoo Developer Network
 
QRadar_CEddfdfdsfdfdfdfdfdfdfdfdfdfdff.pdf
QRadar_CEddfdfdsfdfdfdfdfdfdfdfdfdfdff.pdfQRadar_CEddfdfdsfdfdfdfdfdfdfdfdfdfdff.pdf
QRadar_CEddfdfdsfdfdfdfdfdfdfdfdfdfdff.pdfmindhackers161
 
Digdag Updates 2020 July
Digdag Updates 2020 JulyDigdag Updates 2020 July
Digdag Updates 2020 JulyYou Yamagata
 
Distributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lDistributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lGanesan Narayanasamy
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesWeaveworks
 
Grails 4: Upgrade your Game!
Grails 4: Upgrade your Game!Grails 4: Upgrade your Game!
Grails 4: Upgrade your Game!Zachary Klein
 
MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019Ieva Navickaite
 

Similar to Troubleshooting Apache® Ignite™ (20)

In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersIn-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software Engineers
 
Container and Test Automation Management Practices in TrendMicro
Container and Test Automation Management Practices in TrendMicroContainer and Test Automation Management Practices in TrendMicro
Container and Test Automation Management Practices in TrendMicro
 
On Cloud Nine: How to be happy migrating your in-memory computing platform to...
On Cloud Nine: How to be happy migrating your in-memory computing platform to...On Cloud Nine: How to be happy migrating your in-memory computing platform to...
On Cloud Nine: How to be happy migrating your in-memory computing platform to...
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
 
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
 
Viavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptxViavi_TeraVM Core Emulator.pptx
Viavi_TeraVM Core Emulator.pptx
 
Zendcon scaling magento
Zendcon scaling magentoZendcon scaling magento
Zendcon scaling magento
 
ROI for IP Address Management (IPAM) Solutions
ROI for IP Address Management (IPAM) SolutionsROI for IP Address Management (IPAM) Solutions
ROI for IP Address Management (IPAM) Solutions
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
 
CFD on Power
CFD on Power CFD on Power
CFD on Power
 
QRadar_CEddfdfdsfdfdfdfdfdfdfdfdfdfdff.pdf
QRadar_CEddfdfdsfdfdfdfdfdfdfdfdfdfdff.pdfQRadar_CEddfdfdsfdfdfdfdfdfdfdfdfdfdff.pdf
QRadar_CEddfdfdsfdfdfdfdfdfdfdfdfdfdff.pdf
 
EVOLVE'15 | Maximize | Gary Gamitian | Informatica
EVOLVE'15 | Maximize | Gary Gamitian | InformaticaEVOLVE'15 | Maximize | Gary Gamitian | Informatica
EVOLVE'15 | Maximize | Gary Gamitian | Informatica
 
Digdag Updates 2020 July
Digdag Updates 2020 JulyDigdag Updates 2020 July
Digdag Updates 2020 July
 
Distributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lDistributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2l
 
Securing Hadoop @eBay
Securing Hadoop @eBaySecuring Hadoop @eBay
Securing Hadoop @eBay
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slides
 
Grails 4: Upgrade your Game!
Grails 4: Upgrade your Game!Grails 4: Upgrade your Game!
Grails 4: Upgrade your Game!
 
MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019
 

More from Tom Diederich

Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)Tom Diederich
 
How to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom DiederichHow to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom DiederichTom Diederich
 
How to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hourHow to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hourTom Diederich
 
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Tom Diederich
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in PracticeTom Diederich
 
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...Tom Diederich
 
Machine learning and deep learning with Apache Ignite
Machine learning and deep learning with Apache IgniteMachine learning and deep learning with Apache Ignite
Machine learning and deep learning with Apache IgniteTom Diederich
 
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...Tom Diederich
 
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 Improving Apache Spark™ In-Memory Computing with Apache Ignite™ Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Improving Apache Spark™ In-Memory Computing with Apache Ignite™Tom Diederich
 
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...Tom Diederich
 
“Building consistent and highly available distributed systems with Apache Ign...
“Building consistent and highly available distributed systems with Apache Ign...“Building consistent and highly available distributed systems with Apache Ign...
“Building consistent and highly available distributed systems with Apache Ign...Tom Diederich
 
Quick MySQL performance check
Quick MySQL performance checkQuick MySQL performance check
Quick MySQL performance checkTom Diederich
 

More from Tom Diederich (12)

Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)
 
How to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom DiederichHow to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom Diederich
 
How to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hourHow to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hour
 
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in Practice
 
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
 
Machine learning and deep learning with Apache Ignite
Machine learning and deep learning with Apache IgniteMachine learning and deep learning with Apache Ignite
Machine learning and deep learning with Apache Ignite
 
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
 
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 Improving Apache Spark™ In-Memory Computing with Apache Ignite™ Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
 
“Building consistent and highly available distributed systems with Apache Ign...
“Building consistent and highly available distributed systems with Apache Ign...“Building consistent and highly available distributed systems with Apache Ign...
“Building consistent and highly available distributed systems with Apache Ign...
 
Quick MySQL performance check
Quick MySQL performance checkQuick MySQL performance check
Quick MySQL performance check
 

Recently uploaded

Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

Troubleshooting Apache® Ignite™

  • 1. Troubleshooting Apache® Ignite™ Customer Solutions, GridGain Stan Lukyanov
  • 2. 2019 © GridGain Systems GridGain and Apache Ignite GridGain In-Memory Computing Platform In-Memory Data Grid In-Memory Database Streaming Analytics Continuous Learning Framework Segmentation Protection Data Center Replication Monitoring & Management Enterprise Security Rolling Upgrades Point-in-Time Recovery Heterogenous Recovery Full, Incremental, Continuous Backups Network Backups 1
  • 3. 2019 © GridGain Systems Apache Ignite Support – Faster Time to Reliable Ignite • Get up and running faster with 2 hours initial consultation • Ensure fast, reliable Ignite with unlimited 9x5 global support – Unlimited web/e-mail support – Identify bugs, workarounds – Troubleshoot performance, reliability issues https://www.gridgain.com/products/services/support/support-apache-ignite 2
  • 4. 2019 © GridGain Systems Agenda 3 • Monitoring – Logging – JMX – GridGain Web Console • Troubleshooting – Network – Storage – Performance • Checklist
  • 5. 2019 © GridGain Systems4 Monitoring
  • 6. 2019 © GridGain Systems Monitoring 5 Setup it before something bad happens!
  • 7. 2019 © GridGain Systems Agenda 6 • Monitoring – Logging – JMX – GridGain Web Console • Troubleshooting – Network – Storage – Performance • Checklist
  • 8. 2019 © GridGain Systems Logging 7 What can you do with logs • Manually check nodes state • Identify issues with cluster configuration • Add automatic parsing to report issues on the fly – With custom or third-party tools • Provide to GridGain Support experts
  • 9. 2019 © GridGain Systems Logging 8 Configuring GridGain logs • https://apacheignite.readme.io/docs/logging • Use log4j 2.x integration – Other options: j.u.logging, log4j 1.x, slf4j, custom integration • Start Ignite in verbose mode – ignite.sh –v – java -DIGNITE_QUIET=false –cp ...
  • 10. 2019 © GridGain Systems Logging 9 Quiet log
  • 11. 2019 © GridGain Systems Logging 10 Configuring GC logs • https://apacheignite.readme.io/docs/jvm-and-system-tuning • Crucial to troubleshoot a lot of issues • To configure -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Xloggc:/path/to/gc/logs/log.txt
  • 12. 2019 © GridGain Systems Logging 11 How to manage logs • Choose location for the log files – The default location is ${IGNITE_HOME}/work/log/ignite.log – Use a local disk with enough space (2 GB+) – Don’t use /tmp! – Good idea to store GridGain, application and GC logs together • Archive old log files periodically to save on storage space • Try to save logs for the cluster’s current uptime
  • 13. 2019 © GridGain Systems Logging Run-time configuration changes • Helpful when you need to debug a running deployment • Easy to do with log4j 2.x – Edit the log4j config file directly • https://logging.apache.org/log4j/2.x/manual/c onfiguration.html#AutomaticReconfiguration – Use JMX – log4j has its own bean • https://logging.apache.org/log4j/2.x/manual/j mx.html 12
  • 14. 2019 © GridGain Systems Agenda 13 • Monitoring – Logging – JMX – GridGain Web Console • Troubleshooting – Network – Storage – Performance
  • 15. 2019 © GridGain Systems JMX 14 What can you do with JMX • Check state of various grid subsystems: storage, thread pools, etc • Monitor metrics changing over time • Add automatic warnings on important metrics – With custom or third-party tools
  • 16. 2019 © GridGain Systems JMX 15 How to leverage JMX beans • Standard way to monitor Java applications – A lot of tools on the market – JConsole and VisualVM are bundled with JDK
  • 17. 2019 © GridGain Systems GridGain Metrics 16 Name Description JMX Query Default Cluster metrics Basic node information org.apache:clsLdr=*,group=Kernal,name=ClusterMetricsMXBeanImpl org.apache:clsLdr=*,group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl Enabled Data region metrics Memory information org.apache:clsLdr=*,group=DataRegionMetrics,name=<region_name> Disabled Data storage metrics Persistent storage information org.apache:clsLdr=*,group="Persistent Store",name=DataStorageMetrics Disabled Cache metrics Cache statistics org.apache:clsLdr=*,group=<cache_name>,name="org.apache.ignite.internal.proc essors.cache.CacheClusterMetricsMXBeanImpl“ org.apache:clsLdr=*,group=<cache_name>,name="org.apache.ignite.internal.proc essors.cache.CacheLocalMetricsMXBeanImpl“ Disabled Thread pool metrics Thread pools information org.apache:clsLdr=*,group="Thread Pools",name=<thread_pool_name> Enabled • https://apacheignite.readme.io/docs/cluster-groups#section-cluster-group-metrics • https://apacheignite.readme.io/docs/memory-metrics • https://apacheignite.readme.io/docs/cache-metrics
  • 18. 2019 © GridGain Systems Agenda 17 • Monitoring – Logging – JMX – GridGain Web Console • Troubleshooting – Network – Storage – Performance
  • 19. 2019 © GridGain Systems GridGain Tools 18 Web Console Visor GUI • Basic version in Ignite • Enhanced version in GridGain • All-in-one solution • Web-based • The richest functionality • Available in GridGain • All-in-one solution • Runs on a user’s PC Command line tools • Available in Ignite • Multiple scripts – Visor CMD – control.sh – SQLLine
  • 20. 2019 © GridGain Systems GridGain Web Console 19 What can be done • Manage multiple GridGain clusters through a web interface – Monitor metrics changing over time – Watch logs and thread dumps of the nodes – Execute and monitor running queries • Generate cluster configurations and RDBMS integrations • Available online to try at https://console.gridgain.com/ • On-premise installation for production using Docker or bare metal – https://apacheignite-tools.readme.io/docs/docker-deployment
  • 21. 2019 © GridGain Systems GridGain Web Console 20
  • 22. 2019 © GridGain Systems GridGain Web Console 21
  • 23. 2019 © GridGain Systems GridGain Web Console 22
  • 24. 2019 © GridGain Systems GridGain Web Console 23
  • 25. 2019 © GridGain Systems24 Troubleshooting
  • 26. 2019 © GridGain Systems Agenda 25 • Monitoring – Logging – JMX – GridGain Web Console • Troubleshooting – Network – Storage – Performance • Checklist
  • 27. 2019 © GridGain Systems Network troubleshooting 26 • Nodes not joining the cluster • Nodes joining a wrong cluster • Nodes taking a long time to connect • Node is kicked out from the cluster
  • 28. 2019 © GridGain Systems Nodes not joining the cluster 27 Symptom • Started nodes don’t join in a single cluster – each has a cluster of its own instead • Topology snapshot is “ver=1, <...> servers=1” on both nodes
  • 29. 2019 © GridGain Systems Nodes not joining the cluster 28 Possible cause and solution • Connection issues, firewall, etc – Check via ping, check firewall settings – Make sure TCP ports are open: 47500-47509, 47100-47109, 11211, 10800 • IP Finder configuration – Use TcpDiscoveryVmIpFinder when you know all hosts in advance • Make sure to list all IPs and ports – A lot of other options: Kubernetes, cloud, etc • https://apacheignite.readme.io/docs/kubernetes-deployment • https://apacheignite.readme.io/docs/tcpip-discovery
  • 30. 2019 © GridGain Systems Nodes joining a wrong cluster 29 Symptom • Node started in a seemingly clean environment connects to some unknown cluster • Topology snapshot is “ver=2, servers=2”
  • 31. 2019 © GridGain Systems Nodes joining a wrong cluster 30 Possible cause and solution • Multicast (TcpDiscoveryMulticastIpFinder) is used – Only use it in closed subnets, or for a “Hello, world!” • Wrong IPs in the configuration – Check and fix IP finder configuration – see previous slide
  • 32. 2019 © GridGain Systems Nodes taking a long time to connect 31 Symptom • Cluster is taking a long time to start – Or the first node is taking a long time to start • After the nodes joined in the cluster performance is good
  • 33. 2019 © GridGain Systems Nodes taking a long time to connect 32 Possible cause and solution • Too many addresses in the IP finder – Ignite will try to connect to each know IP – Only list the addresses and ports you use • Too high IgniteConfiguration.failureDetectionTimeout – Basic timeout for most network operations – Used for establishing a connection – Reduce the failureDetectionTimeout to scan through the IP list faster
  • 34. 2019 © GridGain Systems Node is kicked out from the cluster 33 Symptom • Cluster nodes report that one of them has failed [...][WARN ][disco-event-worker-#42][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=13169858-92db-48a3-bc4b- db369cab6457, addrs=[…], sockAddrs=[…], discPort=47501, order=2, intOrder=2, lastExchangeTime=1550680047403, loc=false, ver=2.7.2#20190206-sha1:5f8f5488, isClient=false] • “Local node segmented” messages on the failed node [...][WARN ][disco-event-worker-#42][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=13169858-92db-48a3- bc4b-db369cab6457, addrs=[…], sockAddrs=[…], discPort=47501, order=2, intOrder=2, lastExchangeTime=1550680109573, loc=true, ver=2.7.2#20190206-sha1:5f8f5488, isClient=false]
  • 35. 2019 © GridGain Systems Node is kicked out from the cluster 34 Possible cause and solution • Likely a GC pause • To confirm – Check for “Possible too long JVM pause” messages in the logs – Analyze GC logs around the time of the issue • https://gceasy.io/ can help
  • 36. 2019 © GridGain Systems Node is kicked out from the cluster 35 Possible cause and solution • To handle GC issues – Increase heap: -Xmx16g – Try G1 with a latency target: -XX:+UseG1GC -XX:MaxGCPauseMillis=200 – Reduce heap pressure • Use smaller SQL queries ▪ More about SQL later • Use IgniteCache.withKeepBinary() to avoid deserialization ▪ https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#withKeepBinary-- • Use on-heap caching with CacheConfiguration.copyOnRead=false ▪ https://apacheignite.readme.io/docs/memory-configuration#section-on-heap-caching – https://apacheignite.readme.io/docs/jvm-and-system-tuning • Remedy: change failureDetectionTimeout to allow a longer inactivity period
  • 37. 2019 © GridGain Systems Agenda 36 • Monitoring – Logging – JMX – GridGain Web Console • Troubleshooting – Network – Storage – Performance • Checklist
  • 38. 2019 © GridGain Systems Storage troubleshooting 37 • “Out of memory” errors • Lost data after node restart • Incorrect/partial data returned by SQL • Some nodes don’t store data
  • 39. 2019 © GridGain Systems Out of memory 38 Symptom • Three kinds of “out of memory” conditions – Java’s OutOfMemoryError exception – IgniteOutOfMemoryException exception – OS killing the Ignite’s JVM
  • 40. 2019 © GridGain Systems Out of memory 39 Possible cause and solution: OutOfMemoryError • Heap is too small – Increase –Xmx • Large SQL queries are running – Use lazy queries • https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-result-set-lazy- loading – Avoid many large queries running concurrently – Split query to reduce data set size • Before: ▪ SELECT * FROM PERSON ORDER BY AGE • After: ▪ SELECT * FROM PERSON WHERE AGE < 40 ORDER BY AGE ▪ SELECT * FROM PERSON WHERE AGE >= 40 ORDER BY AGE
  • 41. 2019 © GridGain Systems Out of memory 40 Possible cause and solution: IgniteOutOfMemoryException • Data region is too small IgniteOutOfMemoryException: Out of memory in data region [name=default, initSize=256,0 MiB, maxSize=476,8 MiB, persistenceEnabled=false] Try the following: ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize) ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled) ^-- Enable eviction or expiration policies – Increase DataRegionConfiguration.maxSize • https://apacheignite.readme.io/docs/capacity-planning – Use Native Persistence • https://apacheignite.readme.io/docs/distributed-persistent-store – Use DataRegionConfiguration.pageEvictionMode=RANDOM_2_LRU • https://apacheignite.readme.io/docs/evictions – Use an expiry policy • https://apacheignite.readme.io/docs/expiry-policies
  • 42. 2019 © GridGain Systems Out of memory 41 Possible cause and solution: OOM killer • Total size of the processes is greater than RAM – Check for message Out of memory: Kill process <PID> (java) score <SCORE> or sacrifice child – Make sure that total size of the processes is within bounds – Disable overcommit to have a more graceful, in-process errors • sysctl -w vm.overcommit_memory=2 • https://www.kernel.org/doc/Documentation/vm/overcommit-accounting • May affect other applications, use carefully
  • 43. 2019 © GridGain Systems Lost data after node restart 42 Symptom • Some or all data was lost after one or several node restarts
  • 44. 2019 © GridGain Systems Lost data after node restart 43 Possible cause and solution • In-memory cache with zero backups – Taking down a node will always lose data • In-memory cache with one or more backups – Don’t take down more nodes simultaneously than there are backups – Wait for rebalance after bringing a node back online (use Web Console for that)
  • 45. 2019 © GridGain Systems Incorrect/partial data returned by SQL 44 Symptom • An SQL query with JOIN returns less data than expected CREATE TABLE ORGANIZATION (ID INT PRIMARY KEY, NAME VARCHAR); CREATE TABLE PERSON (ID INT, ORGID INT, NAME VARCHAR, PRIMARY KEY (ID, ORGID)); SELECT * FROM PERSON P JOIN ORGANIZATION O ON O.ORGID = O.ID;
  • 46. 2019 © GridGain Systems Incorrect/partial data returned by SQL 45 Possible cause and solution • Data isn’t collocated – https://apacheignite.readme.io/docs/affinity-collocation – Use affinity key to keep related data together CREATE TABLE ORGANIZATION (ID INT PRIMARY KEY, NAME VARCHAR); CREATE TABLE PERSON (ID INT, ORGID INT, NAME VARCHAR, PRIMARY KEY (ID, ORGID)) WITH “AFFINITY_KEY=ORGID”; SELECT * FROM PERSON P JOIN ORGANIZATION O ON O.ORGID = O.ID; – Use replicated tables – they’re collocated with all others CREATE TABLE ORGANIZATION (ID INT PRIMARY KEY, NAME VARCHAR) WITH “TEMPLATE=REPLICATED”; CREATE TABLE PERSON (ID INT, ORGID INT, NAME VARCHAR, PRIMARY KEY (ID, ORGID)); SELECT * FROM PERSON P JOIN ORGANIZATION O ON O.ORGID = O.ID; – Use distributedJoins=true to do JOINs without collocation (this is costly!)
  • 47. 2019 © GridGain Systems Some nodes don’t store data 46 Symptom • Native persistence is enabled • Data is only stored on one node/subset of nodes • New nodes don’t store data
  • 48. 2019 © GridGain Systems Some nodes don’t store data 47 Possible cause and solution • Baseline topology doesn’t include some nodes – https://apacheignite.readme.io/docs/baseline-topology – Update the baseline after every long-term topology change (use Web Console for that)
  • 49. 2019 © GridGain Systems Agenda 48 • Monitoring – Logging – JMX – GridGain Web Console • Troubleshooting – Network – Storage – Performance • Checklist
  • 50. 2019 © GridGain Systems Performance troubleshooting 49 • Throughput of persistent cache updates drops to zero • Low throughput of SQL access
  • 51. 2019 © GridGain Systems Throughput of cache updates drops to zero 50 Symptom • Native persistence is enabled • Write performance is good but there are periodic intervals of 0 ops/sec
  • 52. 2019 © GridGain Systems Throughput of cache updates drops to zero 51 Possible cause and solution • Updates in RAM happen faster than on disk – disk needs to catch up – Enable write throttling • DataStorageConfiguration.writeThrottlingEnabled=true • Sacrifice peak performance for stable latency and throughput – Increase checkpoint page buffer size • Allow more pending updates in RAM • https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Sto re+-+under+the+hood
  • 53. 2019 © GridGain Systems Low throughput of SQL access 52 Symptom • SQL reads using indexed fields are slow – Even for simple queries like SELECT * FROM FOO WHERE ID = 123 • Writes to an SQL-enabled cache are slow – Even via key-value API
  • 54. 2019 © GridGain Systems Low throughput of SQL access 53 Possible cause and solution • A lot of possible causes – performance tuning is hard! – https://apacheignite-sql.readme.io/docs/performance-and-debugging • A common issue – indexes not fitting into maximum inline size – Check for “Indexed columns of a row cannot be fully inlined into index” warnings – Do what the warning says [2019-02-19T23:36:26,848][WARN ][main][H2TreeIndex] <CacheQueryExamplePersons> Indexed columns of a row cannot be fully inlined into index what may lead to slowdown due to additional data page reads, increase index inline size if needed (use INLINE_SIZE option for CREATE INDEX command, QuerySqlField.inlineSize for annotated classes, or QueryIndex.inlineSize for explicit QueryEntity configuration) [cacheName=PERSON, tableName=CacheQueryExamplePersons, idxName=PERSON_FIRSTNAME_IDX, idxCols=(FIRSTNAME, _KEY), idxType=SECONDARY, curSize=10, recommendedInlineSize=71]
  • 55. 2019 © GridGain Systems Agenda 54 • Monitoring – Logging – JMX – GridGain Web Console • Troubleshooting – Network – Storage – Performance • Checklist
  • 56. 2019 © GridGain Systems55 Checklist
  • 57. 2019 © GridGain Systems Checklist 56 Be prepared • Setup monitoring before something happens, not after! – Make sure logs are safely stored – GridGain Web Console is an easy and effective way to monitor • Properly configure the cluster – Implement the suggestions from this presentation before you encounter the issues they help with • Know the common problems and how to react – Logs often provide solutions
  • 58. 2019 © GridGain Systems Checklist 57 Avoiding common problems • Network-related issues are often caused by IP configuration – Configure IP finder – TcpDiscoveryVmIpFinder is usually a good fit • GC pauses cause performance and stability issues – Use G1 GC, set a fitting heap size – Reduce GC pressure by using lazy SQL, withKeepBinary() and on- heap cache • Insufficient memory will result in a crash – Plan storage capacity in advance – Assess required heap size during testing – Account for other processes in the system
  • 59. 2019 © GridGain Systems Checklist 58 Avoiding common problems • Implement cluster administration processes – For in-memory: wait for rebalance when starting or stopping nodes – For persistence: update baseline for long-term topology changes • Write performance may be unstable when update speed is more than the disk speed – Enable write throttling • Consider performance suggestions reported in the log – Index inline size suggestion is an example
  • 60. 2019 © GridGain Systems59 Questions?
  • 61. 2019 © GridGain Systems Apache Ignite Resources 60 • Apache Ignite documentation – https://apacheignite.readme.io/docs/ • Apache Ignite community resources – user@ignite.apache.org – the mailing list – https://ignite.apache.org/community/resources.html – other resources and instructions – http://apache-ignite-users.70518.x6.nabble.com/ – forum and archive – https://stackoverflow.com/questions/tagged/ignite – StackOverflow questions • Contact me! – slukyanov@gridgain.com – stanlukyanov@gmail.com
  • 62. 2019 © GridGain Systems GridGain Resources • White Papers – Visit https://www.gridgain.com/resources/papers • Webinars – Visit https://www.gridgain.com/resources/webinars • Videos – Visit https://www.gridgain.com/resources/videos • Free 30-Day Ultimate, Enterprise or Professional Edition Trial – Visit https://www.gridgain.com/resources/download