Building an app that scales well for Jira Data Center can be challenging, especially with regards to index replication.
Andriy Yakovlev, a Principal Premier Support Engineer at Atlassian will share some common problems customers have experienced with apps on large instances, and how to prevent them.
Attendees will learn about how indexing works in Jira, and how indexes are replicated in Jira Data Center, as well as what to look out for to prevent problems before they happen.
9. Application down or major
malfunction / Serious
degradation of application
performance or functionality
L1L2
10. Confusion
Multiple end users affected
and confused
Jira admin perception
Confidence
Loss of confidence or fear to
install App
Hours
Time spent on
troubleshooting
11. Agenda
Why is this important?
Overview of Jira Indexing
Index replication in Jira DC
When things break
16. JQL Search
Converts JQL into Lucene
query request, extendable,
pluggable.
Jira index
Lucene
Text search engine which
keeps its structures on disk
17. JQL Search
Converts JQL into Lucene
query request, extendable,
pluggable.
Jira index
Filters, Dashboards,
Agile boards
Using JQL as a building blocks
Lucene
Text search engine which
keeps its structures on disk
29. Global scope
Recomputing the value
CF is indexed
Cascading dependancies
Project scope
Storing values
No index for ViewOnly CF
TTL for cached values
FAST SLOW
30. Agenda
Why is this important?
Overview of Jira Indexing
Index replication in Jira DC
When things break
33. Replicating issues
Issue data is replicated by ID
and Action
Jira index replication
Multiserver
Each node has its own Lucene
copy
34. Replicating issues
Issue data is replicated by ID
and Action
Jira index replication
Eventually consistent
Each node replays the
replication in its own tempo
Multiserver
Each node has its own Lucene
copy
35. Everything is going to be
alright, maybe not today, but
eventually.
CONVENTIONAL WISDOM
41. How it works - DC replication (2)
1
2
3
1 2 3 12 23 2 2
42. How it works - DC replication (2)
1
2
3
1 2 3 12 23 2 2
43. How it works - DC replication (2)
1
2
3
1 2 3 12 23 2 2
44. DC index
replication
Replication
tables
Index counter
Nodes table
• replicatedindexoperation (RIO) - log entries of Luciene
update events
• nodeindexcounter - position of each node in RIO table
• clusternode - all nodes in cluster
•clusternodeheartbeat - cluster heartbeat table
Important tables for DC Index replication
RIO table
45. DC index
replication
ID 37821490
index_time 2019-08-09 03:30:17
node_id Node1
affected_index ISSUE
entity_type NONE
affected_ids 4500367
operation UPDATE_WITH_RELATED
filename
Replication
tables
Index counter
Nodes table
RIO table
46. DC index
replication
ID 37834110
index_time 2019-08-09 03:30:17
node_id Node2
affected_index ALL
entity_type NONE
affected_ids -
operation FULL_REINDEX_END
filename IndexSnapshot_10402.zip
Replication
tables
Index counter
Nodes table
RIO table (2)
47. DC index
replication
Replication
tables
Index counter
Nodes table
id | node_id | sending_node_id | index_operation_id
10400 | node1 | node2 | 37821490
10201 | node2 | node1 | 37822387
10203 | node2 | node2 | 37814476
10204 | node1 | node1 | 37814782
NodeIndexCounter
Each node keeps a record of the latest operation it processed
from RIO
RIO table
48. DC index
replication
Replication
tables
Index counter
Nodes table
RIO table
Node Status tables
Tracks the cluster status.
• clusternode - all nodes in cluster
node_id | node_state | ip | port | timestamp | node_build | version
node1 | ACTIVE | vm1.local | 40001| 1556186274874 | 71305 | 7.13.5
node2 | ACTIVE | vm2.local| 40001| 150366490664 | 71305 | 7.13.5
•clusternodeheartbeat - cluster heartbeat table
node_id | heartbeat_time | database_time
node2 | 1496015679597 | 1496015679597
node1 | 1496015687334 | 1496015687336
49. Global scope x Nodes
Recomputing the value x Nodes
CF is indexed x Nodes
Cascading dependancies x
Nodes
Project scope
Storing values
No index for ViewOnly CF
TTL for cached values
FAST SLOW
50. Agenda
Why is this important?
Overview of Jira Indexing
Index replication in Jira DC
When things break
51. When things
break
Slow indexing
Slow replication
JQL search not
consistent
Own replication
Unnecessary computations
• Global scope for App custom field
• Recomputing values for issues without modifications
Slow computations
• Cascading computations
• External calls to remote 3rd party systems
52. When things
break
Slow indexing
Slow replication
JQL search not
consistent
Best practice
• Project scope for App CF
• Storing values
• No index for ViewOnly CF
• Don’t abuse reindexing API
• Test on large data sets (App Performance Toolkit)
Own replication
Unnecessary computations
• Global scope for App custom field
• Recomputing values for issues without modifications
Slow computations
• Cascading computations
• External calls to remote 3rd party systems
54. When things
break
Slow indexing
Slow replication
JQL search not
consistent
Node can’t keep up with index replication
• Includes slow indexing problems
• Write amplification due to large data sets
• Adding more nodes doesn’t help
Own replication
55. When things
break
Slow indexing
Slow replication
JQL search not
consistent
Node can’t keep up with index replication
• Includes slow indexing problems
• Write amplification due to large data sets
• Adding more nodes doesn’t help
Own replication
Best practice
• Store values
• Test with 3+ nodes
• Measure and test indexing time, report CF indexing
slowness to the Jira admin
56. There are only two hard things
in Computer Science: cache
invalidation and naming
things..
PHIL KARLTON
57. When things
break
Slow indexing
Slow replication
JQL search not
consistent
Lucene data is not consistent
• Nodes collect the value at different time
• Recomputing in different context
• Stale cache
• Errors in reindexing
Own replication
Leaking Lucene searcher
Avoid using ThreadLocalSearcherCache#startSearcherContext
without cleaning the ThreadLocals
58. When things
break
Slow indexing
Slow replication
JQL search not
consistent
Creating your own replication
When issue and CF are not enough
• Use cache to short lived copy values
• Use DB to store values and pass the reference
• Possibly own Lucene index
• Create your own replication Q
• Don’t use cache if consistency is important
• Avoid using ClusterMessage for heavy traffic
•Monitoring and health check
Own replication
59. When things
break
Shape usage
Please only use circles, rectangles, and
rounded rectangles to call attention to a
particular part of a screenshot, for the
sake of consistency.
Knowledge • Jira Data Center Troubleshooting
• Index Replication Jira Data Center Troubleshooting
•Keeping Lucene Index Synchronised in JIRA Data
Center
•HealthCheck: Cluster Index Replication
61. DC Index replication
problems
App CF can slow down index
operations
To recap
DC Scale
Use proper config and test for
data large sets
Insight
How Jira DC indexing works
62. DC Index replication
problems
App CF can slow down index
operations
To recap
DC Scale
Use proper config and test for
data large sets
Insight
How Jira DC indexing works
63. DC Index replication
problems
App CF can slow down index
operations
To recap
DC Scale
Use proper config and test for
data large sets
Insight
How Jira DC indexing works
64. DC Index replication
problems
App CF can slow down index
operations
To recap
DC Scale
Use proper config and test for
data large sets
Insight
How Jira DC indexing works
65.
66. Q & A
Shape usage
Please only use circles, rectangles, and
rounded rectangles to call attention to a
particular part of a screenshot, for the
sake of consistency.
Knowledge • Jira Data Center Troubleshooting
• Index Replication Jira Data Center Troubleshooting
•Keeping Lucene Index Synchronised in JIRA Data
Center
•HealthCheck: Cluster Index Replication