The document discusses troubleshooting techniques for Solr, including using a "TreeMap" process of establishing boundaries, splitting the problem, identifying relevant parts, zooming in, and re-formulating boundaries in an iterative process until the problem is fixed. It outlines how to establish boundaries by defining the identity, location, timing, and magnitude of the problem. Examples are provided for indexing and searching processes in Solr.
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, United Nations
1.
2. 2
02
The magic of creation
Any sufficiently advanced technology is indistinguishable from magic
—Arthur C. Clarke
bin/solr start -e techproducts
Pure magic
1. Start the server
2. Setup the collection
3. Populate with documents
4. Commit
5. Profit!
3. 3
03
The price of magic
bin/solr start … ???
1. What port is the server running on?
2. What is the collection name?
3. Is it static or dynamic schema? Or schemaless?
4. Which directory is schema configuration in? Data?
5. What documents have we populated already?
6. Is everything committed?
7. WHY DOES MY QUERY NOT WORK !!! L
4. 4
03
Troubleshooting process
1. Troubleshooting is not a linear process
2. It is not taught often or well
3. Book is coming soon(-ish….)
4. Based on my experience as:
Solr-based project developer and popularizer
Senior (Weblogic) tech-support for 3 years
5. Hard to explain the book in 40 minutes
6. TreeMap is a – slightly - faster mental model
7. Adaptation of the Root Cause Analysis
8. Top-level concepts described in The New Rational
Manager by the Kepner and Tregoe (1997)
5. 5
03
Troubleshooting TreeMap
1. Establish the boundaries
2. Split the problem
3. Identify the relevant part
4. Zoom in
5. Re-formulate the boundaries
6. Repeat 2-5 until fixed
7. 7
03
Boundaries - Identity
Identity – action we want to accomplish/problem to solve
Initial (black-box) identity –
echoParams is duplicated with example config, sometimes
Zoomed-in –
Any query parameter that is also in request handler's defaults is duplicated
See SOLR-6780 for full story, a.k.a an evil freaking bug
Gets easier with practice
8. 8
03
Boundaries - Location
Problem: Solr cannot find customer records
Could be indexing
• Record was never sent to Solr
• Wrong handler
• Invalid schema definition
• Incorrect URP pipeline
• ...
Could be searching
• Query too restrictive
• Query too permissive
• Searching wrong fields
• Searching against catch-all field
• ...
Cloud adds many more locations
Location – Place (component) where the problem happens
9. 9
03
Boundaries - Timing
Timing – when/how often the problem shows itself
Reproducibility
1. Always – ideal, reproducible with debugger on, logs on/off
2. Seemingly intermittent (a.k.a sometimes) – useless
3. On trigger X (e.g. on commit) – nearly as good as always
Onset
1. Did the system work at time point X – not at time point Y =
What did you change in meanwhile?
2. Problem exists != Problem noticed, may have been shadowed
10. 10
03
Boundaries - Magnitude
Magnitude – WHAT is the extent of the problem
• Latest Solr or a single (or range) of old versions?
• Standard example configuration or only with custom schema?
• A single node or a whole cluster?
• The more standard/recent config is = the easier it is to troubleshoot
11. 11
03
Boundaries – through negation and comparison
“I choose a block of marble and chop off whatever I don’t need”
— (sculptor) Auguste Rodin
Clarify the problem by saying what it is NOT as well
1. Example: This affects Solr 5.1, BUT not Solr 5.2
2. The BUT part requires testing and may prove to be untrue
3. Thinking of negative condition simplifies/purifies test case
4. Also gives a parallel use-case that works – great for debugging
12. 12
03
Practical boundaries – what does the start script do?
bin/solr start … ???
1. Do not try to read the script – look at the ground truth
2. In Admin UI
Dashboard - Versions - solr-spec (version)
Dashboard - JVM - Args (command line params, abbrev.)
Collection - Overview - Instance (all the directories)
3. On command line (Unix, Mac, and like):
ps -aef |grep java
/usr/bin/java -server -Xss256k -Xms512m -Xmx512m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:
TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UsePa
rNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshol
d=64m -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+
CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintTenuringDistributi
on -XX:+PrintGCApplicationStoppedTime -Xloggc:/Users/arafalov/SearchEngines/solr-5.3.1/example
/techproducts/solr//../logs/solr_gc.log …
4. On Windows: use Microsoft/Sysinternals ProcessExplorer
5. Example: SOLR-8073
24. 24
03
TreeMap – Troubleshooting Solr cloud
1. Good luck with exponential complexity increase.
2. Try to reproduce in a standalone instance!
3. Tools exist, but they are themselves complex (e.g. Jepsen)
4. But the TreeMap process is the same overall
Cloud adds many more locations
25. 25
03
Troubleshooting – closing notes and review
1. Troubleshooting is both art (intuition) and science
2. The more you apply the science, the better you become at the art
3. Remember the overall process
Establish the boundaries
Split the problem
Identify the relevant part
Zoom in
Re-formulate the boundaries
Repeat until fixed/problem identified
4. Remember the boundaries
Identity
Location
Timing
Magnitude
26. 26
03
Troubleshooting – next step
1. My resources and mailing list: http://www.solr-start.com/
2. Solr-users mailing list and archives
Identify your boundary in the email
3. Books, current and upcoming
4. Google/Bing/DDG – use good keywords
5. Share what you learned