SlideShare a Scribd company logo
1 of 31
Download to read offline
How SolrCloud Solved
Recovery Issues
Cao Manh Dat
Lucene/Solr Committer, Lucidworks
@caomanhdat
#Activate18 #ActivateSearch
Agenda
• Basic of SolrCloud Indexing
• How Solr used to deal with indexing failures
• The new design
• Q&A
Basic of SolrCloud Indexing
L
ZK
R1
R2
Shard 1
Basic of SolrCloud Indexing
L
ZK
R1
R2
Shard 1
update: u1
Basic of SolrCloud Indexing
L
ZK
R1
R2
Shard 1
update: u1
u1
u1
Basic of SolrCloud Indexing
L
ZK
R1
R2
Shard 1
update: u1
success
success
Basic of SolrCloud Indexing
L
ZK
R1
R2
Shard 1
success
success
success
Index failure on replica side
L
ZK
R1
R2
Shard 1
no response
success
• Unavoidable
• Connection issues
• Replica's node met tragic events
query
Agenda
• Basic of SolrCloud Indexing
• How Solr used to deal with index failures
• The new design
• Q&A
State of Replica
RECOVERING
ACTIVE
DOWN
SHUTDOWN
SHUTDO
W
N
DO
REC
FIN
REC
DO REC
SKIP
REC
How Solr used to deal with index failures
Idea (SOLR-5495): LIR process
1. Leader publish replica's state to DOWN
2. Leader requests replica to do recovery
3. Replica does recovery
A. Replica publish its state to
RECOVERING
B. Sync with leader
C. Replica publish its state to ACTIVE
L R
ZK
1
2
3a, 3c
3b
When replica's state is not enough
When a replica is out-of-sync
• It should not become leader
• It should not becomes ACTIVE without acknowledging LIR process
• It should not skip recovery
Additional Flag : LIR State {ACTIVE, RECOVERING, DOWN}
• Both leader and replica can change it
• A replica has 9 different states in total
A failure case of the old design
Outcome: A replica stay in
DOWN state forever LR ZK
START
update
LIRRecovery on
STARTUP
publish DOWN
wait leader to see
RECOVERING state
Timeout
R's state = REC
R's state = DOWN
wait to see
R's state is
RECOVERING
hmm, nope!
failed to send
an update
Cons of the old design
• Replica states are shared resources
• LIR states are shared resources
• Unable to prove its correctness
• Not being able to solve all kind of failures
Agenda
• Basic of SolrCloud Indexing
• How Solr used to deal with index failures
• The new design
• Q&A
The new design (SOLR-11702)
• Each replica will have an associated term (a positive number)
• The term terminology is borrowed from the Raft paper
(https://goo.gl/9UaURg)
• Terms of all replicas of a shard are stored in ZK
• Path : /collections/collection1/terms/shard1
• Val : {"core1" : 2, "core2" : 2, "core3" : 0}
• Only replicas with highest term can become leader
Operations for changing terms
• Op1 : A replica set its term equals to its leader
• Op2 : A leader increase its term and some other replica terms
by 1
• from : {"core1" : 2, "core2" : 2, "core3" : 2}
• to : {"core1" : 3, "core2" : 3, "core3" : 2}
• Term can only be monotonic increased
Rules
• Leader only forwards updates to replicas with terms equal to
its term
• {"core1" : 3, "core2" : 3, "core3" : 2}
• Replica will watch the term values node and start recovery
process whenever its term less than its leader
How to deal with index failures
1. Leader (L) increase its term and other
replicas succeeded on responding to
the update by 1
• from : {"L":1, "R1":1, "R2":1}
• to : {"L":2, "R1":1, "R2":2}
2. Replica (R1) watch the ZK and get
notified that it needs to do recovery
• Replica sets it term equal to leader
then do recovery
• {"L":2, "R1":2, "R1_recovering":1, "R2":2}
L R1
ZK
1 2
Consistency problem until 7.3
1. A shard with 3 replicas and R1 is
leader
R1
R3R2
Consistency problem until 7.3
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
R1
R3R2
Consistency problem until 7.3
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
3. R1 receives updates u1, u2
R1
R3R2
u1,u2
Consistency problem until 7.3
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
3. R1 receives updates u1, u2
4. R1 goes DOWN
5. R2 and R3 comes back
R1
u1,u2
R3R2
Consistency problem until 7.3
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
3. R1 receives updates u1, u2
4. R1 goes DOWN
5. R2 and R3 comes back
6. R2 or R3 become leader without
having u1, u2
R3R2
R1
u1,u2
How new design solved consistency
problem
1. A shard with 3 replicas and R1 is
leader
R1
R3R2
{R1:1, R2:1, R3:1}
How new design solved consistency
problem
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN R1
R3R2
{R1:1, R2:1, R3:1}
How new design solved consistency
problem
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
3. R1 receives updates u1, u2
R1
R3R2
u1,u2
{R1:2, R2:1, R3:1}
How new design solved consistency
problem
1. A shard with 3 replicas and R1 is
leader
2. R2 and R3 go DOWN
3. R1 receives updates u1, u2
4. R1 goes DOWN
5. R2 and R3 comes back
6. R2 and R3 can't become leader
since their terms is not highest
R1
u1,u2
R3R2
{R1:2, R2:1, R3:1}
Pros of the new design
• Proof of correctness
• Term of replicas are a great hint for leader election
• {"core1" : 1, "core2" : 2, "core3" : 0}
• No need for direct connection between leader and replica
• Only replica update its state
• Solved long-standing issues
• Replica stays in DOWN state forever
• Leaderless shard
• Design document : https://goo.gl/ueSLFT
Note for system administrators
• New LIR design is introduced since Solr 7.3
• Leader in 7.3 will
• use old LIR process for replicas running 7.2 or previous
versions
• use new LIR process for replicas running 7.3 or after
versions
• The backward-compatibility support will be removed since Solr 8.0
• Leader in 8.0 can only use new LIR process
• Leverage the new design in leader election
Q&A

More Related Content

What's hot

Http Parameter Pollution, a new category of web attacks
Http Parameter Pollution, a new category of web attacksHttp Parameter Pollution, a new category of web attacks
Http Parameter Pollution, a new category of web attacksStefano Di Paola
 
Oracle Database Availability & Scalability Across Versions & Editions
Oracle Database Availability & Scalability Across Versions & EditionsOracle Database Availability & Scalability Across Versions & Editions
Oracle Database Availability & Scalability Across Versions & EditionsMarkus Michalewicz
 
Domino policies deep dive
Domino policies deep diveDomino policies deep dive
Domino policies deep diveMartijn de Jong
 
Presentation Oracle Undo & Redo Structures
Presentation Oracle Undo & Redo StructuresPresentation Oracle Undo & Redo Structures
Presentation Oracle Undo & Redo StructuresJohn Boyle
 
9 box matrix
9 box matrix9 box matrix
9 box matrixshakib362
 

What's hot (6)

Http Parameter Pollution, a new category of web attacks
Http Parameter Pollution, a new category of web attacksHttp Parameter Pollution, a new category of web attacks
Http Parameter Pollution, a new category of web attacks
 
Oracle Database Availability & Scalability Across Versions & Editions
Oracle Database Availability & Scalability Across Versions & EditionsOracle Database Availability & Scalability Across Versions & Editions
Oracle Database Availability & Scalability Across Versions & Editions
 
Domino policies deep dive
Domino policies deep diveDomino policies deep dive
Domino policies deep dive
 
Presentation Oracle Undo & Redo Structures
Presentation Oracle Undo & Redo StructuresPresentation Oracle Undo & Redo Structures
Presentation Oracle Undo & Redo Structures
 
5 Important Secure Coding Practices
5 Important Secure Coding Practices5 Important Secure Coding Practices
5 Important Secure Coding Practices
 
9 box matrix
9 box matrix9 box matrix
9 box matrix
 

Similar to How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks

Dbrec - Music recommendations using DBpedia
Dbrec - Music recommendations using DBpediaDbrec - Music recommendations using DBpedia
Dbrec - Music recommendations using DBpediaAlexandre Passant
 
The Compatibility Challenge:Examining R and Developing TERR
The Compatibility Challenge:Examining R and Developing TERRThe Compatibility Challenge:Examining R and Developing TERR
The Compatibility Challenge:Examining R and Developing TERRLou Bajuk
 
LALR Parser Presentation ppt
LALR Parser Presentation pptLALR Parser Presentation ppt
LALR Parser Presentation pptWPVKP.COM
 
ESIL - Universal IL (Intermediate Language) for Radare2
ESIL - Universal IL (Intermediate Language) for Radare2ESIL - Universal IL (Intermediate Language) for Radare2
ESIL - Universal IL (Intermediate Language) for Radare2Anton Kochkov
 
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...Lucidworks
 
230915 paper summary learning to world model with language with details - pub...
230915 paper summary learning to world model with language with details - pub...230915 paper summary learning to world model with language with details - pub...
230915 paper summary learning to world model with language with details - pub...Seungjoon1
 
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objectsBacking Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objectsITD Systems
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDBPingCAP
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing RiakKevin Smith
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing RiakKevin Smith
 
Ric presentation
Ric presentationRic presentation
Ric presentationIvyKuo1
 
Redis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsRedis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsDave Nielsen
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...Spark Summit
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterDatabricks
 

Similar to How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks (19)

Dbrec - Music recommendations using DBpedia
Dbrec - Music recommendations using DBpediaDbrec - Music recommendations using DBpedia
Dbrec - Music recommendations using DBpedia
 
Unit 05 dbms
Unit 05 dbmsUnit 05 dbms
Unit 05 dbms
 
The Compatibility Challenge:Examining R and Developing TERR
The Compatibility Challenge:Examining R and Developing TERRThe Compatibility Challenge:Examining R and Developing TERR
The Compatibility Challenge:Examining R and Developing TERR
 
LALR Parser Presentation ppt
LALR Parser Presentation pptLALR Parser Presentation ppt
LALR Parser Presentation ppt
 
ESIL - Universal IL (Intermediate Language) for Radare2
ESIL - Universal IL (Intermediate Language) for Radare2ESIL - Universal IL (Intermediate Language) for Radare2
ESIL - Universal IL (Intermediate Language) for Radare2
 
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
 
230915 paper summary learning to world model with language with details - pub...
230915 paper summary learning to world model with language with details - pub...230915 paper summary learning to world model with language with details - pub...
230915 paper summary learning to world model with language with details - pub...
 
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objectsBacking Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
 
Presentation mam saima kanwal
Presentation mam saima kanwalPresentation mam saima kanwal
Presentation mam saima kanwal
 
Database part2-
Database part2-Database part2-
Database part2-
 
lecture_24.pptx
lecture_24.pptxlecture_24.pptx
lecture_24.pptx
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
 
Introducing Riak
Introducing RiakIntroducing Riak
Introducing Riak
 
Ric presentation
Ric presentationRic presentation
Ric presentation
 
Redis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsRedis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale Apps
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
 
Final_show
Final_showFinal_show
Final_show
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks

  • 1. How SolrCloud Solved Recovery Issues Cao Manh Dat Lucene/Solr Committer, Lucidworks @caomanhdat #Activate18 #ActivateSearch
  • 2. Agenda • Basic of SolrCloud Indexing • How Solr used to deal with indexing failures • The new design • Q&A
  • 3. Basic of SolrCloud Indexing L ZK R1 R2 Shard 1
  • 4. Basic of SolrCloud Indexing L ZK R1 R2 Shard 1 update: u1
  • 5. Basic of SolrCloud Indexing L ZK R1 R2 Shard 1 update: u1 u1 u1
  • 6. Basic of SolrCloud Indexing L ZK R1 R2 Shard 1 update: u1 success success
  • 7. Basic of SolrCloud Indexing L ZK R1 R2 Shard 1 success success success
  • 8. Index failure on replica side L ZK R1 R2 Shard 1 no response success • Unavoidable • Connection issues • Replica's node met tragic events query
  • 9. Agenda • Basic of SolrCloud Indexing • How Solr used to deal with index failures • The new design • Q&A
  • 11. How Solr used to deal with index failures Idea (SOLR-5495): LIR process 1. Leader publish replica's state to DOWN 2. Leader requests replica to do recovery 3. Replica does recovery A. Replica publish its state to RECOVERING B. Sync with leader C. Replica publish its state to ACTIVE L R ZK 1 2 3a, 3c 3b
  • 12. When replica's state is not enough When a replica is out-of-sync • It should not become leader • It should not becomes ACTIVE without acknowledging LIR process • It should not skip recovery Additional Flag : LIR State {ACTIVE, RECOVERING, DOWN} • Both leader and replica can change it • A replica has 9 different states in total
  • 13. A failure case of the old design Outcome: A replica stay in DOWN state forever LR ZK START update LIRRecovery on STARTUP publish DOWN wait leader to see RECOVERING state Timeout R's state = REC R's state = DOWN wait to see R's state is RECOVERING hmm, nope! failed to send an update
  • 14. Cons of the old design • Replica states are shared resources • LIR states are shared resources • Unable to prove its correctness • Not being able to solve all kind of failures
  • 15. Agenda • Basic of SolrCloud Indexing • How Solr used to deal with index failures • The new design • Q&A
  • 16. The new design (SOLR-11702) • Each replica will have an associated term (a positive number) • The term terminology is borrowed from the Raft paper (https://goo.gl/9UaURg) • Terms of all replicas of a shard are stored in ZK • Path : /collections/collection1/terms/shard1 • Val : {"core1" : 2, "core2" : 2, "core3" : 0} • Only replicas with highest term can become leader
  • 17. Operations for changing terms • Op1 : A replica set its term equals to its leader • Op2 : A leader increase its term and some other replica terms by 1 • from : {"core1" : 2, "core2" : 2, "core3" : 2} • to : {"core1" : 3, "core2" : 3, "core3" : 2} • Term can only be monotonic increased
  • 18. Rules • Leader only forwards updates to replicas with terms equal to its term • {"core1" : 3, "core2" : 3, "core3" : 2} • Replica will watch the term values node and start recovery process whenever its term less than its leader
  • 19. How to deal with index failures 1. Leader (L) increase its term and other replicas succeeded on responding to the update by 1 • from : {"L":1, "R1":1, "R2":1} • to : {"L":2, "R1":1, "R2":2} 2. Replica (R1) watch the ZK and get notified that it needs to do recovery • Replica sets it term equal to leader then do recovery • {"L":2, "R1":2, "R1_recovering":1, "R2":2} L R1 ZK 1 2
  • 20. Consistency problem until 7.3 1. A shard with 3 replicas and R1 is leader R1 R3R2
  • 21. Consistency problem until 7.3 1. A shard with 3 replicas and R1 is leader 2. R2 and R3 go DOWN R1 R3R2
  • 22. Consistency problem until 7.3 1. A shard with 3 replicas and R1 is leader 2. R2 and R3 go DOWN 3. R1 receives updates u1, u2 R1 R3R2 u1,u2
  • 23. Consistency problem until 7.3 1. A shard with 3 replicas and R1 is leader 2. R2 and R3 go DOWN 3. R1 receives updates u1, u2 4. R1 goes DOWN 5. R2 and R3 comes back R1 u1,u2 R3R2
  • 24. Consistency problem until 7.3 1. A shard with 3 replicas and R1 is leader 2. R2 and R3 go DOWN 3. R1 receives updates u1, u2 4. R1 goes DOWN 5. R2 and R3 comes back 6. R2 or R3 become leader without having u1, u2 R3R2 R1 u1,u2
  • 25. How new design solved consistency problem 1. A shard with 3 replicas and R1 is leader R1 R3R2 {R1:1, R2:1, R3:1}
  • 26. How new design solved consistency problem 1. A shard with 3 replicas and R1 is leader 2. R2 and R3 go DOWN R1 R3R2 {R1:1, R2:1, R3:1}
  • 27. How new design solved consistency problem 1. A shard with 3 replicas and R1 is leader 2. R2 and R3 go DOWN 3. R1 receives updates u1, u2 R1 R3R2 u1,u2 {R1:2, R2:1, R3:1}
  • 28. How new design solved consistency problem 1. A shard with 3 replicas and R1 is leader 2. R2 and R3 go DOWN 3. R1 receives updates u1, u2 4. R1 goes DOWN 5. R2 and R3 comes back 6. R2 and R3 can't become leader since their terms is not highest R1 u1,u2 R3R2 {R1:2, R2:1, R3:1}
  • 29. Pros of the new design • Proof of correctness • Term of replicas are a great hint for leader election • {"core1" : 1, "core2" : 2, "core3" : 0} • No need for direct connection between leader and replica • Only replica update its state • Solved long-standing issues • Replica stays in DOWN state forever • Leaderless shard • Design document : https://goo.gl/ueSLFT
  • 30. Note for system administrators • New LIR design is introduced since Solr 7.3 • Leader in 7.3 will • use old LIR process for replicas running 7.2 or previous versions • use new LIR process for replicas running 7.3 or after versions • The backward-compatibility support will be removed since Solr 8.0 • Leader in 8.0 can only use new LIR process • Leverage the new design in leader election
  • 31. Q&A