SlideShare a Scribd company logo
1 of 16
Download to read offline
I M P A C TI M P A C T
SOLR HOSTING &
LARGE SCALE
V E D T I N O D A I N E S E – T I D @ I M P A C T . D K
I M P A C T
AGE NDA
• Simpel hosting af SOLR (Nssm)
• SolrCloud
– Hvad er det? Start, virkemåde
• Zookeeper
– Opsætning
• Hvorfor 3 servere
– Se og Imerco
• Et optimalt Setup (NLB).
• Backup
I M P A C T
TE RMINOLOGI
• Core, basis enheden i et Solr index.
• Collection, samling af dokumenter – en Collection er hosted i en core.
• Configration, sæt af konfigrations filer,(schema.xml, solrconfig.xml etc)
• Leader, hver shard har en slice som er identificeret som leder.
• Node, fysisk server instans i solr cloud
• Replica, en kopi af en slice, og shard
• Shard, En eller flere slices – (leader og replika).
• Slice, en del af en shard – en leder eller replica.
• SolrCloud, Samlig af solr noder
I M P A C T
INSTALLATION
• Installer Java JRE 1.7
• Download Solr zip
– http://mirrors.rackhosting.com/apache/lucene/so
lr/5.2.1/
• Udpak i c:solr eller lign.
I M P A C T
HOSTRING AF SOLR
• Vi burger NSSM
• ´the Non-Sucking Service Manager´
• AppDirectory = C:solr
• Application = Java
• -Dsolr.solr.home=d:/Data/Solr/MultiCore
• -Dsolr.solr.log=e:/Solr
• -Djetty.home=c:/Solr/
• -Djetty.logs=e:/Solr
• -DzkHost=SLR3, SLR4, SLR5 -jar start.jar
I M P A C T
SOLRCLOUD
• Ingen master/slave, men leaders og replicas
• Fleksibelt distribueret søgning og index
• Fejl tolerant og highly available
• Automatisk loadblance og failover for queryes
• Zookeep integration for cluster kooridnering og
konfiguration. (placeringer etc.)
I M P A C T
START AF SOLRCLOUD
• Vurder antal shards er der brug for
• Vurder antal maskine der er brug for
• Start ZooKeeper
• Start Solr på alle serverne
• Send dokumenter til en vilkårlig maskine
• Send forespørgsler til en vilkårlig maskine
I M P A C T
SOLRCLOUD
Solr Master 1
ZK node
Solr Replica 1
Zk node
Solr Master 2
ZK node
Solr Replica 1
Zk node
- Tilføj node, pej på ZK
- Node tager rolle Shard 2
- Automatisk dokument distribution
- Automatish query i clusteret
- Central konfig og monitorering
- Tilføj replica noder
- Automatisk rolle fra ZK
- Leder election af ZK
I M P A C T
SOLRCLOUD
Solr Master 1
ZK node
Solr Replica 1
Zk node
Solr Master 2
ZK node
Solr Replica 1
Zk node
Solr Master 2
ZK node
I M P A C T
SOLRCLOUD
Solr Master 1
ZK node
Solr Replica 1
Zk node
Solr Master 2
ZK node
Solr Replica 1
Zk node
-DzkRun
-Dcollection.configName=jz
-DnumShards=2
-Dbootstrap_confdir=./solr/coll/conf
-DzkHost=ZookeeperHost1
-DzkHost=ZookeeperHost1
-DzkHost=ZookeeperHost1
-DzkHost=ZookeeperHost1
I M P A C T
SOLRCLOUD
Solr shard 1
ZK node
Load
Balancer
Load
Balancer
Solr shard 2
ZK node
Solr shard 3
ZK node
Client søger
efter data
I M P A C T
SOLRCLOUD & ZOOKE E PE R
• Optimalt setup
• Fuld redundant
• Tag en maskine ud og ind – uden konfiguration
• Alle maskiner er aktive
• Zookeeper er 1,3,5 etc.
• En giver ingen redundans, 5 er for meget
• Ingen NLB opsætning på eksterne servere
• Stadigt billigt i licenser
Solr Leader
ZK node
Solr replica
ZK node
Solr replica
ZK node
Windows NLB
http server for
Solr
I M P A C T
OPSÆTNING ZOOKE E PE R
• Download zookeeper
• Udpak i f.eks. C:zookeeper-3.4.6
• Nssm opsat til at starte C: zookeeper-3.4.6 binzkServer.cmd
• C:zookeeperdatamyid skal indeholde den unikke id
• Zoo.cfg skal ligge i "C:zookeeper-3.4.6conf" og indeholde:
• Cores kan tilføjes til zookeeper, ved at kalde zkCli:
– zkcli -zkhost localhost:2181 -cmd upconfig -confdir N:SolrMultiCore/CmsContentCore/conf -
confname CmsContentCore
I M P A C T
IME RCO OG SE
I M P A C T
BACKUP FILE
• Hvorfor tage backup overhoved???
– F.eks. Hvis index er ødelagt
– Design arkitektur så alt data kan genskabes nemt
• Backup
– Data directory som har index filerne
– Konfig fler som schema.xml. Solrconfig.xml etc.
• Restore
– Luk SOLR ned, og kopier filerne tilbage, starte SOLR.
I M P A C T
BACKUP API
• Lav backup sådan:
– http://localhost:8983/solr/<core>/replication?command=backup
• Check backup sådan:
– http://localhost:8983/solr/ /<core>/replication?command=details

More Related Content

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

SOLR hosting & large scale

  • 1. I M P A C TI M P A C T SOLR HOSTING & LARGE SCALE V E D T I N O D A I N E S E – T I D @ I M P A C T . D K
  • 2. I M P A C T AGE NDA • Simpel hosting af SOLR (Nssm) • SolrCloud – Hvad er det? Start, virkemåde • Zookeeper – Opsætning • Hvorfor 3 servere – Se og Imerco • Et optimalt Setup (NLB). • Backup
  • 3. I M P A C T TE RMINOLOGI • Core, basis enheden i et Solr index. • Collection, samling af dokumenter – en Collection er hosted i en core. • Configration, sæt af konfigrations filer,(schema.xml, solrconfig.xml etc) • Leader, hver shard har en slice som er identificeret som leder. • Node, fysisk server instans i solr cloud • Replica, en kopi af en slice, og shard • Shard, En eller flere slices – (leader og replika). • Slice, en del af en shard – en leder eller replica. • SolrCloud, Samlig af solr noder
  • 4. I M P A C T INSTALLATION • Installer Java JRE 1.7 • Download Solr zip – http://mirrors.rackhosting.com/apache/lucene/so lr/5.2.1/ • Udpak i c:solr eller lign.
  • 5. I M P A C T HOSTRING AF SOLR • Vi burger NSSM • ´the Non-Sucking Service Manager´ • AppDirectory = C:solr • Application = Java • -Dsolr.solr.home=d:/Data/Solr/MultiCore • -Dsolr.solr.log=e:/Solr • -Djetty.home=c:/Solr/ • -Djetty.logs=e:/Solr • -DzkHost=SLR3, SLR4, SLR5 -jar start.jar
  • 6. I M P A C T SOLRCLOUD • Ingen master/slave, men leaders og replicas • Fleksibelt distribueret søgning og index • Fejl tolerant og highly available • Automatisk loadblance og failover for queryes • Zookeep integration for cluster kooridnering og konfiguration. (placeringer etc.)
  • 7. I M P A C T START AF SOLRCLOUD • Vurder antal shards er der brug for • Vurder antal maskine der er brug for • Start ZooKeeper • Start Solr på alle serverne • Send dokumenter til en vilkårlig maskine • Send forespørgsler til en vilkårlig maskine
  • 8. I M P A C T SOLRCLOUD Solr Master 1 ZK node Solr Replica 1 Zk node Solr Master 2 ZK node Solr Replica 1 Zk node - Tilføj node, pej på ZK - Node tager rolle Shard 2 - Automatisk dokument distribution - Automatish query i clusteret - Central konfig og monitorering - Tilføj replica noder - Automatisk rolle fra ZK - Leder election af ZK
  • 9. I M P A C T SOLRCLOUD Solr Master 1 ZK node Solr Replica 1 Zk node Solr Master 2 ZK node Solr Replica 1 Zk node Solr Master 2 ZK node
  • 10. I M P A C T SOLRCLOUD Solr Master 1 ZK node Solr Replica 1 Zk node Solr Master 2 ZK node Solr Replica 1 Zk node -DzkRun -Dcollection.configName=jz -DnumShards=2 -Dbootstrap_confdir=./solr/coll/conf -DzkHost=ZookeeperHost1 -DzkHost=ZookeeperHost1 -DzkHost=ZookeeperHost1 -DzkHost=ZookeeperHost1
  • 11. I M P A C T SOLRCLOUD Solr shard 1 ZK node Load Balancer Load Balancer Solr shard 2 ZK node Solr shard 3 ZK node Client søger efter data
  • 12. I M P A C T SOLRCLOUD & ZOOKE E PE R • Optimalt setup • Fuld redundant • Tag en maskine ud og ind – uden konfiguration • Alle maskiner er aktive • Zookeeper er 1,3,5 etc. • En giver ingen redundans, 5 er for meget • Ingen NLB opsætning på eksterne servere • Stadigt billigt i licenser Solr Leader ZK node Solr replica ZK node Solr replica ZK node Windows NLB http server for Solr
  • 13. I M P A C T OPSÆTNING ZOOKE E PE R • Download zookeeper • Udpak i f.eks. C:zookeeper-3.4.6 • Nssm opsat til at starte C: zookeeper-3.4.6 binzkServer.cmd • C:zookeeperdatamyid skal indeholde den unikke id • Zoo.cfg skal ligge i "C:zookeeper-3.4.6conf" og indeholde: • Cores kan tilføjes til zookeeper, ved at kalde zkCli: – zkcli -zkhost localhost:2181 -cmd upconfig -confdir N:SolrMultiCore/CmsContentCore/conf - confname CmsContentCore
  • 14. I M P A C T IME RCO OG SE
  • 15. I M P A C T BACKUP FILE • Hvorfor tage backup overhoved??? – F.eks. Hvis index er ødelagt – Design arkitektur så alt data kan genskabes nemt • Backup – Data directory som har index filerne – Konfig fler som schema.xml. Solrconfig.xml etc. • Restore – Luk SOLR ned, og kopier filerne tilbage, starte SOLR.
  • 16. I M P A C T BACKUP API • Lav backup sådan: – http://localhost:8983/solr/<core>/replication?command=backup • Check backup sådan: – http://localhost:8983/solr/ /<core>/replication?command=details