SlideShare a Scribd company logo
1 of 49
Download to read offline
Scaling with Solr Cloud Saumitra Srivastav saumitra.srivastav@glassbeam.com Bangalore Apache Solr Group September-2014 Meetup
What is Solr Cloud? 
-set of features which add distributed capabilities in Solr 
-fault tolerance and high availability 
-distributed indexing and search 
-enable and simplify horizontal scaling a search index using sharding and replication
Non-Cloud Single Node Deployment 
Machine(server) - 1 
Solr Node ( jetty on port 8983 ) 
Core - 1 
Conf 
Data 
Core - 2 
Conf 
Data 
Core - N 
Conf 
Data 
......... .........
Use Solr Cloud for ... 
-performance 
-scalability 
-high-availability 
-simplicity 
-elasticity
Solr Cloud Glossary 
-Cluster 
-Node 
-Shard 
-Leader & Replica 
-Overseer 
-Collection 
-Zookeeper
High Level View
Glossary 
-Cluster 
-set of solr nodes 
-Node 
-a JVM instance running Solr. 
-also known as a Solr server. 
-Core 
-an individual Solr instance (represents a logical index). 
-multiple cores can run on a single node.
Glossary 
-Collection 
-one or more documents grouped together in a single logical index. 
-can be spread across multiple cores. 
-Shard 
-a logical section of a single collection 
-Implemented as core 
-Replica 
-A copy of a shard or single logical index 
-used in failover or load balancing.
Glossary 
-Leader 
-The main node for each shard that routes document adds, updates, or deletes to other replicas 
-if leader goes down, a new node will be elected to take it's place 
-Overseer 
-A single node in SolrCloud that is responsible for processing actions involving the entire cluster 
-if overseer goes down, a new node will be elected to take it's place
Zookeeper 
-distributed coordination 
-maintaining configuration information 
Solr Node 1 10.0.0.1:8983 
Solr Node 3 10.0.0.3:8983 
Solr Node 2 10.0.0.2:8983 
Solr Node 4 10.0.0.4:8983 
Zookeeper
Zookeeper 
Solr Node 1 10.0.0.1:8983 
Solr Node 3 10.0.0.3:8983 
Solr Node 2 10.0.0.2:8983 
Solr Node 4 10.0.0.4:8983 
zk-1:2181 
zk-2:2182 
zk-3:2183 
Quorum 
Client
Zookeeper - Central Configuration
Zookeeper - distributed coordination 
-Keep track of /live_nodes 
-Collection metadata and replica state in /clusterstate.json 
-Alias list in /aliasies.json 
-Leader election
Collections 
-Collection is a distributed index defined by: 
-named configuration 
-stored in ZooKeeper 
-number of shards 
-replication factor 
-Number of copies of each document in the collection 
-document routing strategy: 
-how documents get assigned to shards
Collections API 
localhost:8983/solr/admin/collections?action=CREATE &name=collection1 &numShards=4 &replicationFactor=2 &maxShardsPerNode=1 &createNodeSet=localhost:8933 &collection.configName=collection1Config
Collections
Sharding 
-Collection has a fixed number of shards 
-existing shards can be split 
-When to shard? 
-Large number of docs 
-Large document sizes 
-Parallelization during indexing and queries 
-Data partitioning (custom hashing)
Replication 
-Why replicate? 
-High-availability 
-Load balancing 
-How does it work in SolrCloud? 
-Near-real-time, NOT master-slave 
-Leader forwards to replicas in parallel, waits for response 
-Error handling during indexing is tricky
Indexing
Indexing 
1.Get cluster state from ZK 
2.Route document directly to leader (hash on doc ID) 
3.Persist document on durable storage (tlog) 
4.Forward to healthy replicas 
5.Acknowledge write succeed to client
Querying
Querying 
-Query client can be ZK aware or just query via a load balancer 
-Client can send query to any node in the cluster 
-Controller node distributes the query to a replica for each shard to identify documents matching query 
-Controller node sorts the results from step 3 and issues a second query for all fields for a page of results
Transaction Log (tlog) 
-file where the raw documents are written for recovery purposes 
-each node has its own tlog 
-replayed on server restart 
-in case of non gracefull shutdown 
-“rolled over” automatically on hard commit 
-old one is closed and a new one is opened
Transaction Log (tlog)
Commits 
-Hard Commit & Soft Commit 
-Hard commits are about durability, soft commits are about visibility 
-Further reading: https://lucidworks.com/blog/understanding- transaction-logs-softcommit-and-commit-in- sorlcloud/
What happens on hard Commit? 
-The tlog is truncated. 
-A new tlog is started. 
-Old tlogs will be deleted if there are more than 100 documents in newer tlogs. 
-The current index segment is closed and flushed. 
-Background segment merges may be initiated.
What happens on soft commit? 
-The tlog has NOT been truncated. It will continue to grow. 
-New documents WILL be visible. 
-some caches will have to be reloaded 
-top-level caches will be invalidated.
Shard Splitting 
-Can split shards into two sub-shards 
-Live splitting. No downtime needed. 
-Requests start being forwarded to sub-shards automatically 
-Expensive operation: Use as required during low traffic
Overseer 
-Persists collection state change events to zooKeeper 
-Controller for Collection API commands 
-One per cluster (for all collections); elected using leader election 
-Asynchronous (pub/sub messaging) 
-Automated failover to a healthy node 
-Can be assigned to a dedicated node
Overseer
Controlling data partitioning 
-Shard vs Replicas 
-Custom Routing 
-Collection Aliasing
Shard vs Replica 
More data? 
Shard 
Replica 
Replica 
Shard 
Shard 
Replica 
More queries? 
Replica 
Replica 
Replica
Document Routing 
-How to assign documents to shards 
-Default Routing 
-Custom routing 
-Routers 
-CompositeID 
-Implicit
Default Routing 
-Each shard covers a hash-range 
-Hash doc-ID into 32-bit integer, map to range 
-Leads to balanced (roughly) shards
Default Routing 
Shard 1 0 - 7fffffff 
Collection 
Document-1 Id = bookdoc1 
Document-2 Id = magazinedoc1 
Document-3 Id = bookdoc2 
32 bit Hash of Document ID 
Shard 2 80000000 - ffffffff 
858919514 
2516704228 
413288864
Default Routing - Querying 
Shard 1 
Shard 2 
Shard 3 
Shard 4 
Shard 5 
Shard 6 
Shard 7 
Shard 8 
Collection 
Application 
q=soccer
Custom Routing 
-Route documents to specific shards 
-based on a shard key component in the document ID
Custom Routing 
-send documents with a prefix in the document ID 
-prefix in ID will be used to calculate the hash to determine the shard 
-Prefix must be separated by exclamation mark(!) 
-Example: 
1.Book!doc1 
2.Magazine!doc1 
3.Book!author!doc2
Custom Routing - Indexing 
Shard 1 0 - 7fffffff 
Collection 
Document-1 Id = book!doc1 
Document-2 Id = magazine!doc1 
Document-3 Id = book!doc2 
Shard 2 80000000 - ffffffff
Custom Routing - Querying 
http://10.0.0.7:8983/solr/collection1/select? q=soccer& _route_=books http://10.0.0.7:8983/solr/collection1/select? q=soccer& _route_=books,magazines
Custom Routing - Querying 
Shard 1 
Shard 2 
Shard 3 
Shard 4 
Shard 5 
Shard 6 
Shard 7 
Shard 8 
Collection 
Application 
q=soccer&_route_=books!
Implicit Router 
-A field can be defined while creating collection to be used for routing http://localhost:8983/solr/admin/collections? action=CREATE& name=articles& router.name=implicit& router.field=article-type
Collection Aliasing 
-allows you to setup a virtual collection that actually points to one or more real collections 
-Virtual collection == alias localhost:8983/solr/admin/collections? action=CREATEALIAS &name=alias-name &collections=collection-list
Collection Aliasing 
-Time-series data 
June 
last3months 
latest 
July 
Aug 
Sep 
Oct 
alias 
alias 
Real Collections
Collection Aliasing 
June 
last3months 
latest 
July 
Aug 
Sep 
Oct 
alias 
alias 
Real Collections 
localhost:8983/solr/admin/collections? action=CREATEALIAS &name=last3months &collections=aug,sep,oct 
localhost:8983/solr/admin/collections? action=CREATEALIAS &name=latest &collections=oct
Collection Aliasing 
June 
last3months 
latest 
July 
Aug 
Sep 
Oct 
alias 
alias 
Real Collections 
localhost:8983/solr/admin/collections? action=CREATEALIAS &name=last3months &collections=sep,oct,nov 
localhost:8983/solr/admin/collections? action=CREATEALIAS &name=latest &collections=nov 
Nov
Collection Aliasing 
-Aliases can be: 
•updated on the fly 
•queried just like a normal collection 
•used for indexing as long as it is pointing to a single collection
Other Features 
-Near-Real-Time Search 
-Atomic Updates 
-Optimistic Locking 
-HTTPS 
-Use HDFS for storing indexes 
-Use MapReduce for building index
Thanks 
-Attributions: 
•Shalin Mangar’s slides on “SolrCloud: Searching Big Data” 
•Rafał Kuć’s slides on “Scaling Solr with SolrCloud” 
-Connect 
•saumitra.srivastav@glassbeam.com 
•saumitra.srivastav7@gmail.com 
•https://www.linkedin.com/in/saumitras 
•@_saumitra_ 
-Join: 
•http://www.meetup.com/Bangalore-Apache-Solr-Lucene-Group/

More Related Content

What's hot

Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Databricks
 

What's hot (20)

Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an Exporter
 
AWS Connectivity, VPC Design and Security Pro Tips
AWS Connectivity, VPC Design and Security Pro TipsAWS Connectivity, VPC Design and Security Pro Tips
AWS Connectivity, VPC Design and Security Pro Tips
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
 
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
 
Amazon Virtual Private Cloud
Amazon Virtual Private CloudAmazon Virtual Private Cloud
Amazon Virtual Private Cloud
 
AWS Introduction
AWS IntroductionAWS Introduction
AWS Introduction
 
Network Security and Access Control in AWS
Network Security and Access Control in AWSNetwork Security and Access Control in AWS
Network Security and Access Control in AWS
 
Serverless with IAC - terraform과 cloudformation 비교
Serverless with IAC - terraform과 cloudformation 비교Serverless with IAC - terraform과 cloudformation 비교
Serverless with IAC - terraform과 cloudformation 비교
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
OWASP AppSecEU 2018 – Attacking "Modern" Web Technologies
OWASP AppSecEU 2018 – Attacking "Modern" Web TechnologiesOWASP AppSecEU 2018 – Attacking "Modern" Web Technologies
OWASP AppSecEU 2018 – Attacking "Modern" Web Technologies
 
Moving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco RepositoryMoving Gigantic Files Into and Out of the Alfresco Repository
Moving Gigantic Files Into and Out of the Alfresco Repository
 
[오픈소스컨설팅] Docker를 활용한 Gitlab CI/CD 구성 테스트
[오픈소스컨설팅] Docker를 활용한 Gitlab CI/CD 구성 테스트[오픈소스컨설팅] Docker를 활용한 Gitlab CI/CD 구성 테스트
[오픈소스컨설팅] Docker를 활용한 Gitlab CI/CD 구성 테스트
 
AWS Fargate on EKS 실전 사용하기
AWS Fargate on EKS 실전 사용하기AWS Fargate on EKS 실전 사용하기
AWS Fargate on EKS 실전 사용하기
 
Build and Manage Your APIs with Amazon API Gateway
Build and Manage Your APIs with Amazon API GatewayBuild and Manage Your APIs with Amazon API Gateway
Build and Manage Your APIs with Amazon API Gateway
 
AWS における サーバーレスの基礎からチューニングまで
AWS における サーバーレスの基礎からチューニングまでAWS における サーバーレスの基礎からチューニングまで
AWS における サーバーレスの基礎からチューニングまで
 
AWS AutoScaling
AWS AutoScalingAWS AutoScaling
AWS AutoScaling
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
 

Viewers also liked

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Lucidworks
 

Viewers also liked (20)

Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Friends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFSFriends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFS
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
SolrCloud and Shard Splitting
SolrCloud and Shard SplittingSolrCloud and Shard Splitting
SolrCloud and Shard Splitting
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Drools Ecosystem
Drools EcosystemDrools Ecosystem
Drools Ecosystem
 
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 

Similar to Scaling search with SolrCloud

Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Lucidworks (Archived)
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovations
Lucidworks (Archived)
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data Analyses
Alaa Elhadba
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Lucidworks
 

Similar to Scaling search with SolrCloud (20)

Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovations
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data Analyses
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Solr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSolr As A SparkSQL DataSource
Solr As A SparkSQL DataSource
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Technical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauTechnical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques Nadeau
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Percona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialPercona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorial
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBA
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
 

Recently uploaded

怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Recently uploaded (20)

怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 

Scaling search with SolrCloud

  • 1. Scaling with Solr Cloud Saumitra Srivastav saumitra.srivastav@glassbeam.com Bangalore Apache Solr Group September-2014 Meetup
  • 2. What is Solr Cloud? -set of features which add distributed capabilities in Solr -fault tolerance and high availability -distributed indexing and search -enable and simplify horizontal scaling a search index using sharding and replication
  • 3. Non-Cloud Single Node Deployment Machine(server) - 1 Solr Node ( jetty on port 8983 ) Core - 1 Conf Data Core - 2 Conf Data Core - N Conf Data ......... .........
  • 4. Use Solr Cloud for ... -performance -scalability -high-availability -simplicity -elasticity
  • 5. Solr Cloud Glossary -Cluster -Node -Shard -Leader & Replica -Overseer -Collection -Zookeeper
  • 7. Glossary -Cluster -set of solr nodes -Node -a JVM instance running Solr. -also known as a Solr server. -Core -an individual Solr instance (represents a logical index). -multiple cores can run on a single node.
  • 8. Glossary -Collection -one or more documents grouped together in a single logical index. -can be spread across multiple cores. -Shard -a logical section of a single collection -Implemented as core -Replica -A copy of a shard or single logical index -used in failover or load balancing.
  • 9. Glossary -Leader -The main node for each shard that routes document adds, updates, or deletes to other replicas -if leader goes down, a new node will be elected to take it's place -Overseer -A single node in SolrCloud that is responsible for processing actions involving the entire cluster -if overseer goes down, a new node will be elected to take it's place
  • 10. Zookeeper -distributed coordination -maintaining configuration information Solr Node 1 10.0.0.1:8983 Solr Node 3 10.0.0.3:8983 Solr Node 2 10.0.0.2:8983 Solr Node 4 10.0.0.4:8983 Zookeeper
  • 11. Zookeeper Solr Node 1 10.0.0.1:8983 Solr Node 3 10.0.0.3:8983 Solr Node 2 10.0.0.2:8983 Solr Node 4 10.0.0.4:8983 zk-1:2181 zk-2:2182 zk-3:2183 Quorum Client
  • 12. Zookeeper - Central Configuration
  • 13. Zookeeper - distributed coordination -Keep track of /live_nodes -Collection metadata and replica state in /clusterstate.json -Alias list in /aliasies.json -Leader election
  • 14. Collections -Collection is a distributed index defined by: -named configuration -stored in ZooKeeper -number of shards -replication factor -Number of copies of each document in the collection -document routing strategy: -how documents get assigned to shards
  • 15. Collections API localhost:8983/solr/admin/collections?action=CREATE &name=collection1 &numShards=4 &replicationFactor=2 &maxShardsPerNode=1 &createNodeSet=localhost:8933 &collection.configName=collection1Config
  • 17. Sharding -Collection has a fixed number of shards -existing shards can be split -When to shard? -Large number of docs -Large document sizes -Parallelization during indexing and queries -Data partitioning (custom hashing)
  • 18. Replication -Why replicate? -High-availability -Load balancing -How does it work in SolrCloud? -Near-real-time, NOT master-slave -Leader forwards to replicas in parallel, waits for response -Error handling during indexing is tricky
  • 20. Indexing 1.Get cluster state from ZK 2.Route document directly to leader (hash on doc ID) 3.Persist document on durable storage (tlog) 4.Forward to healthy replicas 5.Acknowledge write succeed to client
  • 22. Querying -Query client can be ZK aware or just query via a load balancer -Client can send query to any node in the cluster -Controller node distributes the query to a replica for each shard to identify documents matching query -Controller node sorts the results from step 3 and issues a second query for all fields for a page of results
  • 23. Transaction Log (tlog) -file where the raw documents are written for recovery purposes -each node has its own tlog -replayed on server restart -in case of non gracefull shutdown -“rolled over” automatically on hard commit -old one is closed and a new one is opened
  • 25. Commits -Hard Commit & Soft Commit -Hard commits are about durability, soft commits are about visibility -Further reading: https://lucidworks.com/blog/understanding- transaction-logs-softcommit-and-commit-in- sorlcloud/
  • 26. What happens on hard Commit? -The tlog is truncated. -A new tlog is started. -Old tlogs will be deleted if there are more than 100 documents in newer tlogs. -The current index segment is closed and flushed. -Background segment merges may be initiated.
  • 27. What happens on soft commit? -The tlog has NOT been truncated. It will continue to grow. -New documents WILL be visible. -some caches will have to be reloaded -top-level caches will be invalidated.
  • 28. Shard Splitting -Can split shards into two sub-shards -Live splitting. No downtime needed. -Requests start being forwarded to sub-shards automatically -Expensive operation: Use as required during low traffic
  • 29. Overseer -Persists collection state change events to zooKeeper -Controller for Collection API commands -One per cluster (for all collections); elected using leader election -Asynchronous (pub/sub messaging) -Automated failover to a healthy node -Can be assigned to a dedicated node
  • 31. Controlling data partitioning -Shard vs Replicas -Custom Routing -Collection Aliasing
  • 32. Shard vs Replica More data? Shard Replica Replica Shard Shard Replica More queries? Replica Replica Replica
  • 33. Document Routing -How to assign documents to shards -Default Routing -Custom routing -Routers -CompositeID -Implicit
  • 34. Default Routing -Each shard covers a hash-range -Hash doc-ID into 32-bit integer, map to range -Leads to balanced (roughly) shards
  • 35. Default Routing Shard 1 0 - 7fffffff Collection Document-1 Id = bookdoc1 Document-2 Id = magazinedoc1 Document-3 Id = bookdoc2 32 bit Hash of Document ID Shard 2 80000000 - ffffffff 858919514 2516704228 413288864
  • 36. Default Routing - Querying Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Collection Application q=soccer
  • 37. Custom Routing -Route documents to specific shards -based on a shard key component in the document ID
  • 38. Custom Routing -send documents with a prefix in the document ID -prefix in ID will be used to calculate the hash to determine the shard -Prefix must be separated by exclamation mark(!) -Example: 1.Book!doc1 2.Magazine!doc1 3.Book!author!doc2
  • 39. Custom Routing - Indexing Shard 1 0 - 7fffffff Collection Document-1 Id = book!doc1 Document-2 Id = magazine!doc1 Document-3 Id = book!doc2 Shard 2 80000000 - ffffffff
  • 40. Custom Routing - Querying http://10.0.0.7:8983/solr/collection1/select? q=soccer& _route_=books http://10.0.0.7:8983/solr/collection1/select? q=soccer& _route_=books,magazines
  • 41. Custom Routing - Querying Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Collection Application q=soccer&_route_=books!
  • 42. Implicit Router -A field can be defined while creating collection to be used for routing http://localhost:8983/solr/admin/collections? action=CREATE& name=articles& router.name=implicit& router.field=article-type
  • 43. Collection Aliasing -allows you to setup a virtual collection that actually points to one or more real collections -Virtual collection == alias localhost:8983/solr/admin/collections? action=CREATEALIAS &name=alias-name &collections=collection-list
  • 44. Collection Aliasing -Time-series data June last3months latest July Aug Sep Oct alias alias Real Collections
  • 45. Collection Aliasing June last3months latest July Aug Sep Oct alias alias Real Collections localhost:8983/solr/admin/collections? action=CREATEALIAS &name=last3months &collections=aug,sep,oct localhost:8983/solr/admin/collections? action=CREATEALIAS &name=latest &collections=oct
  • 46. Collection Aliasing June last3months latest July Aug Sep Oct alias alias Real Collections localhost:8983/solr/admin/collections? action=CREATEALIAS &name=last3months &collections=sep,oct,nov localhost:8983/solr/admin/collections? action=CREATEALIAS &name=latest &collections=nov Nov
  • 47. Collection Aliasing -Aliases can be: •updated on the fly •queried just like a normal collection •used for indexing as long as it is pointing to a single collection
  • 48. Other Features -Near-Real-Time Search -Atomic Updates -Optimistic Locking -HTTPS -Use HDFS for storing indexes -Use MapReduce for building index
  • 49. Thanks -Attributions: •Shalin Mangar’s slides on “SolrCloud: Searching Big Data” •Rafał Kuć’s slides on “Scaling Solr with SolrCloud” -Connect •saumitra.srivastav@glassbeam.com •saumitra.srivastav7@gmail.com •https://www.linkedin.com/in/saumitras •@_saumitra_ -Join: •http://www.meetup.com/Bangalore-Apache-Solr-Lucene-Group/