SlideShare a Scribd company logo
1 of 15
Download to read offline
SolrCloud and Shard Splitting
Shalin Shekhar Mangar
Bangalore Lucene/Solr Meetup
8th
June 2013
Who am I?
●
Apache Lucene/Solr Committer and PMC member
●
Contributor since January 2008
●
Currently: Engineer at LucidWorks
●
Formerly with AOL
●
Email: shalin@apache.org
●
Twitter: shalinmangar
●
Blog: http://shal.in
Bangalore Lucene/Solr Meetup
8th
June 2013
SolrCloud: Overview
●
Distributed searching/indexing
●
No single points of failure
●
Near Real Time Friendly (push replication)
●
Transaction logs for durability and recovery
●
Real-time get
●
Atomic Updates
●
Optimistic Concurrency
●
Request forwarding from any node in cluster
●
A strong contender for your NoSQL needs as well
Bangalore Lucene/Solr Meetup
8th
June 2013
Bangalore Lucene/Solr Meetup
8th
June 2013
Document Routing
80000000-bfffffff
00000000-3fffffff
40000000-7fffffff
c0000000-ffffffff
shard1shard4
shard3 shard2
1f27
3c7
1
(MurmurHash
3)
1f27
000
0
1f27 ffffto
(hash)
shard
1
q=my_query
shard.keys=BigCo!
numShards=4
router=compositeId
id = BigCo!doc5
Bangalore Lucene/Solr Meetup
8th
June 2013
SolrCloud Collections API
●
/admin/collections?action=CREATE&name=mycollection
– &numShards=3
– &replicationFactor=4
– &maxShardsPerNode=2
– &createNodeSet=node1:8080,node2:8080,node3:8080,...
– &collection.configName=myconfigset
●
/admin/collections?action=DELETE&name=mycollection
●
/admin/collections?action=RELOAD&name=mycollection
●
/admin/collections?action=CREATEALIAS&name=south
– &collections=KA,TN,AP,KL,...
●
Coming soon: Shard aliases
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Background
●
Before Solr 4.3, number of shards had to fixed at the time
of collection creation
●
Forced people to start with large number of shards
●
If a shard ran too hot, the only fix was to re-index and
therefore re-balance the collection
●
Each shard is assigned a hash range
●
Each shard also has a state which defaults to 'ACTIVE'
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Features
●
Seamless on-the-fly splitting – no downtime required
●
Retried on failures
●
/admin/collections?
action=SPLITSHARD&collection=mycollection
– &shard=shardId
●
A lower-level CoreAdmin API comes free!
– /admin/cores?action=SPLIT&core=core0&targetCore=core1&targetCore=core2
– /admin/cores?action=SPLIT&core=core0&path=/path/to/index/1&path=/path/to/index/2
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard2_0
Shard1
replic
a
leade
r
Shard2
replic
a
leade
r
Shard3
replic
a
leade
r
Shard2_1
update
Shard Splitting
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Mechanism
●
New sub-shards created in “construction” state
●
Leader starts forwarding applicable updates, which are buffered
by the sub-shards
●
Leader index is split and installed on the sub-shards
●
Sub-shards apply buffered updates
●
Replicas are created for sub-shards and brought up to speed
●
Sub-shard becomes “active” and old shard becomes “inactive”
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Tips and Gotchas
●
Supports collections with a hash based router i.e. “plain”
or “compositeId” routers
●
Operation is executed by the Overseer node, not by the
node you requested
●
HTTP request is synchronous but operation is async. A
read timeout does not mean failure!
●
Operation is retried on failure. Check parent leader's logs
before you re-issue the command or you may end with
more shards than you want
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Tips and gotchas
●
Solr Admin GUI is not aware of shard states yet so the
inactive parent shard is also shown in “green”
●
The CoreAdmin split command can be used against non-
cloud deployments. It will spread docs alternately among
the sub-indexes
●
Inactive shards have to be cleaned up manually. Solr 4.4
will have a delete shard API
●
Shard splitting in 4.3 release is buggy. Wait for 4.3.1
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Looking towards the future
●
GUI integration and better progress reporting/monitoring
●
Better support for custom sharding use-cases
●
More flexibility towards number of sub-shards, hash
ranges, number of replicas etc
●
Store replication factor per shard
●
Suggest splits to admins based on cluster state and load
Confidential and Proprietary
© 2012 LucidWorks14
About LucidWorks
• Intro to LucidWorks (formerly Lucid Imagination)
– Follow: @lucidworks, @lucidimagineer
– Learn: http://www.lucidworks.com
• Check out SearchHub: http://www.searchhub.org
• Solr 4.1 Reference Guide: http://bit.ly/11KSiMN
– Older versions: http://bit.ly/12t1Egq
• Our Products
– LucidWorks Search
– LucidWorks Big Data
• Lucene Revolution
– http://www.lucenerevolution.com
Bangalore Lucene/Solr Meetup
8th
June 2013
Thank you
Shalin Shekhar Mangar
LucidWorks

More Related Content

What's hot

The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Postgres connections at scale
Postgres connections at scalePostgres connections at scale
Postgres connections at scaleMydbops
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep InternalEXEM
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...Databricks
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Introduction To Liquibase
Introduction To Liquibase Introduction To Liquibase
Introduction To Liquibase Knoldus Inc.
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfEric Xiao
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Patternconfluent
 
Maxscale_메뉴얼
Maxscale_메뉴얼Maxscale_메뉴얼
Maxscale_메뉴얼NeoClova
 
Continuous DB Changes Delivery With Liquibase
Continuous DB Changes Delivery With LiquibaseContinuous DB Changes Delivery With Liquibase
Continuous DB Changes Delivery With LiquibaseAidas Dragūnas
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
Maxscale 소개 1.1.1
Maxscale 소개 1.1.1Maxscale 소개 1.1.1
Maxscale 소개 1.1.1NeoClova
 

What's hot (20)

The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Postgres connections at scale
Postgres connections at scalePostgres connections at scale
Postgres connections at scale
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
REST API Design
REST API DesignREST API Design
REST API Design
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Introduction To Liquibase
Introduction To Liquibase Introduction To Liquibase
Introduction To Liquibase
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
 
Maxscale_메뉴얼
Maxscale_메뉴얼Maxscale_메뉴얼
Maxscale_메뉴얼
 
Continuous DB Changes Delivery With Liquibase
Continuous DB Changes Delivery With LiquibaseContinuous DB Changes Delivery With Liquibase
Continuous DB Changes Delivery With Liquibase
 
Parquet overview
Parquet overviewParquet overview
Parquet overview
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
Maxscale 소개 1.1.1
Maxscale 소개 1.1.1Maxscale 소개 1.1.1
Maxscale 소개 1.1.1
 

Viewers also liked

Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Electionravikgiitk
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and TestingMark Miller
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupShalin Shekhar Mangar
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataShalin Shekhar Mangar
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper Omid Vahdaty
 
Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016Viet-Dung TRINH
 

Viewers also liked (20)

Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene Meetup
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Scaling search with SolrCloud
Scaling search with SolrCloudScaling search with SolrCloud
Scaling search with SolrCloud
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 
Search engine ppt
Search engine pptSearch engine ppt
Search engine ppt
 
Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016
 

Similar to SolrCloud and Shard Splitting

Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???Jitendra Chittoda
 
Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Olivier Dobberkau
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101Itiel Shwartz
 
apachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdfapachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdfssuserbb9f511
 
PostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in ZalandoPostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in ZalandoUri Savelchev
 
Oracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation OptionsOracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation OptionsChris Muir
 
BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010fschupp
 
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013Andrew Morgan
 
Apache Camel K - Copenhagen
Apache Camel K - CopenhagenApache Camel K - Copenhagen
Apache Camel K - CopenhagenClaus Ibsen
 
Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Claus Ibsen
 
Akka Clustering And Sharding
Akka Clustering And ShardingAkka Clustering And Sharding
Akka Clustering And ShardingKnoldus Inc.
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0Anshum Gupta
 
Rails 3 : Cool New Things
Rails 3 : Cool New ThingsRails 3 : Cool New Things
Rails 3 : Cool New ThingsY. Thong Kuah
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!PGConf APAC
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Implementing GraphQL API in Elixir – Victor Deryagin
Implementing GraphQL API in Elixir – Victor DeryaginImplementing GraphQL API in Elixir – Victor Deryagin
Implementing GraphQL API in Elixir – Victor DeryaginElixir Club
 

Similar to SolrCloud and Shard Splitting (20)

Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???
 
ForkJoinPools and parallel streams
ForkJoinPools and parallel streamsForkJoinPools and parallel streams
ForkJoinPools and parallel streams
 
Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101
 
Intro to openfaas
Intro to openfaasIntro to openfaas
Intro to openfaas
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
apachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdfapachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdf
 
PostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in ZalandoPostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
 
Oracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation OptionsOracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation Options
 
BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010
 
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
 
Advance Features of Hibernate
Advance Features of HibernateAdvance Features of Hibernate
Advance Features of Hibernate
 
Apache Camel K - Copenhagen
Apache Camel K - CopenhagenApache Camel K - Copenhagen
Apache Camel K - Copenhagen
 
Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2
 
Akka Clustering And Sharding
Akka Clustering And ShardingAkka Clustering And Sharding
Akka Clustering And Sharding
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Rails 3 : Cool New Things
Rails 3 : Cool New ThingsRails 3 : Cool New Things
Rails 3 : Cool New Things
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Implementing GraphQL API in Elixir – Victor Deryagin
Implementing GraphQL API in Elixir – Victor DeryaginImplementing GraphQL API in Elixir – Victor Deryagin
Implementing GraphQL API in Elixir – Victor Deryagin
 

Recently uploaded

Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 

Recently uploaded (20)

Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 

SolrCloud and Shard Splitting

  • 1. SolrCloud and Shard Splitting Shalin Shekhar Mangar
  • 2. Bangalore Lucene/Solr Meetup 8th June 2013 Who am I? ● Apache Lucene/Solr Committer and PMC member ● Contributor since January 2008 ● Currently: Engineer at LucidWorks ● Formerly with AOL ● Email: shalin@apache.org ● Twitter: shalinmangar ● Blog: http://shal.in
  • 3. Bangalore Lucene/Solr Meetup 8th June 2013 SolrCloud: Overview ● Distributed searching/indexing ● No single points of failure ● Near Real Time Friendly (push replication) ● Transaction logs for durability and recovery ● Real-time get ● Atomic Updates ● Optimistic Concurrency ● Request forwarding from any node in cluster ● A strong contender for your NoSQL needs as well
  • 5. Bangalore Lucene/Solr Meetup 8th June 2013 Document Routing 80000000-bfffffff 00000000-3fffffff 40000000-7fffffff c0000000-ffffffff shard1shard4 shard3 shard2 1f27 3c7 1 (MurmurHash 3) 1f27 000 0 1f27 ffffto (hash) shard 1 q=my_query shard.keys=BigCo! numShards=4 router=compositeId id = BigCo!doc5
  • 6. Bangalore Lucene/Solr Meetup 8th June 2013 SolrCloud Collections API ● /admin/collections?action=CREATE&name=mycollection – &numShards=3 – &replicationFactor=4 – &maxShardsPerNode=2 – &createNodeSet=node1:8080,node2:8080,node3:8080,... – &collection.configName=myconfigset ● /admin/collections?action=DELETE&name=mycollection ● /admin/collections?action=RELOAD&name=mycollection ● /admin/collections?action=CREATEALIAS&name=south – &collections=KA,TN,AP,KL,... ● Coming soon: Shard aliases
  • 7. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Background ● Before Solr 4.3, number of shards had to fixed at the time of collection creation ● Forced people to start with large number of shards ● If a shard ran too hot, the only fix was to re-index and therefore re-balance the collection ● Each shard is assigned a hash range ● Each shard also has a state which defaults to 'ACTIVE'
  • 8. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Features ● Seamless on-the-fly splitting – no downtime required ● Retried on failures ● /admin/collections? action=SPLITSHARD&collection=mycollection – &shard=shardId ● A lower-level CoreAdmin API comes free! – /admin/cores?action=SPLIT&core=core0&targetCore=core1&targetCore=core2 – /admin/cores?action=SPLIT&core=core0&path=/path/to/index/1&path=/path/to/index/2
  • 9. Bangalore Lucene/Solr Meetup 8th June 2013 Shard2_0 Shard1 replic a leade r Shard2 replic a leade r Shard3 replic a leade r Shard2_1 update Shard Splitting
  • 10. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Mechanism ● New sub-shards created in “construction” state ● Leader starts forwarding applicable updates, which are buffered by the sub-shards ● Leader index is split and installed on the sub-shards ● Sub-shards apply buffered updates ● Replicas are created for sub-shards and brought up to speed ● Sub-shard becomes “active” and old shard becomes “inactive”
  • 11. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Tips and Gotchas ● Supports collections with a hash based router i.e. “plain” or “compositeId” routers ● Operation is executed by the Overseer node, not by the node you requested ● HTTP request is synchronous but operation is async. A read timeout does not mean failure! ● Operation is retried on failure. Check parent leader's logs before you re-issue the command or you may end with more shards than you want
  • 12. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Tips and gotchas ● Solr Admin GUI is not aware of shard states yet so the inactive parent shard is also shown in “green” ● The CoreAdmin split command can be used against non- cloud deployments. It will spread docs alternately among the sub-indexes ● Inactive shards have to be cleaned up manually. Solr 4.4 will have a delete shard API ● Shard splitting in 4.3 release is buggy. Wait for 4.3.1
  • 13. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Looking towards the future ● GUI integration and better progress reporting/monitoring ● Better support for custom sharding use-cases ● More flexibility towards number of sub-shards, hash ranges, number of replicas etc ● Store replication factor per shard ● Suggest splits to admins based on cluster state and load
  • 14. Confidential and Proprietary © 2012 LucidWorks14 About LucidWorks • Intro to LucidWorks (formerly Lucid Imagination) – Follow: @lucidworks, @lucidimagineer – Learn: http://www.lucidworks.com • Check out SearchHub: http://www.searchhub.org • Solr 4.1 Reference Guide: http://bit.ly/11KSiMN – Older versions: http://bit.ly/12t1Egq • Our Products – LucidWorks Search – LucidWorks Big Data • Lucene Revolution – http://www.lucenerevolution.com
  • 15. Bangalore Lucene/Solr Meetup 8th June 2013 Thank you Shalin Shekhar Mangar LucidWorks