SlideShare a Scribd company logo
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Lessons from Sharding Solr at Etsy
Gregg Donovan
@greggdonovan
Senior Software Engineer, etsy.com
• 5.5 Years Solr & Lucene at Etsy.com
• 3 Years Solr & Lucene at TheLadders.com
• Speaker at LuceneRevolution 2011 & 2013
Jeff Dean, Challenges in Building Large-Scale Information Retrieval Systems
Jeff Dean, Challenges in Building Large-Scale Information Retrieval Systems
1.5Million Active Shops
32Million Items Listed
21.7Million Active Buyers
Agenda
• Sharding Solr at Etsy V0 — No sharding
• Sharding Solr at Etsy V1 — Local sharding
• Sharding Solr at Etsy V2 (*) — Distributed sharding
• Questions
* —What we’re about to launch.
Sharding V0 — Not Sharding
• Why do we shard?
• Data size grows beyond RAM on a single box
• Lucene can handle this, but there’s a performance cost
• Data size grows beyond local disk
• Latency requirements
• Not sharding allowed us to avoid many problems we’ll discuss later.
Sharding V0 — Not Sharding
• How to keep data size small enough for one host?
• Don’t store anything other than IDs
• fl=pk_id,fk_id,score
• Keep materialized objects in memcached
• Only index fields needed
• Prune index after experiments add fields
• Get more RAM
Sharding V0 — Not Sharding
• How does it fail?
• GC
• Solution
• “Banner” protocol
• Client-side load balancer
• Client connects, waits for 4-bytes — OxCODEA5CF— from the server within 1-10ms before
sending query. Otherwise, try another server.
Sharding V1 — Local Sharding
• Motivations
• Better latency
• Smaller JVMs
• Tough to open a 31gb heap dump on your laptop
• Working set still fit in RAM on one box.
• What’s the simplest system we can built?
Sharding V1 — Local Sharding
• Lucene parallelism
• Shikhar Bhushan at Etsy experimented with segment level parallelism
• See Search-time Parallelism at Lucene Revolution 2014
• Made its way into LUCENE-6294 (Generalize how IndexSearcher parallelizes collection
execution). Committed in Lucene 5.1.
• Ended up with eight Solr shards per host, each in its own small JVM
• Moved query generation and re-ranking to separate process: the “mixer”
Sharding V1 — Local Sharding
• Based on Solr distributed search
• By default, Solr does two-pass distributed search
• First pass gets top IDs
• Second pass fetches stored fields for each top document
• Implemented distrib.singlePass mode (SOLR-5768)
• Does not make sense if individual documents are expensive to fetch
• Basic request tracing via HTTP headers (SOLR-5969)
Sharding V1 — Local Sharding
• Required us to fetch 1000+ results from each shard for reranking layer
• How to efficiently fetch 1000 documents per shard?
• Use Solr’s field syntax to fetch data from FieldCache
• e.g. fl=pk_id:field(pk_id),fk_id:field(fk_id),score
• When all fields are “pseudo” fields, no need to fetch stored fields per document.
Sharding V1 — Local Sharding
• Result
• Very large latency win
• Easy system to manage
• Well understood failure and recovery
• Avoided solving many distributed systems issues
Sharding V2 — Distributed Sharding
• Motivation
• Further latency improvements
• Prepare for data to exceed a single node’s capacity
• Significant latency improvements require finer sharding, more CPUs per request
• Requires a real distributed system and sophisticated RPC
• Before proceeding, stop what you’re doing and read everything by Google’s Jeff Dean and
Twitter’s Marius Eriksen
Sharding V2 — Distributed Sharding
• New problems
• Partial failures
• Lagging shards
• Synchronizing cluster state and configuration
• Network partitions
• Jespen
• Distributed IDF issues exacerbated
Solving Distributed IDF
• Inverse Document Frequency (IDF) now varies across shards, biasing ranking
• Calculate IDF offline in Hadoop
• IDFReplacedSimilarityFactory
• Offline data populates cache of Map<BytesRef,Float> (term —> score)
• Override SimilarityFactory#idfExplain
• Cache misses given rare document constant
• Can be extended to solve i18n IDF issues
Sharding V2 — Distributed Sharding
• ShardHandler
• Solr’s abstraction for fanning out queries to shards
• Ships with default implementation (HttpShardHandler) based on HTTP 1.1
• Does fanout (distrib=true) and processes requests coming from other Solr nodes
(distrib=false).
• Reads shards.rows and shards.start parameters
ShardHandler API
Solr’s SearchHandler calls submit for each shard and then either takeCompletedIncludingErrors
or takeCompletedOrError depending on partial results tolerance.
public abstract class ShardHandler {

public abstract void checkDistributed(ResponseBuilder rb);


public abstract void submit(ShardRequest sreq, String shard, ModifiableSolrParams params);

public abstract ShardResponse takeCompletedIncludingErrors();

public abstract ShardResponse takeCompletedOrError();

public abstract void cancelAll();

public abstract ShardHandlerFactory getShardHandlerFactory();

}
Sharding V2 — Distributed Sharding
Distributed query requirements
• Distributed tracing
• E.g.: Google’s Dapper, Twitter’s Zipkin, Etsy’s CrossStich
• Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
• Handle node failures, slowness
Better Know Your Switches
Have a clear understanding of your networking requirements and whether your hardware meets
them.
• Prefer line-rate switches
• Prefer cut-through to store-and-forward
• No buffering, just read the IP packet header and move packet to the destination
• Track and graph switch statistics in the same dashboard you display your search latency stats
• errors, retransmits, etc.
Sharding V2 — Distributed Sharding
First experiment, Twitter’s Finagle
• Built on Netty
• Mux RPC multiplexing protocol
• SeeYour Server as a Function by Marius Eriksen
• Built-in support for Zipkin distributed tracing
• Served as inspiration for Facebook’s futures-based RPC Wangle
• Implemented a FinagleShardHandler
Sharding V2 — Distributed Sharding
Second experiment, custom Thrift-based protocol
• Blocking I/O easier to integrate with SolrJ API
• Able to integrate our own distributed tracing
• LZ4 compression via a custom Thrift TTransport
Sharding V2 — Distributed Sharding
Future experiment: HTTP/2
• One TCP connection for all requests between two servers
• Libraries
• Square’s OkHttp
• Google’s gRpc
• Jetty client in 9.3+ — appears to be Solr’s choice
Sharding V2 — Distributed Sharding
Implementation note
• Separated fanout from individual request processing
• SolrJ client via an EmbeddedSolrServer containing empty RAM directory.
• Saves a network hop
• Makes shards easier to profile, tune
• Can return result to SolrJ without sending merged results over the network
Sharding V2 — Distributed Sharding
• Good
• Individual shard times demonstrate very low average latency
• Bad
• Overall p95, p99 nowhere near averages
• Why? Lagging shards due to GC, filterCache misses, etc.
• More shards means more chances to hit outliers
Sharding V2 — Distributed Sharding
• Solutions
• See The Tail at Scale by Jeff Dean, CACM 2013.
• Eliminate all sources of inter-host variability
• No filter or other cache misses
• No GC
• Eliminate OS pauses, networking hiccups, deploys, restarts, etc.
• Not realistic
Sharding V2 — Distributed Sharding
• Backup Requests
• Methods
• Brute force — send two copies of every request to different hosts, take the fastest
response
• Less crude — wait X milliseconds for the first server to respond, then send a backup
request.
• Adaptive — choose X based on the first Y% of responses to return.
• Cancellation — Cancel the slow request to save CPU once you’re sure you don’t need it.
Sharding V2 — Distributed Sharding
• “Good enough”
• Return results to user after X% of results return if there are enough results. Don’t issue
backup requests, just cancel laggards.
• Only applicable in certain domains.
• Poses questions:
• Should you cache partial results?
• How is paging effected?
Resilience Testing
Now you own a distributed system. How do you know it works?
• “The Troublemaker”
• Inspired by Netflix’s Chaos Monkey
• Authored by Etsy’s Toria Gibbs
• Make sure humans can operate it
• Failure simulation — don’t wait until 3am
• Gameday exercises and Runbooks
Bonus material!
Better Know Your Kernel
A lesson not about sharding learned while sharding…
• Linux’s futex_wait() was broken in CentOS 6.6
• Backported patches needed from Linux 3.18
• Future direction: make kernel updates independent from distribution updates
• E.g. Plenty of good stuff (e.g. networking improvements, kernel introspection [see
@brendangregg]) between 3.10 and 4.2+, but it won’t come to CentOS for years
• Updating kernel alone easier to roll out
What else are we working on?
• Mesos for cluster orchestration
• GPUs for massive increases in per query computational capacity
Thanks for coming.
gregg@etsy.com
@greggdonovan
Questions?
@greggdonovan
gregg@etsy.com

More Related Content

What's hot

Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Lucidworks
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
Rahul Jain
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Lucidworks
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Lucidworks
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
Lucidworks
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
Rahul Jain
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Lucidworks
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
ravikgiitk
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
Varun Thacker
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
Lucidworks
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchMark Miller
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Databricks
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
Lucidworks
 
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolMail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
Lucidworks
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
Shalin Shekhar Mangar
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
Rahul Jain
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Lucidworks
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
thelabdude
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Ayon Sinha
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Lucidworks
 

What's hot (20)

Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolMail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
 

Similar to Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy

Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
Anshum Gupta
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Lucidworks
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
Amazon Web Services
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Newlink
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Newlink
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Newlink
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640LLC NewLink
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Newlink
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
John Adams
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
Anshum Gupta
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
Roger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
Eonblast
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
Thijs Terlouw
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
Jon Haddad
 

Similar to Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy (20)

Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 

Recently uploaded (20)

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 

Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. Lessons from Sharding Solr at Etsy Gregg Donovan @greggdonovan Senior Software Engineer, etsy.com
  • 3. • 5.5 Years Solr & Lucene at Etsy.com • 3 Years Solr & Lucene at TheLadders.com • Speaker at LuceneRevolution 2011 & 2013
  • 4. Jeff Dean, Challenges in Building Large-Scale Information Retrieval Systems
  • 5. Jeff Dean, Challenges in Building Large-Scale Information Retrieval Systems
  • 6.
  • 10. Agenda • Sharding Solr at Etsy V0 — No sharding • Sharding Solr at Etsy V1 — Local sharding • Sharding Solr at Etsy V2 (*) — Distributed sharding • Questions * —What we’re about to launch.
  • 11. Sharding V0 — Not Sharding • Why do we shard? • Data size grows beyond RAM on a single box • Lucene can handle this, but there’s a performance cost • Data size grows beyond local disk • Latency requirements • Not sharding allowed us to avoid many problems we’ll discuss later.
  • 12. Sharding V0 — Not Sharding • How to keep data size small enough for one host? • Don’t store anything other than IDs • fl=pk_id,fk_id,score • Keep materialized objects in memcached • Only index fields needed • Prune index after experiments add fields • Get more RAM
  • 13.
  • 14. Sharding V0 — Not Sharding • How does it fail? • GC • Solution • “Banner” protocol • Client-side load balancer • Client connects, waits for 4-bytes — OxCODEA5CF— from the server within 1-10ms before sending query. Otherwise, try another server.
  • 15. Sharding V1 — Local Sharding • Motivations • Better latency • Smaller JVMs • Tough to open a 31gb heap dump on your laptop • Working set still fit in RAM on one box. • What’s the simplest system we can built?
  • 16. Sharding V1 — Local Sharding • Lucene parallelism • Shikhar Bhushan at Etsy experimented with segment level parallelism • See Search-time Parallelism at Lucene Revolution 2014 • Made its way into LUCENE-6294 (Generalize how IndexSearcher parallelizes collection execution). Committed in Lucene 5.1. • Ended up with eight Solr shards per host, each in its own small JVM • Moved query generation and re-ranking to separate process: the “mixer”
  • 17. Sharding V1 — Local Sharding • Based on Solr distributed search • By default, Solr does two-pass distributed search • First pass gets top IDs • Second pass fetches stored fields for each top document • Implemented distrib.singlePass mode (SOLR-5768) • Does not make sense if individual documents are expensive to fetch • Basic request tracing via HTTP headers (SOLR-5969)
  • 18. Sharding V1 — Local Sharding • Required us to fetch 1000+ results from each shard for reranking layer • How to efficiently fetch 1000 documents per shard? • Use Solr’s field syntax to fetch data from FieldCache • e.g. fl=pk_id:field(pk_id),fk_id:field(fk_id),score • When all fields are “pseudo” fields, no need to fetch stored fields per document.
  • 19. Sharding V1 — Local Sharding • Result • Very large latency win • Easy system to manage • Well understood failure and recovery • Avoided solving many distributed systems issues
  • 20. Sharding V2 — Distributed Sharding • Motivation • Further latency improvements • Prepare for data to exceed a single node’s capacity • Significant latency improvements require finer sharding, more CPUs per request • Requires a real distributed system and sophisticated RPC • Before proceeding, stop what you’re doing and read everything by Google’s Jeff Dean and Twitter’s Marius Eriksen
  • 21. Sharding V2 — Distributed Sharding • New problems • Partial failures • Lagging shards • Synchronizing cluster state and configuration • Network partitions • Jespen • Distributed IDF issues exacerbated
  • 22. Solving Distributed IDF • Inverse Document Frequency (IDF) now varies across shards, biasing ranking • Calculate IDF offline in Hadoop • IDFReplacedSimilarityFactory • Offline data populates cache of Map<BytesRef,Float> (term —> score) • Override SimilarityFactory#idfExplain • Cache misses given rare document constant • Can be extended to solve i18n IDF issues
  • 23. Sharding V2 — Distributed Sharding • ShardHandler • Solr’s abstraction for fanning out queries to shards • Ships with default implementation (HttpShardHandler) based on HTTP 1.1 • Does fanout (distrib=true) and processes requests coming from other Solr nodes (distrib=false). • Reads shards.rows and shards.start parameters
  • 24. ShardHandler API Solr’s SearchHandler calls submit for each shard and then either takeCompletedIncludingErrors or takeCompletedOrError depending on partial results tolerance. public abstract class ShardHandler {
 public abstract void checkDistributed(ResponseBuilder rb); 
 public abstract void submit(ShardRequest sreq, String shard, ModifiableSolrParams params);
 public abstract ShardResponse takeCompletedIncludingErrors();
 public abstract ShardResponse takeCompletedOrError();
 public abstract void cancelAll();
 public abstract ShardHandlerFactory getShardHandlerFactory();
 }
  • 25. Sharding V2 — Distributed Sharding Distributed query requirements • Distributed tracing • E.g.: Google’s Dapper, Twitter’s Zipkin, Etsy’s CrossStich • Dapper, a Large-Scale Distributed Systems Tracing Infrastructure • Handle node failures, slowness
  • 26.
  • 27. Better Know Your Switches Have a clear understanding of your networking requirements and whether your hardware meets them. • Prefer line-rate switches • Prefer cut-through to store-and-forward • No buffering, just read the IP packet header and move packet to the destination • Track and graph switch statistics in the same dashboard you display your search latency stats • errors, retransmits, etc.
  • 28. Sharding V2 — Distributed Sharding First experiment, Twitter’s Finagle • Built on Netty • Mux RPC multiplexing protocol • SeeYour Server as a Function by Marius Eriksen • Built-in support for Zipkin distributed tracing • Served as inspiration for Facebook’s futures-based RPC Wangle • Implemented a FinagleShardHandler
  • 29. Sharding V2 — Distributed Sharding Second experiment, custom Thrift-based protocol • Blocking I/O easier to integrate with SolrJ API • Able to integrate our own distributed tracing • LZ4 compression via a custom Thrift TTransport
  • 30. Sharding V2 — Distributed Sharding Future experiment: HTTP/2 • One TCP connection for all requests between two servers • Libraries • Square’s OkHttp • Google’s gRpc • Jetty client in 9.3+ — appears to be Solr’s choice
  • 31. Sharding V2 — Distributed Sharding Implementation note • Separated fanout from individual request processing • SolrJ client via an EmbeddedSolrServer containing empty RAM directory. • Saves a network hop • Makes shards easier to profile, tune • Can return result to SolrJ without sending merged results over the network
  • 32. Sharding V2 — Distributed Sharding • Good • Individual shard times demonstrate very low average latency • Bad • Overall p95, p99 nowhere near averages • Why? Lagging shards due to GC, filterCache misses, etc. • More shards means more chances to hit outliers
  • 33. Sharding V2 — Distributed Sharding • Solutions • See The Tail at Scale by Jeff Dean, CACM 2013. • Eliminate all sources of inter-host variability • No filter or other cache misses • No GC • Eliminate OS pauses, networking hiccups, deploys, restarts, etc. • Not realistic
  • 34. Sharding V2 — Distributed Sharding • Backup Requests • Methods • Brute force — send two copies of every request to different hosts, take the fastest response • Less crude — wait X milliseconds for the first server to respond, then send a backup request. • Adaptive — choose X based on the first Y% of responses to return. • Cancellation — Cancel the slow request to save CPU once you’re sure you don’t need it.
  • 35. Sharding V2 — Distributed Sharding • “Good enough” • Return results to user after X% of results return if there are enough results. Don’t issue backup requests, just cancel laggards. • Only applicable in certain domains. • Poses questions: • Should you cache partial results? • How is paging effected?
  • 36. Resilience Testing Now you own a distributed system. How do you know it works? • “The Troublemaker” • Inspired by Netflix’s Chaos Monkey • Authored by Etsy’s Toria Gibbs • Make sure humans can operate it • Failure simulation — don’t wait until 3am • Gameday exercises and Runbooks
  • 38. Better Know Your Kernel A lesson not about sharding learned while sharding… • Linux’s futex_wait() was broken in CentOS 6.6 • Backported patches needed from Linux 3.18 • Future direction: make kernel updates independent from distribution updates • E.g. Plenty of good stuff (e.g. networking improvements, kernel introspection [see @brendangregg]) between 3.10 and 4.2+, but it won’t come to CentOS for years • Updating kernel alone easier to roll out
  • 39. What else are we working on? • Mesos for cluster orchestration • GPUs for massive increases in per query computational capacity