SlideShare a Scribd company logo
1 of 27
Download to read offline
Solr Compute Cloud – An Elastic 
Solr Infrastructure 
Nitin Sharma 
- Member of technical staff, BloomReach 
- nitin.sharma@bloomreach.com
Abstract 
Scaling search platforms is an extremely hard problem 
• Serving hundreds of millions of documents 
• Low latency 
• High throughput workloads 
• Optimized cost. 
At BloomReach, we have implemented SC2, an elastic Solr infrastructure for big data applications 
that: 
• Supports heterogeneous workloads while hosted in the cloud. 
• Dynamically grows/shrinks search servers 
• Application and Pipeline level isolation, NRT search and indexing. 
• Offers latency guarantees and application-specific performance tuning. 
• Provides high-availability features like cluster replacement, cross-data center support, disaster 
recovery etc.
About Us 
BloomReach 
BloomReach has developed a personalized discovery platform that features applications that analyze 
big data to makes our customers’ digital content more discoverable, relevant and profitable. 
Myself 
I work on search platform scaling for BloomReach’s big data. My relevant experience and background 
includes scaling real-time services for latency sensitive applications and building performance and search-quality 
metrics infrastructure for personalization platforms.
The BloomReach 
Personalized 
Discovery 
Platform
BloomReach’s Applications 
Organic 
Search 
Content understanding 
What it does 
Content optimization, 
management and measurement 
Benefit 
Enhanced discoverability and 
customer acquisition in organic search 
What it does 
Personalized onsite search and 
navigation across devices 
Benefit 
Relevant and consistent onsite 
experiences for new and known users 
What it does 
Merchandising tool that understands 
products and identifies opportunities 
Benefit 
Prioritize and optimize 
online merchandising 
SNAP 
Compass
Agenda 
• BloomReach search use cases and architecture 
• Old architecture and issues 
• Scaling challenges 
• Elastic SolrCloud architecture and benefits 
• Lessons learned
BloomReach Search Use Cases 
1. Front-end (serving) queries – Uptime and Latency sensitive 
2. Batch search pipelines – Throughput sensitive 
3. Time bound indexing requirements – Customer Specific 
4. Time bound Solr config updates
BloomReach Search Architecture 
Zookeeper Ensemble Map Reduce 
Solr 
Cluster 
Pipelines (Reads) 
Indexing Pipelines 
Pipeline 1 
Pipeline 2 
Pipeline n 
Indexing 1 
Indexing 2 
Indexing n 
Heavy Load 
Moderate Load 
Light Load 
Legend 
Public API 
Search Traffic 
Search Traffic
Throughput Issues… 
Zookeeper Ensemble 
Solr 
Cluster 
Pipeline 1 
Pipeline 2 
Pipeline n 
Indexing 1 
Indexing 2 
Indexing n 
Public API 
Search Traffic 
● Heterogeneous read 
workload 
● Same collection - different 
pipelines, different query 
patterns, different schedule 
● Cache tuning is virtually 
impossible 
● Larger pipeline starving the 
small ones 
● Machine utilization 
determines throughput and 
stability of a pipeline at any 
point 
● No isolation among jobs
Stability and Uptime Issues… 
Zookeeper Ensemble 
Solr 
Cluster 
Pipeline 1 
Pipeline 2 
Pipeline n 
Indexing 1 
Indexing 2 
Indexing n 
Public API 
Search Traffic 
● Bad clients – bring down 
the cluster/degrade 
performance 
● Bad queries (with heavy 
load) – render nodes 
unresponsive 
● Garbage collection issues 
● ZK stability issues (as we 
scale collections) 
● CPU /Load Issues 
● Higher number of 
concurrent pipelines, higher 
number of issues
Indexing Issues… 
Zookeeper Ensemble 
Solr 
Cluster 
Pipeline 1 
Pipeline 2 
Pipeline n 
Indexing 1 
Indexing 2 
Indexing n 
Public API 
Search Traffic 
● Commit frequencies vary 
with indexer types 
● Indexer run during another 
pipeline – performance 
● Indexer client leaks 
● Too many stored fields 
● Non-batch updates
Rethinking… 
• Shared cluster for pipelines does not scale. 
• Guaranteeing an uptime of 99.99+ - non trivial 
• Every job runs great in isolation. When you put them together, they fail. 
• Running index-heavy load and read-heavy load - cluster performance issues. 
• Any direct access to production cluster – cluster stability (client leaks, bad queries etc.). 
What if every pipeline had its own cluster?
Solr Compute Cloud (SC2) 
• Elastic Infrastructure – Provision Solr Clusters on demand, on-the-fly. 
• Create, Use, Terminate Model - Create a temporary cluster with necessary data, use it and throw it away. 
• Technologies behind SC2 (built in House) 
Cluster Management API - Dynamic cluster provisioning and resource allocation. 
Solr HAFT – High availability and data management library for SolrCloud. 
• Isolation - Pipelines get their own cluster. One cannot disrupt another. 
• Dynamic Scaling – Every pipeline can state its own replication requirements. 
• Production Safeguard - No direct access. Safeguards from bad clients/access patterns. 
• Cost Saving – Provision for the average; withstand peak with elastic growth.
Solr Compute Cloud 
Zookeeper Ensemble 
Solr 
Cluster 
Request: {Collection: A, Replica: 6} 
Pipeline 1 
Solr 
Compute 
Cloud 
API 
Solr Cluster 
Collection A 
Replicas: 6 
1. Read pipeline requests 
collection and desired 
replicas from SC2 API. 
2. SC2 API provisions cluster 
dynamically with needed 
setup (and streams Solr 
data). 
3. SC2 calls HAFT service to 
replicate data from 
production to provisioned 
cluster. 
4. Pipeline uses this cluster 
to run job. 
1 
4 
2 
3 
Solr 
HAFT 
Service 
3 
Read 
Replicate
Solr Compute Cloud… 
Zookeeper Ensemble 
Solr 
Cluster 
Pipeline 1 
Solr 
Compute 
Cloud 
API 
Solr Cluster 
Collection A 
Replicas: 6 
1. Pipeline finishes running 
the job. 
2. Pipeline calls SC2 API to 
terminate the cluster. 
3. SC2 terminates the 
cluster. 
Terminate: {Cluster} 
2 
3 
Solr 
HAFT 
Service 
1
Solr Compute Cloud – Read Pipeline View 
Zookeeper Ensemble 
Pipeline 1 
Solr 
Compute 
Cloud 
API 
Solr Cluster 
Collection A 
Replicas: 6 
Request: {Collection: A, Replica: 6} 
Pipeline 2 
Solr Cluster 
Collection B 
Replicas: 2 
Request: {Collection: B, Replica: 2} 
Solr Cluster Pipeline n 
Collection C 
Replicas: 1 
Request: {Collection: C, Replica: 1} 
Solr 
HAFT 
Service 
Production 
Solr Cluster
Solr Compute Cloud – Indexing 
Zookeeper Ensemble 
Production 
Solr Cluster 
Request: {Collection: A, Replica: 2} 
Indexing 
Solr 
Compute 
Cloud 
API 
Solr Cluster 
Collection A 
Replicas: 6 
1. Read pipeline requests 
collection and desired 
replicas from SC2 API. 
2. SC2 API provisions cluster 
dynamically with needed 
setup (and streams Solr 
data). 
3. Indexer uses this cluster 
to index the data. 
4. Indexer calls HAFT 
service to replicate the 
index from dynamic 
cluster to production. 
5. HAFT service reads data 
from dynamic cluster and 
replicates to production 
Solr. 
1 
3 
2 
Replicate 
Solr HAFT Service 
4 
5 
Read
Solr Compute Cloud – Global View 
Zookeeper Ensemble 
Solr 
Compute 
Cloud 
API 
Solr HAFT Service 
Production 
Solr Cluster 
Indexing Pipelines 1 
Elastic Clusters 
Indexing Pipelines n 
Read Pipelines 1 
Read Pipelines n 
Provision: {Cluster} 
Terminate: {Cluster} 
Replicate Index 
Replicate Index 
Run Job
Solr Compute Cloud API 
1. API to provision clusters on demand. 
2. Dynamic cluster and resource allocation (includes cost optimization) 
3. Track request state, cluster performance and cost. 
4. Terminate long-running, runaway clusters.
Solr HAFT Service 
1. High availability and fault tolerance 
2. Home-grown technology 
3. Open Source - J (Work in progress) 
4. Features 
• One push disaster recovery 
• High availability operations 
• Replace node 
• Add replicas 
• Repair collection 
• Collection versioning 
• Cluster backup operations 
• Dynamic replica creation 
• Cluster clone 
• Cluster swap 
• Cluster state reconstruction
Solr HAFT Service – Functional View 
Black Box Recording 
Index Management Actions 
Custom Commit Node Replacement 
Collection Versioning 
Solr HAFT Service 
Clone Collections 
Clone Alias 
Node Repair 
Clone Cluster 
Lucene Segment 
Optimize 
High Availability Actions 
Cluster Backup Operations 
Solr Metadata Zookeeper 
Metadata 
Dynamic Replica 
Creation 
Cluster Clone 
Cluster Swap 
Cluster State 
Reconstruction 
Verification Monitoring
Disaster Recovery in New Architecture 
Zookeeper Ensemble 
Old 
Production 
Solr 
Cluster 
Zookeeper Ensemble 
New 
Solr 
Cluster 
Push 
Button 
Recovery 
Solr HAFT Service 
Brave Soul on Pager Duty 
1 
2 
3 
DNS 
1. Guy on Pager clicks the 
recovery button 
2. Solr HAFT Service 
triggers 
Cluster Setup 
State Reconstruction 
Cluster Clone 
Cluster Swap 
3. Production DNS – New 
Cluster
SC2 vs Non-SC2 (Stability Features) 
Property 
Non-­‐SC2 
SC2 
Linear 
Scalability 
for 
Heterogeneous 
Workload 
Pipeline 
Level 
IsolaGon 
Dynamic 
CollecGon 
Scaling 
PrevenGon 
from 
Bad 
Clients 
Pipeline 
Specific 
Performance 
No 
Direct 
Access 
to 
ProducGon 
Cluster 
Can 
Sleep 
at 
night? 
J
SC2 vs Non-SC2 (Availability Features) 
Property 
Non-­‐SC2 
SC2 
Cross 
Data-­‐Center 
Support 
Cluster 
Cloning 
CollecGon 
Versioning 
One-­‐Push 
Disaster 
Recovery 
Repair 
API 
for 
Nodes/CollecGons 
Node 
Replacement
Lessons Learned 
1. Solr is a search platform. Do not use it as a database (for scans and lookups). 
Evaluate your stored fields. 
2. Understand access patterns, QPS and queries in detail. Be careful when tuning 
caches. 
3. Have access control for large-scale jobs that directly talk to your cluster. (Internal 
DDOS attacks are hard to track.) 
4. Instrument every piece of infrastructure and collect metrics. 
5. Build automated disaster recovery (You will need it. J)
Questions? 
Thank You! 
NiGn 
Sharma 
niGn.sharma@bloomreach.com 
hQps://www.linkedin.com/in/kniGnsharma

More Related Content

What's hot

Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillBuilding large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillHenry Saputra
 
Senlin deep dive 2016
Senlin deep dive 2016Senlin deep dive 2016
Senlin deep dive 2016Qiming Teng
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformLegacy Typesafe (now Lightbend)
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )varasteh65
 
-Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahar...
-Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahar...-Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahar...
-Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahar...Гриднев Виталий
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpNathan Handler
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNDataWorks Summit/Hadoop Summit
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsMonal Daxini
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformTsuyoshi OZAWA
 
Patterns of-streaming-applications-qcon-2018-monal-daxini
Patterns of-streaming-applications-qcon-2018-monal-daxiniPatterns of-streaming-applications-qcon-2018-monal-daxini
Patterns of-streaming-applications-qcon-2018-monal-daxiniMonal Daxini
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Akka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive PlatformAkka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive PlatformLegacy Typesafe (now Lightbend)
 
Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20Qiming Teng
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancingconfluent
 
Managing Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native WayManaging Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native WayQiming Teng
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
 
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillHarnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillTerence Yim
 
Zoo keeper in the wild
Zoo keeper in the wildZoo keeper in the wild
Zoo keeper in the wilddatamantra
 

What's hot (20)

Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillBuilding large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twill
 
Senlin deep dive 2016
Senlin deep dive 2016Senlin deep dive 2016
Senlin deep dive 2016
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )
 
-Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahar...
-Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahar...-Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahar...
-Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahar...
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARN
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platform
 
Patterns of-streaming-applications-qcon-2018-monal-daxini
Patterns of-streaming-applications-qcon-2018-monal-daxiniPatterns of-streaming-applications-qcon-2018-monal-daxini
Patterns of-streaming-applications-qcon-2018-monal-daxini
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Akka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive PlatformAkka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive Platform
 
Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
 
Managing Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native WayManaging Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native Way
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillHarnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache Twill
 
Zoo keeper in the wild
Zoo keeper in the wildZoo keeper in the wild
Zoo keeper in the wild
 

Viewers also liked

SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...Lucidworks
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataShalin Shekhar Mangar
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupShalin Shekhar Mangar
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Electionravikgiitk
 
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaLucidworks
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkitthelabdude
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 

Viewers also liked (15)

SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene Meetup
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Scaling search with SolrCloud
Scaling search with SolrCloudScaling search with SolrCloud
Scaling search with SolrCloud
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 

Similar to Solr Compute Cloud - An Elastic Solr Infrastructure for Big Data Applications

Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...Lucidworks
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application clusterSatishbabu Gunukula
 
Stabilizing the Jenga tower: Scaling out Ceilometer
Stabilizing the Jenga tower: Scaling out CeilometerStabilizing the Jenga tower: Scaling out Ceilometer
Stabilizing the Jenga tower: Scaling out CeilometerPradeep Kilambi
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga towerGordon Chung
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scaleAnshum Gupta
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Anthony Baker
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks
 
Apache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupApache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupKarthik Ramasamy
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...confluent
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 

Similar to Solr Compute Cloud - An Elastic Solr Infrastructure for Big Data Applications (20)

Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application cluster
 
Stabilizing the Jenga tower: Scaling out Ceilometer
Stabilizing the Jenga tower: Scaling out CeilometerStabilizing the Jenga tower: Scaling out Ceilometer
Stabilizing the Jenga tower: Scaling out Ceilometer
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga tower
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scale
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
Apache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupApache Pulsar Seattle - Meetup
Apache Pulsar Seattle - Meetup
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 

Solr Compute Cloud - An Elastic Solr Infrastructure for Big Data Applications

  • 1.
  • 2. Solr Compute Cloud – An Elastic Solr Infrastructure Nitin Sharma - Member of technical staff, BloomReach - nitin.sharma@bloomreach.com
  • 3. Abstract Scaling search platforms is an extremely hard problem • Serving hundreds of millions of documents • Low latency • High throughput workloads • Optimized cost. At BloomReach, we have implemented SC2, an elastic Solr infrastructure for big data applications that: • Supports heterogeneous workloads while hosted in the cloud. • Dynamically grows/shrinks search servers • Application and Pipeline level isolation, NRT search and indexing. • Offers latency guarantees and application-specific performance tuning. • Provides high-availability features like cluster replacement, cross-data center support, disaster recovery etc.
  • 4. About Us BloomReach BloomReach has developed a personalized discovery platform that features applications that analyze big data to makes our customers’ digital content more discoverable, relevant and profitable. Myself I work on search platform scaling for BloomReach’s big data. My relevant experience and background includes scaling real-time services for latency sensitive applications and building performance and search-quality metrics infrastructure for personalization platforms.
  • 5. The BloomReach Personalized Discovery Platform
  • 6. BloomReach’s Applications Organic Search Content understanding What it does Content optimization, management and measurement Benefit Enhanced discoverability and customer acquisition in organic search What it does Personalized onsite search and navigation across devices Benefit Relevant and consistent onsite experiences for new and known users What it does Merchandising tool that understands products and identifies opportunities Benefit Prioritize and optimize online merchandising SNAP Compass
  • 7. Agenda • BloomReach search use cases and architecture • Old architecture and issues • Scaling challenges • Elastic SolrCloud architecture and benefits • Lessons learned
  • 8. BloomReach Search Use Cases 1. Front-end (serving) queries – Uptime and Latency sensitive 2. Batch search pipelines – Throughput sensitive 3. Time bound indexing requirements – Customer Specific 4. Time bound Solr config updates
  • 9. BloomReach Search Architecture Zookeeper Ensemble Map Reduce Solr Cluster Pipelines (Reads) Indexing Pipelines Pipeline 1 Pipeline 2 Pipeline n Indexing 1 Indexing 2 Indexing n Heavy Load Moderate Load Light Load Legend Public API Search Traffic Search Traffic
  • 10. Throughput Issues… Zookeeper Ensemble Solr Cluster Pipeline 1 Pipeline 2 Pipeline n Indexing 1 Indexing 2 Indexing n Public API Search Traffic ● Heterogeneous read workload ● Same collection - different pipelines, different query patterns, different schedule ● Cache tuning is virtually impossible ● Larger pipeline starving the small ones ● Machine utilization determines throughput and stability of a pipeline at any point ● No isolation among jobs
  • 11. Stability and Uptime Issues… Zookeeper Ensemble Solr Cluster Pipeline 1 Pipeline 2 Pipeline n Indexing 1 Indexing 2 Indexing n Public API Search Traffic ● Bad clients – bring down the cluster/degrade performance ● Bad queries (with heavy load) – render nodes unresponsive ● Garbage collection issues ● ZK stability issues (as we scale collections) ● CPU /Load Issues ● Higher number of concurrent pipelines, higher number of issues
  • 12. Indexing Issues… Zookeeper Ensemble Solr Cluster Pipeline 1 Pipeline 2 Pipeline n Indexing 1 Indexing 2 Indexing n Public API Search Traffic ● Commit frequencies vary with indexer types ● Indexer run during another pipeline – performance ● Indexer client leaks ● Too many stored fields ● Non-batch updates
  • 13. Rethinking… • Shared cluster for pipelines does not scale. • Guaranteeing an uptime of 99.99+ - non trivial • Every job runs great in isolation. When you put them together, they fail. • Running index-heavy load and read-heavy load - cluster performance issues. • Any direct access to production cluster – cluster stability (client leaks, bad queries etc.). What if every pipeline had its own cluster?
  • 14. Solr Compute Cloud (SC2) • Elastic Infrastructure – Provision Solr Clusters on demand, on-the-fly. • Create, Use, Terminate Model - Create a temporary cluster with necessary data, use it and throw it away. • Technologies behind SC2 (built in House) Cluster Management API - Dynamic cluster provisioning and resource allocation. Solr HAFT – High availability and data management library for SolrCloud. • Isolation - Pipelines get their own cluster. One cannot disrupt another. • Dynamic Scaling – Every pipeline can state its own replication requirements. • Production Safeguard - No direct access. Safeguards from bad clients/access patterns. • Cost Saving – Provision for the average; withstand peak with elastic growth.
  • 15. Solr Compute Cloud Zookeeper Ensemble Solr Cluster Request: {Collection: A, Replica: 6} Pipeline 1 Solr Compute Cloud API Solr Cluster Collection A Replicas: 6 1. Read pipeline requests collection and desired replicas from SC2 API. 2. SC2 API provisions cluster dynamically with needed setup (and streams Solr data). 3. SC2 calls HAFT service to replicate data from production to provisioned cluster. 4. Pipeline uses this cluster to run job. 1 4 2 3 Solr HAFT Service 3 Read Replicate
  • 16. Solr Compute Cloud… Zookeeper Ensemble Solr Cluster Pipeline 1 Solr Compute Cloud API Solr Cluster Collection A Replicas: 6 1. Pipeline finishes running the job. 2. Pipeline calls SC2 API to terminate the cluster. 3. SC2 terminates the cluster. Terminate: {Cluster} 2 3 Solr HAFT Service 1
  • 17. Solr Compute Cloud – Read Pipeline View Zookeeper Ensemble Pipeline 1 Solr Compute Cloud API Solr Cluster Collection A Replicas: 6 Request: {Collection: A, Replica: 6} Pipeline 2 Solr Cluster Collection B Replicas: 2 Request: {Collection: B, Replica: 2} Solr Cluster Pipeline n Collection C Replicas: 1 Request: {Collection: C, Replica: 1} Solr HAFT Service Production Solr Cluster
  • 18. Solr Compute Cloud – Indexing Zookeeper Ensemble Production Solr Cluster Request: {Collection: A, Replica: 2} Indexing Solr Compute Cloud API Solr Cluster Collection A Replicas: 6 1. Read pipeline requests collection and desired replicas from SC2 API. 2. SC2 API provisions cluster dynamically with needed setup (and streams Solr data). 3. Indexer uses this cluster to index the data. 4. Indexer calls HAFT service to replicate the index from dynamic cluster to production. 5. HAFT service reads data from dynamic cluster and replicates to production Solr. 1 3 2 Replicate Solr HAFT Service 4 5 Read
  • 19. Solr Compute Cloud – Global View Zookeeper Ensemble Solr Compute Cloud API Solr HAFT Service Production Solr Cluster Indexing Pipelines 1 Elastic Clusters Indexing Pipelines n Read Pipelines 1 Read Pipelines n Provision: {Cluster} Terminate: {Cluster} Replicate Index Replicate Index Run Job
  • 20. Solr Compute Cloud API 1. API to provision clusters on demand. 2. Dynamic cluster and resource allocation (includes cost optimization) 3. Track request state, cluster performance and cost. 4. Terminate long-running, runaway clusters.
  • 21. Solr HAFT Service 1. High availability and fault tolerance 2. Home-grown technology 3. Open Source - J (Work in progress) 4. Features • One push disaster recovery • High availability operations • Replace node • Add replicas • Repair collection • Collection versioning • Cluster backup operations • Dynamic replica creation • Cluster clone • Cluster swap • Cluster state reconstruction
  • 22. Solr HAFT Service – Functional View Black Box Recording Index Management Actions Custom Commit Node Replacement Collection Versioning Solr HAFT Service Clone Collections Clone Alias Node Repair Clone Cluster Lucene Segment Optimize High Availability Actions Cluster Backup Operations Solr Metadata Zookeeper Metadata Dynamic Replica Creation Cluster Clone Cluster Swap Cluster State Reconstruction Verification Monitoring
  • 23. Disaster Recovery in New Architecture Zookeeper Ensemble Old Production Solr Cluster Zookeeper Ensemble New Solr Cluster Push Button Recovery Solr HAFT Service Brave Soul on Pager Duty 1 2 3 DNS 1. Guy on Pager clicks the recovery button 2. Solr HAFT Service triggers Cluster Setup State Reconstruction Cluster Clone Cluster Swap 3. Production DNS – New Cluster
  • 24. SC2 vs Non-SC2 (Stability Features) Property Non-­‐SC2 SC2 Linear Scalability for Heterogeneous Workload Pipeline Level IsolaGon Dynamic CollecGon Scaling PrevenGon from Bad Clients Pipeline Specific Performance No Direct Access to ProducGon Cluster Can Sleep at night? J
  • 25. SC2 vs Non-SC2 (Availability Features) Property Non-­‐SC2 SC2 Cross Data-­‐Center Support Cluster Cloning CollecGon Versioning One-­‐Push Disaster Recovery Repair API for Nodes/CollecGons Node Replacement
  • 26. Lessons Learned 1. Solr is a search platform. Do not use it as a database (for scans and lookups). Evaluate your stored fields. 2. Understand access patterns, QPS and queries in detail. Be careful when tuning caches. 3. Have access control for large-scale jobs that directly talk to your cluster. (Internal DDOS attacks are hard to track.) 4. Instrument every piece of infrastructure and collect metrics. 5. Build automated disaster recovery (You will need it. J)
  • 27. Questions? Thank You! NiGn Sharma niGn.sharma@bloomreach.com hQps://www.linkedin.com/in/kniGnsharma