SlideShare a Scribd company logo
1 of 33
Download to read offline
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Rackspace Email’s solution for indexing 50k documents per second
George Bailey – Software Developer, Rackspace
Cameron Baker – Linux Engineer, Rackspace
george.bailey@rackspace.com
cameron.baker@rackspace.com
3
02Who we are…
•  “Rackers” dedicated to Fanatical Support!
•  Based out of San Antonio, TX
•  Part of Cloud Office
•  Email infrastructure development and engineering
4
022008: Original Problem
•  1 million mailboxes
•  Support needs to track message delivery
•  We need event aggregation + search
•  Needed to provide Fanatical Support!
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
5
02Original System Design
•  Scribed: log aggregation, deposit into HDFS
•  Hadoop 0.20: index via mapreduce
•  solr 1.4: search
•  Custom tools: index loader, mapreduce, scheduler
Scribed
6
02Original System Architecture
7
02Past performance
Step Time
Transport < 1 minute
Index Generation
(Mapreduce)
10 minutes
(cron)
Index Merge
10 minutes
(cron)
Searchable Events 20+ minutes
8
027 years later…
•  4+ million mailboxes
•  Still running solr 1.4, hadoop 0.20, scribed
•  Scaling, maintenance issues
•  Grew to 100+ physical servers, 15 VMs
•  Events need to be used in other contexts
•  20+ minute time-to-search no longer acceptable
9
02
Time to modernize!
10
02Goals
•  Improve customer experience – Fanatical Support!
•  Provide search results faster
•  Reduce technologies
•  Reduce the amount of custom code
•  Reduce the number of physical servers
11
02New System - Components
•  Apache Flume: aggregation + processing
•  Solr 1.4 to 4.x/5.x: NRT indexing, distributed search
•  SolrCloud allowed us to reduce custom code by 75%
12
02System architecture
13
02
Performance Tuning
14
02Flume: backpressure + hop availability
•  Sinks may be unreachable or slow
•  File Channel = durable buffering
•  capacity: disk / event size
•  transactionCapacity: match source / sink
•  minimumRequiredSpace
15
02Flume: batching and throughput
•  Batch size is important
•  File channels = slow
•  Memory channels = fast
•  “Loopback” flows
16
02Flume: controlling the flows
•  One event, multiple uses
•  Channel selectors
•  Optional channels
•  Interceptors
agent.sources.avroSource.selector.type = multiplexing
agent.sources.avroSource.selector.header = eventType
agent.sources.avroSource.selector.default = defaultChannel
agent.sources.avroSource.selector.authEvent = authEventChannel
agent.sources.avroSource.selector.mailEvent = mailEventChannel
agent.sources.avroSource.selector.optional.authEvent = optionalChannel
17
02Flume: Morphlines + Solr
•  Works with SolrCloud
•  Many helpful built-in commands
•  Scripting support for Java
•  Route to multiple collections
•  Validate, modify events in-flight
http://kitesdk.org/docs/current/morphlines/morphlines-reference-guide.html
18
02Requirements for Solr
•  Near real time indexing of 30,000+ docs per sec
•  Few queries (< 10,000 per day)
•  Heavy distributed facet/group/sort queries
•  Support removing documents older than X days
•  Minimize JVM GC impact on indexing performance
19
02Basic Solr install
Server A
Solr
Replica
Server B
Solr
Replica
Server C
Solr
Replica
Server D
Solr
Replica
Collection
Shard 1 Shard 2
~2,500 docs per second
Goal 30,000 (30,000/2,500 = 12)
12 * # of Servers = 48 total servers
20
02Consult the experts…
•  Days of talking/100’s of emails with Rishi Easwaran
•  Recommendations from Shalin Mangar
•  solr-user@lucene.apache.org
Result:
•  Fewer physical servers
•  Faster indexing
21
02Collections – Optimized for additions/deletions
collection-2015-10-11
collection-2015-10-12
collection-2015-10-13
collection-2015-10-14
collection-2015-10-15collection-2015-10-16
•  Rolling collections by date
•  ~1 billion documents removed
•  Aliases for updates/queries
•  25 shards - 2 replicas per shard
22
02JVM – Lean and mean
•  4GB max/min JVM heap size
•  5 Solr JVM processes per server
•  Using Concurrent Mark Sweep GC
•  GC only on very heavy queries
•  GC < 10ms; occurs < 10 times a day
•  No impact on index performance
•  Reads 28 indexes; writes 2 indexes
Server A
Solr
Server A
Solr
Solr
Solr
Solr
Solr
23
02JVM Monitoring – before it’s too late
•  Proactive OOM monitoring
•  Memory not being released
•  Trapped in GC
•  Restart processes
•  Can impact entire cluster
24
02autoCommit for near real time indexing
Tested autoCommit and autoSoftCommit settings of:
•  autoCommit 5 seconds to 5 minutes
•  autoSoftCommit 1 second to 1 minute
Result:
•  autoSoftCommit of 5 seconds and autoCommit of 1
minute balanced out memory usage and disk IO
25
02DocValues – Reduced OOM Errors
•  Struggled with OOME under heavy load
•  Automated restart for nodes trapped in GC cycle
•  Distributed facet/group/sort queries
Solution:
•  docValues=“true” – for facet/group/sort fields
26
02Caching/Cache Warming – Measure and tune
•  filterCache/queryResultCache/documentCache/etc.
•  Very diverse queries (cache hits were too low)
•  Benefits for our workload did not justify the cost
•  Willing to accept slower queries
27
02Configs - Keep it simple
•  Example configs show off advanced features
•  If you are not using the feature, turn it off
•  Start with a trimmed down config
•  Only add features as needed
28
02
Performance Comparison
29
02Present performance
•  Sustained indexing of ~50,000 docs per sec
•  Each replica indexes ~1,000 docs per sec
•  New documents are searchable within 5 seconds
•  10,000 distributed facet/group/sort queries per day
•  1 billion new documents are indexed per day
•  13 billion documents are searchable
•  7TB of data across all indexes
30
02Performance Comparison
Step Performance (2008) Performance (2015)
Transport <1 minute <1 second (NRT)
Index
Generation
10 minutes <5 seconds
Index Merge 10 minutes N/A
Search 20+ minutes <5 seconds
•  Faster transport
•  No more batch processing
•  No external index generation
•  NRT indexing with SolrCloud
31
02Environment Comparison
Server Type Servers (2008) Servers (2015)
Transport
Physical: 4
Virtual: 15
Physical: 4
Virtual: 20
Storage /
processing
Physical: 100+
Virtual: 0
Physical: 0
Virtual: 0
Search
Physical: 12
Virtual: 0
Physical: 10
Virtual: 5
Total
Physical: 100+
Virtual: 15
Physical: 14
Virtual: 25
•  Flume / Solr handle event storage
and processing
•  No more Hadoop footprint
•  Over 80% reduction in servers
32
02Future…
•  Dedicated Solr nodes with SSDs for indexing
•  Shard query collections for improved performance
•  Larger JVM size for query nodes
•  Multiple Datacenter SolrCloud (replication/mirroring)
Rackspace Email’s solution for indexing 50k documents per second
George Bailey – Software Developer, Rackspace
Cameron Baker – Linux Engineer, Rackspace
george.bailey@rackspace.com
cameron.baker@rackspace.com
Thank you

More Related Content

What's hot

Apache Zeppelin & Cluster
Apache Zeppelin & ClusterApache Zeppelin & Cluster
Apache Zeppelin & ClusterJongyoul Lee
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...Lucidworks
 
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...JAXLondon2014
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformLegacy Typesafe (now Lightbend)
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lightbend
 
NoSQL - No Security?
NoSQL - No Security?NoSQL - No Security?
NoSQL - No Security?Gavin Holt
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overviewconfluent
 
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google CloudPakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google CloudLightbend
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)Yuval Itzchakov
 
Do's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in productionDo's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in productionjglobal
 
Scale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with AkkaScale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with AkkaYuval Itzchakov
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In RubySATOSHI TAGOMORI
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...lucenerevolution
 
NoSQL, no SQL injections?
NoSQL, no SQL injections?NoSQL, no SQL injections?
NoSQL, no SQL injections?Wayne Huang
 
SplunkLive London 2014 Developer Presentation
SplunkLive London 2014  Developer PresentationSplunkLive London 2014  Developer Presentation
SplunkLive London 2014 Developer PresentationDamien Dallimore
 
aclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHoundaclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHoundDirkjanMollema
 

What's hot (20)

How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Apache Zeppelin & Cluster
Apache Zeppelin & ClusterApache Zeppelin & Cluster
Apache Zeppelin & Cluster
 
Solr vs ElasticSearch
Solr vs ElasticSearchSolr vs ElasticSearch
Solr vs ElasticSearch
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
NoSQL - No Security?
NoSQL - No Security?NoSQL - No Security?
NoSQL - No Security?
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google CloudPakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google Cloud
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)Spark Streaming @ Scale (Clicktale)
Spark Streaming @ Scale (Clicktale)
 
Do's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in productionDo's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in production
 
Scale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with AkkaScale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with Akka
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 
NoSQL, no SQL injections?
NoSQL, no SQL injections?NoSQL, no SQL injections?
NoSQL, no SQL injections?
 
SplunkLive London 2014 Developer Presentation
SplunkLive London 2014  Developer PresentationSplunkLive London 2014  Developer Presentation
SplunkLive London 2014 Developer Presentation
 
aclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHoundaclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHound
 
Spring Integration Splunk
Spring Integration SplunkSpring Integration Splunk
Spring Integration Splunk
 

Viewers also liked

Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Lucidworks
 
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolMail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolLucidworks
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insightsOmid Vahdaty
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformaticsCharlie Hull
 
JSONSchema with golang
JSONSchema with golangJSONSchema with golang
JSONSchema with golangSuraj Deshmukh
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
 
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Lucidworks
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 

Viewers also liked (10)

What's Your Money Persona?
What's Your Money Persona?What's Your Money Persona?
What's Your Money Persona?
 
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
 
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolMail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insights
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformatics
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
JSONSchema with golang
JSONSchema with golangJSONSchema with golang
JSONSchema with golang
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
 
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 

Similar to Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented by George Bailey & Cameron Baker, Rackspace

Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionLucidworks
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...Lucidworks
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Xavier Lucas
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestHakka Labs
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestKrishna Gade
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scaleOvais Tariq
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewLei (Harry) Zhang
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement VMware Tanzu
 
The Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLThe Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLDatadog
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 

Similar to Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented by George Bailey & Cameron Baker, Rackspace (20)

Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with Fusion
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at Pinterest
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup Presentation
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
 
The Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLThe Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQL
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented by George Bailey & Cameron Baker, Rackspace

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. Rackspace Email’s solution for indexing 50k documents per second George Bailey – Software Developer, Rackspace Cameron Baker – Linux Engineer, Rackspace george.bailey@rackspace.com cameron.baker@rackspace.com
  • 3. 3 02Who we are… •  “Rackers” dedicated to Fanatical Support! •  Based out of San Antonio, TX •  Part of Cloud Office •  Email infrastructure development and engineering
  • 4. 4 022008: Original Problem •  1 million mailboxes •  Support needs to track message delivery •  We need event aggregation + search •  Needed to provide Fanatical Support! http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
  • 5. 5 02Original System Design •  Scribed: log aggregation, deposit into HDFS •  Hadoop 0.20: index via mapreduce •  solr 1.4: search •  Custom tools: index loader, mapreduce, scheduler Scribed
  • 7. 7 02Past performance Step Time Transport < 1 minute Index Generation (Mapreduce) 10 minutes (cron) Index Merge 10 minutes (cron) Searchable Events 20+ minutes
  • 8. 8 027 years later… •  4+ million mailboxes •  Still running solr 1.4, hadoop 0.20, scribed •  Scaling, maintenance issues •  Grew to 100+ physical servers, 15 VMs •  Events need to be used in other contexts •  20+ minute time-to-search no longer acceptable
  • 10. 10 02Goals •  Improve customer experience – Fanatical Support! •  Provide search results faster •  Reduce technologies •  Reduce the amount of custom code •  Reduce the number of physical servers
  • 11. 11 02New System - Components •  Apache Flume: aggregation + processing •  Solr 1.4 to 4.x/5.x: NRT indexing, distributed search •  SolrCloud allowed us to reduce custom code by 75%
  • 14. 14 02Flume: backpressure + hop availability •  Sinks may be unreachable or slow •  File Channel = durable buffering •  capacity: disk / event size •  transactionCapacity: match source / sink •  minimumRequiredSpace
  • 15. 15 02Flume: batching and throughput •  Batch size is important •  File channels = slow •  Memory channels = fast •  “Loopback” flows
  • 16. 16 02Flume: controlling the flows •  One event, multiple uses •  Channel selectors •  Optional channels •  Interceptors agent.sources.avroSource.selector.type = multiplexing agent.sources.avroSource.selector.header = eventType agent.sources.avroSource.selector.default = defaultChannel agent.sources.avroSource.selector.authEvent = authEventChannel agent.sources.avroSource.selector.mailEvent = mailEventChannel agent.sources.avroSource.selector.optional.authEvent = optionalChannel
  • 17. 17 02Flume: Morphlines + Solr •  Works with SolrCloud •  Many helpful built-in commands •  Scripting support for Java •  Route to multiple collections •  Validate, modify events in-flight http://kitesdk.org/docs/current/morphlines/morphlines-reference-guide.html
  • 18. 18 02Requirements for Solr •  Near real time indexing of 30,000+ docs per sec •  Few queries (< 10,000 per day) •  Heavy distributed facet/group/sort queries •  Support removing documents older than X days •  Minimize JVM GC impact on indexing performance
  • 19. 19 02Basic Solr install Server A Solr Replica Server B Solr Replica Server C Solr Replica Server D Solr Replica Collection Shard 1 Shard 2 ~2,500 docs per second Goal 30,000 (30,000/2,500 = 12) 12 * # of Servers = 48 total servers
  • 20. 20 02Consult the experts… •  Days of talking/100’s of emails with Rishi Easwaran •  Recommendations from Shalin Mangar •  solr-user@lucene.apache.org Result: •  Fewer physical servers •  Faster indexing
  • 21. 21 02Collections – Optimized for additions/deletions collection-2015-10-11 collection-2015-10-12 collection-2015-10-13 collection-2015-10-14 collection-2015-10-15collection-2015-10-16 •  Rolling collections by date •  ~1 billion documents removed •  Aliases for updates/queries •  25 shards - 2 replicas per shard
  • 22. 22 02JVM – Lean and mean •  4GB max/min JVM heap size •  5 Solr JVM processes per server •  Using Concurrent Mark Sweep GC •  GC only on very heavy queries •  GC < 10ms; occurs < 10 times a day •  No impact on index performance •  Reads 28 indexes; writes 2 indexes Server A Solr Server A Solr Solr Solr Solr Solr
  • 23. 23 02JVM Monitoring – before it’s too late •  Proactive OOM monitoring •  Memory not being released •  Trapped in GC •  Restart processes •  Can impact entire cluster
  • 24. 24 02autoCommit for near real time indexing Tested autoCommit and autoSoftCommit settings of: •  autoCommit 5 seconds to 5 minutes •  autoSoftCommit 1 second to 1 minute Result: •  autoSoftCommit of 5 seconds and autoCommit of 1 minute balanced out memory usage and disk IO
  • 25. 25 02DocValues – Reduced OOM Errors •  Struggled with OOME under heavy load •  Automated restart for nodes trapped in GC cycle •  Distributed facet/group/sort queries Solution: •  docValues=“true” – for facet/group/sort fields
  • 26. 26 02Caching/Cache Warming – Measure and tune •  filterCache/queryResultCache/documentCache/etc. •  Very diverse queries (cache hits were too low) •  Benefits for our workload did not justify the cost •  Willing to accept slower queries
  • 27. 27 02Configs - Keep it simple •  Example configs show off advanced features •  If you are not using the feature, turn it off •  Start with a trimmed down config •  Only add features as needed
  • 29. 29 02Present performance •  Sustained indexing of ~50,000 docs per sec •  Each replica indexes ~1,000 docs per sec •  New documents are searchable within 5 seconds •  10,000 distributed facet/group/sort queries per day •  1 billion new documents are indexed per day •  13 billion documents are searchable •  7TB of data across all indexes
  • 30. 30 02Performance Comparison Step Performance (2008) Performance (2015) Transport <1 minute <1 second (NRT) Index Generation 10 minutes <5 seconds Index Merge 10 minutes N/A Search 20+ minutes <5 seconds •  Faster transport •  No more batch processing •  No external index generation •  NRT indexing with SolrCloud
  • 31. 31 02Environment Comparison Server Type Servers (2008) Servers (2015) Transport Physical: 4 Virtual: 15 Physical: 4 Virtual: 20 Storage / processing Physical: 100+ Virtual: 0 Physical: 0 Virtual: 0 Search Physical: 12 Virtual: 0 Physical: 10 Virtual: 5 Total Physical: 100+ Virtual: 15 Physical: 14 Virtual: 25 •  Flume / Solr handle event storage and processing •  No more Hadoop footprint •  Over 80% reduction in servers
  • 32. 32 02Future… •  Dedicated Solr nodes with SSDs for indexing •  Shard query collections for improved performance •  Larger JVM size for query nodes •  Multiple Datacenter SolrCloud (replication/mirroring)
  • 33. Rackspace Email’s solution for indexing 50k documents per second George Bailey – Software Developer, Rackspace Cameron Baker – Linux Engineer, Rackspace george.bailey@rackspace.com cameron.baker@rackspace.com Thank you