Building Real-Time Search
at Mailchimp
Kevin Xu
Software Engineer
2018
JAN ‘17 OCT ‘17
2018
MC Technical Overview
US1 - US19
2018
SEARCH APP
(downstream)
search queries
indexing requests
(every 5 min)
SQLite Search
2018
SEARCH APP
Problem: Querying
search queries
2018
SEARCH APP
Problem: Indexing
indexing requests
(every ???)
2018
SEARCH APP
search queries
indexing requests
(every ???)
SQLite Search
2018
Considering a Replacement
- Use cases: full-text search,
logging/log analysis, events
and metrics
- Fast queries
- Scale horizontally
2018
Elasticsearch Docs
2018
Querying ES
2018
Capturing Events from MC
US1
US19
2018
Possible Solution: Direct Indexing?
2018
Adding a Message Queue
2018
Real-Time Streaming
Proprietary & Confidential 2018 16
Kafka: Topics
2018
Kafka: Partitions
2018
Connecting the Dots
2018
Tracking
Events
All Changes
Searchable
Changes
2018
App-Layer Filtering
config.searchable_model: “Contact”;
config.searchable_model: “Campaign”;
...
$contact = new Contact(“Ben”);
$contact->save(); // onSave()
2018
Write to File, Ship to Kafka
2018
Indexing to ES
PHP Consumers
?
2018
Generating Documents
2018
No Ordering Guarantees!
2018
No Order, No Problem
2018
PHP Consumers
2018
PHP Consumers
2018
Queries > 1s
9/26/17 10/10/17
NumberofQueries
release!
2018
release!
Queries > 2sNumberofQueries
9/26/17 10/10/17
2018 30
250msMedian (p50) Query Response Time
2018 31
400msp95 Query Response Time
2018 32
19Number of Elasticsearch clusters
2018 33
373 billion
Total docs across all ES clusters
2018 34
93,000
Total Changelog events per second (peak)
2018 35
3.7 minutes
Average Time to Index
2018 36
0Support tickets post-launch
Proprietary & Confidential 2018 37
What Now?
- Explore other applications of this
infrastructure
- Ongoing technical challenges
- Data Drift, Consumer Lag
Proprietary & Confidential 2018
Thanks!
38
kevin.xu@mailchimp.com

Building Real-Time Search at MailChimp