SF ElasticSearch Meetup 2012.10.03

Scaling ElasticSearch

SF Meetup
2012.10.03

Sushant Shankar
sushant.shankar@33across.com

Agenda
• Why we need a search engine
• Monitoring
• Index Building
• Query Performance

Who is asdfas
>600,000 Publishers
Machine Learning and Graph algorithms to:
- Build advertising segments
- Extract insights out of social and interest data
- Target via high-performance distributed systems that
integrate with our advertising partners

Website | Facebook | Twitter

Why we really need a search engine
Batch! Good for complicated tasks
(Machine Learning, Graph Algorithms, etc.)

… …

INDEX BUILDING
1 WEEK → 3 HOURS

Mappers to build index

6 nodes, 24GB RAM
16GB for ES service
4 cores
3x 1.5TB drive

>1TB/index
Build index
(replicated)
using MR job
~300M documents
and Bulk API
~5KB / document
~3 hours

Parameter Optimization
Amount bulk indexed

Time taken
CPU util.
Mem util.
Disk I/O
Network

# Shards

Index Building: Learnings
• Bulk API
• No replicas
• 2 shards / CPU
• 10,000 documents (users) per indexing
request
• Refresh off (index.refresh_interval = -1)

QUERY PERFORMANCE
5 MINUTES  10 SECONDS

Query Performance: Learnings
• 1-2 Replicas (and for reliability)
• Turn refresh on again (5s default)
• Warm up effect (Index Warm up API 0.20+)
• Optimize API
• Simulate multiple users

Warm Up: load into memory and cache

Other cool features
• Custom Scoring functions
• Scripts – MVEL, Python
• Facets

• Exploring:
• Real-time indexing
• Indexing images, files, etc.
• Parent-child relationships

SF ElasticSearch Meetup 2012.10.03

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SF ElasticSearch Meetup 2012.10.03

Similar to SF ElasticSearch Meetup 2012.10.03 (20)

Recently uploaded

Recently uploaded (20)

SF ElasticSearch Meetup 2012.10.03

Editor's Notes