Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Scaling ElasticSearch      SF Meetup      2012.10.03                       Sushant Shankar                   sushant.shank...
Agenda•   Why we need a search engine•   Monitoring•   Index Building•   Query Performance
Who is asdfas>600,000 PublishersMachine Learning and Graph algorithms to:- Build advertising segments- Extract insights ou...
Why we really need a search engine         Batch! Good for complicated tasks         (Machine Learning, Graph Algorithms, ...
INDEX BUILDING       1 WEEK → 3 HOURS
Mappers to build index                        6 nodes, 24GB RAM                        16GB for ES service                ...
Monitoring: Zabbix
Monitoring: SPM
Parameter OptimizationAmount bulk indexed                      Time taken                       CPU util.                 ...
Index Building: Learnings• Bulk API• No replicas• 2 shards / CPU• 10,000 documents (users) per indexing  request• Refresh ...
QUERY PERFORMANCE     5 MINUTES  10 SECONDS
Query Performance: Learnings•   1-2 Replicas (and for reliability)•   Turn refresh on again (5s default)•   Warm up effect...
Warm Up: load into memory and cache
Other cool features• Custom Scoring functions• Scripts – MVEL, Python• Facets•   Exploring:•   Real-time indexing•   Index...
QUERIES?
Index Building over time
Upcoming SlideShare
Loading in …5
×

SF ElasticSearch Meetup 2012.10.03

933 views

Published on

Some thoughts on scaling ElasticSearch, especially related to index building and optimizing for query performance.

Published in: Technology
  • Be the first to comment

SF ElasticSearch Meetup 2012.10.03

  1. 1. Scaling ElasticSearch SF Meetup 2012.10.03 Sushant Shankar sushant.shankar@33across.com
  2. 2. Agenda• Why we need a search engine• Monitoring• Index Building• Query Performance
  3. 3. Who is asdfas>600,000 PublishersMachine Learning and Graph algorithms to:- Build advertising segments- Extract insights out of social and interest data- Target via high-performance distributed systems that integrate with our advertising partnersWebsite | Facebook | Twitter
  4. 4. Why we really need a search engine Batch! Good for complicated tasks (Machine Learning, Graph Algorithms, etc.) … …
  5. 5. INDEX BUILDING 1 WEEK → 3 HOURS
  6. 6. Mappers to build index 6 nodes, 24GB RAM 16GB for ES service 4 cores 3x 1.5TB drive >1TB/index Build index (replicated) using MR job ~300M documents and Bulk API ~5KB / document ~3 hours
  7. 7. Monitoring: Zabbix
  8. 8. Monitoring: SPM
  9. 9. Parameter OptimizationAmount bulk indexed Time taken CPU util. Mem util. Disk I/O Network # Shards
  10. 10. Index Building: Learnings• Bulk API• No replicas• 2 shards / CPU• 10,000 documents (users) per indexing request• Refresh off (index.refresh_interval = -1)
  11. 11. QUERY PERFORMANCE 5 MINUTES  10 SECONDS
  12. 12. Query Performance: Learnings• 1-2 Replicas (and for reliability)• Turn refresh on again (5s default)• Warm up effect (Index Warm up API 0.20+)• Optimize API• Simulate multiple users
  13. 13. Warm Up: load into memory and cache
  14. 14. Other cool features• Custom Scoring functions• Scripts – MVEL, Python• Facets• Exploring:• Real-time indexing• Indexing images, files, etc.• Parent-child relationships
  15. 15. QUERIES?
  16. 16. Index Building over time

×