SF ElasticSearch Meetup 2012.10.03
Upcoming SlideShare
Loading in...5
×
 

SF ElasticSearch Meetup 2012.10.03

on

  • 778 views

Some thoughts on scaling ElasticSearch, especially related to index building and optimizing for query performance.

Some thoughts on scaling ElasticSearch, especially related to index building and optimizing for query performance.

Statistics

Views

Total Views
778
Views on SlideShare
743
Embed Views
35

Actions

Likes
1
Downloads
6
Comments
0

4 Embeds 35

http://www.linkedin.com 25
https://www.linkedin.com 8
https://twitter.com 1
http://darya-ld1.linkedin.biz 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Collect information over 1B users internationally – text copied from over 600K publisher sites, images, searches, pages visitedDifferent slices of data – now!

SF ElasticSearch Meetup 2012.10.03 SF ElasticSearch Meetup 2012.10.03 Presentation Transcript

  • Scaling ElasticSearch SF Meetup 2012.10.03 Sushant Shankar sushant.shankar@33across.com
  • Agenda• Why we need a search engine• Monitoring• Index Building• Query Performance
  • Who is asdfas>600,000 PublishersMachine Learning and Graph algorithms to:- Build advertising segments- Extract insights out of social and interest data- Target via high-performance distributed systems that integrate with our advertising partnersWebsite | Facebook | Twitter
  • Why we really need a search engine Batch! Good for complicated tasks (Machine Learning, Graph Algorithms, etc.) … …
  • INDEX BUILDING 1 WEEK → 3 HOURS
  • Mappers to build index 6 nodes, 24GB RAM 16GB for ES service 4 cores 3x 1.5TB drive >1TB/index Build index (replicated) using MR job ~300M documents and Bulk API ~5KB / document ~3 hours
  • Monitoring: Zabbix
  • Monitoring: SPM
  • Parameter OptimizationAmount bulk indexed Time taken CPU util. Mem util. Disk I/O Network # Shards
  • Index Building: Learnings• Bulk API• No replicas• 2 shards / CPU• 10,000 documents (users) per indexing request• Refresh off (index.refresh_interval = -1)
  • QUERY PERFORMANCE 5 MINUTES  10 SECONDS
  • Query Performance: Learnings• 1-2 Replicas (and for reliability)• Turn refresh on again (5s default)• Warm up effect (Index Warm up API 0.20+)• Optimize API• Simulate multiple users
  • Warm Up: load into memory and cache
  • Other cool features• Custom Scoring functions• Scripts – MVEL, Python• Facets• Exploring:• Real-time indexing• Indexing images, files, etc.• Parent-child relationships
  • QUERIES?
  • Index Building over time