0
Igor Motovigor@motovs.org twitter: @imotov   github: imotov
Sonian Inc.•Cloud-based email archiving•Founded in 2007•Headquarters: Newton, MA
Small team of about15  developers distributedfrom Campinas, Brazil to   Vancouver, Canada
Using elasticsearch since   June 2010, v0.8.0
We have about      6 billionrecords indexed in elasticsearch
100,000 Netflix DVD Titles
3,000,000 Pages in en.wikipedia.org
22,000,000Books in Library of Congress catalog
150,000,000   Linked-in profiles
3,000,000,000  Estimated bing.com index size
6,000,000,000   Sonian Inc. index size
50,000,000,000Estimated google.com     index size
Infrastructure
http://www.sonian.com/awssonian-technical-diagram/
Ingestion (safe):   ClojureSearch Engine: elasticsearchWeb App:          Ruby on RailDeployment:     ChefMonitoring:     S...
10 clusters     6 AWS Regions2-17 nodes in each cluster
Custom version of   elasticsearch based on 0.19.9with several plugins
jetty plugin• jetty-based http transport• SSL support• Authentication• Request logging (json, plain)
Request logs are also indexed      in elasticsearch
Open sourcehttps://github.com/sonian/elasti           csearch-jetty
Zookeeper plugin Zookeeper-based discoveryReplacement for zen discovery            Experimental!
Open sourcehttps://github.com/sonian/elasti       csearch-zookeeper
Valve plugin•Custom jetty plugin filter•Rejects bulk indexing requestsif cluster is overloaded
Lessons learned in the last two years          or
Proper Care and     Feeding ofElasticsearch Nodes
Rule1: Give nodes plenty of           spaceRunning out of disk space ormemory is the simplest way to    corrupt your index.
Make sure elasticsearch         doesn’t swap It reduces performance andcauses nodes to leave clusters
elasticsearch.ymlbootstrap.mlockall: true
Increase the number of open    file descriptors to 64k.
Rule 2: Distributed but well          connectedAll nodes should be able to talk    to each other all the time
Otherwise your cluster might get split-brain syndrome
Consider settingdiscovery.zen.minimum_master_nodes
Rule 3: Throttle the bulk        indexing load  Asynchronous architecturemakes es scalable and fast, but susceptible to ru...
Rule 4: Try to make all shardsapproximately the same sizeElasticsearch allocates shards   based on the number of  shards. ...
4 rules for happy elasticsearch1. Give nodes plenty of space2. Distributed but well   connected3. Throttle the load4. Make...
Questions?
More InformationLatest stable release: 0.19.10Web Site: http://www.elasticsearch.org/Follow @elasticsearch on twitterIRC: ...
Boston elasticsearch meetup October 2012
Upcoming SlideShare
Loading in...5
×

Boston elasticsearch meetup October 2012

847

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
847
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • http://www.flickr.com/photos/drachmann/327122302/
  • http://www.flickr.com/photos/4nitsirk/3778043845/
  • Transcript of "Boston elasticsearch meetup October 2012"

    1. 1. Igor Motovigor@motovs.org twitter: @imotov github: imotov
    2. 2. Sonian Inc.•Cloud-based email archiving•Founded in 2007•Headquarters: Newton, MA
    3. 3. Small team of about15 developers distributedfrom Campinas, Brazil to Vancouver, Canada
    4. 4. Using elasticsearch since June 2010, v0.8.0
    5. 5. We have about 6 billionrecords indexed in elasticsearch
    6. 6. 100,000 Netflix DVD Titles
    7. 7. 3,000,000 Pages in en.wikipedia.org
    8. 8. 22,000,000Books in Library of Congress catalog
    9. 9. 150,000,000 Linked-in profiles
    10. 10. 3,000,000,000 Estimated bing.com index size
    11. 11. 6,000,000,000 Sonian Inc. index size
    12. 12. 50,000,000,000Estimated google.com index size
    13. 13. Infrastructure
    14. 14. http://www.sonian.com/awssonian-technical-diagram/
    15. 15. Ingestion (safe): ClojureSearch Engine: elasticsearchWeb App: Ruby on RailDeployment: ChefMonitoring: Sensu
    16. 16. 10 clusters 6 AWS Regions2-17 nodes in each cluster
    17. 17. Custom version of elasticsearch based on 0.19.9with several plugins
    18. 18. jetty plugin• jetty-based http transport• SSL support• Authentication• Request logging (json, plain)
    19. 19. Request logs are also indexed in elasticsearch
    20. 20. Open sourcehttps://github.com/sonian/elasti csearch-jetty
    21. 21. Zookeeper plugin Zookeeper-based discoveryReplacement for zen discovery Experimental!
    22. 22. Open sourcehttps://github.com/sonian/elasti csearch-zookeeper
    23. 23. Valve plugin•Custom jetty plugin filter•Rejects bulk indexing requestsif cluster is overloaded
    24. 24. Lessons learned in the last two years or
    25. 25. Proper Care and Feeding ofElasticsearch Nodes
    26. 26. Rule1: Give nodes plenty of spaceRunning out of disk space ormemory is the simplest way to corrupt your index.
    27. 27. Make sure elasticsearch doesn’t swap It reduces performance andcauses nodes to leave clusters
    28. 28. elasticsearch.ymlbootstrap.mlockall: true
    29. 29. Increase the number of open file descriptors to 64k.
    30. 30. Rule 2: Distributed but well connectedAll nodes should be able to talk to each other all the time
    31. 31. Otherwise your cluster might get split-brain syndrome
    32. 32. Consider settingdiscovery.zen.minimum_master_nodes
    33. 33. Rule 3: Throttle the bulk indexing load Asynchronous architecturemakes es scalable and fast, but susceptible to running out ofmemory under excessive bulk indexing load.
    34. 34. Rule 4: Try to make all shardsapproximately the same sizeElasticsearch allocates shards based on the number of shards. It doesn’t consider shard sizes or available disk space.
    35. 35. 4 rules for happy elasticsearch1. Give nodes plenty of space2. Distributed but well connected3. Throttle the load4. Make all shards the same size
    36. 36. Questions?
    37. 37. More InformationLatest stable release: 0.19.10Web Site: http://www.elasticsearch.org/Follow @elasticsearch on twitterIRC: #elasticsearch on irc.freenode.netGitHub: https://github.com/elasticsearch/elasticsearchMailing list: elasticsearch on http://groups.google.com/Stackoverflow tag: elasticsearch
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×