0
ElasticSearch in     Production                  lessons                  learnedAnne Veling, ApacheCon EU, November 6, 2012
agendaIntroductionElasticSearchUdiniUpcoming ToolLessons Learned
introductionAnne Veling, @annevelingSelf-employed contractor  Software Architect  Agile process management  Performance op...
ElasticSearchApache LuceneStarted in 2010 by Shay BanonOpen Source – Apache LicenseA company was formed in 2012: ElasticSe...
ElasticSearchScalable  Distributed, Node Discovery  Automatic sharding  Query distributionRESTful, HTTP API  With API wrap...
ElasticSearchIntegrated faceting  With statistical aggregates (sum/avg/…) for freeField types and analyzers  String, numer...
udini.proquest.comProQuestThe World‟s Article StoreStack  Amazon EC2  Scala with Unfiltered  MongoDB, ElasticSearch
architecture Fulfillment                          Udini         Summon ProvidersPDF pipeline      mongo       Elastic     ...
SOLR at UdiniConnecting to Summon API  700M SOLR ClusterIn Udini, we serve a subset of 160M full text articles  Including ...
ElasticSearch at UdiniLocal index to search your articlesMany small user libraries, searching only locally  User-id as sha...
Exciting new productDeveloping for ProQuestExciting new research tool for scientific researchersCreating a large ElasticSe...
Lessons Learned   Very fast indexing   Bulk indexing ftw      Set up without replicas (replicas = 0, not 1)      Play with...
Lessons learnedSchema(less)?Automatic field type recognition   Can miss types   Strict about types #duhMapping of subfield...
Lessons learnedLearn to trust ElasticSearch  Analyzers: do not pretokenize queries yourself…Difference between “term” and ...
Lessons learnedIssues with automated testing and node discovery/startupStart/stop hundreds of times during Jenkins test jo...
Lessons learnedNew tools every monthwaitForYellowStatusAliases, routing allow for clever control
APIElasticSearch is new, connection libraries still in infancy,documentation growingIssues using the Java API in ScalaHapp...
#nodbElasticSearch used as a full nosql datastore?Using “version” and optimistic locking schemeCould replace MongoDb in ou...
SOLR vs ElasticSearchSOLR                      ElasticSearch  Well-known, many          New kid on the block  tools, exten...
search evolution     • Custom indexers     • Inverted index     • Segment merges     • Custom analyzers     • Faceting    ...
conclusionsElasticSearch benefits  Easy to setup  Very clever architectureDrawbacks  Very new software, tool support limit...
thank youAre you still using Solr?Come on, it’s 2012 already ;-)                      anne@beyondtrees.com                ...
Upcoming SlideShare
Loading in...5
×

ElasticSearch in Production: lessons learned

50,214

Published on

With Proquest Udini, we have created the worlds largest online article store, and aim to be the center for researchers all over the world. We connect to a 700M solr cluster for search, but have recently also implemented a search component with ElasticSearch. We will discuss how we did this, and how we want to use the 30M index for scientific citation recognition. We will highlight lessons learned in integrating ElasticSearch in our virtualized EC2 environments, and challenges aligning with our continuous deployment processes.

Published in: Technology
1 Comment
63 Likes
Statistics
Notes
  • 1M records do not take 40 seconds. It depends on which version you were operating. Also, you can break your 1 M records in batches as it was already mentioned and make your requests parallel and target different nodes of Elastic Search. This will improve your insertion rate quite a bit.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
50,214
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
609
Comments
1
Likes
63
Embeds 0
No embeds

No notes for slide

Transcript of "ElasticSearch in Production: lessons learned"

  1. 1. ElasticSearch in Production lessons learnedAnne Veling, ApacheCon EU, November 6, 2012
  2. 2. agendaIntroductionElasticSearchUdiniUpcoming ToolLessons Learned
  3. 3. introductionAnne Veling, @annevelingSelf-employed contractor Software Architect Agile process management Performance optimization Lucene/SOLR/ElasticSearch implementations & training
  4. 4. ElasticSearchApache LuceneStarted in 2010 by Shay BanonOpen Source – Apache LicenseA company was formed in 2012: ElasticSearch Training, support and developmentCareful feature development vs. build because you can
  5. 5. ElasticSearchScalable Distributed, Node Discovery Automatic sharding Query distributionRESTful, HTTP API With API wrappers for Ruby, Java, Scala, … JSON in, JSON outDocument Model Maps book.author.lastName “schemaless” -> field type recognition Keeps source, keeps „version‟ number
  6. 6. ElasticSearchIntegrated faceting With statistical aggregates (sum/avg/…) for freeField types and analyzers String, numerics, geo, attachment, … Arrays, subdocuments, nested documentsIntegrated sharding Routing and alias Cross-index searching / multi-document type
  7. 7. udini.proquest.comProQuestThe World‟s Article StoreStack Amazon EC2 Scala with Unfiltered MongoDB, ElasticSearch
  8. 8. architecture Fulfillment Udini Summon ProvidersPDF pipeline mongo Elastic SOLR DB Search
  9. 9. SOLR at UdiniConnecting to Summon API 700M SOLR ClusterIn Udini, we serve a subset of 160M full text articles Including fulfillment mechanisms PDF and HTML5 viewing and annotation
  10. 10. ElasticSearch at UdiniLocal index to search your articlesMany small user libraries, searching only locally User-id as sharding key Include key in all queries
  11. 11. Exciting new productDeveloping for ProQuestExciting new research tool for scientific researchersCreating a large ElasticSearch index for journal articlecanonicalizationCurrently in private beta, launching in the coming months
  12. 12. Lessons Learned Very fast indexing Bulk indexing ftw Set up without replicas (replicas = 0, not 1) Play with bulk size Simple write to disk and CURL it in, is very fast 1M records in 40sfor f in ${BATCH_DIR}/batch-*.jsondo echo "about to index $f" curl --silent --show-error --request POST --data-binary @$f localhost:9200/_bulk > /dev/null echodone
  13. 13. Lessons learnedSchema(less)?Automatic field type recognition Can miss types Strict about types #duhMapping of subfields (doc.title vs doc.publication.title) Version dependentIn reality Schema still needed Mapping changes still non trivial
  14. 14. Lessons learnedLearn to trust ElasticSearch Analyzers: do not pretokenize queries yourself…Difference between “term” and “text” type queries tokenized or notElasticSearch probably already does what you want it todo Search for it Try it
  15. 15. Lessons learnedIssues with automated testing and node discovery/startupStart/stop hundreds of times during Jenkins test jobs ordevelopment boxes Takes time Locally sometimes picks up previous versionsMemory issues: ElasticSearch manages a large part of itsmemory outside of the heap Do not simply increase -Xmx
  16. 16. Lessons learnedNew tools every monthwaitForYellowStatusAliases, routing allow for clever control
  17. 17. APIElasticSearch is new, connection libraries still in infancy,documentation growingIssues using the Java API in ScalaHappy with Scalastic now synchronous asynchronous bulk prepare https://github.com/bsadeh/scalastic
  18. 18. #nodbElasticSearch used as a full nosql datastore?Using “version” and optimistic locking schemeCould replace MongoDb in our setupElasticSearch is actually a store optimized for getting stuffout, not for getting stuff in With free faceting Who needs multi-table transactions anyway?
  19. 19. SOLR vs ElasticSearchSOLR ElasticSearch Well-known, many New kid on the block tools, extensions Very easy to configure Feels clunky to Handles document to configure lucene mapping Manual document to Horizontally scalable lucene mapping Easy replication Replication and But: shard key indexing in a cluster non-trivial New schoolOld school ;-)
  20. 20. search evolution • Custom indexers • Inverted index • Segment merges • Custom analyzers • Faceting • Configuration of analyzers • Faceting, Geospatial • Document mapping • Sub-document queries • Replication • JSON document input • Faceting, complex queries just work
  21. 21. conclusionsElasticSearch benefits Easy to setup Very clever architectureDrawbacks Very new software, tool support limited But lots of movement Change sharding in a full index non-trivialElasticSearch Clever architecture, fast, stable Does exactly what you need
  22. 22. thank youAre you still using Solr?Come on, it’s 2012 already ;-) anne@beyondtrees.com @anneveling
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×