You're not using ElasticSearch (outdated)


Published on

The slightly verbose slides accompanying my introductory ElasticSearch talk at Utrecht.rb

These slides are outdated, see

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

You're not using ElasticSearch (outdated)

  1. 1. You’re not using ElasticSearch? Timon Vonk (@timonvonk)
  2. 2. About me • CTO @ Tolq • Freelance hacker and consultant • Used ElasticSearch as a solution for Translation Memory, text author recommendation and as a spam filter.
  3. 3. What is it? • Full text search engine based on Apache Lucene • Cloud in mind, built for speeed • Easy to use JSON api • A *lot* of room for custom solutions
  4. 4. Why use it (vs solr)? • Cloud based setup out of the box. Setting this up is super easy. That means automagic replication, sharding and bonus mapreduce. • Indexes happen in a few seconds (vs minutes in solr) • Again, easy Json API • Nosql, mappings can be generated on the fly • Well scriptable and customisable for fancy aggregation or dynamic analysing
  5. 5. O M G W H AT I S LUCENE • Java library for doing full text searches by Apache • Just a library, by no means a solution in itself (although you can) • More in depth, the search works via a terms indexing algorithm. Scoring is not only based on occurrence, but also on uniqueness
  6. 6. Making a query • You send json to a _search endpoint on either an index, with maybe a type This is a basic full text search with a response: ! GET localhost:9200/example/peanuts/_search
 { ‘query’: { text: { ‘my_field’: ’many search terms’ }}} { took: 5, timed_out: false,
 _shards: { total: 5, successful: 5, failed: 0 },
 hits: [
 { _index: “example”,
 _type: “peanuts”,
 _score: 0.9,
 _source: { …data }
  7. 7. Other types of queries • Terms, full text, boolean, fuzzy, geolocation and lots more variants • You can also do filters to narrow down results
  8. 8. Analysing • Before data is indexed or queries are made its analysed • At this point, terms are scored • But before that, the data is normalised. This usually includes stemming and stop word removal. With support for over 30 languages. Woot! • You can create your own analysers.
  9. 9. Facet me awesome • Facets allow you to do aggregation over your search.
 { “query”: … } { “facets”: { “my_facets”: { “terms”: { “name”: “utrechtrb” } } } } Facets will be deprecated in 1.0 in favour of ‘aggregations’
  10. 10. Ruby libraries • Good old Tire
 deprecated • Stretcher! • Or just a json client
  11. 11. H o w To l q u s e s ElasticSearch • We use ElasticSearch as a Translation Memory. In our case, that means we suggest other relevant translations while a translator is translating. • By storing original texts and dynamically adjusting the analyser to the correct languages, we can suggest similar translations.
  12. 12. Tolq code was here
  13. 13. Tolq code was here
  14. 14. Tolq code was here
  15. 15. Also, it works great for text search! Thanks!