Elasticsearch
Scalable Full-Text Search Engine

Thursday, February 27, 14
Goals for this talk

Thursday, February 27, 14
Outline
• What’s full text search and why do we use
it?

• What can you do with Elasticsearch?
• Why is Elasticsearch different?
• DEMO TIME!
Thursday, February 27, 14
Text Search
do I really need to explain it?

Thursday, February 27, 14
%LIKE%
• In the beginning there was:
SELECT * FROM tweets WHERE content
LIKE ‘%zuckerberg%’

Thursday, February 27, 14
But that’s not what you usually search for!

• You want:
Search by author
Search by time
Search by sentiment
Search by location
Search by everything!

Thursday, February 27, 14
That’s a lot of metadata!

• You can’t search through all that on the fly
if you want realtime results

• You need to index it first!

Thursday, February 27, 14
Inverted Index
• Some documents:sells Facebook’ [Monday]
1: ‘Mark Zuckerberg

2: ‘Facebook buys WhatsApp’ [Tuesday]
3: ‘Mark’s Facebook buys Instagram’[Monday]

• Inverted index for them:{ 1, 2, 3}
Facebook:
Mark: {1, 3}
Instagram: {2}
WhatsApp: {2}
[Monday]: {1, 3}

Thursday, February 27, 14
Ok, now that we have data, we also want some
numbers behind it!

• In our previous example:
• Facebook is mentioned 3 times
• There are 2 posts on [Monday]
• The most frequent words are
Facebook and Mark

Thursday, February 27, 14
All 3 put together
Elasticsearch
=
Search(Content & Metadata) + Analytics
(oversimplified)

Thursday, February 27, 14
Let’s look at some
search features of
Elasticsearch

Thursday, February 27, 14
Features: Complex Queries

• Boolean Operators:

(apple OR pumpkin) AND pie

• Wildcards:
app*: apple, apples, appliance
appl?: apple, apply

• Fuzzy:
back~: back, pack, black, bank

• Ranged:
Thursday, February 27, 14
Features: Complex Queries

• Attribute filtering:
apple AND pie AND location:california

• Range filtering:
apple AND published:[1393100055 TO 1393427055]

Thursday, February 27, 14
Features:Geo Queries
Bounding Box Queries
Queries

Thursday, February 27, 14

Distance Range
Feature: built in analytics

Thursday, February 27, 14
Feature: Built in tagcloud

Thursday, February 27, 14
What’s special about
Elasticsearch?

Thursday, February 27, 14
Distributed

• Clustering data into multiple servers is easy
and abstracted away from the developer

Thursday, February 27, 14
Performance/Scalability

• Add and take nodes on the fly without ever
stopping the search service

Thursday, February 27, 14
Performance/Scalability

• Can scale independently both indexing and
searching

Thursday, February 27, 14
Performance/Scalability

• With few nodes you can do complex
queries on billions of documents

• 3 nodes: 20 mil documents with 2 replicas
each

Thursday, February 27, 14
Easy to back up
• Elasticsearch has a built in backup solution
so that you don’t have to worry about
implementing one

Thursday, February 27, 14
Demo time!

Thursday, February 27, 14

Intro to Elaticsearch - Elasticsearch Bucharest Group @ Softbinator

  • 1.
    Elasticsearch Scalable Full-Text SearchEngine Thursday, February 27, 14
  • 2.
    Goals for thistalk Thursday, February 27, 14
  • 3.
    Outline • What’s fulltext search and why do we use it? • What can you do with Elasticsearch? • Why is Elasticsearch different? • DEMO TIME! Thursday, February 27, 14
  • 4.
    Text Search do Ireally need to explain it? Thursday, February 27, 14
  • 5.
    %LIKE% • In thebeginning there was: SELECT * FROM tweets WHERE content LIKE ‘%zuckerberg%’ Thursday, February 27, 14
  • 6.
    But that’s notwhat you usually search for! • You want: Search by author Search by time Search by sentiment Search by location Search by everything! Thursday, February 27, 14
  • 7.
    That’s a lotof metadata! • You can’t search through all that on the fly if you want realtime results • You need to index it first! Thursday, February 27, 14
  • 8.
    Inverted Index • Somedocuments:sells Facebook’ [Monday] 1: ‘Mark Zuckerberg 2: ‘Facebook buys WhatsApp’ [Tuesday] 3: ‘Mark’s Facebook buys Instagram’[Monday] • Inverted index for them:{ 1, 2, 3} Facebook: Mark: {1, 3} Instagram: {2} WhatsApp: {2} [Monday]: {1, 3} Thursday, February 27, 14
  • 9.
    Ok, now thatwe have data, we also want some numbers behind it! • In our previous example: • Facebook is mentioned 3 times • There are 2 posts on [Monday] • The most frequent words are Facebook and Mark Thursday, February 27, 14
  • 10.
    All 3 puttogether Elasticsearch = Search(Content & Metadata) + Analytics (oversimplified) Thursday, February 27, 14
  • 11.
    Let’s look atsome search features of Elasticsearch Thursday, February 27, 14
  • 12.
    Features: Complex Queries •Boolean Operators: (apple OR pumpkin) AND pie • Wildcards: app*: apple, apples, appliance appl?: apple, apply • Fuzzy: back~: back, pack, black, bank • Ranged: Thursday, February 27, 14
  • 13.
    Features: Complex Queries •Attribute filtering: apple AND pie AND location:california • Range filtering: apple AND published:[1393100055 TO 1393427055] Thursday, February 27, 14
  • 14.
    Features:Geo Queries Bounding BoxQueries Queries Thursday, February 27, 14 Distance Range
  • 15.
    Feature: built inanalytics Thursday, February 27, 14
  • 16.
    Feature: Built intagcloud Thursday, February 27, 14
  • 17.
  • 18.
    Distributed • Clustering datainto multiple servers is easy and abstracted away from the developer Thursday, February 27, 14
  • 19.
    Performance/Scalability • Add andtake nodes on the fly without ever stopping the search service Thursday, February 27, 14
  • 20.
    Performance/Scalability • Can scaleindependently both indexing and searching Thursday, February 27, 14
  • 21.
    Performance/Scalability • With fewnodes you can do complex queries on billions of documents • 3 nodes: 20 mil documents with 2 replicas each Thursday, February 27, 14
  • 22.
    Easy to backup • Elasticsearch has a built in backup solution so that you don’t have to worry about implementing one Thursday, February 27, 14
  • 23.