3. Outline
• What’s full text search and why do we use
it?
• What can you do with Elasticsearch?
• Why is Elasticsearch different?
• DEMO TIME!
Thursday, February 27, 14
4. Text Search
do I really need to explain it?
Thursday, February 27, 14
5. %LIKE%
• In the beginning there was:
SELECT * FROM tweets WHERE content
LIKE ‘%zuckerberg%’
Thursday, February 27, 14
6. But that’s not what you usually search for!
• You want:
Search by author
Search by time
Search by sentiment
Search by location
Search by everything!
Thursday, February 27, 14
7. That’s a lot of metadata!
• You can’t search through all that on the fly
if you want realtime results
• You need to index it first!
Thursday, February 27, 14
8. Inverted Index
• Some documents:sells Facebook’ [Monday]
1: ‘Mark Zuckerberg
2: ‘Facebook buys WhatsApp’ [Tuesday]
3: ‘Mark’s Facebook buys Instagram’[Monday]
• Inverted index for them:{ 1, 2, 3}
Facebook:
Mark: {1, 3}
Instagram: {2}
WhatsApp: {2}
[Monday]: {1, 3}
Thursday, February 27, 14
9. Ok, now that we have data, we also want some
numbers behind it!
• In our previous example:
• Facebook is mentioned 3 times
• There are 2 posts on [Monday]
• The most frequent words are
Facebook and Mark
Thursday, February 27, 14
10. All 3 put together
Elasticsearch
=
Search(Content & Metadata) + Analytics
(oversimplified)
Thursday, February 27, 14
11. Let’s look at some
search features of
Elasticsearch
Thursday, February 27, 14
12. Features: Complex Queries
• Boolean Operators:
(apple OR pumpkin) AND pie
• Wildcards:
app*: apple, apples, appliance
appl?: apple, apply
• Fuzzy:
back~: back, pack, black, bank
• Ranged:
Thursday, February 27, 14
13. Features: Complex Queries
• Attribute filtering:
apple AND pie AND location:california
• Range filtering:
apple AND published:[1393100055 TO 1393427055]
Thursday, February 27, 14
21. Performance/Scalability
• With few nodes you can do complex
queries on billions of documents
• 3 nodes: 20 mil documents with 2 replicas
each
Thursday, February 27, 14
22. Easy to back up
• Elasticsearch has a built in backup solution
so that you don’t have to worry about
implementing one
Thursday, February 27, 14