0
DIY Percolator
Jaideep Dhok
@jdhok
ElasticSearch
● Schema less (sort of)
● Single index can hold docs of multiple types
● Distributed index
● Query, and Docu...
Percolator in ElasticSearch
● You add queries to the percolator
● ES will save these in an index
● Later, you can 'percola...
esClient.preparePercolate(indexName, typeName)
.setSource(doc) // JSON document
.execute() // Gives a listenable Future
.a...
Uses
● Standing queries
● Update alerts
● Streaming
● Log debugging
Log debugging
● Logstash pushes logs to a web server
● Clients register queries with the server
● Server routes incoming l...
How does it work?
● MemoryIndex
● Hold a single document in the index
● For each incoming document
○ Clear the index
○ Add...
MemoryIndex
● Not RAMDirectory
● addField()
● IndexSearcher createSearcher()
● float search(Query)
● reset()
Workflow differences
● Directory: IndexWriter -> Docuement -> Query
● MemoryIndex: Index -> Fields -> Query
Let's write our own
● addQuery(Query)
● List<Query> getMatchingQueries(String
jsonDoc)
public Percolator() {
queries = new ArrayList<Query>();
index = new MemoryIndex();
}
public void addQuery(String query) th...
public List<Query> getMatchingQueries(String doc) {
synchronized (index) {
index.reset();
index.addField(F_CONTENT, doc,
n...
Miscellaneous
● Adding documents is not thread safe
● "Typically, it is about 10-100 times faster than
RAMDirectory"
● "Me...
Resources
● ElasticSearch feature - http://www.elasticsearch.
org/blog/percolator/
● MemoryIndex - http://lucene.apache.
o...
Thank You
Upcoming SlideShare
Loading in...5
×

DIY Percolator

1,380

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,380
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
17
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "DIY Percolator"

  1. 1. DIY Percolator Jaideep Dhok @jdhok
  2. 2. ElasticSearch ● Schema less (sort of) ● Single index can hold docs of multiple types ● Distributed index ● Query, and Document routing ● Faceting ● Scripting ● Percolator!
  3. 3. Percolator in ElasticSearch ● You add queries to the percolator ● ES will save these in an index ● Later, you can 'percolate' a document and get matching queries in response ● Optionally percolate documents while indexing
  4. 4. esClient.preparePercolate(indexName, typeName) .setSource(doc) // JSON document .execute() // Gives a listenable Future .addListener(new ActionListener<PercolateResponse>() { @Override public void onResponse(PercolateResponse response) { // Get ID of matching queries List<String> matchingQueries = response.matches(); // Have fun } });
  5. 5. Uses ● Standing queries ● Update alerts ● Streaming ● Log debugging
  6. 6. Log debugging ● Logstash pushes logs to a web server ● Clients register queries with the server ● Server routes incoming log messages to matching queries
  7. 7. How does it work? ● MemoryIndex ● Hold a single document in the index ● For each incoming document ○ Clear the index ○ Add the doc to the index ○ Search all queries one by one ■ if score > 0: add query Id to matched list ○ return matched list
  8. 8. MemoryIndex ● Not RAMDirectory ● addField() ● IndexSearcher createSearcher() ● float search(Query) ● reset()
  9. 9. Workflow differences ● Directory: IndexWriter -> Docuement -> Query ● MemoryIndex: Index -> Fields -> Query
  10. 10. Let's write our own ● addQuery(Query) ● List<Query> getMatchingQueries(String jsonDoc)
  11. 11. public Percolator() { queries = new ArrayList<Query>(); index = new MemoryIndex(); } public void addQuery(String query) throws ParseException { Analyzer analyzer = new SimpleAnalyzer(VERSION); QueryParser parser = new QueryParser(VERSION, F_CONTENT, analyzer); queries.add(parser.parse(query)); }
  12. 12. public List<Query> getMatchingQueries(String doc) { synchronized (index) { index.reset(); index.addField(F_CONTENT, doc, new SimpleAnalyzer(VERSION)); } List<Query> matching = new ArrayList<Query>(); for (Query qry : queries) { if (index.search(qry) > 0.0f) { matching.add(qry); } else { // Didn't match } } return matching; }
  13. 13. Miscellaneous ● Adding documents is not thread safe ● "Typically, it is about 10-100 times faster than RAMDirectory" ● "Memory consumption is probably larger than for RAMDirectory" ● Indexing a field is O(N) best case, O(Nlog(N)) worst case, where N = number of tokens
  14. 14. Resources ● ElasticSearch feature - http://www.elasticsearch. org/blog/percolator/ ● MemoryIndex - http://lucene.apache. org/core/4_4_0/memory/index.html ● Code - github: jdhok/diypercolate
  15. 15. Thank You
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×