DIY Percolator
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

DIY Percolator

  • 1,457 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,457
On Slideshare
1,451
From Embeds
6
Number of Embeds
1

Actions

Shares
Downloads
9
Comments
0
Likes
1

Embeds 6

https://twitter.com 6

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. DIY Percolator Jaideep Dhok @jdhok
  • 2. ElasticSearch ● Schema less (sort of) ● Single index can hold docs of multiple types ● Distributed index ● Query, and Document routing ● Faceting ● Scripting ● Percolator!
  • 3. Percolator in ElasticSearch ● You add queries to the percolator ● ES will save these in an index ● Later, you can 'percolate' a document and get matching queries in response ● Optionally percolate documents while indexing
  • 4. esClient.preparePercolate(indexName, typeName) .setSource(doc) // JSON document .execute() // Gives a listenable Future .addListener(new ActionListener<PercolateResponse>() { @Override public void onResponse(PercolateResponse response) { // Get ID of matching queries List<String> matchingQueries = response.matches(); // Have fun } });
  • 5. Uses ● Standing queries ● Update alerts ● Streaming ● Log debugging
  • 6. Log debugging ● Logstash pushes logs to a web server ● Clients register queries with the server ● Server routes incoming log messages to matching queries
  • 7. How does it work? ● MemoryIndex ● Hold a single document in the index ● For each incoming document ○ Clear the index ○ Add the doc to the index ○ Search all queries one by one ■ if score > 0: add query Id to matched list ○ return matched list
  • 8. MemoryIndex ● Not RAMDirectory ● addField() ● IndexSearcher createSearcher() ● float search(Query) ● reset()
  • 9. Workflow differences ● Directory: IndexWriter -> Docuement -> Query ● MemoryIndex: Index -> Fields -> Query
  • 10. Let's write our own ● addQuery(Query) ● List<Query> getMatchingQueries(String jsonDoc)
  • 11. public Percolator() { queries = new ArrayList<Query>(); index = new MemoryIndex(); } public void addQuery(String query) throws ParseException { Analyzer analyzer = new SimpleAnalyzer(VERSION); QueryParser parser = new QueryParser(VERSION, F_CONTENT, analyzer); queries.add(parser.parse(query)); }
  • 12. public List<Query> getMatchingQueries(String doc) { synchronized (index) { index.reset(); index.addField(F_CONTENT, doc, new SimpleAnalyzer(VERSION)); } List<Query> matching = new ArrayList<Query>(); for (Query qry : queries) { if (index.search(qry) > 0.0f) { matching.add(qry); } else { // Didn't match } } return matching; }
  • 13. Miscellaneous ● Adding documents is not thread safe ● "Typically, it is about 10-100 times faster than RAMDirectory" ● "Memory consumption is probably larger than for RAMDirectory" ● Indexing a field is O(N) best case, O(Nlog(N)) worst case, where N = number of tokens
  • 14. Resources ● ElasticSearch feature - http://www.elasticsearch. org/blog/percolator/ ● MemoryIndex - http://lucene.apache. org/core/4_4_0/memory/index.html ● Code - github: jdhok/diypercolate
  • 15. Thank You