Your SlideShare is downloading. ×
0
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
DIY Percolator
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

DIY Percolator

1,296

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,296
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. DIY Percolator Jaideep Dhok @jdhok
  • 2. ElasticSearch ● Schema less (sort of) ● Single index can hold docs of multiple types ● Distributed index ● Query, and Document routing ● Faceting ● Scripting ● Percolator!
  • 3. Percolator in ElasticSearch ● You add queries to the percolator ● ES will save these in an index ● Later, you can 'percolate' a document and get matching queries in response ● Optionally percolate documents while indexing
  • 4. esClient.preparePercolate(indexName, typeName) .setSource(doc) // JSON document .execute() // Gives a listenable Future .addListener(new ActionListener<PercolateResponse>() { @Override public void onResponse(PercolateResponse response) { // Get ID of matching queries List<String> matchingQueries = response.matches(); // Have fun } });
  • 5. Uses ● Standing queries ● Update alerts ● Streaming ● Log debugging
  • 6. Log debugging ● Logstash pushes logs to a web server ● Clients register queries with the server ● Server routes incoming log messages to matching queries
  • 7. How does it work? ● MemoryIndex ● Hold a single document in the index ● For each incoming document ○ Clear the index ○ Add the doc to the index ○ Search all queries one by one ■ if score > 0: add query Id to matched list ○ return matched list
  • 8. MemoryIndex ● Not RAMDirectory ● addField() ● IndexSearcher createSearcher() ● float search(Query) ● reset()
  • 9. Workflow differences ● Directory: IndexWriter -> Docuement -> Query ● MemoryIndex: Index -> Fields -> Query
  • 10. Let's write our own ● addQuery(Query) ● List<Query> getMatchingQueries(String jsonDoc)
  • 11. public Percolator() { queries = new ArrayList<Query>(); index = new MemoryIndex(); } public void addQuery(String query) throws ParseException { Analyzer analyzer = new SimpleAnalyzer(VERSION); QueryParser parser = new QueryParser(VERSION, F_CONTENT, analyzer); queries.add(parser.parse(query)); }
  • 12. public List<Query> getMatchingQueries(String doc) { synchronized (index) { index.reset(); index.addField(F_CONTENT, doc, new SimpleAnalyzer(VERSION)); } List<Query> matching = new ArrayList<Query>(); for (Query qry : queries) { if (index.search(qry) > 0.0f) { matching.add(qry); } else { // Didn't match } } return matching; }
  • 13. Miscellaneous ● Adding documents is not thread safe ● "Typically, it is about 10-100 times faster than RAMDirectory" ● "Memory consumption is probably larger than for RAMDirectory" ● Indexing a field is O(N) best case, O(Nlog(N)) worst case, where N = number of tokens
  • 14. Resources ● ElasticSearch feature - http://www.elasticsearch. org/blog/percolator/ ● MemoryIndex - http://lucene.apache. org/core/4_4_0/memory/index.html ● Code - github: jdhok/diypercolate
  • 15. Thank You

×