DIY Percolator

•

3 likes•2,596 views

jdhok

Technology

ElasticSearch
● Schema less (sort of)
● Single index can hold docs of multiple types
● Distributed index
● Query, and Document routing
● Faceting
● Scripting
● Percolator!

Percolator in ElasticSearch
● You add queries to the percolator
● ES will save these in an index
● Later, you can 'percolate' a document and get
matching queries in response
● Optionally percolate documents while indexing

esClient.preparePercolate(indexName, typeName)
.setSource(doc) // JSON document
.execute() // Gives a listenable Future
.addListener(new ActionListener<PercolateResponse>() {
@Override
public void onResponse(PercolateResponse response) {
// Get ID of matching queries
List<String> matchingQueries = response.matches();
// Have fun
}
});

Uses
● Standing queries
● Update alerts
● Streaming
● Log debugging

Log debugging
● Logstash pushes logs to a web server
● Clients register queries with the server
● Server routes incoming log messages to
matching queries

How does it work?
● MemoryIndex
● Hold a single document in the index
● For each incoming document
○ Clear the index
○ Add the doc to the index
○ Search all queries one by one
■ if score > 0: add query Id to matched list
○ return matched list

MemoryIndex
● Not RAMDirectory
● addField()
● IndexSearcher createSearcher()
● float search(Query)
● reset()

Workflow differences
● Directory: IndexWriter -> Docuement -> Query
● MemoryIndex: Index -> Fields -> Query

Let's write our own
● addQuery(Query)
● List<Query> getMatchingQueries(String
jsonDoc)

$public Percolator() { queries = new ArrayList<Query>(); index = new MemoryIndex(); } public void addQuery(String query) throws ParseException { Analyzer analyzer = new SimpleAnalyzer(VERSION); QueryParser parser = new QueryParser(VERSION, F_CONTENT, analyzer); queries.add(parser.parse(query)); }$

$public List<Query> getMatchingQueries(String doc) { synchronized (index) { index.reset(); index.addField(F_CONTENT, doc, new SimpleAnalyzer(VERSION)); } List<Query> matching = new ArrayList<Query>(); for (Query qry : queries) { if (index.search(qry) > 0.0f) { matching.add(qry); } else { // Didn't match } } return matching; }$

Miscellaneous
● Adding documents is not thread safe
● "Typically, it is about 10-100 times faster than
RAMDirectory"
● "Memory consumption is probably larger than
for RAMDirectory"
● Indexing a field is O(N) best case, O(Nlog(N))
worst case, where N = number of tokens

Resources
● ElasticSearch feature - http://www.elasticsearch.
org/blog/percolator/
● MemoryIndex - http://lucene.apache.
org/core/4_4_0/memory/index.html
● Code - github: jdhok/diypercolate

What's hot

Learn Ajax herejarnail

Chapter iii(working with data)Chhom Karath

SQLite with UWPCheah Eng Soon

Brief introduction of SlickKnoldus Inc.

MongoDB and Indexes - MUG Denver - 20160329Douglas Duncan

Asp.net create delete directory folder in c# vb.netrelekarsushant

JavaScript client API for Google Apps Script API primerBruce McPherson

Indexing with MongoDBMongoDB

T-SQL & TriggersAyesha Maqsood

AjaxManav Prasad

Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkFlink Forward

Fast querying indexing for performance (4)MongoDB

Lec 7maamir farooq

บทที่4Waritsara Sonchan

Let's talk about NoSQL StandardOtavio Santana

Query plannerMiguel Angel Nieto

Full Text Search In PostgreSQLKarwin Software Solutions LLC

Selectors and normalizing state shapeMuntasir Chowdhury

04 data accesstechnologiesBat Programmer

Indexing and Query Optimizer (Aaron Staple)MongoSF

What's hot (20)

Learn Ajax here

Chapter iii(working with data)

SQLite with UWP

Brief introduction of Slick

MongoDB and Indexes - MUG Denver - 20160329

Asp.net create delete directory folder in c# vb.net

JavaScript client API for Google Apps Script API primer

Indexing with MongoDB

T-SQL & Triggers

Ajax

Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink

Fast querying indexing for performance (4)

Lec 7

บทที่4

Let's talk about NoSQL Standard

Query planner

Full Text Search In PostgreSQL

Selectors and normalizing state shape

04 data accesstechnologies

Indexing and Query Optimizer (Aaron Staple)

Similar to DIY Percolator

Tutorial on developing a Solr search component pluginsearchbox-com

Building a Search Engine Using LuceneAbdelrahman Othman Helal

Examiness hints and tips from the trenchesIsmail Mayat

Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico

Elasticsearch an overviewAmit Juneja

Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesHolden Karau

Introduction To Apache LuceneMindfire Solutions

Lucene in ActionDevOWL Meetup

Tutorial 5 (lucene)Kira

Querydsl fin jug - june 2012Timo Westkämper

Power tools in JavaDPC Consulting Ltd

Apache Spark in your likeness - low and high level customizationBartosz Konieczny

Lucene Introductionotisg

Apache Lucene/Solr Document ClassificationSease

whats New in axapta 2012H B Kiran

Local StorageIvano Malavolta

Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters

Lucene And Solr Document ClassificationAlessandro Benedetti

Local storage in Web appsIvano Malavolta

Spray Json and MongoDB Queries: Insights and Simple Tricks.Andrii Lashchenko

Similar to DIY Percolator (20)

Tutorial on developing a Solr search component plugin

Building a Search Engine Using Lucene

Examiness hints and tips from the trenches

Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...

Elasticsearch an overview

Introducing Apache Spark's Data Frames and Dataset APIs workshop series

Introduction To Apache Lucene

Lucene in Action

Tutorial 5 (lucene)

Querydsl fin jug - june 2012

Power tools in Java

Apache Spark in your likeness - low and high level customization

Lucene Introduction

Apache Lucene/Solr Document Classification

whats New in axapta 2012

Local Storage

Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...

Lucene And Solr Document Classification

Local storage in Web apps

Spray Json and MongoDB Queries: Insights and Simple Tricks.

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

🐬 The future of MySQL is Postgres 🐘RTylerCroy

A Call to Action for Generative AI in 2024Results

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

How to convert PDF to text with Nanonetsnaman860154

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

08448380779 Call Girls In Civil Lines Women Seeking Men

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

IAC 2024 - IA Fast Track to Search Focused AI Solutions

CNv6 Instructor Chapter 6 Quality of Service

🐬 The future of MySQL is Postgres 🐘

A Call to Action for Generative AI in 2024

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Injustice - Developers Among Us (SciFiDevCon 2024)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Breaking the Kubernetes Kill Chain: Host Path Mount

How to convert PDF to text with Nanonets

My Hashitalk Indonesia April 2024 Presentation

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

08448380779 Call Girls In Friends Colony Women Seeking Men

Handwritten Text Recognition for manuscripts and early printed texts

The Codex of Business Writing Software for Real-World Solutions 2.pptx

GenCyber Cyber Security Day Presentation

Unblocking The Main Thread Solving ANRs and Frozen Frames

DIY Percolator

1. DIY Percolator Jaideep Dhok @jdhok

2. ElasticSearch ● Schema less (sort of) ● Single index can hold docs of multiple types ● Distributed index ● Query, and Document routing ● Faceting ● Scripting ● Percolator!

3. Percolator in ElasticSearch ● You add queries to the percolator ● ES will save these in an index ● Later, you can 'percolate' a document and get matching queries in response ● Optionally percolate documents while indexing

4. esClient.preparePercolate(indexName, typeName) .setSource(doc) // JSON document .execute() // Gives a listenable Future .addListener(new ActionListener<PercolateResponse>() { @Override public void onResponse(PercolateResponse response) { // Get ID of matching queries List<String> matchingQueries = response.matches(); // Have fun } });

5. Uses ● Standing queries ● Update alerts ● Streaming ● Log debugging

6. Log debugging ● Logstash pushes logs to a web server ● Clients register queries with the server ● Server routes incoming log messages to matching queries

7. How does it work? ● MemoryIndex ● Hold a single document in the index ● For each incoming document ○ Clear the index ○ Add the doc to the index ○ Search all queries one by one ■ if score > 0: add query Id to matched list ○ return matched list

8. MemoryIndex ● Not RAMDirectory ● addField() ● IndexSearcher createSearcher() ● float search(Query) ● reset()

9. Workflow differences ● Directory: IndexWriter -> Docuement -> Query ● MemoryIndex: Index -> Fields -> Query

10. Let's write our own ● addQuery(Query) ● List<Query> getMatchingQueries(String jsonDoc)

11. public Percolator() { queries = new ArrayList<Query>(); index = new MemoryIndex(); } public void addQuery(String query) throws ParseException { Analyzer analyzer = new SimpleAnalyzer(VERSION); QueryParser parser = new QueryParser(VERSION, F_CONTENT, analyzer); queries.add(parser.parse(query)); }

12. public List<Query> getMatchingQueries(String doc) { synchronized (index) { index.reset(); index.addField(F_CONTENT, doc, new SimpleAnalyzer(VERSION)); } List<Query> matching = new ArrayList<Query>(); for (Query qry : queries) { if (index.search(qry) > 0.0f) { matching.add(qry); } else { // Didn't match } } return matching; }

13. Miscellaneous ● Adding documents is not thread safe ● "Typically, it is about 10-100 times faster than RAMDirectory" ● "Memory consumption is probably larger than for RAMDirectory" ● Indexing a field is O(N) best case, O(Nlog(N)) worst case, where N = number of tokens

14. Resources ● ElasticSearch feature - http://www.elasticsearch. org/blog/percolator/ ● MemoryIndex - http://lucene.apache. org/core/4_4_0/memory/index.html ● Code - github: jdhok/diypercolate

15. Thank You

DIY Percolator

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DIY Percolator

Similar to DIY Percolator (20)

Recently uploaded

Recently uploaded (20)

DIY Percolator