2. ElasticSearch
● Schema less (sort of)
● Single index can hold docs of multiple types
● Distributed index
● Query, and Document routing
● Faceting
● Scripting
● Percolator!
3. Percolator in ElasticSearch
● You add queries to the percolator
● ES will save these in an index
● Later, you can 'percolate' a document and get
matching queries in response
● Optionally percolate documents while indexing
4. esClient.preparePercolate(indexName, typeName)
.setSource(doc) // JSON document
.execute() // Gives a listenable Future
.addListener(new ActionListener<PercolateResponse>() {
@Override
public void onResponse(PercolateResponse response) {
// Get ID of matching queries
List<String> matchingQueries = response.matches();
// Have fun
}
});
6. Log debugging
● Logstash pushes logs to a web server
● Clients register queries with the server
● Server routes incoming log messages to
matching queries
7. How does it work?
● MemoryIndex
● Hold a single document in the index
● For each incoming document
○ Clear the index
○ Add the doc to the index
○ Search all queries one by one
■ if score > 0: add query Id to matched list
○ return matched list
11. public Percolator() {
queries = new ArrayList<Query>();
index = new MemoryIndex();
}
public void addQuery(String query) throws ParseException {
Analyzer analyzer = new SimpleAnalyzer(VERSION);
QueryParser parser = new QueryParser(VERSION,
F_CONTENT, analyzer);
queries.add(parser.parse(query));
}
12. public List<Query> getMatchingQueries(String doc) {
synchronized (index) {
index.reset();
index.addField(F_CONTENT, doc,
new SimpleAnalyzer(VERSION));
}
List<Query> matching = new ArrayList<Query>();
for (Query qry : queries) {
if (index.search(qry) > 0.0f) {
matching.add(qry);
} else { // Didn't match }
}
return matching;
}
13. Miscellaneous
● Adding documents is not thread safe
● "Typically, it is about 10-100 times faster than
RAMDirectory"
● "Memory consumption is probably larger than
for RAMDirectory"
● Indexing a field is O(N) best case, O(Nlog(N))
worst case, where N = number of tokens