This presentation introduces the open sourced Lucene based implementation of the Cassandra secondary indexes developed by Stratio. It allows users to make complex queries in Cassandra using CQL3, including full text search, top-k queries and free multivariable search. Relevance queries and filters can be combined to make searches such as “give me the 100 tweets that best matches this phrase of those written in a certain date range”.
Cluster-wide relevance search allows retrieving the N more relevant results that meet a given condition. It’s done through a modified version of Cassandra’s storage proxy in which the coordinator node requests the N best results of each node in the cluster in parallel and combines their partial results to get the N best of them.
Stratio’s index is fully compatible with Apache Spark and Apache Hadoop because it supports all the key/token restrictions in the CQL3 statements. Filters are a powerful help when analyzing the data stored in Cassandra with MapReduce frameworks such as Hadoop or, even better, Spark. Filtering the job input avoids full data scanning, dramatically reducing the amount of data to be processed.
Any cell in the tables can be indexed, including primary keys as well as collections. CQL3 wide rows are also supported.