SlideShare a Scribd company logo
1 of 32
Download to read offline
Advanced search and
Top-K queries in Cassandra
1
Andrés de la Peña
andres@stratio.com
@a_de_la_pena
Apache Cassandra Meetup 2015
•  Stratio is a Big Data Company
•  Founded in 2013
•  Commercially launched in 2014
•  70+ employees in Madrid
•  Office in San Francisco
•  Certified Spark distribution
Apache Cassandra Meetup 2015
Who are we?
Introduction to Cassandra
Cassandra query methods
Stratio Lucene based 2i implementation
Integrating Lucene 2i with Apache Spark
1
2
3
CONTENTS
Apache Cassandra Meetup 2015
4
Tunable	
  consistency	
  
Tradeoffs between consistency and latency are tunable. C* values a high
availability and partitioning against consistency; strong consistency can be
achieved but there is no row locking.
Incremental	
  scalability	
  
Nodes added to a cluster increase throughput in a predictable & linear fashion.
The	
  best	
  of	
  Dynamo	
  &	
  Big	
  Table	
  
Combines the partitioning and replication of Amazon’s Dynamo with the log-
structured data model of Google’s Bigtable.
Decentralized	
  
P2P architecture without master node or single point of failure.
Apache Cassandra overview
4Apache Cassandra Meetup 2015
Apache Cassandra operators
5Apache Cassandra Meetup 2015
primary key
secondary indexes
token ranges
Throughput
Expressiveness
Cassandra query methods
6Apache Cassandra Meetup 2015
•  O(1) node lookup for partition key
•  Range slices for clustering key
•  Usually requires denormalization
Primary key queries
Node
3
Node
1
Node
2
Partition key Clustering key range
CLIENT
apena
2014-04-10:body
When you..
aagea
dhiguero
apena
2014-04-06:body 2014-04-07:body 2014-04-08:body
To study and… To think and... If you see what..
2014-04-06:body
The cautious…
2014-04-10:body
When you..
2014-04-11:body
When you do…
7Apache Cassandra Meetup 2015
primary key
secondary indexes
token ranges
Throughput
Expressiveness
Cassandra query methods
8Apache Cassandra Meetup 2015
CLIENT
C*
node
C*
node
2i local column
family
C*
node
2i local column
family
2i local column
family
Secondary indexes queries
•  Inverted index
•  Mitigates denormalization
•  Queries may involve all C* nodes
•  Queries limited to a single column
9Apache Cassandra Meetup 2015
primary key
secondary indexes
token ranges
Throughput
Expressiveness
Cassandra query methods
10Apache Cassandra Meetup 2015
C*	
  
node	
  
C*	
  
node	
  
C*	
  
node	
  
Spark
master
Token range queries
•  Used by MapReduce frameworks
as Hadoop or Spark
•  All kinds of queries are possible
•  Low throughput
•  Ad-hoc queries
•  Batch processing
•  Materialized views
CLIENT
query= function (all data)
11Apache Cassandra Meetup 2015
C*	
  
node	
  
C*	
  
node	
  
C*	
  
node	
  
Combining 2i with MapReduce
•  Expressiveness avoiding full scans
•  Still limited by one indexed column per query
Spark
masterCLIENT
Secondary
index
Secondary
index
Secondary
index
12Apache Cassandra Meetup 2015
MORE EXPRESIVENESS
What do we miss from 2i indexes?
•  Range queries
•  Multivariable search
•  Full text search
•  Sorting by fields
•  Top-k queries
13Apache Cassandra Meetup 2015
IT’S ARCHITECTURE
What do we like from the existing 2i?
•  Each node indexes its own data
•  The index implementations do not need to be distributed
•  Can be created after design and ingestion
•  Natural extension point
14Apache Cassandra Meetup 2015
Thinking in a custom secondary index implementation…
WHY NOT USE ?
15Apache Cassandra Meetup 2015
Why we like Lucene
•  Proven stable and fast indexing solution
•  Expressive queries
- Multivariable, ranges, full text, sorting, top-k, etc.
•  Mature distributed search solutions built on top of it
- Solr, ElasticSearch
•  Can be fully embedded in application code
•  Published under the Apache License
16Apache Cassandra Meetup 2015
HOW IT WORKS
Apache Cassandra Meetup 2015
ALTER TABLE tweets ADD lucene TEXT;
CREATE TABLE tweets (
id bigint,
createdAt timestamp,
message text,
userid bigint,
username text,
PRIMARY KEY (userid, createdAt, id) );
Create index
•  Built in the background in any moment
•  Real time updates
•  Mapping eases ETL
•  Language aware
18
CREATE CUSTOM INDEX tweets_idx ON twitter.tweets (lucene)
USING 'com.stratio.index.RowIndex'
WITH OPTIONS = {
'refresh_seconds' : '60',
'schema' : '{
default_analyzer : "EnglishAnalyzer”,
fields : {
createdat : {type : "date", pattern : "yyyy-MM-dd"},
message : {type : "text", analyzer : ”EnglishAnalyzer"},
userid : {type : "string"},
username : {type : "string”}
}
}'
};
Apache Cassandra Meetup 2015
SELECT * FROM tweets WHERE lucene
= ‘{
filter : {type : "match",
field : "text",
value : "cassandra"}
}’ LIMIT 10;
search 10
found 6
found 4
We are done !
Filtering query
CLIENT
C*
node
C*
node
C*
node
Lucene
index
Lucene
index
Lucene
index
19Apache Cassandra Meetup 2015
Found 5
Found 4
Found 5
Top-k query
SELECT * FROM tweets WHERE lucene
= ‘{
query: {type : "match",
field : "text",
value : "cassandra"}
}’ LIMIT 5;
Search top-5
CLIENT
Search top-5
C*
node
C*
node
C*
node
Lucene
index
Lucene
index
Lucene
index
Merge 14
to best 5
20Apache Cassandra Meetup 2015
SELECT * FROM tweets WHERE lucene = ‘{
filter :
{
type : "boolean", must :
[
{type : "range", field : "time" lower : "2014/04/25”},
{type : "boolean", should :
[
{type : "prefix", field : "user", value : "a"} ,
{type : "wildcard", field : "user", value : "*b*"} ,
{type : "match", field : "user", value : "fast"}
]
}
]
},
sort :
{
fields: [ {field :"time", reverse : true},
{field : "user", reverse : false} ]
}
}’ LIMIT 10000;
Queries can be as complex as you want
21Apache Cassandra Meetup 2015
NO MAINTENANCE REQUIRED
Some implementation details
•  A Lucene document per CQL row, and a Lucene field per indexed column
•  SortingMergePolicy keeps index sorted in the same way that C* does
•  Index commits synchronized with column family flushes
•  Segments merge synchronized with column family compactions
22Apache Cassandra Meetup 2015
LUCENE
AND
SPARK
Apache Cassandra Meetup 2015
Split friendly. It supports searches within a token range
SELECT * FROM tweets WHERE lucene = ‘{
filter : {type:"match", field:"text", value:"cassandra"}
}’
AND TOKEN(userid, createdAt, id) > 253653456456
AND TOKEN(userid, createdAt, id) <= 3456467456756
LIMIT 10000;
Integrating Lucene & Spark
24Apache Cassandra Meetup 2015
SELECT * FROM tweets WHERE lucene = ‘{
filter : {type:"match", field:"text", value:"cassandra"}
}’
AND userid = 3543534
AND createdAt > 2011-02-03 04:05+0000
LIMIT 5000;
Paging friendly: It supports starting queries in a certain point
Integrating Lucene & Spark
25Apache Cassandra Meetup 2015
Integrating Lucene & Spark
CLIENT
Spark
master
C*
node
C*
node
C*
node
Lucene
Lucene
Lucene
•  Compute large amounts of data
•  Avoid systematic full scan
•  Reduces the amount of data to be processed
•  Filtering push-down
26Apache Cassandra Meetup 2015
WHEN TO
USE INDEXES
AND WHEN TO
USE FULL SCAN
Apache Cassandra Meetup 2015
Index performance in Spark
Time
Records returned
Full scan
Lucene 2i
28Apache Cassandra Meetup 2015
DEMOLucene indexes in C*
Apache Cassandra Meetup 2015
Conclusions
•  Added new query methods
- Multivariable queries (AND, OR, NOT)
- Range queries (>, >=, <, <=) and regular expressions
- Full text queries (match, phrase, fuzzy...)
•  Top-k query support
- Lucene scoring formula
- Sort by field values
•  Compatible with MapReduce frameworks
•  Preserves Cassandra’s functionality
30Apache Cassandra Meetup 2015
Its open source
31
github.com/stratio/stratio-cassandra
•  Published as fork of Apache Cassandra
•  Apache License Version 2.0
stratio.github.io/crossdata
•  Apache License Version 2.0
github.com/stratio/deep-spark
•  Apache License Version 2.0
Apache Cassandra Meetup 2015
Advanced search and
Top-K queries in Cassandra
32
Andrés de la Peña
andres@stratio.com
@a_de_la_pena
Apache Cassandra Meetup 2015

More Related Content

What's hot

Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Stratio
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
An Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchAn Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchPatricia Gorla
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringAnant Rustagi
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQLSATOSHI TAGOMORI
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data prajods
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark Summit
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...DataStax Academy
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Summit
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...Simon Ambridge
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache SparkMammoth Data
 

What's hot (20)

Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
An Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchAn Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise Search
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroring
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher Batey
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 

Viewers also liked

Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scalaStratio
 
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Stratio
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseDataStax Academy
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized ViewsCarl Yeksigian
 
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...DataStax Academy
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
Bulk Loading Data into Cassandra
Bulk Loading Data into CassandraBulk Loading Data into Cassandra
Bulk Loading Data into CassandraDataStax
 
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web ServiceSpark Summit
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...MongoDB
 
Cassandra and materialized views
Cassandra and materialized viewsCassandra and materialized views
Cassandra and materialized viewsGrzegorz Duda
 
OpenStack Architecture
OpenStack ArchitectureOpenStack Architecture
OpenStack ArchitectureMirantis
 

Viewers also liked (15)

Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scala
 
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized Views
 
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
Bulk Loading Data into Cassandra
Bulk Loading Data into CassandraBulk Loading Data into Cassandra
Bulk Loading Data into Cassandra
 
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
 
Cassandra and materialized views
Cassandra and materialized viewsCassandra and materialized views
Cassandra and materialized views
 
OpenStack Architecture
OpenStack ArchitectureOpenStack Architecture
OpenStack Architecture
 

Similar to Advanced search and Top-K queries in Cassandra

Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Andrés de la Peña
 
Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Akash Kant
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Databricks
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesDatabricks
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xshradha ambekar
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Databricks
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & SparkMatthias Niehoff
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraStratio
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksAnyscale
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudKaran Singh
 
Announcing Spark Driver for Cassandra
Announcing Spark Driver for CassandraAnnouncing Spark Driver for Cassandra
Announcing Spark Driver for CassandraDataStax
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Sparknickmbailey
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 

Similar to Advanced search and Top-K queries in Cassandra (20)

Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
 
Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloud
 
Announcing Spark Driver for Cassandra
Announcing Spark Driver for CassandraAnnouncing Spark Driver for Cassandra
Announcing Spark Driver for Cassandra
 
Spark Introduction
Spark IntroductionSpark Introduction
Spark Introduction
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 

More from Stratio

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Stratio
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Stratio
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupStratio
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science MeetupStratio
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupStratio
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Stratio
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0Stratio
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challengeStratio
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big DataStratio
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformStratio
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” Stratio
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Stratio
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosStratio
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + KerberosStratio
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model TreesStratio
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] SparktaStratio
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scalaStratio
 
Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Stratio
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaStratio
 

More from Stratio (20)

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka Meetup
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challenge
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big Data
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric Platform
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack”
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelos
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + Kerberos
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model Trees
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
 
Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Advanced search and Top-K queries in Cassandra

  • 1. Advanced search and Top-K queries in Cassandra 1 Andrés de la Peña andres@stratio.com @a_de_la_pena Apache Cassandra Meetup 2015
  • 2. •  Stratio is a Big Data Company •  Founded in 2013 •  Commercially launched in 2014 •  70+ employees in Madrid •  Office in San Francisco •  Certified Spark distribution Apache Cassandra Meetup 2015 Who are we?
  • 3. Introduction to Cassandra Cassandra query methods Stratio Lucene based 2i implementation Integrating Lucene 2i with Apache Spark 1 2 3 CONTENTS Apache Cassandra Meetup 2015 4
  • 4. Tunable  consistency   Tradeoffs between consistency and latency are tunable. C* values a high availability and partitioning against consistency; strong consistency can be achieved but there is no row locking. Incremental  scalability   Nodes added to a cluster increase throughput in a predictable & linear fashion. The  best  of  Dynamo  &  Big  Table   Combines the partitioning and replication of Amazon’s Dynamo with the log- structured data model of Google’s Bigtable. Decentralized   P2P architecture without master node or single point of failure. Apache Cassandra overview 4Apache Cassandra Meetup 2015
  • 5. Apache Cassandra operators 5Apache Cassandra Meetup 2015
  • 6. primary key secondary indexes token ranges Throughput Expressiveness Cassandra query methods 6Apache Cassandra Meetup 2015
  • 7. •  O(1) node lookup for partition key •  Range slices for clustering key •  Usually requires denormalization Primary key queries Node 3 Node 1 Node 2 Partition key Clustering key range CLIENT apena 2014-04-10:body When you.. aagea dhiguero apena 2014-04-06:body 2014-04-07:body 2014-04-08:body To study and… To think and... If you see what.. 2014-04-06:body The cautious… 2014-04-10:body When you.. 2014-04-11:body When you do… 7Apache Cassandra Meetup 2015
  • 8. primary key secondary indexes token ranges Throughput Expressiveness Cassandra query methods 8Apache Cassandra Meetup 2015
  • 9. CLIENT C* node C* node 2i local column family C* node 2i local column family 2i local column family Secondary indexes queries •  Inverted index •  Mitigates denormalization •  Queries may involve all C* nodes •  Queries limited to a single column 9Apache Cassandra Meetup 2015
  • 10. primary key secondary indexes token ranges Throughput Expressiveness Cassandra query methods 10Apache Cassandra Meetup 2015
  • 11. C*   node   C*   node   C*   node   Spark master Token range queries •  Used by MapReduce frameworks as Hadoop or Spark •  All kinds of queries are possible •  Low throughput •  Ad-hoc queries •  Batch processing •  Materialized views CLIENT query= function (all data) 11Apache Cassandra Meetup 2015
  • 12. C*   node   C*   node   C*   node   Combining 2i with MapReduce •  Expressiveness avoiding full scans •  Still limited by one indexed column per query Spark masterCLIENT Secondary index Secondary index Secondary index 12Apache Cassandra Meetup 2015
  • 13. MORE EXPRESIVENESS What do we miss from 2i indexes? •  Range queries •  Multivariable search •  Full text search •  Sorting by fields •  Top-k queries 13Apache Cassandra Meetup 2015
  • 14. IT’S ARCHITECTURE What do we like from the existing 2i? •  Each node indexes its own data •  The index implementations do not need to be distributed •  Can be created after design and ingestion •  Natural extension point 14Apache Cassandra Meetup 2015
  • 15. Thinking in a custom secondary index implementation… WHY NOT USE ? 15Apache Cassandra Meetup 2015
  • 16. Why we like Lucene •  Proven stable and fast indexing solution •  Expressive queries - Multivariable, ranges, full text, sorting, top-k, etc. •  Mature distributed search solutions built on top of it - Solr, ElasticSearch •  Can be fully embedded in application code •  Published under the Apache License 16Apache Cassandra Meetup 2015
  • 17. HOW IT WORKS Apache Cassandra Meetup 2015
  • 18. ALTER TABLE tweets ADD lucene TEXT; CREATE TABLE tweets ( id bigint, createdAt timestamp, message text, userid bigint, username text, PRIMARY KEY (userid, createdAt, id) ); Create index •  Built in the background in any moment •  Real time updates •  Mapping eases ETL •  Language aware 18 CREATE CUSTOM INDEX tweets_idx ON twitter.tweets (lucene) USING 'com.stratio.index.RowIndex' WITH OPTIONS = { 'refresh_seconds' : '60', 'schema' : '{ default_analyzer : "EnglishAnalyzer”, fields : { createdat : {type : "date", pattern : "yyyy-MM-dd"}, message : {type : "text", analyzer : ”EnglishAnalyzer"}, userid : {type : "string"}, username : {type : "string”} } }' }; Apache Cassandra Meetup 2015
  • 19. SELECT * FROM tweets WHERE lucene = ‘{ filter : {type : "match", field : "text", value : "cassandra"} }’ LIMIT 10; search 10 found 6 found 4 We are done ! Filtering query CLIENT C* node C* node C* node Lucene index Lucene index Lucene index 19Apache Cassandra Meetup 2015
  • 20. Found 5 Found 4 Found 5 Top-k query SELECT * FROM tweets WHERE lucene = ‘{ query: {type : "match", field : "text", value : "cassandra"} }’ LIMIT 5; Search top-5 CLIENT Search top-5 C* node C* node C* node Lucene index Lucene index Lucene index Merge 14 to best 5 20Apache Cassandra Meetup 2015
  • 21. SELECT * FROM tweets WHERE lucene = ‘{ filter : { type : "boolean", must : [ {type : "range", field : "time" lower : "2014/04/25”}, {type : "boolean", should : [ {type : "prefix", field : "user", value : "a"} , {type : "wildcard", field : "user", value : "*b*"} , {type : "match", field : "user", value : "fast"} ] } ] }, sort : { fields: [ {field :"time", reverse : true}, {field : "user", reverse : false} ] } }’ LIMIT 10000; Queries can be as complex as you want 21Apache Cassandra Meetup 2015
  • 22. NO MAINTENANCE REQUIRED Some implementation details •  A Lucene document per CQL row, and a Lucene field per indexed column •  SortingMergePolicy keeps index sorted in the same way that C* does •  Index commits synchronized with column family flushes •  Segments merge synchronized with column family compactions 22Apache Cassandra Meetup 2015
  • 24. Split friendly. It supports searches within a token range SELECT * FROM tweets WHERE lucene = ‘{ filter : {type:"match", field:"text", value:"cassandra"} }’ AND TOKEN(userid, createdAt, id) > 253653456456 AND TOKEN(userid, createdAt, id) <= 3456467456756 LIMIT 10000; Integrating Lucene & Spark 24Apache Cassandra Meetup 2015
  • 25. SELECT * FROM tweets WHERE lucene = ‘{ filter : {type:"match", field:"text", value:"cassandra"} }’ AND userid = 3543534 AND createdAt > 2011-02-03 04:05+0000 LIMIT 5000; Paging friendly: It supports starting queries in a certain point Integrating Lucene & Spark 25Apache Cassandra Meetup 2015
  • 26. Integrating Lucene & Spark CLIENT Spark master C* node C* node C* node Lucene Lucene Lucene •  Compute large amounts of data •  Avoid systematic full scan •  Reduces the amount of data to be processed •  Filtering push-down 26Apache Cassandra Meetup 2015
  • 27. WHEN TO USE INDEXES AND WHEN TO USE FULL SCAN Apache Cassandra Meetup 2015
  • 28. Index performance in Spark Time Records returned Full scan Lucene 2i 28Apache Cassandra Meetup 2015
  • 29. DEMOLucene indexes in C* Apache Cassandra Meetup 2015
  • 30. Conclusions •  Added new query methods - Multivariable queries (AND, OR, NOT) - Range queries (>, >=, <, <=) and regular expressions - Full text queries (match, phrase, fuzzy...) •  Top-k query support - Lucene scoring formula - Sort by field values •  Compatible with MapReduce frameworks •  Preserves Cassandra’s functionality 30Apache Cassandra Meetup 2015
  • 31. Its open source 31 github.com/stratio/stratio-cassandra •  Published as fork of Apache Cassandra •  Apache License Version 2.0 stratio.github.io/crossdata •  Apache License Version 2.0 github.com/stratio/deep-spark •  Apache License Version 2.0 Apache Cassandra Meetup 2015
  • 32. Advanced search and Top-K queries in Cassandra 32 Andrés de la Peña andres@stratio.com @a_de_la_pena Apache Cassandra Meetup 2015