SlideShare a Scribd company logo
Introducing
RediSearch Aggregations
Super fast data processing pipeline!
RediSearch 101
•High Performance Search Index
•Redis Module
•Written From Scratch in C
•Supports multiple index types
•Open Source / Commercial
– Commercial is scalable and distributed
– Several commercial customers
– Largest cluster: 200 shards, ~2B documents, 8TB RAM
What Can RediSearch Do?
•Full-Text, Numeric, Geo and Tag indexes
•Real-time indexing and updates
•Distributed search on Billions of documents
•Complex query language
•Auto-Complete
•Highlighting
•And….
NEW! AGGREGATIONS!
Aggregations In Practice
• Transform plain search results into statistical insights
•Using a dynamic pipeline of:
– Grouping and Reducing
– Applying Transformations
– Sorting
GROUP
Result Stream
REDUCE
REDUCE
APPLY SORT
Aggregate vs Search
Search Request
•Top N relevant results
•Processing is minimal if any
•Focus on filtering/ranking
“People named Homer
Simpson”
Aggregate Request
• Reduce and transform results
• Complex processing pipeline
• Used for analytics and insights
“Average age of people by
surname”
• Let’s take all Stack Overflow questions history
• Count the number of questions asked by month
• We have
– Timestamp
– Tags
– Question Text
– Etc
A Little Taste
Count Questions By Month
FT.AGGREGATE idx “*”
APPLY month(@time) AS month
GROUPBY 1 @month
REDUCE COUNT 0 AS count
SORTBY 1 @month
Aggregation Result
Building The Aggregation Pipeline
Step 1: Filter
Filter
A regular RediSearch Query
Step 2: Group
Group by one or more fields
Filter Group
Step 2: Group
Group by one or more fields
Filter
Group
Reduce
Reduce
Step 3: Apply projections
Functions, expressions, whatever
Filter
Group
Reduce
Apply
Step 4: Sort
By one or more fields
Or output of previous steps
Filter Apply Sort
Group
Reduce
Add More Steps
Re-group, apply more, re-sort, page, etc.
Filter Apply Sort ?
Group
Reduce
Aggregate Filters
• Determine what results we process
• Can be used to create complex selections
@country:{Israel | Italy}
@age:[25 52]
@profession:”1337 h4x0r”
-@languages:{perl | php}
GROUPBY and Reduce
• Grouping by any number of fields / upstream properties
• A GROUPBY is associated with many reducers
• Reducers flatten the a group’s stream into one result
Group
Results
REDUCE
Single
Result
GROUP
Group
Results
REDUCE
Single
Result
Result Stream
Result Stream
Available Reducers
• count
• cont_distinct(ish)
• min/max
• avg
• sum
• stddev
• quantile
• first_value
• tolist
• random_sample
Result Stream
REDUCER
Single Result
APPLY Projections
• APPLY applies 1:1 transformations on results
• Evaluate a complex expression per result
• Operate on previous output
APPLY
Single ResultSingle Result
APPLY Complex Expressions
APPLY “sqrt((@a + @b)/2)” AS avg
APPLY “timefmt(day(@time))” AS date
APPLY “format(‘Hello, %s’, @userName)” AS msg
Numeric Functions and Expressions
• Numeric operators: + - / * % ^
• log(x), log2(x)
• floor(x), ciel(x)
• abs(x)
• sqrt(x)
• exp(x)
String Functions and Expressions
• lower(s)
• upper(s)
• substr(s, offset, len)
• format(fmt, …)
• matched_terms([max])
Time Utility Functions
• timefmt(unix_timestamp, [fmt]) → string
• parsetime(str, [fmt]) → timestamp
• minute(timestamp) → timestamp
• hour(timestamp) → timestamp
• day(timestamp) → timestamp
• year(timestamp) → integer
• dayofmonth(timestamp) → integer
• dayofweek(timestamp) → integer
• monthofyear(timestamp) → integer
SORTBY
•Sort by one or more properties
•Each can be ASC/DESC
•MAX limits to top-N results
LIMIT vs. SORTBY MAX
• SORTBY … MAX {n}
• LIMIT {offset} {num}
• Getting results 10-20 will work best as:
SORTBY … MAX 20 LIMIT 10 10
Distributed Aggregations
What If Our Data Is Split Across Nodes?
Classic Search Approach:
Coordinator
Index Shard
Top K
Top K
Top
K
Search: “Hello”
That Doesn’t Translate to Aggregation Pipelines
Coordinator
Index Shard
count_distinctx(x)
count_distinctx(x)
count_distinctx(x)
count_distinct(x)
That Doesn’t Translate to Aggregation Pipelines
Coordinator
Index Shard
count_distinctx(x)
count_distinctx(x)
count_distinctx(x)
count_distinct(x)
Distributing Aggregations: Naive Approach
Filter: “redis”
Group: by date
Reduce: count
Sort: by count
Original Pipeline
Distributing Aggregations: Naive Approach
Filter: “redis”
Group: by date
Reduce: count
Sort: by count
Coordinator Pipeline Shard Pipeline
Distribute To Shards
Merge
Problem: Every Record Must Be Transferred
COUNT_DISTINCT(x)
Index Shard
GET ALL (x)
GET ALL (x)
GET
ALL
(x)
count_distinct(x)
Push Down: 1 Record Per Group Transferred
Filter: “redis”
Group: by date
Reduce: COUNT
Sort: by sum(counts)
Coordinator Pipeline Shard Pipeline
Group: by date
Reduce: SUM(COUNTS)
Distribute To Shards
Merge
Example: Simple Counting
Client
COUNT() SUM(COUNT) COUNT
Example: MIN/MAX
Client
MAX(x) MAX(MAX(x)) MAX(x)
Example: Average
Client
AVG(x) SUM(SUM(x)) → A
SUM(COUNT) → B
A/B → AVG(x)
SUM(x)
COUNT
Example: Count Distinct Elements
Client
COUNT_DISTINCT(x) HLL_SUM(HLL(x)) HLL(x)
Example: Median
Client
MEDIAN(x) MEDIAN(SAMPLE(x)) RESERVOIR
SAMPLE(x)
FT.AGGREGATE idx “*”
APPLY month(@time) AS month
GROUPBY 1 @month
REDUCE COUNT 0 AS count
SORTBY 1 @month
The Query Plan Translator
FT.AGGREGATE idx “*”
APPLY month(@time) AS month
GROUPBY 1 @month
REDUCE COUNT 0 AS count
FT.AGGREGATE {DISTRIBUTE(...)}
GROUPBY 1 @month
REDUCE SUM 1 @count AS count
SORTBY 1 @month
• Two or more GROUPBYs - not possible / easy
• High number of groups - still slow
• Exact count distinct and quantiles are slow
• If you really want to parallelize huge workloads - use Spark et
al
Limitations
Aggregations In Practice
The Dataset
• All Stack Overflow Questions
• The Schema:
– Title / body
– Tags
– Answers
– Asking User Id
– Rating
– Timestamp
Count Questions By Month
FT.AGGREGATE idx “*”
APPLY month(@time) AS month
GROUPBY 1 @month
REDUCE COUNT 0 AS count
SORTBY 1 @month
Compare Tag Popularity Over Time
FT.AGGREGATE idx “@tags:{java|python|c}”
APPLY month(@time) AS month
APPLY matched_terms() AS tag
GROUPBY 2 @month @tag
REDUCE COUNT 0 AS count
SORTBY 2 @month @tag
Top Tags By Approx. Number of Askers
FT.AGGREGATE idx “*”
APPLY split(@tags) AS tag
GROUPBY 1 @tag
REDUCE COUNT_DISTINCTISH 1 @user
AS users
SORTBY 2 @users DESC MAX 10
Histogram Of Scores
FT.AGGREGATE idx “*”
APPLY floor(@score/10)*10 AS score
GROUPBY 1 @score
REDUCE COUNT 0
SORTBY 1 @score MAX 100
• Port all search functionality to the FT.AGGREGATE API
• Streaming time-window based aggregations
• Improve expression language
– Add post filters
– Expressions as groupby / sortby properties
• Support cursored operation in the coordinator
Now open to requests!
Future Challenges
Thank You

More Related Content

What's hot

Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
Peter Milne
 
使用 AWS Step Functions 開發 Serverless 服務
使用 AWS Step Functions 開發 Serverless 服務使用 AWS Step Functions 開發 Serverless 服務
使用 AWS Step Functions 開發 Serverless 服務
Amazon Web Services
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Altinity Ltd
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Altinity Ltd
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
QAware GmbH
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Altinity Ltd
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
Rafael Roman Otero
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
 
ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리
confluent
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
Altinity Ltd
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
Norvald Ryeng
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
Amazon Web Services
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Altinity Ltd
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
Altinity Ltd
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
Databricks
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
Altinity Ltd
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 

What's hot (20)

Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 
使用 AWS Step Functions 開發 Serverless 服務
使用 AWS Step Functions 開發 Serverless 服務使用 AWS Step Functions 開發 Serverless 服務
使用 AWS Step Functions 開發 Serverless 服務
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 

Similar to RedisConf18 - Introducing RediSearch Aggregations

Redis Day TLV 2018 - RediSearch Aggregations
Redis Day TLV 2018 - RediSearch AggregationsRedis Day TLV 2018 - RediSearch Aggregations
Redis Day TLV 2018 - RediSearch Aggregations
Redis Labs
 
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
InfluxDB 1.0 - Optimizing InfluxDB by Sam DillardInfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
InfluxData
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
Amazon Web Services
 
managing big data
managing big datamanaging big data
managing big data
Suveeksha
 
SplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojo
Splunk
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
InfluxData
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
Julian Hyde
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Databricks
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
Databricks
 
Deep Dive - DynamoDB
Deep Dive - DynamoDBDeep Dive - DynamoDB
Deep Dive - DynamoDB
Amazon Web Services
 
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
Lucidworks
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
Michael Ming Lei
 
Deep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDBDeep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDB
Amazon Web Services
 
mongodb-aggregation-may-2012
mongodb-aggregation-may-2012mongodb-aggregation-may-2012
mongodb-aggregation-may-2012
Chris Westin
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
Thomas Hütter
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
Mao Geng
 
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
Datavail
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Altinity Ltd
 

Similar to RedisConf18 - Introducing RediSearch Aggregations (20)

Redis Day TLV 2018 - RediSearch Aggregations
Redis Day TLV 2018 - RediSearch AggregationsRedis Day TLV 2018 - RediSearch Aggregations
Redis Day TLV 2018 - RediSearch Aggregations
 
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
InfluxDB 1.0 - Optimizing InfluxDB by Sam DillardInfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
managing big data
managing big datamanaging big data
managing big data
 
SplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojo
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Deep Dive - DynamoDB
Deep Dive - DynamoDBDeep Dive - DynamoDB
Deep Dive - DynamoDB
 
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Deep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDBDeep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDB
 
mongodb-aggregation-may-2012
mongodb-aggregation-may-2012mongodb-aggregation-may-2012
mongodb-aggregation-may-2012
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 

More from Redis Labs

Redis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redis
Redis Labs
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Redis Labs
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
Redis Labs
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis Labs
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis Labs
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Redis Labs
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Redis Labs
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Redis Labs
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Redis Labs
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Redis Labs
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Redis Labs
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Redis Labs
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Redis Labs
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Redis Labs
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Labs
 

More from Redis Labs (20)

Redis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redis
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 

RedisConf18 - Introducing RediSearch Aggregations

  • 2. RediSearch 101 •High Performance Search Index •Redis Module •Written From Scratch in C •Supports multiple index types •Open Source / Commercial – Commercial is scalable and distributed – Several commercial customers – Largest cluster: 200 shards, ~2B documents, 8TB RAM
  • 3. What Can RediSearch Do? •Full-Text, Numeric, Geo and Tag indexes •Real-time indexing and updates •Distributed search on Billions of documents •Complex query language •Auto-Complete •Highlighting •And….
  • 5. Aggregations In Practice • Transform plain search results into statistical insights •Using a dynamic pipeline of: – Grouping and Reducing – Applying Transformations – Sorting GROUP Result Stream REDUCE REDUCE APPLY SORT
  • 6. Aggregate vs Search Search Request •Top N relevant results •Processing is minimal if any •Focus on filtering/ranking “People named Homer Simpson” Aggregate Request • Reduce and transform results • Complex processing pipeline • Used for analytics and insights “Average age of people by surname”
  • 7. • Let’s take all Stack Overflow questions history • Count the number of questions asked by month • We have – Timestamp – Tags – Question Text – Etc A Little Taste
  • 8. Count Questions By Month FT.AGGREGATE idx “*” APPLY month(@time) AS month GROUPBY 1 @month REDUCE COUNT 0 AS count SORTBY 1 @month
  • 11. Step 1: Filter Filter A regular RediSearch Query
  • 12. Step 2: Group Group by one or more fields Filter Group
  • 13. Step 2: Group Group by one or more fields Filter Group Reduce Reduce
  • 14. Step 3: Apply projections Functions, expressions, whatever Filter Group Reduce Apply
  • 15. Step 4: Sort By one or more fields Or output of previous steps Filter Apply Sort Group Reduce
  • 16. Add More Steps Re-group, apply more, re-sort, page, etc. Filter Apply Sort ? Group Reduce
  • 17. Aggregate Filters • Determine what results we process • Can be used to create complex selections @country:{Israel | Italy} @age:[25 52] @profession:”1337 h4x0r” -@languages:{perl | php}
  • 18. GROUPBY and Reduce • Grouping by any number of fields / upstream properties • A GROUPBY is associated with many reducers • Reducers flatten the a group’s stream into one result Group Results REDUCE Single Result GROUP Group Results REDUCE Single Result Result Stream Result Stream
  • 19. Available Reducers • count • cont_distinct(ish) • min/max • avg • sum • stddev • quantile • first_value • tolist • random_sample Result Stream REDUCER Single Result
  • 20. APPLY Projections • APPLY applies 1:1 transformations on results • Evaluate a complex expression per result • Operate on previous output APPLY Single ResultSingle Result
  • 21. APPLY Complex Expressions APPLY “sqrt((@a + @b)/2)” AS avg APPLY “timefmt(day(@time))” AS date APPLY “format(‘Hello, %s’, @userName)” AS msg
  • 22. Numeric Functions and Expressions • Numeric operators: + - / * % ^ • log(x), log2(x) • floor(x), ciel(x) • abs(x) • sqrt(x) • exp(x)
  • 23. String Functions and Expressions • lower(s) • upper(s) • substr(s, offset, len) • format(fmt, …) • matched_terms([max])
  • 24. Time Utility Functions • timefmt(unix_timestamp, [fmt]) → string • parsetime(str, [fmt]) → timestamp • minute(timestamp) → timestamp • hour(timestamp) → timestamp • day(timestamp) → timestamp • year(timestamp) → integer • dayofmonth(timestamp) → integer • dayofweek(timestamp) → integer • monthofyear(timestamp) → integer
  • 25. SORTBY •Sort by one or more properties •Each can be ASC/DESC •MAX limits to top-N results
  • 26. LIMIT vs. SORTBY MAX • SORTBY … MAX {n} • LIMIT {offset} {num} • Getting results 10-20 will work best as: SORTBY … MAX 20 LIMIT 10 10
  • 28. What If Our Data Is Split Across Nodes?
  • 29. Classic Search Approach: Coordinator Index Shard Top K Top K Top K Search: “Hello”
  • 30. That Doesn’t Translate to Aggregation Pipelines Coordinator Index Shard count_distinctx(x) count_distinctx(x) count_distinctx(x) count_distinct(x)
  • 31. That Doesn’t Translate to Aggregation Pipelines Coordinator Index Shard count_distinctx(x) count_distinctx(x) count_distinctx(x) count_distinct(x)
  • 32. Distributing Aggregations: Naive Approach Filter: “redis” Group: by date Reduce: count Sort: by count Original Pipeline
  • 33. Distributing Aggregations: Naive Approach Filter: “redis” Group: by date Reduce: count Sort: by count Coordinator Pipeline Shard Pipeline Distribute To Shards Merge
  • 34. Problem: Every Record Must Be Transferred COUNT_DISTINCT(x) Index Shard GET ALL (x) GET ALL (x) GET ALL (x) count_distinct(x)
  • 35. Push Down: 1 Record Per Group Transferred Filter: “redis” Group: by date Reduce: COUNT Sort: by sum(counts) Coordinator Pipeline Shard Pipeline Group: by date Reduce: SUM(COUNTS) Distribute To Shards Merge
  • 38. Example: Average Client AVG(x) SUM(SUM(x)) → A SUM(COUNT) → B A/B → AVG(x) SUM(x) COUNT
  • 39. Example: Count Distinct Elements Client COUNT_DISTINCT(x) HLL_SUM(HLL(x)) HLL(x)
  • 41. FT.AGGREGATE idx “*” APPLY month(@time) AS month GROUPBY 1 @month REDUCE COUNT 0 AS count SORTBY 1 @month The Query Plan Translator FT.AGGREGATE idx “*” APPLY month(@time) AS month GROUPBY 1 @month REDUCE COUNT 0 AS count FT.AGGREGATE {DISTRIBUTE(...)} GROUPBY 1 @month REDUCE SUM 1 @count AS count SORTBY 1 @month
  • 42. • Two or more GROUPBYs - not possible / easy • High number of groups - still slow • Exact count distinct and quantiles are slow • If you really want to parallelize huge workloads - use Spark et al Limitations
  • 44. The Dataset • All Stack Overflow Questions • The Schema: – Title / body – Tags – Answers – Asking User Id – Rating – Timestamp
  • 45. Count Questions By Month FT.AGGREGATE idx “*” APPLY month(@time) AS month GROUPBY 1 @month REDUCE COUNT 0 AS count SORTBY 1 @month
  • 46. Compare Tag Popularity Over Time FT.AGGREGATE idx “@tags:{java|python|c}” APPLY month(@time) AS month APPLY matched_terms() AS tag GROUPBY 2 @month @tag REDUCE COUNT 0 AS count SORTBY 2 @month @tag
  • 47. Top Tags By Approx. Number of Askers FT.AGGREGATE idx “*” APPLY split(@tags) AS tag GROUPBY 1 @tag REDUCE COUNT_DISTINCTISH 1 @user AS users SORTBY 2 @users DESC MAX 10
  • 48. Histogram Of Scores FT.AGGREGATE idx “*” APPLY floor(@score/10)*10 AS score GROUPBY 1 @score REDUCE COUNT 0 SORTBY 1 @score MAX 100
  • 49. • Port all search functionality to the FT.AGGREGATE API • Streaming time-window based aggregations • Improve expression language – Add post filters – Expressions as groupby / sortby properties • Support cursored operation in the coordinator Now open to requests! Future Challenges