SlideShare a Scribd company logo
Multiple Device Polling
Events Systems
Thales Biancalana, Senior Backend Developer
A case study from iFood
Presenter
Thales Biancalana, Senior Backend Developer at iFood
Control and Automation Engineer that decided that
programming is more exciting than building robots. Worked in
multiple applications using .NET, Node, React, Swift and Java,
now working as a backend developer at iFood. Always looking for
new challenges and different ways to solve them
Context
iFood
■ Food-tech delivery business.
■ Main delivery app in Brazil and also present in Colombia and Mexico
> 100k> 20M/month
iFood apps
iFood Infrastructure
■ Migrating to an microservice event driven architecture
Connection Team
Connection Team
Responsible for delivering orders' events to merchants
Connection
Merchant App
Integrations
Polling Services
POST
/acks
Polling Services
Multiple polling systems running in parallel
Service A
Service B
Proxy Service App
Orders Events
Acknowledgements
GET
/events
■ Http requests every 30 seconds for each device
■ Database to be invoked for each call
■ Heavy queries on read nodes: “all non-acked events by the device”
■ Mid term goal to support 500k connected merchants with 1 device
each
Polling Services
■ Why?
Polling Services
Multiple polling systems running in parallel:
■ Proxy Service: Kitchen-Polling
■ Service A: Gateway-Core (PostgresQL) - Dead
■ Service B: Connection-Order-Events (Apache Ignite)
■ Service C: Connection-Polling (DynamoDB) - Dying
■ Service D: Connection-Polling (ScyllaDB)
PostgresQL
Gateway-Core
PostgresQL Legacy Service
■ Events indexed in one table and the acknowledgements in another
■ Readings (JOINS) were starting to become a problem as the number
of events and merchants increased
■ Master node “suffering with increasing load”
■ Single point of failure
PostgresQL Legacy Service - Data
Apache Ignite
Connection-Order-Events
Apache Ignite Service
■ Works really well (reading ~3ms)
Problems:
■ Hard to monitor, as service and database are one
■ We need to save events in another database used when adding
machines or recovering from disasters (more code to maintain)
■ It takes longer to get the service back up as it needs to fill the cache
from the PostgresQL database. That's why we have a fallback system
for when it is down
NoSQL
NoSQL Modeling
■ Our main query?
● All events that were not acked by a device
■ Orders (and events) belong to merchants, not devices
● We need the merchant devices when saving events
■ What to do with new devices?
● Return all merchant events and save them to the not acked by
device table
■ We are only interested in events from the last 8 hours from delivery
time
NoSQL Modeling
DynamoDB
Connection-Polling
DynamoDB Service
■ Why DynamoDB?
● Try a NoSQL approach
● Most of infrastructure is in AWS
● Fully managed solution
DynamoDB Service - Issues
Issues with DynamoDB for our use:
■ DynamoDB autoscaling was not fast enough for our use case unless
we left a high minimum throughput our manage it ourselves
● Defeats the purpose as a fully managed solution
■ DynamoDB new on-demand mode is great, but expensive
ScyllaDB
Connection-Polling-v2
ScyllaDB Service A
■ Quite easy to migrate from DynamoDB to Scylla with the same
modeling. Should be even easier with the new Project Alternator
ScyllaDB Service A - Results
■ How did it compare with DynamoDB?
● We started with three c5.2xlarge machine cluster that easily held
the throughput. This was nearly 9x database cost reduction that
could still hold more throughput (around $4.5k to $500/month)
ScyllaDB Service A - Learnings
■ Scylla uses TTL by column vs DynamoDB expiration time by
document
■ Scylla Support: we identified a bug when reading pages from
secondary index with prepared statements. After opening a Github
issue we had a new build with the fix in less than 4 days
(https://github.com/scylladb/scylla/issues/4569)
Modeling Issues
Issues with this modeling:
■ We need to manage restaurant devices
■ Need to manage old events for new devices
● It may be quite heavy to introduce a new device in the middle of the day
ScyllaDB Service B
Second modeling using collections.
Drawbacks:
■ Reads are expected to be slower
(okay as a fallback system)
Advantages:
■ Less complex
■ Events table can be used to
populate ignite cache
ScyllaDB Service B - Catch
ScyllaDB Service B - Results
The good:
■ Nearly 9x database cost reduction when comparing with DynamoDB
on-demand
■ Time reduction from ~80ms to ~3ms to index events which resulted in
nearly 8x infrastructure reduction for writes
■ Solution complexity reduction from 4 tables and 2 indexes to 2 tables
and 1 index and 40% less code
The bad:
■ Increase in read times, worth it for now as a fallback system
■ Collections updates are CPU intensive and generate tombstones ->
use carefully
Final Thoughts
Final Thoughts
■ Scylla was cheaper when comparing with DynamoDB, but we created
a cluster on AWS machines
● Take in consideration the cost of maintaining a cluster. Learn from other talks how
easy it is to maintain a cluster when choosing between databases.
● But we have had no problems as of now
■ Check what you know about your domain and problem, it can be used
to simplify the solution
● Knowing it was a fallback system and the average number of devices per merchant
and orders per merchant led me believe it was a good trade off to have collections
updates, which should be used carefully
Final Thoughts
■ Get to know all features of your database before using them
● Collection updates are not cheap! Each update incurs in a tombstone which
slowdowns reads and gives more work to the garbage collector. We are still toying
with gc_grace to improve performance
● ScyllaDB secondary indexes are global by default which was a good thing for our
second solution, where the index has a cardinality as high as the number of
merchants (a bit more than 100k merchants online today). It could be achieved in
cassandra with Materialized Views.
● Global is the default, but it may not be always the best one to use, so Scylla also
supports local indexes and you need to know when to use each.
Next Steps
Next Steps
■ No acknowledgment polling solution using Scylla
■ Force Scylla to fail
■ Working on MQTT pub/sub solution
Thank you Stay in touch
Any questions?
Thales Biancalana
thales.biancalana@ifood.com.br
37

More Related Content

What's hot

Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
ScyllaDB
 
Free & Open DynamoDB API for Everyone
Free & Open DynamoDB API for EveryoneFree & Open DynamoDB API for Everyone
Free & Open DynamoDB API for Everyone
ScyllaDB
 
Using ScyllaDB with JanusGraph for Cyber Security
Using ScyllaDB with JanusGraph for Cyber SecurityUsing ScyllaDB with JanusGraph for Cyber Security
Using ScyllaDB with JanusGraph for Cyber Security
ScyllaDB
 
How to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesHow to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instances
ScyllaDB
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
ScyllaDB
 
Scylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per serverScylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per server
Avi Kivity
 
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop SeamonstersScylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
ScyllaDB
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
Tzach Livyatan
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and Seastar
Tzach Livyatan
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
ScyllaDB
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
ScyllaDB
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
ScyllaDB
 
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDsHigh-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
ScyllaDB
 
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
ScyllaDB
 
Building and running cloud native cassandra
Building and running cloud native cassandraBuilding and running cloud native cassandra
Building and running cloud native cassandra
Vinay Kumar Chella
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
ScyllaDB
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
ScyllaDB
 
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
ScyllaDB
 
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi Kivity
ScyllaDB
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
ScyllaDB
 

What's hot (20)

Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
 
Free & Open DynamoDB API for Everyone
Free & Open DynamoDB API for EveryoneFree & Open DynamoDB API for Everyone
Free & Open DynamoDB API for Everyone
 
Using ScyllaDB with JanusGraph for Cyber Security
Using ScyllaDB with JanusGraph for Cyber SecurityUsing ScyllaDB with JanusGraph for Cyber Security
Using ScyllaDB with JanusGraph for Cyber Security
 
How to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesHow to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instances
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
 
Scylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per serverScylla: 1 Million CQL operations per second per server
Scylla: 1 Million CQL operations per second per server
 
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop SeamonstersScylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and Seastar
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDsHigh-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
 
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
 
Building and running cloud native cassandra
Building and running cloud native cassandraBuilding and running cloud native cassandra
Building and running cloud native cassandra
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
 
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
 
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi Kivity
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 

Similar to iFood on Delivering 100 Million Events a Month to Restaurants with Scylla

Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
Corey Huinker
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
Yaroslav Tkachenko
 
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
Daniel Martin
 
Kafka used at scale to deliver real-time notifications
Kafka used at scale to deliver real-time notificationsKafka used at scale to deliver real-time notifications
Kafka used at scale to deliver real-time notifications
Sérgio Nunes
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Background processing with hangfire
Background processing with hangfireBackground processing with hangfire
Background processing with hangfire
Aleksandar Bozinovski
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
Matt Kuklinski
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
Jampp
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
Ed Hunter
 
Server fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil AhujaServer fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil Ahuja
camunda services GmbH
 
The challenges of live events scalability
The challenges of live events scalabilityThe challenges of live events scalability
The challenges of live events scalability
Guy Tomer
 
The Journey To Serverless At Home24 - reflections and insights
The Journey To Serverless At Home24 - reflections and insights The Journey To Serverless At Home24 - reflections and insights
The Journey To Serverless At Home24 - reflections and insights
AWS Germany
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB
 
Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platform
Jampp
 
PPCD_And_AmazonRDS
PPCD_And_AmazonRDSPPCD_And_AmazonRDS
PPCD_And_AmazonRDS
Vibhor Kumar
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Amazon Web Services
 
Optimizing your cloud
Optimizing your cloudOptimizing your cloud
Optimizing your cloud
2nd Watch
 
Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2
Dana Elisabeth Groce
 

Similar to iFood on Delivering 100 Million Events a Month to Restaurants with Scylla (20)

Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
 
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
 
Kafka used at scale to deliver real-time notifications
Kafka used at scale to deliver real-time notificationsKafka used at scale to deliver real-time notifications
Kafka used at scale to deliver real-time notifications
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Background processing with hangfire
Background processing with hangfireBackground processing with hangfire
Background processing with hangfire
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Server fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil AhujaServer fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil Ahuja
 
The challenges of live events scalability
The challenges of live events scalabilityThe challenges of live events scalability
The challenges of live events scalability
 
The Journey To Serverless At Home24 - reflections and insights
The Journey To Serverless At Home24 - reflections and insights The Journey To Serverless At Home24 - reflections and insights
The Journey To Serverless At Home24 - reflections and insights
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platform
 
PPCD_And_AmazonRDS
PPCD_And_AmazonRDSPPCD_And_AmazonRDS
PPCD_And_AmazonRDS
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
 
Optimizing your cloud
Optimizing your cloudOptimizing your cloud
Optimizing your cloud
 
Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2
 

More from ScyllaDB

Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
ScyllaDB
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
ScyllaDB
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
ScyllaDB
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
ScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
ScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
ScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
ScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
ScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
ScyllaDB
 

More from ScyllaDB (20)

Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 

Recently uploaded

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 

Recently uploaded (20)

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 

iFood on Delivering 100 Million Events a Month to Restaurants with Scylla

  • 1. Multiple Device Polling Events Systems Thales Biancalana, Senior Backend Developer A case study from iFood
  • 2. Presenter Thales Biancalana, Senior Backend Developer at iFood Control and Automation Engineer that decided that programming is more exciting than building robots. Worked in multiple applications using .NET, Node, React, Swift and Java, now working as a backend developer at iFood. Always looking for new challenges and different ways to solve them
  • 4. iFood ■ Food-tech delivery business. ■ Main delivery app in Brazil and also present in Colombia and Mexico > 100k> 20M/month
  • 6. iFood Infrastructure ■ Migrating to an microservice event driven architecture
  • 8. Connection Team Responsible for delivering orders' events to merchants Connection Merchant App Integrations
  • 10. POST /acks Polling Services Multiple polling systems running in parallel Service A Service B Proxy Service App Orders Events Acknowledgements GET /events
  • 11. ■ Http requests every 30 seconds for each device ■ Database to be invoked for each call ■ Heavy queries on read nodes: “all non-acked events by the device” ■ Mid term goal to support 500k connected merchants with 1 device each Polling Services ■ Why?
  • 12. Polling Services Multiple polling systems running in parallel: ■ Proxy Service: Kitchen-Polling ■ Service A: Gateway-Core (PostgresQL) - Dead ■ Service B: Connection-Order-Events (Apache Ignite) ■ Service C: Connection-Polling (DynamoDB) - Dying ■ Service D: Connection-Polling (ScyllaDB)
  • 14. PostgresQL Legacy Service ■ Events indexed in one table and the acknowledgements in another ■ Readings (JOINS) were starting to become a problem as the number of events and merchants increased ■ Master node “suffering with increasing load” ■ Single point of failure
  • 17. Apache Ignite Service ■ Works really well (reading ~3ms) Problems: ■ Hard to monitor, as service and database are one ■ We need to save events in another database used when adding machines or recovering from disasters (more code to maintain) ■ It takes longer to get the service back up as it needs to fill the cache from the PostgresQL database. That's why we have a fallback system for when it is down
  • 18. NoSQL
  • 19. NoSQL Modeling ■ Our main query? ● All events that were not acked by a device ■ Orders (and events) belong to merchants, not devices ● We need the merchant devices when saving events ■ What to do with new devices? ● Return all merchant events and save them to the not acked by device table ■ We are only interested in events from the last 8 hours from delivery time
  • 22. DynamoDB Service ■ Why DynamoDB? ● Try a NoSQL approach ● Most of infrastructure is in AWS ● Fully managed solution
  • 23. DynamoDB Service - Issues Issues with DynamoDB for our use: ■ DynamoDB autoscaling was not fast enough for our use case unless we left a high minimum throughput our manage it ourselves ● Defeats the purpose as a fully managed solution ■ DynamoDB new on-demand mode is great, but expensive
  • 25. ScyllaDB Service A ■ Quite easy to migrate from DynamoDB to Scylla with the same modeling. Should be even easier with the new Project Alternator
  • 26. ScyllaDB Service A - Results ■ How did it compare with DynamoDB? ● We started with three c5.2xlarge machine cluster that easily held the throughput. This was nearly 9x database cost reduction that could still hold more throughput (around $4.5k to $500/month)
  • 27. ScyllaDB Service A - Learnings ■ Scylla uses TTL by column vs DynamoDB expiration time by document ■ Scylla Support: we identified a bug when reading pages from secondary index with prepared statements. After opening a Github issue we had a new build with the fix in less than 4 days (https://github.com/scylladb/scylla/issues/4569)
  • 28. Modeling Issues Issues with this modeling: ■ We need to manage restaurant devices ■ Need to manage old events for new devices ● It may be quite heavy to introduce a new device in the middle of the day
  • 29. ScyllaDB Service B Second modeling using collections. Drawbacks: ■ Reads are expected to be slower (okay as a fallback system) Advantages: ■ Less complex ■ Events table can be used to populate ignite cache
  • 31. ScyllaDB Service B - Results The good: ■ Nearly 9x database cost reduction when comparing with DynamoDB on-demand ■ Time reduction from ~80ms to ~3ms to index events which resulted in nearly 8x infrastructure reduction for writes ■ Solution complexity reduction from 4 tables and 2 indexes to 2 tables and 1 index and 40% less code The bad: ■ Increase in read times, worth it for now as a fallback system ■ Collections updates are CPU intensive and generate tombstones -> use carefully
  • 33. Final Thoughts ■ Scylla was cheaper when comparing with DynamoDB, but we created a cluster on AWS machines ● Take in consideration the cost of maintaining a cluster. Learn from other talks how easy it is to maintain a cluster when choosing between databases. ● But we have had no problems as of now ■ Check what you know about your domain and problem, it can be used to simplify the solution ● Knowing it was a fallback system and the average number of devices per merchant and orders per merchant led me believe it was a good trade off to have collections updates, which should be used carefully
  • 34. Final Thoughts ■ Get to know all features of your database before using them ● Collection updates are not cheap! Each update incurs in a tombstone which slowdowns reads and gives more work to the garbage collector. We are still toying with gc_grace to improve performance ● ScyllaDB secondary indexes are global by default which was a good thing for our second solution, where the index has a cardinality as high as the number of merchants (a bit more than 100k merchants online today). It could be achieved in cassandra with Materialized Views. ● Global is the default, but it may not be always the best one to use, so Scylla also supports local indexes and you need to know when to use each.
  • 36. Next Steps ■ No acknowledgment polling solution using Scylla ■ Force Scylla to fail ■ Working on MQTT pub/sub solution
  • 37. Thank you Stay in touch Any questions? Thales Biancalana thales.biancalana@ifood.com.br 37

Editor's Notes

  1. Hi everyone, my name is Thales. I’m here today because I’ve seen a lot of Scylla presentations talking about how awesome it is from a tech perspective, like how many ops, how it compares with cassandra as a drop in replacements and things like that, so I'm here to give a different perspective on how was to develop an application using Scylla from a developer perspective. I will not go into how to maintain the infrastructure, just about monitoring and costs.
  2. (I'll probably skip this slide, but I'll leave it here) So as I said I’m Thales
  3. Let me start by giving a little bit of context of what we do at iFood
  4. iFood is a food tech delivery business. It is the main delivery app in Brazil, but we are present in other countries: Colombia and Mexico. We connect over 12 million users, 100 thousand merchants - mostly restaurants today, and deliverymen to deliver a bit over 20 million orders a month as of now, which amounts to a bit over 100 million events going through the platform every month.
  5. So here is the user app on the left and the merchant web app on the left
  6. Something relevant is how fast iFood grew. It went from 1 million to 20 million orders a month in a bit over two years. Because of that we still have some legacy services being broken into microservices using java, node, docker and kubernetes. This was only possible using a cloud service, and most of iFood's infrastructure runs on AWS, which is why we are still using SNS and SQS to move events around our platform. We use other technologies, but I'll mostly focus on these for our problem. Even though its size, iFood is not an established tech company as of now, and with the growing issues we are facing we are always looking for new ways to scale the infrastructure, which is what I'll try to share with you guys today. Most of what we have today in our infrastructure database is over PostgresQL and DynamoDB which is not scaling well as it is becoming expensive.
  7. Now to talk about the project I'll first have to introduce the team I work in: Connection.
  8. We are responsible, among other things, for delivering order's events to merchants, either directly to our merchant app I showed before or to integrations for huge food companies. One of the ways this is done today is with a polling API.
  9. So now I'll present the polling services we've worked on until we got to the Scylla solution.
  10. Events arrive from the platform via SNS-SQS and are indexed in multiple services running in parallel so we can compare them. These events are polled from the app via an GET /events endpoint and are acknowledged via a POST /acks endpoints. The app them sends and acknowledgement for each event it receives as not to receive it again on the next event poll.
  11. The polling is done every 30 seconds for each device The database will be invoked on each /events call We have heavy queries on reading nodes of: all non acked events by the device The master Something that I want to adress now: why are we using polling instead of something with a pub/sub approach? We do have a MQTT service, which we are still developing, but unfortunately we also need to support external integrations, and a lot of them are not tech savvy, so having a REST API is a strategic advantage for having more merchants without going after them.
  12. This is just to give names to all the services we developed
  13. We started with a monolith polling system over a PostgresQL database that was the core of iFood for a long time. Readings were starting to become a problem as we got close to 10 million orders a month. We could solve it for some time by replicating to more databases and scaling the master vetically, but since we were separating the polling system from other functionalities they took this opportunity to work on something better.
  14. Just to give you a better understanding, this was the relational data format. We had the events table at the top and an acknowledge table with the event id and the device id for the acknowledge. We would then join both tables for the polling result.
  15. Our second approach was to deliver events using Apache Ignite in-memory database by indexing events and acknowledges. We decided to use Apache Ignite because we already used it in another service. It was put in place around october last year. It works really well and is currently the primary polling system at iFood. When we first deployed it, it had the postgres solution as a fallback. It works wonderfully, but after working with it for some time we had some bones to pick with it. First that service and database are one, so we need to be really careful about deployments and scaling (one machine at a time), and, although not a problem directly with Ignite, we had multiple issues with AWS ELB discovery for the machines to talk with each other. We also need to save the events/acks in another database for when adding machines or recover from disasters. With that in mind and thinking about removing the postgres solution as a fallback we started working on our first NoSQL solution.
  16. So what do we know about our domain: First that we want all events not acked by a device Second that orders (and events) belong to a merchant, not to the device, so we need to know the merchant devices when saving the device events We need to also index the events by merchant to query them when introducing a new device Also, we are only interested in events from the last 8 hours from the delivery time. When I say delivery time is because we may have scheduled orders
  17. So this was our first NoSQL model. We have a table for unacked device events, one table for the restaurant or merchant events and another for the restaurant devices. I’m just going to point out that we introduced restaurants as merchants not so long ago, so we sometimes still use the term restaurant.
  18. Now we get to the good NoSQL part, where I'll get into it a bit more than the other solutions on how we implemented the solution. But first, why did we choose DynamoDB as our first NoSQL solution? First we wanted to try a NoSQL solution, second that we were already in AWS ecosystem, and third because it is a full managed solution.
  19. As you can see, the solution is quite complex. We need to manage the restaurant devices and events for new devices Other problems with this solution is that DynamoDB autoscaling was not fast enough unless we left a high enough reading and write capacities, which would defeat the purpose of cutting costs. DynamoDB autoscaling only happens every 5 minutes, which is not fast enough for us. Lunch and especially dinner go from 0 to max throughput quite fast. We are currently using on-demand, but it is expensive. We could do the auto scaling ourselves, but it would no longer be a fully managed solution. It was around the time Scylla got in contact with our DBAs and started working on a new Scylla. The main problem we saw was the cost The scaling policy also contains a target utilization—the percentage of consumed provisioned throughput at a point in time. Application Auto Scaling uses a target tracking algorithm to adjust the provisioned throughput of the table (or index) upward or downward in response to actual workloads, so that the actual capacity utilization remains at or near your target utilization. You can set the auto scaling target utilization values between 20 and 90 percent for your read and write capacity. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AutoScaling.html
  20. The first implementation using Scylla service was a direct comparison between Scylla and dynamodb solutions, so we implemented the same modeling. Because of this we could use the same code base and only change de DAO
  21. This CPU load chart was taken from the scylla grafana overview dashboard provided by the scylla team.
  22. I'll talk about Scylla collections
  23. As you can see, the solution is quite complex. We need to manage the restaurant devices and events for new devices Other problems with this solution is that DynamoDB autoscaling was not fast enough unless we left a high enough reading and write capacities, which would defeat the purpose of cutting costs. DynamoDB autoscaling only happens every 5 minutes, which is not fast enough for us. Lunch and especially dinner go from 0 to max throughput quite fast. We are currently using on-demand, but it is expensive. We could do the auto scaling ourselves, but it would no longer be a fully managed solution. It was around the time Scylla got in contact with our DBAs and started working on a new Scylla. The main problem we saw was the cost The scaling policy also contains a target utilization—the percentage of consumed provisioned throughput at a point in time. Application Auto Scaling uses a target tracking algorithm to adjust the provisioned throughput of the table (or index) upward or downward in response to actual workloads, so that the actual capacity utilization remains at or near your target utilization. You can set the auto scaling target utilization values between 20 and 90 percent for your read and write capacity. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AutoScaling.html
  24. Reads are slower, which is ok for a fallback system. Could be faster if NOT CONTAINS was supported on SETs (which not supported in Cassandra as it is not usually a good approach
  25. Remember what I said about Scylla TTL? It is column based, not document based, so the new acked devices column would not have the TTL, thus it was never be deleted
  26. This is probably the most important slide
  27. This is probably the most important slide https://www.scylladb.com/2019/07/23/global-or-localsecondary-indexes-in-scylla-the-choice-is-now-yours/ https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html