iFood is the largest Brazilian-based food delivery app company. It connects users, restaurants, and deliverymen using an event-driven architecture using AWS SQS and SNS, with programming in Java and Node.js. Thales' team is responsible for delivering orders' events to restaurant devices at least once, which is currently done using a REST API polling and acknowledgment system.
Learn how their database infrastructure evolved from a PostgreSQL database, but began to show limitations and was a single point of failure. Growing through a few intermediary steps, including Amazon DynamoDB, eventually, turning to Scylla for its data model and collections to condense multiple tables. Using Scylla, iFood reduced the time to process events and acknowledgments (from ~80ms to ~3ms) and reduced costs using Scylla vs DynamoDB by over 9x.
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
Are you running separate database clusters for operational and analytical workloads? Scylla now has the ability to handle multiple workloads from a single cluster--without performance degradation to either. This session will cover:
- The evolving requirements for operational (OLTP) and analytics (OLAP) workloads in the modern datacenter
- How Scylla provides built-in control over workload priority and makes it easy for administrators to configure workload priorities
- The TCO impact of minimizing integrations and maintenance tasks, while also shrinking the datacenter footprint and maximizing utilization Plus we’ll share test results of how it performs in real-world settings.
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...ScyllaDB
SAS Intelligent Advertising changed its ad-serving platform from using Datastax Cassandra clusters to Scylla clusters for its real-time visitor data storage. This presentation describes how this migration was executed with no downtime and with no loss of data, even as data was constantly being created or updated.
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
Opera chose Scylla over Cassandra to sync the data of millions of browsers to a back-end data repository. The results of the migration and further optimizations they made in their stack helped Opera to gain better latency/throughput and lower resources usage beyond their expectations.
Attend this session to learn how to
Migrate your data in a sane way, without any downtime
Connect a Python+Django web app to Scylla, how to use intranode sharding to improve your application
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
FireEye believes in intelligence driven cyber security. Their legacy system used PostgreSQL with a custom graph database system to store and facilitate analysis of threat intelligence data. As their user base increased they ran into scaling issues requiring a system redesign with a new platform.
This presentation will focus on the bac kend systems and migration path to a new technology stack using JanusGraph running on top of Scylla plus Elasticsearch.
Using Scylladb turned out to be a game-changer in terms of performance and the types of analysis our application is able to do effortlessly.
Lightweight Transactions at Lightning SpeedScyllaDB
This talk will outline the Scylla implementation of Lightweight Transactions (LWT) that brings us to parity with Apache Cassandra. We will cover how to use it, what is working, and what is left to be done. We will also cover what other improvements are in store to improve Scylla's transactional capabilities and why it matters.
Lookout on Scaling Security to 100 Million DevicesScyllaDB
The massive increase of security-related data requires companies to respond with new approaches to ingestion. Learn how Lookout has changed its approach for ingesting telemetry to meet their goal of growing from 1.5 million devices to 100 million devices and beyond, using Kafka Connect and switching from AWS DynamoDB to Scylla.
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformScyllaDB
SkyElectric uses Scylla to power its smart energy platform. Scylla provides better performance, scalability, and lower latency than their previous MySQL database. With Scylla, SkyElectric has seen average write latency of 1.4ms and read latency of under 1ms, which is 10x faster throughput than MySQL. While Scylla has been easy to operate and support responsive upgrades and repairs, SkyElectric hopes to see improvements in data changelog, faster node joining, and backup/restore processes.
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
Are you running separate database clusters for operational and analytical workloads? Scylla now has the ability to handle multiple workloads from a single cluster--without performance degradation to either. This session will cover:
- The evolving requirements for operational (OLTP) and analytics (OLAP) workloads in the modern datacenter
- How Scylla provides built-in control over workload priority and makes it easy for administrators to configure workload priorities
- The TCO impact of minimizing integrations and maintenance tasks, while also shrinking the datacenter footprint and maximizing utilization Plus we’ll share test results of how it performs in real-world settings.
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...ScyllaDB
SAS Intelligent Advertising changed its ad-serving platform from using Datastax Cassandra clusters to Scylla clusters for its real-time visitor data storage. This presentation describes how this migration was executed with no downtime and with no loss of data, even as data was constantly being created or updated.
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
Opera chose Scylla over Cassandra to sync the data of millions of browsers to a back-end data repository. The results of the migration and further optimizations they made in their stack helped Opera to gain better latency/throughput and lower resources usage beyond their expectations.
Attend this session to learn how to
Migrate your data in a sane way, without any downtime
Connect a Python+Django web app to Scylla, how to use intranode sharding to improve your application
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
FireEye believes in intelligence driven cyber security. Their legacy system used PostgreSQL with a custom graph database system to store and facilitate analysis of threat intelligence data. As their user base increased they ran into scaling issues requiring a system redesign with a new platform.
This presentation will focus on the bac kend systems and migration path to a new technology stack using JanusGraph running on top of Scylla plus Elasticsearch.
Using Scylladb turned out to be a game-changer in terms of performance and the types of analysis our application is able to do effortlessly.
Lightweight Transactions at Lightning SpeedScyllaDB
This talk will outline the Scylla implementation of Lightweight Transactions (LWT) that brings us to parity with Apache Cassandra. We will cover how to use it, what is working, and what is left to be done. We will also cover what other improvements are in store to improve Scylla's transactional capabilities and why it matters.
Lookout on Scaling Security to 100 Million DevicesScyllaDB
The massive increase of security-related data requires companies to respond with new approaches to ingestion. Learn how Lookout has changed its approach for ingesting telemetry to meet their goal of growing from 1.5 million devices to 100 million devices and beyond, using Kafka Connect and switching from AWS DynamoDB to Scylla.
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformScyllaDB
SkyElectric uses Scylla to power its smart energy platform. Scylla provides better performance, scalability, and lower latency than their previous MySQL database. With Scylla, SkyElectric has seen average write latency of 1.4ms and read latency of under 1ms, which is 10x faster throughput than MySQL. While Scylla has been easy to operate and support responsive upgrades and repairs, SkyElectric hopes to see improvements in data changelog, faster node joining, and backup/restore processes.
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScyllaDB
ScyllaDB CEO Dor Laor lays out the ten million dollar engineering problem for distributed systems, and how only Scylla is architected to address the issue at the heart of Big Data ROI. He then introduces ScyllaDB's Glauber Costa and Packet's James Malachowski to reveal a new level of performance for a persistent NoSQL datastore. Dor concludes his talk with a bold proposition about how Scylla is uniquely positioned to help companies easily create and scale the software they need to achieve their vision.
Scylla began with a Cassandra compatibility story, implementing Cassandra’s query language (CQL) and replicating its user-visible architecture. Recently we introduced “Alternator” - an experimental feature adding compatibility with a second NoSQL database: Amazon’s DynamoDB. In this talk we look at why DynamoDB’s API was chosen as a good target for our API extension, how DynamoDB is similar to Scylla - and how it differs, and how we can implement DynamoDB’s API in Scylla. We will describe our progress so far in making Alternator compatible with DynamoDB - and what still remains to be done so that any DynamoDB application can run unmodified on Scylla.
Using ScyllaDB with JanusGraph for Cyber SecurityScyllaDB
Come hear how QOMPLX, a leader in Cyber Security Risk Management solutions uses ScyllaDB and JanusGraph to detect, manage and assess risks for large corporate and government clients. By leveraging two highly horizontally scalable and fault tolerant technologies, QOMPLX can flex with their clients' needs.
How to Monitor and Size Workloads on AWS i3 instancesScyllaDB
There is a new class of machines in town! Amazon recently unveiled i3, a new class of machines targeted at I/O-intensive workloads. Scylla will officially support i3, and previews are already available.
Join our webinar to learn how to build a state-of-the-art database solution. Presenters Glauber Costa and Eyal Gutkind will cover how to:
- Determine which workloads can benefit from i3 instances
- Ensure Scylla fully leverages the great resources in the i3 family
- Effectively navigate the Scylla monitoring system and identify bottlenecks
You'll also see a live demonstration with a dashboard featuring an i3 cluster with different data models and workloads.
No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.
Scylla: 1 Million CQL operations per second per serverAvi Kivity
My Cassandra Summit 2015 presentation introducing Scylla, an open source NoSQL implementation compatible with Apache Cassandra, but 10 times faster.
De-animated
http://scylladb.com
Scylla Summit 2018: Meshify - A Case Study, or Petshop SeamonstersScyllaDB
Meshify is the IOT platform focused on wireless sensor technology for industrial/insurance IOT. This talk will provide an overview of how Meshify is using Scylla. It will also explain why, when everything else in Meshify’s platform is moving to a managed cloud service or a container based microservice, why and how the Scylla nodes are the only pet “seamonsters” in Meshify’s platform.
Scylla is a new open source NoSQL database that is compatible with Apache Cassandra but provides significantly higher performance through a redesign that takes advantage of modern hardware. Scylla is capable of over 1.8 million operations per second per node with predictable low latencies. It uses an architecture with shard-per-core and reactor programming that avoids locks and threads for near-linear scaling. Scylla also has its own efficient unified cache and I/O scheduler that maximize throughput and allow it to outperform Cassandra on benchmarks by an order of magnitude. Scylla is fully compatible with Cassandra and aims to build an open source community around ongoing core database improvements.
Seastar is an open source framework that provides highly scalable and asynchronous distributed applications. It uses a shared-nothing architecture with no locks or threads to achieve linear scaling across cores. Applications built on Seastar can handle millions of connections and I/O operations in parallel. It uses an asynchronous programming model based on promises and futures with zero-copy networking and disk I/O for high performance.
Scylla Summit 2022: Stream Processing with ScyllaDBScyllaDB
Palo Alto Networks processes terabytes of events each day. One of their many challenges is to understand which of those events (which might come from various different sensors) actually describe the same story but from many different viewpoints.
Traditionally, such a system would need some sort of a database to store the events, and a message queue to notify consumers about new events that arrived into the system. They wanted to mitigate the cost and operational overhead of deploying yet another stateful component to their system, and designed a solution that uses ScyllaDB as the database for the events *and* as a message queue that allows our consumers to consume the correct events each time. Join this talk with Daniel Belenky, Principal Software Engineer, Palo Alto Networks where he will walk you through their process.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Since its inception, Scylla has offered a compelling alternative to Apache Cassandra, providing better performance for a lower cost of ownership.
With Scylla Open Source 4.0 we continue to extend our CQL interface features and capabilities and also now provide an open source alternative to DynamoDB, allowing you to run your workloads anywhere, on any cloud provider, or on premises.
Join ScyllaDB co-founders, CTO Avi Kivity and CEO Dor Laor, for a look at the new features in Scylla Open Source 4.0, and architectural and cost comparisons with the coming Cassandra 4.0.
Topics will include:
Improved consistency with our new Lightweight Transactions
Scylla Operator for Kubernetes
How we stack up against Apache Cassandra 4.0
Our “run anywhere” DynamoDB alternative
This document discusses ScyllaDB's process for sizing a Scylla cluster. It begins by outlining the importance of understanding business, application, and infrastructure requirements. Then it walks through building a sample system based on provided workload details. It shows how the sample system could be configured on different cloud platforms like AWS, Azure, and GCP. Finally, it highlights Scylla's sizing sheet tool for helping to determine hardware needs based on workload characteristics and performance goals.
High-Load Storage of Users’ Actions with ScyllaDB and HDDsScyllaDB
The presentation gives a brief overview of the high-load service that stores users' actions. The given service is able to serve up to 240k writes per second in less than 2ms 95 percentile with just a few ScyllaDB nodes packed with HDDs. Hardware setup, cluster specification, live load numbers and latencies achieved are given. The problems we encountered with HDD setup are described along with the possible solutions to them.
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...ScyllaDB
Scylla strives to deliver high throughput at low, consistent latencies under any scenario. But in the field things can and do get slower than one would like. Some of those issues come from bad data modelling and anti-patterns. Some others from lack of resources and bad system configuration, and in rare cases even product malfunction.
But how to tell them apart? And once you do, how to understand how to fix your application or reconfigure your system? Scylla has a rich ecosystem of tools available to answer those questions and in this talk we’ll discuss the proper use of some of them and how to take advantage of each tool’s strength. We will discuss real examples using tools like CQL tracing, nodetool commands, the Scylla monitor and others.
This document summarizes a presentation by Vinay Chella and Joey Lynch from Netflix on building and running cloud native Cassandra. They outline some of Cassandra's limitations for cloud deployments including development friction, packaging issues, cluster startup difficulties, and lack of scaling tools. Their proposals aim to address these by improving documentation, automating builds/tests, packaging for containers/packages, adding cluster control planes, and integrating metrics/monitoring. The speakers believe targeted changes can help Cassandra better support cloud-native principles of flexibility, scalability, and reliability.
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScyllaDB
Applications have never been so data-hungry, nor as demanding for scale, speed and availability. Hear from CEO Dor Laor as he shares how ScyllaDB is powering this next tech cycle.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...ScyllaDB
IOTA uses Scylla to store unlimited amounts of data generated by IoT devices on the IOTA network. Scylla is an open source NoSQL database that provides shared-nothing architecture and LSM-based storage optimized for IOTA's needs. Key features of using Scylla include its ability to handle unlimited amounts of time series data from devices, dynamically add or remove nodes from the Scylla cluster supporting IOTA, and power the IOTA Tangle Explorer.
ScyllaDB CTO Avi Kivity looks at the present state of Scylla's capabilities, and offers a glimpse of what's to come. From incremental compaction strategy to take advantage of newer, denser nodes, to data transformations with User Defined Functions (UDFs) and User Defined Aggregates (UDAs), ScyllaDB continues to expand its horizons for capabilities, use cases and APIs.
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
In this talk, we will discuss Happn's war story about migrating a Cassandra 2.1 cluster containing more than 68 Billion records in a counter table to ScyllaDB Open Source.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
This document discusses PostgreSQL database architecture patterns for running PostgreSQL at scale when a relational database as a service like Amazon RDS won't meet needs. It describes challenges faced with MySQL, Redshift and Vertica and how PostgreSQL was better suited through techniques like partitioning by date, TOAST compression, foreign data wrappers, and poor man's parallel processing. Key takeaways are that PostgreSQL supported scaling to petabytes of data, sub-second queries across large date ranges, and custom extensions needed while avoiding limitations and expenses of other database options.
Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko
State is an essential part of the modern streaming pipelines: it enables a variety of foundational capabilities like windowing, aggregation, enrichment, etc. But usually, the state is either transient, so we only keep it until the window is closed, or it's fairly small and doesn't grow much. But what if we treat the state differently? The keyed state in Flink can be scaled vertically and horizontally, it's reliable and fault-tolerant... so is scaling a stateful Flink application that different from scaling any data store like Kafka or MySQL?
At Shopify, we've worked on a massive analytical data pipeline that's needed to support complex streaming joins and correctly handle arbitrarily late-arriving data. We came up with an idea to never clear state and support joins this way. We've made a successful proof of concept, ingested all historical transactional Shopify data and ended up storing more than 10 TB of Flink state. In the end, it allowed us to achieve 100% data correctness.
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScyllaDB
ScyllaDB CEO Dor Laor lays out the ten million dollar engineering problem for distributed systems, and how only Scylla is architected to address the issue at the heart of Big Data ROI. He then introduces ScyllaDB's Glauber Costa and Packet's James Malachowski to reveal a new level of performance for a persistent NoSQL datastore. Dor concludes his talk with a bold proposition about how Scylla is uniquely positioned to help companies easily create and scale the software they need to achieve their vision.
Scylla began with a Cassandra compatibility story, implementing Cassandra’s query language (CQL) and replicating its user-visible architecture. Recently we introduced “Alternator” - an experimental feature adding compatibility with a second NoSQL database: Amazon’s DynamoDB. In this talk we look at why DynamoDB’s API was chosen as a good target for our API extension, how DynamoDB is similar to Scylla - and how it differs, and how we can implement DynamoDB’s API in Scylla. We will describe our progress so far in making Alternator compatible with DynamoDB - and what still remains to be done so that any DynamoDB application can run unmodified on Scylla.
Using ScyllaDB with JanusGraph for Cyber SecurityScyllaDB
Come hear how QOMPLX, a leader in Cyber Security Risk Management solutions uses ScyllaDB and JanusGraph to detect, manage and assess risks for large corporate and government clients. By leveraging two highly horizontally scalable and fault tolerant technologies, QOMPLX can flex with their clients' needs.
How to Monitor and Size Workloads on AWS i3 instancesScyllaDB
There is a new class of machines in town! Amazon recently unveiled i3, a new class of machines targeted at I/O-intensive workloads. Scylla will officially support i3, and previews are already available.
Join our webinar to learn how to build a state-of-the-art database solution. Presenters Glauber Costa and Eyal Gutkind will cover how to:
- Determine which workloads can benefit from i3 instances
- Ensure Scylla fully leverages the great resources in the i3 family
- Effectively navigate the Scylla monitoring system and identify bottlenecks
You'll also see a live demonstration with a dashboard featuring an i3 cluster with different data models and workloads.
No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.
Scylla: 1 Million CQL operations per second per serverAvi Kivity
My Cassandra Summit 2015 presentation introducing Scylla, an open source NoSQL implementation compatible with Apache Cassandra, but 10 times faster.
De-animated
http://scylladb.com
Scylla Summit 2018: Meshify - A Case Study, or Petshop SeamonstersScyllaDB
Meshify is the IOT platform focused on wireless sensor technology for industrial/insurance IOT. This talk will provide an overview of how Meshify is using Scylla. It will also explain why, when everything else in Meshify’s platform is moving to a managed cloud service or a container based microservice, why and how the Scylla nodes are the only pet “seamonsters” in Meshify’s platform.
Scylla is a new open source NoSQL database that is compatible with Apache Cassandra but provides significantly higher performance through a redesign that takes advantage of modern hardware. Scylla is capable of over 1.8 million operations per second per node with predictable low latencies. It uses an architecture with shard-per-core and reactor programming that avoids locks and threads for near-linear scaling. Scylla also has its own efficient unified cache and I/O scheduler that maximize throughput and allow it to outperform Cassandra on benchmarks by an order of magnitude. Scylla is fully compatible with Cassandra and aims to build an open source community around ongoing core database improvements.
Seastar is an open source framework that provides highly scalable and asynchronous distributed applications. It uses a shared-nothing architecture with no locks or threads to achieve linear scaling across cores. Applications built on Seastar can handle millions of connections and I/O operations in parallel. It uses an asynchronous programming model based on promises and futures with zero-copy networking and disk I/O for high performance.
Scylla Summit 2022: Stream Processing with ScyllaDBScyllaDB
Palo Alto Networks processes terabytes of events each day. One of their many challenges is to understand which of those events (which might come from various different sensors) actually describe the same story but from many different viewpoints.
Traditionally, such a system would need some sort of a database to store the events, and a message queue to notify consumers about new events that arrived into the system. They wanted to mitigate the cost and operational overhead of deploying yet another stateful component to their system, and designed a solution that uses ScyllaDB as the database for the events *and* as a message queue that allows our consumers to consume the correct events each time. Join this talk with Daniel Belenky, Principal Software Engineer, Palo Alto Networks where he will walk you through their process.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Since its inception, Scylla has offered a compelling alternative to Apache Cassandra, providing better performance for a lower cost of ownership.
With Scylla Open Source 4.0 we continue to extend our CQL interface features and capabilities and also now provide an open source alternative to DynamoDB, allowing you to run your workloads anywhere, on any cloud provider, or on premises.
Join ScyllaDB co-founders, CTO Avi Kivity and CEO Dor Laor, for a look at the new features in Scylla Open Source 4.0, and architectural and cost comparisons with the coming Cassandra 4.0.
Topics will include:
Improved consistency with our new Lightweight Transactions
Scylla Operator for Kubernetes
How we stack up against Apache Cassandra 4.0
Our “run anywhere” DynamoDB alternative
This document discusses ScyllaDB's process for sizing a Scylla cluster. It begins by outlining the importance of understanding business, application, and infrastructure requirements. Then it walks through building a sample system based on provided workload details. It shows how the sample system could be configured on different cloud platforms like AWS, Azure, and GCP. Finally, it highlights Scylla's sizing sheet tool for helping to determine hardware needs based on workload characteristics and performance goals.
High-Load Storage of Users’ Actions with ScyllaDB and HDDsScyllaDB
The presentation gives a brief overview of the high-load service that stores users' actions. The given service is able to serve up to 240k writes per second in less than 2ms 95 percentile with just a few ScyllaDB nodes packed with HDDs. Hardware setup, cluster specification, live load numbers and latencies achieved are given. The problems we encountered with HDD setup are described along with the possible solutions to them.
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...ScyllaDB
Scylla strives to deliver high throughput at low, consistent latencies under any scenario. But in the field things can and do get slower than one would like. Some of those issues come from bad data modelling and anti-patterns. Some others from lack of resources and bad system configuration, and in rare cases even product malfunction.
But how to tell them apart? And once you do, how to understand how to fix your application or reconfigure your system? Scylla has a rich ecosystem of tools available to answer those questions and in this talk we’ll discuss the proper use of some of them and how to take advantage of each tool’s strength. We will discuss real examples using tools like CQL tracing, nodetool commands, the Scylla monitor and others.
This document summarizes a presentation by Vinay Chella and Joey Lynch from Netflix on building and running cloud native Cassandra. They outline some of Cassandra's limitations for cloud deployments including development friction, packaging issues, cluster startup difficulties, and lack of scaling tools. Their proposals aim to address these by improving documentation, automating builds/tests, packaging for containers/packages, adding cluster control planes, and integrating metrics/monitoring. The speakers believe targeted changes can help Cassandra better support cloud-native principles of flexibility, scalability, and reliability.
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScyllaDB
Applications have never been so data-hungry, nor as demanding for scale, speed and availability. Hear from CEO Dor Laor as he shares how ScyllaDB is powering this next tech cycle.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...ScyllaDB
IOTA uses Scylla to store unlimited amounts of data generated by IoT devices on the IOTA network. Scylla is an open source NoSQL database that provides shared-nothing architecture and LSM-based storage optimized for IOTA's needs. Key features of using Scylla include its ability to handle unlimited amounts of time series data from devices, dynamically add or remove nodes from the Scylla cluster supporting IOTA, and power the IOTA Tangle Explorer.
ScyllaDB CTO Avi Kivity looks at the present state of Scylla's capabilities, and offers a glimpse of what's to come. From incremental compaction strategy to take advantage of newer, denser nodes, to data transformations with User Defined Functions (UDFs) and User Defined Aggregates (UDAs), ScyllaDB continues to expand its horizons for capabilities, use cases and APIs.
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
In this talk, we will discuss Happn's war story about migrating a Cassandra 2.1 cluster containing more than 68 Billion records in a counter table to ScyllaDB Open Source.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
This document discusses PostgreSQL database architecture patterns for running PostgreSQL at scale when a relational database as a service like Amazon RDS won't meet needs. It describes challenges faced with MySQL, Redshift and Vertica and how PostgreSQL was better suited through techniques like partitioning by date, TOAST compression, foreign data wrappers, and poor man's parallel processing. Key takeaways are that PostgreSQL supported scaling to petabytes of data, sub-second queries across large date ranges, and custom extensions needed while avoiding limitations and expenses of other database options.
Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko
State is an essential part of the modern streaming pipelines: it enables a variety of foundational capabilities like windowing, aggregation, enrichment, etc. But usually, the state is either transient, so we only keep it until the window is closed, or it's fairly small and doesn't grow much. But what if we treat the state differently? The keyed state in Flink can be scaled vertically and horizontally, it's reliable and fault-tolerant... so is scaling a stateful Flink application that different from scaling any data store like Kafka or MySQL?
At Shopify, we've worked on a massive analytical data pipeline that's needed to support complex streaming joins and correctly handle arbitrarily late-arriving data. We came up with an idea to never clear state and support joins this way. We've made a successful proof of concept, ingested all historical transactional Shopify data and ended up storing more than 10 TB of Flink state. In the end, it allowed us to achieve 100% data correctness.
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...Daniel Martin
Aetna uses IBM's DB2 Analytics Accelerator to improve the performance of long-running reports on its DB2 database. The accelerator offloads eligible queries to the Netezza appliance, reducing query times from hours to seconds. Aetna saw a 4x compression rate on its data and was able to load 1.5 billion rows in 15 minutes. Reports that previously timed out after 82 minutes now return results in 27 seconds, improving business users' ability to analyze data.
Kafka used at scale to deliver real-time notificationsSérgio Nunes
This document discusses using Apache Kafka at scale to deliver real-time notifications. It describes using Kafka to listen to database events, process them into transactions, translate transactions into notifications, and push notifications to mobile devices in real-time. It outlines the system architecture, challenges faced around partitioning, scaling consumers, and upgrading Kafka clients. Monitoring metrics and consumer lag is also discussed using tools like Burrow and Datadog.
Uber uses a scalable real-time complex event processing system to analyze streaming data from its services. The system uses Apache Samza for distributed stream processing and WSO2 Siddhi for complex event processing. Key events are detected using Siddhi queries and then actions like notifications or indexing to databases are triggered. The system processes over 30 billion messages per day across many use cases. Maintaining scalability, fault tolerance, and low latency are ongoing challenges.
Hangfire
An easy way to perform background processing in .NET and .NET Core applications. No Windows Service or separate process required.
Why Background Processing?
Lengthy operations like updating lot of records in DB
Checking every 2 hours for new data or files
Invoice generation at the end of every billing period
Monthly Reporting
Rebuild data, indexes or search-optimized index after data change
Automatic subscription renewal
Regular Mailings
Send an email due to an action
Background service provisioning
This document summarizes Netflix's use of Kafka in their data pipeline. It discusses how Netflix evolved from using S3 and EMR to introducing Kafka and Kafka producers and consumers to handle 400 billion events per day. It covers challenges of scaling Kafka clusters and tuning Kafka clients and brokers. Finally, it outlines Netflix's roadmap which includes contributing to open source projects like Kafka and testing failure resilience.
This document summarizes Netflix's use of Kafka in their data pipeline. It discusses the evolution of Netflix's data pipeline to incorporate Kafka to handle 400 billion events per day. It describes how Netflix uses Kafka clusters with different priorities and configurations. It also outlines some of the challenges of using Kafka at Netflix's scale, such as Zookeeper client issues and cluster scaling, and the solutions Netflix developed to address these challenges.
Processing 19 billion messages in real time and NOT dying in the processJampp
Here is an introduction in the Jampp architecture for data processing. We walk through our journey of migrating to systems that allows us to process more data in real time
Linkedin has multiple data-centers hosting tens of thousands of servers across them. A large percentage of these servers host our data infrastructure - our distributed data store called Espresso is sizeable amongst them. The fleet of servers contain various hardware components including, but not limited to, SSDs; and hardware has a tendency of failing from time to time. In case of hardware failures the servers need to undergo maintenance which can take a significant amount of time based on type of failure. This creates reduced capacity for that duration and throws an interesting problem of maintaining capacity in the face of multiple failures. This talk covers how LinkedIn uses Camunda wrapped around with several components to achieve hands-off capacity management via multiple workflows, with asynchronous pauses and synchronisation among them. It will also highlight how we achieved seamless integrations with various platforms and components within Linkedin's Infrastructure, and a few best practices that helped us achieve the final state.
The challenges of live events scalabilityGuy Tomer
Guy Tomer, founder of attracTV, discusses the challenges of supporting large online live events with TV participation. AttracTV supported an online music awards show with over 1 million streams and 100,000 concurrent users. Key challenges included scaling to support these large user numbers, handling steep increases in users, managing large amounts of data, maintaining high availability, testing at this scale, and controlling costs. Tomer emphasizes planning for horizontal scalability, keeping things simple, testing under realistic conditions, continuous monitoring, and using cloud infrastructure to handle demand flexibly at an affordable cost.
The Journey To Serverless At Home24 - reflections and insights AWS Germany
Presentation "The Journey To Serverless At Home24" from Çağatay Gürtürk & Martin Lindenberg at the AWS E-Business Web Day for windows applications. All videos and presentations can be found here: http://amzn.to/2ds3aMX
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
Customer Data Platforms, commonly called CDPs, form an integral part of the marketing stack powering Zeotap's Adtech and Martech use-cases. The company offers a privacy-compliant CDP platform, and ScyllaDB is an integral part. Zeotap's CDP demands a mix of OLTP, OLAP, and real-time data ingestion, requiring a highly-performant store.
In this presentation, Shubham Patil, Lead Software Engineer, and Safal Pandita, Senior Software Engineer at Zeotap will share how ScyllaDB is powering their solution and why it's a great fit. They begin by describing their business use case and the challenges they were facing before moving to ScyllaDB. Then they cover their technical use-cases and requirements for real-time and batch data ingestions. They delve into our data access patterns and describe their data model supporting all use cases simultaneously for ingress/egress. They explain how they are using Scylla Migrator for our migration needs, then describe their multiregional, multi-tenant production setup for onboarding more than 130+ partners. Finally, they finish by sharing some of their learnings, performance benchmarks, and future plans.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Building a real-time, scalable and intelligent programmatic ad buying platformJampp
After a brief introduction to programmatic ads and RTB we go through the evolution of Jampp's data platform to handle the enormous about of data we need to process.
The document describes the migration journey from Amazon RDS to Postgres Plus Cloud Database (PPCD). It outlines the business challenges with Amazon RDS including limited storage capacity, slow performance, and lack of control. It then discusses how xDB replication was used along with pg_dump and pg_restore to migrate the data. Several issues were encountered with xDB replication including prepared statements, monitoring, and NaN values. The migration involved fixing these issues, performing a final sync, and pointing the application to the new target database on PPCD. The document stresses the importance of proper planning, validation, and deep knowledge of migration tools.
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Amazon Web Services
AWS Lambda is a powerful and flexible tool for solving diverse business problems, from traditional grid computing to scheduled batch processing workflows. Cloud native solutions using AWS Lambda enable architectures that depart from traditional enterprise application design. These new design patterns can provide substantially increased performance and reduced costs. In this session, learn how Fannie Mae re-architected one of their mission-critical traditional grid computing applications to a modern serverless solution using AWS Lambda. Learn More: https://aws.amazon.com/government-education/
Many enterprise leaders have similar questions when considering a move to the cloud: How should I optimize my cloud strategy to support the business? How do I help the organization become faster, better and cheaper? How do I increase business agility and drive more and better business analytics and insight? In this webinar you will learn about best practices for optimizing your cloud for better business agility and cost.
MongoDB 3.2 introduces a host of new features and benefits, including encryption at rest, document validation, MongoDB Compass, numerous improvements to queries and the aggregation framework, and more. To take advantage of these features, your team needs an upgrade plan.
In this session, we’ll walk you through how to build an upgrade plan. We’ll show you how to validate your existing deployment, build a test environment with a representative workload, and detail how to carry out the upgrade. By the end, you should be prepared to start developing an upgrade plan for your deployment.
Similar to iFood on Delivering 100 Million Events a Month to Restaurants with Scylla (20)
Optimizing NoSQL Performance Through ObservabilityScyllaDB
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor!
Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track.
This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example:
- Common issues getting up and running with the monitoring stack
- Using the CQL optimizations dashboard
- Common issues causing high latency in a node
- Common issues causing replica imbalance
- What a healthy system looks like in terms of memory
- Key metrics to keep an eye on
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
Discuss the core tradeoffs and considerations involved in order-free and ordered stream processing. Brian Taylor walks through the pros and cons of three different approaches: no data dependency, deferred inter-event data dependency, and streaming inter-event data dependency.
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
We start by setting up a common ground introducing why relational databases fall short, addressing common EDA characteristics such as the need for real-time response times and schemaless approaches to address recurring changes to adapt and on-board new use cases. Next, interact with a sample Rust-based application: a social network app demonstrating an integration of both ScyllaDB and Redpanda.
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example:
- Understanding query first design principles
- Planning for schema evolution
- Steering clear of common pitfalls and anti-patterns
- Assessing data access patterns
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
Our first webinar of this series will cover common mistakes with practices such as:
- Translating the data model to NoSQL
- Optimizing table design
- Optimizing query performance
- Planning for partitioning
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
Expert tips on how to maximize your database performance at scale
Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies.
In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams.
We’ll cover how to:
- Design and deploy a large-scale distributed database cluster
- Optimize your clients’ interactions with it
- Expand the cluster horizontally and globally
- Ensure it survives whatever disasters the world throws at it
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll:
- Examine the context and technical requirements
- Talk about potential solutions and cover the pros and cons of each
- Disclose what approach the team took, and how it worked out
About the speaker:
Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care?
This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on:
- The hidden aspects of linear scaling
- When linear scaling matters most and when it’s simply irrelevant
- Often overlooked considerations for optimizing and measuring distributed systems performance
Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.
Navigating Complex Database Performance Hurdles
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma:
- The presenters will describe the context and technical requirements
- Together, we’ll talk about potential solutions and cover the pros and cons of each
- Finally, we’ll disclose what approach the team took, and how it worked out
Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
Felipe Cardeneti Mendes, Solutions Architect at ScyllaDB
Navigating workload-specific performance challenges and tradeoffs.
Felipe Mendes covers how to navigate the top performance challenges and tradeoffs that you’re likely to face with your project’s specific workload characteristics and technical/business requirements.
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
Pavel Emelyanov, Principal Engineer at ScyllaDB
Botond Denes, C++ Developer at ScyllaDB
What performance-minded engineers need to know.
Hear from Pavel Emelyanov and Botond Dénes on the impact of database internals – specifically, what to look for if you need latency and/or throughput improvements.
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
Piotr Sarna, Software Engineer at Turso
Understanding and tapping your driver’s performance potential.
Piotr Sarna discusses how to get the most out of a driver, particularly from the performance perspective, and select a driver that’s a good fit for your needs.
This document discusses replacing external caching solutions with using the internal caching capabilities of ScyllaDB. It provides examples of companies that improved performance, reduced costs and complexity by moving from Redis or Elasticsearch with an external cache to using ScyllaDB's embedded cache instead. The document also outlines some of the advantages of ScyllaDB's cache like improved latency, coherency with the database and observability compared to external caching layers.
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate.
Join us to learn:
- Why and how to ensure your database takes full advantage of your cloud infrastructure
- What architectural considerations matter most for high throughput and low latency
- Key factors to consider when selecting a high-performance database
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
This document discusses the pros and cons of placing an external cache in front of a database. It introduces Tomasz Grabiec and Tzach Livyatan from ScyllaDB and describes ScyllaDB's optimized internal caching design. External caches can increase latency and costs while ignoring the database's context and workload knowledge. ScyllaDB embeds its cache to minimize overhead and ensure data and query awareness. The document shares customer examples that improved performance and reduced costs by moving from cached databases to ScyllaDB.
Expert tips on how to maximize your database potential
If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case?
This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to:
- Infrastructure selection
- ScyllaDB configuration
- Client-side setup
- Data modeling
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
In this talk, Felipe Mendes, Solutions Architect at ScyllaDB, shares how 4 companies managed their migration. He covers:
Disney+ – No migration needed!
Discord – Shadow cluster
OpenWeb – TTL expiration, cover Load and Stream
MyHeritage – Counters
ShareChat – Bonus: A bit of everything
In this talk, Lubos discusses tools and methods for a successful migration. He covers:
Methods
Data (re)modeling
APIs
Spark Migrator
DS bulk
Tuning
Testing/monitoring
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
In this talk, Jon discusses practical strategies and issues to consider. He covers:
Reasons for Migrations
DB Functionality
Cost/Licensing
Outdated Technology
Scaling Problems
Technology Evolution
SQL to NoSQL
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
2. Presenter
Thales Biancalana, Senior Backend Developer at iFood
Control and Automation Engineer that decided that
programming is more exciting than building robots. Worked in
multiple applications using .NET, Node, React, Swift and Java,
now working as a backend developer at iFood. Always looking for
new challenges and different ways to solve them
11. ■ Http requests every 30 seconds for each device
■ Database to be invoked for each call
■ Heavy queries on read nodes: “all non-acked events by the device”
■ Mid term goal to support 500k connected merchants with 1 device
each
Polling Services
■ Why?
12. Polling Services
Multiple polling systems running in parallel:
■ Proxy Service: Kitchen-Polling
■ Service A: Gateway-Core (PostgresQL) - Dead
■ Service B: Connection-Order-Events (Apache Ignite)
■ Service C: Connection-Polling (DynamoDB) - Dying
■ Service D: Connection-Polling (ScyllaDB)
14. PostgresQL Legacy Service
■ Events indexed in one table and the acknowledgements in another
■ Readings (JOINS) were starting to become a problem as the number
of events and merchants increased
■ Master node “suffering with increasing load”
■ Single point of failure
17. Apache Ignite Service
■ Works really well (reading ~3ms)
Problems:
■ Hard to monitor, as service and database are one
■ We need to save events in another database used when adding
machines or recovering from disasters (more code to maintain)
■ It takes longer to get the service back up as it needs to fill the cache
from the PostgresQL database. That's why we have a fallback system
for when it is down
19. NoSQL Modeling
■ Our main query?
● All events that were not acked by a device
■ Orders (and events) belong to merchants, not devices
● We need the merchant devices when saving events
■ What to do with new devices?
● Return all merchant events and save them to the not acked by
device table
■ We are only interested in events from the last 8 hours from delivery
time
22. DynamoDB Service
■ Why DynamoDB?
● Try a NoSQL approach
● Most of infrastructure is in AWS
● Fully managed solution
23. DynamoDB Service - Issues
Issues with DynamoDB for our use:
■ DynamoDB autoscaling was not fast enough for our use case unless
we left a high minimum throughput our manage it ourselves
● Defeats the purpose as a fully managed solution
■ DynamoDB new on-demand mode is great, but expensive
25. ScyllaDB Service A
■ Quite easy to migrate from DynamoDB to Scylla with the same
modeling. Should be even easier with the new Project Alternator
26. ScyllaDB Service A - Results
■ How did it compare with DynamoDB?
● We started with three c5.2xlarge machine cluster that easily held
the throughput. This was nearly 9x database cost reduction that
could still hold more throughput (around $4.5k to $500/month)
27. ScyllaDB Service A - Learnings
■ Scylla uses TTL by column vs DynamoDB expiration time by
document
■ Scylla Support: we identified a bug when reading pages from
secondary index with prepared statements. After opening a Github
issue we had a new build with the fix in less than 4 days
(https://github.com/scylladb/scylla/issues/4569)
28. Modeling Issues
Issues with this modeling:
■ We need to manage restaurant devices
■ Need to manage old events for new devices
● It may be quite heavy to introduce a new device in the middle of the day
29. ScyllaDB Service B
Second modeling using collections.
Drawbacks:
■ Reads are expected to be slower
(okay as a fallback system)
Advantages:
■ Less complex
■ Events table can be used to
populate ignite cache
31. ScyllaDB Service B - Results
The good:
■ Nearly 9x database cost reduction when comparing with DynamoDB
on-demand
■ Time reduction from ~80ms to ~3ms to index events which resulted in
nearly 8x infrastructure reduction for writes
■ Solution complexity reduction from 4 tables and 2 indexes to 2 tables
and 1 index and 40% less code
The bad:
■ Increase in read times, worth it for now as a fallback system
■ Collections updates are CPU intensive and generate tombstones ->
use carefully
33. Final Thoughts
■ Scylla was cheaper when comparing with DynamoDB, but we created
a cluster on AWS machines
● Take in consideration the cost of maintaining a cluster. Learn from other talks how
easy it is to maintain a cluster when choosing between databases.
● But we have had no problems as of now
■ Check what you know about your domain and problem, it can be used
to simplify the solution
● Knowing it was a fallback system and the average number of devices per merchant
and orders per merchant led me believe it was a good trade off to have collections
updates, which should be used carefully
34. Final Thoughts
■ Get to know all features of your database before using them
● Collection updates are not cheap! Each update incurs in a tombstone which
slowdowns reads and gives more work to the garbage collector. We are still toying
with gc_grace to improve performance
● ScyllaDB secondary indexes are global by default which was a good thing for our
second solution, where the index has a cardinality as high as the number of
merchants (a bit more than 100k merchants online today). It could be achieved in
cassandra with Materialized Views.
● Global is the default, but it may not be always the best one to use, so Scylla also
supports local indexes and you need to know when to use each.
36. Next Steps
■ No acknowledgment polling solution using Scylla
■ Force Scylla to fail
■ Working on MQTT pub/sub solution
37. Thank you Stay in touch
Any questions?
Thales Biancalana
thales.biancalana@ifood.com.br
37
Editor's Notes
Hi everyone, my name is Thales. I’m here today because I’ve seen a lot of Scylla presentations talking about how awesome it is from a tech perspective, like how many ops, how it compares with cassandra as a drop in replacements and things like that, so I'm here to give a different perspective on how was to develop an application using Scylla from a developer perspective. I will not go into how to maintain the infrastructure, just about monitoring and costs.
(I'll probably skip this slide, but I'll leave it here)
So as I said I’m Thales
Let me start by giving a little bit of context of what we do at iFood
iFood is a food tech delivery business. It is the main delivery app in Brazil, but we are present in other countries: Colombia and Mexico. We connect over 12 million users, 100 thousand merchants - mostly restaurants today, and deliverymen to deliver a bit over 20 million orders a month as of now, which amounts to a bit over 100 million events going through the platform every month.
So here is the user app on the left and the merchant web app on the left
Something relevant is how fast iFood grew. It went from 1 million to 20 million orders a month in a bit over two years. Because of that we still have some legacy services being broken into microservices using java, node, docker and kubernetes. This was only possible using a cloud service, and most of iFood's infrastructure runs on AWS, which is why we are still using SNS and SQS to move events around our platform.
We use other technologies, but I'll mostly focus on these for our problem.
Even though its size, iFood is not an established tech company as of now, and with the growing issues we are facing we are always looking for new ways to scale the infrastructure, which is what I'll try to share with you guys today. Most of what we have today in our infrastructure database is over PostgresQL and DynamoDB which is not scaling well as it is becoming expensive.
Now to talk about the project I'll first have to introduce the team I work in: Connection.
We are responsible, among other things, for delivering order's events to merchants, either directly to our merchant app I showed before or to integrations for huge food companies. One of the ways this is done today is with a polling API.
So now I'll present the polling services we've worked on until we got to the Scylla solution.
Events arrive from the platform via SNS-SQS and are indexed in multiple services running in parallel so we can compare them. These events are polled from the app via an GET /events endpoint and are acknowledged via a POST /acks endpoints. The app them sends and acknowledgement for each event it receives as not to receive it again on the next event poll.
The polling is done every 30 seconds for each device
The database will be invoked on each /events call
We have heavy queries on reading nodes of: all non acked events by the device
The master
Something that I want to adress now: why are we using polling instead of something with a pub/sub approach?
We do have a MQTT service, which we are still developing, but unfortunately we also need to support external integrations, and a lot of them are not tech savvy, so having a REST API is a strategic advantage for having more merchants without going after them.
This is just to give names to all the services we developed
We started with a monolith polling system over a PostgresQL database that was the core of iFood for a long time. Readings were starting to become a problem as we got close to 10 million orders a month. We could solve it for some time by replicating to more databases and scaling the master vetically, but since we were separating the polling system from other functionalities they took this opportunity to work on something better.
Just to give you a better understanding, this was the relational data format. We had the events table at the top and an acknowledge table with the event id and the device id for the acknowledge. We would then join both tables for the polling result.
Our second approach was to deliver events using Apache Ignite in-memory database by indexing events and acknowledges. We decided to use Apache Ignite because we already used it in another service. It was put in place around october last year. It works really well and is currently the primary polling system at iFood. When we first deployed it, it had the postgres solution as a fallback.
It works wonderfully, but after working with it for some time we had some bones to pick with it. First that service and database are one, so we need to be really careful about deployments and scaling (one machine at a time), and, although not a problem directly with Ignite, we had multiple issues with AWS ELB discovery for the machines to talk with each other. We also need to save the events/acks in another database for when adding machines or recover from disasters. With that in mind and thinking about removing the postgres solution as a fallback we started working on our first NoSQL solution.
So what do we know about our domain:
First that we want all events not acked by a device
Second that orders (and events) belong to a merchant, not to the device, so we need to know the merchant devices when saving the device events
We need to also index the events by merchant to query them when introducing a new device
Also, we are only interested in events from the last 8 hours from the delivery time. When I say delivery time is because we may have scheduled orders
So this was our first NoSQL model. We have a table for unacked device events, one table for the restaurant or merchant events and another for the restaurant devices. I’m just going to point out that we introduced restaurants as merchants not so long ago, so we sometimes still use the term restaurant.
Now we get to the good NoSQL part, where I'll get into it a bit more than the other solutions on how we implemented the solution. But first, why did we choose DynamoDB as our first NoSQL solution? First we wanted to try a NoSQL solution, second that we were already in AWS ecosystem, and third because it is a full managed solution.
As you can see, the solution is quite complex. We need to manage the restaurant devices and events for new devices
Other problems with this solution is that DynamoDB autoscaling was not fast enough unless we left a high enough reading and write capacities, which would defeat the purpose of cutting costs.
DynamoDB autoscaling only happens every 5 minutes, which is not fast enough for us. Lunch and especially dinner go from 0 to max throughput quite fast. We are currently using on-demand, but it is expensive. We could do the auto scaling ourselves, but it would no longer be a fully managed solution. It was around the time Scylla got in contact with our DBAs and started working on a new Scylla.
The main problem we saw was the cost
The scaling policy also contains a target utilization—the percentage of consumed provisioned throughput at a point in time. Application Auto Scaling uses a target tracking algorithm to adjust the provisioned throughput of the table (or index) upward or downward in response to actual workloads, so that the actual capacity utilization remains at or near your target utilization.
You can set the auto scaling target utilization values between 20 and 90 percent for your read and write capacity.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AutoScaling.html
The first implementation using Scylla service was a direct comparison between Scylla and dynamodb solutions, so we implemented the same modeling. Because of this we could use the same code base and only change de DAO
This CPU load chart was taken from the scylla grafana overview dashboard provided by the scylla team.
I'll talk about Scylla collections
As you can see, the solution is quite complex. We need to manage the restaurant devices and events for new devices
Other problems with this solution is that DynamoDB autoscaling was not fast enough unless we left a high enough reading and write capacities, which would defeat the purpose of cutting costs.
DynamoDB autoscaling only happens every 5 minutes, which is not fast enough for us. Lunch and especially dinner go from 0 to max throughput quite fast. We are currently using on-demand, but it is expensive. We could do the auto scaling ourselves, but it would no longer be a fully managed solution. It was around the time Scylla got in contact with our DBAs and started working on a new Scylla.
The main problem we saw was the cost
The scaling policy also contains a target utilization—the percentage of consumed provisioned throughput at a point in time. Application Auto Scaling uses a target tracking algorithm to adjust the provisioned throughput of the table (or index) upward or downward in response to actual workloads, so that the actual capacity utilization remains at or near your target utilization.
You can set the auto scaling target utilization values between 20 and 90 percent for your read and write capacity.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AutoScaling.html
Reads are slower, which is ok for a fallback system. Could be faster if NOT CONTAINS was supported on SETs (which not supported in Cassandra as it is not usually a good approach
Remember what I said about Scylla TTL? It is column based, not document based, so the new acked devices column would not have the TTL, thus it was never be deleted
This is probably the most important slide
This is probably the most important slide
https://www.scylladb.com/2019/07/23/global-or-localsecondary-indexes-in-scylla-the-choice-is-now-yours/
https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html