Opera chose Scylla over Cassandra to sync the data of millions of browsers to a back-end data repository. The results of the migration and further optimizations they made in their stack helped Opera to gain better latency/throughput and lower resources usage beyond their expectations.
Attend this session to learn how to
Migrate your data in a sane way, without any downtime
Connect a Python+Django web app to Scylla, how to use intranode sharding to improve your application
Lightweight Transactions at Lightning SpeedScyllaDB
This talk will outline the Scylla implementation of Lightweight Transactions (LWT) that brings us to parity with Apache Cassandra. We will cover how to use it, what is working, and what is left to be done. We will also cover what other improvements are in store to improve Scylla's transactional capabilities and why it matters.
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaScyllaDB
iFood is the largest Brazilian-based food delivery app company. It connects users, restaurants, and deliverymen using an event-driven architecture using AWS SQS and SNS, with programming in Java and Node.js. Thales' team is responsible for delivering orders' events to restaurant devices at least once, which is currently done using a REST API polling and acknowledgment system.
Learn how their database infrastructure evolved from a PostgreSQL database, but began to show limitations and was a single point of failure. Growing through a few intermediary steps, including Amazon DynamoDB, eventually, turning to Scylla for its data model and collections to condense multiple tables. Using Scylla, iFood reduced the time to process events and acknowledgments (from ~80ms to ~3ms) and reduced costs using Scylla vs DynamoDB by over 9x.
No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
FireEye believes in intelligence driven cyber security. Their legacy system used PostgreSQL with a custom graph database system to store and facilitate analysis of threat intelligence data. As their user base increased they ran into scaling issues requiring a system redesign with a new platform.
This presentation will focus on the bac kend systems and migration path to a new technology stack using JanusGraph running on top of Scylla plus Elasticsearch.
Using Scylladb turned out to be a game-changer in terms of performance and the types of analysis our application is able to do effortlessly.
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
Lookout on Scaling Security to 100 Million DevicesScyllaDB
The massive increase of security-related data requires companies to respond with new approaches to ingestion. Learn how Lookout has changed its approach for ingesting telemetry to meet their goal of growing from 1.5 million devices to 100 million devices and beyond, using Kafka Connect and switching from AWS DynamoDB to Scylla.
High-Load Storage of Users’ Actions with ScyllaDB and HDDsScyllaDB
The presentation gives a brief overview of the high-load service that stores users' actions. The given service is able to serve up to 240k writes per second in less than 2ms 95 percentile with just a few ScyllaDB nodes packed with HDDs. Hardware setup, cluster specification, live load numbers and latencies achieved are given. The problems we encountered with HDD setup are described along with the possible solutions to them.
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...ScyllaDB
SAS Intelligent Advertising changed its ad-serving platform from using Datastax Cassandra clusters to Scylla clusters for its real-time visitor data storage. This presentation describes how this migration was executed with no downtime and with no loss of data, even as data was constantly being created or updated.
Lightweight Transactions at Lightning SpeedScyllaDB
This talk will outline the Scylla implementation of Lightweight Transactions (LWT) that brings us to parity with Apache Cassandra. We will cover how to use it, what is working, and what is left to be done. We will also cover what other improvements are in store to improve Scylla's transactional capabilities and why it matters.
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaScyllaDB
iFood is the largest Brazilian-based food delivery app company. It connects users, restaurants, and deliverymen using an event-driven architecture using AWS SQS and SNS, with programming in Java and Node.js. Thales' team is responsible for delivering orders' events to restaurant devices at least once, which is currently done using a REST API polling and acknowledgment system.
Learn how their database infrastructure evolved from a PostgreSQL database, but began to show limitations and was a single point of failure. Growing through a few intermediary steps, including Amazon DynamoDB, eventually, turning to Scylla for its data model and collections to condense multiple tables. Using Scylla, iFood reduced the time to process events and acknowledgments (from ~80ms to ~3ms) and reduced costs using Scylla vs DynamoDB by over 9x.
No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
FireEye believes in intelligence driven cyber security. Their legacy system used PostgreSQL with a custom graph database system to store and facilitate analysis of threat intelligence data. As their user base increased they ran into scaling issues requiring a system redesign with a new platform.
This presentation will focus on the bac kend systems and migration path to a new technology stack using JanusGraph running on top of Scylla plus Elasticsearch.
Using Scylladb turned out to be a game-changer in terms of performance and the types of analysis our application is able to do effortlessly.
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
Lookout on Scaling Security to 100 Million DevicesScyllaDB
The massive increase of security-related data requires companies to respond with new approaches to ingestion. Learn how Lookout has changed its approach for ingesting telemetry to meet their goal of growing from 1.5 million devices to 100 million devices and beyond, using Kafka Connect and switching from AWS DynamoDB to Scylla.
High-Load Storage of Users’ Actions with ScyllaDB and HDDsScyllaDB
The presentation gives a brief overview of the high-load service that stores users' actions. The given service is able to serve up to 240k writes per second in less than 2ms 95 percentile with just a few ScyllaDB nodes packed with HDDs. Hardware setup, cluster specification, live load numbers and latencies achieved are given. The problems we encountered with HDD setup are described along with the possible solutions to them.
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...ScyllaDB
SAS Intelligent Advertising changed its ad-serving platform from using Datastax Cassandra clusters to Scylla clusters for its real-time visitor data storage. This presentation describes how this migration was executed with no downtime and with no loss of data, even as data was constantly being created or updated.
Agreement in a distributed system is complicated but required. Scylla gained lightweight transactions through Paxos but the latter has a cost of 3X roundtrips. Raft can allow consistent transactions without the performance penalty. Beyond LWT, we plan to integrate Raft with most aspects of Scylla making a leap forward in manageability and consistency
Using ScyllaDB with JanusGraph for Cyber SecurityScyllaDB
Come hear how QOMPLX, a leader in Cyber Security Risk Management solutions uses ScyllaDB and JanusGraph to detect, manage and assess risks for large corporate and government clients. By leveraging two highly horizontally scalable and fault tolerant technologies, QOMPLX can flex with their clients' needs.
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
Are you running separate database clusters for operational and analytical workloads? Scylla now has the ability to handle multiple workloads from a single cluster--without performance degradation to either. This session will cover:
- The evolving requirements for operational (OLTP) and analytics (OLAP) workloads in the modern datacenter
- How Scylla provides built-in control over workload priority and makes it easy for administrators to configure workload priorities
- The TCO impact of minimizing integrations and maintenance tasks, while also shrinking the datacenter footprint and maximizing utilization Plus we’ll share test results of how it performs in real-world settings.
Scylla allows us to create highly performant and scalable systems. However, to achieve good results and prevent our Scylla cluster from being overloaded, we need to properly write our client application and configure the driver. Join this session to learn some practical tips that can help you make your applications faster and more available.
Introducing Scylla Manager: Cluster Management and Task AutomationScyllaDB
By centralizing cluster administration and automating recurring tasks, Scylla Manager brings greater predictability and control to Scylla-based environments.
In this webinar, you will learn about Scylla Manager’s recurrent repair capabilities, including why recurrent repair is critical for Scylla production cluster administration, and why keeping it manual results in errors and suboptimal performance.
We will present a demo of how to set up and run recurrent and ad-hoc repairs on a Scylla cluster, and give you a sneak peek of the Scylla Manager roadmap, which includes cluster management, rolling upgrades, and integrated monitoring.
Scylla’s Journey Towards Being an Elastic Cloud Native DatabaseScyllaDB
Cloud Native Databases are required to scale while serving the increase in online workload with a minimal disruption and complete it as fast as possible. In this session we will review the different components that are stressed in scaling scenarios and present work we have done over the year to improve Scylla’s elasticity as we enhance it to be a true Cloud Native Database.
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
In this talk, we will discuss Happn's war story about migrating a Cassandra 2.1 cluster containing more than 68 Billion records in a counter table to ScyllaDB Open Source.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
ScyllaDB CTO Avi Kivity looks at the present state of Scylla's capabilities, and offers a glimpse of what's to come. From incremental compaction strategy to take advantage of newer, denser nodes, to data transformations with User Defined Functions (UDFs) and User Defined Aggregates (UDAs), ScyllaDB continues to expand its horizons for capabilities, use cases and APIs.
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScyllaDB
Fetching large amount of data in a single query is a longstanding pain for applications. Queries that return a significant amount of data have to be paged, in other words, split into multiple subqueries that return data little by little. In both Scylla and Apache Cassandra, paging is stateless: each subquery is independent of each other and can even be sent to different replicas. Because of that, all the work done in the previous subqueries will not be reused causing a reduction from the maximum expected throughput. In this talk we are going to examine the problems with the previous stateless paging implementation and introduce the new stateful paging implementations that brings vast improvements in the throughput of large partition scans.
These slides are from the recent meetup @ Uber - Apache Cassandra at Uber and Netflix on new features in 4.0.
Abstract:
A glimpse of Cassandra 4.0 features:
There are a lot of exciting features coming in 4.0, but this talk covers some of the features that we at Netflix are particularly excited about and looking forward to. In this talk, we present an overview of just some of the many improvements shipping soon in 4.0.
Scylla: 1 Million CQL operations per second per serverAvi Kivity
My Cassandra Summit 2015 presentation introducing Scylla, an open source NoSQL implementation compatible with Apache Cassandra, but 10 times faster.
De-animated
http://scylladb.com
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
Jeff Bean will lead a discussion of event-driven architectures, Apache Kafka, Kafka Connect, KSQL and Confluent Cloud. Then we'll talk about some uses of Confluent and Scylla together, including a co-deployment with Lookout, ScyllaDB and Confluent in the IoT space, and the upcoming native connector.
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScyllaDB
ScyllaDB CEO Dor Laor lays out the ten million dollar engineering problem for distributed systems, and how only Scylla is architected to address the issue at the heart of Big Data ROI. He then introduces ScyllaDB's Glauber Costa and Packet's James Malachowski to reveal a new level of performance for a persistent NoSQL datastore. Dor concludes his talk with a bold proposition about how Scylla is uniquely positioned to help companies easily create and scale the software they need to achieve their vision.
Seastar is a modern, open source server application framework written in C++ that presents a future/promise based API to the user while delivering top-of-the line performance -- more than five times the nearest competitor, with 7 million requests per second served on a single machine.
Running a DynamoDB-compatible Database on Managed Kubernetes ServicesScyllaDB
With the release of Alternator, Scylla’s DynamoDB-compatible API, you can now take your locked-in DynamoDB workloads and run them anywhere. Scylla provides a cost-effective open source alternative to Amazon’s DynamoDB, deployable wherever a user would want: on-premises, on other public clouds like Microsoft Azure or Google Cloud Platform, still on AWS (such as the high-density i3en instances) or as a fully managed DBaaS.
In this session, we will cover:
- Scylla Alternator: Scylla’s Amazon DynamoDB-compatible API
- Scylla Operator: Running Scylla Alternator on Kubernetes
- Demo Alternator - Demo and explain DynamoDB on GKE
Intro to Apache Kafka I gave at the Big Data Meetup in Geneva in June 2016. Covers the basics and gets into some more advanced topics. Includes demo and source code to write clients and unit tests in Java (GitHub repo on the last slides).
Agreement in a distributed system is complicated but required. Scylla gained lightweight transactions through Paxos but the latter has a cost of 3X roundtrips. Raft can allow consistent transactions without the performance penalty. Beyond LWT, we plan to integrate Raft with most aspects of Scylla making a leap forward in manageability and consistency
Using ScyllaDB with JanusGraph for Cyber SecurityScyllaDB
Come hear how QOMPLX, a leader in Cyber Security Risk Management solutions uses ScyllaDB and JanusGraph to detect, manage and assess risks for large corporate and government clients. By leveraging two highly horizontally scalable and fault tolerant technologies, QOMPLX can flex with their clients' needs.
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
Are you running separate database clusters for operational and analytical workloads? Scylla now has the ability to handle multiple workloads from a single cluster--without performance degradation to either. This session will cover:
- The evolving requirements for operational (OLTP) and analytics (OLAP) workloads in the modern datacenter
- How Scylla provides built-in control over workload priority and makes it easy for administrators to configure workload priorities
- The TCO impact of minimizing integrations and maintenance tasks, while also shrinking the datacenter footprint and maximizing utilization Plus we’ll share test results of how it performs in real-world settings.
Scylla allows us to create highly performant and scalable systems. However, to achieve good results and prevent our Scylla cluster from being overloaded, we need to properly write our client application and configure the driver. Join this session to learn some practical tips that can help you make your applications faster and more available.
Introducing Scylla Manager: Cluster Management and Task AutomationScyllaDB
By centralizing cluster administration and automating recurring tasks, Scylla Manager brings greater predictability and control to Scylla-based environments.
In this webinar, you will learn about Scylla Manager’s recurrent repair capabilities, including why recurrent repair is critical for Scylla production cluster administration, and why keeping it manual results in errors and suboptimal performance.
We will present a demo of how to set up and run recurrent and ad-hoc repairs on a Scylla cluster, and give you a sneak peek of the Scylla Manager roadmap, which includes cluster management, rolling upgrades, and integrated monitoring.
Scylla’s Journey Towards Being an Elastic Cloud Native DatabaseScyllaDB
Cloud Native Databases are required to scale while serving the increase in online workload with a minimal disruption and complete it as fast as possible. In this session we will review the different components that are stressed in scaling scenarios and present work we have done over the year to improve Scylla’s elasticity as we enhance it to be a true Cloud Native Database.
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
In this talk, we will discuss Happn's war story about migrating a Cassandra 2.1 cluster containing more than 68 Billion records in a counter table to ScyllaDB Open Source.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
ScyllaDB CTO Avi Kivity looks at the present state of Scylla's capabilities, and offers a glimpse of what's to come. From incremental compaction strategy to take advantage of newer, denser nodes, to data transformations with User Defined Functions (UDFs) and User Defined Aggregates (UDAs), ScyllaDB continues to expand its horizons for capabilities, use cases and APIs.
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScyllaDB
Fetching large amount of data in a single query is a longstanding pain for applications. Queries that return a significant amount of data have to be paged, in other words, split into multiple subqueries that return data little by little. In both Scylla and Apache Cassandra, paging is stateless: each subquery is independent of each other and can even be sent to different replicas. Because of that, all the work done in the previous subqueries will not be reused causing a reduction from the maximum expected throughput. In this talk we are going to examine the problems with the previous stateless paging implementation and introduce the new stateful paging implementations that brings vast improvements in the throughput of large partition scans.
These slides are from the recent meetup @ Uber - Apache Cassandra at Uber and Netflix on new features in 4.0.
Abstract:
A glimpse of Cassandra 4.0 features:
There are a lot of exciting features coming in 4.0, but this talk covers some of the features that we at Netflix are particularly excited about and looking forward to. In this talk, we present an overview of just some of the many improvements shipping soon in 4.0.
Scylla: 1 Million CQL operations per second per serverAvi Kivity
My Cassandra Summit 2015 presentation introducing Scylla, an open source NoSQL implementation compatible with Apache Cassandra, but 10 times faster.
De-animated
http://scylladb.com
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
Jeff Bean will lead a discussion of event-driven architectures, Apache Kafka, Kafka Connect, KSQL and Confluent Cloud. Then we'll talk about some uses of Confluent and Scylla together, including a co-deployment with Lookout, ScyllaDB and Confluent in the IoT space, and the upcoming native connector.
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScyllaDB
ScyllaDB CEO Dor Laor lays out the ten million dollar engineering problem for distributed systems, and how only Scylla is architected to address the issue at the heart of Big Data ROI. He then introduces ScyllaDB's Glauber Costa and Packet's James Malachowski to reveal a new level of performance for a persistent NoSQL datastore. Dor concludes his talk with a bold proposition about how Scylla is uniquely positioned to help companies easily create and scale the software they need to achieve their vision.
Seastar is a modern, open source server application framework written in C++ that presents a future/promise based API to the user while delivering top-of-the line performance -- more than five times the nearest competitor, with 7 million requests per second served on a single machine.
Running a DynamoDB-compatible Database on Managed Kubernetes ServicesScyllaDB
With the release of Alternator, Scylla’s DynamoDB-compatible API, you can now take your locked-in DynamoDB workloads and run them anywhere. Scylla provides a cost-effective open source alternative to Amazon’s DynamoDB, deployable wherever a user would want: on-premises, on other public clouds like Microsoft Azure or Google Cloud Platform, still on AWS (such as the high-density i3en instances) or as a fully managed DBaaS.
In this session, we will cover:
- Scylla Alternator: Scylla’s Amazon DynamoDB-compatible API
- Scylla Operator: Running Scylla Alternator on Kubernetes
- Demo Alternator - Demo and explain DynamoDB on GKE
Intro to Apache Kafka I gave at the Big Data Meetup in Geneva in June 2016. Covers the basics and gets into some more advanced topics. Includes demo and source code to write clients and unit tests in Java (GitHub repo on the last slides).
NetflixOSS Meetup S3 E1, covering latest components in Distributed Databases, Telemetry systems, Big Data tools and more. Speakers from Netflix, IBM Watson, Pivotal and Nike Digital
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe several techniques available in R to speed up workloads like these, by running multiple iterations simultaneously, in parallel.
Many of these techniques require the use of a cluster of machines running R, and I'll provide examples of using cloud-based services to provision clusters for parallel computations. In particular, I will describe how you can use the SparklyR package to distribute data manipulations using the dplyr syntax, on a cluster of servers provisioned in the Azure cloud.
Presented by David Smith at Data Day Texas in Austin, January 27 2018.
We hear a lot about lambda architectures and how Cassandra and Spark can help us crunch our data both in batch and real-time. After a year in the trenches, I'll share how we at The Weather Company built a general purpose, weather-scale event processing pipeline to make sense of billions of events each day. If you want to avoid much of the pain learning how to get it right, this talk is for you.
How to achieve no compromise performance and availabilityScyllaDB
ScyllaDB co-founders Dor Laor and Avi Kivity discuss why they started ScyllaDB, the decision decisions they made to achieve no-compromise performance and availability, and give a demo on how to get up and running on Docker.
The deck describes ScyllaDB's flagship product - a drop and replacement alternative to Apache Cassandra at 10X the speed. ScyllaDB innovative design relies on shard-per-core, own caching and c++ to deliver blazing and consistent performance. Check the deck on how this was achieved.
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScyllaDB
The idea for implementing a brand new Rust driver for ScyllaDB emerged from an internal hackathon in 2020. The initial goal was to provide a native implementation of a CQL driver, fully compatible with Apache Cassandra™, but also contain a variety of Scylla-specific optimizations. The development was later continued as a Warsaw University project led by ScyllaDB.
Now it's an officially supported driver with excellent performance and a wide range of features. This session shares the design decisions taken in implementing the driver and its roadmap. It also presents a forward-thinking plan to unify other Scylla-specific drivers by translating them to bindings to our Rust driver, using work on our C++ driver as an example.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Migrating Data Pipeline from MongoDB to CassandraDemi Ben-Ari
MongoDB is a great NoSQL database, it’s very flexible and easy to use,
but would it handle massive Read / Write throughput?
actually, what happens when you need to scale everything out and easily?
We will lay out the reasons and the steps of migrating our data pipeline to Apache Cassandra in a short period without having any prior knowledge.
We’ll list our lessons learned as well.
Bio:
Demi Ben-Ari, Sr. Data Engineer @Windward,
I have over 9 years of experience in building various systems both from the field of near real time applications and Big Data distributed systems.
Co-Organizer of the “Big Things” Big Data community:http://somebigthings.com/big-things-intro/
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit
At NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences.
To achieve that, we need to ingest billions of events per day into our big data stores, and we need to do it in a scalable yet cost-efficient manner.
In this session, we will discuss how we continuously transform our data infrastructure to support these goals.
Specifically, we will review how we went from CSV files and standalone Java applications all the way to multiple Kafka and Spark clusters, performing a mixture of Streaming and Batch ETLs, and supporting 10x data growth.
We will share our experience as early-adopters of Spark Streaming and Spark Structured Streaming, and how we overcame technical barriers (and there were plenty...).
We will present a rather unique solution of using Kafka to imitate streaming over our Data Lake, while significantly reducing our cloud services' costs.
Topics include :
* Kafka and Spark Streaming for stateless and stateful use-cases
* Spark Structured Streaming as a possible alternative
* Combining Spark Streaming with batch ETLs
* "Streaming" over Data Lake using Kafka
How Teads scale with Apache Cassandra.
Internet scale means tons of data, read heavy workload, massive data ingestion and low latency.
The French AdTech company Teads uses Cassandra massively, a reliable and performant Open Source database.
Spawning Cassandra nodes in AWS is a piece of cake with Terraform and Chef.
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
Talk at Philly ETE Apr 28 2017
We will talk about Spotify’s story of migrating our big data infrastructure to Google Cloud. Over the past year or so we moved away from maintaining our own 2500+ node Hadoop cluster to managed services in the cloud. We replaced two key components in our data processing stack, Hive and Scalding, with BigQuery and Scio and are able to iterate at a much faster speed. We will focus the technical aspect of Scio, a Scala API for Apache Beam and Google Cloud Dataflow and how it changed the way we process data.
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...ScyllaDB
MongoDB has become the prominent NoSQL database engine and is now used for a wide variety of use cases because of its flexibility and ease of use for developers, while Scylla, a C++ rewrite of Cassandra, provides benefits through its architectural approach, including getting rid of the JVM and a CPU-level design that gets the most out of your hardware thanks to a CPU level design.
Numberly has been using MongoDB for over a decade and Scylla for over a year in production. The benefits of the Scylla architecture allied to the Cassandra ecosystem fuel a rapid adoption in a very wide range of use cases: from real-time data pipelines and analytics batches processing to web applications database backend.
Learn the motivations of such an adoption trend and why it proves to be successful so far while outlining its limits and why MongoDB is still here to stay!
Similar to How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night (20)
Optimizing NoSQL Performance Through ObservabilityScyllaDB
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor!
Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track.
This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example:
- Common issues getting up and running with the monitoring stack
- Using the CQL optimizations dashboard
- Common issues causing high latency in a node
- Common issues causing replica imbalance
- What a healthy system looks like in terms of memory
- Key metrics to keep an eye on
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
Discuss the core tradeoffs and considerations involved in order-free and ordered stream processing. Brian Taylor walks through the pros and cons of three different approaches: no data dependency, deferred inter-event data dependency, and streaming inter-event data dependency.
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
We start by setting up a common ground introducing why relational databases fall short, addressing common EDA characteristics such as the need for real-time response times and schemaless approaches to address recurring changes to adapt and on-board new use cases. Next, interact with a sample Rust-based application: a social network app demonstrating an integration of both ScyllaDB and Redpanda.
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example:
- Understanding query first design principles
- Planning for schema evolution
- Steering clear of common pitfalls and anti-patterns
- Assessing data access patterns
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
Our first webinar of this series will cover common mistakes with practices such as:
- Translating the data model to NoSQL
- Optimizing table design
- Optimizing query performance
- Planning for partitioning
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
Expert tips on how to maximize your database performance at scale
Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies.
In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams.
We’ll cover how to:
- Design and deploy a large-scale distributed database cluster
- Optimize your clients’ interactions with it
- Expand the cluster horizontally and globally
- Ensure it survives whatever disasters the world throws at it
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll:
- Examine the context and technical requirements
- Talk about potential solutions and cover the pros and cons of each
- Disclose what approach the team took, and how it worked out
About the speaker:
Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care?
This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on:
- The hidden aspects of linear scaling
- When linear scaling matters most and when it’s simply irrelevant
- Often overlooked considerations for optimizing and measuring distributed systems performance
Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.
Navigating Complex Database Performance Hurdles
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma:
- The presenters will describe the context and technical requirements
- Together, we’ll talk about potential solutions and cover the pros and cons of each
- Finally, we’ll disclose what approach the team took, and how it worked out
Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
Felipe Cardeneti Mendes, Solutions Architect at ScyllaDB
Navigating workload-specific performance challenges and tradeoffs.
Felipe Mendes covers how to navigate the top performance challenges and tradeoffs that you’re likely to face with your project’s specific workload characteristics and technical/business requirements.
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
Pavel Emelyanov, Principal Engineer at ScyllaDB
Botond Denes, C++ Developer at ScyllaDB
What performance-minded engineers need to know.
Hear from Pavel Emelyanov and Botond Dénes on the impact of database internals – specifically, what to look for if you need latency and/or throughput improvements.
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
Piotr Sarna, Software Engineer at Turso
Understanding and tapping your driver’s performance potential.
Piotr Sarna discusses how to get the most out of a driver, particularly from the performance perspective, and select a driver that’s a good fit for your needs.
Technical risks of putting a cache in front of your database– and what to do instead
Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. External caches can be one of the more problematic components of distributed application architecture.
Join this webinar for a technical discussion of the risks associated with using an external cache and a look at how ScyllaDB’s cache implementation simplifies your architecture without compromising latency. We’ll cover:
- Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache)
- 7 specific reasons why external caching ia a bad choice
- Why Linux’s default caching doesn’t work well for databases
- The advantages & architecture of ScyllaDB's specialized row-based cache
- Real-world examples of why and how teams eliminated their external cache with ScyllaDB
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate.
Join us to learn:
- Why and how to ensure your database takes full advantage of your cloud infrastructure
- What architectural considerations matter most for high throughput and low latency
- Key factors to consider when selecting a high-performance database
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. Caches can be one of the more problematic components of distributed application architecture.
Join this webinar for a technical discussion of the risks associated with using an external cache and a look at an alternative strategy that simplifies your architecture without compromising latency. We’ll cover:
- Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache)
- 7 specific reasons why external caching can be a bad choice
- Why Linux’s default caching doesn’t work well for databases
- The advantages & architecture of specialized row-based caches
- Real-world examples of why and how teams eliminated their external cache
Expert tips on how to maximize your database potential
If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case?
This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to:
- Infrastructure selection
- ScyllaDB configuration
- Client-side setup
- Data modeling
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
In this talk, Felipe Mendes, Solutions Architect at ScyllaDB, shares how 4 companies managed their migration. He covers:
Disney+ – No migration needed!
Discord – Shadow cluster
OpenWeb – TTL expiration, cover Load and Stream
MyHeritage – Counters
ShareChat – Bonus: A bit of everything
In this talk, Lubos discusses tools and methods for a successful migration. He covers:
Methods
Data (re)modeling
APIs
Spark Migrator
DS bulk
Tuning
Testing/monitoring
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
In this talk, Jon discusses practical strategies and issues to consider. He covers:
Reasons for Migrations
DB Functionality
Cost/Licensing
Outdated Technology
Scaling Problems
Technology Evolution
SQL to NoSQL
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
When stars align: studies in data quality, knowledge graphs, and machine lear...
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
1. How to Sync
Tens of Millions of Browsers
and Sleep Well at Night
Rafał Furmański & Piotr Olchawa
2. Presenters
Rafał Furmański, Engineering Manager
Project Manager, Software Engineer, Big Data enthusiast and certified Cassandra developer.
Rafał has 10+ years of experience in programming.
After work: addicted volleyball player.
Piotr Olchawa, Software Engineer
Piotr is a Software Engineer working at Opera between backend and SysOps.
He has over 4 years of experience in programming. He is a big fan of everything that’s extreme:
rock climbing, hackathons, public speaking.
3. Outline
■ About Opera and Sync
■ Problems with Cassandra and first encounter with Scylla
■ Migration process and results
■ Automated repairs with scylla-cli
■ Scylla proxy & shard awareness
6. About Opera
■ Founded in 1995 in Norway
■ HQ in Oslo
■ Branches in Poland, Sweden and China
■ Listed on NASDAQ
■ We make browsers & apps
● Desktop:
■ Opera
■ Opera GX
● Mobile:
■ Opera Mini
■ Opera for Android
■ Opera Touch
■ Opera News
7. About Opera
■ Opera has pioneered many concepts
found in the major browsers today
■ We continue to introduce unique features
in our products
8. Opera syncs
■ Favorite sites on the Speed Dial
■ Bookmarks
■ Open tabs from all devices
■ Browsing history
■ Passwords
■ Boowser preferences
About Opera Sync
10. Opera Sync - infrastructure/software
■ Deployed on bare metal boxes in 2 datacenters:
● Backend - 2x10
● Database - 2x13
■ On each backend host:
● Debian Stretch
● Docker containers:
■ uWSGI (Python/Django App)
■ Nginx
■ Celery workers
■ RabbitMQ
■ statsd
■ Configuration/Deployment: Ansible & Docker Swarm
■ Monitoring: Graphite/Grafana + Nagios + PagerDuty
11. Opera Sync - example model and queries
class Bookmark(Model):
user_id = columns.Text(partition_key=True)
version = columns.BigInt(primary_key=True, clustering_order='ASC')
id = columns.Text(primary_key=True)
parent_id = columns.Text()
position = columns.Bytes()
name = columns.Text()
ctime = columns.DateTime()
mtime = columns.DateTime()
deleted = columns.Boolean(default=False)
folder = columns.Boolean(default=False)
specifics = columns.Bytes()
12. Opera Sync - example model and queries
class Bookmark(Model):
user_id = columns.Text(partition_key=True)
version = columns.BigInt(primary_key=True, clustering_order='ASC')
id = columns.Text(primary_key=True)
parent_id = columns.Text()
position = columns.Bytes()
name = columns.Text()
ctime = columns.DateTime()
mtime = columns.DateTime()
deleted = columns.Boolean(default=False)
folder = columns.Boolean(default=False)
specifics = columns.Bytes()
Query 1: Get all bookmarks of user ‘Adam’ from version=5 # version == precise timestamp
Query 2: Change/remove bookmark of user ‘Adam’ with version=5 and id=’6’
14. Problems with Cassandra
■ We started with Cassandra=2.1 and immediately got hit by:
● [CASSANDRA-9935] Repair fails with RuntimeException
● [CASSANDRA-10689] java.lang.OutOfMemoryError: Direct buffer memory
● [CASSANDRA-10697] Leak detected while running offline scrub
● [CASSANDRA-8558] Deleted row still can be selected out
● [CASSANDRA-8446] Lost writes when using lightweight transactions
● [CASSANDRA-8280] Crash on inserting data over 64K into indexed strings
● [CASSANDRA-8067] NullPointerException in KeyCacheSerializer
● [CASSANDRA-9681] Memtable heap size grows and GC pauses are triggered
15. Problems with Cassandra
■ Bugs, bugs, bugs…
■ Very high p95/p99 read/write latencies
■ Long GC pauses(!!!)
■ Insane CPU usage
■ Failing Gossip/Binary protocols
■ Restarts without specific reason
■ Problems with bootstrapping new nodes
■ Neverending repairs
16. Our “solutions”
■ Add more and more C* nodes!
■ Tune every piece of C*/Java config
■ Seek help from C* gurus
■ [SYNC-1146] Cron job to restart C* periodically (sic!)
17. Our journey with Scylla
First encounter:
Cassandra Summit
First Scylla Cluster
&
Benchmarks
September 2015
July 2018
Decision to migrate
August 2018
Decommissioning of
last Cassandra Node
13 May 2019
18. Initial benchmarks
■ setup: 3 bare metal nodes in the cluster
■ tool: cassandra-stress
■ keyspace: sync, table: bookmark, time: 10 minutes
■ mixed workload: 50% GetUpdates / 50% Commit
20. Migration process
1. Make django-cassandra-engine connect to more than 1 database
2. Prepare 2x3 node Scylla Cluster (with monitoring)
3. Update backend to be connection-aware
Bookmark.objects.using(connection='scylla').filter(...)
1. Move a few test users to Scylla (me and coworkers)
2. Make all new users use Scylla
3. Slowly migrate all existing users from Cassandra to Scylla
a. decommission nodes from Cassandra cluster
b. add decommissioned nodes to Scylla cluster
4. Disconnect Cassandra and make Scylla the default database engine
5. Cleanup
22. Determining user’s connection
def get_user_store(user_id):
connection = UserStore.maybe_get_user_connection(user_id) # from cache
if connection is not None: # We know exactly which connection to use
with ContextQuery(UserStore, connection=connection) as US:
return US.objects.get(user_id=user_id)
else: # We have no clue which connection is correct for this user
try:
with ContextQuery(UserStore, connection='cassandra') as US:
user_store = US.objects.get(user_id=user_id)
except UserStore.DoesNotExist:
with ContextQuery(UserStore, connection='scylla') as US:
user_store = US.objects.get(user_id=user_id)
user_store.cache_user_connection()
return user_store
23. Migration script
Requirements:
■ Ability to move user data from Cassandra to Scylla (and back)
■ Consistency check after migrating
■ Concurrent execution is a must
■ Measure everything:
● Number of migrated users
● Migration time (with distribution)
● Errors with reasons
● Failed migrations
24. Migration script
Algorithm:
1. Pick free user from Cassandra DB (check if not already being migrated) and
mark as picked for migration
2. Set user_store.migration_pending = True (with TTL!)
3. Copy all the data to Scylla DB
4. Perform consistency check
5. Remove leftovers from Cassandra (and clear the connection cache)
6. Set user_store.migration_pending = False
25. Challenges during migration
■ Timeouts and Unavailables in Cassandra
■ Migrating huge accounts takes some time
■ User is cut off from Sync during the migration period
■ Synchronization of concurrent processes
26. Migration results
■ Reduced number of nodes: from 32 (a year ago) to 26 (now) to 8 (next)
■ Faster node bootstrap time (days vs hours)
■ Huge drops in latency
■ No more sleepless nights!
28. scylla-cli overview
■ Console script for
● Checking status of the cluster
● Performing range repairs
■ Connects to Scylla API on each
host via SSH tunnel (or direct)
■ Written in Python
■ Available on PyPi:
$ pip install scylla-cli
29. Why repair with scylla-cli?
■ It works with Scylla Open Source
■ Performs repairs only on the primary range of a Scylla node (in discrete
steps, node by node)
■ Performs advanced repair techniques (subrange repair)
■ Scheduled repairs - what and when (specific node, table)
■ Built-in retry mechanism
■ Real time repair progress and ETA
■ Works better on a busy cluster than regular nodetool repair
Example repair usage:
$ scli repair sync session --dc=Amsterdam
31. Scylla per-node CPU shard awareness
■ “Gains by Using Scylla-Specific Drivers” - over 2x latency decrease:
(Scylla Summit 2018 - Piotr Jastrzębski)
■ Cassandra native protocol extension
■ Achieved by per-node CPU connections
32. Shard-awareness for Sync
Rationale
■ 48 shards per Scylla server - potential performance improvement
Obstacles
■ No Python driver support
■ 300 uwsgi + celery workers per host
● 13 DBs * 48 shards * 300 workers = ~187000 connections
● Port range up to 65535
The solution
■ Proxy Scylla client/server (gocqlproxy)
1 connection / worker, and 1 connection / host-shard
(~300 workers + ~600 shards = ~900 connections)
■ Simplified protocol with just one message type
34. gocqlproxy implementation - driver and proxy
cassandra/proxy_session.py:
class ProxyConnection(DefaultConnection):
# (...)
def send_msg(self, msg, *args, **kwargs):
# (...)
proxied_msg = ProxiedMessage(msg, routing_key)
return super().send_msg(proxied_msg, ...)
cassandra/protocol.py:
class ProxiedMessage(_MessageType):
opcode = 0xF0
# (...)
def send_body(self, f, protocol_version):
message_bytes = encode_message(self.message)
write_longstring(f, message_bytes)
write_longstring(f, self.routing_key)
proxy.go:
frameWriter := &writeProxiedFrame{
head: nestedHead,
frameData: nestedFramer.rbuf,
}
// find the appropriate host/shard to forward the frame to:
partitionKey := query.clientFramer.readBytes()
serverConn, err := query.session.pickHost(partitionKey)
// send the frame to the chosen server/shard:
serverConn.exec(context.TODO(), frameWriter, nil)
// return the response to client (use client’s stream id)
clientFramer.writeHeader(response, outerHeader.stream)
clientFramer.wbuf = append(clientFramer.wbuf, frameData...)
clientFramer.finishWrite()
35. Shard-aware Sync and gocqlproxy – results
■ Production:
● We’ve enabled a working prototype of gocqlproxy, running stable, for a few days
● We can use shard-awareness with 900 connections instead of 180000
■ Local synthetic benchmarks - measured latency decreases:
cluster-wide approximated latencies
averaged over 75-second test runs
‘non-shard-aware→shard-aware’
(100% * (before-after) / before)
read [μs] write [μs]
avg 580→480 (~17%) 570→470 (~18%)
p95 1000→980 (~2%) 1000→980 (~2%)
p99 2000→1000 (~50%) 1900→1000 (~47%)
37. Take away
■ Download Opera Browser
■ Django-cassandra-engine
■ Scylla-cli
■ Scylla-proxy:
● gocqlproxy
● Python driver
38. Thank you Stay in touch
Any questions?
Rafał Furmański
rfurmanski@opera.com
r4fek
Piotr Olchawa
polchawa@opera.com
BugsKillPeople
Editor's Notes
350M -> monthly active users
shard-awareness for python-driver?
(+) no proxy required
(-) non-obvious logic to implement (shard num calculation)
(-) likely too many connections anyway
gocqlproxy
(+) simple - most responsibilities in the proxy - already implemented in gocql
(+) connection numbers, similarly to twemproxy
(-) non-standard protocol changes, both in the driver and in the proxy
Without proxy:
every worker has one connection per shard
(about 300 workers x 600 shards = almost 200k)
With proxy:
every worker has just one connection to the single-process proxy (about 300 connections)
the proxy has one connection per shard (about 600 connections)
300 + 600 = 900
M workers, N shards -> M+N instead of M*N
Messages wrapped as (message, routing_key)
Idea #1: routing_key extracted in the proxy “transparently”
parsing logic to implement
Idea #2: routing_key duplicated by the driver
proxy can use it directly to find the right shard
simple to implement: just (1) add message type, (2) pass the message through to the server in the proxy
remember that stream id needs to come from the “outer”/”wrapping” message
Promising results in multiple-container, single-machine clusters
Up to 50% latency decreases (p99), when alternating between shard-aware/non-aware tests
Little performance improvement in production (except decreased connection counts)
Perhaps IPC overhead is negligible in Sync for some reason?