Ticketmaster is part of Live Nation Entertainment, the world's leading live entertainment company. Learn why they went with Scylla after conducting performance testing between Scylla, Apache Cassandra and DataStax Enterprise.
Powering a Graph Data System with Scylla + JanusGraphScyllaDB
Key Value and Column Stores are not the only two data models Scylla is capable of. In this presentation learn the What, Why and How of building and deploying a graph data system in the cloud, backed by the power of Scylla.
ScyllaDB's Avi Kivity on UDF, UDA, and the FutureScyllaDB
Scylla is now capable of executing user-defined functions and user-defined aggregates. That allows queries to be more flexible, and in many situations, by avoiding server - client data transfers, faster too. In this talk, we will look at the infrastructure added to Scylla to make it happen. One key piece of that infrastructure, is the integration of a programming language interpreter that allows the users to inject their own custom code. But once that happens, where do we stop? We will look into proposed extensions to Scylla to leverage this infrastructure to allow Scylla to consume your data in faster, more efficient, and creative ways.
The talk will cover most of the performance enhancement introduced to Scylla over the past 12 months. As the throughput was very good before, we focused on Scylla’s behaviour under all types of workloads and data models. Scylla improved its latency under all scenarios, improving the behaviours of data models such as large partitions and time series, improvement of the I/O scheduler and behaviour of streaming and repair.
Disney+ Hotstar provides on-demand video to more than 18.5 million paid subscribers and 300 million monthly active users. Hotstar's India Premier League (IPL) coverage in 2020 made it the most widely streamed sports events in the world to date. Learn why they chose Scylla Cloud to replace both Redis and Elasticsearch, and how they migrated their data with no downtime.
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
Getting started with AWS DynamoDB™ is famously easy, but as an application grows and evolves it often starts to struggle with DynamoDB’s limitations. We introduce Scylla’s Alternator, which provides the same API as DynamoDB but aims to empower the application developer. In this presentation we will survey some of Alternator’s developer-centered features: Alternator lets you test and eventually deploy your application anywhere, on any public cloud or private cluster. It efficiently supports multiple tables so it does not require difficult single-table design. Finally, Alternator provides the developer with strong observability tools. The insights provided by these tools can detect bottlenecks, improve performance and even lower its cost.
Scylla Summit 2018: Keynote - 4 Years of ScyllaScyllaDB
This document summarizes Dor Laor's experience over 4+ years with ScyllaDB, including key milestones and achievements as well as ongoing goals and challenges. It notes Scylla's initial release in 2016 and improvements over time to features such as materialized views and global secondary indexes. It also discusses optimizing performance on cloud infrastructure and addressing challenges related to workload types and capacity planning. Going forward, it outlines priorities like lightweight transactions, change data capture, and improving Cassandra compatibility. The overall message is one of pride in accomplishments while still feeling challenged to achieve further dreams and improvements.
Many NoSQL DBaaS vendors limit what cloud platform you can run on, the size of the data you can run and require you to over-provision cloud infrastructure resources while failing to deliver performance and low latency at scale.
In this session, we will compare the performance and Total Cost of Ownership (TCO) of competing NoSQL DBaaS offerings. We will also review how to migrate to Scylla Cloud, our fully managed database service.
You will learn:
- The true cost of ownership for selected NoSQL DBaaS offerings
- The 8 essentials for selecting a NoSQL DBaaS
- Migration options from Apache Cassandra, DynamoDB and other databases
Powering a Graph Data System with Scylla + JanusGraphScyllaDB
Key Value and Column Stores are not the only two data models Scylla is capable of. In this presentation learn the What, Why and How of building and deploying a graph data system in the cloud, backed by the power of Scylla.
ScyllaDB's Avi Kivity on UDF, UDA, and the FutureScyllaDB
Scylla is now capable of executing user-defined functions and user-defined aggregates. That allows queries to be more flexible, and in many situations, by avoiding server - client data transfers, faster too. In this talk, we will look at the infrastructure added to Scylla to make it happen. One key piece of that infrastructure, is the integration of a programming language interpreter that allows the users to inject their own custom code. But once that happens, where do we stop? We will look into proposed extensions to Scylla to leverage this infrastructure to allow Scylla to consume your data in faster, more efficient, and creative ways.
The talk will cover most of the performance enhancement introduced to Scylla over the past 12 months. As the throughput was very good before, we focused on Scylla’s behaviour under all types of workloads and data models. Scylla improved its latency under all scenarios, improving the behaviours of data models such as large partitions and time series, improvement of the I/O scheduler and behaviour of streaming and repair.
Disney+ Hotstar provides on-demand video to more than 18.5 million paid subscribers and 300 million monthly active users. Hotstar's India Premier League (IPL) coverage in 2020 made it the most widely streamed sports events in the world to date. Learn why they chose Scylla Cloud to replace both Redis and Elasticsearch, and how they migrated their data with no downtime.
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
Getting started with AWS DynamoDB™ is famously easy, but as an application grows and evolves it often starts to struggle with DynamoDB’s limitations. We introduce Scylla’s Alternator, which provides the same API as DynamoDB but aims to empower the application developer. In this presentation we will survey some of Alternator’s developer-centered features: Alternator lets you test and eventually deploy your application anywhere, on any public cloud or private cluster. It efficiently supports multiple tables so it does not require difficult single-table design. Finally, Alternator provides the developer with strong observability tools. The insights provided by these tools can detect bottlenecks, improve performance and even lower its cost.
Scylla Summit 2018: Keynote - 4 Years of ScyllaScyllaDB
This document summarizes Dor Laor's experience over 4+ years with ScyllaDB, including key milestones and achievements as well as ongoing goals and challenges. It notes Scylla's initial release in 2016 and improvements over time to features such as materialized views and global secondary indexes. It also discusses optimizing performance on cloud infrastructure and addressing challenges related to workload types and capacity planning. Going forward, it outlines priorities like lightweight transactions, change data capture, and improving Cassandra compatibility. The overall message is one of pride in accomplishments while still feeling challenged to achieve further dreams and improvements.
Many NoSQL DBaaS vendors limit what cloud platform you can run on, the size of the data you can run and require you to over-provision cloud infrastructure resources while failing to deliver performance and low latency at scale.
In this session, we will compare the performance and Total Cost of Ownership (TCO) of competing NoSQL DBaaS offerings. We will also review how to migrate to Scylla Cloud, our fully managed database service.
You will learn:
- The true cost of ownership for selected NoSQL DBaaS offerings
- The 8 essentials for selecting a NoSQL DBaaS
- Migration options from Apache Cassandra, DynamoDB and other databases
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
Are you running separate database clusters for operational and analytical workloads? Scylla now has the ability to handle multiple workloads from a single cluster--without performance degradation to either. This session will cover:
- The evolving requirements for operational (OLTP) and analytics (OLAP) workloads in the modern datacenter
- How Scylla provides built-in control over workload priority and makes it easy for administrators to configure workload priorities
- The TCO impact of minimizing integrations and maintenance tasks, while also shrinking the datacenter footprint and maximizing utilization Plus we’ll share test results of how it performs in real-world settings.
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
Jeff Bean will lead a discussion of event-driven architectures, Apache Kafka, Kafka Connect, KSQL and Confluent Cloud. Then we'll talk about some uses of Confluent and Scylla together, including a co-deployment with Lookout, ScyllaDB and Confluent in the IoT space, and the upcoming native connector.
Scylla Summit 2018: Getting the Most Out of Scylla on KubernetesScyllaDB
People want to have the convenience of deployment through Kubernetes, while still maintaining performance and management control. Moreno first began by getting Scylla working on Docker, and will discuss his in-depth investigation in getting passed performance bottlenecks. After finding how to get most of the performance back, then moved into Kubernetes. StatefulSets are production-ready since Kubernetes 1.9 but there is lot around StatefulSets that is not quite there. What are the tradeoffs of running a stateful application in a stateless environment? How do we minimize those tradeoffs to get the best operational reliability on Kubernetes without losing Scylla performance optimizations? What do you do when you are trying to run as close to the hardware as possible and then you containerize your installation? How do you remain an auto-tuning database when you are running in a containerized world? Learn how to use Docker, Kubernetes and Helm Charts with Scylla. We now invite members of the open source user community for your contributions, testing and feedback. Join our channels for #docker and #kubernetes on our open Slack!
Gumgum is a global media company specializing in contextual intelligence. Find out how and why they moved to Scylla Cloud to meet their growing customer demands and deliver on their promise of future proofed solutions.
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...Data Con LA
Scylla is a new, open-source NoSQL data store with a novel design optimized for modern hardware, capable of 1.8 million requests per second per node, while providing Apache Cassandra compatibility and scaling properties. While conventional NoSQL databases suffer from latency hiccups, expensive locking, and low throughput due to low processor utilization, the Scylla design is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC. The result is a NoSQL database that delivers an order of magnitude more performance, with less performance tuning needed from the administrator.
With extra performance to work with, NoSQL projects can have more flexibility to focus on other concerns, such as functionality and time to market. Come for the tech details on what Scylla does under the hood, and leave with some ideas on how to do more with NoSQL, faster.
Speaker bio
Don Marti is technical marketing manager for ScyllaDB. He has written for Linux Weekly News, Linux Journal, and other publications. He co-founded the Linux consulting firm Electric Lichen. Don is a strategic advisor for Mozilla, and has previously served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for Uselinux, Codecon, and LinuxWorld Conference and Expo.
Vectorized is presenting on their RPC system and Redpanda product. They are focused on operational simplicity, safety, and 10x the performance of Kafka. Their RPC goals include using PODs instead of IDLs, isolation, avoiding translation costs, and embracing futures. Their measurements show their RPC system is up to 7x faster than flatbuffers for nested data structures and has lower latency than alternative systems.
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
FireEye believes in intelligence driven cyber security. Their legacy system used PostgreSQL with a custom graph database system to store and facilitate analysis of threat intelligence data. As their user base increased they ran into scaling issues requiring a system redesign with a new platform.
This presentation will focus on the bac kend systems and migration path to a new technology stack using JanusGraph running on top of Scylla plus Elasticsearch.
Using Scylladb turned out to be a game-changer in terms of performance and the types of analysis our application is able to do effortlessly.
Scylla began with a Cassandra compatibility story, implementing Cassandra’s query language (CQL) and replicating its user-visible architecture. Recently we introduced “Alternator” - an experimental feature adding compatibility with a second NoSQL database: Amazon’s DynamoDB. In this talk we look at why DynamoDB’s API was chosen as a good target for our API extension, how DynamoDB is similar to Scylla - and how it differs, and how we can implement DynamoDB’s API in Scylla. We will describe our progress so far in making Alternator compatible with DynamoDB - and what still remains to be done so that any DynamoDB application can run unmodified on Scylla.
Using ScyllaDB with JanusGraph for Cyber SecurityScyllaDB
Come hear how QOMPLX, a leader in Cyber Security Risk Management solutions uses ScyllaDB and JanusGraph to detect, manage and assess risks for large corporate and government clients. By leveraging two highly horizontally scalable and fault tolerant technologies, QOMPLX can flex with their clients' needs.
Addressing the High Cost of Apache CassandraScyllaDB
Is your Cassandra deployment size out of control? * Do you get constant requests to source more nodes to sustain your NoSQL workload? * Do you need to put an external cache in front of your database to ensure performance? * Is managing your Cassandra clusters too time-consuming and expensive -- either from your own staff or the high price you’re paying your DBaaS vendor?
In this webinar, we’ll dive into the myths about Cassandra ownership costs and the pitfalls that come with it. We’ll show that using modern design techniques, simplified tuning and a scalable datastore can help you control your Total Cost of Ownership (TCO).
Eyal Gutkind, our VP of Solution Engineering, will walk you though:
Primary and secondary considerations for evaluating the effectiveness of your data platform
The correlation between use cases and deployment costs
How you know it's time to migrate and why
Cassandra users! This is a must-attend session for you!! We will show you ways to gauge your Cassandra overspend.
This document discusses Cassandra and techniques for inserting data into Cassandra using the Cassandra driver. It describes three methods for inserting data - execute (blocks until response), execute async (returns immediately without blocking), and batch insert (combines multiple statements). It also covers pagination in Cassandra using fetch size, saving the paging state, and offset queries. Performance comparisons show execute async has lower execution time than execute/sync for the same number of entries.
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalScyllaDB
GPS Insight is a leader in fleet vehicle management using IoT. Internally they use a combination of SQL and NoSQL big data technologies, including distributed SQL data analytics via Presto, an open-source query engine developed by Facebook. Learn how to set up, configure, and use Presto with Scylla for supporting ad hoc non-partition key queries for analytics and data scientists. Plus hear how to use Presto for a Data Archival approach with csv files on S3 or similar storage appliance.
High-Load Storage of Users’ Actions with ScyllaDB and HDDsScyllaDB
The presentation gives a brief overview of the high-load service that stores users' actions. The given service is able to serve up to 240k writes per second in less than 2ms 95 percentile with just a few ScyllaDB nodes packed with HDDs. Hardware setup, cluster specification, live load numbers and latencies achieved are given. The problems we encountered with HDD setup are described along with the possible solutions to them.
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScyllaDB
Originally using SAP Adaptive Server Enterprise (ASE), the GPS Insight team soon found that relational databases simply aren’t a match for high volume machine data. To top it off, SAP ASE’s clustering technology proved cumbersome to manage and operate. In this presentation, you’ll learn about GPS Insight’s hybrid Scylla deployment that runs on-premises and on AWS datacenter. GPS Insight relies on Scylla to capture and analyze GPS data, offloading data from RDBMS to Scylla for hybrid analytics approach.
Scylla’s Journey Towards Being an Elastic Cloud Native DatabaseScyllaDB
Cloud Native Databases are required to scale while serving the increase in online workload with a minimal disruption and complete it as fast as possible. In this session we will review the different components that are stressed in scaling scenarios and present work we have done over the year to improve Scylla’s elasticity as we enhance it to be a true Cloud Native Database.
ScyllaDB recently launched our Scylla Cloud database as a service, which combines the speed and power of the Scylla NoSQL database with the ease of a fully managed cloud service. Scylla Cloud relieves your team of day-to-day cluster management so you can focus on creating modern, interactive applications that respond to queries in milliseconds.
Join us for an overview of Scylla Cloud, including a live demo of how to launch and connect to a cluster, how to create and query a table, and how to run a few operations, all in minutes.
Seastar is a framework for disk, network, compute, and multicore intensive applications such as databases and filesystems. It treats multicore CPUs and disk I/O as asynchronous entities like networking, replacing locks with message passing. This provides benefits like high throughput, low latency, and control over where throughput and latency occur. The keynote discussed Seastar's approach to scheduling, opportunities around coroutines, and its goals for modules, stream revamping, and task co-execution. Compatibility policies were outlined emphasizing community involvement in supported compilers, APIs, and architectures.
Lookout on Scaling Security to 100 Million DevicesScyllaDB
The massive increase of security-related data requires companies to respond with new approaches to ingestion. Learn how Lookout has changed its approach for ingesting telemetry to meet their goal of growing from 1.5 million devices to 100 million devices and beyond, using Kafka Connect and switching from AWS DynamoDB to Scylla.
Ben Slater presented on load testing Cassandra applications. He discussed the importance of load testing to prove an application's stability under peak loads and establish capacity planning. Some considerations for Cassandra load testing include modeling background operations, data conditions, and non-scaling operations. The presentation demonstrated how to use the cassandra-stress tool to simulate different scenarios through a YAML configuration file, generating data and queries. Tips included controlling load rate and using different population distributions.
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
Are you running separate database clusters for operational and analytical workloads? Scylla now has the ability to handle multiple workloads from a single cluster--without performance degradation to either. This session will cover:
- The evolving requirements for operational (OLTP) and analytics (OLAP) workloads in the modern datacenter
- How Scylla provides built-in control over workload priority and makes it easy for administrators to configure workload priorities
- The TCO impact of minimizing integrations and maintenance tasks, while also shrinking the datacenter footprint and maximizing utilization Plus we’ll share test results of how it performs in real-world settings.
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
Jeff Bean will lead a discussion of event-driven architectures, Apache Kafka, Kafka Connect, KSQL and Confluent Cloud. Then we'll talk about some uses of Confluent and Scylla together, including a co-deployment with Lookout, ScyllaDB and Confluent in the IoT space, and the upcoming native connector.
Scylla Summit 2018: Getting the Most Out of Scylla on KubernetesScyllaDB
People want to have the convenience of deployment through Kubernetes, while still maintaining performance and management control. Moreno first began by getting Scylla working on Docker, and will discuss his in-depth investigation in getting passed performance bottlenecks. After finding how to get most of the performance back, then moved into Kubernetes. StatefulSets are production-ready since Kubernetes 1.9 but there is lot around StatefulSets that is not quite there. What are the tradeoffs of running a stateful application in a stateless environment? How do we minimize those tradeoffs to get the best operational reliability on Kubernetes without losing Scylla performance optimizations? What do you do when you are trying to run as close to the hardware as possible and then you containerize your installation? How do you remain an auto-tuning database when you are running in a containerized world? Learn how to use Docker, Kubernetes and Helm Charts with Scylla. We now invite members of the open source user community for your contributions, testing and feedback. Join our channels for #docker and #kubernetes on our open Slack!
Gumgum is a global media company specializing in contextual intelligence. Find out how and why they moved to Scylla Cloud to meet their growing customer demands and deliver on their promise of future proofed solutions.
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...Data Con LA
Scylla is a new, open-source NoSQL data store with a novel design optimized for modern hardware, capable of 1.8 million requests per second per node, while providing Apache Cassandra compatibility and scaling properties. While conventional NoSQL databases suffer from latency hiccups, expensive locking, and low throughput due to low processor utilization, the Scylla design is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC. The result is a NoSQL database that delivers an order of magnitude more performance, with less performance tuning needed from the administrator.
With extra performance to work with, NoSQL projects can have more flexibility to focus on other concerns, such as functionality and time to market. Come for the tech details on what Scylla does under the hood, and leave with some ideas on how to do more with NoSQL, faster.
Speaker bio
Don Marti is technical marketing manager for ScyllaDB. He has written for Linux Weekly News, Linux Journal, and other publications. He co-founded the Linux consulting firm Electric Lichen. Don is a strategic advisor for Mozilla, and has previously served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for Uselinux, Codecon, and LinuxWorld Conference and Expo.
Vectorized is presenting on their RPC system and Redpanda product. They are focused on operational simplicity, safety, and 10x the performance of Kafka. Their RPC goals include using PODs instead of IDLs, isolation, avoiding translation costs, and embracing futures. Their measurements show their RPC system is up to 7x faster than flatbuffers for nested data structures and has lower latency than alternative systems.
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
FireEye believes in intelligence driven cyber security. Their legacy system used PostgreSQL with a custom graph database system to store and facilitate analysis of threat intelligence data. As their user base increased they ran into scaling issues requiring a system redesign with a new platform.
This presentation will focus on the bac kend systems and migration path to a new technology stack using JanusGraph running on top of Scylla plus Elasticsearch.
Using Scylladb turned out to be a game-changer in terms of performance and the types of analysis our application is able to do effortlessly.
Scylla began with a Cassandra compatibility story, implementing Cassandra’s query language (CQL) and replicating its user-visible architecture. Recently we introduced “Alternator” - an experimental feature adding compatibility with a second NoSQL database: Amazon’s DynamoDB. In this talk we look at why DynamoDB’s API was chosen as a good target for our API extension, how DynamoDB is similar to Scylla - and how it differs, and how we can implement DynamoDB’s API in Scylla. We will describe our progress so far in making Alternator compatible with DynamoDB - and what still remains to be done so that any DynamoDB application can run unmodified on Scylla.
Using ScyllaDB with JanusGraph for Cyber SecurityScyllaDB
Come hear how QOMPLX, a leader in Cyber Security Risk Management solutions uses ScyllaDB and JanusGraph to detect, manage and assess risks for large corporate and government clients. By leveraging two highly horizontally scalable and fault tolerant technologies, QOMPLX can flex with their clients' needs.
Addressing the High Cost of Apache CassandraScyllaDB
Is your Cassandra deployment size out of control? * Do you get constant requests to source more nodes to sustain your NoSQL workload? * Do you need to put an external cache in front of your database to ensure performance? * Is managing your Cassandra clusters too time-consuming and expensive -- either from your own staff or the high price you’re paying your DBaaS vendor?
In this webinar, we’ll dive into the myths about Cassandra ownership costs and the pitfalls that come with it. We’ll show that using modern design techniques, simplified tuning and a scalable datastore can help you control your Total Cost of Ownership (TCO).
Eyal Gutkind, our VP of Solution Engineering, will walk you though:
Primary and secondary considerations for evaluating the effectiveness of your data platform
The correlation between use cases and deployment costs
How you know it's time to migrate and why
Cassandra users! This is a must-attend session for you!! We will show you ways to gauge your Cassandra overspend.
This document discusses Cassandra and techniques for inserting data into Cassandra using the Cassandra driver. It describes three methods for inserting data - execute (blocks until response), execute async (returns immediately without blocking), and batch insert (combines multiple statements). It also covers pagination in Cassandra using fetch size, saving the paging state, and offset queries. Performance comparisons show execute async has lower execution time than execute/sync for the same number of entries.
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalScyllaDB
GPS Insight is a leader in fleet vehicle management using IoT. Internally they use a combination of SQL and NoSQL big data technologies, including distributed SQL data analytics via Presto, an open-source query engine developed by Facebook. Learn how to set up, configure, and use Presto with Scylla for supporting ad hoc non-partition key queries for analytics and data scientists. Plus hear how to use Presto for a Data Archival approach with csv files on S3 or similar storage appliance.
High-Load Storage of Users’ Actions with ScyllaDB and HDDsScyllaDB
The presentation gives a brief overview of the high-load service that stores users' actions. The given service is able to serve up to 240k writes per second in less than 2ms 95 percentile with just a few ScyllaDB nodes packed with HDDs. Hardware setup, cluster specification, live load numbers and latencies achieved are given. The problems we encountered with HDD setup are described along with the possible solutions to them.
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScyllaDB
Originally using SAP Adaptive Server Enterprise (ASE), the GPS Insight team soon found that relational databases simply aren’t a match for high volume machine data. To top it off, SAP ASE’s clustering technology proved cumbersome to manage and operate. In this presentation, you’ll learn about GPS Insight’s hybrid Scylla deployment that runs on-premises and on AWS datacenter. GPS Insight relies on Scylla to capture and analyze GPS data, offloading data from RDBMS to Scylla for hybrid analytics approach.
Scylla’s Journey Towards Being an Elastic Cloud Native DatabaseScyllaDB
Cloud Native Databases are required to scale while serving the increase in online workload with a minimal disruption and complete it as fast as possible. In this session we will review the different components that are stressed in scaling scenarios and present work we have done over the year to improve Scylla’s elasticity as we enhance it to be a true Cloud Native Database.
ScyllaDB recently launched our Scylla Cloud database as a service, which combines the speed and power of the Scylla NoSQL database with the ease of a fully managed cloud service. Scylla Cloud relieves your team of day-to-day cluster management so you can focus on creating modern, interactive applications that respond to queries in milliseconds.
Join us for an overview of Scylla Cloud, including a live demo of how to launch and connect to a cluster, how to create and query a table, and how to run a few operations, all in minutes.
Seastar is a framework for disk, network, compute, and multicore intensive applications such as databases and filesystems. It treats multicore CPUs and disk I/O as asynchronous entities like networking, replacing locks with message passing. This provides benefits like high throughput, low latency, and control over where throughput and latency occur. The keynote discussed Seastar's approach to scheduling, opportunities around coroutines, and its goals for modules, stream revamping, and task co-execution. Compatibility policies were outlined emphasizing community involvement in supported compilers, APIs, and architectures.
Lookout on Scaling Security to 100 Million DevicesScyllaDB
The massive increase of security-related data requires companies to respond with new approaches to ingestion. Learn how Lookout has changed its approach for ingesting telemetry to meet their goal of growing from 1.5 million devices to 100 million devices and beyond, using Kafka Connect and switching from AWS DynamoDB to Scylla.
Ben Slater presented on load testing Cassandra applications. He discussed the importance of load testing to prove an application's stability under peak loads and establish capacity planning. Some considerations for Cassandra load testing include modeling background operations, data conditions, and non-scaling operations. The presentation demonstrated how to use the cassandra-stress tool to simulate different scenarios through a YAML configuration file, generating data and queries. Tips included controlling load rate and using different population distributions.
This presentation will walk through some of the key considerations for planning and running load test to ensure your Cassandra application will meet you expected scaling requirements. We will also walk through some examples of using the cassandra-stress tool to construct load test for real-life application scenarios.
About the Speaker
Ben Slater Chief Product Officer, Instaclustr
Instaclustr provides Cassandra and Spark as a managed service in the cloud. As Chief Product Officer, Ben is charged with steering Instaclustr's development roadmap, managing product engineering and overseeing the production support and consulting teams. Ben has over 20 years experience in systems development including previously as lead architect for the product that is now Oracle Policy Automation and over 10 years as a solution architect and project manager for Accenture.
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe several techniques available in R to speed up workloads like these, by running multiple iterations simultaneously, in parallel.
Many of these techniques require the use of a cluster of machines running R, and I'll provide examples of using cloud-based services to provision clusters for parallel computations. In particular, I will describe how you can use the SparklyR package to distribute data manipulations using the dplyr syntax, on a cluster of servers provisioned in the Azure cloud.
Presented by David Smith at Data Day Texas in Austin, January 27 2018.
Pollfish is a survey platform which provides access to millions of targeted users. Pollfish allows easy distribution and targeting of surveys through existing mobile apps. (https://www.pollfish.com/). At pollfish we use Cassandra for difference use cases, eg. for application data store to maximize write throughput when appropriate and for our analytics project to find insights in application generated data. As a medium to accomplish our success so far, we use the Datastax's DSE 4.6 environment which integrates Appache Cassadra, Spark and a hadoop compatible file system (CFS). We will discuss how we started, how the journey was and the impressions gained so far along with some tips learned the hard way. This is a result of joint work of an excellent team here at Pollfish.
This document provides tips and tricks for reducing total cost of ownership (TCO) on AWS. It discusses instance types and sizes best suited for different use cases. It also covers reserved instances, spot instances, managed databases vs bringing your own, and cost optimization strategies for services like S3, Glacier, and CloudWatch. Key recommendations include using T2 instance types for dev/test, choosing the right instance size, using reserved instances, bidding strategically for spot instances, and leveraging managed databases when possible. The document warns against pitfalls like oversizing or undersizing servers and exceeding API limits.
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
Find out about breakthrough architectures for fast OLAP performance querying Cassandra data with Apache Spark, including a new open source project, FiloDB.
GumGum relies heavily on Cassandra for storing different kinds of metadata. Currently GumGum reaches 1 billion unique visitors per month using 3 Cassandra datacenters in Amazon Web Services spread across the globe.
This presentation will detail how we scaled out from one local Cassandra datacenter to a multi-datacenter Cassandra cluster and all the problems we encountered and choices we made while implementing it.
How did we architect multi-region Cassandra in AWS? What were our experiences in implementing multi-datacenter Cassandra? How did we achieve low latency with multi-region Cassandra and the Datastax Driver? What are the different Cassandra use cases at GumGum? How did we integrate our Cassandra with Spark?
This document provides an agenda and overview of Big Data Analytics using Spark and Cassandra. It discusses Cassandra as a distributed database and Spark as a data processing framework. It covers connecting Spark and Cassandra, reading and writing Cassandra tables as Spark RDDs, and using Spark SQL, Spark Streaming, and Spark MLLib with Cassandra data. Key capabilities of each technology are highlighted such as Cassandra's tunable consistency and Spark's fault tolerance through RDD lineage. Examples demonstrate basic operations like filtering, aggregating, and joining Cassandra data with Spark.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
This document provides an agenda and introduction for a presentation on Apache Cassandra and DataStax Enterprise. The presentation covers an introduction to Cassandra and NoSQL, the CAP theorem, Apache Cassandra features and architecture including replication, consistency levels and failure handling. It also discusses the Cassandra Query Language, data modeling for time series data, and new features in DataStax Enterprise like Spark integration and secondary indexes on collections. The presentation concludes with recommendations for getting started with Cassandra in production environments.
Yaroslav Nedashkovsky - "Data Engineering in Information Security: how to col...Lviv Startup Club
This document discusses the system architecture for collecting, storing, and processing terabytes of data from viruses. It describes using Cassandra to store variety of data sources in a scalable way, PostgreSQL for some relational data, AWS Kinesis and Spark Streaming for streaming and processing data in real-time, and providing a REST API to access insights. The overall goal is to collect petabytes of data and gain insights through analytics.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
The document provides an overview of Apache Cassandra, including its key components, data replication, scalability, read/write operations, and tunable data consistency. It discusses how Cassandra is a distributed, decentralized database that provides high availability and horizontal scalability. The key components that enable these features are nodes, partitioners, snitches, gossip protocols, and the replication of data across multiple nodes.
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis
First Steps of an Oracle-expert in the Big Data World. Everyone speaks about Big Data. But what does it mean? This speech focuses on one animal of the Big Data Zoo - Cassandra and answers the following questions:
- Why another database?
- There is Impala and Spark. Why would I need Cassandra?
- New database - do I need to learn a new language?
- How do I get the data in?
- Can I use SQL?
- Is it part of a distribution, for example Cloudera?
Demos will explain the theory.
Quick trip around the Cosmos - Things every astronaut supposed to knowRafał Hryniewski
Slides for my talk which overviews new(ish) product of Microsoft - multi-model, cloud database known as CosmosDB.
Recorded talk (in Polish) is available here: https://youtu.be/ZWpJne0kcds?t=1h52m45s
Similar to Performance Testing: Scylla vs. Cassandra vs. Datastax (20)
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudScyllaDB
Digital Turbine, the Leading Mobile Growth & Monetization Platform, did the analysis and made the leap from DynamoDB to ScyllaDB Cloud on GCP. Suffice it to say, they stuck the landing. We'll introduce Joseph Shorter, VP, Platform Architecture at DT, who lead the charge for change and can speak first-hand to the performance, reliability, and cost benefits of this move. Miles Ward, CTO @ SADA will help explore what this move looks like behind the scenes, in the Scylla Cloud SaaS platform. We'll walk you through before and after, and what it took to get there (easier than you'd guess I bet!).
Petabytes: That's the data volume currently being managed within ScyllaDB Cloud. In this keynote, ScyllaDB's Director of Product Michael Hollander shares how ScyllaDB Cloud harnesses cutting-edge technologies to manage massive datasets efficiently, providing insights into its robust features like API and Terraform integration, data security through encryption at rest, and advanced networking options such as VPC Peering and Transit Gateway, along with upcoming features and enhancements for 2024.
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationScyllaDB
ReversingLabs recently completed the largest migration in their history: migrating more than 300 TB of data, more than 400 services, and data models from their internally-developed key-value database to ScyllaDB seamlessly, and with ZERO downtime. Services using multiple tables — reading, writing, and deleting data, and even using transactions — needed to go through a fast and seamless switch. So how did they pull it off? Martina shares their strategy, including service migration, data modeling changes, the actual data migration, and how they addressed distributed locking.
In ScyllaDB 6.0, we complete the transition to strong consistency for all of the cluster metadata. In this session, Konstantin Osipov covers the improvements we introduce along the way for such features as CDC, authentication, service levels, Gossip, and others.
CTO Insights: Steering a High-Stakes Database MigrationScyllaDB
In migrating a massive, business-critical database, the Chief Technology Officer's (CTO) perspective is crucial. This endeavor requires meticulous planning, risk assessment, and a structured approach to ensure minimal disruption and maximum data integrity during the transition. The CTO's role involves overseeing technical strategies, evaluating the impact on operations, ensuring data security, and coordinating with relevant teams to execute a seamless migration while mitigating potential risks. The focus is on maintaining continuity, optimising performance, and safeguarding the business's essential data throughout the migration process
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLScyllaDB
Tractian, an AI-driven industrial monitoring company, recently discovered that their real-time ML environment needed to handle a tenfold increase in data throughput. In this session, JP Voltani (Head of Engineering at Tractian), details why and how they moved to ScyllaDB to scale their data pipeline for this challenge. JP compares ScyllaDB, MongoDB, and PostgreSQL, evaluating their data models, query languages, sharding and replication, and benchmark results. Attendees will gain practical insights into the MongoDB to ScyllaDB migration process, including challenges, lessons learned, and the impact on product performance.
Inside Expedia's Migration to ScyllaDB for Change Data CaptureScyllaDB
Databases Migrations are no fun, and there are several different strategies and considerations one must be aware of prior to actually doing it in production. In this talk, Jean Carlo and Manikar Rangu will deep dive on Expedia’s migration journey from Cassandra to ScyllaDB. They cover the aspects and pitfalls the team needed to overcome as part of their Identity service project.
Terraform Best Practices for Infrastructure ScalingScyllaDB
Terraform is a GREAT tool, but like a lot of other things in life, it has its pitfalls and bad practices.
Since you are working with Terraform, you probably went through its documentation, which can tell you what resources can be used - BUT do you always have a clear path towards using these resources? How should you structure your Terraform code in general?
And what about scaling? How do you make the most of Terraform when scaling your infrastructure as your organization grows?
In this talk, I’ll cover useful best practices, pitfalls to avoid and major obstacles to anticipate so that you can scale across many teams, avoid refactoring, and get a flying start now -- AND optimize for the future.
You’ll also gain a go-to approach and a paved way for working with Terraform, whether it’s an existing codebase or a new functionality altogether, and also hopefully make you think about the big picture and utilize Terraform in a broader context rather than just an “infrastructure as code"" tool.
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreScyllaDB
kafka-streams-cassandra-state-store' is a drop-in Kafka Streams State Store implementation that persists data to Apache Cassandra.
By moving the state to an external datastore the stateful streams app (from a deployment point of view) effectively becomes stateless. This greatly improves elasticity and allows for fluent CI/CD (rolling upgrades, security patching, pod eviction, ...).
It also can also help to reduce failure recovery and rebalancing downtimes, with demos showing sporty 100ms rebalancing downtimes for your stateful Kafka Streams application, no matter the size of the application’s state.
As a bonus accessing Cassandra State Stores via 'Interactive Queries' (e.g. exposing via REST API) is simple and efficient since there's no need for an RPC layer proxying and fanning out requests to all instances of your streams application.
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessScyllaDB
What can you expect when migrating from DynamoDB to ScyllaDB? This session provides a jumpstart based on what we’ve learned from working with your peers across hundreds of use cases. Discover how ScyllaDB’s architecture, capabilities, and performance compares to DynamoDB’s. Then, hear about your DynamoDB to ScyllaDB migration options and practical strategies for success, including our top do’s and don’ts.
ScyllaDB Real-Time Event Processing with CDCScyllaDB
ScyllaDB’s Change Data Capture (CDC) allows you to stream both the current state as well as a history of all changes made to your ScyllaDB tables. In this talk, Senior Solution Architect Guilherme Nogueira will discuss how CDC can be used to enable Real-time Event Processing Systems, and explore a wide-range of integrations and distinct operations (such as Deltas, Pre-Images and Post-Images) for you to get started with it.
MongoDB to ScyllaDB: Technical Comparison and the Path to SuccessScyllaDB
What can you expect when migrating from MongoDB to ScyllaDB? This session provides a jumpstart based on what we’ve learned from working with your peers across hundreds of use cases. Discover how ScyllaDB’s architecture, capabilities, and performance compares to MongoDB’s. Then, hear about your MongoDB to ScyllaDB migration options and practical strategies for success, including our top do’s and don’ts.
Real-Time or Analytics Workloads... Why Not Both?ScyllaDB
ScyllaDB’s Workload Prioritization provides resource optimization and performance isolation across workloads with different performance needs, such as Analytics and Real-time. In this session, you will learn how Workload Prioritization works, how you can use it to run different types of workloads together under a single ScyllaDB cluster, and how to fine-tune priorities and resource allocation based on your specific requirements.
Supercell is the game developer behind Hay Day, Clash of Clans, Boom Beach, Clash Royale and Brawl Stars. Learn how they unified real-time event streaming for a social platform with hundreds of millions of users.
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDBScyllaDB
Join ScyllaDB’s CEO, Dor Laor, as he introduces the revolutionary tablet architecture that makes one of the fastest databases fully elastic. Dor will also detail the significant advancements in ScyllaDB Cloud’s security and elasticity features as well as the speed boost that ScyllaDB Enterprise 2024.1 received.
An All-Around Benchmark of the DBaaS MarketScyllaDB
The entire database market is moving towards Database-as-a-Service (DBaaS), resulting in a heterogeneous DBaaS landscape shaped by database vendors, cloud providers, and DBaaS brokers. This DBaaS landscape is rapidly evolving and the DBaaS products differ in their features but also their price and performance capabilities. In consequence, selecting the optimal DBaaS provider for the customer needs becomes a challenge, especially for performance-critical applications.
To enable an on-demand comparison of the DBaaS landscape we present the benchANT DBaaS Navigator, an open DBaaS comparison platform for management and deployment features, costs, and performance. The DBaaS Navigator is an open data platform that enables the comparison of over 20 DBaaS providers for the relational and NoSQL databases.
This talk will provide a brief overview of the benchmarked categories with a focus on the technical categories such as price/performance for NoSQL DBaaS and how ScyllaDB Cloud is performing.
Discover the Unseen: Tailored Recommendation of Unwatched ContentScyllaDB
The session shares how JioCinema approaches ""watch discounting."" This capability ensures that if a user watched a certain amount of a show/movie, the platform no longer recommends that particular content to the user. Flawless operation of this feature promotes the discover of new content, improving the overall user experience.
JioCinema is an Indian over-the-top media streaming service owned by Viacom18.
So You've Lost Quorum: Lessons From Accidental DowntimeScyllaDB
The best thing about databases is that they always work as intended, and never suffer any downtime. You'll never see a system go offline because of a database outage. In this talk, Bo Ingram -- staff engineer at Discord and author of ScyllaDB in Action --- dives into an outage with one of their ScyllaDB clusters, showing how a stressed ScyllaDB cluster looks and behaves during an incident. You'll learn about how to diagnose issues in your clusters, see how external failure modes manifest in ScyllaDB, and how you can avoid making a fault too big to tolerate.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsScyllaDB
ScyllaDB monitoring provides a lot of useful information. But sometimes it’s not easy to find the root of the problem if something is wrong or even estimate the remaining capacity by the load on the cluster. This talk shares our team's practical tips on: 1) How to find the root of the problem by metrics if ScyllaDB is slow 2) How to interpret the load and plan capacity for the future 3) Compaction strategies and how to choose the right one 4) Important metrics which aren’t available in the default monitoring setup.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
2. Linda Xu
VP, data platform/TechOps
Ticketmaster
■ Who we are and our challengers
■ Cassandra test project
2
3. Cassandra
Content
▪ Bullet 1
• Bullet 2
• Bullet 3S
3
Somewhere in the world every 20 minutes
is a Live Nation Event
We power unforgettable moments for joy!
4. 4
TECH LANDSCAPE
■ 27 Ticketing Systems and over 250 unique products.
■ Hybrid Cloud with over 20,000 VMs across 7 global data centers, and
multiple AWS regions.
■ Thousands of databases with hybrid cloud deployment across
RDBMS and noSQL etc.
5. 5
Big Scale
Big Challenges
That’s a spike of >8 GBps !!!!!
Black Friday and Cyber
Monday Combined! On-sales = Black Friday every day!
■ Huge spikes / demand for tickets
■ Global company = across time zones
■ Limited inventory
■ Multiple sales channels
0 to 150M transactions in minutes!
Predicable OnSale Traffic
Can we be more prepared?
7. 7
We have a predictable business traffic,
we are looking for predictable backend solutions.
Databases technology we are looking for:
a. Predictable when traffic growth
b. Elastic requirement, not only can scale up but also be able to scale down.
c. Unified deployment to both cloud and OnPrem with shippable technology
d. Balance between features and costs
e. Performance, Performance and Performance.
Ticketmaster’s Cassandra Story
8. Ticketmaster’s Cassandra Story
8
Early Cassandra adoption
First enterprise deployment 2019
Potential standardized key-value DB solution
Solution for different workloads and business tiers
Find balance between cost and performance
Easy evaluation and deployment toolset
9. Database Cluster Setup
▪ Single Region, one dc with 6 nodes across 3 AZs
▪ EC2: r5.2xlarge
▪ EBS: io1 + 10K iops
Testing nodes
▪ EC2: t2.2xlarge
▪ Single node vs Six nodes
▪ Same region where the database cluster exists
Data warmup
▪ Each test start with 50M data preparation.
Cassandra Test Project
9
Cassandra Stress
▪ Ticketmaster customized workload
▪ Using cassandra stress binary from each distribution
Test Workloads
▪ 100% read
▪ 100% write
▪ 50% read and 50% write
▪ 80% write and 20% read
▪ 20% write and 80% read
Test Duration: 20 mins
Test Design
10. Cassandra Test Project - Why EBS?
10
Performance testing:
■ Write Performance is CPU bound
■ Read Performance is memory bound
■ NVMes favrates random reads
RTO
■ EBS: 1~10 mins for single node or entire DC recovery.
■ NVMe: The bigger the data set is the longer it will
take. For a 6TB, it will take 10 hrs or longer.
Memory and CPU scale up/down
■ EBS: 1~10 minutes
■ NVMe: same as RTO
NVMe vs EBS
testing in 2019That’s a spike of >8 GBps !!!!!
Black Friday and
Cyber Monday
Combined!
Start with EBS
11. #
# Keyspace info
#
keyspace: user_space
#
# The CQL for creating a keyspace (optional if it already exists)
#
keyspace_definition: |
CREATE KEYSPACE user_space WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1-nonprod1': 3};
#
11
Cassandra Test Project
Customized Yaml - keyspace
12. 12
Cassandra Test Project
#
# Table info
#
table: user_table
#
# The CQL for creating a table you wish to stress (optional if it already exists)
#
table_definition: |
CREATE TABLE user_table (
user_id text,
ticket_id text,
ticket_type text,
ticket_value double,
time bigint,
PRIMARY KEY (user_id, time)
) WITH CLUSTERING ORDER BY (time DESC)
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'};
Customized Yaml - table
13. Customized Yaml - columnspec
13
Cassandra Test Project
columnspec:
- name: user_id
population: seq(1..5b) # 5 Billion potential user_ids
size: fixed(16)
- name: ticket_type
size: uniform(10..20) # ticket_type is 10-20 chars
population: uniform(1..10) # there are 10 types of ticket_types
- name: ticket_id
size: fixed(16)
population: seq(1..5b) # 5 Billion unique user_ids
- name: ticket_value
population: gaussian(0..1000) # ticket_values range from 0-1000 and follow a gaussian distribution
14. Customized Yaml - operation
Cassandra Test Project
14
insert:
partitions: fixed(1)
batchtype: UNLOGGED
select: fixed(10)/10
queries:
select_user:
cql: select user_id, ticket_value,time from user_table where user_id = ? and time>=90 LIMIT 10
fields: samerow
This is before Covid-19. But I prefer to not put Covid /noCovid there. When I record, I can highlight this is before Covid-19 data
The testing is base on 6 nodes Cassandra cluster with I3.2xlarge vs m5.2xlarge
m5.2xlarge has 50% memory of I3.2xlarge
Tested in 2019.
Plug the diagram here
Plug the diagram here
Plug the diagram here
Plug the diagram here
Test result
The yaml example
Graph the test result
Test result
The yaml example
Graph the test result
Test result
The yaml example
Graph the test result
We noticed each distribution has different behavior pathen when we have single node testing
Scylla and Apache reached to median set right after cluster build. Occasionally we saw randow drop performance later.
DSE reached the media set about 24 hrs after cluster build and stable since there.
Both ScyllaDB and DSE provide solid performance during 6 nodes concurrency testing.
Our observation shows the CPU load on database clusters reach high end (>90%) during the stress test but no crash
Apache distribution shows some level of unstable We experienced errors during 50w50r testing, only one replica response on reading.
Single Node 100 Thread, multiple time testing, take 3 valid test result average data (remove max, remove the min)
We notices the results has different behaviors
Using other delivery of cassandra stress test library will confused the output
Scylla and Apached seems perform better right after build
DSE seems perform better after db clusters be build > 24 hrs
But this seems only apply to single node testing. For 6*100 concurrency, the output is relative stable
6 nodes concurrency test 6*100
We notice each run, it drive the database nodes cpu reach close to 100% we believe the load is good enough to close the db capacity
Both DSE and ScyllDB shows the stability. No crash or error during multipe testing
Apache Cassandra shows some stablility issue during stress testing. It crashed once on 50w50r testing.
The result is average number of run. It does show ScyllaDB faster under the traffic stress for all different workload types