Event-driven architectures are increasingly part of a complete data transformation solution. Learn how to employ Apache Kafka, Cloud Native Computing Foundation’s NATS, Amazon SQS, or other message queueing technologies. This talks covers the details of each, their advantages and disadvantages and how to select the best for your company’s needs.
How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...ScyllaDB
Scylla includes multiple features that collectively provide a robust security model. Most recently we announced support for encryption-at-rest in Scylla Enterprise. This enables you to lock-down your data even in multi-tenant and hybrid deployments of Scylla. Join Tzach and Dejan for an overview of security in Scylla and to see how you can approach it holistically using the array of Scylla capabilities. He will review Scylla Security features, from basic to more advanced, including:
Reducing your attack surface
Authorization & Authentication
Role-Based Access Control
Encryption at Transit
Encryption at Rest, in 2019.1.1 and beyond
LDAP authentication is a common requirement for any enterprise software. It gives users consistent login procedures across multiple components of the IT infrastructure, while centralizing the control of access rights. Scylla Enterprise now supports authentication via LDAP. We will look into how to configure Scylla Enterprise for LDAP interaction and how to fine-tune access control through it.
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
FireEye believes in intelligence driven cyber security. Their legacy system used PostgreSQL with a custom graph database system to store and facilitate analysis of threat intelligence data. As their user base increased they ran into scaling issues requiring a system redesign with a new platform.
This presentation will focus on the bac kend systems and migration path to a new technology stack using JanusGraph running on top of Scylla plus Elasticsearch.
Using Scylladb turned out to be a game-changer in terms of performance and the types of analysis our application is able to do effortlessly.
Why you need benchmarks
Finding the right database solution for your use case can be an arduous journey. The database deployment touches aspects of throughput performance, latency control, high availability and data resilience.
You will need to decide on the infrastructure to use: Cloud, on-premise or a hybrid solution.
Data models also have an impact on finding the right fit for the use case. Once you establish a requirements set, the next step is to test your use case against the databases of choice.
In this workshop, we will discuss the different data points you need to collect in order to get the most realistic testing environment.
We will cover:
Data model impact on performance and latency
Client behavior related to database capabilities
Failover and high availability testing
Hardware selection and cluster configuration impact
We will show 2 benchmarking tools you can use to test and benchmark your clusters to identify the optimal deployment scenario for your use case.
Attend this virtual workshop if you are:
Looking to minimize the cost of your database deployment
Making a database decision based on performance and scale data
Planning to emulate your workload on a pre-production system where you can test, fail fast and learn.
How ReversingLabs Serves File Reputation Service for 10B FilesScyllaDB
ReversingLabs is on a mission to deliver threat intelligence to their users by providing complete visibility and insight into every destructive object. To deliver on their commitment, they migrated to Scylla to handle thousands of updates per second in their processing engines. In their talk, they will go over their requirements and show how they tuned the system to handle requests from their API frontend.
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
Jeff Bean will lead a discussion of event-driven architectures, Apache Kafka, Kafka Connect, KSQL and Confluent Cloud. Then we'll talk about some uses of Confluent and Scylla together, including a co-deployment with Lookout, ScyllaDB and Confluent in the IoT space, and the upcoming native connector.
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...ScyllaDB
AdTech requires high speed at massive scale. Sizmek serves millions of requests every second. Requests need to be processed in tens of milliseconds, while involving 10 simultaneous lookups into a database that contains tens of billions of profiles. In this presentation, you will discover how Scylla enables Sizmek’s real-time bidders to query a gigantic user profile store quickly and reliably with only a few nodes. We’ll discuss data modeling, server and driver configuration, techniques to minimize disk access, as well as considerations for leveraging Spark while migrating from HBase.
Augury: Real-Time Insights for the Industrial IoTScyllaDB
Augury stores and serves time-series features from massive streams of IoT data, both for real-time insights, and offline learning and analytics. Learn about Augury’s needs and constraints, their solution evaluation and architecture, and fundamental practices for efficient data modeling, plus get a glimpse into the next-gen architecture at Augury, with a view on time-series feature storage and serving.
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalScyllaDB
GPS Insight is a leader in fleet vehicle management using IoT. Internally they use a combination of SQL and NoSQL big data technologies, including distributed SQL data analytics via Presto, an open-source query engine developed by Facebook. Learn how to set up, configure, and use Presto with Scylla for supporting ad hoc non-partition key queries for analytics and data scientists. Plus hear how to use Presto for a Data Archival approach with csv files on S3 or similar storage appliance.
How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...ScyllaDB
Scylla includes multiple features that collectively provide a robust security model. Most recently we announced support for encryption-at-rest in Scylla Enterprise. This enables you to lock-down your data even in multi-tenant and hybrid deployments of Scylla. Join Tzach and Dejan for an overview of security in Scylla and to see how you can approach it holistically using the array of Scylla capabilities. He will review Scylla Security features, from basic to more advanced, including:
Reducing your attack surface
Authorization & Authentication
Role-Based Access Control
Encryption at Transit
Encryption at Rest, in 2019.1.1 and beyond
LDAP authentication is a common requirement for any enterprise software. It gives users consistent login procedures across multiple components of the IT infrastructure, while centralizing the control of access rights. Scylla Enterprise now supports authentication via LDAP. We will look into how to configure Scylla Enterprise for LDAP interaction and how to fine-tune access control through it.
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
FireEye believes in intelligence driven cyber security. Their legacy system used PostgreSQL with a custom graph database system to store and facilitate analysis of threat intelligence data. As their user base increased they ran into scaling issues requiring a system redesign with a new platform.
This presentation will focus on the bac kend systems and migration path to a new technology stack using JanusGraph running on top of Scylla plus Elasticsearch.
Using Scylladb turned out to be a game-changer in terms of performance and the types of analysis our application is able to do effortlessly.
Why you need benchmarks
Finding the right database solution for your use case can be an arduous journey. The database deployment touches aspects of throughput performance, latency control, high availability and data resilience.
You will need to decide on the infrastructure to use: Cloud, on-premise or a hybrid solution.
Data models also have an impact on finding the right fit for the use case. Once you establish a requirements set, the next step is to test your use case against the databases of choice.
In this workshop, we will discuss the different data points you need to collect in order to get the most realistic testing environment.
We will cover:
Data model impact on performance and latency
Client behavior related to database capabilities
Failover and high availability testing
Hardware selection and cluster configuration impact
We will show 2 benchmarking tools you can use to test and benchmark your clusters to identify the optimal deployment scenario for your use case.
Attend this virtual workshop if you are:
Looking to minimize the cost of your database deployment
Making a database decision based on performance and scale data
Planning to emulate your workload on a pre-production system where you can test, fail fast and learn.
How ReversingLabs Serves File Reputation Service for 10B FilesScyllaDB
ReversingLabs is on a mission to deliver threat intelligence to their users by providing complete visibility and insight into every destructive object. To deliver on their commitment, they migrated to Scylla to handle thousands of updates per second in their processing engines. In their talk, they will go over their requirements and show how they tuned the system to handle requests from their API frontend.
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
Jeff Bean will lead a discussion of event-driven architectures, Apache Kafka, Kafka Connect, KSQL and Confluent Cloud. Then we'll talk about some uses of Confluent and Scylla together, including a co-deployment with Lookout, ScyllaDB and Confluent in the IoT space, and the upcoming native connector.
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...ScyllaDB
AdTech requires high speed at massive scale. Sizmek serves millions of requests every second. Requests need to be processed in tens of milliseconds, while involving 10 simultaneous lookups into a database that contains tens of billions of profiles. In this presentation, you will discover how Scylla enables Sizmek’s real-time bidders to query a gigantic user profile store quickly and reliably with only a few nodes. We’ll discuss data modeling, server and driver configuration, techniques to minimize disk access, as well as considerations for leveraging Spark while migrating from HBase.
Augury: Real-Time Insights for the Industrial IoTScyllaDB
Augury stores and serves time-series features from massive streams of IoT data, both for real-time insights, and offline learning and analytics. Learn about Augury’s needs and constraints, their solution evaluation and architecture, and fundamental practices for efficient data modeling, plus get a glimpse into the next-gen architecture at Augury, with a view on time-series feature storage and serving.
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalScyllaDB
GPS Insight is a leader in fleet vehicle management using IoT. Internally they use a combination of SQL and NoSQL big data technologies, including distributed SQL data analytics via Presto, an open-source query engine developed by Facebook. Learn how to set up, configure, and use Presto with Scylla for supporting ad hoc non-partition key queries for analytics and data scientists. Plus hear how to use Presto for a Data Archival approach with csv files on S3 or similar storage appliance.
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersScyllaDB
Disney+ Hotstar is the fastest growing branch of Disney+. Join Disney+ Hotstar Architect Vamsi Subhash and senior data engineer Balakrishnan Kaliyamoorthy to learn…
How Disney+ Hotstar architected their systems to handle massive data loads
Why they chose to replace both Redis and Elasticsearch
Their requirements for massively scalable data infrastructure and evolving data models
How they migrated their data to Scylla Cloud, ScyllaDB’s fully managed NoSQL database-as-a-service, without suffering downtime
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia ForwardScyllaDB
To support 6 million on-demand rides per day, a lot has to happen in near-real time. Latency translates into missed rides and monetary losses. Grab relies data streaming in Apache Kafka, with Scylla to tie it all together. This presentation details how Grab uses Scylla as a high throughput, low-latency aggregation store to combine multiple Kafka streams in near real-time, highlighting impressive characteristics of Scylla and how it fared against other databases in Grab’s exhaustive evaluations.
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...ScyllaDB
SAS Intelligent Advertising changed its ad-serving platform from using Datastax Cassandra clusters to Scylla clusters for its real-time visitor data storage. This presentation describes how this migration was executed with no downtime and with no loss of data, even as data was constantly being created or updated.
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
n this talk we will address how we developed our Cassandra environments utilizing Cisco UCS Open Stack Platform with the DataStax Enterprise Edition software. In addition we are utilizing OpenSource CEPH storage in our Infrastructure to optimize the Performance and reduce the costs.
Scylla began with a Cassandra compatibility story, implementing Cassandra’s query language (CQL) and replicating its user-visible architecture. Recently we introduced “Alternator” - an experimental feature adding compatibility with a second NoSQL database: Amazon’s DynamoDB. In this talk we look at why DynamoDB’s API was chosen as a good target for our API extension, how DynamoDB is similar to Scylla - and how it differs, and how we can implement DynamoDB’s API in Scylla. We will describe our progress so far in making Alternator compatible with DynamoDB - and what still remains to be done so that any DynamoDB application can run unmodified on Scylla.
Scylla Summit 2016: ScyllaDB, Present and FutureScyllaDB
Where is Scylla now and where is it going? ScyllaDB's CTO Avi Kivity outlines the 3 ScyllaDB Commitments, and gives an overview of the ScyllaDB road map.
mParticle's Journey to Scylla from CassandraScyllaDB
mParticle processes 50 billion monthly messages and needed a data store that provides full availability and performance. They previously used Cassandra but faced issues with high latency, complicated tuning, and backlogs of up to 20 hours. They tested Scylla and found it provided significantly lower latency and compaction backlogs with minimal tuning needed. Scylla also offered knowledgeable support. mParticle migrated their data from Cassandra to Scylla, which immediately kept up with their data loads with little to no backlog.
Scylla Summit 2022: Stream Processing with ScyllaDBScyllaDB
Palo Alto Networks processes terabytes of events each day. One of their many challenges is to understand which of those events (which might come from various different sensors) actually describe the same story but from many different viewpoints.
Traditionally, such a system would need some sort of a database to store the events, and a message queue to notify consumers about new events that arrived into the system. They wanted to mitigate the cost and operational overhead of deploying yet another stateful component to their system, and designed a solution that uses ScyllaDB as the database for the events *and* as a message queue that allows our consumers to consume the correct events each time. Join this talk with Daniel Belenky, Principal Software Engineer, Palo Alto Networks where he will walk you through their process.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
I'm going to cover something which could be seen as essential for Cassandra but which hasn't gotten much attention in the Cassandra community and literature. It's schema migrations--how you go about pushing out and versioning changes to your keyspace and table definitions across environments. This is an area that has established solutions in the relational database world, with tools like Liquibase(http://www.liquibase.org/) and Flyway (http://flywaydb.org/) and in web frameworks like Rails and Grails.
I'll explain the different types of migrations but then focus, for most of the talk, on schema migrations. I'll explain how schema migrations have been done in the Cassandra community and the roadblocks teams have faced trying to use Liquibase and Flyway to manage Cassandra migrations.
Then I'll share an elegant, lightweight schema migrations system that we at GridPoint built on top of Flyway. I'll use our system as a context for discussing schema migration best practices for Cassandra and the various choices teams have for their migrations and table definitions, including when NOT to use a tool like Flyway. I'll also touch on the other types of migrations besides keyspace and table definitions that can be versioned and driven off source control.
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...ScyllaDB
This document compares MongoDB and ScyllaDB databases. It discusses their histories, architectures, data models, querying capabilities, consistency handling, and scaling approaches. It also provides takeaways for operations teams and developers, noting that ScyllaDB favors consistent performance over flexibility while MongoDB is more flexible but sacrifices some performance. The document also outlines how a company called Numberly uses both MongoDB and ScyllaDB for different use cases.
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
Customer Data Platforms, commonly called CDPs, form an integral part of the marketing stack powering Zeotap's Adtech and Martech use-cases. The company offers a privacy-compliant CDP platform, and ScyllaDB is an integral part. Zeotap's CDP demands a mix of OLTP, OLAP, and real-time data ingestion, requiring a highly-performant store.
In this presentation, Shubham Patil, Lead Software Engineer, and Safal Pandita, Senior Software Engineer at Zeotap will share how ScyllaDB is powering their solution and why it's a great fit. They begin by describing their business use case and the challenges they were facing before moving to ScyllaDB. Then they cover their technical use-cases and requirements for real-time and batch data ingestions. They delve into our data access patterns and describe their data model supporting all use cases simultaneously for ingress/egress. They explain how they are using Scylla Migrator for our migration needs, then describe their multiregional, multi-tenant production setup for onboarding more than 130+ partners. Finally, they finish by sharing some of their learnings, performance benchmarks, and future plans.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...DataStax Academy
The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. But there are serious advantages to many of the new tools, and this presentation will give an analysis of the current state–including pros and cons as well as what’s needed to bootstrap and operate the various options.
About Robbie Strickland, Software Development Manager at The Weather Channel
Robbie works for The Weather Channel’s digital division as part of the team that builds backend services for weather.com and the TWC mobile apps. He has been involved in the Cassandra project since 2010 and has contributed in a variety of ways over the years; this includes work on drivers for Scala and C#, the Hadoop integration, heading up the Atlanta Cassandra Users Group, and answering lots of Stack Overflow questions.
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax
A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration.
Speaker: Michael Kjellman, Software Engineer at Barracuda Networks
Michael Kjellman is a Software Engineer, from San Francisco, working at Barracuda Networks. Michael works across multiple products, technologies, and languages. He primarily works on Barracuda's spam infrastructure and web filter classification data.
ClustrixDB: how distributed databases scale outMariaDB plc
ClustrixDB, now part of MariaDB, is a fully distributed and transactional RDBMS for applications with the highest scalability requirements. In this session Robbie Mihalyi, VP of Engineering for ClustrixDB, provides an introduction to ClustrixDB, followed by an in-depth technical overview of its architecture, with a focus on distributed storage, transactions and query processing – and its unique approach to index partitioning.
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB
To maximize the benefits of ScyllaDB, you must adapt the structure of your data. Data modeling for ScyllaDB should be query-driven based on your access patterns – a very different approach than normalization for SQL tables. In this session, you will learn how tools can help you migrate your existing SQL structures to accelerate your digital transformation and application modernization.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Real-time Fraud Detection for Southeast Asia’s Leading Mobile PlatformScyllaDB
Grab is one of the most frequently used mobile platforms in Southeast Asia, providing the everyday services that matter most to consumers. Its users commute, eat, arrange shopping deliveries, and pay with one e-wallet. Grab relies on the combination of Apache Kafka and Scylla for a very critical use case -- instantaneously detecting fraudulent transactions that might occur across approximately more than six million on-demand rides per day taking place in eight countries across Southeast Asia. Doing this successfully requires many things to happen in near-real time.
Join our webinar for this fascinating real-time big data use case, and learn the steps Grab took to optimize their fraud detection systems using the Scylla NoSQL database along with Apache Kafka.
Patience with Apache Cassandra’s volatile latencies was wearing thin at Rakuten, a global online retailer serving 1.5B worldwide members. The Rakuten Catalog Platform team architected an advanced data platform – with Cassandra at its core – to normalize, validate, transform, and store product data for their global operations. However, while the business was expecting this platform to support extreme growth with exceptional end-user experiences, the team was battling Cassandra’s instability, inconsistent performance at scale, and maintenance overhead. So, they decided to migrate.
Join this webinar to hear a firsthand account of:
How specific Cassandra challenges were impacting the team and their product
How they determined whether migration would be worth the effort
What processes they used to evaluate alternative databases
What their migration required from a technical perspective
Strategies (and lessons learned) for your own database migration
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
Are you running separate database clusters for operational and analytical workloads? Scylla now has the ability to handle multiple workloads from a single cluster--without performance degradation to either. This session will cover:
- The evolving requirements for operational (OLTP) and analytics (OLAP) workloads in the modern datacenter
- How Scylla provides built-in control over workload priority and makes it easy for administrators to configure workload priorities
- The TCO impact of minimizing integrations and maintenance tasks, while also shrinking the datacenter footprint and maximizing utilization Plus we’ll share test results of how it performs in real-world settings.
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformScyllaDB
SkyElectric uses Scylla to power its smart energy platform. Scylla provides better performance, scalability, and lower latency than their previous MySQL database. With Scylla, SkyElectric has seen average write latency of 1.4ms and read latency of under 1ms, which is 10x faster throughput than MySQL. While Scylla has been easy to operate and support responsive upgrades and repairs, SkyElectric hopes to see improvements in data changelog, faster node joining, and backup/restore processes.
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
With a current zoo of technologies and different ways of their interaction it's a big challenge to architect a system (or adopt existed one) that will conform to low-latency BigData analysis requirements. Apache Kafka and Kappa Architecture in particular take more and more attention over classic Hadoop-centric technologies stack. New Consumer API put significant boost in this direction. Microservices-based streaming processing and new Kafka Streams tend to be a synergy in BigData world.
SpringPeople - Introduction to Cloud ComputingSpringPeople
Cloud computing is no longer a fad that is going around. It is for real and is perhaps the most talked about subject. Various players in the cloud eco-system have provided a definition that is closely aligned to their sweet spot –let it be infrastructure, platforms or applications.
This presentation will provide an exposure of a variety of cloud computing techniques, architecture, technology options to the participants and in general will familiarize cloud fundamentals in a holistic manner spanning all dimensions such as cost, operations, technology etc
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersScyllaDB
Disney+ Hotstar is the fastest growing branch of Disney+. Join Disney+ Hotstar Architect Vamsi Subhash and senior data engineer Balakrishnan Kaliyamoorthy to learn…
How Disney+ Hotstar architected their systems to handle massive data loads
Why they chose to replace both Redis and Elasticsearch
Their requirements for massively scalable data infrastructure and evolving data models
How they migrated their data to Scylla Cloud, ScyllaDB’s fully managed NoSQL database-as-a-service, without suffering downtime
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia ForwardScyllaDB
To support 6 million on-demand rides per day, a lot has to happen in near-real time. Latency translates into missed rides and monetary losses. Grab relies data streaming in Apache Kafka, with Scylla to tie it all together. This presentation details how Grab uses Scylla as a high throughput, low-latency aggregation store to combine multiple Kafka streams in near real-time, highlighting impressive characteristics of Scylla and how it fared against other databases in Grab’s exhaustive evaluations.
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...ScyllaDB
SAS Intelligent Advertising changed its ad-serving platform from using Datastax Cassandra clusters to Scylla clusters for its real-time visitor data storage. This presentation describes how this migration was executed with no downtime and with no loss of data, even as data was constantly being created or updated.
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
n this talk we will address how we developed our Cassandra environments utilizing Cisco UCS Open Stack Platform with the DataStax Enterprise Edition software. In addition we are utilizing OpenSource CEPH storage in our Infrastructure to optimize the Performance and reduce the costs.
Scylla began with a Cassandra compatibility story, implementing Cassandra’s query language (CQL) and replicating its user-visible architecture. Recently we introduced “Alternator” - an experimental feature adding compatibility with a second NoSQL database: Amazon’s DynamoDB. In this talk we look at why DynamoDB’s API was chosen as a good target for our API extension, how DynamoDB is similar to Scylla - and how it differs, and how we can implement DynamoDB’s API in Scylla. We will describe our progress so far in making Alternator compatible with DynamoDB - and what still remains to be done so that any DynamoDB application can run unmodified on Scylla.
Scylla Summit 2016: ScyllaDB, Present and FutureScyllaDB
Where is Scylla now and where is it going? ScyllaDB's CTO Avi Kivity outlines the 3 ScyllaDB Commitments, and gives an overview of the ScyllaDB road map.
mParticle's Journey to Scylla from CassandraScyllaDB
mParticle processes 50 billion monthly messages and needed a data store that provides full availability and performance. They previously used Cassandra but faced issues with high latency, complicated tuning, and backlogs of up to 20 hours. They tested Scylla and found it provided significantly lower latency and compaction backlogs with minimal tuning needed. Scylla also offered knowledgeable support. mParticle migrated their data from Cassandra to Scylla, which immediately kept up with their data loads with little to no backlog.
Scylla Summit 2022: Stream Processing with ScyllaDBScyllaDB
Palo Alto Networks processes terabytes of events each day. One of their many challenges is to understand which of those events (which might come from various different sensors) actually describe the same story but from many different viewpoints.
Traditionally, such a system would need some sort of a database to store the events, and a message queue to notify consumers about new events that arrived into the system. They wanted to mitigate the cost and operational overhead of deploying yet another stateful component to their system, and designed a solution that uses ScyllaDB as the database for the events *and* as a message queue that allows our consumers to consume the correct events each time. Join this talk with Daniel Belenky, Principal Software Engineer, Palo Alto Networks where he will walk you through their process.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
I'm going to cover something which could be seen as essential for Cassandra but which hasn't gotten much attention in the Cassandra community and literature. It's schema migrations--how you go about pushing out and versioning changes to your keyspace and table definitions across environments. This is an area that has established solutions in the relational database world, with tools like Liquibase(http://www.liquibase.org/) and Flyway (http://flywaydb.org/) and in web frameworks like Rails and Grails.
I'll explain the different types of migrations but then focus, for most of the talk, on schema migrations. I'll explain how schema migrations have been done in the Cassandra community and the roadblocks teams have faced trying to use Liquibase and Flyway to manage Cassandra migrations.
Then I'll share an elegant, lightweight schema migrations system that we at GridPoint built on top of Flyway. I'll use our system as a context for discussing schema migration best practices for Cassandra and the various choices teams have for their migrations and table definitions, including when NOT to use a tool like Flyway. I'll also touch on the other types of migrations besides keyspace and table definitions that can be versioned and driven off source control.
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...ScyllaDB
This document compares MongoDB and ScyllaDB databases. It discusses their histories, architectures, data models, querying capabilities, consistency handling, and scaling approaches. It also provides takeaways for operations teams and developers, noting that ScyllaDB favors consistent performance over flexibility while MongoDB is more flexible but sacrifices some performance. The document also outlines how a company called Numberly uses both MongoDB and ScyllaDB for different use cases.
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
Customer Data Platforms, commonly called CDPs, form an integral part of the marketing stack powering Zeotap's Adtech and Martech use-cases. The company offers a privacy-compliant CDP platform, and ScyllaDB is an integral part. Zeotap's CDP demands a mix of OLTP, OLAP, and real-time data ingestion, requiring a highly-performant store.
In this presentation, Shubham Patil, Lead Software Engineer, and Safal Pandita, Senior Software Engineer at Zeotap will share how ScyllaDB is powering their solution and why it's a great fit. They begin by describing their business use case and the challenges they were facing before moving to ScyllaDB. Then they cover their technical use-cases and requirements for real-time and batch data ingestions. They delve into our data access patterns and describe their data model supporting all use cases simultaneously for ingress/egress. They explain how they are using Scylla Migrator for our migration needs, then describe their multiregional, multi-tenant production setup for onboarding more than 130+ partners. Finally, they finish by sharing some of their learnings, performance benchmarks, and future plans.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...DataStax Academy
The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. But there are serious advantages to many of the new tools, and this presentation will give an analysis of the current state–including pros and cons as well as what’s needed to bootstrap and operate the various options.
About Robbie Strickland, Software Development Manager at The Weather Channel
Robbie works for The Weather Channel’s digital division as part of the team that builds backend services for weather.com and the TWC mobile apps. He has been involved in the Cassandra project since 2010 and has contributed in a variety of ways over the years; this includes work on drivers for Scala and C#, the Hadoop integration, heading up the Atlanta Cassandra Users Group, and answering lots of Stack Overflow questions.
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax
A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration.
Speaker: Michael Kjellman, Software Engineer at Barracuda Networks
Michael Kjellman is a Software Engineer, from San Francisco, working at Barracuda Networks. Michael works across multiple products, technologies, and languages. He primarily works on Barracuda's spam infrastructure and web filter classification data.
ClustrixDB: how distributed databases scale outMariaDB plc
ClustrixDB, now part of MariaDB, is a fully distributed and transactional RDBMS for applications with the highest scalability requirements. In this session Robbie Mihalyi, VP of Engineering for ClustrixDB, provides an introduction to ClustrixDB, followed by an in-depth technical overview of its architecture, with a focus on distributed storage, transactions and query processing – and its unique approach to index partitioning.
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB
To maximize the benefits of ScyllaDB, you must adapt the structure of your data. Data modeling for ScyllaDB should be query-driven based on your access patterns – a very different approach than normalization for SQL tables. In this session, you will learn how tools can help you migrate your existing SQL structures to accelerate your digital transformation and application modernization.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Real-time Fraud Detection for Southeast Asia’s Leading Mobile PlatformScyllaDB
Grab is one of the most frequently used mobile platforms in Southeast Asia, providing the everyday services that matter most to consumers. Its users commute, eat, arrange shopping deliveries, and pay with one e-wallet. Grab relies on the combination of Apache Kafka and Scylla for a very critical use case -- instantaneously detecting fraudulent transactions that might occur across approximately more than six million on-demand rides per day taking place in eight countries across Southeast Asia. Doing this successfully requires many things to happen in near-real time.
Join our webinar for this fascinating real-time big data use case, and learn the steps Grab took to optimize their fraud detection systems using the Scylla NoSQL database along with Apache Kafka.
Patience with Apache Cassandra’s volatile latencies was wearing thin at Rakuten, a global online retailer serving 1.5B worldwide members. The Rakuten Catalog Platform team architected an advanced data platform – with Cassandra at its core – to normalize, validate, transform, and store product data for their global operations. However, while the business was expecting this platform to support extreme growth with exceptional end-user experiences, the team was battling Cassandra’s instability, inconsistent performance at scale, and maintenance overhead. So, they decided to migrate.
Join this webinar to hear a firsthand account of:
How specific Cassandra challenges were impacting the team and their product
How they determined whether migration would be worth the effort
What processes they used to evaluate alternative databases
What their migration required from a technical perspective
Strategies (and lessons learned) for your own database migration
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
Are you running separate database clusters for operational and analytical workloads? Scylla now has the ability to handle multiple workloads from a single cluster--without performance degradation to either. This session will cover:
- The evolving requirements for operational (OLTP) and analytics (OLAP) workloads in the modern datacenter
- How Scylla provides built-in control over workload priority and makes it easy for administrators to configure workload priorities
- The TCO impact of minimizing integrations and maintenance tasks, while also shrinking the datacenter footprint and maximizing utilization Plus we’ll share test results of how it performs in real-world settings.
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformScyllaDB
SkyElectric uses Scylla to power its smart energy platform. Scylla provides better performance, scalability, and lower latency than their previous MySQL database. With Scylla, SkyElectric has seen average write latency of 1.4ms and read latency of under 1ms, which is 10x faster throughput than MySQL. While Scylla has been easy to operate and support responsive upgrades and repairs, SkyElectric hopes to see improvements in data changelog, faster node joining, and backup/restore processes.
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
With a current zoo of technologies and different ways of their interaction it's a big challenge to architect a system (or adopt existed one) that will conform to low-latency BigData analysis requirements. Apache Kafka and Kappa Architecture in particular take more and more attention over classic Hadoop-centric technologies stack. New Consumer API put significant boost in this direction. Microservices-based streaming processing and new Kafka Streams tend to be a synergy in BigData world.
SpringPeople - Introduction to Cloud ComputingSpringPeople
Cloud computing is no longer a fad that is going around. It is for real and is perhaps the most talked about subject. Various players in the cloud eco-system have provided a definition that is closely aligned to their sweet spot –let it be infrastructure, platforms or applications.
This presentation will provide an exposure of a variety of cloud computing techniques, architecture, technology options to the participants and in general will familiarize cloud fundamentals in a holistic manner spanning all dimensions such as cost, operations, technology etc
Roko Kruze of vectorized.io describes real-time analytics using Redpanda event streams and ClickHouse data warehouse. 15 December 2021 SF Bay Area ClickHouse Meetup
Maheedhar Gunturu presented on connecting Kafka message systems with Scylla. He discussed the benefits of message queues like Kafka including centralized infrastructure, buffering capabilities, and streaming data transformations. He then explained Kafka Connect which provides a standardized framework for building connectors with distributed and scalable connectors. Scylla and Cassandra connectors are available today with a Scylla shard aware connector being developed.
Amazon aws big data demystified | Introduction to streaming and messaging flu...Omid Vahdaty
This document provides an overview of streaming data and messaging concepts including batch processing, streaming, streaming vs messaging, challenges with streaming data, and AWS services for streaming and messaging like Kinesis, Kinesis Firehose, SQS, and Kafka. It discusses use cases and comparisons for these different services. For example, Kinesis is suitable for complex analytics on streaming data while SQS focuses on per-event messaging. Firehose automatically loads streaming data into AWS services like S3 and Redshift without custom coding.
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...Lightbend
**Featuring Aaron Williams, Head of Advocacy at Mesosphere, Inc. and Markus Eisele, Developer Advocate at Lightbend, Inc.**
The traditional architecture that enterprises run their businesses on has typically been delivered as monolithic applications running in a virtualized, on-premise infrastructure. Public and private cloud technologies have changed everything, but if the applications are not designed, or re-designed, appropriately, then it is impossible to take advantage of the advances in both distributed application services and hybrid infrastructure. Consequently, enterprise architects are looking to microservices-based architectures as a means to modernize their legacy applications.
This webinar with Lightbend and partner Mesosphere will introduce a new framework specifically designed to help developers modernize legacy Java EE applications into systems of microservices and then discuss exactly what is required to run these distributed systems at enterprise scale.
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays
apidays New York 2022 - Beyond API Regulations for Finance, Insurance, and Healthcare
July 27 & 28, 2022
Leveraging Event Streaming to Super-Charge your Business
Mary Grygleski, Streaming Developer Advocate at DataStax
------------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Deep dive into the API industry with our reports:
https://www.apidays.global/industry-reports/
Subscribe to our global newsletter:
https://apidays.typeform.com/to/i1MPEW
HPC and cloud distributed computing, as a journeyPeter Clapham
Introducing an internal cloud brings new paradigms, tools and infrastructure management. When placed alongside traditional HPC the new opportunities are significant But getting to the new world with micro-services, autoscaling and autodialing is a journey that cannot be achieved in a single step.
This document provides an overview of IBM Integration Bus, including:
- It is IBM's strategic enterprise integration product supporting Java, .NET and heterogeneous integration.
- Message flows provide reusable, scalable, transactional processing of messages passing between applications.
- The product contains many built-in node types for common integration protocols and tasks like transformation.
- An example message flow demonstrates routing messages based on conditions and transforming/outputting messages.
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
This document describes BBVA's implementation of a Big Data Lake using Apache Spark for log collection, storage, and analytics. It discusses:
1) Using Syslog-ng for log collection from over 2,000 applications and devices, distributing logs to Kafka.
2) Storing normalized logs in HDFS and performing analytics using Spark, with outputs to analytics, compliance, and indexing systems.
3) Choosing Spark because it allows interactive, batch, and stream processing with one system using RDDs, SQL, streaming, and machine learning.
Beyond REST and RPC: Asynchronous Eventing and Messaging PatternsClemens Vasters
In this session you will learn about when and why to use asynchronous communication with and between services, what kind of eventing/messaging infrastructure you can use in the cloud and on the edge, and how to make it all work together.
The document discusses and compares several popular message queue technologies including RabbitMQ, CloudAMQP, Amazon SNS, Stormmq, ActiveMQ, SwiftMQ, Sparrow, Starling, Kestrel, and Kafka. It provides brief descriptions of each technology, highlighting key features such as supported protocols, reliability, speed, and usage patterns. RabbitMQ is described as robust, easy to use, open source, and supporting many platforms. CloudAMQP provides RabbitMQ as a hosted service. Amazon SNS focuses on push notifications and supports delivery to various endpoints. Stormmq offers free usage and high throughput. ActiveMQ supports multiple protocols and reliable messaging. The others described include SwiftMQ, Sparrow, Starling, Kestrel
Big data conference europe real-time streaming in any and all clouds, hybri...Timothy Spann
Biography
Tim Spann is a Principal DataFlow Field Engineer at Cloudera where he works with Apache NiFi, MiniFi, Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
Talk
Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.
Tools:
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.
References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
Source Code: https://github.com/tspannhw/MmFLaNK
FLiP Stack
StreamNative
Session on CloudStack, intended for new users to CloudStack, provides an overview to varied audience levels information on usages, use cases, deployment and its architecture.
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
Webinar Session - https://youtu.be/_5MfGMf8PG4
In this webinar, we share how the Container Attached Storage pattern makes performance tuning more tractable, by giving each workload its own storage system, thereby decreasing the variables needed to understand and tune performance.
We then introduce MayaStor, a breakthrough in the use of containers and Kubernetes as a data plane. MayaStor is the first containerized data engine available that delivers near the theoretical maximum performance of underlying systems. MayaStor performance scales with the underlying hardware and has been shown, for example, to deliver in excess of 10 million IOPS in a particular environment.
The document discusses the Open Data Plane (ODP) project, which aims to create an open source framework for data plane applications. ODP provides a standardized API to enable networking applications across different architectures like ARM, Intel and PowerPC. It is based on the Event Machine model of work-driven processing. ODP implementations optimize the API for different hardware platforms while providing application portability. The project aims to support functions like dynamic load balancing, power management, and virtual switch integration.
Event Driven Architectures with Apache KafkaMatt Masuda
This document discusses event-driven architectures and how Apache Kafka can be used to enable them. It provides an overview of microservices architectures and the issues they can have with synchronous calls. Event-driven architectures address these issues using asynchronous messaging. Kafka is then introduced as a distributed messaging platform that allows publishing and subscribing to event streams. It describes key Kafka concepts like topics, partitions, producers, and consumers. The document argues that using Kafka for event-driven architectures solves problems around service location, load balancing, and integration of new services. It also provides durable storage and read positioning capabilities. Finally, it references additional resources and promises a demo.
Similar to Captial One: Why Stream Data as Part of Data Transformation? (20)
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Optimizing NoSQL Performance Through ObservabilityScyllaDB
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor!
Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track.
This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example:
- Common issues getting up and running with the monitoring stack
- Using the CQL optimizations dashboard
- Common issues causing high latency in a node
- Common issues causing replica imbalance
- What a healthy system looks like in terms of memory
- Key metrics to keep an eye on
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
Discuss the core tradeoffs and considerations involved in order-free and ordered stream processing. Brian Taylor walks through the pros and cons of three different approaches: no data dependency, deferred inter-event data dependency, and streaming inter-event data dependency.
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
We start by setting up a common ground introducing why relational databases fall short, addressing common EDA characteristics such as the need for real-time response times and schemaless approaches to address recurring changes to adapt and on-board new use cases. Next, interact with a sample Rust-based application: a social network app demonstrating an integration of both ScyllaDB and Redpanda.
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example:
- Understanding query first design principles
- Planning for schema evolution
- Steering clear of common pitfalls and anti-patterns
- Assessing data access patterns
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
Our first webinar of this series will cover common mistakes with practices such as:
- Translating the data model to NoSQL
- Optimizing table design
- Optimizing query performance
- Planning for partitioning
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
Expert tips on how to maximize your database performance at scale
Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies.
In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams.
We’ll cover how to:
- Design and deploy a large-scale distributed database cluster
- Optimize your clients’ interactions with it
- Expand the cluster horizontally and globally
- Ensure it survives whatever disasters the world throws at it
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll:
- Examine the context and technical requirements
- Talk about potential solutions and cover the pros and cons of each
- Disclose what approach the team took, and how it worked out
About the speaker:
Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care?
This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on:
- The hidden aspects of linear scaling
- When linear scaling matters most and when it’s simply irrelevant
- Often overlooked considerations for optimizing and measuring distributed systems performance
Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.
Navigating Complex Database Performance Hurdles
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma:
- The presenters will describe the context and technical requirements
- Together, we’ll talk about potential solutions and cover the pros and cons of each
- Finally, we’ll disclose what approach the team took, and how it worked out
Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
Felipe Cardeneti Mendes, Solutions Architect at ScyllaDB
Navigating workload-specific performance challenges and tradeoffs.
Felipe Mendes covers how to navigate the top performance challenges and tradeoffs that you’re likely to face with your project’s specific workload characteristics and technical/business requirements.
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
Pavel Emelyanov, Principal Engineer at ScyllaDB
Botond Denes, C++ Developer at ScyllaDB
What performance-minded engineers need to know.
Hear from Pavel Emelyanov and Botond Dénes on the impact of database internals – specifically, what to look for if you need latency and/or throughput improvements.
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
Piotr Sarna, Software Engineer at Turso
Understanding and tapping your driver’s performance potential.
Piotr Sarna discusses how to get the most out of a driver, particularly from the performance perspective, and select a driver that’s a good fit for your needs.
This document discusses replacing external caching solutions with using the internal caching capabilities of ScyllaDB. It provides examples of companies that improved performance, reduced costs and complexity by moving from Redis or Elasticsearch with an external cache to using ScyllaDB's embedded cache instead. The document also outlines some of the advantages of ScyllaDB's cache like improved latency, coherency with the database and observability compared to external caching layers.
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate.
Join us to learn:
- Why and how to ensure your database takes full advantage of your cloud infrastructure
- What architectural considerations matter most for high throughput and low latency
- Key factors to consider when selecting a high-performance database
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
This document discusses the pros and cons of placing an external cache in front of a database. It introduces Tomasz Grabiec and Tzach Livyatan from ScyllaDB and describes ScyllaDB's optimized internal caching design. External caches can increase latency and costs while ignoring the database's context and workload knowledge. ScyllaDB embeds its cache to minimize overhead and ensure data and query awareness. The document shares customer examples that improved performance and reduced costs by moving from cached databases to ScyllaDB.
Expert tips on how to maximize your database potential
If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case?
This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to:
- Infrastructure selection
- ScyllaDB configuration
- Client-side setup
- Data modeling
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
In this talk, Felipe Mendes, Solutions Architect at ScyllaDB, shares how 4 companies managed their migration. He covers:
Disney+ – No migration needed!
Discord – Shadow cluster
OpenWeb – TTL expiration, cover Load and Stream
MyHeritage – Counters
ShareChat – Bonus: A bit of everything
In this talk, Lubos discusses tools and methods for a successful migration. He covers:
Methods
Data (re)modeling
APIs
Spark Migrator
DS bulk
Tuning
Testing/monitoring
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Captial One: Why Stream Data as Part of Data Transformation?
1. Why Stream Data as Part
of Data Transformation
Glen Gomez Zuazo, Senior Solutions Architect
2. Presenter
Glen Gomez Zuazo, Senior Solutions Architect
● Data Science, Machine Learning, Distributed Systems, Full
Stack Development, Blockchain and Enterprise
Architecture
● Passionate involvement in Diversity and Inclusion
● STEM advocate for young people (Middle and High School)
● Teaching technology (CSSE, AWS and Microservices)
● Spending time with his family, including his dog (Bolillo),
running and camping
3. Event-Driven Data Architecture in 2019
■ Event-driven architectures are increasingly part of a complete data
transformation solution
■ This talks covers
● details of each
● advantages and disadvantages
● how to select the best for your company’s needs
6. Amazon Simple Queue Service
■ Fully managed message queuing service
■ Allows decoupling/scaling microservices, distributed systems, and
serverless applications from Sync to Asynch.
■ Eliminates complexity/overhead of managing and operating
message oriented middleware
7. SQS: type types of message queues
■ Standard queues: maximum throughput, best-effort ordering,
and at-least-once delivery
■ SQS FIFO queues: guarantees messages are processed exactly
once, in the exact order that they are sent.
8. SQS Functionality
■ Unlimited queues and messages
■ Payload
● Up to 256KB of text in any format
● Each 64KB ‘chunk’ of payload is billed as 1 request
● (E.g. 256KB payload is billed as four requests)
● Use Amazon SQS Extended Client Library for Java to send messages >256KB
● Extended Client Library uses Amazon S3 to store the message payload
■ Batches
● Send, receive, or delete messages in batches of up to 10 messages or 256KB
● Batches cost the same amount as single messages
● More cost effective for customers
9. SQS Functionality (cont’d)
■ Long polling
● Reduce extraneous polling to minimize cost while receiving new messages as
quickly as possible.
● When your queue is empty, long-poll requests wait up to 20 seconds for the next
message to arrive
● Long poll requests cost the same amount as regular requests.
■ Retain messages in queues for up to 14 days.
■ Send and read messages simultaneously
10. Functionality
■ Message locking.
● While is Processing.
■ Queue sharing
● Anonymously
● Specific AWS Accounts
■ Server-side encryption (SSE)
● AWS Key Management Service (AWS KMS)
■ Dead Letter Queues (DLQ)
● source queue (standard or FIFO).
11. Publish-Subscribe for Application Integration
● Exchange Data Asynchronously
● Be Independent and fault-tolerant
● Allow Systems to be in different environments (OS, Language)
14. NATS
■ high-performance, cloud native messaging system
■ provides an entire foundational level
■ can build both synchronous and asynchronous, reliable, highly
available systems
■ 2.0 release provides incredible features both for high availability and
security
● not to be confused with NGS, the Synadia commercial version
Let’s cover the details of how we plan to deploy and configure NATS with
special focus on HA and security.
15. High Availability
■ Deploy a NATS cluster as a global entity with NATS gateways used to
connect multi regions. Both NATS and System proper will be
deployed active/active.
■ It is assumed that there is a geographically pinned single point of
entry into each cluster in all of these scenarios as per standard AWS
practices.
■ In "classic" active-active scenarios, you have two or more completely
isolated mirrors.
16. Sharing Streams and Services
■ NATS account model also comes with an explicit and secure by
default means of allowing communication between accounts.
● Account owners can export either a stream (write-only from the account, read-only
to subscribers)
● Service (read/write).
■ Ability to export your service or stream
● Public export (allows any authorized account to import that subject)
● Private export. (Requires an explicit, out of band delivery of an activation token).
17.
18.
19. Security and Multi-Tenancy
■ Main considerations / concerns in a multi-tenant system that sits on top of a
central messaging system
● Security of clients and the message traffic
● Configuration maintenance.
● Multi-tenant systems running in the same cluster (e.g. K8s tenants co-
existing with ECS tenants) complexity
■ In a decentralized model, clients authenticate to NATS with signed user JWTs.
There is a hierarchy that goes from Operator to Account to User.
■ In NATS, an account is a unit of isolation and a user is a unit of client
authentication and authorization.
22. RabbitMQ
■ Messages published to queues (through exchange points).
■ Multiple consumers can connect to a queue.
■ Message broker distribute messages across all available consumers.
■ Also, we can re-deliver the message if the consumer fails.
■ Delivery order guaranteed for queues with a single consumer (this is
not possible when the queue has multiple consumers).
23. Architecture Considerations
■ Performance:
● RabbitMQ is around 20,000 messages/second
■ Processing:
● The consumer is just FIFO based, reading from the HEAD and processing 1 by 1
■ HA
● Provides High Availability Support
■ Open Source
● RabbitMQ is open Source through Mozilla Public License
26. Kafka
■ We use Apache Kafka when it comes to enabling communication
between producers and consumers using message-based topics.
Apache Kafka is a fast, scalable, fault-tolerant, publish-subscribe
messaging system.
■ Basically, it designs a platform for high-end new generation
distributed applications. Also, it allows a large number of permanent
or ad-hoc consumers.
27. Architecture
■ Kafka Producer API
● Permits an application to publish a stream of records to one or more Kafka topics.
■ Kafka Consumer API
● To subscribe to one or more topics and process the stream of records produced to
them in an application
■ Kafka Streams API
● Gives permission to an application in order to act as a stream processor
● Consumes an input stream from one or more topics
● Produces an output stream to one or more output topics
● Also effectively transforming the input streams to output streams
■ Kafka Connector API
● Allows building and running reusable producers or consumers that connect Kafka
topics to existing applications or data systems
● Example: connector to a relational database might capture every change to a table
32. Architectural Message Review Example
We follow processes to define which technology and patterns are going
to be apply base on the specifics requirements of the system.
We perform the following steps:
■ System Requirements
■ ASR (Architecturally Significant Requirements)
■ ADR (Architecturally Decisions Record)
■ System Context and Data Flow
■ PoC
■ MVPx
33. Architecturally Significant Requirements
Architecturally Significant Requirements (ASR) have a measurable effect on a system's
architecture, which includes application and infrastructure.
ASR Criteria
Requirements that have wide effects, are strict, or difficult to achieve are often ASRs. Per the Wikipedia article
on ASRs, some common indicators for a requirement being an ASR are:
■ The requirement is associated with high business value and/or technical risk.
■ The requirement is a concern of a particularly important (influential, that is) stakeholder.
■ The requirement has a first-of-a-kind character, e.g. none of the responsibilities of already existing
components in the architecture addresses it.
■ The requirement has QoS/SLA characteristics that deviate from all ones that are already satisfied by the
evolving architecture.
■ The requirement has caused budget overruns or client dissatisfaction in a previous project with a similar
context.
34. Architecturally Significant Requirements
Categories
We have split our ASRs up into categories to make them easier to read and to allow us to
provide more detail for each requirement. These categories are:
■ Availability
■ Maintainability
■ Observability
■ Performance
■ Resiliency
■ Testability
■ Usability
35. Architecturally Decision Record
■ NATS is an open source, powerful, lightweight, secure-by-default
messaging system.
■ Gives same kind of delivery control as consumer groups in Kafka
■ But without overhead of maintenance and operations cost.
■ NATS is essentially self-managing---it doesn’t need anyone to create
new partitions to scale up or down
■ Clusters form themselves and self-heal, and clients are immediately
notified of cluster topology changes.
■ NATS supports traditional request/reply, pub/sub, fanout, and many
more messaging patterns.
36. Why did we need a message broker?
Our ASRs lean heavily toward:
■ Resiliency,
■ Stability, and
■ Performance
When doing traditional point-to-point communications you have to do a number of things
that introduce points of failure, possible performance degradation, and loss of stability:
■ Service discovery (what's the address for a service?)
■ Retries and Failure Responses
■ Coping with slow connections and intermittent failure
■ Exponential back-off to avoid cascading failures
37. Why not Kafka?
Once we decided that we wanted to take advantage of a message
broker and utilize all of the asynchronous power that comes with it,
we needed to pick which broker.
■ We require low operations burden.
■ Ability to scale without having delicate reconfiguration
■ Fast request-response performance
38. Why not RabbitMQ?
Rabbit has a reputation for reliability and speed, and some of the team
members had used it before. One of the main reasons we disliked the
use of Rabbit was because of the explicit nature of fanout exchanges.
■ Require explicit definition of queues and subscriptions
■ Not recommended for multi-tenant systems
■ Ability to add instances / subscribers without reconfiguration
39. NATS Security
Neither Rabbit nor Kafka gave us the kind of security support we
needed. We need the ability to explicitly control which clients can
publish to which topics and which clients can subscribe to those
topics.
■ Ability to inject the security information without taking broker
down.
■ Flexibility to work with nkeys
■ Asymmetric encryption key system
42. Thank you Stay in touch
Any questions?
Glen Gomez Zuazo
g_gomez_zuazo@hotmail.com
@ZuazoGlen
Editor's Notes
Event-driven architectures are increasingly part of a complete data transformation solution. Learn how to employ Apache Kafka, Cloud Native Computing Foundation’s NATS, Amazon SQS, or other message queueing technologies. This talks covers the details of each, their advantages and disadvantages and how to select the best for your company’s needs.
Notes: Lightbend Akka, this ie beyond my analysis scope for this presentation for Capital One applications, But I know that at least one other presenter is going to be speaking about Akka/Scala — that is Alexandros Bantis from Tubi.tv. Even though it may have been beyond Capital One's consideration, you may wish to mention it in a roundup of popular solutions.
Extra Notes:
Send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be available.
AWS console, Command Line Interface or SDK of your choice, and three simple commands.
Message locking: When a message is received, it becomes “locked” while being processed. This keeps other computers from processing the message simultaneously. If the message processing fails, the lock will expire and the message will be available again.
Queue sharing: Securely share Amazon SQS queues anonymously or with specific AWS accounts. Queue sharing can also be restricted by IP address and time-of-day.
Server-side encryption (SSE): Protect the contents of messages in Amazon SQS queues using keys managed in the AWS Key Management Service (AWS KMS). SSE encrypts messages as soon as Amazon SQS receives them. The messages are stored in encrypted form and Amazon SQS decrypts messages only when they are sent to an authorized consumer.
Dead Letter Queues (DLQ): Handle messages that have not been successfully processed by a consumer with Dead Letter Queues. When the maximum receive count is exceeded for a message it will be moved to the DLQ associated with the original queue. Set up separate consumer processes for DLQs which can help analyze and understand why messages are getting stuck. DLQs must be of the same type as the source queue (standard or FIFO).
In a solution where every service requires NATS to be available in order to function, we clearly need to ensure that NATS meets or exceeds our Top Resiliency Tier level SLAs. To do this, we'll deploy a NATS cluster as a global entity with NATS gateways used to connect east and west. Both NATS and System proper will be deployed active/active.
It is assumed that there is a geographically pinned single point of entry into each cluster in all of these scenarios as per standard AWS practices.
In "classic" active-active scenarios, you have two or more completely isolated mirrors. These two geolocated clusters are completely unaware of each other. Independent component failure is isolated within a region, and in the case of an entire region failure, routes are updated to direct all traffic to the other remaining regions.
The NATS account model also comes with an explicit and secure by default means of allowing communication between accounts. As an account owner, you can export either a stream (write-only from the account, read-only to subscribers) or a service (read/write).
When you export your service or stream, you can choose to do so as a public or a private export. A public export allows any authorized account to import that subject. A private export requires an explicit, out of band delivery of an activation token to the account wishing to import. Without this token, an account cannot import a private export.
What this boils down to is that, with some facilitation by a service to generate keys and tokens, tenants can manage their own topic namespaces, their own users (connected clients), and their own imports/exports with no manual operations overhead. We get security by default, decentralized configuration, self-service secure message exchange, and a "service marketplace" where account (tenant) owners can browse exported subjects and add requests like a shopping cart.
In a multi-tenant system that sits on top of a central messaging system, one of our main concerns was not just the security of clients and the message traffic, but in maintaining configuration. If we had to re-write a configuration file and send an update signal to a server every time we added or removed a tenant, this would become a maintenance nightmare. This would be compounded even more with two multi-tenant systems running in the same cluster (e.g. K8s tenants co-existing with ECS tenants).
In a decentralized model, clients authenticate to NATS with signed user JWTs. There is a hierarchy that goes from Operator to Account to User. In NATS, an account is a unit of isolation and a user is a unit of client authentication and authorization. This decentralized security model actually solves a number of other problems we would have inevitably run into.
RabbitMQ is an open-source message-broker software (sometimes called message-oriented middleware) that originally implemented the Advanced Message Queuing Protocol (AMQP) and has since been extended with a plug-in architecture to support Streaming Text Oriented Messaging Protocol (STOMP), Message Queuing Telemetry Transport (MQTT), and other protocols.
The output of RabbitMQ design:
One of the best features of Kafka is, it is highly available and resilient to node failures and supports automatic recovery. This feature makes Apache Kafka ideal for communication and integration between components of large-scale data systems in real-world data systems.
Point to point operations are generally synchronous, though you can accomplish some decent asynchronous operations with gRPC streaming.
Finally, point-to-point means that no interested parties can become aware of communications unless the sender goes out of its way to make multiple P2P connections or emit secondary events. Our thought is if you're going to emit secondary events, why not build the entire substrate out of asynchronous messaging, skipping point to point altogether?
Service discovery, especially explicit discovery requiring a discovery broker like Netflix Eureka, introduces a new single point of failure to the entire system and, even when working perfectly, introduces the latency cost of at least one more network hop (if you're caching, then you have to deal with the consequences of outdated discovery data).
Because of the history and precedent of using Kafka within Capital One, including its role as the backbone behind the Streaming Data Platform (SDP), we considered using Kafka for our broker.
Once we decided that we wanted to take advantage of a message broker and utilize all of the asynchronous power that comes with it, we needed to pick which broker.
There are a number of critical reasons why we chose against Kafka. First and foremost, we wanted a low operations burden and Kafka is anything but that. Further, we need the ability to scale our services and to dynamically add new topics and new subscribers live, at runtime, in production, without having to perform delicate reconfiguration.
Because of the way Kafka works, we would have to reconfigure partitions and topics manually or through some form of potentially brittle automation. You can't simply scale up and down subscribers and publishers without altering Kafka configuration accordingly.
We also needed incredibly fast request-response performance. We wanted the flexibility of an asynchronous substrate without sacrificing synchronous point-to-point performance. We could not get that with Kafka and NATS outperformed Kafka for non-durable messages in every benchmark.
Because of the history and precedent of using Kafka within Capital One, including its role as the backbone behind the Streaming Data Platform (SDP), we considered using Kafka for our broker.
With Rabbit, clients must explicitly define the queues and subscriptions and exchanges in use when they connect. This can be problematic and create problems in multi-tenant systems. We needed a system where we could dynamically scale the number of instances of a queue subscriber AND add more subscribers to the same queue without negatively impacting existing service or requiring a reconfiguration (manual or automatic) of the message broker.
Because the client list is external to the message broker (a 1:1 correlation with tenant services), this security information needs to be injectable into the broker cluster, no matter how many instances of the broker are running, without ever taking the broker down in production.
NATS security not only gives us this, but lets us work with nkeys, an incredibly powerful asymmetric encryption key system that is less vulnerable to attack than traditional SSH keys and can allow security information to easily flow from a Kubernetes secret to tenant services and the broker configuration.