For this upcoming meetup, we welcome Patrick Eaton PhD, Systems Architect at Stackdriver, and Joey Imbasciano, Cloud Platform Engineer at Stackdriver.
What You'll Learn At This Meetup:
• Why Stackdriver chose Cassandra over other DB offerings
• Stackdriver's data pipeline that runs into Cassandra
• Operating Cassandra Running on AWS
• Stackdriver's approach to disaster recovery
Patrick and Joey will be presenting their use of Apache Cassandra at Stackdriver, some lesson's learned, technical tips and a Q&A to end the evening.
What are the challenges of running Apache Cassandra on Amazon EC2? Is it a good idea?
In this presentation, we explore reasons for and against running the distributed database Cassandra on EC2. We look at the I/O performance of EC2 and
(BDT323) Amazon EBS & Cassandra: 1 Million Writes Per SecondAmazon Web Services
With the introduction of Amazon Elastic Block Store (EBS) GP2 and recent stability improvements, EBS has gained credibility in the Cassandra world for high performance workloads. By running Cassandra on Amazon EBS, you can run denser, cheaper Cassandra clusters with just as much availability as ephemeral storage instances. This talk walks through a highly detailed use case and configuration guide for a multi PetaByte, million write per second cluster that needs to be high performing and cost efficient. We explore the instance type choices, configuration, and low-level tuning that allowed us to hit 1.3 million writes per second with a replication factor of 3 on just 60 nodes.
Mesosphere and Contentteam: A New Way to Run CassandraDataStax Academy
We, Ben Whitehead and Robert Stupp, will show you how to run Cassandra on Mesos. We will go through all the technical steps how to plan, setup and operate even large scale Cassandra clusters on Mesos. Further we illustrate how the Cassandra-on-Mesos framework helps you to setup Cassandra on Mesos, schedule regular maintenance tasks and manage hardware failures in the heart of your data center.
Critical Attributes for a High-Performance, Low-Latency DatabaseScyllaDB
When low latency (P99) and high performance are core requirements, what NoSQL database attributes should you consider, and what tradeoffs are key? While we live in a world of multi-CPU, multi-core servers capable of storing tens of terabytes of data, if your database isn’t architected to take advantage of this, you’re being penalized on performance or cost.
Join this webinar to learn about the critical elements for a high-performance, low-latency NoSQL database. ScyllaDB’s engineers will discuss how they addressed core database performance challenges, including the pros and cons of each, and provide a detailed explanation of the architectural principles they applied to achieve their performance objectives.
We’ll take a deep dive into the strategies applied to:
Achieve precise control over I/O and compute-intensive workloads
Avoid locks and contention on the CPU level
Bypass kernel bottlenecks
Squeeze the most out of modern multi-core hardware
Satisfy SLAs while maintaining system stability
1 Million Writes per second on 60 nodes with Cassandra and EBSJim Plush
EBS has long been taboo in the Cassandra world for high performance workloads. That line of thinking has started to change with the introduction of EBS GP2 and the recent stability improvements made by the EBS team, which is why we have multiple PetaBytes of data relying on EBS every day. Running Cassandra on EBS will now let you run denser, cheaper Cassandra clusters with just as much availability as ephemeral storage instances. This talk will walk through a highly detailed use case and configuration guide for a multi PetaByte, million write per second cluster that needs to be highly performant and cost efficient. We will dive into the instance type choices, configuration and low level tuning that allowed us to hit 1.3 million writes per second with a replication factor of 3 on just 60 nodes. We will go into the details of why we chose to use the latest DateTieredCompactionStrategy and why that's a perfect fit for high volume time series workloads.
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...DataStax
With a growing customer base and Cassandra clusters running on-top of a number of the world’s largest cloud and bare-metal hosting providers, Instaclustr is at the forefront of always-on Cassandra hosting. Instaclustr leverages the power of Docker, a modern containerization solution for Linux, and CoreOS, a lightweight Linux distribution tailored to running software inside containers, to build a stable and adaptable Cassandra hosting platform.
Summary of past Cassandra benchmarks performed by Netflix and description of how Netflix uses Cassandra interspersed with a live demo automated using Jenkins and Jmeter that created two 12 node Cassandra clusters from scratch on AWS, one with regular disks and one with SSDs. Both clusters were scaled up to 24 nodes each during the demo.
What are the challenges of running Apache Cassandra on Amazon EC2? Is it a good idea?
In this presentation, we explore reasons for and against running the distributed database Cassandra on EC2. We look at the I/O performance of EC2 and
(BDT323) Amazon EBS & Cassandra: 1 Million Writes Per SecondAmazon Web Services
With the introduction of Amazon Elastic Block Store (EBS) GP2 and recent stability improvements, EBS has gained credibility in the Cassandra world for high performance workloads. By running Cassandra on Amazon EBS, you can run denser, cheaper Cassandra clusters with just as much availability as ephemeral storage instances. This talk walks through a highly detailed use case and configuration guide for a multi PetaByte, million write per second cluster that needs to be high performing and cost efficient. We explore the instance type choices, configuration, and low-level tuning that allowed us to hit 1.3 million writes per second with a replication factor of 3 on just 60 nodes.
Mesosphere and Contentteam: A New Way to Run CassandraDataStax Academy
We, Ben Whitehead and Robert Stupp, will show you how to run Cassandra on Mesos. We will go through all the technical steps how to plan, setup and operate even large scale Cassandra clusters on Mesos. Further we illustrate how the Cassandra-on-Mesos framework helps you to setup Cassandra on Mesos, schedule regular maintenance tasks and manage hardware failures in the heart of your data center.
Critical Attributes for a High-Performance, Low-Latency DatabaseScyllaDB
When low latency (P99) and high performance are core requirements, what NoSQL database attributes should you consider, and what tradeoffs are key? While we live in a world of multi-CPU, multi-core servers capable of storing tens of terabytes of data, if your database isn’t architected to take advantage of this, you’re being penalized on performance or cost.
Join this webinar to learn about the critical elements for a high-performance, low-latency NoSQL database. ScyllaDB’s engineers will discuss how they addressed core database performance challenges, including the pros and cons of each, and provide a detailed explanation of the architectural principles they applied to achieve their performance objectives.
We’ll take a deep dive into the strategies applied to:
Achieve precise control over I/O and compute-intensive workloads
Avoid locks and contention on the CPU level
Bypass kernel bottlenecks
Squeeze the most out of modern multi-core hardware
Satisfy SLAs while maintaining system stability
1 Million Writes per second on 60 nodes with Cassandra and EBSJim Plush
EBS has long been taboo in the Cassandra world for high performance workloads. That line of thinking has started to change with the introduction of EBS GP2 and the recent stability improvements made by the EBS team, which is why we have multiple PetaBytes of data relying on EBS every day. Running Cassandra on EBS will now let you run denser, cheaper Cassandra clusters with just as much availability as ephemeral storage instances. This talk will walk through a highly detailed use case and configuration guide for a multi PetaByte, million write per second cluster that needs to be highly performant and cost efficient. We will dive into the instance type choices, configuration and low level tuning that allowed us to hit 1.3 million writes per second with a replication factor of 3 on just 60 nodes. We will go into the details of why we chose to use the latest DateTieredCompactionStrategy and why that's a perfect fit for high volume time series workloads.
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...DataStax
With a growing customer base and Cassandra clusters running on-top of a number of the world’s largest cloud and bare-metal hosting providers, Instaclustr is at the forefront of always-on Cassandra hosting. Instaclustr leverages the power of Docker, a modern containerization solution for Linux, and CoreOS, a lightweight Linux distribution tailored to running software inside containers, to build a stable and adaptable Cassandra hosting platform.
Summary of past Cassandra benchmarks performed by Netflix and description of how Netflix uses Cassandra interspersed with a live demo automated using Jenkins and Jmeter that created two 12 node Cassandra clusters from scratch on AWS, one with regular disks and one with SSDs. Both clusters were scaled up to 24 nodes each during the demo.
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformDataStax Academy
In this talk will show how Large Scale Data Analytics can be done with Spark and Cassandra on the DataStax Enterprise Platform. First we will give an overview of what is the Spark Cassandra Connector and how it enables working with large data sets. Then we will use the Spark Notebook to show live examples in the browser of interacting with the data. The example will load a large Movies Database from Cassandra into Spark and then show how that data can be transformed and analyzed using Spark.
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...ScyllaDB
Scylla is an open source reimplementation of Cassandra which performs up to 10X with drop in-replacement compatibility. At ScyllaDB, performance matters but even more importantly, stable performance under any circumstances.
A key factor for our consistent performance is our reliance on userspace schedulers. Scheduling in userspace allows the application, the database in our case to have better control on the different priorities each task has and to provide an SLA to selected operations. Scylla used to have an I/O scheduler and recently won a CPU scheduler.
At ScyllaDB, we make architectural decisions that provide not only low latencies but consistently low latencies at higher percentiles. This begins with our choice of language and key architectural decisions such as not using the Linux page-cache, and is fulfilled by autonomous database control, a set of algorithms, which guarantees that the system will adapt to changes in the workload. In the last year, we have made changes to Scylla that provide latencies that are consistent in every percentile. In this talk, Dor Laor will recap those changes and discuss what ScyllaDB is doing in the future.
Adam Zegelin is Instaclustr's founding software engineer. This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load. Micro-batching combines writes for the same partition key into a single network request and ensures they hit the “fast path” for writes on a Cassandra node.
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScyllaDB
Eventually consistent databases choose to remain available under failure, allowing for conflicting data to be stored in different replicas (later repaired by background processes). Weakening the consistency guarantees improves not only availability, but also performance, as the number of replicas involved in a given operation can be minimized. There are, however, use-cases that require the opposite trade-off. Indeed, Apache Cassandra and Scylla provide Lightweight Transactions (LWT), which allow single-key linearizable updates. The mechanism underlying LWT is asynchronous consensus. In this talk, we'll describe the characteristics and requirements of Scylla's consensus implementation, and how it enables strongly consistent updates. We will also cover how consensus can be applied to other aspects of the system, such as schema changes, node membership, and range movements, in order to improve their reliability and safety. We will thus show that an eventually consistent database can leverage consensus without compromising either availability or performance.
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
We have seen rapid adoption of C* at eBay in past two years. We have made tremendous efforts to integrate C* into existing database platforms, including Oracle, MySQL, Postgres, MongoDB, XMP etc.. We also scale C* to meet business requirement and encountered technical challenges you only see at eBay scale, 100TB data on hundreds of nodes. We will share our experience of deployment automation, managing, monitoring, reporting for both Apache Cassandra and DataStax enterprise.
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
Customizing JVM settings for the needs of an application can be a tricky business, especially when running externally developed software such as Cassandra. In this talk I will share our experiences and the procedure that we have used to test and validate changes with Java tuning. We'll explore with two recent experiences: changes and monitoring of G1 garbage collection, and moving buffer objects off the heap.
For the talk, I'll discuss our tuning process at Knewton. I will share some of the challenges that we faced while identifying what we expected to learn. I'll discuss how we isolated and minimized variables across tests, the importance of the duration of these tests, and how we try to separate correlation from causation. I will demonstrate how to use and interpret the results of the custom scripts that we were driven to develop to gain visibility into our G1GC processes; these scripts will be open sourced.
About the Speaker
Carlos Monroy Senior Software Engineer, Knewton
Carlos Monroy is a senior engineer on the database team at Knewton, an education company that created an adaptive learning platform. Carlos has been developing software professionally since 1998. His experience holding multiple roles on the software lifecycle provides him a wholistic approach. Having used over a half dozen relational database engines, he has recently come over to the NoSQL side, first working with HBase and for the last three years Cassandra.
Seastar is a modern, open source server application framework written in C++ that presents a future/promise based API to the user while delivering top-of-the line performance -- more than five times the nearest competitor, with 7 million requests per second served on a single machine.
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesDataStax Academy
Presenter: Roopa Tangirala, Senior Cloud Data Architect at Netflix
High availability is an important requirement for any online business and trying to architect around failures and expecting infrastructure to fail, and even then be highly available, is the key to success. One such effort here at Netflix was the Active-Active implementation where we provided region resiliency. This presentation will discuss the brief overview of the active-active implementation and how it leveraged Cassandra’s architecture in the backend to achieve its goal. It will cover our journey through A-A from Cassandra’s perspective, the data validation we did to prove the backend would work without impacting customer experience. The various problems we faced, like long repair times and gc_grace settings, plus lessons learned and what would we do differently next time around, will also be discussed.
Co-Founder and CTO of Instaclustr, Ben Bromhead's presentation at the Cassandra Summit 2016, in San Jose.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day. Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...DataStax
Traditionally, machines were statically partitioned across the different services at Uber. In an effort to increase the machine utilization, Uber has recently started transitioning most of its services, including the storage services, to run on top of Mesos. This presentation will describe the initial experience building and operating a framework for running Cassandra on top of Mesos running across multiple datacenters at Uber. This framework automates several Cassandra operations such as node repairs, addition of new nodes and backup/restore. It improves efficiency by co-locating CPU-intensive services as well as multiple Cassandra nodes on the same Mesos agent. It handles failure and restart of Mesos agents by using persistent volumes and dynamic reservations. This talk includes statistics about the number of Cassandra clusters in production, time taken to start a new cluster, add a new node, detect a node failure; and the observed Cassandra query throughput and latency.
About the Speaker
Abhishek Verma Software Engineer, Uber
Dr. Abhishek Verma is currently working on running Cassandra on top of Mesos at Uber. Prior to this, he worked on BorgMaster at Google and was the first author of the Borg paper published in Eurosys 2015. He received an MS in 2010 and a PhD in 2012 in Computer Science from the University of Illinois at Urbana-Champaign, during which he authored more than 20 publications in conferences, journals and books and presented tens of talks.
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!ScyllaDB
As a real time Big Data database, there are few things more important than keeping latencies low and bounded. Scylla has been delivering great tail latencies from our day one, but the job of making them better never ends and there is always more to do. In this talk we will explore some of the changes made to Scylla in the past few releases to help keep latencies down.
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
Dynomite is a
thin, distributed dynamo layer for different storage engines and protocols. Currently at Netflix, we are focusing on using
Redis as the storage engine. Dynomite supports multi-datacenter replication and is designed for high availability. In the age of high scalability and big data, Dynomite’s design goal is to turn single-server datastore solutions into peer-to-peer, linearly
scalable, clustered systems while still preserving the native client/server protocols of the datastores, e.g., Redis protocol. In this talk, we are going to present Dynomite recent features, and the Dyno client. Both projects are open source and available to the community.
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformDataStax Academy
In this talk will show how Large Scale Data Analytics can be done with Spark and Cassandra on the DataStax Enterprise Platform. First we will give an overview of what is the Spark Cassandra Connector and how it enables working with large data sets. Then we will use the Spark Notebook to show live examples in the browser of interacting with the data. The example will load a large Movies Database from Cassandra into Spark and then show how that data can be transformed and analyzed using Spark.
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...ScyllaDB
Scylla is an open source reimplementation of Cassandra which performs up to 10X with drop in-replacement compatibility. At ScyllaDB, performance matters but even more importantly, stable performance under any circumstances.
A key factor for our consistent performance is our reliance on userspace schedulers. Scheduling in userspace allows the application, the database in our case to have better control on the different priorities each task has and to provide an SLA to selected operations. Scylla used to have an I/O scheduler and recently won a CPU scheduler.
At ScyllaDB, we make architectural decisions that provide not only low latencies but consistently low latencies at higher percentiles. This begins with our choice of language and key architectural decisions such as not using the Linux page-cache, and is fulfilled by autonomous database control, a set of algorithms, which guarantees that the system will adapt to changes in the workload. In the last year, we have made changes to Scylla that provide latencies that are consistent in every percentile. In this talk, Dor Laor will recap those changes and discuss what ScyllaDB is doing in the future.
Adam Zegelin is Instaclustr's founding software engineer. This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load. Micro-batching combines writes for the same partition key into a single network request and ensures they hit the “fast path” for writes on a Cassandra node.
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScyllaDB
Eventually consistent databases choose to remain available under failure, allowing for conflicting data to be stored in different replicas (later repaired by background processes). Weakening the consistency guarantees improves not only availability, but also performance, as the number of replicas involved in a given operation can be minimized. There are, however, use-cases that require the opposite trade-off. Indeed, Apache Cassandra and Scylla provide Lightweight Transactions (LWT), which allow single-key linearizable updates. The mechanism underlying LWT is asynchronous consensus. In this talk, we'll describe the characteristics and requirements of Scylla's consensus implementation, and how it enables strongly consistent updates. We will also cover how consensus can be applied to other aspects of the system, such as schema changes, node membership, and range movements, in order to improve their reliability and safety. We will thus show that an eventually consistent database can leverage consensus without compromising either availability or performance.
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
We have seen rapid adoption of C* at eBay in past two years. We have made tremendous efforts to integrate C* into existing database platforms, including Oracle, MySQL, Postgres, MongoDB, XMP etc.. We also scale C* to meet business requirement and encountered technical challenges you only see at eBay scale, 100TB data on hundreds of nodes. We will share our experience of deployment automation, managing, monitoring, reporting for both Apache Cassandra and DataStax enterprise.
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
Customizing JVM settings for the needs of an application can be a tricky business, especially when running externally developed software such as Cassandra. In this talk I will share our experiences and the procedure that we have used to test and validate changes with Java tuning. We'll explore with two recent experiences: changes and monitoring of G1 garbage collection, and moving buffer objects off the heap.
For the talk, I'll discuss our tuning process at Knewton. I will share some of the challenges that we faced while identifying what we expected to learn. I'll discuss how we isolated and minimized variables across tests, the importance of the duration of these tests, and how we try to separate correlation from causation. I will demonstrate how to use and interpret the results of the custom scripts that we were driven to develop to gain visibility into our G1GC processes; these scripts will be open sourced.
About the Speaker
Carlos Monroy Senior Software Engineer, Knewton
Carlos Monroy is a senior engineer on the database team at Knewton, an education company that created an adaptive learning platform. Carlos has been developing software professionally since 1998. His experience holding multiple roles on the software lifecycle provides him a wholistic approach. Having used over a half dozen relational database engines, he has recently come over to the NoSQL side, first working with HBase and for the last three years Cassandra.
Seastar is a modern, open source server application framework written in C++ that presents a future/promise based API to the user while delivering top-of-the line performance -- more than five times the nearest competitor, with 7 million requests per second served on a single machine.
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesDataStax Academy
Presenter: Roopa Tangirala, Senior Cloud Data Architect at Netflix
High availability is an important requirement for any online business and trying to architect around failures and expecting infrastructure to fail, and even then be highly available, is the key to success. One such effort here at Netflix was the Active-Active implementation where we provided region resiliency. This presentation will discuss the brief overview of the active-active implementation and how it leveraged Cassandra’s architecture in the backend to achieve its goal. It will cover our journey through A-A from Cassandra’s perspective, the data validation we did to prove the backend would work without impacting customer experience. The various problems we faced, like long repair times and gc_grace settings, plus lessons learned and what would we do differently next time around, will also be discussed.
Co-Founder and CTO of Instaclustr, Ben Bromhead's presentation at the Cassandra Summit 2016, in San Jose.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day. Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...DataStax
Traditionally, machines were statically partitioned across the different services at Uber. In an effort to increase the machine utilization, Uber has recently started transitioning most of its services, including the storage services, to run on top of Mesos. This presentation will describe the initial experience building and operating a framework for running Cassandra on top of Mesos running across multiple datacenters at Uber. This framework automates several Cassandra operations such as node repairs, addition of new nodes and backup/restore. It improves efficiency by co-locating CPU-intensive services as well as multiple Cassandra nodes on the same Mesos agent. It handles failure and restart of Mesos agents by using persistent volumes and dynamic reservations. This talk includes statistics about the number of Cassandra clusters in production, time taken to start a new cluster, add a new node, detect a node failure; and the observed Cassandra query throughput and latency.
About the Speaker
Abhishek Verma Software Engineer, Uber
Dr. Abhishek Verma is currently working on running Cassandra on top of Mesos at Uber. Prior to this, he worked on BorgMaster at Google and was the first author of the Borg paper published in Eurosys 2015. He received an MS in 2010 and a PhD in 2012 in Computer Science from the University of Illinois at Urbana-Champaign, during which he authored more than 20 publications in conferences, journals and books and presented tens of talks.
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!ScyllaDB
As a real time Big Data database, there are few things more important than keeping latencies low and bounded. Scylla has been delivering great tail latencies from our day one, but the job of making them better never ends and there is always more to do. In this talk we will explore some of the changes made to Scylla in the past few releases to help keep latencies down.
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
Dynomite is a
thin, distributed dynamo layer for different storage engines and protocols. Currently at Netflix, we are focusing on using
Redis as the storage engine. Dynomite supports multi-datacenter replication and is designed for high availability. In the age of high scalability and big data, Dynomite’s design goal is to turn single-server datastore solutions into peer-to-peer, linearly
scalable, clustered systems while still preserving the native client/server protocols of the datastores, e.g., Redis protocol. In this talk, we are going to present Dynomite recent features, and the Dyno client. Both projects are open source and available to the community.
Google Cloud Platform monitoring with ZabbixMax Kuzkin
This presentation describes how to configure Zabbix (https://zabbix.com/) to configure Google Cloud Platform events through its Monitoring API, using gcpmetrics (https://github.com/odin-public/gcpmetrics/) command line tool.
GumGum relies heavily on Cassandra for storing different kinds of metadata. Currently GumGum reaches 1 billion unique visitors per month using 3 Cassandra datacenters in Amazon Web Services spread across the globe.
This presentation will detail how we scaled out from one local Cassandra datacenter to a multi-datacenter Cassandra cluster and all the problems we encountered and choices we made while implementing it.
How did we architect multi-region Cassandra in AWS? What were our experiences in implementing multi-datacenter Cassandra? How did we achieve low latency with multi-region Cassandra and the Datastax Driver? What are the different Cassandra use cases at GumGum? How did we integrate our Cassandra with Spark?
Cloud Connect 2013- Lock Stock and x Smoking EC2'sHarish Ganesan
This Slide was presented @ Cloud Connect 2013. Lock, Stock and X Smoking EC2's was by inspired by Guy Ritchie movies. It describes how we put Amazon EMR + Spot EC2 instances to use for a customer and achieved cost savings while solving a Big Data problem.
Apache Cassandra is a popular choice for a wide variety of application persistence needs. There are many design choices that can effect uptime and performance. In this talk we'll look at some of the many things to consider from a single server to multiple data centers. Basic understanding of Cassandra features coupled with client driver features can be a very powerful combination. This talk will be an introduction but will deep dive into the technical details of how Cassandra works.
Disaster Recovery Planning using Azure Site RecoveryNitin Agarwal
Disaster recovery and business continuity solutions have been historically expensive and time consuming. Microsoft Azure Site Recovery (ASR) makes Disaster Recovery (DR) planning and implementation simpler and affordable for all types of organizations.
Join our team of cloud experts for a walk through of DR and ASR basics. We'll highlight best practices for ASR deployments and help you get a sense of the costs for implementing a solution.
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.
Download DataStax Enterprise: Academy.DataStax.com/Download
Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...DataStax Academy
We have the challenge of how to reliably store massive quantities of data that are available even in the face of infrastructure failures. We have similar challenges on the application side. The most successful cloud architectures break applications down into microservices. How then do we deploy, upgrade and manage the scale of those microservices? This session will illustrate how to tackle these challenges by taking advantage of both Cassandra and Microsoft's next generation PaaS infrastructure called Azure Service Fabric.
"Conceptually, a data lake is a flat data store to collect data in its original form, without the need to enforce a predefined schema. Instead, new schemas or views are created “on demand”, providing a far more agile and flexible architecture while enabling new types of analytical insights. AWS provides many of the building blocks required to help organizations implement a data lake. In this session, we will introduce key concepts for a data lake and present aspects related to its implementation. We will discuss critical success factors, pitfalls to avoid as well as operational aspects such as security, governance, search, indexing and metadata management. We will also provide insight on how AWS enables a data lake architecture.
A data lake is a flat data store to collect data in its original form, without the need to enforce a predefined schema. Instead, new schemas or views are created ""on demand"", providing a far more agile and flexible architecture while enabling new types of analytical insights. AWS provides many of the building blocks required to help organizations implement a data lake. In this session, we introduce key concepts for a data lake and present aspects related to its implementation. We discuss critical success factors and pitfalls to avoid, as well as operational aspects such as security, governance, search, indexing, and metadata management. We also provide insight on how AWS enables a data lake architecture. Attendees get practical tips and recommendations to get started with their data lake implementations on AWS."
Amazon AWS basics needed to run a Cassandra Cluster in AWSJean-Paul Azar
There is a lot of advice on how to configure a Cassandra cluster on AWS. Not every configuration meets every use case.
Best way to know how to deploy Cassandra on AWS is to know the basics of AWS. Part 1: We start covering AWS (as it applies to Cassandra). Later we go into detail with AWS Cassandra specifics.
NetflixOSS Meetup S3 E1, covering latest components in Distributed Databases, Telemetry systems, Big Data tools and more. Speakers from Netflix, IBM Watson, Pivotal and Nike Digital
Introducing the ultimate MariaDB cloud, SkySQLMariaDB plc
SkySQL is the first and only database-as-a-service (DBaaS) engineered for MariaDB by MariaDB, to use a state-of-the-art multi-cloud architecture built on Kubernetes and ServiceNow, and to deploy databases and data warehouses for transactional, analytical and hybrid transactional/analytical workloads.
In this session, we’ll lay out the vision for SkySQL, provide an overview of its capabilities, take a tour of its architecture, and discuss the long-term roadmap. We’ll wrap things up with a live demo of SkySQL, including a preview of its deep learning–based workload analysis and visualization interface.
How we have used ansible for real-time industry use cases and Integration with enterprise tools. Infra provisioning and config management using ansible and automating routine tasks.
Netflix Container Scheduling and Execution - QCon New York 2016aspyker
Scheduling a Fuller House: Container Management At Netflix
Customers from over all over the world streamed Forty Two Billion hours of Netflix content last year. Various Netflix batch jobs and an increasing number of service applications use containers for their processing. In this talk Netflix will present a deep dive on the motivations and the technology powering container deployment on top of the AWS EC2 service. The talk will cover our approach to cloud resource management and scheduling with the open source Fenzo library, along with details on docker execution engine as a part of project Titus. As well, the talk will share some of the results so far, lessons learned, and end with a brief look at the developer experience for containers.
A Comprehensive Introduction to Apache Cassandra.
Agenda:
- What is NoSQL?
- What is Cassandra?
- Architecture
- Data Model
- Key Features and Benefits
- Cassandra Tools
-- CQL
-- Nodetool
-- DataStax Opscenter
- Who’s using Cassandra?
We hear a lot about lambda architectures and how Cassandra and Spark can help us crunch our data both in batch and real-time. After a year in the trenches, I'll share how we at The Weather Company built a general purpose, weather-scale event processing pipeline to make sense of billions of events each day. If you want to avoid much of the pain learning how to get it right, this talk is for you.
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...RightScale
Your database is the foundation of your application. With cloud comes new advantages and considerations for architecting and deployment. Find out how RightScale uses SQL and NoSQL databases such as MySQL, MongoDB, and Cassandra to provide a scalable, distributed, and highly available service around the globe.
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
AWS Big Data Demystified #1: Big data architecture lessons learned . a quick overview of a big data techonoligies, which were selected and disregard in our company
The video: https://youtu.be/l5KmaZNQxaU
dont forget to subcribe to the youtube channel
The website: https://amazon-aws-big-data-demystified.ninja/
The meetup : https://www.meetup.com/AWS-Big-Data-Demystified/
The facebook group : https://www.facebook.com/Amazon-AWS-Big-Data-Demystified-1832900280345700/
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesRaghavendra Prabhu
This is a talk about orchestration of Cassandra with cassandra operator, kubernetes and Yelp PaaSTA (https://github.com/Yelp/paasta).
The talk was presented at Computer Laboratory, University of Cambridge as part of the Engineering, Science and Technology Event (https://www.careers.cam.ac.uk/recruiting/event2Tech.asp) in November 2019.
Robert Bates, SVP Sales Engineering of Crunchy Data explains how you can tackle Data Gravity, Kubernetes, and strategies/best practices to run, scale, and leverage stateful containers in production.
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
Companies today are innovating with real-time data to deliver truly amazing customer experiences in the moment. Real-time data management for real-time customer experience is core to staying ahead of competition and driving revenue growth. Join Trays to learn how Comcast is differentiating itself from it's own historical reputation with Customer Experience strategies.
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture.
Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication
Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html
Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.
Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
This is a two part talk in which we'll go over the architecture that enables Apache Cassandra’s linear scalability as well as how DataStax Drivers are able to take full advantage of it to provide developers with nicely designed and speedy clients extendable to the core.
To view the full-length video and tutorial, visit: https://academy.datastax.com/demos/getting-started-graph-databases
Getting Started with Graph Databases contains a brief overview of RDBMS architecture in comparison to graph, basic graph terminology, a real-world use case for graph, and an overview of Gremlin, the standard graph query language found in TinkerPop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
1. Running Cassandra in AWS
Patrick Eaton, PhD
patrick@stackdriver.com
@PatrickREaton
Joey Imbasciano
joey@stackdriver.com
@_joeyi
2. Stackdriver at a Glance
Stackdriver's hosted intelligent monitoring service helps
SaaS companies innovate more by reducing the burden of
day-to-day operations
● Cloud-native and cloud-aware
● Designed for complex distributed applications
● Founded by cloud/infrastructure industry veterans
(Microsoft, VMware, EMC, Endeca, Red Hat) with deep
systems and DevOps expertise
● Team of ~25, based in Downtown Boston
3. Intelligent Monitoring
Discover customer’s cloud-hosted
applications
●
●
●
●
Infrastructure inventory
Logical units, like groups/clusters
Services, hosted and self-managed
Elastic resources
Monitor
●
●
Various data sources
● Provider metrics
● Host metrics
● Custom metrics
● Endpoints
● Events
● Health
Rich visualizations
Analyze
●
●
●
●
●
Integrate data sources
Aggregate metrics
Report utilization, cost, etc.
Detect policy violations
Recommend actions
4. Lambda Architecture
●
●
●
●
●
●
Typical of modern architectures for on-line
applications.
Formalized by Nathan Marz
Composed of "batch", "speed", and "serving" layers
Batch layer
○ Store of record
○ Compute arbitrary views
Speed layer
○ Low latency updates
○ Streaming algorithms
Serving layer
○ Combine data from batch and speed layers to
answer queries
Serving
Speed
Batch
Data
5. Stackdriver Architecture
●
●
●
●
●
Shares characteristics of lambda architecture
Indexing (speed) path
○ Make "live" data available "pre-analysis"
Analysis (batch) path
○ Compute aggregations
○ Create recommendations
Query (serving) layer
○ Combine "live" and analyzed
data to answer queries
○ May require on-the-fly analysis
Alerting (speed) path (not discussed here)
○ Stream processing to detect
Query
(Serving)
Notification
(Serving)
Database
Indexing
(Speed)
Analysis
(Batch)
policy-based anomalies
Data
Alerting
(Speed)
6. Database Options
● We chose Cassandra!
○ True P2P architecture
○ Good support for write-heavy workloads
○ Compatible data model for time series data
■ Column per metric type, timestamps as columns
● Why not MySQL?
○ Experience with operating large, sharded deployments
○ Relational data model not a good match
● Why not HBase?
○ Operational complexity - zk, hadoop, hdfs, ...
○ Special "Master" role
● Why not Dynamo?
○ Avoid vendor lock-in and high cost
7. Stackdriver Architecture ++
●
Archival pipeline stores all data
● Very small surface area, battle-tested
● Critical for disaster recovery
● S3 considered durable enough
● Replicated for availability
Query
Cassandra
Roll-ups
Analysis
Recs
Inventory
Data Series
Analyze
●
●
●
Archive means Cassandra is "soft state"
C* consolidates analysis and indexing results
Properties of data in C*
● Immutable data
● Append-only
● Read-1, write-1 consistency
S3
Archive
Index
●
Scales out easily
● Indexers, archivers, analyzers, query servers
Data
8. Cassandra at Stackdriver Cluster Configuration
●
●
●
●
●
●
Version: Datastax Community Edition 1.2.10
Replication Factor: 3
Vnodes
Murmur3Partitioner
Ec2Snitch
○ Aids in request efficiency
○ Enables Cassandra to ensure replicas are in
different Availability Zones
phi_convict_threshold: 8 -> 12
○ Used to determine when nodes are down
○ AWS network can be spotty
9. Cassandra Topology in AWS
Where we started...
Where we are...
1
us-east-1a
us-east-1a
3
2
us-east-1c
us-east-1b
us-east-1c
Keep it balanced!
us-east-1b
10. Cassandra EC2 Node Configuration
● m1.xlarge
○ 4 cores
○ 15 GB RAM
○ 4 ephemeral disks available
● 4 disks RAID-0 for Data Volume and CommitLog
○
○
○
○
ext4 - defaults,noatime
mdadm RAID-0
Compactions
Heavy Read/Write IO
11. Cassandra Automation and Operations
● Combination of Boto, Fabric, &
Puppet
○ Boto for AWS API
○ Fabric + Puppet for Bootstrapping
○ Fabric for Operations
● One command to:
○
○
○
○
○
Launch a new cluster
Upsize a cluster
Replace a dead node
Remove existing nodes
List nodes in a cluster
13. Cassandra Backups using S3
● No Cassandra Powered Backups
● Restore from S3
● Useful for major version upgrades
Data
S3
Bulk
Loader
Map
Reduce
1. Data is archived when it is received
2. Bulk loader reads from S3
3. M/R re-analyzes data
4. Cassandra is repopulated
Cassandra
14. Disaster Recover in the Wild
●
●
●
●
●
●
●
●
October 23, Stackdriver suffered a total loss of our C* cluster
● Exhausted memory due to number of open file descriptors (see graph)
We did not notice the problem until it was too late
● Nodes began crashing, resulted in inconsistent view of the ring
Attempted to restart the cluster unsuccessfully for ~2 hours
Provisioned new 36 node cluster in ~2 hours
Directed “live” data to new cluster
Started bulk restore operation from archive
● Full-fidelity data and aggregations
No data loss due to archival pipeline
See http://www.stackdriver.com/post-mortem-october-23-stackdriver-outage/