Diagnosing Problems in Production involves first preparing monitoring tools like OpsCenter, Server monitoring, Application metrics, and Log aggregation. Common issues include incorrect server times causing data inconsistencies, tombstone overhead slowing queries, not using the right snitch, version mismatches breaking functionality, and disk space not being reclaimed properly. Diagnostic tools like htop, iostat, vmstat, dstat, strace, jstack, tcpdump and nodetool can help narrow down issues like performance bottlenecks, garbage collection problems, and compaction issues.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This session covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Viewers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Diagnosing Problems in Production - CassandraJon Haddad
This presentation covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Readers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This presentation is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Cassandra Summit 2014: Diagnosing Problems in ProductionDataStax Academy
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This session covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Viewers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Diagnosing Problems in Production - CassandraJon Haddad
This presentation covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Readers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This presentation is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Cassandra Summit 2014: Diagnosing Problems in ProductionDataStax Academy
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Using Time Window Compaction Strategy For Time Series WorkloadsJeff Jirsa
Cassandra is a great fit for high write use cases, which makes it a popular choice for storing time series and sensor-collection workloads. At Crowdstrike, we've been using Cassandra for just that purpose, collecting petabytes of expiring time series data. In this talk, I'll discuss compaction in time series workloads, and the TimeWindowCompactionStrategy we developed specifically for this purpose. I'll detail TWCS specific configuration properties, some lesser known compaction sub-properties that apply to all compaction strategies, and also cover other general tricks and tuning that are useful for very large time-series workloads.
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
We will present our O365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running DataStax Enterprise on azure.
Webinar: Getting Started with Apache CassandraDataStax
Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”
You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...DataStax
After reading the topic you are probably asking yourself: “Why I’ve never heard about Cassandra Streams?”. The reason is because Cassandra didn’t have any streams support. Until now.
The project started as a preparation to multi regional deployment, so we needed to test Cassandra and answer several simple questions:
· what will be the replication lag, or how long will it take for each mutation to propagate?
· will we be losing any mutation?
· will we see any additional load and/or other problems?
· and zillions of other questions
To answer them, we decided to integrate Cassandra with Amazon Kinesis, so we can track any individual mutation and analyze replication stats. That is how our Cassandra Streams integration was born. Currently it supports several stream platforms, like Kinesis and Kafka.
During this talk you will learn how we did it, how we used it to test Cassandra in Multi-regional setup and what are other possible applications of this concept.
About the Speaker
Dustin Pham, Sony
Dustin is part of the small team who built the core infrastructure which delivered the PlayStation 3 store and then subsequently core services for the PlayStation 4. He has been with Sony for over 4 years and continues to focus on providing entertainment experiences to Sony customers. Dustin is an avid gamer and finds enjoyment in solving large scale problems.
Cassandra Summit 2015: Real World DTCS For OperatorsJeff Jirsa
Real World DTCS For Operators
The introduction of DateTieredCompactionStrategy in late 2014 was a significant step forward in providing a viable compaction strategy for time series data, especially time series data that will be TTL'd out. DateTieredCompactionStrategy's introduction was met with genuine excitement, and its rapid adoption is testament to developers' and operators' desire to have data compacted in a way that better matches their write patterns.
However, DateTieredCompactionStrategy's features come with significant limitations. This talk will review our real world benchmarking and use cases for DTCS as a vehicle to discuss the implications of DateTieredCompactionStrategy on operational tasks such as repair, read-repair, bootstrapping, and especially DR recovery scenarios, and it will also discuss how those various limitations lead us to proposing an operations-friendly alternative to DateTieredCompactionStrategy.
Beginning Operations: 7 Deadly Sins for Apache Cassandra OpsDataStax Academy
The internal battle has been fought, and Cassandra is your group's NoSQL platform of choice! Hooray! But now what? This talk will introduce you to all the basic operations concepts you need to know to start your foray into the wonderful world of Cassandra off right. Or even if you have already started but are looking for a solid holistic overview... this is the talk for you!
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
In this talk, we will share the experiences of applying Cassandra with two real customers in China. In the first use case, we deployed Cassandra at Sany Group, a leading company of Machinery manufacturing, to manage the sensor data generated by construction machinery. By designing a specific schema and optimizing the write process, we successfully managed over 1.5 billion historical data records and achieved the online write throughput of 10k write operations per second with 5 servers. MapReduce is also used on Cassandra for valued-added services, e.g. operations management, machine failure prediction, and abnormal behavior mining. In the second use case, Cassandra is deployed in the China Meteorological Administration to manage the Meteorological data. We design a hybrid schema to support both slice query and time window based query efficiently. Also, we explored the optimized compaction and deletion strategy for meteorological data in this case.
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...DataStax
When we were preparing for PlayStation4 launch we faced 1 hard problem: how to make our games and videos storage system super-fast, highly available and fault tolerant. Moving from relation database to Cassandra is not easy, and it even harder if you want to support different search and query use cases. Join our talk to learn how we managed to build highly available platform, that supports tens of millions of active users and can execute multiple user specific queries in less than a millisecond. And all of it without using Solr or ElasticSearch.
About the Speaker
Alexander Filipchik, Sony
Alex spent last 4 years of his life building the next generation of PlayStation Network. He is honored to be a part of a small team of engineers who managed to build from scratch a platform that scaled from 0 users to 1 million PS4s in just 1 day and have being landing 1.5 million of new devices per month since and now reached tenth of millions of active users. He is passionate about technology, innovations, walking his dog and building scalable software using Cassandra.
QCon NYC: Distributed systems in practice, in theoryAysylu Greenberg
Modern systems in production rely on decades of computer science research. Over time, new architectural patterns emerge that enable more resilient and robust systems. In this talk, we'll discuss some of these patterns from systems I've worked on at Google and the related work that provide insights into the motivations behind them.
An introduction to core concepts in Apache Cassandra. We cover the evolution of database architecture as you try to scale a relational database to solve big data problems, and explain how Cassandra handles these problems efficiently.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This session covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Viewers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
Diagnosing Problems in Production: Cassandra Summit 2014Jon Haddad
At the 2014 Cassandra summit we covered how to ensure that your production experience with Cassandra is top notch by identifying the proper tools that should be put in place beforehand, and what tools you need to identify problems in real time.
Presented by Jon Haddad & Blake Eggleston
Joel Jacobson (Datastax) - Diagnosing Cassandra Problems in ProductionOutlyer
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
Video: https://www.youtube.com/watch?v=9XrHoAxd0Is
Join DevOps Exchange London here: http://www.meetup.com/DevOps-Exchange-London
Follow DOXLON on twitter http://www.twitter.com/doxlon
Cassandra Summit 2014: Diagnosing Problems in ProductionDataStax Academy
Presenters: Jon Haddad
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
Cache & Concurrency considerations for a high performance Cassandra deployment.
SriSatish Ambati
Cassandra has hit it's stride as a distributed java NoSQL database! It's fast, it's in-memory, it's scalable, it's seda; It's eventually consistent model makes it practical for the large & growing volumes of unstructured data usecases. It is also time to run it through the filters of performance analysis. For starters it runs on the java virtual machine and inherits the capabilities and culpabilities of the platform. This presentation reviews the runtime architecture, cache behavior & performance of a real-world workload on Cassandra. We blend existing system & jvm tools to get a quick overview & a breakdown of hotspots in the get, put & update operations. We highlight the role played by garbage collection & fragmentation due to long lived objects; We investigate lock contention in the data structures under concurrent usage. Cassandra uses UDP for management & TCP for data: we look at robustness of the communication patterns during high spikes and cluster-wide events. We review Non-Blocking Hashmap modifications to Cassandra that improve concurrency & amplify performance of this frontrunner in the NoSQL space
ApacheCon2010 NA
Wed, 03 November 2010 15:00
cassandra
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Using Time Window Compaction Strategy For Time Series WorkloadsJeff Jirsa
Cassandra is a great fit for high write use cases, which makes it a popular choice for storing time series and sensor-collection workloads. At Crowdstrike, we've been using Cassandra for just that purpose, collecting petabytes of expiring time series data. In this talk, I'll discuss compaction in time series workloads, and the TimeWindowCompactionStrategy we developed specifically for this purpose. I'll detail TWCS specific configuration properties, some lesser known compaction sub-properties that apply to all compaction strategies, and also cover other general tricks and tuning that are useful for very large time-series workloads.
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
We will present our O365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running DataStax Enterprise on azure.
Webinar: Getting Started with Apache CassandraDataStax
Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”
You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...DataStax
After reading the topic you are probably asking yourself: “Why I’ve never heard about Cassandra Streams?”. The reason is because Cassandra didn’t have any streams support. Until now.
The project started as a preparation to multi regional deployment, so we needed to test Cassandra and answer several simple questions:
· what will be the replication lag, or how long will it take for each mutation to propagate?
· will we be losing any mutation?
· will we see any additional load and/or other problems?
· and zillions of other questions
To answer them, we decided to integrate Cassandra with Amazon Kinesis, so we can track any individual mutation and analyze replication stats. That is how our Cassandra Streams integration was born. Currently it supports several stream platforms, like Kinesis and Kafka.
During this talk you will learn how we did it, how we used it to test Cassandra in Multi-regional setup and what are other possible applications of this concept.
About the Speaker
Dustin Pham, Sony
Dustin is part of the small team who built the core infrastructure which delivered the PlayStation 3 store and then subsequently core services for the PlayStation 4. He has been with Sony for over 4 years and continues to focus on providing entertainment experiences to Sony customers. Dustin is an avid gamer and finds enjoyment in solving large scale problems.
Cassandra Summit 2015: Real World DTCS For OperatorsJeff Jirsa
Real World DTCS For Operators
The introduction of DateTieredCompactionStrategy in late 2014 was a significant step forward in providing a viable compaction strategy for time series data, especially time series data that will be TTL'd out. DateTieredCompactionStrategy's introduction was met with genuine excitement, and its rapid adoption is testament to developers' and operators' desire to have data compacted in a way that better matches their write patterns.
However, DateTieredCompactionStrategy's features come with significant limitations. This talk will review our real world benchmarking and use cases for DTCS as a vehicle to discuss the implications of DateTieredCompactionStrategy on operational tasks such as repair, read-repair, bootstrapping, and especially DR recovery scenarios, and it will also discuss how those various limitations lead us to proposing an operations-friendly alternative to DateTieredCompactionStrategy.
Beginning Operations: 7 Deadly Sins for Apache Cassandra OpsDataStax Academy
The internal battle has been fought, and Cassandra is your group's NoSQL platform of choice! Hooray! But now what? This talk will introduce you to all the basic operations concepts you need to know to start your foray into the wonderful world of Cassandra off right. Or even if you have already started but are looking for a solid holistic overview... this is the talk for you!
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
In this talk, we will share the experiences of applying Cassandra with two real customers in China. In the first use case, we deployed Cassandra at Sany Group, a leading company of Machinery manufacturing, to manage the sensor data generated by construction machinery. By designing a specific schema and optimizing the write process, we successfully managed over 1.5 billion historical data records and achieved the online write throughput of 10k write operations per second with 5 servers. MapReduce is also used on Cassandra for valued-added services, e.g. operations management, machine failure prediction, and abnormal behavior mining. In the second use case, Cassandra is deployed in the China Meteorological Administration to manage the Meteorological data. We design a hybrid schema to support both slice query and time window based query efficiently. Also, we explored the optimized compaction and deletion strategy for meteorological data in this case.
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...DataStax
When we were preparing for PlayStation4 launch we faced 1 hard problem: how to make our games and videos storage system super-fast, highly available and fault tolerant. Moving from relation database to Cassandra is not easy, and it even harder if you want to support different search and query use cases. Join our talk to learn how we managed to build highly available platform, that supports tens of millions of active users and can execute multiple user specific queries in less than a millisecond. And all of it without using Solr or ElasticSearch.
About the Speaker
Alexander Filipchik, Sony
Alex spent last 4 years of his life building the next generation of PlayStation Network. He is honored to be a part of a small team of engineers who managed to build from scratch a platform that scaled from 0 users to 1 million PS4s in just 1 day and have being landing 1.5 million of new devices per month since and now reached tenth of millions of active users. He is passionate about technology, innovations, walking his dog and building scalable software using Cassandra.
QCon NYC: Distributed systems in practice, in theoryAysylu Greenberg
Modern systems in production rely on decades of computer science research. Over time, new architectural patterns emerge that enable more resilient and robust systems. In this talk, we'll discuss some of these patterns from systems I've worked on at Google and the related work that provide insights into the motivations behind them.
An introduction to core concepts in Apache Cassandra. We cover the evolution of database architecture as you try to scale a relational database to solve big data problems, and explain how Cassandra handles these problems efficiently.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This session covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Viewers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
Diagnosing Problems in Production: Cassandra Summit 2014Jon Haddad
At the 2014 Cassandra summit we covered how to ensure that your production experience with Cassandra is top notch by identifying the proper tools that should be put in place beforehand, and what tools you need to identify problems in real time.
Presented by Jon Haddad & Blake Eggleston
Joel Jacobson (Datastax) - Diagnosing Cassandra Problems in ProductionOutlyer
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
Video: https://www.youtube.com/watch?v=9XrHoAxd0Is
Join DevOps Exchange London here: http://www.meetup.com/DevOps-Exchange-London
Follow DOXLON on twitter http://www.twitter.com/doxlon
Cassandra Summit 2014: Diagnosing Problems in ProductionDataStax Academy
Presenters: Jon Haddad
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
Cache & Concurrency considerations for a high performance Cassandra deployment.
SriSatish Ambati
Cassandra has hit it's stride as a distributed java NoSQL database! It's fast, it's in-memory, it's scalable, it's seda; It's eventually consistent model makes it practical for the large & growing volumes of unstructured data usecases. It is also time to run it through the filters of performance analysis. For starters it runs on the java virtual machine and inherits the capabilities and culpabilities of the platform. This presentation reviews the runtime architecture, cache behavior & performance of a real-world workload on Cassandra. We blend existing system & jvm tools to get a quick overview & a breakdown of hotspots in the get, put & update operations. We highlight the role played by garbage collection & fragmentation due to long lived objects; We investigate lock contention in the data structures under concurrent usage. Cassandra uses UDP for management & TCP for data: we look at robustness of the communication patterns during high spikes and cluster-wide events. We review Non-Blocking Hashmap modifications to Cassandra that improve concurrency & amplify performance of this frontrunner in the NoSQL space
ApacheCon2010 NA
Wed, 03 November 2010 15:00
cassandra
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
At Booking.com, we have a constant flow of events coming from various applications and internal subsystems. This critical data needs to be stored for real-time, medium and long term analysis. Events are schema-less, making it difficult to use standard analysis tools.This presentation will explain how we built a storage and analysis solution based on Riak. The talk will cover: data aggregation and serialization, Riak configuration, solutions for lowering the network usage, and finally, how Riak's advanced features are used to perform real-time data crunching on the cluster nodes.
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
Companies today are innovating with real-time data to deliver truly amazing customer experiences in the moment. Real-time data management for real-time customer experience is core to staying ahead of competition and driving revenue growth. Join Trays to learn how Comcast is differentiating itself from it's own historical reputation with Customer Experience strategies.
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.
Download DataStax Enterprise: Academy.DataStax.com/Download
Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture.
Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication
Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html
Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.
Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
This is a two part talk in which we'll go over the architecture that enables Apache Cassandra’s linear scalability as well as how DataStax Drivers are able to take full advantage of it to provide developers with nicely designed and speedy clients extendable to the core.
To view the full-length video and tutorial, visit: https://academy.datastax.com/demos/getting-started-graph-databases
Getting Started with Graph Databases contains a brief overview of RDBMS architecture in comparison to graph, basic graph terminology, a real-world use case for graph, and an overview of Gremlin, the standard graph query language found in TinkerPop.
Most people hear "Spark" and think "Analytics". But the ability of Spark to efficiently distribute and manage a full-table traversal while functionally transforming the data make it perfectly suited to executing "Big Data" maintenance job
Apache Cassandra is a leading open-source distributed database capable of amazing feats of scale, but its data model requires a bit of planning for it to perform well. Of course, the nature of ad-hoc data exploration and analysis requires that we be able to ask questions we hadn’t planned on asking—and get an answer fast. Enter Apache Spark.
Spark is a distributed computation framework optimized to work in-memory, and heavily influenced by concepts from functional programming languages. It’s exactly what a Cassandra cluster needs to deliver real-time, ad-hoc querying of operational data at scale.
In this talk, we’ll explore Spark and see how it works together with Cassandra to deliver a powerful open-source big data analytic solution.
You’ve heard all of the hype, but how can SMACK work for you? In this all-star lineup, you will learn how to create a reactive, scaling, resilient and performant data processing powerhouse. Bringing Akka, Kafka and Mesos together provides a foundation to develop and operate an elastically scalable actor system. We will go through the basics of Akka, Kafka and Mesos and then deep dive into putting them together in an end2end (and back again) distrubuted transaction. Distributed transactions mean producers waiting for one or more of consumers to respond. We'll also go through automated ways to failure induce these systems (using LinkedIn Simoorg) and trace them from start to stop through each component (using Twitters Zipkin). Finally, you will see how Apache Cassandra and Spark can be combined to add the incredibly scaling storage and data analysis needed in fast data pipelines. With these technologies as a foundation, you have the assurance that scale is never a problem and uptime is default.
Cassandra is pretty awesome, sure I am biased, but it rocks. Always on, tuneable consistency and multi-master architecture? Let’s get our web scale on and build a highly available app that never goes down!
Hold on a second. There is one key piece of the puzzle that has a massive impact on your applications availability: the client driver.
In this talk we will go through the how to best configure your clients to make the most of failure handling and tuneable consistency in Cassandra.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
3. DataStax OpsCenter
• Will help with 90% of problems you
encounter
• Should be first place you look when
there's an issue
• Community version is free
• Enterprise version has additional
features
4. Server Monitoring & Alerts
• Monit
• monitor processes
• monitor disk usage
• send alerts
• Munin / collectd
• system perf statistics
• Nagios / Icinga
• Various 3rd party services
• Use whatever works for
you
6. Log Aggregation
• Hosted - Splunk, Loggly
• OSS - Logstash + Kibana, Greylog
• Many more…
• For best results all logs should be
aggregated here
• Oh yeah, and log your errors.
8. Incorrect Server Times
• Everything is written with a timestamp
• Last write wins
• Usually supplied by coordinator
• Can also be supplied by client
• What if your timestamps are wrong
because your clocks are off?
• Always install ntpd!
server
time: 10
server
time: 20
INSERT
real time: 12
DELETE
real time: 15
insert:20
delete:10
9. Tombstones
• Tombstones are a marker that data
no longer exists
• Tombstones have a timestamp just
like normal data
• They say "at time X, this no longer
exists"
10. Tombstone Hell
• Queries on partitions with a lot of tombstones require a lot of filtering
• This can be reaaaaaaally slow
• Consider:
• 100,000 rows in a partition
• 99,999 are tombstones
• How long to get a single row?
• Cassandra is not a queue!
read 99,999 tombstones
finally get the
right data
11. Not using a Snitch
• Snitch lets us distribute data in a fault tolerant way
• Changing this with a large cluster is time
consuming
• Dynamic Snitching
• use the fastest replica for reads
• RackInferring (uses IP to pick replicas)
• DC aware
• PropertyFileSnitch (cassandra-topology.properties)
• EC2Snitch & EC2MultiRegion
• GoogleCloudSnitch
• GossipingPropertyFileSnitch (recommended)
12. Version Mismatch
• SSTable format changed between
versions, making streaming
incompatible
• Version mismatch can break bootstrap,
repair, and decommission
• Introducing new nodes? Stick w/ the
same version
• Upgrade nodes in place
• One at a time
• One rack / AZ at a time (requires proper snitch)
13. Disk Space not Reclaimed
• When you add new nodes, data is
streamed from existing nodes
• … but it's not deleted from them after
• You need to run a nodetool cleanup
• Otherwise you'll run out of space just by
adding nodes
14. Using Shared Storage
• Single point of failure
• High latency
• Expensive
• Performance is about latency
• Can increase throughput with more
disks
• In general avoid EBS, SAN, NAS
15. Compaction
• Compaction merges SSTables
• Too much compaction?
• Opscenter provides insight into compaction
cluster wide
• nodetool
• compactionhistory
• getcompactionthroughput
• Leveled vs Size Tiered vs Date Tiered
• Leveled on SSD + Read Heavy
• Size tiered on Spinning rust
• Size tiered is great for write heavy time series workloads
• Date tiered is new and is showing HUGE promise
24. nodetool tpstats
• What's blocked?
• MemtableFlushWriter? - Slow
disks!
• also leads to GC issues
• Dropped mutations?
• need repair!
25. Histograms
• proxyhistograms
• High level read and write times
• Includes network latency
• cfhistograms <keyspace> <table>
• reports stats for single table on a single
node
• Used to identify tables with
performance problems
28. JVM GC Overview
• What is garbage collection?
• Manual vs automatic memory management
• Generational garbage collection (ParNew & CMS)
• New Generation
• Old Generation
29. New Generation
• New objects are created in the new gen (eden)
• Comprised of Eden & 2 survivor spaces (SurvivorRatio)
• Space identified by HEAP_NEWSIZE in cassandra-env.sh
• Historically limited to 800MB
30. Minor GC
• Occurs when Eden fills up
• Stop the world
• Dead objects are removed
• Copy current survivor to empty survivor
• Live objects are promoted into survivor (S0 & S1) then old gen
• Some survivor objects promoted to old gen (MaxTenuringThreshold)
• Spillover promoted to old gen
• Removing objects is fast, promoting objects is slow
31. Old Generation
• Objects are promoted to new gen from old gen
• Major GC
• Mostly concurrent
• 2 short stop the world pauses
32. Full GC
• Occurs when old gen fills up or
objects can’t be promoted
• Stop the world
• Collects all generations
• Defragments old gen
• These are bad!
• Massive pauses
33. Workload 1: Write Heavy
• Objects promoted: Memtables
• New gen too big
• Remember: promoting objects is slow!
• Huge new gen = potentially a lot of promotion
new gen old gen
too much promotion
34. Workload 2: Read Heavy
• Short lived objects being promoted into old gen
• Lots of minor GCs
• Read heavy workloads on SSD
• Results in frequent full GC
new gen old gen (full of short lived objects)
early promotion
fills up quickly
35. GC Profiling
• Opscenter gc stats
• Look for correlations between gc spikes
and read/write latency
• Cassandra GC Logging
• Can be activated in cassandra-env.sh
• jstat
• prints gc activity
38. Narrow Down the Problem
• Is it even Cassandra? Check your
metrics!
• Nodes flapping / failing
• Check ops center
• Dig into system metrics
• Slow queries
• Find your bottleneck
• Check system stats
• JVM GC
• Compaction
• Histograms
• Tracing