Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics:
- New IO Scheduler and Disk Parallelism
- Per-Service-Level Timeouts
- Better Workload Estimation for Backpressure and Out-of-Memory Conditions
- Large Partition Handling Improvements
- Optimizing Reverse Queries
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
In this talk, we will discuss Happn's war story about migrating a Cassandra 2.1 cluster containing more than 68 Billion records in a counter table to ScyllaDB Open Source.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Scylla allows us to create highly performant and scalable systems. However, to achieve good results and prevent our Scylla cluster from being overloaded, we need to properly write our client application and configure the driver. Join this session to learn some practical tips that can help you make your applications faster and more available.
No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.
Lightweight Transactions in Scylla versus Apache CassandraScyllaDB
Lightweight transactions (LWT) has been a long anticipated feature for Scylla. Join Scylla VP of Product Tzach Livyatan and Software Team Lead Konstantin Osipov for a webinar introducing the Scylla implementation of LWT, a feature that brings strong consistency to our NoSQL database.
In this webinar we will cover the tradeoffs typically made between database consistency, availability and latency; how to use lightweight transactions in Scylla; the similarities and differences between Scylla’s Paxos implementation and Cassandra’s, and what it all means to users.
From attending this live webinar you’ll learn…
The advantages and disadvantages of various consistency options
Scylla lightweight transactions: syntax and semantics
A design and implementation overview, changes in Paxos
Performance comparisons with Apache Cassandra
Scylla’s future roadmap for LWT beyond Paxos
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...DataStax
In this presentation, we will look into JIRAs, JavaDocs and system log entries to gain a deeper understanding on how LCS works under the hood. We will explain what scenarios don't work well for LCS and (more importantly) why. We will leverage legacy TRACE/DEBUG level log for compaction related objects as well as some newer compaction logging information introduced in C* 3.6 (CASSANDRA-10805) to gain better insights.
About the Speakers
Wei Deng Solutions Architect, DataStax
Solutions Architect for DataStax. I have a strong interest in big data, cloud application and distributed computing practices.
ClickHouse Monitoring 101: What to monitor and howAltinity Ltd
Webinar. Presented by Robert Hodges and Ned McClain, April 1, 2020
You are about to deploy ClickHouse into production. Congratulations! But what about monitoring? In this webinar we will introduce how to track the health of individual ClickHouse nodes as well as clusters. We'll describe available monitoring data, how to collect and store measurements, and graphical display using Grafana. We'll demo techniques and share sample Grafana dashboards that you can use for your own clusters.
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics:
- New IO Scheduler and Disk Parallelism
- Per-Service-Level Timeouts
- Better Workload Estimation for Backpressure and Out-of-Memory Conditions
- Large Partition Handling Improvements
- Optimizing Reverse Queries
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
In this talk, we will discuss Happn's war story about migrating a Cassandra 2.1 cluster containing more than 68 Billion records in a counter table to ScyllaDB Open Source.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Scylla allows us to create highly performant and scalable systems. However, to achieve good results and prevent our Scylla cluster from being overloaded, we need to properly write our client application and configure the driver. Join this session to learn some practical tips that can help you make your applications faster and more available.
No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.
Lightweight Transactions in Scylla versus Apache CassandraScyllaDB
Lightweight transactions (LWT) has been a long anticipated feature for Scylla. Join Scylla VP of Product Tzach Livyatan and Software Team Lead Konstantin Osipov for a webinar introducing the Scylla implementation of LWT, a feature that brings strong consistency to our NoSQL database.
In this webinar we will cover the tradeoffs typically made between database consistency, availability and latency; how to use lightweight transactions in Scylla; the similarities and differences between Scylla’s Paxos implementation and Cassandra’s, and what it all means to users.
From attending this live webinar you’ll learn…
The advantages and disadvantages of various consistency options
Scylla lightweight transactions: syntax and semantics
A design and implementation overview, changes in Paxos
Performance comparisons with Apache Cassandra
Scylla’s future roadmap for LWT beyond Paxos
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...DataStax
In this presentation, we will look into JIRAs, JavaDocs and system log entries to gain a deeper understanding on how LCS works under the hood. We will explain what scenarios don't work well for LCS and (more importantly) why. We will leverage legacy TRACE/DEBUG level log for compaction related objects as well as some newer compaction logging information introduced in C* 3.6 (CASSANDRA-10805) to gain better insights.
About the Speakers
Wei Deng Solutions Architect, DataStax
Solutions Architect for DataStax. I have a strong interest in big data, cloud application and distributed computing practices.
ClickHouse Monitoring 101: What to monitor and howAltinity Ltd
Webinar. Presented by Robert Hodges and Ned McClain, April 1, 2020
You are about to deploy ClickHouse into production. Congratulations! But what about monitoring? In this webinar we will introduce how to track the health of individual ClickHouse nodes as well as clusters. We'll describe available monitoring data, how to collect and store measurements, and graphical display using Grafana. We'll demo techniques and share sample Grafana dashboards that you can use for your own clusters.
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
Event streaming architectures require high-throughput, low-latency components to consistently and smoothly transfer data between heterogenous transactional and analytical systems. Join us and Confluent's Tim Berglund to learn how the Scylla and Confluent Kafka interoperate as a foundation upon which you can build enterprise-grade, event-driven applications, plus a use case from Numberly.
Take advantage of ScyllaDB’s wide column NoSQL features such as workload prioritization to balance the needs of OLTP and OLAP in the same cluster. Plus learn about the different compaction strategies and which one would be right for your workload. With additional insights on properly sizing your database and using open source tools for observability.
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
Deleting data from Cassandra has several challenges, and existing solutions (tombstones or TTLs) have limitations that make them unusable or untenable in certain circumstances. We'll explore the cases where existing deletion options fail or are inadequate, then describe a solution we developed which deletes data from Cassandra during standard or user-defined compaction, but without resorting to tombstones or TTL's.
About the Speaker
Eric Stevens Principal Architect, ProtectWise, Inc.
Eric is the principal architect, and day one employee of ProtectWise, Inc., specializing in massive real time processing and scalability problems. The team at ProtectWise processes, analyzes, optimizes, indexes, and stores billions of network packets each second. They look for threats in real time, but also store full fidelity network data (including PCAP), and when new security intelligence is received, automatically replay existing network history through that new intelligence.
Running Scylla on Kubernetes with Scylla OperatorScyllaDB
Kubernetes has quickly become the ad-hoc standard for managing software deployments. The Scylla team recently released the beta version of the Scylla Operator for Kubernetes.
Join us to learn how Kubernetes can be used to automate the deployment, scaling, and various operations of Scylla NoSQL. In this webinar we will:
- Present the mapping between Scylla and Kubernetes entities, and explain their reasoning
- Give insight into the Operator Open Source project, and how you can get involved
- Explain and demo common procedures with Scylla Operator
- Discuss the challenges of running a high performance, persistent application on Kubernetes, and the trade offs we considered
- Share our plans for Scylla on Kubernetes
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Modeling Data and Queries for Wide Column NoSQLScyllaDB
Discover how to model data for wide column databases such as ScyllaDB and Apache Cassandra. Contrast the differerence from traditional RDBMS data modeling, going from a normalized “schema first” design to a denormalized “query first” design. Plus how to use advanced features like secondary indexes and materialized views to use the same base table to get the answers you need.
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
How can Kubernetes be best used to automate the deployment, scaling, and various operations of a Scylla database?
Enter Kubernetes Operators, the way to combine domain-specific knowledge about Scylla with the automation framework of Kubernetes.
In this presentation, we will quickly explore what Kubernetes is and why it works so well, highlight the pain points of running Scylla with just Kubernetes primitives, and show how we extended Kubernetes so that it can correctly operate a Scylla database.
Finally, we will show the Scylla Operator in action and show how easily you can spin up a Scylla cluster with just one command.
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
Flink Forward San Francisco 2022.
In modern data platform architectures, stream processing engines such as Apache Flink are used to ingest continuous streams of data into data lakes such as Apache Iceberg. Streaming ingestion to iceberg tables can suffer by two problems (1) small files problem that can hurt read performance (2) poor data clustering that can make file pruning less effective. To address those two problems, we propose adding a shuffling stage to the Flink Iceberg streaming writer. The shuffling stage can intelligently group data via bin packing or range partition. This can reduce the number of concurrent files that every task writes. It can also improve data clustering. In this talk, we will explain the motivations in details and dive into the design of the shuffling stage. We will also share the evaluation results that demonstrate the effectiveness of smart shuffling.
by
Gang Ye & Steven Wu
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
Did you like it? Check out our E-book: Apache NiFi - A Complete Guide
https://ebook.getindata.com/apache-nifi-complete-guide
Apache NiFi is one of the most popular services for running ETL pipelines otherwise it’s not the youngest technology. During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
Large partitions shall no longer be a nightmare. That is the goal of CASSANDRA-11206.
100MB and 100,000 cells per partition is the recommended limit for a single partition in Cassandra up to 3.5. Exceeding these limits can cause a lot of trouble. Repairs and compactions could fail and reads cause out-of-memory failures.
This talk provides a deep-dive of the reasons for the previous limitations, why exceeding these limitations caused trouble, how the improvements in Cassandra 3.6 helps with big partitions and why you should not blindly let your partitions get huge.
About the Speaker
Robert Stupp Solution Architect, DataStax
Robert is working as a Solutions Architect at DataStax and is also a Committer to Apache Cassandra. Before joining DataStax he worked with his customers to architect and build distributed systems using Cassandra and has a long experience in building distributed backend systems mostly using Java as the preferred language of choice.
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
The talk will cover most of the performance enhancement introduced to Scylla over the past 12 months. As the throughput was very good before, we focused on Scylla’s behaviour under all types of workloads and data models. Scylla improved its latency under all scenarios, improving the behaviours of data models such as large partitions and time series, improvement of the I/O scheduler and behaviour of streaming and repair.
How Scylla Make Adding and Removing Nodes Faster and SaferScyllaDB
When a new node is added or removed, Scylla has to transfer part of the existing data from some nodes to their neighbors. When a node fails, Scylla has to repopulate its data with data from the surviving replicas. Those operations are collectively referred to as "streaming" operations, since they simply stream data from one node to another, without using this opportunity to also fix discrepancies in the data. This is in contrast with the repair operation, that looks into all existing replicas and reconcile their contents. Scylla is moving towards unifying those two operations. In this talk we will discuss why this is considered beneficial, and what other possibilities this opens to users.
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
Event streaming architectures require high-throughput, low-latency components to consistently and smoothly transfer data between heterogenous transactional and analytical systems. Join us and Confluent's Tim Berglund to learn how the Scylla and Confluent Kafka interoperate as a foundation upon which you can build enterprise-grade, event-driven applications, plus a use case from Numberly.
Take advantage of ScyllaDB’s wide column NoSQL features such as workload prioritization to balance the needs of OLTP and OLAP in the same cluster. Plus learn about the different compaction strategies and which one would be right for your workload. With additional insights on properly sizing your database and using open source tools for observability.
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
Deleting data from Cassandra has several challenges, and existing solutions (tombstones or TTLs) have limitations that make them unusable or untenable in certain circumstances. We'll explore the cases where existing deletion options fail or are inadequate, then describe a solution we developed which deletes data from Cassandra during standard or user-defined compaction, but without resorting to tombstones or TTL's.
About the Speaker
Eric Stevens Principal Architect, ProtectWise, Inc.
Eric is the principal architect, and day one employee of ProtectWise, Inc., specializing in massive real time processing and scalability problems. The team at ProtectWise processes, analyzes, optimizes, indexes, and stores billions of network packets each second. They look for threats in real time, but also store full fidelity network data (including PCAP), and when new security intelligence is received, automatically replay existing network history through that new intelligence.
Running Scylla on Kubernetes with Scylla OperatorScyllaDB
Kubernetes has quickly become the ad-hoc standard for managing software deployments. The Scylla team recently released the beta version of the Scylla Operator for Kubernetes.
Join us to learn how Kubernetes can be used to automate the deployment, scaling, and various operations of Scylla NoSQL. In this webinar we will:
- Present the mapping between Scylla and Kubernetes entities, and explain their reasoning
- Give insight into the Operator Open Source project, and how you can get involved
- Explain and demo common procedures with Scylla Operator
- Discuss the challenges of running a high performance, persistent application on Kubernetes, and the trade offs we considered
- Share our plans for Scylla on Kubernetes
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Modeling Data and Queries for Wide Column NoSQLScyllaDB
Discover how to model data for wide column databases such as ScyllaDB and Apache Cassandra. Contrast the differerence from traditional RDBMS data modeling, going from a normalized “schema first” design to a denormalized “query first” design. Plus how to use advanced features like secondary indexes and materialized views to use the same base table to get the answers you need.
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
How can Kubernetes be best used to automate the deployment, scaling, and various operations of a Scylla database?
Enter Kubernetes Operators, the way to combine domain-specific knowledge about Scylla with the automation framework of Kubernetes.
In this presentation, we will quickly explore what Kubernetes is and why it works so well, highlight the pain points of running Scylla with just Kubernetes primitives, and show how we extended Kubernetes so that it can correctly operate a Scylla database.
Finally, we will show the Scylla Operator in action and show how easily you can spin up a Scylla cluster with just one command.
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
Flink Forward San Francisco 2022.
In modern data platform architectures, stream processing engines such as Apache Flink are used to ingest continuous streams of data into data lakes such as Apache Iceberg. Streaming ingestion to iceberg tables can suffer by two problems (1) small files problem that can hurt read performance (2) poor data clustering that can make file pruning less effective. To address those two problems, we propose adding a shuffling stage to the Flink Iceberg streaming writer. The shuffling stage can intelligently group data via bin packing or range partition. This can reduce the number of concurrent files that every task writes. It can also improve data clustering. In this talk, we will explain the motivations in details and dive into the design of the shuffling stage. We will also share the evaluation results that demonstrate the effectiveness of smart shuffling.
by
Gang Ye & Steven Wu
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
Did you like it? Check out our E-book: Apache NiFi - A Complete Guide
https://ebook.getindata.com/apache-nifi-complete-guide
Apache NiFi is one of the most popular services for running ETL pipelines otherwise it’s not the youngest technology. During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
Large partitions shall no longer be a nightmare. That is the goal of CASSANDRA-11206.
100MB and 100,000 cells per partition is the recommended limit for a single partition in Cassandra up to 3.5. Exceeding these limits can cause a lot of trouble. Repairs and compactions could fail and reads cause out-of-memory failures.
This talk provides a deep-dive of the reasons for the previous limitations, why exceeding these limitations caused trouble, how the improvements in Cassandra 3.6 helps with big partitions and why you should not blindly let your partitions get huge.
About the Speaker
Robert Stupp Solution Architect, DataStax
Robert is working as a Solutions Architect at DataStax and is also a Committer to Apache Cassandra. Before joining DataStax he worked with his customers to architect and build distributed systems using Cassandra and has a long experience in building distributed backend systems mostly using Java as the preferred language of choice.
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
The talk will cover most of the performance enhancement introduced to Scylla over the past 12 months. As the throughput was very good before, we focused on Scylla’s behaviour under all types of workloads and data models. Scylla improved its latency under all scenarios, improving the behaviours of data models such as large partitions and time series, improvement of the I/O scheduler and behaviour of streaming and repair.
How Scylla Make Adding and Removing Nodes Faster and SaferScyllaDB
When a new node is added or removed, Scylla has to transfer part of the existing data from some nodes to their neighbors. When a node fails, Scylla has to repopulate its data with data from the surviving replicas. Those operations are collectively referred to as "streaming" operations, since they simply stream data from one node to another, without using this opportunity to also fix discrepancies in the data. This is in contrast with the repair operation, that looks into all existing replicas and reconcile their contents. Scylla is moving towards unifying those two operations. In this talk we will discuss why this is considered beneficial, and what other possibilities this opens to users.
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
Talk for YOW! by Brendan Gregg. "Systems performance studies the performance of computing systems, including all physical components and the full software stack to help you find performance wins for your application and kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (ftrace, bcc/BPF, and bpftrace/BPF), advice about what is and isn't important to learn, and case studies to see how it is applied. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud.
"
Talk for QConSF 2015: "Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of system performance tools, touring common problems with system metrics, monitoring, statistics, visualizations, measurement overhead, and benchmarks. This will likely involve some unlearning, as you discover tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many popular talks on operating system performance tools. This is an anti-version of these talks, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice and methodologies for verifying new performance tools, understanding how they work, and using them successfully."
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1N4GN6z.
Brendan Gregg focuses on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive. Gregg includes advice and methodologies for verifying new performance tools, understanding how they work, and using them successfully. Filmed at qconsf.com.
Brendan Gregg is a senior performance architect at Netflix, where he does large scale computer performance design, analysis, and tuning. He is the author of multiple technical books including Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration.
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
Speaker: Andrzej Dyjak
Language: English
In recent years security industry started to grow fond of Apple’s iOS and OS X platforms. This talk will cover one of XNU's flagship debugging utilities: DTrace, a dynamic tracing framework for troubleshooting kernel and application problems on production systems in real time. It will be shown how it can be used in order to ease various tasks within the realm of dynamic binary analysis and beyond.
CONFidence: http://confidence.org.pl/
From USENIX LISA 2010, San Jose.
Visualizations that include heat maps can be an effective way to present performance data: I/O latency, resource utilization, and more. Patterns can emerge that would be difficult to notice from columns of numbers or line graphs, which are revealing previously unknown behavior. These visualizations are used in a product as a replacement for traditional metrics such as %CPU and are allowing end users to identify more issues much more easily (and some issues are becoming nearly impossible to identify with tools such as vmstat(1)). This talk covers what has been learned, crazy heat map discoveries, and thoughts for future applications beyond performance analysis.
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
Talk for MacIT 2014. This talk is about systems performance on OS X, and introduces the USE Method to check for common performance bottlenecks and errors. This methodology can be used by beginners and experts alike, and begins by constructing a checklist of the questions we’d like to ask of the system, before reaching for tools to answer them. The focus is resources: CPUs, GPUs, memory capacity, network interfaces, storage devices, controllers, interconnects, as well as some software resources such as mutex locks. These areas are investigated by a wide variety of tools, including vm_stat, iostat, netstat, top, latency, the DTrace scripts in /usr/bin (which were written by Brendan), custom DTrace scripts, Instruments, and more. This is a tour of the tools needed to solve our performance needs, rather than understanding tools just because they exist. This talk will make you aware of many areas of OS X that you can investigate, which will be especially useful for the time when you need to get to the bottom of a performance issue.
Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of Linux performance tools, touring common problems with system tools, metrics, statistics, visualizations, measurement overhead, and benchmarks. You might discover that tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many talks on tools that work, including giving the Linux PerformanceTools talk originally at SCALE. This is an anti-version of that talk, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice for verifying new performance tools, understanding how they work, and using them successfully.
Similar to Performance Monitoring: Understanding Your Scylla Cluster (20)
Optimizing NoSQL Performance Through ObservabilityScyllaDB
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor!
Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track.
This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example:
- Common issues getting up and running with the monitoring stack
- Using the CQL optimizations dashboard
- Common issues causing high latency in a node
- Common issues causing replica imbalance
- What a healthy system looks like in terms of memory
- Key metrics to keep an eye on
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
Discuss the core tradeoffs and considerations involved in order-free and ordered stream processing. Brian Taylor walks through the pros and cons of three different approaches: no data dependency, deferred inter-event data dependency, and streaming inter-event data dependency.
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
We start by setting up a common ground introducing why relational databases fall short, addressing common EDA characteristics such as the need for real-time response times and schemaless approaches to address recurring changes to adapt and on-board new use cases. Next, interact with a sample Rust-based application: a social network app demonstrating an integration of both ScyllaDB and Redpanda.
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example:
- Understanding query first design principles
- Planning for schema evolution
- Steering clear of common pitfalls and anti-patterns
- Assessing data access patterns
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
Our first webinar of this series will cover common mistakes with practices such as:
- Translating the data model to NoSQL
- Optimizing table design
- Optimizing query performance
- Planning for partitioning
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
Expert tips on how to maximize your database performance at scale
Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies.
In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams.
We’ll cover how to:
- Design and deploy a large-scale distributed database cluster
- Optimize your clients’ interactions with it
- Expand the cluster horizontally and globally
- Ensure it survives whatever disasters the world throws at it
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll:
- Examine the context and technical requirements
- Talk about potential solutions and cover the pros and cons of each
- Disclose what approach the team took, and how it worked out
About the speaker:
Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care?
This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on:
- The hidden aspects of linear scaling
- When linear scaling matters most and when it’s simply irrelevant
- Often overlooked considerations for optimizing and measuring distributed systems performance
Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.
Navigating Complex Database Performance Hurdles
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma:
- The presenters will describe the context and technical requirements
- Together, we’ll talk about potential solutions and cover the pros and cons of each
- Finally, we’ll disclose what approach the team took, and how it worked out
Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
Felipe Cardeneti Mendes, Solutions Architect at ScyllaDB
Navigating workload-specific performance challenges and tradeoffs.
Felipe Mendes covers how to navigate the top performance challenges and tradeoffs that you’re likely to face with your project’s specific workload characteristics and technical/business requirements.
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
Pavel Emelyanov, Principal Engineer at ScyllaDB
Botond Denes, C++ Developer at ScyllaDB
What performance-minded engineers need to know.
Hear from Pavel Emelyanov and Botond Dénes on the impact of database internals – specifically, what to look for if you need latency and/or throughput improvements.
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
Piotr Sarna, Software Engineer at Turso
Understanding and tapping your driver’s performance potential.
Piotr Sarna discusses how to get the most out of a driver, particularly from the performance perspective, and select a driver that’s a good fit for your needs.
Technical risks of putting a cache in front of your database– and what to do instead
Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. External caches can be one of the more problematic components of distributed application architecture.
Join this webinar for a technical discussion of the risks associated with using an external cache and a look at how ScyllaDB’s cache implementation simplifies your architecture without compromising latency. We’ll cover:
- Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache)
- 7 specific reasons why external caching ia a bad choice
- Why Linux’s default caching doesn’t work well for databases
- The advantages & architecture of ScyllaDB's specialized row-based cache
- Real-world examples of why and how teams eliminated their external cache with ScyllaDB
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate.
Join us to learn:
- Why and how to ensure your database takes full advantage of your cloud infrastructure
- What architectural considerations matter most for high throughput and low latency
- Key factors to consider when selecting a high-performance database
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. Caches can be one of the more problematic components of distributed application architecture.
Join this webinar for a technical discussion of the risks associated with using an external cache and a look at an alternative strategy that simplifies your architecture without compromising latency. We’ll cover:
- Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache)
- 7 specific reasons why external caching can be a bad choice
- Why Linux’s default caching doesn’t work well for databases
- The advantages & architecture of specialized row-based caches
- Real-world examples of why and how teams eliminated their external cache
Expert tips on how to maximize your database potential
If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case?
This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to:
- Infrastructure selection
- ScyllaDB configuration
- Client-side setup
- Data modeling
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
In this talk, Felipe Mendes, Solutions Architect at ScyllaDB, shares how 4 companies managed their migration. He covers:
Disney+ – No migration needed!
Discord – Shadow cluster
OpenWeb – TTL expiration, cover Load and Stream
MyHeritage – Counters
ShareChat – Bonus: A bit of everything
In this talk, Lubos discusses tools and methods for a successful migration. He covers:
Methods
Data (re)modeling
APIs
Spark Migrator
DS bulk
Tuning
Testing/monitoring
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
In this talk, Jon discusses practical strategies and issues to consider. He covers:
Reasons for Migrations
DB Functionality
Cost/Licensing
Outdated Technology
Scaling Problems
Technology Evolution
SQL to NoSQL
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
2. Our Agenda for today
• Basics of Monitoring Scylla
• Monitoring Infrastructure
• Understanding Scylla metrics
3. Linux tools
• Linux tools are familiar, widely available, no setup needed
▪iostat, top, sar, netstat, etc.
•Good for tier-1 analysis and overviews
▪but often don’t tell the whole story,
▪and are limited to a node only.
4. The top example
• Scylla uses a polling architecture
▪Scylla running at < 100 % CPU -> definitely underloaded.
▪Scylla running at = 100 % CPU -> impossible to determine.
CPU in use CPU idle
request
poll
period
5. The top example
• Scylla uses a polling architecture
▪Scylla running at < 100 % CPU -> definitely underloaded.
▪Scylla running at = 100 % CPU -> impossible to determine.
CPU in use
poll
period
poll
period
poll
period
9. Not all issues are database issues
• Client can introduce latencies as well
▪most notably, cassandra-stress will do.
▪JHiccup - client instrumentation for client-side hiccups.
10. Our Agenda for today
• Basics of Monitoring Scylla
• Monitoring Infrastructure
• Understanding Scylla metrics
13. How to use those metrics?
• your own infrastructure
▪Whatever works for collectd, works for Scylla
• scyllatop
• prometheus + grafana
14. scyllatop
• easy to use, top-like interface.
• very high resolution
• good for ad-hoc probing
▪not very good for cluster-wide view or time progression
15. List of metrics available
• RESTful API:
$ curl http://scylla-server:10000/collectd | json_reformat
[
…
{
"enable": true,
"id": {
"plugin_instance": "#cpu",
"type_instance": "load",
"type": "gauge",
"plugin": "reactor"
}
},
• scyllatop -l:
▪ includes host metrics
# scylla running with --smp 1
$ scyllatop -l | wc -l
145
16. prometheus + grafana
•easy cluster-wide view, with pre-configured dashboards
•easy system progression view
•easy metric correlation
•adding composite metrics
•harder to setup,
-but we try to make it easier, docker images, pre-loaded dashboards.
-https://github.com/scylladb/scylla-grafana-monitoring
19. Our Agenda for today
• Basics of Monitoring Scylla
• Monitoring Infrastructure
• Understanding Scylla metrics
20. Naming of metrics
Collectd naming scheme:
{host}/{plugin}-{plugin instance}/{type}-{type instance}
• plugin - name of the component
• plugin instance - instance of the component
• type - type of metric’s value
• type instance - name of the metric of given component
22. Naming of metrics
• plugin instances usually correspond to shard numbers.
▪ Example --smp 3:
node1/reactor-0/gauge-load
node1/reactor-1/gauge-load
node1/reactor-2/gauge-load
23. • GAUGE - value as is
▪ collectd types: gauge, bytes, pending_operations, ...
▪ reactor-*/gauge-load, lsa-*/bytes-total_space, ...
• DERIVE - change over time
▪ collectd types: total_operations, derive, ...
▪ database-*/total_operations-total_reads
Data source types
24. Naming of metrics
When exported to prometheus:
collectd_{plugin}_{type} { {plugin}={plugin instance},type={type instance},instance={host} }
E.g.:
collectd_reactor_gauge{reactor=”0”,type=”load”,instance=”node1”}
28. Best reflected by reactor-*/gauge-load
• percentage of time Scylla was executing tasks
▪ excludes busy polling, execution of on-idle tasks, sleeping
▪ Updated every second and reflects past 5 seconds.
• 100 means the server is CPU-bound
CPU Utilization
30. Memory utilization metrics
total memory
standard
allocations
(non-LSA)
LSA free
memtables
(dirty)
cache
lsa-*/bytes-non_lsa_used_space
memory-*/memory-total_memory
lsa-*/bytes-total_space
memory-*/bytes-dirty cache-*/bytes-total
31. Memory utilization metrics
• Useful for detecting:
▪cache getting shrunk down due to pressure from std allocations
▪requests blocking
-only 50 % of memory is allowed to be dirty.
-Requests will block if we can’t clean fast enough.