This document provides an overview of Cassandra's read and write paths. It describes the core components involved, including memtables, SSTables, commitlog, cache service, column family store, and more. It explains how writes are applied to the commitlog and memtable and how reads merge data from memtables and SSTables using the collation controller.
Cassandra by example - the path of read and write requestsgrro
This article describes how Cassandra handles and processes requests. It will help you to get a better impression about Cassandra's internals and architecture. The path of a single read request as well as the path of a single write request will be described in detail.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
Co-Founder and CTO of Instaclustr, Ben Bromhead's presentation at the Cassandra Summit 2016, in San Jose.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day. Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
The Google Chubby lock service presented in 2006 is the inspiration for Apache ZooKeeper: let's take a deep dive into Chubby to better understand ZooKeeper and distributed consensus.
Speakers: Jingcheng Du and Ramkrishna Vasudevan (Intel)
As HBase continues to expand in application and enterprise or government deployments, there is a growing demand for storing data across geographically distributed datacenters for improved availability and disaster recovery. The Cross-Site BigTable extends HBase to make it well-suited for such deployments, providing the capabilities of creating and accessing HBase tables that are partitioned and asynchronously backed-up over a number of distributed datacenters. This talk reveals how the Cross-Site BigTable manages data access over multiple datacenters and removes the data center itself as a single point of failure in geographically distributed HBase deployments.
Cassandra by example - the path of read and write requestsgrro
This article describes how Cassandra handles and processes requests. It will help you to get a better impression about Cassandra's internals and architecture. The path of a single read request as well as the path of a single write request will be described in detail.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
Co-Founder and CTO of Instaclustr, Ben Bromhead's presentation at the Cassandra Summit 2016, in San Jose.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day. Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
The Google Chubby lock service presented in 2006 is the inspiration for Apache ZooKeeper: let's take a deep dive into Chubby to better understand ZooKeeper and distributed consensus.
Speakers: Jingcheng Du and Ramkrishna Vasudevan (Intel)
As HBase continues to expand in application and enterprise or government deployments, there is a growing demand for storing data across geographically distributed datacenters for improved availability and disaster recovery. The Cross-Site BigTable extends HBase to make it well-suited for such deployments, providing the capabilities of creating and accessing HBase tables that are partitioned and asynchronously backed-up over a number of distributed datacenters. This talk reveals how the Cross-Site BigTable manages data access over multiple datacenters and removes the data center itself as a single point of failure in geographically distributed HBase deployments.
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
A short introduction to replication and consistency in the Cassandra distributed database. Delivered April 28th, 2010 at the Seattle Scalability Meetup.
Cassandra is a highly scalable, eventually consistent, distributed, structured columnfamily store with no single points of failure, initially open-sourced by Facebook and now part of the Apache Incubator. These slides are from Jonathan Ellis's OSCON 09 talk: http://en.oreilly.com/oscon2009/public/schedule/detail/7975
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
There is a plethora of storage solutions for big data, each having its own pros and cons. The objective of this talk is to delve deeper into specific classes of storage types like Distributed File Systems, in-memory Key Value Stores, Big Table Stores and provide insights on how to choose the right storage solution for a specific class of problems. For instance, running large analytic workloads, iterative machine learning algorithms, and real time analytics.
The talk will cover HDFS, HBase and brief introduction to Redis
Werner Vogels, the CTO of Amazon.com, mentioned in one of his papers that "data inconsistency in large-scale reliable distributed systems has to be tolerated" in order to obtain the desired performance and availability. In this talk I'll present you how we equip Cassandra with a primary-backup atomic broadcast of a write-ahead log. This way, we achieved to make Apache Cassandra a key-value store that combines strong consistency with high performance and high availability. Finally, we will discuss our compaction scheduling which by far improves throughput by up to 40% in write-intensive workloads.
At Salesforce, we have deployed many thousands of HBase/HDFS servers, and learned a lot about tuning during this process. This talk will walk you through the many relevant HBase, HDFS, Apache ZooKeeper, Java/GC, and Operating System configuration options and provides guidelines about which options to use in what situation, and how they relate to each other.
A brief, but action-packed introduction to DataStax Enterprise Search. In this deck, we'll get an overview of DSE Search's value proposition, see some example CQL search queries, and dive into the details of the indexing and query paths.
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.
The strength of an open source project resides entirely in its developer community; a strong democratic culture of participation and hacking makes for a better piece of software. The key requirement is having developers who are not only willing to contribute, but also knowledgeable about the project’s internal structure and architecture. This session will introduce developers to the core internal architectural concepts of HBase, not just “what” it does from the outside, but “how” it works internally, and “why” it does things a certain way. We’ll walk through key sections of code and discuss key concepts like the MVCC implementation and memstore organization. The goal is to convert serious “HBase Users” into HBase Developer Users”, and give voice to some of the deep knowledge locked in the committers’ heads.
DataStax: An Introduction to DataStax Enterprise SearchDataStax Academy
1) Why We Built DSE Search
2) Basics of the Read and Write Paths
3) Fault-tolerance and Adaptive Routing
4) Analytics with Search and Spark
5) Live Indexing
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
A short introduction to replication and consistency in the Cassandra distributed database. Delivered April 28th, 2010 at the Seattle Scalability Meetup.
Cassandra is a highly scalable, eventually consistent, distributed, structured columnfamily store with no single points of failure, initially open-sourced by Facebook and now part of the Apache Incubator. These slides are from Jonathan Ellis's OSCON 09 talk: http://en.oreilly.com/oscon2009/public/schedule/detail/7975
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
There is a plethora of storage solutions for big data, each having its own pros and cons. The objective of this talk is to delve deeper into specific classes of storage types like Distributed File Systems, in-memory Key Value Stores, Big Table Stores and provide insights on how to choose the right storage solution for a specific class of problems. For instance, running large analytic workloads, iterative machine learning algorithms, and real time analytics.
The talk will cover HDFS, HBase and brief introduction to Redis
Werner Vogels, the CTO of Amazon.com, mentioned in one of his papers that "data inconsistency in large-scale reliable distributed systems has to be tolerated" in order to obtain the desired performance and availability. In this talk I'll present you how we equip Cassandra with a primary-backup atomic broadcast of a write-ahead log. This way, we achieved to make Apache Cassandra a key-value store that combines strong consistency with high performance and high availability. Finally, we will discuss our compaction scheduling which by far improves throughput by up to 40% in write-intensive workloads.
At Salesforce, we have deployed many thousands of HBase/HDFS servers, and learned a lot about tuning during this process. This talk will walk you through the many relevant HBase, HDFS, Apache ZooKeeper, Java/GC, and Operating System configuration options and provides guidelines about which options to use in what situation, and how they relate to each other.
A brief, but action-packed introduction to DataStax Enterprise Search. In this deck, we'll get an overview of DSE Search's value proposition, see some example CQL search queries, and dive into the details of the indexing and query paths.
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.
The strength of an open source project resides entirely in its developer community; a strong democratic culture of participation and hacking makes for a better piece of software. The key requirement is having developers who are not only willing to contribute, but also knowledgeable about the project’s internal structure and architecture. This session will introduce developers to the core internal architectural concepts of HBase, not just “what” it does from the outside, but “how” it works internally, and “why” it does things a certain way. We’ll walk through key sections of code and discuss key concepts like the MVCC implementation and memstore organization. The goal is to convert serious “HBase Users” into HBase Developer Users”, and give voice to some of the deep knowledge locked in the committers’ heads.
DataStax: An Introduction to DataStax Enterprise SearchDataStax Academy
1) Why We Built DSE Search
2) Basics of the Read and Write Paths
3) Fault-tolerance and Adaptive Routing
4) Analytics with Search and Spark
5) Live Indexing
CaSSanDra: An SSD Boosted Key-Value StoreTilmann Rabl
This presentation was held by Prashanth Menon at ICDE '14 on April 3, 2014 in Chicago, IL, USA.
The full paper and additional information is available at:
http://msrg.org/papers/Menon2013
Abstract:
With the ever growing size and complexity of enterprise systems there is a pressing need for more detailed application performance management. Due to the high data rates, traditional database technology cannot sustain the required performance. Alternatives are the more lightweight and, thus, more performant key-value stores. However, these systems tend to sacrifice read performance in order to obtain the desired write throughput by avoiding random disk access in favor of fast sequential accesses.
With the advent of SSDs, built upon the philosophy of no moving parts, the boundary between sequential vs. random access is now becoming blurred. This provides a unique opportunity to extend the storage memory hierarchy using SSDs in key-value stores. In this paper, we extensively evaluate the benefits of using SSDs in commercialized key-value stores. In particular, we
investigate the performance of hybrid SSD-HDD systems and demonstrate the benefits of our SSD caching and our novel dynamic schema model.
This course is designed to be a “fast start” on the basics of data modeling with Cassandra. We will cover some basic Administration information upfront that is important to understand as you choose your data model. It is still important to take a proper Admin class if you are responsible for production instance. This course focuses on CQL3, but thrift shall not be ignored.
Join us as we talk about the current state as well as the future of DSE Search. Nick Panahi will discuss high level architecture while Ariel will dive deep into some of the integration. We'll talk about future features, improvements and enhancements as well as some of the challenges of our custom integration and what that means for scale and availability.
About the Speakers
Nick Panahi Sr. Product Manager, DSE Search, DataStax
I am the product manager for DSE search, prior to product management, I was a solution architect for DataStax.
Ariel Weisberg Software Engineer, DataStax
Ariel is currently a Cassandra contributor and Datastax employee and former lead architect for VoltDB. Ariel aspires to be or considers himself a shared-nothing database expert depending on the time of day and whether Benedict is in the room, and has a passion for things measured in nanoseconds. Ariel has presented at events like Strangeloop, PAX Dev, OpenSQL camp Boston, NYC MySQL Meetup, and Boston New Technology Group meetup.
Performance tuning - A key to successful cassandra migrationRamkumar Nottath
In last few years, technology has seen a major drift in the dominance of traditional / RDMBS databases across different domains. Expeditious adoption of NoSQL databases especially Cassandra in the industry opens up a lot more discussions on what are the major challenges that are faced during implementation of Cassandra and how to mitigate it. Many a times we conclude that migration or POC (proof of concept) is not successful; however the real flaw might be in the data modeling, identifying the right hardware configurations, database parameters, right consistency level and so on. There's no one good model or configuration which fits all use cases and all applications. Performance tuning an application is truly an art and requires perseverance. This paper delve into different performance tuning considerations and anti-patterns that need to be considered during Cassandra migration / implementation to make sure we are able to reap the benefits of Cassandra, what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational Database Management Systems.
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016DataStax
The task of a data modeler is to create order out of chaos without excessively distorting the truth. The finished product should be data model that describes the structure, manipulation and integrity aspects of the data to be stored. To properly create a data model, the modeler will transform said chaos through three distinct stages. The first is a Conceptual Data Model, then a Logical Data Model, and lastly, a Physical Data Model. This presentation will cover all three stages. It will also walk through the creation of a full blown data model for a service scheduling application to be built with a Cassandra backend. This presentation is based on a 3 part blog series I wrote for DataScale.
About the Speaker
Adam Hutson Data Architect, DataScale
Adam is Data Architect for DataScale, Inc. He is a seasoned data professional with experience designing & developing large-scale, high-volume database systems. Adam previously spent four years as Senior Data Engineer for Expedia building a distributed Hotel Search using Cassandra 1.1 in AWS. Having worked with Cassandra since version 0.8, he was early to recognize the value Cassandra adds to Enterprise data storage. Adam is also a DataStax Certified Cassandra Developer.
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
Go90 is a mobile entertainment platform offering access to live and on demand videos. We built the web services platform and social features like activity feed for go90 by making heavy use of Cassandra and Scala, and would like to share what we learned during development and while operating go90. In this presentation, we cover our data model evolution from the initial prototypes to the current production version and the significant performance gain by using a better data model. We will explain how we apply time series data modeling and the benefits of using expiring columns with DateTieredCompactionStrategy. We will also talk about interesting experiences related to table modifications, tombstones and table pagination. On the operations side, we will discuss our findings on java driver usage, performance, monitoring, cluster maintenance, version upgrade, 2-way ssl and many more. We hope you can learn from our mistakes instead of making them yourself!
About the Speakers
Christopher Webster Software Engineer, AOL
Christopher Webster works on the web services platform for the go90 AOL project. Previously he was a Computer Scientist for the Mission Control Technologies project at NASA Ames Center. Chris worked as a senior staff engineer at Sun Microsystems for Project zembly, the cloud development and deployment environment as well as technical lead in many NetBeans projects. Chris is an author of the NetBeans Field Guide and Assemble the Social Web With Zembly.
Thomas Ng Software Engineer, AOL
Thomas Ng is a software engineer at AOL, building web services for the go90 mobile entertainment platform using Cassandra, Scala and Kafka.
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Malin Weiss
By leveraging memory-mapped files, Speedment and the Chronicle Engine supports large Java maps that easily can exceed the size of your server’s RAM.Because the Java maps are mapped onto files, these maps can be shared instantly between several microservice JVMs and new microservice instances can be added, removed, or restarted very quickly. Data can be retrieved with predictable ultralow latency for a wide range of operations. The solution can be synchronized with an underlying database so that your in-memory maps will be consistently “alive.” The mapped files can be tens of terabytes, which has been done in real-world deployment cases, and a large number of micro services can share these maps simultaneously. Learn more in this session.
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Speedment, Inc.
By leveraging memory-mapped files, Speedment and the Chronicle Engine supports large Java maps that easily can exceed the size of your server’s RAM.Because the Java maps are mapped onto files, these maps can be shared instantly between several microservice JVMs and new microservice instances can be added, removed, or restarted very quickly. Data can be retrieved with predictable ultralow latency for a wide range of operations. The solution can be synchronized with an underlying database so that your in-memory maps will be consistently “alive.” The mapped files can be tens of terabytes, which has been done in real-world deployment cases, and a large number of micro services can share these maps simultaneously. Learn more in this session.
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
From: DataWorks Summit 2017 - Munich - 20170406
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
SQL Server 2014 Memory Optimised Tables - AdvancedTony Rogerson
Hekaton is large piece of kit, this session will focus on the internals of how in-memory tables and native stored procedures work and interact – Database structure: use of File Stream, backup/restore considerations in HA and DR as well as Database Durability, in-memory table make up: hash and range indexes, row chains, Multi-Version Concurrency Control (MVCC). Design considerations and gottcha’s to watch out for.
The session will be demo led.
Note: the session will assume the basics of Hekaton are known, so it is recommended you attend the Basics session.
This talk delves into the many ways that a user has to use HBase in a project. Lars will look at many practical examples based on real applications in production, for example, on Facebook and eBay and the right approach for those wanting to find their own implementation. He will also discuss advanced concepts, such as counters, coprocessors and schema design.
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
JAVA MICROSERVICES FOR BIG DATA WITH LOW LATENCY - Per-Ake Minborg, CTO Speedment
By leveraging on memory mapped files (eg. Hazelcast, ChronicleMaps etc.), Speedment supports large Java Maps that easily can exceed the size of your server’s RAM. Because the Java Maps are mapped onto files, these maps can be shared instantly between several microservice JVMs and new microservice instances can be added, removed or restarted very quickly. Data can be retrieved with predictable ultra-low latency for a wide range of operations. The solution can be synchronized with an underlying database so that your in-memory maps will be consistently “alive”. The mapped files can be terabytes which has been done in real world deployment cases and there can be a large number of microservices that shares these maps simultaneously.
In these slides we introduce Column-Oriented Stores. We deeply analyze Google BigTable. We discuss about features, data model, architecture, components and its implementation. In the second part we discuss all the major open source implementation for column-oriented databases.
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
Topics discussed include differences between columnstore and rowstore engines, data ingestion, data sharding and query tuning, lastly memory and workload management.
Watch the replay at https://memsql.wistia.com/medias/4siccvlorm
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL ServerIDERA Software
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: http://ow.ly/S6MG50A5ok5
Microsoft introduced IN-MEMORY OLTP, widely referred to as “Hekaton” in SQL Server 2014. Hekaton allows for the creation of fully transactionally consistent memory-resident tables designed for high concurrency and no blocking. With SQL 2016, many of the original restrictions and limitations of this feature have been reduced. IDERA’s Vicky Harp will give an overview of this feature, including how to compile T-SQL code into machine code for an even greater performance boost.
There’s also been a lot of buzz about Oracle 12c’s new IN-MEMORY COLUMN STORE. Oracle ACE Bert Scalzo will cover this new feature, how it works, it’s benefits, scripts to measure/monitor it and more. He will also touch on performance observations from benchmarking this new feature against more traditional SGA memory allocations plus Oracle 11g R2’s Database Smart Flash Cache. All findings, scripts and conclusions from this exercise will be shared. In addition, two very popular database benchmarking tools will be highlighted.
In this lecture we analyze key-values databases. At first we introduce key-value characteristics, advantages and disadvantages.
Then we analyze the major Key-Value data stores and finally we discuss about Dynamo DB.
In particular we consider how Dynamo DB: How is implemented
1. Motivation Background
2. Partitioning: Consistent Hashing
3. High Availability for writes: Vector Clocks
4. Handling temporary failures: Sloppy Quorum
5. Recovering from failures: Merkle Trees
6. Membership and failure detection: Gossip Protocol
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
3. Core Components
• Memtable – data in memory (R/W)
• SSTable – data on disk (immutable, R/O)
• CommitLog – data on disk (W/O)
• CacheService (Row Cache and Key Cache) – in-memory caches
• ColumnFamilyStore – logical grouping of “table” data
• DataTracker and View – provides atomicity and grouping of
memtable/sstable data
• ColumnFamily – Collection of Cells
• Cell – Name, Value, TS
• Tombstone – Deletion marker indicating TS and deleted cell(s)
4. MemTable
• In-memory data structure consisting of:
• Memory pools (on-heap, off-heap)
• Allocators for each pool
• Size and limit tracking and CommitLog sentinels
• Map of Key AtomicBTreeColumns
• Atomic copy-on-write semantics for row-data
• Flush to disk logic is triggered when pool passes ratio of usage relative
to user-configurable threshold
• Memtable w/largest ratio of used space (either on or off heap) is flushed
to disk
5. On heap vs. Off heap Memtables: an overview
• http://www.datastax.com/dev/blog/off-heap-memtables-in-cassandra-2-1
• https://issues.apache.org/jira/browse/CASSANDRA-6689
• https://issues.apache.org/jira/browse/CASSANDRA-6694
• memtable_allocation_type
• offheap_buffers moves the cell name and value to DirectBuffer objects. The values are still
“live” Java buffers. This mode only reduces heap significantly when you are storing large
strings or blobs
• offheap_objects moves the entire cell off heap, leaving only the NativeCell reference
containing a pointer to the native (off-heap) data. This makes it effective for small values
like ints or uuids as well, at the cost of having to copy it back on-heap temporarily when
reading from it.
• Default in 2.1 is heap buffers
6. On heap vs. Off heap: continued
• Why?
• Reduces sizes of objects in memory – no more ByteBuffer overhead
• More data fitting in memory == better performance
• Code changes that support it:
• MemtablePools allow on vs. off-heap allocation (and Slab, for that matter)
• MemtableAllocators to allow differentiating between on-heap and off-heap
allocation
• DecoratedKey and *Cells changed to interfaces to have different allocation
implementations based on native vs. heap
7. SSTable
• Ordered-map of KVP
• Immutable
• Consist of 3 files:
• Bloom Filter: optimization to determine if the Partition Key you’re
looking for is (probably) in this sstable
• Index file: contains offset into data file, generally memory mapped
• Data file: contains data, generally compressed
• Read by SSTableReader
8. CommitLog
• Append-only file structure corresponding – provides interim durability for writes while
they’re living in Memtables and haven’t been flushed to sstables
• Has sync logic to determine the level of durability to disk you want - either
PeriodicCommitLogService or BatchCommitLogService
• Periodic: (default) checks to see if it hit window limit, if so, block and wait for sync to catch up
• Batch: no ack until fsync to disk. Waits for a specific window before hitting fsync to coalesce
• Singleton – façade for commit log operations
• Consists of multiple components
• CommitLog.java: interface to subsystem
• CommitLogManager.java: segment allocation and management
• CommitLogArchiver.java: user-defined commands pre/post flush
• CommitLogMetrics.java
9. CacheService.java
• In-memory caching service to optimize lookups of hot data
• Contains three caches:
• keyCache
• rowCache
• counterCache
• See:
• AutoSavingCache.java
• InstrumentingCache.java
• Tunable per table, limits in cassandra.yaml, keys to cache, size in mb, rows, size in mb
• Defaults to keys only, can enable row cache via CQL
10. ColumnFamilyStore.java
• Contains logic for a “table”
• Holds DataTracker
• Creating and removing sstables on disk
• Writing / reading data
• Cache initialization
• Secondary index(es)
• Flushing memtables to sstables
• Snapshots
• And much more
11. CFS: DataTracker and View
• DataTracker allows for atomic operations on a “view” of a Table (ColumnFamilyStore)
• Contains various logic surrounding Memtables and flushing, SSTables and
compaction, and notification for subscribers on changes to SSTableReaders
• 1 DataTracker per CFS, 1 AtomicReference<View> per DataTracker
• View consists of current Memtable, Memtables pending flush, SSTables for the CFS,
and SSTables being actively compacted
• Currently active Memtable is atomically switched out in:
• DataTracker.switchMemtable(boolean truncating)
12. ColumnFamily.java
• A sorted map of columns
• Abstract class, extended by:
• ArrayBackedSortedColumns
• Array backed
• Non-thread-safe
• Good for iteration, adding cells (especially if in sorted order)
• AtomicBTreeColumns (memtable only)
• Btree backed
• Thread-safe w/atomic CAS
• Logarithmic complexity on operations
• Logic to add / retrieve columns, counters, tombstones, atoms
15. Overview – the Read Path
Return results
Keyspace
ColumnFamilyStore
Check Row Cache
CollationController
hit
miss
Memtable
read merge
SSTables
Update Row Cache
ColumnFamily
Coordinator
MessagingService
Key Cache
Seek to cached
position
Binary scan index,
update cache
hit
miss
16. Read-specific primitive: QueryFilter
• Wraps IDiskAtomFilter
• IDiskAtomFilter: used to get columns from Memtable, SSTable, or SuperColumn
• IdentityQueryFilter, NamesQueryFilter, SliceQueryFilter
• Contains a variety of iterators to collate on disk contents, gather tombstones, reduce
(merge) Cells with the same name, etc
• See:
• collateColumns(…)
• gatherTombstones(…)
• getReducer(final Comparator<Cell> comparator)
17. Read-specific class: SSTableReader
• Has 2 SegmentedFiles, ifile and dfile, for index and data respectively
• Contains a Key Cache, caching positions of keys in the SSTR
• Contains an IndexSummary w/sampling of the keys that are in the table
• Binary search used to narrow down location in file via IndexSummary
• getIndexScanPosition(RowPosition key)
• Short running operations guarded by ColumnFamilyStore.readOrdering
• See OpOrder.java – producer/consumer synchronization primitive to coordinate readers
w/flush operations
• Access is reference counted via acquireReference() and releaseReference() for long
running operations (See CASSANDRA-7705 re: moving away from this)
• Provides methods to retrieve an SSTableScanner which gives you access to OnDiskAtoms
via iterators and holds RandomAccessReaders on the raw files on disk
18. Overview – the Read Path
Return results
Keyspace
ColumnFamilyStore
Check Row Cache
CollationController
hit
miss
Memtable
read merge
SSTables
Update Row Cache
ColumnFamily
Coordinator
MessagingService
Key Cache
Seek to cached
position
Binary scan index,
update cache
hit
miss
19. ReadVerbHandler and ReadCommands
• Messages are received by the MessagingService and passed to the ReadVerbHandler for appropriate
verbs
• ReadCommands:
• SliceFromReadCommand
• Relies on SliceQueryFilter, uses a range of columns defined by a ColumnSlice
• SliceByNamesReadCommand
• Relies on NamesQueryFilter, uses a column name to retrieve a single column
• Both diverge in calls and converge back into implementers of ColumnFamily
• ArrayBackedSortedColumns, AtomicBTreeSortedColumns
public Row Keyspace.getRow(QueryFilter filter) {
ColumnFamilyStore cfStore = getColumnFamilyStore(filter.getColumnFamilyName());
ColumnFamily columnFamily = cfStore.getColumnFamily(filter);
return new Row(filter.key, columnFamily);
}
20. Overview – the Read Path
Return results
Keyspace
ColumnFamilyStore
Check Row Cache
CollationController
hit
miss
Memtable
read merge
SSTables
Update Row Cache
ColumnFamily
Coordinator
MessagingService
Key Cache
Seek to cached
position
Binary scan index,
update cache
hit
miss
21. RowCache
• CFS.getThroughCache(UUID cfId, QueryFilter filter)
• After retrieving our CFS, the first thing we check is our Row Cache to see if the row is
already merged, in memory, and ready to go
• If we get a cache hit on the key, we’ll:
• Confirm it’s not just a sentinel of someone else in flight. If so, we query w/out caching
• If the data for the key is valid, we filter it down to the query we have in flight and return
those results as it’ll have >= the count of Cells we’re looking for
• On cache miss:
• Eventually cache all top level columns for the key queried if configured to do so (after
Collation)
• Cache results of user query if it satisfies the cache config params
• Extend the results of the query to satisfy the caching requirements of the system
22. Overview – the Read Path
Return results
Keyspace
ColumnFamilyStore
Check Row Cache
CollationController
hit
miss
Memtable
read merge
SSTables
Update Row Cache
ColumnFamily
Coordinator
MessagingService
Key Cache
Seek to cached
position
Binary scan index,
update cache
hit
miss
23. CollationController.collect*Data (…)
• The data we’re looking for may be in a Memtable, an SSTable, multiple of either, or a
combination of all of them.
• The logic to query this data and merge our results exists in CollationController.java:
• collectAllData
• collectTimeOrderedData
• High level flow:
1. Get data from memtables for the QueryFilter we’re processing
2. Get data from sstables for the QueryFilter we’re processing
3. Merge all the data together, keeping the most recent
4. If we iterated across enough sstables, “hoist up” the now defragmented data into a memtable,
bypassing CommitLog and Index update (collectTimeOrderedData only)
24. Overview – the Read Path
Return results
Keyspace
ColumnFamilyStore
Check Row Cache
CollationController
hit
miss
Memtable
read merge
SSTables
Update Row Cache
ColumnFamily
Coordinator
MessagingService
Key Cache
Seek to cached
position
Binary scan index,
update cache
hit
miss
25. CollationController merging: memtables
• Fairly straightforward operations on memtables in the view:
• Check all memtables to see if they have a ColumnFamily that matches our filter.key
• Add all columns to our result ColumnFamily that match
• Keep a running tally of the mostRecentRowTombstone for use in next step.
26. Overview – the Read Path
Return results
Keyspace
ColumnFamilyStore
Check Row Cache
CollationController
hit
miss
Memtable
read merge
SSTables
Update Row Cache
ColumnFamily
Coordinator
MessagingService
Key Cache
Seek to cached
position
Binary scan index,
update cache
hit
miss
27. CollationController merging: sstables
• We have a few optimizations available for merging in data from sstables:
• Sort the collection of SSTables by the max timestamp present
• Iterate across the SSTables
• Skipping any that are older than the most recent tombstone we’ve seen
• Create a “reduced” name filter by removing columns from our filter where we
have fresher data than the SSTR’s max Timestamp
• Get iterator from SSTR for Atoms matching that reduced name filter
• Add any matching OnDiskAtoms to our result set (BloomFilter excludes via
iterator with SSTR.getPosition() call)
28. Overview – the Read Path
Return results
Keyspace
ColumnFamilyStore
Check Row Cache
CollationController
hit
miss
Memtable
read merge
SSTables
Update Row Cache
ColumnFamily
Coordinator
MessagingService
Key Cache
Seek to cached
position
Binary scan index,
update cache
hit
miss
31. Overview – the Write Path
MessagingService
Keyspace
CommitLog Enabled for this mutation?
Yes
Write CommitLog
No
Skip
Write to Memtable
SecondaryIndexManager.Updater
Invalidate Row Cache
32. MutationVerbHandler, Mutation.apply
• Contains Keyspace name
• DecoratedKey
• Map of cfId to ColumnFamily of modifications to perform
• MutationVerbHandler Mutation.apply() Keyspace.apply()
ColumnFamilyStore.apply()
33. Overview – the Write Path
MessagingService
Keyspace
CommitLog Enabled for this mutation?
Yes
Write CommitLog
No
Skip
Write to Memtable
SecondaryIndexManager.Updater
Invalidate Row Cache
34. The CommitLog ecosystem
• CommitLogSegmentManager: allocation and recycling of CommitLogSegments
• CommitLogSegment: file on disk
• CommitLogArchiver: allows user-defined archive and restore commands to be run
• Reference conf/commitlog_archiving.properties
• An AbstractCommitLogService, one of either:
• BatchCommitLogService – writer waits on sync to complete before returning
• PeriodicCommitLogService – Check if sync is behind, if so, register w/signal and
block until lastSyncedAt catches up
35. CommitLogSegmentManager (CLSM): overview
• Contains 2 collections of CommitLogSegments
• availableSegments: Segments ready to be used
• activeSegments: Segments that are “active” and contain unflushed data
• Only 1 active CommitLogSegment is in use at any given time
• Manager thread is responsible for maintaining active vs. available
CommitLogSegments and can be woken up by other contexts when maintenance is
needed
36. CLSM: allocation on the write path
• During CommitLog.add(…), a writer asks for allocated space for their mutation from
the CommitLogSegmentManager
• This is passed to the active CommitLogSegment’s allocate(…) method
• CommitLogSegment.allocate(int size) spins non-blocking until the space in the
segment is allocated, at which time it marks it dirty
• If the allocate(…) call returns null indicating we need a new CommitLogSegment:
• CommitLogSegment.advanceAllocatingFrom(CommitLogSegment old)
• Goal is to move CLS from available to active segments so we have more CLS to work with
• If it fails to get an available segment, the manager thread is woken back up to do some
maintenance, be it recycling or allocating a new CLS
37. CLSM: manager thread, new segments, recycling
• Constructor creates a runnable that blocks on segmentManagementTasks
• Task can either be null indicating we’re out of space (allocate path) or a segment that’s
flushed and ready for recycle
• If there’s no available segments, we create new CommitLogSegments and add them to
availableSegments
• hasAvailableSegmentsWaitQueue is signaled by this to awake any blocking writes waiting for
allocation
• When our CommitLog usage is approaching our allowable “limit”:
• If our total used size is > than the size allowed
• CommitLogSegmentManager.flushDataFrom on a list of activeSegments
• Force flush on any CFS that’s dirty
• Which switches Memtables and flushes to SSTable – more on this later
38. Overview – the Write Path
MessagingService
Keyspace
CommitLog Enabled for this mutation?
Yes
Write CommitLog
No
Skip
Write to Memtable
SecondaryIndexManager.Updater
Invalidate Row Cache
39. Memtable writes
• We attempt to get the partition for the given key if it exists
• If not, we allocate space for a new key and put an empty entry in the memtable for it,
backing that out if we race and someone else got there first on allocation
• Once we have space allocated, we call addAllWithSizeDelta
• Add the record to a new BTree and CAS it into the existing Holder
• Updates secondary indexes
• Finalize some heap tracking in the ColumnUpdater used by the BTree to perform updates
• Further reading:
• AtomicBTreeColumns.java (specifically addAllWithSizeDelta)
• BTree.java
40. MemtablePool
• Single MEMORY_POOL instance across entire DB
• Get an allocator to the memory pool during construction of a memtable
• Interface covering management of an on-heap and off-heap pool via SubPool
• HeapPool: On heap ByteBuffer allocations and release, subject to GC w/object overhead
• NativePool: Blend of on and off heap based on limits passed in
• Off heap allocations and release through NativeAllocator, calls to Unsafe
• SlabPool: Blend of on and off heap based on limits passed in
• Allocated in large chunks by SlabAllocator (1024*1024)
• MemtablePool.SubPool / SubAllocator:
• Contains various atomically updated longs tracking:
• Limits on allocation
• Currently allocated amounts
• Currently reclaiming amounts
• Threshold for when to run Cleaner thread
• Spin and CAS for updates on the above on allocator calls in addAllWithSizeDelta
41. Overview – the Write Path
MessagingService
Keyspace
CommitLog Enabled for this mutation?
Yes
Write CommitLog
No
Skip
Write to Memtable
SecondaryIndexManager.Updater
Invalidate Row Cache
42. Secondary Indexes: an overview
• Essentially a separate table stored on disk / in memtable
• Contains a ConcurrentNavigableMap of ByteBuffer SecondaryIndex
• There are quite a few SecondaryIndex implementations in the code base, ex:
• PerRowSecondaryIndex
• PerColumnSecondaryIndex
• KeysIndex
• On Write Path:
• SecondaryIndex updater passed down through to ColumnUpdater ctor
• On ColumnUpdater.apply(), insert for secondary index is called
• Essentially amounts to a 2nd write on another “table”
43. Overview – the Write Path
MessagingService
Keyspace
CommitLog Enabled for this mutation?
Yes
Write CommitLog
No
Skip
Write to Memtable
SecondaryIndexManager.Updater
Invalidate Row Cache
45. CLSM.activeSegments
ColumnFamilyStore
Flushing Memtables
Memtable
SSTableWriter
SSTable
SSTableReader
CommitLog.discardCompletedSegments(
cfId, lastReplayPosition)
CLS Active
CLS 2
CLS 1
Actively allocating
Skip
Still other cfDirty
Remove flushed cfId
Removed last dirty
Recycle CLS
Stop at position of flush
46. MemtableCleanerThread: starting a flush
• When MemtableAllocator adjusts the size of the data it has acquired the
MemtablePool checks whether or not we need to flush to free up space in memory
• If our used memory is > than the total reclaiming memory + the limit * ratio defined
in conf.memtable_cleanup_threshold, a memtable needs to be cleaned
• Cleaner thread is currently: ColumnFamilyStore.FlushLargestColumnFamily())
• We find the memtable with the largest Ownership ratio as determined by the currently
owned memory vs. limit, taking the max of either on or off heap
• Signals to CommitLog to discard completed segments on PostFlush stage of flush
47. Memtable Flushing
• Reference ColumnFamilyStore$Flush
• 1st, switch out memtables in CFS.DataTracker.View so new ops go to new memtable
• Sets lifecycle in memtable to discarding
• Runs the FlushRunnable in the Memtable
• Memtable.writeSortedContents
• Uses SSTableWriter to write sorted contents to disk
• Returns SSTableReader created by SSTableWriter.closeAndOpenReader
• Memtable.setDiscarded() MemtableAllocator.setDiscarded()
• Lifecycle to Discarded
• Free up all memory from the allocator for this memtable
48. Memtable Flushing: the commit log
• ColumnFamilyStore$PostFlush
• All relative to a timestamp of the most recent data in the flushed memtable
• Record sentinel for when this cf was cleaned (to be used later if it was active and we
couldn’t purge at time of flush)
• Walk through CommitLogSegments and remove dirty cfid
• Unless it’s actively being allocated from
• If the CLS is no longer in use:
• Remove it from our activeSegments
• Queue a task for Management thread to wake up and recycle the segment
49. Switching out memtables
• CFS.switchMemtableIfCurrent / CFS.switchMemtable
• There’s some complex non-blocking write-barrier operations on
Keyspace.writeOrder to allow us to wait for writes to finish in this context before
swapping out with new memtables regardless of dirty status
• Reference: OpOrder.java,OpOrder.Barrier
• Write sorted contents to disk (Memtable.FlushRunnable.runWith(File
sstableDirectory)
• cfs.replaceFlushed, swapping the memtable with the new SSTableReader returned
from writeSortedContents