The document summarizes a workshop on Cassandra data modeling. It discusses four use cases: (1) modeling clickstream data by storing sessions and clicks in separate column families, (2) modeling a rolling time window of data points by storing each point in a column with a TTL, (3) modeling rolling counters by storing counts in columns indexed by time bucket, and (4) using transaction logs to achieve eventual consistency when modeling many-to-many relationships by serializing transactions and deleting logs after commit. The document provides recommendations and alternatives for each use case.
Cassandra Community Webinar | Become a Super ModelerDataStax
Sure you can do some time series modeling. Maybe some user profiles. What's going to make you a super modeler? Let's take a look at some great techniques taken from real world applications where we exploit the Cassandra big table model to it's fullest advantage. We'll cover some of the new features in CQL 3 as well as some tried and true methods. In particular, we will look at fast indexing techniques to get data faster at scale. You'll be jet setting through your data like a true super modeler in no time.
Speaker: Patrick McFadin, Principal Solutions Architect at DataStax
Cassandra Day Chicago 2015: Advanced Data ModelingDataStax Academy
Speaker(s): Tim Berglund, Global Director of Training at DataStax
You know you need Cassandra for its uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for?
Apache Cassandra 2.0 is out - now there's no reason not to ditch that ol' legacy relational system for your important online applications. Cassandra 2.0 includes big impact features like Light Weight Transactions and Triggers. Do you know about the other new enhancements that got lost in the noise. Let's put the spotlight on all the things! Changes in memory management, file handling and internals. Low hype but they pack a big punch. While we were at it, we also did a bit of house cleaning.
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
Cassandra Community Webinar | Become a Super ModelerDataStax
Sure you can do some time series modeling. Maybe some user profiles. What's going to make you a super modeler? Let's take a look at some great techniques taken from real world applications where we exploit the Cassandra big table model to it's fullest advantage. We'll cover some of the new features in CQL 3 as well as some tried and true methods. In particular, we will look at fast indexing techniques to get data faster at scale. You'll be jet setting through your data like a true super modeler in no time.
Speaker: Patrick McFadin, Principal Solutions Architect at DataStax
Cassandra Day Chicago 2015: Advanced Data ModelingDataStax Academy
Speaker(s): Tim Berglund, Global Director of Training at DataStax
You know you need Cassandra for its uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for?
Apache Cassandra 2.0 is out - now there's no reason not to ditch that ol' legacy relational system for your important online applications. Cassandra 2.0 includes big impact features like Light Weight Transactions and Triggers. Do you know about the other new enhancements that got lost in the noise. Let's put the spotlight on all the things! Changes in memory management, file handling and internals. Low hype but they pack a big punch. While we were at it, we also did a bit of house cleaning.
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
This presentation is primarily focused on how to use collectd (http://collectd.org/) to gather data from the Postgres statistics tables. Examples of how to use collectd with Postgres will be shown. There is some hackery involved to make collectd do a little more and collect more meaningful data from Postgres. These small patches will be explored. A small portion of the discussion will be about how to visualize the data.
Triggers are those little bits of code running in your database that gets executed when something happens that you care about. Whether you are a developer who puts all of your business logic inside of PL/pgSQL functions or someone who uses an ORM and wants to stay away from database code, you will likely end up using triggers at some point. The fact that the most recommend way of implementing table partitioning in PostgreSQL uses triggers accounts for the importance of understanding triggers.
In this talk, we will step through examples of writing various types of triggers using practical uses cases like partitioning and auditing.
The structure of a trigger
BEFORE vs AFTER triggers
Statement Level vs Row Level triggers
Conditional triggers
Event triggers
Debugging triggers
Performance overhead of triggers
All of the examples will be done using PL/pgSQL so in addition to getting an overview of triggers, you will also get a good understanding of how to code in PL/pgSQL.
Preview of Cassandra 2.2 and 3.0 features. Materialized views, user defined functions, user defined aggregations, new storage engine, rewritten hints, improved vnodes, native JSON support, updated garbage collector.
Autovacuuming is one of the most common causes of stubbed toes for PostgreSQL newbies. This talk will be a deep dive into what vacuuming and autovacuuming do, why they are necessary, how to tune them, and how to evaluate whether or not your tuning is correct. I'll also discuss some lessons learned from doing this in a pathologically bloat-heavy context.
Data in Motion: Streaming Static Data EfficientlyMartin Zapletal
Distributed streaming performance, consistency, reliable delivery, durability, optimisations, event time processing and other concepts discussed and explained on Akka Persistence and other examples.
Whether running load tests or migrating historic data, loading data directly into Cassandra can be very useful to bypass the system’s write path.
In this webinar, we will look at how data is stored on disk in sstables, how to generate these structures directly, and how to load this data rapidly into your cluster using sstableloader. We'll also review different use cases for when you should and shouldn't use this method.
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
The 3.0 storage engine re-write is the biggest and most exciting change to ever happen in Apache Cassandra. The new storage engine can efficiently store and read data from disk using the same concepts present in the CQL 3 language. This has delivered large space savings, and creates new performance characteristics.
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the 3.0 storage engine, it's layout and performance characteristics.
About the Speaker
Aaron Morton CEO, The Last Pickle
Aaron Morton is the Co Founder & CEO at The Last Pickle (thelastpickle.com). A professional services company that works with clients to deliver and improve Apache Cassandra based solutions. He's based in New Zealand, is an Apache Cassandra Committer and a DataStax MVP for Apache Cassandra.
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponCloudera, Inc.
OpenTSDB was built on the belief that, through HBase, a new breed of monitoring systems could be created, one that can store and serve billions of data points forever without the need for destructive downsampling, one that could scale to millions of metrics, and where plotting real-time graphs is easy and fast. In this presentation we’ll review some of the key points of OpenTSDB’s design, some of the mistakes that were made, how they were or will be addressed, and what were some of the lessons learned while writing and running OpenTSDB as well as asynchbase, the asynchronous high-performance thread-safe client for HBase. Specific topics discussed will be around the schema, how it impacts performance and allows concurrent writes without need for coordination in a distributed cluster of OpenTSDB instances.
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
This presentation is primarily focused on how to use collectd (http://collectd.org/) to gather data from the Postgres statistics tables. Examples of how to use collectd with Postgres will be shown. There is some hackery involved to make collectd do a little more and collect more meaningful data from Postgres. These small patches will be explored. A small portion of the discussion will be about how to visualize the data.
Triggers are those little bits of code running in your database that gets executed when something happens that you care about. Whether you are a developer who puts all of your business logic inside of PL/pgSQL functions or someone who uses an ORM and wants to stay away from database code, you will likely end up using triggers at some point. The fact that the most recommend way of implementing table partitioning in PostgreSQL uses triggers accounts for the importance of understanding triggers.
In this talk, we will step through examples of writing various types of triggers using practical uses cases like partitioning and auditing.
The structure of a trigger
BEFORE vs AFTER triggers
Statement Level vs Row Level triggers
Conditional triggers
Event triggers
Debugging triggers
Performance overhead of triggers
All of the examples will be done using PL/pgSQL so in addition to getting an overview of triggers, you will also get a good understanding of how to code in PL/pgSQL.
Preview of Cassandra 2.2 and 3.0 features. Materialized views, user defined functions, user defined aggregations, new storage engine, rewritten hints, improved vnodes, native JSON support, updated garbage collector.
Autovacuuming is one of the most common causes of stubbed toes for PostgreSQL newbies. This talk will be a deep dive into what vacuuming and autovacuuming do, why they are necessary, how to tune them, and how to evaluate whether or not your tuning is correct. I'll also discuss some lessons learned from doing this in a pathologically bloat-heavy context.
Data in Motion: Streaming Static Data EfficientlyMartin Zapletal
Distributed streaming performance, consistency, reliable delivery, durability, optimisations, event time processing and other concepts discussed and explained on Akka Persistence and other examples.
Whether running load tests or migrating historic data, loading data directly into Cassandra can be very useful to bypass the system’s write path.
In this webinar, we will look at how data is stored on disk in sstables, how to generate these structures directly, and how to load this data rapidly into your cluster using sstableloader. We'll also review different use cases for when you should and shouldn't use this method.
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
The 3.0 storage engine re-write is the biggest and most exciting change to ever happen in Apache Cassandra. The new storage engine can efficiently store and read data from disk using the same concepts present in the CQL 3 language. This has delivered large space savings, and creates new performance characteristics.
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the 3.0 storage engine, it's layout and performance characteristics.
About the Speaker
Aaron Morton CEO, The Last Pickle
Aaron Morton is the Co Founder & CEO at The Last Pickle (thelastpickle.com). A professional services company that works with clients to deliver and improve Apache Cassandra based solutions. He's based in New Zealand, is an Apache Cassandra Committer and a DataStax MVP for Apache Cassandra.
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponCloudera, Inc.
OpenTSDB was built on the belief that, through HBase, a new breed of monitoring systems could be created, one that can store and serve billions of data points forever without the need for destructive downsampling, one that could scale to millions of metrics, and where plotting real-time graphs is easy and fast. In this presentation we’ll review some of the key points of OpenTSDB’s design, some of the mistakes that were made, how they were or will be addressed, and what were some of the lessons learned while writing and running OpenTSDB as well as asynchbase, the asynchronous high-performance thread-safe client for HBase. Specific topics discussed will be around the schema, how it impacts performance and allows concurrent writes without need for coordination in a distributed cluster of OpenTSDB instances.
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
strangeloop 2012 apache cassandra anti patternsMatthew Dennis
random list of Apache Cassndra Anti Patterns. There is a lot of info on what to use Cassandra for and how, but not a lot of information on what not to do. This presentation works towards filling that gap.
Cassandra concepts, patterns and anti-patternsDave Gardner
An introduction to the fundamental concepts behind Apache Cassandra. This talk explains the engineering principles that make Cassandra such an attractive choice for building highly resilient and available systems and then goes on to explain how to use it - covering basic data modelling patterns and anti-patterns.
I don't think it's hyperbole when I say that Facebook, Instagram, Twitter & Netflix now define the dimensions of our social & entertainment universe. But what kind of technology engines purr under the hoods of these social media machines?
Here is a tech student's perspective on making the paradigm shift to "Big Data" using innovative models: alphabet blocks, nesting dolls, & LEGOs!
Get info on:
- What is Cassandra (C*)?
- Installing C* Community Version on Amazon Web Services EC2
- Data Modelling & Database Design in C* using CQL3
- Industry Use Cases
Cassandra, Modeling and Availability at AMUGMatthew Dennis
brief high level comparison of modeling between relational databases and Cassandra followed by a brief description of how Cassandra achieves global availability
Talk from CassandraSF 2012 showing the importance of real durability. Examples of use for row level isolation in Cassandra and the implementation of a transaction log pattern. The example used is a banking system on top of Cassandra with support crediting/debiting an account, viewing an account balance and transferring money between accounts.
A high level overview of common Cassandra use cases, adoption reasons, BigData trends, DataStax Enterprise and the future of BigData given at the 7th Advanced Computing Conference in Seoul, South Korea
Add a bit of ACID to Cassandra. Cassandra Summit EU 2014odnoklassniki.ru
OK.ru is one of the largest social networks for Russian-speaking audiences with 80+ million unique user’s visits monthly. ok.ru uses Cassandra since 2010 and made a number of improvements to C* 2.0 and 2.1 codebase. Until recent time more than 50 TB of data at Ok.ru OLTP systems was managed by Microsoft SQL Sever. It’s very expensive, hard to scale and cannot save us from outage if one of our data centers fail. We wanted a new, fast scalable and reliable storage for these data. These data has requirements to support ACID transactions, so we don’t have to rewrite all application code from scratch. С* does not support these transactions, only lightweight, so we implemented a new storage with ACID and selected features of SQL world by ourselves. Still, it has C* at its heart. We’ll discuss the internals of the new storage, what features of C* we had to alter and which to rewrite from scratch. We’ll also talk about its operational experience in production.
Cassandra Summit 2014: An overview of the Hippo Project at Credit SuisseDataStax Academy
Speaker: Jay Modha, Vice President of Equity Derivatives IT at Credit Suisse
In this talk I will demonstrate how we have built one of the most reliable services in Credit Suisse using Cassandra in an environment that is changing all the time - navigating the complex path of scheduled downtime, disaster recovery scenarios and the realities of using shared infrastructure in a corporate environment while still maintaining zero downtime for our users. How a small team of 4 developers supports a user base of over 600+ users globally and go through in more detail how we have deployed Cassandra and manage it from an operations perspective. How we have used Cassandra’s TTL functionality to meet regulatory data retention requirements and go through some examples of how our Cassandra schema has evolved to deliver new functionality for our users. Finally, lessons learned and the future of Cassandra at Credit Suisse.
Microservices architecture involves many services that are being distributed over the network resulting in many more ways of failure. This session will try to cover the available tools that can help you when designing/building such distributed system in Go
Diego Souza fala sobre sistemas distribuídos mostradando uma introdução sobre os conceitos básicos e algumas considerações práticas que podem afetar o nosso dia a dia.
Assista esta palestra em https://www.eventials.com/locaweb/sistemas-distribuidos/
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...confluent
When working with KafkaConsumer, we usually employ single thread both for reading and processing of messages. KafkaConsumer is not thread-safe, so using single thread fits in well. Downside of this approach is that you are limited to single thread for processing messages.
By decoupling consumption and processing, we can achieve processing parallelization with single consumer and get the most out of multi-core CPU architectures available today. While this can be very useful in certain use-case scenarios, it's not trivial to implement.
How do we use multiple threads with KafkaConsumer which is not thread safe? How do we react to consumer group rebalancing? Can we get desired processing and ordering guarantees? In this talk we 'll try to answer these questions and explore challenges we face on our path.
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
At Uber we use high cardinality monitoring to observe and detect issues with our 4,000 microservices running on Mesos and across our infrastructure systems and servers. We’ll cover how we put the resulting 6 billion plus time series to work in a variety of different ways, auto-discovering services and their usage of other systems at Uber, setting up and tearing down alerts automatically for services, sending smart alert notifications that rollup different failures into individual high level contextual alerts, and more. We’ll also talk about how we accomplish all this with a global view of our systems with M3, our open source metrics platform. We’ll take a deep dive look at how we use M3DB, now available as an open source Prometheus long term storage backend, to horizontally scale our metrics platform in a cost efficient manner with a system that’s still sane to operate with petabytes of metrics data.
Webinar - MariaDB Temporal Tables: a demonstrationFederico Razzoli
MariaDB Temporal Tables are useful to track how data change over time, and to handle data that refer to specific time periods.
In this webinar I showed:
* Which problems Temporal Tables solve
* How to create Temporal Tables
* How to turn regular tables into Temporal Tables
* Best practices
* Examples of what can be done with Temporal Tables
Introduction to Gatling performance testing tool and how we used it for testing Zonky's REST API. Example of running distributed performance tests in AWS Fargate with real-time monitoring with Logstash/ElasticSearch/Kibana stack.
OSMC 2019 | How to improve database Observability by Charles JudithNETWAYS
Delivering a database service is not a simple job but to ensure that everything is working correctly your platform needs to be observable. In this talk, I’ll talk about how we make the MySQL/MariaDB databases observable. We’ll talk about the RED, USE methods, and the golden signals. You’ll discover how we dealt with the following questions “We think the database is slow”. This talk will allow you to make your databases discoverable with open source solutions.
Logs are one of the most important sources to monitor and reveal some significant events of interest. In this presentation, we introduced an implementation of log streams processing architecture based on Apache Flink. With fluentd, different kinds of emitted logs are collected and sent to Kafka. After having processed by Flink, we try to build a dash board utilizing elasticsearch and kibana for visualization.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
2. Overview
● Hopefully interactive
● Use cases submitted via Google Moderator,
email, IRC, etc
● Interesting and/or common requests in the
slides to get us started
● Bring up others if you have them !
3. Data Modeling Goals
● Keep data queried together on disk together
● In a more general sense think about the
efficiency of querying your data and work
backward from there to a model in Cassandra
● Don't try to normalize your data (contrary to
many use cases in relational databases)
● Usually better to keep a record that something
happened as opposed to changing a value (not
always advisable or possible)
4. ClickStream Data
(use case #1)
● A ClickStream (in this context) is the sequence
of actions a user of an application performs
● Usually this refers to clicking links in a WebApp
● Useful for ad selection, error recording, UI/UX
improvement, A/B testing, debugging, et cetera
● Not a lot of detail in the Google Moderator
request on what the purpose of collecting the
ClickStream data was – so I made some up
5. ClickStream Data Defined
● Record actions of a user within a session for
debugging purposes if app/browser/page/server
crashes
6. Recording Sessions
● CF for sessions a user has had
● Row Key is user name/id
● Column Name is session id (TimeUUID)
● Column Value is empty (or length of session, or some
aggregated details about the session after it ended)
● CF for actual sessions
● Row Key is TimeUUID session id
● Column Name is timestamp/TimeUUID of each click
● Column Value is details about that click (serialized)
7. UserSessions Column Family
Session_01 Session_02 Session_03
(TimeUUID) (TimeUUID)
userId (TimeUUID)
(empty/agg) (empty/agg) (empty/agg)
● Most recent session
● All sessions for a given time period
8. Sessions Column Family
timestamp_01 timestamp_02 timestamp_03
SessionId
(TimeUUID) ClickData ClickData ClickData
(json/xml/etc) (json/xml/etc) (json/xml/etc)
● Retrieve entire session's ClickStream (row)
● Order of clicks/events preserved
● Retrieve ClickStream for a slice of time within the session
● First action taken in a session
● Most recent action taken in a session
● Why JSON/XML/etc?
10. Of Course
(depends on what you want to do)
● Secondary Indexes
● All Sessions in one row
● Track by time of activity instead of session
11. Secondary Indexes Applied
● Drop UserSessions CF and use secondary
indexes
● Uses a “well known” column to record the user
in the row; secondary index is created on that
column
● Doesn't work so well when storing aggregates
about sessions in the UserSessions CF
● Better when you want to retrieve all sessions a
user has had
12. All Sessions In One Row Applied
● Row Key is userId
● Column Name is composite of timestamp and
sessionId
● Can efficiently request activity of a user across
all sessions within a specific time range
● Rows could potentially grow quite large, be
careful
● Reads will almost always require at least two
seeks on disk
13. Time Period Partitioning Applied
● Row Key is composite of userId and time “bucket”
● e.g. jan_2011 or jan_01_2011 for month or day buckets respectively
● Column Name is TimeUUID of click
● Column Value is serialized click data
● Avoids always requiring multiple seeks when the user has old
data but only recent data is requested
● Easy to lazily aggregate old activity
● Can still efficiently request activity of a user across all
sessions within a specific time range
14. Rolling Time Window Of Data Points
(use case #2)
● Similar to RRDTool was the example given
● Essentially store a series of data points within a
rolling window
● common request from Cassandra users for this
and/or similar
15. Data Points Defined
● Each data point has a value (or multiple values)
● Each data point corresponds to a specific point
in time or an interval/bucket (e.g. 5 th minute of
th
17 hour on some date)
16. Time Window Model
System7:RenderTime
TimeUUID0 TimeUUID1 TimeUUID2
s7:rt 0.051 0.014 0.173
Some request took 0.014 seconds to render
● Row Key is the id of the time window data you are
tracking (e.g. server7:render_time)
● Column Name is timestamp (or TimeUUID) the event
occurred at
● Column Value is the value of the event (e.g. 0.051)
17. The Details
● Cassandra TTL values are key here
● When you insert each data point set the TTL to the max time
range you will ever request; there is very little overhead to
expiring columns
● When querying, construct TimeUUIDs for the min/max of
the time range in question and use them as the start/end
in your get_slice call
● Consider partitioning the rows by a known time period
(e.g. “year”) if you plan on keeping a long history of data
(NB: requires slightly more complex logic in the app if a
time range spans such a period)
● Very efficient queries for any window of time
18. Rolling Window Of Counters
(use case #3)
● “How to model rolling time window that contains counters with time
buckets of monthly (12 months), weekly (4 weeks), daily (7 days),
hourly (24 hours)? Example would be; how many times user logged
into a system in last 24 hours, last 7 days ...”
● Timezones and “rolling window” is what makes this interesting
19. Rolling Time Window Details
● One row for every granularity you want to track
(e.g. day, hour)
● Row Key consists of the granularity, metric, user
and system
● Column Name is a “fixed” time bucket on UTC time
● Column Values are counts of the logins in that
bucket
● get_slice calls to return multiple counters which
are them summed up
20. Rolling Time Window Counter Model
user3:system5:logins:by_day
20110107 ... 20110523
U3:S5:L:D
2 ... 7
2 logins in Jan 7th 2011 7 logins on May 23rd 2011
for user 3 on system 5 for user 3 on system 5
user3:system5:logins:by_hour
2011010710 ... 2011052316
U3:S5:L:H
1 ... 7
one login for user 3 on system 5 2 logins for user 3 on system 5
on Jan 7th 2011 for the 10th hour on May 23rd 2011 for the 16th hour
21. Rolling Time Window Queries
● Time window is rolling and there are other
timezones besides UTC
● one get_slice for the “middle” counts
● one get_slice for the “left end”
● one get_slice for the “right end”
22. Example: logins for the past 7 days
● Determine date/time boundaries
● Determine UTC days that are wholly contained
within your boundaries to select and sum
● Select and sum counters for the remaining hours
on either side of the UTC days
● O(1) queries (3 in this case), can be requested
from C* in parallel
● NB: some timezones are annoying (e.g. 15 minute
or 30 minutes offsets); I try to ignore them
23. Alternatives?
(of course)
● If you're counting logins and each user doesn't login
in hundreds of times a day, just have one row per
user with a TimeUUID column name for the time the
login occurred
● Supports any timezone/range/granularity easily
● More expensive for large ranges (e.g. year)
regardless of granularity, so cache results (in C*)
lazily.
● NB: caching results for rolling windows is not usually
helpful (because, well it's rolling and always changes)
24. Eventually Atomic
(use case #4)
● “When there are many to many or one to many relations involved how
to model that and also keep it atomic? for eg: one user can upload
many pictures and those pictures can somehow be related to other
users as well.”
● Attempting full ACID compliance in distributed systems is a bad idea
(and impossible in the general sense)
● However, consistency is important and can certainly be achieved in
C*
● Many approaches / alternatives
● I like transaction log approach, especially in the context of C*
25. Transaction Logs
(in this context)
● Records what is going to be performed before it
is actually performed
● Performs the actions that need to be atomic (in
the indivisible sense, not the all at once sense)
● Marks that the actions were performed
26. In Cassandra
● Serialize all actions that need to be performed
in a single column – JSON, XML, YAML (yuck!),
cpickle, JSO, et cetera
● Row Key = randomly chosen C* node token
● Column Name = TimeUUID
● Perform actions
● Delete Column
27. Configuration Details
● Short GC_Grace on the XACT_LOG Column
Family (e.g. 1 hour)
● Write to XACT_LOG at CL.QUORUM or
CL.LOCAL_QUORUM for durability (if it fails
with an unavailable exception, pick a different
node token and/or node and try again; same
semantics as a traditional relational DB)
● 1M memtable ops, 1 hour memtable flush time
28. Failures
● Before insert into the XACT_LOG
● After insert, before actions
● After insert, in middle of actions
● After insert, after actions, before delete
● After insert, after actions, after delete
29. Recovery
● Each C* has a crond job offset from every other
by some time period
● Each job runs the same code: multiget_slice for
all node tokens for all columns older than some
time period
● Any columns need to be replayed in their
entirety and are deleted after replay (normally
there are no columns because normally things
are working normally)
30. XACT_LOG Comments
● Idempotent writes are awesome (that's why this
works so well)
● Doesn't work so well for counters (they're not
idempotent)
● Clients must be able to deal with temporarily
inconsistent data (they have to do this anyway)
● Could use a reliable queuing service (e.g. SQS)
instead of polling – push to SQS first, then
XACT log.