This document discusses the experiences of Diego Pacheco, Jackson Oliveira, and Marcelo Serpa in building a Cassandra operations orchestrator called CM across multiple AWS regions. It describes the problem CM solves in automating Cassandra cluster operations, its architecture, design principles of self-healing and self-operating, and how the team practiced agile development with techniques like Kanban, documentation, testing, and refactoring. Challenges discussed include outages, issues migrating from Cassandra 2.1 to 2.2, and lessons learned around complexity avoidance, estimation, testing coverage, and observability.
Over the years, DevOps has evolved in many aspects: from best practices, to the responsibilities of an operations engineer, to culture, the ever-changing landscape of infrastructure brings new, exciting discoveries… but also some never before seen issues! I always keep a careful record of all the problems we’ve troubleshot because you never know when a solution will come in handy. This session tells a couple of tales of dealing with databases and how troubleshooting has evolved over time.
Dmytro Patkovskyi "Practical tips regarding build optimization for those who ...Fwdays
This talk is about build optimization mechanisms available in three developer tools that are often used together (Gitlab, Gradle, and Docker). Dmytro will describe the possibilities of each instrument and advise which functions you should use and how. Additional attention will be paid to the most common pitfalls, along with handy tips and tricks. The talk will also be useful for those who use just one or two out of the tools.
"High-load is at the intersection of DevOps and PHP development", Fwdays
Let's talk about how software solutions and business needs understanding can make life easier:
CQRS-architecture in action.
The story of a seamless move to another DC.
Cross-DC infrastructure and data processing.
Software solutions for cross-DC interaction.
Technical details of the transit, methods, and technical stack in terms of software engineering: Mysql (Galera Cluster + StandAlone), RabbitMQ, Cache Warming Strategy, Redis Replication, RabbitMQ-tools.
Cloud Driven Development: a better workflow, less worries, and more powerMarzee Labs
Platform-as-a-service (PaaS) solutions have recently sprung up for Drupal, with Pantheon and Acquia Dev Cloud leading the race. The advantages are plentiful: zero set-up costs, instant upscaling, the use of powerful services such as Apache Solr, Varnish, Redis/Memcached, automated Drupal core updates, site profiling tools, etc.
In this session, I’ll make Drupal developers familiar with PaaS, and show the concepts of “Cloud-driven development” to speed up development and deployment processes. I will show how to use your local, development, test and production environments to organize your Drupal development, and push changes back and forth using Git, Features and Drush, eliminating the need to share the database and pushing changes exclusively via code. Finally, Drush will make your deployment a breeze.
With the free developer subscription of Pantheon and a series of Drush commands and scripts, you will be able to start developing and deploying your own Drupal projects in the cloud, and never again worry about your server. After all, you are a Drupal Developer, not a System Administrator!
Responding rapidly when you have 100+ GB data sets in JavaPeter Lawrey
One way to speed up you application is to bring more of your data into memory. But how to do you handle hundreds of GB of data in a JVM and what tools can help you.
Mentions: Speedment, Azul, Terracotta, Hazelcast and Chronicle.
Over the years, DevOps has evolved in many aspects: from best practices, to the responsibilities of an operations engineer, to culture, the ever-changing landscape of infrastructure brings new, exciting discoveries… but also some never before seen issues! I always keep a careful record of all the problems we’ve troubleshot because you never know when a solution will come in handy. This session tells a couple of tales of dealing with databases and how troubleshooting has evolved over time.
Dmytro Patkovskyi "Practical tips regarding build optimization for those who ...Fwdays
This talk is about build optimization mechanisms available in three developer tools that are often used together (Gitlab, Gradle, and Docker). Dmytro will describe the possibilities of each instrument and advise which functions you should use and how. Additional attention will be paid to the most common pitfalls, along with handy tips and tricks. The talk will also be useful for those who use just one or two out of the tools.
"High-load is at the intersection of DevOps and PHP development", Fwdays
Let's talk about how software solutions and business needs understanding can make life easier:
CQRS-architecture in action.
The story of a seamless move to another DC.
Cross-DC infrastructure and data processing.
Software solutions for cross-DC interaction.
Technical details of the transit, methods, and technical stack in terms of software engineering: Mysql (Galera Cluster + StandAlone), RabbitMQ, Cache Warming Strategy, Redis Replication, RabbitMQ-tools.
Cloud Driven Development: a better workflow, less worries, and more powerMarzee Labs
Platform-as-a-service (PaaS) solutions have recently sprung up for Drupal, with Pantheon and Acquia Dev Cloud leading the race. The advantages are plentiful: zero set-up costs, instant upscaling, the use of powerful services such as Apache Solr, Varnish, Redis/Memcached, automated Drupal core updates, site profiling tools, etc.
In this session, I’ll make Drupal developers familiar with PaaS, and show the concepts of “Cloud-driven development” to speed up development and deployment processes. I will show how to use your local, development, test and production environments to organize your Drupal development, and push changes back and forth using Git, Features and Drush, eliminating the need to share the database and pushing changes exclusively via code. Finally, Drush will make your deployment a breeze.
With the free developer subscription of Pantheon and a series of Drush commands and scripts, you will be able to start developing and deploying your own Drupal projects in the cloud, and never again worry about your server. After all, you are a Drupal Developer, not a System Administrator!
Responding rapidly when you have 100+ GB data sets in JavaPeter Lawrey
One way to speed up you application is to bring more of your data into memory. But how to do you handle hundreds of GB of data in a JVM and what tools can help you.
Mentions: Speedment, Azul, Terracotta, Hazelcast and Chronicle.
How We Made Scylla Maintenance Easier, Safer and FasterScyllaDB
Many Scylla maintenance operations require significant data movement between database nodes in a cluster. It is not an easy task to make the management operations efficient while maintaining minimum impact on the workload all the time. In this talk, we will share how we made those maintenance operations easier, safer and faster with the new Scylla features and improvements, e.g., seedless, repair based node operations, smarter off-strategy compaction, io bandwidth limiter for repair and compaction, parallel repair in Scylla Manger and more.
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
Netflix has updated and added new tools and benchmarks for Cassandra in the last year. In this talk we will cover the latest additions and recipes for the Astyanax Java client, updates to Priam to support Cassandra 1.2 Vnodes, plus newly released and upcoming tools that are all part of the NetflixOSS platform. Following on from the Cassandra on SSD on AWS benchmark that was run live during the 2012 Summit, we've been benchmarking a large write intensive multi-region cluster to see how far we can push it. Cassandra is the data storage and global replication foundation for the Cloud Native architecture that runs Netflix streaming for 36 Million users. Netflix is also offering a Cloud Prize for open source contributions to NetflixOSS, and there are ten categories including Best Datastore Integration and Best Contribution to Performance Improvements, with $10K cash and $5K of AWS credits for each winner. We'd like to pay you to use our free software!
Teads is #1 in Video Ads. Read how Teads handles up to ~1 million requests/s with Apache Cassandra. How do we tuned Cassandra servers and clients. What issues we faced during the last year. How do we provision our clusters. Which tools are used: Datadog for monitoring and alerting, Cassandra reaper, Rundeck, Sumologic, cassandra_snapshotter. Why do we need a fork.
An introduction to Netty. A powerful framework to develop networking applications.
This is suppose to be followed as hands on training, as the exercises on the slides imply, but can be also used an introduction guidance.
Microservices for performance - GOTO Chicago 2016Peter Lawrey
How do Microservices and Trading Systems overlap?
How can one area learn from the other?
How can we test components of microservices?
Is there a library which helps us implement and test these services?
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScyllaDB
Eventually consistent databases choose to remain available under failure, allowing for conflicting data to be stored in different replicas (later repaired by background processes). Weakening the consistency guarantees improves not only availability, but also performance, as the number of replicas involved in a given operation can be minimized. There are, however, use-cases that require the opposite trade-off. Indeed, Apache Cassandra and Scylla provide Lightweight Transactions (LWT), which allow single-key linearizable updates. The mechanism underlying LWT is asynchronous consensus. In this talk, we'll describe the characteristics and requirements of Scylla's consensus implementation, and how it enables strongly consistent updates. We will also cover how consensus can be applied to other aspects of the system, such as schema changes, node membership, and range movements, in order to improve their reliability and safety. We will thus show that an eventually consistent database can leverage consensus without compromising either availability or performance.
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesRaghavendra Prabhu
This is a talk about orchestration of Cassandra with cassandra operator, kubernetes and Yelp PaaSTA (https://github.com/Yelp/paasta).
The talk was presented at Computer Laboratory, University of Cambridge as part of the Engineering, Science and Technology Event (https://www.careers.cam.ac.uk/recruiting/event2Tech.asp) in November 2019.
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - SlidesSeveralnines
Galera is a MySQL replication technology that can simplify the design of a high availability application stack. With a true multi-master MySQL setup, an application can now read and write from any database instance without worrying about master/slave roles, data integrity, slave lag or other drawbacks of asynchronous replication.
And that all sounds great until it’s time to go into production. Throw in a live migration from an existing database setup and devops life just got a bit more interesting ...
So if you are in devops, then this webinar is for you!
Operations is not so much about specific technologies, but about the techniques and tools you use to deploy and manage them. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider – preferably before going live.
Let us guide you through 9 key tips to consider before taking Galera Cluster into production.
Software-as-a-Service has become a very popular software delivery method due to its inherent advantages to both the service provider and the consumer. Startups are emerging businesses that usually provide innovative products to win a market share. In the recent past there are many Information Technology startups adopt SaaS as a way to quickly deliver their products to customers.
This talk is discusses the software engineering challenges in a SaaS startup environment, so that software practitioners those who do not have experience in such an environment can foresee what to be expected.
Every company likes to brag about their successes, but not many are willing to talk about their failures. At PagerDuty we have been rigorously tracking downtime in order to analyze it and learn from our mistakes - we even blog about these failures publicly.
Despite being a highly available system, we have had three outages caused by problems with our production Cassandra clusters over the past year. We'll take a look at each of these outages: what we saw from the inside, the actions we took to recover, and most importantly the procedures and monitoring that will help prevent it from happening to you.
How We Made Scylla Maintenance Easier, Safer and FasterScyllaDB
Many Scylla maintenance operations require significant data movement between database nodes in a cluster. It is not an easy task to make the management operations efficient while maintaining minimum impact on the workload all the time. In this talk, we will share how we made those maintenance operations easier, safer and faster with the new Scylla features and improvements, e.g., seedless, repair based node operations, smarter off-strategy compaction, io bandwidth limiter for repair and compaction, parallel repair in Scylla Manger and more.
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
Netflix has updated and added new tools and benchmarks for Cassandra in the last year. In this talk we will cover the latest additions and recipes for the Astyanax Java client, updates to Priam to support Cassandra 1.2 Vnodes, plus newly released and upcoming tools that are all part of the NetflixOSS platform. Following on from the Cassandra on SSD on AWS benchmark that was run live during the 2012 Summit, we've been benchmarking a large write intensive multi-region cluster to see how far we can push it. Cassandra is the data storage and global replication foundation for the Cloud Native architecture that runs Netflix streaming for 36 Million users. Netflix is also offering a Cloud Prize for open source contributions to NetflixOSS, and there are ten categories including Best Datastore Integration and Best Contribution to Performance Improvements, with $10K cash and $5K of AWS credits for each winner. We'd like to pay you to use our free software!
Teads is #1 in Video Ads. Read how Teads handles up to ~1 million requests/s with Apache Cassandra. How do we tuned Cassandra servers and clients. What issues we faced during the last year. How do we provision our clusters. Which tools are used: Datadog for monitoring and alerting, Cassandra reaper, Rundeck, Sumologic, cassandra_snapshotter. Why do we need a fork.
An introduction to Netty. A powerful framework to develop networking applications.
This is suppose to be followed as hands on training, as the exercises on the slides imply, but can be also used an introduction guidance.
Microservices for performance - GOTO Chicago 2016Peter Lawrey
How do Microservices and Trading Systems overlap?
How can one area learn from the other?
How can we test components of microservices?
Is there a library which helps us implement and test these services?
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScyllaDB
Eventually consistent databases choose to remain available under failure, allowing for conflicting data to be stored in different replicas (later repaired by background processes). Weakening the consistency guarantees improves not only availability, but also performance, as the number of replicas involved in a given operation can be minimized. There are, however, use-cases that require the opposite trade-off. Indeed, Apache Cassandra and Scylla provide Lightweight Transactions (LWT), which allow single-key linearizable updates. The mechanism underlying LWT is asynchronous consensus. In this talk, we'll describe the characteristics and requirements of Scylla's consensus implementation, and how it enables strongly consistent updates. We will also cover how consensus can be applied to other aspects of the system, such as schema changes, node membership, and range movements, in order to improve their reliability and safety. We will thus show that an eventually consistent database can leverage consensus without compromising either availability or performance.
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesRaghavendra Prabhu
This is a talk about orchestration of Cassandra with cassandra operator, kubernetes and Yelp PaaSTA (https://github.com/Yelp/paasta).
The talk was presented at Computer Laboratory, University of Cambridge as part of the Engineering, Science and Technology Event (https://www.careers.cam.ac.uk/recruiting/event2Tech.asp) in November 2019.
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - SlidesSeveralnines
Galera is a MySQL replication technology that can simplify the design of a high availability application stack. With a true multi-master MySQL setup, an application can now read and write from any database instance without worrying about master/slave roles, data integrity, slave lag or other drawbacks of asynchronous replication.
And that all sounds great until it’s time to go into production. Throw in a live migration from an existing database setup and devops life just got a bit more interesting ...
So if you are in devops, then this webinar is for you!
Operations is not so much about specific technologies, but about the techniques and tools you use to deploy and manage them. Monitoring, managing schema changes and pushing them in production, performance optimizations, configurations, version upgrades, backups; these are all aspects to consider – preferably before going live.
Let us guide you through 9 key tips to consider before taking Galera Cluster into production.
Software-as-a-Service has become a very popular software delivery method due to its inherent advantages to both the service provider and the consumer. Startups are emerging businesses that usually provide innovative products to win a market share. In the recent past there are many Information Technology startups adopt SaaS as a way to quickly deliver their products to customers.
This talk is discusses the software engineering challenges in a SaaS startup environment, so that software practitioners those who do not have experience in such an environment can foresee what to be expected.
Every company likes to brag about their successes, but not many are willing to talk about their failures. At PagerDuty we have been rigorously tracking downtime in order to analyze it and learn from our mistakes - we even blog about these failures publicly.
Despite being a highly available system, we have had three outages caused by problems with our production Cassandra clusters over the past year. We'll take a look at each of these outages: what we saw from the inside, the actions we took to recover, and most importantly the procedures and monitoring that will help prevent it from happening to you.
1. If it’s not SQL, it’s not a database.
2. It takes 5+ years to build a database.
3. Listen to your users.
4. Too much magic is a bad thing.
5. It’s the cloud, stupid.
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...Codemotion
Vast volume of our processed data is Time Series data and once you start working with distributed systems, you start tackling many scale and performance problems: How to handle missing data?Should I handle both serving and backed process or separating them out? Best Performance for Money? In the talk we will tell the tale of all of the transformations we’ve made to our data model@Windward, some of the problems we’ve handled, review the multiple data persistency layers like: S3, MongoDB, Apache Cassandra, MySQL. And I’ll try my best NOT to answer the question “Which one of them is the Best?"
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...Severalnines
Galera Cluster for MySQL / MariaDB is easy to deploy, but how does it behave under real workload, scale, and during long term operation? Proof of concepts and lab tests usually work great for Galera, until it’s time to go into production. Throw in a live migration from an existing database setup and devops life just got a bit more interesting ...
If this scenario sounds familiar, then this webinar replay is for you!
AGENDA
101 Sanity Check
Operating System
Backup Strategies
Replication & Sync
Query Performance
Schema Changes
Security / Encryption
Reporting
Managing from disaster
SPEAKER
Johan Andersson, CTO, Severalnines - Johan's technical background and interest are in high performance computing as demonstrated by the work he did on main-memory clustered databases at Ericsson as well as his research on parallel Java Virtual Machines at Trinity College Dublin in Ireland. Prior to co-founding Severalnines, Johan was Principal Consultant and lead of the MySQL Clustering & High Availability consulting group at MySQL / Sun Microsystems / Oracle, where he designed and implemented large-scale MySQL systems for key customers. Johan is a regular speaker at MySQL User Conferences as well as other high profile community gatherings with popular talks and tutorials around architecting and tuning MySQL Clusters.
This is part one of my Monitoring Distributed Apps series.
Here we explore premises of Distributed Application monitoring focusing on metrics, why do we need them and gradually introducing Prometheus as a solution.
The video recording is available here: https://youtu.be/lvogDmRN-Hs
In file systems, large sequential writes are more beneficial than small random writes, and hence many storage systems implement a log structured file system. In the same way, the cloud favors large objects more than small objects. Cloud providers place throttling limits on PUTs and GETs, and so it takes significantly longer time to upload a bunch of small objects than a large object of the aggregate size. Moreover, there are per-PUT calls associated with uploading smaller objects.
In Netflix, a lot of media assets and their relevant metadata is generated and pushed to cloud.
We would like to propose a strategy to compact these small objects into larger blobs before uploading them to Cloud. We will discuss how to select relevant smaller objects, and manage the indexing of these objects within the blob along with modification in reads, overwrites and deletes.
Finally, we would showcase the potential impact of such a strategy on Netflix assets in terms of cost and performance.
So you've been deploying Java in the cloud and are wondering how to handle the new world of containers, microservices, and memory constraints. Cold starts got you down? Come to this session to learn about how the OpenJ9 and the JVM in general can help you on your Cloud Native journey.
Galaxy Big Data with MariaDB 10 by Bernard Garros, Sandrine Chirokoff and Stéphane Varoqui.
Presented 26.6.2014 at the MariaDB Roadshow in Paris, France.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
Operating and Supporting Delta Lake in ProductionDatabricks
Delta lake is widely adopted. There are things to be aware of when dealing with petabytes of data in Delta Lake. These smart decisions can give the best efficiency and increase the adoption of Delta. Best practices like OPTIMIZE, ZORDER have to wisely chosen. We have support stories where we successfully resolved performance issues by applying the right performance strategy. There are a set of common issues or repeated questions from our strategic customers face when using Delta and in this session we cover them and how to address them.
Netflix Open Source Meetup Season 4 Episode 2aspyker
In this episode, we will take a close look at 2 different approaches to high-throughput/low-latency data stores, developed by Netflix.
The first, EVCache, is a battle-tested distributed memcached-backed data store, optimized for the cloud. You will also hear about the road ahead for EVCache it evolves into an L1/L2 cache over RAM and SSDs.
The second, Dynomite, is a framework to make any non-distributed data-store, distributed. Netflix's first implementation of Dynomite is based on Redis.
Come learn about the products' features and hear from Thomson and Reuters, Diego Pacheco from Ilegra and other third party speakers, internal and external to Netflix, on how these products fit in their stack and roadmap.
Similar to Experiences building a multi region cassandra operations orchestrator on aws (20)
Encryption Deep Dive: Randomness, Entropy, RNG, PRNG, AES, AES Operational Modes, Data Rotations, Java Encryption APIs, Tradeoffs, challenges, Envelope Encryption, KMS, and much more on all things encryption.
Design is Not Subjective! Software design and Lean UX, Ux, Design Thinking are not that different after all. UML was in the right direction the problem was where we applied. In this video, I will explain why Design is not subjective. Video https://www.youtube.com/watch?v=ijGR6Tbhr54
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
12. CM Features
❏ Support for CASS 2.2X e 3.1.X
❏ Backups and point in time restores
❏ Seeds / Token Management
❏ Full AWS Automation (SG, LC and ASG)
❏ Automated node Replacement
❏ Automated Node-by-node repairs
❏ Multi-dc support
❏ REST interfaces
❏ CM Internal state durability / Recovery (local disk and S3)
❏ 100%automated operations for:
❏ Cluster: creation, search, shutdown
D
15. CM use cases
❏ Source of Truth of Most microservices
❏ Single Region Cluster
❏ Batch/Streaming Application (Previously with HBase)
❏ Multi-Region Region Cluster
❏ API Gateway (Kong)
❏ Authentication Microservice
D
20. Step Framework
❏ One task has multiple steps
❏ Order
❏ Run a list of steps for cassandra nodes
❏ Tracker the current step running by node
❏ Skip steps
❏ If step fail, send the message for slack channel
❏ SignalFX
1- Create directories
BACKUP
2- Copy data
3- Send to S3
RESTORE
1- Download backups
2- Copy data
3- Restart cassandra
J
22. Recovery old and new model
❏ OLD way
❏ Disk first
❏ S3 every minute
❏ Flaky: No covering all corner cases
❏ New way
❏ Disk
❏ Send to all Cass nodes
❏ In case of failure call all cass nodes
❏ Get the highest TIMESTAMP and use it.
❏ More reliable
TODO draw jackson
J
24. Multi-Region Design
❏ CM Topology
❏ Dedicated: 1-1
❏ Shared: 1-N
❏ Infrastructure details:
❏ CM in both regions exchanges
information
❏ CM internode communication with EIP
❏ Public IP + PEM -> VPC Peering
❏ Cassandra:
❏ 2 seeds on US, 1 seed EU
❏ Seeds boots up first
❏ Replicates is async between regions
J
26. ❏ Clients are: Developers and Cloud Operators
❏ Plannings per Quarters
❏ Tech Lead / Coach
❏ Retro every month
❏ Coaching Sessions - 101
❏ Design session
❏ Reviews
❏ Refactoring
❏ Kanban + google sheets + trello
❏ DevOps Principles - i.e: Immutable Infrastructure
How the team works? Practices.
D
27. How the team works? Tracking.
❏ Tell me a engineer who likes JIRA? Just PMs like JIRA.
❏ We was not using issue tracking first
❏ Issues lost
❏ Look for emails
❏ Ask several times about issues
❏ Repeat same design over and over
❏ Come up in a retrospective
❏ Github as issue tracking
❏ Log issues: bugs and enhancements
❏ Github release tracking
D
28. How the team works? Kanban + Predictability
❏ Simple Google Sheets
❏ Items / Weeks
❏ Check every week is you are on track or not
❏ 100% accuracy for features
❏ 100% WRONG estimate for BUGS (2 weeks ~ 2 months)
❏ Different Nature: Microservices VS Data Layer
❏ Very hard to estimate bugs - Solution?
❏ You can't automate what you don't know
❏ Stability Mindset
❏ Don't introduce bugs == Developer Checklists
❏ Force you to know what to automate later
D
29. How the team works? Releases. Stabilization Windows
❏ 4 Quarters
❏ ~Monthly releases
❏ Looks like waterfall or buffering
❏ Avoid ship bugs to customers
❏ Avoid downtimes
❏ Avoid losing data
❏ It's a must in data layer
❏ Data layer need to be more reliable them microservices
❏ How we did it ?
❏ Single Region - Stabilization window 1
❏ Multi-DC - Stabilization Window 2
D
30. How the team works?Documentation and Scalability
❏ About our customer: 42 countries organization
❏ Meetings are bottleneck for scalability
❏ Jenkins DSL (Code in General) kills scalability
❏ Service-Service kills tickets
❏ Documentations kills meetings
❏ Documentation matters
❏ Time Zones
❏ English
❏ Avoid Repetition
D
31. How the team works? Tests! Stability + Checklists
❏ Unit Tests
❏ Integration Tests
❏ Exploratory Tests
❏ Release 1 - 30 Issues (most bugs)
❏ Release 2 - 20 issues (most enchantments)
❏ Stability Mindset / Principles
❏ Exploration tests are a MUST
❏ Try to maximize coverage spectrum
❏ Developer Checklists Works very well
D
32. How the team works? Refactorings.
❏ Strategic VS Tactical Programing
❏ Several Important Refactorings(Re-Designs) like:
❏ Thread Model
❏ Tasks Responsibility
❏ Utils
❏ And much more…
❏ Easy to do In java and good tooling like: Eclipse.
❏ Pay off in a long run
❏ Kill you if you don't do it.
D
33. Flaky Tests
❏ Integration tests
❏ ~20 minutes
❏ Cassandra 3x and Cassandra 2x
❏ Hard to maintain
❏ Async AWS apis (SG, LC and ASG)
❏ Fixed timeout == unstable tests
❏ Solution: Progressive timeout
T
35. Remediation
❏ Why Remediate?
❏ Manual Steps are dangerous
❏ Bad time == Lots of pressure
❏ Started with Dynomite
❏ Scale Up
❏ AMI Patch
❏ Refactor to support Cassandra and CM
❏ Calls DM and CM Health Checkers
❏ Procedural process
❏ Relies one: DM cold bootstrap and CM node_replace + repair.
D
37. Downtime VS No Downtime: Forklift + Dual Write
❏ Downtime
❏ Dump data to file
❏ Dump Keyspace/Schema to file
❏ Upload to S3
❏ Import in new cluster
❏ No-Downtime
❏ Forklift + Dual writer pattern
❏ Requires code in the microservices
❏ Requires orchestration in Spinnaker.
D
41. Troubleshooting / Police Forensic Skills
Remediation kill too many
nodes and replace did not
happen... why?
A) AWS Ec2
B) Jenkins
C) CM Java Code
D) Python Demon
E) Java Remediation code
F) AWS S3
G) Cassandra Node
H) Cassandra Cluster
I) Time
J) None above
J
42. Troubleshooting / Police Forensic Skills
J
Remediation CM
Cass US 2A
Cass US 2B
Cass US 2C
Cass EU 1A
Cass EU 2B
Cass EU 2C
Cluster
activity?
Cass US 2A
ASG (kill box)
New IP?
43. Fast Vs Slow Issue!
❏ Only with Theories
❏ EVIDENCE to back up our
theories/assumptions
❏ Simulations
❏ Solution:
❏ AWS Chaos service :-)
❏ < 1 mim = FAST
❏ > 3 mim = SLOW
❏ In the end of the day it's all
about 90s internal TTL
❏ Wait for replace to make sure
reflect the REAL world
❏ Wait for HC to make sure
capture real world
J
52. Cass 2.1.x to cass 2.2.x issues
❏ Node replace stop working
❏ We generate cass config files
❏ Position and parameters changed from 2.1 to 2.2
❏ Our code breaked
❏ Big changes on migration from Cass 2.1.x to 2.2.x
❏ Improvements
❏ Improved repair performance.
❏ The commit log is compressed to save disk space.
❏ Fixes
❏ Fix repair hang when snapshot failed (CASSANDRA-10057)
❏ Fix potential NPE on ORDER BY queries with IN (CASSANDRA-10955)
❏ Fix handling of nulls and unsets in IN conditions (CASSANDRA-12981)
❏ https://github.com/apache/cassandra/blob/cassandra-2.2/CHANGES.txt
T
54. CASS Stress/Load Tests
❏ Some bugs only appears when testing with volume
❏ Add volume might be tricky and time consuming
❏ Latency (do not run scripts from you local env)
❏ Filling up a table with a few text files will take too much time
❏ Parelization is needed
❏ Cassandra-Stress tool comes up handy on such scenario
❏ Customize how many rows and how many parallel threads writes
❏ It used tables with blobs
❏ Customize schema, replication factors and consistency level while running scripts
J
55. OOM Outage! EBS vs Instance Store
❏ EBS is a SPOF
❏ EBS is more expansive
❏ EBS is less performatic
❏ EBS is more flexible
❏ Disk spaces was critical to us
❏ You don’t want run out of disk, believe us..
❏ Dynamic Disk space definition while launching a cluster
❏ Disk space validations before starting a backup
J
56. Side note on Cass 4.x
❏ Cass 3.x is better than cass 2.x right now
❏ Cass 4.x will be awesome
❏ Netflix work on incremental repairs
❏ Bug fixes - like gossip threads and restart issue
❏ Way more stable - everybody should migrate.
❏ Having Less cassandra versions reduce complexity
❏ Different configurations
❏ Bugs that was fixed and you don't get it - lack of backport(old versions)
D