Aleksandr Tavgen from Playtech, the world’s largest online gambling software supplier, will share how they are using InfluxDB 2.0, Flux, and the OpenTracingAPI to gain full observability of their platform. In addition, he will share how InfluxDB has served as the glue to cope with multiple sets of time series data.
It covers general problem of creating monitoring and observability without killing your Ops motivation team with False Positives and unexplained alerts.
Problems on this side, pitfalls, anti-patterns, and how to make it right.
How to manage a monitoring zoo. Spaghettification of dashboards. Why Uber needs 9 billion metrics (¯\_(ツ)_/¯) and why this is antipattern. Metrics as a stream of data. We talk about new Flux language from InfluxDb. A bit of time series analysis and defining of pipelines in Flux for metrics data. Drunkyard walk on your metrics or why to measure a randomness.
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Aleksandr Tavgen
Talk about approaches to an observability. Do we need millions of metrics? Anomalies vs regularities? Can Machine Learning help us? Some abilities of Flux language by InfluxData
Monitoring services is easy, right? Set up a notification that goes out when a certain number increases past a certain threshold to let you know that there’s a problem. But if that’s the case, why are so many teams drowning in alerts and dreading their time on call? The reason is that we tend to monitor the wrong things: reactive alerts, metrics that we don’t completely understand how they impact our service, and capacity alerts. We look at our own view of the service and fail to consider that our customers have a different view.
Come learn to let go of what does not help, and explore how to monitor for what truly matters: what the customer sees. This starts with defining our agreements with our customers, continues through building applications intelligently and instrumenting all the things, and finishes with picking the right signals out of that instrumentation to generate alerts that are actionable, not ones that introduce confusion and noise. We will also touch on capacity planning, and how it should never wake you up. You’ll find it’s possible to assure that you meet your service level objectives while still maximizing your sleep level objectives.
Code Yellow: Helping Operations Top-Heavy Teams the Smart WayTodd Palino
All engineering teams run into trouble from time to time. Alert fatigue, caused by technical debt or a failure to plan for growth, can quickly burn out SREs, overloading both development and operations with reactive work. Layer in the potential for communication problems between teams, and we can find ourselves in a place so troublesome we cannot easily see a path out. At times like this, our natural instinct as reliability engineers is to double down and fight through the issues. Often, however, we need to step back, assess the situation, and ask for help to put the team back on the road to success.
We will look at the process for Code Yellow, the term we use for this process of “righting the ship”, and discuss how to identify teams that are struggling. Through a look at three separate experiences, we will examine some of the root causes, what steps were taken, and how the engineering organization as a whole supports the process.
The more we are connected and the more others are connected to us, the more important reliability of your sites becomes. Site Reliability Engineering is an engineering discipline devoted to helping an organization sustainably achieve the appropriate level of reliability in their systems, services, and products. But what does this mean, and how do get started with this? In this session I will talk about the concepts of Site Reliability Engineering and use Microsoft Azure to implement some of the concepts and practices
It covers general problem of creating monitoring and observability without killing your Ops motivation team with False Positives and unexplained alerts.
Problems on this side, pitfalls, anti-patterns, and how to make it right.
How to manage a monitoring zoo. Spaghettification of dashboards. Why Uber needs 9 billion metrics (¯\_(ツ)_/¯) and why this is antipattern. Metrics as a stream of data. We talk about new Flux language from InfluxDb. A bit of time series analysis and defining of pipelines in Flux for metrics data. Drunkyard walk on your metrics or why to measure a randomness.
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Aleksandr Tavgen
Talk about approaches to an observability. Do we need millions of metrics? Anomalies vs regularities? Can Machine Learning help us? Some abilities of Flux language by InfluxData
Monitoring services is easy, right? Set up a notification that goes out when a certain number increases past a certain threshold to let you know that there’s a problem. But if that’s the case, why are so many teams drowning in alerts and dreading their time on call? The reason is that we tend to monitor the wrong things: reactive alerts, metrics that we don’t completely understand how they impact our service, and capacity alerts. We look at our own view of the service and fail to consider that our customers have a different view.
Come learn to let go of what does not help, and explore how to monitor for what truly matters: what the customer sees. This starts with defining our agreements with our customers, continues through building applications intelligently and instrumenting all the things, and finishes with picking the right signals out of that instrumentation to generate alerts that are actionable, not ones that introduce confusion and noise. We will also touch on capacity planning, and how it should never wake you up. You’ll find it’s possible to assure that you meet your service level objectives while still maximizing your sleep level objectives.
Code Yellow: Helping Operations Top-Heavy Teams the Smart WayTodd Palino
All engineering teams run into trouble from time to time. Alert fatigue, caused by technical debt or a failure to plan for growth, can quickly burn out SREs, overloading both development and operations with reactive work. Layer in the potential for communication problems between teams, and we can find ourselves in a place so troublesome we cannot easily see a path out. At times like this, our natural instinct as reliability engineers is to double down and fight through the issues. Often, however, we need to step back, assess the situation, and ask for help to put the team back on the road to success.
We will look at the process for Code Yellow, the term we use for this process of “righting the ship”, and discuss how to identify teams that are struggling. Through a look at three separate experiences, we will examine some of the root causes, what steps were taken, and how the engineering organization as a whole supports the process.
The more we are connected and the more others are connected to us, the more important reliability of your sites becomes. Site Reliability Engineering is an engineering discipline devoted to helping an organization sustainably achieve the appropriate level of reliability in their systems, services, and products. But what does this mean, and how do get started with this? In this session I will talk about the concepts of Site Reliability Engineering and use Microsoft Azure to implement some of the concepts and practices
How to Implement Disaster Recovery in the CloudBluelock
Learn how disaster recovery in the cloud makes DR easy, efficient and affordable. Cloud-based Recovery-as-a-Service is the latest in disaster recovery technology. Recovery-as-a-Service (RaaS) is the ideal on-ramp to cloud to solve your need to recover quickly, easily and efficiently after a disaster strikes.
Learn how four companies architected their RaaS solutions and protected entire applications in the cloud while reducing costs.
LEARN HOW TO:
- Protect and recover applications quickly
- Lower costs while improving RTO and RPOs
- Implement easy and affordable testing
- Recover into an enterprise-grade cloud environment
- Reduce downtime and business risk
http://www.bluelock.com/cloud-services/raas/
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaKeet Sugathadasa
When it comes to Site Reliability Engineering, short for SRE, the resources available online are only limited to the books published by Google themselves. They do share some useful case studies that will help us understand what SRE is, and how to understand the concepts given in it, but they do not clearly explain how to build your own SRE team for your organization. The concept of SRE was cooked fresh within the walls of Google and later released to the general public as a practice for anyone to follow.
In this presentation I would like to give a brief introduction to SRE and why it is important to any Software Engineering organization. This is based on my experiences and learnings from leading a Site Reliability Engineering team for leading organizations in the US and Norway.
This presentation was conducted by me as a Tech Talk as an Associate Technical Lead at Creative Software Sri Lanka.
Presentation of skyWATS.com, a software solution to automate data collecting and reporting from electronics manufacturing test.
skyWATS gives you a high level overview of your test results, including dashboard reporting of true first pass yield, failure pareto, CPK, SPC, repair data, Gage R&R, OEE and more.
Visit skyWATS.com for more info, and a free trial.
Plan Your IaaS Environment for Optimal PerformanceRISC Networks
In this presentation, it helps you establish a performance baseline that you can use to ensure that you select the appropriate server move groups and address any server or application dependencies in your strategy. Our goal is to give you best practices that you can use when planning and executing your move to Cloud IaaS and to get the best performance out of it.
To watch the live webinar for this presentation, please go to https://www.brighttalk.com/webcast/10539/97643
Презентация подготовлена по материалам выступления Анатолия Таразевича на витебском Miniq #27, который был проведен 30 июля 2020:
https://community-z.com/events/miniq-vitebsk-27
Про доклад:
Оценки, как правило, являются необходимым злом в разработке программного обеспечения. К сожалению, люди склонны считать, что написание нового программного обеспечения - это все равно что строить дом или ремонтировать автомобиль, и что подрядчик или участвующий в нем механик должны быть в состоянии обеспечить надежную и точную оценку обьема работы и сроков её выполнения. Но это не всегда так и для того чтобы этого достичь понимания в этом вопросе - важно чтобы все люди участвующие в процессе разработки, будь то разработчик или бизнесс-аналитик, понимали главные особенности и законы эстимаций об этом мы и поговорим.
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
In any software organization, stability & innovation are always at loggerheads - the faster you move, the more things will break. This talk defines what SRE org looks like at high-tech organizations (Google, Uber).
Orangescrum Enterprise edition is widely popular as it offers end to end project management capabilities across industries. Its plugins can be seamlessly installed and offer you perpetual license for unlimited users.
Independently from the DevOps movement but starting from the same problems, Google developed its own strategy defining a new specific role called SRE (Site Reliability Engineer). This introduction tries to explain the history and the concept of this methodology and to compare it with the DevOps manifesto to understand what does it mean to adopt DevOps and what does it mean to be an SRE and what the two things are sharing and where they diverge.
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...InfluxData
Aleksandr Tavgen from Playtech, the world’s largest online gambling software supplier, will share how they are using InfluxDB 2.0, Flux, and the OpenTracingAPI to gain full observability of their platform. In addition, he will share how InfluxDB has served as the glue to cope with multiple sets of time series data, especially in the case of understanding online user activity — a use case that is normally difficult without the math functions now available with Flux.
How to Implement Disaster Recovery in the CloudBluelock
Learn how disaster recovery in the cloud makes DR easy, efficient and affordable. Cloud-based Recovery-as-a-Service is the latest in disaster recovery technology. Recovery-as-a-Service (RaaS) is the ideal on-ramp to cloud to solve your need to recover quickly, easily and efficiently after a disaster strikes.
Learn how four companies architected their RaaS solutions and protected entire applications in the cloud while reducing costs.
LEARN HOW TO:
- Protect and recover applications quickly
- Lower costs while improving RTO and RPOs
- Implement easy and affordable testing
- Recover into an enterprise-grade cloud environment
- Reduce downtime and business risk
http://www.bluelock.com/cloud-services/raas/
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaKeet Sugathadasa
When it comes to Site Reliability Engineering, short for SRE, the resources available online are only limited to the books published by Google themselves. They do share some useful case studies that will help us understand what SRE is, and how to understand the concepts given in it, but they do not clearly explain how to build your own SRE team for your organization. The concept of SRE was cooked fresh within the walls of Google and later released to the general public as a practice for anyone to follow.
In this presentation I would like to give a brief introduction to SRE and why it is important to any Software Engineering organization. This is based on my experiences and learnings from leading a Site Reliability Engineering team for leading organizations in the US and Norway.
This presentation was conducted by me as a Tech Talk as an Associate Technical Lead at Creative Software Sri Lanka.
Presentation of skyWATS.com, a software solution to automate data collecting and reporting from electronics manufacturing test.
skyWATS gives you a high level overview of your test results, including dashboard reporting of true first pass yield, failure pareto, CPK, SPC, repair data, Gage R&R, OEE and more.
Visit skyWATS.com for more info, and a free trial.
Plan Your IaaS Environment for Optimal PerformanceRISC Networks
In this presentation, it helps you establish a performance baseline that you can use to ensure that you select the appropriate server move groups and address any server or application dependencies in your strategy. Our goal is to give you best practices that you can use when planning and executing your move to Cloud IaaS and to get the best performance out of it.
To watch the live webinar for this presentation, please go to https://www.brighttalk.com/webcast/10539/97643
Презентация подготовлена по материалам выступления Анатолия Таразевича на витебском Miniq #27, который был проведен 30 июля 2020:
https://community-z.com/events/miniq-vitebsk-27
Про доклад:
Оценки, как правило, являются необходимым злом в разработке программного обеспечения. К сожалению, люди склонны считать, что написание нового программного обеспечения - это все равно что строить дом или ремонтировать автомобиль, и что подрядчик или участвующий в нем механик должны быть в состоянии обеспечить надежную и точную оценку обьема работы и сроков её выполнения. Но это не всегда так и для того чтобы этого достичь понимания в этом вопросе - важно чтобы все люди участвующие в процессе разработки, будь то разработчик или бизнесс-аналитик, понимали главные особенности и законы эстимаций об этом мы и поговорим.
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
In any software organization, stability & innovation are always at loggerheads - the faster you move, the more things will break. This talk defines what SRE org looks like at high-tech organizations (Google, Uber).
Orangescrum Enterprise edition is widely popular as it offers end to end project management capabilities across industries. Its plugins can be seamlessly installed and offer you perpetual license for unlimited users.
Independently from the DevOps movement but starting from the same problems, Google developed its own strategy defining a new specific role called SRE (Site Reliability Engineer). This introduction tries to explain the history and the concept of this methodology and to compare it with the DevOps manifesto to understand what does it mean to adopt DevOps and what does it mean to be an SRE and what the two things are sharing and where they diverge.
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...InfluxData
Aleksandr Tavgen from Playtech, the world’s largest online gambling software supplier, will share how they are using InfluxDB 2.0, Flux, and the OpenTracingAPI to gain full observability of their platform. In addition, he will share how InfluxDB has served as the glue to cope with multiple sets of time series data, especially in the case of understanding online user activity — a use case that is normally difficult without the math functions now available with Flux.
Azure architecture design patterns - proven solutions to common challengesIvo Andreev
Building a reliable, scalable, secure applications could happen either following verified design patterns or the hard way - following the trial and error approach. Azure architecture patterns are a tested and accepted solutions of common challenges thus reducing the technical risk to the project by not having to employ a new and untested design. However, most of the patterns are relevant to any distributed system, whether hosted on Azure or on other cloud platforms.
Performance doesn’t have the same definition between system administrators, developpers and business teams. What is Performance ? High CPU usage, not scalable web site, low business transaction rate per sec, slow response time, … This presentation is about maths, code performance, load testing, web performance, best practices, … Working on performance optimizaton is a very broad topic. It’s important to really understand main concepts and to have a clean and strong methodology because it could be a very time consumming activity. Happy reading !
Brian Schouten, Director of Technical Presales for PROSTEP INC describes the requirements, risks, strategy, and technical considerations of "do-it-yourself" PLM Migrations for ENOVIA 3D EXPERIENCE.
A Practical Guide to Selecting a Stream Processing Technology confluent
Presented by Michael Noll, Product Manager, Confluent.
Why are there so many stream processing frameworks that each define their own terminology? Are the components of each comparable? Why do you need to know about spouts or DStreams just to process a simple sequence of records? Depending on your application’s requirements, you may not need a full framework at all.
Processing and understanding your data to create business value is the ultimate goal of a stream data platform. In this talk we will survey the stream processing landscape, the dimensions along which to evaluate stream processing technologies, and how they integrate with Apache Kafka. Particularly, we will learn how Kafka Streams, the built-in stream processing engine of Apache Kafka, compares to other stream processing systems that require a separate processing infrastructure.
Machine learning has become an important tool in the modern software toolbox, and high-performing organizations are increasingly coming to rely on data science and machine learning as a core part of their business. eBay introduced machine learning to its commerce search ranking and drove double-digit increases in revenue. Stitch Fix built a multibillion dollar clothing retail business in the US by combining the best of machines with the best of humans. And WeWork is bringing machine-learned approaches to the physical office environment all around the world. In all cases, algorithmic techniques started simple and slowly became more sophisticated over time. This talk will use these examples to derive an agile approach to machine learning, and will explore that approach across several different dimensions. We will set the stage by outlining the kinds of problems that are most amenable to machine-learned approaches as well as describing some important prerequisites, including investments in data quality, a robust data pipeline, and experimental discipline. Next, we will choose the right (algorithmic) tool for the right job, and suggest how to incrementally evolve the algorithmic approaches we bring to bear. Most fancy cutting-edge recommender systems in the real world, for example, started out with simple rules-based techniques or basic regression. Finally, we will integrate machine learning into the broader product development process, and see how it can help us to accelerate business results
A three hour lecture I gave at the Jyväskylä Summer School. The talk goes through important details about the use of data science in real businesses. These include data deployment, data processing, practical issues with data solutions and arising trends in data science.
See also Part 1 of the lecture: Introduction Data Science. You can find it in my profile (click the face)
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
Morningstar’s Risk Model project is created by stitching together statistical and machine learning models to produce risk and performance metrics for millions of financial securities. Previously, we were running a single version of this application, but needed to expand it to allow for customizations based on client demand. With the goal of running hundreds of custom Risk Model runs at once at an output size of around 1TB of data each, we had a challenging technical problem on our hands! In this presentation, we’ll talk about the challenges we faced replatforming this application to Spark, how we solved them, and the benefits we saw.
Some things we’ll touch on include how we created customized models, the architecture of our machine learning application, how we maintain an audit trail of data transformations (for rigorous third party audits), and how we validate the input data our model takes in and output data our model produces. We want the attendees to walk away with some key ideas of what worked for us when productizing a large scale machine learning platform.
Capgemini: Observability within the Dutch governmentElasticsearch
The digital landscape within Dutch government is a complex and heterogeneous mix of technologies. Within this scenario, Capgemini is tasked with continuous integration and maintenance of key infrastructure. The results connect major organizational parts of the country with a large volume of daily traffic. To keep the lights on in operation and allow for quick turn-around times, Elastic is the dominant choice for generating reliable insight. It facilitates a thorough insight into the inner workings of modern amalgamated java deployments, databases and legacy systems spanning a multitude of decades.
Recent Gartner and Capgemini studies predict only around 25% of data science projects are successful and only around 15% make it to full-scale production. Of these, many degrade in performance and produce disappointing results within months of implementation. How can focusing on the desired business outcomes and business use cases throughout a data science project help overcome the odds?
Integration strategies best practices- Mulesoft meetup April 2018Rohan Rasane
Abstract for the Mulesoft meetup in April 2018
If your organization is in the following phases of integrations:
Looking forward to integrate or connect with other applications with a platform dedicated to integrations
Does already have an integration platform and have realized that the integrations are point to point or are highly unorganized and uncontrollable
Then this session will help you identify and explore the way to build highly scalable integrations. This session will also speak about the best practises that should be be followed while maintaining the platform. There will be a sneak peek in the resiliency patterns that I love - circuit breaker and bulkheads, an inspiration from Netflix OSS
SCM Transformation Challenges and How to Overcome ThemCompuware
If your enterprise is focused on continuously improving quality, velocity and efficiency, you’re going to win against those that aren’t. Driving improvements on the mainframe, and in turn throughout the business, requires the transformation of three things: culture, processes and tools. In other words, changing mindsets, implementing modern practices (Agile, DevOps, CI/CD) and replacing outdated technology.
Mainframe source code management is currently a critical area in need of modernization and should be one of the initial tooling changes organizations make when setting out to improve mainframe systems delivery.
During this session, Compuware specialist Lars-Erik Berglund shares the challenges organizations face with mainframe source code management and what you can do to overcome those.
Similar to Using Time Series for Full Observability of a SaaS Platform (20)
Modernizing on IBM Z Made Easier With Open Source SoftwareDevOps.com
In the past decade, IDC has seen IBM Z evolve first from a siloed platform to what they call a "connected" platform, and then to a "transformative" platform. This transition has been driven by IBM, by the IBM Z software vendors, like Rocket Software, and by businesses themselves.
IDC research shows that businesses that choose to modernize IBM Z achieve higher satisfaction than re-platformers and many are using open source software (OSS) in their modernization initiatives. Employing OSS makes it possible to crack the platform open and enable it to connect to the rest of the datacenter and the outside world. Join IDC guest speaker, Al Gillen and Peter Fandel as they take a deeper look at the value proposition associated with using commercially supported OSS in mission-critical environments, like IBM Z. In this webinar we’ll discuss:
How OSS can neutralize the disparity between seasoned IBM Z and emerging developers
The modernization initiatives that involve OSS
What to consider before bringing OSS to IBM Z
How Rocket Software is delivering commercially supported OSS to IBM Z
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...DevOps.com
With the growing adoption of Kubernetes, organizations want to take advantage of containerized Microsoft SQL Server 2019 to optimize transactional performance and accelerate time-to-insights from their business-critical data. However, as enterprises embrace hybrid cloud strategy, they need to consider several aspects based on the performance, cost and data protection requirements for running enterprise-grade SQL Server databases.
In this webinar, we will compare and contrast various cloud-native platforms for SQL Server that would help CIOs, DevOps engineers, database administrators and applications architects to determine the most suitable platform that fits their business needs.
Join us as we explore some exciting results from a recent performance benchmark study conducted by McKnight Consulting Group, an independent consulting firm, to compare the performance of Microsoft SQL Server 2019 on the best possible configurations of the following Kubernetes platforms:
Diamanti Enterprise Kubernetes Platform
Amazon Web Services Elastic Kubernetes Service (AWS EKS)
Azure Kubernetes Service (AKS)
Topics will include:
Platform considerations and requirements for running Microsoft SQL Server 2019
Performance comparison and analysis of running SQL Server on various platform
Best practices for running containerized SQL Server databases in Kubernetes environment
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...DevOps.com
With the growing adoption of Kubernetes, organizations want to take advantage of containerized Microsoft SQL Server 2019 to optimize transactional performance and accelerate time-to-insights from their business-critical data. However, as enterprises embrace hybrid cloud strategy, they need to consider several aspects based on the performance, cost and data protection requirements for running enterprise-grade SQL Server databases.
In this webinar, we will compare and contrast various cloud-native platforms for SQL Server that would help CIOs, DevOps engineers, database administrators and applications architects to determine the most suitable platform that fits their business needs.
Join us as we explore some exciting results from a recent performance benchmark study conducted by McKnight Consulting Group, an independent consulting firm, to compare the performance of Microsoft SQL Server 2019 on the best possible configurations of the following Kubernetes platforms:
Diamanti Enterprise Kubernetes Platform
Amazon Web Services Elastic Kubernetes Service (AWS EKS)
Azure Kubernetes Service (AKS)
Topics will include:
Platform considerations and requirements for running Microsoft SQL Server 2019
Performance comparison and analysis of running SQL Server on various platform
Best practices for running containerized SQL Server databases in Kubernetes environment
Next Generation Vulnerability Assessment Using Datadog and SnykDevOps.com
Vulnerability assessment for teams can often be overwhelming. The dependency graph could be thousands of packages depending on the application. Triaging vulnerability data and prioritizing actions has historically been a very manual process, until now. With Datadog and Snyk, learn how to trace security and performance issues by leveraging continuous profiling capabilities for actionable insight that help developers remediate problems.
Join us on Thursday, January 21 for a unique opportunity to learn more about continuous profiling, vulnerability management, and the benefit to customers from using both of these products. In this webinar, you will:
Bust some myths around continuous profiling and learn how Datadog differentiates itself
See decorated traces in action for sample Java applications and understand how Snyk + Datadog reduce time to triage supply chain vulnerabilities
Learn roadmap information for upcoming public announcements from both partners
In the era of cloud generation, the constant activity around workloads and containers create more vulnerabilities than an organization can keep up with. Using legacy security vendors doesn't set you up for success in the cloud. You’re likely spending undue hours chasing, triaging and patching a countless stream of cloud vulnerabilities with little prioritization.
Join us for this live webinar as we detail how to streamline host and container vulnerability workflows for your software teams wanting to build fast in the cloud. We'll be covering how to:
Get visibility into active packages and associated vulnerabilities
Reduce false positives by 98%
Reduce investigation time by 30%
Spot a legacy vendor looking to do some cloud washing
2021 Open Source Governance: Top Ten Trends and PredictionsDevOps.com
If you work in software development, jumpstart your engineering team in 2021—get ahead of the engineering curve and your competitors—by attending this must-watch open source trends and predictions webinar.
Alex Rybak, Director of Product Management at Revenera, and Russ Eling, founder and CEO of OSS Engineering Consultants, share their top 10 open source usage, license compliance and security insights for the new year.
Just a few hints at what you’ll learn more about:
Where the adoption of shift-left is headed and the decisions you’ll face going forward
The impact of a lack of software developer security training relative to pandemic fallout
The broader role of the engineering team in open source management and governance
The expanding role and impact of open source marketplaces such as GitHub
Don’t miss the discussion for valuable insight and learning for software engineering teams
2020 was a brutal year for ransomware. Cybercriminals operated without any human decency, targeting the most vulnerable and at-risk parties, such as hospitals, scientists, and global manufacturers. The approach has become more sophisticated and life-threatening, shifting from individual targets to global enterprises, destroying backups, blackmailing victims with public leakage of exfiltrated data, and paralyzing critical systems and infrastructure.
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)DevOps.com
As containers and Kubernetes are adopted in production, security is a critical concern and DevOps teams need to go beyond image scanning. Use cases such as runtime security, network visibility and segmentation, incident response and compliance become priorities as your Kubernetes security framework matures.
In this talk, we’ll share an overview of runtime security, discuss approaches used by open source and commercial tools, and hear how users are getting started quickly without impacting developer productivity.
In any fast-paced engineering environment, unexpected incidents can arise and escalate without warning. Without strong leadership within teams, you get chaotic, stressful, and tiring situations that waste valuable engineering time, slow down resolution, and most importantly, impact your customers.
Operationally mature organisations use proven incident response systems led by Incident Commanders. Incident Commanders provide the leadership needed to help stabilize major incidents fast.
In this webinar, we’ll take lessons learned from formalized incident response, such as those used by first responders, and show you how to apply those same practices to your organization. By utilising these methods you’ll improve both the speed and effectiveness of your team’s response, reducing the amount of downtime experienced.
In this workshop, attendees will:
Be introduced to the Incident Command System and learn how it can be adapted to their organisation
Walk through the basics of incident response best practices
Discuss examples of formal incident response from multiple organisations
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's CultureDevOps.com
Chaos engineering is becoming a critical part of the DevOps toolchain when adopting Site Reliability Engineering (SRE) practices. Every system is becoming a distributed system and chaos engineering proclaims many advantages for them.
It improves infrastructure automation, increases reliability and transforms incident management. However, an often-overlooked benefit of chaos engineering and SRE involves culture transformation. Culture is often touched upon when talking about chaos engineering and SRE but not as often as skills and process.
In this webinar, we will discuss how you can build out a chaos engineering practice and how you can adopt a true blameless culture and maximize the potential of your team.
You will learn how to:
Hold blameless postmortems
Share post mortems with other teams
Run regular fire drills and game days
Automate chaos experiments for continuous validation
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with TeleportDevOps.com
Enterprises are best served by leveraging an RBAC system to manage access to their SSH and Kubernetes resources. With Teleport, an open source software, employers are able to provide granular access controls to developers based on the access they need and when they need it. This makes it possible for employers to maintain secure access without getting in the way of their developers’ daily operations.
Join Steven Martin, solution engineer at Teleport, as he demonstrates how to assign access to developers and SRE’s across environments with Teleport through roles mapped from enterprises’ identity providers or SSOs.
Monitoring Serverless Applications with DatadogDevOps.com
Join Datadog for a webinar on monitoring serverless applications with AWS Lambda. You'll learn how to get the most of Datadog's platform, as well ask the following key takeaways:
Learn how to set up a Twitter bot that makes API calls with Node.js
Deploying Serverless Applications
What does observability look like with less infrastructure?
Deliver your App Anywhere … Publicly or PrivatelyDevOps.com
Developers are increasingly adopting a microservices approach for their apps in order to gain rapid iteration capabilities required for delivering new services faster. However, delivering the App still requires multiple steps such as allocation of virtual IPs, provisioning the front load balancer, configuring firewall rules, configuring a public domain, and DDOS. At present, each of these steps requires coordination across multiple teams with multiple iterations per team. The time efficiencies gained by adopting microservices and cloud-native technologies is negated due to the time taken to deliver the App.
In this session, Pranav Dharwadkar, VP of products at Volterra, and Jakub Pavlik, director of engineering, will help you understand these challenges and introduce a distributed proxy architecture that can alleviate the challenges across different cloud environments. This webinar will include a live demo using a distributed proxy architecture to advertise an App publicly and privately.
In this webinar, you will learn:
The steps required to deliver an App using the current approaches
How a distributed proxy architecture can be used to deliver the app publicly and privately
The operational benefits of a distributed proxy architecture for delivering new services
Securing medical apps in the age of covid finalDevOps.com
The COVID-19 pandemic has drastically altered the connected healthcare landscape, accelerating the usage of telemedicine and other remote healthcare delivery systems by as much as 11,000% for some populations. How has this unprecedented push affected healthcare and medical device application security? The security team at Intertrust recently analyzed 100 Android and iOS medical apps to find out.
In this webinar, we'll discuss:
Medical application and device threat trends
The top mHealth security vulnerabilities uncovered in our analysis
Strategies to keep your mHealth apps safe
Future advances in digital healthcare and how your security can evolve with it
Raise your hand if you enjoy being buried in alerts or woken up at 2 a.m. — yeah … thought so. Ever-rising customer expectations around high availability and performance put massive pressure on the teams who develop and support SaaS products. And teams are literally losing sleep over it. Until outages and other incidents are a thing of the past, organizations need to invest in a way of dealing with them that won’t lead to burn-out.
In this session, you’ll learn how to combine the latest tooling with DevOps practices in the pursuit of a sustainable incident response workflow. It’s all about transparency, actionable alerts, resilience and learning from each incident.
The Evolving Role of the Developer in 2021DevOps.com
The role of the developer continues to change as they sit on the front line of application and even cloud infrastructure security. Today, developers are focused on innovating fast and improving security, but how do high-performing teams accomplish this? They commit code frequently, release often and update dependencies regularly (608x faster than others).
In this webinar, we'll discuss the key traits of high-performing teams and how that impacts the role of the developer.
Key Takeaways:
Choose the best third party dependencies
Determine the lowest effort upgrades between open source versions
Solve for issues in both direct and transitive dependencies with a single-click
Block and quarantine suspicious open source components
Service Mesh: Two Big Words But Do You Need It?DevOps.com
Today, one of the big concepts buzzing in the app development world is service mesh. A service mesh is a configurable infrastructure layer for microservices application that makes communication flexible, reliable and fast. Let’s take a step back, though, and answer this question: Do you need a service mesh?
Join this webinar to learn:
What a service mesh is; when and why you need it — or when and why you may not
App modernization journey and traffic management approaches for microservices-based apps
How to make an informed decision based on cost and complexity before adopting service mesh
Learn about NGINX Service Mesh in a live demo, and how it provides the best service mesh option for container-based L7 traffic management
Secure Data Sharing in OpenShift EnvironmentsDevOps.com
Red Hat OpenShift is enabling quicker adoption of DevOps practices. Containers are an essential component of DevOps and the OpenShift Kubernetes Container Platform is integral for orchestration within these environments. Data security is now challenged to keep pace with the size and scope of container usage. The migration from legacy in-house deployments to hybrid-cloud installations has created new attack surfaces as data is shared more freely in Kubernetes deployments.
Protecting data at rest and in motions is a necessity. Learn how you can keep data protected and securely share data in OpenShift environments with real-time data protection solutions.
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...DevOps.com
Managing access permissions in the public cloud can be a very complex process. In fact, by 2023, 75% of cloud security failures will result from the inadequate management of identities, access and privileges, according to Gartner.
Join us as Guy Flechter, CISO of AppsFlyer, presents a real-world case of how his company works to enforce least-privilege and to govern identities in their cloud. This webinar will also provide an overview of how to govern access and achieve least privilege by analyzing the access permissions and activity in your public cloud environment. With thousands of human and machine identities, roles, policies and entitlements, this webinar will give you the tools to examine the access open to people and services in your public cloud, and determine whether that access is necessary.
In this workshop, you will learn about:
The risks of IAM misconfiguration and excessive entitlements in cloud environments
The challenges in identifying and mitigating Identity and access risks for both human and machine identities
How to automate cloud identity governance and entitlement management with Ermetic
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...DevOps.com
Open-source machine learning can be transformative, but without the proper tools in place, enterprises struggle to balance the IT security and governance requirements with the need to deliver these powerpoint tools into the hands of their developers and modelers.
How can organizations get the latest technology from the open-source brain trust, while ensuring enterprise-grade management and security? In this webinar, we will discuss how Anaconda Team Edition, available on RedHat Marketplace, enables IT departments to mirror a curated set of packages into their organization in a safe and governed way.
Join Michael Grant, VP of services at Anaconda, to discuss:
How IT organizations are using Anaconda Team Edition to curate, govern and secure Python and R packages
Tips for how development and data science teams can get the most out of Team Edition, from uploading your own packages to building custom channels for groups or projects
How to distribute conda environments to desktops, servers and clusters:
GUI-based installers for desktop users
“Conda packs” for automated delivery to remote servers and distributed computing clusters
Conda-enabled Docker containers for application deployment
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Using Time Series for Full Observability of a SaaS Platform
1. Using Time Series for Full
Observability of a SaaS Platform
Aleksandr Tavgen Playtech, co-founder Timetrix
2. vAbout me
More than 19 years of
professional experience
FinTech and Data Science
background
From Developer to SRE Engineer
Solved and automated some
problems in Operations on scale
3.
4. Overall problem
• Zoo of monitoring solutions in large enterprises often
distributed over the world
• M&A transactions or distributed teams make central
managing impossible or ineffective
• For small enterprises or startups the key question is
about finding the best solution
• A lot of companies have failed this way
• A lot of anti-patterns have developed
5. Managing a
Zoo
• A lot of independent
teams
• Everyone has some sort
of solution
• It is hard to get overall
picture of operations
• It is hard to orchestrate
and make changes
7. Common
Anti-patterns
It is tempting to keep everything
recorded just in case
Amount of metrics in monitoring
grows exponen?ally
Nobody understands such huge
bunch of metrics
Engineering complexity grows as
well
8. Uber case – 9 billion of metrics / 1000 + instances for monitoring solution
9. Dashboards problem
• Proliferating amount of metrics leads to unusable
dashboards
• How can one observe 9 billion metrics?
• Quite often it looks like spaghetti
• It is ok to pursue anti-pattern for approx. 1,5 years
• GitLab Dashboards are a good example
10. IF YOU NEED 9
BILLION OF
METRICS, YOU
ARE PROBABLY
WRONG
11.
12.
13.
14.
15. Actually not
• Dashboards are very useful when
you know where and when to watch
• Our brain can recognize and process
visual pa:erns more effec=vely
• But only when you know what you
are looking for and when
16. Queries
vs.
Dashboards
Querying your data requires more cogni2ve
effort than a quick look at dashboards
Metrics are a low resolution of your
system’s dynamics
Metrics should not replace logs
It is not necessary to have millions of them
17. What are
Incidents
• Something that has impact
on operational/business
level
• Incidents are expensive
• Incidents come with
credibility costs
18. COST OF AN
HOUR OF
DOWNTIME
2017-2018
h#ps://www.sta,sta.com/sta,s,cs/753938/worldwide-enterprise-server-hourly-down,me-cost/
19. • Change
• Network Failure
• Bug
• Human Factor
• Hardware Failure
• Unspecified
Causes of outage
22. What is it all about?
• Any reduction of
outage/incident timeline
results in significant positive
financial impact
• It is about credibility as well
• And your DevOps teams
feel less pain and toil on
their way
24. Metrics
• It is almost impossible to operate on
billions of metrics
• In case of normal system behavior there
will always be outliers in real production
data
• Therefore, not all outliers should be
flagged as anomalous incidents
• Etsy Kale project case
25.
26. Paradigm Shift
• The main paradigm shift comes from the fields of infrastructure and
architecture
• Cloud architectures, microservices, Kubernetes, and immutable
infrastructure have changed the way companies build and operate
systems
• Virtualization, containerization and orchestration frameworks abstract
infra level
• Moving towards abstraction from the underlying hardware and
networking means that we must focus on ensuring that our
applications work as intended in the context of our business
processes.
27. KPI monitoring
• KPI metrics are related to the core business
opera=ons
• It could be logins, ac=ve sessions, any domain
specific opera=ons
• Heavily seasoned
• Sta=c thresholds can’t help here
56. Overwhelming
results
• Red area – Customer Detection
• Blue area – Own Observation (toil)
• Orange line – Central Grafana Introduced
• Green line – ML based solution in prod
Customer Detection has dropped to
low percentage points
Overwhelming
results
• Red area – Customer Detection
• Blue area – Own Observation (toil)
• Orange line – Central Grafana Introduced
• Green line – ML based solution in prod
Customer Detection has dropped to
low percentage points
57. General view
• Finding anomalies on metrics
• Finding regularities on a higher
level
• Combining events from
organization internals
(changes/deployments)
• Stream processing architectures
58. Why do we need time-series storage?
• We have unpredicted delay on networking
• Operating worldwide is a problem
• CAP theorem
• You can receive signals from the past
• But you should look into the future too
• How long should this window be in the future?
59. Why not Ka:a and all those classical
streaming?
• Frameworks like Storm, Flink - oriented on tuples not =me-ordered
events
• We do not want to process everything
• A lot of events are needed on-demand
• It is ok to lose some signals in favor of performance
• And we s=ll have signals from the past
60. Why Influx v 2.0
• Flux
• Better isolation
• Central storage for metrics, events,
traces
• Streaming paradigm
61. Taking a higher picture
• Finding anomalies on a lower level
• Tracing
• Event logs
• Finding regularities between them
• Building a topology
• We can call it AIOps as well
62. Open Tracing
• Tracing is a higher resolution of your
system’s dynamics
• Distributed tracing can show you unknown-
unknowns
• It reduces Investigation part of Incident
Timeline
• There is a good OSS Jaeger implementation
• Influx v 2.0 – the supported backend
storage
63. Jaeger with
Influxv2.0 as a
Backend storage
• Real prod case
• Every minute approx. 8000
traces
• Performance issue with
limitaDon on I/O ops
connecDons
• Bursts of context switches
on the kernel level
64. Impact on the particular
execution flow
• Db query is quite constant
• Processing time in normal case - 1-3 ms
• After a process context switch - more than 40 ms
65. Flux
• Multi-source joining
• Same functional composition paradigm
• Easy to test hypothesis
• You can combine metrics, event logs, and traces
• Data transformation based on conditions
68. • Let’s check relations between them
• Looks more like stationary time – series
• Easier to model
• Let’s check relations between them
• Looks more like stationary time – series
• Easier to model
69. Random Walk
• Processes have a lot of random
factors
• Random Walk modelling
• X(t) = X(t-1) + Er(t)
• Er(t) = X(t) - X(t-1)
• Stationary time-series is very
easy to model
• Do not need statistical models
• Just reservoir with variance
76. •It is all about semantics
•Datacenters, sites, services
•Graph topology based on time-series data
77. Timetrix
• As a lot people involved in it from
different companies
• We decided to Open Source core
engine
• Integrations which are specific to
domain companies could be easily
added
• We plan to launch Q3/Q4 2019
• Core engine is written in Java
• Great Kudos to bonitoo.io team for
great drivers