Netflix accounts for more than a third of all traffic heading into American homes at peak hours. Making sure users are getting the best possible experience at all times is no simple feat and performance is at the core of this experience. In order to ensure performance and maintain development agility in a highly decentralized environment/(organization?), Netflix employs a multitude of strategies, such as production canary analysis, fully automated performance tests, simple zero-downtime deployments and rollbacks, auto-scaling clusters and a fault-tolerant stateless service architecture. We will present a set of use cases that demonstrate how and why different groups employ different strategies to achieve a common goal, great performance and stability, and detail how these strategies are incorporated into development, test and DevOps with minimal overhead.
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...DataStax
Apache Cassandra makes it possible to execute millions of operations per second in scalable fashion. Harnessing the power of C* leaves many developers pondering about the following:
- Is my data model appropriate and not going to end up as wide partition(s) causing heap pressure and other issues?
- How do I tune my connection pool configuration? What are the optimal settings for my environment ?
- What is my C* cluster capacity in terms of number of IOPs for a given 95th and 99th latency?
- How do I perf-test my data access layer?
In this talk, Vinay Chella, Cloud Data Architect @ Netflix, will share open source tools, techniques and platform(NDBench) that Netflix uses to perf-test their C* fleet with simulations millions of operations per second.
About the Speaker
Vinay Chella Cloud Data Architect, NETFLIX Inc
About Vinay Chella, Cloud Data Architect at Netflix having deeper understanding of Cassandra and other RDBMS. As an Engineer and Architect, working extensively on data modeling, performance tuning and guiding best practices of various persistence stores. Helping various teams @ Netflix building next generation data access layers.
Monitorama 2015 talk by Brendan Gregg, Netflix. With our large and ever-changing cloud environment, it can be vital to debug instance-level performance quickly. There are many instance monitoring solutions, but few come close to meeting our requirements, so we've been building our own and open sourcing them. In this talk, I will discuss our real-world requirements for instance-level analysis and monitoring: not just the metrics and features we desire, but the methodologies we'd like to apply. I will also cover the new and novel solutions we have been developing ourselves to meet these needs and desires, which include use of advanced Linux performance technologies (eg, ftrace, perf_events), and on-demand self-service analysis (Vector).
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...DataStax
Apache Cassandra makes it possible to execute millions of operations per second in scalable fashion. Harnessing the power of C* leaves many developers pondering about the following:
- Is my data model appropriate and not going to end up as wide partition(s) causing heap pressure and other issues?
- How do I tune my connection pool configuration? What are the optimal settings for my environment ?
- What is my C* cluster capacity in terms of number of IOPs for a given 95th and 99th latency?
- How do I perf-test my data access layer?
In this talk, Vinay Chella, Cloud Data Architect @ Netflix, will share open source tools, techniques and platform(NDBench) that Netflix uses to perf-test their C* fleet with simulations millions of operations per second.
About the Speaker
Vinay Chella Cloud Data Architect, NETFLIX Inc
About Vinay Chella, Cloud Data Architect at Netflix having deeper understanding of Cassandra and other RDBMS. As an Engineer and Architect, working extensively on data modeling, performance tuning and guiding best practices of various persistence stores. Helping various teams @ Netflix building next generation data access layers.
Monitorama 2015 talk by Brendan Gregg, Netflix. With our large and ever-changing cloud environment, it can be vital to debug instance-level performance quickly. There are many instance monitoring solutions, but few come close to meeting our requirements, so we've been building our own and open sourcing them. In this talk, I will discuss our real-world requirements for instance-level analysis and monitoring: not just the metrics and features we desire, but the methodologies we'd like to apply. I will also cover the new and novel solutions we have been developing ourselves to meet these needs and desires, which include use of advanced Linux performance technologies (eg, ftrace, perf_events), and on-demand self-service analysis (Vector).
Application development has come a long way. From client-server, to desktop, to web based applications served by monolithic application servers, the need to serve billions of users and hundreds of devices have become crucial to today's business. Typesafe Reactive Platform helps you to modernize your applications by transforming the most critical parts into microservice-style architectures which support extremely high workloads and allow you to serve millions of end-users.
Your Guide to Streaming - The Engineer's PerspectiveIlya Ganelin
It feels like every week there's a new open-source streaming platform out there. Yet, if you only look at the descriptions, performance metrics, or even the architecture, they all start to look exactly the same! In short, nothing really differentiates itself - whether it be Storm, Flink, Apex, GearPumk, Samza, KafkaStreams, AkkaStreams, or any of the other myriad technologies. So if they all look the same, how do you really pick a streaming platform to solve the problem that YOU have? This talk is about how to really compare these platforms, and it turns out that they do have their key differences, they're just not the ones you usually think about. The way that you need to compare these systems if you're building something to last, a well-engineered system, is to look at how they handle durability, availability, how easy they are to install and use, and how they deal with failures.
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Vadym Kazulkin
The goal of Serverless is to focus on writing the code that delivers business value and offload everything else to your trusted partners (like Cloud providers or SaaS vendors). You want to iterate quickly and today’s code quickly becomes tomorrow’s technical debt. In this talk we will show why Serverless adoption increases the developer productivity and how to measure it. We will also go through AWS Serverless architectures where you only glue together different Serverless managed services relying solely on configuration, minimizing the amount of the code written.
Title: Sista: Improving Cog’s JIT performance
Speaker: Clément Béra
Thu, August 21, 9:45am – 10:30am
Video Part1
https://www.youtube.com/watch?v=X4E_FoLysJg
Video Part2
https://www.youtube.com/watch?v=gZOk3qojoVE
Description
Abstract: Although recent improvements of the Cog VM performance made it one of the fastest available Smalltalk virtual machine, the overhead compared to optimized C code remains important. Efficient industrial object oriented virtual machine, such as Javascript V8's engine for Google Chrome and Oracle Java Hotspot can reach on many benchs the performance of optimized C code thanks to adaptive optimizations performed their JIT compilers. The VM becomes then cleverer, and after executing numerous times the same portion of codes, it stops the code execution, looks at what it is doing and recompiles critical portion of codes in code faster to run based on the current environment and previous executions.
Bio: Clément Béra and Eliot Miranda has been working together on Cog's JIT performance for the last year. Clément Béra is a young engineer and has been working in the Pharo team for the past two years. Eliot Miranda is a Smalltalk VM expert who, among others, has implemented Cog's JIT and the Spur Memory Manager for Cog.
A Journey to Reactive Function ProgrammingAhmed Soliman
A gentle introduction to functional reactive programming highlighting the reactive manifesto and ends with a demo in RxJS https://github.com/AhmedSoliman/rxjs-test-cat-scope
With the advent of “big data”, it has become inevitable to analyze huge volumes of data in real-time to make sense out of it. For this to happen seamlessly, the streaming of that data is necessary. This is where Reactive Streams step in.
Akka Streams is built on top of the Reactive Streams interface. This webinar will be an introduction to Akka Streams and how it simplifies the aspect of back-pressure in real-time streaming.
Here’s an outline of the webinar -
~ Introduction to the problem set
~ How do Akka Streams help simplify the problem of back-pressure?
~ Basic terminologies of Akka Streams
~ Live demo of a real-life problem being solved with Akka Streams
This talk is from Distributed Data Summit SF 2018 - http://distributeddatasummit.com/2018-sf/sessions#netflix2
Operating C* can involve a lot of required manpower, complex automation, or both. Some of this complexity comes from operational/configuration activity of the underlying kernel and hardware but much of it is operation complexity stemming from C* itself. Some examples of this complexity are restarting the database in a safe way, reliability backing up and restoring snapshots, monitoring the health of the datastore, and even ensuring eventual consistency through repair. As a result of these complexities, C* operators end up with complicated operational setups, which are expensive to build, manage and monitor. As part of this talk, we will share lessons learned in managing such complexity via our Priam sidecar including recent innovations in how our sidecar ensures the highest possible uptime and correctness of Cassandra. We then use this to motivate building in the management sidecar directly as part of C* itself (CASSANDRA-14395).
Moving to the cloud isn’t easy, transforming your engineering team to adopt to the cloud and services lifestyle is therefore crucial. It all starts with creating a common understanding of the engineering and development principles which are important in the cloud, which are different then building regular applications. This session will take you on a road trip based on the presenters experience developing and more importantly operating Azure Active Directory, SQL Server Azure and most recently the Xbox Live Services to support Xbox One.
NetflixOSS Meetup S3 E1, covering latest components in Distributed Databases, Telemetry systems, Big Data tools and more. Speakers from Netflix, IBM Watson, Pivotal and Nike Digital
An Open Source Notebook based tool that supports Data ingestion, collaboration and Analytics. Offers visualization with modern UI and other interesting features including GitHub integration, cron schedules, importing and exporting notebooks and many more.
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Tim Bozarth
Slides from Tim Bozarth's (@timbozarth) QCon 2017 presentation (https://qconnewyork.com/ny2017/presentation/zero-production-ready-minutes)
Abstract:
The fabric of Netflix's approach to building new highly-available services is evolving. The Runtime Platform Team is focused on improving developer productivity while simultaneously making it simpler to build and maintain the high-availability services that Netflix expects. Starting with application generation, and leveraging a new approach to communication between services (RPC), we're simplifying what's needed to build a fast, reliable, and optimized service capable of delivering a fantastic customer experience.
We'll be sharing how Netflix is enabling engineers to go from "zero" to "production ready" in minutes - incorporating best-practices learned through years in the cloud. We will also share the story of transitioning from our home-grown RPC machinery to open-source standards, how we recognized when it was the right time to walk away from our own creations, and how our new approach is improving team velocity across Netflix engineering.
Reactive Streams 1.0.0 is now live, and so are our implementations in Akka Streams 1.0 and Slick 3.0.
Reactive Streams is an engineering collaboration between heavy hitters in the area of streaming data on the JVM. With the Reactive Streams Special Interest Group, we set out to standardize a common ground for achieving statically-typed, high-performance, low latency, asynchronous streams of data with built-in non-blocking back pressure—with the goal of creating a vibrant ecosystem of interoperating implementations, and with a vision of one day making it into a future version of Java.
Akka (recent winner of “Most Innovative Open Source Tech in 2015”) is a toolkit for building message-driven applications. With Akka Streams 1.0, Akka has incorporated a graphical DSL for composing data streams, an execution model that decouples the stream’s staged computation—it’s “blueprint”—from its execution (allowing for actor-based, single-threaded and fully distributed and clustered execution), type safe stream composition, an implementation of the Reactive Streaming specification that enables back-pressure, and more than 20 predefined stream “processing stages” that provide common streaming transformations that developers can tap into (for splitting streams, transforming streams, merging streams, and more).
Slick is a relational database query and access library for Scala that enables loose-coupling, minimal configuration requirements and abstraction of the complexities of connecting with relational databases. With Slick 3.0, Slick now supports the Reactive Streams API for providing asynchronous stream processing with non-blocking back-pressure. Slick 3.0 also allows elegant mapping across multiple data types, static verification and type inference for embedded SQL statements, compile-time error discovery, and JDBC support for interoperability with all existing drivers.
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITOpenStack
Audience: Advanced
About: Real world lessons and war stories about Catalyst IT’s experience in rolling out an OpenStack based public cloud in New Zealand.
This presentation will provide tips and advice that may save you a lot of time, money and nights of sleep if you are planning to run OpenStack in the future. It may also bring some insights to people that are already running OpenStack in production.
Topics covered will include: selection of hardware for optimal costs, techniques that drive quality and service levels up, common deployment mistakes, in place upgrades, how to identify the maturity level of each project and decide what is ready for production, and much more!
Speaker Bio: Bruno Lago – Entrepreneur, Catalyst IT Limited
Bruno Lago is a solutions architect that has been involved with the Catalyst Cloud (New Zealand’s first public cloud based on OpenStack) from its inception. He is passionate about open source software, cloud computing and disruptive technologies.
OpenStack Australia Day - Sydney 2016
https://events.aptira.com/openstack-australia-day-sydney-2016/
Application development has come a long way. From client-server, to desktop, to web based applications served by monolithic application servers, the need to serve billions of users and hundreds of devices have become crucial to today's business. Typesafe Reactive Platform helps you to modernize your applications by transforming the most critical parts into microservice-style architectures which support extremely high workloads and allow you to serve millions of end-users.
Your Guide to Streaming - The Engineer's PerspectiveIlya Ganelin
It feels like every week there's a new open-source streaming platform out there. Yet, if you only look at the descriptions, performance metrics, or even the architecture, they all start to look exactly the same! In short, nothing really differentiates itself - whether it be Storm, Flink, Apex, GearPumk, Samza, KafkaStreams, AkkaStreams, or any of the other myriad technologies. So if they all look the same, how do you really pick a streaming platform to solve the problem that YOU have? This talk is about how to really compare these platforms, and it turns out that they do have their key differences, they're just not the ones you usually think about. The way that you need to compare these systems if you're building something to last, a well-engineered system, is to look at how they handle durability, availability, how easy they are to install and use, and how they deal with failures.
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Vadym Kazulkin
The goal of Serverless is to focus on writing the code that delivers business value and offload everything else to your trusted partners (like Cloud providers or SaaS vendors). You want to iterate quickly and today’s code quickly becomes tomorrow’s technical debt. In this talk we will show why Serverless adoption increases the developer productivity and how to measure it. We will also go through AWS Serverless architectures where you only glue together different Serverless managed services relying solely on configuration, minimizing the amount of the code written.
Title: Sista: Improving Cog’s JIT performance
Speaker: Clément Béra
Thu, August 21, 9:45am – 10:30am
Video Part1
https://www.youtube.com/watch?v=X4E_FoLysJg
Video Part2
https://www.youtube.com/watch?v=gZOk3qojoVE
Description
Abstract: Although recent improvements of the Cog VM performance made it one of the fastest available Smalltalk virtual machine, the overhead compared to optimized C code remains important. Efficient industrial object oriented virtual machine, such as Javascript V8's engine for Google Chrome and Oracle Java Hotspot can reach on many benchs the performance of optimized C code thanks to adaptive optimizations performed their JIT compilers. The VM becomes then cleverer, and after executing numerous times the same portion of codes, it stops the code execution, looks at what it is doing and recompiles critical portion of codes in code faster to run based on the current environment and previous executions.
Bio: Clément Béra and Eliot Miranda has been working together on Cog's JIT performance for the last year. Clément Béra is a young engineer and has been working in the Pharo team for the past two years. Eliot Miranda is a Smalltalk VM expert who, among others, has implemented Cog's JIT and the Spur Memory Manager for Cog.
A Journey to Reactive Function ProgrammingAhmed Soliman
A gentle introduction to functional reactive programming highlighting the reactive manifesto and ends with a demo in RxJS https://github.com/AhmedSoliman/rxjs-test-cat-scope
With the advent of “big data”, it has become inevitable to analyze huge volumes of data in real-time to make sense out of it. For this to happen seamlessly, the streaming of that data is necessary. This is where Reactive Streams step in.
Akka Streams is built on top of the Reactive Streams interface. This webinar will be an introduction to Akka Streams and how it simplifies the aspect of back-pressure in real-time streaming.
Here’s an outline of the webinar -
~ Introduction to the problem set
~ How do Akka Streams help simplify the problem of back-pressure?
~ Basic terminologies of Akka Streams
~ Live demo of a real-life problem being solved with Akka Streams
This talk is from Distributed Data Summit SF 2018 - http://distributeddatasummit.com/2018-sf/sessions#netflix2
Operating C* can involve a lot of required manpower, complex automation, or both. Some of this complexity comes from operational/configuration activity of the underlying kernel and hardware but much of it is operation complexity stemming from C* itself. Some examples of this complexity are restarting the database in a safe way, reliability backing up and restoring snapshots, monitoring the health of the datastore, and even ensuring eventual consistency through repair. As a result of these complexities, C* operators end up with complicated operational setups, which are expensive to build, manage and monitor. As part of this talk, we will share lessons learned in managing such complexity via our Priam sidecar including recent innovations in how our sidecar ensures the highest possible uptime and correctness of Cassandra. We then use this to motivate building in the management sidecar directly as part of C* itself (CASSANDRA-14395).
Moving to the cloud isn’t easy, transforming your engineering team to adopt to the cloud and services lifestyle is therefore crucial. It all starts with creating a common understanding of the engineering and development principles which are important in the cloud, which are different then building regular applications. This session will take you on a road trip based on the presenters experience developing and more importantly operating Azure Active Directory, SQL Server Azure and most recently the Xbox Live Services to support Xbox One.
NetflixOSS Meetup S3 E1, covering latest components in Distributed Databases, Telemetry systems, Big Data tools and more. Speakers from Netflix, IBM Watson, Pivotal and Nike Digital
An Open Source Notebook based tool that supports Data ingestion, collaboration and Analytics. Offers visualization with modern UI and other interesting features including GitHub integration, cron schedules, importing and exporting notebooks and many more.
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Tim Bozarth
Slides from Tim Bozarth's (@timbozarth) QCon 2017 presentation (https://qconnewyork.com/ny2017/presentation/zero-production-ready-minutes)
Abstract:
The fabric of Netflix's approach to building new highly-available services is evolving. The Runtime Platform Team is focused on improving developer productivity while simultaneously making it simpler to build and maintain the high-availability services that Netflix expects. Starting with application generation, and leveraging a new approach to communication between services (RPC), we're simplifying what's needed to build a fast, reliable, and optimized service capable of delivering a fantastic customer experience.
We'll be sharing how Netflix is enabling engineers to go from "zero" to "production ready" in minutes - incorporating best-practices learned through years in the cloud. We will also share the story of transitioning from our home-grown RPC machinery to open-source standards, how we recognized when it was the right time to walk away from our own creations, and how our new approach is improving team velocity across Netflix engineering.
Reactive Streams 1.0.0 is now live, and so are our implementations in Akka Streams 1.0 and Slick 3.0.
Reactive Streams is an engineering collaboration between heavy hitters in the area of streaming data on the JVM. With the Reactive Streams Special Interest Group, we set out to standardize a common ground for achieving statically-typed, high-performance, low latency, asynchronous streams of data with built-in non-blocking back pressure—with the goal of creating a vibrant ecosystem of interoperating implementations, and with a vision of one day making it into a future version of Java.
Akka (recent winner of “Most Innovative Open Source Tech in 2015”) is a toolkit for building message-driven applications. With Akka Streams 1.0, Akka has incorporated a graphical DSL for composing data streams, an execution model that decouples the stream’s staged computation—it’s “blueprint”—from its execution (allowing for actor-based, single-threaded and fully distributed and clustered execution), type safe stream composition, an implementation of the Reactive Streaming specification that enables back-pressure, and more than 20 predefined stream “processing stages” that provide common streaming transformations that developers can tap into (for splitting streams, transforming streams, merging streams, and more).
Slick is a relational database query and access library for Scala that enables loose-coupling, minimal configuration requirements and abstraction of the complexities of connecting with relational databases. With Slick 3.0, Slick now supports the Reactive Streams API for providing asynchronous stream processing with non-blocking back-pressure. Slick 3.0 also allows elegant mapping across multiple data types, static verification and type inference for embedded SQL statements, compile-time error discovery, and JDBC support for interoperability with all existing drivers.
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITOpenStack
Audience: Advanced
About: Real world lessons and war stories about Catalyst IT’s experience in rolling out an OpenStack based public cloud in New Zealand.
This presentation will provide tips and advice that may save you a lot of time, money and nights of sleep if you are planning to run OpenStack in the future. It may also bring some insights to people that are already running OpenStack in production.
Topics covered will include: selection of hardware for optimal costs, techniques that drive quality and service levels up, common deployment mistakes, in place upgrades, how to identify the maturity level of each project and decide what is ready for production, and much more!
Speaker Bio: Bruno Lago – Entrepreneur, Catalyst IT Limited
Bruno Lago is a solutions architect that has been involved with the Catalyst Cloud (New Zealand’s first public cloud based on OpenStack) from its inception. He is passionate about open source software, cloud computing and disruptive technologies.
OpenStack Australia Day - Sydney 2016
https://events.aptira.com/openstack-australia-day-sydney-2016/
Triangle Devops Meetup covering Netflix open source, cloud architecture, and what Andrew did in his first year working as a senior software engineer in the cloud platform group.
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...StormForge .io
Complimentary Live Webinar
Sponsored by StormForge
Analyzing the performance and behavior of applications run on Kubernetes is often challenging, making the need to optimize prior to production something that you must have. However, a problem has reared its head in the form of a question: How do you get an accurate measurement of application performance or other behavior without accurate testing or an accurate representation of how it will run in production? In this webinar, we will present and discuss a new fully Open Source tool for creating the needed tests with which to accurately measure your applications. We hope you will join us to learn more about this tool, and find out how you can help contribute.
This webinar is sponsored by StormForge and hosted by The Linux Foundation.
Speaker
Noah Abrahams, Open Source Advocate
Noah is an Open Source Advocate for StormForge, merging Open Source Strategy with Developer Advocacy. He has been involved in cloud for over 12 years, has been contributing to the Kubernetes ecosystem for 5 years, and has been up and down the business stack from DevOps and Architecture to Sales, Enablement, and Education. You will find him running meetups in Las Vegas and attending conferences, once those are both happening again.
Antifragility and testing for distributed systems failureDiUS
Failure is inevitable. In our modern world filled with continuously delivered and increasingly complex distributed architectures (looking at you micro-services), it is important to be able to test and improve our systems under a range of failure conditions.
In this talk, Matt discusses these complexities and the forces they exert on development teams, presenting some simple strategies and practical advice to deal with them.
Security in CI/CD Pipelines: Tips for DevOps EngineersDevOps.com
While DevOps is becoming a new norm for most of the companies, security is typically still behind. The new architectures create a number of new process considerations and technical issues. In this practical talk, we will present an overview of the practical issues that go into making security a part of DevOps processes. Will cover incorporating security into existing CI/CD pipelines and tools DevOps professionals need to know to implement the automation and adhere to secure coding practices.
Join Stepan Ilyin, Chief Product Officer at Wallarm for an engaging conversation where you’ll learn:
Methodologies and tooling for dynamic and static security testing
Composite and OSS license analysis benefits
Secrets and analysis and secrets management approaches in distributed applications
Security automation and integration in CI/CD
Apps, APIs and workloads protection in cloud-native K8s enabled environments
Performance Test Automation With GatlingKnoldus Inc.
Gatling is a lightweight dsl written in scala by which you can treat your performance test as a production code means you can easily write a readable code to test the performance of an application it s a framework based on Scala, Akka and Netty.
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
"Regular performance testing is one of the pillars of Kafka Streams’ reliability and efficiency. Beyond ensuring dependable releases, regular performance testing supports engineers in new feature development with the ability to easily test the performance impact of their features, compare different approaches, etc.
In this session, Alex and John share their experience from developing, using, and maintaining a performance testing framework for Kafka Streams that has prevented multiple performance regressions over the last 5 years. They cover guiding principles and architecture, how to ensure statistical significance and stability of results, and how to automate regression detection for actionable notifications.
This talk sheds light on how Apache Kafka is able to foster a vibrant open-source community while maintaining a high performance bar across many years and releases. It also empowers performance-minded engineers to avoid common pitfalls and bring high-quality performance testing to their own systems."
Engineering Netflix Global Operations in the CloudJosh Evans
Delivered at re:Invent 2015.
Operating a massively scalable, constantly changing, distributed global service is a daunting task. We innovate at breakneck speed to attract new customers and stay ahead of the competition. This means more features, more experiments, more deployments, more engineers making changes in production environments, and ever-increasing complexity. Simultaneously improving service availability and accelerating rate of change seems impossible on the surface. At Netflix, operations engineering is both a technical and organizational construct designed to accomplish just that by integrating disciplines like continuous delivery, fault injection, regional traffic management, crisis response, best practice automation, and real-time analytics. In this talk, designed for technical leaders seeking a path to operational excellence, we'll explore these disciplines in depth and how they integrate and create competitive advantages.
(ISM301) Engineering Netflix Global Operations In The CloudAmazon Web Services
Operating a massively scalable, constantly changing, distributed global service is a daunting task. We innovate at breakneck speed to attract new customers and stay ahead of the competition. This means more features, more experiments, more deployments, more engineers making changes in production environments, and ever-increasing complexity. Simultaneously improving service availability and accelerating rate of change seems impossible on the surface. At Netflix, operations engineering is both a technical and organizational construct designed to accomplish just that by integrating disciplines like continuous delivery, fault injection, regional traffic management, crisis response, best practice automation, and real-time analytics. In this talk, designed for technical leaders seeking a path to operational excellence, we'll explore these disciplines in depth and how they integrate and create competitive advantages.
Applying the power of Continuous Delivery to performance testing. Process, techniques, best practices. This talk describes a pragmatic approach to building a robust performance testing strategy.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2qoUklo.
Mark Price talks about techniques for making performance testing a first-class citizen in a Continuous Delivery pipeline. He covers a number of war stories experienced by the team building one of the world's most advanced trading exchanges. Filmed at qconlondon.com.
Mark Price is a Senior Performance Engineer at Improbable.io, working on optimizing and scaling reality-scale simulations. Previously, he worked as Lead Performance Engineer at LMAX Exchange, where he helped to optimize the platform to become one of the world's fastest FX exchanges.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
1. Ensuring Performance in a Fast-
Paced Environment
Martin Spier
Performance Engineering @ Netflix
@spiermar
mspier@netflix.com
Performance & Capacity 2014 by CMG
2. Martin Spier
● Performance Engineer @ Netflix
● Previously @ Expedia and Dell
● Performance
o Architecture, Tuning and Profiling
o Testing and Frameworks
o Tool Development
● Blog @ http://overloaded.io
● Twitter @spiermar
3. ● World's leading Internet television network
● ⅓ of all traffic heading into American homes at
peak hours
● > 50 million members
● > 40 countries
● > 1 billion hours of TV shows and movies per
month
● > 100s different client devices
4. Agenda
● How Netflix Works
o Culture, Development Model, High-level
Architecture, Platform
● Ensuring Performance
o Auto-Scaling, Squeeze Tests, Simian Army, Hystrix,
Redundancy, Canary Analysis, Performance Test
Framework, Large Scale Tests
5. Freedom and Responsibility
● Culture deck* is TRUE
o 9M+ views
● Minimal process
● Context over control
● Root access to everything
● No approvals required
● Only Senior Engineers
* http://www.slideshare.net/reed2001/culture-1798664
6. Independent Development Teams
● Highly aligned, loosely coupled
● Free to define release cycles
● Free to choose use any methodology
● But it’s an agile environment
● And there is a “paved road”
7. Development Agility
● Continuous innovation cycle
● Shorter development cycles
● Automate everything!
● Self-service deployments
● A/B Tests
● Failure cost close to zero
● Lower time to market
● Innovation > Risk
8.
9. Architecture
● Scalable and Resilient
● Micro-services
● Stateless
● Assume Failure
● Backwards Compatible
● Service Discovery
10. Zuul & Dynamic Routing
● Zuul, the front door for all requests from devices and
websites to the backend of the Netflix streaming
application
● Dynamic Routing
● Monitoring
● Resiliency and Security
● Region and AZ Failure
* https://github.com/Netflix/zuul
12. Performance Engineering
● Not a part of any development team
● Not a shared service
● Through consultation improve and maintain the
performance and reliability
● Provide self-service performance analysis utilities
● Disseminate performance best practices
● And we’re hiring!
15. Squeeze Tests
● Stress Test, with Production Load
● Steering Production Traffic
● Understand the Upper Limits of Capacity
● Adjust Auto-Scaling Policies
● Automated Squeeze Tests
16. Red/Black Pushes
● New builds are rolled out as new
Auto-Scaling Groups (ASGs)
● Elastic Load Balancers (ELBs)
control the traffic going to each
ASG
● Fast and simple rollback if issues
are found
● Canary Clusters are used to test
builds before a full rollout
17. Monitoring: Atlas
● Humongous, 1.2 billion distinct time
series
● Integrated to all systems, production
and test
● 1 minute resolution, quick roll ups
● 12-month persistence
● API and querying UI
● System and Application Level
● Servo (github.com/Netflix/servo)
● Custom dashboards
18. Vector
● 1 second Resolution
● No Persistence
● Leverages Performance Co-
Pilot (PCP)
● System-level Metrics
● Java Metrics (parfait)
● ElasticSearch, Cassandra
● Flame Graphs (Brendan Gregg)
19. Mogul
● ASG and Instance Level
● Resource Demand;
● Performance
Characteristics;
● And Downstream
Dependencies.
21. Canary Release
“Canary release is a technique to reduce the risk
of introducing a new software version in
production by slowly rolling out the change to a
small subset of users before rolling it out to the
entire infrastructure and making it available to
everybody.”
22. Automatic Canary Analysis (ACA)
Exactly what the name implies. An automated
way of analyzing a canary release.
23. ACA: Use Case
● You are a service owner and have finished
implementing a new feature into your application.
● You want to determine if the new build, v1.1, is
performing analogous to the existing build.
● The new build is deployed automatically to a canary
cluster
● A small percentage of production traffic is steered to the
canary cluster
● After a short period of time, canary analysis
is triggered
24. Automated Canary Analysis
● For a given set of metrics, ACA will compare
samples from baseline and canary;
● Determine if they are analogous;
● Identify any metrics that deviate from the
baseline;
● And generate a score that indicates the overall
similarity of the canary.
25. Automated Canary Analysis
● The score will be associated
with a Go/No-Go decision;
● And the new build will be
rolled out (or not) to the rest
of the production
environment.
● No workload definitions
● No synthetic load
28. Remember the short release cycles?
With the short time span between production builds,
pre-production tests don’t warn us much sooner.
(And there’s ACA)
29. So when?
When it brings value. Not just because is
part of a process.
30. When? Use Cases
● New Services
● Large Code Refactoring
● Architecture Changes
● Workload Changes
● Proof of Concept
● Initial Cluster Sizing
● Instance Type Migration
31. Use Cases, cont.
● Troubleshooting
● Tuning
● Teams that release less frequently
o Intermediary Builds
● Base Components (Paved Road)
o Amazon Cloud Images (AMIs)
o Platform
o Common Libraries
32. Who?
● Push “tests” to development teams
● Development understands the product, they
developed It
● Performance Engineering knows the tools
and techniques (so we help!)
● Easier to scale the effort!
33. How? Environment
● Free to create any environment configuration
● Integration stack
● Full production-like or scaled-down environment
● Hybrid model
o Performance + integration stack
● Production testing
39. Large Scale Tests
● > 100k req/s
● > 100 of load generators
● High Throughput Components
o In-Memory Caches
● Component scaling
● Full production tests
40. Large Scale Tests: Problems
● Your test client is likely the first bottleneck
● Components are (often) not designed to
scale
o Great performance per node;
o But they don’t scale horizontally.
o Controller, data feeder, load generator*, result
collection, result analysis, monitoring
* often the exception
41. Large Scale Tests: Single Controller
● Single controller, multiple load generators
● Controller also serves as data feeder
● Controller collects all results synchronously
● Controller aggregates monitoring data
● Batch and async might alleviate the problem
● Analysis of large result sets is heavy (think
percentiles)
42. Large Scale Tests: Distributed Model
● Data Feeding and Load Generation
o No Controller
o Independent Load Generators
● Data Collection and Monitoring
o Decentralized Monitoring Platform
● Data Analysis
o Aggregation at node level
o Hive/Pig
o ElasticSearch
43. Takeaways
● Canary analysis
● Testing only when it brings VALUE
● Leveraging cloud for tests
● Automated test analysis
● Pushing execution to development teams
● Open source tools
47. Simian Army
● Ensures cloud handles failures
through regular testing
● The Monkeys
o Chaos Monkey: Resiliency
o Latency: Artificial Delays
o Conformity: Best-practices
o Janitor: Unused Instances
o Doctor: Health checks
o Security: Security Violations
o Chaos Gorilla: AZ Failure
o Chaos Kong: Region Failure
48. “... is a latency and fault
tolerance library designed to
isolate points of access to
remote systems ...”
● Stop cascading failures.
● Fallbacks and graceful degradation
● Fail fast and rapid recovery
● Thread and semaphore isolation with
circuit breakers
● Real-time monitoring and
configuration changes
* https://github.com/Netflix/Hystrix
49. Real-time Analytics Platform (RTA)
● ACA runs on top of RTA
● Compute Engines
o OpenCPU (R)
o OpenPY (Python)
● Data Sources
o Real-time Monitoring Systems
o Big Data Platforms
● Reporting, Scheduling, Persistence
50. Slow Performance Regression
● Deviation => “acceptable” regression
● Small performance regressions might sneak in
● Short release cycle = many releases
● Many releases = cumullative regression
52. Testing Lower Level Components
● Base AMIs
o OS (Linux), tools and agents
● Common Application Platform
● Common Libraries
● Reference Application
o Leverages a common architecture (front, middle,
data, memcache, jar clients, Hystrix)
o Implements functions that stress
specific resources (cpu, service, db)