A presentation on the Netflix Cloud Architecture and NetflixOSS open source. For the All Things Open 2015 conference in Raleigh 2015/10/19. #ATO2015 #NetflixOSS
DataStax: Backup and Restore in Cassandra and OpsCenterDataStax Academy
Cassandra and OpsCenter has a range of backup and restore topics. I will start with a basic overview of Cassandra backup/restore, walking through the operational steps to provide the understanding required to perform an on disk backup and restore. Expanding on this overview, I'll cover the limitations (including schema requirements) and their impact on the restore process. Further, I'll discuss commit log archiving and point in time restore operations. After covering the underlying operations, I'll wrap up with a discussion of how OpsCenter automates this process and leverages S3.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
For a long time in order to achieve mutual TLS between Kafka brokers and its clients we had to use long-lived certificates which is a nightmare to manage at large scale. At TransferWise, we have around 300 microservices and most of them use Kafka for the async communication, stream processing, event sourcing, etc. We wanted to implement Kafka security in a way that reduced the maintenance burden on platform teams, while making migration of diverse clients as simple as possible. In this talk we will describe how we have achieved that goal using SPIFFE with SPIRE and Envoy, requiring zero code changes on the client side.
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
Slides for Amey Banarse's, Principal Data Architect at Yugabyte, "Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB" webinar recorded on Oct 30, 2019 at 11 AM Pacific.
Playback here: https://vimeo.com/369929255
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotCitus Data
Citus is a sharding extension for postgres that can efficiently distribute a wide range of SQL queries. It uses postgres' planner hook to transparently intercept and plan queries on "distributed" tables. Citus then executes the queries in parallel across many servers, in a way that delegates most of the heavy lifting back to postgres.
Within Citus, we distinguish between several types of SQL queries, which each have their own planning logic:
Local-only queries
Single-node “router” queries
Multi-node “real-time” queries
Multi-stage queries
Each type of query corresponds to a different use case, and Citus implements several planners and executors using different techniques to accommodate the performance requirements and trade-offs for each use case.
This talk will discuss the internals of the different types of planners and executors for distributing SQL on top of postgres, and how they can be applied to different use cases.
DataStax: Backup and Restore in Cassandra and OpsCenterDataStax Academy
Cassandra and OpsCenter has a range of backup and restore topics. I will start with a basic overview of Cassandra backup/restore, walking through the operational steps to provide the understanding required to perform an on disk backup and restore. Expanding on this overview, I'll cover the limitations (including schema requirements) and their impact on the restore process. Further, I'll discuss commit log archiving and point in time restore operations. After covering the underlying operations, I'll wrap up with a discussion of how OpsCenter automates this process and leverages S3.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
For a long time in order to achieve mutual TLS between Kafka brokers and its clients we had to use long-lived certificates which is a nightmare to manage at large scale. At TransferWise, we have around 300 microservices and most of them use Kafka for the async communication, stream processing, event sourcing, etc. We wanted to implement Kafka security in a way that reduced the maintenance burden on platform teams, while making migration of diverse clients as simple as possible. In this talk we will describe how we have achieved that goal using SPIFFE with SPIRE and Envoy, requiring zero code changes on the client side.
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
Slides for Amey Banarse's, Principal Data Architect at Yugabyte, "Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB" webinar recorded on Oct 30, 2019 at 11 AM Pacific.
Playback here: https://vimeo.com/369929255
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotCitus Data
Citus is a sharding extension for postgres that can efficiently distribute a wide range of SQL queries. It uses postgres' planner hook to transparently intercept and plan queries on "distributed" tables. Citus then executes the queries in parallel across many servers, in a way that delegates most of the heavy lifting back to postgres.
Within Citus, we distinguish between several types of SQL queries, which each have their own planning logic:
Local-only queries
Single-node “router” queries
Multi-node “real-time” queries
Multi-stage queries
Each type of query corresponds to a different use case, and Citus implements several planners and executors using different techniques to accommodate the performance requirements and trade-offs for each use case.
This talk will discuss the internals of the different types of planners and executors for distributing SQL on top of postgres, and how they can be applied to different use cases.
Increasingly, organizations are relying on Kafka for mission critical use-cases where high availability and fast recovery times are essential. In particular, enterprise operators need the ability to quickly migrate applications between clusters in order to maintain business continuity during outages. In many cases, out-of-order or missing records are entirely unacceptable. MirrorMaker is a popular tool for replicating topics between clusters, but it has proven inadequate for these enterprise multi-cluster environments. Here we present MirrorMaker 2.0, an upcoming all-new replication engine designed specifically to provide disaster recovery and high availability for Kafka. We describe various replication topologies and recovery strategies using MirrorMaker 2.0 and associated tooling.
10 Things Every Developer Using RabbitMQ Should KnowVMware Tanzu
RabbitMQ is the most popular open-source message broker. It’s a de facto standard for message-based architectures. And yet, despite the abundant documentation and usage, developers and operators can still get tripped up on configuration and usage patterns.
Let’s face it: some of these best practices are hard to capture in docs. There’s a subtle difference between what RabbitMQ *can* do, and *how* you should use it in different scenarios. Now is your chance to hear from seasoned RabbitMQ whisperers, Jerry Kuch and Wayne Lund.
Join Pivotal’s Jerry, Senior Principal Software Engineer, and Wayne, Advisory Data Engineer, as they share their top ten RabbitMQ best practices. You’ll learn:
- How and when—and when *not*—to cluster RabbitMQ
- How to optimize resource consumption for better performance
- When and how to persist messages
- How to do performance testing
- And much more!
Speakers:
Jerry Kuch, Senior Principal Software Engineer
Wayne Lund, Advisory Data Engineer, Pivotal
Uber has one of the largest Kafka deployment in the industry. To improve the scalability and availability, we developed and deployed a novel federated Kafka cluster setup which hides the cluster details from producers/consumers. Users do not need to know which cluster a topic resides and the clients view a "logical cluster". The federation layer will map the clients to the actual physical clusters, and keep the location of the physical cluster transparent from the user. Cluster federation brings us several benefits to support our business growth and ease our daily operation. In particular, Client control. Inside Uber there are a large of applications and clients on Kafka, and it's challenging to migrate a topic with live consumers between clusters. Coordinations with the users are usually needed to shift their traffic to the migrated cluster. Cluster federation enables much control of the clients from the server side by enabling consumer traffic redirection to another physical cluster without restarting the application. Scalability: With federation, the Kafka service can horizontally scale by adding more clusters when a cluster is full. The topics can freely migrate to a new cluster without notifying the users or restarting the clients. Moreover, no matter how many physical clusters we manage per topic type, from the user perspective, they view only one logical cluster. Availability: With a topic replicated to at least two clusters we can tolerate a single cluster failure by redirecting the clients to the secondary cluster without performing a region-failover. This also provides much freedom and alleviates the risks for us to carry out important maintenance on a critical cluster. Before the maintenance, we mark the cluster as a secondary and migrate off the live traffic and consumers. We will present the details of the architecture and several interesting technical challenges we overcame.
Presentation at Strata Data Conference 2018, New York
The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.
Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Databricks
Near real-time analytics has become a common requirement for many data teams as the technology has caught up to the demand. One of the hardest aspects of enabling near-realtime analytics is making sure the source data is ingested and deduplicated often enough to be useful to analysts while writing the data in a format that is usable by your analytics query engine. This is usually the domain of many tools since there are three different aspects of the problem: streaming ingestion of data, deduplication using an ETL process, and interactive analytics. With Spark, this can be done with one tool. This talk with walk you through how to use Spark Streaming to ingest change-log data, use Spark batch jobs to perform major and minor compaction, and query the results with Spark.SQL. At the end of this talk you will know what is required to setup near-realtime analytics at your organization, the common gotchas including file formats and distributed file systems, and how to handle data the unique data integrity issues that arise from near-realtime analytics.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics:
- New IO Scheduler and Disk Parallelism
- Per-Service-Level Timeouts
- Better Workload Estimation for Backpressure and Out-of-Memory Conditions
- Large Partition Handling Improvements
- Optimizing Reverse Queries
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Big Data means big hardware, and the less of it we can use to do the job properly, the better the bottom line. Apache Kafka makes up the core of our data pipelines at many organizations, including LinkedIn, and we are on a perpetual quest to squeeze as much as we can out of our systems, from Zookeeper, to the brokers, to the various client applications. This means we need to know how well the system is running, and only then can we start turning the knobs to optimize it. In this talk, we will explore how best to monitor Kafka and its clients to assure they are working well. Then we will dive into how to get the best performance from Kafka, including how to pick hardware and the effect of a variety of configurations in both the broker and clients. We’ll also talk about setting up Kafka for no data loss.
Common Patterns of Multi Data-Center Architectures with Apache Kafkaconfluent
Whether you know you want to run Apache Kafka in multiple data centers and need practical advice or you are wondering why some organizations even need more than one cluster, this online talk is for you.
In this short session, we’ll discuss the basic patterns of multi-datacenter Kafka architectures, explore some of the use-cases enabled by each architecture and show how Confluent Enterprise products make these patterns easy to implement.
Visit www.confluent.io for more information.
Introducing Apache Kafka - a visual overview. Presented at the Canberra Big Data Meetup 7 February 2019. We build a Kafka "postal service" to explain the main Kafka concepts, and explain how consumers receive different messages depending on whether there's a key or not.
On Friday 5 June 2015 I gave a talk called Cluster Management with Kubernetes to a general audience at the University of Edinburgh. The talk includes an example of a music store system with a Kibana front end UI and an Elasticsearch based back end which helps to make concrete concepts like pods, replication controllers and services.
Processing IoT Data from End to End with MQTT and Apache Kafka confluent
(Kai Waehner, Confluent) Kafka Summit SF 2018
This session discusses end-to-end use cases such as connected cars, smart home or healthcare sensors, where you integrate Internet of Things (IoT) devices with enterprise IT using open source technologies and standards. MQTT is a lightweight messaging protocol for IoT. However, MQTT is not built for high scalability, longer storage or easy integration to legacy systems. Apache Kafka is a highly scalable distributed streaming platform, which ingests, stores, processes and forwards high volumes of data from thousands of IoT devices.
This session discusses the Apache Kafka open source ecosystem as a streaming platform to process IoT data. See a live demo of how MQTT brokers like Mosquitto or RabbitMQ integrate with Kafka, and how you can even integrate MQTT clients to Kafka without MQTT Broker. Learn how to analyze the IoT data either natively on Kafka with Kafka Streams/KSQL or on an external big data cluster like Spark, Flink or Elasticsearch leveraging Kafka Connect.
Increasingly, organizations are relying on Kafka for mission critical use-cases where high availability and fast recovery times are essential. In particular, enterprise operators need the ability to quickly migrate applications between clusters in order to maintain business continuity during outages. In many cases, out-of-order or missing records are entirely unacceptable. MirrorMaker is a popular tool for replicating topics between clusters, but it has proven inadequate for these enterprise multi-cluster environments. Here we present MirrorMaker 2.0, an upcoming all-new replication engine designed specifically to provide disaster recovery and high availability for Kafka. We describe various replication topologies and recovery strategies using MirrorMaker 2.0 and associated tooling.
10 Things Every Developer Using RabbitMQ Should KnowVMware Tanzu
RabbitMQ is the most popular open-source message broker. It’s a de facto standard for message-based architectures. And yet, despite the abundant documentation and usage, developers and operators can still get tripped up on configuration and usage patterns.
Let’s face it: some of these best practices are hard to capture in docs. There’s a subtle difference between what RabbitMQ *can* do, and *how* you should use it in different scenarios. Now is your chance to hear from seasoned RabbitMQ whisperers, Jerry Kuch and Wayne Lund.
Join Pivotal’s Jerry, Senior Principal Software Engineer, and Wayne, Advisory Data Engineer, as they share their top ten RabbitMQ best practices. You’ll learn:
- How and when—and when *not*—to cluster RabbitMQ
- How to optimize resource consumption for better performance
- When and how to persist messages
- How to do performance testing
- And much more!
Speakers:
Jerry Kuch, Senior Principal Software Engineer
Wayne Lund, Advisory Data Engineer, Pivotal
Uber has one of the largest Kafka deployment in the industry. To improve the scalability and availability, we developed and deployed a novel federated Kafka cluster setup which hides the cluster details from producers/consumers. Users do not need to know which cluster a topic resides and the clients view a "logical cluster". The federation layer will map the clients to the actual physical clusters, and keep the location of the physical cluster transparent from the user. Cluster federation brings us several benefits to support our business growth and ease our daily operation. In particular, Client control. Inside Uber there are a large of applications and clients on Kafka, and it's challenging to migrate a topic with live consumers between clusters. Coordinations with the users are usually needed to shift their traffic to the migrated cluster. Cluster federation enables much control of the clients from the server side by enabling consumer traffic redirection to another physical cluster without restarting the application. Scalability: With federation, the Kafka service can horizontally scale by adding more clusters when a cluster is full. The topics can freely migrate to a new cluster without notifying the users or restarting the clients. Moreover, no matter how many physical clusters we manage per topic type, from the user perspective, they view only one logical cluster. Availability: With a topic replicated to at least two clusters we can tolerate a single cluster failure by redirecting the clients to the secondary cluster without performing a region-failover. This also provides much freedom and alleviates the risks for us to carry out important maintenance on a critical cluster. Before the maintenance, we mark the cluster as a secondary and migrate off the live traffic and consumers. We will present the details of the architecture and several interesting technical challenges we overcame.
Presentation at Strata Data Conference 2018, New York
The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.
Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Databricks
Near real-time analytics has become a common requirement for many data teams as the technology has caught up to the demand. One of the hardest aspects of enabling near-realtime analytics is making sure the source data is ingested and deduplicated often enough to be useful to analysts while writing the data in a format that is usable by your analytics query engine. This is usually the domain of many tools since there are three different aspects of the problem: streaming ingestion of data, deduplication using an ETL process, and interactive analytics. With Spark, this can be done with one tool. This talk with walk you through how to use Spark Streaming to ingest change-log data, use Spark batch jobs to perform major and minor compaction, and query the results with Spark.SQL. At the end of this talk you will know what is required to setup near-realtime analytics at your organization, the common gotchas including file formats and distributed file systems, and how to handle data the unique data integrity issues that arise from near-realtime analytics.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics:
- New IO Scheduler and Disk Parallelism
- Per-Service-Level Timeouts
- Better Workload Estimation for Backpressure and Out-of-Memory Conditions
- Large Partition Handling Improvements
- Optimizing Reverse Queries
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Big Data means big hardware, and the less of it we can use to do the job properly, the better the bottom line. Apache Kafka makes up the core of our data pipelines at many organizations, including LinkedIn, and we are on a perpetual quest to squeeze as much as we can out of our systems, from Zookeeper, to the brokers, to the various client applications. This means we need to know how well the system is running, and only then can we start turning the knobs to optimize it. In this talk, we will explore how best to monitor Kafka and its clients to assure they are working well. Then we will dive into how to get the best performance from Kafka, including how to pick hardware and the effect of a variety of configurations in both the broker and clients. We’ll also talk about setting up Kafka for no data loss.
Common Patterns of Multi Data-Center Architectures with Apache Kafkaconfluent
Whether you know you want to run Apache Kafka in multiple data centers and need practical advice or you are wondering why some organizations even need more than one cluster, this online talk is for you.
In this short session, we’ll discuss the basic patterns of multi-datacenter Kafka architectures, explore some of the use-cases enabled by each architecture and show how Confluent Enterprise products make these patterns easy to implement.
Visit www.confluent.io for more information.
Introducing Apache Kafka - a visual overview. Presented at the Canberra Big Data Meetup 7 February 2019. We build a Kafka "postal service" to explain the main Kafka concepts, and explain how consumers receive different messages depending on whether there's a key or not.
On Friday 5 June 2015 I gave a talk called Cluster Management with Kubernetes to a general audience at the University of Edinburgh. The talk includes an example of a music store system with a Kibana front end UI and an Elasticsearch based back end which helps to make concrete concepts like pods, replication controllers and services.
Processing IoT Data from End to End with MQTT and Apache Kafka confluent
(Kai Waehner, Confluent) Kafka Summit SF 2018
This session discusses end-to-end use cases such as connected cars, smart home or healthcare sensors, where you integrate Internet of Things (IoT) devices with enterprise IT using open source technologies and standards. MQTT is a lightweight messaging protocol for IoT. However, MQTT is not built for high scalability, longer storage or easy integration to legacy systems. Apache Kafka is a highly scalable distributed streaming platform, which ingests, stores, processes and forwards high volumes of data from thousands of IoT devices.
This session discusses the Apache Kafka open source ecosystem as a streaming platform to process IoT data. See a live demo of how MQTT brokers like Mosquitto or RabbitMQ integrate with Kafka, and how you can even integrate MQTT clients to Kafka without MQTT Broker. Learn how to analyze the IoT data either natively on Kafka with Kafka Streams/KSQL or on an external big data cluster like Spark, Flink or Elasticsearch leveraging Kafka Connect.
Triangle Devops Meetup covering Netflix open source, cloud architecture, and what Andrew did in his first year working as a senior software engineer in the cloud platform group.
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...DevOps.com
As part of its Cloud-native transformation, Cisco needed to modernize its software delivery process. Scalability, multi-cloud deployment to its OpenShift environment and public clouds, and the ability to support Cisco’s extensive policy, compliance, and security requirements made open source Spinnaker a logical choice for a modern continuous delivery platform.
As one of the world’s top technology providers with one of the largest and most diverse software development organizations, Cisco had to overcome some unique challenges to be able to onboard 10,000+ developers, 1000+ monolithic and non-cloud native applications, and achieve the high availability and reliability needed to support mission-critical production applications.
Join us for this new webinar as Balaji Siva, VP of Products at OpsMx engages Anil Anaberumutt, IT architect at Cisco, and Red Hat Sr. Solutions Architect, Vikas Grover, in a discussion about Cisco’s CD challenges and the lessons learned, best practices implemented, and key results achieved on their CD transformation journey from zero to over 1000 applications.
For the Computer Measurement Group workshop in San Diego November 2013. Also presented to a student class at UC Santa Barbara. What is Cloud Native. Capacity and Performance benchmarks. Cost Optimization Techniques - content co-developed with Jinesh Varia of AWS.
Today, it is critical that IT teams are able to easily, consistently deploy to production. Running Docker containers on Amazon Web Services makes it possible to engineer a compliant and DevOps-friendly environment from the ground up. Spring Venture Group successfully migrated to AWS with Docker containers and leveraged Logicworks to migrate to AWS and automate infrastructure build-out and deployment. Join our webinar to learn how Spring Venture Group, an innovative insurance brokerage, reduced risk and improved deployment velocity with Logicworks, AWS, and Docker.
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023VMware Tanzu
Event Slides: Simplify and Scale Enterprise Spring Apps in the Cloud
Date: March 23, 2023
Speakers:
Adib Saikali, Principal Solutions Engineer, VMware Tanzu
Asir Selvasingh, Principal Architect, Java on Azure, Microsoft
Modernizing Testing as Apps Re-ArchitectDevOps.com
Applications are moving to cloud and containers to boost reliability and speed delivery to production. However, if we use the same old approaches to testing, we'll fail to achieve the benefits of cloud. But what do we really need to change? We know we need to automate tests, but how do we keep our automation assets from becoming obsolete? Automatically provisioning test environments seems close, but some parts of our applications are hard to move to cloud.
[Capitole du Libre] #serverless - mettez-le en oeuvre dans votre entreprise...Ludovic Piot
Tout comme le Cloud IaaS avant lui, le serverless promet de faciliter le succès de vos projets en accélérant le Time to Market et en fluidifiant les relations entre Devs et Ops.
Mais sa mise en œuvre au sein d’une entreprise reste complexe et coûteuse.
Après 2 ans à mettre en place des plateformes managées de ce type, nous partagons nos expériences de ce qu’il faut faire pour mettre en œuvre du serverless en entreprise, en évitant les douleurs et en limitant les contraintes au maximum.
Tout d’abord l’architecture technique, avec 2 implémentations très différentes : Kubernetes et Helm d’un côté, Clever Cloud on-premise de l’autre.
Ensuite, la mise en place et l’utilisation d’OpenFaaS. Comment tester et versionner du Function as a Service. Mais aussi les problématiques de blue/green deployment, de rolling update, d’A/B testing. Comment diagnostiquer rapidement les dépendances et les communications entre services.
Enfin, en abordant les sujets chers à la production : * vulnerability management et patch management, * hétérogénéïté du parc, * monitoring et alerting, * gestion des stacks obsolètes, etc.
AWS re:Invent 2016: Develop Your Migration Toolkit (ENT312)Amazon Web Services
Learn about some of the most useful and popular tools that you can leverage at various stages of a migration project. These tools will allow your teams to focus on coordinating the migration and automating as many migration activities as possible.
Enterprise DevOps is different then DevOps in startups and smaller companies. This session how AWS/CSC address this. How AWS IaaS level automation via CloudFormation, UserData, Console, APIS and some PaaS OpsWorks/Beanstalk is complimented by CSC Agility Platform. CSC Agility adds application compliance and security to the AWS infrastructure compliance and security. CSC Agility allows for the creation of architecture blueprints for predefined application offerings.
Business and IT agility through DevOps and microservice architecture powered ...Lucas Jellema
IT needs to run in production in order to generate business value. DevOps is among other things a way of thinking focusing on production software. A business application requires a tailor made platform to generate business value. The combination of application and its platform is a DevOps product. The DevOps team has full responsibility for that product through its entire lifecycle.
The microservices architecture promises flexibility, scalability, and optimal use of compute resources. Via independent components with well-defined scope and responsibility, interface, and ownership that are evolved and managed in an automated DevOps process, this architecture leverages current technologies and hard-learned insights from past decades.
This session defines the objectives of Business with IT, of microservices and DevOps and introduces Containers and the container platform Kubernetes as crucial ingredients for making DevOps happen.
Adrian Cockcroft on his top predictions for the cloud computing industry in 2015 and beyond, as well as how cloud-native applications, continuous-delivery and DevOps techniques, will speed the pace of innovation and disruption.
For more about Adrian be sure to check out his page on Battery Ventures:
https://www.battery.com/our-team/member/adrian-cockcroft/
Follow Adrian on Twitter: @adrianco
Building Cloud-Native App Series - Part 5 of 11
Microservices Architecture Series
Microservices Architecture,
Monolith Migration Patterns
- Strangler Fig
- Change Data Capture
- Split Table
Infrastructure Design Patterns
- API Gateway
- Service Discovery
- Load Balancer
Data Streaming with Apache Kafka & MongoDBconfluent
Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Similar to Netflix Cloud Architecture and Open Source (20)
Herding Kats - Netflix’s Journey to Kubernetes Publicaspyker
An update from Netflix Compute's container management platform, Titus, covering the work to move from Mesos to Kubernetes. Lessons learned, next steps, and challenges.
Season 7 Episode 1 - Tools for Data Scientistsaspyker
Metaflow (Ville Tuulos)
Data scientists at Netflix are expected to develop and operate large machine learning workflows autonomously. However, we do not expect that all our scientists are deeply experienced with distributed systems and data engineering. Metaflow was created to make it delightfully easy to build and operate ML workflows in the cloud using idiomatic Python and off-the-shelf ML libraries, covering the whole lifecycle of an ML project from prototype to production.
Polynote (Jeremy Smith)
Polynote is a new notebook tool we created from scratch to address some of the pain points we've run into while using Scala in machine-learning notebooks at Netflix. It provides essential code editing features other tools lack like interactive auto-completes, support for mixing multiple languages and sharing data between them within a single notebook, and encourages reproducible notebooks with its immutable data model.
Papermill (Matthew Seal)
Nteract is an open source organization under which there are several libraries and applications that Netflix and many other companies and individuals contribute to. One of these libraries is Papermill, a library used to programmatically parameterize and execute Jupyter Notebooks. Papermill provides a CLI and Python interface that we'll explore during the session to see how it can be used and what value it adds. Using this pattern we'll also briefly talk about how we've integrated papermill at Netflix and how it interfaces with other Jupyter and nteract services.
CMP376 - Another Week, Another Million Containers on Amazon EC2aspyker
Netflix’s container management platform, Titus, powers critical aspects of the Netflix business, including video streaming, recommendations, machine learning, big data, content encoding, studio technology, internal engineering tools, and other Netflix workloads. Titus offers a convenient model for managing compute resources, enables developers to maintain just their application artifacts, and provides a consistent developer experience from a developer’s laptop to production by leveraging Netflix container-focused engineering tools.
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
Disenchantment is a Netflix show following the medieval misadventures of a hard-drinking princess, her feisty elf, and her personal demon. In this talk, we will follow the story of Netflix’s container management platform, Titus, which powers critical aspects of the Netflix business (video encoding & streaming, big data, recommendations & machine learning, and other workloads). We’ll cover the challenges growing Titus from 10’s to 1000’s of workloads. We’ll talk about our feisty team’s work across container runtimes, scheduling & control plane, and cloud infrastructure integration. We’ll talk about the demons we’ve found on this journey covering operability, security, reliability and performance.
In this episode, we will focus on continuous delivery and how Netflix uses Spinnaker and Kayenta to safely deliver changes to the cloud and beyond. Kayenta is a platform for Automated Canary Analysis (ACA). It is used by Spinnaker to enable automated canary deployments. We will also discuss how Spinnaker is used at Netflix to deploy targets beyond cloud VMs and containers --- batch jobs, CDNs, fast properties and Open Connect appliances.
NetflixOSS Meetup S6E1 - Titus & Containersaspyker
Come hear about our container management platform, Titus. Titus launches over 2 millions containers per week for service and batch workloads. Come to learn what applications are powered by Titus and what values the developers are getting from containers. Also, we will cover some of the Titus unique aspects of reliability, control plane, scheduling, and container runtime technologies. We will also cover our integrations with Netflix systems such as Spinnaker as well as Amazon concepts such as VPC and IAM.
https://www.meetup.com/Netflix-Open-Source-Platform/events/247776324/
Running Containers at Scale at Netflix. An update on the usage of containers at Netflix. Technical discussions on new features and concepts we've added across container scheduling and execution.
Topics:
• RepoKid
Netflix’s Open-source Strategy to Rightsizing Cloud Permissions at Scale
• BetterTLS
A test suite for HTTPS clients implementing verification of the Name Constraints certificate extension
• Authorization at Netflix
Netflix’s architecture for implementing Authorization at scale
• Open Policy Agent
An open source, general-purpose policy engine that enables unified, context-aware policy enforcement across the entire stack. (www.openpolicyagent.org)
• Introducing PADME (Policy Access Decision Management Engine)
A modern policy management for distributed heterogenous systems. (www.padme.io)
Demo Stations:
• Stethoscope
Personalized, user-focused recommendations for employee information security.
• HubCommander
Slack bot for GitHub organization management -- and other things too!
• Open Policy Agent
An open source, general-purpose policy engine that enables unified, context-aware policy enforcement across the entire stack.
Series of Unfortunate Netflix Container Events - QConNYC17aspyker
Project Titus is Netflix's container runtime on top of Amazon EC2. Titus powers algorithm research through massively parallel model training, media encoding, data research notebooks, ad hoc reporting, NodeJS UI services, stream processing and general micro-services. As an update from last year's talk, we will focus on the lessons learned operating one of the largest container runtimes on a public cloud. We'll cover the migration we've seen of applications and frameworks from VM's to containers. We will cover the operational issues with containers that only showed after we reached the large scale (1000's of container hosts, 100's of thousands of containers launched weekly) we are currently supporting. We'll touch base on the unique features we have added to help both batch and microservices run across a variety of runtimes (Java, R, NodeJS, Python, etc) and how higher level frameworks have taken advantage of Titus's scheduling capabilities.
In this episode, we will focus on open sourcing how we run Netflix's open source program. Netflix has been using and contributing to open source for several years. Over the years, Netflix has released over one hundred Netflix Open Source (aka NetflixOSS) libraries, servers, and technologies. Netflix engineers benefit by accepting contributions and gathering feedback with key collaborators around the world. Users of NetflixOSS from many industries benefit from our solutions including Big Data, Build and Delivery Tools, Runtime Services and Libraries, Data Persistence, Insight, Reliability and Performance, Security and User Interface. With such a large and mature open source program, Netflix has worked on approaches and tools that help manage and improve the NetflixOSS source offerings and communities. Netflix has taken a different approach to building support for open source as compared to other Internet scale companies. Come to this session to learn about the unique approaches Netflix has taken to both distribute and automate the responsibilities of building a world-class open source program.
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
Members from over all over the world streamed over forty-two billion hours of Netflix content last year. Various Netflix batch jobs and an increasing number of service applications use containers for their processing. In this session, Netflix presents a deep dive on the motivations and the technology powering container deployment on top of Amazon Web Services. The session covers our approach to resource management and scheduling with the open source Fenzo library, along with details of how we integrate Docker and Netflix container scheduling running on AWS. We cover the approach we have taken to deliver AWS platform features to containers such as IAM roles, VPCs, security groups, metadata proxies, and user data. We want to take advantage of native AWS container resource management using Amazon ECS to reduce operational responsibilities. We are delivering these integrations in collaboration with the Amazon ECS engineering team. The session also shares some of the results so far, and lessons learned throughout our implementation and operations.
Netflix and Containers: Not A Stranger Thingaspyker
Customers from over all over the world streamed Forty Two Billion hours of Netflix content last year. The Netflix streaming service had been powered by the Amazon cloud with virtual machines for over five years, blazing a trail for similar architectures. In the last year, it invested in containers for batch-style jobs and service-style applications. Andrew Spyker will explain the potential containers have to help Netflix create a more productive development experience while simultaneously deepening its control over resource management. Join Andrew to see why Netflix is moving forward with containers, how it can leverage its existing operational machinery, and how it’s running containers with a similar guarantee of high availability as current Netflix infrastructure provides.
Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker
Netflix has been using and contributing to open source for several years. Over the years, Netflix has released over one hundred Netflix Open Source (aka NetflixOSS) libraries, servers, and technologies. Netflix engineers benefit by accepting contributions and gathering feedback with key collaborators around the world. Users of NetflixOSS from many industries benefit from our solutions including Big Data, Build and Delivery Tools, Runtime Services and Libraries, Data Persistence, Insight, Reliability and Performance, Security and User Interface. With such a large and mature open source program, Netflix has worked on approaches and tools that help manage and improve the NetflixOSS source offerings and communities. Netflix has taken a different approach to building support for open source as compared to other Internet scale companies. Come to this session to learn about the unique approaches Netflix has taken to both distribute and automate the responsibilities of building a world-class open source program.
Netflix Open Source Meetup Season 4 Episode 3aspyker
In this episode, we will focus on security in the cloud at scale. We’ll have Netflix speakers discussing existing and upcoming security-related OSS releases, and we’ll also have external speakers from organizations that are using and contributing to Netflix security OSS.
First, Patrick Kelley from Netflix’s Security Operations team will speak about RepoMan, an upcoming OSS release designed to right-size AWS permissions. Then, Wes Miaw from Netflix’s Security Engineering team will discuss MSL (Message Security Layer).
We have two external speakers for this event - Chris Dorros from OpenDNS/Cisco will talk about his use of and contributions to Lemur, and Ryan Lane from Lyft will talk about their use of BLESS.
After the talks, we’ll have OSS authors at demo stations to answer questions and provide demos of Netflix security OSS, including Lemur, MSL, and Security Monkey.
Netflix Container Scheduling and Execution - QCon New York 2016aspyker
Scheduling a Fuller House: Container Management At Netflix
Customers from over all over the world streamed Forty Two Billion hours of Netflix content last year. Various Netflix batch jobs and an increasing number of service applications use containers for their processing. In this talk Netflix will present a deep dive on the motivations and the technology powering container deployment on top of the AWS EC2 service. The talk will cover our approach to cloud resource management and scheduling with the open source Fenzo library, along with details on docker execution engine as a part of project Titus. As well, the talk will share some of the results so far, lessons learned, and end with a brief look at the developer experience for containers.
Netflix Open Source Meetup Season 4 Episode 2aspyker
In this episode, we will take a close look at 2 different approaches to high-throughput/low-latency data stores, developed by Netflix.
The first, EVCache, is a battle-tested distributed memcached-backed data store, optimized for the cloud. You will also hear about the road ahead for EVCache it evolves into an L1/L2 cache over RAM and SSDs.
The second, Dynomite, is a framework to make any non-distributed data-store, distributed. Netflix's first implementation of Dynomite is based on Redis.
Come learn about the products' features and hear from Thomson and Reuters, Diego Pacheco from Ilegra and other third party speakers, internal and external to Netflix, on how these products fit in their stack and roadmap.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
2. About Netflix
69M members
2000+ employees (1400 tech)
80+ countries
> 100M hours watch per day
> ⅓ NA internet download traffic
500+ Microservices
Many 10’s of thousands VM’s
3 regions across the world
3. About the Speaker
Cloud platform technologies
Distributed configuration, service discovery, RPC, application
frameworks, non-Java sidecar
Container cloud
Resource management and scheduling, making Docker containers
operational in Amazon EC2/ECS
Open Source
Organize @NetflixOSS meetups & internal group
Performance
Assist across Netflix, but focused mainly on cloud platform perf
With Netflix for ~ 1 year. Previously at IBM here in Raleigh/Durham
(RTP)
@aspyker
ispyker.blog
spot.com
5. Why does Netflix open source?
Allows engineers to gather feedback
Openly talk, through code, on our approach
Collaboration on key projects with the world
Happily use proven outside open source
And improve it for Netflix scale and availability
Netflix culture of freedom and responsibility
Want to open source?
Go for it, be responsible!
Recruiting and Retention
Candidates know exactly what they can work on
NetflixOSS engineers choose to stay at Netflix
6. NetflixOSS is widely used
The architecture has shaped public cloud usage
Immutability, Red/Black Deploys, Chaos,
Regional and worldwide high availability
Offerings
Pivotal Spring Cloud
Large usage
IBM Watson as a Service (on IBM Cloud)
Nike Digital is hiring NetflixOSS experts
Interesting usage
“To help locate new troves of data claiming to be the files stolen from
AshleyMadison, the company’s forensics team has been using a tool
that Netflix released last year called Scumblr”
8. Key aspects of NetflixOSS website
Show how the pieces fit together
Projects now discussed with each other in context
OSS categories mirror internal teams
No artificial categories, focal points for each area
Focus on projects that are core to Netflix
Projects mentioned are core and strategic
11. Elastic, Web and Hyper Scale
Front end
API
Another
Microservice
Temporal
caching
Durable
Storage
Load
Balancers
Strategy Benefit
Make deployments automated Without automation impossible
Expose well designed API to users Offloads presentation complexity to clients
Remove state for mid tier services Allows easy elastic scale out
Push temporal state to client and caching tier Leverage clients, avoids data tier overload
Use partitioned data storage Data design and storage scales with HA
Recommendation
Microservice
13. Micro service
Implementation
Call microservice #2
Highly Available Service Runtime Recipe
Ribbon REST client
with Eureka
Microservice #1
(REST services)
App Service
Microservice #2
Execute
call
Hystrix
Eureka
Server(s)
Eureka
Server(s)
Eureka
Server(s)
Karyon
Fallback
Implementation
Implementation Detail Benefits
Decompose into micro services
• Key user path always available
• Failure does not propagate across service boundaries
Karyon /w automatic Eureka registration
• New instances are quickly found
• Failing individual instances disappear
Ribbon client with Eureka awareness
• Load balances & retries across instances with “smarts”
• Handles temporal instance failure
Hystrix as dependency circuit breaker
• Allows for fast failure
• Provides graceful cross service degradation/recovery
14. IaaS High Availability
Region (us-east-1)
us-east-1e
us-east-1c
Eureka
Web App Service1 Service2
Cluster Auto Recovery and Scaling Services (Auto Scaling Groups)
ELB’s
Rule Why?
Always > 2 of everything 1 is SPOF, 2 doesn’t web scale and slow DR recovery
Including IaaS and cloud services You’re only as strong as your weakest dependency
Use auto scaler/recovery monitoring Clusters guarantee availability and service latency
Use application level health checks Instance on the network != healthy
Worldwide availability Data replication, global front-end routing, cross region traffic
us-east-1d
15. A truly global service
Replicate data across
regions
Be able to redirect traffic
from region to region
Be able to migrate regional
traffic to other regions
Have automated control
across regions
Flux Demo
16. Testing is only way to prove HA
Chaos Monkey
Kill instances in production - runs regularly
Chaos Gorilla
Kills availability zones (single datacenter)
Also testing for split brain important
Chaos Kong
Kill entire region and shift traffic globally
Run frequently but with prior scheduling
18. v
Continuous Delivery
Cluster v1 Canary v2 Cluster V2
Step Technology
Developers test locally Unit test frameworks
Continuous build Continuous build server based on gradle builds
Build “bakes” full instance image Aminator and deployment pipeline bake images from build artifacts
Developer work across dev and test Archaius allows for environment based context
Developers do canary tests, red/black
deployments in prod
Asgard console provides app cluster common devops approach,
security patterns, and visibility
Continuous
Build Server
Baked to images
(AMI’s)
19. From Asgard to Spinnaker
Spinnaker is our CI/CD solution
CI/CD solution including baking and Jenkins integration
Workflow engine for the continuous delivery
Pipeline based deployment including baking
Global visibility across all of our AWS regions
Provides an API first design
A microservices runtime HA architecture
More flexible cloud model so the community can contribute back
improvements not related to AWS
Asgard continues to work side-by-side
Spinnaker is this new end to end CI/CD tool
22. Operational Visibility
Microservice #1 Microservice #2
Visibility Point Technology
Basic IaaS instance monitoring Not enough (not scalable, not app specific)
User like external monitoring SaaS offerings or OSS like Uptime
Targeted performance, sampling Vector performance and app level metrics
Service to service interconnects Hystrix streams ➔Turbine aggregation ➔Hystrix dashboard
Application centric metrics Servo/Spectator gauges, counters, timers sent to metrics store like Atlas
Remote logging Logstash/Kibana or similar log aggregation and analysis frameworks
Threshold monitoring and alerts Services like Atlas and PagerDuty for incident management
Servo/
Spectator
Hystrix/Turbine
External
Uptime
Monitoring Metric/Event
Repositories
LogStash/Elastic
Search/Kibana
Incidents
Atlas
Vector
24. Dynamic, Web Scale & Simpler Security
Security Monkey
Monitors security policies, tracks changes, alerts on situations
Scumblr
Searches internet for security “nuggets” (credentials, hacking discussions)
Sketchy
A safe way to collect text and screenshots from websites
FIDO
Automated event detection, analysis, enrichment & and enforcement
Sleepy Puppy
Delayed cross site scripting propagation testing framework
Lemur
x.509 certificate orchestration framework
25. What did we not cover?
Over 50 github projects
NetflixOSS is “Technical indigestion as a service”
Big Data, Data Persistence and UI Engineering
Big Data tools used well beyond Netflix
Ephemeral, semi and fully persistent data systems
Recent addition of UI OSS and Falcor
27. How do I get started?
All of the previous slides shows NetflixOSS components
Code: http://netflix.github.io
Announcements: http://techblog.netflix.com/
Want to get running a bit faster?
ZeroToCloud
Workshop for getting started with build/bake/deploy in Amazon EC2
ZeroToDocker
Docker images that containing running Netflix technologies (not production
ready, but easy to understand)
28. ZeroToDocker Demo
Mac OS X
Virtual Box
Ubuntu 14.04
single kernel
Container#1
Filesystem+
process
Eureka
Container
ZuulContainer
Another
Container
...
Docker running instances
Single kernel
Contained processes
Zookeeper and Exhibitor
A Microservices app and
surrounding NetflixOSS
services (Zuul to Karyon
with Eureka)