Slides from a presentation by Monal Daxini at Disney, Glendale CA about Netflix Open Source Software, Cloud Data Persistence, and Cassandra best Practices
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
Keystone processes over 700 billion events per day (1 peta byte) with at-least once processing semantics in the cloud. We will explore in detail how we leverage Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. We will also share our plans on offering a Stream Processing as a Service for all of Netflix use.
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini
Netflix Keystone Pipeline processing 600 billion events a day, and detailed treatise on the modification of and use of Samza for real time routing of events including docker.
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
Keystone - Processing over Half a Trillion events per day with 8 million events & 17 GB per second peaks, and at-least once processing semantics. We will explore in detail how we employ Kafka, Samza, and Docker at scale to implement a multi-tenant pipeline. We will also look at the evolution to its current state and where the pipeline is headed next in offering a self-service stream processing infrastructure atop the Kafka based pipeline and support Spark Streaming.
The need for gleaning answers from data in real-time is moving from nicety to a necessity. There are few options to analyze the never-ending stream of unbounded data at scale. Let’s compare and contrast the core principles and technologies the different open source solutions available to help with this endeavor, and where in the future processing engines need to evolve to solve processing needs at scale. These findings are based on the experience of continuing to build a scalable solution in the cloud to process over 700 billion events at Netflix, and how we are embarking on the next journey to evolve unbounded data processing engines.
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecPeter Bakas
Talk on Netflix Keystone by Peter Bakas at SF Data Engineering Meetup on 2/23/2016.
Topics covered:
- Architectural design and principles for Keystone
- Technologies that Keystone is leveraging
- Best practices
http://www.meetup.com/SF-Data-Engineering/events/228293610/
The need for gleaning answers from unbounded data streams is moving from nicety to a necessity. Netflix is a data driven company, and has a need to process over 1 trillion events a day amounting to 3 PB of data to derive business insights.
To ease extracting insight, we are building a self-serve, scalable, fault-tolerant, multi-tenant "Stream Processing as a Service" platform so the user can focus on data analysis. I'll share our experience using Flink to help build the platform.
A talk given on 2018-06-16 in HK Open Source Conference 2018.
The rise of the Apache Kafka starts a new generation of data pipeline - the stream-processing pipeline.
In this talk, Dr. Mole Wong will walk you through the concept of the stream-processing data pipeline, and how this data pipeline can be set up. He will also discuss the use cases of such a data pipeline.
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
Keystone processes over 700 billion events per day (1 peta byte) with at-least once processing semantics in the cloud. We will explore in detail how we leverage Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. We will also share our plans on offering a Stream Processing as a Service for all of Netflix use.
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini
Netflix Keystone Pipeline processing 600 billion events a day, and detailed treatise on the modification of and use of Samza for real time routing of events including docker.
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
Keystone - Processing over Half a Trillion events per day with 8 million events & 17 GB per second peaks, and at-least once processing semantics. We will explore in detail how we employ Kafka, Samza, and Docker at scale to implement a multi-tenant pipeline. We will also look at the evolution to its current state and where the pipeline is headed next in offering a self-service stream processing infrastructure atop the Kafka based pipeline and support Spark Streaming.
The need for gleaning answers from data in real-time is moving from nicety to a necessity. There are few options to analyze the never-ending stream of unbounded data at scale. Let’s compare and contrast the core principles and technologies the different open source solutions available to help with this endeavor, and where in the future processing engines need to evolve to solve processing needs at scale. These findings are based on the experience of continuing to build a scalable solution in the cloud to process over 700 billion events at Netflix, and how we are embarking on the next journey to evolve unbounded data processing engines.
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecPeter Bakas
Talk on Netflix Keystone by Peter Bakas at SF Data Engineering Meetup on 2/23/2016.
Topics covered:
- Architectural design and principles for Keystone
- Technologies that Keystone is leveraging
- Best practices
http://www.meetup.com/SF-Data-Engineering/events/228293610/
The need for gleaning answers from unbounded data streams is moving from nicety to a necessity. Netflix is a data driven company, and has a need to process over 1 trillion events a day amounting to 3 PB of data to derive business insights.
To ease extracting insight, we are building a self-serve, scalable, fault-tolerant, multi-tenant "Stream Processing as a Service" platform so the user can focus on data analysis. I'll share our experience using Flink to help build the platform.
A talk given on 2018-06-16 in HK Open Source Conference 2018.
The rise of the Apache Kafka starts a new generation of data pipeline - the stream-processing pipeline.
In this talk, Dr. Mole Wong will walk you through the concept of the stream-processing data pipeline, and how this data pipeline can be set up. He will also discuss the use cases of such a data pipeline.
http://www.oreilly.com/pub/e/3764
Keystone processes over 700 billion events per day (1 peta byte) with at-least-once processing semantics in the cloud. Monal Daxini details how they used Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. He'll also share plans on offering a Stream Processing as a Service for all of Netflix use.
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
In this session, Netflix provides an overview of Keystone, their new data pipeline. The session covers how Netflix migrated from Suro to Keystone, including the reasons behind the transition and the challenges of zero loss while processing over 400 billion events daily. The session covers in detail how they deploy, operate, and scale Kafka, Samza, Docker, and Apache Mesos in AWS to manage 8 million events & 17 GB per second during peak.
Strategies and techniques to optimize Kafka brokers and producers to minimize data loss under huge traffic volume, limited configuration options, less ideal and constant changing environment and balance against cost.
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
"This is a technical architect's case study of how Loggly has employed the latest social-media-scale technologies as the backbone ingestion processing for our multi-tenant, geo-distributed, and real-time log management system. This presentation describes design details of how we built a second-generation system fully leveraging AWS services including Amazon Route 53 DNS with heartbeat and latency-based routing, multi-region VPCs, Elastic Load Balancing, Amazon Relational Database Service, and a number of pro-active and re-active approaches to scaling computational and indexing capacity.
The talk includes lessons learned in our first generation release, validated by thousands of customers; speed bumps and the mistakes we made along the way; various data models and architectures previously considered; and success at scale: speeds, feeds, and an unmeltable log processing engine."
Netflix changed its data pipeline architecture recently to use Kafka as the gateway for data collection for all applications which processes hundreds of billions of messages daily. This session will discuss the motivation of moving to Kafka, the architecture and improvements we have added to make Kafka work in AWS. We will also share the lessons learned and future plans.
The Netflix Way to deal with Big Data ProblemsMonal Daxini
Netflix is a data driven company with a unique culture. Come take a holistic tour of the Big Data ecosystem, and how Netflix culture catalyzes the development of systems. Then ogle at how we quickly evolved and scaled the event pipeline to a 1 trillion events per day and over 1.4 PB of event data without service disruption, and a small team.
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
Netflix is a data driven company and we process over 700 billion streaming events per day with at-least once processing semantics in the cloud. To enable extracting intelligence from this unbounded stream easily we are building Stream Processing as a Service (SPaaS) infrastructure so that the user can focus on extracting value and not have to worry about boilerplate infrastructure and scale.
We will share our experience in building a scalable SPaaS using Flink, Apache Beam and Kafka as the foundation layer to process over 1.3 PB of event data without service disruption.
Apache Kafka, Apache Cassandra and Kubernetes are open source big data technologies enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpins the data layer of the stack providing capability to stream, disseminate, store and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters. In this presentation, we will reveal how we architected a massive scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example Anomaly detection application running on a Kubernetes cluster and generating and processing massive amount of events. Anomaly detection is a method used to detect unusual events in an event stream. It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at massive scale generating millions or billions of events, they impose significant computational, performance and scalability challenges to anomaly detection algorithms and data layer technologies. We will demonstrate the scalability, performance and cost effectiveness of Apache Kafka, Cassandra and Kubernetes, with results from our experiments allowing the Anomaly detection application to scale to 19 Billion anomaly checks per day.
Deploying Kafka at Dropbox, Mark Smith, Sean Fellowsconfluent
At Dropbox we are currently handling approximately 10,000,000 messages per second at peak across our handful of Kafka clusters. The largest of which has hit throughputs of 7,000,000 per second (~30 Gbps) on only 20 nodes. We’ll walk you through the steps we took to get where we are, the design that works for us — and those that didn’t. We’ll talk about the tooling we had to build and what we want to see exist.
We’ll dive deeper into configuration and provide a blueprint you can follow. We’ll talk about the trials and tribulations of using Kafka — including ways we’ve set our clusters on fire, ways we’ve lost data, ways we’ve turned our hairs gray, and ways we’ve heroically saved the day for our users. Finally, we’ll spend time on some of the work we’re doing to handle consumer coordination across our many different systems and to integrate Kafka into a well established corporate infrastructure. (I.e., making Kafka “”play nice”” with everybody.)
Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.
Arc305 how netflix leverages multiple regions to increase availability an i...Ruslan Meshenberg
Learn how to make your services more resilient and available by embracing principles of isolation and redundancy. See details of 2 projects - Isthmus and Active/Active to learn how Netflix architects for availability in multi-regional environment.
Recently, the interest in highly scalable stream processing engines has risen, thus many projects have appeared. Apache Samza is a distributed stream-processing framework that uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, and resource management. It is one of the most popular stream processing engines out there used by many high-profile companies. On the other hand, we have Amazon Kinesis that is a fully managed service for real-time processing of streaming data which allows users to scale the amount of data ingested by Kinesis without worrying about the infrastructure details. This presentation gives a brief introduction about the very popular Samza-Kafka integration, then focuses on the new Samza-Kinesis integration, and explains users the new opportunities they have due to the new Samza-Kinesis integration.
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019confluent
Data stream processing is built on the core concept of time. However, understanding time semantics and reasoning about time is not simple, especially if deterministic processing is expected. In this talk, we explain the difference between processing, ingestion, and event time and what their impact is on data stream processing. Furthermore, we explain how Kafka clusters and stream processing applications must be configured to achieve specific time semantics. Finally, we deep dive into the time semantics of the Kafka Streams DSL and KSQL operators, and explain in detail how the runtime handles time. Apache Kafka offers many ways to handle time on the storage layer, ie, the brokers, allowing users to build applications with different semantics. Time semantics in the processing layer, ie, Kafka Streams and KSQL, are even richer, more powerful, but also more complicated. Hence, it is paramount for developers, to understand different time semantics and to know how to configure Kafka to achieve them. Therefore, this talk enables developers to design applications with their desired time semantics, help them to reason about the runtime behavior with regard to time, and allow them to understand processing/query results.
Streaming in Practice - Putting Apache Kafka in Productionconfluent
This presentation focuses on how to integrate all these components into an enterprise environment and what things you need to consider as you move into production.
We will touch on the following topics:
- Patterns for integrating with existing data systems and applications
- Metadata management at enterprise scale
- Tradeoffs in performance, cost, availability and fault tolerance
- Choosing which cross-datacenter replication patterns fit with your application
- Considerations for operating Kafka-based data pipelines in production
http://www.oreilly.com/pub/e/3764
Keystone processes over 700 billion events per day (1 peta byte) with at-least-once processing semantics in the cloud. Monal Daxini details how they used Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. He'll also share plans on offering a Stream Processing as a Service for all of Netflix use.
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
In this session, Netflix provides an overview of Keystone, their new data pipeline. The session covers how Netflix migrated from Suro to Keystone, including the reasons behind the transition and the challenges of zero loss while processing over 400 billion events daily. The session covers in detail how they deploy, operate, and scale Kafka, Samza, Docker, and Apache Mesos in AWS to manage 8 million events & 17 GB per second during peak.
Strategies and techniques to optimize Kafka brokers and producers to minimize data loss under huge traffic volume, limited configuration options, less ideal and constant changing environment and balance against cost.
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
"This is a technical architect's case study of how Loggly has employed the latest social-media-scale technologies as the backbone ingestion processing for our multi-tenant, geo-distributed, and real-time log management system. This presentation describes design details of how we built a second-generation system fully leveraging AWS services including Amazon Route 53 DNS with heartbeat and latency-based routing, multi-region VPCs, Elastic Load Balancing, Amazon Relational Database Service, and a number of pro-active and re-active approaches to scaling computational and indexing capacity.
The talk includes lessons learned in our first generation release, validated by thousands of customers; speed bumps and the mistakes we made along the way; various data models and architectures previously considered; and success at scale: speeds, feeds, and an unmeltable log processing engine."
Netflix changed its data pipeline architecture recently to use Kafka as the gateway for data collection for all applications which processes hundreds of billions of messages daily. This session will discuss the motivation of moving to Kafka, the architecture and improvements we have added to make Kafka work in AWS. We will also share the lessons learned and future plans.
The Netflix Way to deal with Big Data ProblemsMonal Daxini
Netflix is a data driven company with a unique culture. Come take a holistic tour of the Big Data ecosystem, and how Netflix culture catalyzes the development of systems. Then ogle at how we quickly evolved and scaled the event pipeline to a 1 trillion events per day and over 1.4 PB of event data without service disruption, and a small team.
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
Netflix is a data driven company and we process over 700 billion streaming events per day with at-least once processing semantics in the cloud. To enable extracting intelligence from this unbounded stream easily we are building Stream Processing as a Service (SPaaS) infrastructure so that the user can focus on extracting value and not have to worry about boilerplate infrastructure and scale.
We will share our experience in building a scalable SPaaS using Flink, Apache Beam and Kafka as the foundation layer to process over 1.3 PB of event data without service disruption.
Apache Kafka, Apache Cassandra and Kubernetes are open source big data technologies enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpins the data layer of the stack providing capability to stream, disseminate, store and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters. In this presentation, we will reveal how we architected a massive scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example Anomaly detection application running on a Kubernetes cluster and generating and processing massive amount of events. Anomaly detection is a method used to detect unusual events in an event stream. It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at massive scale generating millions or billions of events, they impose significant computational, performance and scalability challenges to anomaly detection algorithms and data layer technologies. We will demonstrate the scalability, performance and cost effectiveness of Apache Kafka, Cassandra and Kubernetes, with results from our experiments allowing the Anomaly detection application to scale to 19 Billion anomaly checks per day.
Deploying Kafka at Dropbox, Mark Smith, Sean Fellowsconfluent
At Dropbox we are currently handling approximately 10,000,000 messages per second at peak across our handful of Kafka clusters. The largest of which has hit throughputs of 7,000,000 per second (~30 Gbps) on only 20 nodes. We’ll walk you through the steps we took to get where we are, the design that works for us — and those that didn’t. We’ll talk about the tooling we had to build and what we want to see exist.
We’ll dive deeper into configuration and provide a blueprint you can follow. We’ll talk about the trials and tribulations of using Kafka — including ways we’ve set our clusters on fire, ways we’ve lost data, ways we’ve turned our hairs gray, and ways we’ve heroically saved the day for our users. Finally, we’ll spend time on some of the work we’re doing to handle consumer coordination across our many different systems and to integrate Kafka into a well established corporate infrastructure. (I.e., making Kafka “”play nice”” with everybody.)
Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.
Arc305 how netflix leverages multiple regions to increase availability an i...Ruslan Meshenberg
Learn how to make your services more resilient and available by embracing principles of isolation and redundancy. See details of 2 projects - Isthmus and Active/Active to learn how Netflix architects for availability in multi-regional environment.
Recently, the interest in highly scalable stream processing engines has risen, thus many projects have appeared. Apache Samza is a distributed stream-processing framework that uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, and resource management. It is one of the most popular stream processing engines out there used by many high-profile companies. On the other hand, we have Amazon Kinesis that is a fully managed service for real-time processing of streaming data which allows users to scale the amount of data ingested by Kinesis without worrying about the infrastructure details. This presentation gives a brief introduction about the very popular Samza-Kafka integration, then focuses on the new Samza-Kinesis integration, and explains users the new opportunities they have due to the new Samza-Kinesis integration.
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019confluent
Data stream processing is built on the core concept of time. However, understanding time semantics and reasoning about time is not simple, especially if deterministic processing is expected. In this talk, we explain the difference between processing, ingestion, and event time and what their impact is on data stream processing. Furthermore, we explain how Kafka clusters and stream processing applications must be configured to achieve specific time semantics. Finally, we deep dive into the time semantics of the Kafka Streams DSL and KSQL operators, and explain in detail how the runtime handles time. Apache Kafka offers many ways to handle time on the storage layer, ie, the brokers, allowing users to build applications with different semantics. Time semantics in the processing layer, ie, Kafka Streams and KSQL, are even richer, more powerful, but also more complicated. Hence, it is paramount for developers, to understand different time semantics and to know how to configure Kafka to achieve them. Therefore, this talk enables developers to design applications with their desired time semantics, help them to reason about the runtime behavior with regard to time, and allow them to understand processing/query results.
Streaming in Practice - Putting Apache Kafka in Productionconfluent
This presentation focuses on how to integrate all these components into an enterprise environment and what things you need to consider as you move into production.
We will touch on the following topics:
- Patterns for integrating with existing data systems and applications
- Metadata management at enterprise scale
- Tradeoffs in performance, cost, availability and fault tolerance
- Choosing which cross-datacenter replication patterns fit with your application
- Considerations for operating Kafka-based data pipelines in production
Overview of the JSON-RPC mechanism.
JSON-RPC is a simple RPC (Remote Procedure Call) mechanism, similar to XML-RPC.
Unlike XML-RPC which is a client-server protocol, JSON-RPC is a peer-to-peer protocol.
It uses JSON (Javascript Object Notation, RFC4627) as the serialization format and plain TCP streams or HTTP as transport mechanism.
JSON-RPC defines the three message types Request, Response and Notification. There is no direct mapping of JSON-RPC message to HTTP request. HTTP or plain TCP are merely transport protocols that carry JSON-RPC messages.
JSON-RPC is a simple protocol and therefore lacks most of the features that big web services like SOAP/WSDL and the WS-* standards provide. JSON-RPC may be suited for web service applications with the need for bidirectional interaction (peer2peer), but where the complexity of SOAP is not required.
GumGum relies heavily on Cassandra for storing different kinds of metadata. Currently GumGum reaches 1 billion unique visitors per month using 3 Cassandra datacenters in Amazon Web Services spread across the globe.
This presentation will detail how we scaled out from one local Cassandra datacenter to a multi-datacenter Cassandra cluster and all the problems we encountered and choices we made while implementing it.
How did we architect multi-region Cassandra in AWS? What were our experiences in implementing multi-datacenter Cassandra? How did we achieve low latency with multi-region Cassandra and the Datastax Driver? What are the different Cassandra use cases at GumGum? How did we integrate our Cassandra with Spark?
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
A novel approach to Artificial Intelligence On-Board
New generations of spacecrafts are required to perform tasks with an increased level of autonomy. Space exploration, Earth Observation, space robotics, etc. are all growing fields in Space that require more sensors and more computational power to perform these missions.
Sensors, embedded processors, and hardware in general have hugely evolved in the last decade, equipping embedded systems with large number of sensors that will produce data at rates that has not been seen before while simultaneously having computing power capable of large data processing on-board. Near-future spacecrafts will be equipped with large number of sensors that will produce data at high-speed rates in space and data processing power will be significantly increased.
Future missions such as Active Debris Removal will rely on novel high-performance avionics to support image processing and Artificial Intelligence algorithms with large workloads. Similar requirements come from Earth Observation applications, where data processing on-board can be critical in order to provide real-time reliable information to Earth. This new scenario has brought new challenges with it: low determinism, excessive power needs, data losses and large response latency.
In this project, Klepsydra AI is used as a novel approach to on-board artificial intelligence. It provides a very sophisticated threading model combination of pipeline and parallelization techniques applied to deep neural networks, making AI applications much more efficient and reliable. This new approach has been validated with several DNN models and two different computer architectures. The results show that the data processing rate and power saving of the applications increase substantially with respect to standard AI solutions.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Cassandra is the dominant data store used at Netflix and it's health is critical to many of its services. In this talk we will share details of the recent redesign of our health monitoring system and how we leveraged a reactive stream processing system to give us a real-time view our entire fleet while dramatically improving accuracy and reducing false alarms in our alerting.
About the Speaker
Jason Cacciatore Senior Software Engineer, Netflix
Jason Cacciatore is a Senior Software Engineer at Netflix, where he's been working for the past several years. He's interested in stateful distributed systems and has a diverse background in technology. In his spare time he enjoys spending time with his wife and two sons, reading non-fiction, and watching Netflix documentaries.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
DeveloperWeek Management 2022 Conference Presentation https://www.developerweek.com/global/conference/management/schedule/
In the last decade, the development of modern horizontally scalable open-source Big Data technologies such as Apache Cassandra (for data storage), and Apache Kafka (for data streaming) enabled cost-effective, highly scalable, reliable, low-latency applications, and made these technologies increasingly ubiquitous. To enable reliable horizontal scalability, both Cassandra and Kafka utilize partitioning (for concurrency) and replication (for reliability and availability) across clustered servers. But building scalable applications isn’t as easy as just throwing more servers at the clusters, and unexpected speed humps are common. Consequently, you also need to understand the performance impact of partitions, replication, and clusters; monitor the correct metrics to have an end-to-end view of applications and clusters; conduct careful benchmarking, and scale and tune iteratively to take into account performance insights and optimizations. In this presentation, I will explore some of the performance goals, challenges, solutions, and results I discovered over the last 5 years building multiple realistic demonstration applications. The examples will include trade-offs with elastic Cassandra auto-scaling, scaling a Cassandra and Kafka anomaly detection application to 19 Billion checks per day, and building low-latency streaming data pipelines using Kafka Connect for multiple heterogeneous source and sink systems.
Five Steps to Creating a Secure Hybrid Cloud ArchitectureAmazon Web Services
A hybrid Architecture is one of the easiest ways to securely address new application requirements and cloud-first development initiatives. This approach allows you to start small and expand as your requirements change while maintaining a strong security posture. In this session, you will learn the 5 key steps to building a hybrid architecture on AWS using the VM-Series next-generation firewall.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
8. Micro Services
Micro services DOES NOT mean better
Availability
Need Fault Tolerant Architecture
Service Dependency View
Distributed Tracing (Dapper inspired)
17. Encoding PaaS
Master - Worker Pattern
Decoupled by Priority Queues
with message lease
State in Cassandra
18. Oracle >> Cassandra
Data Model & Lack of ACID
Client Cluster Symbiosis
Embrace Eventual Consistency
Data Migration
Shadow Write / Reads
19. Object To Cassandra Mapping
/**
* @author mdaxini
*/
@CColumnFamily(name = “Sequence", shared = true)
@Audited(columnFamily = "sequence_audit")
public class SequenceBean {
@CId(name = "id")
private String sequenceName;
@CColumn(name = "sequenceValue")
private Long sequenceValue;
@CColumn(name = "updated")
@TemporalAutoUpdate
@JsonProperty("updated")
private Date updated;
20. Object To Cassandra Mapping
@JsonAutoDetect(JsonMethod.NONE)
@JsonIgnoreProperties(ignoreUnknown = true)
!
@CColumnFamily(name = "task")
public class Job {
@CId
private JobKey jobKey;
public final class TaskKey {
@CId(order = 0)
private Long packageId;
@CId(order = 1)
private UUID taskId;
21. Priority-Scheduling Queue
Evolution:
One SQS Queue per priority range
Store and forward (rate-adaptive) to SQS
Queue
Rule based priority, leases, RDBMS based with
prefetch
22. Encoding PaaS Farm
One command deployment and upgrade
Self Serve
Homogeneous View of Windows and Linux
Pioneered Ubuntu - production since 2011
23. Innovate Fast
Build for Pragmatic Scale
Innovate for Business
Standardize Later*
25. Platform Big Data/Caching & Services
Cassandra
Astyanax Priam
CassJMeter
Hadoop Platform
As a Service
Genie
Lipstick
Adapted from a slide by @stonse
Caching
Inviso*
26. CDE Charter
Dynomite*
Redis
ElasticSearch
Spark*
Solr*
* Under Construction
Cassandra (1.2.x >> 2.0.x)
Priam
Astyanax
Skynet*
30. Use RandomPartitioner
Have at least 3 replicas (quorum)
Same number of replicas - simpler operations
!
create keyspace oracle
with placement_strategy = 'NetworkTopologyStrategy'
!
and strategy_options = {us-west-2 : 3, us-east : 3}
31. Move to CQL3 from thrift
Codifies best practices
Leverage Collections (albeit restricted cardinality)
Use Key Caching
As a default turn off Row Caching
Rename all composite columns in one ALTER
TABLE statement.
32. Watch length of column names
Use “COMPACT STORAGE” wisely
Cannot use collections - depends on
CompositeType
Non compact storage uses 2 bytes per internal
cell, but preferred.
!
!
* Image courtsey Datastax blog
34. Prefer CL_ONE
data replication within 500ms across the region
Using quorum reads and writes, then set
read_repair_chance to 0.0 or very low value.
Make sure repairs are run often
Eventual Consistency does not mean hopeful
consistency
35. Avoid secondary indexes for high cardinality
values
Most cases we set gc_grace_seconds = 10 days
Avoid hot rows
detect using node level latency metrics
36. Avoid heavy rows
Avoid too wide rows (< 100K columns if smaller)
Don’t use C* as a Queue
Tombstones will bite you
39. Guesstimate and then validate sstable_size_in_mb
Hint: based on write rate and size
160mb for LeveledCompactionStrategy
SizeTieredCompactionStrategy - C* default 50mb
40. Atomic batches
no isolation, only atomic for row within
partition key
no automatic rollback
Lightweight transactions
42. If your C* clusters footprint is significant
must have good automation
at least a C* semi-expert
Use cstar_perf to validate your initial clusters
We don’t use vnodes
On each node size disk to have 2x of expected
data - ephemeral ssds no ebs
43. Monitoring and alerting
read write latency - co-ordinator & node level
Compaction stats
Heap Usage
Network
Max & Min Row sizes
44. Fixed tokens, double the cluster to expand
Important to size the cluster for app needs
initially
benefits of fixed tokens outweighs vnodes
Take back up of all the nodes
to allow for eventual consistency on restores
Note: commitlog by default fsync only ever 10
seconds
45. Run repairs before GCGraceSeconds expires
Throttle compactions and repairs
Repairs can take a long time
run a primary range and a Keyspace at a time to
avoid performance impact.
46. Schema disagreements - pick the nodes with the
older date and restart them one at time.
nodetool reset local schema not persistent on 1.2
Recyle nodes in aws to prevent staleness
Expanding to new region
Launch nodes in new region without
bootstrapping
Change Keyspace replication
Run nodetool rebuild on nodes in new region.
47. More Info
http://techblog.netflix.com/
http://netflix.github.io/
http://slideshare.net/netflix
https://www.youtube.com/user/NetflixOpenSource
https://www.youtube.com/user/NetflixIR $$$