Surge 2013 presentation which covers how Netflix maximizes engineering velocity while keeping risks to scalability, reliability, and performance in check.
goto; London: Keeping your Cloud Footprint in CheckCoburn Watson
Presented on the "Lean" track at goto; London September 17th, 2015. Covers how Netflix manages cloud cost efficiency in light of innovation and reliability drivers.
#lspe Q1 2013 dynamically scaling netflix in the cloudCoburn Watson
Meetup presentation on how Netflix dynamically scales in the cloud. It covers topics primarily related to AWS autoscaling and provides some "day-in-the-life" data.
All Things Open 2014 - Day 1
Wednesday, October 22nd, 2014
Mark Hinkle
Senior Director & Citrix Open Source Business Office for Citrix
Cloud
Crash Course in Cloud Computing
Find more of Mark's talks here: http://www.slideshare.net/socializedsoftware
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
In this session, Netflix provides an overview of Keystone, their new data pipeline. The session covers how Netflix migrated from Suro to Keystone, including the reasons behind the transition and the challenges of zero loss while processing over 400 billion events daily. The session covers in detail how they deploy, operate, and scale Kafka, Samza, Docker, and Apache Mesos in AWS to manage 8 million events & 17 GB per second during peak.
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services
Providing a great media consumption experience to customers is crucial to maximizing audience engagement. To do that, it is important that you make content available for consumption anytime, anywhere, on any device, with a personalized and interactive experience. This session explores the power of big data log analytics (real-time and batched), using technologies like Spark, Shark, Kafka, Amazon Elastic MapReduce, Amazon Redshift and other AWS services. Such analytics are useful for content personalization, recommendations, personalized dynamic ad-insertions, interactivity, and streaming quality.
This session also includes a discussion from Netflix, which explores personalized content search and discovery with the power of metadata.
In this session, Kevin will dive into the unique challenges of keeping your Kubernetes workloads highly available while keeping costs low. You will learn about how to leverage cloud-native autoscaling, pod requirement right-sizing, resource buffer definition, cost allocation and more.
goto; London: Keeping your Cloud Footprint in CheckCoburn Watson
Presented on the "Lean" track at goto; London September 17th, 2015. Covers how Netflix manages cloud cost efficiency in light of innovation and reliability drivers.
#lspe Q1 2013 dynamically scaling netflix in the cloudCoburn Watson
Meetup presentation on how Netflix dynamically scales in the cloud. It covers topics primarily related to AWS autoscaling and provides some "day-in-the-life" data.
All Things Open 2014 - Day 1
Wednesday, October 22nd, 2014
Mark Hinkle
Senior Director & Citrix Open Source Business Office for Citrix
Cloud
Crash Course in Cloud Computing
Find more of Mark's talks here: http://www.slideshare.net/socializedsoftware
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
In this session, Netflix provides an overview of Keystone, their new data pipeline. The session covers how Netflix migrated from Suro to Keystone, including the reasons behind the transition and the challenges of zero loss while processing over 400 billion events daily. The session covers in detail how they deploy, operate, and scale Kafka, Samza, Docker, and Apache Mesos in AWS to manage 8 million events & 17 GB per second during peak.
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services
Providing a great media consumption experience to customers is crucial to maximizing audience engagement. To do that, it is important that you make content available for consumption anytime, anywhere, on any device, with a personalized and interactive experience. This session explores the power of big data log analytics (real-time and batched), using technologies like Spark, Shark, Kafka, Amazon Elastic MapReduce, Amazon Redshift and other AWS services. Such analytics are useful for content personalization, recommendations, personalized dynamic ad-insertions, interactivity, and streaming quality.
This session also includes a discussion from Netflix, which explores personalized content search and discovery with the power of metadata.
In this session, Kevin will dive into the unique challenges of keeping your Kubernetes workloads highly available while keeping costs low. You will learn about how to leverage cloud-native autoscaling, pod requirement right-sizing, resource buffer definition, cost allocation and more.
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...HostedbyConfluent
Should you consume Kafka in a stream OR batch? When should you choose each one? What is more efficient, and cost effective?
In this talk we’ll give you the tools and metrics to decide which solution you should apply when, and show you a real life example with cost & time comparisons.
To highlight the differences, we’ll dive into a project we’ve done, transitioning from reading Kafka in a stream to reading it in batch.
By turning conventional thinking on its head and reading our multi-petabyte Kafka stream in batch using Spark and Airflow, we’ve achieved a huge cost reduction of 65% while at the same time getting a more scalable and resilient solution.
We’ll explore the tradeoffs and give you the metrics and intuition you’ll need to make such decisions yourself.
We’ll cover:
Costs of processing in stream compared to batch
Scaling up for bursts and reprocessing
Making the tradeoff between wait times and costs
Recovering from outages
And much more…
http://www.oreilly.com/pub/e/3764
Keystone processes over 700 billion events per day (1 peta byte) with at-least-once processing semantics in the cloud. Monal Daxini details how they used Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. He'll also share plans on offering a Stream Processing as a Service for all of Netflix use.
Going from three nines to four nines using Kafka | Tejas Chopra, NetflixHostedbyConfluent
Many organizations have chosen to go with a hybrid cloud architecture to give them the best of both worlds: the scalability and ease of deployment of cloud, and the security, latency & egress benefits of local storage.
Persistence of data on such an architecture can follow a write-back mode, where data is first written to local storage, and then uploaded to cloud asynchronously. However, this means that the applications cannot utilize the availability and durability guarantees of cloud, and the availability of storage is the availability SLA of on-premise storage, which is almost always less than the availability SLA of Cloud.
By switching the order, i.e. performing uploads to cloud, and then hydrating on-premise storage, applications get the benefit of availability SLAs of cloud. In our case, this allowed us to move from three 9’s of availability (99.9%) of local storage to four 9’s (99.99%).
Instead of uploading in write-back mode, we duplicated the incoming stream to upload to both cloud and on-premise. For on-premise uploads that failed, we leveraged Kafka’s event processing to queue up objects that need to be egressed out of Cloud into the local storage.
This architecture allowed us to hydrate the local storage with objects uploaded to Cloud. Furthermore, since local storage space is limited, we periodically purged data out of local storage and created a secondary copy of the data on cloud by leveraging Kafka event processing.
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
Siphon is a highly available and reliable distributed pub/sub system built using Apache Kafka. It is used to publish, discover and subscribe to near real-time data streams for operational and product intelligence. Siphon is used as a “Databus” by a variety of producers and subscribers in Microsoft, and is compliant with security and privacy requirements. It has a built-in Auditing and Quality control. This session will provide an overview of the use of Kafka at Microsoft, and then deep dive into Siphon. We will describe an important business scenario and talk about the technical details of the system in the context of that scenario. We will also cover the design and implementation of the service, the scale, and real world production experiences from operating the service in the Microsoft cloud environment.
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward
The increasing number of available data sources in today's application stacks created a demand to continuously capture and process data from various sources to quickly turn high volume streams of raw data into actionable insights. Apache Flink addresses many of the challenges faced in this domain as it's specifically tailored to distributed computations over streams. While Flink provides all the necessary capabilities to process streaming data, provisioning and maintaining a Flink cluster still requires considerable effort and expertise. We will discuss how cloud services can remove most of the burden of running the clusters underlying your Flink jobs and explain how to build a real-time processing pipeline on top of AWS by integrating Flink with Amazon Kinesis and Amazon EMR. We will furthermore illustrate how to leverage the reliable, scalable, and elastic nature of the AWS cloud to effectively create and operate your real-time processing pipeline with little operational overhead.
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...Flink Forward
The mission of Uber is to make transportation as reliable as running water. The business is fundamentally driven by real-time data -- more than half of the employees in Uber, many of whom are non-technical, use SQL on a regular basis to analyze data and power their business decisions. We are building AthenaX, a stream processing platform built on top of Apache Flink to enable our users to write SQL to process real-time data efficiently and reliably at Uber's scale. Using Apache Calcite as query parser, AthenaX compiles the SQL down to Flink jobs. Leveraging Flink's unique streaming capabilities, AthenaX supports (1) consistent computations reliably thanks to at-least-once guarantees, (2) nontrivial analytics (e.g., windowing and joins) on multiple data sources, and (3) efficient and cost-effective executions in production through code generation and elastic scaling.
Netflix viewing data architecture evolution - EBJUG Nov 2014Philip Fisher-Ogden
Netflix's architecture for viewing data has evolved as streaming usage has grown. Each generation was designed for the next order of magnitude, and was informed by learnings from the previous. From SQL to NoSQL, from data center to cloud, from proprietary to open source, look inside to learn how this system has evolved. (slides from a talk given at the East Bay Java Users Group MeetUp in Nov 2014)
Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...DevOps.com
In this new world of remote work, the demand for real-time cloud-based communication services has skyrocketed.
8x8, a company that provides a global cloud communications platform for its customers, moved its SaaS application from the data center to public cloud using Kubernetes and a microservices-based architecture to meet increased demand.
In this webinar, we’ll share the firsthand experience of the DevOps team at 8x8, covering key considerations and learnings from successfully migrating a VoIP offering from an on-premises environment to AWS at scale. Learnings will cover designing application delivery infrastructure, automation, analytics, security, observability, and rapid scaling.
GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the CloudSamuel Chow
Interactive Workshop on discussing the target architecture and migration plan of migrating a legacy, on-premise IT infrastructure to Google Cloud Platform (GCP).
How to Enable Industrial Decarbonization with Node-RED and InfluxDBInfluxData
Graphite Energy’s thermal energy storage (TES) platform encourages clients to offset their traditional energy consumption with low-cost renewable energy sources. Their customers include manufacturers, mines, steelmakers and aluminum plants. IIoT data is collected about energy usage, fuel consumption, temperatures, solar panels, wind farms, process steam and air dryers. Discover how Graphite Energy uses InfluxDB to monitor their zero-emission energy solution.
In this webinar, Byron Ross will dive into:
Graphite Energy’s approach to reducing their clients’ carbon footprint
Their methodology to collecting sensor data used to make their operations more green
Why they chose a time series database over a data historian
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
Netflix is a data driven company and we process over 700 billion streaming events per day with at-least once processing semantics in the cloud. To enable extracting intelligence from this unbounded stream easily we are building Stream Processing as a Service (SPaaS) infrastructure so that the user can focus on extracting value and not have to worry about boilerplate infrastructure and scale.
We will share our experience in building a scalable SPaaS using Flink, Apache Beam and Kafka as the foundation layer to process over 1.3 PB of event data without service disruption.
Putting Kafka Together with the Best of Google Cloud Platform confluent
(Kir Titievsky, Google) Kafka Summit SF 2018
In this talk we will share some stories and patterns from customers who have built streaming pipelines and event-driven systems using Confluent Cloud in combination with Google Cloud Platform-native analytics tools, such as BigQuery and Dataflow. We will discuss what Confluent Cloud enables for hybrid deployments and how and why to mix and match platform-native and platform-neutral tools.
Over 100 million subscribers from over 190 countries enjoy the Netflix service. This leads to over a trillion events, amounting to 3 PB, flowing through the Keystone infrastructure to help improve customer experience and glean business insights. The self-serve Keystone stream processing service processes these messages in near real-time with at-least once semantics in the cloud. This enables the users to focus on extracting insights, and not worry about building out scalable infrastructure. I’ll share the details about this platform, and our experience building it.
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...HostedbyConfluent
Should you consume Kafka in a stream OR batch? When should you choose each one? What is more efficient, and cost effective?
In this talk we’ll give you the tools and metrics to decide which solution you should apply when, and show you a real life example with cost & time comparisons.
To highlight the differences, we’ll dive into a project we’ve done, transitioning from reading Kafka in a stream to reading it in batch.
By turning conventional thinking on its head and reading our multi-petabyte Kafka stream in batch using Spark and Airflow, we’ve achieved a huge cost reduction of 65% while at the same time getting a more scalable and resilient solution.
We’ll explore the tradeoffs and give you the metrics and intuition you’ll need to make such decisions yourself.
We’ll cover:
Costs of processing in stream compared to batch
Scaling up for bursts and reprocessing
Making the tradeoff between wait times and costs
Recovering from outages
And much more…
http://www.oreilly.com/pub/e/3764
Keystone processes over 700 billion events per day (1 peta byte) with at-least-once processing semantics in the cloud. Monal Daxini details how they used Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. He'll also share plans on offering a Stream Processing as a Service for all of Netflix use.
Going from three nines to four nines using Kafka | Tejas Chopra, NetflixHostedbyConfluent
Many organizations have chosen to go with a hybrid cloud architecture to give them the best of both worlds: the scalability and ease of deployment of cloud, and the security, latency & egress benefits of local storage.
Persistence of data on such an architecture can follow a write-back mode, where data is first written to local storage, and then uploaded to cloud asynchronously. However, this means that the applications cannot utilize the availability and durability guarantees of cloud, and the availability of storage is the availability SLA of on-premise storage, which is almost always less than the availability SLA of Cloud.
By switching the order, i.e. performing uploads to cloud, and then hydrating on-premise storage, applications get the benefit of availability SLAs of cloud. In our case, this allowed us to move from three 9’s of availability (99.9%) of local storage to four 9’s (99.99%).
Instead of uploading in write-back mode, we duplicated the incoming stream to upload to both cloud and on-premise. For on-premise uploads that failed, we leveraged Kafka’s event processing to queue up objects that need to be egressed out of Cloud into the local storage.
This architecture allowed us to hydrate the local storage with objects uploaded to Cloud. Furthermore, since local storage space is limited, we periodically purged data out of local storage and created a secondary copy of the data on cloud by leveraging Kafka event processing.
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
Siphon is a highly available and reliable distributed pub/sub system built using Apache Kafka. It is used to publish, discover and subscribe to near real-time data streams for operational and product intelligence. Siphon is used as a “Databus” by a variety of producers and subscribers in Microsoft, and is compliant with security and privacy requirements. It has a built-in Auditing and Quality control. This session will provide an overview of the use of Kafka at Microsoft, and then deep dive into Siphon. We will describe an important business scenario and talk about the technical details of the system in the context of that scenario. We will also cover the design and implementation of the service, the scale, and real world production experiences from operating the service in the Microsoft cloud environment.
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward
The increasing number of available data sources in today's application stacks created a demand to continuously capture and process data from various sources to quickly turn high volume streams of raw data into actionable insights. Apache Flink addresses many of the challenges faced in this domain as it's specifically tailored to distributed computations over streams. While Flink provides all the necessary capabilities to process streaming data, provisioning and maintaining a Flink cluster still requires considerable effort and expertise. We will discuss how cloud services can remove most of the burden of running the clusters underlying your Flink jobs and explain how to build a real-time processing pipeline on top of AWS by integrating Flink with Amazon Kinesis and Amazon EMR. We will furthermore illustrate how to leverage the reliable, scalable, and elastic nature of the AWS cloud to effectively create and operate your real-time processing pipeline with little operational overhead.
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming pro...Flink Forward
The mission of Uber is to make transportation as reliable as running water. The business is fundamentally driven by real-time data -- more than half of the employees in Uber, many of whom are non-technical, use SQL on a regular basis to analyze data and power their business decisions. We are building AthenaX, a stream processing platform built on top of Apache Flink to enable our users to write SQL to process real-time data efficiently and reliably at Uber's scale. Using Apache Calcite as query parser, AthenaX compiles the SQL down to Flink jobs. Leveraging Flink's unique streaming capabilities, AthenaX supports (1) consistent computations reliably thanks to at-least-once guarantees, (2) nontrivial analytics (e.g., windowing and joins) on multiple data sources, and (3) efficient and cost-effective executions in production through code generation and elastic scaling.
Netflix viewing data architecture evolution - EBJUG Nov 2014Philip Fisher-Ogden
Netflix's architecture for viewing data has evolved as streaming usage has grown. Each generation was designed for the next order of magnitude, and was informed by learnings from the previous. From SQL to NoSQL, from data center to cloud, from proprietary to open source, look inside to learn how this system has evolved. (slides from a talk given at the East Bay Java Users Group MeetUp in Nov 2014)
Business Continuity with Microservices-Based Apps and DevOps: Learnings from ...DevOps.com
In this new world of remote work, the demand for real-time cloud-based communication services has skyrocketed.
8x8, a company that provides a global cloud communications platform for its customers, moved its SaaS application from the data center to public cloud using Kubernetes and a microservices-based architecture to meet increased demand.
In this webinar, we’ll share the firsthand experience of the DevOps team at 8x8, covering key considerations and learnings from successfully migrating a VoIP offering from an on-premises environment to AWS at scale. Learnings will cover designing application delivery infrastructure, automation, analytics, security, observability, and rapid scaling.
GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the CloudSamuel Chow
Interactive Workshop on discussing the target architecture and migration plan of migrating a legacy, on-premise IT infrastructure to Google Cloud Platform (GCP).
How to Enable Industrial Decarbonization with Node-RED and InfluxDBInfluxData
Graphite Energy’s thermal energy storage (TES) platform encourages clients to offset their traditional energy consumption with low-cost renewable energy sources. Their customers include manufacturers, mines, steelmakers and aluminum plants. IIoT data is collected about energy usage, fuel consumption, temperatures, solar panels, wind farms, process steam and air dryers. Discover how Graphite Energy uses InfluxDB to monitor their zero-emission energy solution.
In this webinar, Byron Ross will dive into:
Graphite Energy’s approach to reducing their clients’ carbon footprint
Their methodology to collecting sensor data used to make their operations more green
Why they chose a time series database over a data historian
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
Netflix is a data driven company and we process over 700 billion streaming events per day with at-least once processing semantics in the cloud. To enable extracting intelligence from this unbounded stream easily we are building Stream Processing as a Service (SPaaS) infrastructure so that the user can focus on extracting value and not have to worry about boilerplate infrastructure and scale.
We will share our experience in building a scalable SPaaS using Flink, Apache Beam and Kafka as the foundation layer to process over 1.3 PB of event data without service disruption.
Putting Kafka Together with the Best of Google Cloud Platform confluent
(Kir Titievsky, Google) Kafka Summit SF 2018
In this talk we will share some stories and patterns from customers who have built streaming pipelines and event-driven systems using Confluent Cloud in combination with Google Cloud Platform-native analytics tools, such as BigQuery and Dataflow. We will discuss what Confluent Cloud enables for hybrid deployments and how and why to mix and match platform-native and platform-neutral tools.
Over 100 million subscribers from over 190 countries enjoy the Netflix service. This leads to over a trillion events, amounting to 3 PB, flowing through the Keystone infrastructure to help improve customer experience and glean business insights. The self-serve Keystone stream processing service processes these messages in near real-time with at-least once semantics in the cloud. This enables the users to focus on extracting insights, and not worry about building out scalable infrastructure. I’ll share the details about this platform, and our experience building it.
Originally presented at Angelbeat, learn how hackers gather data about your organization and how you can do the same sort of reconnaissance to eliminate risk before it becomes a breach.
Check out the deck and then get your own free risk scorecard here: https://www.normshield.com/get-risk-scorecard/
From Code to the Monkeys: Continuous Delivery at NetflixDianne Marsh
At Netflix, we continue to improve upon our continuous delivery process. We thrive in a hybrid environment, where every developer is able to deploy code, and with that freedom comes the responsibility for ensuring that our customers are not negatively impacted. We have constructed Open Source tools toward a Continuous Delivery solution. In this presentation, from QConSF 2013, you will learn about our tool chain so that you can determine which make sense in your environment.
QConSF 2014 talk on Netflix Mantis, a stream processing systemDanny Yuan
Justin and I gave this talk in QCon SF 2014 about the Mantis, a stream processing system that features a reactive programming API, auto scaling, and stream locality
Using the new extended Berkley Packet Filter capabilities in Linux to the improve performance of auditing security relevant kernel events around network, file and process actions.
With ad growth thrown into the mix, it’s apparent that every facet of the OTT market is expanding: advertising opportunities; popularity of OTT devices like Apple TV and Roku; and the amount of OTT content and services geared to break into the market.
Continuous Delivery at Netflix, and beyondMike McGarr
A talk I gave on how Netflix delivers code to production, some of the enabling factors and recommendations for how to implement continuous delivery in your organization.
(SPOT302) Availability: The New Kind of Innovator’s DilemmaAmazon Web Services
Successful companies, while focusing on their current customers' needs, often fail to embrace disruptive technologies and business models. This phenomenon, known as the "Innovator's Dilemma," eventually leads to many companies' downfall and is especially relevant in the fast-paced world of online services. In order to protect its leading position and grow its share of the highly competitive global digital streaming market, Netflix has to continuously increase the pace of innovation by constantly refining recommendation algorithms and adding new product features, while maintaining a high level of service uptime. The Netflix streaming platform consists of hundreds of microservices that are constantly evolving, and even the smallest production change may cause a cascading failure that can bring the entire service down. We face a new kind of Innovator's Dilemma, where product changes may not only disrupt the business model but also cause production outages that deny customers service access. This talk will describe various architectural, operational and organizational changes adopted by Netflix in order to reconcile rapid innovation with service availability.
NormShield Cyber Threat & Vulnerability Orchestration OverviewNormShield, Inc.
NormShield is at the forefront of orchestrated cyber security operations and reporting, a transformative new category that Gartner calls SOAR. The NormShield cloud platform automates finding vulnerabilities, prioritizes them and provides actionable intelligence. A key differentiation is the company’s combination of advanced automation and human intelligence for reliability unparalleled in the industry. NormShield CISOs receive letter-grade risk scorecards. Their teams manage risk, not data. The results are measurable: informed decisions and swift action that reduces risk as never before possible and at an affordable price.
Benchmarking Elastic Cloud Big Data Services under SLA ConstraintsNicolas Poggi
We introduce an extension for TPC benchmarks addressing the requirements of big data processing in cloud environments. We characterize it as the Elasticity Test and evaluate under TPCx-BB (BigBench). First, the Elasticity Test incorporates an approach to generate real-world query submissions patterns with distinct data scale factors based on major industrial cluster logs. Second, a new metric is introduced based on Service Level Agreements (SLAs) that takes the quality of service requirements of each query under consideration.
Experiments with Apache Hive and Spark on the cloud platforms of three major vendors validate our approach by comparing to the current TPCx-BB metric.
Results show how systems who fail to meet SLAs under concurrency due to queuing or degraded performance negatively affect the new metric. On the other hand, elastic systems meet a higher percentage of SLAs and thus are rewarded in the new metric. Such systems have the ability to scale up and down compute workers according to the demands of a varying workload and can thus save dollar costs.
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
Ted Dunning – Very High Bandwidth Time Series Database Implementation
This talk will describe our work in creating time series databases with very high ingest rates (over 100 million points / second) on very small clusters. Starting with openTSDB and the off-the-shelf version of MapR-DB, we were able to accelerate ingest by >1000x. I will describe our techniques in detail and talk about the architectural changes required. We are also working to allow access to openTSDB data using SQL via Apache Drill. In addition, I will talk about how this work has implications regarding the much fabled Internet of Things. And tell some stories about the origins of open source big data in the 19th century at sea.
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...DataStax Academy
An earthquake occurs in the Sea of Japan. A tsunami is likely to hit the coast. The population must be warned by SMS. A datacenter has been damaged by the earthquake. Will the alerting system still work ?
Building this simple alerting system is a great way to start with Cassandra, as we discovered teaching a bigdata hands-on class in a french university.
What were the reasons that made a majority of students to choose Cassandra to implement a fast, resilient and high availability big data system to be deployed on AWS ?
What were the common pitfalls, the modeling alternatives and their performance impact ?
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Paul Brebner
Apache Kafka, Apache Cassandra and Kubernetes are open source big data technologies enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpins the data layer of the stack providing capability to stream, disseminate, store and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters.
In this presentation, Paul will reveal how he architected a massive scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example Anomaly detection application running on a Kubernetes cluster and generating and processing massive amount of events. Anomaly detection is a method used to detect unusual events in an event stream.
It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at massive scale generating millions or billions of events, they impose significant computational, performance and scalability challenges to anomaly detection algorithms and data layer technologies. Paul will demonstrate the scalability, performance and cost effectiveness of Apache Kafka, Cassandra and Kubernetes, with results from his experiments allowing the Anomaly detection application to scale to 19 Billion anomaly checks per day.
Melbourne Big Data Meetup, March 5 2020
https://www.eventbrite.com/e/melbourne-big-data-meetup-realtime-anomaly-detection-with-cassandra-kafka-tickets-93028445585
Why test automation is getting more difficult, and what can be done about it. This slides are from a presentation by Group Director, Product Management at TestPlant, Gordon McKeown, which was presented at the Northern Lights conference in Manchester in April 2016.
Log Monitoring and Anomaly Detection at Scale at ORNLElasticsearch
See how Oak Ridge National Laboratory transitioned from using COTS toolset to a more cost-effective and flexible open source model by employing NiFi, Kafka, and the Elastic Stack.
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
This talk will address how a new architecture is emerging for analytics, based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK). Popular architecture like Lambda separate layers of computation and delivery and require many technologies which have overlapping functionality. Some of this results in duplicated code, untyped processes, or high operational overhead, let alone the cost (i.e. ETL). I will discuss the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. We will cover how the particular set of technologies addresses common requirements and how collaboratively they work together to enrich and reinforce each other.
It’s one thing to support many data sources with megabytes of data. It’s a completely different problem supporting thousands of data sources with terabytes of data every day. How do you create systems that scale infinitely?
The answer is; you don’t . You can not design for infinite scalability. Rather, consider a pod approach where each pod supports a defined capacity. Scalability results from deployment of multiple cooperating pods.
Systems handling extremely large data sources with significant processing requirements are difficult at best to validate. Attempting to deploy such a system without well understood capacity limits is destined for failure.
This was first presented at Cloud Expo NYC.
How the Internet of Things is Turning the Internet Upside DownTed Dunning
This is a wide-ranging talk that goes into how the internet is architected, how that architecture is changing as a result of internet of things, how the internet of things worked in the 19th century big data, open-source community and how to build time-series databases to make this all possible.
Really.
Apache Kafka, Apache Cassandra and Kubernetes are open source big data technologies enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpins the data layer of the stack providing capability to stream, disseminate, store and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters. In this presentation, we will reveal how we architected a massive scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example Anomaly detection application running on a Kubernetes cluster and generating and processing massive amount of events. Anomaly detection is a method used to detect unusual events in an event stream. It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at massive scale generating millions or billions of events, they impose significant computational, performance and scalability challenges to anomaly detection algorithms and data layer technologies. We will demonstrate the scalability, performance and cost effectiveness of Apache Kafka, Cassandra and Kubernetes, with results from our experiments allowing the Anomaly detection application to scale to 19 Billion anomaly checks per day.
Concurrency at Scale: Evolution to Micro-ServicesRandy Shoup
Most large-scale web companies have evolved their system architecture from a monolithic application and monolithic database to a set of loosely coupled micro-services. Using examples from Google, eBay, and KIXEYE, this talk outlines the pros and cons of these different stages of evolution, and makes practical suggestions about when and how other organizations should consider migrating to micro-services. It concludes with some more advanced implications of a micro-services architecture, including SLAs, cost-allocation, and vendor-customer relationships within the organization.
This tutorial gives out an brief and interesting introduction to modern stream computing technologies. The participants can learn the essential concepts and methodologies for designing and building a advanced stream processing system. The tutorial unveils the key fundamentals behind various kinds of design choices. Some forecast of technology developments in this domain is also introduced at the last section of this tutorial.
“Spikey Workloads”:
Emergency Management in the Cloud
One of the best use cases for the cloud involves websites with surges in computing needs. This session will feature organizations that have leveraged the cloud to handle their unique burst workloads without breaking the bank:
Speaker: , Solutions Architect, Amazon Web Services
Similar to Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in the Cloud (20)
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. Netflix, Inc.
• World's leading internet television network
• ~ 38 Million subscribers in 40+ countries
• Over a billion hours streamed per month
• Approximately 33% of all US Internet traffic at
night
• Recent Notables
• Increased Originals catalog
• Large open source contribution
• OpenConnect (homegrown CDN)
2
3. About Me
• Manage Cloud Performance Engineering Team
• Sub-team of Cloud Solutions Organization
• Focus on performance since 2000
• Large-scale billing applications, eCommerce, datacenter
mgmt., etc.
• Genentech, McKesson, Amdocs, Mercury Int., HP, etc.
• Passion for tackling performance at cloud-scale
• Looking for great performance engineers
• cwatson@netflix.com
3
4. Freedom and Responsibility
• Culture deck..a great read
• Good performers: 2x, Top performers: 10x
• What engineers dislike
• cumbersome processes
• deployment inefficiency
• restricted access
• restricted technical freedom
• lack of trust
• If removed…maximize:
• Engineering velocity
• Engineer satisfaction
4
6. How
• Implementation freedom
• SCM, libraries, language
• that said..platform benefits exist
• Deployment freedom
• Service team owns
• push schedule, functionality, performance
• operational activities (being paged)
• On-demand cloud capacity
• Thousands of instances at the push of a button
6
14. Fear (Revere) the Monkeys
• Simulate
• Latency
• Errors
• Initiate
• Instance Termination
• Availability Zone Failure
• Identify
• Configuration Drift
… in Test and Production
14
17. Automated Canary Analysis
• Identify regression between new and existing code
• Point ACA to baseline (prod) and canary ASG
• Typically analyze an hours worth of time series data
• Compare ratio of averages between canary and baseline
• Evaluate range and noise; determine quality of signal
• Bucket: Hot, Cold, Noisy, or OK
• Multiple classifiers available
• Multiple metric collections (e.g. hand-picked by service, general)
• Rollup
• Constrained: along metric dimensions
• Final: Score the canary
• Implementation: R-based analysis
17
21. Dynamic Scaling
EC2 footprint autoscales 2500-3500 instances per
day
• order of tens of thousands of EC2 instances
• Larger ASG spans 200-900 m2.4xlarge daily
Why:
• Improved scalability during unexpected workloads
• Absorb variance in service performance profile
• Reactive chain of dependencies
• Creates "reserved instance troughs" for batch activity
21
22. Dynamic Scaling, cont.
Example covers 3 services
• 2 edge (A,B), 1 mid-tier (C)
• C has more upstream services
than simply A and B
Multiple Autoscaling Policies
• (A) System Load Average
• (B,C) Request-Rate based
22
24. Dynamic Scaling, cont.
• Response time variability greatest during scaling events
• Average response time primary between 75-150 msec
24
25. Dynamic Scaling, cont.
• Instance counts 3x, Aggregate requests 4.5x (not shown)
• Average CPU utilization per instance: ~25-55%25
26. Study performed:
• 24 node C* SSD-based cluster (hi1.4xlarge)
• mid-tier service load application
• Targeting 2x production rates
• Increase read ops from 30k to to 70k in ~ 3 minutes
• Increase write ops 750 to 1500 in ~ 3 minutes
Results:
• 95th pctl response time increase: ~ 17 msec to 45 msec
• 99th pctl response time increase: ~ 35 msec to 80 msec
Cassandra Performance
26
27. Response times consistent during 4x increase in load *
* Due to upstream code change
EVcache (memcached) Scalability
27
28. Cloud-scale Load Testing
• Ad-Hoc or CI-based load test model
• (CI) Run-over-run comparison; email on rule violation
1. Jenkins initiates job
2. JMeter instances apply load
3. Results written to s3
4. Instance metrics published to
Atlas
5. Raw data fetched and
processed
28
29. Conclusions
• Continually accelerate engineering velocity
• Evolve architecture and processes to mitigate risks
• Stateless micro-service architectures win!
• Remove barriers for engineers
• Last option should be to reduce rate of change
• Exercise failure and “thundering herd” scenarios
• Cloud native scaling and resiliency are key factors
• Leverage pre-existing OSS PaaS when possible
29
30. Netflix Open Source
Our Open Source Software simplifies mgmt at
scale
Great projects, stunning colleagues:
jobs.netflix.com30