It’s impossible to understand the health of your system without real-time logging. Our logging platform is one of our favorite and most popular features, and currently handles millions of requests and gigabytes per second of traffic. It’s stateless, real-time, and can provide insight and ship data to a multitude of destinations. Fastly has consistently iterated to keep up with the growth of the platform, and we’ve learned many lessons along the way. In this talk, you’ll get a peek into the system, and how we’ve developed it.
Presented at Stream Processing Meetup (7/19/2018)(https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/251481797/).
At Uber, we operate 20+ Kafka clusters to collect system and application logs as well as event data from rider and driver apps. We need a Kafka replication solution to replicate data between Kafka clusters across multiple data centers for different purposes. This talk will introduce the history behind uReplicator and the high level architecture. As the original uReplicator ran into scalability challenges and operational overhead as the scale of Kafka clusters increased, we built the Federated uReplicator which addressed above issues and provide an extensible architecture for further scaling.
Building data product requires having lambda architecture to bridge the batch and streaming processing. AirStream is a framework built on top of HBase to allow users to easily build data products at Airbnb. It proved HBase is impactful and useful in the production for mission critical data products.
In the talk, we will present the applications to leverage HBase to compute moving average, distinct count, window based join and etc. in the streaming computation.
Also, we will talk about how to leverage HBase to bridge the gap between batch and streaming queries, including building presto-hbase connector to serve near real time ad-hoc query.
by Liyin Tang of AirBnB
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per DayAnkur Bansal
Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder.
Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Samza) and in-house technologies have helped Uber scale.
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...Flink Forward
Stream Processing in conjunction with a Consistent, Durable, Reliable stream storage is kicking the revolution up a notch in Big Data processing. This modern paradigm is enabling a new generation of data middleware that delivers on the streaming promise of a simplified and unified programming model. From data ingest, transformation, and messaging to search, time series and more, a robust streaming data ecosystem means we’ll all be able to more quickly build applications that solve problems we could not solve before.
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward
Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos. In this talk we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.
Apache Kafka, Apache Cassandra and Kubernetes are open source big data technologies enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpins the data layer of the stack providing capability to stream, disseminate, store and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters. In this presentation, we will reveal how we architected a massive scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example Anomaly detection application running on a Kubernetes cluster and generating and processing massive amount of events. Anomaly detection is a method used to detect unusual events in an event stream. It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at massive scale generating millions or billions of events, they impose significant computational, performance and scalability challenges to anomaly detection algorithms and data layer technologies. We will demonstrate the scalability, performance and cost effectiveness of Apache Kafka, Cassandra and Kubernetes, with results from our experiments allowing the Anomaly detection application to scale to 19 Billion anomaly checks per day.
Presented at Stream Processing Meetup (7/19/2018)(https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/251481797/).
At Uber, we operate 20+ Kafka clusters to collect system and application logs as well as event data from rider and driver apps. We need a Kafka replication solution to replicate data between Kafka clusters across multiple data centers for different purposes. This talk will introduce the history behind uReplicator and the high level architecture. As the original uReplicator ran into scalability challenges and operational overhead as the scale of Kafka clusters increased, we built the Federated uReplicator which addressed above issues and provide an extensible architecture for further scaling.
Building data product requires having lambda architecture to bridge the batch and streaming processing. AirStream is a framework built on top of HBase to allow users to easily build data products at Airbnb. It proved HBase is impactful and useful in the production for mission critical data products.
In the talk, we will present the applications to leverage HBase to compute moving average, distinct count, window based join and etc. in the streaming computation.
Also, we will talk about how to leverage HBase to bridge the gap between batch and streaming queries, including building presto-hbase connector to serve near real time ad-hoc query.
by Liyin Tang of AirBnB
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per DayAnkur Bansal
Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder.
Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Samza) and in-house technologies have helped Uber scale.
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...Flink Forward
Stream Processing in conjunction with a Consistent, Durable, Reliable stream storage is kicking the revolution up a notch in Big Data processing. This modern paradigm is enabling a new generation of data middleware that delivers on the streaming promise of a simplified and unified programming model. From data ingest, transformation, and messaging to search, time series and more, a robust streaming data ecosystem means we’ll all be able to more quickly build applications that solve problems we could not solve before.
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward
Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos. In this talk we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.
Apache Kafka, Apache Cassandra and Kubernetes are open source big data technologies enabling applications and business operations to scale massively and rapidly. While Kafka and Cassandra underpins the data layer of the stack providing capability to stream, disseminate, store and retrieve data at very low latency, Kubernetes is a container orchestration technology that helps in automated application deployment and scaling of application clusters. In this presentation, we will reveal how we architected a massive scale deployment of a streaming data pipeline with Kafka and Cassandra to cater to an example Anomaly detection application running on a Kubernetes cluster and generating and processing massive amount of events. Anomaly detection is a method used to detect unusual events in an event stream. It is widely used in a range of applications such as financial fraud detection, security, threat detection, website user analytics, sensors, IoT, system health monitoring, etc. When such applications operate at massive scale generating millions or billions of events, they impose significant computational, performance and scalability challenges to anomaly detection algorithms and data layer technologies. We will demonstrate the scalability, performance and cost effectiveness of Apache Kafka, Cassandra and Kubernetes, with results from our experiments allowing the Anomaly detection application to scale to 19 Billion anomaly checks per day.
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward
The increasing number of available data sources in today's application stacks created a demand to continuously capture and process data from various sources to quickly turn high volume streams of raw data into actionable insights. Apache Flink addresses many of the challenges faced in this domain as it's specifically tailored to distributed computations over streams. While Flink provides all the necessary capabilities to process streaming data, provisioning and maintaining a Flink cluster still requires considerable effort and expertise. We will discuss how cloud services can remove most of the burden of running the clusters underlying your Flink jobs and explain how to build a real-time processing pipeline on top of AWS by integrating Flink with Amazon Kinesis and Amazon EMR. We will furthermore illustrate how to leverage the reliable, scalable, and elastic nature of the AWS cloud to effectively create and operate your real-time processing pipeline with little operational overhead.
How to Improve the Observability of Apache Cassandra and Kafka applications...Paul Brebner
As distributed cloud applications grow more complex, dynamic, and massively scalable, “observability” becomes more critical.
Observability is the practice of using metrics, monitoring and distributed tracing to understand how a system works.
We’ll explore two complementary Open Source technologies:
Prometheus for monitoring application metrics, and
OpenTracing and Jaeger for distributed tracing.
We’ll discover how they improve the observability of
an Anomaly Detection application, deployed on AWS Kubernetes, and using Instaclustr managed Apache Cassandra and Kafka clusters.
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward
Flink is a great stream processor, Python is a great programming language, Apache Beam is a great programming model and portability layer. Using all three together is a great idea! We will demo and discuss writing Beam Python pipelines and running them on Flink. We will cover Beam's portability vision that led here, what you need to know about how Beam Python pipelines are executed on Flink, and where Beam's portability framework is headed next (hint: Python pipelines reading from non-Python connectors)
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
“Customer experience is the next big battle ground for telcos,” proclaimed recently Amit Akhelikar, Global Director of Lynx Analytics at TM Forum Live! Asia in Singapore. But, how to fight in this battle? A common approach has been to keep “under control” some well-known network quality indicators, like dropped calls, radio access congestion, availability, and so on; but this has proven not to be enough to keep customers happy, like a siege weapon is not enough to conquer a city. But, what if it were possible to know how customers perceive services, at least most demanded ones, like web browsing or video streaming? That would be like a squad of archers ready to battle. And even having that, how to extract value of it and take actions in no time, giving our skilled archers the right targets? Meet CANVAS (Customer And Network Visualization and AnaltyticS), one of the first LATAM implementations of a Flink-based stream processing use case for a telco, which successfully combines leading and innovative technologies like Apache Hadoop, YARN, Kafka, Nifi, Druid and advanced visualizations with Flink core features like non-trivial stateful stream processing (joins, windows and aggregations on event time) and CEP capabilities for alarm generation, delivering a next-generation tool for SOC (Service Operation Center) teams.
Kafka for Real-Time Event Processing in Serverless Environmentsconfluent
(Jeff Sharpe + Alex Srisuwan, Capital One) Kafka Summit SF 2018
Using Kafka as a platform messaging bus is common, but bridging communication between real-time and asynchronous components can become complicated, especially when dealing with serverless environments. This has become increasingly common in modern banking where events need to be processed at near-real-time speed. Serverless environments are well-suited to address these needs, and Kafka remains an excellent solution for providing the reliable, resilient communication layer between serverless components and dedicated stream processing services.
In this talk, we will examine some of the strengths and weaknesses of using Kafka for real-time communication, some tips for efficient interactions with Kafka and AWS Lambda, and a number of useful patterns for maximizing the strengths of Kafka and serverless components.
Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder. Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Flink) and in-house technologies have helped Uber scale.
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...HostedbyConfluent
Logging ingestion infrastructure at Pinterest is built around Apache Kafka to support thousands of pipelines with over 1 trillion (1PB) new messages generated by hundreds of services (written in 5 different languages) and transported to data lake (AWS S3) every day. In the past, we have focused on scalability and auto operation of the infrastructure to help internal teams quickly onboard new pipelines (Kafka Summit 2018, 2020). However, we had constantly observed data loss and data corruption due to the design decisions we made to favor scalability and availability over durability and consistency.
To tackle these problems, we designed and implemented logging auditing framework which consists of (1) audit client library integrated into every component of the infrastructure to detect data corruption for every message and send out audit events for randomly picked messages, (2) Kafka clusters receiving audit events, and (3) realtime and batch application processing audit events to generate insights for alerting and reporting.
Focusing on zero negative impact to existing ingestion pipelines, scalability and cost efficiency led us to make various design decisions to eventually achieve auditing rollout to every pipeline with zero downtime and fundamentally improve the data ingestion quality at Pinterest in general by tracking data loss and removing data corruption which in the past can block downstream applications for hours and often lead to severe incidents.
The need for gleaning answers from unbounded data streams is moving from nicety to a necessity. Netflix is a data driven company, and has a need to process over 1 trillion events a day amounting to 3 PB of data to derive business insights.
To ease extracting insight, we are building a self-serve, scalable, fault-tolerant, multi-tenant "Stream Processing as a Service" platform so the user can focus on data analysis. I'll share our experience using Flink to help build the platform.
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...Flink Forward
The streaming platform team at Lyft has been running Flink jobs in production for more than a year now, powering critical use cases like improving pickup ETA accuracy, dynamic pricing, generating machine learning features for fraud detection, real-time analytics among many others. Broadly, the jobs fall into two abstraction layers: applications (Flink jobs that run on the native platform) and analytics (that leverage Dryft, Lyft’s fully managed data processing engine). This talk will give an overview of the platform architecture, deployment model and user experience. The talk will also dive deeper into some of the challenges and the lessons that were learnt, running Flink jobs at scale, specifically around scaling Flink connectors, dealing with event time skew (source synchronization) and highlight common patterns of problems observed across several Flink jobs. Finally, the talk will give insights into how we are re-architecting the streaming platform @ Lyft using a Kubernetes based deployment.
Jingwei Lu and Jason Zhang (Airbnb)
AirStream is a realtime stream computation framework built on top of Spark Streaming and HBase that allows our engineers and data scientists to easily leverage HBase to get real-time insights and build real-time feedback loops. In this talk, we will introduce AirStream, and then go over a few production use cases.
(Krunal Vora, Tinder) Kafka Summit San Francisco 2018
At Tinder, we have been using Kafka for streaming and processing events, data science processes and many other integral jobs. Forming the core of the pipeline at Tinder, Kafka has been accepted as the pragmatic solution to match the ever increasing scale of users, events and backend jobs. We, at Tinder, are investing time and effort to optimize the usage of Kafka solving the problems we face in the dating apps context. Kafka forms the backbone for the plans of the company to sustain performance through envisioned scale as the company starts to grow in unexplored markets. Come, learn about the implementation of Kafka at Tinder and how Kafka has helped solve the use cases for dating apps. Engage in the success story behind the business case of Kafka at Tinder.
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...HostedbyConfluent
Should you consume Kafka in a stream OR batch? When should you choose each one? What is more efficient, and cost effective?
In this talk we’ll give you the tools and metrics to decide which solution you should apply when, and show you a real life example with cost & time comparisons.
To highlight the differences, we’ll dive into a project we’ve done, transitioning from reading Kafka in a stream to reading it in batch.
By turning conventional thinking on its head and reading our multi-petabyte Kafka stream in batch using Spark and Airflow, we’ve achieved a huge cost reduction of 65% while at the same time getting a more scalable and resilient solution.
We’ll explore the tradeoffs and give you the metrics and intuition you’ll need to make such decisions yourself.
We’ll cover:
Costs of processing in stream compared to batch
Scaling up for bursts and reprocessing
Making the tradeoff between wait times and costs
Recovering from outages
And much more…
NetflixOSS Meetup S6E1 - Titus & Containersaspyker
Come hear about our container management platform, Titus. Titus launches over 2 millions containers per week for service and batch workloads. Come to learn what applications are powered by Titus and what values the developers are getting from containers. Also, we will cover some of the Titus unique aspects of reliability, control plane, scheduling, and container runtime technologies. We will also cover our integrations with Netflix systems such as Spinnaker as well as Amazon concepts such as VPC and IAM.
https://www.meetup.com/Netflix-Open-Source-Platform/events/247776324/
On Tuesday, June 12th at 1pm EDT, ChronoLogic Developer Anthony Adegbemi and Community Manager Sean Morgan will host a Livestream to unveil new ChronoLogic Tools.
You can view the recording at https://youtu.be/uXcy-xIngMw
The LiveStream included:
The Latest Development Updates
The Electron Dapp Demo and Discussion
The Token Distribution Allocator Demo and Discussion
Community Questions
Sean Morgan addressed the communities most pressing questions and interviewed Anthony about the implications of ChronoLogic's most recent developments.
If you want to find out the latest ChronoLogic information, you will not want to miss this LiveStream.
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatHostedbyConfluent
When your Kafka clusters start growing so is the cost associated with them. As administrators we have to ensure that the service we support is operating in the most reliable way to satisfy the customers. However, for our business it is as important that we ensure the same service is also cost-efficient. There are two ways we can optimize the cost of service – tuning broker machines and tuning the data transfers. Minimizing data transfer is the largest return on investment since that is what accounts for the most spend. With the use of Kafka administrative tools and metrics we can find multiple ways to reduce the data transfers in the clusters.
The presentation will cover various techniques administrators of Kafka service can employ to reduce the data transfers and to save the operational costs. Reducing cross-AZ traffic, optimizing batching with use of DumpLogSegment script, utilizing Kafka metrics to shut down unused data streams and more.
With an objective of making our Kafka deployment as cost effective as possible, we have gained money saving tricks. And we would love to share them with the community.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2l2Rr6L.
Doug Daniels discusses the cloud-based platform they have built at DataDog and how it differs from a traditional datacenter-based analytics stack. He walks through the decisions they have made at each layer, covers the pros and cons of these decisions and discusses the tooling they have built. Filmed at qconsf.com.
Doug Daniels is a Director of Engineering at Datadog, where he works on high-scale data systems for monitoring, data science, and analytics. Prior to joining Datadog, he was CTO at Mortar Data and an architect and developer at Wireless Generation, where he designed data systems to serve more than 4 million students in 49 states.
Serverless is great for web applications and APIs, but this does not mean it cannot be used successfully for other use cases. In this talk, we will discuss a successful application of serverless in the field of High Performance Computing. Specifically we will discuss how Lambda, Fargate, Kinesis and other serverless technologies are being used to run sophisticated financial models at one of the major reinsurance companies in the World. We we learn about the architecture, the tradeoffs, some challenges and some unresolved pain points. Most importantly, we'll find out if serverless can be a great fit for HPC and if we can finally stop managing those boring EC2 instances!
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward
The increasing number of available data sources in today's application stacks created a demand to continuously capture and process data from various sources to quickly turn high volume streams of raw data into actionable insights. Apache Flink addresses many of the challenges faced in this domain as it's specifically tailored to distributed computations over streams. While Flink provides all the necessary capabilities to process streaming data, provisioning and maintaining a Flink cluster still requires considerable effort and expertise. We will discuss how cloud services can remove most of the burden of running the clusters underlying your Flink jobs and explain how to build a real-time processing pipeline on top of AWS by integrating Flink with Amazon Kinesis and Amazon EMR. We will furthermore illustrate how to leverage the reliable, scalable, and elastic nature of the AWS cloud to effectively create and operate your real-time processing pipeline with little operational overhead.
How to Improve the Observability of Apache Cassandra and Kafka applications...Paul Brebner
As distributed cloud applications grow more complex, dynamic, and massively scalable, “observability” becomes more critical.
Observability is the practice of using metrics, monitoring and distributed tracing to understand how a system works.
We’ll explore two complementary Open Source technologies:
Prometheus for monitoring application metrics, and
OpenTracing and Jaeger for distributed tracing.
We’ll discover how they improve the observability of
an Anomaly Detection application, deployed on AWS Kubernetes, and using Instaclustr managed Apache Cassandra and Kafka clusters.
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward
Flink is a great stream processor, Python is a great programming language, Apache Beam is a great programming model and portability layer. Using all three together is a great idea! We will demo and discuss writing Beam Python pipelines and running them on Flink. We will cover Beam's portability vision that led here, what you need to know about how Beam Python pipelines are executed on Flink, and where Beam's portability framework is headed next (hint: Python pipelines reading from non-Python connectors)
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
“Customer experience is the next big battle ground for telcos,” proclaimed recently Amit Akhelikar, Global Director of Lynx Analytics at TM Forum Live! Asia in Singapore. But, how to fight in this battle? A common approach has been to keep “under control” some well-known network quality indicators, like dropped calls, radio access congestion, availability, and so on; but this has proven not to be enough to keep customers happy, like a siege weapon is not enough to conquer a city. But, what if it were possible to know how customers perceive services, at least most demanded ones, like web browsing or video streaming? That would be like a squad of archers ready to battle. And even having that, how to extract value of it and take actions in no time, giving our skilled archers the right targets? Meet CANVAS (Customer And Network Visualization and AnaltyticS), one of the first LATAM implementations of a Flink-based stream processing use case for a telco, which successfully combines leading and innovative technologies like Apache Hadoop, YARN, Kafka, Nifi, Druid and advanced visualizations with Flink core features like non-trivial stateful stream processing (joins, windows and aggregations on event time) and CEP capabilities for alarm generation, delivering a next-generation tool for SOC (Service Operation Center) teams.
Kafka for Real-Time Event Processing in Serverless Environmentsconfluent
(Jeff Sharpe + Alex Srisuwan, Capital One) Kafka Summit SF 2018
Using Kafka as a platform messaging bus is common, but bridging communication between real-time and asynchronous components can become complicated, especially when dealing with serverless environments. This has become increasingly common in modern banking where events need to be processed at near-real-time speed. Serverless environments are well-suited to address these needs, and Kafka remains an excellent solution for providing the reliable, resilient communication layer between serverless components and dedicated stream processing services.
In this talk, we will examine some of the strengths and weaknesses of using Kafka for real-time communication, some tips for efficient interactions with Kafka and AWS Lambda, and a number of useful patterns for maximizing the strengths of Kafka and serverless components.
Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder. Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Flink) and in-house technologies have helped Uber scale.
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...HostedbyConfluent
Logging ingestion infrastructure at Pinterest is built around Apache Kafka to support thousands of pipelines with over 1 trillion (1PB) new messages generated by hundreds of services (written in 5 different languages) and transported to data lake (AWS S3) every day. In the past, we have focused on scalability and auto operation of the infrastructure to help internal teams quickly onboard new pipelines (Kafka Summit 2018, 2020). However, we had constantly observed data loss and data corruption due to the design decisions we made to favor scalability and availability over durability and consistency.
To tackle these problems, we designed and implemented logging auditing framework which consists of (1) audit client library integrated into every component of the infrastructure to detect data corruption for every message and send out audit events for randomly picked messages, (2) Kafka clusters receiving audit events, and (3) realtime and batch application processing audit events to generate insights for alerting and reporting.
Focusing on zero negative impact to existing ingestion pipelines, scalability and cost efficiency led us to make various design decisions to eventually achieve auditing rollout to every pipeline with zero downtime and fundamentally improve the data ingestion quality at Pinterest in general by tracking data loss and removing data corruption which in the past can block downstream applications for hours and often lead to severe incidents.
The need for gleaning answers from unbounded data streams is moving from nicety to a necessity. Netflix is a data driven company, and has a need to process over 1 trillion events a day amounting to 3 PB of data to derive business insights.
To ease extracting insight, we are building a self-serve, scalable, fault-tolerant, multi-tenant "Stream Processing as a Service" platform so the user can focus on data analysis. I'll share our experience using Flink to help build the platform.
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...Flink Forward
The streaming platform team at Lyft has been running Flink jobs in production for more than a year now, powering critical use cases like improving pickup ETA accuracy, dynamic pricing, generating machine learning features for fraud detection, real-time analytics among many others. Broadly, the jobs fall into two abstraction layers: applications (Flink jobs that run on the native platform) and analytics (that leverage Dryft, Lyft’s fully managed data processing engine). This talk will give an overview of the platform architecture, deployment model and user experience. The talk will also dive deeper into some of the challenges and the lessons that were learnt, running Flink jobs at scale, specifically around scaling Flink connectors, dealing with event time skew (source synchronization) and highlight common patterns of problems observed across several Flink jobs. Finally, the talk will give insights into how we are re-architecting the streaming platform @ Lyft using a Kubernetes based deployment.
Jingwei Lu and Jason Zhang (Airbnb)
AirStream is a realtime stream computation framework built on top of Spark Streaming and HBase that allows our engineers and data scientists to easily leverage HBase to get real-time insights and build real-time feedback loops. In this talk, we will introduce AirStream, and then go over a few production use cases.
(Krunal Vora, Tinder) Kafka Summit San Francisco 2018
At Tinder, we have been using Kafka for streaming and processing events, data science processes and many other integral jobs. Forming the core of the pipeline at Tinder, Kafka has been accepted as the pragmatic solution to match the ever increasing scale of users, events and backend jobs. We, at Tinder, are investing time and effort to optimize the usage of Kafka solving the problems we face in the dating apps context. Kafka forms the backbone for the plans of the company to sustain performance through envisioned scale as the company starts to grow in unexplored markets. Come, learn about the implementation of Kafka at Tinder and how Kafka has helped solve the use cases for dating apps. Engage in the success story behind the business case of Kafka at Tinder.
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...HostedbyConfluent
Should you consume Kafka in a stream OR batch? When should you choose each one? What is more efficient, and cost effective?
In this talk we’ll give you the tools and metrics to decide which solution you should apply when, and show you a real life example with cost & time comparisons.
To highlight the differences, we’ll dive into a project we’ve done, transitioning from reading Kafka in a stream to reading it in batch.
By turning conventional thinking on its head and reading our multi-petabyte Kafka stream in batch using Spark and Airflow, we’ve achieved a huge cost reduction of 65% while at the same time getting a more scalable and resilient solution.
We’ll explore the tradeoffs and give you the metrics and intuition you’ll need to make such decisions yourself.
We’ll cover:
Costs of processing in stream compared to batch
Scaling up for bursts and reprocessing
Making the tradeoff between wait times and costs
Recovering from outages
And much more…
NetflixOSS Meetup S6E1 - Titus & Containersaspyker
Come hear about our container management platform, Titus. Titus launches over 2 millions containers per week for service and batch workloads. Come to learn what applications are powered by Titus and what values the developers are getting from containers. Also, we will cover some of the Titus unique aspects of reliability, control plane, scheduling, and container runtime technologies. We will also cover our integrations with Netflix systems such as Spinnaker as well as Amazon concepts such as VPC and IAM.
https://www.meetup.com/Netflix-Open-Source-Platform/events/247776324/
On Tuesday, June 12th at 1pm EDT, ChronoLogic Developer Anthony Adegbemi and Community Manager Sean Morgan will host a Livestream to unveil new ChronoLogic Tools.
You can view the recording at https://youtu.be/uXcy-xIngMw
The LiveStream included:
The Latest Development Updates
The Electron Dapp Demo and Discussion
The Token Distribution Allocator Demo and Discussion
Community Questions
Sean Morgan addressed the communities most pressing questions and interviewed Anthony about the implications of ChronoLogic's most recent developments.
If you want to find out the latest ChronoLogic information, you will not want to miss this LiveStream.
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatHostedbyConfluent
When your Kafka clusters start growing so is the cost associated with them. As administrators we have to ensure that the service we support is operating in the most reliable way to satisfy the customers. However, for our business it is as important that we ensure the same service is also cost-efficient. There are two ways we can optimize the cost of service – tuning broker machines and tuning the data transfers. Minimizing data transfer is the largest return on investment since that is what accounts for the most spend. With the use of Kafka administrative tools and metrics we can find multiple ways to reduce the data transfers in the clusters.
The presentation will cover various techniques administrators of Kafka service can employ to reduce the data transfers and to save the operational costs. Reducing cross-AZ traffic, optimizing batching with use of DumpLogSegment script, utilizing Kafka metrics to shut down unused data streams and more.
With an objective of making our Kafka deployment as cost effective as possible, we have gained money saving tricks. And we would love to share them with the community.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2l2Rr6L.
Doug Daniels discusses the cloud-based platform they have built at DataDog and how it differs from a traditional datacenter-based analytics stack. He walks through the decisions they have made at each layer, covers the pros and cons of these decisions and discusses the tooling they have built. Filmed at qconsf.com.
Doug Daniels is a Director of Engineering at Datadog, where he works on high-scale data systems for monitoring, data science, and analytics. Prior to joining Datadog, he was CTO at Mortar Data and an architect and developer at Wireless Generation, where he designed data systems to serve more than 4 million students in 49 states.
Serverless is great for web applications and APIs, but this does not mean it cannot be used successfully for other use cases. In this talk, we will discuss a successful application of serverless in the field of High Performance Computing. Specifically we will discuss how Lambda, Fargate, Kinesis and other serverless technologies are being used to run sophisticated financial models at one of the major reinsurance companies in the World. We we learn about the architecture, the tradeoffs, some challenges and some unresolved pain points. Most importantly, we'll find out if serverless can be a great fit for HPC and if we can finally stop managing those boring EC2 instances!
This talk focuses on how we used Amazon Kinesis to build the pub-sub infra at Lyft, that ingests more than a 100 billion events per day. We'll review the strengths and weaknesses of Kinesis as a choice for streaming events in realtime, at Lyft's scale; as well as the best practices and lessons learnt over time.
Speaker: Hafiz Hamid (Lyft)
Hafiz Hamid is a software engineer on the Pub-Sub/Streaming Platform team at Lyft. He has built some of the key pieces in the messaging & streaming infrastructure at Lyft. Previously, Hafiz was a technical lead at Bing Search where he worked on data pipelines, relevance and web crawlers.
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Landon Robinson
The Spark Listener interface provides a fast, simple and efficient route to monitoring and observing your Spark application - and you can start using it in minutes. In this talk, we'll introduce the Spark Listener interfaces available in core and streaming applications, and show a few ways in which they've changed our world for the better at SpotX. If you're looking for a "Eureka!" moment in monitoring or tracking of your Spark apps, look no further than Spark Listeners and this talk!
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks
The Spark Listener interface provides a fast, simple and efficient route to monitoring and observing your Spark application - and you can start using it in minutes. In this talk, we'll introduce the Spark Listener interfaces available in core and streaming applications, and show a few ways in which they've changed our world for the better at SpotX. If you're looking for a "Eureka!" moment in monitoring or tracking of your Spark apps, look no further than Spark Listeners and this talk!
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
In this talk, Till Rohrmann and Addison Higham discuss how Flink allows for ambitious stream processing workflows and how Pulsar and Flink enable new capabilities that push forward the state-of-the-art in streaming. They will also share upcoming features and new capabilities in the integrations between Flink and Pulsar and how these two communities are working together to truly advance the power of stream processing.
Building an Observability Platform in 389 Difficult StepsDigitalOcean
Watch this Tech Talk: https://do.co/video_dworth
Dave Worth, Engineering Manager at Strava, lays out a strategy for choosing the right tech stack depending on your business and team need. Watch as he guides you through tool sets that navigate around business constraints and regulatory concerns.
About the Presenter
Dave Worth’s professional life consists of being a web and backend engineer who developed specialization in observability through building reliable distributed systems at Strava, and previously DigitalOcean. In his spare time, Dave loves cycling, jiu jitsu, and searching for another great math book to only read the first 50 pages of.
New to DigitalOcean? Get US $100 in credit when you sign up: https://do.co/deploytoday
To learn more about DigitalOcean: https://www.digitalocean.com/
Follow us on Twitter: https://twitter.com/digitalocean
Like us on Facebook: https://www.facebook.com/DigitalOcean
Follow us on Instagram: https://www.instagram.com/thedigitalocean/
We're hiring: http://do.co/careers
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisAmazon Web Services
Thousands of services work in concert to deliver millions of hours of video streams to Netflix customers every day. These applications vary in size, function, and technology, but they all make use of the Netflix network to communicate. Understanding the interactions between these services is a daunting challenge both because of the sheer volume of traffic and the dynamic nature of deployments. In this talk, we’ll first discuss why Netflix chose Amazon Kinesis Streams over other streaming data solutions like Kafka to address these challenges at scale. We’ll then dive deep into how Netflix uses Amazon Kinesis Streams to enrich network traffic logs and identify usage patterns in real time. Lastly, we will cover how Netflix uses this system to build comprehensive dependency maps, increase network efficiency, and improve failure resiliency. From this talk, you’ll take away techniques and processes that you can apply to your large-scale networks and derive real-time, actionable insights.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
ndependent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloITCamp
Both CQRS and Event Sourcing are by no means “new stuff” anymore, yet a lot can be told about how to use Azure’s PaaS to implement such patterns and unleash their power. The ingredients are: DocumentDB as the event storage, Service Bus as the events’ dispatcher, Could Services/Service Fabric as the scalable, fault tolerant business logic container, SQL Azure as the read model and ASP .NET Core as the application framework used to implement views and back-end services. Eager to know the recipe? Don’t miss this talk then.
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Soroosh Khodami
Session Recording on Youtube
https://www.youtube.com/watch?v=uWPZQ_HMy10
- Session Description
Do you find yourself bombarded with buzzwords and overwhelmed by the rapid emergence of new technologies? "Stream Processing" is a tech buzzword that has been around for some time but is still unfamiliar to many. Join this session to discover its potential in software systems. I will share insights from Apache Flink, Apache Beam, Google Dataflow, and my experiences at Bol.com (the biggest e-commerce platform in the Netherlands) as we cover:
- Stream Processing overview: main concepts and features
- Apache Beam vs. Spring Boot comparison
- Key Considerations for Using Stream Processing
- Learning strategies to navigate this evolving landscape.
RFC 7540 was ratified over 2 years ago and, today, all major browsers, servers, and CDNs support the next generation of HTTP. Just over a year ago, at Velocity, we discussed the protocol, looked at some real world implications of its deployment and use, and what realistic expectations we should have from its use. Now that adoption is ramped up and the protocol is being regularly used on the Internet, it's a good time to revisit the protocol and its deployment. Has it evolved? Have we learned anything? Are all the features providing the benefits we were expecting? What's next?In this session, we'll review protocol basics and try to answer some of these questions based on real-world use of it. We'll dig into the core features like interaction with TCP, server push, priorities and dependencies, and HPACK. We'll look at these features through the lens of experience and see if good practice patterns have emerged. We'll also review available tools and discuss what protocol enhancements are in the near and not-so-near horizon.
Altitude San Francisco 2018: Preparing for Video Streaming Events at ScaleFastly
CBS Interactive streams some of the largest video streaming events on the planet, including SuperBowl in 2019. This talk will focus on all the work that goes in ahead of time to prepare and plan for game day. From architecture design to capacity reservations to operational visibility and building playbooks we will explore how we build, test and prepare for these large events. We will also explore how some of Fastly's unique features such as MediaShield and VCL are becoming critical to these workflows.
Altitude San Francisco 2018: Building the Souther Hemisphere of the InternetFastly
As a global organization, Fastly carefully selects and deploys POP locations to service the greater audience of the Internet. Fastly currently has 52 global POPs across the Internet, 13 of which are located in the Southern Hemisphere. Another 3 are outside North America, Europe, and Asia. During this talk, VP of Infrastructure Tom Daly will share our experience in building Fastly's network of POPs south of the equator, where, in some cases, the Internet we know here in San Francisco, is much different. Tom will explore the physical datacenter infrastructure, network topology, and network policy that pose of unique challenges when operating in these parts of the world.
Altitude San Francisco 2018: The World Cup StreamFastly
FuboTV’s recent offering of the 2018 FIFA World Cup broke all of our previous records for viewership and put our systems to the test as we delivered all 64 matches live. Coverage for a majority of games was spread out across ~150 regional sports networks, local FOX affiliates, owned and operated regional stations and other local FOX offerings, with a few early matches broadcasted on national channels. Running a successful World Cup required us to pay close attention to our caching strategies, delivery mechanisms, content edge-case handling and more. An event at this scale, spread out over a month, also gave us an excellent test bed to run experiments. We were able to augment our last-mile delivery, test/tweak our solution for CDN decisioning/priority, and even stand up a set of UHD HDR10 feeds to give our users their first glimpse of live OTT UHD offerings. We’ll run through this whole event from a scale and technology perspective and share our takeaways as we prepare for the upcoming NFL season and beyond.
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...Fastly
Braze is a customer engagement platform that delivers more than a billion messaging experiences across push, email, apps and more each day. In this session, Jon Hyman will describe the company's challenges during an inflection point in 2015 when the company reached the limitation of their physical networking equipment, and how Braze has since grown more than 7x on Fastly. Jon will also discuss how Braze uses Fastly's Layer 7 load balancing to improve stability and uptime of its APIs.
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless MigrationFastly
In this talk, Jeff Valeo from Grubhub will talk about how they leveraged Fastly to slowly migrate user traffic from a legacy monolith to a new, service-based architecture. This solution allowed Grubhub to shift millions of users as new functionality was built with zero downtime.
Altitude San Francisco 2018: Bringing TLS to GitHub PagesFastly
Sam Kottler, SRE Engineering Manager at GitHub will dig into how they rearchitected Pages, so that custom domains now support HTTPS, meaning over a million GitHub Pages sites will be served over HTTPS.
Altitude San Francisco 2018: HTTP Invalidation WorkshopFastly
One of the most powerful tools that Fastly offers is worldwide, instant purge. Come learn the ins and outs of how HTTP invalidation works in general and how purge and surrogate keys can be used to improve your site's delivery and get even more value from Fastly.
This talk will also cover the purge blast radius
Surrogate Keys are an amazing way to purge your content from cache, but they can be a bit scary when you aren't sure how many URLs this surrogate key is tied to or what kind of affect this will have on origin. Join the USA Today Network as we explain how we leverage big data tools, Go APIs, New Relic, and Sumo Logic to provide our users a suite of tools for purging content from Fastly. Developers love knowing the blast radius of their surrogate keys, while our engineers love the real-time metrics and notifications we get when developers are hard-purging content.
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...Fastly
Magento Commerce was first released by a small web development agency over ten years when they saw first-hand what a challenge it was for companies like them to build unique eCommerce sites. They created an open source platform that gives developers the flexibility to create meaningful shopping experiences while building a global community that drives down merchant costs and fosters innovation. Amid the rise of cloud-based software Magento needed to keep pace with more complex merchant needs and heightened shopper expectations. In this session learn how Magento, with the help of Partners like Fastly, evolved into a cloud-based platform without sacrificing their commitment to open software, flexibility, and the community.
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per dayFastly
ConsenSys is a venture production studio building decentralized applications and developer and end-user tools for blockchains. Their Infura platform is a core infrastructure pillar of Ethereum, enabling decentralized applications of all kinds to scale to accommodate their users.
Infura went from 20 million requests a day at the beginning of 2017 to over 10 billion requests today. This staggering 500x increase naturally lead to questions of scale.
In this talk, co-founder Michael Wuehler will discuss the technical challenges encountered while building and scaling the Infura platform, and the infrastructure decisions that led to their adoption of Fastly and other pivotal technologies.
Altitude San Francisco 2018: Authentication at the EdgeFastly
Turning away unwanted traffic close to the source is a common and key use case for edge networks like Fastly, but identity, authentication, and authorization at the edge can go far beyond blocking DDoS. The unique way that you identify your site’s users can probably move to the edge too, allowing you to cut response times in your critical path, offload more origin traffic, and make smarter routing decisions at the edge.
In this talk we’ll cover a number of patterns in use by real Fastly customers. Whether you prefer token authentication, pre-shared keys, OAuth, HTTP auth, JSON web tokens, or a complex paywall, learn how you can potentially make your authentication decisions at the edge.
Altitude San Francisco 2018: Testing with Fastly WorkshopFastly
A crucial step for continuous integration and continuous delivery with Fastly is testing the service configuration to provide confidence in changes. This workshop will cover unit-testing VCL, component testing a service as a black box, systems testing a service end-to-end and stakeholder acceptance testing.
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORKFastly
One of the most powerful tools that Fastly offers is worldwide, instant purge. Come learn the ins and outs of how HTTP invalidation works in general and how purge and surrogate keys can be used to improve your site's delivery and get even more value from Fastly.
This talk will also cover the purge blast radius
Surrogate Keys are an amazing way to purge your content from cache, but they can be a bit scary when you aren't sure how many URLs this surrogate key is tied to or what kind of affect this will have on origin. Join the USA Today Network as we explain how we leverage big data tools, Go APIs, New Relic, and Sumo Logic to provide our users a suite of tools for purging content from Fastly. Developers love knowing the blast radius of their surrogate keys, while our engineers love the real-time metrics and notifications we get when developers are hard-purging content.
In this hands-on workshop you will attack a vulnerable web application while defending your own web service behind a Fastly WAF. Attendees will depart understanding how common web application attacks can be exploited as well defended against. They will experience WAF logging and analytics via sumologic to detect attacks realtime. For mitigation you will use a preview version of our newly built WAF rule management UI. We will close off the workshop by deep diving on how our security team analyzed and mitigated some of this summer major vulnerabilities.
Altitude San Francisco 2018: Logging at the Edge Fastly
Fastly delivers more than a million log events per second. Our Real-Time Log Streaming is easy to set up, but there are many features you might not be using to their full extent.
This workshop will cover setting up logging to various endpoints, dealing with structured data, and getting real-time insights into your customers’ behavior.
Altitude San Francisco 2018: Video Workshop DocsFastly
Fastly delivers more than a million log events per second. Our Real-Time Log Streaming is easy to set up, but there are many features you might not be using to their full extent.
This workshop will cover setting up logging to various endpoints, dealing with structured data, and getting real-time insights into your customers’ behavior.
- - - - - - - - - - -
Live streaming and on-demand video can provide a powerful way to connect with customers, but viewers expect seamless pixel-perfect streams without common video delivery inconveniences, such as downtime or lags. This workshop will demonstrate how anyone can deliver live video at scale. We’ll thoroughly explain key video delivery optimizations and more importantly, demonstrate their efficacy using the data collected from both Fastly Log Streaming/Sumo Logic and the Mux quality of experience service.
Altitude San Francisco 2018: Programming the EdgeFastly
Programming the edge
Second floor
Andrew Betts
Principal Developer Advocate, Fastly
Hide abstract
Through our support for running your own code on our edge servers, Fastly's network offers you a platform of unparalleled speed, reliability and efficiency to which you can delegate a surprising amount of logic that has traditionally been in the application layer. In this workshop, you'll implement a series of advanced edge solutions, and learn how to apply these patterns to your own applications to reduce your origin load, dramatically improve performance, and make your applications more secure.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
12. But Why?
This pipeline is one of the oldest systems at Fastly
Born out of our dissatisfaction w the status quo
We wanted something that would send you logs
extremely fast (stream them near realtime) to
anywhere you want (many endpoints)
22. Logging pipeline is Stateless
We don’t batch your logs
We don’t store your logs
We stream your logs in
near real-time to your
defined endpoints
We really don’t want your
logs on disk
27. Logging pipeline is Best Effort
We try our best to send logs to
your defined endpoint
Your endpoint must be up &
healthy in order for us to be
able to send data to it
We have minimal buffering
Pipeline optimized for log
streaming speed
28. Logging Endpoints
We don’t limit the number
of endpoints or log lines
per request
~8.6K active endpoints
Ecosystem of endpoints in
different stages of
evolution
Aggregators
Endpoints
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
36. We send a lot of data continuously to
our supported endpoints
Syslog continues to be our most
popular endpoint but S3 & GCS have
the highest volume
The 70's are still alive with a very
respectable 13 MBps to ftp and 74
kBps to sftp*
* for the non-millennials
Logging Endpoints
39. Volume Challenges
No hard limits to what you
can log, this can be
challenging
System is multi-tenant. Noisy
neighbors can affect delivery
Consider sampling for high
volume logging
40. Burden of many
endpoints
Classic integrations
challenges (each endpoint is
a downstream dependency)
Standard endpoint clients
often don’t meet our needs
Having our own clients
affords us extra optimizations
41. Endpoints & Health
Some endpoints have known
limitations (infamous
examples: S3, BigQuery, GCS)
Difficult to infer if an
endpoint is working or not
(Hard to test setup too)
Structured logging (JSON via
VCL) is challenging
42. Service Isolation
Prioritize delivery of content over
log retention
An aggregator discards the oldest
logs it has when it can’t deliver
them fast enough
In a cache node we are our own
customers so senders do the
same when they can’t reach
aggregators fast enough
43. Expectation Mismatch
Burden of a system that works so well is that it
makes you believe you have strong guarantees
Design constraints determine the SLA of the
pipeline
General advice: Understand the design choices of
the systems you use because they limit what is
possible to guarantee *
45. The team have been Busy bees
H2
H1
Platform performance
& addressing the
challenges of
individual endpoints
We are getting fancy!
46. Platform Performance
Reducing lock contention & CPU usage
Smarter memory allocation &
management
Overhauling all endpoints
Halving the time it takes for a log line to
be processed (from sender read to
aggregator line preparation)
51. Want more endpoints?
Want metrics?
Want easier structured logging?
Want VCL counters + secondly
aggregation + a higher SLA?
Dom Fee
Want More?
52. Want more endpoints?
Want metrics?
Want easier structured logging?
Want VCL counters + secondly
aggregation + a higher SLA?
Dom Fee
Want More?
53. tl;dr LOGGING
Fastly lets you extend the
visibility of your system to the
edge & gain meaningful insights
in near real-time
Is a pipeline with very specific
constraints & guarantees
Exciting things are coming!