This document discusses delivering APIs to production and the challenges of operating APIs in a production environment. It emphasizes designing APIs with production operations in mind from the start by following best practices for logging, monitoring, and error handling. The document demonstrates how Splunk can help with production monitoring and troubleshooting by enabling full-text search of logs, alerting, event correlation, and visualization of log data and metrics.
Common Patterns of Multi Data-Center Architectures with Apache Kafkaconfluent
Whether you know you want to run Apache Kafka in multiple data centers and need practical advice or you are wondering why some organizations even need more than one cluster, this online talk is for you.
In this short session, we’ll discuss the basic patterns of multi-datacenter Kafka architectures, explore some of the use-cases enabled by each architecture and show how Confluent Enterprise products make these patterns easy to implement.
Visit www.confluent.io for more information.
In this session, Neil Avery covers the planning and operation of your KSQL deployment, including under-the-hood architectural details. You will learn about the various deployment models, how to track and monitor your KSQL applications, how to scale in and out and how to think about capacity planning. This is part 3 out of 3 in the Empowering Streams through KSQL series.
Event Driven Architectures with Apache Kafka on HerokuHeroku
Apache Kafka is the backbone for building architectures that deal with billions of events a day. Chris Castle, Developer Advocate, will show you where it might fit in your roadmap.
- What Apache Kafka is and how to use it on Heroku
- How Kafka enables you to model your data as immutable streams of events, introducing greater parallelism into your applications
- How you can use it to solve scale problems across your stack such as managing high throughput inbound events and building data pipelines
Learn more at https://www.heroku.com/kafka
Reveal.js version of slides: http://slides.com/christophercastle/deck#/
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersenconfluent
Best Practices for building Hybrid-Cloud Architectures - Hans Jespersen
Afternoon opening presentation during Confluent’s streaming event in Paris, presented by Hans Jespersen, VP WW Systems Engineering at Confluent.
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, AivenHostedbyConfluent
When moving to a cloud native architecture Moogsoft knew they needed more scale than Rabbit could provide. Moogsoft moved into Kafka which is known for quick writing and driving heavy event driven workloads on top of niceties such as replayability. Choosing the tool was easy, finding a vendor that ticked all their boxes was not. They needed to ensure scalability, upgradability, builds via existing IAC pipelines, and observability via existing tools. When Moogsoft found Aiven, they were impressed with their offering and ability to scale on demand. During this presentation we will explore how Moogsoft used Aiven for Kafka to manage and scale their data in the cloud.
Kafka makes so many things easier to do, from managing metrics to processing streams of data. Yet it seems that so many things we have done to this point in configuring and managing it have been object studies in how to make our lives, as the plumbers who keep the data flowing, more difficult than they have to be. What are some of our favorites?
* Kafka without access controls
* Multitenant clusters with no capacity controls
* Worrying about message schemas
* MirrorMaker inefficiencies
* Hope and pray log compaction
* Configurations as shared secrets
* One-way upgrades
We’ve made a lot of progress over the last few years improving the situation, in part by focusing some of this incredibly talented community towards operational concerns. We’ll talk about the big mistakes you can avoid when setting up multi-tenant Kafka, and some that you still can’t. And we will talk about how to continue down the path of marrying the hot, new features with operational stability so we can all continue to come back here every year to talk about it.
Common Patterns of Multi Data-Center Architectures with Apache Kafkaconfluent
Whether you know you want to run Apache Kafka in multiple data centers and need practical advice or you are wondering why some organizations even need more than one cluster, this online talk is for you.
In this short session, we’ll discuss the basic patterns of multi-datacenter Kafka architectures, explore some of the use-cases enabled by each architecture and show how Confluent Enterprise products make these patterns easy to implement.
Visit www.confluent.io for more information.
In this session, Neil Avery covers the planning and operation of your KSQL deployment, including under-the-hood architectural details. You will learn about the various deployment models, how to track and monitor your KSQL applications, how to scale in and out and how to think about capacity planning. This is part 3 out of 3 in the Empowering Streams through KSQL series.
Event Driven Architectures with Apache Kafka on HerokuHeroku
Apache Kafka is the backbone for building architectures that deal with billions of events a day. Chris Castle, Developer Advocate, will show you where it might fit in your roadmap.
- What Apache Kafka is and how to use it on Heroku
- How Kafka enables you to model your data as immutable streams of events, introducing greater parallelism into your applications
- How you can use it to solve scale problems across your stack such as managing high throughput inbound events and building data pipelines
Learn more at https://www.heroku.com/kafka
Reveal.js version of slides: http://slides.com/christophercastle/deck#/
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersenconfluent
Best Practices for building Hybrid-Cloud Architectures - Hans Jespersen
Afternoon opening presentation during Confluent’s streaming event in Paris, presented by Hans Jespersen, VP WW Systems Engineering at Confluent.
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, AivenHostedbyConfluent
When moving to a cloud native architecture Moogsoft knew they needed more scale than Rabbit could provide. Moogsoft moved into Kafka which is known for quick writing and driving heavy event driven workloads on top of niceties such as replayability. Choosing the tool was easy, finding a vendor that ticked all their boxes was not. They needed to ensure scalability, upgradability, builds via existing IAC pipelines, and observability via existing tools. When Moogsoft found Aiven, they were impressed with their offering and ability to scale on demand. During this presentation we will explore how Moogsoft used Aiven for Kafka to manage and scale their data in the cloud.
Kafka makes so many things easier to do, from managing metrics to processing streams of data. Yet it seems that so many things we have done to this point in configuring and managing it have been object studies in how to make our lives, as the plumbers who keep the data flowing, more difficult than they have to be. What are some of our favorites?
* Kafka without access controls
* Multitenant clusters with no capacity controls
* Worrying about message schemas
* MirrorMaker inefficiencies
* Hope and pray log compaction
* Configurations as shared secrets
* One-way upgrades
We’ve made a lot of progress over the last few years improving the situation, in part by focusing some of this incredibly talented community towards operational concerns. We’ll talk about the big mistakes you can avoid when setting up multi-tenant Kafka, and some that you still can’t. And we will talk about how to continue down the path of marrying the hot, new features with operational stability so we can all continue to come back here every year to talk about it.
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...HostedbyConfluent
From migrations between Apache Kafka clusters to multi-region deployments across datacenters, the introduction of MirrorMaker2 has expanded the possibilities for Apache Kafka deployments and use cases. In this session you will learn about patterns, best practices, and learnings compiled from running MirrorMaker2 in production at every scale.
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
Learn how data in motion is secure within Apache Kafka and the broader Confluent Platform, while data at rest can be secured by solutions like Vormetric Data Security Manager.
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, NutanixHostedbyConfluent
The business systems of an organization are a continuous source of events. Each system also needs to know about events happening in the other systems. Exchanging these events through direct API calls creates a web of inter-dependencies, is fragile and fails to scale. We examine how this problem can be solved through the use of right integration patterns implemented as a light-weight event hub that leverages the power of Kafka and Confluent to operate at enterprise scale. We demonstrate how JavaScript with its event-driven programming model can be a good fit for implementing an event hub that ensures guaranteed message delivery in the face of failures within the individual subscriber systems.
Many organizations having large engineering teams skilled in NodeJS and a multitude of NodeJs applications. We show how these teams can easily leverage the power of Kafka and scale their applications with the right architectural building blocks. We also offer insights from our own experience of building NodeJS based Kafka applications.
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivotalOpenSourceHub
Pivoting Spring XD to Spring Cloud Data Flow: A microservice based architecture for stream processing
Microservice based architectures are not just for distributed web applications! They are also a powerful approach for creating distributed stream processing applications. Spring Cloud Data Flow enables you to create and orchestrate standalone executable applications that communicate over messaging middleware such as Kafka and RabbitMQ that when run together, form a distributed stream processing application. This allows you to scale, version and operationalize stream processing applications following microservice based patterns and practices on a variety of runtime platforms such as Cloud Foundry, Apache YARN and others.
About Sabby Anandan
Sabby Anandan is a Product Manager at Pivotal. Sabby is focused on building products that eliminate the barriers between application development, cloud, and big data.
Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks
Apache Spark is a gift to the big data community, which adds tons of new features on every release. However, it’s difficult to manage petabyte-scale Hadoop clusters with hundreds of edge nodes, multiple Spark releases and demonstrate operational efficiencies and standardization. In order to address these challenges, Paypal has developed and deployed a REST0based Spark platform: Spark Compute as a Service (SCaaS),which provides improved application development, execution, logging, security, workload management and tuning.
This session will walk through the top challenges faced by PayPal administrators, developers and operations and describe how Paypal’s SCaaS platform overcomes them by leveraging open source tools and technologies, like Livy, Jupyter, SparkMagic, Zeppelin, SQL Tools, Kafka and Elastic. You’ll also hear about the improvements PayPal has added, which enable it to run greater than 10,000 Spark applications in production effectively.
Common issues with Apache Kafka® Producerconfluent
Badai Aqrandista, Confluent, Senior Technical Support Engineer
This session will be about a common issue in the Kafka Producer: producer batch expiry. We will be discussing the Kafka Producer internals, its common causes, such as a slow network or small batching, and how to overcome them. We will also be sharing some examples along the way!
https://www.meetup.com/apache-kafka-sydney/events/279651982/
A Streaming Platform Architecture Based on Apache Kafkaconfluent
Presentation for the 4/11/17 Apache Kafka Bay Area, hosted by Uber.
What happens if you take everything that is happening in your company -- every click, every database change, every application log -- and make it all available as a real-time stream of well structure data? This session will explain how to combine the full Apache Kafka toolkit to accomplish this and shift from batch-oriented data integration and data processing to real-time streams and real-time processing.
We will explain how the design and implementation of Kafka enables it to act as a scalable platform for streams of event data. The Kafka Connect API is a tool for scalable, fault-tolerant data import and export and turns Kafka into a hub for all your real-time data and bridges the gap between real-time and batch systems. The Kafka Streams API is a new library built right into Kafka that provides the corresponding processing support. It is built leveraging Kafka's existing low-level clients, and provides a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. These three components provide all the components you need for a data pipeline: storage, import/export, and processing.
Finally, we'll describe an architecture for a stream data platform that combines these tools to react to all your inbound streams of state. This architecture only requires tools that ship with Apache Kafka, is lightweight in terms of deployment and management, and yet can scale to support large organizations with massive data pipelines.
Presentation by Gwen Shapira, Product Manager, Confluent.
With the rapid increase of Apache Kafka use within organizations, issues of data governance and data quality take center stage. When more and more disparate departments and teams depend on the data in Apache Kafka, it’s important to provide a way to make sure "bad data" does not make its way into critical topics. Every organization that uses Kafka at large scale realize they need a way to deliver these guarantees.
In this talk, Kafka committer, Gwen Shapira will review the benefits of a schema registry for large-scale Kafka deployments and will give high-level overview of how the Confluent schema registry is being used in enterprise architectures across industry.
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services
Providing a great media consumption experience to customers is crucial to maximizing audience engagement. To do that, it is important that you make content available for consumption anytime, anywhere, on any device, with a personalized and interactive experience. This session explores the power of big data log analytics (real-time and batched), using technologies like Spark, Shark, Kafka, Amazon Elastic MapReduce, Amazon Redshift and other AWS services. Such analytics are useful for content personalization, recommendations, personalized dynamic ad-insertions, interactivity, and streaming quality.
This session also includes a discussion from Netflix, which explores personalized content search and discovery with the power of metadata.
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platformconfluent
Many enterprises have a large technical debt in legacy applications hosted in on-premises data centers. There is a strong desire to modernize and move to a cloud-based infrastructure, but the world won’t stop for you to transition. Existing applications need to be supported and enhanced; data from legacy platforms is required to make decisions that drive the business. On the other hand, data from cloud-based applications does not exist in a vacuum. Legacy applications need access to these cloud data sources and vice versa.
Can an enterprise have it both ways? Can new applications be built in the cloud while existing applications are maintained in a private data center?
Monsanto has adopted a cloud-first mentality—today most new development is focused on the cloud. However, this transition did not happen overnight.
Chrix Finne and Bob Lehmann share their experience building and implementing a Kafka-based cross-data-center streaming platform to facilitate the move to the cloud—in the process, kick-starting Monsanto’s transition from batch to stream processing. Details include an overview of the challenges involved in transitioning to the cloud and a deep dive into the cross-data-center stream platform architecture, including best practices for running this architecture in production and a summary of the benefits seen after deploying this architecture.
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...HostedbyConfluent
Managing a distributed system like Apache Kafka can be extremely challenging, especially when you try to approach monitoring and managing from a single centralized GUI approach. In this talk come here and see a demo of a more decoupled approach to Kafka management and Kafka Monitoring where data is centralized but access is is distributed to scale to enterprise deployments, CICD pipelines and much much more.
Watch this talk here: https://www.confluent.io/online-talks/how-apache-kafka-works-on-demand
Pick up best practices for developing applications that use Apache Kafka, beginning with a high level code overview for a basic producer and consumer. From there we’ll cover strategies for building powerful stream processing applications, including high availability through replication, data retention policies, producer design and producer guarantees.
We’ll delve into the details of delivery guarantees, including exactly-once semantics, partition strategies and consumer group rebalances. The talk will finish with a discussion of compacted topics, troubleshooting strategies and a security overview.
This session is part 3 of 4 in our Fundamentals for Apache Kafka series.
So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...confluent
Kafka, many times is just a piece of the stack that lives in production that often times no one wants to touch – because it just works. At AppsFlyer, a mobile attribution and analysis platform that generates a constant “storm” of 70B+ events (HTTP Requests) daily, Kafka sits at the core of our infrastructure. Recently I inherited the daunting task of managing our Kafka operation and discovered a lot of technical debt we needed to recover from if we wanted to be able sustain our next phase of growth. This talk will dive into how to safely migrate from outdated versions, how to gain trust with developers to migrate their production services, how to manage and monitor the right metrics and build resiliency into the architecture, as well as how to plan for continued improvements through paradigms such as sleep-driven design, and much more.
Running Apache Kafka in production is only the first step in the Kafka operations journey. Professional Kafka users are ready to handle all possible disasters - because for most businesses having a disaster recovery plan is not optional.
In this session, we’ll discuss disaster scenarios that can take down entire Kafka clusters and share advice on how to plan, prepare and handle these events. This is a technical session full of best practices - we want to make sure you are ready to handle the worst mayhem that nature and auditors can cause.
Visit www.confluent.io for more information.
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
Kafka operators need to provide guarantees to the business that Kafka is working properly and delivering data in real time, and they need to identify and triage problems so they can solve them before end users notice them. This elevates the importance of Kafka monitoring from a nice-to-have to an operational necessity. In this talk, Kafka operations experts Xavier Léauté and Gwen Shapira share their best practices for monitoring Kafka and the streams of events flowing through it. How to detect duplicates, catch buggy clients, and triage performance issues – in short, how to keep the business’s central nervous system healthy and humming along, all like a Kafka pro.
Speakers: Gwen Shapira, Xavier Leaute (Confluence)
Gwen is a software engineer at Confluent working on core Apache Kafka. She has 15 years of experience working with code and customers to build scalable data architectures. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.
Xavier Leaute is One of the first engineers to Confluent team, Xavier is responsible for analytics infrastructure, including real-time analytics in KafkaStreams. He was previously a quantitative researcher at BlackRock. Prior to that, he held various research and analytics roles at Barclays Global Investors and MSCI.
Advanced dev ops governance with terraformJames Counts
DevOps project sprawl is real! Large organizations with many teams need to support a variety of configurations from infrastructure governance to domain-specific app deployments, all while enforcing good security practices like least privilege for each team. Maintaining these controls by hand leads to complexity, stagnation, and insecure shortcuts. In this session, you'll learn how Terraform can automate this configuration--using Terraform--and make doing the right thing easy!
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
Siphon is a highly available and reliable distributed pub/sub system built using Apache Kafka. It is used to publish, discover and subscribe to near real-time data streams for operational and product intelligence. Siphon is used as a “Databus” by a variety of producers and subscribers in Microsoft, and is compliant with security and privacy requirements. It has a built-in Auditing and Quality control. This session will provide an overview of the use of Kafka at Microsoft, and then deep dive into Siphon. We will describe an important business scenario and talk about the technical details of the system in the context of that scenario. We will also cover the design and implementation of the service, the scale, and real world production experiences from operating the service in the Microsoft cloud environment.
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...HostedbyConfluent
From migrations between Apache Kafka clusters to multi-region deployments across datacenters, the introduction of MirrorMaker2 has expanded the possibilities for Apache Kafka deployments and use cases. In this session you will learn about patterns, best practices, and learnings compiled from running MirrorMaker2 in production at every scale.
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
Learn how data in motion is secure within Apache Kafka and the broader Confluent Platform, while data at rest can be secured by solutions like Vormetric Data Security Manager.
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, NutanixHostedbyConfluent
The business systems of an organization are a continuous source of events. Each system also needs to know about events happening in the other systems. Exchanging these events through direct API calls creates a web of inter-dependencies, is fragile and fails to scale. We examine how this problem can be solved through the use of right integration patterns implemented as a light-weight event hub that leverages the power of Kafka and Confluent to operate at enterprise scale. We demonstrate how JavaScript with its event-driven programming model can be a good fit for implementing an event hub that ensures guaranteed message delivery in the face of failures within the individual subscriber systems.
Many organizations having large engineering teams skilled in NodeJS and a multitude of NodeJs applications. We show how these teams can easily leverage the power of Kafka and scale their applications with the right architectural building blocks. We also offer insights from our own experience of building NodeJS based Kafka applications.
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivotalOpenSourceHub
Pivoting Spring XD to Spring Cloud Data Flow: A microservice based architecture for stream processing
Microservice based architectures are not just for distributed web applications! They are also a powerful approach for creating distributed stream processing applications. Spring Cloud Data Flow enables you to create and orchestrate standalone executable applications that communicate over messaging middleware such as Kafka and RabbitMQ that when run together, form a distributed stream processing application. This allows you to scale, version and operationalize stream processing applications following microservice based patterns and practices on a variety of runtime platforms such as Cloud Foundry, Apache YARN and others.
About Sabby Anandan
Sabby Anandan is a Product Manager at Pivotal. Sabby is focused on building products that eliminate the barriers between application development, cloud, and big data.
Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks
Apache Spark is a gift to the big data community, which adds tons of new features on every release. However, it’s difficult to manage petabyte-scale Hadoop clusters with hundreds of edge nodes, multiple Spark releases and demonstrate operational efficiencies and standardization. In order to address these challenges, Paypal has developed and deployed a REST0based Spark platform: Spark Compute as a Service (SCaaS),which provides improved application development, execution, logging, security, workload management and tuning.
This session will walk through the top challenges faced by PayPal administrators, developers and operations and describe how Paypal’s SCaaS platform overcomes them by leveraging open source tools and technologies, like Livy, Jupyter, SparkMagic, Zeppelin, SQL Tools, Kafka and Elastic. You’ll also hear about the improvements PayPal has added, which enable it to run greater than 10,000 Spark applications in production effectively.
Common issues with Apache Kafka® Producerconfluent
Badai Aqrandista, Confluent, Senior Technical Support Engineer
This session will be about a common issue in the Kafka Producer: producer batch expiry. We will be discussing the Kafka Producer internals, its common causes, such as a slow network or small batching, and how to overcome them. We will also be sharing some examples along the way!
https://www.meetup.com/apache-kafka-sydney/events/279651982/
A Streaming Platform Architecture Based on Apache Kafkaconfluent
Presentation for the 4/11/17 Apache Kafka Bay Area, hosted by Uber.
What happens if you take everything that is happening in your company -- every click, every database change, every application log -- and make it all available as a real-time stream of well structure data? This session will explain how to combine the full Apache Kafka toolkit to accomplish this and shift from batch-oriented data integration and data processing to real-time streams and real-time processing.
We will explain how the design and implementation of Kafka enables it to act as a scalable platform for streams of event data. The Kafka Connect API is a tool for scalable, fault-tolerant data import and export and turns Kafka into a hub for all your real-time data and bridges the gap between real-time and batch systems. The Kafka Streams API is a new library built right into Kafka that provides the corresponding processing support. It is built leveraging Kafka's existing low-level clients, and provides a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. These three components provide all the components you need for a data pipeline: storage, import/export, and processing.
Finally, we'll describe an architecture for a stream data platform that combines these tools to react to all your inbound streams of state. This architecture only requires tools that ship with Apache Kafka, is lightweight in terms of deployment and management, and yet can scale to support large organizations with massive data pipelines.
Presentation by Gwen Shapira, Product Manager, Confluent.
With the rapid increase of Apache Kafka use within organizations, issues of data governance and data quality take center stage. When more and more disparate departments and teams depend on the data in Apache Kafka, it’s important to provide a way to make sure "bad data" does not make its way into critical topics. Every organization that uses Kafka at large scale realize they need a way to deliver these guarantees.
In this talk, Kafka committer, Gwen Shapira will review the benefits of a schema registry for large-scale Kafka deployments and will give high-level overview of how the Confluent schema registry is being used in enterprise architectures across industry.
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services
Providing a great media consumption experience to customers is crucial to maximizing audience engagement. To do that, it is important that you make content available for consumption anytime, anywhere, on any device, with a personalized and interactive experience. This session explores the power of big data log analytics (real-time and batched), using technologies like Spark, Shark, Kafka, Amazon Elastic MapReduce, Amazon Redshift and other AWS services. Such analytics are useful for content personalization, recommendations, personalized dynamic ad-insertions, interactivity, and streaming quality.
This session also includes a discussion from Netflix, which explores personalized content search and discovery with the power of metadata.
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platformconfluent
Many enterprises have a large technical debt in legacy applications hosted in on-premises data centers. There is a strong desire to modernize and move to a cloud-based infrastructure, but the world won’t stop for you to transition. Existing applications need to be supported and enhanced; data from legacy platforms is required to make decisions that drive the business. On the other hand, data from cloud-based applications does not exist in a vacuum. Legacy applications need access to these cloud data sources and vice versa.
Can an enterprise have it both ways? Can new applications be built in the cloud while existing applications are maintained in a private data center?
Monsanto has adopted a cloud-first mentality—today most new development is focused on the cloud. However, this transition did not happen overnight.
Chrix Finne and Bob Lehmann share their experience building and implementing a Kafka-based cross-data-center streaming platform to facilitate the move to the cloud—in the process, kick-starting Monsanto’s transition from batch to stream processing. Details include an overview of the challenges involved in transitioning to the cloud and a deep dive into the cross-data-center stream platform architecture, including best practices for running this architecture in production and a summary of the benefits seen after deploying this architecture.
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...HostedbyConfluent
Managing a distributed system like Apache Kafka can be extremely challenging, especially when you try to approach monitoring and managing from a single centralized GUI approach. In this talk come here and see a demo of a more decoupled approach to Kafka management and Kafka Monitoring where data is centralized but access is is distributed to scale to enterprise deployments, CICD pipelines and much much more.
Watch this talk here: https://www.confluent.io/online-talks/how-apache-kafka-works-on-demand
Pick up best practices for developing applications that use Apache Kafka, beginning with a high level code overview for a basic producer and consumer. From there we’ll cover strategies for building powerful stream processing applications, including high availability through replication, data retention policies, producer design and producer guarantees.
We’ll delve into the details of delivery guarantees, including exactly-once semantics, partition strategies and consumer group rebalances. The talk will finish with a discussion of compacted topics, troubleshooting strategies and a security overview.
This session is part 3 of 4 in our Fundamentals for Apache Kafka series.
So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...confluent
Kafka, many times is just a piece of the stack that lives in production that often times no one wants to touch – because it just works. At AppsFlyer, a mobile attribution and analysis platform that generates a constant “storm” of 70B+ events (HTTP Requests) daily, Kafka sits at the core of our infrastructure. Recently I inherited the daunting task of managing our Kafka operation and discovered a lot of technical debt we needed to recover from if we wanted to be able sustain our next phase of growth. This talk will dive into how to safely migrate from outdated versions, how to gain trust with developers to migrate their production services, how to manage and monitor the right metrics and build resiliency into the architecture, as well as how to plan for continued improvements through paradigms such as sleep-driven design, and much more.
Running Apache Kafka in production is only the first step in the Kafka operations journey. Professional Kafka users are ready to handle all possible disasters - because for most businesses having a disaster recovery plan is not optional.
In this session, we’ll discuss disaster scenarios that can take down entire Kafka clusters and share advice on how to plan, prepare and handle these events. This is a technical session full of best practices - we want to make sure you are ready to handle the worst mayhem that nature and auditors can cause.
Visit www.confluent.io for more information.
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
Kafka operators need to provide guarantees to the business that Kafka is working properly and delivering data in real time, and they need to identify and triage problems so they can solve them before end users notice them. This elevates the importance of Kafka monitoring from a nice-to-have to an operational necessity. In this talk, Kafka operations experts Xavier Léauté and Gwen Shapira share their best practices for monitoring Kafka and the streams of events flowing through it. How to detect duplicates, catch buggy clients, and triage performance issues – in short, how to keep the business’s central nervous system healthy and humming along, all like a Kafka pro.
Speakers: Gwen Shapira, Xavier Leaute (Confluence)
Gwen is a software engineer at Confluent working on core Apache Kafka. She has 15 years of experience working with code and customers to build scalable data architectures. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.
Xavier Leaute is One of the first engineers to Confluent team, Xavier is responsible for analytics infrastructure, including real-time analytics in KafkaStreams. He was previously a quantitative researcher at BlackRock. Prior to that, he held various research and analytics roles at Barclays Global Investors and MSCI.
Advanced dev ops governance with terraformJames Counts
DevOps project sprawl is real! Large organizations with many teams need to support a variety of configurations from infrastructure governance to domain-specific app deployments, all while enforcing good security practices like least privilege for each team. Maintaining these controls by hand leads to complexity, stagnation, and insecure shortcuts. In this session, you'll learn how Terraform can automate this configuration--using Terraform--and make doing the right thing easy!
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
Siphon is a highly available and reliable distributed pub/sub system built using Apache Kafka. It is used to publish, discover and subscribe to near real-time data streams for operational and product intelligence. Siphon is used as a “Databus” by a variety of producers and subscribers in Microsoft, and is compliant with security and privacy requirements. It has a built-in Auditing and Quality control. This session will provide an overview of the use of Kafka at Microsoft, and then deep dive into Siphon. We will describe an important business scenario and talk about the technical details of the system in the context of that scenario. We will also cover the design and implementation of the service, the scale, and real world production experiences from operating the service in the Microsoft cloud environment.
A whirlwind tour of Event Driven Architecture, extensibility, Domain Driven Design, Command and Query Responsibility Segregation (CQRS) and Complex Event Processing
Building Event-Driven Systems with Apache KafkaBrian Ritchie
Event-driven systems provide simplified integration, easy notifications, inherent scalability and improved fault tolerance. In this session we'll cover the basics of building event driven systems and then dive into utilizing Apache Kafka for the infrastructure. Kafka is a fast, scalable, fault-taulerant publish/subscribe messaging system developed by LinkedIn. We will cover the architecture of Kafka and demonstrate code that utilizes this infrastructure including C#, Spark, ELK and more.
Sample code: https://github.com/dotnetpowered/StreamProcessingSample
This presentation includes a comprehensive introduction to Apache Spark. From an explanation of its rapid ascent to performance and developer advantages over MapReduce. We also explore its built-in functionality for application types involving streaming, machine learning, and Extract, Transform and Load (ETL).
Alpine academy apache spark series #1 introduction to cluster computing wit...Holden Karau
Alpine academy apache spark series #1 introduction to cluster computing with python & a wee bit of scala. This is the first in the series and is aimed at the intro level, the next one will cover MLLib & ML.
This talk will address new architectures emerging for large scale streaming analytics. Some based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) and other newer streaming analytics platforms and frameworks using Apache Flink or GearPump. Popular architecture like Lambda separate layers of computation and delivery and require many technologies which have overlapping functionality. Some of this results in duplicated code, untyped processes, or high operational overhead, let alone the cost (e.g. ETL).
I will discuss the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. We will cover how the particular set of technologies addresses common requirements and how collaboratively they work together to enrich and reinforce each other.
Since 2014, Typesafe has been actively contributing to the Apache Spark project, and has become a certified development support partner of Databricks, the company started by the creators of Spark. Typesafe and Mesosphere have forged a partnership in which Typesafe is the official commercial support provider of Spark on Apache Mesos, along with Mesosphere’s Datacenter Operating Systems (DCOS).
In this webinar with Iulian Dragos, Spark team lead at Typesafe Inc., we reveal how Typesafe supports running Spark in various deployment modes, along with the improvements we made to Spark to help integrate backpressure signals into the underlying technologies, making it a better fit for Reactive Streams. He also show you the functionalities at work, and how to make it simple to deploy to Spark on Mesos with Typesafe.
We will introduce:
Various deployment modes for Spark: Standalone, Spark on Mesos, and Spark with Mesosphere DCOS
Overview of Mesos and how it relates to Mesosphere DCOS
Deeper look at how Spark runs on Mesos
How to manage coarse-grained and fine-grained scheduling modes on Mesos
What to know about a client vs. cluster deployment
A demo running Spark on Mesos
Everyone in the Scala world is using or looking into using Akka for low-latency, scalable, distributed or concurrent systems. I'd like to share my story of developing and productionizing multiple Akka apps, including low-latency ingestion and real-time processing systems, and Spark-based applications.
When does one use actors vs futures?
Can we use Akka with, or in place of, Storm?
How did we set up instrumentation and monitoring in production?
How does one use VisualVM to debug Akka apps in production?
What happens if the mailbox gets full?
What is our Akka stack like?
I will share best practices for building Akka and Scala apps, pitfalls and things we'd like to avoid, and a vision of where we would like to go for ideal Akka monitoring, instrumentation, and debugging facilities. Plus backpressure and at-least-once processing.
NOTE: This was converted to Powerpoint from Keynote. Slideshare does not play the embedded videos. You can download the powerpoint from slideshare and import it into keynote. The videos should work in the keynote.
Abstract:
In this presentation, we will describe the "Spark Kernel" which enables applications, such as end-user facing and interactive applications, to interface with Spark clusters. It provides a gateway to define and run Spark tasks and to collect results from a cluster without the friction associated with shipping jars and reading results from peripheral systems. Using the Spark Kernel as a proxy, applications can be hosted remotely from Spark.
Reactive app using actor model & apache sparkRahul Kumar
Developing Application with Big Data is really challenging work, scaling, fault tolerance and responsiveness some are the biggest challenge. Realtime bigdata application that have self healing feature is a dream these days. Apache Spark is a fast in-memory data processing system that gives a good backend for realtime application.In this talk I will show how to use reactive platform, Actor model and Apache Spark stack to develop a system that have responsiveness, resiliency, fault tolerance and message driven feature.
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
We present a solution for streaming anomaly detection, named “Coral”, based on Spark, Akka and Cassandra. In the system presented, we run Spark to run the data analytics pipeline for anomaly detection. By running Spark on the latest events and data, we make sure that the model is always up-to-date and that the amount of false positives is kept low, even under changing trends and conditions. Our machine learning pipeline uses Spark decision tree ensembles and k-means clustering. Once the model is trained by Spark, the model’s parameters are pushed to the Streaming Event Processing Layer, implemented in Akka. The Akka layer will then score 1000s of event per seconds according to the last model provided by Spark. Spark and Akka communicate which each other using Cassandra as a low-latency data store. By doing so, we make sure that every element of this solution is resilient and distributed. Spark performs micro-batches to keep the model up-to-date while Akka detects the new anomalies by using the latest Spark-generated data model. The project is currently hosted on Github. Have a look at : http://coral-streaming.github.io
Reactive dashboard’s using apache sparkRahul Kumar
Apache Spark's Tutorial talk, In this talk i explained how to start working with Apache spark, feature of apache spark and how to compose data platform with spark. This talk also explains about reactive platform, tools and framework like Play, akka.
Spark Streaming makes it easy to build scalable fault-tolerant streaming applications. In this webinar, developers will learn:
*How Spark Streaming works - a quick review.
*Features in Spark Streaming that help prevent potential data loss.
*Complementary tools in a streaming pipeline - Kafka and Akka.
*Design and tuning tips for Reactive Spark Streaming applications.
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
This talk is about architecture designs for data processing platforms based on SMACK stack which stands for Spark, Mesos, Akka, Cassandra and Kafka. The main topics of the talk are:
- SMACK stack overview
- storage layer layout
- fixing NoSQL limitations (joins and group by)
- cluster resource management and dynamic allocation
- reliable scheduling and execution at scale
- different options for getting the data into your system
- preparing for failures with proper backup and patching strategies
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
Regardless of the meaning we are searching for over our vast amounts of data, whether we are in science, finance, technology, energy, health care…, we all share the same problems that must be solved: How do we achieve that? What technologies best support the requirements? This talk is about how to leverage fast access to historical data with real time streaming data for predictive modeling for lambda architecture with Spark Streaming, Kafka, Cassandra, Akka and Scala. Efficient Stream Computation, Composable Data Pipelines, Data Locality, Cassandra data model and low latency, Kafka producers and HTTP endpoints as akka actors...
Splunk Sales Presentation Imagemaker 2014Urena Nicolas
Splunk provee Inteligencia operativa para todos
Splunk es la plataforma de inteligencia operativa en tiempo real líder del sector. Es una forma fácil, rápida y segura de buscar, analizar y visualizar los grandes flujos de datos de máquina generados por sus sistemas de TI e infraestructura tecnológica (físicos, virtuales y en la nube).
Splunk Enterprise 6 es la versión más reciente y proporciona:
- Análisis potente para todos los usuarios a velocidades sorprendentes
- Experiencia de usuario completamente rediseñada
- Entorno del desarrollador más enriquecido para una ampliación fácil de la plataforma
Splunk Enterprise 6 ya está disponible. Descárguelo ahora y pruébelo usted mismo.
More info: https://cnfl.io/cloud-native-experience-for-kafka-in-cloud | Neha Narkhede is co-founder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s streaming infrastructure built on top of Apache Kafka and Apache Samza. She is one of the initial authors of Apache Kafka and a committer and PMC member on the project.
Monitor OpenStack Environments from the bottom up and front to backIcinga
Talk given by Thomas Stocking at Icinga Camp San Francisco 2016 - https://www.icinga.org/community/events/archive/2016-archive/icinga-camp-san-francisco/
If you missed the SpringOne Conference this year, don't fret! In this session you'll get the opportunity to get the highlights of the trip Jeroen and Tim made to Las Vegas and they'll show you the coolest stuff from Spring and CloudFoundry!
An overview of Splunk Enterprise 6.3. Presented by Splunk's Jim Viegas at GTRI's Splunk Tech Day, December 8, 2015.
Visit http://www.gtri.com/ for more information.
Current & Future Use-Cases of OpenDaylightabhijit2511
OpenDaylight Overview and Architecture
• OpenDaylight Use Cases (Partial List)
I. Network Abstraction
II. ONAP
III. Network Virtualization
IV. AI/ML with OpenDaylight
V. ODL in OSS
• OpenDaylight: Getting Involved
If you're like most of the world, you're on an aggressive race to implement machine learning applications and on a path to get to deep learning. If you can give better service at a lower cost, you will be the winners in 2030. But infrastructure is a key challenge to getting there. What does the technology infrastructure look like over the next decade as you move from Petabytes to Exabytes? How are you budgeting for more colossal data growth over the next decade? How do your data scientists share data today and will it scale for 5-10 years? Do you have the appropriate security, governance, back-up and archiving processes in place? This session will address these issues and discuss strategies for customers as they ramp up their AI journey with a long term view.
This presentation provides an objective approach to make your legacy and custom-built applications agile and infused with intelligence. This allows your apps to utilize new and more substantial data sets as well as apply artificial intelligence and machine learning to take in-the-moment actions.
Saca el máximo partido a tus sistemas con Oracle Cloud 'Observability' y Management Platform.
Las empresas viven un proceso acelerado de evolución de sus sistemas y aplicaciones. Los entornos tradicionales se mezclan con los virtualizados y con tecnologías cloud, y es necesario obtener el mejor rendimiento de todos ellos.
¿Conoces el detalle de todos tus sistemas y la relación entre las diferentes tecnologías para resolver posibles problemas?
En esta nueva edición de nuestras Tech Dates, avanttic y Oracle te presentamos una introducción a Oracle Cloud Observability and Management Platform, una solución global para la gestión de sistemas complejos y dinámicos que cubre estas nuevas necesidades, tanto en entornos on-premise como cloud.
También repasaremos la experiencia de avanttic con esta herramienta, que maximiza el rendimiento y disponibilidad de los sistemas más críticos, y veremos las ventajas que nuestros clientes ya están obteniendo tras desplegarla.
Security Requires Visibility-Turn Data Into Security InsightAmazon Web Services
Security and visibility are critical considerations for any AWS deployment.
In this webinar, you’ll learn how EnerNOC, a leading provider of energy management solutions, seamlessly transforms data from their AWS environment (including AWS CloudTrail, AWS Config and VPC Flow Logs) into real-time security insights. You will gain insight into the pre-built dashboards and reports delivered by the free Splunk App for AWS, and how EnerNOC has identified and resolved security risks using this app. You will also hear from AWS and Splunk about the importance of utilizing these solutions for optimizing AWS security.
Join us to learn:
• How to help ensure your AWS deployment is secure
• How to instantly take advantage of data from AWS CloudTrail and AWS Config
• Real-world best practices from an experienced AWS customer
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
4. Design with Production in Mind
4
Scalability & Fault Tolerance
• REST-based architecture
• Stateless
• Load-Balancing
• Versioning
How will you troubleshoot in production?
• Only production acts like production
• You need operational visibility
Pressure to increase velocity and delivery business value
• Limited insights into behavior and performance from application logs
• Building comprehensive management tools take time
6. • REST-based API
• Dependency Injection
• ORM
• Database
• Logging Framework
Coding the service
6
7. • Write log data to local file
• Institute a log rotation policy
• Begin each event with a timestamp
• Generate a unique identifier that is assigned to related events
• Use key-value pairs to describe the properties of events
o Standardize field names across the application
• Avoid excessively long events
• Avoid spamming the log
Logging Best Practices
7
11. Q: how many programmers does it take to
change a light bulb?
11
A: none, they just make darkness standard
and tell everyone "this behavior is by
design"
One way to deal with errors…
12. Or, you can design for production:
You have a live system, tons of log data, and
you need:
Monitoring & Trending
Alerting
Event Correlation
Troubleshooting across multiple systems
Billing
12
14. 1414
Splunk to the Rescue
Two threads walk into a bar. The barkeeper looks up
and yells, "hey, I want don't any conditions race like
time last!"
Looking at logs can be like this:
15. 1515
Splunk to the Rescue
But Splunk makes it easy to find & correlate information in your logs:
16. 1616
Splunk to the Rescue
Analysis &
Reporting
Security &
Compliance
Infrastructure &
Operations
Application
Management
Splunk brings value
to your machine
data making it
accessible to the
enterprise.