EDA Governance Model: a multicloud approach based on GitOps | Alejandro Alija...HostedbyConfluent
The first question that arises when you start a new EDA project is how to govern the system? An entire ecosystem of applications, backends, events, and APIs must co-exist under the same architecture. The architect team must keep in well-balanced the warranty of the reliability, coherence, and security of the system along with the ability for the developing team to agile create new apps. In this work, we present a general governance framework we designed base on our experience with one of the largest insurance companies in Spain. Our framework is based on the catalog and the cataloging process; the deployment & provision process, and the operational & exploitation model. The framework's main engine is the catalog & a catalog-enforcement approach rules the entire system. Catalog implementation is based on GitOps practices. Along with this catalog, we envisioned an event-portal (as a UI) and a set of hand-on-labs to help users to train their skills on the new EDA Architecture.
Kafka makes so many things easier to do, from managing metrics to processing streams of data. Yet it seems that so many things we have done to this point in configuring and managing it have been object studies in how to make our lives, as the plumbers who keep the data flowing, more difficult than they have to be. What are some of our favorites?
* Kafka without access controls
* Multitenant clusters with no capacity controls
* Worrying about message schemas
* MirrorMaker inefficiencies
* Hope and pray log compaction
* Configurations as shared secrets
* One-way upgrades
We’ve made a lot of progress over the last few years improving the situation, in part by focusing some of this incredibly talented community towards operational concerns. We’ll talk about the big mistakes you can avoid when setting up multi-tenant Kafka, and some that you still can’t. And we will talk about how to continue down the path of marrying the hot, new features with operational stability so we can all continue to come back here every year to talk about it.
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...confluent
Is your organization adopting Kafka as their messaging bus but you've found that it will take too long to migrate your existing service-oriented architecture to a log-oriented architecture? Some of the biggest challenges in building a new stream processor can be implementing all the business logic again. It has become increasingly common for companies with high-throughput source streams and change-data-capture logs to want to build systems fast. At Ticketmaster, we have found a solution to the problem by leveraging the business logic in our existing services and calling them from our Java based KafkaStreams processor applications in an efficient manner. In this talk, we will examine the initial challenges we faced in our transition, then we will explore the solutions we built to address the use cases at Ticketmaster. The primary focus will address our workflow around calling services to bring stream processor applications to market fast. We will review our challenges and share tips for success.
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...HostedbyConfluent
The Ohio Department of Transportation has adopted Confluent as the event driven enabler of DriveOhio, a modern Intelligent Transportation System. DriveOhio digitally links sensors, cameras, speed monitoring equipment, and smart highway assets in real time, to dynamically adjust the surface road network to maximize the safety and efficiency for travelers. Over the past 24 months the team has increased the number and types of devices within the DriveOhio environment, while also working to see their vendors adopt Kafka to better participate in data sharing.
My team at Zalando fell in love with KStreams and their programming model straight out of the gate. However, as a small team of developers, building out and supporting our infrastructure while still trying to deliver solutions for our business has not always resulted in a smooth journey. Can a small team of a couple of developers run their own Kafka infrastructure confidently and still spend most of their time developing code? In this talk, we will dive into some of the problems we experienced while running Kafka brokers and Kafka streams applications, as well as the consultations we had with other teams around this matter. We will outline some of the pragmatic decisions we made regarding backups, monitoring and operations to minimize our time spent administering our Kafka brokers and various stream applications.
Monitoring and Troubleshooting a Real Time PipelineApache Apex
Alan Ngai, CTO/Co-Founder, OpsClarity
OpsClarity is a performance monitoring solution for stream processing applications. In additional to providing deep component monitoring it leverages data science to proactively identify anomalies across the entire data pipeline and correlates issues across the data and app tier to identify common concerns that impact business. OpsClarity automatically discovers the entire app and data topology and is years ahead of anything else in how it leverages the rich meta-data and network dependency context captured through the topology to provide rich analysis and fastest correlated troubleshooting. This talk will additionally cover integration with Apache Apex.
EDA Governance Model: a multicloud approach based on GitOps | Alejandro Alija...HostedbyConfluent
The first question that arises when you start a new EDA project is how to govern the system? An entire ecosystem of applications, backends, events, and APIs must co-exist under the same architecture. The architect team must keep in well-balanced the warranty of the reliability, coherence, and security of the system along with the ability for the developing team to agile create new apps. In this work, we present a general governance framework we designed base on our experience with one of the largest insurance companies in Spain. Our framework is based on the catalog and the cataloging process; the deployment & provision process, and the operational & exploitation model. The framework's main engine is the catalog & a catalog-enforcement approach rules the entire system. Catalog implementation is based on GitOps practices. Along with this catalog, we envisioned an event-portal (as a UI) and a set of hand-on-labs to help users to train their skills on the new EDA Architecture.
Kafka makes so many things easier to do, from managing metrics to processing streams of data. Yet it seems that so many things we have done to this point in configuring and managing it have been object studies in how to make our lives, as the plumbers who keep the data flowing, more difficult than they have to be. What are some of our favorites?
* Kafka without access controls
* Multitenant clusters with no capacity controls
* Worrying about message schemas
* MirrorMaker inefficiencies
* Hope and pray log compaction
* Configurations as shared secrets
* One-way upgrades
We’ve made a lot of progress over the last few years improving the situation, in part by focusing some of this incredibly talented community towards operational concerns. We’ll talk about the big mistakes you can avoid when setting up multi-tenant Kafka, and some that you still can’t. And we will talk about how to continue down the path of marrying the hot, new features with operational stability so we can all continue to come back here every year to talk about it.
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...confluent
Is your organization adopting Kafka as their messaging bus but you've found that it will take too long to migrate your existing service-oriented architecture to a log-oriented architecture? Some of the biggest challenges in building a new stream processor can be implementing all the business logic again. It has become increasingly common for companies with high-throughput source streams and change-data-capture logs to want to build systems fast. At Ticketmaster, we have found a solution to the problem by leveraging the business logic in our existing services and calling them from our Java based KafkaStreams processor applications in an efficient manner. In this talk, we will examine the initial challenges we faced in our transition, then we will explore the solutions we built to address the use cases at Ticketmaster. The primary focus will address our workflow around calling services to bring stream processor applications to market fast. We will review our challenges and share tips for success.
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...HostedbyConfluent
The Ohio Department of Transportation has adopted Confluent as the event driven enabler of DriveOhio, a modern Intelligent Transportation System. DriveOhio digitally links sensors, cameras, speed monitoring equipment, and smart highway assets in real time, to dynamically adjust the surface road network to maximize the safety and efficiency for travelers. Over the past 24 months the team has increased the number and types of devices within the DriveOhio environment, while also working to see their vendors adopt Kafka to better participate in data sharing.
My team at Zalando fell in love with KStreams and their programming model straight out of the gate. However, as a small team of developers, building out and supporting our infrastructure while still trying to deliver solutions for our business has not always resulted in a smooth journey. Can a small team of a couple of developers run their own Kafka infrastructure confidently and still spend most of their time developing code? In this talk, we will dive into some of the problems we experienced while running Kafka brokers and Kafka streams applications, as well as the consultations we had with other teams around this matter. We will outline some of the pragmatic decisions we made regarding backups, monitoring and operations to minimize our time spent administering our Kafka brokers and various stream applications.
Monitoring and Troubleshooting a Real Time PipelineApache Apex
Alan Ngai, CTO/Co-Founder, OpsClarity
OpsClarity is a performance monitoring solution for stream processing applications. In additional to providing deep component monitoring it leverages data science to proactively identify anomalies across the entire data pipeline and correlates issues across the data and app tier to identify common concerns that impact business. OpsClarity automatically discovers the entire app and data topology and is years ahead of anything else in how it leverages the rich meta-data and network dependency context captured through the topology to provide rich analysis and fastest correlated troubleshooting. This talk will additionally cover integration with Apache Apex.
Agile Data Integration: How is it possible?confluent
In this talk, we are going to tell you the story of building the Connection Platform (CoPa). This is an endeavor undertaken at Generali Switzerland over the course of the last year, in a collaboration with Innovation Process Technology. The goal was to design a general purpose, state of the art integration platform, which covers all integration needs of the enterprise. The central data distribution and integration layer are powered by Confluent Kafka. We will throw a spotlight on three different aspects of this platform that, all in their own right, are essential for agile data integration.
First of all, the platform is hosted on the container platform Redhat Openshift. Everything is set up in flexible Docker containers. Automated pipelines are used to build, provision and deploy everything on the platform from infrastructure to data pipeline
Kubernetes as Orchestrator for A10 Lightning ControllerAkshay Mathur
A10 Lightning Application Delivery System (ADS) supports hybrid environments by providing secure application services and advanced analytics across the entire deployment – from traditional on-premise data centers, to public and/or private clouds, or any combination thereof. A10 Lightning employs a controller-based architecture that can self-managed on-premise or in a private cloud, or utilized as a SaaS offering managed by A10, to enable management of heterogeneous workloads across physical hardware-based environments, as well as public, private, and hybrid clouds.
This presentation talks about our journey from a VM based Controller to a Kubernetes based Controller
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, MicrosoftHostedbyConfluent
Kafka Connectors are used extensively in data migration solutions, serving as a middle tier when migrating data across databases. In addition, microservice architectures also use Kafka Connectors heavily when communicating with one another while still operating independently on their own data stores. In this talk, we cover these use cases in more detail along with a deep dive into the architecture of the source and sink Kafka Connectors for Cosmos DB.
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetHostedbyConfluent
Streaming data systems have been growing rapidly in importance to the modern data stack. Kafka’s kSQL provides an interface for analytic tools that speak SQL. Apache Superset, the most popular modern open-source visualization and analytics solution, plugs into nearly any data source that speaks SQL, including Kafka. Here, we review and compare methods for connecting Kafka to Superset to enable streaming analytics use cases including anomaly detection, operational monitoring, and online data integration.
Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...HostedbyConfluent
Robinhood uses Kafka in every line of its business, from stock and crypto trading to clearing and data analytics. One interesting aspect of our architecture is that many of our microservices leveraging Kafka are written in Python. When you combine Python's relatively slow performance coupled, its reliance on process-based parallelism and Robinhood’s scale, the result is a massive fleet of application processes producing to and consuming from our Kafka clusters. This fleet generates an atypical workload on Kafka that warrants a deeper investment in scalability and reliability.
This talk discusses our investments in Kafka infrastructure for a large-scale Python-based environment:
kafkahood: our librdkafka-based client library wrapper that codifies best practices, sane defaults and deep client-side observability.
kafkaproxy: a Rust-based sidecar proxy that reduces connection fan-in from Python gunicorn worker pools to our Kafka clusters.
We'll also present challenges we encountered along the way and share our learnings with the audience.
URP? Excuse You! The Three Metrics You Have to Know confluent
(Todd Palino, LinkedIn) Kafka Summit SF 2018
What do you really know about how to monitor a Kafka cluster for problems? Is your most reliable monitoring your users telling you there’s something broken? Are you capturing more metrics than the actual data being produced? Sure, we all know how to monitor disk and network, but when it comes to the state of the brokers, many of us are still unsure of which metrics we should be watching, and what their patterns mean for the state of the cluster. Kafka has hundreds of measurements, from the high-level numbers that are often meaningless to the per-partition metrics that stack up by the thousands as our data grows.
We will thoroughly explore three key monitoring concepts in the broker, that will leave you an expert in identifying problems with the least amount of pain:
-Under-replicated Partitions: The mother of all metrics
-Request Latencies: Why your users complain
-Thread pool utilization: How could 80% be a problem?
We will also discuss the necessity of availability monitoring and how to use it to get a true picture of what your users see, before they come beating down your door!
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
Apache Hudi is a data lake platform, that provides streaming primitives (upserts/deletes/change streams) on top of data lake storage. Hudi powers very large data lakes at Uber, Robinhood and other companies, while being pre-installed on four major cloud platforms.
Hudi supports exactly-once, near real-time data ingestion from Apache Kafka to cloud storage, which is typically used in-place of a S3/HDFS sink connector to gain transactions and mutability. While this approach is scalable and battle-tested, it can only ingest data in mini batches, leading to lower data freshness. In this talk, we introduce a Kafka Connect Sink Connector for Apache Hudi, which writes data straight into Hudi's log format, making the data immediately queryable, while Hudi's table services like indexing, compaction, clustering work behind the scenes, to further re-organize for better query performance.
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...HostedbyConfluent
SIEM platforms are essential to the new cybersecurity paradigm and data collection layer is a very important piece of it.
When you deliver a new platform, you can easily get lost in a variety of different vendors and solutions, too many challenges are facing. What if I change vendors, will I keep my data? How to feed multiple tools with the same data? How to collect data from custom apps and services? How to pay less for an expensive platform? How to keep data without a huge cost?
Join us if you are looking for the answers. In this session, you will learn how we replaced the vendor-provided data collection layer with kafka connect and the lessons we learnt. After the talk you will know:
- architecture and real-life examples of the flexible and highly available data collection platform
- custom connectors that do most of the work for us and how to extend the connectors to consume new data, we made them open sourced
- easy way to receive data from thousands of servers and many cloud services
- how to archive data at low cost
You will leave armed with a set of free tools and recipes to build a truly vendor-agnostic data collection platform. It will allow you to take you SIEM costs under control. You will feed your analytics tools with what they need and archive the rest at low cost. You will feed your SIEM smart!
Migrating from One Cloud Provider to Another (Without Losing Your Data or You...HostedbyConfluent
If you’re considering -- or planning -- a cloud migration, you may be concerned about risks to your data and your mental health. Migrations at scale are fraught with risk. You absolutely can’t lose data, compromise its integrity, or suffer downtime, so you want to be slow and careful. On the other hand, you’re paying two providers for every day the migration goes on, so you need to move as fast as possible.
Unity Technologies accumulates lots of data. We recently moved our data infrastructure as part of a major cloud migration from Amazon Web Services (AWS) to Google Cloud Platform (GCP).
To minimize risk and costs our team used Apache Kafka and Confluent Platform, while engaging Confluent Platform Professional Services to help ensure a speedy and seamless migration. Kafka was already serving as the backbone to our data infrastructure, which handles over half a million events per second, and during the migration it also served as the bridge between AWS and GCP.
Join us at this session to learn about the processes and tools used, the challenges faced, and the lessons learned as we moved our operations and petabytes of data from AWS to GCP with zero downtime.
DEVNET-1106 Upcoming Services in OpenStackCisco DevNet
There are several new upcoming OpenStack projects/services that are build upon the core OpenStack infrastructure services. This session will first briefly discuss the new changes introduced for the project governance structure in OpenStack. Subsequently, the focus of the presentation will be to provide feature and architecture details on few of the new projects and services in OpenStack. These will include Trove-Database Service, Sahara-Dataprocessing Service, Congress - Policy Service and Magnum – Container Service. A summary of other OpenStack related services will also be provided.
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...HostedbyConfluent
As cyber threats continuously grow in sophistication and frequency, companies need to quickly acclimate to effectively detect, respond, and protect their environments. At Intel, we’ve addressed this need by implementing a modern, scalable Cyber Intelligence Platform (CIP) based on Splunk and Apache Kafka. We believe that CIP positions us for the best defense against cyber threats well into the future.
Our CIP ingests tens of terabytes of data each day and transforms it into actionable insights through streams processing, context-smart applications, and advanced analytics techniques. Kafka serves as a massive data pipeline within the platform. It achieves economies of scale by acquiring data once and consuming it many times. It reduces technical debt by eliminating custom point-to-point connections for producing and consuming data. At the same time, it provides the ability to operate on data in-stream, enabling us to reduce Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR). Faster detection and response ultimately lead to better prevention.
In our session, we’ll discuss the details described in the IT@Intel white paper that was published in Nov 2020 with same title. We’ll share some stream processing techniques, such as filtering and enriching in Kafka to deliver contextually rich data to Splunk and many of our security controls.
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, AivenHostedbyConfluent
When moving to a cloud native architecture Moogsoft knew they needed more scale than Rabbit could provide. Moogsoft moved into Kafka which is known for quick writing and driving heavy event driven workloads on top of niceties such as replayability. Choosing the tool was easy, finding a vendor that ticked all their boxes was not. They needed to ensure scalability, upgradability, builds via existing IAC pipelines, and observability via existing tools. When Moogsoft found Aiven, they were impressed with their offering and ability to scale on demand. During this presentation we will explore how Moogsoft used Aiven for Kafka to manage and scale their data in the cloud.
In the Internet of things, data and commands between things and servers are sent as streams of events, which are often aggregated and processed to provide up to date information to end users. Because of this, CQRS and Event Sourcing patterns are a natural fit for IoT applications. In this presentation we provide an overview of these patterns, how they apply to IoT applications and their benefits. A prototype application of Event Sourcing is then demonstrated using the Sense Tecnic FRED platform based on Node-RED - a data flow programming tool for wiring up the internet of things
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...HostedbyConfluent
In this talk, we'll discuss how VillageMD is able to use Kafka topic compaction for rapidly scaling our reprocessing pipelines to encompass hundreds of feeds. Within healthcare data ecosystems, privacy and data minimalism are key design priorities. Being able to handle data deletion in a reliable, timely manner within event-driven architectures is becoming more and more necessary with key governance frameworks like the GDPR and HIPAA.
We'll be giving an overview of the building and governance of dead-letter queues for streaming data processing.
We'll discuss:
1. How to architect a data sink for failed records.
2. How topic compaction can reduce duplicate data and enable idempotency.
3. Building a tombstoning system for removing successfully reprocessed records from the queues.
4. Considerations for monitoring a reprocessing system in production -- what metrics, dataops, and SLAs are useful?
Agile Data Integration: How is it possible?confluent
In this talk, we are going to tell you the story of building the Connection Platform (CoPa). This is an endeavor undertaken at Generali Switzerland over the course of the last year, in a collaboration with Innovation Process Technology. The goal was to design a general purpose, state of the art integration platform, which covers all integration needs of the enterprise. The central data distribution and integration layer are powered by Confluent Kafka. We will throw a spotlight on three different aspects of this platform that, all in their own right, are essential for agile data integration.
First of all, the platform is hosted on the container platform Redhat Openshift. Everything is set up in flexible Docker containers. Automated pipelines are used to build, provision and deploy everything on the platform from infrastructure to data pipeline
Kubernetes as Orchestrator for A10 Lightning ControllerAkshay Mathur
A10 Lightning Application Delivery System (ADS) supports hybrid environments by providing secure application services and advanced analytics across the entire deployment – from traditional on-premise data centers, to public and/or private clouds, or any combination thereof. A10 Lightning employs a controller-based architecture that can self-managed on-premise or in a private cloud, or utilized as a SaaS offering managed by A10, to enable management of heterogeneous workloads across physical hardware-based environments, as well as public, private, and hybrid clouds.
This presentation talks about our journey from a VM based Controller to a Kubernetes based Controller
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, MicrosoftHostedbyConfluent
Kafka Connectors are used extensively in data migration solutions, serving as a middle tier when migrating data across databases. In addition, microservice architectures also use Kafka Connectors heavily when communicating with one another while still operating independently on their own data stores. In this talk, we cover these use cases in more detail along with a deep dive into the architecture of the source and sink Kafka Connectors for Cosmos DB.
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetHostedbyConfluent
Streaming data systems have been growing rapidly in importance to the modern data stack. Kafka’s kSQL provides an interface for analytic tools that speak SQL. Apache Superset, the most popular modern open-source visualization and analytics solution, plugs into nearly any data source that speaks SQL, including Kafka. Here, we review and compare methods for connecting Kafka to Superset to enable streaming analytics use cases including anomaly detection, operational monitoring, and online data integration.
Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...HostedbyConfluent
Robinhood uses Kafka in every line of its business, from stock and crypto trading to clearing and data analytics. One interesting aspect of our architecture is that many of our microservices leveraging Kafka are written in Python. When you combine Python's relatively slow performance coupled, its reliance on process-based parallelism and Robinhood’s scale, the result is a massive fleet of application processes producing to and consuming from our Kafka clusters. This fleet generates an atypical workload on Kafka that warrants a deeper investment in scalability and reliability.
This talk discusses our investments in Kafka infrastructure for a large-scale Python-based environment:
kafkahood: our librdkafka-based client library wrapper that codifies best practices, sane defaults and deep client-side observability.
kafkaproxy: a Rust-based sidecar proxy that reduces connection fan-in from Python gunicorn worker pools to our Kafka clusters.
We'll also present challenges we encountered along the way and share our learnings with the audience.
URP? Excuse You! The Three Metrics You Have to Know confluent
(Todd Palino, LinkedIn) Kafka Summit SF 2018
What do you really know about how to monitor a Kafka cluster for problems? Is your most reliable monitoring your users telling you there’s something broken? Are you capturing more metrics than the actual data being produced? Sure, we all know how to monitor disk and network, but when it comes to the state of the brokers, many of us are still unsure of which metrics we should be watching, and what their patterns mean for the state of the cluster. Kafka has hundreds of measurements, from the high-level numbers that are often meaningless to the per-partition metrics that stack up by the thousands as our data grows.
We will thoroughly explore three key monitoring concepts in the broker, that will leave you an expert in identifying problems with the least amount of pain:
-Under-replicated Partitions: The mother of all metrics
-Request Latencies: Why your users complain
-Thread pool utilization: How could 80% be a problem?
We will also discuss the necessity of availability monitoring and how to use it to get a true picture of what your users see, before they come beating down your door!
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
Apache Hudi is a data lake platform, that provides streaming primitives (upserts/deletes/change streams) on top of data lake storage. Hudi powers very large data lakes at Uber, Robinhood and other companies, while being pre-installed on four major cloud platforms.
Hudi supports exactly-once, near real-time data ingestion from Apache Kafka to cloud storage, which is typically used in-place of a S3/HDFS sink connector to gain transactions and mutability. While this approach is scalable and battle-tested, it can only ingest data in mini batches, leading to lower data freshness. In this talk, we introduce a Kafka Connect Sink Connector for Apache Hudi, which writes data straight into Hudi's log format, making the data immediately queryable, while Hudi's table services like indexing, compaction, clustering work behind the scenes, to further re-organize for better query performance.
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...HostedbyConfluent
SIEM platforms are essential to the new cybersecurity paradigm and data collection layer is a very important piece of it.
When you deliver a new platform, you can easily get lost in a variety of different vendors and solutions, too many challenges are facing. What if I change vendors, will I keep my data? How to feed multiple tools with the same data? How to collect data from custom apps and services? How to pay less for an expensive platform? How to keep data without a huge cost?
Join us if you are looking for the answers. In this session, you will learn how we replaced the vendor-provided data collection layer with kafka connect and the lessons we learnt. After the talk you will know:
- architecture and real-life examples of the flexible and highly available data collection platform
- custom connectors that do most of the work for us and how to extend the connectors to consume new data, we made them open sourced
- easy way to receive data from thousands of servers and many cloud services
- how to archive data at low cost
You will leave armed with a set of free tools and recipes to build a truly vendor-agnostic data collection platform. It will allow you to take you SIEM costs under control. You will feed your analytics tools with what they need and archive the rest at low cost. You will feed your SIEM smart!
Migrating from One Cloud Provider to Another (Without Losing Your Data or You...HostedbyConfluent
If you’re considering -- or planning -- a cloud migration, you may be concerned about risks to your data and your mental health. Migrations at scale are fraught with risk. You absolutely can’t lose data, compromise its integrity, or suffer downtime, so you want to be slow and careful. On the other hand, you’re paying two providers for every day the migration goes on, so you need to move as fast as possible.
Unity Technologies accumulates lots of data. We recently moved our data infrastructure as part of a major cloud migration from Amazon Web Services (AWS) to Google Cloud Platform (GCP).
To minimize risk and costs our team used Apache Kafka and Confluent Platform, while engaging Confluent Platform Professional Services to help ensure a speedy and seamless migration. Kafka was already serving as the backbone to our data infrastructure, which handles over half a million events per second, and during the migration it also served as the bridge between AWS and GCP.
Join us at this session to learn about the processes and tools used, the challenges faced, and the lessons learned as we moved our operations and petabytes of data from AWS to GCP with zero downtime.
DEVNET-1106 Upcoming Services in OpenStackCisco DevNet
There are several new upcoming OpenStack projects/services that are build upon the core OpenStack infrastructure services. This session will first briefly discuss the new changes introduced for the project governance structure in OpenStack. Subsequently, the focus of the presentation will be to provide feature and architecture details on few of the new projects and services in OpenStack. These will include Trove-Database Service, Sahara-Dataprocessing Service, Congress - Policy Service and Magnum – Container Service. A summary of other OpenStack related services will also be provided.
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...HostedbyConfluent
As cyber threats continuously grow in sophistication and frequency, companies need to quickly acclimate to effectively detect, respond, and protect their environments. At Intel, we’ve addressed this need by implementing a modern, scalable Cyber Intelligence Platform (CIP) based on Splunk and Apache Kafka. We believe that CIP positions us for the best defense against cyber threats well into the future.
Our CIP ingests tens of terabytes of data each day and transforms it into actionable insights through streams processing, context-smart applications, and advanced analytics techniques. Kafka serves as a massive data pipeline within the platform. It achieves economies of scale by acquiring data once and consuming it many times. It reduces technical debt by eliminating custom point-to-point connections for producing and consuming data. At the same time, it provides the ability to operate on data in-stream, enabling us to reduce Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR). Faster detection and response ultimately lead to better prevention.
In our session, we’ll discuss the details described in the IT@Intel white paper that was published in Nov 2020 with same title. We’ll share some stream processing techniques, such as filtering and enriching in Kafka to deliver contextually rich data to Splunk and many of our security controls.
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, AivenHostedbyConfluent
When moving to a cloud native architecture Moogsoft knew they needed more scale than Rabbit could provide. Moogsoft moved into Kafka which is known for quick writing and driving heavy event driven workloads on top of niceties such as replayability. Choosing the tool was easy, finding a vendor that ticked all their boxes was not. They needed to ensure scalability, upgradability, builds via existing IAC pipelines, and observability via existing tools. When Moogsoft found Aiven, they were impressed with their offering and ability to scale on demand. During this presentation we will explore how Moogsoft used Aiven for Kafka to manage and scale their data in the cloud.
In the Internet of things, data and commands between things and servers are sent as streams of events, which are often aggregated and processed to provide up to date information to end users. Because of this, CQRS and Event Sourcing patterns are a natural fit for IoT applications. In this presentation we provide an overview of these patterns, how they apply to IoT applications and their benefits. A prototype application of Event Sourcing is then demonstrated using the Sense Tecnic FRED platform based on Node-RED - a data flow programming tool for wiring up the internet of things
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...HostedbyConfluent
In this talk, we'll discuss how VillageMD is able to use Kafka topic compaction for rapidly scaling our reprocessing pipelines to encompass hundreds of feeds. Within healthcare data ecosystems, privacy and data minimalism are key design priorities. Being able to handle data deletion in a reliable, timely manner within event-driven architectures is becoming more and more necessary with key governance frameworks like the GDPR and HIPAA.
We'll be giving an overview of the building and governance of dead-letter queues for streaming data processing.
We'll discuss:
1. How to architect a data sink for failed records.
2. How topic compaction can reduce duplicate data and enable idempotency.
3. Building a tombstoning system for removing successfully reprocessed records from the queues.
4. Considerations for monitoring a reprocessing system in production -- what metrics, dataops, and SLAs are useful?
Istio Mesh – Managing Container Deployments at ScaleMofizur Rahman
The service mesh is an infrastructure component that helps manage services running within our clusters. Without any changes to service or application code, solutions like Istio and Linkerd provide features to manage container deployments at scale. With Istio we get traffic management, security, rate limiting, monitoring, and many more things out of the box. We will discuss these solutions and some of their features at a high level, then roll in some specific demonstrations of using a service mesh to route and shift service traffic, easily manage deployments and test our services with micro benchmarks and fault injection.
The service mesh is an infrastructure component that helps manage services running within our clusters. Without any changes to service or application code, solutions like Istio and Linkerd provide features to manage container deployments at scale. With Istio we get traffic management, security, rate limiting, monitoring, and many more things out of the box. We will discuss these solutions and some of their features at a high level, then roll in some specific demonstrations of using a service mesh to route and shift service traffic, easily manage deployments and test our services with micro benchmarks and fault injection. We will also look at some of the error scenarios you may encounter and how to deal with some of them while keeping your sanity.
How to build "AutoScale and AutoHeal" systems using DevOps practices by using modern technologies.
A complete build pipeline and the process of architecting a nearly unbreakable system were part of the presentation.
These slides were presented at 2018 DevOps conference in Singapore. http://claridenglobal.com/conference/devops-sg-2018/
Amazon EKS 그리고 Service Mesh
Kubernetes는 컨테이너 서비스를 도입하는 기업들에게 가장 있기있는 Orchestration 플랫폼입니다. 이 세션에서는 아마존에서 6월 정식 출시한 managed Kubenetes서비스인 EKS를 소개해드리며, 오픈소스 버전과의 차이점 및 장점 등에 대해 설명하고, 진보한 마이크로 서비스인 Service Mesh를 구현하는 Linkerd 소개 및 데모를 진행하고자 합니다.
Containers as Infrastructure for New Gen AppsKhalid Ahmed
Khalid will share on emerging container technologies and their role in supporting an agile cloud-native application development model. He will discuss the basics of containers compared to traditional virtualization, review use cases, and explore the open-source container management ecosystem.
The adoption of container native and cloud native development practices presents new operational challenges. Today’s microservice environments are polyglot, distributed, container-based, highly-scalable, and ephemeral. To understand your system, you need to be able to follow the life of a request across numerous components distributed in multiple environments. Without the proper tools it can feel impossible to determine a root cause of an issue. This requires a new approach to operations. We will review a series of open source observability tools for logging, monitoring, and tracing to help developers achieve operational excellence for running container-based workloads.
1. Overview of DevOps
2. Infrastructure as Code (IaC) and Configuration as code
3. Identity and Security protection in CI CD environment
4. Monitor Health of the Infrastructure/Application
5. Open Source Software (OSS) and third-party tools, such as Chef, Puppet, Ansible, and Terraform to achieve DevOps.
6. Future of DevOps Application
CAM - Cloud Automation Manager is a centralized and modular microservices plugins-based framework for DevOps and automation of Entereprise software development and service provider cloud operations using a single pane of glass using opensource architecture
DEVNET-1169 CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...Cisco DevNet
Nowadays, we heard a lot regarding micro services and DevOps but then, what are the impacts for an application development and how to really achieve this? The demo will demonstrate the benefits of using Docker (and related tools / technologies) for a micro services application and then having a continuous integration / tests / deployment workflow on CCS/Nimbus.
The evolution of micro services architecture. Mainframe, Midrange, Client Server, SOA. Best practices of microservices. Load balancing, BigData, design patterns. When and why to use microservices.
How Docker EE is Finnish Railway’s Ticket to App ModernizationDocker, Inc.
VR Group-Finnish Railways is responsible for 118 million passenger rides and moving 41 million tons of cargo a year and is seeing overall growth in rail transit throughout Finland. A priority for the organization is to provide improved customer services, including an improved seat reservation system and bringing modern experiences like next generation mobile apps to their passengers. These improvements require looking at their application portfolio and deciding to either:
Revise: Transform legacy applications to more cost efficient solutions
Redesign: Redesign and rewrite mainframe-based solutions to microservices
In this session, Markus Niskanen, Integration Manager at VR Group, and Oscar Renalias, Sr. Technology Architect at Accenture will discuss how they leveraged Docker EE and the public cloud to be the common platform for these different application modernization projects. They will cover how they are leveraging Docker and the cloud to renew and optimize their application portfolio for greater ROI, leading to organization-wide adaptation of DevOps principles and cultural change in an industry that is over 150 years old.
Azure service fabric for building micro service based applications. Comparison of monolythic application with cloud based micro service application, hosting over cloud containers like docker
In this training webinar, Samantha Wang will walk you through the basics of Telegraf. Telegraf is the open source server agent which is used to collect metrics from your stacks, sensors and systems. It is InfluxDB’s native data collector that supports nearly 300 inputs and outputs. Learn how to send data from a variety of systems, apps, databases and services in the appropriate format to InfluxDB. Discover tips and tricks on how to write your own plugins. The know-how learned here can be applied to a multitude of use cases and sectors. This one-hour session will include the training and time for live Q&A.
In this training webinar, we will walk you through the basics of InfluxDB – the purpose-built time series database. InfluxDB has everything you need from a time series platform in a single binary – a multi-tenanted time series database, UI and dashboarding tools, background processing and monitoring agent. This one-hour session will include the training and time for live Q&A.
What you will learn
Core concepts of time series databases
An overview of the InfluxDB platform
How to ingesting and query data in InfluxDB
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
2. Values
• Infrastructure as Code (IaC)
• Test then deploy
• Deploy once, run anywhere (don’t depend on proprietary
services/cloud)
• Everything should be documented
• Everything opensource and free to use
• Hiring, KT, Onboarding of new developers should be seamless and
easy
• Distributed, Highly scalable, Fault Tolerant, Resilient
3. Microservices Architecture
• Cloud native is a term used to describe container-based environments. Cloud-
native technologies are used to develop applications built with services packaged
in containers, deployed as microservices and managed on elastic infrastructure
through agile DevOps processes and continuous delivery workflows.
• 10 Commandments of Microservices Architecture
• Clean separation of stateless and stateful services
• Do not share libraries or SDKs
• Avoid host affinity
• Focus on services with one task in mind
• Use lightweight messaging protocol for communication
• Design a well-defined entry point and exit point
• Implement a self-registration and discovery mechanism
• Explicitly check for rules and constraints
• Prefer polyglot over single stack
• Maintain independent revisions and build environments
4. Technologies
Current Tools New Tools
Infrastructure Provisioning Terraform / Ansible
CI / CD Pipeline AWS Code Pipeline / Jenkins Jenkins
Server / Container
Orchestration
EC2 instances Kubernetes
Service Mesh Istio
Monitoring New relic / AWS Cloudwatch Prometheus, Alertmanager, Grafana
Logging Elasticsearch, Fluentd, Kibana
Job Orchestrator CronJobs / GCP Cron Scheduler Airflow
Environment Native Deployments Docker
Data Pipeline (ETL) Python scripts / cronjobs
Databases MySQL / Redshift
5. Terraform / Ansible
• Declarative Programming tool for automating infrastructure resource
creation
• Key Features
• Infrastructure as Code
• Execution Plans
• Resource Graph
• Change Automation
• Creating new infrastructure is a code change (commit, PR, merge)
• Ansible – Tool for managing fleet of servers
6. Jenkins (CI / CD Tool)
• Jenkins is a continuous integration tool which enables software teams
to build the integration pipelines for their projects.
7.
8. Kubernetes
• Software tools to manage and coordinate containers
• Key Features
• Automatic Binpacking
• Horizontal Scaling
• Automated rollouts and rollbacks
• Storage Orchestration
• Self-healing
• Service discovery and load balancing
• Secret and Configuration Management
• Batch Execution
9. Other Features
• Blue/green deployment, canary deployment
• Long running services, but also batch (one-off) jobs
• Overcommit our cluster and evict low-priority jobs
• Run services with stateful data (databases etc.)
• Fine-grained access control defining what can be done
by whom on which resources
• Integrating third party services (service catalog)
• Automating complex tasks (operators)
• CronJobs
10.
11. Istio
• Istio is an open platform for providing a uniform way to integrate microservices,
manage traffic flow across microservices, enforce policies and aggregate telemetry
data. Istio's control plane provides an abstraction layer over the underlying cluster
management platform, such as Kubernetes, Mesos, etc.
• Key Features
• Code Independent (Polyglot)
• Intelligent Routing and Load-Balancing
• A/B Tests
• Smarter Canary Releases
• Chaos: Fault Injection
• Resilience
• Circuit Breakers
• Retries, Failovers
• Single Authentication and Authorization service, User Management (keycloak)
• Fleet wide policy enforcement
• A pluggable policy layer and configuration API supporting access controls, rate limits and quotas.
• Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic.
• Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and
egress.
14. Monitoring (Prometheus, Alertmanager, Grafana)
• A CNCF (Cloud Native Computing Foundation) project, is a systems and service
monitoring system. It collects metrics from configured targets at given intervals,
evaluates rule expressions, displays the results, and can trigger alerts if some
condition is observed to be true.
• The Alertmanager handles alerts sent by client applications such as the
Prometheus server. It takes care of deduplicating, grouping, and routing them to
the correct receiver integration such as email, Slack, PagerDuty, or OpsGenie. It
also takes care of silencing and inhibition of alerts.
• Key Features
• Grouping
• Inhibition
• Silences
• The open platform for beautiful analytics and monitoring (open source software
for time series analytics)
15.
16.
17.
18.
19. Logging (Elasticsearch, Fluentd, Kibana)
• Elasticsearch is a distributed, scalable, real-time search and analytics engine. It
enables you to search, analyze, and explore your data. It exists because raw data
sitting on a hard drive is just not useful.
• Fluentd is an open source data collector for unified logging layer.
• Kibana is a visualization layer that works on top of Elasticsearch.
• Other features
• Heartbeats
• Metrics / APM (Application Performance Monitoring)
• Elastalert (Alerting over logs)
• spike
• frequency
• flatline
• new_term
• change
20.
21. Job Orchestrator (Airflow)
• Airflow is a platform to programmatically author, schedule and
monitor workflows.
• Use airflow to author workflows as directed acyclic graphs (DAGs) of
tasks. The airflow scheduler executes your tasks on an array of
workers while following the specified dependencies. Rich command
line utilities make performing complex surgeries on DAGs a snap. The
rich user interface makes it easy to visualize pipelines running in
production, monitor progress, and troubleshoot issues when needed.
• When workflows are defined as code, they become more
maintainable, versionable, testable, and collaborative.
22.
23.
24. Docker
• Docker is a tool for deploying isolated, or containerized, applications.
Docker containers are similar to virtual machines in a sense, but
much more lightweight both in size and resource consumption.
• Code once, run everywhere
• Doesn’t depend on environment
• Every dependency is packed inside an image
• Easy to scale horizontally
26. Onboarding Applications
• Steps
1. Dockerizing Application
2. Creating a Jenkins pipeline
3. Deploying in staging environment
4. Deploying in production after Q&A