If your business is heavily dependent on the Internet, you may be facing an unprecedented level of network traffic analytics data. How to make the most of that data is the challenge. This presentation from Kentik VP Product and former EMA analyst Jim Frey explores the evolving need, the architecture and key use cases for BGP and NetFlow analysis based on scale-out cloud computing and Big Data technologies.
Kentik Detect Engine - Network Field Day 2017gvillain
Dan Ellis (CTO@Kentik) presents and discusses the technology and platform behind Kentik Detect Engine.
Links to the video of the presentation: https://kentik.com/nfd14
The Network Knows—Avi Freedman, CEO & Co-Founder of Kentik Outlyer
Apps generate the traffic, but the network delivers it. Many devops and netops stacks are completely separate, but it doesn't have to be that way!
In this talk we'll talk a bit about network traffic telemetry - sources, tools, and methods - and show how that data can be linked to metric, log, and APM systems.
PCAP Graphs for Cybersecurity and System TuningDr. Mirko Kämpf
Cybersecurity is a broad topic and many commercial products are related to it. We demonstrate a fundamental concept in network analysis: re-construction and visualization of temporal networks. Furthermore, we apply the method to describe operational conditions of a Hadoop cluster. Our experiments provide first results and allow a classification of the cluster state related to current workloads. The temporal networks show significant differences for different operation modes. In reallity we would expect mixed workloads. If such workload parameters are known, we are able to handle a-typical events accordingly - which means, we are able to create alerts based on context information, rather than only the package content. We show an end-to-end example: (1) Data collection is done via python, using the sniffer script; (2) using Apache Hive and Apache Spark we analyze the network traffic data and create the temporary network. Finally, we are able to visualize the results using Gephi in step (3). In a next step, we plan to contribute to the Apache Spot project.
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®confluent
Watch this talk here: https://www.confluent.io/online-talks/siem-modernization-build-a-situationally-aware-organization-with-apache-kafka
Of all security breaches, 85% are conducted with compromised credentials, often at the administration level or higher. A lot of IT groups think “security” means authentication, authorization and encryption (AAE), but these are often tick-boxes that rarely stop breaches. The internal threat surfaces of data streams or disk drives in a raidset in a data centre are not the threat surface of interest.
Cyber or Threat organizations must conduct internal investigations of IT, subcontractors and supply chains without implicating the innocent. Therefore, they are organizationally air-gapped from IT. Some surveys indicate up to 10% of IT is under investigation at any given time.
Deploying a signal processing platform, such as Confluent Platform, allows organizations to evaluate data as soon as it becomes available enabling them to assess and mitigate risk before it arises. In Cyber or Threat Intelligence, events can be considered signals, and when analysts are hunting for threat actors, these don't appear as a single needle in a haystack, but as a series of needles. In this paradigm, streams of signals aggregate into signatures. This session shows how various sub-systems in Apache Kafka can be used to aggregate, integrate and attribute these signals into signatures of interest.
In this talk you will learn:
-The current threat landscape
-The difference between Security and Threat Intelligence
-The value of Confluent platform as an ideal complement to hardware endpoint detection systems and batch-based SIEM warehouses
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
Apache Druid is a high-performance distributed analytics store for modern analytics applications. It supports ingesting millions of events per second and sub-second query processing. Druid supports various types of data sources for ingestion, including Apache Kafka. You can immediately query on stream events once they get ingested into Druid. Since Kafka provides scalable and robust data delivery while Druid supports advanced complex analysis on streams, Kafka and Druid are widely used together for BI and operational analytics use cases, which require interactivity, scalability, real-time, and performance.
This talk is based on our real-world experiences building out streaming analytics stacks powering production use cases across many industries.
If your business is heavily dependent on the Internet, you may be facing an unprecedented level of network traffic analytics data. How to make the most of that data is the challenge. This presentation from Kentik VP Product and former EMA analyst Jim Frey explores the evolving need, the architecture and key use cases for BGP and NetFlow analysis based on scale-out cloud computing and Big Data technologies.
Kentik Detect Engine - Network Field Day 2017gvillain
Dan Ellis (CTO@Kentik) presents and discusses the technology and platform behind Kentik Detect Engine.
Links to the video of the presentation: https://kentik.com/nfd14
The Network Knows—Avi Freedman, CEO & Co-Founder of Kentik Outlyer
Apps generate the traffic, but the network delivers it. Many devops and netops stacks are completely separate, but it doesn't have to be that way!
In this talk we'll talk a bit about network traffic telemetry - sources, tools, and methods - and show how that data can be linked to metric, log, and APM systems.
PCAP Graphs for Cybersecurity and System TuningDr. Mirko Kämpf
Cybersecurity is a broad topic and many commercial products are related to it. We demonstrate a fundamental concept in network analysis: re-construction and visualization of temporal networks. Furthermore, we apply the method to describe operational conditions of a Hadoop cluster. Our experiments provide first results and allow a classification of the cluster state related to current workloads. The temporal networks show significant differences for different operation modes. In reallity we would expect mixed workloads. If such workload parameters are known, we are able to handle a-typical events accordingly - which means, we are able to create alerts based on context information, rather than only the package content. We show an end-to-end example: (1) Data collection is done via python, using the sniffer script; (2) using Apache Hive and Apache Spark we analyze the network traffic data and create the temporary network. Finally, we are able to visualize the results using Gephi in step (3). In a next step, we plan to contribute to the Apache Spot project.
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®confluent
Watch this talk here: https://www.confluent.io/online-talks/siem-modernization-build-a-situationally-aware-organization-with-apache-kafka
Of all security breaches, 85% are conducted with compromised credentials, often at the administration level or higher. A lot of IT groups think “security” means authentication, authorization and encryption (AAE), but these are often tick-boxes that rarely stop breaches. The internal threat surfaces of data streams or disk drives in a raidset in a data centre are not the threat surface of interest.
Cyber or Threat organizations must conduct internal investigations of IT, subcontractors and supply chains without implicating the innocent. Therefore, they are organizationally air-gapped from IT. Some surveys indicate up to 10% of IT is under investigation at any given time.
Deploying a signal processing platform, such as Confluent Platform, allows organizations to evaluate data as soon as it becomes available enabling them to assess and mitigate risk before it arises. In Cyber or Threat Intelligence, events can be considered signals, and when analysts are hunting for threat actors, these don't appear as a single needle in a haystack, but as a series of needles. In this paradigm, streams of signals aggregate into signatures. This session shows how various sub-systems in Apache Kafka can be used to aggregate, integrate and attribute these signals into signatures of interest.
In this talk you will learn:
-The current threat landscape
-The difference between Security and Threat Intelligence
-The value of Confluent platform as an ideal complement to hardware endpoint detection systems and batch-based SIEM warehouses
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
Apache Druid is a high-performance distributed analytics store for modern analytics applications. It supports ingesting millions of events per second and sub-second query processing. Druid supports various types of data sources for ingestion, including Apache Kafka. You can immediately query on stream events once they get ingested into Druid. Since Kafka provides scalable and robust data delivery while Druid supports advanced complex analysis on streams, Kafka and Druid are widely used together for BI and operational analytics use cases, which require interactivity, scalability, real-time, and performance.
This talk is based on our real-world experiences building out streaming analytics stacks powering production use cases across many industries.
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalHostedbyConfluent
ASRC Federal created the Mission Operator Assist (MOA) tool to extend human capabilities through AI/ML for NOAA. MOA ingests system log data from on-orbit satellite constellations and applies machine learning to greatly improve real-time situational awareness. MOA uses a collection of tools, including Kafka for multi-subscriber communications, all hosted through AWS Cloud Services and Kubernetes Containers for microservices. Like many traditional on-premises systems, satellite ground station operations are undergoing a renaissance as they increasingly become enabled by cloud.
During this session, the audience will learn about the satellite communications chain, and best practices and lessons learned in creating a data pipeline with Kafka for high throughput and scalability while displaying high quality situational awareness to mission operators. We will discuss our goals centered around establishing event-driven streaming for satellite logs so our machine learning becomes real-time and supporting a multi-subscriber approach for various Kafka topics. Listeners will also learn how a multi-subscriber approach using Kafka, helped us auto scale logstash based on how many messages are in the queue and other microservices.
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...HostedbyConfluent
We often need to build applications that analyze Kafka data to unlock the most value from event streams, so how can organizations build these real-time analytics applications? In this talk, we examine an indexing approach that enables fast SQL analytics on data from Kafka, without data flattening or denormalization. Rockset is the real-time indexing database that builds an inverted index, a columnar index and a row index on all fields of your Kafka messages, including nested fields and arrays. This Converged Index accelerates various types of analytic queries–search, aggregations and joins–without the need to denormalize or transform data for performance reasons. With indexing delivering significant gains in query performance, we also need to index new data in a timely manner. We discuss several strategies used for efficient ingestion and indexing from Kafka, including rollups, write optimizations on the underlying RocksDB storage engine, and the disaggregation of ingest and query compute.
The process of streaming real-time data from a wide variety of machine data sources and entities can be very complex and unwieldy. Using an agent-based approach, Informatica has invented a new technique and open access product that makes this process much more user friendly and efficient, even when dealing with multiple environments such as Hadoop, Cassandra, Storm, Amazon Kinesis and Complex Event Processing.
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHostedbyConfluent
At Gloo.us, we face a challenge in providing platform data to heterogeneous applications in a way that eliminates access contention, avoids high latency ETLs, and ensures consistency for many teams. We're solving this problem by adopting Data Mesh principles and leveraging Kafka, Kafka Connect, and Kafka streams to build an event driven architecture to connect applications to the data they need. A domain driven design keeps the boundaries between specialized process domains and singularly focused data domains clear, distinct, and disciplined. Applying the principles of a Data Mesh, process domains assume the responsibility of transforming, enriching, or aggregating data rather than relying on these changes at the source of truth -- the data domains. Architecturally, we've broken centralized big data lakes into smaller data stores that can be consumed into storage managed by process domains.
This session covers how we’re applying Kafka tools to enable our data mesh architecture. This includes how we interpret and apply the data mesh paradigm, the role of Kafka as the backbone for a mesh of connectivity, the role of Kafka Connect to generate and consume data events, and the use of KSQL to perform minor transformations for consumers.
Power Your Delta Lake with Streaming Transactional ChangesDatabricks
Organizations are adopting data digitization and data-driven decision making is at the heart of this transformation. Cloud Data Lakes and Datawarehouses provide great flexibility to proto-type and roll out applications continuously at much lower costs.
Transactional databases are optimized for processing huge volumes of transactions in real-time, whereas the cloud data lake needs to be optimized for analyzing huge volumes of data quickly. This brings about a challenge in creating a streamlined data flow process from capturing realtime transactions into a cloud datawarehouse to drive realtime insights in a scalable and cost effective manner.
In this session, we’ll show how organizations can easily overcome that challenge by adopting a robust platform with StreamSets and Delta Lake. StreamSets provides a no-code framework to automate ingestion of transactional data and data processing on Spark, while Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.
Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...confluent
(Benny Lee + Christopher Arthur, Bank of Australia) Kafka Summit SF 2018
Commonwealth Bank of Australia (CBA) is Australia’s largest bank with over 15m customers, 50,000 employees and over USD700 billion in assets. We started the journey two years ago to transform our existing enterprise architecture into an “event driven” architecture. Since then, Kafka has become a mission critical platform in the Bank and it is the core component in our “event driven” architecture strategy.
In this talk, we will walk you through the journey of how we stood up the initial Kafka clusters, the challenges we encountered (both technical and organisational) and how we overcame those challenges.
We will also deep dive into one of the use cases for Kafka (with Kafka Streams and Connectors) in our new real time payment system that was introduced in Australia early this year. We will discuss why we think Kafka was the perfect solution for this use case, and the lessons learned.
Key Takeaways:
-Lessons learned from our experiences (that we think other companies could be able to benefit from)
-Our use cases for Kafka with a particular focus on the new real time payment systems (NPP) initiative in Australia
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...HostedbyConfluent
Time series data is everywhere -- connected IoT devices, application monitoring & observability platforms, and more. What makes time series datastreams challenging is that they often have orders of magnitude more data than other workloads, with millions of time series datapoints being quite common. Given its ability to ingest high volumes of data, Kafka is a natural part of any data architecture handling large volumes of time series telemetry, specifically as an intermediate buffer before that data is persisted in InfluxDB for processing, analysis, and use in other applications. In this session, we will show you how you can stream time series data to your IoT application using Kafka queues and InfluxDB, drawing upon deployments done at Hulu and Wayfair that allow both to ingest 1 million metrics per second. Once this session is complete, you’ll be able to connect a Kafka queue to an InfluxDB instance as the beginning of your own time series data pipeline.
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...HostedbyConfluent
As a 120 year-old company, Nordstrom was facing numerous challenges as a result of an aging, service-oriented, architecture. Developers needing to implement reporting for analytics separately from core functionality resulted in questionable data quality for analytical purposes. Scaling dependent services in harmony to not overwhelm each other was a struggle faced by many, if not most, teams. Several years into a company-wide transition to an event-sourced architecture, Nordstrom has solved these and various other problems. By leveraging the capabilities of Apache Kafka and Confluent, combined with a deep organizational focus on well-defined business event schemas, a singular event can be used for analytical, functional, operational, and model building purposes. This session will describe this architecture and the lessons learned while building it, with a focus on the internally built, multi-tenant, multi-cluster, Kafka-as-a-Service platform that enables it.
Digital transformation: Highly resilient streaming architecture and strategie...HostedbyConfluent
Failure is inevitable in any distributed system but anticipating failures and building systems to recover from failures instantaneous makes the system highly resilient. At Capital One we process billions of events everyday and we leverage cloud, microservices, streaming and machine learning technologies to solve customer problems and provide the best customer experience.
As part of this session I will be talking about highly resilient streaming architecture that is supporting processing of billions of events every day then some of the strategies & best practices to build highly available and fault-tolerant systems utilizing Kafka and Cloud environments.
The data that organizations are required to analyze in order to make informed decisions is growing at an unprecedented rate. Companies have to capture the window of opportunity and become not just data driven, but event driven. In this talk, we will talk around addressing these issues and look into ways to bridge the on-premise kafka deployments with GCP stack for different use cases and personas. This will be followed by architecture examples on How do you deploy kafka and integrate with the rest of the GCP stack.
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
Apache Kafka is used as the primary message bus for propagating events and logs across Uber. In particular, it pairs with Apache Pinot, a real-time distributed OLAP datastore, to deliver real-time insights seconds after the messages produced to Kafka.
One challenge we faced was to update existing data in Pinot with the changelog in Kafka, and deliver an accurate view in the real-time analytical results. For example, the financial dashboard can report gross booking with the corrected Ride fares. And restaurant owners can analyze the UberEats orders with their latest delivery status.
Implementing upserts in an immutable real-time OLAP store like Pinot is nontrivial. We need to make architectural changes in how data is distributed via Kafka amongst the server nodes, how it's indexed and queried in a distributed fashion. In this talk I will discuss how we leveraged Kafka's partition-by-key feature to this end and how we added this ability in Pinot without any performance degradation.
Successful AI/ML Projects with End-to-End Cloud Data EngineeringDatabricks
Trusted, high-quality data and efficient use of data engineers’ time are critical success factors for AI/ML projects. Enterprise data is complex—it comes from several sources, in a variety of formats, and at varied speeds. For your machine learning projects on Apache Spark, you need a holistic approach to data engineering: finding & discovering, ingesting & integrating, server-less processing at scale, and data governance. Stop by this session for an overview on how to set up AI/ML projects for success while Informatica takes the heavy lifting out of your data engineering.
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...HostedbyConfluent
Are you looking for a cloud-based architecture that includes the best of breed streaming and database technologies? In this session you will learn how to setup and configure the Confluent Cloud with MongoDB Atlas. We'll start the journey learning about the basic connectivity between the two cloud services and end with a brief discovery of what you can do with data once it is in MongoDB Atlas. By the end of this session you will know how to securely setup and configure the MongoDB Atlas connectors in the Confluent Cloud in both a source and sink configuration.
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...HostedbyConfluent
Netflix will spend roughly 19B+ on content in 2021 over 100 countries to entertain over 200M subscribers. With this scale of investment in internationalized productions, it is essential to intelligently invest and track the movement of cash, in near-real-time, from pre-production to launch.
This involves complex near real time financial calculations, data movement, consistent data views of transactional data of the present and the cash predictions of the future. It needs exactly once processing semantics and idempotency while guaranteeing a low latency serving layer. We leverage kafka to build a scalable and fault tolerance event-based processing engine to provide the right SLA guarantees.
We also embrace event-driven data materialization to provide low latency lookups at scale. The aim of this talk is to provide an insight into how we built such a scalable system while embracing a full blown Kappa architecture, with Kafka at the heart of it all.
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...confluent
(Dmitry Milman + Ankur Kaneria, Express Scripts) Kafka Summit SF 2018
Building cloud-based microservices can be a challenge when the system of record is a relational database residing on an on-premise mainframe. The challenge lies in the ability to efficiently and cost-effectively access the ever-increasing amount of data. Express Scripts is reimagining its data architecture to bring best-in-class user experience and provide the foundation of next-generation applications.
This talk will showcase how Kafka plays a key role within Express Scripts’ transformation from mainframe to a microservice-based ecosystem, ensuring data integrity between two worlds. It will discuss how change data capture (CDC) is leveraged to stream data changes to Kafka, allowing us to build a low-latency data sync pipeline. We will describe how we achieve transactional consistency by collapsing all events that belong together onto a single topic, yet have the ability to scale out to meet the real time SLAs and low-latency requirements through means of partitions. We will share our Kafka Streams configuration to handle the data transformation workload. We will discuss our overall Kafka cluster footprint, configuration and security measures.
Express Scripts Holding Company is an American Fortune 100 company. As of 2018, the company is the 25th largest in the U.S. as well as one of the largest pharmacy benefit management organizations in the U.S. Customers rely on 24/7 access to our services, and need the ability to interact with our systems in real time via various channels such as web and mobile. Sharing our mainframe t0 microservices migration journey, our experiences and lessons learned would be beneficial to other companies venturing on a similar path.
Your LAN can now be delivered as a cloud-orchestrated service. Learn about key benefits, differentiators and business outcomes, while we deep dive into the Network as a Service (NaaS) concept, end-user experience and analytics.
- What is and Why NaaS?
- NaaS Utility Model
- Analytics to Optimize Your Cloud Network
- Live Q&A
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalHostedbyConfluent
ASRC Federal created the Mission Operator Assist (MOA) tool to extend human capabilities through AI/ML for NOAA. MOA ingests system log data from on-orbit satellite constellations and applies machine learning to greatly improve real-time situational awareness. MOA uses a collection of tools, including Kafka for multi-subscriber communications, all hosted through AWS Cloud Services and Kubernetes Containers for microservices. Like many traditional on-premises systems, satellite ground station operations are undergoing a renaissance as they increasingly become enabled by cloud.
During this session, the audience will learn about the satellite communications chain, and best practices and lessons learned in creating a data pipeline with Kafka for high throughput and scalability while displaying high quality situational awareness to mission operators. We will discuss our goals centered around establishing event-driven streaming for satellite logs so our machine learning becomes real-time and supporting a multi-subscriber approach for various Kafka topics. Listeners will also learn how a multi-subscriber approach using Kafka, helped us auto scale logstash based on how many messages are in the queue and other microservices.
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...HostedbyConfluent
We often need to build applications that analyze Kafka data to unlock the most value from event streams, so how can organizations build these real-time analytics applications? In this talk, we examine an indexing approach that enables fast SQL analytics on data from Kafka, without data flattening or denormalization. Rockset is the real-time indexing database that builds an inverted index, a columnar index and a row index on all fields of your Kafka messages, including nested fields and arrays. This Converged Index accelerates various types of analytic queries–search, aggregations and joins–without the need to denormalize or transform data for performance reasons. With indexing delivering significant gains in query performance, we also need to index new data in a timely manner. We discuss several strategies used for efficient ingestion and indexing from Kafka, including rollups, write optimizations on the underlying RocksDB storage engine, and the disaggregation of ingest and query compute.
The process of streaming real-time data from a wide variety of machine data sources and entities can be very complex and unwieldy. Using an agent-based approach, Informatica has invented a new technique and open access product that makes this process much more user friendly and efficient, even when dealing with multiple environments such as Hadoop, Cassandra, Storm, Amazon Kinesis and Complex Event Processing.
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHostedbyConfluent
At Gloo.us, we face a challenge in providing platform data to heterogeneous applications in a way that eliminates access contention, avoids high latency ETLs, and ensures consistency for many teams. We're solving this problem by adopting Data Mesh principles and leveraging Kafka, Kafka Connect, and Kafka streams to build an event driven architecture to connect applications to the data they need. A domain driven design keeps the boundaries between specialized process domains and singularly focused data domains clear, distinct, and disciplined. Applying the principles of a Data Mesh, process domains assume the responsibility of transforming, enriching, or aggregating data rather than relying on these changes at the source of truth -- the data domains. Architecturally, we've broken centralized big data lakes into smaller data stores that can be consumed into storage managed by process domains.
This session covers how we’re applying Kafka tools to enable our data mesh architecture. This includes how we interpret and apply the data mesh paradigm, the role of Kafka as the backbone for a mesh of connectivity, the role of Kafka Connect to generate and consume data events, and the use of KSQL to perform minor transformations for consumers.
Power Your Delta Lake with Streaming Transactional ChangesDatabricks
Organizations are adopting data digitization and data-driven decision making is at the heart of this transformation. Cloud Data Lakes and Datawarehouses provide great flexibility to proto-type and roll out applications continuously at much lower costs.
Transactional databases are optimized for processing huge volumes of transactions in real-time, whereas the cloud data lake needs to be optimized for analyzing huge volumes of data quickly. This brings about a challenge in creating a streamlined data flow process from capturing realtime transactions into a cloud datawarehouse to drive realtime insights in a scalable and cost effective manner.
In this session, we’ll show how organizations can easily overcome that challenge by adopting a robust platform with StreamSets and Delta Lake. StreamSets provides a no-code framework to automate ingestion of transactional data and data processing on Spark, while Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.
Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...confluent
(Benny Lee + Christopher Arthur, Bank of Australia) Kafka Summit SF 2018
Commonwealth Bank of Australia (CBA) is Australia’s largest bank with over 15m customers, 50,000 employees and over USD700 billion in assets. We started the journey two years ago to transform our existing enterprise architecture into an “event driven” architecture. Since then, Kafka has become a mission critical platform in the Bank and it is the core component in our “event driven” architecture strategy.
In this talk, we will walk you through the journey of how we stood up the initial Kafka clusters, the challenges we encountered (both technical and organisational) and how we overcame those challenges.
We will also deep dive into one of the use cases for Kafka (with Kafka Streams and Connectors) in our new real time payment system that was introduced in Australia early this year. We will discuss why we think Kafka was the perfect solution for this use case, and the lessons learned.
Key Takeaways:
-Lessons learned from our experiences (that we think other companies could be able to benefit from)
-Our use cases for Kafka with a particular focus on the new real time payment systems (NPP) initiative in Australia
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...HostedbyConfluent
Time series data is everywhere -- connected IoT devices, application monitoring & observability platforms, and more. What makes time series datastreams challenging is that they often have orders of magnitude more data than other workloads, with millions of time series datapoints being quite common. Given its ability to ingest high volumes of data, Kafka is a natural part of any data architecture handling large volumes of time series telemetry, specifically as an intermediate buffer before that data is persisted in InfluxDB for processing, analysis, and use in other applications. In this session, we will show you how you can stream time series data to your IoT application using Kafka queues and InfluxDB, drawing upon deployments done at Hulu and Wayfair that allow both to ingest 1 million metrics per second. Once this session is complete, you’ll be able to connect a Kafka queue to an InfluxDB instance as the beginning of your own time series data pipeline.
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...HostedbyConfluent
As a 120 year-old company, Nordstrom was facing numerous challenges as a result of an aging, service-oriented, architecture. Developers needing to implement reporting for analytics separately from core functionality resulted in questionable data quality for analytical purposes. Scaling dependent services in harmony to not overwhelm each other was a struggle faced by many, if not most, teams. Several years into a company-wide transition to an event-sourced architecture, Nordstrom has solved these and various other problems. By leveraging the capabilities of Apache Kafka and Confluent, combined with a deep organizational focus on well-defined business event schemas, a singular event can be used for analytical, functional, operational, and model building purposes. This session will describe this architecture and the lessons learned while building it, with a focus on the internally built, multi-tenant, multi-cluster, Kafka-as-a-Service platform that enables it.
Digital transformation: Highly resilient streaming architecture and strategie...HostedbyConfluent
Failure is inevitable in any distributed system but anticipating failures and building systems to recover from failures instantaneous makes the system highly resilient. At Capital One we process billions of events everyday and we leverage cloud, microservices, streaming and machine learning technologies to solve customer problems and provide the best customer experience.
As part of this session I will be talking about highly resilient streaming architecture that is supporting processing of billions of events every day then some of the strategies & best practices to build highly available and fault-tolerant systems utilizing Kafka and Cloud environments.
The data that organizations are required to analyze in order to make informed decisions is growing at an unprecedented rate. Companies have to capture the window of opportunity and become not just data driven, but event driven. In this talk, we will talk around addressing these issues and look into ways to bridge the on-premise kafka deployments with GCP stack for different use cases and personas. This will be followed by architecture examples on How do you deploy kafka and integrate with the rest of the GCP stack.
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
Apache Kafka is used as the primary message bus for propagating events and logs across Uber. In particular, it pairs with Apache Pinot, a real-time distributed OLAP datastore, to deliver real-time insights seconds after the messages produced to Kafka.
One challenge we faced was to update existing data in Pinot with the changelog in Kafka, and deliver an accurate view in the real-time analytical results. For example, the financial dashboard can report gross booking with the corrected Ride fares. And restaurant owners can analyze the UberEats orders with their latest delivery status.
Implementing upserts in an immutable real-time OLAP store like Pinot is nontrivial. We need to make architectural changes in how data is distributed via Kafka amongst the server nodes, how it's indexed and queried in a distributed fashion. In this talk I will discuss how we leveraged Kafka's partition-by-key feature to this end and how we added this ability in Pinot without any performance degradation.
Successful AI/ML Projects with End-to-End Cloud Data EngineeringDatabricks
Trusted, high-quality data and efficient use of data engineers’ time are critical success factors for AI/ML projects. Enterprise data is complex—it comes from several sources, in a variety of formats, and at varied speeds. For your machine learning projects on Apache Spark, you need a holistic approach to data engineering: finding & discovering, ingesting & integrating, server-less processing at scale, and data governance. Stop by this session for an overview on how to set up AI/ML projects for success while Informatica takes the heavy lifting out of your data engineering.
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...HostedbyConfluent
Are you looking for a cloud-based architecture that includes the best of breed streaming and database technologies? In this session you will learn how to setup and configure the Confluent Cloud with MongoDB Atlas. We'll start the journey learning about the basic connectivity between the two cloud services and end with a brief discovery of what you can do with data once it is in MongoDB Atlas. By the end of this session you will know how to securely setup and configure the MongoDB Atlas connectors in the Confluent Cloud in both a source and sink configuration.
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...HostedbyConfluent
Netflix will spend roughly 19B+ on content in 2021 over 100 countries to entertain over 200M subscribers. With this scale of investment in internationalized productions, it is essential to intelligently invest and track the movement of cash, in near-real-time, from pre-production to launch.
This involves complex near real time financial calculations, data movement, consistent data views of transactional data of the present and the cash predictions of the future. It needs exactly once processing semantics and idempotency while guaranteeing a low latency serving layer. We leverage kafka to build a scalable and fault tolerance event-based processing engine to provide the right SLA guarantees.
We also embrace event-driven data materialization to provide low latency lookups at scale. The aim of this talk is to provide an insight into how we built such a scalable system while embracing a full blown Kappa architecture, with Kafka at the heart of it all.
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...confluent
(Dmitry Milman + Ankur Kaneria, Express Scripts) Kafka Summit SF 2018
Building cloud-based microservices can be a challenge when the system of record is a relational database residing on an on-premise mainframe. The challenge lies in the ability to efficiently and cost-effectively access the ever-increasing amount of data. Express Scripts is reimagining its data architecture to bring best-in-class user experience and provide the foundation of next-generation applications.
This talk will showcase how Kafka plays a key role within Express Scripts’ transformation from mainframe to a microservice-based ecosystem, ensuring data integrity between two worlds. It will discuss how change data capture (CDC) is leveraged to stream data changes to Kafka, allowing us to build a low-latency data sync pipeline. We will describe how we achieve transactional consistency by collapsing all events that belong together onto a single topic, yet have the ability to scale out to meet the real time SLAs and low-latency requirements through means of partitions. We will share our Kafka Streams configuration to handle the data transformation workload. We will discuss our overall Kafka cluster footprint, configuration and security measures.
Express Scripts Holding Company is an American Fortune 100 company. As of 2018, the company is the 25th largest in the U.S. as well as one of the largest pharmacy benefit management organizations in the U.S. Customers rely on 24/7 access to our services, and need the ability to interact with our systems in real time via various channels such as web and mobile. Sharing our mainframe t0 microservices migration journey, our experiences and lessons learned would be beneficial to other companies venturing on a similar path.
Your LAN can now be delivered as a cloud-orchestrated service. Learn about key benefits, differentiators and business outcomes, while we deep dive into the Network as a Service (NaaS) concept, end-user experience and analytics.
- What is and Why NaaS?
- NaaS Utility Model
- Analytics to Optimize Your Cloud Network
- Live Q&A
High Scalability Network Performance Management for EnterprisesCA Technologies
CA Performance Management is a big data collection, warehousing and analytics solution that helps enterprises maximize return on their network infrastructure investments and lower the cost of network operations.
Learn more about CA Performance Management here: http://bit.ly/1vrQPJB
General discussions
Why cloud?
The terminology: relating virtualization and cloud
Types of Virtualization and Cloud deployment model
Decisive factors in migration
Hands-on cloud deployment
Cloud for banks
Your LAN can now be delivered as a cloud-orchestrated service. Learn about key benefits, differentiators and business outcomes, while we deep dive into the Network as a Service (NaaS) concept, end-user experience and analytics.
- What is and Why NaaS?
- NaaS Utility Model
- Analytics to Optimize Your Cloud Network
- Live Q&A
Cloud Computing is a general term used to describe a new class of network based computing that takes place over the Internet,
a collection/group of integrated and networked hardware, software and Internet infrastructure.
provides hardware, software and networking services to clients.
These platforms (networked hardware) hide the complexity to provide a very simple service.
With all the hype around Cloud and SDN, business decision makers are finding themselves trying to navigate through many new concepts and consequently needing to change the way they have traditionally selected their IT infrastructure. Technologies are now becoming more integrated and it is more important than ever to help your business be agile enough to keep up with the demands of your users and your customers. Come hear from Lisa Guess to learn how organizations can embrace Cloud technologies such as automation, SDN and Orchestration platforms to help you build next-generation networks.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
2. The Cloud is a Digital Supply Chain
• SaaS, PaaS, IaaS are major
suppliers for your users
• Enterprises are offering more
cloud-based services
• Mobile apps
• E-commerce
• Which function and depend on
Web APIs
• Maps, Search, Ads, etc.
• The Internet is the global freight
routing system
• Must be high performing
3. Cloud-Aware Net Mgmt:Strategic Considerations
• Assures delivery of performance and user experience
• Deals with reality of Internet security
• Particularly DDoS because it is as much an operational
availability issue as a security challenge
• Leverages redundancy via multi-homing and CDN
infrastructure
4. Cloud-Aware Net Mgmt:Tactics
• Collect detailed traffic flow
information
• Instrument key nexus servers with
performance metrics collection
• Utilize advanced analytics
• Deploy synthetic testing to
understand availability
• Limited reliance on traditional
deep packet capture techniques,
which are cumbersome for cloud
networking
5. Elements of Cloud NetworkManagement
• NetFlow, sFlow, IPFIX traffic
flow data export
• Sampled flows are fine
• Passive BGP peering
• Cost-effective server-side
network instrumentation
• Granular, tune-able alerts for
anomalies & attacks
• Deep analytical visibility
• Automated remediation
6. MonitoringConsiderations
• Global visibility
• Top-down visibility
• Full details for drill-downs
• More than just summaries
• Not siloed
• Integrate with other tools,
dashboards, etc.
• Data/views easily shared with many
functional teams
• Supports fully hybrid environments
7. Alerting Considerations
• Network-wide
• Scalable with detail
• Host-level capable
• Dynamic anomaly detection
(self-learning what is normal
behavior)
• Flexible integration with your
choice of notification as well as
automated remediation
• E.g. DDoS scrubbers, load
balancers, network orchestration
• Alerting & detection needs to be
complemented by deep analytics
8. Reality of NetworkBig Data
• Network data is big data
• Commonplace to generate hundreds
of millions of data records per day
• Traditional approaches very limited
• Only produced roll-up summaries
• Okay for top-level views
• Useless for real action
• Compute/storage scale means big data
analytics are now relevant
• Recent announcement by Cisco on
Tetration Analytics is major signal
• Key is to go past BI and have
operational speed
9. Big Data Challenges for NetworkAnalytics
• Ingest speed
• Latency to query
• Time to query response
• Pre-computed cubes
• On the fly
10. Advanced (Big Data) NetworkAnalytics
• Need to enable engineers to leverage
their technical and institutional
knowledge effectively
• Ad-hoc queries across massive datasets
in a timely manner
• Multi-dimensional analytics
• Combine and visualize multiple fields
• Like a massive pivot table
• Complemented by automated analyses
that reveal complex relationships
• Practically speaking, turning insightful
ad-hoc queries into dashboards
22. Case Example: Summary
- Unusual traffic patterns from suspect Geo
- Turned out to be DNS Amplification targeting a specific dest IP
- But main attack was hiding other attacks/exploits
- Data harvested for mitigation
- Time required to complete this analysis: 3 minutes!