Twitter is powered by thousands of microservices running on an internal cloud platform consisting of a suite of multitenant platform services that offer compute, storage, messaging, monitoring, etc. as a service. These platforms have thousands of tenants and run atop hundreds of thousands of servers both on-premises and in the public cloud. The scale of diversity in Twitter’s multitenant infrastructure services makes it extremely difficult to effectively forecast capacity, compute resource utilization, and cost and drive efficiency.
Vinu Charanya explains how she and her team are building a system that captures, defines, provisions, meters, and charges infrastructure resources, redefining how systems are built atop Twitter infrastructure. The infrastructure resources include primitive bare metal servers and VMs in the public cloud and abstract resources offered by multitenant services such as a compute platform (powered by Apache Aurora and Mesos), storage (Manhattan for key-value, cache, RDBMS), and observability. Along the way, Vinu shares how Twitter used this data to better plan capacity and drive a cultural change in engineering that helped improve overall resource utilization and led to significant savings in infrastructure spend.
Changing landscapes in data integration - Kafka Connect for near real-time da...HostedbyConfluent
1. The document discusses Kafka Connect and its evolution for managing near real-time streaming pipelines. It describes how Kafka Connect can be used for data integration across different systems and challenges around identity when deploying Kafka Connect.
2. It introduces the concept of a managed Kafka Connect which deploys Kafka Connect on the customer's own Kubernetes namespace to avoid security and identity issues. The managed Connect is configured and managed through a centralized control plane.
3. It details how the managed Kafka Connect control plane can be used to provision Kafka Connect clusters, deploy connectors between different systems, override connector configurations, and monitor tasks.
Apache Kafka® is the technology behind event streaming which is fast becoming the central nervous system of flexible, scalable, modern data architectures. Customers want to connect their databases, data warehouses, applications, microservices and more, to power the event streaming platform. To connect to Apache Kafka, you need a connector!
This online talk dives into the new Verified Integrations Program and the integration requirements, the Connect API and sources and sinks that use Kafka Connect. We cover the verification steps and provide code samples created by popular application and database companies. We will discuss the resources available to support you through the connector development process.
This is Part 2 of 2 in Building Kafka Connectors - The Why and How
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...confluent
Speaker: Perry Krol, Senior Sales Engineer, Confluent Germany GmbH
Title of Talk:
Introduction to Apache Kafka as Event-Driven Open Source Streaming Platform
Abstract:
Apache Kafka is a de facto standard event streaming platform, being widely deployed as a messaging system, and having a robust data integration framework (Kafka Connect) and stream processing API (Kafka Streams) to meet the needs that common attend real-time event driven data processing.
The open source Confluent Platform adds further components such as a KSQL, Schema Registry, REST Proxy, Clients for different programming languages and Connectors for different technologies and databases. This session explains the concepts, architecture and technical details, including live demos.
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...confluent
Application teams in JPMC have started shifting towards building event driven architectures and real time steaming pipelines and Kafka has been at core in this journey. As application teams have started adopting Kafka rapidly, need for a centrally managed Kafka as a service has emerged. We have started delivering Kafka as a service in early 2018 and running in production for more than an year now operating 80+ clusters (and growing) in all environments together. One of the key requirements is to provide truly segregated, secured multi-tenant environment with RBAC model while satisfying financial regulations and controls at the same time. Operating clusters at large scale requires scalable self-service capabilities and cluster management orchestration. In this talk we will present - Our experiences in delivering and operating secured, multi-tenant and resilient Kafka clusters at scale. - Internals of our service framework/control plane which enables self-service capabilities for application teams, cluster build/patch orchestration and capacity management capabilities for TSE/admin teams. - Our approach in enabling automated Cross Datacenter failover for application teams using service framework and confluent replicator.
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...confluent
Ever wondered just how many CPU cores of KSQL Server you need to provision to handle your planned stream processing workload ? Or how many GBits of aggregate network bandwidth, spread across some number of processing threads, you'll need to deal with combined peak throughput of multiple queries ? In this talk we'll first explore the basic drivers of KSQL throughput and hardware requirements, building up to more advanced query plan analysis and capacity-planning techniques, and review some real-world testing results along the way. Finally we will recap how and what to monitor to know you got it right!
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...HostedbyConfluent
In any enterprise or cloud application, Task scheduling is a key requirement. A highly available and fault-tolerant task scheduling will help us to improve our business goals.
A classic task scheduling infrastructure is typically backed by databases. The instances/service that performs the scheduling, loads the task definitions from the database into memory and performs the task scheduling.
This kind of infrastructure creates issues like stateful services, inability to scale the services horizontally, being prone to frequent failures, etc., If the state of these kinds of services is not maintained well, it may lead to inconsistent and integrity issues.
To mitigate these issues, we will explore a high available and fault-tolerant task scheduling infrastructure using Kafka, Kafka Streams, and State Store.
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...HostedbyConfluent
Kafka Streams has captured the hearts and minds of many developers that want to develop streaming applications on top of Kafka. But as powerful as the framework is, Kafka Streams has had a hard time getting around the requirement of writing Java code and setting up build pipelines. There were some attempts to rebuild Kafka Streams, but up until now popular languages like Python did not receive equally powerful (and maintained) stream processing frameworks. In this session we will present a new declarative approach to unlock Kafka Streams, called KSML. After this session you will be able to write streaming applications yourself, using only a few simple basic rules and Python snippets.
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioHostedbyConfluent
We all love to play with the shiny toys, but an event stream with no events is a sorry sight. In this session you’ll see how to create your own streaming dataset for Apache Kafka using Python and the Faker library. You’ll learn how to create a random data producer and define the structure and rate of its message delivery. Randomly-generated data is often hilarious in its own right, and it adds just the right amount of fun to any Kafka and its integrations!
Changing landscapes in data integration - Kafka Connect for near real-time da...HostedbyConfluent
1. The document discusses Kafka Connect and its evolution for managing near real-time streaming pipelines. It describes how Kafka Connect can be used for data integration across different systems and challenges around identity when deploying Kafka Connect.
2. It introduces the concept of a managed Kafka Connect which deploys Kafka Connect on the customer's own Kubernetes namespace to avoid security and identity issues. The managed Connect is configured and managed through a centralized control plane.
3. It details how the managed Kafka Connect control plane can be used to provision Kafka Connect clusters, deploy connectors between different systems, override connector configurations, and monitor tasks.
Apache Kafka® is the technology behind event streaming which is fast becoming the central nervous system of flexible, scalable, modern data architectures. Customers want to connect their databases, data warehouses, applications, microservices and more, to power the event streaming platform. To connect to Apache Kafka, you need a connector!
This online talk dives into the new Verified Integrations Program and the integration requirements, the Connect API and sources and sinks that use Kafka Connect. We cover the verification steps and provide code samples created by popular application and database companies. We will discuss the resources available to support you through the connector development process.
This is Part 2 of 2 in Building Kafka Connectors - The Why and How
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...confluent
Speaker: Perry Krol, Senior Sales Engineer, Confluent Germany GmbH
Title of Talk:
Introduction to Apache Kafka as Event-Driven Open Source Streaming Platform
Abstract:
Apache Kafka is a de facto standard event streaming platform, being widely deployed as a messaging system, and having a robust data integration framework (Kafka Connect) and stream processing API (Kafka Streams) to meet the needs that common attend real-time event driven data processing.
The open source Confluent Platform adds further components such as a KSQL, Schema Registry, REST Proxy, Clients for different programming languages and Connectors for different technologies and databases. This session explains the concepts, architecture and technical details, including live demos.
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...confluent
Application teams in JPMC have started shifting towards building event driven architectures and real time steaming pipelines and Kafka has been at core in this journey. As application teams have started adopting Kafka rapidly, need for a centrally managed Kafka as a service has emerged. We have started delivering Kafka as a service in early 2018 and running in production for more than an year now operating 80+ clusters (and growing) in all environments together. One of the key requirements is to provide truly segregated, secured multi-tenant environment with RBAC model while satisfying financial regulations and controls at the same time. Operating clusters at large scale requires scalable self-service capabilities and cluster management orchestration. In this talk we will present - Our experiences in delivering and operating secured, multi-tenant and resilient Kafka clusters at scale. - Internals of our service framework/control plane which enables self-service capabilities for application teams, cluster build/patch orchestration and capacity management capabilities for TSE/admin teams. - Our approach in enabling automated Cross Datacenter failover for application teams using service framework and confluent replicator.
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...confluent
Ever wondered just how many CPU cores of KSQL Server you need to provision to handle your planned stream processing workload ? Or how many GBits of aggregate network bandwidth, spread across some number of processing threads, you'll need to deal with combined peak throughput of multiple queries ? In this talk we'll first explore the basic drivers of KSQL throughput and hardware requirements, building up to more advanced query plan analysis and capacity-planning techniques, and review some real-world testing results along the way. Finally we will recap how and what to monitor to know you got it right!
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...HostedbyConfluent
In any enterprise or cloud application, Task scheduling is a key requirement. A highly available and fault-tolerant task scheduling will help us to improve our business goals.
A classic task scheduling infrastructure is typically backed by databases. The instances/service that performs the scheduling, loads the task definitions from the database into memory and performs the task scheduling.
This kind of infrastructure creates issues like stateful services, inability to scale the services horizontally, being prone to frequent failures, etc., If the state of these kinds of services is not maintained well, it may lead to inconsistent and integrity issues.
To mitigate these issues, we will explore a high available and fault-tolerant task scheduling infrastructure using Kafka, Kafka Streams, and State Store.
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...HostedbyConfluent
Kafka Streams has captured the hearts and minds of many developers that want to develop streaming applications on top of Kafka. But as powerful as the framework is, Kafka Streams has had a hard time getting around the requirement of writing Java code and setting up build pipelines. There were some attempts to rebuild Kafka Streams, but up until now popular languages like Python did not receive equally powerful (and maintained) stream processing frameworks. In this session we will present a new declarative approach to unlock Kafka Streams, called KSML. After this session you will be able to write streaming applications yourself, using only a few simple basic rules and Python snippets.
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioHostedbyConfluent
We all love to play with the shiny toys, but an event stream with no events is a sorry sight. In this session you’ll see how to create your own streaming dataset for Apache Kafka using Python and the Faker library. You’ll learn how to create a random data producer and define the structure and rate of its message delivery. Randomly-generated data is often hilarious in its own right, and it adds just the right amount of fun to any Kafka and its integrations!
(Randall Hauch, Confluent) Kafka Summit SF 2018
The Kafka Connect framework makes it easy to move data into and out of Kafka, and you want to write a connector. Where do you start, and what are the most important things to know? This is an advanced talk that will cover important aspects of how the Connect framework works and best practices of designing, developing, testing and packaging connectors so that you and your users will be successful. We’ll review how the Connect framework is evolving, and how you can help develop and improve it.
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...HostedbyConfluent
In a real-time data ingestion pipeline for analytical processing, efficient and fast data loading to a columnar database such as ClickHouse favors large blocks over individual rows. Therefore, applications often rely on some buffering mechanism such as Kafka to store data temporarily, and having a message processing engine to aggregate Kafka messages into large blocks which then get loaded to the backend database. Due to various failures in this pipeline, a naive block aggregator that forms blocks without additional measures, would cause data duplication or data loss. We have developed a solution to avoid these issues, thereby achieving exactly-once delivery from Kafka to ClickHouse. Our solution utilizes Kafka’s metadata to keep track of blocks that we intend to send to ClickHouse, and later uses this metadata information to deterministically re-produce ClickHouse blocks for re-tries in case of failures. The identical blocks are guaranteed to be deduplicated by ClickHouse. We have also developed a run-time verification tool that monitors Kafka’s internal metadata topic, and raises alerts when the required invariants for exactly-once delivery are violated. Our solution has been developed and deployed to the production clusters that span multiple datacenters at eBay.
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020confluent
Kafka is one of the most important foundation services at Zendesk. It became even more crucial with the introduction of Global Event Bus which my team built to propagate events between Kafka clusters hosted at different parts of the world and between different products. As part of its rollout, we had to add mTLS support in all of our Kafka Clusters (we have quite a few of them), this was to make propagation of events between clusters hosted at different parts of the world secure. It was quite a journey, but we eventually built a solution that is working well for us.
Things I will be sharing as part of the talk:
1. Establishing the use case/problem we were trying to solve (why we needed mTLS)
2. Building a Certificate Authority with open source tools (with self-signed Root CA)
3. Building helper components to generate certificates automatically and regenerate them before they expire (helps using a shorter TTL (Time To Live) which is good security practice) for both Kafka Clients and Brokers
4. Hot reloading regenerated certificates on Kafka brokers without downtime
5. What we built to rotate the self-signed root CA without downtime as well across the board
6. Monitoring and alerts on TTL of certificates
7. Performance impact of using TLS (along with why TLS affects kafka’s performance)
8. What we are doing to drive adoption of mTLS for existing Kafka clients using PLAINTEXT protocol by making onboarding easier
9. How this will become a base for other features we want, eg ACL, Rate Limiting (by using the principal from the TLS certificate as Identity of clients)
My new industry acronym: PSTL
the Parallelized Streaming Transformation Loader (Pron. PiSToL) is an architecture for highly scalable and reliable, data ingestion pipelines
While there is guidance on using; Apache Kafka™ for Streaming (or non-Streaming), and Apache Spark™ for Transformations, and Loading data (e.g., COPY) into an HP-Vertica™ columnar Data Warehouse, there is very little prescriptive guidance on how to truly parallelize a unified data pipeline - until now.
Connect at Twitter-scale | Jordan Bull and Ryanne Dolan, TwitterHostedbyConfluent
Twitter has one of the largest Kafka fleets in the world, handling hundreds of millions of events per second. In order to operate Kafka Connect at this scale, we've had to get creative. In this talk we'll present some of the problems we've run into with Kafka Connect, and how we've engineered around them.
Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent
Facing Open Banking regulation, rapidly increasing transaction volumes and increasing customer expectations, Nationwide took the decision to take load off their back-end systems through real-time streaming of data changes into Kafka. Hear about how Nationwide started their journey with Kafka, from their initial use case of creating a real-time data cache using Change Data Capture, Kafka and Microservices to how Kafka allowed them to build a stream processing backbone used to reengineer the entire banking experience including online banking, payment processing and mortgage applications. See a working demo of the system and what happens to the system when the underlying infrastructure breaks. Technologies covered include: Change Data Capture, Kafka (Avro, partitioning and replication) and using KSQL and Kafka Streams Framework to join topics and process data.
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...confluent
Getting Kafka running on Kubernetes is only step one of a journey to create a production-ready Kafka cluster. This talk walks through the other steps: 1) Monitoring and remediating faults. 2) Updates to Kubernetes nodes for clusters not using shared storage. 3) Automating Kafka updates and restarts. We present how to create fault-tolerant Kafka clusters on Kubernetes without sacrificing availability, durability, or latency. Learn about Lyft's overlay-free Kubernetes networking driver and how we use it to keep performance on par with non-Kubernetes clusters.
Watch this webcast here: https://www.confluent.io/online-talks/whats-new-in-confluent-platform-55/
Join the Confluent Product Marketing team as we provide an overview of Confluent Platform 5.5, which makes Apache Kafka and event streaming more broadly accessible to developers with enhancements to data compatibility, multi-language development, and ksqlDB.
Building an event-driven architecture with Apache Kafka allows you to transition from traditional silos and monolithic applications to modern microservices and event streaming applications. With these benefits has come an increased demand for Kafka developers from a wide range of industries. The Dice Tech Salary Report recently ranked Kafka as the highest-paid technological skill of 2019, a year removed from ranking it second.
With Confluent Platform 5.5, we are making it even simpler for developers to connect to Kafka and start building event streaming applications, regardless of their preferred programming languages or the underlying data formats used in their applications.
This session will cover the key features of this latest release, including:
-Support for Protobuf and JSON schemas in Confluent Schema Registry and throughout our entire platform
-Exactly once semantics for non-Java clients
-Admin functions in REST Proxy (preview)
-ksqlDB 0.7 and ksqlDB Flow View in Confluent Control Center
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)confluent
Presenter: Tim Berglund, Senior Director of Developer Experience, Confluent
It has become a truism in the past decade that building systems at scale, using non-relational databases, requires giving up on the transactional guarantees afforded by the relational databases of yore. ACID transactional semantics are fine, but we all know you can’t have them all in a distributed system. Or can we?
In this talk, I will argue that by designing our systems around a distributed log like Apache Kafka®, we can in fact achieve ACID semantics at scale. We can ensure that distributed write operations can be applied atomically, consistently, in isolation between services, and of course with durability. What seems to be a counterintuitive conclusion ends up being straightforwardly achievable using existing technologies, as an elusive set of properties becomes relatively easy to achieve with the right architectural paradigm underlying the application.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
This document provides an overview of Kafka Connect and how it can be used to stream data between Kafka and other data systems. It discusses key Kafka Connect concepts like connectors, converters, transforms, deployment modes, and troubleshooting. The document contains configuration examples for connectors and transforms.
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...HostedbyConfluent
This talk is aimed to give developers who are interested to scale their streaming application with Exactly-Once (EOS) guarantees. Since the original release, EOS processing has received wide adoption as a much needed feature inside the community, and has also exposed various scalability and usability issues when applied in production systems.
To address those issues, we improved on the existing EOS model by integrating static Producer transaction semantics with dynamic Consumer group semantics. We will have a deep-dive into the newly added features (KIP-447), from which the audience will have more insight into the scalability v.s. semantics guarantees tradeoffs and how Kafka Streams specifically leveraged them to help scale EOS streaming applications written in this library. We would also present how the EOS code can be simplified with plain Producer and Consumer. Come to learn more if you wish to adopt this improved EOS feature and get started on building your own EOS application today!
Live Event Debugging With ksqlDB at Reddit | Hannah Hagen and Paul Kiernan, R...HostedbyConfluent
Convincing developers to write tests for new code is hard; convincing developers to write tests for new event data is even harder. At Reddit, engineers have often deployed new app versions, only to find out later that the event wasn’t firing at all, or it was missing critical fields. So this begs the question, “How can engineers at Reddit be confident that the events they instrument are accurate and complete?”
In this session, we will learn about an internal tool developed at Reddit to QA events in real-time. This KSQL-powered web app streams events from our pipeline, allowing developers to filter events they care about using criteria like User ID, Device ID or the type of user interaction. With a backbone of KSQL and Kafka Streams, engineers can get real-time feedback on how accurate (or how erroneous) their event data is.
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Getting Started with Confluent Schema Registryconfluent
Getting started with Confluent Schema Registry, Patrick Druley, Senior Solutions Engineer, Confluent
Meetup link: https://www.meetup.com/Cleveland-Kafka/events/272787313/
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatHostedbyConfluent
When your Kafka clusters start growing so is the cost associated with them. As administrators we have to ensure that the service we support is operating in the most reliable way to satisfy the customers. However, for our business it is as important that we ensure the same service is also cost-efficient. There are two ways we can optimize the cost of service – tuning broker machines and tuning the data transfers. Minimizing data transfer is the largest return on investment since that is what accounts for the most spend. With the use of Kafka administrative tools and metrics we can find multiple ways to reduce the data transfers in the clusters.
The presentation will cover various techniques administrators of Kafka service can employ to reduce the data transfers and to save the operational costs. Reducing cross-AZ traffic, optimizing batching with use of DumpLogSegment script, utilizing Kafka metrics to shut down unused data streams and more.
With an objective of making our Kafka deployment as cost effective as possible, we have gained money saving tricks. And we would love to share them with the community.
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019confluent
Data stream processing is built on the core concept of time. However, understanding time semantics and reasoning about time is not simple, especially if deterministic processing is expected. In this talk, we explain the difference between processing, ingestion, and event time and what their impact is on data stream processing. Furthermore, we explain how Kafka clusters and stream processing applications must be configured to achieve specific time semantics. Finally, we deep dive into the time semantics of the Kafka Streams DSL and KSQL operators, and explain in detail how the runtime handles time. Apache Kafka offers many ways to handle time on the storage layer, ie, the brokers, allowing users to build applications with different semantics. Time semantics in the processing layer, ie, Kafka Streams and KSQL, are even richer, more powerful, but also more complicated. Hence, it is paramount for developers, to understand different time semantics and to know how to configure Kafka to achieve them. Therefore, this talk enables developers to design applications with their desired time semantics, help them to reason about the runtime behavior with regard to time, and allow them to understand processing/query results.
Exactly-once Data Processing with Kafka Streams - July 27, 2017confluent
This document discusses exactly-once processing in stream processing systems. It begins by defining exactly-once processing and describing some of the challenges in achieving it. It then outlines three options for achieving exactly-once processing with Kafka: at-least-once processing with deduplication, using Kafka's idempotent producer and transactions, and using Kafka Streams. The document focuses on Kafka Streams, describing how it provides exactly-once guarantees through transactional processing of data in batches across the processing topology.
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...Vinu Charanya
Twitter is powered by thousands of microservices that run on our internal Cloud platform which consists of a suite of multi-tenant platform services that offer Compute, Storage, Messaging, Monitoring, etc as a service. These platforms have thousands of tenants and run atop hundreds of thousands of servers, across on-prem & the public cloud. The scale & diversity in multi-tenant infrastructure services make it extremely difficult to effectively forecast capacity, compute resource utilization & cost and drive efficiency.
In this talk, I would like to share how my team is building a system (Kite - A unified service manager) to help define, model, provision, meter & charge infrastructure resources. The infrastructure resources include primitive bare metal servers / VMs on the public cloud and abstract resources offered by multi-tenant services such as our Compute platform (powered by Apache Aurora/Mesos), Storage (Manhattan for key/val, Cache, RDBMS), Observability. Along with how we solved this problem, I also intend to share a few case-studies on how we were able to use this data to better plan capacity & drive a cultural change in engineering that helped improve overall resource utilization & drive significant savings in infrastructure spending.
SignalFx is an advanced monitoring and alerting system for cloud applications delivered as SaaS. It provides real-time metrics, analytics, and tagging to monitor microservices architectures. Traditional monitoring approaches are noisy and reactive, while SignalFx aims to provide guided triage and correlate events using time series analytics to identify patterns and anomalies.
(Randall Hauch, Confluent) Kafka Summit SF 2018
The Kafka Connect framework makes it easy to move data into and out of Kafka, and you want to write a connector. Where do you start, and what are the most important things to know? This is an advanced talk that will cover important aspects of how the Connect framework works and best practices of designing, developing, testing and packaging connectors so that you and your users will be successful. We’ll review how the Connect framework is evolving, and how you can help develop and improve it.
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...HostedbyConfluent
In a real-time data ingestion pipeline for analytical processing, efficient and fast data loading to a columnar database such as ClickHouse favors large blocks over individual rows. Therefore, applications often rely on some buffering mechanism such as Kafka to store data temporarily, and having a message processing engine to aggregate Kafka messages into large blocks which then get loaded to the backend database. Due to various failures in this pipeline, a naive block aggregator that forms blocks without additional measures, would cause data duplication or data loss. We have developed a solution to avoid these issues, thereby achieving exactly-once delivery from Kafka to ClickHouse. Our solution utilizes Kafka’s metadata to keep track of blocks that we intend to send to ClickHouse, and later uses this metadata information to deterministically re-produce ClickHouse blocks for re-tries in case of failures. The identical blocks are guaranteed to be deduplicated by ClickHouse. We have also developed a run-time verification tool that monitors Kafka’s internal metadata topic, and raises alerts when the required invariants for exactly-once delivery are violated. Our solution has been developed and deployed to the production clusters that span multiple datacenters at eBay.
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020confluent
Kafka is one of the most important foundation services at Zendesk. It became even more crucial with the introduction of Global Event Bus which my team built to propagate events between Kafka clusters hosted at different parts of the world and between different products. As part of its rollout, we had to add mTLS support in all of our Kafka Clusters (we have quite a few of them), this was to make propagation of events between clusters hosted at different parts of the world secure. It was quite a journey, but we eventually built a solution that is working well for us.
Things I will be sharing as part of the talk:
1. Establishing the use case/problem we were trying to solve (why we needed mTLS)
2. Building a Certificate Authority with open source tools (with self-signed Root CA)
3. Building helper components to generate certificates automatically and regenerate them before they expire (helps using a shorter TTL (Time To Live) which is good security practice) for both Kafka Clients and Brokers
4. Hot reloading regenerated certificates on Kafka brokers without downtime
5. What we built to rotate the self-signed root CA without downtime as well across the board
6. Monitoring and alerts on TTL of certificates
7. Performance impact of using TLS (along with why TLS affects kafka’s performance)
8. What we are doing to drive adoption of mTLS for existing Kafka clients using PLAINTEXT protocol by making onboarding easier
9. How this will become a base for other features we want, eg ACL, Rate Limiting (by using the principal from the TLS certificate as Identity of clients)
My new industry acronym: PSTL
the Parallelized Streaming Transformation Loader (Pron. PiSToL) is an architecture for highly scalable and reliable, data ingestion pipelines
While there is guidance on using; Apache Kafka™ for Streaming (or non-Streaming), and Apache Spark™ for Transformations, and Loading data (e.g., COPY) into an HP-Vertica™ columnar Data Warehouse, there is very little prescriptive guidance on how to truly parallelize a unified data pipeline - until now.
Connect at Twitter-scale | Jordan Bull and Ryanne Dolan, TwitterHostedbyConfluent
Twitter has one of the largest Kafka fleets in the world, handling hundreds of millions of events per second. In order to operate Kafka Connect at this scale, we've had to get creative. In this talk we'll present some of the problems we've run into with Kafka Connect, and how we've engineered around them.
Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent
Facing Open Banking regulation, rapidly increasing transaction volumes and increasing customer expectations, Nationwide took the decision to take load off their back-end systems through real-time streaming of data changes into Kafka. Hear about how Nationwide started their journey with Kafka, from their initial use case of creating a real-time data cache using Change Data Capture, Kafka and Microservices to how Kafka allowed them to build a stream processing backbone used to reengineer the entire banking experience including online banking, payment processing and mortgage applications. See a working demo of the system and what happens to the system when the underlying infrastructure breaks. Technologies covered include: Change Data Capture, Kafka (Avro, partitioning and replication) and using KSQL and Kafka Streams Framework to join topics and process data.
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...confluent
Getting Kafka running on Kubernetes is only step one of a journey to create a production-ready Kafka cluster. This talk walks through the other steps: 1) Monitoring and remediating faults. 2) Updates to Kubernetes nodes for clusters not using shared storage. 3) Automating Kafka updates and restarts. We present how to create fault-tolerant Kafka clusters on Kubernetes without sacrificing availability, durability, or latency. Learn about Lyft's overlay-free Kubernetes networking driver and how we use it to keep performance on par with non-Kubernetes clusters.
Watch this webcast here: https://www.confluent.io/online-talks/whats-new-in-confluent-platform-55/
Join the Confluent Product Marketing team as we provide an overview of Confluent Platform 5.5, which makes Apache Kafka and event streaming more broadly accessible to developers with enhancements to data compatibility, multi-language development, and ksqlDB.
Building an event-driven architecture with Apache Kafka allows you to transition from traditional silos and monolithic applications to modern microservices and event streaming applications. With these benefits has come an increased demand for Kafka developers from a wide range of industries. The Dice Tech Salary Report recently ranked Kafka as the highest-paid technological skill of 2019, a year removed from ranking it second.
With Confluent Platform 5.5, we are making it even simpler for developers to connect to Kafka and start building event streaming applications, regardless of their preferred programming languages or the underlying data formats used in their applications.
This session will cover the key features of this latest release, including:
-Support for Protobuf and JSON schemas in Confluent Schema Registry and throughout our entire platform
-Exactly once semantics for non-Java clients
-Admin functions in REST Proxy (preview)
-ksqlDB 0.7 and ksqlDB Flow View in Confluent Control Center
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)confluent
Presenter: Tim Berglund, Senior Director of Developer Experience, Confluent
It has become a truism in the past decade that building systems at scale, using non-relational databases, requires giving up on the transactional guarantees afforded by the relational databases of yore. ACID transactional semantics are fine, but we all know you can’t have them all in a distributed system. Or can we?
In this talk, I will argue that by designing our systems around a distributed log like Apache Kafka®, we can in fact achieve ACID semantics at scale. We can ensure that distributed write operations can be applied atomically, consistently, in isolation between services, and of course with durability. What seems to be a counterintuitive conclusion ends up being straightforwardly achievable using existing technologies, as an elusive set of properties becomes relatively easy to achieve with the right architectural paradigm underlying the application.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
This document provides an overview of Kafka Connect and how it can be used to stream data between Kafka and other data systems. It discusses key Kafka Connect concepts like connectors, converters, transforms, deployment modes, and troubleshooting. The document contains configuration examples for connectors and transforms.
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...HostedbyConfluent
This talk is aimed to give developers who are interested to scale their streaming application with Exactly-Once (EOS) guarantees. Since the original release, EOS processing has received wide adoption as a much needed feature inside the community, and has also exposed various scalability and usability issues when applied in production systems.
To address those issues, we improved on the existing EOS model by integrating static Producer transaction semantics with dynamic Consumer group semantics. We will have a deep-dive into the newly added features (KIP-447), from which the audience will have more insight into the scalability v.s. semantics guarantees tradeoffs and how Kafka Streams specifically leveraged them to help scale EOS streaming applications written in this library. We would also present how the EOS code can be simplified with plain Producer and Consumer. Come to learn more if you wish to adopt this improved EOS feature and get started on building your own EOS application today!
Live Event Debugging With ksqlDB at Reddit | Hannah Hagen and Paul Kiernan, R...HostedbyConfluent
Convincing developers to write tests for new code is hard; convincing developers to write tests for new event data is even harder. At Reddit, engineers have often deployed new app versions, only to find out later that the event wasn’t firing at all, or it was missing critical fields. So this begs the question, “How can engineers at Reddit be confident that the events they instrument are accurate and complete?”
In this session, we will learn about an internal tool developed at Reddit to QA events in real-time. This KSQL-powered web app streams events from our pipeline, allowing developers to filter events they care about using criteria like User ID, Device ID or the type of user interaction. With a backbone of KSQL and Kafka Streams, engineers can get real-time feedback on how accurate (or how erroneous) their event data is.
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Getting Started with Confluent Schema Registryconfluent
Getting started with Confluent Schema Registry, Patrick Druley, Senior Solutions Engineer, Confluent
Meetup link: https://www.meetup.com/Cleveland-Kafka/events/272787313/
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatHostedbyConfluent
When your Kafka clusters start growing so is the cost associated with them. As administrators we have to ensure that the service we support is operating in the most reliable way to satisfy the customers. However, for our business it is as important that we ensure the same service is also cost-efficient. There are two ways we can optimize the cost of service – tuning broker machines and tuning the data transfers. Minimizing data transfer is the largest return on investment since that is what accounts for the most spend. With the use of Kafka administrative tools and metrics we can find multiple ways to reduce the data transfers in the clusters.
The presentation will cover various techniques administrators of Kafka service can employ to reduce the data transfers and to save the operational costs. Reducing cross-AZ traffic, optimizing batching with use of DumpLogSegment script, utilizing Kafka metrics to shut down unused data streams and more.
With an objective of making our Kafka deployment as cost effective as possible, we have gained money saving tricks. And we would love to share them with the community.
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019confluent
Data stream processing is built on the core concept of time. However, understanding time semantics and reasoning about time is not simple, especially if deterministic processing is expected. In this talk, we explain the difference between processing, ingestion, and event time and what their impact is on data stream processing. Furthermore, we explain how Kafka clusters and stream processing applications must be configured to achieve specific time semantics. Finally, we deep dive into the time semantics of the Kafka Streams DSL and KSQL operators, and explain in detail how the runtime handles time. Apache Kafka offers many ways to handle time on the storage layer, ie, the brokers, allowing users to build applications with different semantics. Time semantics in the processing layer, ie, Kafka Streams and KSQL, are even richer, more powerful, but also more complicated. Hence, it is paramount for developers, to understand different time semantics and to know how to configure Kafka to achieve them. Therefore, this talk enables developers to design applications with their desired time semantics, help them to reason about the runtime behavior with regard to time, and allow them to understand processing/query results.
Exactly-once Data Processing with Kafka Streams - July 27, 2017confluent
This document discusses exactly-once processing in stream processing systems. It begins by defining exactly-once processing and describing some of the challenges in achieving it. It then outlines three options for achieving exactly-once processing with Kafka: at-least-once processing with deduplication, using Kafka's idempotent producer and transactions, and using Kafka Streams. The document focuses on Kafka Streams, describing how it provides exactly-once guarantees through transactional processing of data in batches across the processing topology.
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...Vinu Charanya
Twitter is powered by thousands of microservices that run on our internal Cloud platform which consists of a suite of multi-tenant platform services that offer Compute, Storage, Messaging, Monitoring, etc as a service. These platforms have thousands of tenants and run atop hundreds of thousands of servers, across on-prem & the public cloud. The scale & diversity in multi-tenant infrastructure services make it extremely difficult to effectively forecast capacity, compute resource utilization & cost and drive efficiency.
In this talk, I would like to share how my team is building a system (Kite - A unified service manager) to help define, model, provision, meter & charge infrastructure resources. The infrastructure resources include primitive bare metal servers / VMs on the public cloud and abstract resources offered by multi-tenant services such as our Compute platform (powered by Apache Aurora/Mesos), Storage (Manhattan for key/val, Cache, RDBMS), Observability. Along with how we solved this problem, I also intend to share a few case-studies on how we were able to use this data to better plan capacity & drive a cultural change in engineering that helped improve overall resource utilization & drive significant savings in infrastructure spending.
SignalFx is an advanced monitoring and alerting system for cloud applications delivered as SaaS. It provides real-time metrics, analytics, and tagging to monitor microservices architectures. Traditional monitoring approaches are noisy and reactive, while SignalFx aims to provide guided triage and correlate events using time series analytics to identify patterns and anomalies.
Keynote 1 the rise of stream processing for data management & micro serv...Sabri Skhiri
This keynote describes the 3 waves of the stream processing, starting from the Lambda architecture to stateful stream processing. We show that the rise of Stateful stream processing, Event-driven architecture, kappa architecture and micro-service architecture lead to rethink the way we can implement data architecture and micro-service architecture.
Cloud Computing for Business - The Road to IT-as-a-ServiceJames Urquhart
This document discusses the transition from infrastructure-centric IT operations to service-centric operations driven by cloud computing. It explains that cloud computing asks IT to deliver computing resources as a set of services rather than managing individual servers, networks, and storage. This new model of service operations is comprised of application operations, service operations, and infrastructure operations. Application operations focus on consuming services through tools like multi-cloud management. Service operations identify and deliver key services with supporting capabilities. Infrastructure operations provide the common foundation through APIs and automation.
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Karthik Ramasamy
Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. In order
to meet this challenge, Twitter designed an end to end real-time stack consisting of DistributedLog, the distributed and replicated messaging system system, and Heron, the streaming system for real time computation. DistributedLog is a replicated log service that is built on top of Apache BookKeeper, providing infinite, ordered, append-only streams that can be used for building robust real-time systems. It is the foundation of Twitter’s publish-subscribe system. Twitter Heron is the next generation streaming system built from ground up to address our scalability and reliability needs. Both the systems have been in production for nearly two years and is widely used at Twitter in a range of diverse applications such as search ingestion pipeline, ad analytics, image classification and more. These slides will describe Heron and DistributedLog in detail, covering a few use cases in-depth and sharing the operating experiences and challenges of running large-scale real time systems at scale.
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Sumeet Singh
This document discusses lessons learned from building a scalable, self-serve, real-time, multi-tenant monitoring service at Yahoo. It describes transitioning from a classical architecture to one based on real-time big data technologies like Storm and Kafka. Key lessons include properly handling producer-consumer problems at scale, challenges of debugging skewed data, strategically managing multi-tenancy and resources, issues optimizing asynchronous systems, and not neglecting assumptions outside the application.
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...Dataconomy Media
Stephen Cantrell, kdb+ Developer at Kx Systems
“Kdb+: How Wall Street Tech can Speed up the World"
You can see some additional notes here:
https://github.com/cantrells/berlin_kdb_demo?files=1
In this tutorial we walk through state-of-the-art streaming systems, algorithms, and deployment architectures and cover the typical challenges in modern real-time big data platforms and offering insights on how to address them. We also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, we explore the interplay between storage and stream processing and discuss future developments.
Designing Modern Streaming Data ApplicationsArun Kejariwal
Many industry segments have been grappling with fast data (high-volume, high-velocity data). The enterprises in these industry segments need to process this fast data just in time to derive insights and act upon it quickly. Such tasks include but are not limited to enriching data with additional information, filtering and reducing noisy data, enhancing machine learning models, providing continuous insights on business operations, and sharing these insights just in time with customers. In order to realize these results, an enterprise needs to build an end-to-end data processing system, from data acquisition, data ingestion, data processing, and model building to serving and sharing the results. This presents a significant challenge, due to the presence of multiple messaging frameworks and several streaming computing frameworks and storage frameworks for real-time data.
In this tutorial we lead a journey through the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline, messaging frameworks, streaming computing frameworks, storage frameworks for real-time data, and more. We also share case studies from the IoT, gaming, and healthcare as well as their experience operating these systems at internet scale at Twitter and Yahoo. We conclude by offering their perspectives on how advances in hardware technology and the emergence of new applications will impact the evolution of messaging systems, streaming systems, storage systems for streaming data, and reinforcement learning-based systems that will power fast processing and analysis of a large (potentially of the order of hundreds of millions) set of data streams.
Topics include:
* An introduction to streaming
* Common data processing patterns
* Different types of end-to-end stream processing architectures
* How to seamlessly move data across data different frameworks
* Case studies: Healthcare and the IoT
* Data sketches for mining insights from data streams
CQRS and Event Sourcing: A DevOps perspectiveMaria Gomez
This document discusses challenges of deploying, monitoring, and debugging systems using CQRS and event sourcing from a DevOps perspective. It describes using a blue/green deployment approach, implementing consistent and usable logging, monitoring key metrics and data streams, and employing distributed tracing to identify the origin of requests in order to quickly debug problems. The overall goal is to build scalable, resilient, and automated systems while facilitating operational tasks through iterative improvements to tools and processes.
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
The hidden engineering behind machine learning products at Helixa
Gianmario Spacagna, (Helixa)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
Matei Zaharia is an assistant professor of computer science at Stanford University, Chief Technologist and Co-founder of Databricks. He started the Spark project at UC Berkeley and continues to serve as its vice president at Apache. Matei also co-started the Apache Mesos project and is a committer on Apache Hadoop. Matei’s research work on datacenter systems was recognized through two Best Paper awards and the 2014 ACM Doctoral Dissertation Award.
Revolutionary container based hybrid cloud solution for MLPlatform
Ness' data science platform, NextGenML, puts the entire machine learning process: modelling, execution and deployment in the hands of data science teams.
The entire paradigm approaches collaboration around AI/ML, being implemented with full respect for best practices and commitment to innovation.
Kubernetes (onPrem) + Docker, Azure Kubernetes Cluster (AKS), Nexus, Azure Container Registry(ACR), GlusterFS
Workflow
Argo->Kubeflow
DevOps
Helm, kSonnet, Kustomize,Azure DevOps
Code Management & CI/CD
Git, TeamCity, SonarQube, Jenkins
Security
MS Active Directory, Azure VPN, Dex (K8s) integrated with GitLab
Machine Learning
TensorFlow (model training, boarding, serving), Keras, Seldon
Storage (Azure)
Storage Gen1 & Gen2, Data Lake, File Storage
ETL (Azure)
Databricks, Spark on K8, Data Factory (ADF), HDInsight (Kafka and Spark), Service Bus (ASB)
Lambda functions & VMs, Cache for Redis
Monitoring and Logging
Graphana, Prometeus, GrayLog
Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017Amazon Web Services
At Netflix, we have traditionally approached cloud efficiency from a human standpoint, whether it be in-person meetings with the largest service teams or manually flipping reservations. Over time, we realized that these manual processes are not scalable as the business continues to grow. Therefore, in the past year, we have focused on building out tools that allow us to make more insightful, data-driven decisions around capacity and efficiency. In this session, we discuss the DIY applications, dashboards, and processes we built to help with capacity and efficiency. We start at the ten thousand foot view to understand the unique business and cloud problems that drove us to create these products, and discuss implementation details, including the challenges encountered along the way. Tools discussed include Picsou, the successor to our AWS billing file cost analyzer; Libra, an easy-to-use reservation conversion application; and cost and efficiency dashboards that relay useful financial context to 50+ engineering teams and managers.
From zero to hero with the actor model - Tamir Dresher - Odessa 2019Tamir Dresher
My talk from Odessa .NET User Group - http://www.usergroup.od.ua/2019/02/microsoft-net-user-group.html
Code can be found here: https://github.com/tamirdresher/FromZeroToTheActorModel
here's nothing new about the actor model, in fact it was invented in the early seventies. So how come its now the hottest buzzword? In this session you will learn what is the Actor Model and why it helps making your system Reactive - scalable, responsive and resilient. You will get to know Akka.Net library that makes the Actor model a piece of cake.
Proactive ops for container orchestration environmentsDocker, Inc.
This document discusses different approaches to monitoring systems from manual and reactive to proactive monitoring using container orchestration tools. It provides examples of metrics to monitor at the host/hardware, networking, application, and orchestration layers. The document emphasizes applying the principles of observability including structured logging, events and tracing with metadata, and monitoring the monitoring systems themselves. Speakers provide best practices around failure prediction, understanding failure modes, and using chaos engineering to build system resilience.
Telefonica: Automatización de la gestión de redes mediante grafosNeo4j
The document discusses automation of network management through graphs. It covers the evolution of technologies and challenges, modeling networks with graphs, use cases demonstrating the theory in practice, and ongoing projects. Key points include using graphs for network inventories as the basis for automation of processes like network creation, service fulfillment, and assurance. Case studies demonstrate how graph databases support these functions.
Transform Your Telecom Operations with Graph TechnologiesNeo4j
The telco industry faces an ever-increasing expectation from their customers on quality and availability of the services offered; any interruption or degradation of the service has a tremendously negative impact on their business and can lead to customer churn.
It’s no wonder this industry was one of the first to realize the power of graphs, especially in the areas of network and service management. Two of the three largest Telcos in the world, three of the five largest telco equipment vendors, and leading OSS vendors and major MVPDs have been using Neo4j in mission-critical solutions for years.
Join us to hear how graph technology is helping differentiate a primary OSS domain: Network and service assurance. We’ll look at why graph technology is used to help optimize network and service performance in critical areas:
•Performance management
•Fault and event management
•Service quality management
•Discovery and reconciliation
The performance, flexibility, and expressivity of a native graph platform are truly transformative for these challenging disciplines. Register and learn how you can leverage graph technology for your next generation service assurance solution.
Accelerating analytics on the Sensor and IoT Data. Keshav Murthy
Informix Warehouse Accelerator (IWA) has helped traditional
data warehousing performance to improve dramatically. Now,
IWA accelerates analytics over the sensor data stored in relational and timeseries data.
This document discusses Danaos' use of BigDataStack data services for real-time shipping data. It describes how streaming data is handled at the edge and within the data center using Complex Event Processing (CEP). It also discusses how Danaos utilizes a CEP-integrated LXS database, a seamless component for single access to data, and a data skipping technique to improve query performance. The document recaps how these services enable predictive maintenance, data quality assessment, and an integrated Danaos platform for real-time shipping data analytics.
Similar to [Velocity Conf 2017 NY] How Twitter built a framework to improve infrastructure utilization and efficiency at scale (20)
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
9. INFRASTRUCTURE & DATACENTER MANAGEMENT
CORE APPLICATION
SERVICES
TWEETS
USERS
SOCIAL
GRAPH
PLATFORM SERVICES
SEARCH
MESSAGING &
QUEUES
CACHE
MONITORING AND
ALERTING
INGRESS &
PROXY
FRAMEWORK/
LIBRARIES
FINAGLE
(RPC)
SCALDING
(Map Reduce in
Scala)
HERON
(Streaming
Compute)
JVM
MANAGEMENT
TOOLS
SELF SERVE
SERVICE
DIRECTORY
CHARGEBACK
CONFIG
MGMT
DATA & ANALYTICS
PLATFORM
INTERACTIVE
QUERY
DATA
DISCOVERY
WORKFLOW
MANAGEMENT
INFRASTRUCTURE
SERVICES
MANHATTAN
BLOBSTORE
GRAPHSTORE
TIMESERIESDB
S
T
O
R
A
G
E
MESOS/AURORA
HADOOP
C
O
M
P
U
T
E
MYSQL
VERTICA
POSTGRES
D
B
/
D
W
DEPLOY
(Workflows)
16. Chargeback @Twitter
Ability to meter
allocation & utilization of resources
per service,
per project,
per engineering team
17. Chargeback @Twitter
Ability to meter
allocation & utilization of resources
per service,
per project,
per engineering team
to improve visibility &
enable accountability
19. 19
Chargeback @Twitter
1. Resource Catalog: Consistent way to inventory infrastructure
resources
Support diverse Infrastructure and Platform Services
20. 20
Chargeback @Twitter
1. Resource Catalog: Consistent way to inventory infrastructure
resources
• Resource Fluidity: Support primitive (CPU) and abstract resource (“Tweets /
second”). Extend existing resource
Support diverse Infrastructure and Platform Services
21. 21
Chargeback @Twitter
1. Resource Catalog: Consistent way to inventory infrastructure
resources
• Resource Fluidity: Support primitive (CPU) and abstract resource (“Tweets /
second”). Extend existing resource
2. Resource <> Client Identifier Ownership: Map of client identifier to an
owner to enable accountability
Support diverse Infrastructure and Platform Services
30. Operational Overhead
Headroom
Production Used Cores
Non-Prod Used Cores
Cost of Physical Server
($X / day)
Total available Cores
Quota Buffer
(Underutilized Quota)
Container Size Buffer
(Underutilized Reservation)
Total Cost of Ownership for Aurora
$X core-day
31. Operational Overhead
Headroom
Production Used Cores
Non-Prod Used Cores
Cost of Physical Server
($X / day)
Total available Cores
Quota Buffer
(Underutilized Quota)
Container Size Buffer
(Underutilized Reservation)
Total used Cores
Total Cost of Ownership for Aurora
$X core-day
32. Operational Overhead
Headroom
Production Used Cores
Non-Prod Used Cores
Cost of Physical Server
($X / day)
Total available Cores
Quota Buffer
(Underutilized Quota)
Container Size Buffer
(Underutilized Reservation)
Total used Cores
Excess Cores (incl. DR,
Spikes, Overallocation)Total Cost of Ownership for Aurora
$X core-day
33. Operational Overhead
Headroom
Production Used Cores
Non-Prod Used Cores
Cost of Physical Server
($X / day)
Total available Cores
Quota Buffer
(Underutilized Quota)
Container Size Buffer
(Underutilized Reservation)
Total used Cores
Excess Cores (incl. DR,
Spikes, Overallocation)
Cores used by platform
for operations &
maintenance
Total Cost of Ownership for Aurora
$X core-day
34. Operational Overhead
Headroom
Production Used Cores
Non-Prod Used Cores
Cost of Physical Server
($X / day)
Total available Cores
Quota Buffer
(Underutilized Quota)
Container Size Buffer
(Underutilized Reservation)
Total used Cores
Excess Cores (incl. DR,
Spikes, Overallocation)
Cores used by platform
for operations &
maintenance
Total Cost of Ownership for Aurora
$X core-day
38. 38
Chargeback @Twitter
INFRASTRUCTURE
SERVICE 1
INFRASTRUCTURE
SERVICE 2
INGEST
METRICS
RAW
FACT
TRANSFORMER
RESOLVED
FACT
RESOURCE
CATALOG
REPORT
REPORT
Metering Pipeline (ETL Job)
IDENTIFIER
OWNERSHIP
MAPPING
Schema(client_identifier, offering_measure, volume, metadata, timestamp)
DATA FIDELITY
Metering Pipeline (ETL Job)
39. 39
Chargeback @Twitter
Metering Pipeline (ETL Job)
INFRASTRUCTURE
SERVICE 1
INFRASTRUCTURE
SERVICE 2
INGEST
METRICS
RAW
FACT
TRANSFORMER
RESOLVED
FACT
RESOURCE
CATALOG
IDENTIFIER
OWNERSHIP
MAPPING
REPORT
REPORT
Transformer
DATA FIDELITY
Metering Pipeline (ETL Job)
40. 40
Chargeback @Twitter
Metering Pipeline (ETL Job)
INFRASTRUCTURE
SERVICE 1
INFRASTRUCTURE
SERVICE 2
INGEST
METRICS
RAW
FACT
TRANSFORMER
RESOLVED
FACT
RESOURCE
CATALOG
IDENTIFIER
OWNERSHIP
MAPPING
REPORT
REPORT
1. Resolve Ownership
DATA FIDELITY
Metering Pipeline (ETL Job)
41. 41
Chargeback @Twitter
Metering Pipeline (ETL Job)
INFRASTRUCTURE
SERVICE 1
INFRASTRUCTURE
SERVICE 2
INGEST
METRICS
RAW
FACT
TRANSFORMER
RESOLVED
FACT
RESOURCE
CATALOG
IDENTIFIER
OWNERSHIP
MAPPING
REPORT
REPORT
2. Cost Computation
DATA FIDELITY
Metering Pipeline (ETL Job)
42. 42
Chargeback @Twitter
Metering Pipeline (ETL Job)
INFRASTRUCTURE
SERVICE 1
INFRASTRUCTURE
SERVICE 2
INGEST
METRICS
RAW
FACT
TRANSFORMER
RESOLVED
FACT
RESOURCE
CATALOG
DATA FIDELITY
REPORT
REPORT
IDENTIFIER
OWNERSHIP
MAPPING
Data Fidelity & Reporting
Metering Pipeline (ETL Job)
43. 43
Chargeback @Twitter
Metering Pipeline (ETL Job)
INFRASTRUCTURE
SERVICE 1
INFRASTRUCTURE
SERVICE 2
INGEST
METRICS
RAW
FACT
TRANSFORMER
RESOLVED
FACT
RESOURCE
CATALOG
REPORT
REPORT
IDENTIFIER
OWNERSHIP
MAPPING
1. Verify Data Integrity & Fidelity
DATA FIDELITY
Metering Pipeline (ETL Job)
44. 44
Chargeback @Twitter
Metering Pipeline (ETL Job)
INFRASTRUCTURE
SERVICE 1
INFRASTRUCTURE
SERVICE 2
INGEST
METRICS
RAW
FACT
TRANSFORMER
RESOLVED
FACT
RESOURCE
CATALOG
REPORT
REPORT
IDENTIFIER
OWNERSHIP
MAPPING
2. Alert when things don’t seem the way it should be
DATA FIDELITY
Metering Pipeline (ETL Job)
49. 49
Chargeback @Twitter
Customers
Infrastructure & Platform Operators
Overall Cluster Growth
Allocation v/s Utilization of resources by Client/Tenant
Finance & Execs
Budget v/s Spend per Org
Infrastructure PnL
Overall Efficiency & Trends
Service Owners & Developers
Team Bill
Per Service Allocation vs. Utilization of Resources
Reports
53. 53
1 2 3 4
Learnings
Chargeback @Twitter
Invest in data
Fidelity
Accurate Ownership
Mapping
Logical grouping
of resources
Change History
• Trust in data is most
important.
• Invest in monitoring &
alerting for data
inconsistencies
• Leverage this for
detecting abnormal
increase/decrease and
notify users
• Static mappings go out
of date quickly
• Invest in systems (ex,
Kite) for users to manage
it themselves
• Identifiers were too
granular and teams were
too broad.
• Find a good middle
ground and invest in
system (ex, Kite) to track,
understand and maintain
• Unit prices change over
time
• Orgs / Teams change
over time
• Resources get added /
removed
• Change history is
essential for consistency
which is used for CAP
planning
54. 54
1 2 3 4
Learnings
Chargeback @Twitter
Invest in data
Fidelity
Accurate Ownership
Mapping
Logical grouping
of resources
Change History
• Trust in data is most
important.
• Invest in monitoring &
alerting for data
inconsistencies
• Leverage this for
detecting abnormal
increase/decrease and
notify users
• Static mappings go out
of date quickly
• Invest in systems (ex,
Kite) for users to manage
it themselves
• Identifiers were too
granular and teams were
too broad.
• Find a good middle
ground and invest in
system (ex, Kite) to track,
understand and maintain
• Unit prices change over
time
• Orgs / Teams change
over time
• Resources get added /
removed
• Change history is
essential for consistency
which is used for CAP
planning
55. 55
1 2 3 4
Learnings
Chargeback @Twitter
Invest in data
Fidelity
Accurate Ownership
Mapping
Logical grouping
of resources
Change History
• Trust in data is most
important.
• Invest in monitoring &
alerting for data
inconsistencies
• Leverage this for
detecting abnormal
increase/decrease and
notify users
• Static mappings go out
of date quickly
• Invest in systems (ex,
Kite) for users to manage
it themselves
• Identifiers were too
granular and teams were
too broad.
• Find a good middle
ground and invest in
system (ex, Kite) to track,
understand and maintain
• Unit prices change over
time
• Orgs / Teams change
over time
• Resources get added /
removed
• Change history is
essential for consistency
which is used for CAP
planning
56. 56
1 2 3 4
Learnings
Chargeback @Twitter
Invest in data
Fidelity
Accurate Ownership
Mapping
Logical grouping
of resources
Change History
• Trust in data is most
important.
• Invest in monitoring &
alerting for data
inconsistencies
• Leverage this for
detecting abnormal
increase/decrease and
notify users
• Static mappings go out
of date quickly
• Invest in systems (ex,
Kite) for users to manage
it themselves
• Identifiers were too
granular and teams were
too broad.
• Find a good middle
ground and invest in
system (ex, Kite) to track,
understand and maintain
• Unit prices change over
time
• Orgs / Teams change
over time
• Resources get added /
removed
• Change history is
essential for consistency
which is used for CAP
planning
57. 57
1 2 3 4
Learnings
Chargeback @Twitter
Invest in data
Fidelity
Accurate Ownership
Mapping
Logical grouping
of resources
Change History
• Trust in data is most
important.
• Invest in monitoring &
alerting for data
inconsistencies
• Leverage this for
detecting abnormal
increase/decrease and
notify users
• Static mappings go out
of date quickly
• Invest in systems (ex,
Kite) for users to manage
it themselves
• Identifiers were too
granular and teams were
too broad.
• Find a good middle
ground and invest in
system (ex, Kite) to track,
understand and maintain
• Unit prices change over
time
• Orgs / Teams change
over time
• Resources get added /
removed
• Change history is
essential for consistency
which is used for CAP
planning
58.
59. SERVICE IDENTITY
MANAGER
RESOURCE
PROVISIONING MANAGER
DASHBOARD
(SINGLE PANE OF GLASS)
REPORTING
INFRASTRUCTURE SERVICEINFRASTRUCTURE SERVICEINFRASTRUCTURE SERVICEINFRASTRUCTURE & PLATFORM SERVICE
SERVICE LIFECYCLE WORKFLOWS
METADATA
RESOURCE QUOTA
MANAGEMENT
METERING &
CHARGEBACK
CLIENT IDENTITY
PROVIDER APIS & ADAPTERS
61. 61
Kite @Twitter
Identity System: Built a consistent way to group client identifiers of
different infrastructure services into a project and enabled ownership
• Capture Org Structure: Support org structure changes, project transfer
workflows to ensure up-to-date ownership of identifiers
• Unify client identifier provisioning workflow: Enables single source of truth
and reduces operator pain around provisioning and managing client identifiers.
Client Identifier Management
63. IDENTITY ENTITY MODEL
SERVICE/
SYSTEM ACCOUNT
<INFRA, CLIENTID>
1:N
tweetypie
<Aurora,
tweetypie.prod.tweetypie>
ads-prediction
<Aurora, ads-
prediction.prod.campaign-x>
64. BUSINESS OWNER
TEAM
PROJECT
SERVICE/
SYSTEM ACCOUNT
<INFRA, CLIENTID>
1:N
1:N
1:N
1:N
INFRASTRUCTURE
TWEETYPIE
tweetypie
tweetypie
<Aurora,
tweetypie.prod.tweetypie>
ADS PREDICTION
prediction
ads-prediction
<Aurora, ads-
prediction.prod.campaign-x>
REVENUE
IDENTITY ENTITY MODEL
65. BUSINESS OWNER
TEAM
PROJECT
SERVICE/
SYSTEM ACCOUNT
<INFRA, CLIENTID>
1:N
1:N
1:N
1:N
INFRASTRUCTURE
TWEETYPIE
tweetypie
tweetypie
<Aurora,
tweetypie.prod.tweetypie>
ADS PREDICTION
prediction
ads-prediction
<Aurora, ads-
prediction.prod.campaign-x>
REVENUE
IDENTITY ENTITY MODEL
Entities are time varying dimensions
73. 73
Future Work
Impact & Future Work
1 2
Capacity Planning Extend Quota
Manager
• Provide historic trends
and help with forecast of
capacity
• Onboard Hadoop,
Storage and other
systems
3
Enable project
deprecation
• Detect unused
resources, notify users,
trigger deprecation
process based on policy
74.
75. 75
1 2
Future Work
Impact & Future Work
Capacity Planning Extend Quota
Manager
• Provide historic trends
and help with forecast of
capacity
• Onboard Hadoop,
Storage and other
systems
3
Enable project
deprecation
• Detect unused
resources, notify users,
trigger deprecation
process based on policy
76. 76
1 2
Future Work
Impact & Future Work
Capacity Planning Extend Quota
Manager
• Provide historic trends
and help with forecast of
capacity
• Onboard Hadoop,
Storage and other
systems
3
Enable project
deprecation
• Detect unused
resources, notify users,
trigger deprecation
process based on policy