Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

•

5 likes•3,581 views

In this talk we present a prototype solution for multitenant, scale-out Prometheus. Don't worry, its open source! Our solution turns a lot of the Prometheus architectural assumptions on its head, by marrying a scale-out PromQL query engine with a storage layer based on DynamoDB & S3. We have disaggregated the Prometheus binary into a microservices-style architecture, with separate services for distribution, ingest and storage. By designing all these services as fungible replicas, this solution can be scaled out with ease and failure of any individual replica can be dealt with gracefully. This multitenant, scale-out Prometheus service forms a core component of the Weave Cloud, a hosted management, monitoring and visualisation platform for microservice & containerised applications. This platform is built from 100% open source components, and we're working with the Prometheus community to contribute all the changes we've made back to Prometheus.

Project Frankenstein
A multi-tenant, horizontally scalable Prometheus as a Service
Tom Wilkie (& Julius Volz)
Weaveworks, August 2016

“the best way to visualise, manage & monitor your cloud
native application”

why not just run my own Prometheus?
• the as-a-service bit provides authentication and
access control
• virtually inﬁnite retention; all the state is managed
for you, by us
• provide a different story around durability, HA and
scalability
• (eventually) better query performance, especially
for long queries

requirements:
1. API compatible with Prometheus
2. easy to operate and manage
3. tens of thousands of users, tens of
millions samples/s
4. cost effective to run
5. reuse as much of Prometheus as possible
… so we can sell it

Aim: build proof of concept as
quickly as possible
16/06 started design doc
22/06 circulated on list
22/06 initial commit
26/07 launch jobs
25/08 give talk!
http://goo.gl/prdUYV

Retriever
scraping
your jobs
Your DC
Weave Cloud
Frontend,
Authenticator
Distributor
Ingester
Distributor…
IngesterIngester
DynamoDB S3

Retriever
/bin/prometheus -retrieval-only -storage.remote.generic-url=...
Does scraping and relabelling.
Is a vanilla Prometheus plus:
• Brian Brazil’s generic write PR (#1487)
• Some modiﬁcation to prevent local storage + indexing

• Uses consistent hashing to assign
timeseries to Ingesters
• Input to hash is (user ID, metric
name)
• Tokens stored in Consul
• Also currently handles queries
Distributor
http://goo.gl/U9u1U2

• Heavily modiﬁed MemorySeriesStorage
• Use same chunk format as Prometheus
• Keeps everything in memory (for up to an hour)
• Also stores in memory inverted index for queries
• Flushes chunks to S3 and indexes them in DynamoDB
Ingester

External inverted index maintained in DynamoDB, chunks stored in S3
Item in DynamoDB looks like:
{
hash key: “{user ID}:{metric name}:{hour}”,
range key: “{label name}:{label value}:{chunk ID}”,
metric: ...,
from, through: ...,
ID: ...,
}
DynamoDB S3

The Good
• It works! And in ~2
months.
• Seems pretty scalable,
handling two clusters right
now
• Query performance better
than expected
The Ugly: the code…
The Bad
• Hashing scheme means
can’t do queries that don’t
involve metric names.
• Possible to hotspot an
ingester

Lots left to do…
Features:
• Recording rules
• Alerting & Alertmanager
Reliability:
• Replication between
ingesters, commit log etc
• Ingestor lifecycle
• Separate query service?
Performance:
• Query parallelisation
• Background chunk
coalescing
Code:
• Code cleanup
• Upstream appropriate
changes

Questions?
Try it out!
Email help@weave.works for
instructions and to get on white list
https://github.com/tomwilkie/prometheus

Video of the presentation: http://www.youtube.com/watch?v=8z3h4Uv9YbE At LinkedIn, we have started to use the Play Framework to build front-end and back-end services at massive scale. Play does things a little differently: it's a Java and Scala web framework, but it doesn't follow the servlet spec; it's fairly new, but it runs on top of robust technologies like Akka and Netty; it uses a thread pool, but it's built for non-blocking I/O and reactive programming; most importantly, it's high performance, but also high productivity. We've found that the Play Framework is one of the few frameworks that is able to maintain the delicate balance of performance, reliability, and developer productivity. In the Java and Scala world, nothing even comes close. In this talk, I'll share what we've learned so far, including details of rapid iteration with Java and Scala, the story behind async I/O on the JVM, support for real time web apps (comet, WebSockets), and integrating Play into a large existing codebase.

Embulk, an open-source plugin-based parallel bulk data loader

Sadayuki Furuhashi

ChatWork is one of major business communication platforms in Japan. We keep growing up for 5+ years since our service inception. Now, we hold 110k+ of customer organizations which includes large organizations like telecom companies and the service is widely used across 200+ countries and regions. Nowadays we have faced drastic increase of message traffic. But, unfortunately, our conventional backend was based on traditional LAMP architecture. Transforming traditional backend into highly available, scalable and resilient backend was imperative. To achieve this, we have applied “Command Query Responsibility Segregation (CQRS) and Event Sourcing” as a heart of its architecture. The simple idea of segregation brings us independent command-side and query-side system components and it can subsequently achieve highly available, scalable and resilient systems. It is desirable property for messaging services because, for example, even if command-side was down, user can keep reading messages unless query-side was down. Event Sourcing is another key technique to enable us to build optimized systems to handle heterogeneous write/read load. This means that we can choose optimized storage platform for each side. Moreover, the event data can be the rich source for real-time analysis of user’s communication behavior. We have chosen Kafka as a command-side event storage, HBase as a query-side storage, Kafka Streams as a core library to give eventual consistency between the two sides. In application layer, Akka has been chosen as a core framework. Akka can be a good choice as an abstraction layer to build highly concurrent, distributed, resilient and message-driven application effectively. Backpressure introduced by Akka Stream can be important technology to prevent from overflow of data flows in our backend, which contributes system stability very well. In this session, we talk about how above architecture works, how we concluded above architectural decisions on many trade-offs, what was achieved by this architecture, what was the pain points (e.g. how to guarantee eventual consistency, how to migrate systems in the real project, etc.) and several TIPS we learned for realizing our highly distributed and resilient messaging systems. ChatWork is a business communication platform for global teams. Our four main features are enterprise-grade group chat, file sharing, task management and video chat. NTT DATA is one of biggest solution provider in Japan and providing technical support about Open Source Software and distributed computing. The project has been conducted with cooperation of ChatWork and NTT DATA.

SplunkLive! Getting Started with Splunk EnterpriseSplunk

ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure

Fei Chen

Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Spark Summit

Apache Pinot Meetup Sept02, 2020

Mayank Shrivastava

인프런 - 스타트업 인프랩 시작 사례

Hyung Lee

KafkaとAWS Kinesisの比較

Yoshiyasu SAEKI

Best Practice for Deploying Application with Heat

Ethan Lynn

Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx

RomanKhavronenko

VictoriaMetrics and Grafana Mimir are time series databases with support of mostly the same protocols and APIs. However, they have different architectures and components, which makes the comparison more complicated. In the talk, we'll go through the details of the benchmark where I compared both solutions. We'll see how VictoriaMetrics and Mimir are dealing with identical workloads and how efficient they’re with using the allocated resources. The talk will cover design and architectural details, weak and strong points, trade-offs, and maintenance complexity of both solutions.

Using ClickHouse for Experimentation

Gleb Kanterov

PlaySQLAlchemy: SQLAlchemy入門

泰増田

Druid

Dori Waldman

Kubernetes環境に対する性能試験（Kubernetes Novice Tokyo #2 発表資料）

NTT DATA Technology & Innovation

twMVC#44 讓我們用 k6 來進行壓測吧

twMVC

How To Become Better Engineer

DaeMyung Kang

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...

Weaveworks

In this talk we'll present a prototype solution for multitenant, horizontally scalable Prometheus as a Service, code name "Project Frankenstein". Frankenstein turns Prometheus architectural assumptions on their head, by marrying the PromQL query engine with a storage layer based on DynamoDB and S3. We have disaggregated the Prometheus binary into a microservices-style architecture, with separate services for distribution, ingest, alerting rules and storage. By designing all these services as fungible replicas, this solution can be scaled out with ease and failure of any individual replica can be dealt with gracefully. This multitenant, scale-out Prometheus service forms a core component of Weave Cloud, a hosted management, monitoring and visualisation platform for cloud native applications. This platform is built from 100% open source components, and we're working with the Prometheus community to contribute all the changes we've made back to Prometheus. Project Frankenstein is open source and can be found at https://github.com/weaveworks/frankenstein

Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service

Weaveworks

What's hot

Prestoクエリログの保存/分析機能の構築 #yjdsnight

Yahoo!デベロッパーネットワーク

RabbitMQ can scale out!!(jp ops-workshop-3)

NTT Communications Technology Development

RUCK 2017 - 강병엽 - Spark와 R을 연동한 빅데이터 분석

r-kor

PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin

Equnix Business Solutions

鯨物語～Dockerコンテナとオーケストレーションの理解

Masahito Zembutsu

Worldwide Scalable and Resilient Messaging Services by CQRS and Event Sourcin...

DataWorks Summit

SplunkLive! Getting Started with Splunk EnterpriseSplunk

ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure

Fei Chen

Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Spark Summit

Apache Pinot Meetup Sept02, 2020

Mayank Shrivastava

인프런 - 스타트업 인프랩 시작 사례

Hyung Lee

KafkaとAWS Kinesisの比較

Yoshiyasu SAEKI

Best Practice for Deploying Application with Heat

Ethan Lynn

Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx

RomanKhavronenko

Using ClickHouse for Experimentation

Gleb Kanterov

PlaySQLAlchemy: SQLAlchemy入門

泰増田

Druid

Dori Waldman

Kubernetes環境に対する性能試験（Kubernetes Novice Tokyo #2 発表資料）

NTT DATA Technology & Innovation

twMVC#44 讓我們用 k6 來進行壓測吧

twMVC

How To Become Better Engineer

DaeMyung Kang

What's hot (20)

Prestoクエリログの保存/分析機能の構築 #yjdsnight

RabbitMQ can scale out!!(jp ops-workshop-3)

RUCK 2017 - 강병엽 - Spark와 R을 연동한 빅데이터 분석

PGConf.ASIA 2019 Bali - Patroni in 2019 - Alexander Kukushkin

鯨物語～Dockerコンテナとオーケストレーションの理解

Worldwide Scalable and Resilient Messaging Services by CQRS and Event Sourcin...

SplunkLive! Getting Started with Splunk Enterprise

ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure

Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Apache Pinot Meetup Sept02, 2020

인프런 - 스타트업 인프랩 시작 사례

KafkaとAWS Kinesisの比較

Best Practice for Deploying Application with Heat

Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx

Using ClickHouse for Experimentation

PlaySQLAlchemy: SQLAlchemy入門

Druid

Kubernetes環境に対する性能試験（Kubernetes Novice Tokyo #2 発表資料）

twMVC#44 讓我們用 k6 來進行壓測吧

How To Become Better Engineer

Similar to Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...

Weaveworks

Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service

Weaveworks

Scaling Prometheus on Kubernetes with Thanos

Thomas Riley

Surrogate dependencies (in node js) v1.0

Dinis Cruz

Tech io spa_angularjs_20130814_v0.9.5Ganesh Kondal

Advanced web application architecture - Talk

Matthias Noback

COMMitMDE'18: Eclipse Hawk: model repository querying as a service

Antonio García-Domínguez

Prometheus - basics

Juraj Hantak

Cortex: Horizontally Scalable, Highly Available Prometheus

Grafana Labs

In this talk we present Cortex - a horizontally scalable, highly available Prometheus implementation. Like Prometheus, Cortex is a CNCF (sandbox) project. Cortex turns a lot of the Prometheus architectural assumptions on its head, by marrying a scale-out PromQL query engine with a storage layer based on NOSQL databases such as Bigtable, DynamoDB and Cassandra. We have disaggregated the Prometheus binary into a microservices-style architecture, with separate services for query, ingest, alerting and recording rules. By designing all these services as fungible replicas, this solution can be scaled out with ease and failure of any individual replica can be dealt with gracefully.

Zenko @Cloud Native Foundation London Meetup March 6th 2018

Laure Vergeron

Monitoring, the Prometheus Way - Julius Voltz, Prometheus

Docker, Inc.

Prometheus is an opinionated metrics collection and monitoring system that is particularly well suited to accommodate modern workloads like containers and micro-services. To achieve these goals, it radically breaks away from existing systems and follows very different design principles. In this talk, Prometheus founder Julius Volz will explain these design principles and how they apply to dockerized applications. This will provide insight useful to newcomers wanting to start on the right foot in the land of container monitoring, but also to veterans wanting to quickly map their existing knowledge to Prometheus concepts. In particular, a demo will show Prometheus in action together with a Docker Swarm cluster.

PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story

vanphp

Untangling - fall2017 - week 9

Derek Jacoby

Monitoring your Python with Prometheus (Python Ireland April 2015)

Brian Brazil

Webinar: Enabling Microservices with Containers, Orchestration, and MongoDB

MongoDB

Want to try out MongoDB on your laptop? Execute a single command and you have a lightweight, self-contained sandbox; another command removes all trace when you're done. Need an identical copy of your application stack in multiple environments? Build your own container image and then your entire development, test, operations, and support teams can launch an identical clone environment. Containers are revolutionizing the entire software lifecycle: from the earliest technical experiments and proofs of concept through development, test, deployment, and support. Orchestration tools manage how multiple containers are created, upgraded and made highly available. Orchestration also controls how containers are connected to build sophisticated applications from multiple, microservice containers. This webinar introduces the concepts behind containers and orchestration, then explains the available technologies and how to use them with MongoDB. Finally, you will see a demonstration of exactly how to create a MongoDB replica set on Docker and Kubernetes within the Google Cloud.

Adopting AnswerModules ModuleSuiteAnswerModules

Mongo DB at Community Engine

Community Engine

MongoDB at community engine

mathraq

how to mesure web performance metrics

Marc Cortinas Val

Microservices and Serverless for Mega Startups - DevOps IL Meetup

Boaz Ziniman

Microservices and Serverless computing allow you to build and run simpler and more efficient applications, while improving your agility and saving a lot of money. The ability to deploy your applications without the need for provisioning or managing servers opens for startups new opportunities to build web, mobile, and IoT backends; run stream processing or big data workloads; run chatbots, and more, without the investment in hardware or professional manpower to run this hardware. In this session, we will learn how to get started with Microservices and Serverless computing with AWS Lambda, which lets you run code without provisioning or managing servers.

Similar to Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service (20)

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...

Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service

Scaling Prometheus on Kubernetes with Thanos

Surrogate dependencies (in node js) v1.0

Tech io spa_angularjs_20130814_v0.9.5

Advanced web application architecture - Talk

COMMitMDE'18: Eclipse Hawk: model repository querying as a service

Prometheus - basics

Cortex: Horizontally Scalable, Highly Available Prometheus

Zenko @Cloud Native Foundation London Meetup March 6th 2018

Monitoring, the Prometheus Way - Julius Voltz, Prometheus

PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story

Untangling - fall2017 - week 9

Monitoring your Python with Prometheus (Python Ireland April 2015)

Webinar: Enabling Microservices with Containers, Orchestration, and MongoDB

Adopting AnswerModules ModuleSuite

Mongo DB at Community Engine

MongoDB at community engine

how to mesure web performance metrics

Microservices and Serverless for Mega Startups - DevOps IL Meetup

More from Weaveworks

Weave AI Controllers (Weave GitOps Office Hours)

Weaveworks

LLMs are one of the rising workloads on Kubernetes and so are the complexities of deploying, managing and fine-tuning them. With this latest extension we can offer a strong blueprint for enterprises on how to keep LLMs OCI contained with the use of Kubernetes, Flux and Weave AI Controllers. The Highlights: * Simplified deployment, management, and fine-tuning of LLMs on any Kubernetes infrastructure. * Strong security and governance ensured through GitOps workflows and a robust signing and verification process. The Whys: * Security, Governance & Compliance: Ensures vulnerability-free and compliant deployments. * Seamless Integration: Works with existing systems, including Red Hat OpenShift. * GitOps for Productivity & Collaboration: Leverages the power of Flux and Kubernetes for automated, streamlined workflows. The Weave AI Controllers are an out of the box extension for Flux and are shipped and supported with Weave GitOps Assured (https://www.weave.works/product/gitops) and Enterprise (https://www.weave.works/product/gitops-enterprise/). Read our latest blog for more information (https://www.weave.works/blog/weave-ai-controllers) and visit GitHub to get started - https://github.com/weave-ai/weave-ai

Flamingo: Expand ArgoCD with Flux (Office Hours)

Weaveworks

Flamingo is an open source tool that allows for integrated use of both Flux and ArgoCD, the two leading GitOps solutions available today. * See how to integrate the two most used CNCF projects together to create flexible and extensible GitOps solutions. * Learn how to use Flux’s powerful and secure controllers with ArgoCD’s web-based GUI. * Understand how Flamingo provides a path towards Platform Engineering for ArgoCD users. * Explore extending ArgoCD to manage Infrastructure as Code through Flux’s Terraform Controller. For more information visit: https://github.com/flux-subsystem-argo/flamingo

Webinar: Capabilities, Confidence and Community – What Flux GA Means for You

Weaveworks

Flux, the original GitOps project, began its development in a small London office back in 2017 with the goal to bring continuous delivery (CD) to developers, platform and cluster operators working with Kubernetes. From donating the project to the CNCF, its continued growth within the cloud native community, to its achievement of passing rigorous battle tests for security, longevity and governance, it’s little wonder that Flux v2 has reached yet another celebratory milestone – General Availability (GA). Flux is the GitOps platform of choice for many enterprise companies such as SAP, Volvo Cars, and Axel Springer; and is embedded within AKS, Azure Arc and EKS Anywhere. It provides extensive automation to CI/CD, security and audit trails, and reliability through canary deployments and rollback capabilities. Join this webinar by Flux maintainers and creators and discover: * Latest release features and roadmap for the future. * Interesting use cases for Flux (e.g security). * Flux capabilities you may not be aware of (e.g. extensions). * Joining the vibrant Flux community. * How to leverage Flux in a supported enterprise environment today.

Six Signs You Need Platform Engineering

Weaveworks

Although not an entirely new concept, Platform Engineering and Internal Developer Platforms (IDPs) are all the rage due to their potential to increase development velocity and deployment frequency while boosting reliability and security. Join Joe Dahlquist, VP of PMM and Mohamed Ahmed, VP of Developer Platforms at Weaveworks to learn the 6 tell-tale signs your company should implement a platform engineering approach. The webinar draws on hundreds of conversations with SRE’s, developers, and platform engineering teams to help you better understand what works, what doesn’t and what might be missing from your strategy. Attendees can apply these learnings to their first (or next) developer platform regardless of your build vs. buy journey. You will learn: * The difference between Internal Developer Platforms and Platform Engineering * Why platform engineering now? * How Dev and Ops benefit from an IDP * 6 tell-tale signs to start platform engineering * Drafting your platform engineering strategy - where to begin and what to avoid

SRE and GitOps for Building Robust Kubernetes Platforms.pdf

Weaveworks

In today's technology-driven landscape, ensuring the reliability and stability of systems is critical for organizations to deliver exceptional user experiences. Site Reliability Engineering (SRE) has emerged as a proven methodology to achieve operational excellence and elevate performance. By combining SRE and GitOps, organizations can leverage the benefits of both methodologies. GitOps provides a reliable and auditable approach to managing infrastructure and application changes, ensuring that all deployments are version-controlled and consistent across environments. This aligns with the SRE principle of implementing standardized and automated processes for maintaining system reliability. Join our live webinar as we introduce the fundamentals and significance of SRE and GitOps, and provide actionable strategies for implementation. We’ll also explore the features of Weave GitOps that integrate SRE and GitOps practices to streamline workflows to support system reliability and stability. You will learn: An overview and correlation of key SRE and GitOps best practices The 5 keys DORA metrics for measuring performance of software delivery. How to leverage continuous delivery and progressive delivery to enhance application stability. How Weave GitOps can reliably simplify the management of infrastructure and applications, with real-world customer examples illustrating their impact.

Webinar: End to End Security & Operations with Chainguard and Weave GitOps

Weaveworks

One of the key values of GitOps relies on its fully declarative single source of truth in Git for the desired state of your entire system – configuration that continuously reconciles with the runtime of the system. Validating committer identity in your Git repository is a critical component towards a secure GitOps solution. Although basic capabilities are provided by Git service providers, more granular controls for governance and compliance are a requirement to satisfy most enterprise grade implementations. How do you keep that end to end process secure, from Git to Runtime? Join Weaveworks and Chainguard for a live webinar where we will look at how Chainguard Enforce for Git together with Weave GitOps Enterprise Policy Engine allows you to secure your end to end GitOps workflows, from Git to Runtime. You will learn how to: - Use Chainguard Enforce for Git to ensure only authorized GitOps tooling can modify your desired state. - Provide a secure identity to Weave GitOps Enterprise for all Git operations. - Use Weave GitOps Policy Engine to guarantee compliance on admission.

Flux Beyond Git Harnessing the Power of OCI

Weaveworks

Watch the recap: https://youtu.be/gKR95Kmc5ac In this KubeCon Europe 2023 session, Stefan and Hidde will talk about the latest developments of Flux around the Open Container Initiative (OCI). The focus will be on how OCI can serve as the single source of truth for both application code (container images) and configuration (OCI artifacts). We will start by explaining how Flux can be used as a package manager for distributing Kubernetes configs and Terraform modules as OCI artifacts. Afterwards, we will demonstrate how to build a secure delivery pipeline that leverages Flux integrations with GitHub Actions and keyless signatures from Sigstore Cosign. Lastly, we will touch upon the upcoming plans for 2023 and the significance of OCI in the future of continuous delivery with Flux.

Automated Provisioning, Management & Cost Control for Kubernetes Clusters

Weaveworks

In today’s economic climate, IT departments are feeling the pressure to reduce costs which can have a significant effect on development teams, and more specifically, Kubernetes strategies. For many organizations, there is a good chance that many Kubernetes resources are overprovisioned, and it’s often difficult to visualize which processes are responsible for this unnecessary spend. Weaveworks has joined forces with KubeCost to show you how to “do more with less” by easily integrating a Kubernetes FinOps solution into your existing workflows and seamlessly automating the provisioning and management of FinOps enabled Kubernetes clusters from a single UI / dashboard. Join this webinar to discover best practices for monitoring and reducing Kubernetes spend, while balancing cost, performance, and reliability. What you’ll learn: - Best practices for implementing a FinOps strategy in your organization. - Cluster management and templating capabilities using Weave GitOps for automating FinOps. - How to use predefined, automated policies for reliable cost control across your Kubernetes environment.

How to Avoid Kubernetes Multi-tenancy Catastrophes

Weaveworks

Picture this… It’s the middle of the night on a Saturday, and the sound of slack messages rolling in rouses you from slumber. Then two text messages chime in quick succession. As you grab your phone and pry open an eye to figure out WTF, the phone rings - and it’s your boss!? You stammer out a “Hello?” She sounds alarmed. “Wake up, we have a big problem” “It’s two-in-the-morning, what problem?” you croak back. “I guess you missed the alerts while you were sleeping…API endpoints in prod are getting knocked over, and the tokens responsible are yours.” “They’re what? How?” “Get to your machine and jump on the meeting link I just sent - everybody’s waiting” Yikes. Join Weaveworks for some real-world tales from the trenches, and learn about the 5 simple things you can do to prevent making a royal mess of Tenancy in Kubernetes. Hear from developers that got that late night call because of a bone-headed accident, and teams affected by gob-smacking access and permissions foul-ups. Luckily for us, they were happy to tell us the tales so we can learn from their pain. Weave GitOps Workspaces is a new feature that enables multi-tenancy so platform engineers can scale their GitOps workflows across numerous development teams. Oh yeah, it also wards -off wake-up calls in the middle of the night, which is nice. Watch this webinar recording to learn: - How Weave GitOps simplifies tenancy management - How security guardrails keep you from blowing a hole in your app, and across your team - 5 takeaways for enabling Kubernetes tenancy safely and effectively for your teams

Building internal developer platform with EKS and GitOps

Weaveworks

An internal developer platform (IDP) is a set of standardized tools and technologies that enables development teams to self-service, offering convenient access to resources they need to create and deploy compliant code. The ultimate goal is to facilitate automation, autonomy and productivity across large teams. However, creating an IDP is highly complex, especially when bridging hybrid scenarios. In fact, build timelines can take anywhere between one to two years! In this Techstrong Learning Experience, we will discuss how platform engineers can more efficiently build an IDP with Amazon EKS and Weave GitOps and accelerate cloud-native adoption while speeding up migration of existing applications to the cloud. Our experts will also introduce EKS Blueprints, a collection of infrastructure-as-code (IaC) modules like Terraform and AWS Cloud Development Kit (AWS CDK) that will help you configure and deploy consistent EKS clusters across on-premises and cloud. Key Takeaways: - Why you should build a self-service IDP - How to leverage EKS, GitOps and EKS Blueprints to build your IDP - A review of use cases and benefits of an IDP

GitOps Testing in Kubernetes with Flux and Testkube.pdf

Weaveworks

GitOps is amazing... until you can't apply it! This has been the case mostly for testing where it continues to be more of a push than a pull in organizations' DevOps pipelines. Join us in this talk to learn the benefits of improving your existing testing pipeline with Testkube, an open source project that brings tests inside your Kubernetes cluster, and FluxCD adding the GitOps sprinkles to testing! Speaker: Abdallah Abedraba, Product Leader at Testkube Abdallah works at Testkube, a Kubernetes native testing framework. In his prior experiences, he has tried everything from software engineering to product management, and now working as a Developer Advocate, on open source (a dream of his!) evangelizing all things Testing and Kubernetes. In his free time, he enjoys attending developer conferences and meetups, as well as spending time at the movies and actively listening to music.

Intro to GitOps with Weave GitOps, Flagger and Linkerd

Weaveworks

You may not think of "GitOps" and "service mesh" together – but maybe you should! These two wildly different technologies are each enormously capable independently, and combined they deliver far more than the sum of their parts: a single Git commit can control workflows customized for your exact situation by taking advantage of the service mesh's ability to measure and manipulate traffic anywhere in your application's call graph, and you can rest easy knowing that Git is preserving the complete configuration for your entire application every step of the way. See how these technologies can work together to tackle complex problems in cloud-native applications. What you’ll get out of this: * Understand what GitOps and service meshes can - and can't - do for you. * Understand basic operations with GitOps and Linkerd. * Understand the basics of continuous deployment with Weave GitOps and Linkerd.

Implementing Flux for Scale with Soft Multi-tenancy

Weaveworks

Soft multi-tenancy can be hard to achieve and secure. Multiple tenants sharing the same cluster means there are global objects, like Custom Resource Definitions (CRDs), namespaces, and so on, that you don’t want tenants controlling. Platform admins, cluster admins, and tenants, should be separated, with dedicated namespaces, role bindings, node groups, taints and tolerations, etc. With Flux, tenant isolation is enforced by default, so you don’t have to worry about accidental tenant cross-over / cross-contamination. In this session, Priyanka “Pinky” Ravi, Developer Experience Engineer at Weaveworks, will walk you through how to set up multi-tenancy on an existing Kubernetes cluster and manage several tenants within the cluster. Take advantage of the benefits that come with infrastructure as code.

Accelerating Hybrid Multistage Delivery with Weave GitOps on EKS

Weaveworks

Join Leo Murillo, Principal Solutions Architect at Weaveworks and Rama Ponnuswami, Sr. Container Specialist at AWS, as they walk through accelerating Multi-stage delivery on GitOps. If you already have EKS-A, you are ready to automate the release of multistage delivery. Thus, allowing you to deploy more often and reliably with less overhead. In this Webinar, we cover: - Best practices for CI/CD, GitOps and Application Pipeline Management. - Simple cluster management across Kubernetes hybrid infrastructure. - Multistage deployments using Weave GitOps for EKS and EKS-A using a single UI dashboard.

The Story of Flux Reaching Graduation in the CNCF

Weaveworks

Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...

Weaveworks

In this session, we’ve partnered with Upbound to showcase how to effectively manage application delivery while maintaining a high level of security using Weave GitOps and Upbound. Managing a stateful application deployment with a relational database, Weave GitOps can recognize if there is a policy violation and correct it before deploying the application. Join us as we demonstrate the scenarios where: All changes to application configuration are managed through Git workflows Upbound’s Universal Crossplane allows you to build, deploy, and manage your cloud platforms GitOps provides an extra layer of security by removing the need for direct access to Kubernetes clusters Policy-as-Code guarantees security, resilience and coding standards compliance Watch the recording: xx

Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...

Weaveworks

In a joint webinar with Traefik Labs, we show how Traefik Hub, a SaaS-based cloud native networking platform, helps you publish your containers securely in seconds with tunnels, OIDC authentication and automated TLS certificate management. And, how you can combine that with Weave GitOps to achieve continuous application delivery using progressive delivery strategies for risk-free and reliable deployments. Security is key, so we showcase multi-tenancy for full RBAC across the different deployment stages, and trusted delivery best practices for continuous security and compliance baked in. Learn how: - To utilize canary deployments for reliable and risk-free application deployments. - GitOps lets you automate and secure the publishing of containers at the edge consistently. - Easy it is to deploy, update and manage your application workloads on Kubernetes. - To publish containers securely using tunnels, OIDC authentication and TLS certificate management.

Flux’s Security & Scalability with OCI & Helm Slides.pdf

Weaveworks

Flux Security & Scalability using VS Code GitOps Extension

Weaveworks

Recently Flux has released two new features (OCI and Cosign) for scalable and secure GitOps. Juozas Gaigalas, a Developer Experience Engineer at Weaveworks, will demonstrate how developers and platform engineers can quickly create scalable and Cosign-verified GitOps configurations using VS Code GitOps Tools extension. New and experienced Flux users can learn about Flux’s OCI and Cosign support through this demo.

Deploying Stateful Applications Securely & Confidently with Ondat & Weave GitOps

Weaveworks

Deploying secure, cloud native stateful applications requires a high level of performance across hybrid and multi-cloud environments. Using the scalable, highly performant storage provided by Ondat in combination with Weave GitOps Trusted Delivery, you can shift left security and accelerate software development. Watch this on-demand webinar as we demonstrate how: - All changes to application configuration are managed through Git workflows GitOps provides an extra layer of security by removing the need for direct access to Kubernetes clusters. - Policy-as-Code guarantees security, resilience and coding standards compliance. - To dynamically provision highly available persistent volumes by simply deploying Ondat anywhere with a simple operator profile. - All data services such as replication, compression and encryption, are optimized and accelerated to scale on any platform with Ondat’s low latency data plane.

More from Weaveworks (20)

Weave AI Controllers (Weave GitOps Office Hours)

Flamingo: Expand ArgoCD with Flux (Office Hours)

Webinar: Capabilities, Confidence and Community – What Flux GA Means for You

Six Signs You Need Platform Engineering

SRE and GitOps for Building Robust Kubernetes Platforms.pdf

Webinar: End to End Security & Operations with Chainguard and Weave GitOps

Flux Beyond Git Harnessing the Power of OCI

Automated Provisioning, Management & Cost Control for Kubernetes Clusters

How to Avoid Kubernetes Multi-tenancy Catastrophes

Building internal developer platform with EKS and GitOps

GitOps Testing in Kubernetes with Flux and Testkube.pdf

Intro to GitOps with Weave GitOps, Flagger and Linkerd

Implementing Flux for Scale with Soft Multi-tenancy

Accelerating Hybrid Multistage Delivery with Weave GitOps on EKS

The Story of Flux Reaching Graduation in the CNCF

Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...

Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...

Flux’s Security & Scalability with OCI & Helm Slides.pdf

Flux Security & Scalability using VS Code GitOps Extension

Deploying Stateful Applications Securely & Confidently with Ondat & Weave GitOps

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

Elevating Tactical DDD Patterns Through Object Calisthenics

Dorra BARTAGUIZ

After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

Connector Corner: Automate dynamic content and events by pushing a button

DianaGray10

Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to: Create a campaign using Mailchimp with merge tags/fields Send an interactive Slack channel message (using buttons) Have the message received by managers and peers along with a test email for review But there’s more: In a second workflow supporting the same use case, you’ll see: Your campaign sent to target colleagues for approval If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team But—if the “Reject” button is pushed, colleagues will be alerted via Slack message Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors. And... Speakers: Akshay Agnihotri, Product Manager Charlie Greenberg, Host

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Knowledge engineering: from people to machines and back

Elena Simperl

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Elevating Tactical DDD Patterns Through Object Calisthenics

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

Neuro-symbolic is not enough, we need neuro-*semantic*

Epistemic Interaction - tuning interfaces to provide information for AI support

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Leading Change strategies and insights for effective change management pdf 1.pdf

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

GraphRAG is All You need? LLM & Knowledge Graph

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Connector Corner: Automate dynamic content and events by pushing a button

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Monitoring Java Application Security with JDK Tools and JFR Events

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Knowledge engineering: from people to machines and back

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

When stars align: studies in data quality, knowledge graphs, and machine lear...

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

1. Project Frankenstein A multi-tenant, horizontally scalable Prometheus as a Service Tom Wilkie (& Julius Volz) Weaveworks, August 2016

4. “the best way to visualise, manage & monitor your cloud native application”

7. Design

8. why not just run my own Prometheus? • the as-a-service bit provides authentication and access control • virtually inﬁnite retention; all the state is managed for you, by us • provide a different story around durability, HA and scalability • (eventually) better query performance, especially for long queries

9. requirements: 1. API compatible with Prometheus 2. easy to operate and manage 3. tens of thousands of users, tens of millions samples/s 4. cost effective to run 5. reuse as much of Prometheus as possible … so we can sell it

10. Aim: build proof of concept as quickly as possible 16/06 started design doc 22/06 circulated on list 22/06 initial commit 26/07 launch jobs 25/08 give talk! http://goo.gl/prdUYV

11. Retriever scraping your jobs Your DC Weave Cloud Frontend, Authenticator Distributor Ingester Distributor… IngesterIngester DynamoDB S3

12. Retriever /bin/prometheus -retrieval-only -storage.remote.generic-url=... Does scraping and relabelling. Is a vanilla Prometheus plus: • Brian Brazil’s generic write PR (#1487) • Some modiﬁcation to prevent local storage + indexing

13. • Uses consistent hashing to assign timeseries to Ingesters • Input to hash is (user ID, metric name) • Tokens stored in Consul • Also currently handles queries Distributor http://goo.gl/U9u1U2

14. • Heavily modiﬁed MemorySeriesStorage • Use same chunk format as Prometheus • Keeps everything in memory (for up to an hour) • Also stores in memory inverted index for queries • Flushes chunks to S3 and indexes them in DynamoDB Ingester

15. External inverted index maintained in DynamoDB, chunks stored in S3 Item in DynamoDB looks like: { hash key: “{user ID}:{metric name}:{hour}”, range key: “{label name}:{label value}:{chunk ID}”, metric: ..., from, through: ..., ID: ..., } DynamoDB S3

16. Evaluation

17. The Good • It works! And in ~2 months. • Seems pretty scalable, handling two clusters right now • Query performance better than expected The Ugly: the code… The Bad • Hashing scheme means can’t do queries that don’t involve metric names. • Possible to hotspot an ingester

18.

19. Demo

20. Lots left to do… Features: • Recording rules • Alerting & Alertmanager Reliability: • Replication between ingesters, commit log etc • Ingestor lifecycle • Separate query service? Performance: • Query parallelisation • Background chunk coalescing Code: • Code cleanup • Upstream appropriate changes

21. Questions? Try it out! Email help@weave.works for instructions and to get on white list https://github.com/tomwilkie/prometheus

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

Similar to Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service (20)

More from Weaveworks

More from Weaveworks (20)

Recently uploaded

Recently uploaded (20)

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service