Team Collaboration in Kafka Clusters With Maria Berinde-Tampanariu | Current 2022
When different teams start to use the same Kafka clusters, it opens up opportunities and challenges. During this talk, we will look at different architectures and team structures to explore ways in which to set up authorization in a granular and maintainable way for real-world users, as well as for producing or consuming clients.
What are the options offered by the Kafka built-in Authorizer, how can the Authorizer be customized and how are integrations with external systems built in order to provide group or role-based access control? Confluent Cloud and Confluent Platform provide predefined roles as part of the Role-based Access Control (RBAC) feature. We will look at the permissions included in these role bindings, the scope on which they can be used, and the components for which they are available. Role-based Access Control and Access Control Lists can be used together - let’s explore the options, best practices, and order of precedence.
We will put the capabilities into action by looking at the practices used by an imaginary company where the central Platform Team provisions clusters for its internal customers and provides access for teams to self-manage their domains. What’s the best approach to grant access to team members to their team’s resources and what needs to happen when one team collaborates with another team? What happens when a team member works temporarily on two teams?
We will close the session by looking at the ability to use the authorization mechanisms in conjunction with different authentication options and at the automation options to make the actions predictable and repeatable.
3. Journey
3
My first
Kafka cluster
● a foreseeable
amount of
applications
● the core team
with full access
Central
Nervous
System
● many different
types of clients
● many users with
different access
levels
The ability to work
without getting in
each other’s way.
➔ scalable & repeatable
actions
➔ predictability
➔ self-service capabilities
➔ isolation
➔ manageability
7. Client Authentication
• process of establishing the client identity and verifying client & server authenticity
• authenticated identity throughout lifetime of connection
• KafkaPrincipal used to represent client identity (e.g. Username: maria)
• principal used to:
- grant access to resources
- allocate quotas
- log details
• different authentication mechanisms
7
8. Authentication Methods
Confluent Cloud
8
API Keys OAuth Single Sign On
• Cloud keys
• resource specific keys
- Kafka
- Schema Registry
- ksqlDB
• all keys owned by an
account
• key rotation
• delegated authentication
• JSON Web Token (JWT)
• OpenID Connect (OIDC)
• identity provider & identity
pools
• SAML based Identity
Provider (IdP)
• enabled at Confluent
Cloud organization level
• SSO users vs. local users
Confluent Cloud is a fully-managed Apache Kafka service available on all three major clouds.
• user & service accounts
10. Access Control List (ACL)
• general format:
"Principal P is [Allowed/Denied] Operation O From Host H On Resource R"
• wildcard & prefix matching supported
10
Principal P
based on standard
authorizer
(wildcard)
is [Allowed
/ Denied]
Operation O From Host H On Resource R
(wildcard & prefix)
Apache
Kafka®
individual principals
“Deny”
always
trumps
“Allow”.
supported
operations are
based on resource
(see docs)
supported
Cluster
Delegation Token
Group
Topic
Transactional ID
Confluent
Platform
individual & group
principals
Confluent
Cloud
user & service accounts
not
supported
Cluster
Consumer Group
Topic
Transactional ID
11. Authorizer
• customizable server plugin
• authorize an operation based on the principal and the resource being accessed
11
Confluent Cloud
.
• subset of Kafka Access
Control Lists (ACL)
• predefined role-based
access control (RBAC)
roles
• ACL & RBAC can be used
together
• AclAuthorizer (since v5.4.0)
• SimpleAclAuthorizer (before
v5.4.0)
• Confluent Server Authorizer
with LDAP group-based &
role-based access control
(RBAC)
‘
• Access Control Lists (ACL)
stored on Zookeeper (ZK) or
centrally on Metadata
Service (MDS)
Confluent Platform
Apache Kafka®
• pluggable Authorizer
• out-of-box implementation
• default authorizer:
AclAuthorizer ( > v2.4)
SimpleAclAuthorizer (< v2.4)
StandardAuthorizer (KRaft)
• Access Control Lists (ACL)
stored on Zookeeper (ZK) or in
metadata topic
12. Role-based Access Control (RBAC)
• serves as an additional authorization layer on
top of ACLs
• predefined roles & role-bindings
• Metadata Service used to configure and
manage RBAC
• only “Allow” rules (“Deny” not supported)
• benefits:
+ Manage security access across the platform
(Kafka, ksqlDB, Connect, Schema Registry,
Confluent Control Center)
+ delegation of permission management is
possible (ResourceOwner role)
+ centrally manage multiple clusters
12
13. RBAC on Confluent Cloud
CLI
GUI API
Org Admin
Env Admin Env Admin
Cluster 1 Admin Cluster 2 Admin
Topic 1
Resource Owner
Topic 2
Resource Owner
Dev Read Only -
Topic 1
Dev Write -
Topic 2
RBAC Authorization
Control access to
organizations, environments
and clusters
Admin Roles:
● OrganizationAdmin
● EnvironmentAdmin
● CloudClusterAdmin
Control CRUD operations
within Kafka resources
Developer Roles:
● ResourceOwner
● DeveloperRead
● DeveloperWrite
● DeveloperManage
Note: A single user can have multiple roles
13
Operator Roles:
● Operator
● MetricsViewer
15. Naming Conventions
• RBAC & ACLs can be used together
- use RBAC in general as the default to grant access
- use ACL in particular cases to deny access
• both support prefixed rules
• governance
- visual attribution
- stream governance functionality
• choose names unlikely to change over time
• think about how naming conventions can be enforced (e.g. CI/CD pipeline)
15
16. Demo: Role bindings with Prefixed Rules in
Confluent Cloud
• Authentication:
Confluent Cloud local users
• Authorization:
RBAC prefixed role bindings
• Naming Convention:
Team name used as prefix
16
20. Platform Limits
20
• given by the infrastructure on which Kafka is deployed
• Do you know the limits for your deployment?
• Confluent Cloud
- hard limits & soft limits
- different types of clusters (basic, standard & dedicated)
- some limits depend on type of cluster
- examples of limits:
• RBAC role-bindings
• ACLs
• throughput
22. Client Quotas
• applied on (user, client-ID) or client-ID groups
• defined at different levels with order of precedence
• quotas:
- network bandwidth
- request rate
• early access feature on Confluent Cloud
22
Quota parameter Cloud Client Quotas Apache Kafka Quotas
Apply to Service Accounts User or Client ID
Managed by Calling the Confluent Cloud API API Interacting with Kafka Directly
Level enforced at Cluster level Broker level
27. Confluent Control Center
• Self-managed deployment
• Can be connected to Confluent Cloud
• Can be used to monitor local Connect
cluster.
• Allows custom notifications.
27
30. Chargeback
• charging individual cost centers for their share of Kafka cluster usage
- flat rate
- consumption based
• chargeback vs. showback
• start with a simple model, which can evolve over time
• Confluent Control Center insights
• Metrics grouped by Principal ID
• content about cost effectiveness by Lyndon Hedderly, Confluent Principal Business Value
Consultant
30
31. Active Connection Count Example
31
Total client connections
(Basic & Standard
clusters)
Max 1000
Number of TCP connections to the cluster that can be open at one time.
Available in the Metrics API as active_connection_count.
If you are self-managing Kafka, you can look at the broker kafka.server:type=socket-server-
metrics,listener={listener_name},networkProcessor={#},name=connection-count metrics to understand how many connections you are using.
This value can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce
patterns per client, and consume patterns per client.
To reduce usage on this dimension, you can reduce the total number of clients connecting to the cluster.
35. Terraform Considerations
• starting a new project vs. migrating existing clusters
• Decide weather to support all possible options or provide Tshirt-sized templates.
• The lifecycle Meta-Argument
lifecycle { prevent_destroy = true }
35