Dynamic
Authorization and
Policy Control for
Docker Container
Environments
TORIN SANDALL
Engineer, Styra
@sometorin
JUSTIN CORMACK
Engineer, Docker
@justincormack
Why Policy?
● Heterogeneity: many languages, protocols,
systems
● Dynamism: system changing, policies
changing
● How do you enforce and audit policy?
● Correctness, performance, cost
Challenges
Today we are going to talk about tools and
approaches for solving this problem!
Computers can help!
Real life example
Example Scenario: AcmeCorp
● Pet camera & home monitoring devices
● Watch your cute pets while you travel
● Always on, always connected
AcmeCorp's Architecture
Portal
PaymentsAccounts Device UpdatesDevice Streams
Stream Archiving
HTTP HTTP gRPCHLS
S3
MySQL
Alice
(Customer)
AcmeCorp VPC
Janet
(Tech support)
AcmeCorp's Problem
Portal
PaymentsAccounts Device UpdatesDevice Streams
Stream Archiving
HTTP HTTP gRPCHLS
S3
MySQL
Alice
(Customer)
AcmeCorp VPC
Janet
(Tech support)
"Tech support specialists accessing customer
data must be assigned to an open ticket for the
customer they are assisting."
Example Policy
Example Implementation
Portal
PaymentsAccounts Device UpdatesDevice Streams
Stream Archiving
HTTP HTTP gRPCHLS
S3
MySQL
Alice
(Customer)
AcmeCorp VPC
Janet
(Tech support)
authzauthzauthzauthz
authz
authz
Example Implementation
business logic
authorization logic
def get_user_account(req):
if not authorized(req):
raise status(403)
return db.read_user_account(req.id)
def authorized(req):
if "support" in req.subject.groups:
for ticket in get_open_tickets(req.id):
if req.subject.user == ticket.assignee:
return True
return False
# other authorization logic...
Obvious questions...
def get_user_account(req):
if not authorized(req):
raise status(403)
return db.read_user_account(req.id)
def authorized(req):
if "support" in req.subject.groups:
for ticket in get_open_tickets(req.id):
if req.subject.user == ticket.assignee:
return True
return False
# other authorization logic...
What happens when the policy changes...
What if the policy requires additional context...
What if the customer requires control of the policy...
What if the policy was not implemented correctly...
What if you have 100+ services written in N langs...
What is Open Policy Agent?
OPA: general-purpose policy engine
Inception
Project started in 2016 at
Styra.
Goal
Unify policy enforcement
across the stack.
Use Cases
Admission control
Authorization
ACLs
RBAC
IAM
ABAC
Risk management
Data Protection
Data Filtering
Users
Netflix
Chef
Medallia
Cloudflare
State Street
Pinterest
Intuit
...and many more.
Today
CNCF project (Sandbox)
36 contributors
1.5K stars
400 slack members
20+ integrations
Open Policy Agent
OPA: general-purpose policy engine
Service
OPA
Policy
(Rego)
Data
(JSON)
Policy
Query
Policy
Decision
Enforcement
Request
OPA: general-purpose policy engine
Accounts
OPA
Policy
(Rego)
Data
(JSON)
Policy
Query
Policy
Decision
Enforcement
Request GET /accounts/alice HTTP/1.1
Authorization: janet
OPA: general-purpose policy engine
Accounts
OPA
Policy
(Rego)
Data
(JSON)
Policy
Query
Policy
Decision
Enforcement
Request GET /accounts/alice HTTP/1.1
Authorization: janet
{
"method": "GET",
"path": ["accounts", "alice"],
"user": "janet"
}
true or false
OPA: general-purpose policy engine
Service
OPA
Policy
(Rego)
Data
(JSON)
Policy
Query
Policy
Decision
Enforcement
Request
Service refers to any one of:
● Custom service
● Kubernetes API server
● Message broker
● SSH daemon
● CI/CD pipeline script
OPA: general-purpose policy engine
Service
OPA
Policy
(Rego)
Data
(JSON)
Policy
Query
Policy
Decision
Enforcement
Request
Service refers to any one of:
● Custom service
● Kubernetes API server
● Message broker
● SSH daemon
● CI/CD pipeline script
Input can be any JSON value:
"alice"
["v1", "users", "bob"]
{"kind": "Pod", "spec": …}
Output can be any JSON value:
true
"request rejected"
{"servers": ["web1", "web2"]}
Getting hands on
"Tech support specialists accessing customer
data must be assigned to an open ticket for the
customer they are assisting."
Example Policy
Conclusions
● write rules not code
● re-use same policy across all applications and
languages
● support audit and testing
A better way to build policy
OPA: Integrations
Data Filtering
Admission Control “Restrict ingress hostnames for payments team.”
“Ensure container images come from corporate repo.”
API Authorization
“Deny test scripts access to production services.”
“Allow analysts to access APIs serving anonymized data.”
Data Protection
Linux PAM
SSH & sudo “Only allow on-call engineers to SSH into production servers.”
"Trades exceeding $10M must be executed between 9AM and
5PM and require MFA."
"Users can access files for past 6 months related to the region
they licensed."
● allow developers to see effect of policy earlier
● show effect of policy as soon as possible
● have ability to move point of enforcement
around if it improves code
● test policy was enforced in production as well
“Shift left”
● https://www.openpolicyagent.org/
● https://slack.openpolicyagent.org/
● https://github.com/open-policy-agent
● Examples from talk
○ https://github.com/tsandall/dockercon-eu-2018
Questions?
I want more!
Blank slide
OPA: general-purpose policy engine
Service
OPA
Policy
(rego)
Data
(json)
Policy
Query
Policy
Decision
Enforcement
● Declarative Policy Language (Rego)
○ Can identity I do operation O on resource R?
■ ACLs, RBAC, IAM, ABAC
○ What invariants does workload W violate?
■ Enforce, audit, dry-run
○ Which records should bob be allowed to see?
■ Constraints on data
OPA: general-purpose policy engine
Service
OPA
Policy
(rego)
Data
(json)
Policy
Query
Policy
Decision
Enforcement
● Declarative Policy Language (Rego)
○ Can identity I do operation O on resource R?
■ ACLs, RBAC, IAM, ABAC
○ What invariants does workload W violate?
■ Enforce, audit, dry-run
○ Which records should bob be allowed to see?
■ Constraints on data
● Library, sidecar, host-level daemon
○ Policy and data are kept in-memory
○ Zero decision-time dependencies
OPA: general-purpose policy engine
Service
OPA
Policy
(rego)
Data
(json)
Policy
Query
Policy
Decision
Enforcement
● Declarative Policy Language (Rego)
○ Can identity I do operation O on resource R?
■ ACLs, RBAC, IAM, ABAC
○ What invariants does workload W violate?
■ Enforce, audit, dry-run
○ Which records should bob be allowed to see?
■ Constraints on data
● Library, sidecar, host-level daemon
○ Policy and data are kept in-memory
○ Zero decision-time dependencies
● Management APIs for control & observability
○ Bundle service API for sending policy & data to OPA
○ Status service API for receiving status from OPA
○ Log service API for receiving audit log from OPA
OPA: general-purpose policy engine
Service
OPA
Policy
(rego)
Data
(json)
Policy
Query
Policy
Decision
Enforcement
● Declarative Policy Language (Rego)
○ Can identity I do operation O on resource R?
■ ACLs, RBAC, IAM, ABAC
○ What invariants does workload W violate?
■ Enforce, audit, dry-run
○ Which records should bob be allowed to see?
■ Constraints on data
● Library, sidecar, host-level daemon
○ Policy and data are kept in-memory
○ Zero decision-time dependencies
● Management APIs for control & observability
○ Bundle service API for sending policy & data to OPA
○ Status service API for receiving status from OPA
○ Log service API for receiving audit log from OPA
● Tooling to build, test, and debug policy
○ opa run, opa test, opa fmt, opa deps, opa check, etc.
○ VS Code plugin, Tracing, Profiling, etc.
Hands on example with OPA
Accounts
OPA Bundle
Server
Policy + Tickets
GET /accounts/alice HTTP/1.1
Authorization: Bearer ...
Input
{
method: GET
path: [accounts, alice]
subject: {
user: janet
groups: [support]
}
}
Result
true (allow) or false (deny)
Data
{
tickets: {
alice: [
{assignee: janet},
{assignee: bob}
],
ken: [
{assignee: janet}
]
}
}
Hands on example with OPA
Input
{
method: GET
path: [accounts, alice]
subject: {
user: janet
groups: [support]
}
}
Data
{
tickets: {
alice: [
{assignee: janet}
]
}
}
package acmecorp.authz
default allow = false
allow = true {
input.method = "GET"
input.path = ["accounts", id]
input.subject.groups[_] = "support"
input.subject.user = data.tickets[id][_].assignee
}
Example Policy++
"Tech support specialists should only perform
device updates during core business hours
(10AM to 3PM GMT)."
allow {
input.action = "UpdateDevice"
input.subject.groups[_] = "support"
inside_business_hours
}
inside_business_hours {
[hour, minute, second] = time.clock(time.now_ns())
time_of_day = (hour * seconds_per_hour) + (minute * seconds_per_minute) + second
time_of_day >= 10 * seconds_per_hour
time_of_day <= 15 * seconds_per_hour
}
seconds_per_hour = 60 * 60
seconds_per_minute = 60
Example Policy++
For example:
● valuable data on PVs must be retained 6 months after app is undeployed
● default reclaim policy is "delete" because clean-up is annoying
● admins want to grant exceptions for specific apps to use "retain" policy
Beyond the app
Kubernetes Admission Policy
package kubernetes.admission
import data.kubernetes.storageclasses
reclaim_exceptions = {"payments", "clickstream"}
# Generate an admission control violation error if...
deny["invalid reclaim policy requested by app"] {
# Input object is a PVC and ...
input.kind = "PersistentVolumeClaim"
# StorageClass specifies the "Retain" reclaim policy and...
name = input.spec.storageClassName
storageclasses[name].reclaimPolicy = "Retain"
# The app that owns the PVC is not whitelisted.
not reclaim_exceptions[input.labels["app"]]
}

Dynamic Authorization & Policy Control for Docker Environments

  • 1.
    Dynamic Authorization and Policy Controlfor Docker Container Environments
  • 2.
    TORIN SANDALL Engineer, Styra @sometorin JUSTINCORMACK Engineer, Docker @justincormack
  • 3.
  • 6.
    ● Heterogeneity: manylanguages, protocols, systems ● Dynamism: system changing, policies changing ● How do you enforce and audit policy? ● Correctness, performance, cost Challenges
  • 7.
    Today we aregoing to talk about tools and approaches for solving this problem! Computers can help!
  • 8.
  • 9.
    Example Scenario: AcmeCorp ●Pet camera & home monitoring devices ● Watch your cute pets while you travel ● Always on, always connected
  • 10.
    AcmeCorp's Architecture Portal PaymentsAccounts DeviceUpdatesDevice Streams Stream Archiving HTTP HTTP gRPCHLS S3 MySQL Alice (Customer) AcmeCorp VPC Janet (Tech support)
  • 11.
    AcmeCorp's Problem Portal PaymentsAccounts DeviceUpdatesDevice Streams Stream Archiving HTTP HTTP gRPCHLS S3 MySQL Alice (Customer) AcmeCorp VPC Janet (Tech support)
  • 12.
    "Tech support specialistsaccessing customer data must be assigned to an open ticket for the customer they are assisting." Example Policy
  • 13.
    Example Implementation Portal PaymentsAccounts DeviceUpdatesDevice Streams Stream Archiving HTTP HTTP gRPCHLS S3 MySQL Alice (Customer) AcmeCorp VPC Janet (Tech support) authzauthzauthzauthz authz authz
  • 14.
    Example Implementation business logic authorizationlogic def get_user_account(req): if not authorized(req): raise status(403) return db.read_user_account(req.id) def authorized(req): if "support" in req.subject.groups: for ticket in get_open_tickets(req.id): if req.subject.user == ticket.assignee: return True return False # other authorization logic...
  • 15.
    Obvious questions... def get_user_account(req): ifnot authorized(req): raise status(403) return db.read_user_account(req.id) def authorized(req): if "support" in req.subject.groups: for ticket in get_open_tickets(req.id): if req.subject.user == ticket.assignee: return True return False # other authorization logic... What happens when the policy changes... What if the policy requires additional context... What if the customer requires control of the policy... What if the policy was not implemented correctly... What if you have 100+ services written in N langs...
  • 16.
    What is OpenPolicy Agent?
  • 17.
    OPA: general-purpose policyengine Inception Project started in 2016 at Styra. Goal Unify policy enforcement across the stack. Use Cases Admission control Authorization ACLs RBAC IAM ABAC Risk management Data Protection Data Filtering Users Netflix Chef Medallia Cloudflare State Street Pinterest Intuit ...and many more. Today CNCF project (Sandbox) 36 contributors 1.5K stars 400 slack members 20+ integrations Open Policy Agent
  • 18.
    OPA: general-purpose policyengine Service OPA Policy (Rego) Data (JSON) Policy Query Policy Decision Enforcement Request
  • 19.
    OPA: general-purpose policyengine Accounts OPA Policy (Rego) Data (JSON) Policy Query Policy Decision Enforcement Request GET /accounts/alice HTTP/1.1 Authorization: janet
  • 20.
    OPA: general-purpose policyengine Accounts OPA Policy (Rego) Data (JSON) Policy Query Policy Decision Enforcement Request GET /accounts/alice HTTP/1.1 Authorization: janet { "method": "GET", "path": ["accounts", "alice"], "user": "janet" } true or false
  • 21.
    OPA: general-purpose policyengine Service OPA Policy (Rego) Data (JSON) Policy Query Policy Decision Enforcement Request Service refers to any one of: ● Custom service ● Kubernetes API server ● Message broker ● SSH daemon ● CI/CD pipeline script
  • 22.
    OPA: general-purpose policyengine Service OPA Policy (Rego) Data (JSON) Policy Query Policy Decision Enforcement Request Service refers to any one of: ● Custom service ● Kubernetes API server ● Message broker ● SSH daemon ● CI/CD pipeline script Input can be any JSON value: "alice" ["v1", "users", "bob"] {"kind": "Pod", "spec": …} Output can be any JSON value: true "request rejected" {"servers": ["web1", "web2"]}
  • 23.
  • 24.
    "Tech support specialistsaccessing customer data must be assigned to an open ticket for the customer they are assisting." Example Policy
  • 25.
  • 26.
    ● write rulesnot code ● re-use same policy across all applications and languages ● support audit and testing A better way to build policy
  • 27.
    OPA: Integrations Data Filtering AdmissionControl “Restrict ingress hostnames for payments team.” “Ensure container images come from corporate repo.” API Authorization “Deny test scripts access to production services.” “Allow analysts to access APIs serving anonymized data.” Data Protection Linux PAM SSH & sudo “Only allow on-call engineers to SSH into production servers.” "Trades exceeding $10M must be executed between 9AM and 5PM and require MFA." "Users can access files for past 6 months related to the region they licensed."
  • 28.
    ● allow developersto see effect of policy earlier ● show effect of policy as soon as possible ● have ability to move point of enforcement around if it improves code ● test policy was enforced in production as well “Shift left”
  • 29.
    ● https://www.openpolicyagent.org/ ● https://slack.openpolicyagent.org/ ●https://github.com/open-policy-agent ● Examples from talk ○ https://github.com/tsandall/dockercon-eu-2018 Questions? I want more!
  • 30.
  • 31.
    OPA: general-purpose policyengine Service OPA Policy (rego) Data (json) Policy Query Policy Decision Enforcement ● Declarative Policy Language (Rego) ○ Can identity I do operation O on resource R? ■ ACLs, RBAC, IAM, ABAC ○ What invariants does workload W violate? ■ Enforce, audit, dry-run ○ Which records should bob be allowed to see? ■ Constraints on data
  • 32.
    OPA: general-purpose policyengine Service OPA Policy (rego) Data (json) Policy Query Policy Decision Enforcement ● Declarative Policy Language (Rego) ○ Can identity I do operation O on resource R? ■ ACLs, RBAC, IAM, ABAC ○ What invariants does workload W violate? ■ Enforce, audit, dry-run ○ Which records should bob be allowed to see? ■ Constraints on data ● Library, sidecar, host-level daemon ○ Policy and data are kept in-memory ○ Zero decision-time dependencies
  • 33.
    OPA: general-purpose policyengine Service OPA Policy (rego) Data (json) Policy Query Policy Decision Enforcement ● Declarative Policy Language (Rego) ○ Can identity I do operation O on resource R? ■ ACLs, RBAC, IAM, ABAC ○ What invariants does workload W violate? ■ Enforce, audit, dry-run ○ Which records should bob be allowed to see? ■ Constraints on data ● Library, sidecar, host-level daemon ○ Policy and data are kept in-memory ○ Zero decision-time dependencies ● Management APIs for control & observability ○ Bundle service API for sending policy & data to OPA ○ Status service API for receiving status from OPA ○ Log service API for receiving audit log from OPA
  • 34.
    OPA: general-purpose policyengine Service OPA Policy (rego) Data (json) Policy Query Policy Decision Enforcement ● Declarative Policy Language (Rego) ○ Can identity I do operation O on resource R? ■ ACLs, RBAC, IAM, ABAC ○ What invariants does workload W violate? ■ Enforce, audit, dry-run ○ Which records should bob be allowed to see? ■ Constraints on data ● Library, sidecar, host-level daemon ○ Policy and data are kept in-memory ○ Zero decision-time dependencies ● Management APIs for control & observability ○ Bundle service API for sending policy & data to OPA ○ Status service API for receiving status from OPA ○ Log service API for receiving audit log from OPA ● Tooling to build, test, and debug policy ○ opa run, opa test, opa fmt, opa deps, opa check, etc. ○ VS Code plugin, Tracing, Profiling, etc.
  • 35.
    Hands on examplewith OPA Accounts OPA Bundle Server Policy + Tickets GET /accounts/alice HTTP/1.1 Authorization: Bearer ... Input { method: GET path: [accounts, alice] subject: { user: janet groups: [support] } } Result true (allow) or false (deny) Data { tickets: { alice: [ {assignee: janet}, {assignee: bob} ], ken: [ {assignee: janet} ] } }
  • 36.
    Hands on examplewith OPA Input { method: GET path: [accounts, alice] subject: { user: janet groups: [support] } } Data { tickets: { alice: [ {assignee: janet} ] } } package acmecorp.authz default allow = false allow = true { input.method = "GET" input.path = ["accounts", id] input.subject.groups[_] = "support" input.subject.user = data.tickets[id][_].assignee }
  • 37.
    Example Policy++ "Tech supportspecialists should only perform device updates during core business hours (10AM to 3PM GMT)."
  • 38.
    allow { input.action ="UpdateDevice" input.subject.groups[_] = "support" inside_business_hours } inside_business_hours { [hour, minute, second] = time.clock(time.now_ns()) time_of_day = (hour * seconds_per_hour) + (minute * seconds_per_minute) + second time_of_day >= 10 * seconds_per_hour time_of_day <= 15 * seconds_per_hour } seconds_per_hour = 60 * 60 seconds_per_minute = 60 Example Policy++
  • 39.
    For example: ● valuabledata on PVs must be retained 6 months after app is undeployed ● default reclaim policy is "delete" because clean-up is annoying ● admins want to grant exceptions for specific apps to use "retain" policy Beyond the app
  • 40.
    Kubernetes Admission Policy packagekubernetes.admission import data.kubernetes.storageclasses reclaim_exceptions = {"payments", "clickstream"} # Generate an admission control violation error if... deny["invalid reclaim policy requested by app"] { # Input object is a PVC and ... input.kind = "PersistentVolumeClaim" # StorageClass specifies the "Retain" reclaim policy and... name = input.spec.storageClassName storageclasses[name].reclaimPolicy = "Retain" # The app that owns the PVC is not whitelisted. not reclaim_exceptions[input.labels["app"]] }