How to encourage software engineers to adhere* standards
Tokarev Alexander
* But not limited
Who am I
• Sberbank cloud center of excellence head
• Certified Amazon solution architect
• Tons of AWS and GCP finished projects
Agenda
• Standards in software development
• Open Policy Agent
• Rego
• Tools
• OPA for policy based control
• OPA use cases
• Q&A
•
Current state
• 100 OpenShift applications in production
• Monthly basis releases
• Microservices
Result
• Reinventing the wheel
• Overloaded security software
Solution
• Create requirements and recommendations
• Create containerization standards
• Create cloud best practices
Risks
• Manual compliance checks
• Software delivery pace is slow
Compliance maturity levels
1. Base validation
2. Version control integration
3. Reusable validations
4. Automatic validation
5. Validation-based mutations
6. Advanced validations
Compliance maturity levels
1. Base validation
2. Version control integration
3. Reusable validations
4. Automatic validation
5. Validation-based mutations
6. Advanced validations
Kubetest
https://github.com/garethr/kubetest
Kyverno
https://github.com/nirmata/kyverno
https://kyverno.io/
1. CRD is too verbose
2. Nailed to k8s
3. There is no DSL
4. All validations are regexp’s
5. No debug option
Kyverno
CPU and memory limit validation
Sonar custom rules
A lot of stuff!
And don’t forget to compile!
7
What is OPA
• Policy engine
• Go language
• In-memory speed
• Could validate anything
• Declarative Rego language
• SQL-like
• External data processing
• Perfect integration
• Customizable response format
Simplest policy “Limits are populated”
3 lines!
Get all K8S containers
“For each” validation
Prepare output
Package name
Rule name
K8S advanced policy “Readiness probe exists”
Another type of probes
Only deployments should be checked
All attributes are populated
“For each” validation
Prepare output
What else
• Maven
• NPM
• Terraform
• ER diagrams – Power Designer XML inside
JSON only!
Sberbank OPA
+ =
Any configuration:
K8S YAML
Pom.xml
.properties
.ini
…………………………………..
+ UI
Plugin system
Sberbank OPA
Policy example
Policy example
Required library exists
Policy
• Check permitted image list:
fluentbit2, envoy3, nginx:1.7.9
• Output prohibited image list
Examples
InvalidValid
Not related with search condition at all
Let’s try this one! We should trust AWS!
Policy implementation example
19 lines!!!
WTF?!
Object comprehension:
Translate array as is to object
key
Condition
Policy implementation example “Sberbank naive”
13 lines!
Still too complicated!
Object!!!
Policy implementation example “Sberbank native”
The best!
7 lines!!!
Get all K8S containers
All containers out of permitted
Tools
https://play.openpolicyagent.org/
Tools
That’s it 
https://github.com/tsandall/vscode-opa
1. Syntax check
2. Highlighting
3. Evaluation
4. Trace
5. Profile
6. Unit tests
Compliance maturity levels
1. Base validation
2. Version control integration
3. Reusable validations
4. Automatic validation
5. Validation-based mutations
6. Advanced validations
OPA integration modes
• Push – OPA API
1. curl -X PUT http://localhost:8181/v1/data/checks/ --data-binary @check_packages.rego
2. curl -X PUT http://localhost:8181/v1/data/checks/packages --data-binary @permitted_packages.json
• Pull – OPA bundle server
Load data
OPA server
Bundle server
http GET Request
*.gzip
Rego + dataETAG cache header
• Run validation
• Process results via Jenkins, admission controller, UI, whatever…
Validation
curl -X POST http://localhost:8181/v1/data/checks/npm/valid_package --data-binary
'{ "input": { "private": true,"dependencies": { "clickstream": "^4.6.1" } } }'
An object to check
Package name Rule name
Features nobody use
• Unit tests
• Tracing, profiling and benchmarking
• Conditional evaluation
• HTTP requests
• JWT
• Nobody explains features intension
• No chance to fine using Google
Search result
Proper trace
Found occasionally!
CNCF project
It’s alive
It’s alive
Conftest
• OPA-based tool to check config files
• Data format conversion plugins
• Plugins are auto-applied based on files extension
Conftest
Conftest
+
• Data conversion plugins: YAML, INI, TOML, HOCON, HCL, HCL1, CUE, Dockerfile, EDN, VCL, XML
• Full of Rego samples
-
• Plugins are written using Go
• Validation rules are stored in Docker registry
• Not stable
• Single-thread
• Command-line tool
Compliance maturity levels
1. Base validation
2. Version control integration
3. Reusable validations
4. Automatic validation
5. Validation-based mutations
6. Advanced validations
Gatekeeper
1. Admission controller
2. Mutating admission controller
1
4
3
2
5
Examples
Examples
Pros
• Validations reuse using templates approach
• Huge library of examples
• Tight integration with K8S
Cons
• K8S cluster is required
• No unit tests for templates and constraints
• Rather verbose CRD
• UI is absent
• No options to invoke external services
• No options to use bundle server
• Mutation admission controller is in development stage
Gatekeeper
Recommended validations
+
Not K8S validations
Bundle server
Solution architecture
GateKeeper
Open Policy Agent
BitBucket
Governance as a code
repository
K8S
Policies
Java
configs
CI/CD
artifacts
Jenkins
K8S
config
Rules
metadata
Governance as a code UI
Sber-made 
Sber-made 
Policies
unit tests
templates
constraints
BitBucket
Software
repository
K8S
config
BitBucket
Whatever we need
repository
Any
artifact
tar
Mandatory validations
Results
1. 24 mandatory rules
2. 12 recommended rules
3. 120 rules are planned
4. 10 seconds – 1 project validation
“Opensource or not opensource,
that is the question
Compliance maturity levels
1. Base validation
2. Version control integration
3. Reusable validations
4. Automatic validation
5. Validation-based mutations
6. Advanced validations
Goldman Sachs
• 12 shared clusters on VMs
• 150 namespace per cluster
• 1800 namespace total
Inventory
Per-namespace management:
1. Security inventory: Roles, RoleBindings, ClusterRoles,
ClusterRoleBindings
2. Capacity Inventory: cpu, memory for ResourceQuotas and
LimitRange
3. NFS inventory: Persistent volumes and Persistent volume claims
Solution design
• Pull-mode for inventory objects using bundle server
• K8S objects are created based on OPA validation output
• Hand-made mutation admission controller
Monitoring
1. Go routine and thread counts
2. Memory in use (stack vs heap)
3. Memory allocated (stack vs heap)
4. GC stats
5. Pointer lookup count
6. Roundtrip time by http method
7. Percentage of requests under 500ms, 200ms, 50ms
8. Mean API request latency
9. Recommendations for alerting
10. Number of OPA instances up at any given time
11. OPA responding under 200ms for 95% of requests
Miguel.Uzcategui@ny.email.gs.com @tlhinrichs
Provisioning
Controller Manager
Policy
(Rego)
GS Inventory
Provision
Request data
Kube-mgmt
GIT
Cluster state
Bundle server
Notify changes
Request data
Results
• 24 validations
• 1 Mb security reference data – 3500 rules
• 2 Mb PV, CPU and memory quotas – 8000 rules
• Cluster target state after any change –2-5 minutes
Compliance maturity levels
1. Base validation
2. Version control integration
3. Reusable validations
4. Automatic validation
5. Validation-based mutations
6. Advanced validations
Fugue
• Governance as a code as a service
• Multi-cloud validation - Amazon, Azure, GCP
• Implemented as a mutation admission controller
• Rollback process for erroneous configuration
• Presets for PSI DSS, HIPAA, SOC
• OPA-based engine
• Self-made Rego interpreter
• https://www.fugue.co/
• https://github.com/fugue/fregot
• https://github.com/fugue/custom-rules
Fugue
Fugue
RDS HA rule
Elastic web UI limitation rule
Fregot
Fregot
• Simplified debug
• Breakpoints
• Watch variables
• Extended troubleshooting
What should I do?!
Compliance maturity levels
1. Base validation
2. Version control integration
3. Reusable validations
4. Automatic validation
5. Validation-based mutations
6. Advanced validations
Validation toolkit
1. Base validations
2. Version control integration
3. Reusable validations
4. Automatic validation
5. Validation-based mutations
6. Advanced validations
OPA use cases
• Any structured data validation
• Mutating validated data
• Authorization
• Database row level security – SQL databases, ElasticSearch
• And games  https://medium.com/@KevinHoffman/corrupting-the-open-policy-agent-to-run-my-game-711f340adb5a
Pinterest
1. Zero-trust security
2. OPA-based authorization
3. K8S + VM
4. Kafka
5. Envoy
6. 4.1M QPS avg
7. 8.5M QPS peak
8. Authorization result cache – 5 min TTL
9. 204K QPS – OPA
10. 437K QPS peak – OPA
Performance
• Network footprint
• OPA library single-thread
• Use OPA server instead – multi-thread
• Memory for data – 20x from raw data
• Partial evaluation – ms to ns
• Extra memory consumption for partial evaluation cache
• Beware arrays
• Use objects instead
OPA as a sidecar
Policy delivery
Service
Sidecar
Container
S3
Zookeper
Bitbucket
K8S cluster
Commit hook
OPA authorization
OPA authorization
OPA authorization
Gloo enterprise edition only!
Envoy
1. L3/4/7 proxy
2. C++
3. Filters chains
4. http/2 support
5. Dynamic configuration update
6. Cloud native patterns implementation
1. Service mesh
2. Envoy fleet
3. Configuration distribution
4. Good CRD
5. Observability and security
OPA authorization
OPA authorization
OPA authorization
OPA authorization
Penetration testing
Conclusions
The results of this Cure53 security assessment of the OPA compound are positive.
Having said that, Cure53 specifically finds that the provided examples of implementations were very
minimal and straightforward, which resulted in a small attack surface and absence of security-
relevant issues
OPA is perfect for authorization purpose!
Penetration testing
Identified Vulnerabilities
OPA-01-001 Server: Insecure Default Config allows to bypass Policies
(Medium)
OPA-01-005 Server: OPA Query Interface is vulnerable to XSS (High)
Miscellaneous Issues
OPA-01-002 Server: Query Interface can be abused for SSRF (Medium)
OPA-01-003 Server: Unintended Behavior due to unclear Documentation
(Medium)
OPA-01-004 Server: Denial of Service via GZip Bomb in Bundle (Info)
OPA-01-006 Server: Path Mismatching via HTTP Redirects (Info) Conclusions
Introduct
Penetration testing
What is more, the shared documentation was unclear and misleading
at times (see OPA-01-001), so that arriving at a secure configuration
and integration would require a user to have an extensive and nearly-
internal-level of knowledge. As people normally cannot be expected
“to know what to look for”, this poses a risk of insecure configurations.
Conclusion
• Deadly hard to find proper code samples
• Even though documentation is huge
• Active development phase
• A lot of use cases not limited to configuration checks
• Safety confirmed by penetration tests
• UI is absent
• Native language is perfect to express policies
• Try to learn Rego
• Fregot is perfect for debug
Q&A
We are hiring!
Mail: shtock@mail.ru
Socials: https://www.linkedin.com/in/alexander-tokarev-14bab230/

Open Policy Agent for governance as a code

  • 1.
    How to encouragesoftware engineers to adhere* standards Tokarev Alexander * But not limited
  • 2.
    Who am I •Sberbank cloud center of excellence head • Certified Amazon solution architect • Tons of AWS and GCP finished projects
  • 3.
    Agenda • Standards insoftware development • Open Policy Agent • Rego • Tools • OPA for policy based control • OPA use cases • Q&A •
  • 4.
    Current state • 100OpenShift applications in production • Monthly basis releases • Microservices
  • 5.
    Result • Reinventing thewheel • Overloaded security software
  • 6.
    Solution • Create requirementsand recommendations • Create containerization standards • Create cloud best practices
  • 7.
    Risks • Manual compliancechecks • Software delivery pace is slow
  • 8.
    Compliance maturity levels 1.Base validation 2. Version control integration 3. Reusable validations 4. Automatic validation 5. Validation-based mutations 6. Advanced validations
  • 9.
    Compliance maturity levels 1.Base validation 2. Version control integration 3. Reusable validations 4. Automatic validation 5. Validation-based mutations 6. Advanced validations
  • 11.
  • 12.
    Kyverno https://github.com/nirmata/kyverno https://kyverno.io/ 1. CRD istoo verbose 2. Nailed to k8s 3. There is no DSL 4. All validations are regexp’s 5. No debug option
  • 13.
    Kyverno CPU and memorylimit validation
  • 14.
    Sonar custom rules Alot of stuff! And don’t forget to compile! 7
  • 15.
    What is OPA •Policy engine • Go language • In-memory speed • Could validate anything • Declarative Rego language • SQL-like • External data processing • Perfect integration • Customizable response format
  • 16.
    Simplest policy “Limitsare populated” 3 lines! Get all K8S containers “For each” validation Prepare output Package name Rule name
  • 17.
    K8S advanced policy“Readiness probe exists” Another type of probes Only deployments should be checked All attributes are populated “For each” validation Prepare output
  • 19.
    What else • Maven •NPM • Terraform • ER diagrams – Power Designer XML inside JSON only!
  • 20.
    Sberbank OPA + = Anyconfiguration: K8S YAML Pom.xml .properties .ini ………………………………….. + UI Plugin system
  • 21.
  • 22.
  • 23.
  • 24.
    Policy • Check permittedimage list: fluentbit2, envoy3, nginx:1.7.9 • Output prohibited image list
  • 25.
  • 26.
    Not related withsearch condition at all Let’s try this one! We should trust AWS!
  • 27.
    Policy implementation example 19lines!!! WTF?! Object comprehension: Translate array as is to object key Condition
  • 28.
    Policy implementation example“Sberbank naive” 13 lines! Still too complicated! Object!!!
  • 29.
    Policy implementation example“Sberbank native” The best! 7 lines!!! Get all K8S containers All containers out of permitted
  • 30.
  • 31.
    Tools That’s it  https://github.com/tsandall/vscode-opa 1.Syntax check 2. Highlighting 3. Evaluation 4. Trace 5. Profile 6. Unit tests
  • 32.
    Compliance maturity levels 1.Base validation 2. Version control integration 3. Reusable validations 4. Automatic validation 5. Validation-based mutations 6. Advanced validations
  • 33.
    OPA integration modes •Push – OPA API 1. curl -X PUT http://localhost:8181/v1/data/checks/ --data-binary @check_packages.rego 2. curl -X PUT http://localhost:8181/v1/data/checks/packages --data-binary @permitted_packages.json • Pull – OPA bundle server
  • 34.
    Load data OPA server Bundleserver http GET Request *.gzip Rego + dataETAG cache header
  • 35.
    • Run validation •Process results via Jenkins, admission controller, UI, whatever… Validation curl -X POST http://localhost:8181/v1/data/checks/npm/valid_package --data-binary '{ "input": { "private": true,"dependencies": { "clickstream": "^4.6.1" } } }' An object to check Package name Rule name
  • 36.
    Features nobody use •Unit tests • Tracing, profiling and benchmarking • Conditional evaluation • HTTP requests • JWT • Nobody explains features intension • No chance to fine using Google
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
    Conftest • OPA-based toolto check config files • Data format conversion plugins • Plugins are auto-applied based on files extension
  • 44.
  • 45.
    Conftest + • Data conversionplugins: YAML, INI, TOML, HOCON, HCL, HCL1, CUE, Dockerfile, EDN, VCL, XML • Full of Rego samples - • Plugins are written using Go • Validation rules are stored in Docker registry • Not stable • Single-thread • Command-line tool
  • 46.
    Compliance maturity levels 1.Base validation 2. Version control integration 3. Reusable validations 4. Automatic validation 5. Validation-based mutations 6. Advanced validations
  • 47.
    Gatekeeper 1. Admission controller 2.Mutating admission controller 1 4 3 2 5
  • 48.
  • 49.
  • 50.
    Pros • Validations reuseusing templates approach • Huge library of examples • Tight integration with K8S
  • 51.
    Cons • K8S clusteris required • No unit tests for templates and constraints • Rather verbose CRD • UI is absent • No options to invoke external services • No options to use bundle server • Mutation admission controller is in development stage
  • 52.
  • 53.
    Recommended validations + Not K8Svalidations Bundle server Solution architecture GateKeeper Open Policy Agent BitBucket Governance as a code repository K8S Policies Java configs CI/CD artifacts Jenkins K8S config Rules metadata Governance as a code UI Sber-made  Sber-made  Policies unit tests templates constraints BitBucket Software repository K8S config BitBucket Whatever we need repository Any artifact tar Mandatory validations
  • 54.
    Results 1. 24 mandatoryrules 2. 12 recommended rules 3. 120 rules are planned 4. 10 seconds – 1 project validation
  • 55.
    “Opensource or notopensource, that is the question
  • 56.
    Compliance maturity levels 1.Base validation 2. Version control integration 3. Reusable validations 4. Automatic validation 5. Validation-based mutations 6. Advanced validations
  • 57.
    Goldman Sachs • 12shared clusters on VMs • 150 namespace per cluster • 1800 namespace total
  • 58.
    Inventory Per-namespace management: 1. Securityinventory: Roles, RoleBindings, ClusterRoles, ClusterRoleBindings 2. Capacity Inventory: cpu, memory for ResourceQuotas and LimitRange 3. NFS inventory: Persistent volumes and Persistent volume claims
  • 59.
    Solution design • Pull-modefor inventory objects using bundle server • K8S objects are created based on OPA validation output • Hand-made mutation admission controller
  • 60.
    Monitoring 1. Go routineand thread counts 2. Memory in use (stack vs heap) 3. Memory allocated (stack vs heap) 4. GC stats 5. Pointer lookup count 6. Roundtrip time by http method 7. Percentage of requests under 500ms, 200ms, 50ms 8. Mean API request latency 9. Recommendations for alerting 10. Number of OPA instances up at any given time 11. OPA responding under 200ms for 95% of requests
  • 61.
    Miguel.Uzcategui@ny.email.gs.com @tlhinrichs Provisioning Controller Manager Policy (Rego) GSInventory Provision Request data Kube-mgmt GIT Cluster state Bundle server Notify changes Request data
  • 62.
    Results • 24 validations •1 Mb security reference data – 3500 rules • 2 Mb PV, CPU and memory quotas – 8000 rules • Cluster target state after any change –2-5 minutes
  • 63.
    Compliance maturity levels 1.Base validation 2. Version control integration 3. Reusable validations 4. Automatic validation 5. Validation-based mutations 6. Advanced validations
  • 64.
    Fugue • Governance asa code as a service • Multi-cloud validation - Amazon, Azure, GCP • Implemented as a mutation admission controller • Rollback process for erroneous configuration • Presets for PSI DSS, HIPAA, SOC • OPA-based engine • Self-made Rego interpreter • https://www.fugue.co/ • https://github.com/fugue/fregot • https://github.com/fugue/custom-rules
  • 65.
  • 66.
    Fugue RDS HA rule Elasticweb UI limitation rule
  • 67.
  • 68.
    Fregot • Simplified debug •Breakpoints • Watch variables • Extended troubleshooting What should I do?!
  • 69.
    Compliance maturity levels 1.Base validation 2. Version control integration 3. Reusable validations 4. Automatic validation 5. Validation-based mutations 6. Advanced validations
  • 70.
    Validation toolkit 1. Basevalidations 2. Version control integration 3. Reusable validations 4. Automatic validation 5. Validation-based mutations 6. Advanced validations
  • 71.
    OPA use cases •Any structured data validation • Mutating validated data • Authorization • Database row level security – SQL databases, ElasticSearch • And games  https://medium.com/@KevinHoffman/corrupting-the-open-policy-agent-to-run-my-game-711f340adb5a
  • 72.
    Pinterest 1. Zero-trust security 2.OPA-based authorization 3. K8S + VM 4. Kafka 5. Envoy 6. 4.1M QPS avg 7. 8.5M QPS peak 8. Authorization result cache – 5 min TTL 9. 204K QPS – OPA 10. 437K QPS peak – OPA
  • 73.
    Performance • Network footprint •OPA library single-thread • Use OPA server instead – multi-thread • Memory for data – 20x from raw data • Partial evaluation – ms to ns • Extra memory consumption for partial evaluation cache • Beware arrays • Use objects instead
  • 74.
    OPA as asidecar
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
    Envoy 1. L3/4/7 proxy 2.C++ 3. Filters chains 4. http/2 support 5. Dynamic configuration update 6. Cloud native patterns implementation 1. Service mesh 2. Envoy fleet 3. Configuration distribution 4. Good CRD 5. Observability and security
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
    Penetration testing Conclusions The resultsof this Cure53 security assessment of the OPA compound are positive. Having said that, Cure53 specifically finds that the provided examples of implementations were very minimal and straightforward, which resulted in a small attack surface and absence of security- relevant issues OPA is perfect for authorization purpose!
  • 85.
    Penetration testing Identified Vulnerabilities OPA-01-001Server: Insecure Default Config allows to bypass Policies (Medium) OPA-01-005 Server: OPA Query Interface is vulnerable to XSS (High) Miscellaneous Issues OPA-01-002 Server: Query Interface can be abused for SSRF (Medium) OPA-01-003 Server: Unintended Behavior due to unclear Documentation (Medium) OPA-01-004 Server: Denial of Service via GZip Bomb in Bundle (Info) OPA-01-006 Server: Path Mismatching via HTTP Redirects (Info) Conclusions Introduct
  • 86.
    Penetration testing What ismore, the shared documentation was unclear and misleading at times (see OPA-01-001), so that arriving at a secure configuration and integration would require a user to have an extensive and nearly- internal-level of knowledge. As people normally cannot be expected “to know what to look for”, this poses a risk of insecure configurations.
  • 87.
    Conclusion • Deadly hardto find proper code samples • Even though documentation is huge • Active development phase • A lot of use cases not limited to configuration checks • Safety confirmed by penetration tests • UI is absent • Native language is perfect to express policies • Try to learn Rego • Fregot is perfect for debug
  • 88.
    Q&A We are hiring! Mail:shtock@mail.ru Socials: https://www.linkedin.com/in/alexander-tokarev-14bab230/