At the Italian National Institute for Nuclear Physics (INFN) an effort is being made in leveraging modern cloud-native paradigms to build the scientific analysis infrastructure of the future. The talk will focus on the adopted storage platform that is based on Min.io with a fine grained authorization model obtained by the combination of the AWS STS authentication flow and the native integration with the OpenPolicyAgent. Moreover, a set of tools have been developed to allow users to access data with different mode spanning from the canonical S3 APIs until a POSIX like experience.
why an Opensea Clone Script might be your perfect match.pdf
stackconf 2021 | Setup Min.io and Open Policy Agent for a multi purpose scientific platform
1. Setup Min.io and Open Policy Agent for
a multi-purpose scientific platform
D. Ciangottini, INFN
2. stackconf-2021 $ > whoami
● IT Researcher at Istituto Nazionale di Fisica
Nucleare (INFN)
○ Translated: National Institute of Nuclear Physics
● Involved on R&D activities to deploy cloud-native
solutions for the next-gen of data analysis
infrastructure for the INFN/LHC users
2
7. Computing @INFN
Long tradition supporting experiments
For last 10 years, that was meant for supporting LHC
communities.
Quickly widening to many other use cases recently.
7
8. On-demand computing resources
for the INFN communities
● Easy access to on-demand solutions for scientific data analysis
● Composable services to extend and customize the environment
● Provide INFN users with a set of core tools centrally managed
○ E.g. JupyterHub-aaS, object storage, sync&share ....
● Federating the resources from several centers at national level
● Becoming the hub of reference for most of the activities and projects @ INFN
8
The INFN-Cloud initiative
9. The INFN-Cloud infrastructure
A backbone composed by the main
computing centers for central services
+ a federation of smaller sites
providing resources for user deployments
9
10. Computing challenges
Data storage for multiple communities
Providing a cloud storage hosted on the backbone infrastructure means:
● Geo-distributed storage federation
● Heterogeneous set of requirements
○ Object size (few MBs, to 10s GBs)
○ Workflow (imaging, columnar analysis...) and data access (posix, webdav, s3 etc..)
but it also means providing the tools:
● “F.A.I.R.” data
○ Findable, Accessible, Interoperable, Reusable
○ Make it intuitive or eventually transparent for the end user
● Focus on the “R.”! Allow sustainable reuse of data
10
11. Wrapping up...
Requirements
● Dynamic user registration/acls integrated with Indigo-IAM/OIDC
● Fine grain authz (ro, rw, per file/per user group )
● Easy and robust ops
○ gitOps eventually
● Accessible via posix
● WebUI access
● Vendor neutral
● Open source
11
12. Quick look to the solution
The components
● Minio has been chosen as the cloud
storage solution
○ S3 compliance
○ Powerful WebUI
○ Proven scalability
● Native integration with AWS STS
credentials
○ External OIDC IdP’s (e.g. Indigo IAM)
● Support for customizable authZ
policies with OpenPolicyAgent
12
13. User management
Indigo-IAM
● Authentication via SAML IdPs or identity
federations, OpenID Connect providers
and X.509 certificates
● Enrollment and registration
functionalities
○ so that users can join groups/collaborations
according to well-defined flows
○ provides services to manage group
membership
○ attributes assignment and account linking
functionality
● Integrable as IdP with any OIDC
compliant service
13
14. Cloud storage AuthN:
AWS STS credentials
● Endpoint service that enables clients to request temporary credentials for MinIO
resources
● AWS AssumeRoleWithWebIdentity flow is supported out of the box
○ Allowing the integration with any OpenID Connect-compatible identity provider ⇒ our IAM service
14
15. Cloud storage AuthZ:
OpenPolicyAgent integration
● A lightweight general-purpose policy
engine that can be co-located with
Minio server
● OPA HTTP API used to authorize Minio
STS credentials
○ Fine grain ACLs
■ Every token claim from authN can be
selected for policy checking
○ Dynamic config
○ Decoupled from the storage configuration
16. Example of an e2e AuthZ flow
16
OPA server checks custom
policies for the input
API
Policy example
# Allow users to manage their own data.
allow {
username := split(lower(input.claims.email),"@")[0]
input.bucket == username
input.claims.aud == "minio-auth"
permissions := rl_permissions["user"]
p := permissions[_]
p == {"action": input.action}
}
Authorized
JWT
List of operation permissions defined on OPA
"claims": {
"accessKey": "VP43M6DO1N53U2LUBTZ3",
"aud": "https://wlcg.cern.ch/jwt/v1/any",
"client_id":
"5c38c020-b753-4115-a5f4-3f48595e4c1b",
"exp": "1621714730",
"iat": 1621713801,
"iss": "https://login.cloud.infn.it",
"scope": "openid profile email",
"email": “ciangottini@infn.it”
}
Minio STS auth data
17. Managing policies with OPA bundles
the gitOps way
● OPA can periodically download bundles
of policy and data from remote HTTP
servers
○ Allowing for a gitOps based policy
management
● The policies and data are loaded on the
fly without requiring a restart of OPA
○ Policies and data are then applied immediately
17
18. So far so good…
Let’s put the hands on some user tools now!
18
19. Managing temporary credentials:
OIDC-agent
● A set of tools to manage OpenID Connect access tokens and make them
easily usable from the command line
○ ssh-agent design, so users can handle OIDC tokens in a similar way as they do with ssh keys
● Secure sensible information (long living credentials) while exposing short lived
ones (e.g. access token)
● Integrable via API libraries for: python, go and c++
19
20. POSIX access:
RClone + OIDC-Agent integration
To provide posix access we make use of RClone mount capability
A small patch has been applied to add a dedicated S3 provider integrated with
OIDC-Agent
Users, once oidc-agent is configured on its VM, can then mount its own bucket as a
folder with no further actions/authentication steps.
Backups via Restic are enabled through the use of this patched version of RClone
20
21. Make it easier:
STS-wire
21
For cases where the user does not/cannot run oidc-agent:
● a tool has been created to manage both the credential renewal and the rclone
mount in a guided/integrated/opinionated way
We found that to be the preferred solution to mount a bucket content on a laptop
for instance.
22. What about python?
boto3+STS+OIDC-Agent = boto3STS
Access Minio bucket through the
integration of boto library with
temporary credentials
- AWS STS token via IAM
- IAM access token get via
oidc-agent API
Instantiate an S3 session with a simple
line of code
22
24. Deployment models
● Generic centrally maintained service for each INFN user
○ HA K8s cluster on infrastructure backbone
○ FluxCD for gitOps operations
○ Central repo for OPA bundles
● On-demand cloud storage
○ Deploy the solution for a dedicated experiment/group of people
○ On prem or public cloud k8s instance as the ONLY requirement
○ Helm chart configurable via WebUI thanks to Kubeapps
24
26. Self-managed k8s
- Ansible to bring up Kubeapps pointing to the supported INFN Helm charts
- Catalogue of pre configured apps already included
- Minio-Operator to deploy a Minio Tenant with STS credentials and OPA server
- Specifying custom OPA bundles endpoints and other similar configurations
26
27. ● In production supporting physics and not only
○ (e.g. pandemic related research P.L.A.N.E.T.)
R&D continues toward:
● testing/scaling multi-cloud
● improving tools dedicated to data access and reuse
● try out Minio gateway cache instances to reduce latency
27
Wrapping up:
Summary and plans