2. Outline
• Cloud Validation and Testing: Why?
• Timeline & Context
• TestSuite & Scientific workload deployments
• EOSC Roadmap & Benefits for the Community
2
3. Cloud Validation & Testing: Why?
• Cloud Commodity Services? Perhaps, but they still need validation
for multiple domain research use cases
• Which type of GPUs you provide? Do you provide Quantum HW?
• Which Libraries do you offer for ML? Do you offer fast interconnects?
• At which scale you offer resources? Which regions are they available? At
what speed and path do you interconnect to the GÉANT network?
• Several research domains, several cloud platforms available
• Not feasible for manually run tests
• Validation needs to scale in type, cloud stack and number of services
3
5. CERN contribution - OCRE Test Suite
5
• Automated
• Deployment with open technologies: Ansible,
Terraform, Docker, K8s
• Heterogeneous
• Containerised tests to be deployed with all
dependencies, providing working examples to
researchers in how to deploy applications
• Central Repository
• Collects tests and validation results in a structured
manner (CERN S3 service)
• Results Dashboard
• Allows organisations to get a results overview,
dashboard consumes JSONs files from the CERN S3
bucket
6. Process to include tests
• Discussion with developers to include the use case
• Assessment of the work to be done and include requirements
• Collection of information
• Documentation, Contact Person and Applicable License
•Licensing established
• Test Suite: developed by CERN, FOSS under AGPL
• Test License: Responsibility of Test Owners
6
7. OCRE - Deployments
7
Benchmark / Test Description and details Run on Domain Covered Contributor
Data Repatriation
Simple Data Repatriation: exporting data from the public
cloud to Zenodo.
Single-node, 8 cores and ~30GB memory.
Accessibility and
Network
Connectivity
CERN
DODAS
Validation of generation of public cloud clusters on-demand
for batch workload execution.
Single-node, 8 cores and ~30GB memory. Compute (CPU) INFN
HEP CPU
Benchmarks
CPU benchmarking based on reduced versions of several
real-world physics workloads.
Single-node, 8 cores and ~30GB memory.
Default configurations.
Compute (CPU) CERN
Networking
performance tests
with perfSONAR
End to end network measurements using perfSONAR.
• Buyer-side endpoint was pse01-gva.cern.ch (used IPv4)
• Latency measured with ping
• Trace measured with traceroute
• Throughput measured with iperf3, single stream
Provider-side endpoint ran on a 8-core and
~30GB VM.
Networking and
Connectivity
ESnet, GÉANT,
Indiana U.,
Internet2, U. of
Michigan RNP,
CERN
Single node GAN
training (ProGAN)
Satellite image analysis and generation using Progressive
Growing GANs. Configurations:
• images_amount: 100
• kimg: 300
Single-node cluster with 1 NVIDIA V100
card.
Compute (GPU)
and machine
learning services
UNOSAT & CERN
Openlab
Distributed GAN
training (NNLO)
Distributed training of Generative Adversarial Networks.
Configurations:
• epochs: 10
• benchmark: nnlo
• datasetSize: 30
Six-node cluster with 1 NVIDIA V100 per
node, hence totalling to 6 GPUs.
Compute (GPU),
machine learning
services and
distributed
computing
CERN Openlab
COSBench and S3
validation
Cloud Object Storage Benchmarking and testing. Default
configurations.
No VM utilised in this case, COSBench’s
source server ran on the CERN Openstack
cloud.
Connectivity,
storage, and APIs
Intel
Additional Information: https://eosc-testsuite.readthedocs.io/en/latest/testsCatalog.html
9. Validation categories defined in OCRE
9
Category 1
The cloud provider offers an extended range of mature services, integration of software, hardware and service options at
scale that can cover a wide number of research applications at scale. This includes, for example, Machine Learning, HPC,
and even Quantum Computing.
User interfaces are offered in different modes (Console, API, CLI), and are straightforward and intuitive for users ranging
from beginner to advanced.
Category 2
The cloud provider offers a range of mature services, with a good level of integration of software and hardware; in some
cases, supporting service options that can cover several research applications including, for example, Machine Learning.
User interfaces are offered in different modes (Console, API, CLI), and their configuration is straightforward and intuitive for
users ranging from beginner to advanced.
Category 3
The cloud provider offers a limited number of services, with very little integration of software and hardware.
User interfaces in different modes (Console, API, CLI) are not always present and require users to be familiar with aspects
of cloud architectures and resource-provisioning methods.
Category 4
The cloud provider offers a very limited number of services where only a small number of applications can be deployed with
virtually no integration across the hardware and software stacks.
User Interfaces in different modes (Console, API, CLI) are not always present and require users to be familiar with aspects
of cloud architectures and resource provisioning methods.
Category 5
The cloud provider offers a customised limited type of service where only specific applications can be deployed at a limited
scale.
Criteria: ease of access, service maturity, scale, integration of h/w & s/w stacks
10. Test Suite: Benefits for the community
10
Working examples of technical deployments
○ Raising awareness of cloud technology/interfaces/costing/optimization
○ Build skills to make informed choices about the best cloud solution to solve a given research problem
Accumulated practical technical experience with 20+ cloud providers
○ Ability to quickly run small samples of real representative workloads
○ Documented recommendations and guidance, based on the experience
○ Lessons learned from the testing and validation activity
○ types of resources supported, software stacks, network connectivity, etc.
Test Suite framework ready to use for cloud procurements
○ Adapted to be used during a market survey of a procurement exercise
○ Cloud offers are commodity, but vary in capabilities and technological implementation
○ Test Suite exposes those differences to research organisations for their benefit
11. References
11
● Repository
○ https://github.com/cern-it-efp/EOSC-Testsuite
● Documentation
○ https://eosc-testsuite.readthedocs.io/en/latest/
○ OCRE Deliverable D4.2 - Lessons learned, recommendations and guidance for
research organisations
● Recorded Demos of the Test Suite
○ EGI conference 2021
o https://www.youtube.com/watch?v=KENk4KnFmhs
○ EGI conference 2020
o https://www.youtube.com/watch?v=ZznFp9IlGR0
12. EOSC Service Operation Model Proposal
12
EOSC Compliance
Testing
Neutral validation and testing of cloud provider capabilities; Providing working examples of deployments;
To be used either before a procurement action, feeding realistic information to the tender exercise, or during contract
execution for effective contract monitoring;
Monitoring of cloud providers regions, verify the use of EC member states for data sovereignty reasons.
EOSC Test Suite interface shall follow the EOSC branding.
Support Roles
Breakdown of responsibilities between Test Suite team and Cloud vendors:
Test runs are executed by the Test Suite team, end researchers, or cloud vendor architects.
Validation results are to be accessed by researchers on all platforms. Each service provider can have access only to its
results.
Cloud service providers: direct support lines covering multiple time zones and are contractually responsible for ensuring that
services remain available to be validated, manageable and secure.
Documentation & Code
Repositories
Documentation is reviewed to allow the tool to be deployed independently.
Extensive documentation is maintained using the RST language, whose source code is available in the Test Suite’s GitHub
repository.
Code compiles into a set of HTML files, which are available online.
The repositories must remain accessible and ensuring the following, README files updated, Correct management of
branches and monitor issues and pull requests, as an open-source project.
Development &
Maintenance Effort
1.5 FTE Software Development or similar background
EOSC Procurement recently announced (35M): https://ted.europa.eu/udl?uri=TED:NOTICE:234660-2022:TEXT:EN:HTML