SlideShare a Scribd company logo
1 of 17
Download to read offline
DAOS Adventures at CERN openlab
Miguel F. Medeiros on behalf of openlab team
miguel.fontes.medeiros@cern.ch
19/11/2020
DAOS User Group 2020 (DUG20)
• CERN’s IT branch for cutting-edge computing technologies.
• Active collaboration with Intel on several projects: https://openlab.cern/members/intel
CERN openlab
Miguel F. Medeiros | DAOS Adventures at CERN openlab 2
https://openlab.cern/
“CERN openlab is a unique public-private partnership that works to accelerate the
development of cutting-edge ICT solutions for the worldwide LHC community and wider
scientific research. Through CERN openlab, CERN collaborates with leading ICT companies
and research institutes.”
19/11/2020
• Work performed under CERN’s openlab umbrella
• CERN openlab comprises many projects and collaborations which we provide technical support.
• The specific openlab DAOS use-cases will not be covered on this talk.
• This talk will only focus on the sysadmin/technical aspects
• We will present our experience and process on commissioning, testing and benchmarking DAOS.
• We will try to provide insights, findings and hopefully valuable feedback for DAOS developers.
Disclaimer
Miguel F. Medeiros | DAOS Adventures at CERN openlab 319/11/2020
Releasing it to our users: the process
Miguel F. Medeiros | DAOS Adventures at CERN openlab 4
• Intel Workshop at CERN on February 2020 (right on time…)
• Benchmark and test it ourselves (with the valuable support of Intel experts)
• Release DAOS with socket configuration
• Allow all users to get acquainted with the system.
• Test the functionality, development and integration aspects.
• Release DAOS with PSM2 configuration
• Exclusive cluster access.
• Allow the users with performance requirements to test their use cases.
19/11/2020
Cluster Hardware
Miguel F. Medeiros | DAOS Adventures at CERN openlab 5
• 4x Cascade Lakes
• 4x SkyLakes (only 2x with DAOS)
19/11/2020
Benchmarking considerations
Miguel F. Medeiros | DAOS Adventures at CERN openlab 6
• Validated the Omnipath cluster with MPI tests & benchmarks
• OSU Micro-Benchmarks, IntelMPI
• Benchmark was based on IOR [1] with DAOS API
• All benchmarks were performed with DAOS v0.9.4
• Topology
• 3x DAOS Servers
• 2x DAOS Clients
[1]: https://github.com/hpc/ior
19/11/2020
Benchmarking with IOR: DAOS Sockets
Miguel F. Medeiros | DAOS Adventures at CERN openlab 7
• Functional tests with Sockets configuration
19/11/2020
Each point represents the average of 20 iterations. Error bars are standard deviation.
Benchmarking with IOR: DAOS PSM2
Miguel F. Medeiros | DAOS Adventures at CERN openlab 8
• We also tested with PSM2 → performance gains
19/11/2020
Each point represents the average of 20 iterations. Error bars are standard deviation.
Benchmarking with IOR: DAOS PSM2 scaling
11/18/2020 Miguel F. Medeiros | DAOS Adventures at CERN openlab 9
• Two node test
Each point represents the average of 20 iterations. Error bars are standard deviation.
Performance limitations
Miguel F. Medeiros | DAOS Adventures at CERN openlab 10
• Finding the missing performance…
• Cascade Lake with half performance.
• We suspect the riser card → HFI card is PCIe 16x but our riser card only provides 8x elec, 8x mech.
19/11/2020
Each point represents the average of 20 iterations. Error bars are standard deviation.
A SysAdmin experience – feedback for developers
Miguel F. Medeiros | DAOS Adventures at CERN openlab 11
DISCLAIMER:
• Please note that all comments provided here are based on a DAOS v0.9.4 experience. If something was improved in
the meantime please ignore its mention.
19/11/2020
A SysAdmin experience – feedback for developers 1
Miguel F. Medeiros | DAOS Adventures at CERN openlab 12
• Installation in our environment was challenging
• Dependency resolution with conflicts → we rely on specific software versions for our internal software.
• Warn users about “sanity/pre-flight” checks before compiling.
• Error reporting
• Difficult to troubleshoot some issues.
• We rely on error messages for troubleshooting and not all errors were mapped on https://daos-
stack.github.io/admin/troubleshooting/#daos-errors
DISCLAIMER: Please note that all comments provided here are based on a DAOS v0.9.4 experience. If something was
improved in the meantime please ignore its mention.
19/11/2020
A SysAdmin experience – feedback for developers 2
Miguel F. Medeiros | DAOS Adventures at CERN openlab 13
• What about a “--detail” option for admins?
NVMe size SCM size Creation date Creator?…
DISCLAIMER: Please note that all comments provided here are based on a DAOS v0.9.4 experience. If something was
improved in the meantime please ignore its mention.
19/11/2020
--detail
Miguel F. Medeiros | DAOS Adventures at CERN openlab 14
• DAOS Servers with constant 30% CPU utilization.
A SysAdmin experience – feedback for developers 3
DISCLAIMER: Please note that all comments provided here are based on a DAOS v0.9.4 experience. If something was
improved in the meantime please ignore its mention.
19/11/2020
Miguel F. Medeiros | DAOS Adventures at CERN openlab 15
• System metrics
• Nice to have these metrics in handy formats (e.g: json).
• Unpack the “message” section.
→ Facilitate the integration on monitoring solutions without extra parsers.
A SysAdmin experience – feedback for developers 4
DISCLAIMER: Please note that all comments provided here are based on a DAOS v0.9.4 experience. If something was
improved in the meantime please ignore its mention.
19/11/2020
Final thoughts
Miguel F. Medeiros | DAOS Adventures at CERN openlab 16
• System commissioning was challenging and interesting!
• Required some debugging in our environment.
• There is still some room for a performance increase
• Issue with riser cards, one HFI card per socket, etc.
• Scalability testing
• We needed more nodes to fully evaluate the system.
• The DAOS server configuration is a plus
• Quite user friendly!
• Works well with configuration management tools (Puppet).
• The difficulty was mostly to understand which settings suited best for our cluster.
• On a personal note, it was a good experience with several learning opportunities.
Thank you Intel for all the support during this Benchmark process!
19/11/2020
home.cern

More Related Content

Similar to DUG'20: 06 - DAOS Adventures at CERN Openlab

Managing ScaleIO as Software on Mesos
Managing ScaleIO as Software on MesosManaging ScaleIO as Software on Mesos
Managing ScaleIO as Software on MesosDavid vonThenen
 
Rakuten Ichiba_Rakuten Technology Conference 2016
Rakuten Ichiba_Rakuten Technology Conference 2016Rakuten Ichiba_Rakuten Technology Conference 2016
Rakuten Ichiba_Rakuten Technology Conference 2016Rakuten Group, Inc.
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Grayharryvanhaaren
 
Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015WaveMaker, Inc.
 
A295 nodejs-knowledge-accelerator
A295   nodejs-knowledge-acceleratorA295   nodejs-knowledge-accelerator
A295 nodejs-knowledge-acceleratorMichael Dawson
 
Deploying Containers in Production and at Scale
Deploying Containers in Production and at ScaleDeploying Containers in Production and at Scale
Deploying Containers in Production and at ScaleMesosphere Inc.
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 
SDNs: hot topics, evolution & research opportunities
SDNs: hot topics, evolution & research opportunitiesSDNs: hot topics, evolution & research opportunities
SDNs: hot topics, evolution & research opportunitiesDiego Kreutz
 
"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect WorkshopPatrick Chanezon
 
engage 2015 - - 2015 - Infrastructure Assessment - Analyze, Visualize and Op...
engage 2015 -  - 2015 - Infrastructure Assessment - Analyze, Visualize and Op...engage 2015 -  - 2015 - Infrastructure Assessment - Analyze, Visualize and Op...
engage 2015 - - 2015 - Infrastructure Assessment - Analyze, Visualize and Op...Christoph Adler
 
Docker Enterprise Deployment Planning
Docker Enterprise Deployment PlanningDocker Enterprise Deployment Planning
Docker Enterprise Deployment PlanningStephane Woillez
 
Strategy, planning and governance for enterprise deployments of containers - ...
Strategy, planning and governance for enterprise deployments of containers - ...Strategy, planning and governance for enterprise deployments of containers - ...
Strategy, planning and governance for enterprise deployments of containers - ...The Incredible Automation Day
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...Ambassador Labs
 
Tips for Installing Cognos Analytics 11.2.1x
Tips for Installing Cognos Analytics 11.2.1xTips for Installing Cognos Analytics 11.2.1x
Tips for Installing Cognos Analytics 11.2.1xSenturus
 
Container Attached Storage (CAS) with OpenEBS - SDC 2018
Container Attached Storage (CAS) with OpenEBS -  SDC 2018Container Attached Storage (CAS) with OpenEBS -  SDC 2018
Container Attached Storage (CAS) with OpenEBS - SDC 2018OpenEBS
 
ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and Optimize
ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and OptimizeISBG 2015 - Infrastructure Assessment - Analyze, Visualize and Optimize
ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and OptimizeChristoph Adler
 
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld
 
Node.js Service - Best practices in 2019
Node.js Service - Best practices in 2019Node.js Service - Best practices in 2019
Node.js Service - Best practices in 2019Olivier Loverde
 

Similar to DUG'20: 06 - DAOS Adventures at CERN Openlab (20)

Managing ScaleIO as Software on Mesos
Managing ScaleIO as Software on MesosManaging ScaleIO as Software on Mesos
Managing ScaleIO as Software on Mesos
 
Rakuten Ichiba_Rakuten Technology Conference 2016
Rakuten Ichiba_Rakuten Technology Conference 2016Rakuten Ichiba_Rakuten Technology Conference 2016
Rakuten Ichiba_Rakuten Technology Conference 2016
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
 
Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015
 
A295 nodejs-knowledge-accelerator
A295   nodejs-knowledge-acceleratorA295   nodejs-knowledge-accelerator
A295 nodejs-knowledge-accelerator
 
Deploying Containers in Production and at Scale
Deploying Containers in Production and at ScaleDeploying Containers in Production and at Scale
Deploying Containers in Production and at Scale
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
SDNs: hot topics, evolution & research opportunities
SDNs: hot topics, evolution & research opportunitiesSDNs: hot topics, evolution & research opportunities
SDNs: hot topics, evolution & research opportunities
 
"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop
 
engage 2015 - - 2015 - Infrastructure Assessment - Analyze, Visualize and Op...
engage 2015 -  - 2015 - Infrastructure Assessment - Analyze, Visualize and Op...engage 2015 -  - 2015 - Infrastructure Assessment - Analyze, Visualize and Op...
engage 2015 - - 2015 - Infrastructure Assessment - Analyze, Visualize and Op...
 
Docker Enterprise Deployment Planning
Docker Enterprise Deployment PlanningDocker Enterprise Deployment Planning
Docker Enterprise Deployment Planning
 
Strategy, planning and governance for enterprise deployments of containers - ...
Strategy, planning and governance for enterprise deployments of containers - ...Strategy, planning and governance for enterprise deployments of containers - ...
Strategy, planning and governance for enterprise deployments of containers - ...
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
 
Tips for Installing Cognos Analytics 11.2.1x
Tips for Installing Cognos Analytics 11.2.1xTips for Installing Cognos Analytics 11.2.1x
Tips for Installing Cognos Analytics 11.2.1x
 
Container Attached Storage (CAS) with OpenEBS - SDC 2018
Container Attached Storage (CAS) with OpenEBS -  SDC 2018Container Attached Storage (CAS) with OpenEBS -  SDC 2018
Container Attached Storage (CAS) with OpenEBS - SDC 2018
 
ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and Optimize
ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and OptimizeISBG 2015 - Infrastructure Assessment - Analyze, Visualize and Optimize
ISBG 2015 - Infrastructure Assessment - Analyze, Visualize and Optimize
 
Node-RED Installer, Standalone Installer using Electron
Node-RED Installer, Standalone Installer using ElectronNode-RED Installer, Standalone Installer using Electron
Node-RED Installer, Standalone Installer using Electron
 
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
 
Node.js Service - Best practices in 2019
Node.js Service - Best practices in 2019Node.js Service - Best practices in 2019
Node.js Service - Best practices in 2019
 

More from Andrey Kudryavtsev

DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansAndrey Kudryavtsev
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...Andrey Kudryavtsev
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesAndrey Kudryavtsev
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateAndrey Kudryavtsev
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingAndrey Kudryavtsev
 
DUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOSDUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOSAndrey Kudryavtsev
 
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedDUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedAndrey Kudryavtsev
 
DUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateDUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateAndrey Kudryavtsev
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSAndrey Kudryavtsev
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraAndrey Kudryavtsev
 
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateDUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateAndrey Kudryavtsev
 

More from Andrey Kudryavtsev (13)

DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware Update
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY Mapping
 
DUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOSDUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOS
 
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedDUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
 
DUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateDUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature Update
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOS
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
 
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateDUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS Update
 
DAOS Middleware overview
DAOS Middleware overviewDAOS Middleware overview
DAOS Middleware overview
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

DUG'20: 06 - DAOS Adventures at CERN Openlab

  • 1. DAOS Adventures at CERN openlab Miguel F. Medeiros on behalf of openlab team miguel.fontes.medeiros@cern.ch 19/11/2020 DAOS User Group 2020 (DUG20)
  • 2. • CERN’s IT branch for cutting-edge computing technologies. • Active collaboration with Intel on several projects: https://openlab.cern/members/intel CERN openlab Miguel F. Medeiros | DAOS Adventures at CERN openlab 2 https://openlab.cern/ “CERN openlab is a unique public-private partnership that works to accelerate the development of cutting-edge ICT solutions for the worldwide LHC community and wider scientific research. Through CERN openlab, CERN collaborates with leading ICT companies and research institutes.” 19/11/2020
  • 3. • Work performed under CERN’s openlab umbrella • CERN openlab comprises many projects and collaborations which we provide technical support. • The specific openlab DAOS use-cases will not be covered on this talk. • This talk will only focus on the sysadmin/technical aspects • We will present our experience and process on commissioning, testing and benchmarking DAOS. • We will try to provide insights, findings and hopefully valuable feedback for DAOS developers. Disclaimer Miguel F. Medeiros | DAOS Adventures at CERN openlab 319/11/2020
  • 4. Releasing it to our users: the process Miguel F. Medeiros | DAOS Adventures at CERN openlab 4 • Intel Workshop at CERN on February 2020 (right on time…) • Benchmark and test it ourselves (with the valuable support of Intel experts) • Release DAOS with socket configuration • Allow all users to get acquainted with the system. • Test the functionality, development and integration aspects. • Release DAOS with PSM2 configuration • Exclusive cluster access. • Allow the users with performance requirements to test their use cases. 19/11/2020
  • 5. Cluster Hardware Miguel F. Medeiros | DAOS Adventures at CERN openlab 5 • 4x Cascade Lakes • 4x SkyLakes (only 2x with DAOS) 19/11/2020
  • 6. Benchmarking considerations Miguel F. Medeiros | DAOS Adventures at CERN openlab 6 • Validated the Omnipath cluster with MPI tests & benchmarks • OSU Micro-Benchmarks, IntelMPI • Benchmark was based on IOR [1] with DAOS API • All benchmarks were performed with DAOS v0.9.4 • Topology • 3x DAOS Servers • 2x DAOS Clients [1]: https://github.com/hpc/ior 19/11/2020
  • 7. Benchmarking with IOR: DAOS Sockets Miguel F. Medeiros | DAOS Adventures at CERN openlab 7 • Functional tests with Sockets configuration 19/11/2020 Each point represents the average of 20 iterations. Error bars are standard deviation.
  • 8. Benchmarking with IOR: DAOS PSM2 Miguel F. Medeiros | DAOS Adventures at CERN openlab 8 • We also tested with PSM2 → performance gains 19/11/2020 Each point represents the average of 20 iterations. Error bars are standard deviation.
  • 9. Benchmarking with IOR: DAOS PSM2 scaling 11/18/2020 Miguel F. Medeiros | DAOS Adventures at CERN openlab 9 • Two node test Each point represents the average of 20 iterations. Error bars are standard deviation.
  • 10. Performance limitations Miguel F. Medeiros | DAOS Adventures at CERN openlab 10 • Finding the missing performance… • Cascade Lake with half performance. • We suspect the riser card → HFI card is PCIe 16x but our riser card only provides 8x elec, 8x mech. 19/11/2020 Each point represents the average of 20 iterations. Error bars are standard deviation.
  • 11. A SysAdmin experience – feedback for developers Miguel F. Medeiros | DAOS Adventures at CERN openlab 11 DISCLAIMER: • Please note that all comments provided here are based on a DAOS v0.9.4 experience. If something was improved in the meantime please ignore its mention. 19/11/2020
  • 12. A SysAdmin experience – feedback for developers 1 Miguel F. Medeiros | DAOS Adventures at CERN openlab 12 • Installation in our environment was challenging • Dependency resolution with conflicts → we rely on specific software versions for our internal software. • Warn users about “sanity/pre-flight” checks before compiling. • Error reporting • Difficult to troubleshoot some issues. • We rely on error messages for troubleshooting and not all errors were mapped on https://daos- stack.github.io/admin/troubleshooting/#daos-errors DISCLAIMER: Please note that all comments provided here are based on a DAOS v0.9.4 experience. If something was improved in the meantime please ignore its mention. 19/11/2020
  • 13. A SysAdmin experience – feedback for developers 2 Miguel F. Medeiros | DAOS Adventures at CERN openlab 13 • What about a “--detail” option for admins? NVMe size SCM size Creation date Creator?… DISCLAIMER: Please note that all comments provided here are based on a DAOS v0.9.4 experience. If something was improved in the meantime please ignore its mention. 19/11/2020 --detail
  • 14. Miguel F. Medeiros | DAOS Adventures at CERN openlab 14 • DAOS Servers with constant 30% CPU utilization. A SysAdmin experience – feedback for developers 3 DISCLAIMER: Please note that all comments provided here are based on a DAOS v0.9.4 experience. If something was improved in the meantime please ignore its mention. 19/11/2020
  • 15. Miguel F. Medeiros | DAOS Adventures at CERN openlab 15 • System metrics • Nice to have these metrics in handy formats (e.g: json). • Unpack the “message” section. → Facilitate the integration on monitoring solutions without extra parsers. A SysAdmin experience – feedback for developers 4 DISCLAIMER: Please note that all comments provided here are based on a DAOS v0.9.4 experience. If something was improved in the meantime please ignore its mention. 19/11/2020
  • 16. Final thoughts Miguel F. Medeiros | DAOS Adventures at CERN openlab 16 • System commissioning was challenging and interesting! • Required some debugging in our environment. • There is still some room for a performance increase • Issue with riser cards, one HFI card per socket, etc. • Scalability testing • We needed more nodes to fully evaluate the system. • The DAOS server configuration is a plus • Quite user friendly! • Works well with configuration management tools (Puppet). • The difficulty was mostly to understand which settings suited best for our cluster. • On a personal note, it was a good experience with several learning opportunities. Thank you Intel for all the support during this Benchmark process! 19/11/2020