This session is focused on explaining the practicalities of several server developments with different OEM's which are focused on OCP equipment and which are following OCP immersion guidelines. This session showcases several actual server platforms which are created using the Open Cassette specifications and how the principles are followed which are outlined in the "Guidelines for Immersion-Cooled IT equipment" whitepaper. This session covers work and equipment by Asperitas, OEMs and system integrators from the OCP ecosystem.
What can you expect to learn?
The practical implications of optimizing IT equipment for immersion. It covers the challenges which are faced with retrofitting IT equipment, opportunities for different applications and the way in which collaboration works to achieve common goals. This session will also cover the value of data collection and material compatibility studies as part of a certification process.
3. Asperitas Immersion Technology
Asperitas AIC24-15/21”
Passive immersion technology
Shell Immersion Cooling Fluid S5 X (Hydrocarbon)
Power density:
• Compute density: 45 kW/m2@32°C
High availability immersion solution
• Dual power and cooling integration
• Self contained and self sustained (gravity driven)
• Full monitoring and autonomous safety (monitoring, alarming, control)
Thermally optimized for platform focus
• Immersion optimized IT platforms
• Serviceable IT equipment
• Fully warranted solutions by OEMs
ADVANCED
COOLING
SOLUTIONS
SERVER
4. Asperitas certification
• Process facilitates and executes OCP “Design Guidelines for Immersion-Cooled IT Equipment”
• Standardized on Asperitas OCP Open Cassette design
• OEM focused collaborations, facilitating full warranted solutions
Requirement
Density
Performance
Thermal
System
specifications
Thermal
design
System
design
Level 1
Feasibility
System build
Material
analysis
Thermal
performance
Level 2
Prototype
Validated
system
design
Thermal
certification
Duration test
Level 3
Supported
OEM/vendor
platform
Platform
optimisation
OEM support
alignment
OEM
redesign
OEM product
lifecycle
ADVANCED
COOLING
SOLUTIONS
SERVER
5. OCP Asperitas Open Cassette SPEC (AOC) Virtual Summit 2020
https://www.opencompute.org/documents/20200227-open-cassettes-specification-v1-0-pub-pdf
Asperitas Open Cassette SPEC ADVANCED
COOLING
SOLUTIONS
SERVER
6. Engineering optimized platforms
Edge platform
• High cooling temperature tolerance (48°C)
• Medium CPU density (4x AMD EPYC/1U)
• Minimized footprint (15” chassis)
Enterprise mainstream platform
• High availability implementation
• High overall efficiency
• Highly serviceable
HPC platform
• High performance implementation
• CPU & GPU dense
• High overall efficiency
ADVANCED
COOLING
SOLUTIONS
SERVER
Target
Dimensions
1U/15”
2U/21”
1U/21”
Platform
SMC/AMD
Data processing
Dell/AMD/Intel
High availability
Penguin/Gigabyte/
Intel/NVIDIA
AI/Machine learning
Original platform
2124BT-HTR (BigTwin)
Dell C6525/C6420
Relion XO1114GTS
7. Edge thermal and fluid optimization
1U 15”Open cassette implementation
• High liquid flow abilities
• Limited space for components
• One board “upside down” (no upside down)
• Small formfactor PSU
Main thermal sources
• CPUs (180W)
• 80Plus PSU (10% loss margin calculated)
Design decisions
• Fixed board mounting instead of sleds
• Reliability: PSU thermal tolerance 60°C, positioning in bottom
• Performance: CPU’s as low as possible above PSU
• Accelerate thermal shadowing to minimize impact
• Custom designed power delivery infrastructure by SuperMicro
ADVANCED
COOLING
SOLUTIONS
SERVER
8. Enterprise system optimization
2U 21”OCP open cassette adaptation
• High liquid flow abilities
• Flexible space for components
• Adapted built-in blade slots
Main components sources
• Dell C6525 and 6420 blades
• Dell original PSU’s
Design decisions
• Single side servicing: remove drive bays (on-board storage)
• Extended blade designs for lowered positioning
• Original backplane designs re-positioned
• Off the shelf blades support
• Custom firmware/system management by Dell
ADVANCED
COOLING
SOLUTIONS
SERVER
9. HPC design optimization
1U 21”open cassette adaptation
• High liquid flow abilities
• Flexible space for components
Performance optimization
• Focus on GPU performance: Lowest position in chassis
• CPU’s as low as possible in remaining space
Density optimizations
• Power shelf to chassis PSU integration (17% of space in tank)
• 1 OU converted to 1 U to increase density (8% of space in tank)
Design decisions
• Custom PDB by Gigabyte
• Custom cabling by Gigabyte
ADVANCED
COOLING
SOLUTIONS
SERVER
10. Material compatibility study
Mainboard and PSU focused
• Only compatible cabling used
• Capacitors (potential rubber sealing)
• Thermal compounds (potential to dissolve)
• Labels (Potential to dissolve glue or ink)
• Fan simulator application (removal of fan)
Test methods
• Material sheets not always available
• Visual (high res photo) analysis before and after immersion
• Dielectric thermal bath testing, liquid analysis after test (Shell)
• 100x rapid power switching (PSU relay)
• Visual check confirmation before and after immersion
Duration testing
• Continuous logging of thermal properties
• Continuous logging of system performance
ADVANCED
COOLING
SOLUTIONS
SERVER
11. Liquid analysis
Compatibility tests experiments:
• Immersing plastics, elastomers, metals and parts under
controlled temperatures (room temp, 50, 80 and 100°C)
• For some weeks or months (accelerated test)
• Measuring component weight & volume behaviour ICP
Dielectric Breakdown Voltage
Integrated fluid analytics:
• ICP: Identify metals and elements originally
not in the fluid
• FTIR: Materials spectrums comparison /
Impurities
• Dielectric breakdown: Thermal Fluid
performance
Source: Shell Technology Centres SST/STCHa
ADVANCED
COOLING
SOLUTIONS
SERVER
12. Thermal optimization
Thermal shadowing
• Focus on GPU workloads, all GPU’s parallel
• PSU as critical component, placement parallel to GPU’s
• CPU’s secondary workload focus, in GPU shadow
Other critical components
• High utilization of SSD’s
• Placement in intermediate temperatures
• GPU shadow acceptable
• Infiniband network adapters
• High temperature tolerant
• Placement in high temperature layers
• Ports extended for service access
ADVANCED
COOLING
SOLUTIONS
SERVER
15. CPU variations Boost vs base clock ADVANCED
COOLING
SOLUTIONS
SERVER
CPU Temperature fluctuations
(min/max) over 24 hour test cycle
using Stresslinux
Full boost
Base clock (-20W)
17. Other OCP platform optimizations
Tioga Pass compute platform
Compute platform
Olympus platform
ADVANCED
COOLING
SOLUTIONS
SERVER
18. Conclusion
Improved system capabilities are unlocked by platform optimization
Optimization of platforms benefits from collaborative approach
Differentiation between workloads allows optimization for desired goals
• Thermal
• HPC/High Density
• High availability
Thorough material compatibility studies improve predictability
Tests at scale allow for identification of further optimization steps
ADVANCED
COOLING
SOLUTIONS
SERVER
19. Call to Action
• Join and contribute immersion related content in ACS Immersion, Server and ACF groups
• Collaborate on optimization of IT platforms for immersion
• Cross-pollination within OCP builds an effective ecosystem
• Sharing knowledge helps increase the immersion potential
• More information: Rolf.Brink@Asperitas.com
• Design Guidelines for Immersion-Cooled IT Equipment: Keep an eye out on the immersion wiki and mailinglist!
• Open cassette spec: https://www.opencompute.org/documents/20200227-open-cassettes-specification-v1-0-pub-pdf
• ACS Immersion:
• Project Wiki with latest information:
https://www.opencompute.org/wiki/Rack_%26_Power/Advanced_Cooling_Solutions_Immersion_Cooling
• Mailing list: http://lists.opencompute.org/mailman/listinfo/opencompute-acsimmersion
ADVANCED
COOLING
SOLUTIONS
SERVER