Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Optimized HPC/AI cloud with OpenStack acceleration service and composable hardware

337 views

Published on

Today data scientist is turning to cloud for AI and HPC workloads. However, AI/HPC applications require high computational throughput where generic cloud resources would not suffice. There is a strong demand for OpenStack to support hardware accelerated devices in a dynamic model.
In this session, we will introduce OpenStack Acceleration Service – Cyborg, which provides a management framework for accelerator devices (e.g. FPGA, GPU, NVMe SSD). We will also discuss Rack Scale Design (RSD) technology and explain how physical hardware resources can be dynamically aggregated to meet the AI/HPC requirements. The ability to “compose on the fly” with workload-optimized hardware and accelerator devices through an API allow data center managers to manage these resources in an efficient automated manner.
We will also introduce an enhanced telemetry solution with Gnnochi, bandwidth discovery and smart scheduling, by leveraging RSD technology, for efficient workloads management in HPC/AI cloud.

Published in: Engineering
  • Writing a good research paper isn't easy and it's the fruit of hard work. For help you can check writing expert. Check out, please ⇒ www.WritePaper.info ⇐ I think they are the best
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Optimized HPC/AI cloud with OpenStack acceleration service and composable hardware

  1. 1. / / / / . . , . . . . , , . Vancouver 2018 . ,
  2. 2. Agenda • . 2 7 9 C • &H H 99 9 D 7 D D /0& • A 0D D 0 F H 9 • EDE . E7 D AH 9 DI E7 9 DC C F 7 LP S 01/ 1 & 1 TORN
  3. 3. OpenStack for Scientific Research • HPC/AI workloads make cloud acceleration a requirement rather than an interesting option. • Hardware acceleration isn’t new. GPU, ASIC, NVMe, FPGA, etc. • Current issues to use accelerators in OpenStack 5G Wireless and Optical Networking Cloud Processing and Analytics 50B Connected Devices by 20201 Hyper-connected World Deluge of Data
  4. 4. Benefits increase over timeBetter Utilization Greater Flexibility More Vendor choice Lower cost of ownership Implement Standards Enable Solutions Resource Pooling Compute and Rack API Storage Management API Network Device Management API Partner with OEMs & ISVs NVMe over PCIe NVMe over Fabrics FPGA Accelerators Interface with Orchestrators Build Validated Solution Stacks Intel® RSD Vision
  5. 5. Any forward looking information provided here is subject to change without notice 2016 2017 2018 * Other names and brands may be claimed as property of others. ** Dates above refer to delivery of Intel RSD reference code. OEM delivery of solutions are typically launched 1-2 quarters later Intel RSD v1.2 MODERN MANAGABILITY • Open and Modern HW Management APIs (Redfish*) • Pod-Level Architecture APIs • Networked-Storage Services APIs Intel RSD v2.1 NVMe POOLING OVER PCIe • Physical Storage Disaggregation and Composability APIs • Storage Pooling over PCIe (Direct-Attach) • Pooled Node Controller Discovery Intel RSD v2.3 NVMe POOLING OVER ETHERNET • NVMe over Fabrics* (Ethernet, RDMA) • Standards-Based Storage Mgmt. (SNIA Swordfish*) • Telemetry for NVMe over Fabrics* Available Now Available Now Features under investigation: • Intel persistent memory support • FPGA Accelerator Pooling over PCIe • Network card pooling • Standards-Based Network Mgmt. (Yang-to-Redfish*) In Development Target: Q2 2018 Intel RSD v2.2 ADVANCED MANAGABILITY • Intel Xeon Scalable Processor Support • Out-of-Band Telemetry APIs • TPM Support • FPGA Discovery over PCIe Available Now Intel® RSD Roadmap
  6. 6. Storage Drawer Compute Compute Compute Compute Network Accelerator Drawer Disaggregation Spend less up front and save $$ over time Storage Pooling over PCIe NVMe over Fabrics (Ethernet) Two implementation options for hardware disaggregation Lower Latency Greater Scalability Intel® RSD - Disaggregation
  7. 7. Storage Pooling – PCIe Direct Attach 7 Pod Manager § § § Storage TrayCompute Servers Management Management Re-timer1Gb Eth. PSME Re-timer1Gb Eth. PSME Re-timer1Gb Eth. PSME Network Data Data PCIe Switch 1Gb Eth. PSME PCIe PCIe PCIe Storage Pool 16 drives typically Compute Radix Generally 8-16 x4 Latency Add ~1 μSec per I/O Bandwidth <50% oversubscribed Use Case Hottest Tier Storage Cost Low Shipping NOW with v2.1 Direct Attach PCIe Storage Pooling requires PCIe Re-timer cards in Compute Servers 7 0 1 7 9 2 7 0 9 2 91 19 10 AE F . ID C
  8. 8. Storage Pooling – NVMe over Fabric 8 Storage Pool Unlimited Compute Radix Unlimited Latency Add ~20 μSec per I/O Bandwidth May be oversubscribed Use Case Warm Tier Storage Cost Medium Coming in v2.3 Software-based Storage Pooling over Ethernet using “NVMe over Fabrics” protocol Pod Manager § § § Storage ServersCompute Servers § § § Management Management RDMA1Gb Eth. PSME Data RDMA1Gb Eth. PSME RDMA1Gb Eth. PSME Network RDMA PCIe Switch 1Gb Eth. x86 CPU PSME RDMA PCIe Switch 1Gb Eth. x86 CPU PSME RDMA PCIe Switch 1Gb Eth. x86 CPU PSME 0 8 1 72 89 A 8 1 70 2 2 21 FCL I . . . E D
  9. 9. Storage Pooling – NVMe over Fabric 9 Storage Pool Unlimited Compute Radix Unlimited Latency Add ~2 μSec per I/O Bandwidth May be oversubscribed Use Case Warm Tier Storage Cost Low Hardware-based Storage Pooling over Ethernet using “NVMe over Fabrics” protocol Pod Manager § § § Storage AppliancesCompute Servers § § § Management Management RDMA1Gb Eth. PSME Data RDMA1Gb Eth. PSME RDMA1Gb Eth. PSME Network RDMA PCIe Switch 1Gb Eth. PSME HW Bridge RDMA PCIe Switch 1Gb Eth. PSME HW Bridge RDMA PCIe Switch 1Gb Eth. PSME HW Bridge 7 0 1 7 9 2 7 0 9 2 91 19 10 AE F . ID C
  10. 10. OpenStack Acceleration Service – Cyborg Cyborg is an OpenStack project that aims to provide a general purpose management framework for acceleration resources (i.e. various types of accelerators such as Crypto cards, GPU, FPGA, NVMe/NOF SSDs, ODP, DPDK/SPDK and so on). So Cyborg will be a good choice to manage NVMe high-speed storage devices in Intel RSD rack, by considering it as one kind of accelerator. 10 Cyborg-api Cyborg-dbCyborg-conductor Cyborg-agent local cache vendor-X driver cyborg-generic-driver vendor-Y driver vendor-X FPGA vendor-Z ipsec vendor-Y NVMe SSD Control Node Compute Node
  11. 11. The Workflow of Cyborg Services Driver Agent Database/ Conductor Api Detect Accelerators Hardware / Software updates Install drivers Monitor Utilization Accelerators list Usage statistics Attach / Detach for Nova Manage per host software Provides schedule hints to Nova User requests instance Operator manages hardware
  12. 12. C2 0 A 9 C 7 D 9 99 7 A EIFON L . 1 MH
  13. 13. Thank You www.99cloud.net 1 2 @ 0 0 . . - 6701 49 5 5

×