Successfully reported this slideshow.
Your SlideShare is downloading. ×

Chips alliance omni xtend overview

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Notes on NUMA architecture
Notes on NUMA architecture
Loading in …3
×

Check these out next

1 of 31 Ad

More Related Content

Slideshows for you (20)

Similar to Chips alliance omni xtend overview (20)

Advertisement

More from RISC-V International (20)

Recently uploaded (20)

Advertisement

Chips alliance omni xtend overview

  1. 1. OmniXtend Open Source Cache-coherence over Ethernet Dr. Zvonimir Bandic Next Gen Platforms Technologies, Western Digital Chairman, CHIPS Alliance
  2. 2. Agenda 2 Who Are We? OmniXtend Details Next Steps
  3. 3. CHIPS Alliance: Who Are We?
  4. 4. What Is CHIPS Alliance? 4 ▸ Organization which develops and hosts: – Open source hardware code (IP cores) -> think open source CPUs – Open source software design tools -> fastest growing component – Interconnect IP (phy and logical protocols) -> attracts most interest ▸ A barrier free environment for collaboration: – Standards organization framework for collaboration and development – Legal framework – Apache v2 license ▸ Shared resources ($ and time) which lower the cost of hardware development: – For IP and tools
  5. 5. CHIPS Alliance: Who Are We? 5 Wilson Snyder Olof Kindgren
  6. 6. Workgroups 6 Tools-WG: Cores-WG: Chisel-WG Interconnect: ‣ Verilator ‣ FuseSOC ‣ Cocotb-verilator ‣ SweRV Core™ ‣ OmniXtend™ ‣ TileLink 2.0 ‣ AIB (Chiplets) Rocket SoC AI accelerator
  7. 7. OmniXtend Details OmniXtend Details
  8. 8. Why OmniXtend? ▸ Processor acts as a control point within the datacenter, limiting customer flexibility ▸ Memory blocked behind processor – Limited number of DIMMs per socket – Limited CPU memory address space – Limited access to fast memory bus ▸ Analytics and machine learning driven by accelerators, limited access to the coherency bus – Access to future fast-I/O storage attach points may also be constrained Main Memory CPU L1 $ GPU FPGA ML Accelerator . . . I/O 8
  9. 9. OmniXtend vs. Other Memory-centric Concepts Memory fabric may mean different things to different people Context switch cost comparable to memory access latency Require software/kernel support and/or rewriting of applications ‣ Page fault trap leading to RDMA request (incurs context switch and SW overhead) ‣ Global address translation management in SW, leading to LD/ST across global memory fabric Fabric CPU Cache DRAM NIC DMA RDMA SW Fabric CPU Cache Tables LD/ST SW This is OmniXtend. No rewriting of software, scalable like the algorithm ‣ Coherence protocol scaled out, global page management and no context switching ‣ No impact on application system call interface (changes to boot needed) Fabric CPU Cache Cache Coherence Protocol PT
  10. 10. OmniXtend — Open Unified Memory Fabric 10 Memory is the center of the architecture Network Fabric AI Accelerator FPGA GPU Memory Fabric Other CPU RISC-V
  11. 11. OmniXtend Details 11 ▸ OmniXtend is based off TileLink – TileLink is an open, coherent bus used to connect Cores with Memory OmniXtend encapsulates TileLink and serializes it over Ethernet
  12. 12. Example OmniXtend Implementation 12 Internal Cache Coherence Switch Cache Coherence Serializer L2 cache Coherence Manager 802.3 PHY Peripherals DRAM Ethernet with Cache Coherency L1 cache L1 cache L1 cache L1 cache OmniXtend
  13. 13. Benefits of OmniXtend Already implemented in FPGAs Enables new data centric architectures and decouples compute from memory Completely unleashed memory from the CPU No need to rewrite application software The only completely open cache coherent fabric standard Based on low- cost Ethernet 13
  14. 14. OmniXtend System Block Diagram Example Only Ethernet based fabric that supports cache coherency and is open 14
  15. 15. OmniXtend Architecture Overview 15 RISC-V node Match-Action Unit 0 Parser Deparser Match-Action Unit 1 Match-Action Unit 11 … Data Packets Data Packets Match Action Unit (MAU) PHV PHV Match Table Parameters Key Actions TLoE Frame Structure TLoE Frame Header (8 bytes) Ethernet Preamble/SFD (8 bytes) Ethernet MAC Header (14 bytes) TileLink message 1 TileLink message 2 … TileLink message m Ethernet FCS (4 bytes) TLoE Frame Mask (8 bytes) Padding (Px8 bytes) Color coding in right panel may need revision
  16. 16. FPGA “Real World” Measurements RISC-V SoC with OmniXtend running in FPGA Tofino Switch programmed with P4 code to support OmniXtend 1 1 2 1 2 16
  17. 17. FPGA Time Measurements 17 ▸ 16 CPU (FPGA) system & Tofino switch – CPU runs at 100Mhz in FPGA OmniXtend Latency (100MHz FPGA)
  18. 18. Software Support ▸ Kernel level memory management changes ▸ Option 1: Single kernel instance – All nodes controlled under a single kernel instance ▸ NUMA SMP like system – Small scale systems – Expose nodes memory as NUMA nodes ▸ Option 2: Independent kernel instances – Independent kernel instance per node – Large scale systems – Applications can share memory through an FS-like interface ▸ Memory mapped files 18 Memory 0 Memory 1 CPU 0 CPU 1 CPU 2 CPU 3 CPU 4 CPU 5 CPU 6 CPU 7 NUMA node 0 NUMA node1 Logical System Memory 0 CPU 0 CPU 1 CPU 2 CPU 3 Physical node 0 Memory 1 CPU 0 CPU 1 CPU 2 CPU 3 Physical node 1 OmniXtend Fabric Memory CPU 0 CPU 1 CPU 2 CPU 3 Physical node 0 Memory CPU 0 CPU 1 CPU 2 CPU 3 Physical node 1 OmniXtend Fabric Shared Memory
  19. 19. z Qemu Process Software Support ▸ Single kernel image prototype implemented – OpenSBI modifications mainly, very few kernel changes needed ▸ Device tree description of memory – Boots on FPGA prototype hardware, up to 4 nodes ▸ Independent kernels implementation under study – Interface and implementation of shared memory setup and access control ▸ QEMU based OmniXtend emulation under development – Facilitate software development and verification – Allows access to physical compute nodes too Memory Ops Callback Guest User Space Guest Kernel Read/write to cacheable memory Read/write to non-cacheable remote memory Address Checker LRU Cache OmniXtend Device Memory Backend Virtio Net-device Virtio Disk Remote cacheable memory request Implement the OmniXtend protocol MMIO Device Remote Local User Space Kernel Hardware Ethernet Phy Access ethernet packets using a raw socket 19
  20. 20. Next Steps Next Steps
  21. 21. OmniXtend Reference Design Memory Fabric Innovation Platform Standardize RISC-V coherency bus leveraging OmniXtend 21
  22. 22. Open Source Collaboration to Drive Development CHIPS Alliance open to all organizations OmniXtend: Well-positioned for Growth ©2019 Western Digital Corporation or its affiliates. All rights reserved. 22 22 OmniXtend Reference Design Allegro files available now in CHIPS Alliance https://github.com/chipsalliance/omnixtend
  23. 23. Summary 23 ▸ OmniXtend is an open, unified memory fabric ▸ Joint workgroups with RISC-V International to standardize on TileLink 2.0 and OmniXtend for multicore systems ▸ Adopt the technology in your next SoC See more: www.chipsalliance.org
  24. 24. OmniXtend Details Thank you.
  25. 25. backup 25
  26. 26. OmniXtend Architecture Overview 26 RISC-V node
  27. 27. Visibility Growth + Operations CHIPS Alliance – organizational structure 27 Project maintainer 3 Project maintainer 2 Project maintainer 1 CHIPS Alliance Board of Directors Zvonimir Bandic (Chairman) Richard Ho (Vice-chairman) Xiaoning Qi Dave Ditzel Yunsup Lee David Kehlet Prof. Borivoje Nikolic Ted Marena Interim Director Henry Cook Technical Committee Michael Gielda Outreach Committee Brian Warner Operations Community Manager Linux Foundation Events Elected Staff Agency Future Linux Foundation Finance / Operations Workgroup Chairs Linux Foundation Legal Advocacy + Outreach SW Engineer 2 Verif. Engineer 1 Technology
  28. 28. Single Kernel Model Physical Node 0 Memory 0 Physical Node 1 Memory 1 Logical System CPU 0 CPU 1 Memory 0 CPU 2 CPU 3 CPU 4 CPU 5 Memory 1 CPU 6 CPU 7 NUMA node 0 NUMA node 1 OmniXtend fabric CPU 0 CPU 1 CPU 2 CPU 3 CPU 0 CPU 1 CPU 2 CPU 3
  29. 29. Independent Kernel Model OmniXtend fabric Shared memory Physical Node 0 Memory CPU 0 CPU 1 CPU 2 CPU 3 Physical Node 1 Memory CPU 0 CPU 1 CPU 2 CPU 3
  30. 30. OmniXtend System Block Diagram Example ML Accelerator 802.3 Phy Internal Cache Coherence Switch Cache Coherence Serializer L2 cache Coherence Manager 802.3 PHY DRAM Programmable (P4) Switch TofinoTM Ethernet with Cache Coherency L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ Internal Cache Coherence Switch Cache Coherence Serializer L2 cache Coherence Manager 802.3 PHY DRAM NVM 802.3 Phy NVM NVM – Main Memory Only Ethernet based fabric that supports cache coherency and is open This is the live version of the diagram on 15-16. Pls do not delete.
  31. 31. OmniXtend vs Other Protocols Interface Physical I/O Connection Standard Coherence? Reference Design CCIX PCIe Point to Point Yes No No Gen Z Custom P2P, Switched, Fabric Yes No Yes CXL PCIe Point to Point Yes Partial No Open CAPI Custom Point to Point Open Partial Yes OmniXtend Ethernet Fabric, P2P Open Yes Yes 31

Editor's Notes




  • > - Tooling to support use of Linux in Safety Critical Systems
    >
    > Complete and consistent set of techniques mapped to a set of open source
    > tools provided by the project.
    >
    > - kernel and software stack lifecycle in these systems ( extending LTS
    > models )
    > - Provisions for sustaining linux throughout the that product lifetime.
    > - Support for certification activities.
    >
    > Incident and Hazard Monitoring
    >
    > - Monitoring critical components in member system specific contexts and
    > reporting impact for updates
    > - Establish best practices for member response teams.
    >
    > Reference Documentation and Use Cases
    >
    > - Safety Concepts and building blocks
    > - Kernel selection and how to configure (how/why certain flags).
    > - Reference sample system to guide use of measures and techniques
    >
    > Education and Evangelism
    >
    > - Workshops and opportunities for knowledge sharing
    > - Course on Safety Engineering Best Practices
    Preexisting element selection and integration
    Safety concept/safety case based on pre-existing elements
    > - Use of analysis tools and gaps that require manual intervention

    Continuous feedback to FLOSS community
    - Process and traceability improvments
    - Automation of bug scanning and quality metrics
    - Building awareness on safety and its relation to reliabilty and availability

    Interaction with the safety community
    - Present and critically discuss FLOSS in safety
    - Present methods and tools for peer review
    - Establish acceptance for FLOSS in the safety community
    - Submit amendments and/or full standards to relevant committees
  • Changed Integration/Minaturization to Low-cost Low-power SoCs
    Changed Storage Architecture Control Points to Memory Architecture Control Points
  • Page fault trap leading to RDMA request (incurs context switch and SW overhead)
    Global address translation management in SW, leading to LD/ST across global memory fabric

    Coherence protocol scaled out, global page management and no context switching
  • Completely unleashed memory from the CPU
    Main memory can be shared equally with CPUs, GPUs, ML accelerators, FPGAs, etc.
    No need to rewrite application software
    The only completely open cache coherent fabric standard
    Based on low cost Ethernet
    Already implemented in FPGAs
    Enables new data centric architectures and decouples compute from memory
  • Kernel level memory management changes
    Option 1: Single kernel instance
    All nodes controlled under a single kernel instance
    NUMA SMP like system
    Small scale systems
    Expose nodes memory as NUMA nodes
    Option 2: Independent kernel instances
    Independent kernel instance per node
    Large scale systems
    Applications can share memory through an FS-like interface
    Memory mapped files
  • Single kernel image prototype implemented
    OpenSBI modifications mainly, very few kernel changes needed
    Device tree description of memory
    Boots on FPGA prototype hardware, up to 4 nodes
    Independent kernels implementation under study
    Interface and implementation of shared memory setup and access control
    QEMU based OmniXtend emulation under development
    Facilitate software development and verification
    Allows access to physical compute nodes too
  • So how do we make amazing data centric architectures happen? We believe it is possible via open standards. RISC-V affords us a unique opportunity to create an open standard memory coherency bus for heterogeneous computing architectures. Today, the existing CPUs all have coherency buses that are closed. We can lead the way with an open coherency bus. The CHIPS Alliance organization along with RISC-V International has been discussing this. We are asking you to join us in creating a standard for a RISC-V coherency bus. We think OmniXtend is a great foundation to build upon. OmniXtend has a reference design board and has been shown to perform well in initial system build outs. Contribute to our efforts in creating an open standard which will enable data centric solutions to thrive.
  • OmniXtend is an open, data centric memory fabric
    Joint workgroups with RISC-V International to standardize on TileLink 2.0 and OmniXtend for multicore systems
    Adopt the technology in your next SoC


×