From Rack scale computers to Warehouse scale computers


Published on

A survey report on rack scale computers and warehouse scale computers

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

From Rack scale computers to Warehouse scale computers

  1. 1. From  Rack  scale  computers  to   Warehouse  scale  computers 産総研  情報技術研究部⾨門 ⾼高野  了了成 1 2014/7/31  (Revised  8/6)
  2. 2. 概要 •  単純な規模の拡⼤大から⾮非連続的技術の導⼊入へ –  キーワード  “disaggregation” •  Rack  scale  computer –  Case  #1:  Open  Compute  Project –  Case  #2:  Intel  Rack  Scale  Architecture •  Warehouse  scale  computer –  Case  #1:  HP  The  Machine –  Case  #2:  UCB  ASPIRE  FireBox 2
  3. 3. Rack  scale  computer •  HP  Moonshotよりもう少し先の話 3 Moonshot for extreme efficiency Converged Infrastructure for extreme scale Shared Power Shared Storage Shared Fabric Shared Management Shared Chassis Shared Cooling … with a rich set of applications specific cartridges codesigned for extreme efficiency The new metric Gflops/Watt At extreme scale no way to escape specialization and heterogenity
  4. 4. Open  Compute  Project 4 •  コモディティ製品の利利⽤用から、ユーザ主導の設計へ •  2011年年4⽉月に⽶米フェイスブックが、同社データセンター におけるサーバや設備の仕様をオープンソース化 •  ⼤大規模データセンターの集積度度や省省エネルギー性の向上 –  Industry  Standard:  1.9  PUE –  Open  Compute  Project:  1.07  PUE •  サーバ、ストレージ、ラック、ネットワークスイッチ、 データセンター設計などに関する仕様が公開 •  製品化の開始:Quanta  Rackgo  X  series,  GIGABYTE   DataCenter  Solution  series PUE:  Power  Usage  Effectiveness
  5. 5. Open  Compute  Rack  v2:   Open  Rack Ø Well-defined “Mechanical API” between the server and the rack Ø Accepts any size equipment 1U – 10U Ø Wide 21” equipment bay for maximum space efficiency Ø Shared 12v DC power system Ø Available now from Delta Electronics (more suppliers coming soon)
  6. 6. 6
  7. 7. 7 Reference Architecture Network platform – Flexible & Cost effective Increase utilization thru storage aggregation Extreme Compute and Network bandwidth Platform Flexibility - Increase useful life, and capacity Intel rack scale architecture CPU / Mem Modules Silicon – Atom & Xeon Photonics & switch fabric Storage – PCIE -SSD & Caching Open Network Platform Orchestration Accelerating rack scale innovation by delivering suite of interoperable technologies Efficiency thru granularity at physical & logical level • Intel technologies optimized for flexibility, performance & cost • Open rack scale reference architecture to simplify adoption • Driving alignment on common standards with broad range of uses (end users, Scorpio and OCP ) and OEM implementations
  8. 8. 8 Silicon Photonics for Disaggregation Mezzanine Options Intel Ethernet controller and Intel Silicon Photonics Optical PCIe via Intel Silicon Photonics Intel® Xeon ® processor based tray Mezzanine fiber Intel® AtomTM Micro-server tray 100 Gb in the rack, enables flexible topologies & distributed switching
  9. 9. 911 Optical Rack Choice of Logical Architecture CPUMem DDR Server CPUMem DDRServer CPUMem Xeon: PCIe Atom: Enet DDR Server Xeon and Atom Fabric Compute HDDs PCIeCPUSSDs Compute Network CPUMem DDRServer CPUMem DDRServer CPUMem DDR Server SiPhSiPhSiPh FabricFabricFabric 100Glinks Architecture offers flexible solutions and multiple Value Propositions Remote Storage I/O Appliance To Spine Switches Network Storage Compute Switch ASIC CPU NIC SSD NICSSD Server SiPhSiPh CPUMem DDR CPUMem DDR CPUMem DDR SiPh PCIe PCIe PCIe Server Server • Inter operable & programmable systems based on standard platforms • Choice of platform sub systems & logical architecture – “composability” Network & Storage move into TOR Switch TOR Switch distributed into Servers
  10. 10. 10 Example Usages Public Cloud Private Cloud Big Data IMDB (future) CSP’s  SW // • Range of end user usage models driving innovation • OEM’s  delivering  range  of  implementations   • Industry delivering common building blocks with flexible configurations Range  of  emerging  solution  stacks  with  “composability”    
  11. 11. Warehouse  scale  computer 11 HE DATACENTER AS A COMPUTER Figure 1.1 depicts some of the more popular building blocks for WSCs. A set of low-end serv- typically in a 1U or blade enclosure format, are mounted within a rack and interconnected using cal Ethernet switch. These rack-level switches, which can use 1- or 10-Gbps links, have a num- of uplink connections to one or more cluster-level (or datacenter-level) Ethernet switches. This ond-level switching domain can potentially span more than ten thousand individual servers. .1 Storage k drives are connected directly to each individual server and managed by a global distributed system (such as Google’s GFS [31]) or they can be part of Network Attached Storage (NAS) ices that are directly connected to the cluster-level switching fabric. A NAS tends to be a simpler ution to deploy initially because it pushes the responsibility for data management and integrity to AS appliance vendor. In contrast, using the collection of disks directly attached to server nodes uires a fault-tolerant file system at the cluster level. This is difficult to implement but can lower dware costs (the disks leverage the existing server enclosure) and networking fabric utilization GURE 1.1: Typical elements in warehouse-scale systems: 1U server (left), 7´ rack with Ethernet ch (middle), and diagram of a small cluster with a cluster-level Ethernet switch/router (right). connectivity. Storage Hierarchy 2 shows a programmer’s view of storage hierarchy of a typical WSC. A server consists of a f processor sockets, each with a multicore CPU and its internal cache hierarchy, local shared ent DRAM, and a number of directly attached disk drives.The DRAM and disk resources e rack are accessible through the first-level rack switches (assuming some sort of remote call API to them), and all resources in all racks are accessible via the cluster-level switch. Quantifying Latency, Bandwidth, and Capacity 3 attempts to quantify the latency, bandwidth, and capacity characteristics of a WSC. For n we assume a system with 2,000 servers, each with 8 GB of DRAM and four 1-TB disk ach group of 40 servers is connected through a 1-Gbps link to a rack-level switch that ditional eight 1-Gbps ports used for connecting the rack to the cluster-level switch (an 1.2: Storage hierarchy of a WSC. ?
  12. 12. HP  “The  Machine” 12 ⽤用途特化型コア  + (SoC) ユニバーサル メモリプール フォトニクス      + (+ファブリック)
  13. 13. ユニバーサルメモリ (HP  Nanostores/Memristor) 13 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 UNIVERSAL MEMORY A drastic reduction of the memory stack complexity and cost But requires a complete software stack redesign to leverage the full potentiality of the new architecture 性能/ワット⽐比は 10〜~100倍向上
  14. 14. フォトニクス技術 14
  15. 15. アーキテクチャ 15 Photonic Interconnect Compute Elements Memory Elements NV Memory Elements Storage Elements Architecture evolution/revolution “Computing Ensemble”: bigger than a server, smaller than a datacenter, built-in system software – Disaggregated pools of uncommitted compute, memory, and storage elements – Optical interconnects enable dynamic, on-demand composition – Ensemble OS software using virtualization for composition and management – Management and programming virtual appliances add value for IT and application developers On-demand composition Ensemble OS Management Ensemble Programming
  16. 16. Example  Usage  (1) 16
  17. 17. Example  Usage  (2) 17
  18. 18. Performance  Estimation •  京よりも6倍⾼高速で、80倍のエネルギー効率率率 18 HPC  Challengeʼ’s  RandomAccess  benchmark
  19. 19. Performance  Estimation •  BG/Qと同等の性能で、20倍のエネルギー効率率率 19 Graph  500  benchmark
  20. 20. Roadmap 20
  21. 21. UC Berkeley 1 Terabit/sec optical fibers FireBox Overview! High Radix Switches SoC SoC SoC SoC SoC SoC SoCSoC SoC SoC SoC SoC SoC SoC SoC SoC Up to 1000 SoCs + High-BW Mem (100,000 core total) NVM NVM NVM NVM NVM NVM NVM NVMNVM NVM NVM NVM NVM NVM NVM NVM Up to 1000 NonVolatile Memory Modules (100PB total) InterXBox& Network& Many&Short&Paths& Thru&HighXRadix&Switches& FireBox  Overview 21 The  Machineと ほぼ同じような図
  22. 22. 22 UC Berkeley Photonic-Switches- !  Monolithically&integrated&silicon&photonics&with&WaveXDivision& MulCplexing&(WDM)& -  A&fiber&carries&32&wavelengths,&each&32Gb/s,&in&each&direcCon& -  OffXchip&laser&opCcal&supply,&onXchip&modulators&and&detectors& !  MulCple&radixX1000&photonic&switch&chips&arranged&as&middle& stage&of&Clos&network&(first&and&last&Clos&stage&inside&sockets)& !  2K&endpoints&can&be&configured&as&either&SoC&or&NVM&modules& !  In&Box,&all&paths&are&two&fiber&hops:& -  ElectricalXphotonic&at&socket& -  One&fiber&hop&socketXtoXswitch& -  PhotonicXelectrical&at&switch& -  Electrical&packet&rouCng&in&switch& -  ElectricalXphotonic&at&socket& -  One&fiber&hop&switchXtoXsocket& -  PhotonicXelectrical&at&socket& 30 SoC& Switch& Switch& SoC& NVM& 結局、電気でパケットスイッチング
  23. 23. まとめ •  データセンターに⾮非連続的技術導⼊入の波 –  近い将来のデータセンターは、サーバ単位ではなく、ラック単位 で構成可能 –  さらに先のデータセンターは、データセンターワイドで構成可能 •  アーキテクチャおよびシステムソフトウェアに⼤大きなイ ンパクト •  フォトニクス技術が鍵 –  システムとしてどのように構築するか? •  フォトニクススイッチの⼤大規模化に対するブレークスルーがないなら、fat   node化が唯⼀一の道。⼀一⽅方でB/Fは低下。 –  各種通信インタフェース規格は統⼀一されるのか? –  メモリがdisaggregationされるかは?? –  電気の⼊入出⼒力力帯域はどこまで向上する? •  25  Gbps  (100  GbE)、28  Gbps  (HMC)あたりが現状のstate-‐‑‒of-‐‑‒the-‐‑‒art 23
  24. 24. 出典 •  Intel  rack  scale  architecture  overview,  Interop2013 –‐‑‒vegas/2013/ free-‐‑‒sessions-‐‑‒-‐‑‒-‐‑‒keynote-‐‑‒presentations/download/463 •  New  technologies  that  disrupt  our  complete   ecosystem  and  their  limits  in  the  race  to   Zettascale,  HPC2014 – •  HPが「Tech  Power  Club」で⾒見見せた“未来のサーバー技 術”, – 24