The Power Architecture and word marks and the Power ...
Upcoming SlideShare
Loading in...5

The Power Architecture and word marks and the Power ...






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • IBM is proud of the success of our POWER processor. Many industries have adopted it’s extremely scalable technology...used in in servers, desktop computing, and embedded in games… From consumer electronics to supercomputers – the POWER architecture has been a winner from its first inception. And, you can rely on IBM continuing to provide technological advancements through the POWER technology…

The Power Architecture and word marks and the Power ... The Power Architecture and word marks and the Power ... Presentation Transcript

  • IBM POWER6 TM Based Servers May 27, 2008 Jeff Stuecheli, IBM
    • Historical Background
    • POWER6 TM Chip Components
    • Interconnect Topology
    • System Innovations
    • POWER6 TM Systems
    • Benchmark Results
    Agenda Copyright © 2007 by IBM All rights reserved.
  • IBM POWER development team Unix Servers Games SOC Mainframes collaboration
  • IBM POWER SYSTEMS Leading up to POWER6 TM POWER4+ (130nm) POWER4 (180nm) POWER5 (130nm) POWER5+ (90nm) POWER6 (65nm) 2001 2003 2004 2006 2007 Copyright © 2007 by IBM All rights reserved.
  • POWER6: Innovations in all areas of system design
    • Technology
    • Performance
    • Flexibility
    • RAS
    • Features and Function
    • System Energy Management
    Copyright © 2007 by IBM All rights reserved.
    • POWER6 TM System Components
  • POWER6 TM Chip Overview
    • Ultra-high frequency dual-core chip
      • 7-way superscalar, 2-way SMT core
      • 9 execution units
        • 2LS, 2FP, 2FX, 1BR, 1VMX,1DFU
      • 790M transistors
      • Up to 64-core SMP systems
      • 2x4MB on-chip L2
      • 32MB On-chip L3 directory and controller
      • Two memory controllers on-chip
    • Technology
      • CMOS 65nm lithography, SOI
    • High-speed elastic bus interface at 2:1 freq
      • I/Os: 1953 signal, 5399 Power/Gnd
    Copyright © 2007 by IBM All rights reserved. Core Core L2 Ctrl L2 Ctrl L2 Data L2 Data L2 Data L2 Data L3 Ctrl L3 Ctrl Mem Ctrl Mem Ctrl I/O Ctrl SMP Interconnect
  • eDRAM L3 cache chip
    • 32 MB eDRAM
      • 8 x 36 Mb macros
      • 16 banks per macro
    • Cache Data controller
      • 8 entry read queue
      • 8 entry write queue
      • 8x128 Byte Write buffering
    • 3 GHz max interface
    Copyright © 2007 by IBM All rights reserved.
  • POWER5 vs POWER6 Pipeline Comparison Instruction Fetch Instruction Buffer/Decode Instruction Dispatch/Issue Data Fetch/Execute FXU Dependent execution Load Dependent execution
    • Pipeline Comparison
      • POWER6 processor is ~2X frequency of POWER5
      • POWER6 instruction pipeline depth equivalent to POWER5
  • POWER6 Core Pipeline
    • A
    Instruction fetch pipeline BR/FX/Load pipeline Floating Point Pipeline Check Point Recovery Pipeline BR/CR FX LOAD Legend : Pre-decode stage Ifetch/Branch stage Delayed/Transmit stage Instruction Decode stage Instruction Dispatch/Issue stage Operand access/execution stage Write back stage Completion stage Check Point stage FX result bypass Load result bypass Float result bypass Cache access stage P1 P2 P3 P4 IC0 ROT IC1 EX1 FMT AG DISP PD IB0 IB1 RF RF RF RF DC0 DC1 EX2 EX3 EX4 EX5 EX6 EX7 EX ISS ECC ECC BHT BHT IFAR Instruction dispatch pipeline
    • Interconnect Topology
  • POWER6 TM : Extreme Bandwidth Capability Copyright © 2007 by IBM All rights reserved.
  • POWER6 TM Interconnect M
    • Memory DIMMs
      • 8 channels
      • Point to Point Elastic interface at 4X DRAM frequency
      • Custom on-DIMM buffer chip
    • L3 data
      • 32 MB capacity
      • Dual 16 byte (8 read + 8 write) interface at ½ CPU frequency
    • SMP interconnect
      • Three Intra-Node SMP buses for 8-way Node
      • Two Inter-Node SMP buses for up to 8 Nodes
    L3 L3 POWER6 TM M M M M M M M Intra-Node Links Inter-Node Links DDR2/3 DRAM Custom eDRAM L3 Data Copyright © 2007 by IBM All rights reserved.
  • Example System Topology Node Node Node Node Node Node Node P6 P6 L3 L3 P6 P6 M M M M M M M M Mem Ctrl Mem Ctrl 64-way SMP: 8 Nodes x 4 Chips Dual L3 Data Chips per POWER6 TM chip Dual Memory Controllers: 4 Channels each Node is logical: System design drives physical embodiment Node Copyright © 2007 by IBM All rights reserved.
    • System Innovation
  • System Innovation
    • Bulletproof Computing
        • Error detection and recovery
    • System Level Power
        • Datacenter centric control
    • Partition Mobility
        • Datacenter virtualization
    • Decimal Floating Point
    • Water Cooling
    • Coherence
        • Scalable SMP
    • Prefetch
    Copyright © 2007 by IBM All rights reserved.
  • POWER6 TM Cache Coherence
    • SubSpace Snooping
        • Hybrid approach: Combination of cache snooping and directory lookup
        • Design Principle: Provide ideal latency of cache snooping, with high scalability provided with full directories.
    • Design Components
        • Cache States: MESI enhanced with line location residue.
        • Memory Directory: Low cost coherence information stored with cache line data
        • Coherence Predictor: Prediction of cache line system state.
        • Two Tiered: Node local and System scope
    Copyright © 2007 by IBM All rights reserved.
  • Sustained system memory bandwidth running HPC loop Copyright © 2007 by IBM All rights reserved.
  • Prefetch
    • Core Centric Control
    • Design Details,
      • 16 entry stream detection table
      • Differentiation of load, store, and load+store streams
      • Hardware/software synergy
        • Prefetch table exposed to sw via enhanced dcbt instructions
        • Software controlled stream characteristics and initiation
      • Multi-line prefetch
        • Multi-cacheline bus memory prefetch operation
        • Power and resource efficient
    Copyright © 2007 by IBM All rights reserved.
    • POWER6 Systems
  • The New Power Server Family BladeCenter Power 550 Power 570 Power 520 JS12 JS22 Power 595 Power 575 2 Processor Sockets (20 sockets/rack) 4 Processor Sockets (40 sockets/rack) 2,4,6,8 Sockets Scalable Size High RAS (20 sockets/rack) 32 Sockets Highest Bandwidth Design Highest RAS 224 Sockets (14 - 16 skt 2U servers) Direct Water Cooling 1 skt 2 skt (112 LP sockets/rack)
  • The P6-p575: Comparison to P5-p575+ for Reference P6-p575 602 GFLOPs 8.42 TFLOPs 121.6 GFLOPs 1.46 TFLOPs < 1/5 th the space / Compute Op Same size Rack as P5-575+ ~ 1/3 rd the energy / compute Op ~6x Density of P6 Rack Servers ~3x Density of P6 Blade Servers
  • A Greener Alternative to 1U/2U Servers for Technical or Commercial Compute Intense Tasks
    • Save:
      • Energy
      • Space
      • Management Effort
    • Replace:
      • 3 Racks of High Performance 1U Servers
      • 224 Ethernet Links and Switch ports
      • 112 Operating System Images
      • Associated on-Floor PDU Power capacity
      • Associated on-Floor CRAC Cooling capacity
    • With:
      • 1 p6-575 rack (completely factory pre-assembled & pre-tested)
      • 48 ports of 10Gb Ethernet
      • 14 Operating System Images
      • ~15% on the on-Floor CRAC capacity the 1U Servers require
  • Efficiency in Consolidation
    • P6-p575 Gains by Eliminating Components
      • The larger SMP of a P6-p575 is 8x larger then that of the typical 2-socket
        • 1/8 th the I/O
        • 1/8 th the Ethernet switch ports & cables
        • 1Gb & 10Gb Ethernet are fully integrated into chipset
        • Less Memory is required due to 1/8 th the OS images
        • Shared Memory usage across 8x the amount of compute power -> more efficient processing (virtualization).
  • Energy Efficient 575 design
    • P6-p575 Reduces Power & Cooling Overhead through Advanced Design Techniques
      • Processors are directly water cooled allowing them to run at <50 deg-C versus the typical 90 deg-C
        • Leakage reduction
      • 2 Larger 3-phase electronically commutated fans are used in place of many small fans typically used
      • Efficient Voltage Regulators with lower “on-resistance” are used to convert power more efficiently
        • Regulators are precisely adjusted for each P6 chip
      • AC-to-DC Conversion is moved out of the server package
        • Logic is not heated by AC-to-DC conversion process
        • AC-to-DC Centralization
        • Centralized AC-to-DC enables better facility power efficiency
          • Ability to accept higher voltages (480V in North America)
          • True balanced 3-phase power input with active PFC
          • Fewer server power feeds to manage
      • Water Conditioning Units in the rack – with full redundancy
        • System accepts standard building chilled water – no preconditioning
  • NCAR bluefire 575 installation (76.4 TeraFLOPs)