WHAT IS BLUE GENE
•Blue Gene is an ambitious project to expand the
horizons of supercomputing, with the ultimate goal
of creating a system that can do one quadrillion
calculations per second, or perform one *petaflop.”
A massively parallel supercomputer using thousands of
embedded PowerPC processors supporting a large
memory space.
CONTD…
• With standard compilers and message
passing environment.
• Blue Gene is an IBM project aimed at designing
supercomputers that can reach operating speeds in the
PFLOPS(petaFLOPS) range, with low power
consumption.
 The first supercomputer to beat human in chess.
*A petaflop is a measure of a computer's processing speed and can be expressed as a thousand trillion floating point operations per second.
HISTORY
In December 1999 , IBM announced to build a massively
parallel computer, to be applied to study the protein gene
sequence.
Major areas of investigation included:
 The use of this novel platform to meet scientific goals
 Making of parallel machines more usable
 Achieving performance targets at reasonable cost through a
novel machine architecture.
RESULTS
 Linpack Top 500 Supercomputers
BLUE GENE PROJECTS
 Four Blue Gene projects :
 BlueGene/L
 BlueGene/C
 BlueGene/P
 BlueGene/Q
BLUE GENE/L
 The first computer in the Blue Gene series .
 Designed to deliver the most performance per kilowatt of
power consumed.
 It is a 16 rack system, with each rack holding 1024
compute nodes and a LINPAC performance of 70.72
*TFLOPS.
 Theoretical peak performance of 360 TFLOPS .
*TFLOPS -A tflop , or teraflop, is a parallel supercomputing system that has the ability to
compute one trillion floating point operations per second.
MAJOR FEATURES
 Trading the speed of processors for lower power
consumption.
 Dual processors per node with two working
modes: co-processor mode where one processor
handles computation and the other handles
communication.
 System-on-a-chip design. All node components
were embedded on one chip, with the exception
of 512 MB external DRAM.
BLUE GENE/L ARCHITECTURE
 Can be scaled up to 65,536 compute or I/O
nodes, with 131,072 processors
 Each node is a single ASIC with associated
DRAM memory chips
 Each ASIC has 2 700 MHz IBM PowerPC
processors
 PowerPC processors
 Low-frequency, low-power embedded processors,
superior to today's high-frequency, high-power
microprocessors by a factor of 2 or more
BLUE GENE/LARCHITECTURE CONTD…
 Double-pipeline-double-precision Floating Point Unit
 A cache sub-system with built-in DRAM controller
 Node CPUs are not cache coherent with one another
 FPUs and CPUs are designed for low power consumption
BLUE GENE/L ARCHITECTURE
CONTD…
1024 nodes
System Overview
BLUE GENE/L NETWORKS
 Each node is attached to 3 main parallel
communication networks
 3D Torus network - peer-2-peer between compute
nodes
 Collective network – collective & global
communication
 Ethernet network - I/O and management (such as access
to any node for configuration, booting and diagnostics )
BLUE GENE/L SYSTEM SOFTWARE
 System software supports efficient execution of parallel
applications
 Compiler support for *DFPU (C, C++, Fortran)
 Compute nodes use a minimal operating system called
“BlueGene/L compute node kernel”
 A lightweight, single-user operating system
 Supports execution of a single dual-threaded application compute process
 Kernel provides a single and static virtual address space to a running
process
 Because of single-process nature, no context switching required
* DFPU - Double Floating Point Unit
BLUE GENE/L SYSTEM SOFTWARE
CONTD…
 To allow multiple programs to run concurrently
 Blue Gene/L system can be partitioned into electronically isolated sets of
nodes
 The number of nodes in a partition must be a positive integer power of 2
 To run program – reserve this partition
 No other program can use till partition is done with current program
 With so many nodes, component failures are inevitable. The system is
able to electrically isolate faulty hardware to allow the machine to
continue to run
BLUE GENE/L SYSTEM SOFTWARE
CONTD…
 Parallel Programming model
 Message Passing – supported through an
implementation of MPI
 Only a subset of POSIX calls are supported
 Green threads are also used to simulate local
concurrency
BLUE GENE/C
 Renamed to Cyclops64
 Massively parallel, supercomputer-on-a-chip
cellular architecture
 Cellular architecture gives the programmer the
ability to run large numbers of concurrent
threads within a single processor.
ARCHITECTURE OVERVIEW
 Each 64-bit Cyclops64 chip (processor) will run
at 500 megahertz and contain 80 processors.
 Each processor will have
two thread units and
a floating point unit.
 Five processors share a
32 kB instruction cache.
BLUE GENE/P
 Architecturally similar to BlueGene /L .
 Expected to operate around one petaflop.
 Launched in 2008.
 In here, the cores are cache coherent and the
chip can operate as a 4-way symmetric multi-
processor.
 The memory subsystem on the chip consist of
small private L2 caches , a central shared 8
MB cache , and dual DDR2 memory
controllers.
BLUE GENE/Q
 Third and the Last known supercomputer in
the Blue Gene series .
 Expected to reach 20 petaflops in 2012.
 Enhancement to the blue gene/L and P
architecture.
RESOURCES
 IBM
website(www.03.ibm.com/servers/deepcomputi
ng/bluegene.html)
 www.supercomp.org/sc2002/paperpdfs/pap.pap
207.pdf
 http://en.wikipedia.org/wiki/Blue_Gene
 http://community.anitaborg.org/wiki/images/9/9
2/GHC07-BlueGene_salapura.pdf

Blue gene- IBM's SuperComputer

  • 2.
    WHAT IS BLUEGENE •Blue Gene is an ambitious project to expand the horizons of supercomputing, with the ultimate goal of creating a system that can do one quadrillion calculations per second, or perform one *petaflop.” A massively parallel supercomputer using thousands of embedded PowerPC processors supporting a large memory space.
  • 3.
    CONTD… • With standardcompilers and message passing environment. • Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the PFLOPS(petaFLOPS) range, with low power consumption.  The first supercomputer to beat human in chess. *A petaflop is a measure of a computer's processing speed and can be expressed as a thousand trillion floating point operations per second.
  • 4.
    HISTORY In December 1999, IBM announced to build a massively parallel computer, to be applied to study the protein gene sequence. Major areas of investigation included:  The use of this novel platform to meet scientific goals  Making of parallel machines more usable  Achieving performance targets at reasonable cost through a novel machine architecture.
  • 5.
    RESULTS  Linpack Top500 Supercomputers
  • 6.
    BLUE GENE PROJECTS Four Blue Gene projects :  BlueGene/L  BlueGene/C  BlueGene/P  BlueGene/Q
  • 7.
    BLUE GENE/L  Thefirst computer in the Blue Gene series .  Designed to deliver the most performance per kilowatt of power consumed.  It is a 16 rack system, with each rack holding 1024 compute nodes and a LINPAC performance of 70.72 *TFLOPS.  Theoretical peak performance of 360 TFLOPS . *TFLOPS -A tflop , or teraflop, is a parallel supercomputing system that has the ability to compute one trillion floating point operations per second.
  • 8.
    MAJOR FEATURES  Tradingthe speed of processors for lower power consumption.  Dual processors per node with two working modes: co-processor mode where one processor handles computation and the other handles communication.  System-on-a-chip design. All node components were embedded on one chip, with the exception of 512 MB external DRAM.
  • 9.
    BLUE GENE/L ARCHITECTURE Can be scaled up to 65,536 compute or I/O nodes, with 131,072 processors  Each node is a single ASIC with associated DRAM memory chips  Each ASIC has 2 700 MHz IBM PowerPC processors  PowerPC processors  Low-frequency, low-power embedded processors, superior to today's high-frequency, high-power microprocessors by a factor of 2 or more
  • 10.
    BLUE GENE/LARCHITECTURE CONTD… Double-pipeline-double-precision Floating Point Unit  A cache sub-system with built-in DRAM controller  Node CPUs are not cache coherent with one another  FPUs and CPUs are designed for low power consumption
  • 11.
  • 12.
    BLUE GENE/L NETWORKS Each node is attached to 3 main parallel communication networks  3D Torus network - peer-2-peer between compute nodes  Collective network – collective & global communication  Ethernet network - I/O and management (such as access to any node for configuration, booting and diagnostics )
  • 13.
    BLUE GENE/L SYSTEMSOFTWARE  System software supports efficient execution of parallel applications  Compiler support for *DFPU (C, C++, Fortran)  Compute nodes use a minimal operating system called “BlueGene/L compute node kernel”  A lightweight, single-user operating system  Supports execution of a single dual-threaded application compute process  Kernel provides a single and static virtual address space to a running process  Because of single-process nature, no context switching required * DFPU - Double Floating Point Unit
  • 14.
    BLUE GENE/L SYSTEMSOFTWARE CONTD…  To allow multiple programs to run concurrently  Blue Gene/L system can be partitioned into electronically isolated sets of nodes  The number of nodes in a partition must be a positive integer power of 2  To run program – reserve this partition  No other program can use till partition is done with current program  With so many nodes, component failures are inevitable. The system is able to electrically isolate faulty hardware to allow the machine to continue to run
  • 15.
    BLUE GENE/L SYSTEMSOFTWARE CONTD…  Parallel Programming model  Message Passing – supported through an implementation of MPI  Only a subset of POSIX calls are supported  Green threads are also used to simulate local concurrency
  • 16.
    BLUE GENE/C  Renamedto Cyclops64  Massively parallel, supercomputer-on-a-chip cellular architecture  Cellular architecture gives the programmer the ability to run large numbers of concurrent threads within a single processor.
  • 17.
    ARCHITECTURE OVERVIEW  Each64-bit Cyclops64 chip (processor) will run at 500 megahertz and contain 80 processors.  Each processor will have two thread units and a floating point unit.  Five processors share a 32 kB instruction cache.
  • 19.
    BLUE GENE/P  Architecturallysimilar to BlueGene /L .  Expected to operate around one petaflop.  Launched in 2008.  In here, the cores are cache coherent and the chip can operate as a 4-way symmetric multi- processor.  The memory subsystem on the chip consist of small private L2 caches , a central shared 8 MB cache , and dual DDR2 memory controllers.
  • 20.
    BLUE GENE/Q  Thirdand the Last known supercomputer in the Blue Gene series .  Expected to reach 20 petaflops in 2012.  Enhancement to the blue gene/L and P architecture.
  • 21.
    RESOURCES  IBM website(www.03.ibm.com/servers/deepcomputi ng/bluegene.html)  www.supercomp.org/sc2002/paperpdfs/pap.pap 207.pdf http://en.wikipedia.org/wiki/Blue_Gene  http://community.anitaborg.org/wiki/images/9/9 2/GHC07-BlueGene_salapura.pdf