The document discusses the Blue Gene supercomputer project by IBM. It describes Blue Gene as a massively parallel supercomputer using thousands of PowerPC processors to support large memory. The name "Blue Gene" refers to IBM's corporate color blue and the intended use of computational biology and protein folding. The project began in 1999 with a $100M effort to build a petaflop supercomputer. There have been four Blue Gene projects including BlueGene/L, the first system which achieved over 280 teraflops and set performance records.
2. Content
What is Blue Gene
Why the name “Blue Gene”?
History
Results
Blue Gene Projects
References
3. What is Blue Gene
A massively parallel supercomputer using tens of
thousands of embedded PowerPC processors
supporting a large memory space
With standard compilers
and message passing
environment
4. Why the name “Blue Gene”?
“Blue”: The corporate color of IBM
“Gene”: The intended use of the Blue Gene
clusters – Computational biology, specifically,
protein folding
5. History
Dec’99, IBM Research announced $100M US effort to build
a Petaflop scale supercomputer.
Two goals of The Blue Gene project :
Massively parallel machine architecture and software
Bio-Molecular Simulation – advance orders of magnitude
November 2001, Partnership with Lawrence Livermore
National Laboratory (LLNL)
7. Blue Gene Projects
Four Blue Gene projects :
BlueGene/L
BlueGene/C
BlueGene/P
BlueGene/Q
8. Blue Gene/L
The first computer in the Blue Gene
series
IBM first announced the Blue Gene/L
project, Sept. 29, 2004
Final configuration was launched in
October 2005
9. Blue Gene/L - Unsurpassed Performance
Designed to deliver the most performance per
kilowatt of power consumed
Theoretical peak performance of 360 TFLOPS
Final Configuration (Oct. ‘05) scores over 280
TFLOPS sustained on the Linpack benchmark.
Nov 14, ‘06, at Supercomputing 2006, Blue
Gene/L was awarded the winning prize in all
HPC Challenge Classes of awards.
10. Blue Gene/L Architecture
Can be scaled up to 65,536 compute or I/O
nodes, with 131,072 processors
Each node is a single ASIC with associated
DRAM memory chips
Each ASIC has 2 700 MHz IBM PowerPC
processors
PowerPC processors
Low-frequency, low-power embedded processors,
superior to today's high-frequency, high-power
microprocessors by a factor of 2 or more
11. Blue Gene/L Architecture contd…
Double-pipeline-double-precision Floating Point Unit
A cache sub-system with built-in DRAM controller
Node CPUs are not cache coherent with one another
FPUs and CPUs are designed for low power consumption
Using transistors with low leakage current
Local clock gating
Putting the FPU or CPU/FPU pair to sleep
13. Blue Gene/L Architecture contd…
1 rack holds 1024 nodes or 2048 processors
Nodes optimized for low power consumption
ASIC based on System-on-a-chip technology
Large numbers of low-power system-on-a-chip technology
allows it to outperform commodity clusters while saving on
power
Aggressive packaging of processors, memory and
interconnect
Power Efficient & Space Efficient
Allows for latencies and bandwidths that are significantly
better than those for nodes typically used in ASC scale
supercomputers
14. Blue Gene/L Networks
Each node is attached to 3 main parallel
communication networks
3D Torus network - peer-2-peer between compute nodes
Collective network – collective & global communication
Ethernet network - I/O and management (such as access to
any node for configuration, booting and diagnostics )
15. Blue Gene/L System Software
System software supports efficient execution of
parallel applications
Compiler support for DFPU (C, C++, Fortran)
Compute nodes use a minimal operating system
called “BlueGene/L compute node kernel”
A lightweight, single-user operating system
Supports execution of a single dual-threaded application
compute process
Kernel provides a single and static virtual address space to
one running compute process
Because of single-process nature, no context switching
required
16. Blue Gene/L System Software contd…
To allow multiple programs to run concurrently
Blue Gene/L system can be partitioned into electronically
isolated sets of nodes
The number of nodes in a partition must be a positive integer
power of 2
To run program – reserve this partition
No other program can use till partition is done with current
program
With so many nodes, component failures are inevitable. The
system is able to electrically isolate faulty hardware to allow
the machine to continue to run
17. Blue Gene/L System Software contd…
Parallel Programming model
Message Passing – supported through an implementation of
MPI
Only a subset of POSIX calls are supported
Green threads are also used to simulate local concurrency