The Cell Processor

Agenda Introduction Limiters to Processor Performance Cell Architecture Cell Platform Cell Applications Cell Programming Appendix

Cell History IBM, SCEI / Sony and Toshiba Alliance formed in 2000 Design Center opened in March 2001 (Based in Austin, Texas) Single Cell BE operational Spring 2004 2-way SMP operational Summer 2004 February 7, 2005: First technical disclosures November 9, 2005: Open Source SDK Published

The problem is… … the view from the computer room!

Outlook Source: Kurzweil “ Computer performance increases since 100 years exponential !!!”

But what could you do if all objects were intelligent… … and connected?

What could you do with unlimited computing power… for pennies? Could you predict the path of a storm down to the square kilometer? Could you identify another 20% of proven oil reserves without drilling one hole?

2 Limiters to Processor Performance

Power Wall / Voltage Wall Power components: Active Power Passive Power Gate leakage Sub-threshold leakage (source-drain leakage) Source: Tom’s Hardware Guide 1

Memory Wall Main memory now nearly 1000 cycles from the processor Situation worse with (on-chip) SMP Memory latency penalties drive inefficiency in the design Expensive and sophisticated hardware to try and deal with it Programmers that try to gain control of cache content are hindered by the hardware mechanisms Latency induced bandwidth limitations Much of the bandwidth to memory in systems can only be used speculatively 2

Frequency Wall Increasing frequencies and deeper pipelines have reached diminishing returns on performance Returns negative if power is taken into account Results of studies depend on issue width of processor The wider the processor the slower it wants to be Simultaneous Multithreading helps to use issue slots efficiently Results depend on number of architected registers and workload More registers tolerate deeper pipeline Fewer random branches in application tolerates deeper pipelines 3

Microprocessor Efficiency Gelsinger’s law 1.4x more performance for 2x more Hofstee’s corollary 1/1.4x efficiency loss in every generation Examples: Cache size, Out-of-Order, Super-scalar, etc. Source: Tom’s Hardware Guide Increasing performance requires increasing efficiency !!!

Attacking the Performance Walls Multi-Core Non-Homogeneous Architecture Control Plane vs. Data Plane processors Attacks Power Wall 3-level Model of Memory Main Memory, Local Store, Registers Attacks Memory Wall Large Shared Register File & SW Controlled Branching Allows deeper pipelines (11FO4 helps power) Attacks Frequency Wall

Cell BE Processor ~250M transistors ~235mm2 Top frequency >3GHz 9 cores, 10 threads > 200+ GFlops (SP) @3.2 GHz > 20+ GFlops (DP) @3.2 GHz Up to 25.6GB/s memory B/W Up to 76,8GB/s I/O B/W ~400M$(US) design investment

Key Attributes of Cell Cell is Multi-Core Contains 64-bit Power Architecture TM Contains 8 Synergistic Processor Elements (SPE) Cell is a Flexible Architecture Multi-OS support (including Linux) with Virtualization technology Path for OS, legacy apps, and software development Cell is a Broadband Architecture SPE is RISC architecture with SIMD organization and Local Store 128+ concurrent transactions to memory per processor Cell is a Real-Time Architecture Resource allocation (for Bandwidth Measurement) Locking Caches (via Replacement Management Tables) Cell is a Security Enabled Architecture SPE dynamically reconfigurable as secure processors

Power Processor Element (PPE) 64-bit Power Architecture™ with VMX In-order, 2-way hardware Multi-threading Coherent Load/Store with 32KB I & D L1 and 512KB L2 Controls the SPEs

Synergistic Processor Elements (SPEs) SPE provides computational performance Dual issue, up to 16-way 128-bit SIMD Dedicated resources: 128 128-bit register file, 256KB Local Store Each can be dynamically configured to protect resources Dedicated DMA engine: Up to 16 outstanding request Memory flow controller for DMA 25 GB/s DMA data transfer “ I/O Channels” for IPC Seperate Cores Simple Implementation (e.g. no branch prediction) No Caches No protected instructions

SPE BLOCK DIAGRAM Permute Unit Load-Store Unit Floating-Point Unit Fixed-Point Unit Branch Unit Channel Unit Result Forwarding and Staging Register File Local Store (256kB) Single Port SRAM 128B Read 128B Write DMA Unit Instruction Issue Unit / Instruction Line Buffer 8 Byte/Cycle 16 Byte/Cycle 128 Byte/Cycle 64 Byte/Cycle On-Chip Coherent Bus

Element Interconnect Bus Four 16 byte data rings, supporting multiple transfers 96B/cycle peak bandwidth Over 100 outstanding requests 300+ GByte/sec @ 3.2 GHz Element Interconnect Bus (EIB)

Four 16B data rings connecting 12 bus elements Two clockwise / Two counter-clockwise Physically overlaps all processor elements Central arbiter supports up to three concurrent transfers per data ring Two stage, dual round robin arbiter Each element port simultaneously supports 16B in and 16B out data path Ring topology is transparent to element data interface Element Interconnect Bus (EIB) 16B 16B 16B 16B Data Arb 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B 16B SPE0 SPE2 SPE4 SPE6 SPE7 SPE5 SPE3 SPE1 MIC PPE BIF/IOIF0 IOIF1

Example of eight concurrent transactions MIC SPE0 SPE2 SPE4 SPE6 BIF / IOIF1 Ramp 7 Controller Ramp 8 Controller Ramp 9 Controller Ramp 10 Controller Ramp 11 Controller Controller Ramp 0 Controller Ramp 1 Controller Ramp 2 Controller Ramp 3 Controller Ramp 4 Controller Ramp 5 Controller Ramp 6 Controller Ramp 7 Controller Ramp 8 Controller Ramp 9 Controller Ramp 10 Controller Ramp 11 Data Arbiter Ramp 7 Controller Ramp 8 Controller Ramp 9 Controller Ramp 10 Controller Ramp 11 Controller Controller Ramp 5 Controller Ramp 4 Controller Ramp 3 Controller Ramp 2 Controller Ramp 1 Controller Ramp 0 PPE SPE1 SPE3 SPE5 SPE7 IOIF1 PPE SPE1 SPE3 SPE5 SPE7 IOIF1 PPE SPE1 SPE3 SPE5 SPE7 IOIF1 MIC SPE0 SPE2 SPE4 SPE6 BIF / IOIF0 Ring1 Ring3 Ring0 Ring2 controls

I/O and Memory Interfaces I/O Provides wide bandwidth Dual XDR TM controller (25.6GB/s @ 3.2Gbps) Two configurable interfaces (76.8GB/s @6.4Gbps) Configurable number of Bytes Coherent or I/O Protection Allows for multiple system configurations

Game console systems Blades HDTV Home media servers Supercomputers ...... ? Cell processor can support many systems Cell BE Processor XDR tm XDR tm IOIF0 IOIF1 Cell BE Processor XDR tm XDR tm IOIF BIF Cell BE Processor XDR tm XDR tm IOIF Cell BE Proessor XDR tm XDR tm IOIF BIF Cell BE Processor XDR tm XDR tm IOIF Cell BE Processor XDR tm XDR tm IOIF BIF Cell BE Processor XDR tm XDR tm IOIF SW

Chassis Standard IBM BladeCenter with: 7 Blades (for 2 slots each) with full performance 2 switches (1Gb Ethernet) with 4 external ports each Updated Management Module Firmware. External Infiniband Switches with optional FC ports. Blade (400 GFLOPs) Game Processor and Support Logic: Dual Processor Configuration Single SMP OS image 1GB XDRAM Optionally PCI-exp attached standard graphics adapter BladeCenter Interface ( Based on IBM JS20): New Blade Power System and Sense Logic Control Firmware to connect processor & support logic to H8 service processor Signal Level Converters for processor & support logic 2 Infiniband (IB) Host Adapters with 2x IB 4x each Physical link drivers (GbE Phy etc) Chassis 2x (+12V RS-485,USB,GbEn) Rambus Design: DRAM 1/2GB Cell BE Processor H8 SP Blade Input Power &Sense Level Convert GbE Phy BladeCenter Interface Blade Cell BE Processor South Bridge Rambus Design: DRAM 1/2GB South Bridge IB 4X IB 4X Blade QS20 Hardware Description

QS20 Blade Assembly ATA Disk Service Proc. South Bridges InfiniBand Cards Blade Bezel

Up to 2 InfiniBand Cards can be attached. Standard PC InfiniBand Card with special bezel MHEA28-1TCSB Dual-Port HCA PCI Express x8 interface Dual 10 Gb/s InfiniBand 4X Ports 128 MB Local Memory IBTA v1.1 Compatible Design Options - InfiniBand

Cell Software Stack Firmware Applications SLOF powerpc architecture dependent code Cell Broadband Engine Linux memory management device drivers gcc ppc64, spu backend glibc Hardware RTAS Secondary Boot Loader powerpc- and cell- specific Linux code Low-level FW scheduler (pSeries) (PMac) cell User space Linux common code device drivers

Cell BE Development Platform Cell BE Firmware Graphics Std Devices Developer Workstation Cell Linux kernel Lower-level programming interface Basic Cell runtime: lib_spe, spelibc, … Basic Cell toolchain: gcc, binutils, gdb, oprofile, … Cell aware tooling Application Framework (segment specific) Standard Linux Development Environment  ppc64 Cell optimized libraries Cell specialized compilers Higher-level programming interface Application-level programming interface Tooling Libraries Cell enablement Cell exploitation Cell is an exotic platform and hard to program Challenging to exploit SPEs: Limited local memory (256 KB) – need to DMA data and code fragments back and forth Multi-level parallelism – 8 SPEs, 128-bit wide SIMD units in each SPE If done right, the result is impressive performance… Make Cell easier to program Hide complexity in critical libraries Compiler support for standard tasks, e.g., overlays, global data access, SW-managed cache, auto vectorization, auto parallelization, … Smart tooling Make Cell a standard platform Middleware and frameworks provide architecture-specific components and hide Cell –specifics from application developer

Alpha Quality SDK hosted on FC4 / X86 OS: Initial Linux Cell 2.6.14 patches SPE Threads runtime XLC Cell C Compiler SPE gdb debugger Cell Coding Sample Source Documentation Installation Scripts Cell Hardware Specs Programming Docs SDK1.0 GCC Tools from SCEA gcc 3.0 for Cell Binutils for Cell Alpha Quality SDK hosted on FC5 / X86 Critical Linux Cell Performance Enhancements Cell Enhanced Functions Critical Cell RAS Functions Machine Check, System Error Performance Analysis Tools Oprofile – PPU Cycle only profiling (No SPU) GNU Toolchain updates Mambo Updates Julia Set Sample SDK1.1 Execution platform: Cell Simulator Hosting platform: Linux/86 (FC4) 11/2005 7/2006 SDK 2.0 12/2006 XL C/C++ Linux/x86, LoP Overlay prototype Auto-SIMD enhancements Linux Kernel updates Performance Enhancements RAS/ Debug support SPE runtime extensions Interrupt controller enhancements GNU Toolchain updates FSF integration GDB multi-thread support Newlib library optimization Prog model support for overlay Programming Model Preview Overlay support Accelerated Libraries Framework Library enhancements Vector Math Library – Phase 1 MASS Library for PPU, MASSV Library for PPU/SPU IDE Tool integration Remote tool support Performance Analysis Visualization tools Bandwidth, Latency, Lock analyzers Performance debug tools Oprofile – SDK 1.1 plus PPU event based profiling Mambo Performance model correlation Visualization SDK1.0.1 Execution platform: Cell Simulator Cell Blade 1 rev 2 Hosting platform: Linux/86 (FC4) Linux/Cell (FC4)* Linux/Power (FC4)* Execution platform: Cell Simulator Cell Blade 1 rev 3 Hosting platform: Linux/86 (FC5) Linux/Cell (FC5)* Linux/Power (FC5)* Refresh Execution platform: Cell Simulator Cell Blade 1 rev 3 Hosting platform: Linux/86 (FC5) Linux/Cell (FC5)* Linux/Power (FC5)* 2/2006 Refresh 9/2006 SDK1.1.1 Documentation Mambo updates for CB1 and 64-bit hosting ISO image update * Subset of tools

Cell library content (source) ~ 156k loc Standard SPE C library subset optimized SPE C functions including stdlib c lib, math and etc. Audio resample - resampling audio signals FFT - 1D and 2D fft functions gmath - mathematic functions optimized for gaming environment image - convolution functions intrinsics - generic intrinsic conversion functions large-matrix - functions performing large matrix operations matrix - basic matrix operations mpm- multi-precision math functions noise - noise generation functions oscillator - basic sound generation functions sim- providing I/O channels to simulated environments surface - a set of bezier curve and surface functions sync - synchronization library vector - vector operation functions http://www.alphaworks.ibm.com/tech/cellsw

Peak GFLOPs FreeScale DC 1.5 GHz PPC 970 2.2 GHz AMD DC 2.2 GHz Intel SC 3.6 GHz Cell 3.0 GHz

Cell Processor Example Application Areas Cell is a processor that excels at processing of rich media content in the context of broad connectivity Digital content creation (games and movies) Game playing and game serving Distribution of (dynamic, media rich) content Imaging and image processing Image analysis (e.g. video surveillance) Next-generation physics-based visualization Video conferencing (3D?) Streaming applications (codecs etc.) Physical simulation & science

Opportunities for Cell BE Blade Aerospace & Defense Signal & Image Processing Security, Surveillance Simulation & Training, … Petroleum Industry Seismic computing Reservoir Modeling, … Communications Equipment LAN/MAN Routers Access Converged Networks Security, … Medical Imaging CT Scan Ultrasound, … Consumer / Digital Media Digital Content Creation Media Platform Video Surveillance, … Public Sector / Gov’t & Higher Educ. Signal & Image Processing Computational Chemistry, … Finance Trade modeling Industrial Semiconductor / LCD Video Conference Petroleum Industry A&D Comm Industrial Cell Assets Consumer Public Finance

Since 2000, Folding@Home (FAH) has led to a major jump in the capabilities of molecular simulation of: Protein folding and related diseases, including Alzheimer’s Disease, Huntington's Disease, and certain forms of cancer. By joining together hundreds of thousands of PCs throughout the world, calculations which were previously considered impossible have now become routine. Folding@Home utilizes the new Cell processor in Sony’s PLAYSTATION 3 (PS3) to achieve performance previously only possible on supercomputers. 14,000 PlayStation 3’s are literally outperforming 159,000 Windows Computers by more than Double! In fact they out perform all the other clients combined. http://folding.stanford.edu/FAQ-PS3.html Dr. V. S. Pande, folding@home, Distributed Computing Project, Stanford University

Ported by 235 584 tetrahedra 48 000 nodes 28 iterations in NKMG solver In 3.8 seconds Sustained Performance for large Objects: 52 GFLOP/s Multigrid Finite Element Solver on Cell using the free SDK www.digitalmedics.de ls7-www.cs.uni-dortmund.de

Computational Fluid Dynamics Solver on Cell Ported by Sustained Performance for large Objects: Not yet benchmarked (3/2007) using the free SDK www.digitalmedics.de ls7-www.cs.uni-dortmund.de

Computational Fluid Dynamics Solver on Cell A Lattice-Boltzmann Solver Developed by Fraunhofer IWTM http://www.itwm.fraunhofer.de/

Terrain Rendering Engine (TRE) and IBM Blades Systems and Technology Group Commodity Cell BE Blade Add Live Video, Aerial Information, Combat Situational Awareness Next-Gen GCS Combine Data & Render Aircraft data / Field Data BladeCenter-1 Chassis QS20

Example: Medical Computer Tomography (CT) Scans Image whole heart in 1 rotation 4D CT – includes time 2 slices 4 slices 8 slices 16 slices 32 slices 64 slices 128 slices 256 slices Current CT Products Future CT Products

The moving image is aligned to the fixed image as the registration proceeds. Fixed Image Moving Image Registration Process “ Image Registration” Using Cell

Small single-SPE models – a sample /* spe_foo.c: * A C program to be compiled into an executable called “spe_foo” */ int main( int speid, addr64 argp, addr64 envp ) { char i; /* do something intelligent here */ i = func_foo ( argp ); /* when the syscall is supported */ printf ( “Hello world! my result is %d \n”, i); return i ; }

extern spe_program_handle spe_foo ; /* the spe image handle from CESOF */ int main() { int rc, status; speid_t spe_id; /* load & start the spe_foo program on an allocated spe */ spe_id = spe_create_thread (0, &spe_foo, 0, NULL, -1, 0); /* wait for spe prog. to complete and return final status */ rc = spe_wait (spe_id, &status, 0); return status; } Small single-SPE models – PPE controlling program

Using SPEs (1) Simple Function Offload Remote Procedure Call Style SPE working set fits in Local Store PPE initiates DMA data/code transfers Could be easily supported by a programming env, e.g., RPC Style IDL Compiler Compiler Directives (pragmas) Libraries Or even automatic scheduling of code/data to SPEs (2) Typical (Complex) Function Offload SPE working set larger than Local Store PPE initially loads SPE LS with small startup code SPE initiates DMAs (code/data staging)  Stream data through code  Stream code through data Latency hiding required in most cases Requires "high locality of reference" characteristics Can be extended to a “services offload model” PowerPC (PPE) SPU Local Store MFC N SPE Puts Results PPE Puts Text Static Data Parameters SPE executes PowerPC (PPE) SPU Local Store MFC N SPE Puts Results PPE Puts Initial Text Static Data Parameters System Memory SPE Independently Stages Text & Intermediate Data Transfers while executing

Using SPEs (3) Pipelining for complex functions Functions split up in processing stages Direct LS to LS communication possible Including LS to LS DMA Avoid PPE / System Memory bottlenecks (4) Parallel stages for very compute-intense functions PPE partitions and distributes work to multiple SPEs SPU Local Store MFC N SPU Local Store MFC N Parallel-stages PowerPC (PPE) System Memory PowerPC (PPE) System Memory SPU Local Store MFC N SPU Local Store MFC N Multi-stage Pipeline SPU Local Store MFC N

Large single-SPE programming models Data or code working set cannot fit completely into a local store The PPE controlling process, kernel, and libspe runtime set up the system memory mapping as SPE’s secondary memory store The SPE program accesses the secondary memory store via its software-controlled SPE DMA engine - Memory Flow Controller (MFC) SPE Program System Memory PPE controller maps system memory for SPE DMA trans. DMA transactions Local Store

Large single-SPE programming models – I/O data System memory for large size input / output data e.g. Streaming model System memory int ip[32] int op[32] SPE program: op = func(ip) DMA DMA Local store int g_ip[512*1024] int g_op[512*1024]

Large single-SPE programming models System memory as secondary memory store Manual management of data buffers Automatic software-managed data cache Software cache framework libraries Compiler runtime support System memory SW cache entries SPE program Local store Global objects

Large single-SPE programming models System memory as secondary memory store Manual loading of plug-in into code buffer Plug-in framework libraries Automatic software-managed code overlay Compiler generated overlaying code System memory Local store SPE plug-in b SPE plug-in a SPE plug-in e SPE plug-in a SPE plug-in b SPE plug-in c SPE plug-in d SPE plug-in e SPE plug-in f

Large single-SPE prog. models – Job Queue Code and data packaged together as inputs to an SPE kernel program A multi-tasking model – more discussion later Job queue System memory Local store code/data n code/data n+1 code/data n+2 code/data … Code n Data n SPE kernel DMA

Large single-SPE programming models - DMA DMA latency handling is critical to overall performance for SPE programs moving large data or code Data pre-fetching is a key technique to hide DMA latency e.g. double-buffering Time I Buf 1 (n) O Buf 1 (n) I Buf 2 (n+1) O Buf 2 (n-1) SPE program: Func (n) output n-2 input n Output n-1 Func (input n ) Input n+1 Func (input n+1 ) Func (input n-1 ) output n Input n+2 DMAs SPE exec. DMAs SPE exec.

Large single-SPE programming models - CESOF C ell E mbedded S PE O bject F ormat (CESOF) and PPE/SPE toolchains support the resolution of SPE references to the global system memory objects in the effective-address space. _EAR_g_foo structure Local Store Space Effective Address Space DMA transactions CESOF EAR symbol resolution Char g_foo[512] Char local_foo[512]

Parallel programming models – Job Queue Large set of jobs fed through a group of SPE programs Streaming is a special case of job queue with regular and sequential data Each SPE program locks on the shared job queue to obtain next job For uneven jobs, workloads are self-balanced among available SPEs PPE SPE1 Kernel() SPE0 Kernel() SPE7 Kernel() System Memory I n . I 7 I 6 I 5 I 4 I 3 I 2 I 1 I 0 O n . O 7 O 6 O 5 O 4 O 3 O 2 O 1 O 0 … ..

Parallel programming models – Pipeline / Streaming Use LS to LS DMA bandwidth, not system memory bandwidth Flexibility in connecting pipeline functions Larger collective code size per pipeline Load-balance is harder PPE SPE1 Kernel 1 () SPE0 Kernel 0 () SPE7 Kernel 7 () System Memory I n . . I 6 I 5 I 4 I 3 I 2 I 1 I 0 O n . . O 6 O 5 O 4 O 3 O 2 O 1 O 0 … .. DMA DMA

Multi-tasking SPEs – LS resident multi-tasking Simplest multi-tasking programming model No memory protection among tasks Co-operative, Non-preemptive, event-driven scheduling Task a Task b Task c Task d Task x Event Dispatcher Local Store SPE n Event Queue a c a d x a c d

Multi-tasking SPEs – Self-managed multi-tasking Non-LS resident Blocked job context is swapped out of LS and scheduled back later to the job queue once unblocked System memory Local store task n task n+1 task n+2 Task … Code n Data n SPE kernel task n’ task queue Job queue

libspe sample code #include <libspe.h> int main(int argc, char *argv[], char *envp[]) { spe_program_handle_t *binary; speid_t spe_thread; int status; binary = spe_open_image(argv[1]); if (!binary) return 1; spe_thread = spe_create_thread(0, binary, argv+1, envp, -1, 0); if (!spe_thread) return 2; spe_wait(spe_thread, &status, 0); spe_close_image(binary); return status; }

Linux on Cell/B.E. kernel components Platform abstraction arch/powerpc/platforms/{cell,ps3,beat} Integrated Interrupt Handling I/O Memory Management Unit Power Management Hypervisor abstractions South Bridge drivers SPU file system

SPU file system Virtual File System /spu holds SPU contexts as directories Files are primary user interfaces New system calls: spu create and spu run SPU contexts abstracted from real SPU Preemptive context switching (W.I.P)

PPE on Cell is a 100% compliant ppc64! A solid base… Everything in a distribution, all middleware runs out of the box All tools available BUT: not optimized to exploit Cell Toolchain needs to cover Cell aspects Optimized, critical “middleware” for Cell needed Depending on workload requirements

Using SPEs: Task Based Abstraction  APIs provided by user space libraries SPE programs controlled via PPE-originated thread function calls spe_create_thread(), ... Calls on PPE and SPE Mailboxes DMA Events Simple runtime support (local store heap management, etc.) Lots of library extensions Encryption, signal processing, math operations

spu_create int spu create(const char *pathname, int flags, mode t mode); creates a new context in pathname returns an open file descriptor context is gets destroyed when fd is closed

spu_run uint32 t spu run(int fd, uint32 t *npc, uint32 t *status); transfers flow of control to SPU context fd returns when the context has stopped for some reason, e.g. exit or forceful abort callback from SPU to PPU can be interrupted by signals

PPE programming interfaces Asynchronous SPE thread API (“libspe 1.x”) spe_create_thread spe_wait spe_kill . . .

spe create thread implementation Allocate virtual SPE (spu create) Load SPE application code into context Start PPE thread using pthread create New thread calls spu run

More libspe interfaces Event notification int spe get event(struct spe event *, int nevents, int timeout); Message passing spe read out mbox(speid t speid); spe write in mbox(speid t speid); spe write signal(speid t speid, unsigned reg, unsigned data); Local store access void *spe get ls(speid t speid);

GNU tool chain PPE support Just another PowerPC variant. . . SPE support Just another embedded processor. . . Cell/B.E. support More than just PPE + SPE!

Object file format PPE: regular ppc/ppc64 ELF binaries SPE: new ELF flavour EM SPU 32-bit big-endian No shared libraries Manipulated via cross-binutils New: Code overlay support Cell/B.E.: combined object files embedspu: link into one binary .rodata.spuelf section in PPE object CESOF: SPE− >PPE symbol references

gcc on the PPE handled by “rs6000” back end Processor-specific tuning pipeline description

gcc on the SPE Merged Jan 3rd Built as cross-compiler Handles vector data types, intrinsics Middle-end support: branch hints, aggressive if-conversion GCC 4.1 port exploiting auto-vectorization No Java

Existing proprietary applications Games Volume rendering Real-time Raytracing Digital Video Monte Carlo simulation

Obviously missing ffmpeg, mplayer, VLC VDR, mythTV Xorg acceleration OpenSSL Your project here !!!

Questions! Thank you very much for your attention.

Documentation (new or recently updated) Cell Broadband Engine Cell Broadband Engine Architecture V1.0 Cell Broadband Engine Programming Handbook V1.0 Cell Broadband Engine Registers V1.3 SPU C/C++ Language Extensions V2.1 Synergistic Processor Unit (SPU) Instruction Set Architecture V1.1 SPU Application Binary Interface Specification V1.4 SPU Assembly Language Specification V1.3 Cell Broadband Engine Programming using the SDK Cell Broadband Engine SDK Installation and User's Guide V1.1 Cell Broadband Engine Programming Tutorial V1.1 Cell Broadband Engine Linux Reference Implementation ABI V1.0 SPE Runtime Management library documentation V1.1 SDK Sample Library documentation V1.1 IDL compiler documentation V1.1 New developerWorks Articles Maximizing the power of the Cell Broadband Engine processor Debugging Cell Broadband Engine systems

Documentation (new or recently updated) IBM Cell Broadband Engine Full-System Simulator IBM Full-System Simulator Users Guide IBM Full-System Simulator Command Reference Performance Analysis with the IBM Full-System Simulator IBM Full-System Simulator BogusNet HowTo PowerPC Architecture Book Book I: PowerPC User Instruction Set Architecture Version 2.02 Book II: PowerPC Virtual Environment Architecture Version 2.02 Book III: PowerPC Operating Environment Architecture Version 2.02 Vector/SIMD Multimedia Extension Technology Programming Environments Manual Version 2.06c

Links Cell Broadband Engine http://www-306.ibm.com/chips/techlib/techlib.nsf/products/Cell_Broadband_Engine IBM BladeCenter QS20 http://www-03.ibm.com/technology/splash/qs20/ Cell Broadband Engine resource center http://www-128.ibm.com/developerworks/power/cell/ Cell Broadband Engine resource center - Documentation archive http://www-128.ibm.com/developerworks/power/cell/docs_documentation.html Cell Broadband Engine technology http://www.alphaworks.ibm.com/topics/cell Power.org's Cell Developers Corner http://www.power.org/resources/devcorner/cellcorner Barcelona Supercomputer Center - Linux on Cell http://www.bsc.es/projects/deepcomputing/linuxoncell/ Barcelona Supercomputer Center - Documentation http://www.bsc.es/plantillaH.php?cat_id=262 Heiko J Schick's Cell Bookmarks http://del.icio.us/schihei/Cell

The Cell Processor

More Related Content

Similar to The Cell Processor

More from Heiko Joerg Schick

Recently uploaded

The Cell Processor

Editor's Notes