Beyond the GFLOPS

G O SG O SBeyond the GFLOPSBeyond the GFLOPS
Dominic Mallinson
Vice President, US R & D
Dominic Mallinson
Vice President, US R & D
Sony Computer Entertainment Inc.Sony Computer Entertainment Inc.

“Wh t t li b?“Why not go out on a limb?
That’s where the fruit is.”That s where the fruit is
(Will Rogers, cowboy, actor, philanthropist)
© 2007 SCE

Th C ll B db d E iTh C ll B db d E iThe Cell Broadband Engine
(Cell/B E ) Processor
The Cell Broadband Engine
(Cell/B E ) Processor(Cell/B.E.) Processor(Cell/B.E.) Processor
© 2007 SCE

The Cell/B.E. ProcessorThe Cell/B.E. Processor
Leading the industry in heterogeneous multi-core
200+ GFLOPS high performance computing
Leading the industry in heterogeneous multi-core
200+ GFLOPS high performance computing200+ GFLOPS high performance computing
But what lies beyond the GFLOPS statistics ?
200+ GFLOPS high performance computing
But what lies beyond the GFLOPS statistics ?
Why does an application need Cell/B.E.’s power ?
How can we make Cell/B.E.’s performance more accessible ?
Why does an application need Cell/B.E.’s power ?
How can we make Cell/B.E.’s performance more accessible ?
What part do you and the Cell/B.E.’s software community play ?What part do you and the Cell/B.E.’s software community play ?
© 2007 SCE

Why does SCE need
C ll/B E f ?
Why does SCE need
C ll/B E f ?Cell/B.E. performance ?Cell/B.E. performance ?
© 2007 SCE

Games and Virtual WorldGames and Virtual World
GBytes of data streaming through the CPU in real-time
100 f i ti 3D h t
GBytes of data streaming through the CPU in real-time
100 f i ti 3D h t100s of animating 3D characters on screen
True HD 3D Graphics with millions of vertices visible
100s of animating 3D characters on screen
True HD 3D Graphics with millions of vertices visible
Complex Artificial Intelligence techniques
Physical Simulation, cloth, fluids, soft and rigid bodies
Complex Artificial Intelligence techniques
Physical Simulation, cloth, fluids, soft and rigid bodies
Real-time spatial audio processing and encode
Millions of simultaneous users
Real-time spatial audio processing and encode
Millions of simultaneous users
© 2007 SCE
Potential for client and server to use Cell/B.E. processorPotential for client and server to use Cell/B.E. processor

Demo TimeDemo TimeDemo TimeDemo Time
© 2007 SCE

Media ProcessingMedia Processinggg
Blu-ray movie playback
1080p video decode in AVC VC1 or MPEG2
Blu-ray movie playback
1080p video decode in AVC VC1 or MPEG21080p video decode in AVC, VC1 or MPEG2
Simultaneous 480p “picture in picture” decode
7.1 multi-channel audio decode and mixing
1080p video decode in AVC, VC1 or MPEG2
Simultaneous 480p “picture in picture” decode
7.1 multi-channel audio decode and mixing7.1 multi channel audio decode and mixing
… and a Java™ VM
Remote Play function of PLAYSTATION®3 (PS3™)
7.1 multi channel audio decode and mixing
… and a Java™ VM
Remote Play function of PLAYSTATION®3 (PS3™)y ( )
Realtime AV encoding and streaming to a PlayStation®Portable
Multi-person AV Chat
y ( )
Realtime AV encoding and streaming to a PlayStation®Portable
Multi-person AV Chat
© 2007 SCE
1 encode plus up to 5 decodes, AEC noise reduction1 encode plus up to 5 decodes, AEC noise reduction

Folding@homeTM
on PS3Folding@homeTM
on PS3g@g@
A distributed computing project from Stanford University
R h i t t i i f ldi t h l d t d d fi d
A distributed computing project from Stanford University
R h i t t i i f ldi t h l d t d d fi dResearch into protein misfolding to help understand and find
treatments for diseases such as Alzheimer’s and cancer.
PS3 Client launched in March 2007
Research into protein misfolding to help understand and find
treatments for diseases such as Alzheimer’s and cancer.
PS3 Client launched in March 2007PS3 Client launched in March 2007
Over 250,000 unique PS3 users in the first month
488 TFLOPS (Stanford metrics from June 14th 2007)
PS3 Client launched in March 2007
Over 250,000 unique PS3 users in the first month
488 TFLOPS (Stanford metrics from June 14th 2007)( )
26,961 Active Cell/B.E. CPUs
More than doubled previous PC/GPU contributions
( )
26,961 Active Cell/B.E. CPUs
More than doubled previous PC/GPU contributions
© 2007 SCE
DEMODEMO

Accessing the power of Cell/B.E.Accessing the power of Cell/B.E.
© 2007 SCE

Accessing the power of Cell/B.E.Accessing the power of Cell/B.E.g pg p
The Cell/B.E. is designed for performanceThe Cell/B.E. is designed for performance
Maximum performance requires complex software
The upper quartile of engineers already achieve it
Maximum performance requires complex software
The upper quartile of engineers already achieve it
The lower quartile currently cannot
Research and Industry must bridge this gap
The lower quartile currently cannot
Research and Industry must bridge this gapy g g p
Many programming models are emerging
How does SCE tackle this problem ?
y g g p
Many programming models are emerging
How does SCE tackle this problem ?
© 2007 SCE
How does SCE tackle this problem ?How does SCE tackle this problem ?

SCE’s SPURS EnvironmentSCE’s SPURS Environment
A flexible, cooperative SPE management layerA flexible, cooperative SPE management layer
SPE-centric scheduling (minimal PPU overhead)
Low or zero context switch overhead
SPE-centric scheduling (minimal PPU overhead)
Low or zero context switch overhead
Application control for scheduling priorities
Supports sharing SPE with 3rd party middleware
Application control for scheduling priorities
Supports sharing SPE with 3rd party middlewareSupports sharing SPE with 3rd party middleware
Built on top of OS SPE Threads
Supports sharing SPE with 3rd party middleware
Built on top of OS SPE Threads
© 2007 SCE
Policy manager allows multiple modelsPolicy manager allows multiple models

Duck Demo SPE UsageDuck Demo SPE Usagegg
Old Code – no machine vision – 6 SPEsOld Code – no machine vision – 6 SPEs Old Code - machine vision – 8 SPEsOld Code - machine vision – 8 SPEs
SPE0 – Surface water physics
SPE1 – Splash physics
SPE2 – Boat 1 physics
SPE3 Boat 2 physics
SPE0 – Surface water physics
SPE1 – Splash physics
SPE3 Boat 2 physics
SPE0-SPE5 UNCHANGED
Added machine vision, particle water
SPE0-SPE5 UNCHANGED
Added machine vision, particle water
© 2007 SCE
SPE4 – Collision physics
SPE5 – Graphics
SPE4 – Collision physics
SPE5 – Graphics
SPE6 – Particle water physics
SPE7 – Machine vision
SPE6 – Particle water physics
SPE7 – Machine vision

Goal: Everything on 6 SPEsGoal: Everything on 6 SPEsy gy g
Refactor with SPURSRefactor with SPURSNaïve use of SPURSNaïve use of SPURS
Refactor machine vision
Refactor particle water
Use SPURS to share SPEs
Refactor machine vision
Refactor particle water
Just try to move work around
Water + Boat 2 is over time
Graphics + Machine vision
Just try to move work around
Water + Boat 2 is over time
© 2007 SCE
Room to ‘breath’
Room to ‘breath’
Fits but no room to flex
Fits but no room to flex

The “Tasks” policy module
Si il t th d b t ti h d li
The “Tasks” policy module
Si il t th d b t ti h d liSimilar to threads but cooperative scheduling
SPE’s pull tasks from a shared memory pool
Good for mid to high complexity programs
Similar to threads but cooperative scheduling
SPE’s pull tasks from a shared memory pool
Good for mid to high complexity programsGood for mid to high complexity programs
The “Jobs” policy module
Stateless execution kernels (specify all input/output)
Good for mid to high complexity programs
The “Jobs” policy module
Stateless execution kernels (specify all input/output)Stateless execution kernels (specify all input/output)
SPE’s pull from a shared queue of jobs
Good for low to mid complexity programs
Stateless execution kernels (specify all input/output)
SPE’s pull from a shared queue of jobs
© 2007 SCE
Ideal for stream processing
Ideal for stream processing

Job StreamingJob Streaming
PPE thread
gg
Divide a program and data into pieces (called Jobs)
Define dependencies between groups of jobs
Divide a program and data into pieces (called Jobs)
Define dependencies between groups of jobs
J b Li t
p g p j
Build Job Lists
SPEs grab Jobs and execute them in parallel
p g p j
Build Job Lists
SPEs grab Jobs and execute them in parallel
Job
Job
Job
Job
Job
Job
Job
Job
Job List
Job
Program
and
Data
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
© 2007 SCE
Job
Job
Job
Job
Job
PPE thread

Job Streaming PipelineJob Streaming Pipelineg pg p
RAM RAMRAM RAM RAM
SPU
Execute
Code*,
Parameters SPE
JD Address Execute
Input
Data
Output
Data
Parameters,
I/O addresses,
I/O sizes,
etc.
CODEJD Address
© 2007 SCE
“prefetch”“prefetch” “input”“input” “execute”“execute” “output”“output”

Multi-BufferingMulti-Bufferinggg
Job stages are interleaved so that DMA memory transfers will be
in progress during job execution
Job stages are interleaved so that DMA memory transfers will be
in progress during job execution
Each color represents a different job.Each color represents a different job.
in progress during job execution.in progress during job execution.
prefetch prefetch prefetch prefetch prefetch
I t I t I t I t I tInput Input Input Input Input
Exec Exec Exec Exec Exec
Output Output Output Output Output
TIMETIME
P P S P S E P S E F P S E F S E F E F F
P t ti ll th i t lli f t f !P t ti ll th i t lli f t f !
© 2007 SCE
Potentially, there is no stalling for memory transfers!Potentially, there is no stalling for memory transfers!

SPURS solves part of the problem
All ff ti h i f th SPE
SPURS solves part of the problem
All ff ti h i f th SPEAllows effective sharing of the SPE resources
Simplifies the programming and synchronization
B t it till d ’t b id th
Allows effective sharing of the SPE resources
Simplifies the programming and synchronization
B t it till d ’t b id thBut it still doesn’t bridge the gap
We need higher level models which provide…
f S
But it still doesn’t bridge the gap
We need higher level models which provide…
f SAutomatic DMA for large code and data on SPE
Parallel programming abstractions
S l bl h i ti th d
Automatic DMA for large code and data on SPE
Parallel programming abstractions
S l bl h i ti th d
© 2007 SCE
Scalable synchronization methods
Full debug and performance analysis
Scalable synchronization methods
Full debug and performance analysis

The Cell/B E Software CommunityThe Cell/B E Software CommunityThe Cell/B.E. Software CommunityThe Cell/B.E. Software Community
© 2007 SCE

The Importance of the CoCThe Importance of the CoCpp
The Center of Competence is a focal point
T b i t th h d i d t
The Center of Competence is a focal point
T b i t th h d i d tTo bring together researchers and industry
To help develop optimized ‘standard’ libraries for Cell/B.E
Research new programming languages/models
To bring together researchers and industry
To help develop optimized ‘standard’ libraries for Cell/B.E
Research new programming languages/modelsResearch new programming languages/models
Research new compiler techniques
General multi-core / parallel programming research
Research new programming languages/models
Research new compiler techniques
General multi-core / parallel programming research
Dealing with distributed memory hierarchies
Research scalability of synchronization methods
De elop tools that can help is ali e parallel soft are
Dealing with distributed memory hierarchies
Research scalability of synchronization methods
De elop tools that can help is ali e parallel soft are
© 2007 SCE
Develop tools that can help visualize parallel softwareDevelop tools that can help visualize parallel software

Industry SupportIndustry Supporty ppy pp
Terra Soft Solutions – Yellow Dog Linux for PS3Terra Soft Solutions – Yellow Dog Linux for PS3
Mercury Systems
RapidMind
Mercury Systems
RapidMindp
Cmpware, Inc.
Reservoir Labs
p
Cmpware, Inc.
Reservoir LabsReservoir Labs
Gedae
Reservoir Labs
Gedae
© 2007 SCE
allineaallinea

Concluding ThoughtsConcluding Thoughtsg gg g
The Cell/B.E. has amazing performanceThe Cell/B.E. has amazing performance
Its available now in consumer and HPC marketsIts available now in consumer and HPC markets
We need more software targeting Cell/B.E.
We need Cell/B E ’s power to be more accessible
We need more software targeting Cell/B.E.
We need Cell/B E ’s power to be more accessibleWe need Cell/B.E. s power to be more accessible
We need more research into Cell/B.E. and multi-core
We need Cell/B.E. s power to be more accessible
We need more research into Cell/B.E. and multi-core
© 2007 SCE

We need YOU to help us goWe need YOU to help us goWe need YOU to help us go..We need YOU to help us go..
Beyond the GFLOPSBeyond the GFLOPS
© 2007 SCE

Beyond the GFLOPS

Recommended

Recommended

More Related Content

Similar to Beyond the GFLOPS

Similar to Beyond the GFLOPS (20)

More from Slide_N

More from Slide_N (20)

Recently uploaded

Recently uploaded (20)

Beyond the GFLOPS