SlideShare a Scribd company logo
G O SG O SBeyond the GFLOPSBeyond the GFLOPS
Dominic Mallinson
Vice President, US R & D
Dominic Mallinson
Vice President, US R & D
Sony Computer Entertainment Inc.Sony Computer Entertainment Inc.
“Wh t t li b?“Why not go out on a limb?
That’s where the fruit is.”That s where the fruit is
(Will Rogers, cowboy, actor, philanthropist)
© 2007 SCE
Th C ll B db d E iTh C ll B db d E iThe Cell Broadband Engine
(Cell/B E ) Processor
The Cell Broadband Engine
(Cell/B E ) Processor(Cell/B.E.) Processor(Cell/B.E.) Processor
© 2007 SCE
The Cell/B.E. ProcessorThe Cell/B.E. Processor
Leading the industry in heterogeneous multi-core
200+ GFLOPS high performance computing
Leading the industry in heterogeneous multi-core
200+ GFLOPS high performance computing200+ GFLOPS high performance computing
But what lies beyond the GFLOPS statistics ?
200+ GFLOPS high performance computing
But what lies beyond the GFLOPS statistics ?
Why does an application need Cell/B.E.’s power ?
How can we make Cell/B.E.’s performance more accessible ?
Why does an application need Cell/B.E.’s power ?
How can we make Cell/B.E.’s performance more accessible ?
What part do you and the Cell/B.E.’s software community play ?What part do you and the Cell/B.E.’s software community play ?
© 2007 SCE
Why does SCE need
C ll/B E f ?
Why does SCE need
C ll/B E f ?Cell/B.E. performance ?Cell/B.E. performance ?
© 2007 SCE
Games and Virtual WorldGames and Virtual World
GBytes of data streaming through the CPU in real-time
100 f i ti 3D h t
GBytes of data streaming through the CPU in real-time
100 f i ti 3D h t100s of animating 3D characters on screen
True HD 3D Graphics with millions of vertices visible
100s of animating 3D characters on screen
True HD 3D Graphics with millions of vertices visible
Complex Artificial Intelligence techniques
Physical Simulation, cloth, fluids, soft and rigid bodies
Complex Artificial Intelligence techniques
Physical Simulation, cloth, fluids, soft and rigid bodies
Real-time spatial audio processing and encode
Millions of simultaneous users
Real-time spatial audio processing and encode
Millions of simultaneous users
© 2007 SCE
Potential for client and server to use Cell/B.E. processorPotential for client and server to use Cell/B.E. processor
Demo TimeDemo TimeDemo TimeDemo Time
© 2007 SCE
Media ProcessingMedia Processinggg
Blu-ray movie playback
1080p video decode in AVC VC1 or MPEG2
Blu-ray movie playback
1080p video decode in AVC VC1 or MPEG21080p video decode in AVC, VC1 or MPEG2
Simultaneous 480p “picture in picture” decode
7.1 multi-channel audio decode and mixing
1080p video decode in AVC, VC1 or MPEG2
Simultaneous 480p “picture in picture” decode
7.1 multi-channel audio decode and mixing7.1 multi channel audio decode and mixing
… and a Java™ VM
Remote Play function of PLAYSTATION®3 (PS3™)
7.1 multi channel audio decode and mixing
… and a Java™ VM
Remote Play function of PLAYSTATION®3 (PS3™)y ( )
Realtime AV encoding and streaming to a PlayStation®Portable
Multi-person AV Chat
y ( )
Realtime AV encoding and streaming to a PlayStation®Portable
Multi-person AV Chat
© 2007 SCE
1 encode plus up to 5 decodes, AEC noise reduction1 encode plus up to 5 decodes, AEC noise reduction
Folding@homeTM
on PS3Folding@homeTM
on PS3g@g@
A distributed computing project from Stanford University
R h i t t i i f ldi t h l d t d d fi d
A distributed computing project from Stanford University
R h i t t i i f ldi t h l d t d d fi dResearch into protein misfolding to help understand and find
treatments for diseases such as Alzheimer’s and cancer.
PS3 Client launched in March 2007
Research into protein misfolding to help understand and find
treatments for diseases such as Alzheimer’s and cancer.
PS3 Client launched in March 2007PS3 Client launched in March 2007
Over 250,000 unique PS3 users in the first month
488 TFLOPS (Stanford metrics from June 14th 2007)
PS3 Client launched in March 2007
Over 250,000 unique PS3 users in the first month
488 TFLOPS (Stanford metrics from June 14th 2007)( )
26,961 Active Cell/B.E. CPUs
More than doubled previous PC/GPU contributions
( )
26,961 Active Cell/B.E. CPUs
More than doubled previous PC/GPU contributions
© 2007 SCE
DEMODEMO
Accessing the power of Cell/B.E.Accessing the power of Cell/B.E.
© 2007 SCE
Accessing the power of Cell/B.E.Accessing the power of Cell/B.E.g pg p
The Cell/B.E. is designed for performanceThe Cell/B.E. is designed for performance
Maximum performance requires complex software
The upper quartile of engineers already achieve it
Maximum performance requires complex software
The upper quartile of engineers already achieve it
The lower quartile currently cannot
Research and Industry must bridge this gap
The lower quartile currently cannot
Research and Industry must bridge this gapy g g p
Many programming models are emerging
How does SCE tackle this problem ?
y g g p
Many programming models are emerging
How does SCE tackle this problem ?
© 2007 SCE
How does SCE tackle this problem ?How does SCE tackle this problem ?
SCE’s SPURS EnvironmentSCE’s SPURS Environment
A flexible, cooperative SPE management layerA flexible, cooperative SPE management layer
SPE-centric scheduling (minimal PPU overhead)
Low or zero context switch overhead
SPE-centric scheduling (minimal PPU overhead)
Low or zero context switch overhead
Application control for scheduling priorities
Supports sharing SPE with 3rd party middleware
Application control for scheduling priorities
Supports sharing SPE with 3rd party middlewareSupports sharing SPE with 3rd party middleware
Built on top of OS SPE Threads
Supports sharing SPE with 3rd party middleware
Built on top of OS SPE Threads
© 2007 SCE
Policy manager allows multiple modelsPolicy manager allows multiple models
Duck Demo SPE UsageDuck Demo SPE Usagegg
Old Code – no machine vision – 6 SPEsOld Code – no machine vision – 6 SPEs Old Code - machine vision – 8 SPEsOld Code - machine vision – 8 SPEs
SPE0 – Surface water physics
SPE1 – Splash physics
SPE2 – Boat 1 physics
SPE3 Boat 2 physics
SPE0 – Surface water physics
SPE1 – Splash physics
SPE2 – Boat 1 physics
SPE3 Boat 2 physics
SPE0-SPE5 UNCHANGED
Added machine vision, particle water
SPE0-SPE5 UNCHANGED
Added machine vision, particle water
© 2007 SCE
SPE3 – Boat 2 physics
SPE4 – Collision physics
SPE5 – Graphics
SPE3 – Boat 2 physics
SPE4 – Collision physics
SPE5 – Graphics
SPE6 – Particle water physics
SPE7 – Machine vision
SPE6 – Particle water physics
SPE7 – Machine vision
Goal: Everything on 6 SPEsGoal: Everything on 6 SPEsy gy g
Refactor with SPURSRefactor with SPURSNaïve use of SPURSNaïve use of SPURS
Refactor machine vision
Refactor particle water
Use SPURS to share SPEs
Refactor machine vision
Refactor particle water
Use SPURS to share SPEs
Just try to move work around
Water + Boat 2 is over time
Graphics + Machine vision
Just try to move work around
Water + Boat 2 is over time
Graphics + Machine vision
© 2007 SCE
Use SPURS to share SPEs
Room to ‘breath’
Use SPURS to share SPEs
Room to ‘breath’
Graphics + Machine vision
Fits but no room to flex
Graphics + Machine vision
Fits but no room to flex
SCE’s SPURS EnvironmentSCE’s SPURS Environment
The “Tasks” policy module
Si il t th d b t ti h d li
The “Tasks” policy module
Si il t th d b t ti h d liSimilar to threads but cooperative scheduling
SPE’s pull tasks from a shared memory pool
Good for mid to high complexity programs
Similar to threads but cooperative scheduling
SPE’s pull tasks from a shared memory pool
Good for mid to high complexity programsGood for mid to high complexity programs
The “Jobs” policy module
Stateless execution kernels (specify all input/output)
Good for mid to high complexity programs
The “Jobs” policy module
Stateless execution kernels (specify all input/output)Stateless execution kernels (specify all input/output)
SPE’s pull from a shared queue of jobs
Good for low to mid complexity programs
Stateless execution kernels (specify all input/output)
SPE’s pull from a shared queue of jobs
Good for low to mid complexity programs
© 2007 SCE
Good for low to mid complexity programs
Ideal for stream processing
Good for low to mid complexity programs
Ideal for stream processing
Job StreamingJob Streaming
PPE thread
gg
Divide a program and data into pieces (called Jobs)
Define dependencies between groups of jobs
Divide a program and data into pieces (called Jobs)
Define dependencies between groups of jobs
J b Li t
p g p j
Build Job Lists
SPEs grab Jobs and execute them in parallel
p g p j
Build Job Lists
SPEs grab Jobs and execute them in parallel
Job
Job
Job
Job
Job
Job
Job
Job
Job List
Job
Program
and
Data
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
© 2007 SCE
Job
Job
Job
Job
Job
PPE thread
Job Streaming PipelineJob Streaming Pipelineg pg p
RAM RAMRAM RAM RAM
SPU
Execute
Code*,
Parameters SPE
JD Address Execute
Input
Data
Output
Data
Parameters,
I/O addresses,
I/O sizes,
etc.
CODEJD Address
© 2007 SCE
“prefetch”“prefetch” “input”“input” “execute”“execute” “output”“output”
Multi-BufferingMulti-Bufferinggg
Job stages are interleaved so that DMA memory transfers will be
in progress during job execution
Job stages are interleaved so that DMA memory transfers will be
in progress during job execution
Each color represents a different job.Each color represents a different job.
in progress during job execution.in progress during job execution.
prefetch prefetch prefetch prefetch prefetch
I t I t I t I t I tInput Input Input Input Input
Exec Exec Exec Exec Exec
Output Output Output Output Output
TIMETIME
P P S P S E P S E F P S E F S E F E F F
P t ti ll th i t lli f t f !P t ti ll th i t lli f t f !
© 2007 SCE
Potentially, there is no stalling for memory transfers!Potentially, there is no stalling for memory transfers!
SCE’s SPURS EnvironmentSCE’s SPURS Environment
SPURS solves part of the problem
All ff ti h i f th SPE
SPURS solves part of the problem
All ff ti h i f th SPEAllows effective sharing of the SPE resources
Simplifies the programming and synchronization
B t it till d ’t b id th
Allows effective sharing of the SPE resources
Simplifies the programming and synchronization
B t it till d ’t b id thBut it still doesn’t bridge the gap
We need higher level models which provide…
f S
But it still doesn’t bridge the gap
We need higher level models which provide…
f SAutomatic DMA for large code and data on SPE
Parallel programming abstractions
S l bl h i ti th d
Automatic DMA for large code and data on SPE
Parallel programming abstractions
S l bl h i ti th d
© 2007 SCE
Scalable synchronization methods
Full debug and performance analysis
Scalable synchronization methods
Full debug and performance analysis
The Cell/B E Software CommunityThe Cell/B E Software CommunityThe Cell/B.E. Software CommunityThe Cell/B.E. Software Community
© 2007 SCE
The Importance of the CoCThe Importance of the CoCpp
The Center of Competence is a focal point
T b i t th h d i d t
The Center of Competence is a focal point
T b i t th h d i d tTo bring together researchers and industry
To help develop optimized ‘standard’ libraries for Cell/B.E
Research new programming languages/models
To bring together researchers and industry
To help develop optimized ‘standard’ libraries for Cell/B.E
Research new programming languages/modelsResearch new programming languages/models
Research new compiler techniques
General multi-core / parallel programming research
Research new programming languages/models
Research new compiler techniques
General multi-core / parallel programming research
Dealing with distributed memory hierarchies
Research scalability of synchronization methods
De elop tools that can help is ali e parallel soft are
Dealing with distributed memory hierarchies
Research scalability of synchronization methods
De elop tools that can help is ali e parallel soft are
© 2007 SCE
Develop tools that can help visualize parallel softwareDevelop tools that can help visualize parallel software
Industry SupportIndustry Supporty ppy pp
Terra Soft Solutions – Yellow Dog Linux for PS3Terra Soft Solutions – Yellow Dog Linux for PS3
Mercury Systems
RapidMind
Mercury Systems
RapidMindp
Cmpware, Inc.
Reservoir Labs
p
Cmpware, Inc.
Reservoir LabsReservoir Labs
Gedae
Reservoir Labs
Gedae
© 2007 SCE
allineaallinea
Concluding ThoughtsConcluding Thoughtsg gg g
© 2007 SCE
Concluding ThoughtsConcluding Thoughtsg gg g
The Cell/B.E. has amazing performanceThe Cell/B.E. has amazing performance
Its available now in consumer and HPC marketsIts available now in consumer and HPC markets
We need more software targeting Cell/B.E.
We need Cell/B E ’s power to be more accessible
We need more software targeting Cell/B.E.
We need Cell/B E ’s power to be more accessibleWe need Cell/B.E. s power to be more accessible
We need more research into Cell/B.E. and multi-core
We need Cell/B.E. s power to be more accessible
We need more research into Cell/B.E. and multi-core
© 2007 SCE
We need YOU to help us goWe need YOU to help us goWe need YOU to help us go..We need YOU to help us go..
Beyond the GFLOPSBeyond the GFLOPS
© 2007 SCE

More Related Content

Similar to Beyond the GFLOPS

Systems Proposal
Systems ProposalSystems Proposal
Systems Proposal
Larry Jennings
 
Ab initio training Ab-initio Architecture
Ab initio training Ab-initio ArchitectureAb initio training Ab-initio Architecture
Ab initio training Ab-initio Architecture
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Sirius: Graphical Editors for your DSLs
Sirius: Graphical Editors for your DSLsSirius: Graphical Editors for your DSLs
Sirius: Graphical Editors for your DSLs
mikaelbarbero
 
Three key concepts for java batch
Three key concepts for java batchThree key concepts for java batch
Three key concepts for java batch
timfanelli
 
Floyd Imaging
Floyd ImagingFloyd Imaging
Hosseini sv07
Hosseini sv07Hosseini sv07
Hosseini sv07
Obsidian Software
 
Acug datafiniti pellon_sept2013
Acug datafiniti pellon_sept2013Acug datafiniti pellon_sept2013
Acug datafiniti pellon_sept2013
Datafiniti
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation Brief
Boni Bruno
 
Spanner : Google' s Globally Distributed Database
Spanner : Google' s Globally Distributed DatabaseSpanner : Google' s Globally Distributed Database
Spanner : Google' s Globally Distributed Database
Ahmedmchayaa
 
Innoslate the Gateway to SysML 2.0 and Beyond
Innoslate the Gateway to SysML 2.0 and BeyondInnoslate the Gateway to SysML 2.0 and Beyond
Innoslate the Gateway to SysML 2.0 and Beyond
SarahCraig7
 
3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdf3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdf
PawachMetharattanara
 
AI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience ReportsAI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience Reports
University of Antwerp
 
An Overview of Hadoop
An Overview of HadoopAn Overview of Hadoop
An Overview of Hadoop
Asif Ali
 
Modern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettextModern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettext
Alexander Mostovenko
 
Compile ahead of time. It's fine?
Compile ahead of time. It's fine?Compile ahead of time. It's fine?
Compile ahead of time. It's fine?
Dmitry Chuyko
 
Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02
Sqrrl
 
Simulation Versus Acceleration, Versus Emulation
Simulation Versus Acceleration, Versus EmulationSimulation Versus Acceleration, Versus Emulation
Simulation Versus Acceleration, Versus Emulation
DVClub
 
Engine Terminology
Engine Terminology Engine Terminology
Engine Terminology
copelandadam
 
OpenStack at Scale Inside NetApp
OpenStack at Scale Inside NetAppOpenStack at Scale Inside NetApp
OpenStack at Scale Inside NetApp
Tesora
 
Common Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngineCommon Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngine
Slide_N
 

Similar to Beyond the GFLOPS (20)

Systems Proposal
Systems ProposalSystems Proposal
Systems Proposal
 
Ab initio training Ab-initio Architecture
Ab initio training Ab-initio ArchitectureAb initio training Ab-initio Architecture
Ab initio training Ab-initio Architecture
 
Sirius: Graphical Editors for your DSLs
Sirius: Graphical Editors for your DSLsSirius: Graphical Editors for your DSLs
Sirius: Graphical Editors for your DSLs
 
Three key concepts for java batch
Three key concepts for java batchThree key concepts for java batch
Three key concepts for java batch
 
Floyd Imaging
Floyd ImagingFloyd Imaging
Floyd Imaging
 
Hosseini sv07
Hosseini sv07Hosseini sv07
Hosseini sv07
 
Acug datafiniti pellon_sept2013
Acug datafiniti pellon_sept2013Acug datafiniti pellon_sept2013
Acug datafiniti pellon_sept2013
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation Brief
 
Spanner : Google' s Globally Distributed Database
Spanner : Google' s Globally Distributed DatabaseSpanner : Google' s Globally Distributed Database
Spanner : Google' s Globally Distributed Database
 
Innoslate the Gateway to SysML 2.0 and Beyond
Innoslate the Gateway to SysML 2.0 and BeyondInnoslate the Gateway to SysML 2.0 and Beyond
Innoslate the Gateway to SysML 2.0 and Beyond
 
3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdf3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdf
 
AI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience ReportsAI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience Reports
 
An Overview of Hadoop
An Overview of HadoopAn Overview of Hadoop
An Overview of Hadoop
 
Modern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettextModern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettext
 
Compile ahead of time. It's fine?
Compile ahead of time. It's fine?Compile ahead of time. It's fine?
Compile ahead of time. It's fine?
 
Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02
 
Simulation Versus Acceleration, Versus Emulation
Simulation Versus Acceleration, Versus EmulationSimulation Versus Acceleration, Versus Emulation
Simulation Versus Acceleration, Versus Emulation
 
Engine Terminology
Engine Terminology Engine Terminology
Engine Terminology
 
OpenStack at Scale Inside NetApp
OpenStack at Scale Inside NetAppOpenStack at Scale Inside NetApp
OpenStack at Scale Inside NetApp
 
Common Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngineCommon Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngine
 

More from Slide_N

Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4
Slide_N
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
Towards Cell Broadband Engine - Together with Playstation
Towards Cell Broadband Engine  - Together with PlaystationTowards Cell Broadband Engine  - Together with Playstation
Towards Cell Broadband Engine - Together with Playstation
Slide_N
 
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
Slide_N
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Slide_N
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
Slide_N
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
Slide_N
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
Slide_N
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
Slide_N
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
Slide_N
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
Slide_N
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
Slide_N
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
Slide_N
 
The Technology behind PlayStation 2
The Technology behind PlayStation 2The Technology behind PlayStation 2
The Technology behind PlayStation 2
Slide_N
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
Slide_N
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Slide_N
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
Slide_N
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
Slide_N
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
Slide_N
 

More from Slide_N (20)

Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
 
Towards Cell Broadband Engine - Together with Playstation
Towards Cell Broadband Engine  - Together with PlaystationTowards Cell Broadband Engine  - Together with Playstation
Towards Cell Broadband Engine - Together with Playstation
 
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
 
The Technology behind PlayStation 2
The Technology behind PlayStation 2The Technology behind PlayStation 2
The Technology behind PlayStation 2
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 

Recently uploaded

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 

Recently uploaded (20)

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 

Beyond the GFLOPS

  • 1. G O SG O SBeyond the GFLOPSBeyond the GFLOPS Dominic Mallinson Vice President, US R & D Dominic Mallinson Vice President, US R & D Sony Computer Entertainment Inc.Sony Computer Entertainment Inc.
  • 2. “Wh t t li b?“Why not go out on a limb? That’s where the fruit is.”That s where the fruit is (Will Rogers, cowboy, actor, philanthropist) © 2007 SCE
  • 3. Th C ll B db d E iTh C ll B db d E iThe Cell Broadband Engine (Cell/B E ) Processor The Cell Broadband Engine (Cell/B E ) Processor(Cell/B.E.) Processor(Cell/B.E.) Processor © 2007 SCE
  • 4. The Cell/B.E. ProcessorThe Cell/B.E. Processor Leading the industry in heterogeneous multi-core 200+ GFLOPS high performance computing Leading the industry in heterogeneous multi-core 200+ GFLOPS high performance computing200+ GFLOPS high performance computing But what lies beyond the GFLOPS statistics ? 200+ GFLOPS high performance computing But what lies beyond the GFLOPS statistics ? Why does an application need Cell/B.E.’s power ? How can we make Cell/B.E.’s performance more accessible ? Why does an application need Cell/B.E.’s power ? How can we make Cell/B.E.’s performance more accessible ? What part do you and the Cell/B.E.’s software community play ?What part do you and the Cell/B.E.’s software community play ? © 2007 SCE
  • 5. Why does SCE need C ll/B E f ? Why does SCE need C ll/B E f ?Cell/B.E. performance ?Cell/B.E. performance ? © 2007 SCE
  • 6. Games and Virtual WorldGames and Virtual World GBytes of data streaming through the CPU in real-time 100 f i ti 3D h t GBytes of data streaming through the CPU in real-time 100 f i ti 3D h t100s of animating 3D characters on screen True HD 3D Graphics with millions of vertices visible 100s of animating 3D characters on screen True HD 3D Graphics with millions of vertices visible Complex Artificial Intelligence techniques Physical Simulation, cloth, fluids, soft and rigid bodies Complex Artificial Intelligence techniques Physical Simulation, cloth, fluids, soft and rigid bodies Real-time spatial audio processing and encode Millions of simultaneous users Real-time spatial audio processing and encode Millions of simultaneous users © 2007 SCE Potential for client and server to use Cell/B.E. processorPotential for client and server to use Cell/B.E. processor
  • 7. Demo TimeDemo TimeDemo TimeDemo Time © 2007 SCE
  • 8. Media ProcessingMedia Processinggg Blu-ray movie playback 1080p video decode in AVC VC1 or MPEG2 Blu-ray movie playback 1080p video decode in AVC VC1 or MPEG21080p video decode in AVC, VC1 or MPEG2 Simultaneous 480p “picture in picture” decode 7.1 multi-channel audio decode and mixing 1080p video decode in AVC, VC1 or MPEG2 Simultaneous 480p “picture in picture” decode 7.1 multi-channel audio decode and mixing7.1 multi channel audio decode and mixing … and a Java™ VM Remote Play function of PLAYSTATION®3 (PS3™) 7.1 multi channel audio decode and mixing … and a Java™ VM Remote Play function of PLAYSTATION®3 (PS3™)y ( ) Realtime AV encoding and streaming to a PlayStation®Portable Multi-person AV Chat y ( ) Realtime AV encoding and streaming to a PlayStation®Portable Multi-person AV Chat © 2007 SCE 1 encode plus up to 5 decodes, AEC noise reduction1 encode plus up to 5 decodes, AEC noise reduction
  • 9. Folding@homeTM on PS3Folding@homeTM on PS3g@g@ A distributed computing project from Stanford University R h i t t i i f ldi t h l d t d d fi d A distributed computing project from Stanford University R h i t t i i f ldi t h l d t d d fi dResearch into protein misfolding to help understand and find treatments for diseases such as Alzheimer’s and cancer. PS3 Client launched in March 2007 Research into protein misfolding to help understand and find treatments for diseases such as Alzheimer’s and cancer. PS3 Client launched in March 2007PS3 Client launched in March 2007 Over 250,000 unique PS3 users in the first month 488 TFLOPS (Stanford metrics from June 14th 2007) PS3 Client launched in March 2007 Over 250,000 unique PS3 users in the first month 488 TFLOPS (Stanford metrics from June 14th 2007)( ) 26,961 Active Cell/B.E. CPUs More than doubled previous PC/GPU contributions ( ) 26,961 Active Cell/B.E. CPUs More than doubled previous PC/GPU contributions © 2007 SCE DEMODEMO
  • 10. Accessing the power of Cell/B.E.Accessing the power of Cell/B.E. © 2007 SCE
  • 11. Accessing the power of Cell/B.E.Accessing the power of Cell/B.E.g pg p The Cell/B.E. is designed for performanceThe Cell/B.E. is designed for performance Maximum performance requires complex software The upper quartile of engineers already achieve it Maximum performance requires complex software The upper quartile of engineers already achieve it The lower quartile currently cannot Research and Industry must bridge this gap The lower quartile currently cannot Research and Industry must bridge this gapy g g p Many programming models are emerging How does SCE tackle this problem ? y g g p Many programming models are emerging How does SCE tackle this problem ? © 2007 SCE How does SCE tackle this problem ?How does SCE tackle this problem ?
  • 12. SCE’s SPURS EnvironmentSCE’s SPURS Environment A flexible, cooperative SPE management layerA flexible, cooperative SPE management layer SPE-centric scheduling (minimal PPU overhead) Low or zero context switch overhead SPE-centric scheduling (minimal PPU overhead) Low or zero context switch overhead Application control for scheduling priorities Supports sharing SPE with 3rd party middleware Application control for scheduling priorities Supports sharing SPE with 3rd party middlewareSupports sharing SPE with 3rd party middleware Built on top of OS SPE Threads Supports sharing SPE with 3rd party middleware Built on top of OS SPE Threads © 2007 SCE Policy manager allows multiple modelsPolicy manager allows multiple models
  • 13. Duck Demo SPE UsageDuck Demo SPE Usagegg Old Code – no machine vision – 6 SPEsOld Code – no machine vision – 6 SPEs Old Code - machine vision – 8 SPEsOld Code - machine vision – 8 SPEs SPE0 – Surface water physics SPE1 – Splash physics SPE2 – Boat 1 physics SPE3 Boat 2 physics SPE0 – Surface water physics SPE1 – Splash physics SPE2 – Boat 1 physics SPE3 Boat 2 physics SPE0-SPE5 UNCHANGED Added machine vision, particle water SPE0-SPE5 UNCHANGED Added machine vision, particle water © 2007 SCE SPE3 – Boat 2 physics SPE4 – Collision physics SPE5 – Graphics SPE3 – Boat 2 physics SPE4 – Collision physics SPE5 – Graphics SPE6 – Particle water physics SPE7 – Machine vision SPE6 – Particle water physics SPE7 – Machine vision
  • 14. Goal: Everything on 6 SPEsGoal: Everything on 6 SPEsy gy g Refactor with SPURSRefactor with SPURSNaïve use of SPURSNaïve use of SPURS Refactor machine vision Refactor particle water Use SPURS to share SPEs Refactor machine vision Refactor particle water Use SPURS to share SPEs Just try to move work around Water + Boat 2 is over time Graphics + Machine vision Just try to move work around Water + Boat 2 is over time Graphics + Machine vision © 2007 SCE Use SPURS to share SPEs Room to ‘breath’ Use SPURS to share SPEs Room to ‘breath’ Graphics + Machine vision Fits but no room to flex Graphics + Machine vision Fits but no room to flex
  • 15. SCE’s SPURS EnvironmentSCE’s SPURS Environment The “Tasks” policy module Si il t th d b t ti h d li The “Tasks” policy module Si il t th d b t ti h d liSimilar to threads but cooperative scheduling SPE’s pull tasks from a shared memory pool Good for mid to high complexity programs Similar to threads but cooperative scheduling SPE’s pull tasks from a shared memory pool Good for mid to high complexity programsGood for mid to high complexity programs The “Jobs” policy module Stateless execution kernels (specify all input/output) Good for mid to high complexity programs The “Jobs” policy module Stateless execution kernels (specify all input/output)Stateless execution kernels (specify all input/output) SPE’s pull from a shared queue of jobs Good for low to mid complexity programs Stateless execution kernels (specify all input/output) SPE’s pull from a shared queue of jobs Good for low to mid complexity programs © 2007 SCE Good for low to mid complexity programs Ideal for stream processing Good for low to mid complexity programs Ideal for stream processing
  • 16. Job StreamingJob Streaming PPE thread gg Divide a program and data into pieces (called Jobs) Define dependencies between groups of jobs Divide a program and data into pieces (called Jobs) Define dependencies between groups of jobs J b Li t p g p j Build Job Lists SPEs grab Jobs and execute them in parallel p g p j Build Job Lists SPEs grab Jobs and execute them in parallel Job Job Job Job Job Job Job Job Job List Job Program and Data Job Job Job Job Job Job Job Job Job Job Job Job Job © 2007 SCE Job Job Job Job Job PPE thread
  • 17. Job Streaming PipelineJob Streaming Pipelineg pg p RAM RAMRAM RAM RAM SPU Execute Code*, Parameters SPE JD Address Execute Input Data Output Data Parameters, I/O addresses, I/O sizes, etc. CODEJD Address © 2007 SCE “prefetch”“prefetch” “input”“input” “execute”“execute” “output”“output”
  • 18. Multi-BufferingMulti-Bufferinggg Job stages are interleaved so that DMA memory transfers will be in progress during job execution Job stages are interleaved so that DMA memory transfers will be in progress during job execution Each color represents a different job.Each color represents a different job. in progress during job execution.in progress during job execution. prefetch prefetch prefetch prefetch prefetch I t I t I t I t I tInput Input Input Input Input Exec Exec Exec Exec Exec Output Output Output Output Output TIMETIME P P S P S E P S E F P S E F S E F E F F P t ti ll th i t lli f t f !P t ti ll th i t lli f t f ! © 2007 SCE Potentially, there is no stalling for memory transfers!Potentially, there is no stalling for memory transfers!
  • 19. SCE’s SPURS EnvironmentSCE’s SPURS Environment SPURS solves part of the problem All ff ti h i f th SPE SPURS solves part of the problem All ff ti h i f th SPEAllows effective sharing of the SPE resources Simplifies the programming and synchronization B t it till d ’t b id th Allows effective sharing of the SPE resources Simplifies the programming and synchronization B t it till d ’t b id thBut it still doesn’t bridge the gap We need higher level models which provide… f S But it still doesn’t bridge the gap We need higher level models which provide… f SAutomatic DMA for large code and data on SPE Parallel programming abstractions S l bl h i ti th d Automatic DMA for large code and data on SPE Parallel programming abstractions S l bl h i ti th d © 2007 SCE Scalable synchronization methods Full debug and performance analysis Scalable synchronization methods Full debug and performance analysis
  • 20. The Cell/B E Software CommunityThe Cell/B E Software CommunityThe Cell/B.E. Software CommunityThe Cell/B.E. Software Community © 2007 SCE
  • 21. The Importance of the CoCThe Importance of the CoCpp The Center of Competence is a focal point T b i t th h d i d t The Center of Competence is a focal point T b i t th h d i d tTo bring together researchers and industry To help develop optimized ‘standard’ libraries for Cell/B.E Research new programming languages/models To bring together researchers and industry To help develop optimized ‘standard’ libraries for Cell/B.E Research new programming languages/modelsResearch new programming languages/models Research new compiler techniques General multi-core / parallel programming research Research new programming languages/models Research new compiler techniques General multi-core / parallel programming research Dealing with distributed memory hierarchies Research scalability of synchronization methods De elop tools that can help is ali e parallel soft are Dealing with distributed memory hierarchies Research scalability of synchronization methods De elop tools that can help is ali e parallel soft are © 2007 SCE Develop tools that can help visualize parallel softwareDevelop tools that can help visualize parallel software
  • 22. Industry SupportIndustry Supporty ppy pp Terra Soft Solutions – Yellow Dog Linux for PS3Terra Soft Solutions – Yellow Dog Linux for PS3 Mercury Systems RapidMind Mercury Systems RapidMindp Cmpware, Inc. Reservoir Labs p Cmpware, Inc. Reservoir LabsReservoir Labs Gedae Reservoir Labs Gedae © 2007 SCE allineaallinea
  • 24. Concluding ThoughtsConcluding Thoughtsg gg g The Cell/B.E. has amazing performanceThe Cell/B.E. has amazing performance Its available now in consumer and HPC marketsIts available now in consumer and HPC markets We need more software targeting Cell/B.E. We need Cell/B E ’s power to be more accessible We need more software targeting Cell/B.E. We need Cell/B E ’s power to be more accessibleWe need Cell/B.E. s power to be more accessible We need more research into Cell/B.E. and multi-core We need Cell/B.E. s power to be more accessible We need more research into Cell/B.E. and multi-core © 2007 SCE
  • 25. We need YOU to help us goWe need YOU to help us goWe need YOU to help us go..We need YOU to help us go.. Beyond the GFLOPSBeyond the GFLOPS © 2007 SCE