SlideShare a Scribd company logo
Introduction
Sony Corporation develops technologies for products used in
graphics operations and scientific calculation using the Cell
Broadband EngineTM
(Cell/B.E.) and the RSX employed in the
PLAYSTATION®
3, which are developed by Sony Computer
Entertainment Incorporated (SCEI). This document explains
the principal technologies.
Trend toward multi-core processors
Drastic improvements in performance in recent years have
enabled microprocessors to handle large amounts of data
at high speed. The target applications are being extended,
and the amount of computation required in each field is
increasing. In addition to conventional general-purpose
processors, recently developed application-specific processors
exploit the full potential of their performance for a given
application. For example, the Graphics Processing Unit (GPU)
is a special processor for processing 2D/3D graphics at high
speed. This provides very high computational performance
that cannot be achieved by conventional general-purpose
processors.
The trend in the microprocessor market is moving to
asymmetric multi-core processors, where different types of
processors optimized for a specific use coexist. Considering
this technology trend, Sony Corporation has developed the
“Cell Computing Unit”, which uses the high computational
performance provided by the Cell/B.E. and the RSX
processors.
This technology provides solutions to multimedia computing,
which requires large number of computations, such as
those encountered in image-processing applications, like
computer graphics, and scientific computations for non-image-
processing applications, such as encryption technology for
security (Fig. 1).
Multi-core processor with powerful
computational performance - Cell/B.E. -
The Cell/B.E. is an asymmetric multi-core processor jointly
developed by Sony/SCEI, Toshiba Corporation, and IBM
Corporation. This processor alone provides high floating-point
arithmetic operations at 230 GFLOPS. In anticipation of real-
time operation on 4K x 2K or High Definition images, the Cell
processor is optimally designed for multimedia processing
and usage in distributed computing environments.
Architecture of the Cell/B.E.
Two types of processor cores
Two types of processor cores are mounted in the Cell/B.E.:
PowerPC Processor Element (PPE), a control-intensive
processor core for general-purpose processing; and Synergistic
Processor Element (SPE), a processor core for high-speed
data processing. The coordinated operation of these processor
cores give rise to the high level of computational performance
of the Cell/B.E..
The Basics of Cell Computing
Technology
GPU
Target area
Cell/B.E.
3D Volume
Rendering
Game
CG
Rendering
Image
Processing
Search
AV Codec
Vision
Computing
Security
(AES/DES)
Science
Computation
Science
Visualization
CAD
GUI
Web
Document
Editor
File
Sysrem Network
Protocol
GPU
Non-Graphic Processing
Monolithic
Graphic Processing
Parallel
Processing
Fig. 1 Target area of Cell Computing Technology
1
WHITE PAPER
2
WHITE PAPER The Basics of Cell Computing Technology
High-speed internal bus
Four ring-shaped broadband buses called Element
Interconnect Bus (EIB) connect the cores of the Cell/B.E. and
peripheral chips. PPE, SPE, main memory (XDR™ DRAM),
and external I/O devices (the RSX, SouthBridge, etc.) are also
connected to the EIB (Fig. 2).
General-purpose processor core - PPE -
PPE is a general-purpose processor core with a 64-bit Power
Architecture™ and can run existing software compatible with
the PowerPC chip. One PPE is mounted in the Cell/B.E.. The
PPE not only executes the operating system (input/output
control to main memory, external devices, etc.) but also
controls the SPEs.
L1/L2 caches
The PPE is equipped with 32 KB of L1 cache for instructions
and data, respectively, and 512 KB of unified L2 cache for
both instructions and data.
Hardware multithreading
Simultaneous multithreading technology is implemented in
the PPE to provide two-way hardware multithreading.
Vector operation unit
A 128-bit vector operation unit supports the Single Instruction
Multiple Data (SIMD) operations by Vector/SIMD Multimedia
Extensions (VMX) technology.
Computation-intensive processor core - SPE -
SPE is a computationally intensive processor core that excels
at multimedia processing, especially for repeated calculations.
The PPE, on the other hand is optimized for complicated
program control. Eight SPEs are employed in the Cell/B.E.,
Each SPE operates independently of the PPE and other SPEs
and provides high computational performance.
Vector oriented RISC processor
The SPU instruction set architecture (ISA) supports 128-bit
SIMD operations. All basic data processing is performed by
this operation. Because of its very simple architecture, the
SPE can realize high-speed processing where calculations can
be predicted. It is very suitable for execution of an application
where real time processing is required.
Abundant register files
As many as 128 of the 128-bit registers are on the SPE
because of the large amount of data processing is likely to
be performed by this device. By taking advantage of the
abundant register files, high-speed arithmetic operations can
be performed on a large volume of data.
SPE-dedicated local memory
Each SPE is equipped with 256 KB of scratch pad type
memory called Local Store (LS). With no cache mounted, an
SPE accesses LS directly at high speed. DMA transfer is used
for data transfer between main memory and LS.
Memory flow controller
Access from SPE to main memory, PPE and other external
I/O devices is performed through a unit called the Memory
Flow Controller (MFC). The MFC is equipped with a DMA
Controller to exchange data with the main memory. Because
the MFC operates independently of program execution in an
SPE, it can transfer data to external devices in parallel.
Graphics engine with high-speed I/O function
- RSX -
The RSX is a graphics engine jointly developed by NVIDIA
Corporation and SCEI. This device can be connected to the
Cell/B.E. using a 128-bit memory interface called FlexIO™.
Programmable shader engine
The RSX processes graphics with vertex shader programs
created by the user and computes fragment colors with
fragment shader programs. The RSX has multiple fragment
shader engines for precise graphics operation without
overloading the Cell/B.E., which is difficult in fixed pipelined
graphics engines.
Rendering pipeline
The rendering pipeline in the RSX consists of the host/
frontend, geometry, raster, fragmentation shader/texture,
L2 texture cache, 2D raster, and Raster Operation (ROP). It
also includes a frame buffer to control memory access and a
display block to scan out from local memory (Fig. 3).
Fig. 2 Overview of the Cell/B.E. and Peripheral Chips
IOIF1/FlexIO™
MIC/XIO IOIF0/FlexIO™
PPE
PPU
L1
L2
SPE1
SPU
LS
MFC
SPE3
SPU
LS
MFC
SPE5
SPU
LS
MFC
SPE7
SPU
LS
MFC
RSX®
XDR™
DRAM
SPE0
SPU
LS
MFC
SPE2
SPU
LS
MFC
SPE4
SPU
LS
MFC
SPE6
SPU
LS
MFC
South
Bridge
EIB
3
The Basics of Cell Computing Technology WHITE PAPER
High-speed I/O function
The Cell/B.E. and the RSX are connected via a high-speed
bus called FlexIO for efficient graphics operations. The RSX
also features an external I/O function with performance
exceeding 4 GB/s. Therefore, it can easily handle the I/O
of high-resolution video signals, such as 4K x 2K and High
Definition. It can also be used in I/F for professional use by
taking advantage of this high-speed I/O performance.
Sony's New Development
-"Cell Computing Unit"
Features of Cell Computing Unit
The newly developed “Cell Computing Unit” is equipped with
the high-performance Cell/B.E. processor with high floating-
point arithmetic operation of 230 GFLOPS. Additional use
of the RSX graphics engine enables much faster operation.
An open system is offered by adopting the Linux operating
system. Its eight SPEs can be freely used to support high-level
of calculations in multimedia applications.
Software configuration
Cell Computing Unit is equipped with two systems: the main
system for general applications and the mini system for
maintenance. Linux is the operating system so that they can
be switched at system startup. Under normal operation, the
main system executes user applications. The mini system for
maintenance is mainly used to update firmware and drivers.
Graphics API
Graphics programs for the RSX support the subset of OpenGL
1.5, which is an industry standard for graphics APIs. They
also support the Cg shader language developed by NVIDIA
Corporation for RSX-native graphics control. Therefore, the
RSX acceleration can be programmed at a low level.
Software development
Applications running on the main system of the Cell
Computing Unit can be developed under Linux on Intel
architecture (x86). A cross-platform toolchain (binutils,
compilers, etc.), debuggers, performance analysis tools, and
integrated development environments are used.
The SPE runtime management library for the Cell/B.E. and the
graphics library for RSX hardware acceleration are currently
being prepared.
Porting and optimization of the existing programs
In optimized porting of existing programs, it is important
to understand how multiple SPEs are used. The basic
optimization process of existing programs with the Cell/B.E.
consist of three processes: (1) porting to PPE, (2) performance
measurement and profiling, and (3) offloading to SPEs
(Fig. 4). SPE offloading ports part of the processing in a
program to the various SPEs. In the optimization process,
a spiral type process is generally used, where profiling is
performed again after optimization. Then, further optimization
in performance is possible.
Faster programs
One can drastically improve the processing performance of
programs with the Cell/B.E. by using a variety of optimization
techniques. The most important optimization technique
is how parallel they are. Programs are designed and
implemented so that the following three parallelisms may be
improved in every aspect.
Host/FE
Geometry
Raster 2D Raster
Fragment Shader
& Texture
L2 Texture
Cache
FB
ROP
(Raster Operation)
IOIF/FlexIO™
Cell/B.E.
Fig. 3 Major blocks in rendering pipeline
Fig. 4 Basic process of optimization
Existing Programs
Porting to PPE
Performance Analysis/Profiling
Decide optimazation strategies
SPE Offloading
Other Architectures
Power Architecture™
Cell/B.E. Architecture
4
WHITE PAPER The Basics of Cell Computing Technology
Parallelism at the thread level
The PPE can execute two threads in parallel with
simultaneous multithreading. When eight SPEs are used, it
can execute up to ten threads in parallel. This is because,
in general, the PPE thread can directly access external
I/O devices and handle such interruptions as hardware
exceptions, it controls the entire system, while the SPE thread
mainly handles data calculations. What kind of processing is
performed in each thread must be considered to distribute the
load of the PPE and SPE threads.
Parallelism at the data level
The streaming data operation repeats the same processing
for multiple data. In the operation, one SIMD instruction
can process multiple data in parallel (Fig. 5). All instructions
in the SPE, including
arithmetic, comparison,
and bit operation
instructions, support
SIMD operation. The
SIMD operation is also
available in the PPE by
using VMX instructions
extended from AltiVec
technology.
Parallelism at the instruction level
An SPE has two long pipelines (Even/Odd) with 26 stages.
The instructions executed in these two pipelines are
predetermined: arithmetic operations are executed in the
Even pipeline, and load/store instructions are executed in the
Odd pipeline. By optimizing the instruction sequence of a
program, two instructions can be issued simultaneously per
cycle (Dual Issue).
Conclusion
Sony Corporation will continue to develop technologies that
enable large-scale calculation at high speed, which could not
be obtained from traditional servers and workstations. We will
strive for development of state-of-the-art technologies in high
performance computing, such as CG rendering and scientific
calculations by utilizing the maximum performance of the
high-end Cell/B.E. processor and the high-speed GPU RSX.
Trademarks
Sony is a registered trademark of Sony Corporation.
PLAYSTATION is a registered trademark of Sony Computer Entertainment Inc.
mental images, mental ray and mental mill are registered trademarks and
MetaSL is a trademark of mental images GmbH.
All other trademarks are the properties of their respective owners.
Disclaimer
© 2008 Sony Corporation, All rights reserved.
Reproduction in whole or in part without written permission of Sony
Corporation is prohibited.
Features and specifications are subject to change without notice.
Fig. 5 SIMD operation (ADD instruction)
a3a2a1a0
++++
b3b2b1b0
a3+b3a2+b2a1+b1a0+b0

More Related Content

What's hot

Virtual Memory In Contemporary Microprocessors And 64-Bit Microprocessors Arc...
Virtual Memory In Contemporary Microprocessors And 64-Bit Microprocessors Arc...Virtual Memory In Contemporary Microprocessors And 64-Bit Microprocessors Arc...
Virtual Memory In Contemporary Microprocessors And 64-Bit Microprocessors Arc...
Anurag Deb
 
Computer Organization Lecture Notes
Computer Organization Lecture NotesComputer Organization Lecture Notes
Computer Organization Lecture Notes
FellowBuddy.com
 
Avionics Paperdoc
Avionics PaperdocAvionics Paperdoc
Avionics Paperdoc
Falascoj
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its Types
Nimrah Shahbaz
 
Cliff sugerman
Cliff sugermanCliff sugerman
Cliff sugerman
clifford sugerman
 
Computer processing
Computer processingComputer processing
Computer processing
Shakila Mahjabin
 
parallel processing
parallel processingparallel processing
parallel processing
Sufiyana Mujawar
 
Bus Standards and Networking
Bus Standards and NetworkingBus Standards and Networking
Bus Standards and Networking
Prabu U
 
Computer Organization (Unit-1)
Computer Organization (Unit-1)Computer Organization (Unit-1)
Computer Organization (Unit-1)
Harsh Pandya
 
Coa
CoaCoa
Sybsc cs sem 3 physical computing and iot programming unit 1
Sybsc cs sem 3 physical computing and iot programming unit 1Sybsc cs sem 3 physical computing and iot programming unit 1
Sybsc cs sem 3 physical computing and iot programming unit 1
WE-IT TUTORIALS
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architectures
A B Shinde
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - Review
Abdelrahman Hosny
 
Introduction to Microprocessors
Introduction to MicroprocessorsIntroduction to Microprocessors
Introduction to Microprocessors
76 Degree Creative
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
GIKI
 
generations of computer
generations of computergenerations of computer
generations of computer
abdulraufsiddiqui1
 
Coa presentation4
Coa presentation4Coa presentation4
Coa presentation4
rickypatel151
 
Computer Architecture and organization
Computer Architecture and organizationComputer Architecture and organization
Computer Architecture and organization
Badrinath Kadam
 
Ca lecture 03
Ca lecture 03Ca lecture 03
Ca lecture 03
Haris456
 
computer organization and architecture notes
computer organization and architecture notescomputer organization and architecture notes
computer organization and architecture notes
Upasana Talukdar
 

What's hot (20)

Virtual Memory In Contemporary Microprocessors And 64-Bit Microprocessors Arc...
Virtual Memory In Contemporary Microprocessors And 64-Bit Microprocessors Arc...Virtual Memory In Contemporary Microprocessors And 64-Bit Microprocessors Arc...
Virtual Memory In Contemporary Microprocessors And 64-Bit Microprocessors Arc...
 
Computer Organization Lecture Notes
Computer Organization Lecture NotesComputer Organization Lecture Notes
Computer Organization Lecture Notes
 
Avionics Paperdoc
Avionics PaperdocAvionics Paperdoc
Avionics Paperdoc
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its Types
 
Cliff sugerman
Cliff sugermanCliff sugerman
Cliff sugerman
 
Computer processing
Computer processingComputer processing
Computer processing
 
parallel processing
parallel processingparallel processing
parallel processing
 
Bus Standards and Networking
Bus Standards and NetworkingBus Standards and Networking
Bus Standards and Networking
 
Computer Organization (Unit-1)
Computer Organization (Unit-1)Computer Organization (Unit-1)
Computer Organization (Unit-1)
 
Coa
CoaCoa
Coa
 
Sybsc cs sem 3 physical computing and iot programming unit 1
Sybsc cs sem 3 physical computing and iot programming unit 1Sybsc cs sem 3 physical computing and iot programming unit 1
Sybsc cs sem 3 physical computing and iot programming unit 1
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architectures
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - Review
 
Introduction to Microprocessors
Introduction to MicroprocessorsIntroduction to Microprocessors
Introduction to Microprocessors
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
generations of computer
generations of computergenerations of computer
generations of computer
 
Coa presentation4
Coa presentation4Coa presentation4
Coa presentation4
 
Computer Architecture and organization
Computer Architecture and organizationComputer Architecture and organization
Computer Architecture and organization
 
Ca lecture 03
Ca lecture 03Ca lecture 03
Ca lecture 03
 
computer organization and architecture notes
computer organization and architecture notescomputer organization and architecture notes
computer organization and architecture notes
 

Similar to The Basics of Cell Computing Technology

Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Slide_N
 
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architecture
Michael Gschwind
 
An area and power efficient on chip communication architectures for image enc...
An area and power efficient on chip communication architectures for image enc...An area and power efficient on chip communication architectures for image enc...
An area and power efficient on chip communication architectures for image enc...
eSAT Publishing House
 
UNIT 1 SONCA.pptx
UNIT 1 SONCA.pptxUNIT 1 SONCA.pptx
UNIT 1 SONCA.pptx
mohan134666
 
computer processors intel and amd
computer processors intel and amdcomputer processors intel and amd
computer processors intel and amd
Rohit Gada
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
Heiko Joerg Schick
 
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
Michael Gschwind
 
Designing of telecommand system using system on chip soc for spacecraft contr...
Designing of telecommand system using system on chip soc for spacecraft contr...Designing of telecommand system using system on chip soc for spacecraft contr...
Designing of telecommand system using system on chip soc for spacecraft contr...
IAEME Publication
 
Glossary of terms (assignment...)
Glossary of terms (assignment...)Glossary of terms (assignment...)
Glossary of terms (assignment...)
gordonpj96
 
System_on_Chip_SOC.ppt
System_on_Chip_SOC.pptSystem_on_Chip_SOC.ppt
System_on_Chip_SOC.ppt
zahixdd
 
Assembly Language for x86 Processors 7th Edition Chapter 2 : x86 Processor Ar...
Assembly Language for x86 Processors 7th Edition Chapter 2 : x86 Processor Ar...Assembly Language for x86 Processors 7th Edition Chapter 2 : x86 Processor Ar...
Assembly Language for x86 Processors 7th Edition Chapter 2 : x86 Processor Ar...
ssuser65bfce
 
UNIT 1.docx
UNIT 1.docxUNIT 1.docx
UNIT 1.docx
Nagendrababu Vasa
 
Glossary of terms (assignment...)
Glossary of terms (assignment...)Glossary of terms (assignment...)
Glossary of terms (assignment...)
gordonpj96
 
Embedded System basic and classifications
Embedded System basic and classificationsEmbedded System basic and classifications
Embedded System basic and classifications
rajkciitr
 
Electronics Engineer Portfolio
Electronics Engineer PortfolioElectronics Engineer Portfolio
Electronics Engineer Portfolio
Anupama Sujith
 
The Best Programming Practice for Cell/B.E.
The Best Programming Practice for Cell/B.E.The Best Programming Practice for Cell/B.E.
The Best Programming Practice for Cell/B.E.
Slide_N
 
Implementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applicationsImplementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applications
IOSR Journals
 
Physical computing and iot programming final with cp sycs sem 3
Physical computing and iot programming final with cp sycs sem 3Physical computing and iot programming final with cp sycs sem 3
Physical computing and iot programming final with cp sycs sem 3
WE-IT TUTORIALS
 
chameleon chip
chameleon chipchameleon chip
chameleon chip
Sucharita Bohidar
 
An introduction to digital signal processors 1
An introduction to digital signal processors 1An introduction to digital signal processors 1
An introduction to digital signal processors 1
Hossam Hassan
 

Similar to The Basics of Cell Computing Technology (20)

Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
 
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architecture
 
An area and power efficient on chip communication architectures for image enc...
An area and power efficient on chip communication architectures for image enc...An area and power efficient on chip communication architectures for image enc...
An area and power efficient on chip communication architectures for image enc...
 
UNIT 1 SONCA.pptx
UNIT 1 SONCA.pptxUNIT 1 SONCA.pptx
UNIT 1 SONCA.pptx
 
computer processors intel and amd
computer processors intel and amdcomputer processors intel and amd
computer processors intel and amd
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
 
Designing of telecommand system using system on chip soc for spacecraft contr...
Designing of telecommand system using system on chip soc for spacecraft contr...Designing of telecommand system using system on chip soc for spacecraft contr...
Designing of telecommand system using system on chip soc for spacecraft contr...
 
Glossary of terms (assignment...)
Glossary of terms (assignment...)Glossary of terms (assignment...)
Glossary of terms (assignment...)
 
System_on_Chip_SOC.ppt
System_on_Chip_SOC.pptSystem_on_Chip_SOC.ppt
System_on_Chip_SOC.ppt
 
Assembly Language for x86 Processors 7th Edition Chapter 2 : x86 Processor Ar...
Assembly Language for x86 Processors 7th Edition Chapter 2 : x86 Processor Ar...Assembly Language for x86 Processors 7th Edition Chapter 2 : x86 Processor Ar...
Assembly Language for x86 Processors 7th Edition Chapter 2 : x86 Processor Ar...
 
UNIT 1.docx
UNIT 1.docxUNIT 1.docx
UNIT 1.docx
 
Glossary of terms (assignment...)
Glossary of terms (assignment...)Glossary of terms (assignment...)
Glossary of terms (assignment...)
 
Embedded System basic and classifications
Embedded System basic and classificationsEmbedded System basic and classifications
Embedded System basic and classifications
 
Electronics Engineer Portfolio
Electronics Engineer PortfolioElectronics Engineer Portfolio
Electronics Engineer Portfolio
 
The Best Programming Practice for Cell/B.E.
The Best Programming Practice for Cell/B.E.The Best Programming Practice for Cell/B.E.
The Best Programming Practice for Cell/B.E.
 
Implementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applicationsImplementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applications
 
Physical computing and iot programming final with cp sycs sem 3
Physical computing and iot programming final with cp sycs sem 3Physical computing and iot programming final with cp sycs sem 3
Physical computing and iot programming final with cp sycs sem 3
 
chameleon chip
chameleon chipchameleon chip
chameleon chip
 
An introduction to digital signal processors 1
An introduction to digital signal processors 1An introduction to digital signal processors 1
An introduction to digital signal processors 1
 

More from Slide_N

Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4
Slide_N
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
Common Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngineCommon Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngine
Slide_N
 
Towards Cell Broadband Engine - Together with Playstation
Towards Cell Broadband Engine  - Together with PlaystationTowards Cell Broadband Engine  - Together with Playstation
Towards Cell Broadband Engine - Together with Playstation
Slide_N
 
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
Slide_N
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Slide_N
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
Slide_N
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
Slide_N
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
Slide_N
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
Slide_N
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
Slide_N
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
Slide_N
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
Slide_N
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
Slide_N
 
The Technology behind PlayStation 2
The Technology behind PlayStation 2The Technology behind PlayStation 2
The Technology behind PlayStation 2
Slide_N
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
Slide_N
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
Slide_N
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Slide_N
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
Slide_N
 

More from Slide_N (20)

Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
 
Common Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngineCommon Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngine
 
Towards Cell Broadband Engine - Together with Playstation
Towards Cell Broadband Engine  - Together with PlaystationTowards Cell Broadband Engine  - Together with Playstation
Towards Cell Broadband Engine - Together with Playstation
 
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
 
The Technology behind PlayStation 2
The Technology behind PlayStation 2The Technology behind PlayStation 2
The Technology behind PlayStation 2
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
 

Recently uploaded

Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 

Recently uploaded (20)

Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 

The Basics of Cell Computing Technology

  • 1. Introduction Sony Corporation develops technologies for products used in graphics operations and scientific calculation using the Cell Broadband EngineTM (Cell/B.E.) and the RSX employed in the PLAYSTATION® 3, which are developed by Sony Computer Entertainment Incorporated (SCEI). This document explains the principal technologies. Trend toward multi-core processors Drastic improvements in performance in recent years have enabled microprocessors to handle large amounts of data at high speed. The target applications are being extended, and the amount of computation required in each field is increasing. In addition to conventional general-purpose processors, recently developed application-specific processors exploit the full potential of their performance for a given application. For example, the Graphics Processing Unit (GPU) is a special processor for processing 2D/3D graphics at high speed. This provides very high computational performance that cannot be achieved by conventional general-purpose processors. The trend in the microprocessor market is moving to asymmetric multi-core processors, where different types of processors optimized for a specific use coexist. Considering this technology trend, Sony Corporation has developed the “Cell Computing Unit”, which uses the high computational performance provided by the Cell/B.E. and the RSX processors. This technology provides solutions to multimedia computing, which requires large number of computations, such as those encountered in image-processing applications, like computer graphics, and scientific computations for non-image- processing applications, such as encryption technology for security (Fig. 1). Multi-core processor with powerful computational performance - Cell/B.E. - The Cell/B.E. is an asymmetric multi-core processor jointly developed by Sony/SCEI, Toshiba Corporation, and IBM Corporation. This processor alone provides high floating-point arithmetic operations at 230 GFLOPS. In anticipation of real- time operation on 4K x 2K or High Definition images, the Cell processor is optimally designed for multimedia processing and usage in distributed computing environments. Architecture of the Cell/B.E. Two types of processor cores Two types of processor cores are mounted in the Cell/B.E.: PowerPC Processor Element (PPE), a control-intensive processor core for general-purpose processing; and Synergistic Processor Element (SPE), a processor core for high-speed data processing. The coordinated operation of these processor cores give rise to the high level of computational performance of the Cell/B.E.. The Basics of Cell Computing Technology GPU Target area Cell/B.E. 3D Volume Rendering Game CG Rendering Image Processing Search AV Codec Vision Computing Security (AES/DES) Science Computation Science Visualization CAD GUI Web Document Editor File Sysrem Network Protocol GPU Non-Graphic Processing Monolithic Graphic Processing Parallel Processing Fig. 1 Target area of Cell Computing Technology 1 WHITE PAPER
  • 2. 2 WHITE PAPER The Basics of Cell Computing Technology High-speed internal bus Four ring-shaped broadband buses called Element Interconnect Bus (EIB) connect the cores of the Cell/B.E. and peripheral chips. PPE, SPE, main memory (XDR™ DRAM), and external I/O devices (the RSX, SouthBridge, etc.) are also connected to the EIB (Fig. 2). General-purpose processor core - PPE - PPE is a general-purpose processor core with a 64-bit Power Architecture™ and can run existing software compatible with the PowerPC chip. One PPE is mounted in the Cell/B.E.. The PPE not only executes the operating system (input/output control to main memory, external devices, etc.) but also controls the SPEs. L1/L2 caches The PPE is equipped with 32 KB of L1 cache for instructions and data, respectively, and 512 KB of unified L2 cache for both instructions and data. Hardware multithreading Simultaneous multithreading technology is implemented in the PPE to provide two-way hardware multithreading. Vector operation unit A 128-bit vector operation unit supports the Single Instruction Multiple Data (SIMD) operations by Vector/SIMD Multimedia Extensions (VMX) technology. Computation-intensive processor core - SPE - SPE is a computationally intensive processor core that excels at multimedia processing, especially for repeated calculations. The PPE, on the other hand is optimized for complicated program control. Eight SPEs are employed in the Cell/B.E., Each SPE operates independently of the PPE and other SPEs and provides high computational performance. Vector oriented RISC processor The SPU instruction set architecture (ISA) supports 128-bit SIMD operations. All basic data processing is performed by this operation. Because of its very simple architecture, the SPE can realize high-speed processing where calculations can be predicted. It is very suitable for execution of an application where real time processing is required. Abundant register files As many as 128 of the 128-bit registers are on the SPE because of the large amount of data processing is likely to be performed by this device. By taking advantage of the abundant register files, high-speed arithmetic operations can be performed on a large volume of data. SPE-dedicated local memory Each SPE is equipped with 256 KB of scratch pad type memory called Local Store (LS). With no cache mounted, an SPE accesses LS directly at high speed. DMA transfer is used for data transfer between main memory and LS. Memory flow controller Access from SPE to main memory, PPE and other external I/O devices is performed through a unit called the Memory Flow Controller (MFC). The MFC is equipped with a DMA Controller to exchange data with the main memory. Because the MFC operates independently of program execution in an SPE, it can transfer data to external devices in parallel. Graphics engine with high-speed I/O function - RSX - The RSX is a graphics engine jointly developed by NVIDIA Corporation and SCEI. This device can be connected to the Cell/B.E. using a 128-bit memory interface called FlexIO™. Programmable shader engine The RSX processes graphics with vertex shader programs created by the user and computes fragment colors with fragment shader programs. The RSX has multiple fragment shader engines for precise graphics operation without overloading the Cell/B.E., which is difficult in fixed pipelined graphics engines. Rendering pipeline The rendering pipeline in the RSX consists of the host/ frontend, geometry, raster, fragmentation shader/texture, L2 texture cache, 2D raster, and Raster Operation (ROP). It also includes a frame buffer to control memory access and a display block to scan out from local memory (Fig. 3). Fig. 2 Overview of the Cell/B.E. and Peripheral Chips IOIF1/FlexIO™ MIC/XIO IOIF0/FlexIO™ PPE PPU L1 L2 SPE1 SPU LS MFC SPE3 SPU LS MFC SPE5 SPU LS MFC SPE7 SPU LS MFC RSX® XDR™ DRAM SPE0 SPU LS MFC SPE2 SPU LS MFC SPE4 SPU LS MFC SPE6 SPU LS MFC South Bridge EIB
  • 3. 3 The Basics of Cell Computing Technology WHITE PAPER High-speed I/O function The Cell/B.E. and the RSX are connected via a high-speed bus called FlexIO for efficient graphics operations. The RSX also features an external I/O function with performance exceeding 4 GB/s. Therefore, it can easily handle the I/O of high-resolution video signals, such as 4K x 2K and High Definition. It can also be used in I/F for professional use by taking advantage of this high-speed I/O performance. Sony's New Development -"Cell Computing Unit" Features of Cell Computing Unit The newly developed “Cell Computing Unit” is equipped with the high-performance Cell/B.E. processor with high floating- point arithmetic operation of 230 GFLOPS. Additional use of the RSX graphics engine enables much faster operation. An open system is offered by adopting the Linux operating system. Its eight SPEs can be freely used to support high-level of calculations in multimedia applications. Software configuration Cell Computing Unit is equipped with two systems: the main system for general applications and the mini system for maintenance. Linux is the operating system so that they can be switched at system startup. Under normal operation, the main system executes user applications. The mini system for maintenance is mainly used to update firmware and drivers. Graphics API Graphics programs for the RSX support the subset of OpenGL 1.5, which is an industry standard for graphics APIs. They also support the Cg shader language developed by NVIDIA Corporation for RSX-native graphics control. Therefore, the RSX acceleration can be programmed at a low level. Software development Applications running on the main system of the Cell Computing Unit can be developed under Linux on Intel architecture (x86). A cross-platform toolchain (binutils, compilers, etc.), debuggers, performance analysis tools, and integrated development environments are used. The SPE runtime management library for the Cell/B.E. and the graphics library for RSX hardware acceleration are currently being prepared. Porting and optimization of the existing programs In optimized porting of existing programs, it is important to understand how multiple SPEs are used. The basic optimization process of existing programs with the Cell/B.E. consist of three processes: (1) porting to PPE, (2) performance measurement and profiling, and (3) offloading to SPEs (Fig. 4). SPE offloading ports part of the processing in a program to the various SPEs. In the optimization process, a spiral type process is generally used, where profiling is performed again after optimization. Then, further optimization in performance is possible. Faster programs One can drastically improve the processing performance of programs with the Cell/B.E. by using a variety of optimization techniques. The most important optimization technique is how parallel they are. Programs are designed and implemented so that the following three parallelisms may be improved in every aspect. Host/FE Geometry Raster 2D Raster Fragment Shader & Texture L2 Texture Cache FB ROP (Raster Operation) IOIF/FlexIO™ Cell/B.E. Fig. 3 Major blocks in rendering pipeline Fig. 4 Basic process of optimization Existing Programs Porting to PPE Performance Analysis/Profiling Decide optimazation strategies SPE Offloading Other Architectures Power Architecture™ Cell/B.E. Architecture
  • 4. 4 WHITE PAPER The Basics of Cell Computing Technology Parallelism at the thread level The PPE can execute two threads in parallel with simultaneous multithreading. When eight SPEs are used, it can execute up to ten threads in parallel. This is because, in general, the PPE thread can directly access external I/O devices and handle such interruptions as hardware exceptions, it controls the entire system, while the SPE thread mainly handles data calculations. What kind of processing is performed in each thread must be considered to distribute the load of the PPE and SPE threads. Parallelism at the data level The streaming data operation repeats the same processing for multiple data. In the operation, one SIMD instruction can process multiple data in parallel (Fig. 5). All instructions in the SPE, including arithmetic, comparison, and bit operation instructions, support SIMD operation. The SIMD operation is also available in the PPE by using VMX instructions extended from AltiVec technology. Parallelism at the instruction level An SPE has two long pipelines (Even/Odd) with 26 stages. The instructions executed in these two pipelines are predetermined: arithmetic operations are executed in the Even pipeline, and load/store instructions are executed in the Odd pipeline. By optimizing the instruction sequence of a program, two instructions can be issued simultaneously per cycle (Dual Issue). Conclusion Sony Corporation will continue to develop technologies that enable large-scale calculation at high speed, which could not be obtained from traditional servers and workstations. We will strive for development of state-of-the-art technologies in high performance computing, such as CG rendering and scientific calculations by utilizing the maximum performance of the high-end Cell/B.E. processor and the high-speed GPU RSX. Trademarks Sony is a registered trademark of Sony Corporation. PLAYSTATION is a registered trademark of Sony Computer Entertainment Inc. mental images, mental ray and mental mill are registered trademarks and MetaSL is a trademark of mental images GmbH. All other trademarks are the properties of their respective owners. Disclaimer © 2008 Sony Corporation, All rights reserved. Reproduction in whole or in part without written permission of Sony Corporation is prohibited. Features and specifications are subject to change without notice. Fig. 5 SIMD operation (ADD instruction) a3a2a1a0 ++++ b3b2b1b0 a3+b3a2+b2a1+b1a0+b0