SlideShare a Scribd company logo
1 of 30
Design issues of IBM CellDesign issues of IBM Cell
ArchitectureArchitecture
Vitthal Gutthe MEIT 1326Vitthal Gutthe MEIT 1326
Pravin kumar Yadav MEIT 1338Pravin kumar Yadav MEIT 1338
Vyanktesh Dorlikar MEIT 1324Vyanktesh Dorlikar MEIT 1324
contentscontents
 General IntroductionGeneral Introduction
 History of developmentHistory of development
 Technical overview of architectureTechnical overview of architecture
 Detailed technical discussion ofDetailed technical discussion of
componentscomponents
 Design choicesDesign choices
 Cell programming issuesCell programming issues
History of DevelopmentHistory of Development
 Sony Playstation2Sony Playstation2
• Released March 2000 in JapanReleased March 2000 in Japan
• 128bit “Emotion Engine”128bit “Emotion Engine”
• With freq. of 294Mhz,MIPS CPUWith freq. of 294Mhz,MIPS CPU
• Having capability of 6.2gflops(gegaHaving capability of 6.2gflops(gega
floating point operation per second)floating point operation per second)
History ContinuedHistory Continued
 Partnership between Sony, Toshiba,Partnership between Sony, Toshiba,
IBM in Summer of 2000IBM in Summer of 2000
 Initial goal of 1000 x PS2 Power inInitial goal of 1000 x PS2 Power in
single Machinesingle Machine
 March 2001, Sony-IBM-ToshibaMarch 2001, Sony-IBM-Toshiba
design center opened with andesign center opened with an
investment of $400m investment.investment of $400m investment.
Overall Goals for CellOverall Goals for Cell
 High performance in multimedia appsHigh performance in multimedia apps
 Gain Real time performanceGain Real time performance
 Power consumption should bePower consumption should be
minimumminimum
 Cost as low as possibleCost as low as possible
 Available by 2005Available by 2005
 Avoid memory latency issuesAvoid memory latency issues
associated with control structuresassociated with control structures
The Cell itselfThe Cell itself
 Power PC basedPower PC based
main core (PPE)main core (PPE)
 MultipleMultiple
SPEs(Synergistic)SPEs(Synergistic)
 On die memoryOn die memory
controllercontroller
 Inter-coreInter-core
transport bustransport bus
 High speed IOHigh speed IO
Cell Die LayoutCell Die Layout
Cell ImplementationCell Implementation
 Cell is an architectureCell is an architecture
 Preliminary ImplementationPreliminary Implementation
• 1 PPE1 PPE
• 7 SPE (1 Disabled for yield increase)7 SPE (1 Disabled for yield increase)
• 221 mm² die size on a 90 nm process221 mm² die size on a 90 nm process
• Clocked at freq. 3-4ghzClocked at freq. 3-4ghz
• 256GFLOPS Single Precision @ 4ghz256GFLOPS Single Precision @ 4ghz
Why a Cell ArchitectureWhy a Cell Architecture
 Follows a trend in computingFollows a trend in computing
architecturearchitecture
 Natural extension of dual and multi-Natural extension of dual and multi-
corecore
 Extremely low hardware overheadExtremely low hardware overhead
 Software controllableSoftware controllable
 Specialized hardware more useful forSpecialized hardware more useful for
multimediamultimedia
Possible UsesPossible Uses
 Playstation3Playstation3
(Obviously)(Obviously)
 Blade servers (IBM)Blade servers (IBM)
• Amazing singleAmazing single
precision FPprecision FP
performanceperformance
• Scientific applicationsScientific applications
 Toshiba HDTVToshiba HDTV
productsproducts
Power Processing ElementPower Processing Element
 PowerPC instruction set with AltiVecPowerPC instruction set with AltiVec
 Used for general purpose computingUsed for general purpose computing
and controlling SPE’sand controlling SPE’s
 Simultaneous MultithreadingSimultaneous Multithreading
 Separate 32 KB L1 Caches andSeparate 32 KB L1 Caches and
unified 512 KB L2 Cacheunified 512 KB L2 Cache
PPE (cont.)PPE (cont.)
 Slow but power efficient PowerPCSlow but power efficient PowerPC
instruction set implementationinstruction set implementation
 Two issue in-order instruction fetchTwo issue in-order instruction fetch
 Conspicuous lack of instructionConspicuous lack of instruction
windowwindow
 Compare to conventional PowerPCCompare to conventional PowerPC
implementations (G5)implementations (G5)
 Performance depends on SPEPerformance depends on SPE
utilizationutilization
Synergistic Processing Element (SPE)Synergistic Processing Element (SPE)
 Specialized hardwareSpecialized hardware
 Meant to be used inMeant to be used in
parallelparallel
• (7 on PS3(7 on PS3
implementation)implementation)
 On chip memory (256kb)On chip memory (256kb)
 No branch predictionNo branch prediction
 In-order executionIn-order execution
 Dual issueDual issue
SPE ArchitectureSPE Architecture
 0.99µm2 on 90nm Process0.99µm2 on 90nm Process
 128 registers (128 bits wide)128 registers (128 bits wide)
• Instructions assumed to be 4x 32bitInstructions assumed to be 4x 32bit
 Variant of VMX instruction setVariant of VMX instruction set
• Modified for 128 registersModified for 128 registers
 On chip memory is NOT a cacheOn chip memory is NOT a cache
SPE ExecutionSPE Execution
 Dual issue, in-orderDual issue, in-order
 Seven execution unitsSeven execution units
 Vector logicVector logic
 8 single precision operations per8 single precision operations per
cyclecycle
 Significant performance hit forSignificant performance hit for
double precisiondouble precision
SPE Execution DiagramSPE Execution Diagram
SPE Local Storage AreaSPE Local Storage Area
 NOT a cacheNOT a cache
 256kb, 4 x 64kb ECC single port256kb, 4 x 64kb ECC single port
SRAMSRAM
 Completely private to each SPECompletely private to each SPE
 Directly addressable by softwareDirectly addressable by software
 Can be used as a cache, but onlyCan be used as a cache, but only
with software controlswith software controls
 No tag bits, or any extra hardwareNo tag bits, or any extra hardware
SPE LS SchedulingSPE LS Scheduling
 Software controlled DMASoftware controlled DMA
 DMA to and from main memoryDMA to and from main memory
 Scheduling a HUGE problemScheduling a HUGE problem
• Done primarily in softwareDone primarily in software
• IBM predicts 80-90% usage ideallyIBM predicts 80-90% usage ideally
 Request queue handles 16 simultaneousRequest queue handles 16 simultaneous
requestsrequests
• Up to 16 kb transfer eachUp to 16 kb transfer each
• Priority: DMA, L/S, FetchPriority: DMA, L/S, Fetch
 Fetch / execute parallelismFetch / execute parallelism
SPE Control LogicSPE Control Logic
 Very little in comparisonVery little in comparison
 Represents shift in focusRepresents shift in focus
 Complete lack of branch predictionComplete lack of branch prediction
• Software branch predictionSoftware branch prediction
• Loop unrollingLoop unrolling
• 18 cycle penalty18 cycle penalty
 Software controlled DMASoftware controlled DMA
SPE PipelineSPE Pipeline
 Little ILP, and thusLittle ILP, and thus
little control logiclittle control logic
 Dual issueDual issue
 Simple commitSimple commit
unit (no reorderunit (no reorder
buffer or otherbuffer or other
complexities)complexities)
 Same executionSame execution
unit for FP/intunit for FP/int
SPE SummarySPE Summary
 Essentially small vector computerEssentially small vector computer
 Based on Altivec/VMX ISABased on Altivec/VMX ISA
• Extensions for DMA and LS managementExtensions for DMA and LS management
• Extended for 128x 128bit registerfileExtended for 128x 128bit registerfile
 Uniquely suited for real time applicationsUniquely suited for real time applications
 Extremely fast for certain FP operationsExtremely fast for certain FP operations
 Offload a large amount on to compiler /Offload a large amount on to compiler /
software.software.
Element Interconnect BusElement Interconnect Bus
 4 concentric rings connecting all Cell4 concentric rings connecting all Cell
elementselements
 128-bit wide interconnects128-bit wide interconnects
EIB (cont.)EIB (cont.)
 Designed to minimize coupling noiseDesigned to minimize coupling noise
 Rings of data traveling in alternatingRings of data traveling in alternating
directionsdirections
 Buffers and repeaters at each SPEBuffers and repeaters at each SPE
boundaryboundary
 Architecture can be scaled up withArchitecture can be scaled up with
increased bus latencyincreased bus latency
EIB (cont.)EIB (cont.)
 Total bandwidth at ~200GB/sTotal bandwidth at ~200GB/s
 EIB controller located physically inEIB controller located physically in
center of chip between SPE’scenter of chip between SPE’s
 Controller reserves channels for eachController reserves channels for each
individual data transfer requestindividual data transfer request
 Implementation allows for SPEImplementation allows for SPE
extension horizontallyextension horizontally
Memory InterfaceMemory Interface
 Rambus XDR memory to keep Cell atRambus XDR memory to keep Cell at
full utilizationfull utilization
 3.2 Gbps data bandwidth per device3.2 Gbps data bandwidth per device
connected to XDR interfaceconnected to XDR interface
 Cell uses dual channel XDR with fourCell uses dual channel XDR with four
devices and 16-bit wide buses todevices and 16-bit wide buses to
achieve 25.2 GB/s total memoryachieve 25.2 GB/s total memory
bandwidthbandwidth
Input / Output BusInput / Output Bus
 Rambus FlexIO BusRambus FlexIO Bus
 IO interface consists of 12IO interface consists of 12
unidirectional byte lanesunidirectional byte lanes
 Each lane supports 6.4 GB/sEach lane supports 6.4 GB/s
bandwidthbandwidth
 7 outbound lanes and 5 inbound7 outbound lanes and 5 inbound
laneslanes
Design ChoicesDesign Choices
 In-order executionIn-order execution
• Abandoning ILPAbandoning ILP
• ILP – 10-20% increase per generationILP – 10-20% increase per generation
• Reducing control logicReducing control logic
• Real time responsivenessReal time responsiveness
 Cache DesignCache Design
• Software configuration on SPESoftware configuration on SPE
• Standard L2 cache on PPEStandard L2 cache on PPE
Cell Programming IssuesCell Programming Issues
 No Cell compiler in existence to manageNo Cell compiler in existence to manage
utilization of SPE’s at compile timeutilization of SPE’s at compile time
 SPE’s do not natively support contextSPE’s do not natively support context
switching. Must be OS managed.switching. Must be OS managed.
 SPE’s are vector processors. Not efficientSPE’s are vector processors. Not efficient
for general-purpose computation.for general-purpose computation.
 PPE’s and SPE’s use different instructionPPE’s and SPE’s use different instruction
sets.sets.
Cell Programming (cont.)Cell Programming (cont.)
 Functional Offload ModelFunctional Offload Model
 Simplest model for Cell programmingSimplest model for Cell programming
 Optimize existing libraries for SPEOptimize existing libraries for SPE
computationcomputation
 Requires no rebuild of mainRequires no rebuild of main
application logic which runs on PPEapplication logic which runs on PPE
RefrencesRefrences
• "Synergistic Processing in Cell's Multicore
Architecture"(PDF). IEEE. Retrieved 2007-03-22.
•Jump up^ "Cell Designer talks about PS3 and IBM
Cell Processors". Retrieved 2007-03-22.
•Jump up^ "Cell Broadband Engine Interconnect
and Memory Interface"(PDF). IBM. Retrieved 2007-
03-22.
•http://en.wikipedia.org/wiki/Cell_(microprocessor)

More Related Content

What's hot (20)

Network protocals
Network protocalsNetwork protocals
Network protocals
 
Process management os concept
Process management os conceptProcess management os concept
Process management os concept
 
Memory management
Memory managementMemory management
Memory management
 
Single &Multi Core processor
Single &Multi Core processorSingle &Multi Core processor
Single &Multi Core processor
 
Processes and threads
Processes and threadsProcesses and threads
Processes and threads
 
Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]
 
Gopher Protocol
Gopher ProtocolGopher Protocol
Gopher Protocol
 
Operating Systems: Process Scheduling
Operating Systems: Process SchedulingOperating Systems: Process Scheduling
Operating Systems: Process Scheduling
 
IPV6 ADDRESS
IPV6 ADDRESSIPV6 ADDRESS
IPV6 ADDRESS
 
Disk structure
Disk structureDisk structure
Disk structure
 
Cache memory
Cache memoryCache memory
Cache memory
 
Osi model
Osi modelOsi model
Osi model
 
Comuputer processor
Comuputer processorComuputer processor
Comuputer processor
 
Transmission Modes in Computer Networks
Transmission Modes in Computer Networks Transmission Modes in Computer Networks
Transmission Modes in Computer Networks
 
Networking devices
Networking devicesNetworking devices
Networking devices
 
IPv4
IPv4IPv4
IPv4
 
OS - Process Concepts
OS - Process ConceptsOS - Process Concepts
OS - Process Concepts
 
Ethernet - Networking presentation
Ethernet - Networking presentationEthernet - Networking presentation
Ethernet - Networking presentation
 
OSI Model
OSI ModelOSI Model
OSI Model
 
Embedded systems basics
Embedded systems basicsEmbedded systems basics
Embedded systems basics
 

Viewers also liked

Encryptioon and key management introduction
Encryptioon and key management introductionEncryptioon and key management introduction
Encryptioon and key management introductionVyanktesh Dorlikar
 
Lec Jan12 2009
Lec Jan12 2009Lec Jan12 2009
Lec Jan12 2009Ravi Soni
 
Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3IBMInfoSphereUGFR
 

Viewers also liked (6)

Encryptioon and key management introduction
Encryptioon and key management introductionEncryptioon and key management introduction
Encryptioon and key management introduction
 
Lec Jan12 2009
Lec Jan12 2009Lec Jan12 2009
Lec Jan12 2009
 
9/27 PPT RSS
9/27 PPT RSS9/27 PPT RSS
9/27 PPT RSS
 
wireless biometric system
wireless biometric systemwireless biometric system
wireless biometric system
 
Object oriented data model
Object oriented data modelObject oriented data model
Object oriented data model
 
Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3
 

Similar to Ibm cell

3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Community
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Community
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Community
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overviewlambertt
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Community
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Odinot Stanislas
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalTommy Lee
 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Blockoscon2007
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Community
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Community
 
1 emc vs_compellent
1 emc vs_compellent1 emc vs_compellent
1 emc vs_compellentjyoti_j2
 

Similar to Ibm cell (20)

3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash Storage
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Block
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage
 
1 emc vs_compellent
1 emc vs_compellent1 emc vs_compellent
1 emc vs_compellent
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 

Recently uploaded

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Recently uploaded (20)

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 

Ibm cell

  • 1. Design issues of IBM CellDesign issues of IBM Cell ArchitectureArchitecture Vitthal Gutthe MEIT 1326Vitthal Gutthe MEIT 1326 Pravin kumar Yadav MEIT 1338Pravin kumar Yadav MEIT 1338 Vyanktesh Dorlikar MEIT 1324Vyanktesh Dorlikar MEIT 1324
  • 2. contentscontents  General IntroductionGeneral Introduction  History of developmentHistory of development  Technical overview of architectureTechnical overview of architecture  Detailed technical discussion ofDetailed technical discussion of componentscomponents  Design choicesDesign choices  Cell programming issuesCell programming issues
  • 3. History of DevelopmentHistory of Development  Sony Playstation2Sony Playstation2 • Released March 2000 in JapanReleased March 2000 in Japan • 128bit “Emotion Engine”128bit “Emotion Engine” • With freq. of 294Mhz,MIPS CPUWith freq. of 294Mhz,MIPS CPU • Having capability of 6.2gflops(gegaHaving capability of 6.2gflops(gega floating point operation per second)floating point operation per second)
  • 4. History ContinuedHistory Continued  Partnership between Sony, Toshiba,Partnership between Sony, Toshiba, IBM in Summer of 2000IBM in Summer of 2000  Initial goal of 1000 x PS2 Power inInitial goal of 1000 x PS2 Power in single Machinesingle Machine  March 2001, Sony-IBM-ToshibaMarch 2001, Sony-IBM-Toshiba design center opened with andesign center opened with an investment of $400m investment.investment of $400m investment.
  • 5. Overall Goals for CellOverall Goals for Cell  High performance in multimedia appsHigh performance in multimedia apps  Gain Real time performanceGain Real time performance  Power consumption should bePower consumption should be minimumminimum  Cost as low as possibleCost as low as possible  Available by 2005Available by 2005  Avoid memory latency issuesAvoid memory latency issues associated with control structuresassociated with control structures
  • 6. The Cell itselfThe Cell itself  Power PC basedPower PC based main core (PPE)main core (PPE)  MultipleMultiple SPEs(Synergistic)SPEs(Synergistic)  On die memoryOn die memory controllercontroller  Inter-coreInter-core transport bustransport bus  High speed IOHigh speed IO
  • 7. Cell Die LayoutCell Die Layout
  • 8. Cell ImplementationCell Implementation  Cell is an architectureCell is an architecture  Preliminary ImplementationPreliminary Implementation • 1 PPE1 PPE • 7 SPE (1 Disabled for yield increase)7 SPE (1 Disabled for yield increase) • 221 mm² die size on a 90 nm process221 mm² die size on a 90 nm process • Clocked at freq. 3-4ghzClocked at freq. 3-4ghz • 256GFLOPS Single Precision @ 4ghz256GFLOPS Single Precision @ 4ghz
  • 9. Why a Cell ArchitectureWhy a Cell Architecture  Follows a trend in computingFollows a trend in computing architecturearchitecture  Natural extension of dual and multi-Natural extension of dual and multi- corecore  Extremely low hardware overheadExtremely low hardware overhead  Software controllableSoftware controllable  Specialized hardware more useful forSpecialized hardware more useful for multimediamultimedia
  • 10. Possible UsesPossible Uses  Playstation3Playstation3 (Obviously)(Obviously)  Blade servers (IBM)Blade servers (IBM) • Amazing singleAmazing single precision FPprecision FP performanceperformance • Scientific applicationsScientific applications  Toshiba HDTVToshiba HDTV productsproducts
  • 11. Power Processing ElementPower Processing Element  PowerPC instruction set with AltiVecPowerPC instruction set with AltiVec  Used for general purpose computingUsed for general purpose computing and controlling SPE’sand controlling SPE’s  Simultaneous MultithreadingSimultaneous Multithreading  Separate 32 KB L1 Caches andSeparate 32 KB L1 Caches and unified 512 KB L2 Cacheunified 512 KB L2 Cache
  • 12. PPE (cont.)PPE (cont.)  Slow but power efficient PowerPCSlow but power efficient PowerPC instruction set implementationinstruction set implementation  Two issue in-order instruction fetchTwo issue in-order instruction fetch  Conspicuous lack of instructionConspicuous lack of instruction windowwindow  Compare to conventional PowerPCCompare to conventional PowerPC implementations (G5)implementations (G5)  Performance depends on SPEPerformance depends on SPE utilizationutilization
  • 13. Synergistic Processing Element (SPE)Synergistic Processing Element (SPE)  Specialized hardwareSpecialized hardware  Meant to be used inMeant to be used in parallelparallel • (7 on PS3(7 on PS3 implementation)implementation)  On chip memory (256kb)On chip memory (256kb)  No branch predictionNo branch prediction  In-order executionIn-order execution  Dual issueDual issue
  • 14. SPE ArchitectureSPE Architecture  0.99µm2 on 90nm Process0.99µm2 on 90nm Process  128 registers (128 bits wide)128 registers (128 bits wide) • Instructions assumed to be 4x 32bitInstructions assumed to be 4x 32bit  Variant of VMX instruction setVariant of VMX instruction set • Modified for 128 registersModified for 128 registers  On chip memory is NOT a cacheOn chip memory is NOT a cache
  • 15. SPE ExecutionSPE Execution  Dual issue, in-orderDual issue, in-order  Seven execution unitsSeven execution units  Vector logicVector logic  8 single precision operations per8 single precision operations per cyclecycle  Significant performance hit forSignificant performance hit for double precisiondouble precision
  • 16. SPE Execution DiagramSPE Execution Diagram
  • 17. SPE Local Storage AreaSPE Local Storage Area  NOT a cacheNOT a cache  256kb, 4 x 64kb ECC single port256kb, 4 x 64kb ECC single port SRAMSRAM  Completely private to each SPECompletely private to each SPE  Directly addressable by softwareDirectly addressable by software  Can be used as a cache, but onlyCan be used as a cache, but only with software controlswith software controls  No tag bits, or any extra hardwareNo tag bits, or any extra hardware
  • 18. SPE LS SchedulingSPE LS Scheduling  Software controlled DMASoftware controlled DMA  DMA to and from main memoryDMA to and from main memory  Scheduling a HUGE problemScheduling a HUGE problem • Done primarily in softwareDone primarily in software • IBM predicts 80-90% usage ideallyIBM predicts 80-90% usage ideally  Request queue handles 16 simultaneousRequest queue handles 16 simultaneous requestsrequests • Up to 16 kb transfer eachUp to 16 kb transfer each • Priority: DMA, L/S, FetchPriority: DMA, L/S, Fetch  Fetch / execute parallelismFetch / execute parallelism
  • 19. SPE Control LogicSPE Control Logic  Very little in comparisonVery little in comparison  Represents shift in focusRepresents shift in focus  Complete lack of branch predictionComplete lack of branch prediction • Software branch predictionSoftware branch prediction • Loop unrollingLoop unrolling • 18 cycle penalty18 cycle penalty  Software controlled DMASoftware controlled DMA
  • 20. SPE PipelineSPE Pipeline  Little ILP, and thusLittle ILP, and thus little control logiclittle control logic  Dual issueDual issue  Simple commitSimple commit unit (no reorderunit (no reorder buffer or otherbuffer or other complexities)complexities)  Same executionSame execution unit for FP/intunit for FP/int
  • 21. SPE SummarySPE Summary  Essentially small vector computerEssentially small vector computer  Based on Altivec/VMX ISABased on Altivec/VMX ISA • Extensions for DMA and LS managementExtensions for DMA and LS management • Extended for 128x 128bit registerfileExtended for 128x 128bit registerfile  Uniquely suited for real time applicationsUniquely suited for real time applications  Extremely fast for certain FP operationsExtremely fast for certain FP operations  Offload a large amount on to compiler /Offload a large amount on to compiler / software.software.
  • 22. Element Interconnect BusElement Interconnect Bus  4 concentric rings connecting all Cell4 concentric rings connecting all Cell elementselements  128-bit wide interconnects128-bit wide interconnects
  • 23. EIB (cont.)EIB (cont.)  Designed to minimize coupling noiseDesigned to minimize coupling noise  Rings of data traveling in alternatingRings of data traveling in alternating directionsdirections  Buffers and repeaters at each SPEBuffers and repeaters at each SPE boundaryboundary  Architecture can be scaled up withArchitecture can be scaled up with increased bus latencyincreased bus latency
  • 24. EIB (cont.)EIB (cont.)  Total bandwidth at ~200GB/sTotal bandwidth at ~200GB/s  EIB controller located physically inEIB controller located physically in center of chip between SPE’scenter of chip between SPE’s  Controller reserves channels for eachController reserves channels for each individual data transfer requestindividual data transfer request  Implementation allows for SPEImplementation allows for SPE extension horizontallyextension horizontally
  • 25. Memory InterfaceMemory Interface  Rambus XDR memory to keep Cell atRambus XDR memory to keep Cell at full utilizationfull utilization  3.2 Gbps data bandwidth per device3.2 Gbps data bandwidth per device connected to XDR interfaceconnected to XDR interface  Cell uses dual channel XDR with fourCell uses dual channel XDR with four devices and 16-bit wide buses todevices and 16-bit wide buses to achieve 25.2 GB/s total memoryachieve 25.2 GB/s total memory bandwidthbandwidth
  • 26. Input / Output BusInput / Output Bus  Rambus FlexIO BusRambus FlexIO Bus  IO interface consists of 12IO interface consists of 12 unidirectional byte lanesunidirectional byte lanes  Each lane supports 6.4 GB/sEach lane supports 6.4 GB/s bandwidthbandwidth  7 outbound lanes and 5 inbound7 outbound lanes and 5 inbound laneslanes
  • 27. Design ChoicesDesign Choices  In-order executionIn-order execution • Abandoning ILPAbandoning ILP • ILP – 10-20% increase per generationILP – 10-20% increase per generation • Reducing control logicReducing control logic • Real time responsivenessReal time responsiveness  Cache DesignCache Design • Software configuration on SPESoftware configuration on SPE • Standard L2 cache on PPEStandard L2 cache on PPE
  • 28. Cell Programming IssuesCell Programming Issues  No Cell compiler in existence to manageNo Cell compiler in existence to manage utilization of SPE’s at compile timeutilization of SPE’s at compile time  SPE’s do not natively support contextSPE’s do not natively support context switching. Must be OS managed.switching. Must be OS managed.  SPE’s are vector processors. Not efficientSPE’s are vector processors. Not efficient for general-purpose computation.for general-purpose computation.  PPE’s and SPE’s use different instructionPPE’s and SPE’s use different instruction sets.sets.
  • 29. Cell Programming (cont.)Cell Programming (cont.)  Functional Offload ModelFunctional Offload Model  Simplest model for Cell programmingSimplest model for Cell programming  Optimize existing libraries for SPEOptimize existing libraries for SPE computationcomputation  Requires no rebuild of mainRequires no rebuild of main application logic which runs on PPEapplication logic which runs on PPE
  • 30. RefrencesRefrences • "Synergistic Processing in Cell's Multicore Architecture"(PDF). IEEE. Retrieved 2007-03-22. •Jump up^ "Cell Designer talks about PS3 and IBM Cell Processors". Retrieved 2007-03-22. •Jump up^ "Cell Broadband Engine Interconnect and Memory Interface"(PDF). IBM. Retrieved 2007- 03-22. •http://en.wikipedia.org/wiki/Cell_(microprocessor)