SlideShare a Scribd company logo
Kish Hirani
Sebastien Schertenleib
Sony Computer Entertainment Europe
Research & Development Division
Slide 2
• PlayStation® Development Resources
• Architecture of the PlayStation®3 (PS3™)
• Case studies
– Cover techniques used by 1st party titles for both the PSP®
(PlayStation®Portable) and PS3™
• PlayStation® Advanced Programming
– Graphics performance
– PS3™ SPE utilisation
What We Will Be Covering
Kish Hirani
Head of Developer Services
Sony Computer Entertainment Europe
(SCEE)
Developer Services
SCEE - Research & Development Division
Slide 4
Developer Services
• Great Marlborough Street in
Central London
• Cover ‘PAL’ Region
Developers
Slide 5
• Support – SCE DevNet
• Field Engineering
• Technical and Strategic Consultancy
• Technical Training
• TRC Consultancy
• Projects
What We Cover
Slide 6
PlayStation® DevNet
Slide 7
PS3 DevKit
Latest additions to Dev Tools
PS3 Test Kit
Slide 8
Existing PSP DevKit
Latest additions to Dev Tools
PSPgo
Commander Arm
Slide 9
• PS3 DevKit: €1,700
• PS3 TestKit: €900
• PSP DevKit: €1,200
• PSP Commander: €300 euros
Dev Tools Prices
Slide 10
• Option for Digital Distribution via
PlayStation® Store as well as disc
• Same with full PSP titles
• And now minis
Digital Distribution Options
Slide 11
• PSN available in 58 countries
– 12 languages, 22 currencies
– Including Russia! 90% of all titles release are now localised.
• 700+ games and demos available
• 600+ million items downloaded worldwide
• More than 130K registered in Russia
• Cumulative Russia PSN Sales €650K
– as of FY2009 Q3
PlayStation®Network Vital Statistics
Slide 12
• 52.9 million globally (as of end June 2009)
– 17 million in SCEE Region
– More than a 850K in Russia
• Russia Tie Ratio: 2:1
PSP Vital Statistics
Slide 13
• 23.7 million PS3s (as of end of June 2009)
– One million of the new model sold in three weeks in
September
• More than 190K in the Russia so far
– PlayStation and PS2 both sold more than 2 million in
Russia during their lifespan (PS2 still selling well)
• 68% of PS3 users in Russia go online
• Russia Tie Ratio 5:1
PS3 Vital Statistics
Slide 14
https://www.tpr.scee.net/
Where to start if you would like to become a registered PlayStation Developer
Slide 15
http://www.worldwidestudios.net/xdev
For registered developers who wish to be considered as a SCEE 2nd Partner Developer
Dr Sebastien Schertenleib
Developer Services - SCEE Research & Development
PS3™ Hardware Overview
Slide 17
• Console vs PC Development
• Main Memory
• RSX™
– Bandwidth, Pipeline and Data Placement
• Cell Broadband Engine™
– Element Interconnect Bus, PPE and SPE
• Benefits of SPU Programming
• Data Storage
• Peripherals
PS3™ Hardware Overview
Slide 18
PC vs Console Game Development
PC
• Open Platform
• Limitless Resources
• Generic API
• IHV Drivers
Console
• Closed Platform
• Fixed Hardware
• Tiny HW Abstraction
• Direct access
Slide 19
• High performance on a budget
• Fixed hardware target with a long life cycles
– Games have to get better every year
– Waiting for hardware to catch up with software is not an option
• Have to understand the strengths and weaknesses of the
hardware
Why Consoles Are Different
Slide 20
• Performance benefits from understanding of how the
hardware works
– Does not mean coding in assembly
– Need to have an awareness of how software design may affect
performance
– Generally code optimized for consoles run also faster on PC
Why Consoles Are Different
Slide 21
• Memory is a very finite resource
• Memory access is slow compare to processor speed
– This gulf is getting larger on all platforms 
– Cache is trying to insulate the application from memory latency
– But cache on consoles are generally smaller than on a PC
Why Consoles Are Different
Slide 22
System Architecture
Slide 23
Main Memory
Slide 24
• Traditionally, heap function is called (malloc/new)
– Has to be contiguous
– Fragmentation possible
• Recommend acquiring memory through mmapper
– Build heap on top of this
– Non-contiguous pages mapped to contiguous address space
– Not subject to fragmentation
– More control: page sizes, access rights
Memory Management
Slide 25
• Games will typically have their own memory manager
– Small amount of memory compared to PC
– Conservative with data structures and memory allocations
• Different strategies for different uses
– Memory pools, custom heaps, memory tracking
• Understand the fragmentation, performance and space implications
for each tool and apply them appropriately
Memory Management
Slide 26
• Fetching data from memory into a processor can easily dominate
performance
– L1 cache miss 10~30 cycles
– L2 cache miss ~300 cycles
• Most applications exhibit locality of code or data
– A loop executes the same small set of instructions on data located
contiguously in memory
• Caches aim to increase overall system performance by taking advantage
of this
– Fast on die memory is very expensive so they are limited in size
Caches
Slide 27
Vertex
Shader
Setup/
Rasteriser
Fragment
Shader
Raster
Operations
GDDR
GDDR
RSX™
RSX™
Slide 28
• RSX™ has access to two memories
• Local Memory
–249MB usable (256MB – 7MB reserved by OS)
• System Memory
–213MB usable (256MB – 43MB reserved by OS)
–RSX™ usable through memory mapping by application
in 1MB blocks
Memory Setup
Slide 29
Memory Architecture
Main Memory
Cell RSX™
Local Memory
Slide 30
System Bandwidth
25.6GB/s 20.8GB/s
Audio Out
Video Out
15GB/s
20GB/s
Audio
I/F
Video
I/F
Cell
Cell
I/F
GPU
Main Memory Local Memory
Slide 31
Vertex
Shader
Triangle
Setup
Rasteriser Fragment
Shader
Raster
Operations
Vertex/Index DataVertex/Index Data
Local MemoryMain Memory
RSX™ Pipeline
Slide 32
Vertex
Shader
Rasteriser Fragment
Shader
Main Texture
Cache
Local Texture
Cache
Textures Textures
Triangle
Setup
Raster
Operations
Local MemoryMain Memory
RSX™ Pipeline
Slide 33
Z-Cull
Feedback
Fragment
Shader
Texture/
Framebuffer
Texture/
Framebuffer
Raster
Operations
Vertex
Shader
Triangle
Setup
Rasteriser
Local MemoryMain Memory
RSX™ Pipeline
Slide 34
• Able to utilise graphics data in either memory
– Vertex attributes, Textures
• Frame Buffer stores colour & depth surfaces
– Can also be in either memory
– Local memory supports colour & Z compression for MSAA modes
• Saves bandwidth rather than space
• Generally local memory accesses are faster
• If local memory bandwidth is bottleneck → consider moving
assets to system memory
Data Placement
Slide 35
How triangles end up on screen
RSX™
Local Memory
Cell
Main Memory
Verts
Textures
Shaders
Verts
Textures
ShadersCmd Buffer
Slide 36
Cell Broadband Engine™
Slide 37
Cell Broadband Engine™
• Processor core of PS3™
– Clocked at 3.2GHz
• Contains 7 processing cores.
– PowerPC Processor Element (PPE)
• 512k cache memory
– 6 Synergistic Processing Elements (SPE)
• 256k local memory store
Slide 38
Inputs and Outputs
Slide 39
• Memory Interface Controller (MIC)
– Controls the 2 Rambus XDR channels.
– Theoretical 25.6GB/s transfer
• IOIF 0 - Connection to RSX™
– 20 GB/s to RSX™
– 15 GB/s from RSX™
• IOIF1 – Connection to South Bridge
– 2.5GB/s to/from devices
Inputs and Outputs
Slide 40
Element Interconnect Bus
Slide 41
• This connects all units in the CBE
• Consist of 4 128bit buses
• All data transfers are 128 byte packets
– Smaller transfers sent as partial packed
Element Interconnect Bus
Slide 42
Element Interconnect Bus
Slide 43
Element Interconnect Bus
Slide 44
PowerPC Processing Element
Slide 45
• 64 bit CPU core
• 25.6 GFLOPs FPU
• In order execution
• Execution units include:
– IEEE double precision FPU
– Integer
– Float vector unit (VMX)
PowerPC Processing Element
Slide 46
PPU
E
I
B
PowerPC Processing Element
Slide 47
• Intrinsics are supported, preferable to inline assembly
–Scheduling and optimisation
• Two hardware threads
–2 or more active software threads act to mitigate stalls
• Good use of caches can make big difference in
performance
PPE Performance
Slide 48
• Power PC issue that can severely affect performance
• PPU has a store queue, that queues up stores waiting to
get sent to the cache
• If a load occurs that conflicts with data in store queue, the
load will stall until the data is stored
– Pipeline stall, around 40 cycles
• Find using Tuner
– LHS performance counter
Load Hit Store
Slide 49
• Type conversion
– Integer, floating point or vector registers
– Moves have to go through memory as there are no asm
instructions to move data between register sets
• Stack access - when calling a function that has:
– Large parameters - will be passed on the stack
– Many parameters - some may end up on the stack
– Few instructions - with stack based parameters
LHS Cases
Slide 50
Synergistic Processor Elements
Slide 51
SPE Architecture
• 3.2GHz processor
• 128 Vector registers
• 128 bit
• No scalar registers
• 256kB Local Store
• Very fast, like L1 cache
• Runs Asynchronously from rest of system
• Independent Memory Flow Controller (MFC)
• SIMD (single instruction multiple data)
• No Operating System, Very few system calls
• Performance is high and deterministic
SPE
EIB
MFC
SPU
Local Store
Slide 52
• SPE means Synergistic Processing Element
– Mutual dependence between PPE and SPEs
• PPE runs operating system and performs top level control
– General Purpose Hardware (PowerPC)
• SPEs provide bulk of application performance
– Simple hardware
– Simple pipeline
– Simple memory architecture
– Massive computational horsepower
SPEs
Slide 53
• Simple, predictable instruction set
– Has full SIMD support
– Intrinsics are supported for all instructions
• Optimised compiler
– Handles scalar conversion
– Often as good as PPE for scalar code, can be quicker
• Easy to program; can be programmed in C/C++
• Core power of system, key to performance
• Frees up RSX™, frees up PPE
Synergistic Processor Elements
Slide 54
• Ported PPU code usually faster for optimised & non-optimised code
– Port your code
• Easy to program; with some limitations can be programmed in C/C++
• Scalar code can be run
– Full SIMD instruction set
• Multithreaded engine analogy - can move threads to SPUs
SPU Development
Slide 55
• Raw
– You own the hardware
– Complete control, direct hardware access
– Once loaded – stays resident
• SPU Threads
– The OS owns the hardware
– Must belong to a thread group
– Scheduled by Cell OS Lv-2 kernel – based on group priority
• SPURS
– Higher level abstraction
– Plays well with middleware
Ways of Using SPUs
Slide 56
• Runs on-top of SPU Threads
– Small kernel on SPU
• Thread switching more efficient
– Requires no PPU resources
– PPU SPURS Handler: only to restore thread group when necessary
• Easier to synchronise threads
• Easier to balance load on multiple SPUs
– SPUs grab work when available
SPURS (SPU Run Time System)
Slide 57
SPURS (SPU Run Time System)
Policy Module
SPURS Kernel
Work
Load
Policy Module
SPURS Kernel
Work
Load
Policy Module
SPURS Kernel
SPURS
Handler
PPU SPU0 SPU1 SPU2
Thread Group for SPURS
• Implemented as an
SPU Thread Group
• Kernel, Policy
Module, Workload
• Minimal PPU
Overhead
Work
Load
Slide 58
• Jobs
– A chain of execution units
– Suited to short execution times
– Automatic DMA transfer of input & output data
– Executed with associated data
– No context switching – run complete
• Tasks
– Different data model, larger programs
– Can yield - synchronisation mechanisms
– Optimised context switch
– Similar to threads
SPURS Workload Types
Slide 59
• SPURS JobChain uses manual control of the execution sequence
• SPURS JobQueue is designed for dynamic job submission in parallel
SPURS Job Queue
SPU
SPU
SPURS Job Queue
PPU Fiber
Port
PPU Thread
SPU
Submitted jobs
SPURS Task
submitting in parallel executing in parallel
Slide 60
Example Application 1
Slide 61
• Problem: Update an array of objects and submit the visible
ones for rendering
• Object.update() method was bottleneck on PPE
– Determined using SN Tuner
– This function generates the world space matrices for each
object
– Embarrassingly parallel
• No data dependencies between objects
• If we had 1 processor for each object we could do that 
Function off load example using a SPURS Task
Slide 62
SPURS Tasks
//---Conditionally calling SPU Code
if (usingSPU==false) //---------------------------generic version
{
for(int i=0; i<OBJECT_COUNT; i++)
{
testObject[i].update();
}
}
else //-------------------------------------------SPU version
{
int test=spurs.ThrowTaskRunComplete(taskUpdateObject_elf_start,
(int)testObject,
OBJECT_COUNT,0,0);
//could do something useful here…
int result = spurs.CatchTaskRunComplete(test);
}
Slide 63
• Getting data in and out of SPU is first problem
• Get this working before worrying about actual processing
– Brute force often works just fine
• DMA entire object array into SPE memory
• Run update method
• DMA entire object array back to main memory
• List DMA allows you to get fancy
– Gather and Scatter operation
• If the data set is really big or if you need every last drop of performance
– Streaming model
• Overlap DMA transfers with processing
• Double or quad buffering
SPURS Task
Slide 64
SPURS Task: Getting Data in and out
int cellSpuMain(.....)
{
int sourceAddr = ....; //source address
int count = ....; //number of data elements
int dataSize = count * sizeof(gfxObject); //amount of memory to get
gfxObject *buf = (gfxObject*)memalign(128,dataSize); //alloc mem
DmaGet((void*)buf,sourceAddr,dataSize);
DmaWait(....);
//---data is loaded at this point so do something interesting
DmaPut((void*)buf,sourceAddr,dataSize);
DmaWait(....);
return(0);
}
Slide 65
• Keep code the same as PPE Version
– Conditional compilation based on processor type
– Might have to split code into a separate file
SPURS Task: Executing the Code
//--- SPU code to call the update method
//--- data is loaded at this point so do something interesting
//--- step through array of objects and update
gfxObject *tp;
for(int i=0; i<count; i++)
{
tp = &buf[i];
tp->update(); //same method as on PPE compiled for SPU
}
Slide 66
• 5x Speed up in this instance
– Brute force solution
– Could add more SPUS
– Problem is embarrassingly parallel
• Note that the same source code for the method is being
used on PPU and SPU
– Code stays in sync
– No fancy SPU specific optimisations
SPURS Task: Results
Slide 67
• PPU Stalled while waiting for SPU to Process
• SPU Data get and put were synchronous
– Not using hardware to its fullest extent
– Easy to get more if we need it
SPURS Task: Time Waster
PPU PPU
GET PUTExec
TIME
SPU
Slide 68
Example Application 2
Slide 69
Physics Solver
Parallel SPU x5PPU
512
22ms
3200
23ms
Slide 70
PPU Solver
• Collision detection and response is a big, involved process
0
50
100
150
200
250
0 100 200 300 400 500 600
Time(ms)
Step
Integrate
Collision
Response
3200 boxes
Slide 71
Development History
0
50
100
150
200
250
PPU x1 SPU x1 SPU x5 SPU x5 (Optimised)
Time(ms)
Total
Time(ms)
Slide 72
SPU solver
0
5
10
15
20
25
0 100 200 300 400 500 600
Time(ms)
Step
Integrate
Collision
Response
Time(ms)
Slide 73
• Simple pipeline
– No stalls like LHS
• No Operating System, very few system calls
– Performance is high and deterministic
• Memory is very fast like L1 cache
– Memory transfers a asynchronous so can be hidden
• Even if performance is comparable, frees up the PPE to do other stuff
SPEs are faster than the PPE
Slide 74
• Double buffer
• Align data and transfers optimally
• Vectorise data: SOA instead of AOS
• Use SIMD
– 4 operations at once = 4x speed up
• Use intrinsics
• Program directives - branch hinting
Further SPU Optimisations
Slide 75
Double-Buffering
Buffer0 Buffer0 Buffer0 Buffer0
Buffer1 Buffer1 Buffer1 Buffer1
DMA Computation
• Allows overlapping of DMA and computation
• Transfer to/from one buffer while using data from another
• Use different tag group for each buffer
• Depends on space available in LS
TIME
Slide 76
• Graphics API
• Multistream – Audio
• Bullet – Physics
• PlayStation®Edge – Geometry/Compression/Animation
• FIOS – Optimised disc access
• Various Codecs
Our Libraries that run on SPUs
Slide 77
• Game engine including
– Modular run time
– Samples, docs and whitepapers
– Full source and art work
• Optimised for multi-core platforms especially PS3™
• Extractable and reusable components
PhyreEngine™
Slide 78
• A lot of 3rd party middleware partners use the SPUs
– Physics
– Audio
– AI
– Game Engines
– Many more…
• Examples by guest speakers at the end of the talk
– NVIDIA® PhysX ®, Epic Games Unreal ® Engine3
and Crytek CryENGINE ® 3
In Addition
Slide 79
• Animation
• AI Threat prediction
• AI Line of fire
• AI Obstacle avoidance
• Collision detection
• Physics
• Particle simulation
• Particle rendering
• Scene graph
Killzone®2 SPU Usage
• Display list building
• IBL Light probes
• Graphics post-processing
• Dynamic music system
• Skinning
• MP3 Decompression
• Edge Zlib
• etc.
44 Job types
Slide 80
• Effect done on SPU
– Motion Blur, Bloom, Depth of Field
– Quarter-res image on XDR
• SPUs assist RSX™
1. RSX™ prepares low-res image buffers in XDR
2. RSX™ triggers interrupt to start SPUs
3. SPUs perform image operations
4. RSX™ already starts on next frame
5. Result of SPUs processed by RSX™ early in next frame
Post Processing
Slide 81
Resolve to quarter resolution
Image generated on the SPUs
(bloom, DoF, motion blur)
Final image by the RSX™
(Blending the out-of-focus and blurry areas in)
XDR DDR
Slide 82
South Bridge
Slide 83
• Faster than Blu-ray
– Over 20MB/s vs. 9MB/s of Blu-ray
– Cache data to HDD for faster subsequent load times
– Install titles
– Stream data
• Virtual memory
• Decrease usage of system memory
Storage – Hard drive
Slide 84
• High capacity
– 25GB single layer
– 50GB dual layer
• Constant linear velocity
– Constant read speed over disc - 9 MB/s
– Optimise layout, duplicate and pack data
• Use compression
– Lowers bandwidth cost and lowers read time
– Use SPUs to decompress – Edge Zlib
• Can emulate on Devkit
Blu-ray
Slide 85
• Can access both Blu-ray drive (~9 MB/s) and
HDD (~20MB/s) simultaneously
– On HDD, data come from the system cache and
game data partitions
• SPU Zlib decompression
– Increase the throughput
• libFIOS
– Optimize I/O transfers
– Ease background HDD caching implementation
PS3TM I/O Performance
System
Cache
(2GB)
25 ms apart
Game Data
(up to 5GB
per title)
Slide 86
Any questions so far?

More Related Content

What's hot

Introduction into Procedural Content Generation by Yogie Aditya
Introduction into Procedural Content Generation by Yogie AdityaIntroduction into Procedural Content Generation by Yogie Aditya
Introduction into Procedural Content Generation by Yogie Aditya
gamelanYK
 
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing Cameras
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing CamerasQualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing Cameras
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing Cameras
Qualcomm Developer Network
 
Game engines and Their Influence in Game Design
Game engines and Their Influence in Game DesignGame engines and Their Influence in Game Design
Game engines and Their Influence in Game Design
Prashant Warrier
 
Technologies Used In Graphics Rendering
Technologies Used In Graphics RenderingTechnologies Used In Graphics Rendering
Technologies Used In Graphics Rendering
Bhupinder Singh
 
98 374 Lesson 02-slides
98 374 Lesson 02-slides98 374 Lesson 02-slides
98 374 Lesson 02-slides
Tracie King
 
XNA for Windows Phone
XNA for Windows PhoneXNA for Windows Phone
XNA for Windows Phone
Ed Donahue
 
Output devices
Output devicesOutput devices
Output devices
yogiraj001
 
98 374 Lesson 04-slides
98 374 Lesson 04-slides98 374 Lesson 04-slides
98 374 Lesson 04-slides
Tracie King
 
Output Device
Output DeviceOutput Device
Output Device
Kikaboo6938
 
Hp envy
Hp  envyHp  envy
Computer Output Devices
Computer   Output DevicesComputer   Output Devices
Computer Output Devices
Faraz Ahmed
 
Control panel
Control panelControl panel
Control panel
manju_asnani
 
Output devices
Output devicesOutput devices
Output devices
Momina Mateen
 
Console design
Console designConsole design
Console design
crimzon36
 
98 374 Lesson 05-slides
98 374 Lesson 05-slides98 374 Lesson 05-slides
98 374 Lesson 05-slides
Tracie King
 
Output device presentation
Output device presentationOutput device presentation
Output device presentation
Suraj Neupane
 
Motion graphics and_compositing_video_analysis_worksheet_3
Motion graphics and_compositing_video_analysis_worksheet_3Motion graphics and_compositing_video_analysis_worksheet_3
Motion graphics and_compositing_video_analysis_worksheet_3
megrobbo95
 
1. case study
1. case study1. case study
1. case study
William Thirlaway
 
Fundamentals of computer output devices
Fundamentals of computer output devicesFundamentals of computer output devices
Fundamentals of computer output devices
Jesus Obenita Jr.
 
The Ultimate Gaming
The Ultimate GamingThe Ultimate Gaming
The Ultimate Gaming
koolshreeram
 

What's hot (20)

Introduction into Procedural Content Generation by Yogie Aditya
Introduction into Procedural Content Generation by Yogie AdityaIntroduction into Procedural Content Generation by Yogie Aditya
Introduction into Procedural Content Generation by Yogie Aditya
 
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing Cameras
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing CamerasQualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing Cameras
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing Cameras
 
Game engines and Their Influence in Game Design
Game engines and Their Influence in Game DesignGame engines and Their Influence in Game Design
Game engines and Their Influence in Game Design
 
Technologies Used In Graphics Rendering
Technologies Used In Graphics RenderingTechnologies Used In Graphics Rendering
Technologies Used In Graphics Rendering
 
98 374 Lesson 02-slides
98 374 Lesson 02-slides98 374 Lesson 02-slides
98 374 Lesson 02-slides
 
XNA for Windows Phone
XNA for Windows PhoneXNA for Windows Phone
XNA for Windows Phone
 
Output devices
Output devicesOutput devices
Output devices
 
98 374 Lesson 04-slides
98 374 Lesson 04-slides98 374 Lesson 04-slides
98 374 Lesson 04-slides
 
Output Device
Output DeviceOutput Device
Output Device
 
Hp envy
Hp  envyHp  envy
Hp envy
 
Computer Output Devices
Computer   Output DevicesComputer   Output Devices
Computer Output Devices
 
Control panel
Control panelControl panel
Control panel
 
Output devices
Output devicesOutput devices
Output devices
 
Console design
Console designConsole design
Console design
 
98 374 Lesson 05-slides
98 374 Lesson 05-slides98 374 Lesson 05-slides
98 374 Lesson 05-slides
 
Output device presentation
Output device presentationOutput device presentation
Output device presentation
 
Motion graphics and_compositing_video_analysis_worksheet_3
Motion graphics and_compositing_video_analysis_worksheet_3Motion graphics and_compositing_video_analysis_worksheet_3
Motion graphics and_compositing_video_analysis_worksheet_3
 
1. case study
1. case study1. case study
1. case study
 
Fundamentals of computer output devices
Fundamentals of computer output devicesFundamentals of computer output devices
Fundamentals of computer output devices
 
The Ultimate Gaming
The Ultimate GamingThe Ultimate Gaming
The Ultimate Gaming
 

Similar to Sony Computer Entertainment Europe Research & Development Division

CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
byteLAKE
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Community
 
Ca lecture 03
Ca lecture 03Ca lecture 03
Ca lecture 03
Haris456
 
Architecture of high end processors
Architecture of high end processorsArchitecture of high end processors
Architecture of high end processors
University of Gujrat, Pakistan
 
Chapter Three
Chapter ThreeChapter Three
Chapter Three
Nada G.Youssef
 
The Next Generation of PhyreEngine
The Next Generation of PhyreEngineThe Next Generation of PhyreEngine
The Next Generation of PhyreEngine
Slide_N
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
Patrick Quairoli
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at Scale
James Saint-Rossy
 
Coa presentation3
Coa presentation3Coa presentation3
Coa presentation3
rickypatel151
 
Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)
AllineaSoftware
 
Is ch03
Is ch03Is ch03
IS_Ch03.ppt
IS_Ch03.pptIS_Ch03.ppt
IS_Ch03.ppt
GaneshRam376603
 
IS740 Chapter 03
IS740 Chapter 03IS740 Chapter 03
IS740 Chapter 03
iDocs
 
Cpu
CpuCpu
Computer !
Computer !Computer !
Computer !
Usman Shah
 
Basic Computer Architecture
Basic Computer ArchitectureBasic Computer Architecture
Basic Computer Architecture
Yong Heui Cho
 
Chap4.ppt
Chap4.pptChap4.ppt
Chap4.ppt
SaniyaSultana9
 
Chap4.ppt
Chap4.pptChap4.ppt
Chap4.ppt
mvpk14486
 
The Central Processing Unit(CPU) for Chapter 4
The Central Processing Unit(CPU) for Chapter 4The Central Processing Unit(CPU) for Chapter 4
The Central Processing Unit(CPU) for Chapter 4
MKKhaing
 
Chap4.ppt
Chap4.pptChap4.ppt
Chap4.ppt
Praches1
 

Similar to Sony Computer Entertainment Europe Research & Development Division (20)

CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
 
Ca lecture 03
Ca lecture 03Ca lecture 03
Ca lecture 03
 
Architecture of high end processors
Architecture of high end processorsArchitecture of high end processors
Architecture of high end processors
 
Chapter Three
Chapter ThreeChapter Three
Chapter Three
 
The Next Generation of PhyreEngine
The Next Generation of PhyreEngineThe Next Generation of PhyreEngine
The Next Generation of PhyreEngine
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at Scale
 
Coa presentation3
Coa presentation3Coa presentation3
Coa presentation3
 
Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)
 
Is ch03
Is ch03Is ch03
Is ch03
 
IS_Ch03.ppt
IS_Ch03.pptIS_Ch03.ppt
IS_Ch03.ppt
 
IS740 Chapter 03
IS740 Chapter 03IS740 Chapter 03
IS740 Chapter 03
 
Cpu
CpuCpu
Cpu
 
Computer !
Computer !Computer !
Computer !
 
Basic Computer Architecture
Basic Computer ArchitectureBasic Computer Architecture
Basic Computer Architecture
 
Chap4.ppt
Chap4.pptChap4.ppt
Chap4.ppt
 
Chap4.ppt
Chap4.pptChap4.ppt
Chap4.ppt
 
The Central Processing Unit(CPU) for Chapter 4
The Central Processing Unit(CPU) for Chapter 4The Central Processing Unit(CPU) for Chapter 4
The Central Processing Unit(CPU) for Chapter 4
 
Chap4.ppt
Chap4.pptChap4.ppt
Chap4.ppt
 

More from Slide_N

Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Slide_N
 
Petascale Visualization: Approaches and Initial Results
Petascale Visualization: Approaches and Initial ResultsPetascale Visualization: Approaches and Initial Results
Petascale Visualization: Approaches and Initial Results
Slide_N
 
The Cell at Los Alamos: From Ray Tracing to Roadrunner
The Cell at Los Alamos: From Ray Tracing to RoadrunnerThe Cell at Los Alamos: From Ray Tracing to Roadrunner
The Cell at Los Alamos: From Ray Tracing to Roadrunner
Slide_N
 
Roadrunner and hybrid computing - Conference on High-Speed Computing
Roadrunner and hybrid computing - Conference on High-Speed ComputingRoadrunner and hybrid computing - Conference on High-Speed Computing
Roadrunner and hybrid computing - Conference on High-Speed Computing
Slide_N
 
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell ProcessorRoadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Slide_N
 
Deferred Pixel Shading on the PlayStation 3
Deferred Pixel Shading on the PlayStation 3Deferred Pixel Shading on the PlayStation 3
Deferred Pixel Shading on the PlayStation 3
Slide_N
 
POWER9: IBM’s Next Generation POWER Processor
POWER9: IBM’s Next Generation POWER ProcessorPOWER9: IBM’s Next Generation POWER Processor
POWER9: IBM’s Next Generation POWER Processor
Slide_N
 
IBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group DevelopmentIBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group Development
Slide_N
 
IBM POWER8: The first OpenPOWER processor
IBM POWER8: The first OpenPOWER processorIBM POWER8: The first OpenPOWER processor
IBM POWER8: The first OpenPOWER processor
Slide_N
 
Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4
Slide_N
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
Common Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngineCommon Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngine
Slide_N
 
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Slide_N
 
Towards Cell Broadband Engine - Together with Playstation
Towards Cell Broadband Engine  - Together with PlaystationTowards Cell Broadband Engine  - Together with Playstation
Towards Cell Broadband Engine - Together with Playstation
Slide_N
 
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
Slide_N
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Slide_N
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
Slide_N
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
Slide_N
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
Slide_N
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
Slide_N
 

More from Slide_N (20)

Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
 
Petascale Visualization: Approaches and Initial Results
Petascale Visualization: Approaches and Initial ResultsPetascale Visualization: Approaches and Initial Results
Petascale Visualization: Approaches and Initial Results
 
The Cell at Los Alamos: From Ray Tracing to Roadrunner
The Cell at Los Alamos: From Ray Tracing to RoadrunnerThe Cell at Los Alamos: From Ray Tracing to Roadrunner
The Cell at Los Alamos: From Ray Tracing to Roadrunner
 
Roadrunner and hybrid computing - Conference on High-Speed Computing
Roadrunner and hybrid computing - Conference on High-Speed ComputingRoadrunner and hybrid computing - Conference on High-Speed Computing
Roadrunner and hybrid computing - Conference on High-Speed Computing
 
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell ProcessorRoadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
Roadrunner Tutorial: An Introduction to Roadrunner and the Cell Processor
 
Deferred Pixel Shading on the PlayStation 3
Deferred Pixel Shading on the PlayStation 3Deferred Pixel Shading on the PlayStation 3
Deferred Pixel Shading on the PlayStation 3
 
POWER9: IBM’s Next Generation POWER Processor
POWER9: IBM’s Next Generation POWER ProcessorPOWER9: IBM’s Next Generation POWER Processor
POWER9: IBM’s Next Generation POWER Processor
 
IBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group DevelopmentIBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group Development
 
IBM POWER8: The first OpenPOWER processor
IBM POWER8: The first OpenPOWER processorIBM POWER8: The first OpenPOWER processor
IBM POWER8: The first OpenPOWER processor
 
Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4Efficient Usage of Compute Shaders on Xbox One and PS4
Efficient Usage of Compute Shaders on Xbox One and PS4
 
Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
 
Common Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngineCommon Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngine
 
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
 
Towards Cell Broadband Engine - Together with Playstation
Towards Cell Broadband Engine  - Together with PlaystationTowards Cell Broadband Engine  - Together with Playstation
Towards Cell Broadband Engine - Together with Playstation
 
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
 

Recently uploaded

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 

Recently uploaded (20)

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 

Sony Computer Entertainment Europe Research & Development Division

  • 1. Kish Hirani Sebastien Schertenleib Sony Computer Entertainment Europe Research & Development Division
  • 2. Slide 2 • PlayStation® Development Resources • Architecture of the PlayStation®3 (PS3™) • Case studies – Cover techniques used by 1st party titles for both the PSP® (PlayStation®Portable) and PS3™ • PlayStation® Advanced Programming – Graphics performance – PS3™ SPE utilisation What We Will Be Covering
  • 3. Kish Hirani Head of Developer Services Sony Computer Entertainment Europe (SCEE) Developer Services SCEE - Research & Development Division
  • 4. Slide 4 Developer Services • Great Marlborough Street in Central London • Cover ‘PAL’ Region Developers
  • 5. Slide 5 • Support – SCE DevNet • Field Engineering • Technical and Strategic Consultancy • Technical Training • TRC Consultancy • Projects What We Cover
  • 7. Slide 7 PS3 DevKit Latest additions to Dev Tools PS3 Test Kit
  • 8. Slide 8 Existing PSP DevKit Latest additions to Dev Tools PSPgo Commander Arm
  • 9. Slide 9 • PS3 DevKit: €1,700 • PS3 TestKit: €900 • PSP DevKit: €1,200 • PSP Commander: €300 euros Dev Tools Prices
  • 10. Slide 10 • Option for Digital Distribution via PlayStation® Store as well as disc • Same with full PSP titles • And now minis Digital Distribution Options
  • 11. Slide 11 • PSN available in 58 countries – 12 languages, 22 currencies – Including Russia! 90% of all titles release are now localised. • 700+ games and demos available • 600+ million items downloaded worldwide • More than 130K registered in Russia • Cumulative Russia PSN Sales €650K – as of FY2009 Q3 PlayStation®Network Vital Statistics
  • 12. Slide 12 • 52.9 million globally (as of end June 2009) – 17 million in SCEE Region – More than a 850K in Russia • Russia Tie Ratio: 2:1 PSP Vital Statistics
  • 13. Slide 13 • 23.7 million PS3s (as of end of June 2009) – One million of the new model sold in three weeks in September • More than 190K in the Russia so far – PlayStation and PS2 both sold more than 2 million in Russia during their lifespan (PS2 still selling well) • 68% of PS3 users in Russia go online • Russia Tie Ratio 5:1 PS3 Vital Statistics
  • 14. Slide 14 https://www.tpr.scee.net/ Where to start if you would like to become a registered PlayStation Developer
  • 15. Slide 15 http://www.worldwidestudios.net/xdev For registered developers who wish to be considered as a SCEE 2nd Partner Developer
  • 16. Dr Sebastien Schertenleib Developer Services - SCEE Research & Development PS3™ Hardware Overview
  • 17. Slide 17 • Console vs PC Development • Main Memory • RSX™ – Bandwidth, Pipeline and Data Placement • Cell Broadband Engine™ – Element Interconnect Bus, PPE and SPE • Benefits of SPU Programming • Data Storage • Peripherals PS3™ Hardware Overview
  • 18. Slide 18 PC vs Console Game Development PC • Open Platform • Limitless Resources • Generic API • IHV Drivers Console • Closed Platform • Fixed Hardware • Tiny HW Abstraction • Direct access
  • 19. Slide 19 • High performance on a budget • Fixed hardware target with a long life cycles – Games have to get better every year – Waiting for hardware to catch up with software is not an option • Have to understand the strengths and weaknesses of the hardware Why Consoles Are Different
  • 20. Slide 20 • Performance benefits from understanding of how the hardware works – Does not mean coding in assembly – Need to have an awareness of how software design may affect performance – Generally code optimized for consoles run also faster on PC Why Consoles Are Different
  • 21. Slide 21 • Memory is a very finite resource • Memory access is slow compare to processor speed – This gulf is getting larger on all platforms  – Cache is trying to insulate the application from memory latency – But cache on consoles are generally smaller than on a PC Why Consoles Are Different
  • 24. Slide 24 • Traditionally, heap function is called (malloc/new) – Has to be contiguous – Fragmentation possible • Recommend acquiring memory through mmapper – Build heap on top of this – Non-contiguous pages mapped to contiguous address space – Not subject to fragmentation – More control: page sizes, access rights Memory Management
  • 25. Slide 25 • Games will typically have their own memory manager – Small amount of memory compared to PC – Conservative with data structures and memory allocations • Different strategies for different uses – Memory pools, custom heaps, memory tracking • Understand the fragmentation, performance and space implications for each tool and apply them appropriately Memory Management
  • 26. Slide 26 • Fetching data from memory into a processor can easily dominate performance – L1 cache miss 10~30 cycles – L2 cache miss ~300 cycles • Most applications exhibit locality of code or data – A loop executes the same small set of instructions on data located contiguously in memory • Caches aim to increase overall system performance by taking advantage of this – Fast on die memory is very expensive so they are limited in size Caches
  • 28. Slide 28 • RSX™ has access to two memories • Local Memory –249MB usable (256MB – 7MB reserved by OS) • System Memory –213MB usable (256MB – 43MB reserved by OS) –RSX™ usable through memory mapping by application in 1MB blocks Memory Setup
  • 29. Slide 29 Memory Architecture Main Memory Cell RSX™ Local Memory
  • 30. Slide 30 System Bandwidth 25.6GB/s 20.8GB/s Audio Out Video Out 15GB/s 20GB/s Audio I/F Video I/F Cell Cell I/F GPU Main Memory Local Memory
  • 31. Slide 31 Vertex Shader Triangle Setup Rasteriser Fragment Shader Raster Operations Vertex/Index DataVertex/Index Data Local MemoryMain Memory RSX™ Pipeline
  • 32. Slide 32 Vertex Shader Rasteriser Fragment Shader Main Texture Cache Local Texture Cache Textures Textures Triangle Setup Raster Operations Local MemoryMain Memory RSX™ Pipeline
  • 34. Slide 34 • Able to utilise graphics data in either memory – Vertex attributes, Textures • Frame Buffer stores colour & depth surfaces – Can also be in either memory – Local memory supports colour & Z compression for MSAA modes • Saves bandwidth rather than space • Generally local memory accesses are faster • If local memory bandwidth is bottleneck → consider moving assets to system memory Data Placement
  • 35. Slide 35 How triangles end up on screen RSX™ Local Memory Cell Main Memory Verts Textures Shaders Verts Textures ShadersCmd Buffer
  • 37. Slide 37 Cell Broadband Engine™ • Processor core of PS3™ – Clocked at 3.2GHz • Contains 7 processing cores. – PowerPC Processor Element (PPE) • 512k cache memory – 6 Synergistic Processing Elements (SPE) • 256k local memory store
  • 39. Slide 39 • Memory Interface Controller (MIC) – Controls the 2 Rambus XDR channels. – Theoretical 25.6GB/s transfer • IOIF 0 - Connection to RSX™ – 20 GB/s to RSX™ – 15 GB/s from RSX™ • IOIF1 – Connection to South Bridge – 2.5GB/s to/from devices Inputs and Outputs
  • 41. Slide 41 • This connects all units in the CBE • Consist of 4 128bit buses • All data transfers are 128 byte packets – Smaller transfers sent as partial packed Element Interconnect Bus
  • 45. Slide 45 • 64 bit CPU core • 25.6 GFLOPs FPU • In order execution • Execution units include: – IEEE double precision FPU – Integer – Float vector unit (VMX) PowerPC Processing Element
  • 47. Slide 47 • Intrinsics are supported, preferable to inline assembly –Scheduling and optimisation • Two hardware threads –2 or more active software threads act to mitigate stalls • Good use of caches can make big difference in performance PPE Performance
  • 48. Slide 48 • Power PC issue that can severely affect performance • PPU has a store queue, that queues up stores waiting to get sent to the cache • If a load occurs that conflicts with data in store queue, the load will stall until the data is stored – Pipeline stall, around 40 cycles • Find using Tuner – LHS performance counter Load Hit Store
  • 49. Slide 49 • Type conversion – Integer, floating point or vector registers – Moves have to go through memory as there are no asm instructions to move data between register sets • Stack access - when calling a function that has: – Large parameters - will be passed on the stack – Many parameters - some may end up on the stack – Few instructions - with stack based parameters LHS Cases
  • 51. Slide 51 SPE Architecture • 3.2GHz processor • 128 Vector registers • 128 bit • No scalar registers • 256kB Local Store • Very fast, like L1 cache • Runs Asynchronously from rest of system • Independent Memory Flow Controller (MFC) • SIMD (single instruction multiple data) • No Operating System, Very few system calls • Performance is high and deterministic SPE EIB MFC SPU Local Store
  • 52. Slide 52 • SPE means Synergistic Processing Element – Mutual dependence between PPE and SPEs • PPE runs operating system and performs top level control – General Purpose Hardware (PowerPC) • SPEs provide bulk of application performance – Simple hardware – Simple pipeline – Simple memory architecture – Massive computational horsepower SPEs
  • 53. Slide 53 • Simple, predictable instruction set – Has full SIMD support – Intrinsics are supported for all instructions • Optimised compiler – Handles scalar conversion – Often as good as PPE for scalar code, can be quicker • Easy to program; can be programmed in C/C++ • Core power of system, key to performance • Frees up RSX™, frees up PPE Synergistic Processor Elements
  • 54. Slide 54 • Ported PPU code usually faster for optimised & non-optimised code – Port your code • Easy to program; with some limitations can be programmed in C/C++ • Scalar code can be run – Full SIMD instruction set • Multithreaded engine analogy - can move threads to SPUs SPU Development
  • 55. Slide 55 • Raw – You own the hardware – Complete control, direct hardware access – Once loaded – stays resident • SPU Threads – The OS owns the hardware – Must belong to a thread group – Scheduled by Cell OS Lv-2 kernel – based on group priority • SPURS – Higher level abstraction – Plays well with middleware Ways of Using SPUs
  • 56. Slide 56 • Runs on-top of SPU Threads – Small kernel on SPU • Thread switching more efficient – Requires no PPU resources – PPU SPURS Handler: only to restore thread group when necessary • Easier to synchronise threads • Easier to balance load on multiple SPUs – SPUs grab work when available SPURS (SPU Run Time System)
  • 57. Slide 57 SPURS (SPU Run Time System) Policy Module SPURS Kernel Work Load Policy Module SPURS Kernel Work Load Policy Module SPURS Kernel SPURS Handler PPU SPU0 SPU1 SPU2 Thread Group for SPURS • Implemented as an SPU Thread Group • Kernel, Policy Module, Workload • Minimal PPU Overhead Work Load
  • 58. Slide 58 • Jobs – A chain of execution units – Suited to short execution times – Automatic DMA transfer of input & output data – Executed with associated data – No context switching – run complete • Tasks – Different data model, larger programs – Can yield - synchronisation mechanisms – Optimised context switch – Similar to threads SPURS Workload Types
  • 59. Slide 59 • SPURS JobChain uses manual control of the execution sequence • SPURS JobQueue is designed for dynamic job submission in parallel SPURS Job Queue SPU SPU SPURS Job Queue PPU Fiber Port PPU Thread SPU Submitted jobs SPURS Task submitting in parallel executing in parallel
  • 61. Slide 61 • Problem: Update an array of objects and submit the visible ones for rendering • Object.update() method was bottleneck on PPE – Determined using SN Tuner – This function generates the world space matrices for each object – Embarrassingly parallel • No data dependencies between objects • If we had 1 processor for each object we could do that  Function off load example using a SPURS Task
  • 62. Slide 62 SPURS Tasks //---Conditionally calling SPU Code if (usingSPU==false) //---------------------------generic version { for(int i=0; i<OBJECT_COUNT; i++) { testObject[i].update(); } } else //-------------------------------------------SPU version { int test=spurs.ThrowTaskRunComplete(taskUpdateObject_elf_start, (int)testObject, OBJECT_COUNT,0,0); //could do something useful here… int result = spurs.CatchTaskRunComplete(test); }
  • 63. Slide 63 • Getting data in and out of SPU is first problem • Get this working before worrying about actual processing – Brute force often works just fine • DMA entire object array into SPE memory • Run update method • DMA entire object array back to main memory • List DMA allows you to get fancy – Gather and Scatter operation • If the data set is really big or if you need every last drop of performance – Streaming model • Overlap DMA transfers with processing • Double or quad buffering SPURS Task
  • 64. Slide 64 SPURS Task: Getting Data in and out int cellSpuMain(.....) { int sourceAddr = ....; //source address int count = ....; //number of data elements int dataSize = count * sizeof(gfxObject); //amount of memory to get gfxObject *buf = (gfxObject*)memalign(128,dataSize); //alloc mem DmaGet((void*)buf,sourceAddr,dataSize); DmaWait(....); //---data is loaded at this point so do something interesting DmaPut((void*)buf,sourceAddr,dataSize); DmaWait(....); return(0); }
  • 65. Slide 65 • Keep code the same as PPE Version – Conditional compilation based on processor type – Might have to split code into a separate file SPURS Task: Executing the Code //--- SPU code to call the update method //--- data is loaded at this point so do something interesting //--- step through array of objects and update gfxObject *tp; for(int i=0; i<count; i++) { tp = &buf[i]; tp->update(); //same method as on PPE compiled for SPU }
  • 66. Slide 66 • 5x Speed up in this instance – Brute force solution – Could add more SPUS – Problem is embarrassingly parallel • Note that the same source code for the method is being used on PPU and SPU – Code stays in sync – No fancy SPU specific optimisations SPURS Task: Results
  • 67. Slide 67 • PPU Stalled while waiting for SPU to Process • SPU Data get and put were synchronous – Not using hardware to its fullest extent – Easy to get more if we need it SPURS Task: Time Waster PPU PPU GET PUTExec TIME SPU
  • 69. Slide 69 Physics Solver Parallel SPU x5PPU 512 22ms 3200 23ms
  • 70. Slide 70 PPU Solver • Collision detection and response is a big, involved process 0 50 100 150 200 250 0 100 200 300 400 500 600 Time(ms) Step Integrate Collision Response 3200 boxes
  • 71. Slide 71 Development History 0 50 100 150 200 250 PPU x1 SPU x1 SPU x5 SPU x5 (Optimised) Time(ms) Total Time(ms)
  • 72. Slide 72 SPU solver 0 5 10 15 20 25 0 100 200 300 400 500 600 Time(ms) Step Integrate Collision Response Time(ms)
  • 73. Slide 73 • Simple pipeline – No stalls like LHS • No Operating System, very few system calls – Performance is high and deterministic • Memory is very fast like L1 cache – Memory transfers a asynchronous so can be hidden • Even if performance is comparable, frees up the PPE to do other stuff SPEs are faster than the PPE
  • 74. Slide 74 • Double buffer • Align data and transfers optimally • Vectorise data: SOA instead of AOS • Use SIMD – 4 operations at once = 4x speed up • Use intrinsics • Program directives - branch hinting Further SPU Optimisations
  • 75. Slide 75 Double-Buffering Buffer0 Buffer0 Buffer0 Buffer0 Buffer1 Buffer1 Buffer1 Buffer1 DMA Computation • Allows overlapping of DMA and computation • Transfer to/from one buffer while using data from another • Use different tag group for each buffer • Depends on space available in LS TIME
  • 76. Slide 76 • Graphics API • Multistream – Audio • Bullet – Physics • PlayStation®Edge – Geometry/Compression/Animation • FIOS – Optimised disc access • Various Codecs Our Libraries that run on SPUs
  • 77. Slide 77 • Game engine including – Modular run time – Samples, docs and whitepapers – Full source and art work • Optimised for multi-core platforms especially PS3™ • Extractable and reusable components PhyreEngine™
  • 78. Slide 78 • A lot of 3rd party middleware partners use the SPUs – Physics – Audio – AI – Game Engines – Many more… • Examples by guest speakers at the end of the talk – NVIDIA® PhysX ®, Epic Games Unreal ® Engine3 and Crytek CryENGINE ® 3 In Addition
  • 79. Slide 79 • Animation • AI Threat prediction • AI Line of fire • AI Obstacle avoidance • Collision detection • Physics • Particle simulation • Particle rendering • Scene graph Killzone®2 SPU Usage • Display list building • IBL Light probes • Graphics post-processing • Dynamic music system • Skinning • MP3 Decompression • Edge Zlib • etc. 44 Job types
  • 80. Slide 80 • Effect done on SPU – Motion Blur, Bloom, Depth of Field – Quarter-res image on XDR • SPUs assist RSX™ 1. RSX™ prepares low-res image buffers in XDR 2. RSX™ triggers interrupt to start SPUs 3. SPUs perform image operations 4. RSX™ already starts on next frame 5. Result of SPUs processed by RSX™ early in next frame Post Processing
  • 81. Slide 81 Resolve to quarter resolution Image generated on the SPUs (bloom, DoF, motion blur) Final image by the RSX™ (Blending the out-of-focus and blurry areas in) XDR DDR
  • 83. Slide 83 • Faster than Blu-ray – Over 20MB/s vs. 9MB/s of Blu-ray – Cache data to HDD for faster subsequent load times – Install titles – Stream data • Virtual memory • Decrease usage of system memory Storage – Hard drive
  • 84. Slide 84 • High capacity – 25GB single layer – 50GB dual layer • Constant linear velocity – Constant read speed over disc - 9 MB/s – Optimise layout, duplicate and pack data • Use compression – Lowers bandwidth cost and lowers read time – Use SPUs to decompress – Edge Zlib • Can emulate on Devkit Blu-ray
  • 85. Slide 85 • Can access both Blu-ray drive (~9 MB/s) and HDD (~20MB/s) simultaneously – On HDD, data come from the system cache and game data partitions • SPU Zlib decompression – Increase the throughput • libFIOS – Optimize I/O transfers – Ease background HDD caching implementation PS3TM I/O Performance System Cache (2GB) 25 ms apart Game Data (up to 5GB per title)