•No API in detail
•No code(my own)
CPU vs GPU
or instruction stream sharing. While
mming model permits each shader
w a unique stream of control, in
ecution on nearby stream elements
e same dynamic control-ﬂow decisions.
le shader invocations can likely share
am. Although GPUs must accom-
where this is not the case, instruction
oss multiple shader invocations is a key
e design of GPU processing cores and is
gorithms for pipeline scheduling.
Even higher performance is possible by populating
each core with multiple ﬂoating-point ALUs. This is done
efﬁciently with SIMD processing, which uses each ALU to
perform the same operation on a different piece of data.
The most common implementation of SIMD processing
is via explicit short-vector instructions, similar to those
provided by the x86 SSE or PowerPC Altivec ISA exten-
sions. These extensions provide a SIMD width of four,
with instructions that control the operation of four ALUs.
Alternative implementations, such as NVIDIA’s 8-series
architecture, perform SIMD execution by implicitly shar-
Type Processor Cores/Chip ALUs/Core3
SIMD width MaxT4
GPUs AMD Radeon HD 2900 4 80 64 48
NVIDIA GeForce 8800 16 8 32 96
CPUs Intel Core 2 Quad1
4 8 4 1
STI Cell BE2
8 4 4 1
Sun UltraSPARC T2 8 1 1 4
SSE processing only, does not account for x86 FPU.
Stream processing (SPE) cores only, does not account for PPU cores.
32-bit, ﬂoating point (all ALUs are multiply-add except the Intel Core 2 Quad)
Why we should use driver?
•GPU runs asynchronously
•Different address space
•Display is updated by frame
•(그릴 그림을 생각한다)
•붓과 물감을 고른다
•붓으로 그림을 그린다.
•(구겨 버리거나 걸어둔다)
•새 도화지를 편다
그림 그리기 / Graphics App.
•도화지를 편다 / Framebuffer setup
•(그릴 그림을 생각한다) / Data setup
•붓과 물감을 고른다 / State setup
•붓으로 그림을 그린다. / Draw call
•(구겨 버리거나 걸어둔다) / Update a frame
•새 도화지를 편다 / Framebuffer clear
Graphics Driver는 이 모든 과정의 API를 제공한다
Graphics Driver의 계층 구조
Command Queue Management
Why is it expensive?
Graphics Driver가 하는 일
■ Conﬁrming API usage is valid
■ Encoding API state to hardware state
■ Run-time generation of shader machine code
■ Interactions between state and shaders
•Sending work to GPU
■ Managing resource residency
■ Batching commands
Are you kidding?
•No standard for pre-built shader
•No standard for shader binary format
int Init(ESContext *esContext)
UserData *userData = esContext->userData;
GLbyte vShaderStr =
"attribute vec4 vPosition; n"
"void main() n"
" gl_Position = vPosition; n"
GLbyte fShaderStr =
"precision mediump ﬂoat; n"
"void main() n"
" gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0); n"
•Shader는 오브젝트를 어둡게 칠한다
courtesy of 西川善司
Sending work to GPU
•Batching commands and committing
•Transferring data and texture
•Low CPU overhead
•More predictable performance
•Create and validate state up-front
•Shader can be compiled offline
•Enable versatile multi-threading
•Shared memory for CPU & GPU
•Handle synchronisation explicitly
•Tile-based deferred rendering
•C++11 based language
•No legacy baggage
But, A7 only - What the x