Published on

OSXDEV 오픈세미나 - WWDC 따라잡기

Published in: Technology, Art & Photos


  1. 1. Metal girl+sk8er osxdev.org
  2. 2. Recap WWDC 2014 •Swift •Yosemite •Metal
  3. 3. I’m so happy that I was too lazy to learn Objective-C
  4. 4. Maybe or not Game Industry Trend C++ OOP Design Pattern TDD - C FP / PP / DOP - Fast Iteration Immutability
  5. 5. Maybe or not Game Industry Trend C++ / Objective-C OOP Design Pattern TDD C / Swift FP / PP / DOP ! Fast Iteration Immutability
  6. 6. seen season one before? Explaining Metal
  7. 7. Season one BAAM! Explaining Metal
  8. 8. SubTitle Boss의 한마디 http://www.bloter.net/archives/195819
  9. 9. This talk •No API in detail •No code(my own) •No demo
  10. 10. CPU vs GPU Control Cache ALU ALU ALU ALU DRAM DRAM or instruction stream sharing. While mming model permits each shader w a unique stream of control, in ecution on nearby stream elements e same dynamic control-flow decisions. le shader invocations can likely share am. Although GPUs must accom- where this is not the case, instruction oss multiple shader invocations is a key e design of GPU processing cores and is gorithms for pipeline scheduling. a GPU’s hin cessing or exe- tions. mple- nces exist d product GPUs ciency multi- mploy Even higher performance is possible by populating each core with multiple floating-point ALUs. This is done efficiently with SIMD processing, which uses each ALU to perform the same operation on a different piece of data. The most common implementation of SIMD processing is via explicit short-vector instructions, similar to those provided by the x86 SSE or PowerPC Altivec ISA exten- sions. These extensions provide a SIMD width of four, with instructions that control the operation of four ALUs. Alternative implementations, such as NVIDIA’s 8-series architecture, perform SIMD execution by implicitly shar- Type Processor Cores/Chip ALUs/Core3 SIMD width MaxT4 GPUs AMD Radeon HD 2900 4 80 64 48 NVIDIA GeForce 8800 16 8 32 96 CPUs Intel Core 2 Quad1 4 8 4 1 STI Cell BE2 8 4 4 1 Sun UltraSPARC T2 8 1 1 4 TABLE 1 1 SSE processing only, does not account for x86 FPU. 2 Stream processing (SPE) cores only, does not account for PPU cores. 3 32-bit, floating point (all ALUs are multiply-add except the Intel Core 2 Quad)
  11. 11. Apple A7 http://www.anandtech.com/show/8116/some-thoughts-on-apples-metal-api
  12. 12. Why we should use driver?
  13. 13. Why we should use driver? •GPU runs asynchronously •Different address space •Different ISA •Display is updated by frame
  14. 14. 그림 그리기 •도화지를 편다 •(그릴 그림을 생각한다) •붓과 물감을 고른다 •붓으로 그림을 그린다. •(구겨 버리거나 걸어둔다) •새 도화지를 편다
  15. 15. 그림 그리기 / Graphics App. •도화지를 편다 / Framebuffer setup •(그릴 그림을 생각한다) / Data setup •붓과 물감을 고른다 / State setup •붓으로 그림을 그린다. / Draw call •(구겨 버리거나 걸어둔다) / Update a frame •새 도화지를 편다 / Framebuffer clear Graphics Driver는 이 모든 과정의 API를 제공한다
  16. 16. Graphics Driver의 계층 구조 API Interface State Management Command Queue Management I/O Controller Shader Compiler
  17. 17. Why is it expensive? Graphics Driver가 하는 일 •State validation ■ Confirming API usage is valid ■ Encoding API state to hardware state •Shader compilation ■ Run-time generation of shader machine code ■ Interactions between state and shaders •Sending work to GPU ■ Managing resource residency ■ Batching commands
  18. 18. OpenGL State validation void glTexImage2D( GLenum target, GLint level, GLint internalFormat, GLsizei width, GLsizei height, GLint border, GLenum format, GLenum type, const GLvoid * data);
  19. 19. Are you kidding? Shader Compilation •No standard for pre-built shader •No standard for shader binary format int Init(ESContext *esContext) { UserData *userData = esContext->userData; GLbyte vShaderStr[] = "attribute vec4 vPosition; n" "void main() n" "{ n" " gl_Position = vPosition; n" "} n"; GLbyte fShaderStr[] = "precision mediump float; n" "void main() n" "{ n" " gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0); n" "} n"; GLuint vertexShader; GLuint fragmentShader; GLuint programObject; GLint linked;
  20. 20. 음영(陰影) Shader •Shader는 오브젝트를 어둡게 칠한다 courtesy of 西川善司
  21. 21. 복붙 Sending work to GPU •Batching commands and committing •Transferring data and texture
  22. 22. Design target Metal •Low CPU overhead •More predictable performance •Better programmability
  23. 23. Key ideas Metal •Create and validate state up-front •Shader can be compiled offline •Enable versatile multi-threading •Shared memory for CPU & GPU •Handle synchronisation explicitly •Tile-based deferred rendering •C++11 based language •No legacy baggage •Compute shader But, A7 only - What the x
  24. 24. Multi-threading
  25. 25. Metal vs OpenGL ES Code comparison
  26. 26. Low CPU overhead enable So what •more draw calls •more objects •better physics •better AI •more complex logic •low battery usage
  27. 27. Use engine or forget How do I start? •Unity 5(next year) - free/4,500$ •Unreal 4(may be this year) - 19$/month •Cocos2D - free •Xcode template
  28. 28. Proprietary API •Apple is a promoter of Khronos Group •OpenCL story •판이 꺼졌으니 사다리 걷어차기? ■ 하지만 구글은 바보가 아니다(Expansion Pack)
  29. 29. 몰라도 그만 Conclusion •Low CPU overhead •Can do something more •A7 only(할 수 없거나 귀찮거나) •Game-changer? maybe or not