4. Features Texture filtering, rasterization, depth testing and alpha blending entirely in software Implement binned renderer to increase parallelism Reduced memory Bandwidth Parallel processing on image processing, physical simulation, medical & financial analysis. DDR5 RAM support Each core can execute 32Gigaflops/s with 1GHz clock, results several teraflops/s speed
5. Differences with CPU Out of order execution Vector processing unit supports 16-single precision floating point numbers at a time Texture sampling units – trilinear /anisotropic filtering & texture decompression 1024-bit ring bus between cores Cache control instructions 4-way multithreading
6. Difference with GPU x86 instruction set with Larrabee-specific extensions cache coherency across all its cores z-buffering, clipping, and blending without using graphics hardware
14. Fixed Function Logic Micro codes in place of fixed function logic for post shader alpha blending, rasterizationand interpolation. Includes fixed function texture filter logic Virtual memory for textures
15. Larrabee’s Binning Renderer Binning pipeline – Reduces synchronization – Front end processes vertex & geometry shading – Back end processes pixel shading, stencil testing, blending – Bin FIFO between them • Multi-tasking by cores – Each orange box is a core – Cores run independently – Other cores can run other tasks, e.g. physics
16. Back-end Rendering a Tile • Orange boxes represent work on separate threads • Three work threads do Z, pixel shader, and blending • Setup thread reads from bins and does pre-processing • Combines task parallel, data parallel, and sequential
17. Pipeline can be changed Parts can move between front end & back end – Vertex shading, tesselation, rasterization, etc. – Allows balancing computation vs. bandwidth New features – Transparency, shadowing, ray tracing etc. – Each of these need irregular data structures – Also helps to be able to “repack” the data
19. Examples of using Tasks Applications – Scene traversal and culling – Procedural geometry synthesis – Physics contact group solve – Data parallel strand groups – Distribute across threads/cores using task system – Exploit core resources with SIMD Larrabee can submit work to itself! – Tasks can spawn other tasks – Exposed in Larrabee Native programming interface(c/c++compiler)
24. Conclusion The Larrabee architecture opens the rich set of opportunities for both graphics rendering and throughput computing and is the appropriate platform for convergence of GPU & CPU
25. Reference IEEE Digital Library- Larrabee: a many- core x86 architecture for visual computing: - Larry Seiler, Doug Carmean, Toni Juan of Intel Corporation, Jeremy Sugerman & Peter Hanrahan – Stanford University IEEE spectrum January 2008 ACM transactions on graphics-Article 18 www.intel.com www.wikipedia.com