Intel’s Larrabee<br />Vipin.p.nair<br />S7-EC<br />Roll no: 24<br />CEK<br />
Introduction<br /><ul><li>It is a multicore general purpose graphics processor unit (GPGPU), combines the functions of mul...
Larrabee is based on Intel’s x86 architecture.</li></li></ul><li>Architectural convergence<br />
Features<br />Texture filtering, rasterization, depth testing and alpha blending entirely in software<br />Implement binne...
Differences with CPU<br />Out of order execution<br />Vector processing unit supports 16-single precision floating point n...
Difference with GPU<br />x86 instruction set with Larrabee-specific extensions <br />cache coherency across all its cores<...
Larrabee – Block Diagram<br />
Architecture<br /><ul><li>Cores communicate on a 1024-bit wide ring bus</li></ul>    - Fast access to memory, I/O interfac...
   Vector unit: 16 32-bit ops/clock
   In-order instruction execution
Fast access from 64k L1 cache
   Direct connection to each</li></ul>core’s subset of the 256k L2 cache<br /><ul><li>Prefetch instructions load L1</li></...
Vector Unit<br /><ul><li>    Vector complete instruction set</li></ul>         – Scatter/gather for vector load/store<br /...
Fixed Function Logic<br />Micro codes in place of fixed function logic for post shader alpha blending, rasterizationand in...
Larrabee’s Binning Renderer<br />Binning pipeline<br />– Reduces synchronization<br />– Front end processes vertex & geome...
Back-end Rendering a Tile<br />• Orange boxes represent work on separate threads<br />• Three work threads do Z, pixel sha...
Pipeline can be changed<br />Parts can move between front end & back end<br />     – Vertex shading, tesselation, rasteriz...
Transparency<br />Transparency with & without pre-resolve effects<br />
Examples of using Tasks<br />Applications<br />     – Scene traversal and culling<br />     – Procedural geometry synthesi...
Application scaling studies<br />
Scalability Studies<br /><ul><li> Based on memory Bandwidth & texture filtering speed</li></li></ul><li>Performance Breakd...
Binning & Bandwidth Studies<br />Bandwidth<br /><ul><li>Immediate mode use more Bandwidth   </li></ul>      -2.4 to 7 time...
Overall performance<br />
Conclusion <br />The Larrabee architecture opens the rich set of opportunities for both graphics rendering and throughput ...
Reference<br />IEEE Digital Library- Larrabee: a many- core x86 architecture for visual computing: - Larry Seiler, Doug Ca...
Upcoming SlideShare
Loading in …5
×

Intel’S Larrabee

834 views

Published on

Larrabee is a new processor from Intel. It combines the features of bot CPU &amp; GPU

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
834
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Intel’S Larrabee

  1. 1. Intel’s Larrabee<br />Vipin.p.nair<br />S7-EC<br />Roll no: 24<br />CEK<br />
  2. 2. Introduction<br /><ul><li>It is a multicore general purpose graphics processor unit (GPGPU), combines the functions of multi core CPU & GPU.
  3. 3. Larrabee is based on Intel’s x86 architecture.</li></li></ul><li>Architectural convergence<br />
  4. 4. Features<br />Texture filtering, rasterization, depth testing and alpha blending entirely in software<br />Implement binned renderer to increase parallelism <br />Reduced memory Bandwidth<br />Parallel processing on image processing, physical simulation, medical & financial analysis.<br />DDR5 RAM support<br />Each core can execute 32Gigaflops/s with 1GHz clock, results several teraflops/s speed <br />
  5. 5. Differences with CPU<br />Out of order execution<br />Vector processing unit supports 16-single precision floating point numbers at a time<br />Texture sampling units – trilinear /anisotropic filtering & texture decompression<br />1024-bit ring bus between cores<br />Cache control instructions<br />4-way multithreading<br />
  6. 6. Difference with GPU<br />x86 instruction set with Larrabee-specific extensions <br />cache coherency across all its cores<br />z-buffering, clipping, and blending without using graphics hardware<br />
  7. 7. Larrabee – Block Diagram<br />
  8. 8. Architecture<br /><ul><li>Cores communicate on a 1024-bit wide ring bus</li></ul> - Fast access to memory, I/O interfaces and fixed function blocks<br /> - Fast access for cache coherency<br /><ul><li>L2 cache is partitioned among the cores</li></ul> - Provides high aggregate bandwidth<br /> - Allows data replication & sharing<br /><ul><li> Optimized for highly parallel workload using vector processor</li></li></ul><li>In-order CPU Core <br /><ul><li> Separate scalar & vector units with separate registers
  9. 9. Vector unit: 16 32-bit ops/clock
  10. 10. In-order instruction execution
  11. 11. Fast access from 64k L1 cache
  12. 12. Direct connection to each</li></ul>core’s subset of the 256k L2 cache<br /><ul><li>Prefetch instructions load L1</li></ul>and L2 caches<br />
  13. 13. Vector Unit<br /><ul><li> Vector complete instruction set</li></ul> – Scatter/gather for vector load/store<br /> – Mask registers select lanes to write,<br /> which allows data-parallel flow control<br /> – Masks also support data compaction<br /><ul><li> Vector instructions support</li></ul> – Full speed when data in L1 cache<br /> – Fused multiply add (three arguments)<br /> – Int32, Float32 and Float64 data<br /> – Can read 8-bit unorm, 8-bit uint, 16 bit sine, 16 bit float data & convert it into 32 bit floats/ integers.<br />
  14. 14. Fixed Function Logic<br />Micro codes in place of fixed function logic for post shader alpha blending, rasterizationand interpolation.<br />Includes fixed function texture filter logic<br />Virtual memory for textures<br />
  15. 15. Larrabee’s Binning Renderer<br />Binning pipeline<br />– Reduces synchronization<br />– Front end processes vertex & geometry shading<br />– Back end processes pixel shading, stencil testing, blending<br />– Bin FIFO between them<br />• Multi-tasking by cores<br />– Each orange box is a core<br />– Cores run independently<br />– Other cores can run other<br />tasks, e.g. physics<br />
  16. 16. Back-end Rendering a Tile<br />• Orange boxes represent work on separate threads<br />• Three work threads do Z, pixel shader, and blending<br />• Setup thread reads from bins and does pre-processing<br />• Combines task parallel, data parallel, and sequential<br />
  17. 17. Pipeline can be changed<br />Parts can move between front end & back end<br /> – Vertex shading, tesselation, rasterization, etc.<br /> – Allows balancing computation vs. bandwidth<br />New features <br /> – Transparency, shadowing, ray tracing etc.<br /> – Each of these need irregular data structures<br />– Also helps to be able to “repack” the data<br />
  18. 18. Transparency<br />Transparency with & without pre-resolve effects<br />
  19. 19. Examples of using Tasks<br />Applications<br /> – Scene traversal and culling<br /> – Procedural geometry synthesis<br /> – Physics contact group solve<br /> – Data parallel strand groups<br /> – Distribute across threads/cores using task system<br /> – Exploit core resources with SIMD<br />Larrabee can submit work to itself!<br /> – Tasks can spawn other tasks<br /> – Exposed in Larrabee Native programming interface(c/c++compiler)<br />
  20. 20. Application scaling studies<br />
  21. 21. Scalability Studies<br /><ul><li> Based on memory Bandwidth & texture filtering speed</li></li></ul><li>Performance Breakdowns<br />
  22. 22. Binning & Bandwidth Studies<br />Bandwidth<br /><ul><li>Immediate mode use more Bandwidth </li></ul> -2.4 to 7 times for F.E.A.R<br /> -1.5 to2.6 times more for Gears of War<br /> -1.6 to 1.8 times more for Half Life 2 Episode 2.<br />
  23. 23. Overall performance<br />
  24. 24. Conclusion <br />The Larrabee architecture opens the rich set of opportunities for both graphics rendering and throughput computing and is the appropriate platform for convergence of GPU & CPU<br />
  25. 25. Reference<br />IEEE Digital Library- Larrabee: a many- core x86 architecture for visual computing: - Larry Seiler, Doug Carmean, Toni Juan of Intel Corporation, Jeremy Sugerman & Peter Hanrahan – Stanford University<br />IEEE spectrum January 2008<br />ACM transactions on graphics-Article 18<br />www.intel.com<br />www.wikipedia.com<br />
  26. 26. Questions<br />
  27. 27. Thank You<br />

×