SlideShare a Scribd company logo
1 of 25
Intel’s Larrabee Vipin.p.nair S7-EC Roll no: 24 CEK
Introduction ,[object Object]
Larrabee is based on Intel’s x86 architecture.,[object Object]
Features Texture filtering, rasterization, depth testing and alpha blending entirely in software Implement binned renderer to increase parallelism  Reduced memory Bandwidth Parallel processing on image processing, physical simulation, medical & financial analysis. DDR5 RAM support Each core can execute 32Gigaflops/s with 1GHz clock, results several teraflops/s speed
Differences with CPU Out of order execution Vector processing unit supports 16-single precision floating point numbers at a time Texture sampling units – trilinear /anisotropic filtering & texture decompression 1024-bit ring bus between cores Cache control instructions 4-way multithreading
Difference with GPU x86 instruction set with Larrabee-specific extensions  cache coherency across all its cores z-buffering, clipping, and blending without using graphics hardware
Larrabee – Block Diagram
Architecture ,[object Object],    - Fast access to memory, I/O interfaces and fixed function blocks     - Fast access for cache coherency ,[object Object],    - Provides high aggregate bandwidth     - Allows data replication & sharing ,[object Object],[object Object]
   Vector unit: 16 32-bit ops/clock
   In-order instruction execution
Fast access from 64k L1 cache
   Direct connection to eachcore’s subset of the 256k L2 cache ,[object Object],and L2 caches
Vector Unit ,[object Object],         – Scatter/gather for vector load/store          – Mask registers select lanes to write,             which allows data-parallel flow control          – Masks also support data compaction ,[object Object],         – Full speed when data in L1 cache  – Fused multiply add (three arguments)          – Int32, Float32 and Float64 data          – Can read 8-bit unorm, 8-bit uint, 16 bit               sine, 16 bit float data & convert it into 32 bit floats/ integers.
Fixed Function Logic Micro codes in place of fixed function logic for post shader alpha blending, rasterizationand interpolation. Includes fixed function texture filter logic Virtual memory for textures
Larrabee’s Binning Renderer Binning pipeline – Reduces synchronization – Front end processes vertex & geometry shading – Back end processes pixel shading, stencil testing, blending – Bin FIFO between them • Multi-tasking by cores – Each orange box is a core – Cores run independently – Other cores can run other tasks, e.g. physics
Back-end Rendering a Tile • Orange boxes represent work on separate threads • Three work threads do Z, pixel shader, and blending • Setup thread reads from bins and does pre-processing • Combines task parallel, data parallel, and sequential
Pipeline can be changed Parts can move between front end & back end      – Vertex shading, tesselation, rasterization, etc.      – Allows balancing computation vs. bandwidth New features       – Transparency, shadowing, ray tracing etc.      – Each of these need irregular data structures – Also helps to be able to “repack” the data
Transparency Transparency with & without pre-resolve effects
Examples of using Tasks Applications      – Scene traversal and culling      – Procedural geometry synthesis      – Physics contact group solve      – Data parallel strand groups      – Distribute across threads/cores using task system      – Exploit core resources with SIMD Larrabee can submit work to itself!      – Tasks can spawn other tasks      – Exposed in Larrabee Native programming interface(c/c++compiler)
Application scaling studies
Scalability Studies ,[object Object],[object Object]
Binning & Bandwidth Studies Bandwidth ,[object Object],      -2.4 to 7 times for F.E.A.R       -1.5 to2.6 times more for Gears of War        -1.6 to 1.8 times more for Half Life 2 Episode 2.
Overall performance
Conclusion  The Larrabee architecture opens the rich set of opportunities for both graphics rendering and throughput computing and is the appropriate platform for convergence of GPU & CPU
Reference IEEE Digital Library- Larrabee: a many- core x86 architecture for visual computing: - Larry Seiler, Doug Carmean, Toni Juan of Intel Corporation, Jeremy Sugerman & Peter Hanrahan – Stanford University IEEE spectrum January 2008 ACM transactions on graphics-Article 18 www.intel.com www.wikipedia.com

More Related Content

What's hot

Icg hpc-user
Icg hpc-userIcg hpc-user
Icg hpc-usergdburton
 
Lecture 3
Lecture 3Lecture 3
Lecture 3Mr SMAK
 
Lecture 7 cuda execution model
Lecture 7   cuda execution modelLecture 7   cuda execution model
Lecture 7 cuda execution modelVajira Thambawita
 
Research Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingResearch Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingShitalkumar Sukhdeve
 
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...Ahmed kasim
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processingare you
 
cis97007
cis97007cis97007
cis97007perfj
 
Parallel processing coa
Parallel processing coaParallel processing coa
Parallel processing coaBala Vignesh
 
Feng’s classification
Feng’s classificationFeng’s classification
Feng’s classificationNarayan Kandel
 
Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8Marcelo Arbore
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platformsVajira Thambawita
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
 

What's hot (20)

Introduction to parallel computing
Introduction to parallel computingIntroduction to parallel computing
Introduction to parallel computing
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Icg hpc-user
Icg hpc-userIcg hpc-user
Icg hpc-user
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Array Processor
Array ProcessorArray Processor
Array Processor
 
Lecture 7 cuda execution model
Lecture 7   cuda execution modelLecture 7   cuda execution model
Lecture 7 cuda execution model
 
CO Module 5
CO Module 5CO Module 5
CO Module 5
 
Aca2 07 new
Aca2 07 newAca2 07 new
Aca2 07 new
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Research Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingResearch Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel Programming
 
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processing
 
cis97007
cis97007cis97007
cis97007
 
Parallel processing coa
Parallel processing coaParallel processing coa
Parallel processing coa
 
Advanced computer architecture
Advanced computer architectureAdvanced computer architecture
Advanced computer architecture
 
The CLAM Framework
The CLAM FrameworkThe CLAM Framework
The CLAM Framework
 
Feng’s classification
Feng’s classificationFeng’s classification
Feng’s classification
 
Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platforms
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologies
 

Similar to Intel’S Larrabee

Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Michelle Holley
 
Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH veena babu
 
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0Sahil Kaw
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
Cache performance-x86-2009
Cache performance-x86-2009Cache performance-x86-2009
Cache performance-x86-2009Léia de Sousa
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxAkshitAgiwal1
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 

Similar to Intel’S Larrabee (20)

Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
 
Cliff sugerman
Cliff sugermanCliff sugerman
Cliff sugerman
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 
Ef35745749
Ef35745749Ef35745749
Ef35745749
 
Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH
 
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
processor struct
processor structprocessor struct
processor struct
 
Cache performance-x86-2009
Cache performance-x86-2009Cache performance-x86-2009
Cache performance-x86-2009
 
U I - 4. 80386 Real mode.pptx
U I - 4. 80386 Real mode.pptxU I - 4. 80386 Real mode.pptx
U I - 4. 80386 Real mode.pptx
 
I7 processor
I7 processorI7 processor
I7 processor
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Hpc 4 5
Hpc 4 5Hpc 4 5
Hpc 4 5
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
Danish presentation
Danish presentationDanish presentation
Danish presentation
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 

Intel’S Larrabee

  • 1. Intel’s Larrabee Vipin.p.nair S7-EC Roll no: 24 CEK
  • 2.
  • 3.
  • 4. Features Texture filtering, rasterization, depth testing and alpha blending entirely in software Implement binned renderer to increase parallelism Reduced memory Bandwidth Parallel processing on image processing, physical simulation, medical & financial analysis. DDR5 RAM support Each core can execute 32Gigaflops/s with 1GHz clock, results several teraflops/s speed
  • 5. Differences with CPU Out of order execution Vector processing unit supports 16-single precision floating point numbers at a time Texture sampling units – trilinear /anisotropic filtering & texture decompression 1024-bit ring bus between cores Cache control instructions 4-way multithreading
  • 6. Difference with GPU x86 instruction set with Larrabee-specific extensions cache coherency across all its cores z-buffering, clipping, and blending without using graphics hardware
  • 8.
  • 9. Vector unit: 16 32-bit ops/clock
  • 10. In-order instruction execution
  • 11. Fast access from 64k L1 cache
  • 12.
  • 13.
  • 14. Fixed Function Logic Micro codes in place of fixed function logic for post shader alpha blending, rasterizationand interpolation. Includes fixed function texture filter logic Virtual memory for textures
  • 15. Larrabee’s Binning Renderer Binning pipeline – Reduces synchronization – Front end processes vertex & geometry shading – Back end processes pixel shading, stencil testing, blending – Bin FIFO between them • Multi-tasking by cores – Each orange box is a core – Cores run independently – Other cores can run other tasks, e.g. physics
  • 16. Back-end Rendering a Tile • Orange boxes represent work on separate threads • Three work threads do Z, pixel shader, and blending • Setup thread reads from bins and does pre-processing • Combines task parallel, data parallel, and sequential
  • 17. Pipeline can be changed Parts can move between front end & back end – Vertex shading, tesselation, rasterization, etc. – Allows balancing computation vs. bandwidth New features – Transparency, shadowing, ray tracing etc. – Each of these need irregular data structures – Also helps to be able to “repack” the data
  • 18. Transparency Transparency with & without pre-resolve effects
  • 19. Examples of using Tasks Applications – Scene traversal and culling – Procedural geometry synthesis – Physics contact group solve – Data parallel strand groups – Distribute across threads/cores using task system – Exploit core resources with SIMD Larrabee can submit work to itself! – Tasks can spawn other tasks – Exposed in Larrabee Native programming interface(c/c++compiler)
  • 21.
  • 22.
  • 24. Conclusion The Larrabee architecture opens the rich set of opportunities for both graphics rendering and throughput computing and is the appropriate platform for convergence of GPU & CPU
  • 25. Reference IEEE Digital Library- Larrabee: a many- core x86 architecture for visual computing: - Larry Seiler, Doug Carmean, Toni Juan of Intel Corporation, Jeremy Sugerman & Peter Hanrahan – Stanford University IEEE spectrum January 2008 ACM transactions on graphics-Article 18 www.intel.com www.wikipedia.com