SlideShare a Scribd company logo
Richard Thomson
legalize@xmission.com
@LegalizeAdulthd
github.com/LegalizeAdulthood
SIMD
 Single
 Instruction
 Multiple
 Data
SIMD Exploits Data Parallelism
 Image Processing
 Array Processing
 Scientific Computing
 3D Graphics
Brief History of CPU SIMD
Year Extension Register Size
1997 MMX 64 bits
1999 SSE 128 bits
2001 SSE2 128 bits
2004 SSE3 128 bits
2006 SSE4 128 bits
2008 AVX 256 bits
2015 AVX-512 512 bits
Data Types
 8-bit integers
 16-bit integers
 32-bit integers
 64-bit integers
 16-bit floats
 32-bit floats
 64-bit floats
 Multiple smaller
quantities are packed into
registers ("multiple data")
 Alignment requirements
on data
 Older extensions do not
support all data types
Alignment C++11
struct alignas(16) foo
{
int i; // 4 bytes
int j; // 4 bytes
alignas(4) char s[3]; // 3 bytes
short q; // 2 bytes
};
// outputs 16:
std::cout << alignof(foo) << 'n';
Alignment C++03
// pre-C++11
// MSVC:
struct __declspec(align(16)) foo
{
// ...
};
// gcc:
struct foo __attribute__((aligned(16)))
{
// ...
};
Boost.Align
 Handles heap allocation of aligned memory
 Query the alignment requirements of a type
 Declare alignment to the compiler portably
Compiler Intrinsics
 A function whose implementation is handled directly
by the compiler.
 SIMD registers exposed as data types
 __m64, __m128, __m128d, __m128i, etc.
 SIMD instructions exposed as intrinsic functions
 _m_paddb, _m_paddd, _m_paddsb, etc.
 Register allocation, instruction scheduling and
addressing modes handled by the compiler
 Proper alignment of operands is assumed
Options Available
Assembly
Intrinsics
Class Library
Automatic Vectorization
+ Direct control,
- Hard to program
+ Pure C/C++,
- Hard to program
+ Easier to program,
- Less control
- Very little control
Proposed Boost.Simd
 https://github.com/NumScale/boost.simd
 Seems promising; easier to program without loss of
control?
 I had problems using it on Windows (issue #189)
 Abstracts away the different sizes of registers as packs
 Provides facilities to deal with alignment
 Provides natural syntax for manipulating packs, i.e.
a+b adds two packs together
 Single code base can target multiple extensions
 Templates expand to calls to intrinsics
Group Exercise
 Convert BasicMandel to use intrinsics
 AVX packs 8 32-bit floats to a single 256-bit register
 AVX Intrinsics:
 #include <immintrin.h>
 __m256 _mm256_add_ps(__m256 a, __m256 b)
 __m256 _m256_mul_ps(__m256 a, __m256 b)
 __m256 _m256_sub_ps(__m256 a, __m256 b)
 __m256 _mm256_load_ps(float const *c)
 __m256 _mm256_cmp_ps(__m256 a, __m256 b, const int compOp)
 __m256i _mm256_castps_si256(__m256 a)
 Intel Intrinsics Guide

More Related Content

What's hot

Arithmetic Logic Unit .
Arithmetic Logic Unit .Arithmetic Logic Unit .
Arithmetic Logic Unit .
Deyaa Ahmed
 
feedback_optimizations_v2
feedback_optimizations_v2feedback_optimizations_v2
feedback_optimizations_v2Ani Sridhar
 
Arithmetic and logic unit
Arithmetic and logic unitArithmetic and logic unit
Arithmetic and logic unit
IndrajaMeghavathula
 
Arithmetic Logic Unit (ALU)
Arithmetic Logic Unit (ALU)Arithmetic Logic Unit (ALU)
Arithmetic Logic Unit (ALU)
Student
 
ALU arithmetic logic unit
ALU  arithmetic logic unitALU  arithmetic logic unit
ALU arithmetic logic unit
Karthik Prof.
 
Cba lecture 6 intro_ch_06_a_br
Cba lecture 6 intro_ch_06_a_brCba lecture 6 intro_ch_06_a_br
Cba lecture 6 intro_ch_06_a_br
nazninislamnipa
 
Arithmetic logic shift unit
Arithmetic logic shift unitArithmetic logic shift unit
Arithmetic logic shift unit
rishi ram khanal
 
ALU
ALUALU
Aca2 06 new
Aca2 06 newAca2 06 new
Aca2 06 new
Sumit Mittu
 
CArcMOOC 04.01 - Von Neumann and CPU micro-architecture
CArcMOOC 04.01 - Von Neumann and CPU micro-architectureCArcMOOC 04.01 - Von Neumann and CPU micro-architecture
CArcMOOC 04.01 - Von Neumann and CPU micro-architecture
Alessandro Bogliolo
 
Lecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.pptLecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.ppt
RaJibRaju3
 
2 bit alu
2 bit alu2 bit alu
2 bit alu
Mahmudul Hasan
 
Register & Memory
Register & MemoryRegister & Memory
Register & Memory
Education Front
 
X86 Architecture
X86 Architecture X86 Architecture
X86 Architecture
IGZ Software house
 
Intel x86 and ARM Data types
Intel x86 and ARM Data typesIntel x86 and ARM Data types
Intel x86 and ARM Data types
Rowena Cornejo
 

What's hot (16)

Arithmetic Logic Unit .
Arithmetic Logic Unit .Arithmetic Logic Unit .
Arithmetic Logic Unit .
 
arithmetic logic unit
arithmetic logic unitarithmetic logic unit
arithmetic logic unit
 
feedback_optimizations_v2
feedback_optimizations_v2feedback_optimizations_v2
feedback_optimizations_v2
 
Arithmetic and logic unit
Arithmetic and logic unitArithmetic and logic unit
Arithmetic and logic unit
 
Arithmetic Logic Unit (ALU)
Arithmetic Logic Unit (ALU)Arithmetic Logic Unit (ALU)
Arithmetic Logic Unit (ALU)
 
ALU arithmetic logic unit
ALU  arithmetic logic unitALU  arithmetic logic unit
ALU arithmetic logic unit
 
Cba lecture 6 intro_ch_06_a_br
Cba lecture 6 intro_ch_06_a_brCba lecture 6 intro_ch_06_a_br
Cba lecture 6 intro_ch_06_a_br
 
Arithmetic logic shift unit
Arithmetic logic shift unitArithmetic logic shift unit
Arithmetic logic shift unit
 
ALU
ALUALU
ALU
 
Aca2 06 new
Aca2 06 newAca2 06 new
Aca2 06 new
 
CArcMOOC 04.01 - Von Neumann and CPU micro-architecture
CArcMOOC 04.01 - Von Neumann and CPU micro-architectureCArcMOOC 04.01 - Von Neumann and CPU micro-architecture
CArcMOOC 04.01 - Von Neumann and CPU micro-architecture
 
Lecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.pptLecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.ppt
 
2 bit alu
2 bit alu2 bit alu
2 bit alu
 
Register & Memory
Register & MemoryRegister & Memory
Register & Memory
 
X86 Architecture
X86 Architecture X86 Architecture
X86 Architecture
 
Intel x86 and ARM Data types
Intel x86 and ARM Data typesIntel x86 and ARM Data types
Intel x86 and ARM Data types
 

Similar to SIMD Processing Using Compiler Intrinsics

8871077.ppt
8871077.ppt8871077.ppt
8871077.ppt
ssuserc28b3c
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
Edge AI and Vision Alliance
 
Something about SSE and beyond
Something about SSE and beyondSomething about SSE and beyond
Something about SSE and beyond
Lihang Li
 
Introduction to computer architecture .pptx
Introduction to computer architecture .pptxIntroduction to computer architecture .pptx
Introduction to computer architecture .pptx
Fatma Sayed Ibrahim
 
The x86 Family
The x86 FamilyThe x86 Family
The x86 Family
Motaz Saad
 
x86_1.ppt
x86_1.pptx86_1.ppt
x86_1.ppt
jeronimored
 
C programming part2
C programming part2C programming part2
C programming part2
Keroles karam khalil
 
C programming part2
C programming part2C programming part2
C programming part2
Keroles karam khalil
 
C programming part2
C programming part2C programming part2
C programming part2
Keroles karam khalil
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
Dilum Bandara
 
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structuChapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
EstelaJeffery653
 
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers LibraryAdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
jamieayre
 
Instruction set.pptx
Instruction set.pptxInstruction set.pptx
Instruction set.pptx
ssuser000e54
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeDmitri Nesteruk
 
Js2517181724
Js2517181724Js2517181724
Js2517181724
IJERA Editor
 
Creating user-mode debuggers for Windows
Creating user-mode debuggers for WindowsCreating user-mode debuggers for Windows
Creating user-mode debuggers for Windows
Mithun Shanbhag
 
Lec02
Lec02Lec02
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Deepak Shankar
 
Computer architecture instruction formats
Computer architecture instruction formatsComputer architecture instruction formats
Computer architecture instruction formats
Mazin Alwaaly
 

Similar to SIMD Processing Using Compiler Intrinsics (20)

8871077.ppt
8871077.ppt8871077.ppt
8871077.ppt
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
 
Something about SSE and beyond
Something about SSE and beyondSomething about SSE and beyond
Something about SSE and beyond
 
Introduction to computer architecture .pptx
Introduction to computer architecture .pptxIntroduction to computer architecture .pptx
Introduction to computer architecture .pptx
 
The x86 Family
The x86 FamilyThe x86 Family
The x86 Family
 
x86_1.ppt
x86_1.pptx86_1.ppt
x86_1.ppt
 
C programming part2
C programming part2C programming part2
C programming part2
 
C programming part2
C programming part2C programming part2
C programming part2
 
C programming part2
C programming part2C programming part2
C programming part2
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
 
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structuChapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
 
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers LibraryAdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
 
Instruction set.pptx
Instruction set.pptxInstruction set.pptx
Instruction set.pptx
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
 
Js2517181724
Js2517181724Js2517181724
Js2517181724
 
Js2517181724
Js2517181724Js2517181724
Js2517181724
 
Creating user-mode debuggers for Windows
Creating user-mode debuggers for WindowsCreating user-mode debuggers for Windows
Creating user-mode debuggers for Windows
 
Lec02
Lec02Lec02
Lec02
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
 
Computer architecture instruction formats
Computer architecture instruction formatsComputer architecture instruction formats
Computer architecture instruction formats
 

More from Richard Thomson

Vintage Computing Festival Midwest 18 2023-09-09 What's In A Terminal.pdf
Vintage Computing Festival Midwest 18 2023-09-09 What's In A Terminal.pdfVintage Computing Festival Midwest 18 2023-09-09 What's In A Terminal.pdf
Vintage Computing Festival Midwest 18 2023-09-09 What's In A Terminal.pdf
Richard Thomson
 
Automated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDashAutomated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDash
Richard Thomson
 
Feature and platform testing with CMake
Feature and platform testing with CMakeFeature and platform testing with CMake
Feature and platform testing with CMake
Richard Thomson
 
Consuming Libraries with CMake
Consuming Libraries with CMakeConsuming Libraries with CMake
Consuming Libraries with CMake
Richard Thomson
 
BEFLIX
BEFLIXBEFLIX
Modern C++
Modern C++Modern C++
Modern C++
Richard Thomson
 
Cross Platform Mobile Development with Visual Studio 2015 and C++
Cross Platform Mobile Development with Visual Studio 2015 and C++Cross Platform Mobile Development with Visual Studio 2015 and C++
Cross Platform Mobile Development with Visual Studio 2015 and C++
Richard Thomson
 
Consuming and Creating Libraries in C++
Consuming and Creating Libraries in C++Consuming and Creating Libraries in C++
Consuming and Creating Libraries in C++
Richard Thomson
 
Web mashups with NodeJS
Web mashups with NodeJSWeb mashups with NodeJS
Web mashups with NodeJS
Richard Thomson
 
C traps and pitfalls for C++ programmers
C traps and pitfalls for C++ programmersC traps and pitfalls for C++ programmers
C traps and pitfalls for C++ programmers
Richard Thomson
 

More from Richard Thomson (10)

Vintage Computing Festival Midwest 18 2023-09-09 What's In A Terminal.pdf
Vintage Computing Festival Midwest 18 2023-09-09 What's In A Terminal.pdfVintage Computing Festival Midwest 18 2023-09-09 What's In A Terminal.pdf
Vintage Computing Festival Midwest 18 2023-09-09 What's In A Terminal.pdf
 
Automated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDashAutomated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDash
 
Feature and platform testing with CMake
Feature and platform testing with CMakeFeature and platform testing with CMake
Feature and platform testing with CMake
 
Consuming Libraries with CMake
Consuming Libraries with CMakeConsuming Libraries with CMake
Consuming Libraries with CMake
 
BEFLIX
BEFLIXBEFLIX
BEFLIX
 
Modern C++
Modern C++Modern C++
Modern C++
 
Cross Platform Mobile Development with Visual Studio 2015 and C++
Cross Platform Mobile Development with Visual Studio 2015 and C++Cross Platform Mobile Development with Visual Studio 2015 and C++
Cross Platform Mobile Development with Visual Studio 2015 and C++
 
Consuming and Creating Libraries in C++
Consuming and Creating Libraries in C++Consuming and Creating Libraries in C++
Consuming and Creating Libraries in C++
 
Web mashups with NodeJS
Web mashups with NodeJSWeb mashups with NodeJS
Web mashups with NodeJS
 
C traps and pitfalls for C++ programmers
C traps and pitfalls for C++ programmersC traps and pitfalls for C++ programmers
C traps and pitfalls for C++ programmers
 

Recently uploaded

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 

Recently uploaded (20)

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 

SIMD Processing Using Compiler Intrinsics

  • 3. SIMD Exploits Data Parallelism  Image Processing  Array Processing  Scientific Computing  3D Graphics
  • 4. Brief History of CPU SIMD Year Extension Register Size 1997 MMX 64 bits 1999 SSE 128 bits 2001 SSE2 128 bits 2004 SSE3 128 bits 2006 SSE4 128 bits 2008 AVX 256 bits 2015 AVX-512 512 bits
  • 5. Data Types  8-bit integers  16-bit integers  32-bit integers  64-bit integers  16-bit floats  32-bit floats  64-bit floats  Multiple smaller quantities are packed into registers ("multiple data")  Alignment requirements on data  Older extensions do not support all data types
  • 6. Alignment C++11 struct alignas(16) foo { int i; // 4 bytes int j; // 4 bytes alignas(4) char s[3]; // 3 bytes short q; // 2 bytes }; // outputs 16: std::cout << alignof(foo) << 'n';
  • 7. Alignment C++03 // pre-C++11 // MSVC: struct __declspec(align(16)) foo { // ... }; // gcc: struct foo __attribute__((aligned(16))) { // ... };
  • 8. Boost.Align  Handles heap allocation of aligned memory  Query the alignment requirements of a type  Declare alignment to the compiler portably
  • 9. Compiler Intrinsics  A function whose implementation is handled directly by the compiler.  SIMD registers exposed as data types  __m64, __m128, __m128d, __m128i, etc.  SIMD instructions exposed as intrinsic functions  _m_paddb, _m_paddd, _m_paddsb, etc.  Register allocation, instruction scheduling and addressing modes handled by the compiler  Proper alignment of operands is assumed
  • 10. Options Available Assembly Intrinsics Class Library Automatic Vectorization + Direct control, - Hard to program + Pure C/C++, - Hard to program + Easier to program, - Less control - Very little control
  • 11. Proposed Boost.Simd  https://github.com/NumScale/boost.simd  Seems promising; easier to program without loss of control?  I had problems using it on Windows (issue #189)  Abstracts away the different sizes of registers as packs  Provides facilities to deal with alignment  Provides natural syntax for manipulating packs, i.e. a+b adds two packs together  Single code base can target multiple extensions  Templates expand to calls to intrinsics
  • 12. Group Exercise  Convert BasicMandel to use intrinsics  AVX packs 8 32-bit floats to a single 256-bit register  AVX Intrinsics:  #include <immintrin.h>  __m256 _mm256_add_ps(__m256 a, __m256 b)  __m256 _m256_mul_ps(__m256 a, __m256 b)  __m256 _m256_sub_ps(__m256 a, __m256 b)  __m256 _mm256_load_ps(float const *c)  __m256 _mm256_cmp_ps(__m256 a, __m256 b, const int compOp)  __m256i _mm256_castps_si256(__m256 a)  Intel Intrinsics Guide