SlideShare a Scribd company logo
1 of 18
A Quantitative Study of Irregular Programs on
                    GPUs




                  By
           Prashant Momale
              IIT Kanpur

              Guided By
         Prof. S. K. Aggarwal
Introduction

Regular vs Irregular Algorithms
- Regular Programs
(i) operate on large vectors or matrices
(ii) access them in statically predictable ways


- These codes often have high computational Demands
- exhibit extensive data parallelism
- access memory in a streaming fashion, and require little synchronization
i.e. Matrix Multiplication
Introduction(Continue...)

Irregular Programs

- build, traverse, and update irregular data structures such as trees, graphs, and priority
   queues
i.e. domains like n-body simulation, data mining, decisions problems that use Boolean
   satisfiability, optimization theory, social networks
- more difficult to parallelize
- more challenging to map to GPUs than regular programs
Introduction(Continue...)

Many Questions to be solved

- Several GPU implementation of irregular programs have been published but very little
   is known about them
- Some questions do not have clear answers like
(i) Does irregularity really manifest itself as a binary property?
(ii) How is the irregularity behavior of an application influenced by its input, if at all?
(iii) Does an increase in irregularity necessarily degrade performance or might it help in
     certain cases?
- Answers to above questions are really important to understand the behavior of irregular
   programs
Irregularity

Regular Programs
- Control flow and memory access are not data dependent
Ex. In matrix multiplication, knowing source code, starting address and input size and
   without knowing any matrix elements we can predict the behavior



Irregular Programs
- Control flow and memory access are data dependent
- Input values determine the program's behavior
Ex. Binary Search Tree implementation
   The values and the order in which they are processed affect the control flow and
   memory references
Irregularity (Continued....)

Warp Concept

- GPU contains processing elements (PEs) and tightly coupled PEs form a streaming
 multiprocessor (SM).
- Each PE in an SM can run an independent thread of instructions
- The PEs in each SM execute vector instructions that conditionally operate on 32 data
   items.
- A set of 32 threads that run together in this fashion is called a warp.
Irregularity (Continued....)

Control Flow Irregularity
- Sometimes all threads in warp can not perform same instruction.
- Threads automatically get subdivided into sets
- Threads from set performs same instruction
- But sets get executed in serial manner until they re-converge.


Situation where not all threads in warp follow the same control flow is call Thread
   Divergence.
This is a Control Flow Irregularity
Irregularity (Continued....)

Memory Access Irregularity
- Coalesced memory transaction
- When memory access is not coalesced, hardware has to perform many memory
   transactions, one after the other, compared to coalesced access.
This is how Memory Access Irregularity can lower the performance.


- Bank Conflict : Warp can simultaneously access 32 words in shared memory as long
   as they reside in different banks. If more than one word is touched within a bank
   bank conflict occurs.
Bank Conflict is another reason of memory access irregularity
Metrics of Irregularity


(i) Control Flow Irregularity


           CFI = (divergent branches ) / (executed instructions)


(ii)Memory-Access Irregularity


           MAI = ( replayed instructions) / ( issued instructions)
Metrics of Irregularity(Continued...)

- Both metrics ranges from 0% to 100%
- Higher the values higher is the irregularity
- CFI is usually low
- They are independent of runtime
- Both metric s measure irregularity at warp level


These metrics do not classify a program as regular or irregular. Rather, they
  measure the Degree of Irregularity
Results and Analysis
- Analysis of observations about the irregularity exhibited by various CUDA kernels has
   be presented.
- Investigated the effect of different program inputs
- Effect of optimizations on programs
- Variability of the results between different runs
 (i) on same GPU
 (ii) on different GPU
(Benchmarks Used :
Irregular - BFS, Barnes Hut, Data Compression, Delaunay Mesh Refinement,
            Points-to Analysis, Survey Propagation, Single Source Shortest Path, TSP
 Regular - Black Scholes, Histogram, Monte Carlo, Matrix Multiplication, N-Body )
Results and Analysis(Continued....)
Amount of Irregularity

- CFI is usually very low. For above benchmarks it is less than 4.1%
- Most of the programs can not strictly classified as regular or irregular
- Two irregularities appear to be independent of each other
- Irregular control flow generally implies irregular memory access
Results and Analysis(Continued....)

Input Sensitivity
- Input sensitivity is very difficult to predict
- Difficult to do it in application independent way


(i) Input Oblivious - Irregularity remains largely constant for different inputs
(ii) Input-type Dependent - Irregularity varies largely across different types of inputs
                            rather than within a single type
(iii) Input Dependent – Irregularity varies as size of the input varies
Results and Analysis(Continued....)


(iii) Arithmetic Precision –
    Change from single precision to double precision increases CFI and MAI for small
      inputs but decreases both for medium and large inputs
    But the change is very small.
    - It indicates that change in arithmetic precision does not affect the irregularity of
        program.
Results and Analysis(Continued....)

Variability

- Observed for several kernels on different GPUs and same GPUs for multiple runs


Irregularities are quite stable for same GPU and vary somewhat between distinct
   GPUs
Conclusion

- There is no type of programs as regular or irregular
- Irregularity is not necessarily bad for the performance
- By definition, irregular programs are data dependent but deferent inputs yield similar
   degrees of irregularity
- Irregularity does no vary much between distinct GPUs


It is expected that above conclusions hold across a broad range of CUDA-capable GPUs
     and hope that it will increase the understanding of the behavior of irregular GPU
     applications.
References

Paper : A Quantitative Study of Irregular Programs on GPUs
             By - Rupesh Nasre, Keshav Pingali, Martin Burtscher
                  Texas State University
             Published in – IEEE International Symposium on
                            Workload Characterization ( IISWC '13 )
Results and Analysis(Continued....)
Effect of Optimizations and Arithmetic Precision

(i) Regular version of one program reads records from global memory but in optimized
    version if calculates the record values on the fly.
  - This actually increase the Control Flow Irregularity
  - But faster is the performance because computations are cheaper than reading values
   from global memory.


(ii) In optimized Single Source Shortest Path algorithm, nodes which are logically close
    to each other are kept close in memory.
   - It increase the Memory-Access Irregularity but increases the spatial locality

More Related Content

Similar to Irregular Programs on GPU

A Parallel Computing-a Paradigm to achieve High Performance
A Parallel Computing-a Paradigm to achieve High PerformanceA Parallel Computing-a Paradigm to achieve High Performance
A Parallel Computing-a Paradigm to achieve High PerformanceAM Publications
 
V1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docV1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docpraveena06
 
Improvement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining TechniquesImprovement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining Techniquesijdmtaiir
 
Cloud data management
Cloud data managementCloud data management
Cloud data managementambitlick
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Sudarshan Mondal
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...Malobe Lottin Cyrille Marcel
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEAravind NC
 
Computing notes
Computing notesComputing notes
Computing notesthenraju24
 
The influence of data size on a high-performance computing memetic algorithm ...
The influence of data size on a high-performance computing memetic algorithm ...The influence of data size on a high-performance computing memetic algorithm ...
The influence of data size on a high-performance computing memetic algorithm ...journalBEEI
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine LearningSudarsun Santhiappan
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxShakas Technologies
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxShakas Technologies
 
Data Analysis In The Cloud
Data Analysis In The CloudData Analysis In The Cloud
Data Analysis In The CloudMonica Carter
 
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : NotesIs Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : NotesSubhajit Sahu
 
Concurrency and Parallelism, Asynchronous Programming, Network Programming
Concurrency and Parallelism, Asynchronous Programming, Network ProgrammingConcurrency and Parallelism, Asynchronous Programming, Network Programming
Concurrency and Parallelism, Asynchronous Programming, Network ProgrammingPrabu U
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSijdpsjournal
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collectionsijdpsjournal
 

Similar to Irregular Programs on GPU (20)

Parallel processing
Parallel processingParallel processing
Parallel processing
 
A Parallel Computing-a Paradigm to achieve High Performance
A Parallel Computing-a Paradigm to achieve High PerformanceA Parallel Computing-a Paradigm to achieve High Performance
A Parallel Computing-a Paradigm to achieve High Performance
 
V1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docV1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.doc
 
Improvement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining TechniquesImprovement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining Techniques
 
Cloud data management
Cloud data managementCloud data management
Cloud data management
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
chap-0 .ppt
chap-0 .pptchap-0 .ppt
chap-0 .ppt
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
 
Computing notes
Computing notesComputing notes
Computing notes
 
The influence of data size on a high-performance computing memetic algorithm ...
The influence of data size on a high-performance computing memetic algorithm ...The influence of data size on a high-performance computing memetic algorithm ...
The influence of data size on a high-performance computing memetic algorithm ...
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
 
Data Analysis In The Cloud
Data Analysis In The CloudData Analysis In The Cloud
Data Analysis In The Cloud
 
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : NotesIs Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
 
Concurrency and Parallelism, Asynchronous Programming, Network Programming
Concurrency and Parallelism, Asynchronous Programming, Network ProgrammingConcurrency and Parallelism, Asynchronous Programming, Network Programming
Concurrency and Parallelism, Asynchronous Programming, Network Programming
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONS
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collections
 

Recently uploaded

Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxLimon Prince
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSean M. Fox
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxMarlene Maheu
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project researchCaitlinCummins3
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17Celine George
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMELOISARIVERA8
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code ExamplesPeter Brusilovsky
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital ManagementMBA Assignment Experts
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesAmanpreetKaur157993
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptNishitharanjan Rout
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSAnaAcapella
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...Nguyen Thanh Tu Collection
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文中 央社
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....Ritu480198
 
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportBasic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportDenish Jangid
 
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfRich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfJerry Chew
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppCeline George
 

Recently uploaded (20)

Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptx
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategies
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportBasic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
 
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfRich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 

Irregular Programs on GPU

  • 1. A Quantitative Study of Irregular Programs on GPUs By Prashant Momale IIT Kanpur Guided By Prof. S. K. Aggarwal
  • 2. Introduction Regular vs Irregular Algorithms - Regular Programs (i) operate on large vectors or matrices (ii) access them in statically predictable ways - These codes often have high computational Demands - exhibit extensive data parallelism - access memory in a streaming fashion, and require little synchronization i.e. Matrix Multiplication
  • 3. Introduction(Continue...) Irregular Programs - build, traverse, and update irregular data structures such as trees, graphs, and priority queues i.e. domains like n-body simulation, data mining, decisions problems that use Boolean satisfiability, optimization theory, social networks - more difficult to parallelize - more challenging to map to GPUs than regular programs
  • 4. Introduction(Continue...) Many Questions to be solved - Several GPU implementation of irregular programs have been published but very little is known about them - Some questions do not have clear answers like (i) Does irregularity really manifest itself as a binary property? (ii) How is the irregularity behavior of an application influenced by its input, if at all? (iii) Does an increase in irregularity necessarily degrade performance or might it help in certain cases? - Answers to above questions are really important to understand the behavior of irregular programs
  • 5. Irregularity Regular Programs - Control flow and memory access are not data dependent Ex. In matrix multiplication, knowing source code, starting address and input size and without knowing any matrix elements we can predict the behavior Irregular Programs - Control flow and memory access are data dependent - Input values determine the program's behavior Ex. Binary Search Tree implementation The values and the order in which they are processed affect the control flow and memory references
  • 6. Irregularity (Continued....) Warp Concept - GPU contains processing elements (PEs) and tightly coupled PEs form a streaming multiprocessor (SM). - Each PE in an SM can run an independent thread of instructions - The PEs in each SM execute vector instructions that conditionally operate on 32 data items. - A set of 32 threads that run together in this fashion is called a warp.
  • 7. Irregularity (Continued....) Control Flow Irregularity - Sometimes all threads in warp can not perform same instruction. - Threads automatically get subdivided into sets - Threads from set performs same instruction - But sets get executed in serial manner until they re-converge. Situation where not all threads in warp follow the same control flow is call Thread Divergence. This is a Control Flow Irregularity
  • 8. Irregularity (Continued....) Memory Access Irregularity - Coalesced memory transaction - When memory access is not coalesced, hardware has to perform many memory transactions, one after the other, compared to coalesced access. This is how Memory Access Irregularity can lower the performance. - Bank Conflict : Warp can simultaneously access 32 words in shared memory as long as they reside in different banks. If more than one word is touched within a bank bank conflict occurs. Bank Conflict is another reason of memory access irregularity
  • 9. Metrics of Irregularity (i) Control Flow Irregularity CFI = (divergent branches ) / (executed instructions) (ii)Memory-Access Irregularity MAI = ( replayed instructions) / ( issued instructions)
  • 10. Metrics of Irregularity(Continued...) - Both metrics ranges from 0% to 100% - Higher the values higher is the irregularity - CFI is usually low - They are independent of runtime - Both metric s measure irregularity at warp level These metrics do not classify a program as regular or irregular. Rather, they measure the Degree of Irregularity
  • 11. Results and Analysis - Analysis of observations about the irregularity exhibited by various CUDA kernels has be presented. - Investigated the effect of different program inputs - Effect of optimizations on programs - Variability of the results between different runs (i) on same GPU (ii) on different GPU (Benchmarks Used : Irregular - BFS, Barnes Hut, Data Compression, Delaunay Mesh Refinement, Points-to Analysis, Survey Propagation, Single Source Shortest Path, TSP Regular - Black Scholes, Histogram, Monte Carlo, Matrix Multiplication, N-Body )
  • 12. Results and Analysis(Continued....) Amount of Irregularity - CFI is usually very low. For above benchmarks it is less than 4.1% - Most of the programs can not strictly classified as regular or irregular - Two irregularities appear to be independent of each other - Irregular control flow generally implies irregular memory access
  • 13. Results and Analysis(Continued....) Input Sensitivity - Input sensitivity is very difficult to predict - Difficult to do it in application independent way (i) Input Oblivious - Irregularity remains largely constant for different inputs (ii) Input-type Dependent - Irregularity varies largely across different types of inputs rather than within a single type (iii) Input Dependent – Irregularity varies as size of the input varies
  • 14. Results and Analysis(Continued....) (iii) Arithmetic Precision – Change from single precision to double precision increases CFI and MAI for small inputs but decreases both for medium and large inputs But the change is very small. - It indicates that change in arithmetic precision does not affect the irregularity of program.
  • 15. Results and Analysis(Continued....) Variability - Observed for several kernels on different GPUs and same GPUs for multiple runs Irregularities are quite stable for same GPU and vary somewhat between distinct GPUs
  • 16. Conclusion - There is no type of programs as regular or irregular - Irregularity is not necessarily bad for the performance - By definition, irregular programs are data dependent but deferent inputs yield similar degrees of irregularity - Irregularity does no vary much between distinct GPUs It is expected that above conclusions hold across a broad range of CUDA-capable GPUs and hope that it will increase the understanding of the behavior of irregular GPU applications.
  • 17. References Paper : A Quantitative Study of Irregular Programs on GPUs By - Rupesh Nasre, Keshav Pingali, Martin Burtscher Texas State University Published in – IEEE International Symposium on Workload Characterization ( IISWC '13 )
  • 18. Results and Analysis(Continued....) Effect of Optimizations and Arithmetic Precision (i) Regular version of one program reads records from global memory but in optimized version if calculates the record values on the fly. - This actually increase the Control Flow Irregularity - But faster is the performance because computations are cheaper than reading values from global memory. (ii) In optimized Single Source Shortest Path algorithm, nodes which are logically close to each other are kept close in memory. - It increase the Memory-Access Irregularity but increases the spatial locality