Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A HIGH-LEVEL PROGRAMMING APPROACH
FOR
USING FPGAS IN HPC
USING
FUNCTIONAL DESCRIPTION,
VECTOR TYPE-TRANSFORMATIONS AND
COS...
Using Safe Transformations and a
Cost-model For HPC On FPGAs
• The TyTra project context
• Our approach, blue-sky target, ...
THE CONTEXT
Our approach, blue-sky target, down-to-earth target, where we are now,
how we are different
Blue Sky Target
Blue Sky Target
Cost Model
Legacy
Scientific Code
Heterogeneous
HPC Target
Description
Optimized HPC
solution!
The goal th...
6
A performance portable code-base that builds on a purely software programming
paradigm.
The Cunning Plan…
The Cunning Plan…
1. Functional programming paradigm and
(auto) generate correct-by-constructions
program-variants through...
The Cunning Plan…
1. Functional programming paradigm and
(auto) generate correct-by-constructions
program-variants through...
And You May Very Well Ask…
9
The jury is still out…
Where We Are Now
Working with small but real scientific code
Where We Are Now Legacy Fortran
Scientific Code
Working with small but real scientific code
VECTOR
TYPE TRANSFORMATIONS
Wim’s slides
IR AND COST MODEL
(1) A custom Intermediate Language, and (2) a fast and accurate Cost
Model
Pre-requisite: Models
Of Abstraction
1. Platform model
2. Memory hierarchy model
3. Execution model
4. Design-space and co...
Pre-requisite: Models
Of Abstraction
1. Platform model
2. Memory hierarchy model
3. Execution model
4. Design-space model
...
Platform And Memory Model
Pre-requisite: Models
Of Abstraction
1. Platform model
2. Memory hierarchy model
3. Execution model
4. Design-space model
...
Design Space
Pre-requisite: Models
Of Abstraction
1. Platform model
2. Memory hierarchy model
3. Execution model
4. Design-space model
...
Performance Estimate
Dependence On Memory Execution Model
Time
Activity
Host

Device-DRAM
Device-DRAM

Device-Buffers
De...
Performance Estimate
Dependence On Memory Execution Model
Time
Activity
Host

Device-DRAM
Device-DRAM

Device-Buffers
De...
Performance Estimate
Dependence On Memory Execution Model
Time
Activity
Host

Device-DRAM
Device-DRAM

Device-Buffers
De...
Performance Estimate
Dependence On Memory Execution Model
Time
Activity
Host

Device-DRAM
Device-DRAM

Device-Buffers
De...
Performance Estimate
Dependence On Memory Execution Model
Time
Activity
Host

Device-DRAM
Device-DRAM

Device-Buffers
De...
Pre-requisite: Models
Of Abstraction
1. Platform model
2. Memory hierarchy model
3. Execution model
4. Design-space model
...
Pre-requisite: Models
Of Abstraction
1. Platform model
2. Memory hierarchy model
3. Execution model
4. Design-space model
...
The Back-end
Approach
• Use (or design) an IR that can capture all these models
• We ended up using LLVM and modifying it ...
The IR
The Tytra IR
• Strongly and statically typed - Largely based on the LLVM-IR
• All computations expressed as SSA (Single-St...
Tytra-IR Syntax
A Typical Tytra-IR
Configuration Tree
The Cost-model
The Cost-model Use-case
34
A set of standardized experiments feed target-specific empirical data to the cost
model, and th...
Resource Estimates - Example
35
Integer Division
Integer Multiplication
Light-weight cost expressions associated with ever...
Performance Estimate
Performance Estimate
 Effective Work-Instance Throughput (EWIT)
o Work-Instance = Executing the kernel over the entire in...
Platform And Memory Model
Performance Estimate
 Effective Work-Instance Throughput (EWIT)
o Work-Instance = Executing the kernel over the entire in...
Forms of Memory
Execution
Performance Estimate
 Effective Work-Instance Throughput (EWIT)
o Work-Instance = Executing the kernel over the entire in...
Effect of Access Pattern with Different Array
Sizes
Effect of using Vector-Access Optimizations with
Different Array Sizes
Performance Estimates
Parameters that Make up the Expression
Performance Estimates
The Expressions
Performance Estimates
The Expressions
Performance Estimates
The Expressions
Performance Estimates
The Expressions
Performance Estimates
The Expressions
49
Performance Estimates
Experimental Results (Type C)
Estimated vs actual cost and throughput
(CPWI = cycles per work instan...
Does The Tytra Approach Work?
How Fast Is The Cost Model
70
0.3
0
10
20
30
40
50
60
70
80
Xilinx SDAccel toolS TyTra
Time taken to generate estimate (se...
Design-space Exploration?
CONCLUSION
The Route To Automated Design Space
Exploration On FPGAs For HPC
Applications
 The larger aim is to create a turn-key com...
The woods are lovely, dark and deep,
But I havepromises to keep,
And lines to code before I sleep,
And lines to code befor...
A High-Level Programming Approach for using FPGAs in HPC using Functional Description, Vector Type-Transformations and Cos...
Upcoming SlideShare
Loading in …5
×

A High-Level Programming Approach for using FPGAs in HPC using Functional Description, Vector Type-Transformations and Cost-Modelling

229 views

Published on

Delivered at School of Informatics, University of Edinburgh, 25 Feb 2016

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

A High-Level Programming Approach for using FPGAs in HPC using Functional Description, Vector Type-Transformations and Cost-Modelling

  1. 1. A HIGH-LEVEL PROGRAMMING APPROACH FOR USING FPGAS IN HPC USING FUNCTIONAL DESCRIPTION, VECTOR TYPE-TRANSFORMATIONS AND COST-MODELLING S WAQAR NABI & WIM VANDERBAUWHEDE www.tytra.org.uk School of Informatics, University of Edinburgh,, 25 Feb 2016
  2. 2. Using Safe Transformations and a Cost-model For HPC On FPGAs • The TyTra project context • Our approach, blue-sky target, down-to-earth target, where we are now, how we are different • Key contributions • (1) Type transformations to create design-variants, (2) a new Intermediate Language, and (3) an FPGA Cost model • The cost model • Performance and resource-usage estimates, some results Using safe transformations and an associated light-weight cost-model opens the route to a fully automated design-space exploration flow
  3. 3. THE CONTEXT Our approach, blue-sky target, down-to-earth target, where we are now, how we are different
  4. 4. Blue Sky Target
  5. 5. Blue Sky Target Cost Model Legacy Scientific Code Heterogeneous HPC Target Description Optimized HPC solution! The goal that keeps us motivated! ( The pragmatic target is somewhat more modest…)
  6. 6. 6 A performance portable code-base that builds on a purely software programming paradigm. The Cunning Plan…
  7. 7. The Cunning Plan… 1. Functional programming paradigm and (auto) generate correct-by-constructions program-variants through vector- transformations • which translate to design-variants on the FPGA. 2. Create an Intermediate Language: • captures design-space • light-weight cost-model • target for front-end compiler 3. Create a fast and accurate cost-model that can estimate the performance and resource-utilization for each variant. 7 A performance portable code-base that builds on a purely software programming paradigm.
  8. 8. The Cunning Plan… 1. Functional programming paradigm and (auto) generate correct-by-constructions program-variants through vector- transformations • which translate to design-variants on the FPGA. 2. Create an Intermediate Language: • captures design-space • light-weight cost-model • target for front-end compiler 3. Create a fast and accurate cost-model that can estimate the performance and resource-utilization for each variant. 8 A performance portable code-base that builds on a purely software programming paradigm.
  9. 9. And You May Very Well Ask… 9 The jury is still out…
  10. 10. Where We Are Now Working with small but real scientific code
  11. 11. Where We Are Now Legacy Fortran Scientific Code Working with small but real scientific code
  12. 12. VECTOR TYPE TRANSFORMATIONS Wim’s slides
  13. 13. IR AND COST MODEL (1) A custom Intermediate Language, and (2) a fast and accurate Cost Model
  14. 14. Pre-requisite: Models Of Abstraction 1. Platform model 2. Memory hierarchy model 3. Execution model 4. Design-space and cost-space model 5. Memory execution model 6. Data access pattern model
  15. 15. Pre-requisite: Models Of Abstraction 1. Platform model 2. Memory hierarchy model 3. Execution model 4. Design-space model 5. Memory execution model 6. Data access pattern model (More or less) based on OpenCL standard
  16. 16. Platform And Memory Model
  17. 17. Pre-requisite: Models Of Abstraction 1. Platform model 2. Memory hierarchy model 3. Execution model 4. Design-space model 5. Memory execution model 6. Data access pattern model
  18. 18. Design Space
  19. 19. Pre-requisite: Models Of Abstraction 1. Platform model 2. Memory hierarchy model 3. Execution model 4. Design-space model 5. Memory execution model 6. Data access pattern model
  20. 20. Performance Estimate Dependence On Memory Execution Model Time Activity Host  Device-DRAM Device-DRAM  Device-Buffers Device-Buffers  Offset-Buffers Kernel Pipeline Execution
  21. 21. Performance Estimate Dependence On Memory Execution Model Time Activity Host  Device-DRAM Device-DRAM  Device-Buffers Device-Buffers  Offset-Buffers Kernel Pipeline Execution
  22. 22. Performance Estimate Dependence On Memory Execution Model Time Activity Host  Device-DRAM Device-DRAM  Device-Buffers Device-Buffers  Offset-Buffers Kernel Pipeline Execution Work-Instance Iterations Form A All iterations
  23. 23. Performance Estimate Dependence On Memory Execution Model Time Activity Host  Device-DRAM Device-DRAM  Device-Buffers Device-Buffers  Offset-Buffers Kernel Pipeline Execution First Iteration only Last Iteration only Work-Instance Iterations Form B All other iterations
  24. 24. Performance Estimate Dependence On Memory Execution Model Time Activity Host  Device-DRAM Device-DRAM  Device-Buffers Device-Buffers  Offset-Buffers Kernel Pipeline Execution First Iteration only Last Iteration only Work-Instance Iterations Form C All other iterations Once a design-variant is categorized, performance can be estimated accordingly
  25. 25. Pre-requisite: Models Of Abstraction 1. Platform model 2. Memory hierarchy model 3. Execution model 4. Design-space model 5. Memory execution model 6. Data access pattern model
  26. 26. Pre-requisite: Models Of Abstraction 1. Platform model 2. Memory hierarchy model 3. Execution model 4. Design-space model 5. Memory execution model 6. Data access pattern model 1. Contiguous access 2. (Fixed) Strided access
  27. 27. The Back-end Approach • Use (or design) an IR that can capture all these models • We ended up using LLVM and modifying it to fit our purpose, effectively creating a custom IR we call the “TyTra-IR”. • Develop a cost-model that can evaluate the variants expressed in the IR
  28. 28. The IR
  29. 29. The Tytra IR • Strongly and statically typed - Largely based on the LLVM-IR • All computations expressed as SSA (Single-Static Assignments) • Keywords pipe, par, seq and comb to indicate type of parallelism, and nested functions of these types used to build architectural configurations Manage-IR • Memory objects • Streams • Offset streams Compute-IR • Streaming datapath model • SSA instructions
  30. 30. Tytra-IR Syntax
  31. 31. A Typical Tytra-IR Configuration Tree
  32. 32. The Cost-model
  33. 33. The Cost-model Use-case 34 A set of standardized experiments feed target-specific empirical data to the cost model, and the rest comes from the IR descripition.
  34. 34. Resource Estimates - Example 35 Integer Division Integer Multiplication Light-weight cost expressions associated with every legal SSA instruction in the TyTra-IR
  35. 35. Performance Estimate
  36. 36. Performance Estimate  Effective Work-Instance Throughput (EWIT) o Work-Instance = Executing the kernel over the entire index-space  Key Determinants o Memory execution model o Sustained memory bandwidth for the target architecture and design- variant • Data-access pattern o Design configuration of the FPGA o Operating frequency of the FPGA o Compute-bound or IO-bound? 37 Performance model is trickier, especially calculating estimates of sustained memory bandwidth.
  37. 37. Platform And Memory Model
  38. 38. Performance Estimate  Effective Work-Instance Throughput (EWIT) o Work-Instance = Executing the kernel over the entire index-space  Key Determinants o Memory execution model o Sustained memory bandwidth for the target architecture and design- variant • Data-access pattern o Design configuration of the FPGA o Operating frequency of the FPGA o Compute-bound or IO-bound? 39 Performance model is trickier, especially calculating estimates of sustained memory bandwidth.
  39. 39. Forms of Memory Execution
  40. 40. Performance Estimate  Effective Work-Instance Throughput (EWIT) o Work-Instance = Executing the kernel over the entire index-space  Key Determinants o Memory execution model o Sustained memory bandwidth for the target architecture and design-variant • Data-access pattern o Design configuration of the FPGA o Operating frequency of the FPGA o Compute-bound or IO-bound?
  41. 41. Effect of Access Pattern with Different Array Sizes
  42. 42. Effect of using Vector-Access Optimizations with Different Array Sizes
  43. 43. Performance Estimates Parameters that Make up the Expression
  44. 44. Performance Estimates The Expressions
  45. 45. Performance Estimates The Expressions
  46. 46. Performance Estimates The Expressions
  47. 47. Performance Estimates The Expressions
  48. 48. Performance Estimates The Expressions 49
  49. 49. Performance Estimates Experimental Results (Type C) Estimated vs actual cost and throughput (CPWI = cycles per work instance)
  50. 50. Does The Tytra Approach Work?
  51. 51. How Fast Is The Cost Model 70 0.3 0 10 20 30 40 50 60 70 80 Xilinx SDAccel toolS TyTra Time taken to generate estimate (sec) 200x faster
  52. 52. Design-space Exploration?
  53. 53. CONCLUSION
  54. 54. The Route To Automated Design Space Exploration On FPGAs For HPC Applications  The larger aim is to create a turn-key compiler for: Legacy scientific code  Heterogeneous HPC Platform o Current focus is on FPGAs, and on using a Functional Language design entry  Our main contributions are: o Type transformations to create design-variants, o New Intermediate Language, and o FPGA Cost model  Our FPGA Cost Model o Works on the TyTra-IR, is light-weight, accurate (enough), and allows us to evaluate design-variants Using safe transformations on a functional language paradigm and a light-weight cost-model to brings us closer to a turn-key HPC compiler for legacy code
  55. 55. The woods are lovely, dark and deep, But I havepromises to keep, And lines to code before I sleep, And lines to code before I sleep. 56 Acknowledgement We wish to acknowledge support by EPSRC through grant EP/L00058X/1. The woods are lovely, dark and deep, But I havepromises to keep, And lines to code before I sleep, And lines to code before I sleep.

×