SlideShare a Scribd company logo
1 of 24
Download to read offline
AN LLVM BACKEND
FOR GHC
David Terei & Manuel Chakravarty
University of New South Wales

Haskell Symposium 2010
Baltimore, USA, 30th Sep
What is ‘An LLVM Backend’?
• Low Level Virtual Machine

• Backend Compiler framework

• Designed to be used by compiler developers, not

 by software developers

• Open Source

• Heavily sponsored by Apple
Motivation
• Simplify
  • Reduce ongoing work
  • Outsource!


• Performance
  • Improve run-time
Example
collat :: Int -> Word32 -> Int                          Run-time (ms)
collat c 1 = c                                   2000
collat c n | even n =                            1800
             collat (c+1) $ n `div` 2            1600
           | otherwise =                         1400
             collat (c+1) $ 3 * n + 1
                                                 1200
                                                 1000
pmax x n = x `max` (collat 1 n, n)
                                                 800
                                                 600
main = print $ foldl pmax (1,1)
                                                 400
                [2..1000000]
                                                 200
                                                   0
                                                         LLVM   C   NCG
Different, updated run-times compared to paper
Competitors
         C Backend (C)            Native Code Generator (NCG)

• GNU C Dependency               • Huge amount of work
  • Badly supported on           • Very limited portability
    platforms such as Windows
                                 • Does very little
• Use a Mangler on
                                  optimisation work
  assembly code
• Slow compilation speed
  • Takes twice as long as the
   NCG
GHC’s Compilation Pipeline
                                                     C

   Core            STG             Cmm              NCG

                                                    LLVM
Language, Abstract Machine, Machine Configuration



                          Heap, Stack, Registers
Compiling to LLVM
Won’t be covering, see paper for full details:
• Why from Cmm and not from STG/Core
• LLVM, C-- & Cmm languages
• Dealing with LLVM’s SSA form
• LLVM type system


Will be covering:
• Handling the STG Registers
• Handling GHC’s Table-Next-To-Code optimisation
Handling the STG Registers
                               STG Register    X86 Register
Implement either by:           Base            ebx
• In memory                    Heap Pointer    edi
• Pin to hardware registers    Stack Pointer   ebp
                               R1              esi
NCG?
• Register allocator permanently stores STG registers in
  hardware

C Backend?
• Uses GNU C extension (global register variables) to
  also permanently store STG registers in hardware
Handling the STG Registers
LLVM handles by implementing a new calling convention:
             STG Register    X86 Register
             Base            ebx
             Heap Pointer    edi
             Stack Pointer   ebp
             R1              esi


  define f ghc_cc (Base, Hp, Sp, R1) {
    …
    tail call g ghc_cc (Base, Hp’, Sp’, R1’);
    return void;
  }
Handling the STG Registers
• Issue: If implemented naively then all the STG registers
 have a live range of the entire function.

• Some of the STG registers can never be scratched (e.g
 Sp, Hp…) but many can (e.g R2, R3…).

• We need to somehow tell LLVM when we no longer care
 about an STG register, otherwise it will spill and reload the
 register across calls to C land for example.
Handling the STG Registers
• We handle this by storing undef into the STG register
 when it is no longer needed. We manually scratch them.
define f ghc_cc (Base, Hp, Sp, R1, R2, R3, R4) {
  …
  store undef %R2
  store undef %R3
  store undef %R4
  call c_cc sin(double %f22);
  …
  tail call ghc_cc g(Base,Hp’,Sp’,R1’,R2’,R3’,R4’);
  return void;
}
Handling Tables-Next-To-Code
   Un-optimised Layout        Optimised Layout




                         How to implement in LLVM?
Handling Tables-Next-To-Code
Use GNU Assembler sub-section feature.
• Allows code/data to be put into numbered sub-section
• Sub-sections are appended together in order
• Table in <n>, entry code in <n+1>
                    .text 12
                    sJ8_info:
                        movl ...
                        movl ...
                        jmp ...

                    [...]
                    .text 11
                    sJ8_info_itable:
                        .long ...
                        .long 0
                        .long 327712
Handling Tables-Next-To-Code
        LLVM Mangler                     C Mangler

• 180 lines of Haskell (half   • 2,000 lines of Perl
  is documentation)            • Needed for every platform
• Needed only for OS X
Evaluation: Simplicity
         Code Size         LLVM
25000                      • Half of code is representation of
                             LLVM language
20000
                           C
15000                      • Compiler: 1,100 lines
                           • C Headers: 2,000 lines

10000                      • Perl Mangler: 2,000 lines


 5000                      NCG
                           • Shared component: 8,000 lines
   0                       • Platform specific: 4,000 – 5,000
        LLVM   C     NCG     for X86, SPARC, PowerPC
Evaluation: Performance
                                      Run-time against LLVM (%)
Nofib:                           3
• Egalitarian benchmark
  suite, everything is equal   2.5

• Memory bound, little           2
  room for optimisation
  once at Cmm stage            1.5

                                 1

                               0.5

                                 0
                                           NCG            C
                               -0.5
Repa Performance
                   runtimes (s)
16

14

12

10

 8                                      LLVM
                                        NCG
 6

 4

 2

 0
     Matrix Mult   Laplace        FFT
Compile Times, Object Sizes
      Compile Times Vs         Object Sizes Vs LLVM
           LLVM           0
60                                 NCG         C
                          -2
40
                          -4
20
                          -6
 0
        NCG         C     -8
-20
                         -10
-40
                         -12
-60

-80                      -14
Result
• LLVM Backend is simpler.


• LLVM Backend is as fast or faster.


• LLVM developers now work for GHC!
Get It
• LLVM
   • Our calling convention has been accepted upstream!
   • Included in LLVM since version 2.7


                             http://llvm.org
• GHC
  • In HEAD
  • Should be released in GHC 7.0




• Send me any programs that are slower!
Questions?
Why from Cmm?
A lot less work then from STG/Core

But…

Couldn’t you do a better job from STG/Core?

Doubtful…

Easier to fix any deficiencies in Cmm representation and
code generator
Dealing with SSA
LLVM language is SSA form:
• Each variable can only be assigned to once
• Immutable


How do we handle converting mutable Cmm variables?
• Allocate a stack slot for each Cmm variable
• Use load and stores for reads and writes
• Use ‘mem2reg’ llvm optimisation pass
 • This converts our stack allocation to LLVM variables instead that
   properly obeys the SSA requirement
Type Systems?
LLVM language has a fairly high level type system
• Strings, Arrays, Pointers…


When combined with SSA form, great for development
• 15 bug fixes required after backend finished to get test
  suite to pass
• 10 of those were motivated by type system errors
• Some could have been fairly difficult (e.g returning pointer
  instead of value)

More Related Content

What's hot

JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir Ivanov
ZeroTurnaround
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
Masashi Shibata
 
Qtpvbscripttrainings
QtpvbscripttrainingsQtpvbscripttrainings
Qtpvbscripttrainings
Ramu Palanki
 
Debugging With GNU Debugger GDB
Debugging With GNU Debugger GDBDebugging With GNU Debugger GDB
Debugging With GNU Debugger GDB
kyaw thiha
 

What's hot (20)

ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 release
 
Lustre Best Practices
Lustre Best Practices Lustre Best Practices
Lustre Best Practices
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir Ivanov
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
 
Programming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityProgramming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & Productivity
 
Porting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVEPorting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVE
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
LCU14 209- LLVM Linux
LCU14 209- LLVM LinuxLCU14 209- LLVM Linux
LCU14 209- LLVM Linux
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
Qtpvbscripttrainings
QtpvbscripttrainingsQtpvbscripttrainings
Qtpvbscripttrainings
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
RISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLD
 
DBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
DBMS benchmarking overview and trends for Moscow ACM SIGMOD ChapterDBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
DBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
 
Debugging With GNU Debugger GDB
Debugging With GNU Debugger GDBDebugging With GNU Debugger GDB
Debugging With GNU Debugger GDB
 
New Process/Thread Runtime
New Process/Thread Runtime	New Process/Thread Runtime
New Process/Thread Runtime
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
 

Similar to Haskell Symposium 2010: An LLVM backend for GHC

Similar to Haskell Symposium 2010: An LLVM backend for GHC (20)

Os Lattner
Os LattnerOs Lattner
Os Lattner
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
AVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpegAVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpeg
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differences
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approach
 
Scale Out Your Graph Across Servers and Clouds with OrientDB
Scale Out Your Graph Across Servers and Clouds  with OrientDBScale Out Your Graph Across Servers and Clouds  with OrientDB
Scale Out Your Graph Across Servers and Clouds with OrientDB
 
Værktøjer udviklet på AAU til analyse af SCJ programmer
Værktøjer udviklet på AAU til analyse af SCJ programmerVærktøjer udviklet på AAU til analyse af SCJ programmer
Værktøjer udviklet på AAU til analyse af SCJ programmer
 
LCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC session
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
Lev
LevLev
Lev
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Haskell Symposium 2010: An LLVM backend for GHC

  • 1. AN LLVM BACKEND FOR GHC David Terei & Manuel Chakravarty University of New South Wales Haskell Symposium 2010 Baltimore, USA, 30th Sep
  • 2. What is ‘An LLVM Backend’? • Low Level Virtual Machine • Backend Compiler framework • Designed to be used by compiler developers, not by software developers • Open Source • Heavily sponsored by Apple
  • 3. Motivation • Simplify • Reduce ongoing work • Outsource! • Performance • Improve run-time
  • 4. Example collat :: Int -> Word32 -> Int Run-time (ms) collat c 1 = c 2000 collat c n | even n = 1800 collat (c+1) $ n `div` 2 1600 | otherwise = 1400 collat (c+1) $ 3 * n + 1 1200 1000 pmax x n = x `max` (collat 1 n, n) 800 600 main = print $ foldl pmax (1,1) 400 [2..1000000] 200 0 LLVM C NCG Different, updated run-times compared to paper
  • 5. Competitors C Backend (C) Native Code Generator (NCG) • GNU C Dependency • Huge amount of work • Badly supported on • Very limited portability platforms such as Windows • Does very little • Use a Mangler on optimisation work assembly code • Slow compilation speed • Takes twice as long as the NCG
  • 6. GHC’s Compilation Pipeline C Core STG Cmm NCG LLVM Language, Abstract Machine, Machine Configuration Heap, Stack, Registers
  • 7. Compiling to LLVM Won’t be covering, see paper for full details: • Why from Cmm and not from STG/Core • LLVM, C-- & Cmm languages • Dealing with LLVM’s SSA form • LLVM type system Will be covering: • Handling the STG Registers • Handling GHC’s Table-Next-To-Code optimisation
  • 8. Handling the STG Registers STG Register X86 Register Implement either by: Base ebx • In memory Heap Pointer edi • Pin to hardware registers Stack Pointer ebp R1 esi NCG? • Register allocator permanently stores STG registers in hardware C Backend? • Uses GNU C extension (global register variables) to also permanently store STG registers in hardware
  • 9. Handling the STG Registers LLVM handles by implementing a new calling convention: STG Register X86 Register Base ebx Heap Pointer edi Stack Pointer ebp R1 esi define f ghc_cc (Base, Hp, Sp, R1) { … tail call g ghc_cc (Base, Hp’, Sp’, R1’); return void; }
  • 10. Handling the STG Registers • Issue: If implemented naively then all the STG registers have a live range of the entire function. • Some of the STG registers can never be scratched (e.g Sp, Hp…) but many can (e.g R2, R3…). • We need to somehow tell LLVM when we no longer care about an STG register, otherwise it will spill and reload the register across calls to C land for example.
  • 11. Handling the STG Registers • We handle this by storing undef into the STG register when it is no longer needed. We manually scratch them. define f ghc_cc (Base, Hp, Sp, R1, R2, R3, R4) { … store undef %R2 store undef %R3 store undef %R4 call c_cc sin(double %f22); … tail call ghc_cc g(Base,Hp’,Sp’,R1’,R2’,R3’,R4’); return void; }
  • 12. Handling Tables-Next-To-Code Un-optimised Layout Optimised Layout How to implement in LLVM?
  • 13. Handling Tables-Next-To-Code Use GNU Assembler sub-section feature. • Allows code/data to be put into numbered sub-section • Sub-sections are appended together in order • Table in <n>, entry code in <n+1> .text 12 sJ8_info: movl ... movl ... jmp ... [...] .text 11 sJ8_info_itable: .long ... .long 0 .long 327712
  • 14. Handling Tables-Next-To-Code LLVM Mangler C Mangler • 180 lines of Haskell (half • 2,000 lines of Perl is documentation) • Needed for every platform • Needed only for OS X
  • 15. Evaluation: Simplicity Code Size LLVM 25000 • Half of code is representation of LLVM language 20000 C 15000 • Compiler: 1,100 lines • C Headers: 2,000 lines 10000 • Perl Mangler: 2,000 lines 5000 NCG • Shared component: 8,000 lines 0 • Platform specific: 4,000 – 5,000 LLVM C NCG for X86, SPARC, PowerPC
  • 16. Evaluation: Performance Run-time against LLVM (%) Nofib: 3 • Egalitarian benchmark suite, everything is equal 2.5 • Memory bound, little 2 room for optimisation once at Cmm stage 1.5 1 0.5 0 NCG C -0.5
  • 17. Repa Performance runtimes (s) 16 14 12 10 8 LLVM NCG 6 4 2 0 Matrix Mult Laplace FFT
  • 18. Compile Times, Object Sizes Compile Times Vs Object Sizes Vs LLVM LLVM 0 60 NCG C -2 40 -4 20 -6 0 NCG C -8 -20 -10 -40 -12 -60 -80 -14
  • 19. Result • LLVM Backend is simpler. • LLVM Backend is as fast or faster. • LLVM developers now work for GHC!
  • 20. Get It • LLVM • Our calling convention has been accepted upstream! • Included in LLVM since version 2.7 http://llvm.org • GHC • In HEAD • Should be released in GHC 7.0 • Send me any programs that are slower!
  • 22. Why from Cmm? A lot less work then from STG/Core But… Couldn’t you do a better job from STG/Core? Doubtful… Easier to fix any deficiencies in Cmm representation and code generator
  • 23. Dealing with SSA LLVM language is SSA form: • Each variable can only be assigned to once • Immutable How do we handle converting mutable Cmm variables? • Allocate a stack slot for each Cmm variable • Use load and stores for reads and writes • Use ‘mem2reg’ llvm optimisation pass • This converts our stack allocation to LLVM variables instead that properly obeys the SSA requirement
  • 24. Type Systems? LLVM language has a fairly high level type system • Strings, Arrays, Pointers… When combined with SSA form, great for development • 15 bug fixes required after backend finished to get test suite to pass • 10 of those were motivated by type system errors • Some could have been fairly difficult (e.g returning pointer instead of value)