SlideShare a Scribd company logo
1 of 24
Download to read offline
AN LLVM BACKEND
FOR GHC
David Terei & Manuel Chakravarty
University of New South Wales

Haskell Symposium 2010
Baltimore, USA, 30th Sep
What is ‘An LLVM Backend’?
• Low Level Virtual Machine

• Backend Compiler framework

• Designed to be used by compiler developers, not

 by software developers

• Open Source

• Heavily sponsored by Apple
Motivation
• Simplify
  • Reduce ongoing work
  • Outsource!


• Performance
  • Improve run-time
Example
collat :: Int -> Word32 -> Int                          Run-time (ms)
collat c 1 = c                                   2000
collat c n | even n =                            1800
             collat (c+1) $ n `div` 2            1600
           | otherwise =                         1400
             collat (c+1) $ 3 * n + 1
                                                 1200
                                                 1000
pmax x n = x `max` (collat 1 n, n)
                                                 800
                                                 600
main = print $ foldl pmax (1,1)
                                                 400
                [2..1000000]
                                                 200
                                                   0
                                                         LLVM   C   NCG
Different, updated run-times compared to paper
Competitors
         C Backend (C)            Native Code Generator (NCG)

• GNU C Dependency               • Huge amount of work
  • Badly supported on           • Very limited portability
    platforms such as Windows
                                 • Does very little
• Use a Mangler on
                                  optimisation work
  assembly code
• Slow compilation speed
  • Takes twice as long as the
   NCG
GHC’s Compilation Pipeline
                                                     C

   Core            STG             Cmm              NCG

                                                    LLVM
Language, Abstract Machine, Machine Configuration



                          Heap, Stack, Registers
Compiling to LLVM
Won’t be covering, see paper for full details:
• Why from Cmm and not from STG/Core
• LLVM, C-- & Cmm languages
• Dealing with LLVM’s SSA form
• LLVM type system


Will be covering:
• Handling the STG Registers
• Handling GHC’s Table-Next-To-Code optimisation
Handling the STG Registers
                               STG Register    X86 Register
Implement either by:           Base            ebx
• In memory                    Heap Pointer    edi
• Pin to hardware registers    Stack Pointer   ebp
                               R1              esi
NCG?
• Register allocator permanently stores STG registers in
  hardware

C Backend?
• Uses GNU C extension (global register variables) to
  also permanently store STG registers in hardware
Handling the STG Registers
LLVM handles by implementing a new calling convention:
             STG Register    X86 Register
             Base            ebx
             Heap Pointer    edi
             Stack Pointer   ebp
             R1              esi


  define f ghc_cc (Base, Hp, Sp, R1) {
    …
    tail call g ghc_cc (Base, Hp’, Sp’, R1’);
    return void;
  }
Handling the STG Registers
• Issue: If implemented naively then all the STG registers
 have a live range of the entire function.

• Some of the STG registers can never be scratched (e.g
 Sp, Hp…) but many can (e.g R2, R3…).

• We need to somehow tell LLVM when we no longer care
 about an STG register, otherwise it will spill and reload the
 register across calls to C land for example.
Handling the STG Registers
• We handle this by storing undef into the STG register
 when it is no longer needed. We manually scratch them.
define f ghc_cc (Base, Hp, Sp, R1, R2, R3, R4) {
  …
  store undef %R2
  store undef %R3
  store undef %R4
  call c_cc sin(double %f22);
  …
  tail call ghc_cc g(Base,Hp’,Sp’,R1’,R2’,R3’,R4’);
  return void;
}
Handling Tables-Next-To-Code
   Un-optimised Layout        Optimised Layout




                         How to implement in LLVM?
Handling Tables-Next-To-Code
Use GNU Assembler sub-section feature.
• Allows code/data to be put into numbered sub-section
• Sub-sections are appended together in order
• Table in <n>, entry code in <n+1>
                    .text 12
                    sJ8_info:
                        movl ...
                        movl ...
                        jmp ...

                    [...]
                    .text 11
                    sJ8_info_itable:
                        .long ...
                        .long 0
                        .long 327712
Handling Tables-Next-To-Code
        LLVM Mangler                     C Mangler

• 180 lines of Haskell (half   • 2,000 lines of Perl
  is documentation)            • Needed for every platform
• Needed only for OS X
Evaluation: Simplicity
         Code Size         LLVM
25000                      • Half of code is representation of
                             LLVM language
20000
                           C
15000                      • Compiler: 1,100 lines
                           • C Headers: 2,000 lines

10000                      • Perl Mangler: 2,000 lines


 5000                      NCG
                           • Shared component: 8,000 lines
   0                       • Platform specific: 4,000 – 5,000
        LLVM   C     NCG     for X86, SPARC, PowerPC
Evaluation: Performance
                                      Run-time against LLVM (%)
Nofib:                           3
• Egalitarian benchmark
  suite, everything is equal   2.5

• Memory bound, little           2
  room for optimisation
  once at Cmm stage            1.5

                                 1

                               0.5

                                 0
                                           NCG            C
                               -0.5
Repa Performance
                   runtimes (s)
16

14

12

10

 8                                      LLVM
                                        NCG
 6

 4

 2

 0
     Matrix Mult   Laplace        FFT
Compile Times, Object Sizes
      Compile Times Vs         Object Sizes Vs LLVM
           LLVM           0
60                                 NCG         C
                          -2
40
                          -4
20
                          -6
 0
        NCG         C     -8
-20
                         -10
-40
                         -12
-60

-80                      -14
Result
• LLVM Backend is simpler.


• LLVM Backend is as fast or faster.


• LLVM developers now work for GHC!
Get It
• LLVM
   • Our calling convention has been accepted upstream!
   • Included in LLVM since version 2.7


                             http://llvm.org
• GHC
  • In HEAD
  • Should be released in GHC 7.0




• Send me any programs that are slower!
Questions?
Why from Cmm?
A lot less work then from STG/Core

But…

Couldn’t you do a better job from STG/Core?

Doubtful…

Easier to fix any deficiencies in Cmm representation and
code generator
Dealing with SSA
LLVM language is SSA form:
• Each variable can only be assigned to once
• Immutable


How do we handle converting mutable Cmm variables?
• Allocate a stack slot for each Cmm variable
• Use load and stores for reads and writes
• Use ‘mem2reg’ llvm optimisation pass
 • This converts our stack allocation to LLVM variables instead that
   properly obeys the SSA requirement
Type Systems?
LLVM language has a fairly high level type system
• Strings, Arrays, Pointers…


When combined with SSA form, great for development
• 15 bug fixes required after backend finished to get test
  suite to pass
• 10 of those were motivated by type system errors
• Some could have been fairly difficult (e.g returning pointer
  instead of value)

More Related Content

What's hot

ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 releaseLuba Tang
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovZeroTurnaround
 
한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32HANCOM MDS
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...Masashi Shibata
 
Programming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityProgramming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityLinaro
 
Porting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVEPorting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVELinaro
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
LCU14 209- LLVM Linux
LCU14 209- LLVM LinuxLCU14 209- LLVM Linux
LCU14 209- LLVM LinuxLinaro
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunningguest1f2740
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bitsChiou-Nan Chen
 
Qtpvbscripttrainings
QtpvbscripttrainingsQtpvbscripttrainings
QtpvbscripttrainingsRamu Palanki
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
RISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRay Song
 
DBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
DBMS benchmarking overview and trends for Moscow ACM SIGMOD ChapterDBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
DBMS benchmarking overview and trends for Moscow ACM SIGMOD ChapterAndrei Nikolaenko
 
Debugging With GNU Debugger GDB
Debugging With GNU Debugger GDBDebugging With GNU Debugger GDB
Debugging With GNU Debugger GDBkyaw thiha
 
New Process/Thread Runtime
New Process/Thread Runtime	New Process/Thread Runtime
New Process/Thread Runtime Linaro
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Anne Nicolas
 

What's hot (20)

ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 release
 
Lustre Best Practices
Lustre Best Practices Lustre Best Practices
Lustre Best Practices
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir Ivanov
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
 
Programming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityProgramming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & Productivity
 
Porting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVEPorting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVE
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
LCU14 209- LLVM Linux
LCU14 209- LLVM LinuxLCU14 209- LLVM Linux
LCU14 209- LLVM Linux
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
Qtpvbscripttrainings
QtpvbscripttrainingsQtpvbscripttrainings
Qtpvbscripttrainings
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
RISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLD
 
DBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
DBMS benchmarking overview and trends for Moscow ACM SIGMOD ChapterDBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
DBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
 
Debugging With GNU Debugger GDB
Debugging With GNU Debugger GDBDebugging With GNU Debugger GDB
Debugging With GNU Debugger GDB
 
New Process/Thread Runtime
New Process/Thread Runtime	New Process/Thread Runtime
New Process/Thread Runtime
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
 

Similar to Haskell Symposium 2010: An LLVM backend for GHC

Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulHostedbyConfluent
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulHostedbyConfluent
 
AVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpegAVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpegKieran Kunhya
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differencesJean-Philippe BEMPEL
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdfssuser3fb50b
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsScyllaDB
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachjemin lee
 
Scale Out Your Graph Across Servers and Clouds with OrientDB
Scale Out Your Graph Across Servers and Clouds  with OrientDBScale Out Your Graph Across Servers and Clouds  with OrientDB
Scale Out Your Graph Across Servers and Clouds with OrientDBLuca Garulli
 
LCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLinaro
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterTim Ellison
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Anna Shymchenko
 

Similar to Haskell Symposium 2010: An LLVM backend for GHC (20)

Os Lattner
Os LattnerOs Lattner
Os Lattner
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
AVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpegAVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpeg
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differences
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approach
 
Scale Out Your Graph Across Servers and Clouds with OrientDB
Scale Out Your Graph Across Servers and Clouds  with OrientDBScale Out Your Graph Across Servers and Clouds  with OrientDB
Scale Out Your Graph Across Servers and Clouds with OrientDB
 
Værktøjer udviklet på AAU til analyse af SCJ programmer
Værktøjer udviklet på AAU til analyse af SCJ programmerVærktøjer udviklet på AAU til analyse af SCJ programmer
Værktøjer udviklet på AAU til analyse af SCJ programmer
 
LCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC session
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
Lev
LevLev
Lev
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 

Haskell Symposium 2010: An LLVM backend for GHC

  • 1. AN LLVM BACKEND FOR GHC David Terei & Manuel Chakravarty University of New South Wales Haskell Symposium 2010 Baltimore, USA, 30th Sep
  • 2. What is ‘An LLVM Backend’? • Low Level Virtual Machine • Backend Compiler framework • Designed to be used by compiler developers, not by software developers • Open Source • Heavily sponsored by Apple
  • 3. Motivation • Simplify • Reduce ongoing work • Outsource! • Performance • Improve run-time
  • 4. Example collat :: Int -> Word32 -> Int Run-time (ms) collat c 1 = c 2000 collat c n | even n = 1800 collat (c+1) $ n `div` 2 1600 | otherwise = 1400 collat (c+1) $ 3 * n + 1 1200 1000 pmax x n = x `max` (collat 1 n, n) 800 600 main = print $ foldl pmax (1,1) 400 [2..1000000] 200 0 LLVM C NCG Different, updated run-times compared to paper
  • 5. Competitors C Backend (C) Native Code Generator (NCG) • GNU C Dependency • Huge amount of work • Badly supported on • Very limited portability platforms such as Windows • Does very little • Use a Mangler on optimisation work assembly code • Slow compilation speed • Takes twice as long as the NCG
  • 6. GHC’s Compilation Pipeline C Core STG Cmm NCG LLVM Language, Abstract Machine, Machine Configuration Heap, Stack, Registers
  • 7. Compiling to LLVM Won’t be covering, see paper for full details: • Why from Cmm and not from STG/Core • LLVM, C-- & Cmm languages • Dealing with LLVM’s SSA form • LLVM type system Will be covering: • Handling the STG Registers • Handling GHC’s Table-Next-To-Code optimisation
  • 8. Handling the STG Registers STG Register X86 Register Implement either by: Base ebx • In memory Heap Pointer edi • Pin to hardware registers Stack Pointer ebp R1 esi NCG? • Register allocator permanently stores STG registers in hardware C Backend? • Uses GNU C extension (global register variables) to also permanently store STG registers in hardware
  • 9. Handling the STG Registers LLVM handles by implementing a new calling convention: STG Register X86 Register Base ebx Heap Pointer edi Stack Pointer ebp R1 esi define f ghc_cc (Base, Hp, Sp, R1) { … tail call g ghc_cc (Base, Hp’, Sp’, R1’); return void; }
  • 10. Handling the STG Registers • Issue: If implemented naively then all the STG registers have a live range of the entire function. • Some of the STG registers can never be scratched (e.g Sp, Hp…) but many can (e.g R2, R3…). • We need to somehow tell LLVM when we no longer care about an STG register, otherwise it will spill and reload the register across calls to C land for example.
  • 11. Handling the STG Registers • We handle this by storing undef into the STG register when it is no longer needed. We manually scratch them. define f ghc_cc (Base, Hp, Sp, R1, R2, R3, R4) { … store undef %R2 store undef %R3 store undef %R4 call c_cc sin(double %f22); … tail call ghc_cc g(Base,Hp’,Sp’,R1’,R2’,R3’,R4’); return void; }
  • 12. Handling Tables-Next-To-Code Un-optimised Layout Optimised Layout How to implement in LLVM?
  • 13. Handling Tables-Next-To-Code Use GNU Assembler sub-section feature. • Allows code/data to be put into numbered sub-section • Sub-sections are appended together in order • Table in <n>, entry code in <n+1> .text 12 sJ8_info: movl ... movl ... jmp ... [...] .text 11 sJ8_info_itable: .long ... .long 0 .long 327712
  • 14. Handling Tables-Next-To-Code LLVM Mangler C Mangler • 180 lines of Haskell (half • 2,000 lines of Perl is documentation) • Needed for every platform • Needed only for OS X
  • 15. Evaluation: Simplicity Code Size LLVM 25000 • Half of code is representation of LLVM language 20000 C 15000 • Compiler: 1,100 lines • C Headers: 2,000 lines 10000 • Perl Mangler: 2,000 lines 5000 NCG • Shared component: 8,000 lines 0 • Platform specific: 4,000 – 5,000 LLVM C NCG for X86, SPARC, PowerPC
  • 16. Evaluation: Performance Run-time against LLVM (%) Nofib: 3 • Egalitarian benchmark suite, everything is equal 2.5 • Memory bound, little 2 room for optimisation once at Cmm stage 1.5 1 0.5 0 NCG C -0.5
  • 17. Repa Performance runtimes (s) 16 14 12 10 8 LLVM NCG 6 4 2 0 Matrix Mult Laplace FFT
  • 18. Compile Times, Object Sizes Compile Times Vs Object Sizes Vs LLVM LLVM 0 60 NCG C -2 40 -4 20 -6 0 NCG C -8 -20 -10 -40 -12 -60 -80 -14
  • 19. Result • LLVM Backend is simpler. • LLVM Backend is as fast or faster. • LLVM developers now work for GHC!
  • 20. Get It • LLVM • Our calling convention has been accepted upstream! • Included in LLVM since version 2.7 http://llvm.org • GHC • In HEAD • Should be released in GHC 7.0 • Send me any programs that are slower!
  • 22. Why from Cmm? A lot less work then from STG/Core But… Couldn’t you do a better job from STG/Core? Doubtful… Easier to fix any deficiencies in Cmm representation and code generator
  • 23. Dealing with SSA LLVM language is SSA form: • Each variable can only be assigned to once • Immutable How do we handle converting mutable Cmm variables? • Allocate a stack slot for each Cmm variable • Use load and stores for reads and writes • Use ‘mem2reg’ llvm optimisation pass • This converts our stack allocation to LLVM variables instead that properly obeys the SSA requirement
  • 24. Type Systems? LLVM language has a fairly high level type system • Strings, Arrays, Pointers… When combined with SSA form, great for development • 15 bug fixes required after backend finished to get test suite to pass • 10 of those were motivated by type system errors • Some could have been fairly difficult (e.g returning pointer instead of value)