This slides describes the basic concepts of industrial-strength compiler design. This includes basic concept of static single-assignment form (SSA) and various optimizations such as dead code elimination, global value numbering, constant propagation, etc. This is intend for a 150 minutes undergraduate compiler class.
Python is a great language, but there are occasions where we need access to low level operations or connect with some database driver written in C. With the FFI(Foreign function interface) we can connect Python with other languages like C, C++ and even the new Rust. There are some alternatives to achieve this goal, Native Extensions, Ctypes and CFFI. I'll compare this three ways of extending Python.
Basic c++ 11/14 for python programmersJen Yee Hong
A short list of some common python programming patterns and their C++ equivalents. This can help programmers learn C++ in a more efficient way if he or she already knows Python.
Part of this material is used for internal training of Appier Inc, one of the leading artificial intelligence company in Asia.
Thank Appier Inc. for allowing me to share this.
Python is a great language, but there are occasions where we need access to low level operations or connect with some database driver written in C. With the FFI(Foreign function interface) we can connect Python with other languages like C, C++ and even the new Rust. There are some alternatives to achieve this goal, Native Extensions, Ctypes and CFFI. I'll compare this three ways of extending Python.
Basic c++ 11/14 for python programmersJen Yee Hong
A short list of some common python programming patterns and their C++ equivalents. This can help programmers learn C++ in a more efficient way if he or she already knows Python.
Part of this material is used for internal training of Appier Inc, one of the leading artificial intelligence company in Asia.
Thank Appier Inc. for allowing me to share this.
Tiramisu : A Code Optimization Framework for High Performance Systems
https://www.csail.mit.edu/research/tiramisu-framework-code-optimization-and-code-generation
の概要です。
ドキュメントがほとんどないので、ソースコード解析をやって、サンプルプログラムの内容について、調べてみました。
These are the slides of my talk organized at the Université de Nantes, on March 13, 2014, by the student association in computer science ASCII. [Update Nov. 2016]
The target audience is people with some C++ knowledges interested by learning good practices and common pitfalls.
Tiramisu : A Code Optimization Framework for High Performance Systems
https://www.csail.mit.edu/research/tiramisu-framework-code-optimization-and-code-generation
の概要です。
ドキュメントがほとんどないので、ソースコード解析をやって、サンプルプログラムの内容について、調べてみました。
These are the slides of my talk organized at the Université de Nantes, on March 13, 2014, by the student association in computer science ASCII. [Update Nov. 2016]
The target audience is people with some C++ knowledges interested by learning good practices and common pitfalls.
I am Bernard. I am a C Programming Assignment Help Expert at programminghomeworkhelp.com. I hold a Ph.D. in Computer Science from, University of Leeds, UK. I have been helping students with their homework for the past 12 years. I solve assignments related to C Programming.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.You can also call on +1 678 648 4277 for any assistance with C Programming assignments.
Asynchronous operations are getting more and more popular. To the point that we are getting frameworks and environments revolving strictly around that concept. Boost.ASIO, Twisted and node.js are notable example. We will not explore that area. We will focus on techniques for making asynchronous more readable. We will present different currently used solutions. At the end we will introduce coroutines and explain the concept. We will show how these can be integrated with asynchronous code and what we benefit from using coroutines in asynchronous code.
I am Moffat K. I am a C++ Programming Homework Expert at cpphomeworkhelp.com. I hold a Masters in Programming from London, UK. I have been helping students with their homework for the past 6 years. I solve homework related to C++ Programming.
Visit cpphomeworkhelp.com or email info@cpphomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with C++ Programming Homework.
Capture the Flag (CTF) are information security challenges. They are fun, but they also provide a opportunity to practise for real-world security challenges.
In this talk we present the concept of CTF. We focus on some tools used by our team, which can also be used to solve real-world problems.
I am Gill H. I am a Programming Assignment Expert at programminghomeworkhelp.com. I hold a Ph.D. in Electronics Engineering from, the University of Texas, USA. I have been helping students with their homework for the past 8 years. I solve assignments related to Programming.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with Programming Assignments.
Serge Guelton and Pierrick Brunet (Namek): “Pythran: Static Compilation of Parallel Scientific Kernels”
Abstract: As the use of Python coupled to Numpy/SciPy for numerical computation increases, many tools to optimize performance have emerged. Indeed, this duo has relatively poor performance when compared to scientific codes written in legacy languages like C or Fortran. Cython, Numba, numexpr and parakeet belongs to this new compiler ecosystem. And so does Pythran, a Python to C++11 translator for scientific Python.
Pythran uses a static compilation approach a la Cython, but with full backward compatibility with Python. It does not only turns Python code into C++ code, it also performs Python/Numpy specific optimizations, generates calls to a parallel, vectorized runtime and makes it possible to write OpenMP annotation in the original Python code. It supports a large range of Numpy functions and can combine them in efficient ways: it can optimize highlevel modern Python/Numpy codes and not only Fortran with a Python flavor ones.
This talk presents the existing compilation approach and optimization opportunities for numerical Python, their strengths and weaknesses, then focus on the specificities of the Pythran compiler.
A talk on static code analysis tools such as jshint, jscs, and eslint and how to use them to write good (stylish) code. Also introducing tools to enforce using the correct style via editorconfig or js-beautify to minimize efforts to write good code.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Joseph Spisak, Product Manager at Facebook, delivers the presentation "PyTorch Deep Learning Framework: Status and Directions" at the Embedded Vision Alliance's December 2019 Vision Industry and Technology Forum. Spisak gives an update on the Torch deep learning framework and where it’s heading.
Presentation on native interfaces for the R programming language given as part of a course in advanced R programming at FHCRC:
https://secure.bioconductor.org/SeattleMay10/
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
Launch Your Streaming Platforms in MinutesRoshan Dwivedi
The claim of launching a streaming platform in minutes might be a bit of an exaggeration, but there are services that can significantly streamline the process. Here's a breakdown:
Pros of Speedy Streaming Platform Launch Services:
No coding required: These services often use drag-and-drop interfaces or pre-built templates, eliminating the need for programming knowledge.
Faster setup: Compared to building from scratch, these platforms can get you up and running much quicker.
All-in-one solutions: Many services offer features like content management systems (CMS), video players, and monetization tools, reducing the need for multiple integrations.
Things to Consider:
Limited customization: These platforms may offer less flexibility in design and functionality compared to custom-built solutions.
Scalability: As your audience grows, you might need to upgrade to a more robust platform or encounter limitations with the "quick launch" option.
Features: Carefully evaluate which features are included and if they meet your specific needs (e.g., live streaming, subscription options).
Examples of Services for Launching Streaming Platforms:
Muvi [muvi com]
Uscreen [usencreen tv]
Alternatives to Consider:
Existing Streaming platforms: Platforms like YouTube or Twitch might be suitable for basic streaming needs, though monetization options might be limited.
Custom Development: While more time-consuming, custom development offers the most control and flexibility for your platform.
Overall, launching a streaming platform in minutes might not be entirely realistic, but these services can significantly speed up the process compared to building from scratch. Carefully consider your needs and budget when choosing the best option for you.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
2. LOGAN CHIEN
So ware engineer at MediaTek
Ametuar LLVM/Clang developer
Integrated LLVM/Clang into Android NDK
3. WHY I LOVE COMPILERS?
I have been a faithful reader of Jserv's blog for ten years.
I was inspired by the compiler and the virtual machine
technologies mentioned in his blog.
I have decided to choose compiler technologies as my
research topic since then.
5. MY PROFESSOR SAID ...
“We took many lectures to discuss about the parser.
However, when people say they are doing compiler
research, with large possibility, they are not referring to the
parsing technique.”
6. I AM HERE TO ...
Re-introduce the compiler technologies,
Give a lightening talk on industrial-strength compiler
design,
Explain the connection between compiler technologies and
the industry.
9. WHAT IS A COMPILER?
Compiler are tools for programmers to translate
programmer's thought into computer runnable programs.
ANALOGY — Translators who turn from one language to
another, such as those who translate Chinese to English.
10. WHAT HAVE WE LEARNED IN
UNDERGRADUATE COMPILER
COURSE?
11. LEXER
Reads the input source code (as a sequence of bytes) and
converts them into a stream of tokens.
u n s i g n e d b a c k g r o u n d ( u n s i g n e d f o r e g r o u n d ) {
i f ( ( f o r e g r o u n d > > 1 6 ) > 0 x 8 0 ) {
r e t u r n 0 ;
} e l s e {
r e t u r n 0 x f f f f f f ;
}
}
unsigned background ( unsigned foreground ) { if ( ( foreground >>
16 ) > 128 ) { return 0 ; } else { return 16777215 ; } }
12. PARSER
Reads the tokens and build an AST according to the syntax.
unsigned background ( unsigned foreground ) { if ( ( foreground >>
16 ) > 128 ) { return 0 ; } else { return 16777215 ; } }
( p r o c e d u r e b a c k g r o u n d
( a r g s ' ( f o r e g r o u n d ) )
( c o m p o u n d - s t m t
( i f - s t m t
( b i n - e x p r G E ( b i n - e x p r R S H I F T f o r e g r o u n d 1 6 ) 1 2 8 )
( r e t u r n - s t m t 0 )
( r e t u r n - s t m t 1 6 7 7 7 2 1 5 ) ) ) )
13. CODE GENERATOR
Generate the machine code or (assembly) according to the
AST. In the undergraduate course, we usually simply do
syntax-directed translation.
( p r o c e d u r e b a c k g r o u n d
( a r g s ' ( f o r e g r o u n d ) )
( c o m p o u n d - s t m t
( i f - s t m t
( b i n - e x p r G E ( b i n - e x p r R S H I F T f o r e g r o u n d 1 6 ) 1 2 8 )
( r e t u r n - s t m t 0 )
( r e t u r n - s t m t 1 6 7 7 7 2 1 5 ) ) ) )
l s r w 8 , w 0 , # 1 6
c m p w 8 , # 1 2 8
b . l o . L e l s e
m o v w 0 , w z r
r e t
. L e l s e :
o r r w 0 , w z r , # 0 x f f f f f f
r e t
14. WHAT'S MISSING?
Can a person who can only lex and parse sentences translate
articles well?
17. PROGRAMMING LANGUAGE THEORY
Essential component of a programming language: type
theory, variable scoping, language semantics, etc.
How do people reason and compose a program?
Create an abstraction that is understandable to human and
tracable to computers.
18. EXAMPLE: SUBTYPE AND MUTABLE
RECORDS
Why you can't perform following conversion in C++?
v o i d t e s t ( i n t * p t r ) {
i n t * * p = & p t r ;
c o n s t i n t * * a = p ; / / C o m p i l e r g i v e s w a r n i n g
/ / . . .
}
This is related to covariant type and contravarience type. With PLT, we know that we can only choose
two of (a) covariant type, (b) mutable records, and (c) type consistency.
v o i d t e s t ( i n t * p t r ) {
c o n s t i n t c = 0 ;
i n t * * p = & p t r ;
c o n s t i n t * * a = p ; / / I f i t i s a l l o w e d , b a d p r o g r a m s w i l l p a s s .
* a = & c ;
* p = 5 ; / / N o w a r n i n g h e r e .
}
19. EXAMPLE: DYNAMIC SCOPING IN
BASH
# ! / b i n / s h
v = 1 # I n i t i a l i z e v w i t h 1
f o o ( ) {
e c h o " f o o : v = $ { v } " # W h i c h v i s r e f e r r e d ?
v = 2 # W h i c h v i s a s s i g n e d ?
}
b a r ( ) {
l o c a l v = 3
f o o
e c h o " b a r : v = $ { v } " # W h a t w i l l b e p r i n t e d ?
}
v = 4 # A s s i g n 4 t o v
b a r
e c h o " v = $ { v } " # W h a t w i l l b e p r i n t e d ?
Ans: foo:v=3, bar:v=2, v=4. Surprisingly, foois
accessing local vin barinstead of the global v.
20. EXAMPLE: NON-LEXICAL SCOPING IN
JAVASCRIPT
/ / J a v a s c r i p t , t h e b a d p a r t
f u n c t i o n b a d ( v ) {
v a r s u m ;
w i t h ( v ) {
s u m = a + b ;
}
r e t u r n s u m ;
}
c o n s o l e . l o g ( b a d ( { a : 5 , b : 1 0 } ) ) ;
c o n s o l e . l o g ( b a d ( { a : 5 , b : 1 0 , s u m : 1 0 0 } ) ) ;
Ans: The second console.log()prints undefined.
22. QUIZ: DO YOU REALLY KNOW C?
Is it guaranteed that vwill always be loaded a er pred?
i n t p r e d ;
i n t v ;
i n t g e t ( i n t a , i n t b ) {
i n t r e s ;
i f ( p r e d > 0 ) {
r e s = v * a - v / b ;
} e l s e {
r e s = v * a + v / b ;
}
r e t u r n r e s ;
}
Ans: No. Independent reads/writes can be reordered. The standard only requires the result should
be the same as running from top to bottom (in a single thread.)
23. COMPILER ANALYSIS
Data-flow analysis — Analyze value ranges, check the
conditions or contraints, figure out modifications to
variables, etc.
Control-flow analysis — Analyze the structure of the
program, such as control dependency and loop structure.
Memory dependency analysis — Analyze the memory
access pattern of the access to array elements or pointer
dereferences, e.g. alias analysis.
24. ALIAS ANALYSIS
Determine whether two pointers can refers to the same
object or not.
v o i d m o v e ( c h a r * d s t , c o n s t c h a r * s r c , i n t n ) {
f o r ( i n t i = 0 ; i < n ; + + i ) {
d s t [ i ] = s r c [ i ] ;
}
}
i n t s u m ( i n t * p t r , c o n s t i n t * v a l , i n t n ) {
i n t r e s = 0 ;
f o r ( i n t i = 0 ; i < n ; + + i ) {
r e s + = * v a l ;
* p t r + + = 1 0 ;
}
r e t u r n r e s ;
}
25. QUIZ: DO YOU KNOW C++?
c l a s s Q M u t e x L o c k e r {
p u b l i c :
u n i o n {
Q M u t e x * m t x _ ;
u i n t p t r _ t v a l _ ;
} ;
v o i d u n l o c k ( ) {
i f ( v a l _ ) {
i f ( ( v a l _ & ( u i n t p t r _ t ) 1 ) = = ( u i n t p t r _ t ) 1 ) {
v a l _ & = ~ ( u i n t p t r _ t ) 1 ;
m t x _ - > u n l o c k ( ) ;
}
}
}
} ;
Pitfall: Reading from union fields that were not written
previously results in undefined behavior. Type-Based Alias
Analysis (TBAA) exploits this rule.
26. COMPILER OPTIMIZATION
Scalar optimization — Fold the constants, remove the
redundancies, change expressions with identities, etc.
Vector optimization — Convert several scalar operations
into one vector operation, e.g. combining for addinstruction
into one vector add.
Interprocedural optimization — function inlining,
devirtualization, cross-function analysis, etc.
29. What's the difference between your
final project and the industrial-
strength compiler?
30. KEY DIFFERENCE
Analysis — Reasons program structures and changes of
values.
Optimization — Applies several provably correct
transformation which should make program run faster.
Intermediate Representation — Data structure on which
analyses and optimizations are based.
31. INTERMEDIATE REPRESENTATION1/4
A data structure for program analyses and optimizations.
High-level enough to capture important properties and
encapsulates hardware limitation.
Low-level enough to be analyzed by analyses and
manipulated by transformations.
An abstraction layer for multiple front-ends and back-ends.
34. INTERMEDIATE REPRESENTATION4/4
GCC — GENERIC, GIMPLE, Tree SSA, and RTL.
LLVM — LLVM IR, Selection DAG, and Machine Instructions.
Java HotSpot — HIR, LIR, and MIR.
35. CONTROL FLOW GRAPH1/3
Basic block — A sequence of instructions that will only be
entered from the top and exited from the end.
Edge — If the basic block s may branch to t, then we add a
directed edge (s, t).
Predecessor/Successor — If there is an edge (s, t), then s is a
predecessor of t and t is a successor of s.
36. CONTROL FLOW GRAPH2/3
/ / y = a * x + b ;
v o i d m a t m u l ( d o u b l e * r e s t r i c t y ,
u n s i g n e d l o n g m , u n s i g n e d l o n g n ,
c o n s t d o u b l e * r e s t r i c t a ,
c o n s t d o u b l e * r e s t r i c t x ,
c o n s t d o u b l e * r e s t r i c t b ) {
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) {
t + = a [ r * n + i ] * x [ i ] ;
}
y [ r ] = t ;
}
}
Input source program
37. CONTROL FLOW GRAPH3/3
bp = getelementptr b, r
t = load bp
i = 0
cmp = icmp lt, i, n
br cmp, B2, B3
B1
B2
rn = mul r, n
ai = add rn, i
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i
xd = load xp
ax = mul ad, xd
t = add t, ax
i = add i, 1
cmp = icmp lt, i, n
br cmp, B2, B3
B3
yp = getelementptr y, r
store t, yp
r = add r, 1
cmp = icmp lt, r, n
br cmp, B1, B4B4
ret void
B0 (ENTRY)
r = 0
cmp = icmp lt r, m
br cmp, B1, B4
38. VARIABLE DEFINITION
bp = getelementptr b, r
t = load bp
i = 0
cmp = icmp lt, i, n
br cmp, B2, B3
B1
B2
rn = mul r, n
ai = add rn, i
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i
xd = load xp
ax = mul ad, xd
t = add t, ax
i = add i, 1
cmp = icmp lt, i, n
br cmp, B2, B3
B3
yp = getelementptr y, r
store t, yp
r = add r, 1
cmp = icmp lt, r, n
br cmp, B1, B4B4
ret void
B0 (ENTRY)
r = 0
cmp = icmp lt r, m
br cmp, B1, B4
definitions of r
The place where a variable is assigned or defined.
39. VARIABLE USE
bp = getelementptr b, r
t = load bp
i = 0
cmp = icmp lt, i, n
br cmp, B2, B3
B1
B2
rn = mul r, n
ai = add rn, i
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i
xd = load xp
ax = mul ad, xd
t = add t, ax
i = add i, 1
cmp = icmp lt, i, n
br cmp, B2, B3
B3
yp = getelementptr y, r
store t, yp
r = add r, 1
cmp = icmp lt, r, n
br cmp, B1, B4B4
ret void
B0 (ENTRY)
r = 0
cmp = icmp lt r, m
br cmp, B1, B4
definitions of r
uses of r
The places where a variable is referred or used.
40. REACHING DEFINITION1/2
bp = getelementptr b, r
t = load bp
i = 0
cmp = icmp lt, i, n
br cmp, B2, B3
B1
B2
rn = mul r, n
ai = add rn, i
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i
xd = load xp
ax = mul ad, xd
t = add t, ax
i = add i, 1
cmp = icmp lt, i, n
br cmp, B2, B3
B3
yp = getelementptr y, r
store t, yp
r = add r, 1
cmp = icmp lt, r, n
br cmp, B1, B4B4
ret void
B0 (ENTRY)
r = 0
cmp = icmp lt r, m
br cmp, B1, B4
{d0} {d0, d1}
{d0, d1}
d0:
d1:
{d1}
{d0, d1}
{d0, d1}
Definitions that reaches a use.
41. REACHING DEFINITION2/2
Constant propagation is a good example to show the
usefulness of reaching definition.
v o i d t e s t ( i n t c o n d ) {
i n t a = 1 ; / / d 0
i n t b = 2 ; / / d 1
i f ( c o n d ) {
c = 3 ; / / d 2
} e l s e {
/ / R e a c h D e f [ a ] = { d 0 }
/ / R e a c h D e f [ b ] = { d 1 }
c = a + b ; / / d 3
}
/ / R e a c h D e f [ c ] = { d 2 , d 3 }
u s e ( c ) ;
}
42. DOMINANCE RELATION1/2
A basic block s dominates t iff every paths that goes from
entry to t will pass through s.
Every basic block in a CFG has an immediate dominator and
forms a dominator tree.
44. DOMINANCE FRONTIER
A basic block t is a dominance frontier of a basic block s, if
one of predecessor of t is dominated by s but t is not strictly
dominated by s.
B0
B1 B2
B3
B6
B4
B5
DF[B2] = {B3, B5}
DF[B4] = {B4}
DF[B1] = {B3, B5}
45. STATIC SINGLE-ASSIGNMENT FORM
static — A static analysis to the program (not the execution.)
single-assignment — Every variable can only be assigned
once.
SSA form is the most popular intermediate representation
recently.
It is adopted by a wide range of compilers, such as GCC,
LLVM, Java HotSpot, Android ART, etc.
46. SSA PROPERTIES
Every variables can only be defined once.
Every uses can only refer to one definition.
Use phifunction to handle the merged control-flow.
47. PHI FUNCTIONS
d e f i n e v o i d @ f o o ( i 1 c o n d ,
i 3 2 a , i 3 2 b ) {
e n t :
b r c o n d , b 1 , b 2
b 1 :
t 0 = m u l a , 4
b r b 3
b 2 :
t 1 = m u l b , 5
b r b 3
b 3 :
t 2 = p h i ( t 0 ) , ( t 1 )
u s e ( t 2 )
r e t
}
d e f i n e v o i d @ f o o ( i 3 2 n ) {
e n t :
b r l o o p
l o o p :
i 0 = p h i ( 0 ) , ( i 1 )
c m p = i c m p g e i 0 , n
b r c m p , e n d , c o n t
c o n t :
u s e ( i 0 )
i 1 = a d d i 0 , 1
b r l o o p
e n d :
r e t
}
48. ADVANTAGE OF SSA FORM
Compact — Reduce the def-use chain.
Referential transparency — The properties associated with a
variable will not be changed, aka. context-free.
49. REDUCED DEF-USE CHAIN
v o i d f o o ( i n t c o n d 1 , i n t c o n d 2 ,
i n t a , i n t b ) {
i n t t ;
i f ( c o n d 1 ) {
t = a * 4 ; / / d 0
} e l s e {
t = b * 5 ; / / d 1
}
i f ( c o n d 2 ) {
/ / r e a c h - d e f : { d 0 , d 1 }
u s e ( t ) ;
} e l s e {
/ / r e a c h - d e f : { d 0 , d 1 }
u s e ( t ) ;
}
}
v o i d f o o ( i n t c o n d 1 , i n t c o n d 2 ,
i n t a , i n t b ) {
i f ( c o n d 1 ) {
t . 0 = a * 4 ;
} e l s e {
t . 1 = b * 5 ;
}
t . 2 = p h i ( t . 0 , t . 1 ) ;
i f ( c o n d 2 ) {
u s e ( t . 2 ) ;
} e l s e {
u s e ( t . 2 ) ;
}
}
50. REFERENTIAL TRANSPARENCY1/2
v o i d f o o ( ) {
i n t r = 5 ; / / d 0
/ / . . . S O M E C O D E . . .
/ / W e c a n o n l y a s s u m e
/ / " r = = 5 " i f d 0 i s t h e
/ / o n l y r e a c h i n g d e f i n i t i o n .
u s e ( r ) ;
}
v o i d f o o ( ) {
r . 0 = 5 ;
/ / . . . S O M E C O D E . . .
/ / N o m a t t e r w h a t c o d e a r e
/ / s k i p p e d a b o v e , i t i s s a f e
/ / t o r e p l a c e f o l l o w i n g r . 0
/ / w i t h 5 .
u s e ( r . 0 ) ;
}
51. REFERENTIAL TRANSPARENCY2/2
v o i d f o o ( i n t a , i n t b ) {
i n t c = a + b ;
i n t r = a + b ;
/ / C a n w e s i m p l y r e p l a c e
/ / a l l o c c u r r e n c e o f r w i t h
/ / c ? ( N O )
/ / . . . S O M E C O D E . . .
u s e ( r ) ;
}
v o i d f o o ( i n t a , i n t b ) {
c . 0 = a + b ;
r . 0 = a + b ;
/ / . . . S O M E C O D E . . .
/ / N o m a t t e r w h a t c o d e a r e
/ / s k i p p e d a b o v e , i t i s s a f e
/ / t o r e p l a c e f o l l o w i n g r . 0
/ / w i t h c . 0 .
u s e ( r . 0 ) ;
}
52. BUILDING SSA FORM
1. Compute domination relationships between basic blocks
and build the dominator tree.
2. Compute dominator frontiers.
3. Insert phifunctions at dominator frontiers.
4. Traverse the dominator tree and rename the variables.
53. BEFORE SSA CONSTRUCTION
bp = getelementptr b, r
t = load bp
i = 0
cmp = icmp lt, i, n
br cmp, B2, B3
B1
B2
rn = mul r, n
ai = add rn, i
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i
xd = load xp
ax = mul ad, xd
t = add t, ax
i = add i, 1
cmp = icmp lt, i, n
br cmp, B2, B3
B3
yp = getelementptr y, r
store t, yp
r = add r, 1
cmp = icmp lt, r, n
br cmp, B1, B4B4
ret void
B0 (ENTRY)
r = 0
cmp = icmp lt r, m
br cmp, B1, B4
54. AFTER SSA CONSTRUCTION
r.0 = phi (0, B0) (r.1 B3)
bp = getelementptr b, r.0
t.0 = load bp
cmp = icmp lt, 0, n
br cmp, B2, B3
B1
B2
t.1 = phi (t.0, B1) (t.2 B2)
i.0 = phi (0, B1) (i.1, B2)
rn = mul r.0, n
ai = add rn, i.1
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i.1
xd = load xp
ax = mul ad, xd
t.2 = add t.1, ax
i.1 = add i.0, 1
cmp = icmp lt, i.1, n
br cmp, B2, B3
B3
t.3 = phi (t0, B1) (t.2, B2)
yp = getelementptr y, r.0
store t.3, yp
r.1 = add r.0, 1
cmp = icmp lt, r.1, n
br cmp, B1, B4
B4
ret void
B0 (ENTRY)
cmp = icmp lt 0, m
br cmp, B1, B4
56. CONSTANT PROPAGATION1/3
Constant propagation, sometimes known as constant
folding, will evaluate the instructions with constant
operands and propagate the constant result.
a = a d d 2 , 3
b = a
c = m u l a , b
a = 5
b = 5
c = 2 5
57. CONSTANT PROPAGATION2/3
Why do we need constant propagation?
s t r u c t q u e u e * c r e a t e _ q u e u e ( ) {
r e t u r n ( s t r u c t q u e u e * ) m a l l o c ( s i z e o f ( s t r u c t q u e u e * ) * 1 6 ) ;
}
i n t p r o c e s s _ d a t a ( i n t a , i n t b , i n t c ) {
i n t k [ ] = { 0 x 1 3 , 0 x 1 7 , 0 x 1 9 } ;
i f ( D E B U G ) {
v e r i f y _ d a t a ( k , d a t a ) ;
}
r e t u r n ( a * k [ 1 ] * k [ 2 ] + b * k [ 0 ] * k [ 2 ] + c * k [ 0 ] * k [ 1 ] ) ;
}
58. CONSTANT PROPAGATION3/3
For each basic block b from CFG in reversed postorder:
For each instruction i in basic block b from top to bottom:
If all of its operands are constants, the operation has no
side-effect, and we know how to evaluate it at compile time,
then evaluate the result, remove the instruction, and replace
all usages with the result.
59. GLOBAL VALUE NUMBERING1/2
Global value numbering tries to give numbers to the
computed expression and maps the newly visited expression
to the visited ones.
a = c * d ; / / [ ( * , c , d ) - > a ]
e = c ; / / [ ( * , c , d ) - > a ]
f = e * d ; / / q u e r y : i s ( * , c , d ) a v a i l a b l e ?
u s e ( a , e , f ) ;
a = c * d ;
u s e ( a , c , a ) ;
60. GLOBAL VALUE NUMBERING2/2
Traverse the basic blocks in depth-first order on dominator
tree.
Maintain a stack of hash table. Once we have returned from
a child node on the dominator tree, then we have to pop the
stack top.
Visit the instructions in the basic block with tᅠ=ᅠopᅠa,ᅠb
form and compute the hash for (op, a, b).
If (op, a, b) is already in the hash table, then change
tᅠ=ᅠopᅠa,ᅠb with tᅠ=ᅠhash_tab[(op,ᅠa,ᅠb)]
Otherwise, insert (op, a, b) -> t to the hash table.
61. DEAD CODE ELIMINATION1/6
Dead code elimination (DCE) removes unreachable
instructions or ignored results.
Constant propagation might reveal more dead code since
the branch conditions become constant value.
On the other hand, DCE can exploit more constant for
constant propagation because several definitions are
removed from the program.
62. DEAD CODE ELIMINATION2/6
Conditional statements with constant condition
i f ( k I s D e b u g B u i l d ) { / / D e a d c o d e
c h e c k _ i n v a r i a n t ( a , b , c , d ) ; / / D e a d c o d e
}
63. DEAD CODE ELIMINATION3/6
Platform-specific constant
v o i d h a s h _ e n t _ s e t _ v a l ( s t r u c t h a s h _ e n t * h , i n t v ) {
i f ( s i z e o f ( i n t ) < = s i z e o f ( v o i d * ) ) {
h - > p = ( v o i d * ) ( u i n t p t r _ t ) ( v ) ;
} e l s e {
h - > p = m a l l o c ( s i z e o f ( i n t ) ) ; / / D e a d c o d e
* ( h - > p ) = v ; / / D e a d c o d e
}
}
64. DEAD CODE ELIMINATION4/6
Computed result ignored
i n t c o m p u t e _ s u m ( i n t a , i n t b ) {
i n t s u m = ( a + b ) * ( a - b + 1 ) / 2 ; / / D e a d c o d e : N o t u s e d
r e t u r n 0 ;
}
Dead code a er dead store optimization
v o i d t e s t ( i n t * p , b o o l c o n d , i n t a , i n t b , i n t c ) {
i f ( c o n d ) {
t = a + b ; / / D e a d c o d e
* p = t ; / / W i l l b e r e m o v e d b y D S E
}
* p = c ;
}
65. DEAD CODE ELIMINATION5/6
Dead code a er code specialization
v o i d m a t m u l ( d o u b l e * r e s t r i c t y , u n s i g n e d l o n g m , u n s i g n e d l o n g n ,
c o n s t d o u b l e * r e s t r i c t a ,
c o n s t d o u b l e * r e s t r i c t x ,
c o n s t d o u b l e * r e s t r i c t b ) {
i f ( n = = 0 ) {
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { / / D e a d c o d e
t + = a [ r * n + i ] * x [ i ] ; / / D e a d c o d e
} / / D e a d c o d e
y [ r ] = t ;
}
} e l s e {
/ / . . . s k i p p e d . . .
}
}
66. DEAD CODE ELIMINATION6/6
Traverse the CFG starting from entry in reversed post order.
Only traverse the successor that may be visited, i.e. if the
branch condition is a constant, then ignore the other side.
While traversing the basic block, clear the dead instructions
within the basic block.
A er the traversal, remove the unvisited basic blocks and
remove the variable uses that refers to the variables that are
defined in the unvisited basic blocks.
67. LOOP OPTIMIZATIONS
It is reasonable to assume a program spends more time in
the loop body. Thus, loop optimization is an important issue
in the compiler.
68. LOOP INVARIANT CODE MOTION1/3
Loop invariant code motion (LICM) is an optimization which
moves loop invariants or loop constants out of the loop.
i n t t e s t ( i n t n , i n t a , i n t b , i n t c ) {
i n t s u m = 0 ;
f o r ( i n t i = 0 ; i < n ; + + i ) {
s u m + = i * a * b * c ; / / a * b * c i s l o o p i n v a r i a n t
}
r e t u r n s u m ;
}
69. LOOP INVARIANT CODE MOTION2/3
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) {
t + = a [ ( r * n ) + i ] * x [ i ] ; / / " r * n " i s a i n n e r l o o p i n v a r i a n t
}
y [ r ] = t ;
}
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
u n s i g n e d l o n g k = r * n ; / / " r * n " m o v e d o u t o f t h e i n n e r l o o p
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) {
t + = a [ k + i ] * x [ i ] ;
}
y [ r ] = t ;
}
70. LOOP INVARIANT CODE MOTION3/3
How do we know whether a variable is a loop invariant?
If the computation of a value is not (transitively) depending
on following black lists, we can assume such value is a loop
invariant:
Phi instructions associating with the loop being
considered
Instructions with side-effects or non-pure instructions,
e.g. function calls
71. INDUCTION VARIABLE
If x represents the trip count (iteration count), then we call
a*x+b induction variables.
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
u n s i g n e d l o n g k = r * n ; / / o u t e r l o o p I V
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) {
u n s i g n e d l o n g m = k + i ; / / i n n e r l o o p I V
t + = a [ m ] * x [ i ] ;
}
y [ r ] = t ;
}
k and r are induction variables of outer loop.
i and m are induction variables of inner loop.
72. LOOP STRENGTH REDUCTION1/2
Strength reduction is an optimization to replace the
multiplication in induction variables with an addition.
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
u n s i g n e d l o n g k = r * n ; / / o u t e r l o o p I V
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) {
u n s i g n e d l o n g m = k + i ; / / i n n e r l o o p I V
t + = a [ m ] * x [ i ] ;
}
y [ r ] = t ;
}
f o r ( u n s i g n e d l o n g r = 0 , k = 0 ; r < m ; + + r , k + = n ) { / / k
d o u b l e t = b [ r ] ;
f o r ( u n s i g n e d l o n g i = 0 , m = k ; i < n ; + + i , + + m ) { / / m
t + = a [ m ] * x [ i ] ;
}
y [ r ] = t ;
}
73. LOOP STRENGTH REDUCTION2/2
We can even rewrite the range to eliminate the index
computation.
i n t s u m ( c o n s t i n t * a , i n t n ) {
i n t r e s = 0 ;
f o r ( i n t i = 0 ; i < n ; + + i ) {
r e s + = a [ i ] ; / / i m p l i c i t * ( a + s i z e o f ( i n t ) * i )
}
r e t u r n r e s ;
}
i n t s u m ( c o n s t i n t * a , i n t n ) {
i n t r e s = 0 ;
c o n s t i n t * e n d = a + n ; / / i m p l i c i t a + s i z e o f ( i n t ) * n
f o r ( c o n s t i n t * p = a ; p ! = e n d ; + + p ) { / / r a n g e u p d a t e d
r e s + = * p ;
}
r e t u r n r e s ;
}
74. LOOP UNROLLING1/2
Unroll the loop body multiple times.
Purpose: Reduce amortized loop iteration overhead.
Purpose: Reduce load/store stall and exploit instruction-
level parallelism, e.g. so ware pipelining.
Purpose: Prepare for vectorization, e.g. SIMD.
75. LOOP UNROLLING2/2
f o r ( u n s i g n e d l o n g r = 0 , k = 0 ; r < m ; + + r , k + = n ) {
d o u b l e t = b [ r ] ;
s w i t c h ( n & 0 x 3 ) { / / D u f f ' s d e v i c e
c a s e 3 : t + = a [ k + 2 ] * x [ 2 ] ;
c a s e 2 : t + = a [ k + 1 ] * x [ 1 ] ;
c a s e 1 : t + = a [ k ] * x [ 0 ] ;
}
f o r ( u n s i g n e d l o n g i = n & 0 x 3 ; i < n ; i + = 4 ) {
t + = a [ k + i ] * x [ i ] ;
t + = a [ k + i + 1 ] * x [ i + 1 ] ;
t + = a [ k + i + 2 ] * x [ i + 2 ] ;
t + = a [ k + i + 3 ] * x [ i + 3 ] ;
}
y [ r ] = t ;
}
76. INSTRUCTION SELECTION1/5
Instruction selection is a process to map IR into machine
instructions.
Compiler back-ends will perform pattern matching to the
best select instructions (according to the heuristic.)
77. INSTRUCTION SELECTION2/5
Complex operations, e.g. shi -and-add or multiply-
accumulate.
Array load/store instructions, which can be translated to one
shi -add-and-load on some architectures.
IR instructions are which not natively supported by the target
machine.
78. INSTRUCTION SELECTION3/5
d e f i n e i 6 4 @ m a c ( i 6 4 % a , i 6 4 % b , i 6 4 % c ) {
e n t :
% 0 = m u l i 6 4 % a , % b
% 1 = a d d i 6 4 % 0 , % c
r e t i 6 4 % 1
}
m a c :
m a d d x 0 , x 1 , x 0 , x 2
r e t
79. INSTRUCTION SELECTION4/5
d e f i n e i 6 4 @ l o a d _ s h i f t ( i 6 4 * % a , i 6 4 % i ) {
e n t :
% 0 = g e t e l e m e n t p t r i 6 4 , i 6 4 * % a , i 6 4 % i
% 1 = l o a d i 6 4 , i 6 4 * % 0
r e t i 6 4 % 1
}
l o a d _ s h i f t :
l d r x 0 , [ x 0 , x 1 , l s l # 3 ]
r e t
80. INSTRUCTION SELECTION5/5
% s t r u c t . A = t y p e { i 6 4 * , [ 1 6 x i 6 4 ] }
d e f i n e i 6 4 @ g e t _ v a l ( % s t r u c t . A * % p , i 6 4 % i , i 6 4 % j ) {
e n t :
% 0 = g e t e l e m e n t p t r % s t r u c t . A , % s t r u c t . A * % p , i 6 4 % i , i 3 2 1 , i 6 4 % j
% 1 = l o a d i 6 4 , i 6 4 * % 0
r e t i 6 4 % 1
}
g e t _ v a l :
m o v z w 8 , # 0 x 8 8
m a d d x 8 , x 1 , x 8 , x 0
a d d x 8 , x 8 , x 2 , l s l # 3
l d r x 0 , [ x 8 , # 8 ]
r e t
81. SSA DESTRUCTION1/5
How do we deal with the phifunctions?
Copy the assignment operator to the end of the predecessor.
(Must be done carefully)
82. SSA DESTRUCTION2/3
cmp = icmp eq, cond, 0
br cmp, B1, B2
B0
br B3
B1
br B3
B2
a = phi (3, B1) (7, B2)
print(a)
B3
cmp = icmp eq, cond, 0
br cmp, B1, B2
B0
a = 3
br B3
B1
a = 7
br B3
B2
print(a)
B3
converts to
Example to show the assignment copying
83. SSA DESTRUCTION3/5
i.0 = phi (0, B0) (i.1, B2)
cmp = icmp lt, i.0, 100
br cmp, B2, B3
B1
ret
B3
print(i.0)
i.1 = add i.0, 1
br B1
B2
br B1
B0
cmp = icmp lt, i.0, 100
br cmp, B2, B3
B1 B3
print(i.0)
i.1 = add i.0, 1
i.0 = i.1
br B1
B2
i.0 = 0
br B1
B0
ret
Converts to
Converts to
Example to show the assignment copying with loop
84. SSA DESTRUCTION4/5
i.0 = phi (0, B0) (i.1, B2)
cmp = icmp lt, i.0, 100
br cmp, B2, B3
B1
print(i.0)
ret
B3
print(i.0)
i.1 = add i.0, 1
br B1
B2
br B1
B0
cmp = icmp lt, i.0, 100
br cmp, B2, B3
B1 B3
print(i.0)
i.1 = add i.0, 1
i.0 = i.1
br B1
B2
i.0 = 0
br B1
B0
print(i.0)
ret
Doesn't
work!
Lost copy problem — Naive de-SSA algorithm doesn't work
due to live range conflicts. Need extra assignments and
renaming to avoid conflicts.
85. SSA DESTRUCTION5/5
t = x
x = y
y = t
br cmp, B2, B1
B1
print(x)
print(y)
B2
x = 1
y = 2
br B1
B0
x1 = phi (x0, y1)
y1 = phi (y0, x1)
br cmp, B2, B1
B1
print(x1)
print(y1)
B2
x0 = 1
y0 = 2
br B1
B0
x1 = y1
y1 = x1
br cmp, B2, B1
B1
print(x1)
print(y1)
B2
x0 = 1
y0 = 2
x1 = x0
y1= y0
br B1
B0
Swap problem — Conflicts due to parallel assignment
semantics of phiinstructions. A correct algorithm should
detect the cycle and implement parallel assignment with
swap instructions.
86. REGISTER ALLOCATION1/2
We have to replace infinite virtual registers with finite
machine registers.
Register Allocation — Make sure that the maximum
simultaneous register usages are less then k.
Register Assignment — Assign a register given the fact that
registers are always sufficient.
The classical solution is to compute the life time of each
variables, build the inference graph, spill the variable to
stack if the inference graph is not k-colorable.
87. REGISTER ALLOCATION2/2
ret y
add y, c, g
add g a f
add f, d, e
add e, a, c
add d, a, b
mov c, #2
mov b, #1
mov a, #1
a
b
c
d
e
f
g
y
a
g
b
c
e
d
f
y
88. INSTRUCTION SCHEDULING1/2
Sort the instruction according the number of cycles in order
to reduce execution time of the critical path.
Constraints: (a) Data dependence, (b) Functional units, (c)
Issue width, (d) Datapath forwarding, (e) Instruction cycle
time, etc.
89. INSTRUCTION SCHEDULING2/2
0 : a d d r 0 , r 0 , r 1
1 : a d d r 1 , r 2 , r 2
2 : l d r r 4 , [ r 3 ]
3 : s u b r 4 , r 4 , r 0
4 : s t r r 4 , [ r 1 ]
2 : l d r r 4 , [ r 3 ] # l o a d i n s t r u c t i o n n e e d s m o r e c y c l e s
0 : a d d r 0 , r 0 , r 1
1 : a d d r 1 , r 2 , r 2
3 : s u b r 4 , r 4 , r 0
4 : s t r r 4 , [ r 1 ]
93. BROWSERS
Javascript-related
Javascript Engine, e.g. V8, IonMonkey, JavaScriptCore,
Chakra
Regular Expression Engine
WebAssembly
WebGL — OpenGL binding for the Web (Graphics)
WebCL — OpenCL binding for the Web (Computing)
95. VIRTUAL MACHINES AND EMULATORS
Simulate different architecture (using dynamic
recompilation), e.g. QEMU
Simulate special instructions that are not supported by
hypervisor.
96. DEVELOPMENT ENVIRONMENT
C/C++ compiler, e.g. MSVC, GCC, LLVM
Java compiler, e.g. OpenJDK
DSP compilers for digital signal processors.
VHDL compilers for IC design team.
Profiling and/or intrusmentation tools, e.g. Valgrind, Dr.
Memory, Address Sanitizer, Memory Sanitizer
97. WHAT DOES A COMPILER TEAM DO?
Develop the compiler toolchain for the in-house processors,
e.g. DSP, GPU, CPU, and etc.
Tune the performance of the existing compiler or virtual
machines.
Develop a new programming language or model (e.g. Cuda
from NVIDIA.)
98. SOME RESEARCH TOPICS ...
Memory model for concurrency — Designing a good
memory model at programming language level, which
should be intuitive to human while open for
compiler/architecture optimization, is still a challenging
problem.
Both Java and C++ took efforts to standardize. However,
there are still some unintuitive cases that are allowed and
some desirable cases being ruled out.