This document discusses GCC's use of instruction patterns to port code to new target architectures. It explains that GCC uses three main intermediate representation formats: GENERIC, GIMPLE, and RTL. Optimization passes are performed on GIMPLE to do target-independent optimizations and on RTL to incorporate target features. Instruction patterns define operands, constraints, and assembly output to validate RTL and emit target code. Patterns can be split during optimization passes to better optimize constant values. Targets can define their own predicates and constraints for operands.
Jserv gave a talk about the conceptual introduction to LLVM. The session mentioned the evolution of compiler technologies, paradigm shift, LLVM as a promising open source project, and how LLVM changes the IT world.
Jserv gave a talk about the conceptual introduction to LLVM. The session mentioned the evolution of compiler technologies, paradigm shift, LLVM as a promising open source project, and how LLVM changes the IT world.
CMake is an open-source cross-platform build system. It is increasingly becoming the build system of choice for open source projects. The Qt project recently announced that Qbs, the replacement build system for qmake, will no longer be supported and future efforts will focus on CMake. It may become the default build system for Qt version 6.
CMake has offered support for building Qt applications for some time, and is supported within the Qt Creator IDE. In this webinar we will:
-Introduce you to CMake
-Cover its basic features and how to use it
-Show some CMake configurations including Qt-based applications
-Prove how easy it is to use Cmake with Qt so you'll be ready to use it for your C++ and Qt-based applications!
BUD17-302: LLVM Internals #2
Speaker: Renato Golin, Peter Smith, Diana Picus, Omair Javaid, Adhemerval Zanella
Track: Toolchain
★ Session Summary ★
Continuing from LAS16 and, if we have time, introducing global isel that we’re working on.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/bud17/bud17-302/
Presentation:
Video:
---------------------------------------------------
★ Event Details ★
Linaro Connect Budapest 2017 (BUD17)
6-10 March 2017
Corinthia Hotel, Budapest,
Erzsébet krt. 43-49,
1073 Hungary
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
"
Debug Information And Where They Come FromMin-Yih Hsu
(Presented in COSCUP 2022)
Debug information is a mapping between the original source code and low-level binary locations. It provides developers powerful insights to diagnose problems (via debuggers) in their code and acts as one of the most important foundations for modern software development. Furthermore, in recent years, we are seeing increasing demands of high quality debug information for highly optimized applications that are otherwise “un-debuggable”. For instance, debugging unoptimized games is generally not feasible since it’s likely to miss every single frame. In this talk, we are going to introduce how debug information works and how compilers generate proper debug info even with extensive levels of optimization enabled. At the end of this talk, you will gain insights into the structure of debug information and learn key compiler engineering knowledge on generating high quality debug info for critical, highly optimized software.
Introduce Brainf*ck, another Turing complete programming language. Then, try to implement the following from scratch: Interpreter, Compiler [x86_64 and ARM], and JIT Compiler.
Presentation slides about internals of GCC C++ compiler. It covers transformation from source code to output binary, compiler optimizations, register transfer language, etc.
Build a full-functioned virtual machine from scratch, when Brainfuck is used. Basic concepts about interpreter, optimizations techniques, language specialization, and platform specific tweaks.
While CMake has become the de-facto standard buildsystem for C++, it's siblings CTest and CPack are less well known. This talk gives a lightspeed introduction into these three tools and then focuses on best practices on building, testing, and packaging.
Graph-Based Source Code Analysis of JavaScript Repositories Dániel Stein
A graph-based approach to analyze JavaScript source codes, using Neo4j as the graph database backend and ShapeSecurity Shift as the parser.
Hungarian version (presented at a Neo4j meetup): http://www.slideshare.net/steindani/forrskdtrak-grfalap-statikus-analzise
CMake is an open-source cross-platform build system. It is increasingly becoming the build system of choice for open source projects. The Qt project recently announced that Qbs, the replacement build system for qmake, will no longer be supported and future efforts will focus on CMake. It may become the default build system for Qt version 6.
CMake has offered support for building Qt applications for some time, and is supported within the Qt Creator IDE. In this webinar we will:
-Introduce you to CMake
-Cover its basic features and how to use it
-Show some CMake configurations including Qt-based applications
-Prove how easy it is to use Cmake with Qt so you'll be ready to use it for your C++ and Qt-based applications!
BUD17-302: LLVM Internals #2
Speaker: Renato Golin, Peter Smith, Diana Picus, Omair Javaid, Adhemerval Zanella
Track: Toolchain
★ Session Summary ★
Continuing from LAS16 and, if we have time, introducing global isel that we’re working on.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/bud17/bud17-302/
Presentation:
Video:
---------------------------------------------------
★ Event Details ★
Linaro Connect Budapest 2017 (BUD17)
6-10 March 2017
Corinthia Hotel, Budapest,
Erzsébet krt. 43-49,
1073 Hungary
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
"
Debug Information And Where They Come FromMin-Yih Hsu
(Presented in COSCUP 2022)
Debug information is a mapping between the original source code and low-level binary locations. It provides developers powerful insights to diagnose problems (via debuggers) in their code and acts as one of the most important foundations for modern software development. Furthermore, in recent years, we are seeing increasing demands of high quality debug information for highly optimized applications that are otherwise “un-debuggable”. For instance, debugging unoptimized games is generally not feasible since it’s likely to miss every single frame. In this talk, we are going to introduce how debug information works and how compilers generate proper debug info even with extensive levels of optimization enabled. At the end of this talk, you will gain insights into the structure of debug information and learn key compiler engineering knowledge on generating high quality debug info for critical, highly optimized software.
Introduce Brainf*ck, another Turing complete programming language. Then, try to implement the following from scratch: Interpreter, Compiler [x86_64 and ARM], and JIT Compiler.
Presentation slides about internals of GCC C++ compiler. It covers transformation from source code to output binary, compiler optimizations, register transfer language, etc.
Build a full-functioned virtual machine from scratch, when Brainfuck is used. Basic concepts about interpreter, optimizations techniques, language specialization, and platform specific tweaks.
While CMake has become the de-facto standard buildsystem for C++, it's siblings CTest and CPack are less well known. This talk gives a lightspeed introduction into these three tools and then focuses on best practices on building, testing, and packaging.
Graph-Based Source Code Analysis of JavaScript Repositories Dániel Stein
A graph-based approach to analyze JavaScript source codes, using Neo4j as the graph database backend and ShapeSecurity Shift as the parser.
Hungarian version (presented at a Neo4j meetup): http://www.slideshare.net/steindani/forrskdtrak-grfalap-statikus-analzise
New abstractions for concurrency make writing programs easier by moving away from threads and locks, but debugging such programs becomes harder. The call-stack, an essential tool in understanding why and how control flow reached a certain point in the program, loses meaning when inspected in traditional debuggers. Futures, actors or iteratees make code easier to write and reason about, and in this talk I'll show a simple solution to make them easier to debug. The tool I present integrates well with the Eclipse plugin for Scala, and shows how a "reactive debugger" might look like.
Talk for SCaLE13x. Video: https://www.youtube.com/watch?v=_Ik8oiQvWgo . Profiling can show what your Linux kernel and appliacations are doing in detail, across all software stack layers. This talk shows how we are using Linux perf_events (aka "perf") and flame graphs at Netflix to understand CPU usage in detail, to optimize our cloud usage, solve performance issues, and identify regressions. This will be more than just an intro: profiling difficult targets, including Java and Node.js, will be covered, which includes ways to resolve JITed symbols and broken stacks. Included are the easy examples, the hard, and the cutting edge.
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docxeugeniadean34240
20145-5SumII_CSC407_assign1.html
CSC 407: Computer Systems II: 2015 Summer II, Assignment #1
Last Modified 2015 July 21Purpose:
To go over issues related to how the compiler and the linker
serve you, the programmer.
Computing
Please ssh into ctilinux1.cstcis.cti.depaul.edu, or use your own Linux machine.
Compiler optimization (45 Points)
Consider the following program.
/* q1.c
*/
#include <stdlib.h>
#include <stdio.h>
#define unsigned int uint
#define LENGTH ((uint) 512*64)
int initializeArray (uint len,
int* intArray
)
{
uint i;
for (i = 0; i < len; i++)
intArray[i] = (rand() % 64);
}
uint countAdjacent (int maxIndex,
int* intArray,
int direction
)
{
uint i;
uint sum = 0;
for (i = 0; i < maxIndex; i++)
if ( ( intArray[i] == (intArray[i+1] + direction) ) &&
( intArray[i] == (intArray[i+2] + 2*direction) )
)
sum++;
return(sum);
}
uint funkyFunction (uint len,
int* intArray
)
{
uint i;
uint sum = 0;
for (i = 0; i < len-1; i++)
if ( (i % 8) == 0x3 )
sum += 7*countAdjacent(len-2,intArray,+1);
else
sum += 17*countAdjacent(len-2,intArray,-1);
return(sum);
}
int main ()
{
int* intArray = (int*)calloc(LENGTH,sizeof(int));
initializeArray(LENGTH,intArray);
printf("funkyFunction() == %d\n",funkyFunction(LENGTH,intArray));
free(intArray);
return(EXIT_SUCCESS);
}
(8 Points) Compile it for profiling but with no extra optimization with:
$ gcc -o q1None -pg q1.c # Compiles q1.c to write q1None to make profile info
$ ./q1None # Runs q1None
$ gprof q1None # Gives profile info on q1None
Be sure to scroll all the way to the top of gprof output!
What are the number of self seconds taken by:
FunctionSelf secondsinitializeBigArray()__________countAdjaceent()__________funkyFunction()__________
(8 Points)
How did it do the operation (i % 8) == 0x3?
Was it done as a modulus (the same as an expensive division, but returns the remainder instead of the quotient) or something else?
Show the assembly language for this C code
using gdb to dissassemble
funkyFunction() of q1None.
Hint: do:
$ gdb q1None
. . .
(gdb) disass funkyFunction
Dump of assembler code for function funkyFunction:
. . .
and then look for the code that sets up the calls to countAdjacent().
The (i % 8) == 0x3 test is done before either countAdjacent() call.
(8 Points) Compile it for profiling but with optimization with:
$ gcc -o q1Compiler -O1 -pg q1.c # Compiles q1.c to write q1Compiler to make profile info
$ ./q1Compiler # Runs q1Compiler
$ gprof q1Compiler # Gives profile info on q1Compiler
What are the number of self seconds taken by:
FunctionSelf secondsinitializeBigArray()__________countAdjacent()__________funkyFunction()__________(8 Points) Use gdb to dissassemble countAdjacent() of both q1None and q1.
The GlobalISel framework was introduced with the intention of replacing SelectionDAG, aiming to provide advantages in terms of performance, granularity, and modularity. This tutorial will provide everything you need to know about using this framework for a new target, focusing on RISC-V as an example and working through some specific examples of challenging cases.
(c) European LLVM Developers' Meeting 2023
Glasgow, United Kingdom
May 10 - 11, 2023
https://llvm.swoogo.com/2023eurollvm/
https://www.youtube.com/playlist?list=PL_R5A0lGi1AD-bqRaY61l5Q-EozbfyLZr
Clojure is a new dialect of LISP that runs on the Java Virtual Machine (JVM). As a functional language, it offers great benefits in terms of programmer productivity; as a language that runs on the JVM, it also offers the opportunity to reuse existing Java libraries. Simon’s interest is in using Clojure to build desktop applications with the Java Swing GUI library. In this presentation Simon discusses how the power of Clojure can be applied to Swing, and whether it hits the sweet spot.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
2. Outline
Compiler structure
Intermediate languages in GCC
Optimization pass in GCC
Define instruction pattern
Operand constraints
Match instruction pattern
Strict RTL
Target defined constraints
Emit assembly code
Target information usage
Preserve word to describe instruction pattern
Example of instruction pattern
Split instruction pattern
Instruction attribute
Peephole pattern
Instruction scheduling
3.
4. Three main intermediate languages format in GCC
GENERIC
Language-independent representation generated by each front
end
Common representation for all the languages supported by
GCC.
GIMPLE
Perform language independent and target independent
optimization
RTL
Perform the optimization which will notice target feature by
porting code
7. Why need divide optimization pass to
gimple pass and RTL pass?
Gimple pass have more high level semantic
Ex: switch, array, structure, variable
Some optimization is more easier to designed when
high level semantic still exist
However, gimple pass lack of target information
Ex: instruction length(size), supported ISA
Therefore, we need RTL optimization pass
8. Define instruction pattern
All the RTL pattern must match target ISA
How to tell GCC generate the RTL match ISA ?
Instruction patterns
Use define_expand, define_insn to describe the instruction
patterns which target support
(define_insn “addsi3"
[
(set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i"))
)
] ... )
9. Define instruction pattern
GCC already define several instruction pattern
name and the semantic of the pattern
addsi3
Add semantic with 3 SI mode operands
GCC don’t know the operand constraint of the
target
How to tell GCC our target’s operand constraint of each
instruction ?
Predicate
Constraint
10. Operand Constraints
Multiple Alternative Constraints
(define_insn “addsi3"
[
(set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i"))
)
] ... )
Predicate: register_operand, nonmemory_operand
Constraint: r, i
Predicate should contain each constraints of the operand
For operand 2 with SI mode
r(reg) belong to nonmemory_operand
i(immediate) belong to nonmemory_operand
11. Operand Constraints
GCC already have predicate to restrict
operand
Why need constraint field ?
Give the opportunity to change operand while
optimization
Ex:
movi $r0, 4;
add $r1, $r1, $r0 {addsi3}
Constant propagation
=> addi $r1, $1, 4 {addsi3}
12. Operand Constraints
GCC use two level operand constraint
group same semantic instruction together with
single instruction pattern (addsi3)
Lots of ISA designed have several assembly
instructions with same semantic and different
operand constraint
Reduce the instruction pattern when porting
13. Operand Constraints
Use instruction pattern do ISA support
checking when GCC generate a new RTL
pattern
Check does the back end define the pattern by
define_insn
Check the operand type support or not by
predicate
Check the operand belong to which alternative
by constraint
14. Operand Constraints
Multiple Alternative Constraints
(define_insn “addsi3"
[
(set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i"))
)
] ... )
First alternative constraints
match “add”
Second alternative constraints
match “addi”
17. Strict RTL
Does the new generated RTL pattern
always satisfy constraint ?
GCC allow certain kind un-match constraint
which reload could fix it later
Predicate must always satisfy
RTL1
Not do optimization1
Do optimization1
RTL1
RTL2
Reload
Reload
RTL3
RTL2 not satisfy constraint
RTL4
1. RTL3 and RTL4
Satisfy constraint
2. RTL4 is better
Then RTL3
18. Strict RTL
Constraint could allow certain un-match before
reload, and hope reload to fix it
Ex: constraint is m (memory), but current operand is
constant, GCC will allow before reload
Reload phase is after register allocation
In fact, during register allocation, GCC will call reload rapidly
while the operand not fit the constraint.
After reload, the operand must satisfy one of the
operand constraint (strict RTL)
19. Strict RTL
(define_insn “movsi"
[
(set (match_operand:SI 0 “register_operand" "=r,m")
(match_operand:SI 1 “register_operand" “r,r"))
)
] ... )
(set (reg/f:SI 47)
(reg:SI 60))
(set (reg/f:SI 47)
(reg:SI 3))
Assume
After register allocation
Pseudo register r60 assigned to r3
and the hardware register is exhausted
RA (set (mem:SI (plus (sp)(const))))
(reg:SI 3))
Reload
20. Target defined constraints
Target could define their own predicate and
constraint
Target defined predicate
(define_predicate "index_operand"
(ior (match_operand 0 "register_operand")
(and (match_operand 0 “const_int_operand")
(match_test "(INTVAL (op) < 4096
&& INTVAL (op) > -4096))")))
21. Target defined constraints
Target defined constraint
(define_register_constraint "l"
"LO_REGS"
"registers r0->r7.")
(define_memory_constraint "Uv"
"@internal In ARM/Thumb-2 state a valid VFP load/store address."
(and (match_code "mem")
(match_test "TARGET_32BIT
&& arm_coproc_mem_operand (op, FALSE)")))
23. Target information usage
When will GCC use target information get from
instruction patterns ?
RTL instruction pattern generation
generate insn-emit.c when building GCC by parsing instruction
patterns
RTL instruction validation (target supported)
generate insn-recog.c when building GCC by parsing instruction
patterns
Emit target assembly code
generate insn-output.c when building GCC by parsing
instruction patterns
24. Preserve word to describe instruction
pattern
define_insn
“naming pattern”
define_expand
“naming pattern”
define_insn
“*..”
RTL generation
RTL validation
Emit assembly
GCC define several “naming patterns” and their semantic use to
generate RTL pattern during RTL expand phase
ex: addsi3, subsi3, movsi, movhi …
Some target ISA which the semantic not defined in GCC naming
pattern but the RTL could generate by some optimization
ex: add_slli could generate by combine phase
define un-naming pattern make the instruction validate
define_insn “*add_slli”
define_insn name with * prefix will identify as un-naming pattern
25. Example of instruction pattern
1600 ;; These control RTL generation for conditional jump insns
1601 (define_expand "cbranchsi4"
1602 [(set (pc)
1603 (if_then_else (match_operator 0 "ordered_comparison_operator"
1604 [(match_operand:SI 1 "nonmemory_nonsymbol_operand" "")
1605 (match_operand:SI 2 "nonmemory_nonsymbol_operand" "")])
1606 (label_ref (match_operand 3 "" ""))
1607 (pc)))]
1608 ""
1609 {
1610 sh_expand_cbranchsi4 (operands);
1611 DONE;
1612 }
1613 )
Semantic of “cbranchsi4”
compare operand1 and operand 2 by operator 0
branch to label 3 if the compare result is true
Predicate "ordered_comparison_operator“ including EQ,NE,
LT,LTU,LE,LEU,GT,GTU,GE,GEU.
Use porting function sh_expand_cbranchsi4 to generate RTL pattern
26. Example of instruction pattern
1621 (define_insn "*bcondz"
1622 [(set (pc)
1623 (if_then_else (match_operator 0 "bcondz_operator"
1624 [(match_operand:SI 1 "register_operand" "r")
1625 (const_int 0)])
1626 (label_ref (match_operand 2 "" ""))
1627 (pc)))]
1628 ""
1629 {
1630 switch (GET_CODE (operands[0]))
1631 {
1632 case EQ:
1633 return "beqz %1, %2";
1634 case NE:
1635 return "bnez %1, %2";
1636 case LT:
1637 return "bltz %1, %2";
1638 case LE:
1639 return "blez %1, %2";
1640 case GT:
1641 return "bgtz %1, %2";
1642 case GE:
1643 return "bgez %1, %2";
1644 default:
1645 gcc_unreachable ();
1646 }
1647 }
Un-naming pattern “*bcondz”
Use to validate RTL and emit
assembly code for the branch
compare with zero
27. Example of instruction pattern
1388 (define_insn "one_cmplsi2"
1389 [(set (match_operand:SI 0 "register_operand" "=r")
1390 (not:SI (match_operand:SI 1 "register_operand" "r")))]
1391 ""
1392 "nort%0, %1, %1“)
Semantic of “one_cmplsi2”
not operand1 and set to operand 0
Naming pattern “one_cmplsi2” use to generate RTL, validate RTL
And output assembly code
Output assembly “nor ra, rb, rb” to match the semantic
28. Split instruction pattern
When will need split instruction pattern ?
The const_int value too big that single assembly
instruction can’t encode
Split the const_int to high part and low part
Could split the constant while define_expand
But it’s not good enough, why?
Too early split the constant will lost the
opportunity to optimize the RTL pattern
29. Split instruction pattern
The optimization phase “move2add”could do the following
thing (use assembly code to present RTL semantic for
convenient )
move $r0, 123456
move $r1, 123457
move $r2, 123458
move $r0, 123456
addi $r1, $r0, 1
addi $r2, $r0, 2
sethi $r0, hi20(123456)
ori $r0, lo12(123456)
sethi $r1, hi20(123457)
ori $r1, lo12(123457)
sethi $r2, hi20(123458)
ori $r2, lo12(123458)
If split const_int to high/low part too
early
move2add will fail to transfer move
to add
30. Split instruction pattern
How to split instruction pattern not in RTL
expand phase ?
Use define_split, define_insn_and_split
37. Instruction attribute
Attribute “type” use to divide instruction to
several instruction group
Help to write instruction scheduling porting code
Attribute “length” give each instruction ISA
length (size) information make the GCC
could calculate branch distance correctly.
39. Instruction scheduling
Instruction scheduling is the optimization
pass in GCC
change instruction without changing the
semantic of the code
To reduce the pipeline stall to improve
performance
Instruction scheduling is belong to RTL phase
41. Instruction scheduling
GCC have two scheduling pass
Sched1
Do the interblock scheduling before Register allocation
Try to find the innermost loop as region
Schedule the instructions in the region
Improve the performance of hot spot (innermost loop)
Extend the scope to region to find more scheduling
opportunity
Sched2
Do the single basic block scheduling after Register allocation
Register allocation may produce spill code (load/store)
Need re-schedule again
42. Instruction scheduling
Instruction scheduling resolve the following
hazard to prevent pipeline stall
Structure hazard
Structure hazard occur when two or more instruction
need the same function unit at the same time
Data hazard
RAW (read after write): a true dependency
WAR (write after read): a anti-dependency
WAW(write after write): an output dependency
43. Instruction scheduling
GCC provide several interface to describe
pipeline model
After parsing the pipeline description porting
code
Gcc will generate a automata as a pipeline hazard
recognizer
To figure out the possibility of the instruction issue by
the processor on a given simulated cycle
(define_automaton “name")
44. Instruction scheduling
(define_automaton “a1")
(define_cpu_unit "decode1,decode2" "a1")
(define_cpu_unit "div" "a1")
(define_insn_reservation “alu_class" 1
(eq_attr “type" “alu")
"decode + alu")
(define_insn_reservation "mult_class" 1
(eq_attr “type" "mult")
"decode + mult")
a1: automata name
decode1, decode2, div: the cpu unit
(function unit) in the processor
define_insn_reservation: describe
pipeline rule for each instruction class
alu_class,mult_class:
insn-name (insn class)
(eq_attr “type" “alu"): match the rule
while the type attribute of the
Instruction pattern is alu
"decode + alu": regular expression
to describe the function unit usage
1 is the default cycle when the data
dependency occur
46. Instruction scheduling
(define_automaton “a1")
(define_cpu_unit "decode1,decode2" "a1")
(define_cpu_unit “alu" "a1")
(define_cpu_unit “mult" "a1")
(define_insn_reservation “alu_class" 1
(eq_attr “type" “alu")
"decode + alu")
(define_insn_reservation "mult_class" 1
(eq_attr “type" "mult")
"decode + mult")
nothing
decode
+ alu
decode
+ mult
alu_class
next_cycle
next_cycle
mult_class
next_cycle
Current CPU
Function unit usage
Next cycle CPU
Function unit usage
State transition:
1. Occupy some function unit
2. release function some unit
47. Instruction scheduling
(define_insn_reservation “alu_class" 1
(eq_attr “type" “alu")
"decode + alu")
(define_insn_reservation "mult_class" 1
(eq_attr “type" "mult")
"decode + mult")
(define_bypass 2 “alu_class" “alu_class“)
(define_bypass 3 “mult_class" “mult_class“)
producer consumer t 1 2 3 4 5
alu_class alu_class
mult_class
1 0 0 0 0
0 0 0 0 0
mult_class alu_class
mult_class
0 0 0 0 0
1 1 0 0 0
1 means will stall at t cycle
t cycle is the cycle time
After producer