LLVM is currently finalizing the migration from typed pointers (i32*) to opaque pointers (ptr) -- the likely largest intermediate representation change in LLVM's history. In this talk, we'll discuss the motivations for the change, how it will affect developers working on/with LLVM in practice, and why this migration took such a long time. We'll also briefly cover possible future IR changes based on opaque pointers.
Handling inline assembly in Clang and LLVMMin-Yih Hsu
The document discusses how inline assembly is processed in Clang and LLVM. It outlines the target-specific logic involved at each stage, including operand constraint validation in Clang, lowering operands to LLVM IR, and printing different operand types in the assembler. Target-specific callbacks for inline assembly handling are scattered throughout Clang and LLVM. The goals are to learn the inline assembly workflow and provide documentation for backend developers to add support.
This document discusses debugging the ACPI subsystem in the Linux kernel. It provides an overview of the ACPI subsystem and components like ACPICA and the namespace. It describes how to enable ACPI debug logging via the acpi.debug_layer and acpi.debug_level kernel parameters. It also covers overriding ACPI definition blocks tables and tracing ACPI temperature as a debugging case study.
The document discusses various compiler data structure nodes used in the Java HotSpot VM compiler such as Node, RegionNode, PhiNode, and LoopNode. It describes how nodes are connected and how methods like Ideal, Value, and Identity are used to optimize nodes. The nodes represent an intermediate representation of the code during compilation and are manipulated throughout the various phases of compilation.
How to create SystemVerilog verification environment?Sameh El-Ashry
Basic knowledge for the verification engineer to learn the art of creating SystemVerilog verification environment.
Starting from the specifications extraction till coverage closure.
This document contains code and documentation related to RISC-V target machine configuration in LLVM. It discusses RISC-V architecture types (32-bit and 64-bit), feature strings, code generation options and relocation models for RISC-V. Code snippets show the RISC-VTargetMachine constructor and Triple class for specifying RISC-V configuration.
The document discusses Inter-Integrated Circuit (I2C) drivers in Linux. It covers I2C protocol basics like start/stop conditions and transactions. It then describes the I2C subsystem in Linux including I2C adapter, client and core drivers. It also discusses registering I2C devices without and with device tree, and provides examples of I2C client drivers.
Handling inline assembly in Clang and LLVMMin-Yih Hsu
The document discusses how inline assembly is processed in Clang and LLVM. It outlines the target-specific logic involved at each stage, including operand constraint validation in Clang, lowering operands to LLVM IR, and printing different operand types in the assembler. Target-specific callbacks for inline assembly handling are scattered throughout Clang and LLVM. The goals are to learn the inline assembly workflow and provide documentation for backend developers to add support.
This document discusses debugging the ACPI subsystem in the Linux kernel. It provides an overview of the ACPI subsystem and components like ACPICA and the namespace. It describes how to enable ACPI debug logging via the acpi.debug_layer and acpi.debug_level kernel parameters. It also covers overriding ACPI definition blocks tables and tracing ACPI temperature as a debugging case study.
The document discusses various compiler data structure nodes used in the Java HotSpot VM compiler such as Node, RegionNode, PhiNode, and LoopNode. It describes how nodes are connected and how methods like Ideal, Value, and Identity are used to optimize nodes. The nodes represent an intermediate representation of the code during compilation and are manipulated throughout the various phases of compilation.
How to create SystemVerilog verification environment?Sameh El-Ashry
Basic knowledge for the verification engineer to learn the art of creating SystemVerilog verification environment.
Starting from the specifications extraction till coverage closure.
This document contains code and documentation related to RISC-V target machine configuration in LLVM. It discusses RISC-V architecture types (32-bit and 64-bit), feature strings, code generation options and relocation models for RISC-V. Code snippets show the RISC-VTargetMachine constructor and Triple class for specifying RISC-V configuration.
The document discusses Inter-Integrated Circuit (I2C) drivers in Linux. It covers I2C protocol basics like start/stop conditions and transactions. It then describes the I2C subsystem in Linux including I2C adapter, client and core drivers. It also discusses registering I2C devices without and with device tree, and provides examples of I2C client drivers.
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)Shinya Takamaeda-Y
The document describes the process to set up Debian Linux on a Zynq FPGA board using a Zybo board as a reference platform. The key steps include:
1. Developing the hardware design in Vivado, including adding a CPU, GPIO for LEDs and switches, and generating a bitstream;
2. Compiling U-boot and the Linux kernel, as well as creating a device tree and root filesystem;
3. Setting up an SD card and booting the system from the SD card.
This document discusses adding support for PCI Express and new chipset emulation to Qemu. It introduces a new Q35 chipset emulator with support for 64-bit BAR, PCIe MMCONFIG, multiple PCI buses and slots. Future work includes improving PCIe hotplug, passthrough and power management as well as switching the BIOS to SeaBIOS and improving ACPI table support. The goal is to modernize Qemu's emulation of PCI features to match capabilities of newer hardware.
Profiling the ACPICA Namespace and Event HandingSUSE Labs Taipei
This document summarizes a presentation about profiling the ACPI namespace and event handling. It discusses the key components of ACPICA and how it interacts with the Linux kernel. It describes how definition blocks are parsed to build the ACPI namespace, and how fixed events and general purpose events (GPEs) are handled through event detection and dispatching to handlers in the kernel or by evaluating control methods.
Join this video course on udemy . Click here :
https://www.udemy.com/course/mastering-microcontroller-with-peripheral-driver-development/?couponCode=SLIDESHARE
Learn bare metal driver development systems using Embedded C: Writing drivers for STM32 GPIO,I2C,SPI,USART from scratch
Software/Hardware used:
In this course, the code is developed such a way that, It can be ported to any MCU you have at your hand.
If you need any help in porting these codes to different MCUs you can always reach out to me!
The course is strictly not bound to any 1 type of MCU. So, if you already have any Development board which runs with ARM-Cortex M3/M4 processor,
then I recommend you to continue using it.
But if you don’t have any Development board, then check out the below Development boards.
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMULinaro
This document discusses moving QEMU's Tiny Code Generator (TCG) to a multi-threaded model to take advantage of multi-core systems. It describes the current single-threaded TCG process model and global state. Approaches considered for multi-threading include using threads/locks, processes/IPC, or rewriting TCG from scratch. Key challenges addressed are protecting code generation globals and implementing atomic memory operations and memory barriers in a multi-threaded context. Patches have been contributed to address these issues and enable multi-threaded TCG. Further work remains to fully enable it across all QEMU backends and architectures.
Kernel Recipes 2015: Representing device-tree peripherals in ACPIAnne Nicolas
Platforms using ACPI firmware are becoming increasingly interesting to embedded developers. This presentation will demonstrate the new features in the ACPI 5.1 specification which make it possible for ACPI to transparently represent devices using existing device-tree bindings, and for Linux to use existing device drivers which should automatically work for both ACPI and device-tree.
David Woodhouse, Intel
This document discusses eBPF (extended Berkeley Packet Filter), which allows tracing from the Linux kernel to userspace using BPF programs. It provides an overview of eBPF including extended registers, verification, maps, and probes. Examples are given of using eBPF for tracing functions like kfree_skb() and the C library function malloc. The Berkeley Compiler Collection (BCC) makes it easy to write eBPF programs in C and Python.
The document provides an introduction to the C programming language and algorithms. It begins with an overview of C and its history. It then defines key concepts like keywords, data types, qualifiers, loops, storage classes, decision statements, and jumps. Examples of algorithms are provided for common problems like adding two numbers. Pattern printing algorithms are given as homework exercises. The document discusses where C is used and explains what a programming language and algorithms are. It emphasizes the importance of understanding requirements before implementation.
The document discusses virtual machines and JavaScript engines. It provides a brief history of virtual machines from the 1970s to today. It then explains how virtual machines work, including the key components of a parser, intermediate representation, interpreter, garbage collection, and optimization techniques. It discusses different approaches to interpretation like switch statements, direct threading, and inline threading. It also covers compiler optimizations and just-in-time compilation that further improve performance.
JVM Mechanics: Understanding the JIT's TricksDoug Hawkins
In this talk, we'll walkthrough how the JIT optimizes a piece Java code step-by-step. In doing so, you'll learn some of the amazing feats of optimization that JVMs can perform, but also some surprisingly simple things that prevent your code from running fast.
This document discusses Linux kernel debugging. It provides an overview of debugging techniques including collecting system information, handling failures, and using printk(), KGDB, and debuggers. Key points covered are the components of a debugger, how KGDB can be used with gdb to debug interactively, analyzing crash data, and some debugging tricks and print functions.
Actor model which is currently trending in software development is a popular design approach when it comes to complex applications. Many systems written in Erlang (Akka framework) are designed with actor model in their core. But Erlang and Akka are managed environments and safe programming languages. Is it worth using actor model in C++? If yes, where to look and what to use? The talk will cover all these topics.
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
This document provides an overview of using eBPF (extended Berkeley Packet Filter) to quickly get performance wins as a sysadmin. It recommends installing BCC and bpftrace tools to easily find issues like periodic processes, misconfigurations, unexpected TCP sessions, or slow file system I/O. A case study examines using biosnoop to identify which processes were causing disk latency issues. The document suggests thinking like a sysadmin first by running tools, then like a programmer if a problem requires new tools. It also outlines recommended frontends depending on use cases and provides references to learn more about BPF.
The embedded Linux boot process involves multiple stages beginning with ROM code that initializes hardware and loads the first stage bootloader, X-Loader. The X-Loader further initializes hardware and loads the second stage bootloader, U-Boot, which performs additional initialization and loads the Linux kernel. The kernel then initializes drivers and mounts the root filesystem to launch userspace processes. Booting can occur from flash memory, an eMMC/SD card, over a network using TFTP/NFS, or locally via UART/USB depending on the boot configuration and available devices.
This document discusses how Qemu works to translate guest binaries to run on the host machine. It first generates an intermediate representation called TCG-IR from the guest binary code. It then translates the TCG-IR into native host machine code. To achieve high performance, it chains translated blocks together by patching jump targets. Key techniques include just-in-time compilation, translation block finding, block chaining, and helper functions to emulate unsupported guest instructions.
Here is a bpftrace program to measure scheduler latency for ICMP echo requests:
#!/usr/local/bin/bpftrace
kprobe:icmp_send {
@start[tid] = nsecs;
}
kprobe:__netif_receive_skb_core {
@diff[tid] = hist(nsecs - @start[tid]);
delete(@start[tid]);
}
END {
print(@diff);
clear(@diff);
}
This traces the time between the icmp_send kernel function (when the packet is queued for transmit) and the __netif_receive_skb_core function (when the response packet is received). The
Exploiting the Linux Kernel via Intel's SYSRET Implementationnkslides
Intel handles SYSRET instructions weirdly and might throw around exceptions while still being in ring0. When the kernel is not being extra careful when returning to userland after being signaled with a syscall bad things can happen. Like root shells.
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)Shinya Takamaeda-Y
The document describes the process to set up Debian Linux on a Zynq FPGA board using a Zybo board as a reference platform. The key steps include:
1. Developing the hardware design in Vivado, including adding a CPU, GPIO for LEDs and switches, and generating a bitstream;
2. Compiling U-boot and the Linux kernel, as well as creating a device tree and root filesystem;
3. Setting up an SD card and booting the system from the SD card.
This document discusses adding support for PCI Express and new chipset emulation to Qemu. It introduces a new Q35 chipset emulator with support for 64-bit BAR, PCIe MMCONFIG, multiple PCI buses and slots. Future work includes improving PCIe hotplug, passthrough and power management as well as switching the BIOS to SeaBIOS and improving ACPI table support. The goal is to modernize Qemu's emulation of PCI features to match capabilities of newer hardware.
Profiling the ACPICA Namespace and Event HandingSUSE Labs Taipei
This document summarizes a presentation about profiling the ACPI namespace and event handling. It discusses the key components of ACPICA and how it interacts with the Linux kernel. It describes how definition blocks are parsed to build the ACPI namespace, and how fixed events and general purpose events (GPEs) are handled through event detection and dispatching to handlers in the kernel or by evaluating control methods.
Join this video course on udemy . Click here :
https://www.udemy.com/course/mastering-microcontroller-with-peripheral-driver-development/?couponCode=SLIDESHARE
Learn bare metal driver development systems using Embedded C: Writing drivers for STM32 GPIO,I2C,SPI,USART from scratch
Software/Hardware used:
In this course, the code is developed such a way that, It can be ported to any MCU you have at your hand.
If you need any help in porting these codes to different MCUs you can always reach out to me!
The course is strictly not bound to any 1 type of MCU. So, if you already have any Development board which runs with ARM-Cortex M3/M4 processor,
then I recommend you to continue using it.
But if you don’t have any Development board, then check out the below Development boards.
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMULinaro
This document discusses moving QEMU's Tiny Code Generator (TCG) to a multi-threaded model to take advantage of multi-core systems. It describes the current single-threaded TCG process model and global state. Approaches considered for multi-threading include using threads/locks, processes/IPC, or rewriting TCG from scratch. Key challenges addressed are protecting code generation globals and implementing atomic memory operations and memory barriers in a multi-threaded context. Patches have been contributed to address these issues and enable multi-threaded TCG. Further work remains to fully enable it across all QEMU backends and architectures.
Kernel Recipes 2015: Representing device-tree peripherals in ACPIAnne Nicolas
Platforms using ACPI firmware are becoming increasingly interesting to embedded developers. This presentation will demonstrate the new features in the ACPI 5.1 specification which make it possible for ACPI to transparently represent devices using existing device-tree bindings, and for Linux to use existing device drivers which should automatically work for both ACPI and device-tree.
David Woodhouse, Intel
This document discusses eBPF (extended Berkeley Packet Filter), which allows tracing from the Linux kernel to userspace using BPF programs. It provides an overview of eBPF including extended registers, verification, maps, and probes. Examples are given of using eBPF for tracing functions like kfree_skb() and the C library function malloc. The Berkeley Compiler Collection (BCC) makes it easy to write eBPF programs in C and Python.
The document provides an introduction to the C programming language and algorithms. It begins with an overview of C and its history. It then defines key concepts like keywords, data types, qualifiers, loops, storage classes, decision statements, and jumps. Examples of algorithms are provided for common problems like adding two numbers. Pattern printing algorithms are given as homework exercises. The document discusses where C is used and explains what a programming language and algorithms are. It emphasizes the importance of understanding requirements before implementation.
The document discusses virtual machines and JavaScript engines. It provides a brief history of virtual machines from the 1970s to today. It then explains how virtual machines work, including the key components of a parser, intermediate representation, interpreter, garbage collection, and optimization techniques. It discusses different approaches to interpretation like switch statements, direct threading, and inline threading. It also covers compiler optimizations and just-in-time compilation that further improve performance.
JVM Mechanics: Understanding the JIT's TricksDoug Hawkins
In this talk, we'll walkthrough how the JIT optimizes a piece Java code step-by-step. In doing so, you'll learn some of the amazing feats of optimization that JVMs can perform, but also some surprisingly simple things that prevent your code from running fast.
This document discusses Linux kernel debugging. It provides an overview of debugging techniques including collecting system information, handling failures, and using printk(), KGDB, and debuggers. Key points covered are the components of a debugger, how KGDB can be used with gdb to debug interactively, analyzing crash data, and some debugging tricks and print functions.
Actor model which is currently trending in software development is a popular design approach when it comes to complex applications. Many systems written in Erlang (Akka framework) are designed with actor model in their core. But Erlang and Akka are managed environments and safe programming languages. Is it worth using actor model in C++? If yes, where to look and what to use? The talk will cover all these topics.
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
This document provides an overview of using eBPF (extended Berkeley Packet Filter) to quickly get performance wins as a sysadmin. It recommends installing BCC and bpftrace tools to easily find issues like periodic processes, misconfigurations, unexpected TCP sessions, or slow file system I/O. A case study examines using biosnoop to identify which processes were causing disk latency issues. The document suggests thinking like a sysadmin first by running tools, then like a programmer if a problem requires new tools. It also outlines recommended frontends depending on use cases and provides references to learn more about BPF.
The embedded Linux boot process involves multiple stages beginning with ROM code that initializes hardware and loads the first stage bootloader, X-Loader. The X-Loader further initializes hardware and loads the second stage bootloader, U-Boot, which performs additional initialization and loads the Linux kernel. The kernel then initializes drivers and mounts the root filesystem to launch userspace processes. Booting can occur from flash memory, an eMMC/SD card, over a network using TFTP/NFS, or locally via UART/USB depending on the boot configuration and available devices.
This document discusses how Qemu works to translate guest binaries to run on the host machine. It first generates an intermediate representation called TCG-IR from the guest binary code. It then translates the TCG-IR into native host machine code. To achieve high performance, it chains translated blocks together by patching jump targets. Key techniques include just-in-time compilation, translation block finding, block chaining, and helper functions to emulate unsupported guest instructions.
Here is a bpftrace program to measure scheduler latency for ICMP echo requests:
#!/usr/local/bin/bpftrace
kprobe:icmp_send {
@start[tid] = nsecs;
}
kprobe:__netif_receive_skb_core {
@diff[tid] = hist(nsecs - @start[tid]);
delete(@start[tid]);
}
END {
print(@diff);
clear(@diff);
}
This traces the time between the icmp_send kernel function (when the packet is queued for transmit) and the __netif_receive_skb_core function (when the response packet is received). The
Exploiting the Linux Kernel via Intel's SYSRET Implementationnkslides
Intel handles SYSRET instructions weirdly and might throw around exceptions while still being in ring0. When the kernel is not being extra careful when returning to userland after being signaled with a syscall bad things can happen. Like root shells.
Optimizing Parallel Reduction in CUDA : NOTESSubhajit Sahu
Highlighted notes on Optimizing Parallel Reduction in CUDA
While doing research work under Prof. Dip Banerjee, Prof. Kishore Kothapalli.
Interesting optimizations, i should try these soon as PageRank is basically lots of sums.
The document discusses optimizing parallel reduction in CUDA. It presents 7 versions of a parallel reduction kernel with optimizations including interleaved vs sequential addressing to avoid bank conflicts, unrolling the last warp, adding the first reduction during global memory loads, and completely unrolling the reduction using C++ templates. Performance is improved from 2 GB/s to over 30 GB/s for a 4 million element reduction, reaching over 15x speedup by eliminating instruction overhead. Templates allow specializing the kernel for different block sizes known only at launch time.
This document discusses optimizing computer vision algorithms on mobile platforms. It recommends first optimizing the algorithm itself before pursuing technical optimizations. Using SIMD instructions can provide a performance boost of up to 4x by processing multiple data elements simultaneously. Libraries can help with vectorization but may not be fully optimized; intrinsics provide more control but require platform-specific code. Handcrafting SIMD assembly code can yield the best performance but is also the most difficult. GPUs via OpenGL ES can provide over an order of magnitude speedup for tasks like image processing but come with limitations on mobile.
The document discusses pointers in C programming. It defines pointers as variables that store memory addresses rather than values. Pointers have several useful applications like accessing variables outside functions, passing information between functions, and efficiently handling data tables. The document then explains basic pointer concepts like declaring pointer variables, assigning memory addresses to pointers using the '&' operator, dereferencing pointers using the '*' operator, and passing pointers to functions. It also discusses pointer arithmetic and relationships between arrays and pointers.
This document provides an overview of pointers in C programming. It begins by explaining where a program's data can be stored, such as the heap, data segment, code segment, and stack segment. It then defines pointers as variables that contain the memory address of another variable. The document outlines how to declare pointer variables, where pointers can be used, and pointer operators like & and *. It also covers pointer arithmetic, comparisons, passing pointers as function parameters, and dynamically allocating memory using pointers. The overall summary is an introduction to pointers, their uses, and basic operations in C.
- Python uses reference counting and a cyclic garbage collector to manage memory and free objects that are no longer referenced. It allocates memory for objects in blocks and assigns them to size classes based on their size.
- Objects in Python have a type and attributes. They are created via type constructors and can have specific attributes like __dict__, __slots__, and descriptors.
- At the Python virtual machine level, code is represented as code objects that contain details like the bytecode, constants, and variable names. Code objects are compiled from Python source files.
Contiguous buffer is what makes code run fast. With the regular and compact layout, data can go through the memory hierarchy in a quick way to the processor. Numpy makes use of it to achieve the runtime almost as fast as C.
However, not all algorithms can be easily maintained or designed by using array operations. A more general ways is to use the contiguous buffer as an interface. The dynamic typing system of Python makes it easy to use but hard to optimize. The typed, fixed-shape buffer allows efficient data sharing between the fast code written in C++ and the Python application. In the end, the system is as easy to use as but much faster than Python. The runtime is the same as C++.
In this talk, I will show how to design a simple array system in C++ and make it available in Python. It may be made as a general-purpose library, or embedded for ad hoc optimization.
LAS16-501: Introduction to LLVM - Projects, Components, Integration, InternalsLinaro
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
Speakers: Renato Golin
Date: September 30, 2016
★ Session Description ★
Deep dive into LLVM internals, middle/back-ends, libraries, sanitizers, linker, debugger and overall compilation process. The focus is to show how LLVM works under the hood, which is useful for GCC compiler engineers getting into LLVM development, as well as for other engineers to learn more about parts of the toolchain they’re not familiar with. This presentation also touches on frequent LLVM-specific errors, so GCC users may find useful, if they’re moving to LLVM.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-501
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-501/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
This document summarizes a presentation on TensorFlow quantization. It discusses:
1. Quantizing models for mobile and IoT devices by converting floats to integers like int8 to reduce size and latency.
2. Post-training quantization which quantizes weights and biases of a trained model. This can reduce size by 70-90% with tools in TensorFlow Lite.
3. Quantization-aware training which trains models directly in a quantized format. Full integer quantization of weights and activations can further reduce size.
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
This document discusses compiler optimizations. It begins with an outline of topics including compilation trajectory, intermediate languages, optimization levels, and optimization techniques. It then provides more details on each phase of compilation, how compilers use intermediate representations to perform optimizations, and specific optimizations like common subexpression elimination, constant propagation, and instruction scheduling.
This document provides an overview of a 5-day training course on C basics for data structures. The course will cover topics such as arrays, pointers, and structures. Specific topics for arrays include how they are stored in memory, static versus dynamic allocation, and pointers to arrays. Pointers will cover what they are, pointer arithmetic, and pointers to pointers. The document includes example code for allocating memory dynamically using malloc and free.
MuVM: Higher Order Mutation Analysis Virtual Machine for CSusumu Tokumoto
Mutation analysis is a method for evaluating the effectiveness of a test suite by seeding faults artificially and measuring the fraction of seeded faults detected by the test suite. The major limitation of mutation analysis is its lengthy execution time because it involves generating, compiling and running large numbers of mutated programs, called mutants. Our tool MuVM achieves a significant runtime improvement by performing higher order mutation analysis using four techniques, metamutation, mutation on virtual machine, higher order split-stream execution, and online adaptation technique. In order to obtain the same behavior as mutating the source code directly, metamutation preserves the mutation location information which may potentially be lost during bitcode compilation and optimization. Mutation on a virtual machine reduces the compilation and testing cost by compiling a program once and invoking a process once. Higher order split-stream execution also reduces the testing cost by executing common parts of the mutants together and splitting the execution at a seeded fault. Online adaptation technique reduces the number of generated mutants by omitting infeasible mutants. Our comparative experiments indicate that our tool is significantly superior to an existing tool, an existing technique (mutation schema generation), and no-split-stream execution in higher order mutation.
Specializing the Data Path - Hooking into the Linux Network StackKernel TLV
Ever needed to add your custom logic into the network stack?
Ever hacked the network stack but wasn't certain you're doing it right?
Shmulik Ladkani talks about various mechanisms for customizing packet processing logic to the network stack's data path.
He covers covering topics such as packet sockets, netfilter hooks, traffic control actions and ebpf. We will discuss their applicable use-cases, advantages and disadvantages.
Shmulik Ladkani is a Tech Lead at Ravello Systems.
Shmulik started his career at Jungo (acquired by NDS/Cisco) implementing residential gateway software, focusing on embedded Linux, Linux kernel, networking and hardware/software integration.
51966 coffees and billions of forwarded packets later, with millions of homes running his software, Shmulik left his position as Jungo’s lead architect and joined Ravello Systems (acquired by Oracle) as tech lead, developing a virtual data center as a cloud service. He's now focused around virtualization systems, network virtualization and SDN.
This is a toy compiler for the course Compiler 2016 at ACM Class, SJTU. The source language is Mx*. The target is MIPS assembly (in SPIM format).
You can refer to my presentation slides to know something about this compiler and also what I've learnt during the course.
Github: https://github.com/abcdabcd987/compiler2016
When debugging this compiler, I wrote another project LLIRInterpreter(https://github.com/abcdabcd987/LLIRInterpreter) which reads text IR and does interpretation.
This document provides an overview and introduction to GlobalISel, including:
- What GlobalISel is and how it differs from SelectionDAG
- The key stages in the GlobalISel compilation flow
- Examples of basic arithmetic instruction selection and type legalization in GlobalISel
- Tips for building GlobalISel support in a backend, such as handling constants and register selection
1. The document summarizes key concepts about C/C++ including program structure, data types like arrays and structures, pointers, dynamic memory allocation, and linked lists.
2. Pointers allow accessing and modifying the value at a memory address, and are used with dynamic memory allocation functions like malloc() to allocate and free memory dynamically at runtime.
3. Linked lists provide dynamic memory structures by linking nodes together using pointers, with doubly-linked lists using two pointers in each node - one to the next node and one to the previous.
PHP 8.0 comes with many long-awaited features: A just-in-time compiler, attributes, union types, and named arguments are just a small part of the list. As a major version, it also includes some backward-incompatible changes, which are centered around stricter error handling and enhanced type safety. Let's have an overview of the important changes in PHP 8.0 and how they might affect you!
The document discusses the Just-In-Time (JIT) compiler that was introduced in PHP 8. It provides a brief history of JIT in PHP, explaining that early prototypes showed that the rest of PHP was too slow to benefit from JIT. It then discusses how optimizations from JIT were integrated into opcache without needing a full JIT. It provides information on configuring and using the JIT compiler, and shows performance improvements on benchmarks. It also provides an example of how a function is compiled to machine code by the JIT compiler.
PHP 8.0 is expected to be released by the end of the year, so it’s time to take a first look at the next major version of PHP. Attributes, union types, and a just-in-time compiler are likely the flagship features of this release, but there are many more improvements to be excited about. As PHP 8.0 is a major version, this release also includes backwards-incompatible changes, many of which are centered around stricter error handling and more type safety.
Presentation from phpfwdays 2020.
This talk discusses various issues of low-level PHP performance, such as: When is it more efficient to use arrays or objects? What causes catastrophic garbage collection? Does adding type annotations make PHP faster or slower?
I will answer these types of question with a (shallow) dive into PHP internals, touching on various topics like value representation, bytecode optimization and GC.
Typed Properties and more: What's coming in PHP 7.4?Nikita Popov
The document summarizes new features coming in PHP 7.4, including typed properties, arrow functions, the nullsafe operator, and array spread syntax. It also discusses future language features like property accessors and generics. Some deprecations are noted, such as changes to ternary operator and concatenation precedence to avoid ambiguity.
Static Optimization of PHP bytecode (PHPSC 2017)Nikita Popov
This document discusses static optimization of PHP bytecode. It describes optimizations like constant propagation, dead code elimination, inlining, and specialization that have been implemented in PHP. It also discusses challenges to optimization from features like references, eval(), and variable variables. Type inference using static single assignment form is explained. Metrics on performance improvements from optimizations in libraries and applications like WordPress are provided. Current and future work on additional optimizations in PHP is mentioned.
PHP 7 – What changed internally? (Forum PHP 2015)Nikita Popov
One of the main selling points of PHP 7 is greatly improved performance, with many real-world applications now running twice as fast… But where do these improvements come from?
At the core of PHP 7 lies an engine rewrite with focus on improving memory usage and performance. This talk provides an overview of the most significant changes, briefly covering everything from data structure changes, over enhancements in the executor, to the new compiler implementation.
PHP 7 – What changed internally? (PHP Barcelona 2015)Nikita Popov
The document discusses optimizations made in PHP 7 to improve performance. Some key points:
- PHP 7 optimized how values and objects are stored in memory by removing unnecessary overhead like reference counting. This reduces memory usage and indirection.
- Arrays were optimized to reduce the number of allocations, size, and lookup indirections compared to PHP 5.
- Objects saw reduced allocations, size and indirection for property values in PHP 7.
- Other optimizations included using 64-bit integers, improving immutable array performance, and enhancing the virtual machine to better manage function call stacks.
As a result of an engine rewrite with focus on more efficient data structures, PHP 7 offers much improved performance and memory usage. This session describes important aspects of the new implementation and how it compares to PHP 5. A particular focus will be on the representation of values, arrays and objects.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
Project Management: The Role of Project Dashboards.pdfKarya Keeper
Project management is a crucial aspect of any organization, ensuring that projects are completed efficiently and effectively. One of the key tools used in project management is the project dashboard, which provides a comprehensive view of project progress and performance. In this article, we will explore the role of project dashboards in project management, highlighting their key features and benefits.
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Odoo releases a new update every year. The latest version, Odoo 17, came out in October 2023. It brought many improvements to the user interface and user experience, along with new features in modules like accounting, marketing, manufacturing, websites, and more.
The Odoo 17 update has been a hot topic among startups, mid-sized businesses, large enterprises, and Odoo developers aiming to grow their businesses. Since it is now already the first quarter of 2024, you must have a clear idea of what Odoo 17 entails and what it can offer your business if you are still not aware of it.
This blog covers the features and functionalities. Explore the entire blog and get in touch with expert Odoo ERP consultants to leverage Odoo 17 and its features for your business too.
An Overview of Odoo ERP
Odoo ERP was first released as OpenERP software in February 2005. It is a suite of business applications used for ERP, CRM, eCommerce, websites, and project management. Ten years ago, the Odoo Enterprise edition was launched to help fund the Odoo Community version.
When you compare Odoo Community and Enterprise, the Enterprise edition offers exclusive features like mobile app access, Odoo Studio customisation, Odoo hosting, and unlimited functional support.
Today, Odoo is a well-known name used by companies of all sizes across various industries, including manufacturing, retail, accounting, marketing, healthcare, IT consulting, and R&D.
The latest version, Odoo 17, has been available since October 2023. Key highlights of this update include:
Enhanced user experience with improvements to the command bar, faster backend page loading, and multiple dashboard views.
Instant report generation, credit limit alerts for sales and invoices, separate OCR settings for invoice creation, and an auto-complete feature for forms in the accounting module.
Improved image handling and global attribute changes for mailing lists in email marketing.
A default auto-signature option and a refuse-to-sign option in HR modules.
Options to divide and merge manufacturing orders, track the status of manufacturing orders, and more in the MRP module.
Dark mode in Odoo 17.
Now that the Odoo 17 announcement is official, let’s look at what’s new in Odoo 17!
What is Odoo ERP 17?
Odoo 17 is the latest version of one of the world’s leading open-source enterprise ERPs. This version has come up with significant improvements explained here in this blog. Also, this new version aims to introduce features that enhance time-saving, efficiency, and productivity for users across various organisations.
Odoo 17, released at the Odoo Experience 2023, brought notable improvements to the user interface and added new functionalities with enhancements in performance, accessibility, data analysis, and management, further expanding its reach in the market.
INTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLESanfaltahir1010
Image: Include an image that represents the concept of precision, such as a AI helix or a futuristic healthcare
setting.
Objective: Provide a foundational understanding of precision medicine and its departure from traditional
approaches
Role of theory: Discuss how genomics, the study of an organism's complete set of AI ,
plays a crucial role in precision medicine.
Customizing treatment plans: Highlight how genetic information is used to customize
treatment plans based on an individual's genetic makeup.
Examples: Provide real-world examples of successful application of AI such as genetic
therapies or targeted treatments.
Importance of molecular diagnostics: Explain the role of molecular diagnostics in identifying
molecular and genetic markers associated with diseases.
Biomarker testing: Showcase how biomarker testing aids in creating personalized treatment plans.
Content:
• Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and
potential misuse of genetic information.
• Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard
patient rights.
• Visuals: Include images or icons representing ethical considerations.
Content:
• Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and
potential misuse of genetic information.
• Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard
patient rights.
• Visuals: Include images or icons representing ethical considerations.
Content:
• Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and
potential misuse of genetic information.
• Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard
patient rights.
• Visuals: Include images or icons representing ethical considerations.
Real-world case study: Present a detailed case study showcasing the success of precision
medicine in a specific medical scenario.
Patient's journey: Discuss the patient's journey, treatment plan, and outcomes.
Impact: Emphasize the transformative effect of precision medicine on the individual's
health.
Objective: Ground the presentation in a real-world example, highlighting the practical
application and success of precision medicine.
Data challenges: Address the challenges associated with managing large sets of patient data in precision
medicine.
Technological solutions: Discuss technological innovations and solutions for handling and analyzing vast
datasets.
Visuals: Include graphics representing data management challenges and technological solutions.
Objective: Acknowledge the data-related challenges in precision medicine and highlight innovative solutions.
Data challenges: Address the challenges associated with managing large sets of patient data in precision
medicine.
Technological solutions: Discuss technological innovations and solutions
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...kalichargn70th171
In today's fiercely competitive mobile app market, the role of the QA team is pivotal for continuous improvement and sustained success. Effective testing strategies are essential to navigate the challenges confidently and precisely. Ensuring the perfection of mobile apps before they reach end-users requires thoughtful decisions in the testing plan.
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid
IBM watsonx Code Assistant for Z, our latest Generative AI-assisted mainframe application modernization solution. Mainframe (IBM Z) application modernization is a topic that every mainframe client is addressing to various degrees today, driven largely from digital transformation. With generative AI comes the opportunity to reimagine the mainframe application modernization experience. Infusing generative AI will enable speed and trust, help de-risk, and lower total costs associated with heavy-lifting application modernization initiatives. This document provides an overview of the IBM watsonx Code Assistant for Z which uses the power of generative AI to make it easier for developers to selectively modernize COBOL business services while maintaining mainframe qualities of service.
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...kalichargn70th171
In today's business landscape, digital integration is ubiquitous, demanding swift innovation as a necessity rather than a luxury. In a fiercely competitive market with heightened customer expectations, the timely launch of flawless digital products is crucial for both acquisition and retention—any delay risks ceding market share to competitors.
7. Pointer element types carry no semantics
define void @test(i32* %p)
⇒ Does not imply that %p is 4-byte aligned
define void @test(i32* aligned 4 %p)
7
8. Pointer element types carry no semantics
define void @test(i32* %p)
⇒ Does not imply that %p is 4-byte aligned
define void @test(i32* aligned 4 %p)
⇒ Does not imply that %p is 4-byte dereferenceable
define void @test(i32* dereferenceable(4) %p)
8
9. Pointer element types carry no semantics
define void @test(i32* %p)
⇒ Does not imply that %p is 4-byte aligned
define void @test(i32* aligned 4 %p)
⇒ Does not imply that %p is 4-byte dereferenceable
define void @test(i32* dereferenceable(4) %p)
⇒ Does not imply any aliasing semantics (TBAA metadata does)
9
10. Pointer element types carry no semantics
define void @test(i32* %p)
⇒ Does not imply that %p is 4-byte aligned
define void @test(i32* aligned 4 %p)
⇒ Does not imply that %p is 4-byte dereferenceable
define void @test(i32* dereferenceable(4) %p)
⇒ Does not imply any aliasing semantics (TBAA metadata does)
⇒ Does not imply it will be accessed as i32!
10
11. Pointers can be arbitrarily bitcasted
define i64 @test(double* byval(double) %p) {
%p.p0i8 = bitcast double* %p to i8**
store i8* null, i8** %p.p0i8
%p.i64 = bitcast double* %p to i64*
%x = load i64, i64** %p.i64
ret i64 %x
}
11
12. Only types at certain uses matter
define i64 @test(double* byval(double) %p) {
%p.p0i8 = bitcast double* %p to i8**
store i8* null, i8** %p.p0i8
%p.i64 = bitcast double* %p to i64*
%x = load i64, i64** %p.i64
ret i64 %x
}
12
13. Only types at certain uses matter
define i64 @test(ptr byval(double) %p) {
store ptr 0.0, ptr %p
%x = load i64, ptr %p
ret i64 %x
}
13
14. Why?
● Memory usage: Don’t need to store bitcasts
● Compile-time: Don’t need to skip bitcasts in optimizations
14
18. Why?
● Memory usage: Don’t need to store bitcasts
● Compile-time: Don’t need to skip bitcasts in optimizations
● Performance:
○ Optimizations should ignore pointer bitcasts
18
19. Why?
● Memory usage: Don’t need to store bitcasts
● Compile-time: Don’t need to skip bitcasts in optimizations
● Performance:
○ Optimizations should ignore pointer bitcasts
○ …and many do (e.g. cost models says they’re free)
19
20. Why?
● Memory usage: Don’t need to store bitcasts
● Compile-time: Don’t need to skip bitcasts in optimizations
● Performance:
○ Optimizations should ignore pointer bitcasts
○ …and many do (e.g. cost models says they’re free)
○ …but many don’t (e.g. limited instruction/use walks)
20
21. Why?
● Memory usage: Don’t need to store bitcasts
● Compile-time: Don’t need to skip bitcasts in optimizations
● Performance:
○ Bitcasts can’t affect optimization if they don’t exist
21
22. Equivalence modulo pointer type
define i32* @test(i8** %p) {
store i8* null, i8** %p
%p.i32 = bitcast i8** %p to i32**
%v = load i32*, i32** %p.i32
ret i32* %v
}
EarlyCSE can’t optimize this!
(But full GVN can.)
22
23. Equivalence modulo pointer type
define ptr @test(ptr %p) {
store ptr null, ptr %p
%v = load ptr, ptr %p
ret ptr %v
}
EarlyCSE can optimize this!
23
24. Equivalence modulo pointer type
define ptr @test(ptr %p) {
store ptr null, ptr %p
%v = load ptr, ptr %p
ret ptr %v
}
; RUN: opt -S -early-cse < %s
define ptr @test(ptr %p) {
store ptr null, ptr %p
ret ptr null
}
24
25. Why?
● Memory usage: Don’t need to store bitcasts
● Compile-time: Don’t need to skip bitcasts in optimizations
● Performance:
○ Bitcasts can’t affect optimization if they don’t exist
○ Pointer element type difference cannot prevent CSE / forwarding / etc.
25
34. Offset-based reasoning
define internal i32 @add({ i32, i32 }* %p) {
%p0 = getelementptr { i32, i32 }, { i32, i32 }* %p, i64 0, i32 0
%v0 = load i32, i32* %p0 ← Load of i32 at offset 0
%p1 = getelementptr { i32, i32 }, { i32, i32 }* %p, i64 0, i32 1
%v1 = load i32, i32* %p1 ← Load of i32 at offset 4
%add = add i32 %v0, %v1
ret i32 %add
}
define i32 @caller({ i32, i32 }* %p) {
%res = call i32 @add({ i32, i32 }* %p)
ret i32 %res
}
Derive “struct type” from access pattern,
rather than IR type information
34
35. Why?
● Memory usage: Don’t need to store bitcasts
● Compile-time: Don’t need to skip bitcasts in optimizations
● Performance:
○ Bitcasts can’t affect optimization if they don’t exist
○ Pointer element type difference cannot prevent CSE / forwarding / etc.
○ Opaque pointers require/encourage generic offset-based reasoning
35
36. Why?
● Memory usage: Don’t need to store bitcasts
● Compile-time: Don’t need to skip bitcasts in optimizations
● Performance: …
● Implementation simplification:
○ Don’t need to insert bitcasts all over the place
36
38. Type-System recursion
%ty = type { %ty* }
⇒ This requires struct types to be mutable
%ty = type { ptr }
⇒ All types can be immutable
38
39. Why?
● Memory usage: Don’t need to store bitcasts
● Compile-time: Don’t need to skip bitcasts in optimizations
● Performance: …
● Implementation simplification:
○ Don’t need to insert bitcasts all over the place
○ Removes recursion from the type system
39
40. Why?
● Memory usage: Don’t need to store bitcasts
● Compile-time: Don’t need to skip bitcasts in optimizations
● Performance: …
● Implementation simplification:
○ Don’t need to insert bitcasts all over the place
○ Removes recursion from the type system
○ Enables follow-up IR changes to remove more types
40
48. Code changes
PointerType::get(ElemTy, AS) still works in opaque pointer mode!
The element type is simply ignored.
⇒ As long as getPointerElementType() is not called,
code usually “just works” in opaque pointer mode.
48
49. Pointer equality does not imply access type equality
define ptr @test(ptr %p) {
store i32 0, ptr %p
%v = load i64, ptr %p
ret ptr %v
}
49
50. Pointer equality does not imply access type equality
define ptr @test(ptr %p) {
store i32 0, ptr %p
%v = load i64, ptr %p
ret ptr %v
}
Need to explicitly check that load type == store type.
Not implied by same pointer operand anymore!
50
51. Frontends
Need to track pointer element types in their own structures now – can’t rely on
LLVM PointerType!
51
52. Frontends
Need to track pointer element types in their own structures now – can’t rely on
LLVM PointerType!
Clang: Address, LValue, RValue store pointer element type now.
52
54. Opaque pointer mode
Automatically enabled if you use ptr in IR or bitcode.
Manually enabled with -opaque-pointers.
define i32* @test(i32* %p)
ret i32* %p
}
54
55. Opaque pointer mode
Automatically enabled if you use ptr in IR or bitcode.
Manually enabled with -opaque-pointers.
define i32* @test(i32* %p)
ret i32* %p
}
; RUN: opt -S -opaque-pointers < %s
define ptr @test(ptr %p)
ret ptr %p
}
55
56. Opaque pointer mode
Automatically enabled if you use ptr in IR or bitcode.
Manually enabled with -opaque-pointers.
Upgrading (very old) bitcode to opaque pointers is supported!
56
58. Migration
● Proposed by David Blaikie in 2015
● Massive migration scope, with many hundreds of direct and indirect pointer
element type uses across the code base.
○ Some trivial to remove.
○ Some require IR changes or full transform rewrites.
58
59. Migration
● Proposed by David Blaikie in 2015
● Massive migration scope, with many hundreds of direct and indirect pointer
element type uses across the code base.
○ Some trivial to remove.
○ Some require IR changes or full transform rewrites.
● Opaque pointer type ptr only introduced in 2021 by Arthur Eubanks
59
60. Migration
● Proposed by David Blaikie in 2015
● Massive migration scope, with many hundreds of direct and indirect pointer
element type uses across the code base.
○ Some trivial to remove.
○ Some require IR changes or full transform rewrites.
● Opaque pointer type ptr only introduced in 2021 by Arthur Eubanks
● Now (end of March 2022): All pointer element type accesses in LLVM and
Clang eradicated.
60
61. Migration
● Proposed by David Blaikie in 2015
● Massive migration scope, with many hundreds of direct and indirect pointer
element type uses across the code base.
○ Some trivial to remove.
○ Some require IR changes or full transform rewrites.
● Opaque pointer type ptr only introduced in 2021 by Arthur Eubanks
● Now (end of March 2022): All pointer element type accesses in LLVM and
Clang eradicated.
● Next step: Enable opaque pointers by default.
○ Caveat: Requires updating ~7k tests.
61
62. Migration
● Proposed by David Blaikie in 2015
● Massive migration scope, with many hundreds of direct and indirect pointer
element type uses across the code base.
○ Some trivial to remove.
○ Some require IR changes or full transform rewrites.
● Opaque pointer type ptr only introduced in 2021 by Arthur Eubanks
● Now (end of March 2022): All pointer element type accesses in LLVM and
Clang eradicated.
● Next step: Enable opaque pointers by default.
○ Caveat: Requires updating ~7k tests.
● Typed pointers expected to be removed after LLVM 15 branch.
62
63. Future: Type-less GEP
All of these are equivalent:
getelementptr { [1 x i32], i32 }, ptr %p, i64 0, i32 1
getelementptr { [1 x i32], i32 }, ptr %p, i64 0, i32 0, i64 1
getelementptr { [1 x i32], i32 }, ptr %p, i64 1, i32 0, i64 -1
63
64. Future: Type-less GEP
All of these are equivalent:
getelementptr { [1 x i32], i32 }, ptr %p, i64 0, i32 1
getelementptr { [1 x i32], i32 }, ptr %p, i64 0, i32 0, i64 1
getelementptr { [1 x i32], i32 }, ptr %p, i64 1, i32 0, i64 -1
getelementptr i32, ptr %p, i64 1
getelementptr i8, ptr %p, i64 4
64
65. Future: Type-less GEP
All of these are equivalent:
getelementptr { [1 x i32], i32 }, ptr %p, i64 0, i32 1
getelementptr { [1 x i32], i32 }, ptr %p, i64 0, i32 0, i64 1
getelementptr { [1 x i32], i32 }, ptr %p, i64 1, i32 0, i64 -1
getelementptr i32, ptr %p, i64 1
getelementptr i8, ptr %p, i64 4
Offset-based algorithms will realize these are equivalent, but…
65
66. Future: Type-less GEP
Nothing in the -O3 pipeline realizes that %p1 and %p2 can be CSEd:
define void @test(ptr %p) {
%p1 = getelementptr i8, ptr %p, i64 4
%p2 = getelementptr i32, ptr %p, i64 1
call void @use(ptr %p1, ptr %p2)
ret void
}
66