SlideShare a Scribd company logo
© 2018 NETRONOME
Jiong Wang
Linux Plumbers Conference
Vancouver, Nov, 2018
Efficient JIT to 32-bit Arches
1
© 2018 NETRONOME
Background
 ISA specification and impact on JIT compiler
 Default code-gen use 64-bit register, ALU64, JMP64
 No big impact on 64-bit backends (REX header for x86-64)
 Requires extra regs and insns on 32-bit arches.
 Programs in real world are 32-bit friendly
 Nearly all non-pointer arithmetic are 32-bit and all will be if pointer is 32-bit
2
38%
62%
test_l4lb_noinline.c
32-bit ALU 64-bit ALU + JMP
31%
69%
test_xdp_noinline.c
32-bit ALU 64-bit ALU + JMP
Arch BPF_ALU/BPF_JMP
arm emit_alu_r(rd[1], rs[1], true, false, op, ctx);
emit_alu_r(rd[0], rs[0], true, true, op, ctx)
emit(ARM_CMP_R(rd, rm), ctx);
emit(ARM_COND_EQ, ARM_CMP_R(rt, rn), ctx);
nfp emit_alu(…reg_both(dst), reg_a(dst), alu_op, reg_b(src));
emit_alu(…reg_both(dst + 1), reg_a(dst + 1), alu_op, reg_b(src +
1));
…
© 2018 NETRONOME
Solution A - 32-bit subreg ISA
 eBPF ISA already defined 32-bit subreg and ALU32 insns, could pass 32-bit semantics from c types down to assembly
 A subreg read is always 32-bit read, but write implicitly zeros high 32-bit. Compilers or hand written assembly could be
using this
 32-bit arches must model implicitly zero extension using extra insns to guarantee correctness. This applies to all ALU32
insn. How could we avoid these extra code-gen? 3
void cal(u32 *a, u32 *b, u64 *c) {
u32 sum = *a + *b;
*c = sum;
}
-mattr=+alu32
cal:
w1 = *(u32 *)(r1 + 0)
w2 = *(u32 *)(r2 + 0)
w2 += w1
*(u64 *)(r3 + 0) = r2
exit
ALU32 insn but there is exploit of implicit zero
extension on w2 by the following store
Reference a 32-bit subreg as a 64-bit definition
-mattr=+alu32 (since LLVM 7.0)
void cal(u32 *a, u32 *b, u32 *c) {
u32 sum = *a + *b;
*c = sum;
}
cal:
w1 = *(u32 *)(r1 + 0)
w2 = *(u32 *)(r2 + 0)
w2 += w1
*(u32 *)(r3 + 0) = w2
exit
cal:
r1 = *(u32 *)(r1 + 0)
r2 = *(u32 *)(r2 + 0)
r2 += r1
*(u32 *)(r3 + 0) = r2
exit
default
© 2018 NETRONOME
Solution A - 32-bit subreg ISA
 Just don’t do zero extension on the DST_REG of ALU32 at all, do them the first time DST_REG is used as 64-bit. Mark the
reg to save on later use
 Insns have 64-bit register read including: cond BPF_JMP, BPF_ALU, BPF_STX | BPF_DW
 cond JMP overhead could be reduced by introducing BPF_JMP32 insn. X86_64 and AArch64 ISA do support this
 A large portion of BPF_ALU comes from address calculation, for example calculating address of local variables.
Could potentially avoid this using verifier register type info
 Without flow analysis, when a instruction is jump destination (start of basic block), need to clear register mark. This could
possibly cause quite a few unnecessary code-gen
Benchmark Total insn Total cond JMP JMP32 Percentage
test_xdp_noinline 999 84 52 61.9%
test_l4lb_noinline 569 40 24 60.0%
© 2018 NETRONOME
Solution A - 32-bit subreg ISA
 LLVM might have better knowledge on all these, so another choice is stopping LLVM exploiting implicit zero extension and
generate it explicitly
 For explicit zero extension, we can’t use existing ALU32 MOV as otherwise we can’t differentiate it with normally MOV
which we don’t want to clear high 32bits. We will need new explicit zero-extension instruction BPF_UEXT
 But how could verifier safely know the input sequences is compiled from LLVM compiler conforming to such convention is
an issue. Could generate ELF tag, however which could be faked, therefore have potential impact on security
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
6
 Work with any input sequence, for example pure ALU64 sequence. Can’t leverage LLVM’s work
 Optimistic
 Assume all ALU instructions are 32-bit safe initially
 Once find one 64-bit register use, pollute all instructions contributed to the value of this register
 Pessimistic
 Assume all ALU instructions are 64-bit initially
 Mark one ALU insn as 32-bit safe only when all the use of its definition are 32-bit
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
 Amount of 32-bit use info decides amount of opportunities
 Without +alu32 code-gen, use info come from
 B/H/W STORE
 CMP-JMP if reg is zero extended
 With +alu32, all ALU32 insn could also produce use info, and analysis could finished quicker
r1 r1
r2 r2
r2 r2
r3r2
r0
r1
define use0 use1
r1 = *(u32 *)(r1 + 0)
r2 = *(u32 *)(r2 + 0)
r2 += r1
*(u32 *)(r3 + 0) = r2
exit
w2 r2
w1 r1
w1 w1
w2 …w1
r0
w2
define use0 use1
w2 = *(u32 *)(r2 + 0)
w1 = *(u32 *)(r1 + 0)
w1 += w2
call foo
exit
void bar(u32 *a, u32 *b, u32 *c) {
u32 sum = *a + *b;
}
*c = sum
*c = foo(sum)
7
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
 A stand-alone, drop-and-work implementation organized as a kernel lib
 Implementation available as RFC
 Could be enhanced with previous CFG infrastructure
https://lwn.net/Articles/753724 to become a global DF analyzer
struct bpf_du_chain {
struct bpf_du_insn *insn;
struct bpf_du_chain *next;
unsigned int flags;
};
r2 r2
r1 r1
r1 r1
w2 …r1
r0
r2
define use0 use1
r2 = *(u32 *)(r2 + 0)
r1 = *(u32 *)(r1 + 0)
r1 += r2
*(u32 *)(r3 + 0) = r2
exit
struct bpf_du_insn {
struct bpf_du_chain *chain[2];
unsigned int flags;
};
bpf_df_init
run DF passes
(bpf_df_pass_32bit_safe)
bpf_df_init
copy DF result
to JIT backend
• Partition scope, mark places
introducing unknown def and use
• Scan insn sequence from start to
end, build DU chain
• Run pre-requisite pass if any
• Mark seed 32-bit insn
• Iteratively backward propagate
until reached fix-point
• Copy DF result kept as flags
• Release resources
8
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
 Algorithm pseudo code for local analyzer
ext_defs[MAX_BPF_REG] = false;
// insn define the current value of R
act_defs[MAX_BPF_REG] = NULL;
bpf_df_init:
for_each_insn_in_the_sequence
if (cur_insn == jmp or call)
dst_insn.flag |= FLAG_IS_JMP_DST
for_each_insn_in_the_sequence
if (cur_insn.flag & FLAG_IS_JMP_DST)
ext_defs[0..MAX_BPF_REG] = true
if (cur_insn == call or exit or jmp )
add_unknown_use(act_def[arg_regs])
or act_defs[0..MAX_BPF_REG] = NULL
cur_insn.u2d[0] = act_defs[cur_insn.dst_reg]
act_defs[cur_insn.dst_reg].d2u.next = cur_insn;
cur_insn.u2d[1] = act_defs[cur_insn.src_reg]
act_defs[cur_insn.src_reg].d2u.next = cur_insn;
bpf_df_pass_32bit_safe:
for_each_insn_in_the_sequence
if (cur_insn == ST_B/H/W && has_single_use(cur_insn.u2d[1]))
cur_insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE
else if (cur_insn == ALU32) {
if (has_single_use(cur_insn.u2d[0])
cur_insn.u2d[0].flag |= FLAG_IS_32BIT_SAFE
if (BPF_X && has_single_use(cur_insn.u2d[1])
cur_insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE
}
do {
changed = false
for_each_insn_in_the_sequence
changed |= propagate_32bit_safe(cur_insn)
while (changed)
propagate_32bit_safe:
changed = false
if (!(insn.flag & FLAG_IS_32BIT_SAFE))
return false
if (insn == alu64 or alu32 except right shift) {
if (has_single_use(insn.u2d[0])
insn.u2d[0].flag |= FLAG_IS_32BIT_SAFE
if (BPF_X && has_single_use(insn.u2d[1])
insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE
}
return changed;
9
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
10
 Build global analyser on top of CFG infrastructure
 Split each register into hi/lo, hi0 ~ lo10, lo0 ~ lo10, do classic live variable analysis
 Calculate LiveIn and LiveOut for hi registers, meaning whether high 32-bit are live (used later) when entering and
leaving one basic block. For 32-bit safety, we care about Out(s) which could tell us the use in successor blocks
 Gen(s) function is complex than classic Gen(s) as we need insn DU chain to propagate from def to use. Can’t simply
finish initialize Gen(s) set using insn scan
Out(s) = ∪s′ ∊ succ(s)
In(s′)
In(s) = Gen(s) ∪ (Out(s) - Kill(s))
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
KERNEL/lib/bpf_core_cfg.c
KERNEL/lib/bpf_core_df.c
KERNEL/lib/bpf_core_opt_32bit.c
driver/nfp driver/arm tool/bpftool Any other places inside
kernel tree needs
eBPF insn analysis
Ideally should compile for both userspace and kernel space
userspace compilation mode
11
© 2018 NETRONOME
Solution C - Enhance Verifier DF analyzer
 Verifier has a light “DF analyzer”, designed for releasing more path prune opportunities
 It is integrated with path walker, collects and processes information while walking insn
 Fit scenarios which do not need memorize historical information, but doesn’t fit well otherwise
12
10: r6 = *(u32 *)(r10 - 4)
11: r7 = *(u32 *)(r10 - 8)
12: *(u64 *)(r10 - 16) = r6
13: *(u64 *)(r10 - 24) = r7
14: r6 += 1
15: r7 += r6
16: *(u32 *)(r10 - 28) = r7
17: r3 += r7
18: r4 = r6
19: *(u64 *)(r10 - 32) = r3
20: *(u64 *)(r10 - 40) = r4
State A
State B
State C
R4 Propagation
R3 Propagation
reg read propagation
10: r6 = *(u32 *)(r10 - 4)
11: r7 = *(u32 *)(r10 - 8)
12: *(u64 *)(r10 - 16) = r6
13: *(u64 *)(r10 - 24) = r7
14: r6 += 1
15: r7 += r6
16: *(u32 *)(r10 - 28) = r7
17: r3 += r7
18: r4 = r6
19: *(u64 *)(r10 - 32) = r3
20: *(u64 *)(r10 - 40) = r4
64-bit usage propagation
R4 Propagation
R3 Propagation
R7 Propagation
R6 Propagation
© 2018 NETRONOME
Solution C - Enhance Verifier DF analyzer
 32-bit analysis algorithm based on verifier DF analyzer
 On solution B, one insn is 32-bit safe if its definition has 32-bit use only, this requires def<->use dual link
 Verifier DF analyzer is based on code path walking, we don’t know whether all uses has been visited, so can only
build use->def singular chain
 Therefore, the algorithm is using 64-bit “polluting”. Initially all instructions are considered as 32-bit safe, then
whenever there is one 64-bit use of one definition, that insn is polluted as 64-bit
 64-bit polluting could propagate from definition to use. For example, A = B + C, definition A has 64-bit use, insn A is
polluted, uses B and C must also be 64-bit as they are forming A, therefore definition insn of B and C should be
polluted as well
13
r7 += r6
r3 += r7
*(u64 *)(r10 - 28) = r3
insn_mark_stack[stack_idx++] = insn_idx;
while (stack_idx) {
def_idx = insn_mark_stack[--stack_idx];
aux[def_idx].full_ref = true;
u2d0 = aux[def_idx].u2d[0];
if (u2d0 >= 0 && !aux[u2d0].full_ref)
insn_mark_stack[stack_idx++] = u2d0;
u2d1 = aux[def_idx].u2d[1];
if (u2d1 >= 0 && !aux[u2d1].full_ref)
insn_mark_stack[stack_idx++] = u2d1;
}
Need to pollute insns that
define r3, r6 and r7 and any
one further backward
iteratively.
© 2018 NETRONOME
Solution C - Enhance Verifier DF analyzer
 64-bit read in pruned path need to be propagated upward to parent verifier state
 But 64-bit read propagation is more complex, it won’t be screened off by a simple write and such propagation needs to be
done iteratively on all relevant registers
10: r6 = *(u32 *)(r10 - 4)
11: r4+=r3
12: *(u64 *)(r10 - 16) = r6
13: *(u64 *)(r10 - 24) = r7
14: r4 = 1
14: r6 += r4
16: r5 += r6
17: r4 += r5
18: r5 = 1
19: r4 += r5
20: *(u64 *)(r10 - 40) = r4
normal reg read
propagation
64-bit reg read
propagation
when propagation start when walking the insn when walking the insn,
but could start later when
one relevant reg identified
as 64-bit
need historical info no yes
need to record all insns
relevant to one reg
screen off any write to the reg write to the reg, but
source shouldn’t be
overlapping with the dest
affect other regs no yes
instructions involved generating r4, registers in these insns are all relevant
14
© 2018 NETRONOME
Solution C - Enhance Verifier DF analyzer
 Historical information
 Insns involved with value generation of the reg
 Because reg state is changing alone with insn walking, so should keep the state when walking the insn
 For example, r4 in insn 14 contributed to the final value of r4 used in insn 20 which is 64-bit, so it should be
propagated to parent state. Insn 15 however screened off r14, if we don’t record the reg state when walking insn 14,
we will miss this propagation
15
14: r6 += r4
15: r4 = 1
16: r5 += r6
17: r4 += r5
18: r5 = 1
19: r4 += r5
20: *(u64 *)(r10 - 40) = r4
State A
State B
struct bpf_insn_aux_data {
…
struct bpf_reg_state_lite reg_state_lite[MAX_BPF_REG];
struct bpf_verifier_state *parent_vstate;
s16 u2d[2];
}
struct bpf_reg_state_lite {
struct bpf_reg_state *parent;
u32 live;
};
insn walker could visit one insn recursively (disallowed now), or repeatedly, both would break the chain!
© 2018 NETRONOME
Solution C - Enhance Verifier DF analyzer
 Algorithm pseudo code
16
enum reg_arg_type {
SRC_OP_0,
SRC_OP64_0,
SRC_OP_1,
SRC_OP64_1,
SRC_OP64_IMP,
DST_OP,
U_DST_OP,
DST_OP_NO_MARK,
U_DST_OP_NO_MARK
};
check_reg_arg across verifier need to pass finer arg_type
depending on the position of the src and whether it is 64bit
check_reg_arg(env, insn->src_reg, SRC_OP|64_0, insn_idx);
check_reg_arg(env, insn->dst_reg, SRC_OP|64_1, insn_idx);
dst_type = no overlap between dst_reg and any src ?
U_DST_OP_NO_MARK : DST_OP_NO_MARK;
check_reg_arg(env, insn->dst_reg, dst_type, insn_idx);
check_reg_arg(reg, type, insn_idx):
rstate = cur_frame->regs + insn_idx
if (type == SRC_*) {
link_use_to_def(rstate, type, insn_idx)
mark_reg_read(… type == SRC_*64_*);
} else {
rstate->live |= REG_LIVE_WRITTEN;
if (t == U_DST_OP || t == U_DST_OP_NO_MARK)
rstate->live |= REG_LIVE_WRITTEN_UNIQUE;
…
rstate->def_insn_idx = insn_idx;
if (insn_idx != INVALID_INSN_IDX)
copy_reg_state_lite(insn_aux[insn_idx].regs_lite, regs);
}
link_reg_to_def(rstate, type, insn_idx):
slot_idx = 0 or 1 depending on type
def_idx = rstate->def_insn_idx;
insn_aux_data[insn_idx].u2d[slot_idx] = def_idx;
mark_reg_read:
…
if (!64bit_read)
return;
return mark_reg_read64(…)
mark_reg_read64:
while (parent) {
if (writes &&
state->live &
REG_LIVE_WRITTEN_UNIQUE)
break;
parent->live |= REG_LIVE_READ64;
state = parent;
parent = state->parent;
writes = true;
}
if (no_mark_insn)
return;
return mark_insn_64bit(def_idx);
mark_insn_64bit:
insn_mark_stack[stack_idx++] = insn_idx;
while (stack_idx) {
def_idx = insn_mark_stack[--stack_idx];
aux[def_idx].full_ref = true;
u2d0 = aux[def_idx].u2d[0];
if (u2d0 >= 0 && !aux[u2d0].full_ref) {
insn_mark_stack[stack_idx++] = u2d0;
mark_reg_read64(aux[def_idx].u2d0.dst_reg,
no_mark_insn)
}
u2d1 = aux[def_idx].u2d[1];
if (u2d1 >= 0 && !aux[u2d1].full_ref) {
insn_mark_stack[stack_idx++] = u2d1;
mark_reg_read64(aux[def_idx].u2d1.dst_reg,
no_mark_insn)
}
}
© 2018 NETRONOME
Which approach to go?
 Verifier
 Better to focus on verification, offer reliable and fast verification, no bothering of optimization
 DF analysis based on dynamic insn walking requires recording historical information
 Path prune however make it very difficult to collect such information on such scope
 Classic CFG + DF analysis
 Reliable and has sophisticated algorithms
 Redo LLVM’s work, heavy for kernel space
 Reuse information passed down from LLVM through 32-bit subreg ISA
 Throw the heavy lift work to user space static compiler who is also really good at
 Lack of some instructions (JMP32 etc) is causing trouble
 Best to follow this approach and enhance eBPF ISA ?
17
Thank you!

More Related Content

What's hot

BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
Linaro
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
Marina Kolpakova
 
8086 labmanual
8086 labmanual8086 labmanual
8086 labmanualiravi9
 
Code GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory SubsystemCode GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory Subsystem
Marina Kolpakova
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Marina Kolpakova
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
Marina Kolpakova
 
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Hsien-Hsin Sean Lee, Ph.D.
 
Online test program generator for RISC-V processors
Online test program generator for RISC-V processorsOnline test program generator for RISC-V processors
Online test program generator for RISC-V processors
RISC-V International
 
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Hsien-Hsin Sean Lee, Ph.D.
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...
Marina Kolpakova
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
Marina Kolpakova
 
15CS44 MP & MC Module 1
15CS44 MP & MC Module 115CS44 MP & MC Module 1
15CS44 MP & MC Module 1
RLJIT
 
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Hsien-Hsin Sean Lee, Ph.D.
 
Happy To Use SIMD
Happy To Use SIMDHappy To Use SIMD
Happy To Use SIMD
Wei-Ta Wang
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5PRADEEP
 
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PipeliningLec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining
Hsien-Hsin Sean Lee, Ph.D.
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me
_xhr_
 

What's hot (20)

BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
 
8086 labmanual
8086 labmanual8086 labmanual
8086 labmanual
 
Code GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory SubsystemCode GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory Subsystem
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
 
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
 
Online test program generator for RISC-V processors
Online test program generator for RISC-V processorsOnline test program generator for RISC-V processors
Online test program generator for RISC-V processors
 
Ch2 arm-1
Ch2 arm-1Ch2 arm-1
Ch2 arm-1
 
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
 
3D-DRESD Lorenzo Pavesi
3D-DRESD Lorenzo Pavesi3D-DRESD Lorenzo Pavesi
3D-DRESD Lorenzo Pavesi
 
DSP_Assign_1
DSP_Assign_1DSP_Assign_1
DSP_Assign_1
 
15CS44 MP & MC Module 1
15CS44 MP & MC Module 115CS44 MP & MC Module 1
15CS44 MP & MC Module 1
 
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 
Happy To Use SIMD
Happy To Use SIMDHappy To Use SIMD
Happy To Use SIMD
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
 
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PipeliningLec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me
 

Similar to Efficient JIT to 32-bit Arches

Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
Kernel TLV
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)bolovv
 
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander NasonovMultiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
eurobsdcon
 
Arm architecture
Arm architectureArm architecture
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
Andrey Karpov
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications development
OOO "Program Verification Systems"
 
A Close Look at ARM Code Size
A Close Look at ARM Code SizeA Close Look at ARM Code Size
A Close Look at ARM Code Size
Samsung Open Source Group
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
Netronome
 
The ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM ArchitectureThe ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM Architecture
sreea4
 
Exploring the x64
Exploring the x64Exploring the x64
Exploring the x64FFRI, Inc.
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Akihiro Hayashi
 
ARM Introduction
ARM IntroductionARM Introduction
ARM Introduction
Ramasubbu .P
 
64bit SMP OS for TILE-Gx many core processor
64bit SMP OS for TILE-Gx many core processor64bit SMP OS for TILE-Gx many core processor
64bit SMP OS for TILE-Gx many core processor
Toru Nishimura
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Kernel TLV
 
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching materialJohn Williams
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development Board
Jian-Hong Pan
 
arm-intro.ppt
arm-intro.pptarm-intro.ppt
arm-intro.ppt
MostafaParvin1
 
C programming session10
C programming  session10C programming  session10
C programming session10
Keroles karam khalil
 

Similar to Efficient JIT to 32-bit Arches (20)

Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)
 
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander NasonovMultiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
 
Arm architecture
Arm architectureArm architecture
Arm architecture
 
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications development
 
A Close Look at ARM Code Size
A Close Look at ARM Code SizeA Close Look at ARM Code Size
A Close Look at ARM Code Size
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
The ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM ArchitectureThe ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM Architecture
 
Arm
ArmArm
Arm
 
Exploring the x64
Exploring the x64Exploring the x64
Exploring the x64
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
ARM Introduction
ARM IntroductionARM Introduction
ARM Introduction
 
64bit SMP OS for TILE-Gx many core processor
64bit SMP OS for TILE-Gx many core processor64bit SMP OS for TILE-Gx many core processor
64bit SMP OS for TILE-Gx many core processor
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching material
 
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching material
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development Board
 
arm-intro.ppt
arm-intro.pptarm-intro.ppt
arm-intro.ppt
 
C programming session10
C programming  session10C programming  session10
C programming session10
 

More from Netronome

Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Netronome
 
LFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DSLFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DS
Netronome
 
LFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M InstructionsLFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M Instructions
Netronome
 
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server ArchitectureUsing Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Netronome
 
Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports
Netronome
 
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware OffloadsQuality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
Netronome
 
ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project Launch
Netronome
 
Flexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesFlexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific Architectures
Netronome
 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Netronome
 
Massively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional MemoryMassively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional Memory
Netronome
 
Offloading Linux LAG Devices Via Open vSwitch and TC
Offloading Linux LAG Devices Via Open vSwitch and TCOffloading Linux LAG Devices Via Open vSwitch and TC
Offloading Linux LAG Devices Via Open vSwitch and TC
Netronome
 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current Techniques
Netronome
 
eBPF & Switch Abstractions
eBPF & Switch AbstractionseBPF & Switch Abstractions
eBPF & Switch Abstractions
Netronome
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
Netronome
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT Compiler
Netronome
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
Netronome
 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction
Netronome
 
Host Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment ModelsHost Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment Models
Netronome
 
The Power of SmartNICs
The Power of SmartNICsThe Power of SmartNICs
The Power of SmartNICs
Netronome
 
DPDK Support for New HW Offloads
DPDK Support for New HW OffloadsDPDK Support for New HW Offloads
DPDK Support for New HW Offloads
Netronome
 

More from Netronome (20)

Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
 
LFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DSLFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DS
 
LFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M InstructionsLFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M Instructions
 
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server ArchitectureUsing Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
 
Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports
 
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware OffloadsQuality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
 
ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project Launch
 
Flexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesFlexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific Architectures
 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPF
 
Massively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional MemoryMassively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional Memory
 
Offloading Linux LAG Devices Via Open vSwitch and TC
Offloading Linux LAG Devices Via Open vSwitch and TCOffloading Linux LAG Devices Via Open vSwitch and TC
Offloading Linux LAG Devices Via Open vSwitch and TC
 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current Techniques
 
eBPF & Switch Abstractions
eBPF & Switch AbstractionseBPF & Switch Abstractions
eBPF & Switch Abstractions
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT Compiler
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction
 
Host Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment ModelsHost Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment Models
 
The Power of SmartNICs
The Power of SmartNICsThe Power of SmartNICs
The Power of SmartNICs
 
DPDK Support for New HW Offloads
DPDK Support for New HW OffloadsDPDK Support for New HW Offloads
DPDK Support for New HW Offloads
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

Efficient JIT to 32-bit Arches

  • 1. © 2018 NETRONOME Jiong Wang Linux Plumbers Conference Vancouver, Nov, 2018 Efficient JIT to 32-bit Arches 1
  • 2. © 2018 NETRONOME Background  ISA specification and impact on JIT compiler  Default code-gen use 64-bit register, ALU64, JMP64  No big impact on 64-bit backends (REX header for x86-64)  Requires extra regs and insns on 32-bit arches.  Programs in real world are 32-bit friendly  Nearly all non-pointer arithmetic are 32-bit and all will be if pointer is 32-bit 2 38% 62% test_l4lb_noinline.c 32-bit ALU 64-bit ALU + JMP 31% 69% test_xdp_noinline.c 32-bit ALU 64-bit ALU + JMP Arch BPF_ALU/BPF_JMP arm emit_alu_r(rd[1], rs[1], true, false, op, ctx); emit_alu_r(rd[0], rs[0], true, true, op, ctx) emit(ARM_CMP_R(rd, rm), ctx); emit(ARM_COND_EQ, ARM_CMP_R(rt, rn), ctx); nfp emit_alu(…reg_both(dst), reg_a(dst), alu_op, reg_b(src)); emit_alu(…reg_both(dst + 1), reg_a(dst + 1), alu_op, reg_b(src + 1)); …
  • 3. © 2018 NETRONOME Solution A - 32-bit subreg ISA  eBPF ISA already defined 32-bit subreg and ALU32 insns, could pass 32-bit semantics from c types down to assembly  A subreg read is always 32-bit read, but write implicitly zeros high 32-bit. Compilers or hand written assembly could be using this  32-bit arches must model implicitly zero extension using extra insns to guarantee correctness. This applies to all ALU32 insn. How could we avoid these extra code-gen? 3 void cal(u32 *a, u32 *b, u64 *c) { u32 sum = *a + *b; *c = sum; } -mattr=+alu32 cal: w1 = *(u32 *)(r1 + 0) w2 = *(u32 *)(r2 + 0) w2 += w1 *(u64 *)(r3 + 0) = r2 exit ALU32 insn but there is exploit of implicit zero extension on w2 by the following store Reference a 32-bit subreg as a 64-bit definition -mattr=+alu32 (since LLVM 7.0) void cal(u32 *a, u32 *b, u32 *c) { u32 sum = *a + *b; *c = sum; } cal: w1 = *(u32 *)(r1 + 0) w2 = *(u32 *)(r2 + 0) w2 += w1 *(u32 *)(r3 + 0) = w2 exit cal: r1 = *(u32 *)(r1 + 0) r2 = *(u32 *)(r2 + 0) r2 += r1 *(u32 *)(r3 + 0) = r2 exit default
  • 4. © 2018 NETRONOME Solution A - 32-bit subreg ISA  Just don’t do zero extension on the DST_REG of ALU32 at all, do them the first time DST_REG is used as 64-bit. Mark the reg to save on later use  Insns have 64-bit register read including: cond BPF_JMP, BPF_ALU, BPF_STX | BPF_DW  cond JMP overhead could be reduced by introducing BPF_JMP32 insn. X86_64 and AArch64 ISA do support this  A large portion of BPF_ALU comes from address calculation, for example calculating address of local variables. Could potentially avoid this using verifier register type info  Without flow analysis, when a instruction is jump destination (start of basic block), need to clear register mark. This could possibly cause quite a few unnecessary code-gen Benchmark Total insn Total cond JMP JMP32 Percentage test_xdp_noinline 999 84 52 61.9% test_l4lb_noinline 569 40 24 60.0%
  • 5. © 2018 NETRONOME Solution A - 32-bit subreg ISA  LLVM might have better knowledge on all these, so another choice is stopping LLVM exploiting implicit zero extension and generate it explicitly  For explicit zero extension, we can’t use existing ALU32 MOV as otherwise we can’t differentiate it with normally MOV which we don’t want to clear high 32bits. We will need new explicit zero-extension instruction BPF_UEXT  But how could verifier safely know the input sequences is compiled from LLVM compiler conforming to such convention is an issue. Could generate ELF tag, however which could be faked, therefore have potential impact on security
  • 6. © 2018 NETRONOME Solution B – Standalone DF Analyzer 6  Work with any input sequence, for example pure ALU64 sequence. Can’t leverage LLVM’s work  Optimistic  Assume all ALU instructions are 32-bit safe initially  Once find one 64-bit register use, pollute all instructions contributed to the value of this register  Pessimistic  Assume all ALU instructions are 64-bit initially  Mark one ALU insn as 32-bit safe only when all the use of its definition are 32-bit
  • 7. © 2018 NETRONOME Solution B – Standalone DF Analyzer  Amount of 32-bit use info decides amount of opportunities  Without +alu32 code-gen, use info come from  B/H/W STORE  CMP-JMP if reg is zero extended  With +alu32, all ALU32 insn could also produce use info, and analysis could finished quicker r1 r1 r2 r2 r2 r2 r3r2 r0 r1 define use0 use1 r1 = *(u32 *)(r1 + 0) r2 = *(u32 *)(r2 + 0) r2 += r1 *(u32 *)(r3 + 0) = r2 exit w2 r2 w1 r1 w1 w1 w2 …w1 r0 w2 define use0 use1 w2 = *(u32 *)(r2 + 0) w1 = *(u32 *)(r1 + 0) w1 += w2 call foo exit void bar(u32 *a, u32 *b, u32 *c) { u32 sum = *a + *b; } *c = sum *c = foo(sum) 7
  • 8. © 2018 NETRONOME Solution B – Standalone DF Analyzer  A stand-alone, drop-and-work implementation organized as a kernel lib  Implementation available as RFC  Could be enhanced with previous CFG infrastructure https://lwn.net/Articles/753724 to become a global DF analyzer struct bpf_du_chain { struct bpf_du_insn *insn; struct bpf_du_chain *next; unsigned int flags; }; r2 r2 r1 r1 r1 r1 w2 …r1 r0 r2 define use0 use1 r2 = *(u32 *)(r2 + 0) r1 = *(u32 *)(r1 + 0) r1 += r2 *(u32 *)(r3 + 0) = r2 exit struct bpf_du_insn { struct bpf_du_chain *chain[2]; unsigned int flags; }; bpf_df_init run DF passes (bpf_df_pass_32bit_safe) bpf_df_init copy DF result to JIT backend • Partition scope, mark places introducing unknown def and use • Scan insn sequence from start to end, build DU chain • Run pre-requisite pass if any • Mark seed 32-bit insn • Iteratively backward propagate until reached fix-point • Copy DF result kept as flags • Release resources 8
  • 9. © 2018 NETRONOME Solution B – Standalone DF Analyzer  Algorithm pseudo code for local analyzer ext_defs[MAX_BPF_REG] = false; // insn define the current value of R act_defs[MAX_BPF_REG] = NULL; bpf_df_init: for_each_insn_in_the_sequence if (cur_insn == jmp or call) dst_insn.flag |= FLAG_IS_JMP_DST for_each_insn_in_the_sequence if (cur_insn.flag & FLAG_IS_JMP_DST) ext_defs[0..MAX_BPF_REG] = true if (cur_insn == call or exit or jmp ) add_unknown_use(act_def[arg_regs]) or act_defs[0..MAX_BPF_REG] = NULL cur_insn.u2d[0] = act_defs[cur_insn.dst_reg] act_defs[cur_insn.dst_reg].d2u.next = cur_insn; cur_insn.u2d[1] = act_defs[cur_insn.src_reg] act_defs[cur_insn.src_reg].d2u.next = cur_insn; bpf_df_pass_32bit_safe: for_each_insn_in_the_sequence if (cur_insn == ST_B/H/W && has_single_use(cur_insn.u2d[1])) cur_insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE else if (cur_insn == ALU32) { if (has_single_use(cur_insn.u2d[0]) cur_insn.u2d[0].flag |= FLAG_IS_32BIT_SAFE if (BPF_X && has_single_use(cur_insn.u2d[1]) cur_insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE } do { changed = false for_each_insn_in_the_sequence changed |= propagate_32bit_safe(cur_insn) while (changed) propagate_32bit_safe: changed = false if (!(insn.flag & FLAG_IS_32BIT_SAFE)) return false if (insn == alu64 or alu32 except right shift) { if (has_single_use(insn.u2d[0]) insn.u2d[0].flag |= FLAG_IS_32BIT_SAFE if (BPF_X && has_single_use(insn.u2d[1]) insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE } return changed; 9
  • 10. © 2018 NETRONOME Solution B – Standalone DF Analyzer 10  Build global analyser on top of CFG infrastructure  Split each register into hi/lo, hi0 ~ lo10, lo0 ~ lo10, do classic live variable analysis  Calculate LiveIn and LiveOut for hi registers, meaning whether high 32-bit are live (used later) when entering and leaving one basic block. For 32-bit safety, we care about Out(s) which could tell us the use in successor blocks  Gen(s) function is complex than classic Gen(s) as we need insn DU chain to propagate from def to use. Can’t simply finish initialize Gen(s) set using insn scan Out(s) = ∪s′ ∊ succ(s) In(s′) In(s) = Gen(s) ∪ (Out(s) - Kill(s))
  • 11. © 2018 NETRONOME Solution B – Standalone DF Analyzer KERNEL/lib/bpf_core_cfg.c KERNEL/lib/bpf_core_df.c KERNEL/lib/bpf_core_opt_32bit.c driver/nfp driver/arm tool/bpftool Any other places inside kernel tree needs eBPF insn analysis Ideally should compile for both userspace and kernel space userspace compilation mode 11
  • 12. © 2018 NETRONOME Solution C - Enhance Verifier DF analyzer  Verifier has a light “DF analyzer”, designed for releasing more path prune opportunities  It is integrated with path walker, collects and processes information while walking insn  Fit scenarios which do not need memorize historical information, but doesn’t fit well otherwise 12 10: r6 = *(u32 *)(r10 - 4) 11: r7 = *(u32 *)(r10 - 8) 12: *(u64 *)(r10 - 16) = r6 13: *(u64 *)(r10 - 24) = r7 14: r6 += 1 15: r7 += r6 16: *(u32 *)(r10 - 28) = r7 17: r3 += r7 18: r4 = r6 19: *(u64 *)(r10 - 32) = r3 20: *(u64 *)(r10 - 40) = r4 State A State B State C R4 Propagation R3 Propagation reg read propagation 10: r6 = *(u32 *)(r10 - 4) 11: r7 = *(u32 *)(r10 - 8) 12: *(u64 *)(r10 - 16) = r6 13: *(u64 *)(r10 - 24) = r7 14: r6 += 1 15: r7 += r6 16: *(u32 *)(r10 - 28) = r7 17: r3 += r7 18: r4 = r6 19: *(u64 *)(r10 - 32) = r3 20: *(u64 *)(r10 - 40) = r4 64-bit usage propagation R4 Propagation R3 Propagation R7 Propagation R6 Propagation
  • 13. © 2018 NETRONOME Solution C - Enhance Verifier DF analyzer  32-bit analysis algorithm based on verifier DF analyzer  On solution B, one insn is 32-bit safe if its definition has 32-bit use only, this requires def<->use dual link  Verifier DF analyzer is based on code path walking, we don’t know whether all uses has been visited, so can only build use->def singular chain  Therefore, the algorithm is using 64-bit “polluting”. Initially all instructions are considered as 32-bit safe, then whenever there is one 64-bit use of one definition, that insn is polluted as 64-bit  64-bit polluting could propagate from definition to use. For example, A = B + C, definition A has 64-bit use, insn A is polluted, uses B and C must also be 64-bit as they are forming A, therefore definition insn of B and C should be polluted as well 13 r7 += r6 r3 += r7 *(u64 *)(r10 - 28) = r3 insn_mark_stack[stack_idx++] = insn_idx; while (stack_idx) { def_idx = insn_mark_stack[--stack_idx]; aux[def_idx].full_ref = true; u2d0 = aux[def_idx].u2d[0]; if (u2d0 >= 0 && !aux[u2d0].full_ref) insn_mark_stack[stack_idx++] = u2d0; u2d1 = aux[def_idx].u2d[1]; if (u2d1 >= 0 && !aux[u2d1].full_ref) insn_mark_stack[stack_idx++] = u2d1; } Need to pollute insns that define r3, r6 and r7 and any one further backward iteratively.
  • 14. © 2018 NETRONOME Solution C - Enhance Verifier DF analyzer  64-bit read in pruned path need to be propagated upward to parent verifier state  But 64-bit read propagation is more complex, it won’t be screened off by a simple write and such propagation needs to be done iteratively on all relevant registers 10: r6 = *(u32 *)(r10 - 4) 11: r4+=r3 12: *(u64 *)(r10 - 16) = r6 13: *(u64 *)(r10 - 24) = r7 14: r4 = 1 14: r6 += r4 16: r5 += r6 17: r4 += r5 18: r5 = 1 19: r4 += r5 20: *(u64 *)(r10 - 40) = r4 normal reg read propagation 64-bit reg read propagation when propagation start when walking the insn when walking the insn, but could start later when one relevant reg identified as 64-bit need historical info no yes need to record all insns relevant to one reg screen off any write to the reg write to the reg, but source shouldn’t be overlapping with the dest affect other regs no yes instructions involved generating r4, registers in these insns are all relevant 14
  • 15. © 2018 NETRONOME Solution C - Enhance Verifier DF analyzer  Historical information  Insns involved with value generation of the reg  Because reg state is changing alone with insn walking, so should keep the state when walking the insn  For example, r4 in insn 14 contributed to the final value of r4 used in insn 20 which is 64-bit, so it should be propagated to parent state. Insn 15 however screened off r14, if we don’t record the reg state when walking insn 14, we will miss this propagation 15 14: r6 += r4 15: r4 = 1 16: r5 += r6 17: r4 += r5 18: r5 = 1 19: r4 += r5 20: *(u64 *)(r10 - 40) = r4 State A State B struct bpf_insn_aux_data { … struct bpf_reg_state_lite reg_state_lite[MAX_BPF_REG]; struct bpf_verifier_state *parent_vstate; s16 u2d[2]; } struct bpf_reg_state_lite { struct bpf_reg_state *parent; u32 live; }; insn walker could visit one insn recursively (disallowed now), or repeatedly, both would break the chain!
  • 16. © 2018 NETRONOME Solution C - Enhance Verifier DF analyzer  Algorithm pseudo code 16 enum reg_arg_type { SRC_OP_0, SRC_OP64_0, SRC_OP_1, SRC_OP64_1, SRC_OP64_IMP, DST_OP, U_DST_OP, DST_OP_NO_MARK, U_DST_OP_NO_MARK }; check_reg_arg across verifier need to pass finer arg_type depending on the position of the src and whether it is 64bit check_reg_arg(env, insn->src_reg, SRC_OP|64_0, insn_idx); check_reg_arg(env, insn->dst_reg, SRC_OP|64_1, insn_idx); dst_type = no overlap between dst_reg and any src ? U_DST_OP_NO_MARK : DST_OP_NO_MARK; check_reg_arg(env, insn->dst_reg, dst_type, insn_idx); check_reg_arg(reg, type, insn_idx): rstate = cur_frame->regs + insn_idx if (type == SRC_*) { link_use_to_def(rstate, type, insn_idx) mark_reg_read(… type == SRC_*64_*); } else { rstate->live |= REG_LIVE_WRITTEN; if (t == U_DST_OP || t == U_DST_OP_NO_MARK) rstate->live |= REG_LIVE_WRITTEN_UNIQUE; … rstate->def_insn_idx = insn_idx; if (insn_idx != INVALID_INSN_IDX) copy_reg_state_lite(insn_aux[insn_idx].regs_lite, regs); } link_reg_to_def(rstate, type, insn_idx): slot_idx = 0 or 1 depending on type def_idx = rstate->def_insn_idx; insn_aux_data[insn_idx].u2d[slot_idx] = def_idx; mark_reg_read: … if (!64bit_read) return; return mark_reg_read64(…) mark_reg_read64: while (parent) { if (writes && state->live & REG_LIVE_WRITTEN_UNIQUE) break; parent->live |= REG_LIVE_READ64; state = parent; parent = state->parent; writes = true; } if (no_mark_insn) return; return mark_insn_64bit(def_idx); mark_insn_64bit: insn_mark_stack[stack_idx++] = insn_idx; while (stack_idx) { def_idx = insn_mark_stack[--stack_idx]; aux[def_idx].full_ref = true; u2d0 = aux[def_idx].u2d[0]; if (u2d0 >= 0 && !aux[u2d0].full_ref) { insn_mark_stack[stack_idx++] = u2d0; mark_reg_read64(aux[def_idx].u2d0.dst_reg, no_mark_insn) } u2d1 = aux[def_idx].u2d[1]; if (u2d1 >= 0 && !aux[u2d1].full_ref) { insn_mark_stack[stack_idx++] = u2d1; mark_reg_read64(aux[def_idx].u2d1.dst_reg, no_mark_insn) } }
  • 17. © 2018 NETRONOME Which approach to go?  Verifier  Better to focus on verification, offer reliable and fast verification, no bothering of optimization  DF analysis based on dynamic insn walking requires recording historical information  Path prune however make it very difficult to collect such information on such scope  Classic CFG + DF analysis  Reliable and has sophisticated algorithms  Redo LLVM’s work, heavy for kernel space  Reuse information passed down from LLVM through 32-bit subreg ISA  Throw the heavy lift work to user space static compiler who is also really good at  Lack of some instructions (JMP32 etc) is causing trouble  Best to follow this approach and enhance eBPF ISA ? 17 Thank you!