Upcoming SlideShare
×

# Certifying (RISC) Machine Code Safe from Aliasing (OpenCert 2013)

175 views
149 views

Published on

Slide presentation for Certifying (RISC) Machine Code Safe from Aliasing, presented at OpenCert 2013, Madrid. See http://www.academia.edu/3244313/Certifying_Machine_Code_Safe_from_Hardware_Aliasing_RISC_is_not_necessarily_risky.

Published in: Technology, Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
175
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
2
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Certifying (RISC) Machine Code Safe from Aliasing (OpenCert 2013)

1. 1. Certifying (RISC) Machine Code Safe from Aliasing Peter T. Breuer University of Birmingham, UK Jonathan P. Bowen London South Bank University, UK
2. 2. Little and Large Problem ● Small arithmetic unit, embedded processor – 40 bit arithmetic ● Large memory unit – 64 bit addressing ● What do we do with the extra wires?
3. 3. Hardware Aliasing ● What happens to the extra wires? – depends on the hardware ● 4 + 0xfffffffffffffffc = 0x0000000000000000 or 0xfffff00000000000 ? ● Both mean 0 – If use arithmetic to calculate address 0 ● Sometimes get the 0 you want ● Sometimes not!
4. 4. Also happens in KPU ● A KPU is an encrypted processor – Instead of 4 - 4 = 0 – Does 99900 - 99900 = 78763298 ● Homomorphism conditions on encrypted arithmetic guarantee correct behaviour – Real encryption is always 1-many ● The encoding of 0 is 9896861 ● 99900 - 99900 = 78763298  9896861 ● Another encoding of 0 is 78763298 – Encrypted arithmetic gives different result ● Depending on how you do the calculation
5. 5. Problem ● How to check a program is safe from hardware aliasing ● Where `hardware aliasing' means that arithmetic on addresses does not always give the same result. – Trust only exactly the same calculation – Because 4 - 4 != 0 – It's `equivalent' to 0, not identical!
6. 6. Can imagine in both cases ... ● Values have invisible extra bits ● 42.1101101 ● Represent different encodings of '42' ● Arithmetic ignores but mutates the extra bits ● 42.1101101 + 42.1100001 = 84.0110110 ● Memory unit is sensitive to invisible extra bits ● Can't see just '42'. ● Needs loving care from programmer
7. 7. How to deal with hardware aliasing ● Left program returns different alias of SP to caller Subroutine foo: SP -= 32 # 8 local vars …code ... SP += 32 # destroy frame return Subroutine foo: GP = SP SP -= 32 …code ... SP = GP return GoodBad
8. 8. Regard machine code as compiled from Stack Machine control language ● Good code: cspt GP # copy stack pointer to GP push 32 # make 32B space on stack … rspf GP # restore stack pointer from GP return
9. 9. What makes that SM code safe? ● No access outside the current frame – The stack access commands are ● Get 10 gp # 10th stack cell contents.. ● Put 10 gp # .. transfer to/from reg gp – If all access offsets in current frame range ● Only one way to access stack content.. ● By offset from current stack pointer – Can only make new frame, not shift sp ● Push 32 – Can only return sp to value saved earlier ● Cspt gp … rspf gp
10. 10. Heap access ● Deal with that later! – Look for array and string treatment in text
11. 11. Verifying SM code ● Means verifying that all stack accesses are within the current frame boundary ● That's so easy! Check n in 'get n r'. ● But we have machine code, not SM code!
12. 12. Machine code looks like this ● Mov gp sp # cspt gp Addi sp sp -32 # push 32 … mov sp gp # rspf gp jr ra # return ● Is it compiled from safe SM code?
13. 13. To prove m/c safe ● Apply Hoare-like rules of reasoning – Whose names are the SM code that the m/c is supposed to be compiled from ● Requires human being to chose rule – Or an automaton to search solution space – Either way, it's deduction-guided disassembly
14. 14. Example ● Think about a 32B current frame { sp=c32 !10; (10)=x } ld gp 10(sp) [get 10 gp] {sp=c32 !10; (10)=gp=x} ● 'c32 !10' means pointer to 32B – Already written at offset 10 ● (10)=x means stack cell 10 has an x-thing ● Machine code is 'ld gp 10(sp)' – Load reg gp from offset 10 from stack ptr ● Name of the rule is 'get 10 gp'
15. 15. Types ● Logic is based on stack machine model – manipulates types in register/stack/heap ● C32 – pointer to stack frame of size 32 – Only access by bounded offset from ptr ● U10 – array of size 10 on heap – Can only access by offset from fixed base ● C1 - string accessed in increments of 1 – String is like a stack of frames size 1 – Stepping up `pops one off the stack' – Access within `current frame' only
16. 16. Typing ● Milner typing – Assign type variables to every register and stack position within current frame – Calculate effect of instructions – Ambiguous modulo assignment of rule ● Equals dis-assembly of instruction ● Proved – soundness – Assigned types say what really happens
17. 17. Other Proved Things ● Termination – Milner algorithm terminates – With a typing, if one exists, errors if not ● Uniqueness – The type found is unique most general ● For a given annotation ● There are at most 32 valid annotations – Differ in position of stack pointer register
18. 18. Conclusion 1.Disassemble machine code • Human activity 2.Apply Milner typing • Includes stack machine bounds verification • Automated activity 3.Certify m/c as hardware alias safe ● Steps 1 & 2 can be mixed/simultaneous ● Inference-guided disassembly 4.Apply to assembler in Linux kernel