1
Embedded Career Day#2:
ARM Architecture and Meltdown/Spectre
Andrew Lukin
2018-02-10
2
In general
3
RISC -- Reduced Instructions Set Computer
● Small set of simple and general instructions
● Fixed length instructions
● Simpler processor’s core logic
● Harvard architecture -- architecture with physically separate storage and
signal pathways for instructions and data
● Load/Store architecture -- separate instructions for memory access
● A lot of general purpose registers or even register files
4
Evolution of the ARM architecture
5
Registers ARMv7
6
Registers AArch64 (ARMv8)
7
PSTATE at AArch32
8
OS-specific
9
ARMv7 exceptions
10
AArch64 exceptions model
11
Exceptions table
12
Exceptions
● A synchronous exception if it is generated as a result of execution or attempted
execution of the instruction stream, and where the return address provides
details of the instruction that caused it.
● An asynchronous exception is not generated by executing instructions, while the
return address might not always provide details of what caused the exception.
● In the ARMv7-A architecture, the prefetch abort, Data Abort and undef
exceptions are separate items.
● In AArch64, all of these events generate a Synchronous abort. The exception
handler may then read the syndrome and FAR registers to obtain the necessary
information to distinguish between them.
13
Interrupts
14
Execution states
15
Execution states - Registers mapping
16
MMU
17
MMU ARMv7
18
AArch64 MMU Support
19
MMU - Caches
● Point of Coherency (PoC) -- is the point at which all observers, for example,
cores, DSPs, or DMA engines, that can access memory, are guaranteed to
see the same copy of a memory location. Typically, this is the main external
system memory.
● Point of Unification (PoU) -- is the point at which the instruction and data
caches and translation table walks of the core are guaranteed to see the
same copy of a memory location
20
MMU + ASID
21
MMU - Normal memory
● Normal memory -- The processor can re-order, repeat, and merge accesses
to it.
Furthermore, address locations that are marked as Normal can be accessed
speculatively by the processor, so that data or instructions can be read from
memory without being explicitly referenced in the program, or in advance of
the actual execution of an explicit reference. Such speculative accesses can
occur as a result of branch prediction, speculative cache linefills, out-of-order
data loads, or other hardware optimizations.
22
MMU - Device memory
● Device memory --
○ Device-nGnRnE most restrictive (equivalent to Strongly Ordered
memory in the ARMv7 architecture).
○ Device-nGnRE
○ Device-nGRE
○ Device-GRE least restrictive
● Gathering of non Gathering (G or nG) -- whether multiple accesses can be
merged into a single bus transaction for this memory region.
● Re-ordering (R or nR) -- whether accesses to the same device can be
re-ordered with respect to each other.
● Early Write Acknowledgement (E or nE) -- whether an intermediate write
buffer between the processor and the slave device being accessed is allowed
to send an acknowledgement of a write completion
23
Instructions
24
Instruction changes
25
Instruction changes
26
Instruction changes
27
Multiplication instructions in assembly language
28
ABI
29
ABI for AArch64
30
PCS
31
Meltdown and Spectre
32
Conspiracy theory
Intel:
Vulnerability is there for 10 to 20 YEARS
But “Flush+Reload” are known from 2014 at least
ARM:
Vulnerability is introduced in latest most powerful designs
33
Vulnerable processors
34
Microarchitecture Cortex-A75
35
Microarchitecture Cortex-A75
36
Side-channel attacks: Timing attack
37
Variant 1: bypassing software checking of untrusted values
1 struct array {
2 unsigned long length;
3 unsigned char data[];
4 };
5 struct array *arr1 = ...; /* small array */
6 struct array *arr2 = ...; /*array of size 0x400 */
7 unsigned long untrusted_offset_from_user = ...;
8 if (untrusted_offset_from_user < arr1->length) {
9 unsigned char value;
10 value = arr1->data[untrusted_offset_from_user];
11 unsigned long index2 = ((value&1)*0x100)+0x200;
12 if (index2 < arr2->length) {
13 unsigned char value2 = arr2->data[index2];
14 }
15 }
38
Variant 2: branch target injection
39
Variant 3: using speculative reads of inaccessible data
The perturbation of the cache by the LDR X5, [X6,X3] (line 7) can be subsequently measured by the EL0 code for
different values of the shift amount imm (line 5). This gives a mechanism to establish the value of the EL1 data at
the address pointed to by X4,so leaking data that should not be accessible to EL0 code.
1 LDR X1, [X2] ; arranged to miss in the cache
2 CBZ X1, over ; This will be taken but
3 ; is predicted not taken
4 LDR X3, [X4] ; X4 points to some EL1 memory
5 LSL X3, X3, #imm
6 AND X3, X3, #0xFC0
7 LDR X5, [X6,X3] ; X6 is an EL0 base address
8 over
40
Variant 3a: using speculative reads of system registers
In much the same way as with the main Variant 3, in a small number of Arm implementations, a processor that
speculatively performs a read of a system register that is not accessible at the current exception level, will actually
access the associated system register (provided that it is a register that can be read without side-effects).
1 LDR X1, [X2] ; arranged to miss in the cache
2 CBZ X1, over ; This will be taken
3 MRS X3, TTBR0_EL1;
4 LSL X3, X3, #imm
5 AND X3, X3, #0xFC0
6 LDR X5, [X6,X3] ; X6 is an EL0 base address
7 over
Can be used to read crypto keys from system registers if ARM Pointer authentication
feature used (ARMv8.3)
41
Links
1. Programmer’s Guide for ARMv8-A (DEN0024A)
2. ARM® Architecture Reference Manual
ARMv8, for ARMv8-A architecture profile
DDI0487B_a_armv8.pdf
3. ARM® Architecture Reference Manual
ARMv7-A and ARMv7-R edition
DDI0406C_C_arm_architecture_reference_manual.pdf
4. http://meltdownattack.com
5. http://spectreattack.com
6. https://developer.arm.com/support/security-update
42
Questions?
43
Thank you
Andrew Lukin
Sr Embedded Developer
andrii.lukin@globallogic.com
+380-95-303-43-76

ARM Architecture and Meltdown/Spectre

  • 1.
    1 Embedded Career Day#2: ARMArchitecture and Meltdown/Spectre Andrew Lukin 2018-02-10
  • 2.
  • 3.
    3 RISC -- ReducedInstructions Set Computer ● Small set of simple and general instructions ● Fixed length instructions ● Simpler processor’s core logic ● Harvard architecture -- architecture with physically separate storage and signal pathways for instructions and data ● Load/Store architecture -- separate instructions for memory access ● A lot of general purpose registers or even register files
  • 4.
    4 Evolution of theARM architecture
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    12 Exceptions ● A synchronousexception if it is generated as a result of execution or attempted execution of the instruction stream, and where the return address provides details of the instruction that caused it. ● An asynchronous exception is not generated by executing instructions, while the return address might not always provide details of what caused the exception. ● In the ARMv7-A architecture, the prefetch abort, Data Abort and undef exceptions are separate items. ● In AArch64, all of these events generate a Synchronous abort. The exception handler may then read the syndrome and FAR registers to obtain the necessary information to distinguish between them.
  • 13.
  • 14.
  • 15.
    15 Execution states -Registers mapping
  • 16.
  • 17.
  • 18.
  • 19.
    19 MMU - Caches ●Point of Coherency (PoC) -- is the point at which all observers, for example, cores, DSPs, or DMA engines, that can access memory, are guaranteed to see the same copy of a memory location. Typically, this is the main external system memory. ● Point of Unification (PoU) -- is the point at which the instruction and data caches and translation table walks of the core are guaranteed to see the same copy of a memory location
  • 20.
  • 21.
    21 MMU - Normalmemory ● Normal memory -- The processor can re-order, repeat, and merge accesses to it. Furthermore, address locations that are marked as Normal can be accessed speculatively by the processor, so that data or instructions can be read from memory without being explicitly referenced in the program, or in advance of the actual execution of an explicit reference. Such speculative accesses can occur as a result of branch prediction, speculative cache linefills, out-of-order data loads, or other hardware optimizations.
  • 22.
    22 MMU - Devicememory ● Device memory -- ○ Device-nGnRnE most restrictive (equivalent to Strongly Ordered memory in the ARMv7 architecture). ○ Device-nGnRE ○ Device-nGRE ○ Device-GRE least restrictive ● Gathering of non Gathering (G or nG) -- whether multiple accesses can be merged into a single bus transaction for this memory region. ● Re-ordering (R or nR) -- whether accesses to the same device can be re-ordered with respect to each other. ● Early Write Acknowledgement (E or nE) -- whether an intermediate write buffer between the processor and the slave device being accessed is allowed to send an acknowledgement of a write completion
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
    32 Conspiracy theory Intel: Vulnerability isthere for 10 to 20 YEARS But “Flush+Reload” are known from 2014 at least ARM: Vulnerability is introduced in latest most powerful designs
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
    37 Variant 1: bypassingsoftware checking of untrusted values 1 struct array { 2 unsigned long length; 3 unsigned char data[]; 4 }; 5 struct array *arr1 = ...; /* small array */ 6 struct array *arr2 = ...; /*array of size 0x400 */ 7 unsigned long untrusted_offset_from_user = ...; 8 if (untrusted_offset_from_user < arr1->length) { 9 unsigned char value; 10 value = arr1->data[untrusted_offset_from_user]; 11 unsigned long index2 = ((value&1)*0x100)+0x200; 12 if (index2 < arr2->length) { 13 unsigned char value2 = arr2->data[index2]; 14 } 15 }
  • 38.
    38 Variant 2: branchtarget injection
  • 39.
    39 Variant 3: usingspeculative reads of inaccessible data The perturbation of the cache by the LDR X5, [X6,X3] (line 7) can be subsequently measured by the EL0 code for different values of the shift amount imm (line 5). This gives a mechanism to establish the value of the EL1 data at the address pointed to by X4,so leaking data that should not be accessible to EL0 code. 1 LDR X1, [X2] ; arranged to miss in the cache 2 CBZ X1, over ; This will be taken but 3 ; is predicted not taken 4 LDR X3, [X4] ; X4 points to some EL1 memory 5 LSL X3, X3, #imm 6 AND X3, X3, #0xFC0 7 LDR X5, [X6,X3] ; X6 is an EL0 base address 8 over
  • 40.
    40 Variant 3a: usingspeculative reads of system registers In much the same way as with the main Variant 3, in a small number of Arm implementations, a processor that speculatively performs a read of a system register that is not accessible at the current exception level, will actually access the associated system register (provided that it is a register that can be read without side-effects). 1 LDR X1, [X2] ; arranged to miss in the cache 2 CBZ X1, over ; This will be taken 3 MRS X3, TTBR0_EL1; 4 LSL X3, X3, #imm 5 AND X3, X3, #0xFC0 6 LDR X5, [X6,X3] ; X6 is an EL0 base address 7 over Can be used to read crypto keys from system registers if ARM Pointer authentication feature used (ARMv8.3)
  • 41.
    41 Links 1. Programmer’s Guidefor ARMv8-A (DEN0024A) 2. ARM® Architecture Reference Manual ARMv8, for ARMv8-A architecture profile DDI0487B_a_armv8.pdf 3. ARM® Architecture Reference Manual ARMv7-A and ARMv7-R edition DDI0406C_C_arm_architecture_reference_manual.pdf 4. http://meltdownattack.com 5. http://spectreattack.com 6. https://developer.arm.com/support/security-update
  • 42.
  • 43.
    43 Thank you Andrew Lukin SrEmbedded Developer andrii.lukin@globallogic.com +380-95-303-43-76