Meltdown & Spectre
Mitigate the unmitigable
Based on
● Meltdown: Reading Kernel Memory from User Space
● Spectre Attacks: Exploiting Speculative Execution
HELLO!
We are
2
Sergiy
Shevchenko
Marco
Cipriano
● What are them?
○ Vulnerabilities in most modern CPU, due to
poorly understood interactions between
speculative execution and side effects
● When?
○ Discovered in june/july of 2017 but leaked on
4th of january 2018 by Google’s Project Zero
● Effects?
○ Need to modify:
■ CPU Memory
■ OS Virtual memory handling
■ Compilers
■ Hypervisors
■ Browsers
○ Real solution: change CPU DESIGN
3
Are considered the greatest
hardware vulnerabilities in
computer history
Today’s Agenda
1. Speculative Execution
2. Meltdown attack
3. Spectre attack
4
1.
Speculative
Execution
Let’s start from the beginning
Presented by
Sergiy Shevchenko
Basic memory model of modern CPU
❑ Each CPU has 3 caches*:
❏ L1 with access latency ~ 5 cycles
(1.2 ns)
❏ L2 with access latency ~ 10
cycles (4.2 ns)
❏ L3 with access latency ~ 40
cycles (20 ns)
❏ DRAM (100 ns)
*Modern Core i7 Xeon Server Edition
6
Fetch-decode-execute parallelism
❑ On a non-pipelined CPU, when a instruction
is being processed at a particular stage, the
other stages are at an idle state
❑ On a pipelined CPU, all the stages work in
parallel:
❏ When the 1st instruction is being
decoded by the Decoder Unit, the 2nd
instruction is being fetched by the Fetch
Unit. It only takes 5 clock cycles to
execute 2 instructions on a pipelined
CPU.
Image source: https://stackpointer.io/hardware/how-pipelining-improves-cpu-performance/113/
7
Intel SkyLake CPU Microarchitecture
❑ Simplified illustration of a
single core of the Intel’s
Skylake microarchitecture
❑ Instructions are decoded
into µOPs and executed
out-of-order in the execution
engine by individual
execution units
Figure from MeltDown by Lipp
https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-lipp.pdf
8
Out-of-order processing
1 op per cycle
9
❑ t = a + b
❑ u = c + d
❑ v = e + f
❑ w = v + g
❑ x = h + i
❑ y = j + k
6 clock cycles
❑ t = a + b, u = c + d
❑ v = e + f, w = v + g
❑ x = h + i, y = j + k
3 clock cycles
2 op per cycle
Out-of-order processing
1 op per cycle
10
❑ t = a + b
❑ u = c + d
❑ v = e + f
❑ w = v + g
❑ x = h + i
❑ y = j + k
6 clock cycles
❑ t = a + b, u = c + d
❑ v = e + f, w = v + g
❑ x = h + i, y = j + k
3 clock cycles
2 op per cycle
Out-of-order processing
11
❑ t = a + b, u = c + d
❑ v = e + f
❑ w = v + g, x = h + i
❑ y = j + k
❑ t = a + b, u = c + d
❑ v = e + f, y = j + k
❑ w = v + g, x = h + i
3 clock cycles, 2 op per cycle and
out-of-order execution
Fast in a straight line
not so good on corners
Branch prediction
13
❑ Pipelining, superscalar and out-of-order execution only
helps if you know what instructions are coming next
❑ Conditionals are a problem - we don’t know what to load
into a pipeline until conditional IF is clear
Branch prediction will help: let's guess
❑ If the guess is right, great
❑ If the guess is wrong, clear pipeline
Branch Prediction Is Not A Solved Problem: Measurements, Opportunities, and Future Directions
https://arxiv.org/pdf/1906.08170.pdf
B1 B2
?
Branch prediction + Speculative execution
14
❑ Cache misses cause long delay in data extraction
❑ Speculative execution:
❏ Execute instructions on predicted branch
❏ If prediction was right - great!
❏ If prediction was wrong - undo all effects of running in
speculative mode and flush pipeline
Branch Prediction Is Not A Solved Problem: Measurements, Opportunities, and Future Directions
https://arxiv.org/pdf/1906.08170.pdf
Example of branch prediction with speculation
15
1. C = A + B;
2. E = C + D;
3. G = E + F;
4. if (G == 0){
5. J = H + I;
6. L = J + K;
7. N = L + M;
8. }
Without speculation
can’t reorder
anything
Example of branch prediction with speculation
16
1. C = A + B;
2. E = C + D;
3. G = E + F;
4. if (G == 0){
5. J = H + I;
6. L = J + K;
7. N = L + M;
8. }
With speculation
1. C = A + B; _J = H + I;
2. E = C + D; _L = _J + K;
3. G = E + F; _N = _L + M;
4. if (G == 0){
5. J =_J; L =_L; N=_N;
6. }
Only make
results available
if G equals 0
ALU 1 ALU 2
ALU 1
Virtual memory
❑ Virtual memory is divided into
Kernel and User space
❑ Page table handles mappings
❑ Kernel space pages have an
extra bit
❑ If user code tries accessing
kernel space a trap will occur
17
Kernel space
User space
Physical
memory
Page
table
2.
Meltdown attack
Explained
Presented by
Sergiy Shevchenko
Meltdown Abstract
19
Meltdown exploits side effects of out-order execution on
modern processors to read arbitrary kernel-memory locations
including personal data and passwords.
...read memory of process or virtual machines in the cloud
without any permissions or privileges.
CVE-2017-5754
Meltdown attack: side channel
20
1. C = A + B;
2. E = C + D;
3. G = E + F;
4. Z = G + Y
5. if (Z == 0){
6. J = kernel_mem[addr];
7. L = J & 0x01;
8. N = L * 4096;
9. M = user_mem[N];
10. }
1. C = A + B; _J = kernel_mem[addr];
2. E = C + D; _L = _J & 0x01;
3. G = E + F; _N = _L * 4096;
4. Z = G + Y; _M = user_mem[_N];
5. if (Z == 0){
6. J =_J; L =_L; N =_N; M =_M;
7. }
8. t1 = get_time();
9. V = user_mem[0];
10. t2 = get_time();
11. delta = t2 - t1;
ALU 1 ALU 2
Predictor
thinks this is
true
Normally, this should receive
segmentation fault, but not in
speculative case
As this will never being executed, _J,
_L and _N will be never revealed
If the difference is < 10
cycles the 1st
bit of
kernel_mem[addr] = 0
Meltdown attack performance
21
❑ Core i7-8700K:
❏ 582 KB/s error 0.003%
❑ Core i7-6700K:
❏ 569 KB/s error 0.002%
❑ Intel Xeon E5-1630:
❏ 491 KB/s error 10.7%
* Meltdown Main Paper
https://meltdownattack.com/meltdown.pdf
Meltdown mitigation
22
❑ Why do operating systems map kernel memory into global
address space?
❏ ...making in this way Meltdown possible..
❑ .. because of memory page caching named TLB
(translation lookaside buffer)
❏ Page table is stored in memory itself, so accessing it
costs a lot
* Meltdown Main Paper
https://meltdownattack.com/meltdown.pdf
TLB
23
❑ A translation lookaside buffer (TLB)
is a memory cache that is used to
reduce the time taken to access a
user memory location, it is a part of
MMU
❑ If you try accessing page which is
not in TLB need to traverse all page
table to find where the physical
page is
❏ Cost is near 300-600 cycles!
* Meltdown Main Paper
https://meltdownattack.com/meltdown.pdf
Kernel space
User space
Physical
memory
MMU
TLB
KPTI - Kernel Page Table Isolation
24
❑ Only have user space memory
mapping in the page table
❏ So meltdown can “physically”
read kernel memory
❑ Some additional line of code in kernel
to handle userspace and kernel space
page table switching
KPTI - Kernel Page Table Isolation
25
❑ Complete mitigation of meltdown
❑ Changing page tables requires flushing
TLB
❑ Big impact on performance (this will
happen on each OS Call!)
❑ In some cases > 50% slowdown
❑ Newer Intel CPU has association Process
Context ID with TLB
❏ This allows flushing only part of TLB
3.
Spectre attack
A really hard part
Presented by
Marco Cipriano
Spectre Abstract
27
❏ Spectre Variant 1: Bound Check Bypass
❏ Exploiting Conditional Branches
❏ CVE-2017-5753
❏ Spectre Variant 2: Branch Target Injection
❏ Exploiting Indirect Branches
❏ CVE-2017-5715
Spectre attacks involve inducing a victim to speculatively perform
operations that would not occur during correct program execution
and wich leak victim’s confidential information via a side channel to
the adversary
Differences
28
Concepts
29
Spectre attack: Overview
30
❑ Setup Phase: Minstrain the CPU
❑ Speculative Execution of Instructions that leak sensitive
information to slide channel
❑ Recovering sensitive instruction from side channel using:
❏ Flush + Reload
❏ Evict + Reload
Yarom, Y., and Fakner, K. - FLUSH+RELOAD: A HIGH RESOLUTION, LOW NOISE, L3 CACHE SIDE-CHANNEL ATTACK, - in Usenix
Security Symposium (2014).
transitories
Variant 1 : Bound Check Bypass
31
1. array a = …; // size 400
2. array b = …; // size 512
3. offset = …; // user input
4. if (offset < size(a)){
5. v = a[offset];
6. i = (v & 0x01) * 4096;
7. x = b[i];
8. }
Variant 2 : Poisoning Indirect Branches
❏ Indirect branch instructions have ability to jump to more than
two possible target addresses.
❏ X86 example
❏ jmp eax: Address in register
❏ jmp [eax]: Address in memory location
❏ jmp dword ptr [0x12345678]
❏ Address from the stack (“ret”)
❏ MIPS example
❏ jr $ra
32
How can we use this?
❏ Find unsafe user code [unlikely]
❏ Use JIT Compiler of user code [highly likely]
❏ Google PoC uses BPF (packet filter in JIT)
in Linux kernel
❏ Microsoft PoC uses Javascript
❏ Potentially any visited website can
access all your RAM memory
33
Attack Variations
34
Attack Variation 1 : Evict + Time
35
Selecting bit to
analyze
Measuring
access time
If data is on CPU
cache saving it
Mitigation Options
36
Preventing Speculative Execution
Inserting speculative execution blocking instructions
❏ Degrades performance if used too extensively
❏ Use static analysis to find out optimum placement of blocking instructions
❏ Requires code recompilation
37
❏ Intel Processors:
❏ add lfence before IF in bound checks (prevent speculation)
❏ ARM Processors:
❏ add build_in_no_speculate() to your compilation process
Conclusion
❏ Software Isolation techniques are widely deployed
❏ A fundamental security assumption underpinning all of these is that the CPU will
faithfully execute software, including its safety checks
❏ Speculative execution violates this assumption that allow adversaries to
determine the contents of memory and register
❏ Trade-offs between security and performance
38
THANKS!
Any questions?
Sergiy Shevchenko
s.shevchenko
@studenti.unisa.it
Marco Cipriano
m.cipriano12
@studenti.unisa.it

Meltdown & Spectre

  • 1.
    Meltdown & Spectre Mitigatethe unmitigable Based on ● Meltdown: Reading Kernel Memory from User Space ● Spectre Attacks: Exploiting Speculative Execution
  • 2.
  • 3.
    ● What arethem? ○ Vulnerabilities in most modern CPU, due to poorly understood interactions between speculative execution and side effects ● When? ○ Discovered in june/july of 2017 but leaked on 4th of january 2018 by Google’s Project Zero ● Effects? ○ Need to modify: ■ CPU Memory ■ OS Virtual memory handling ■ Compilers ■ Hypervisors ■ Browsers ○ Real solution: change CPU DESIGN 3 Are considered the greatest hardware vulnerabilities in computer history
  • 4.
    Today’s Agenda 1. SpeculativeExecution 2. Meltdown attack 3. Spectre attack 4
  • 5.
    1. Speculative Execution Let’s start fromthe beginning Presented by Sergiy Shevchenko
  • 6.
    Basic memory modelof modern CPU ❑ Each CPU has 3 caches*: ❏ L1 with access latency ~ 5 cycles (1.2 ns) ❏ L2 with access latency ~ 10 cycles (4.2 ns) ❏ L3 with access latency ~ 40 cycles (20 ns) ❏ DRAM (100 ns) *Modern Core i7 Xeon Server Edition 6
  • 7.
    Fetch-decode-execute parallelism ❑ Ona non-pipelined CPU, when a instruction is being processed at a particular stage, the other stages are at an idle state ❑ On a pipelined CPU, all the stages work in parallel: ❏ When the 1st instruction is being decoded by the Decoder Unit, the 2nd instruction is being fetched by the Fetch Unit. It only takes 5 clock cycles to execute 2 instructions on a pipelined CPU. Image source: https://stackpointer.io/hardware/how-pipelining-improves-cpu-performance/113/ 7
  • 8.
    Intel SkyLake CPUMicroarchitecture ❑ Simplified illustration of a single core of the Intel’s Skylake microarchitecture ❑ Instructions are decoded into µOPs and executed out-of-order in the execution engine by individual execution units Figure from MeltDown by Lipp https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-lipp.pdf 8
  • 9.
    Out-of-order processing 1 opper cycle 9 ❑ t = a + b ❑ u = c + d ❑ v = e + f ❑ w = v + g ❑ x = h + i ❑ y = j + k 6 clock cycles ❑ t = a + b, u = c + d ❑ v = e + f, w = v + g ❑ x = h + i, y = j + k 3 clock cycles 2 op per cycle
  • 10.
    Out-of-order processing 1 opper cycle 10 ❑ t = a + b ❑ u = c + d ❑ v = e + f ❑ w = v + g ❑ x = h + i ❑ y = j + k 6 clock cycles ❑ t = a + b, u = c + d ❑ v = e + f, w = v + g ❑ x = h + i, y = j + k 3 clock cycles 2 op per cycle
  • 11.
    Out-of-order processing 11 ❑ t= a + b, u = c + d ❑ v = e + f ❑ w = v + g, x = h + i ❑ y = j + k ❑ t = a + b, u = c + d ❑ v = e + f, y = j + k ❑ w = v + g, x = h + i 3 clock cycles, 2 op per cycle and out-of-order execution
  • 12.
    Fast in astraight line not so good on corners
  • 13.
    Branch prediction 13 ❑ Pipelining,superscalar and out-of-order execution only helps if you know what instructions are coming next ❑ Conditionals are a problem - we don’t know what to load into a pipeline until conditional IF is clear Branch prediction will help: let's guess ❑ If the guess is right, great ❑ If the guess is wrong, clear pipeline Branch Prediction Is Not A Solved Problem: Measurements, Opportunities, and Future Directions https://arxiv.org/pdf/1906.08170.pdf B1 B2 ?
  • 14.
    Branch prediction +Speculative execution 14 ❑ Cache misses cause long delay in data extraction ❑ Speculative execution: ❏ Execute instructions on predicted branch ❏ If prediction was right - great! ❏ If prediction was wrong - undo all effects of running in speculative mode and flush pipeline Branch Prediction Is Not A Solved Problem: Measurements, Opportunities, and Future Directions https://arxiv.org/pdf/1906.08170.pdf
  • 15.
    Example of branchprediction with speculation 15 1. C = A + B; 2. E = C + D; 3. G = E + F; 4. if (G == 0){ 5. J = H + I; 6. L = J + K; 7. N = L + M; 8. } Without speculation can’t reorder anything
  • 16.
    Example of branchprediction with speculation 16 1. C = A + B; 2. E = C + D; 3. G = E + F; 4. if (G == 0){ 5. J = H + I; 6. L = J + K; 7. N = L + M; 8. } With speculation 1. C = A + B; _J = H + I; 2. E = C + D; _L = _J + K; 3. G = E + F; _N = _L + M; 4. if (G == 0){ 5. J =_J; L =_L; N=_N; 6. } Only make results available if G equals 0 ALU 1 ALU 2 ALU 1
  • 17.
    Virtual memory ❑ Virtualmemory is divided into Kernel and User space ❑ Page table handles mappings ❑ Kernel space pages have an extra bit ❑ If user code tries accessing kernel space a trap will occur 17 Kernel space User space Physical memory Page table
  • 18.
  • 19.
    Meltdown Abstract 19 Meltdown exploitsside effects of out-order execution on modern processors to read arbitrary kernel-memory locations including personal data and passwords. ...read memory of process or virtual machines in the cloud without any permissions or privileges. CVE-2017-5754
  • 20.
    Meltdown attack: sidechannel 20 1. C = A + B; 2. E = C + D; 3. G = E + F; 4. Z = G + Y 5. if (Z == 0){ 6. J = kernel_mem[addr]; 7. L = J & 0x01; 8. N = L * 4096; 9. M = user_mem[N]; 10. } 1. C = A + B; _J = kernel_mem[addr]; 2. E = C + D; _L = _J & 0x01; 3. G = E + F; _N = _L * 4096; 4. Z = G + Y; _M = user_mem[_N]; 5. if (Z == 0){ 6. J =_J; L =_L; N =_N; M =_M; 7. } 8. t1 = get_time(); 9. V = user_mem[0]; 10. t2 = get_time(); 11. delta = t2 - t1; ALU 1 ALU 2 Predictor thinks this is true Normally, this should receive segmentation fault, but not in speculative case As this will never being executed, _J, _L and _N will be never revealed If the difference is < 10 cycles the 1st bit of kernel_mem[addr] = 0
  • 21.
    Meltdown attack performance 21 ❑Core i7-8700K: ❏ 582 KB/s error 0.003% ❑ Core i7-6700K: ❏ 569 KB/s error 0.002% ❑ Intel Xeon E5-1630: ❏ 491 KB/s error 10.7% * Meltdown Main Paper https://meltdownattack.com/meltdown.pdf
  • 22.
    Meltdown mitigation 22 ❑ Whydo operating systems map kernel memory into global address space? ❏ ...making in this way Meltdown possible.. ❑ .. because of memory page caching named TLB (translation lookaside buffer) ❏ Page table is stored in memory itself, so accessing it costs a lot * Meltdown Main Paper https://meltdownattack.com/meltdown.pdf
  • 23.
    TLB 23 ❑ A translationlookaside buffer (TLB) is a memory cache that is used to reduce the time taken to access a user memory location, it is a part of MMU ❑ If you try accessing page which is not in TLB need to traverse all page table to find where the physical page is ❏ Cost is near 300-600 cycles! * Meltdown Main Paper https://meltdownattack.com/meltdown.pdf Kernel space User space Physical memory MMU TLB
  • 24.
    KPTI - KernelPage Table Isolation 24 ❑ Only have user space memory mapping in the page table ❏ So meltdown can “physically” read kernel memory ❑ Some additional line of code in kernel to handle userspace and kernel space page table switching
  • 25.
    KPTI - KernelPage Table Isolation 25 ❑ Complete mitigation of meltdown ❑ Changing page tables requires flushing TLB ❑ Big impact on performance (this will happen on each OS Call!) ❑ In some cases > 50% slowdown ❑ Newer Intel CPU has association Process Context ID with TLB ❏ This allows flushing only part of TLB
  • 26.
    3. Spectre attack A reallyhard part Presented by Marco Cipriano
  • 27.
    Spectre Abstract 27 ❏ SpectreVariant 1: Bound Check Bypass ❏ Exploiting Conditional Branches ❏ CVE-2017-5753 ❏ Spectre Variant 2: Branch Target Injection ❏ Exploiting Indirect Branches ❏ CVE-2017-5715 Spectre attacks involve inducing a victim to speculatively perform operations that would not occur during correct program execution and wich leak victim’s confidential information via a side channel to the adversary
  • 28.
  • 29.
  • 30.
    Spectre attack: Overview 30 ❑Setup Phase: Minstrain the CPU ❑ Speculative Execution of Instructions that leak sensitive information to slide channel ❑ Recovering sensitive instruction from side channel using: ❏ Flush + Reload ❏ Evict + Reload Yarom, Y., and Fakner, K. - FLUSH+RELOAD: A HIGH RESOLUTION, LOW NOISE, L3 CACHE SIDE-CHANNEL ATTACK, - in Usenix Security Symposium (2014). transitories
  • 31.
    Variant 1 :Bound Check Bypass 31 1. array a = …; // size 400 2. array b = …; // size 512 3. offset = …; // user input 4. if (offset < size(a)){ 5. v = a[offset]; 6. i = (v & 0x01) * 4096; 7. x = b[i]; 8. }
  • 32.
    Variant 2 :Poisoning Indirect Branches ❏ Indirect branch instructions have ability to jump to more than two possible target addresses. ❏ X86 example ❏ jmp eax: Address in register ❏ jmp [eax]: Address in memory location ❏ jmp dword ptr [0x12345678] ❏ Address from the stack (“ret”) ❏ MIPS example ❏ jr $ra 32
  • 33.
    How can weuse this? ❏ Find unsafe user code [unlikely] ❏ Use JIT Compiler of user code [highly likely] ❏ Google PoC uses BPF (packet filter in JIT) in Linux kernel ❏ Microsoft PoC uses Javascript ❏ Potentially any visited website can access all your RAM memory 33
  • 34.
  • 35.
    Attack Variation 1: Evict + Time 35 Selecting bit to analyze Measuring access time If data is on CPU cache saving it
  • 36.
  • 37.
    Preventing Speculative Execution Insertingspeculative execution blocking instructions ❏ Degrades performance if used too extensively ❏ Use static analysis to find out optimum placement of blocking instructions ❏ Requires code recompilation 37 ❏ Intel Processors: ❏ add lfence before IF in bound checks (prevent speculation) ❏ ARM Processors: ❏ add build_in_no_speculate() to your compilation process
  • 38.
    Conclusion ❏ Software Isolationtechniques are widely deployed ❏ A fundamental security assumption underpinning all of these is that the CPU will faithfully execute software, including its safety checks ❏ Speculative execution violates this assumption that allow adversaries to determine the contents of memory and register ❏ Trade-offs between security and performance 38
  • 39.