Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Meltdown, Spectre and
Apache Spark™
Performance
Chris Stevens
June 5, 2018
1
Databricks Performance on AWS
2
3-5% performance degradation
Overview
Goal: Understand the 3-5% degradation
Steps:
- TPC-DS Benchmarks
- System Analysis
- Breakdown the exploits and p...
TPC-DS: q1-v2.4
4
WITH customer_total_return AS
(SELECT sr_customer_sk AS ctr_customer_sk,
sr_store_sk AS ctr_store_sk,
su...
TPC-DS: q1-v2.4
5
TPC-DS: q1-v2.4
6
q1-v2.4 CPU Utilization
7mpstat -P ALL
TPC-DS: q1-v2.4
8strace -f -c -p PID
Exploits Background
Out-of-Order Execution + Side Channel Attacks
9
In-order Execution
x = a + b
y = c + d
z = x + y
Add
A + B
Store
X
Decode
Add
Retire
Add
C + B
Store
Y
Decode
Add
Retire
A...
Out-of-order Execution
x = a + b
y = c + d
z = x + y
Add
A + B
Store
X
Decode
Add
Retire
Add
C + D
Store
Y
Decode
Add
Reti...
Side-Channel Attacks
FLUSH +RELOAD
0 0xABAB
1 0xCDCD
2 0xEFEF
3 0x1212
0 INVALID
1 INVALID
2 INVALID
3 INVALID
0 INVALID
1...
Side-Channel Attacks
https://meltdownattack.com/meltdown.pdf
13
Meltdown
14
Meltdown
kernelByte = *kernelAddr;
probeArray[kernelByte * 4096];
15
Meltdown
kernelByte = *kernelAddr;
probeArray[kernelByte * 4096];
Get value of
kernelAddr
Get value of
probeArray
Read 1 B...
Side-Channel Attack
FLUSH +RELOAD
0*4096 INVALID
1*4096 INVALID
2*4096 INVALID
3*4096 INVALID
Speculative execution
reads ...
Kernel Page-Table Isolation
Process Page Tables without KPTI
Combined
Page Tables
Kernel
Virtual Memory
User
Virtual Memor...
Meltdown
kernelByte = *kernelAddr;
probeArray[kernelByte * 4096];
Get value of
kernelAddr
Get value of
probeArray
Read 1 B...
Kernel Page-Table Isolation
Process Page Tables without KPTI
Combined
Page Tables
Kernel
Virtual Memory
User
Virtual Memor...
TLB before KPTI
Virtual
Address
Physical
Address
- -
- -
while True:
print *UserAddress
sys_clock_gettime
Virtual
Address
...
TLB with KPTI
Virtual
Address
Physical
Address
- -
- -
while True:
print *UserAddress
sys_clock_gettime
Virtual
Address
Ph...
Meltdown TLB Misses
23perf stat -e dtlb-load-misses
TLB with KPTI and PCID
Virtual
Address
PCID Physical
Address
- - -
- - -
while True:
print *UserAddress
sys_clock_gettime
...
Meltdown Runtime
25
Spectre V1
Bounds Check Bypass
26
Bounds Check Bypass
if (index < size) {
val = array[index];
probeArray[val];
}
27
Bounds Check Bypass
28
if (index < size) {
val = array[index];
probeArray[val];
}
Read size Is (index < size)? Retire
Read...
Bounds Check Bypass
29
if (index < size) {
val = array[index];
probeArray[val];
} ATTACKTRAIN
Was (index < size)?
-
Was (i...
Observable Speculation Barrier
30
- Protects against Spectre V1 - Bounds Check Bypass
- ~14 in the Ubuntu kernel and drive...
Observable Speculation Barrier
31
if (index < size) {
osb()
val = array[index];
probeArray[val];
}
Read size Is (index < s...
Spectre V2
Branch Target Injection
32
Branch Target Injection
33
def call_func(func, arg):
func(arg)
Branch Target Injection
34
def call_func(func, arg):
func(arg)
Read func
from memory
Call func Retire
Call previous
func
T...
Branch Target Injection
35
def call_func(func, arg):
func(arg)
ATTACKTRAIN
What was func?
-
What was func?
attack_widget
c...
Intel Microcode Updates
36
- Protect against Spectre V2 - Branch Target Injection
- Indirect Branch Restricted Speculation...
IBPB and IBRS Patches
37
def call_func(func, arg):
func(arg)
ATTACKTRAIN
What was func?
-
What was func?
-
call_func(attac...
Branch Target Injection
38
def call_func(func, arg):
func(arg)
Read func
from memory
Call func Retire
Cache Miss
Time
What...
Spectre V2 Branch Misses
39perf stat -e branch-misses
Spectre V2 Runtime
40
Retpoline
- Protects indirect branches/calls from speculative execution
- Uses an infinite loop to do it
41
jmp rax call s...
Retpoline
- Protects indirect branches/calls from speculative execution
- Uses an infinite loop to do it
42
jmp rax call s...
Retpoline
- Protects indirect branches/calls from speculative execution
- Uses an infinite loop to do it
43
jmp rax call s...
Retpoline
- Protects indirect branches/calls from speculative execution
- Uses an infinite loop to do it
44
jmp rax call s...
Retpoline
- Protects indirect branches/calls from speculative execution
- Uses an infinite loop to do it
45
jmp rax call s...
Retpoline (Improved)
- Check against a known target and direct branch
46
jmp rax cmp rax, known_target
jne retpoline
jmp k...
Retpoline Runtime
47
Retpoline CPU Utilization
48mpstat -P ALL
Retpoline System Calls
49strace -f -c -p PID
A Tale of Two Queries
50
q1-v2.4 ss_max-v2.4
SELECT
count(*) AS total,
count(ss_sold_date_sk) AS not_null_total,
count(DIS...
q1-v2.4 vs ss_max-v2.4
51perf stat -e raw_syscalls:sys_enter -a -I 1000
Thank You
chris.stevens@databricks.com
https://www.linkedin.com/in/chriscstevens/
52
Backup Slides
53
array_index_nospec()
54
- Protects against Spectre V1 - Bounds Check Bypass
- ~50 in the Linux kernel and drivers (not in ...
Speculative Store Bypass
55
- CVE-2018-3639
- “Variant 4”
- Public Date: May 21, 2018
- Details:
- https://access.redhat.c...
TPC-DS: ss_max-v2.4
56
TPC-DS: q1-v2.4
57
Baselines Pre-Skylake 2% Skylake+ 11%
TPC-DS: ss_max-v2.4
58
Baselines Pre-Skylake 0% Skylake+ 1%
q1-v2.4 Runtime Box Plots
59
ss_max-v2.4 Runtime Box Plots
60
Single Node TPC-DS Setup
• Intel Core i7-4820k @ 3.70GHz (Ivy Bridge - ca. 2012)
• 8 CPUs
• 8GB of RAM
• Disk: Western Dig...
Repositories
- https://github.com/apache/spark.git
- https://github.com/databricks/spark-sql-perf.git
- https://github.com...
Upcoming SlideShare
Loading in …5
×

of

Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 1 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 2 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 3 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 4 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 5 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 6 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 7 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 8 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 9 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 10 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 11 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 12 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 13 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 14 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 15 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 16 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 17 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 18 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 19 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 20 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 21 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 22 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 23 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 24 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 25 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 26 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 27 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 28 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 29 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 30 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 31 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 32 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 33 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 34 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 35 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 36 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 37 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 38 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 39 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 40 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 41 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 42 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 43 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 44 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 45 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 46 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 47 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 48 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 49 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 50 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 51 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 52 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 53 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 54 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 55 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 56 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 57 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 58 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 59 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 60 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 61 Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens Slide 62
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens

Download to read offline

Meltdown and Spectre are two security vulnerabilities disclosed in early 2018 that expose systems to cross-VM and cross-process attacks. They were the first of their kind and opened up a new class of exploits that allow one program to scan another program’s memory. The kernel and VM patches released to address these vulnerabilities have shown to degrade the performance of Apache Spark workloads in the cloud by 2-5%.

This talk will dive deep into the exploits and their patches in order to help explain the origin of this decline in performance.

Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workloads with Chris Stevens

  1. 1. Meltdown, Spectre and Apache Spark™ Performance Chris Stevens June 5, 2018 1
  2. 2. Databricks Performance on AWS 2 3-5% performance degradation
  3. 3. Overview Goal: Understand the 3-5% degradation Steps: - TPC-DS Benchmarks - System Analysis - Breakdown the exploits and patches - Meltdown - Spectre V1 - Bounds Check Bypass - Specter V2 - Branch Target Injection 3
  4. 4. TPC-DS: q1-v2.4 4 WITH customer_total_return AS (SELECT sr_customer_sk AS ctr_customer_sk, sr_store_sk AS ctr_store_sk, sum(sr_return_amt) AS ctr_total_return FROM store_returns, date_dim WHERE sr_returned_date_sk = d_date_sk AND d_year = 2000 GROUP BY sr_customer_sk, sr_store_sk) SELECT c_customer_id FROM customer_total_return ctr1, store, customer WHERE ctr1.ctr_total_return > (SELECT avg(ctr_total_return)*1.2 FROM customer_total_return ctr2 WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk) AND s_store_sk = ctr1.ctr_store_sk AND s_state = 'TN' AND ctr1.ctr_customer_sk = c_customer_sk ORDER BY c_customer_id LIMIT 100
  5. 5. TPC-DS: q1-v2.4 5
  6. 6. TPC-DS: q1-v2.4 6
  7. 7. q1-v2.4 CPU Utilization 7mpstat -P ALL
  8. 8. TPC-DS: q1-v2.4 8strace -f -c -p PID
  9. 9. Exploits Background Out-of-Order Execution + Side Channel Attacks 9
  10. 10. In-order Execution x = a + b y = c + d z = x + y Add A + B Store X Decode Add Retire Add C + B Store Y Decode Add Retire Add X + Y Store Z Decode Add Retire Time 10
  11. 11. Out-of-order Execution x = a + b y = c + d z = x + y Add A + B Store X Decode Add Retire Add C + D Store Y Decode Add Retire Add X+Y Store Z Decode Add Retire Time 11
  12. 12. Side-Channel Attacks FLUSH +RELOAD 0 0xABAB 1 0xCDCD 2 0xEFEF 3 0x1212 0 INVALID 1 INVALID 2 INVALID 3 INVALID 0 INVALID 1 INVALID 2 0xEFEF 3 INVALID Speculative execution fills cache entry 2 0 0xABAB 1 0xCDCD 2 0xEFEF 3 0x1212 ATTACK clflush instruction invalidates the cache Reload measures the time to read each cache entry 12
  13. 13. Side-Channel Attacks https://meltdownattack.com/meltdown.pdf 13
  14. 14. Meltdown 14
  15. 15. Meltdown kernelByte = *kernelAddr; probeArray[kernelByte * 4096]; 15
  16. 16. Meltdown kernelByte = *kernelAddr; probeArray[kernelByte * 4096]; Get value of kernelAddr Get value of probeArray Read 1 Byte at kernelAddr Retire Multiply kernel byte by 4096 Read probeArray at offset Time Decode Multiply Decode Array Access Decode Memory Read Retire Retire Permission Check Here Side Channel Attack 16
  17. 17. Side-Channel Attack FLUSH +RELOAD 0*4096 INVALID 1*4096 INVALID 2*4096 INVALID 3*4096 INVALID Speculative execution reads probeArray at kernelByte * 4096 ATTACK Flush each page in probeArray Reload measures the time to read each page in probeArray 0*4096 INVALID 1*4096 INVALID 2*4096 0xEFEF 3*4096 INVALID 0*4096 0xABAB 1*4096 0xCDCD 2*4096 0xEFEF 3*4096 0x1212 Sees that page 2 was the fastest => kernelByte = 0x2 17
  18. 18. Kernel Page-Table Isolation Process Page Tables without KPTI Combined Page Tables Kernel Virtual Memory User Virtual Memory Sensitive Data Meltdown Code Process Page Tables with KPTI Kernel Page Tables Kernel Virtual Memory User Virtual Memory Sensitive Data Meltdown Code Meltdown Code User Mode Page Tables Same Physical Memory 18
  19. 19. Meltdown kernelByte = *kernelAddr; probeArray[kernelByte * 4096]; Get value of kernelAddr Get value of probeArray Read 1 Byte at kernelAddr Retire Multiply kernel byte by 4096 Read probeArray at offset Time Decode Multiply Decode Array Access Decode Memory Read Retire Retire 19
  20. 20. Kernel Page-Table Isolation Process Page Tables without KPTI Combined Page Tables Kernel Virtual Memory User Virtual Memory Sensitive Data Meltdown Code Process Page Tables with KPTI Kernel Page Tables Kernel Virtual Memory User Virtual Memory Sensitive Data Meltdown Code Meltdown Code User Mode Page Tables Same Physical Memory 20
  21. 21. TLB before KPTI Virtual Address Physical Address - - - - while True: print *UserAddress sys_clock_gettime Virtual Address Physical Address UserAddress Page 1 - - Virtual Address Physical Address UserAddress Page 1 KernelTime Page 2 print *UserAddress print *UserAddresssys_clock_gettime TLB MISS TLB MISS TLB HIT 21
  22. 22. TLB with KPTI Virtual Address Physical Address - - - - while True: print *UserAddress sys_clock_gettime Virtual Address Physical Address - - - - Virtual Address Physical Address - - - - print *UserAddress print *UserAddresssys_clock_gettime TLB MISS TLB MISS TLB MISS 22
  23. 23. Meltdown TLB Misses 23perf stat -e dtlb-load-misses
  24. 24. TLB with KPTI and PCID Virtual Address PCID Physical Address - - - - - - while True: print *UserAddress sys_clock_gettime print *UserAddress print *UserAddresssys_clock_gettime TLB MISS TLB MISS TLB HIT Virtual Address PCID Physical Address UserAddress 1 Page 1 - - - Virtual Address PCID Physical Address UserAddress 1 Page 1 KernelTime 0 Page 2
  25. 25. Meltdown Runtime 25
  26. 26. Spectre V1 Bounds Check Bypass 26
  27. 27. Bounds Check Bypass if (index < size) { val = array[index]; probeArray[val]; } 27
  28. 28. Bounds Check Bypass 28 if (index < size) { val = array[index]; probeArray[val]; } Read size Is (index < size)? Retire Read array at index Time Was (index < size)? Decode If Is (index < size>)? Yes Read probeArray Retire Retire Is (index < size>)? Yes Yes Side Channel Attack
  29. 29. Bounds Check Bypass 29 if (index < size) { val = array[index]; probeArray[val]; } ATTACKTRAIN Was (index < size)? - Was (index < size)? yes Code size = 10 index = 1000 Code size = 10 index = 5
  30. 30. Observable Speculation Barrier 30 - Protects against Spectre V1 - Bounds Check Bypass - ~14 in the Ubuntu kernel and drivers - Stops speculative array access with the LFENCE barrier if (index < size) { val = array[index]; probeArray[val]; } if (index < size) { osb(); val = array[index]; probeArray[val]; } Before After
  31. 31. Observable Speculation Barrier 31 if (index < size) { osb() val = array[index]; probeArray[val]; } Read size Is (index < size)? Retire LFENCE Time Was (index < size)? Decode If Is (index < size>)? Yes Retire Yes Read array at index
  32. 32. Spectre V2 Branch Target Injection 32
  33. 33. Branch Target Injection 33 def call_func(func, arg): func(arg)
  34. 34. Branch Target Injection 34 def call_func(func, arg): func(arg) Read func from memory Call func Retire Call previous func Time What was func last time? Decode call func == previous func Retire func == previous func No Yes
  35. 35. Branch Target Injection 35 def call_func(func, arg): func(arg) ATTACKTRAIN What was func? - What was func? attack_widget call_func(attack_widget, 1000) innocent, 10 call_func(innocent, 10) attack_widget, 1000
  36. 36. Intel Microcode Updates 36 - Protect against Spectre V2 - Branch Target Injection - Indirect Branch Restricted Speculation (IBRS) - Stops attacks from code running at lower privilege levels - Stops attacks from code running on the sibling hyperthread (STIBP) - Indirect Branch Prediction Barrier (IBPB) - Stops attacks from code running at the same privilege level - Inserted at User-to-User and Guest-to-Guest transitions
  37. 37. IBPB and IBRS Patches 37 def call_func(func, arg): func(arg) ATTACKTRAIN What was func? - What was func? - call_func(attack_widget, 1000) innocent, 10 call_func(innocent, 10) attack_widget, 1000 IBPB, IBRS
  38. 38. Branch Target Injection 38 def call_func(func, arg): func(arg) Read func from memory Call func Retire Cache Miss Time What was func last time? Decode call func == previous func No
  39. 39. Spectre V2 Branch Misses 39perf stat -e branch-misses
  40. 40. Spectre V2 Runtime 40
  41. 41. Retpoline - Protects indirect branches/calls from speculative execution - Uses an infinite loop to do it 41 jmp rax call set_up_target speculative_loop: pause lfence jmp speculative_loop set_up_target: mov rax, [rsp] ret Before After Return Stack Buffer . . . speculative_loop Stack . . . speculative_loop
  42. 42. Retpoline - Protects indirect branches/calls from speculative execution - Uses an infinite loop to do it 42 jmp rax call set_up_target speculative_loop: pause lfence jmp speculative_loop set_up_target: mov rax, [rsp] ret Before After Return Stack Buffer . . . speculative_loop Stack . . . rax
  43. 43. Retpoline - Protects indirect branches/calls from speculative execution - Uses an infinite loop to do it 43 jmp rax call set_up_target speculative_loop: pause lfence jmp speculative_loop set_up_target: mov rax, [rsp] ret Before After Return Stack Buffer . . . speculative_loop Stack . . . rax
  44. 44. Retpoline - Protects indirect branches/calls from speculative execution - Uses an infinite loop to do it 44 jmp rax call set_up_target speculative_loop: pause lfence jmp speculative_loop set_up_target: mov rax, [rsp] ret Before After Return Stack Buffer . . . speculative_loop Stack . . . rax
  45. 45. Retpoline - Protects indirect branches/calls from speculative execution - Uses an infinite loop to do it 45 jmp rax call set_up_target speculative_loop: pause lfence jmp speculative_loop set_up_target: mov rax, [rsp] ret Before After Return Stack Buffer . . . speculative_loop Stack . . . rax
  46. 46. Retpoline (Improved) - Check against a known target and direct branch 46 jmp rax cmp rax, known_target jne retpoline jmp known_target retpoline: call set_up_target speculative_loop: pause jmp speculative_loop set_up_target: mov rax, [rsp] ret Before After Direct Branch (fast)
  47. 47. Retpoline Runtime 47
  48. 48. Retpoline CPU Utilization 48mpstat -P ALL
  49. 49. Retpoline System Calls 49strace -f -c -p PID
  50. 50. A Tale of Two Queries 50 q1-v2.4 ss_max-v2.4 SELECT count(*) AS total, count(ss_sold_date_sk) AS not_null_total, count(DISTINCT ss_sold_date_sk) AS unique_days, max(ss_sold_date_sk) AS max_ss_sold_date_sk, max(ss_sold_time_sk) AS max_ss_sold_time_sk, max(ss_item_sk) AS max_ss_item_sk, max(ss_customer_sk) AS max_ss_customer_sk, max(ss_cdemo_sk) AS max_ss_cdemo_sk, max(ss_hdemo_sk) AS max_ss_hdemo_sk, max(ss_addr_sk) AS max_ss_addr_sk, max(ss_store_sk) AS max_ss_store_sk, max(ss_promo_sk) AS max_ss_promo_sk FROM store_sales WITH customer_total_return AS (SELECT sr_customer_sk AS ctr_customer_sk, sr_store_sk AS ctr_store_sk, sum(sr_return_amt) AS ctr_total_return FROM store_returns, date_dim WHERE sr_returned_date_sk = d_date_sk AND d_year = 2000 GROUP BY sr_customer_sk, sr_store_sk) SELECT c_customer_id FROM customer_total_return ctr1, store, customer WHERE ctr1.ctr_total_return > (SELECT avg(ctr_total_return)*1.2 FROM customer_total_return ctr2 WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk) AND s_store_sk = ctr1.ctr_store_sk AND s_state = 'TN' AND ctr1.ctr_customer_sk = c_customer_sk ORDER BY c_customer_id LIMIT 100
  51. 51. q1-v2.4 vs ss_max-v2.4 51perf stat -e raw_syscalls:sys_enter -a -I 1000
  52. 52. Thank You chris.stevens@databricks.com https://www.linkedin.com/in/chriscstevens/ 52
  53. 53. Backup Slides 53
  54. 54. array_index_nospec() 54 - Protects against Spectre V1 - Bounds Check Bypass - ~50 in the Linux kernel and drivers (not in Ubuntu) - Stops speculative array access by clamping the index if (index < size) { val = array[index]; probeArray[val]; } if (index < size) { index = array_index_nospec(index, size); val = array[index]; probeArray[val]; } Before After
  55. 55. Speculative Store Bypass 55 - CVE-2018-3639 - “Variant 4” - Public Date: May 21, 2018 - Details: - https://access.redhat.com/security/vulnerabilities/ssbd - Ubuntu Command Line Parameter: - spec_store_bypass_disable=on
  56. 56. TPC-DS: ss_max-v2.4 56
  57. 57. TPC-DS: q1-v2.4 57 Baselines Pre-Skylake 2% Skylake+ 11%
  58. 58. TPC-DS: ss_max-v2.4 58 Baselines Pre-Skylake 0% Skylake+ 1%
  59. 59. q1-v2.4 Runtime Box Plots 59
  60. 60. ss_max-v2.4 Runtime Box Plots 60
  61. 61. Single Node TPC-DS Setup • Intel Core i7-4820k @ 3.70GHz (Ivy Bridge - ca. 2012) • 8 CPUs • 8GB of RAM • Disk: Western Digital Blue – WDC10EZEX-08M2NA0 – 1TB – 7200 rpm – 146MB/s sequential read (http://hdd.userbenchmark.com/) • Ubuntu 16.04.4 LTS (Xenial Xerus) – 64-bit server image – 4.4.0-116-generic Linux kernel 61
  62. 62. Repositories - https://github.com/apache/spark.git - https://github.com/databricks/spark-sql-perf.git - https://github.com/databricks/tpcds-kit - https://github.com/speed47/spectre-meltdown-checker - https://github.com/brendangregg/pmc-cloud-tools - https://github.com/brendangregg/perf-tools 62
  • softpapa

    Jan. 13, 2019

Meltdown and Spectre are two security vulnerabilities disclosed in early 2018 that expose systems to cross-VM and cross-process attacks. They were the first of their kind and opened up a new class of exploits that allow one program to scan another program’s memory. The kernel and VM patches released to address these vulnerabilities have shown to degrade the performance of Apache Spark workloads in the cloud by 2-5%. This talk will dive deep into the exploits and their patches in order to help explain the origin of this decline in performance.

Views

Total views

654

On Slideshare

0

From embeds

0

Number of embeds

8

Actions

Downloads

23

Shares

0

Comments

0

Likes

1

×