Preventing CPU side-channel attacks with kernel tracking

Preventing Preventing
  CPU sidechannel CPU sidechannel
        attacks with attacks with
    kernel trackingkernel tracking
Marian MarinovMarian Marinov
mm@siteground.commm@siteground.com
Chief System ArchitectChief System Architect
Head of the DevOps departmentHead of the DevOps department

❖❖ Who am I?Who am I?
- Chief System Architect of SiteGround.com- Chief System Architect of SiteGround.com
- Sysadmin since 1996- Sysadmin since 1996
- Organizer of OpenFest, BG Perl- Organizer of OpenFest, BG Perl
Workshops, LUG-BG and othersWorkshops, LUG-BG and others
- Teaching Network Security and- Teaching Network Security and
Linux System AdministrationLinux System Administration
courses in Sofia Universitycourses in Sofia University
and SoftUniand SoftUni

❖❖ DisclaimerDisclaimer
What I'm proposing is NOT aWhat I'm proposing is NOT a
general purpose solution!general purpose solution!

We are a shared hostingWe are a shared hosting
provider... we consider allprovider... we consider all
code, hostilecode, hostile

We haven't seen MeltdownWe haven't seen Meltdown
attempts on ourattempts on our
infrastructureinfrastructure

a little bit of introa little bit of intro
Intel's microcode updatesIntel's microcode updates
and KPTI were supposed to and KPTI were supposed to
result in 1015% result in 1015%
performance DEGRADATIONperformance DEGRADATION

Needless to say...Needless to say...
I was not a big fan of I was not a big fan of
implementing both, across implementing both, across
all of our serversall of our servers

1015% on a single machine 1015% on a single machine
are not a problemare not a problem
on 1000s of machines...on 1000s of machines...
that is a bit differentthat is a bit different

There are There are 44**
different different
caches inside the CPUcaches inside the CPU
L1
instructions cache
L2 cache
L3 cache
L1 I
cache
L1 D
cache
L2 cache
L1 I
cache
L1 D
cache
L2 cache
L1
data cache
L1 I
cache
L2
Single Core Single CoreSingle Core Single C

There are There are 44**
different different
caches inside the CPUcaches inside the CPU
L1
instructions cache
L2 cache
L3 cache
L1 I
cache
L1 D
cache
L2 cache
L1 I
cache
L1 D
cache
L2 cache
L1
data cache
L1 I
cache
L2
Single Core Single CoreSingle Core Single C
* In some architectures, there is even L4 cache

L1 and L2 caches are shared between
hyper-threads in a single core
L2 cache is shared between different
execution engines inside the core
(ALU, FMA, ADD, etc.)
L3 cache is shared between all cores
Sharing the cacheSharing the cache

Shared L3 Cache (LLC)
Synchronization
L1
Instruction
cache
Branch Predict.Isnt. Fetch
Pipeline(s)
Instruction decoder
Dispatch Integer
Cluster
2FPU
W.C. Cache
L1
Instruction
cache
L1
data
cache
Integer
Cluster
1
L1
data
cache
L2 Data Cache
shared
Core
Iface
Single Core
L1
Instruction
cache
Pipeline(s)
Instruction decoder
Dispatch Integer
Cluster
2FPU
W.C. Cache
L1
Instruction
cache
L1
data
cache
Integer
Cluster
1
L1
data
cache
L2 Data Cache
shared
Core
Iface
Single Core
L1
Instruction
cache
Isnt.
Instruct
Di
W.C
L1
Instruction
cache
Integer
Cluster
1
L1
data
cache
LCore
Iface
Some CPU architecture intro :)
AMD Bulldozer block diagram

Cache SideChannel Attacks
➢ 2013 Flush + Reload
➢ 2016 Flush + Flush

Flush + ReloadFlush + Reload
L1
Instruction
cache
Pipeline(s)
Instruction decoder
Dispatch Integer
Cluster
2FPU
W.C. Cache
L1
Instruction
cache
L1
data
cache
Integer
Cluster
1
L1
data
cache
L2 Data Cache
shared
Core
Iface
Single Core
Synchronization
1. Find a shared library location in memory
2. Clear the cache
3. Check if the victim has accessed it or not by
comparing the time it takes to execute the code

Flush + FlushFlush + Flush
L1
Instruction
cache
Pipeline(s)
Instruction decoder
Dispatch Integer
Cluster
2FPU
W.C. Cache
L1
Instruction
cache
L1
data
cache
Integer
Cluster
1
L1
data
cache
L2 Data Cache
shared
Core
Iface
Single Core
Synchronization
1. Find a shared library location in memory
2. Clear the cache
3. Clear the cache again and observe the timing
if the victim has accessed the code, clflush will
take longer to finish

More architecture...
Floating Point
L1 D-Cache D-TLB
Schedulers
Integer
μop queues
Decoder
Trace Cache
Rename/Alloc
μop ROMBTB
BTB and I-TLB
BusL2CacheandControl
Thread 1: floating point

More architecture...
Floating Point
L1 D-Cache D-TLB
Schedulers
Integer
μop queues
Decoder
Trace Cache
Rename/Alloc
μop ROMBTB
BTB and I-TLB
BusL2CacheandControl
Thread 1: integer Thread 2: floating point

So we will look at So we will look at
protections from protections from
MeltdownMeltdown

Monitoring and analysisMonitoring and analysis
➢  KPTI was already in the makingKPTI was already in the making
➢  Capsule 8 wrote on Jan 5Capsule 8 wrote on Jan 5
➢  EndGame shared their research on EndGame shared their research on
Jan 08Jan 08
➢  I found out about EndGame and I found out about EndGame and
Capsule 8 on Jan 11, after we had Capsule 8 on Jan 11, after we had
already started on our workalready started on our work

Capsule 8 approachCapsule 8 approach
Kernel tracepoints and monitor Kernel tracepoints and monitor
for:for:
exceptions/page_fault_userexceptions/page_fault_user
Kernel perf countersKernel perf counters
– PERF_COUNT_HW_CACHE_OP_READPERF_COUNT_HW_CACHE_OP_READ
– PERF_COUNT_HW_CACHE_RESULT_ACCESSPERF_COUNT_HW_CACHE_RESULT_ACCESS
– PERF_COUNT_HW_CACHE_RESULT_MISSPERF_COUNT_HW_CACHE_RESULT_MISS
https://github.com/capsule8/capsule8/tree/master/examples

Capsule 8 approachCapsule 8 approach
This is nice, but not enough... This is nice, but not enough...
sincesince
Flush+Reload can be replaced by Flush+Reload can be replaced by
Flush+Flush to achieve the same Flush+Flush to achieve the same
result without actual page missresult without actual page miss
https://github.com/capsule8/capsule8/tree/master/examples

EndGameEndGame
They did not provide code They did not provide code
examples...examples...
However they explained a lot around However they explained a lot around
the statistics and using the CPU the statistics and using the CPU
performance counters.performance counters.
https://www.endgame.com/blog/technical-blog/detecting-spectre-and-meltdown-using-hardware
-performance-counters

Both examples by Capsule 8 and Both examples by Capsule 8 and
EndGame provide detection, but EndGame provide detection, but
little to no countermeasures.little to no countermeasures.

Fight the requirements
not the attacks
➢ Successful meltdown exploitation prefers that
both the SIGSEGV children and the victim are on
the same CPU
➢ so we simply LIE to sched_setaffinity
➢ effectively we do nothing
➢ we save the requested affinity in the
task_struct as cpumask_t cpus_allowed;cpumask_t cpus_allowed;
➢ we have patched sched_getaffinitysched_getaffinity to
report only the cpu mask already stored for
the current process

not the attacks
➢ Successful meltdown exploitation requires that
a process should have one of the following:
➢ SIGSEGV children or grandchildren
➢ SIGSEGV threads
➢ TSX instructions that do not finish successfully

not the attacks
➢ On our infrastructure, there is no customer's
software that has a valid case to have
➢ SIGSEGV children or threads
➢ our CPUs do not support TSX instructions :)

So...So...
we decided to forbid we decided to forbid
SIGSEGV processesSIGSEGV processes

What we did?What we did?
Kernel moduleKernel module

– detecting processes that had more then detecting processes that had more then
1 child dying with SIGSEGV1 child dying with SIGSEGV

– when such process is detected it is when such process is detected it is
STOPPED, not KILLEDSTOPPED, not KILLED

– when such process is detected it is when such process is detected it is
STOPPED, not KILLEDSTOPPED, not KILLED
– only the root on the host machine can only the root on the host machine can
send any type of signals to this send any type of signals to this
processprocess

What we had to change?What we had to change?
– Introduced a per process counter of Introduced a per process counter of
its SIGSEGV childrenits SIGSEGV children
unsigned int pids[MAX_PID];unsigned int pids[MAX_PID];
– implement a workqueue to check for implement a workqueue to check for
abusersabusers
create_singlethread_workqueue()create_singlethread_workqueue()
– implement a implement a /proc/proc interface to monitor interface to monitor
and change the max segfaultsand change the max segfaults

Similar attack pattern?Similar attack pattern?
➢  ForshadowOSForshadowOS
➢  ForshadowVMMForshadowVMM
➢  ForshadowSGXForshadowSGX
➢  All of the above require the All of the above require the
generation of page faults, which is generation of page faults, which is
essentially the same side effect essentially the same side effect
that we see with Meltdownthat we see with Meltdown

Cache Line FlushCache Line Flush
– Limiting clflush and clflushopt Limiting clflush and clflushopt
effectively stops Flush+Reload and effectively stops Flush+Reload and
Flush+Flush attacksFlush+Flush attacks
– cache flush can be indirectly called cache flush can be indirectly called
when invalid instruction is issuedwhen invalid instruction is issued
– This greatly limits the options for This greatly limits the options for
executing Meltdown, by leaving only executing Meltdown, by leaving only
TSX instructionsTSX instructions

– both clflush and clflushopt are both clflush and clflushopt are
unprivileged instructions. Trapping unprivileged instructions. Trapping
them is not directly possiblethem is not directly possible
– We discussed different approaches:We discussed different approaches:
●
Inspecting the instructions of each Inspecting the instructions of each
binary, before it is executed and marking binary, before it is executed and marking
it cleanit clean
●
Inspecting the binary in parallel while Inspecting the binary in parallel while
the program is executingthe program is executing
●
Virtualizing the system and actually Virtualizing the system and actually
trapping the instructions after they have trapping the instructions after they have
been evaluated by the guest kernelbeen evaluated by the guest kernel

– adding noclflush on the kernel cmdline adding noclflush on the kernel cmdline
does NOT disable clflush!!!does NOT disable clflush!!!
– Events you can monitor for clflush:Events you can monitor for clflush:
●
L2_LINES_OUT.DEMAND_CLEANL2_LINES_OUT.DEMAND_CLEAN
●
MEM_LOAD_UOPS_RETIRED.L3_MISSMEM_LOAD_UOPS_RETIRED.L3_MISS
●
PERF_COUNT_HW_CACHE_LLPERF_COUNT_HW_CACHE_LL
●
PERF_COUNT_HW_CACHE_OP_READPERF_COUNT_HW_CACHE_OP_READ
●
PERF_COUNT_HW_CACHE_RESULT_ACCESSPERF_COUNT_HW_CACHE_RESULT_ACCESS
●
PERF_COUNT_HW_CACHE_RESULT_MISSPERF_COUNT_HW_CACHE_RESULT_MISS

TSXTSX
Transactional Synchronization Transactional Synchronization
eXtensions (TSX)eXtensions (TSX)
– Because of issues with the Because of issues with the
implementation, TSX instructions implementation, TSX instructions
should be disabled on Haswell CPUsshould be disabled on Haswell CPUs
– However if the microcode is not However if the microcode is not
applied, your Haswell CPUs support TSX applied, your Haswell CPUs support TSX
:):)
– TSX instructions are supported on TSX instructions are supported on
Skylake...Skylake...

TSXTSX
– One thing that EndGame showed us is One thing that EndGame showed us is
that TSX instructions can be countedthat TSX instructions can be counted
– RTM_RETIRED.ABORTEDRTM_RETIRED.ABORTED

TSXTSX
– I considered lying to userspace by I considered lying to userspace by
reporting that TSX is not supported by reporting that TSX is not supported by
the CPU...the CPU...
●
but the cpuid instruction is unprivileged but the cpuid instruction is unprivileged
so trapping it is a nontrivial job so trapping it is a nontrivial job 

TSXTSX
– So relying on CPU counters is So relying on CPU counters is
currently the only sensible way of currently the only sensible way of
detecting if TSX is being used for detecting if TSX is being used for
Meltdown exploitsMeltdown exploits
– The legitimate use of TSX instructions The legitimate use of TSX instructions
is very limited, and on shared hosting is very limited, and on shared hosting
we will most likely not see such we will most likely not see such
softwaresoftware

TSX eventsTSX events
– RTM_RETIRED.STARTRTM_RETIRED.START    Number of times we Number of times we
entered an RTM region. Does not count nested entered an RTM region. Does not count nested
transactionstransactions
– RTM_RETIRED.ABORTEDRTM_RETIRED.ABORTED    Number of times an Number of times an
RTM execution aborted due to any reasons RTM execution aborted due to any reasons
(multiple categories may count as one)(multiple categories may count as one)
– RTM_RETIRED.ABORTED_TIMER Number of RTM_RETIRED.ABORTED_TIMER Number of
times an RTM execution aborted due to times an RTM execution aborted due to
uncommon conditionsuncommon conditions

you can find the list of Processor Monitor Unit
(PMU) events by running:
# perf list
Perf can be build from the linux kernel source
tree in tools/perf:
# make
# mv perf /usr/bin
TSX eventsTSX eventsTSX eventsTSX eventsTSX eventsTSX events

GlossaryGlossary
ALU Arithmetic Logic Unit
AGU Address Generation Unit
TLB Translation Lookaside Buffer
BTP Branch Target Predictor
BP   Branch Predictor
BTB Branch Target Buffer
LLC Last Level Cache
WB Cache     Writeback cache
W.C. Cache   Write combining cache
Trace Cache execution trace cache

LinksLinks
Flush + Reload paper
Flush + Flush paper
EndGame research
Capsule 8 meltdown detection
Spectre & Meltdown attacks
Meltdown PoC
Collection of Speculation bugs info
Forshadow attacks
TLBleed attack

Marian MarinovMarian Marinov
mm@siteground.commm@siteground.com

Preventing CPU side-channel attacks with kernel tracking

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Preventing CPU side-channel attacks with kernel tracking

Similar to Preventing CPU side-channel attacks with kernel tracking (20)

More from Marian Marinov

More from Marian Marinov (20)

Recently uploaded

Recently uploaded (20)

Preventing CPU side-channel attacks with kernel tracking