[若渴計畫] Studying Concurrency

Studying Concurrency
2017.1.22
<ajblane0612@gmail.com>
AJMachine
迷失到收斂
Outline
• 為什麼寫concurrency不容易?
• Programmer-observable behavior
• 來點concurrency performance 撰寫技巧例子
• 來點concurrency security 例子
為什麼寫Concurrency不容易?
• Hardware optimizations
• Compiler optimizations
無法預期行為
Hardware Optimizations
- Write Buffer
• On a write, a processor simply inserts the write operation into the
write buffer and proceeds without waiting for the write to complete
• In order to effectively hide the latency of write operations
• Therefore, P1, P2 are all in critical sections
Sarita V. Adve, Kourosh Gharachorloo, “Shared Memory Consistency Models: A Tutorial”
Hardware Optimizations
- Overlapped Writes
• Assume the Data and Head variables reside in different memory modules
• Since the write to Head may be injected into the network before the write to Data
has reached its memory module
• Therefore, it is possible for another processor to observe the new value of Head
and yet obtain the old value of Data
• Reordering of write operations
Sarita V. Adve, Kourosh Gharachorloo, “Shared Memory Consistency Models: A Tutorial”
(coalesced write)
Hardware Optimizations
- Non−blocking Reads
• If P2 is allowed to issue its read operations in an overlapped
fashion, there is the possibility for the read of Data to arrive
at its memory module before the write from P1 while the
read of Head reaches its memory module after the write
from P1 => P2.Data =2000/ P2.Head = 0
Sarita V. Adve, Kourosh Gharachorloo, “Shared Memory Consistency Models: A Tutorial”
(coalesced read)
如果更想仔細了解運作,可參考
Memory Barriers: a Hardware View for
Software Hackers
所以怎麼辦? 理想上
• Sequential Consistency (單核operations順序=
多核operation順序)
– The result of any execution is the same as if the
operations of all the processors were executed in
some sequential order, and the operations of each
individual processor appear in this sequence in
the order specified by its program
• There is no local reordering
• Each write becomes visible to all threads
Sarita V. Adve, Kourosh Gharachorloo, “Shared Memory Consistency Models: A Tutorial”
Luc Maranget, etc., “A Tutorial Introduction to the ARM and POWER Relaxed Memory Models”
事實上,不保證SC
Memory model Local ordering Multiple-copy atomic
model
Total store ordering Intel x86 X O
Relaxed memory
model
ARM X X
Luc Maranget, etc., “A Tutorial Introduction to the ARM and POWER Relaxed Memory Models”
Developers需自己寫code管理記憶體操作順序
Hardware Optimizations這麼多,我要
怎知道程式的運作行為(Programmer-
observable Behavior)?
• Mathematically rigorous architecture
definitions
– Luc Maranget, etc., “A Tutorial Introduction to the
ARM and POWER Relaxed Memory Models”
• Hardware semantics
– Shaked Flur, etc., “Modelling the ARMv8
Architecture, Operationally Concurrency and ISA”
• C/C++11 memory model
• …?
Mathematically Rigorous Architecture
Definitions – For Example
• Message Passing (MP)
Luc Maranget, etc., “A Tutorial Introduction to the ARM and POWER Relaxed Memory Models”
Y=1; r1=y; r2=x; x=1 r1=1 ∧ r2=0
x86-TSO : forbidden
ARM: allowed
Partial-order Propagation
?
Partial-order Propagation是否一定會
影響程式行為? 不一定會
• MP test harness
• m is the number of times that the final outcome
of r1=1 ∧ r2=0 was observed in n trials
Hardware Semantics
Shaked Flur, etc., “Modelling the ARMv8 Architecture, Operationally Concurrency and ISA”
撰寫
撰寫
Web Site of Hardware Semantics
http://www.cl.cam.ac.uk/~sf502/popl16/help.html
Result of Hardware Semantics
http://www.cl.cam.ac.uk/~sf502/popl16/help.html
如果有同時存取某位置(lock沒寫好),可以看result資訊可提早看出。
C/C++11 Memory Model
• 從language層面,制定keywords,來使各個
硬體必須符合此language memory model。
– https://www.youtube.com/watch?v=S-x-23lrRnc
• 此影片中有提到ARM為了滿足C11 memory model,
complier會有double barrier狀況
• Reinoud Elhorst, “Lowering C11 Atomics for ARM in
LLVM”
– Torvald Riegel, “Modern C/C++ concurrency”
• Semantics
– Mark Barry, “Mathematizing C++ concurrency”
Mathematizing C++ Concurrency
• 利用 Isabelle/HOL 來撰寫C++ memory model
的semantics
• For example:定義release sequence
來點Concurrency Performance撰寫技
巧例子
• LMAX
• RCU
• Concurrent malloc(3)
• An Analysis of Linux Scalability to Many Cores
LMAX: New Financial Trading Platform
https://martinfowler.com/articles/lmax.html
LMAX Lock-free技巧
http://mechanitis.blogspot.tw/2011/06/dissecting-disruptor-how-do-i-read-from.html
• 應用Barrier就是把原本lock改成lock-free,lock-free可以想成lock是硬體
管理。基本上實作概念跟lock差不多。排隊。
• RingBuffer: 增快反應時間
Read Copy Update (RCU)
• Read-mostly situations
• Typical RCU: update into removal and reclamation (disrupt)
– Removal and Replacing references to data items can run concurrently with readers
– Remove pointers to a data structure, so that subsequent readers cannot gain a
reference to it
– RCU provides implicit low-overhead communication between readers and reclaimers
(synchronize_rcu())
https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt
https://lwn.net/Articles/262464/
Grace Period 時間太長?
https://lwn.net/Articles/253651/
有一堆RCU,只能有緣再唸了
https://lwn.net/Articles/264090/
Concurrent malloc(3)
• How to false cache sharing
– Modern multi-processor systems preserve a coherent
view of memory on a per-cache-line basis
• How to reduce lock contention
Jason Evans, “a scalable concurrent malloc implementation for freebsd”
jemalloc
• Phk-malloc was specially optimized to minimize the working set of pages, jemalloc
must be more concerned with cache locality
• jemalloc first tries to minimize memory usage, and tries to allocate contiguously
(weaker security)
• One way of fixing this issue is to pad allocations, but padding is in direct opposition
to the goal of packing objects as tightly as possible; it can cause severe internal
fragmentation. jemalloc instead relies on multiple allocation arenas to reduce the
problem
• One of the main goals for this allocator was to reduce lock contention for multi-
threaded applications by using a single 2 allocator lock, each free list had its own
lock
• The solution was to use multiple
arenas for allocation, and assign threads
to arenas via hashing of the thread identifiers
Jason Evans, “a scalable concurrent malloc implementation for freebsd”
Scalability Collapse Caused by Non-
scalable Locks
Linux Scalability to Many Cores -
Per-core Mount Caches
Silas Boyd-Wickizer, etc. , “An Analysis of Linux Scalability to Many Cores”
• Observation: mount table is
rarely modified
• Common case: cores access
per-core tables
• Modify mount table: invalidate
per-core tables
Linux Scalability to Many Cores -
Sloppy Counters
• Because reading reference count is slow
Silas Boyd-Wickizer, etc. , “An Analysis of Linux Scalability to Many Cores”
來點Concurrency Security 例子
• Concurrency fuzzer
– Sebastian Burckhardt, etc., “A Randomized
Scheduler with Probabilistic Guarantees of Finding
Bugs”
• Timing side channel attack
– Yeongjin Jang, etc., “Breaking Kernel Address
Space Layout Randomization with Intel TSX”
Concurrency Fuzzer-
Randomized Scheduler
Sebastian Burckhardt, etc., “A Randomized Scheduler with Probabilistic
Guarantees of Finding Bugs”
Randomized Scheduler
基本上,Read/ Write reordering in hardware 是沒有模擬到的
Find Violation (Order/ Atomicity)
此投影片有整理幾個Fuzzer
“Concurrency: A problem and
opportunity in the exploitation of
memory corruptions”
Intel Transactional Synchronization
Extensions
• the assembly instruction xbegin can return various
results that represent the hardware's suggestions for
how to proceed and reasons for failure: success, a
suggestion to retry, a potential cause for the abort
• To effectively use TSX it's imperative to understand it's
implementation and limitations. TSX is implemented
using the cache coherence protocol, which x86
machines already implement. When a transaction
begins, the processor starts tracking read and write
sets of cache lines which have been brought into the L1
cache. If at any point during a logical core's execution
of a transaction another core modifies a cache line in
the read or write set then the transaction is aborted.
Nick Stanley, “Hardware Transactional Memory with Intel’s TSX”
Intel Transactional Synchronization
Extensions - Suppressing exceptions
• a transaction aborts when such a hardware exception occurs during the
execution of the transaction. However, unlike normal situations where the
OS intervenes and handles these exceptions gracefully, TSX instead
invokes a user-specified abort handler, without informing the underlying
OS. More precisely, TSX treats these exceptions in a synchronous
manner—immediately executing an abort handler while suppressing the
exception itself. In other words, the exception inside the transaction will
not be communicated to the underlying OS. This allows us to engage in
abnormal behavior (e.g., attempting to access privileged, i.e., kernel,
memory regions) without worrying about crashing the program. In DrK,
we break KASLR by turning this surprising behavior into a timing channel
that leaks the status (e.g., mapped or unmapped) of all kernel pages.
Timing Side Channel Attack
• TSX instead invokes a user-
specified abort handler, without
informing the underlying OS
• 也就是說我在User space就可以
知道kennel address with random
(!!!)
Yeongjin Jang, etc., “Breaking Kernel Address Space Layout
Randomization with Intel TSX”
Reference
• Sarita V. Adve, Kourosh Gharachorloo, “Shared Memory Consistency
Models: A Tutorial”
• Luc Maranget, etc., “A Tutorial Introduction to the ARM and POWER
Relaxed Memory Models”
• Shaked Flur, etc., “Modelling the ARMv8 Architecture, Operationally
Concurrency and ISA”
• https://www.youtube.com/watch?v=6QU37TwRO4w
• http://www.cl.cam.ac.uk/~sf502/popl16/help.html
• Jade Alglave, etc., “The Semantics of Power and ARM Multiprocessor
Machine Code”
• Paul E. McKenney, “Memory Barriers: a Hardware View for Software
Hackers”
Reference
C/C++ 11 memory model
• https://www.youtube.com/watch?v=S-x-23lrRnc
• Reinoud Elhorst, “Lowering C11 Atomics for ARM in LLVM”
• Torvald Riegel, “Modern C/C++ concurrency”
• Mark Barry, “Mathematizing C++ concurrency”
LMAX
• https://github.com/LMAX-Exchange/disruptor
• https://martinfowler.com/articles/lmax.html
• http://mechanitis.blogspot.tw/2011/06/dissecting-disruptor-how-do-i-read-
from.html
RCU
• https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt
• https://lwn.net/Articles/262464/
• https://lwn.net/Articles/253651/
• https://lwn.net/Articles/264090/
Reference
Concurrent malloc(3)
• Jason Evans, “a scalable concurrent malloc implementation
for freebsd”
Concurrency security
• Sebastian Burckhardt, etc., “A Randomized Scheduler with
Probabilistic Guarantees of Finding Bugs”
• Ralf-Philipp Weinmann, etc., “Concurrency: A problem and
opportunity in the exploitation of memory corruptions”
• Yeongjin Jang, etc., “Breaking Kernel Address Space Layout
Randomization with Intel TSX”
• Nick Stanley, “Hardware Transactional Memory with Intel’s
TSX” (有建議的Intel concurrency寫法)
1 of 38

Recommended

Optimizing VM images for OpenStack with KVM/QEMU by
Optimizing VM images for OpenStack with KVM/QEMUOptimizing VM images for OpenStack with KVM/QEMU
Optimizing VM images for OpenStack with KVM/QEMUOpenStack Foundation
14.5K views24 slides
ceph optimization on ssd ilsoo byun-short by
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortNAVER D2
2.4K views49 slides
Open stack cinder by
Open stack cinderOpen stack cinder
Open stack cinderYong Luo
413 views24 slides
OpenStack Cinder by
OpenStack CinderOpenStack Cinder
OpenStack CinderRenuka Apte
5.3K views14 slides
The kvm virtualization way by
The kvm virtualization wayThe kvm virtualization way
The kvm virtualization wayFrancisco Gonçalves
8.3K views22 slides
OpenStack Cinder Best Practices - Meet Up by
OpenStack Cinder Best Practices - Meet UpOpenStack Cinder Best Practices - Meet Up
OpenStack Cinder Best Practices - Meet UpAaron Delp
2.2K views29 slides

More Related Content

What's hot

KVM tools and enterprise usage by
KVM tools and enterprise usageKVM tools and enterprise usage
KVM tools and enterprise usagevincentvdk
5K views93 slides
Sheepdog Status Report by
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status ReportLiu Yuan
1.1K views20 slides
Laying OpenStack Cinder Block Services by
Laying OpenStack Cinder Block ServicesLaying OpenStack Cinder Block Services
Laying OpenStack Cinder Block ServicesKenneth Hui
2.5K views33 slides
RHEVM - Live Storage Migration by
RHEVM - Live Storage MigrationRHEVM - Live Storage Migration
RHEVM - Live Storage MigrationRaz Tamir
1.2K views25 slides
Openstack HA by
Openstack HAOpenstack HA
Openstack HAYong Luo
487 views55 slides
Building your own NSQL store by
Building your own NSQL storeBuilding your own NSQL store
Building your own NSQL storeEdward Capriolo
1.1K views57 slides

What's hot(20)

KVM tools and enterprise usage by vincentvdk
KVM tools and enterprise usageKVM tools and enterprise usage
KVM tools and enterprise usage
vincentvdk5K views
Sheepdog Status Report by Liu Yuan
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status Report
Liu Yuan1.1K views
Laying OpenStack Cinder Block Services by Kenneth Hui
Laying OpenStack Cinder Block ServicesLaying OpenStack Cinder Block Services
Laying OpenStack Cinder Block Services
Kenneth Hui2.5K views
RHEVM - Live Storage Migration by Raz Tamir
RHEVM - Live Storage MigrationRHEVM - Live Storage Migration
RHEVM - Live Storage Migration
Raz Tamir1.2K views
Openstack HA by Yong Luo
Openstack HAOpenstack HA
Openstack HA
Yong Luo487 views
OpenStack Cinder Overview - Havana Release by Avishay Traeger
OpenStack Cinder Overview - Havana ReleaseOpenStack Cinder Overview - Havana Release
OpenStack Cinder Overview - Havana Release
Avishay Traeger9K views
Cinder Live Migration and Replication - OpenStack Summit Austin by Ed Balduf
Cinder Live Migration and Replication - OpenStack Summit AustinCinder Live Migration and Replication - OpenStack Summit Austin
Cinder Live Migration and Replication - OpenStack Summit Austin
Ed Balduf3.4K views
Libvirt/KVM Driver Update (Kilo) by Stephen Gordon
Libvirt/KVM Driver Update (Kilo)Libvirt/KVM Driver Update (Kilo)
Libvirt/KVM Driver Update (Kilo)
Stephen Gordon4.7K views
Kubernetes for HCL Connections Component Pack - Build or Buy? by Martin Schmidt
Kubernetes for HCL Connections Component Pack - Build or Buy?Kubernetes for HCL Connections Component Pack - Build or Buy?
Kubernetes for HCL Connections Component Pack - Build or Buy?
Martin Schmidt43 views
Virtualization Architecture & KVM by Pradeep Kumar
Virtualization Architecture & KVMVirtualization Architecture & KVM
Virtualization Architecture & KVM
Pradeep Kumar33.4K views
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화 by OpenStack Korea Community
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
Monitor PowerKVM using Ganglia, Nagios by Pradeep Kumar
Monitor PowerKVM using Ganglia, NagiosMonitor PowerKVM using Ganglia, Nagios
Monitor PowerKVM using Ganglia, Nagios
Pradeep Kumar462 views
Play With Android by Champ Yen
Play With AndroidPlay With Android
Play With Android
Champ Yen1.1K views
Linux Integrity Mechanisms - Protecting Container Runtime as an example by Clay (Chih-Hao) Chang
Linux Integrity Mechanisms - Protecting Container Runtime as an exampleLinux Integrity Mechanisms - Protecting Container Runtime as an example
Linux Integrity Mechanisms - Protecting Container Runtime as an example
OSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan by OpenStorageSummit
OSS Presentation VMWorld 2011 by Andy Bennett & Craig MorganOSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
OSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
OpenStorageSummit419 views
Cinder - status of replication by Ed Balduf
Cinder - status of replicationCinder - status of replication
Cinder - status of replication
Ed Balduf825 views
Symmetric Crypto for DPDK - Declan Doherty by harryvanhaaren
Symmetric Crypto for DPDK - Declan DohertySymmetric Crypto for DPDK - Declan Doherty
Symmetric Crypto for DPDK - Declan Doherty
harryvanhaaren2K views
One-click Hadoop Cluster Deployment on OpenPOWER Systems by Pradeep Kumar
One-click Hadoop Cluster Deployment on OpenPOWER SystemsOne-click Hadoop Cluster Deployment on OpenPOWER Systems
One-click Hadoop Cluster Deployment on OpenPOWER Systems
Pradeep Kumar470 views

Similar to [若渴計畫] Studying Concurrency

Memory model by
Memory modelMemory model
Memory modelYi-Hsiu Hsu
4.6K views45 slides
CPU Caches by
CPU CachesCPU Caches
CPU Cachesshinolajla
4.6K views44 slides
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks by
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other AttacksExploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacksinside-BigData.com
940 views184 slides
Cassandra and drivers by
Cassandra and driversCassandra and drivers
Cassandra and driversBen Bromhead
446 views56 slides
Cpu Cache and Memory Ordering——并发程序设计入门 by
Cpu Cache and Memory Ordering——并发程序设计入门Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门frogd
50.7K views81 slides
Beneath the Linux Interrupt handling by
Beneath the Linux Interrupt handlingBeneath the Linux Interrupt handling
Beneath the Linux Interrupt handlingBhoomil Chavda
595 views20 slides

Similar to [若渴計畫] Studying Concurrency(20)

CPU Caches by shinolajla
CPU CachesCPU Caches
CPU Caches
shinolajla4.6K views
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks by inside-BigData.com
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other AttacksExploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks
Exploiting Modern Microarchitectures: Meltdown, Spectre, and other Attacks
inside-BigData.com940 views
Cassandra and drivers by Ben Bromhead
Cassandra and driversCassandra and drivers
Cassandra and drivers
Ben Bromhead446 views
Cpu Cache and Memory Ordering——并发程序设计入门 by frogd
Cpu Cache and Memory Ordering——并发程序设计入门Cpu Cache and Memory Ordering——并发程序设计入门
Cpu Cache and Memory Ordering——并发程序设计入门
frogd50.7K views
Beneath the Linux Interrupt handling by Bhoomil Chavda
Beneath the Linux Interrupt handlingBeneath the Linux Interrupt handling
Beneath the Linux Interrupt handling
Bhoomil Chavda595 views
POWER ISA introduction and what’s new in ISA V3.1 (Overview) by Ganesan Narayanasamy
POWER ISA introduction and what’s new in ISA V3.1 (Overview)POWER ISA introduction and what’s new in ISA V3.1 (Overview)
POWER ISA introduction and what’s new in ISA V3.1 (Overview)
Windows Internals for Linux Kernel Developers by Kernel TLV
Windows Internals for Linux Kernel DevelopersWindows Internals for Linux Kernel Developers
Windows Internals for Linux Kernel Developers
Kernel TLV1.2K views
Andy Parsons Pivotal June 2011 by Andy Parsons
Andy Parsons Pivotal June 2011Andy Parsons Pivotal June 2011
Andy Parsons Pivotal June 2011
Andy Parsons508 views
Automating the Hunt for Non-Obvious Sources of Latency Spreads by ScyllaDB
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency Spreads
ScyllaDB269 views
Client Drivers and Cassandra, the Right Way by DataStax Academy
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
DataStax Academy7.5K views
Pune-Cocoa: Blocks and GCD by Prashant Rane
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
Prashant Rane1.3K views
Solaris vs Linux by Grigale LTD
Solaris vs LinuxSolaris vs Linux
Solaris vs Linux
Grigale LTD27K views
Application Profiling for Memory and Performance by pradeepfn
Application Profiling for Memory and PerformanceApplication Profiling for Memory and Performance
Application Profiling for Memory and Performance
pradeepfn1.7K views
LMAX Disruptor - High Performance Inter-Thread Messaging Library by Sebastian Andrasoni
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging Library
Sebastian Andrasoni4.6K views
Using the big guns: Advanced OS performance tools for troubleshooting databas... by Nikolay Savvinov
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Nikolay Savvinov192 views
Synchronization linux by Susant Sahani
Synchronization linuxSynchronization linux
Synchronization linux
Susant Sahani4.1K views

More from Aj MaChInE

An Intro on Data-oriented Attacks by
An Intro on Data-oriented AttacksAn Intro on Data-oriented Attacks
An Intro on Data-oriented AttacksAj MaChInE
301 views18 slides
A Study on .NET Framework for Red Team - Part I by
A Study on .NET Framework for Red Team - Part IA Study on .NET Framework for Red Team - Part I
A Study on .NET Framework for Red Team - Part IAj MaChInE
493 views28 slides
A study on NetSpectre by
A study on NetSpectreA study on NetSpectre
A study on NetSpectreAj MaChInE
211 views27 slides
Introduction to Adversary Evaluation Tools by
Introduction to Adversary Evaluation ToolsIntroduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation ToolsAj MaChInE
1.2K views45 slides
[若渴] A preliminary study on attacks against consensus in bitcoin by
[若渴] A preliminary study on attacks against consensus in bitcoin[若渴] A preliminary study on attacks against consensus in bitcoin
[若渴] A preliminary study on attacks against consensus in bitcoinAj MaChInE
345 views46 slides
[RAT資安小聚] Study on Automatically Evading Malware Detection by
[RAT資安小聚] Study on Automatically Evading Malware Detection[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware DetectionAj MaChInE
795 views71 slides

More from Aj MaChInE(19)

An Intro on Data-oriented Attacks by Aj MaChInE
An Intro on Data-oriented AttacksAn Intro on Data-oriented Attacks
An Intro on Data-oriented Attacks
Aj MaChInE301 views
A Study on .NET Framework for Red Team - Part I by Aj MaChInE
A Study on .NET Framework for Red Team - Part IA Study on .NET Framework for Red Team - Part I
A Study on .NET Framework for Red Team - Part I
Aj MaChInE493 views
A study on NetSpectre by Aj MaChInE
A study on NetSpectreA study on NetSpectre
A study on NetSpectre
Aj MaChInE211 views
Introduction to Adversary Evaluation Tools by Aj MaChInE
Introduction to Adversary Evaluation ToolsIntroduction to Adversary Evaluation Tools
Introduction to Adversary Evaluation Tools
Aj MaChInE1.2K views
[若渴] A preliminary study on attacks against consensus in bitcoin by Aj MaChInE
[若渴] A preliminary study on attacks against consensus in bitcoin[若渴] A preliminary study on attacks against consensus in bitcoin
[若渴] A preliminary study on attacks against consensus in bitcoin
Aj MaChInE345 views
[RAT資安小聚] Study on Automatically Evading Malware Detection by Aj MaChInE
[RAT資安小聚] Study on Automatically Evading Malware Detection[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware Detection
Aj MaChInE795 views
[若渴] Preliminary Study on Design and Exploitation of Trustzone by Aj MaChInE
[若渴] Preliminary Study on Design and Exploitation of Trustzone[若渴] Preliminary Study on Design and Exploitation of Trustzone
[若渴] Preliminary Study on Design and Exploitation of Trustzone
Aj MaChInE281 views
[若渴]Study on Side Channel Attacks and Countermeasures by Aj MaChInE
[若渴]Study on Side Channel Attacks and Countermeasures [若渴]Study on Side Channel Attacks and Countermeasures
[若渴]Study on Side Channel Attacks and Countermeasures
Aj MaChInE858 views
[若渴計畫] Challenges and Solutions of Window Remote Shellcode by Aj MaChInE
[若渴計畫] Challenges and Solutions of Window Remote Shellcode[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
Aj MaChInE981 views
[若渴計畫] Introduction: Formal Verification for Code by Aj MaChInE
[若渴計畫] Introduction: Formal Verification for Code[若渴計畫] Introduction: Formal Verification for Code
[若渴計畫] Introduction: Formal Verification for Code
Aj MaChInE718 views
[若渴計畫] Studying ASLR^cache by Aj MaChInE
[若渴計畫] Studying ASLR^cache[若渴計畫] Studying ASLR^cache
[若渴計畫] Studying ASLR^cache
Aj MaChInE430 views
[若渴計畫] Black Hat 2017之過去閱讀相關整理 by Aj MaChInE
[若渴計畫] Black Hat 2017之過去閱讀相關整理[若渴計畫] Black Hat 2017之過去閱讀相關整理
[若渴計畫] Black Hat 2017之過去閱讀相關整理
Aj MaChInE434 views
閱讀文章分享@若渴 2016.1.24 by Aj MaChInE
閱讀文章分享@若渴 2016.1.24閱讀文章分享@若渴 2016.1.24
閱讀文章分享@若渴 2016.1.24
Aj MaChInE1.3K views
[若渴計畫2015.8.18] SMACK by Aj MaChInE
[若渴計畫2015.8.18] SMACK[若渴計畫2015.8.18] SMACK
[若渴計畫2015.8.18] SMACK
Aj MaChInE1.2K views
[SITCON2015] 自己的異質多核心平台自己幹 by Aj MaChInE
[SITCON2015] 自己的異質多核心平台自己幹[SITCON2015] 自己的異質多核心平台自己幹
[SITCON2015] 自己的異質多核心平台自己幹
Aj MaChInE2.6K views
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU by Aj MaChInE
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
[MOSUT20150131] Linux Runs on SoCKit Board with the GPGPU
Aj MaChInE1.3K views
[若渴計畫]由GPU硬體概念到coding CUDA by Aj MaChInE
[若渴計畫]由GPU硬體概念到coding CUDA[若渴計畫]由GPU硬體概念到coding CUDA
[若渴計畫]由GPU硬體概念到coding CUDA
Aj MaChInE4.8K views
[若渴計畫]64-bit Linux Return-Oriented Programming by Aj MaChInE
[若渴計畫]64-bit Linux Return-Oriented Programming[若渴計畫]64-bit Linux Return-Oriented Programming
[若渴計畫]64-bit Linux Return-Oriented Programming
Aj MaChInE2.2K views
[MOSUT] Format String Attacks by Aj MaChInE
[MOSUT] Format String Attacks[MOSUT] Format String Attacks
[MOSUT] Format String Attacks
Aj MaChInE2.6K views

Recently uploaded

Tunable Laser (1).pptx by
Tunable Laser (1).pptxTunable Laser (1).pptx
Tunable Laser (1).pptxHajira Mahmood
24 views37 slides
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院IttrainingIttraining
52 views8 slides
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdfDr. Jimmy Schwarzkopf
19 views29 slides
20231123_Camunda Meetup Vienna.pdf by
20231123_Camunda Meetup Vienna.pdf20231123_Camunda Meetup Vienna.pdf
20231123_Camunda Meetup Vienna.pdfPhactum Softwareentwicklung GmbH
41 views73 slides
Business Analyst Series 2023 - Week 3 Session 5 by
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5DianaGray10
248 views20 slides
Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
53 views38 slides

Recently uploaded(20)

【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10248 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 views
AMAZON PRODUCT RESEARCH.pdf by JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta26 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely21 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive

[若渴計畫] Studying Concurrency

  • 3. Outline • 為什麼寫concurrency不容易? • Programmer-observable behavior • 來點concurrency performance 撰寫技巧例子 • 來點concurrency security 例子
  • 4. 為什麼寫Concurrency不容易? • Hardware optimizations • Compiler optimizations 無法預期行為
  • 5. Hardware Optimizations - Write Buffer • On a write, a processor simply inserts the write operation into the write buffer and proceeds without waiting for the write to complete • In order to effectively hide the latency of write operations • Therefore, P1, P2 are all in critical sections Sarita V. Adve, Kourosh Gharachorloo, “Shared Memory Consistency Models: A Tutorial”
  • 6. Hardware Optimizations - Overlapped Writes • Assume the Data and Head variables reside in different memory modules • Since the write to Head may be injected into the network before the write to Data has reached its memory module • Therefore, it is possible for another processor to observe the new value of Head and yet obtain the old value of Data • Reordering of write operations Sarita V. Adve, Kourosh Gharachorloo, “Shared Memory Consistency Models: A Tutorial” (coalesced write)
  • 7. Hardware Optimizations - Non−blocking Reads • If P2 is allowed to issue its read operations in an overlapped fashion, there is the possibility for the read of Data to arrive at its memory module before the write from P1 while the read of Head reaches its memory module after the write from P1 => P2.Data =2000/ P2.Head = 0 Sarita V. Adve, Kourosh Gharachorloo, “Shared Memory Consistency Models: A Tutorial” (coalesced read)
  • 9. 所以怎麼辦? 理想上 • Sequential Consistency (單核operations順序= 多核operation順序) – The result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program • There is no local reordering • Each write becomes visible to all threads Sarita V. Adve, Kourosh Gharachorloo, “Shared Memory Consistency Models: A Tutorial” Luc Maranget, etc., “A Tutorial Introduction to the ARM and POWER Relaxed Memory Models”
  • 10. 事實上,不保證SC Memory model Local ordering Multiple-copy atomic model Total store ordering Intel x86 X O Relaxed memory model ARM X X Luc Maranget, etc., “A Tutorial Introduction to the ARM and POWER Relaxed Memory Models” Developers需自己寫code管理記憶體操作順序
  • 11. Hardware Optimizations這麼多,我要 怎知道程式的運作行為(Programmer- observable Behavior)? • Mathematically rigorous architecture definitions – Luc Maranget, etc., “A Tutorial Introduction to the ARM and POWER Relaxed Memory Models” • Hardware semantics – Shaked Flur, etc., “Modelling the ARMv8 Architecture, Operationally Concurrency and ISA” • C/C++11 memory model • …?
  • 12. Mathematically Rigorous Architecture Definitions – For Example • Message Passing (MP) Luc Maranget, etc., “A Tutorial Introduction to the ARM and POWER Relaxed Memory Models” Y=1; r1=y; r2=x; x=1 r1=1 ∧ r2=0 x86-TSO : forbidden ARM: allowed Partial-order Propagation ?
  • 13. Partial-order Propagation是否一定會 影響程式行為? 不一定會 • MP test harness • m is the number of times that the final outcome of r1=1 ∧ r2=0 was observed in n trials
  • 14. Hardware Semantics Shaked Flur, etc., “Modelling the ARMv8 Architecture, Operationally Concurrency and ISA” 撰寫 撰寫
  • 15. Web Site of Hardware Semantics http://www.cl.cam.ac.uk/~sf502/popl16/help.html
  • 16. Result of Hardware Semantics http://www.cl.cam.ac.uk/~sf502/popl16/help.html 如果有同時存取某位置(lock沒寫好),可以看result資訊可提早看出。
  • 17. C/C++11 Memory Model • 從language層面,制定keywords,來使各個 硬體必須符合此language memory model。 – https://www.youtube.com/watch?v=S-x-23lrRnc • 此影片中有提到ARM為了滿足C11 memory model, complier會有double barrier狀況 • Reinoud Elhorst, “Lowering C11 Atomics for ARM in LLVM” – Torvald Riegel, “Modern C/C++ concurrency” • Semantics – Mark Barry, “Mathematizing C++ concurrency”
  • 18. Mathematizing C++ Concurrency • 利用 Isabelle/HOL 來撰寫C++ memory model 的semantics • For example:定義release sequence
  • 19. 來點Concurrency Performance撰寫技 巧例子 • LMAX • RCU • Concurrent malloc(3) • An Analysis of Linux Scalability to Many Cores
  • 20. LMAX: New Financial Trading Platform https://martinfowler.com/articles/lmax.html
  • 22. Read Copy Update (RCU) • Read-mostly situations • Typical RCU: update into removal and reclamation (disrupt) – Removal and Replacing references to data items can run concurrently with readers – Remove pointers to a data structure, so that subsequent readers cannot gain a reference to it – RCU provides implicit low-overhead communication between readers and reclaimers (synchronize_rcu()) https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt https://lwn.net/Articles/262464/
  • 25. Concurrent malloc(3) • How to false cache sharing – Modern multi-processor systems preserve a coherent view of memory on a per-cache-line basis • How to reduce lock contention Jason Evans, “a scalable concurrent malloc implementation for freebsd”
  • 26. jemalloc • Phk-malloc was specially optimized to minimize the working set of pages, jemalloc must be more concerned with cache locality • jemalloc first tries to minimize memory usage, and tries to allocate contiguously (weaker security) • One way of fixing this issue is to pad allocations, but padding is in direct opposition to the goal of packing objects as tightly as possible; it can cause severe internal fragmentation. jemalloc instead relies on multiple allocation arenas to reduce the problem • One of the main goals for this allocator was to reduce lock contention for multi- threaded applications by using a single 2 allocator lock, each free list had its own lock • The solution was to use multiple arenas for allocation, and assign threads to arenas via hashing of the thread identifiers Jason Evans, “a scalable concurrent malloc implementation for freebsd”
  • 27. Scalability Collapse Caused by Non- scalable Locks
  • 28. Linux Scalability to Many Cores - Per-core Mount Caches Silas Boyd-Wickizer, etc. , “An Analysis of Linux Scalability to Many Cores” • Observation: mount table is rarely modified • Common case: cores access per-core tables • Modify mount table: invalidate per-core tables
  • 29. Linux Scalability to Many Cores - Sloppy Counters • Because reading reference count is slow Silas Boyd-Wickizer, etc. , “An Analysis of Linux Scalability to Many Cores”
  • 30. 來點Concurrency Security 例子 • Concurrency fuzzer – Sebastian Burckhardt, etc., “A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs” • Timing side channel attack – Yeongjin Jang, etc., “Breaking Kernel Address Space Layout Randomization with Intel TSX”
  • 31. Concurrency Fuzzer- Randomized Scheduler Sebastian Burckhardt, etc., “A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs” Randomized Scheduler 基本上,Read/ Write reordering in hardware 是沒有模擬到的 Find Violation (Order/ Atomicity)
  • 32. 此投影片有整理幾個Fuzzer “Concurrency: A problem and opportunity in the exploitation of memory corruptions”
  • 33. Intel Transactional Synchronization Extensions • the assembly instruction xbegin can return various results that represent the hardware's suggestions for how to proceed and reasons for failure: success, a suggestion to retry, a potential cause for the abort • To effectively use TSX it's imperative to understand it's implementation and limitations. TSX is implemented using the cache coherence protocol, which x86 machines already implement. When a transaction begins, the processor starts tracking read and write sets of cache lines which have been brought into the L1 cache. If at any point during a logical core's execution of a transaction another core modifies a cache line in the read or write set then the transaction is aborted. Nick Stanley, “Hardware Transactional Memory with Intel’s TSX”
  • 34. Intel Transactional Synchronization Extensions - Suppressing exceptions • a transaction aborts when such a hardware exception occurs during the execution of the transaction. However, unlike normal situations where the OS intervenes and handles these exceptions gracefully, TSX instead invokes a user-specified abort handler, without informing the underlying OS. More precisely, TSX treats these exceptions in a synchronous manner—immediately executing an abort handler while suppressing the exception itself. In other words, the exception inside the transaction will not be communicated to the underlying OS. This allows us to engage in abnormal behavior (e.g., attempting to access privileged, i.e., kernel, memory regions) without worrying about crashing the program. In DrK, we break KASLR by turning this surprising behavior into a timing channel that leaks the status (e.g., mapped or unmapped) of all kernel pages.
  • 35. Timing Side Channel Attack • TSX instead invokes a user- specified abort handler, without informing the underlying OS • 也就是說我在User space就可以 知道kennel address with random (!!!) Yeongjin Jang, etc., “Breaking Kernel Address Space Layout Randomization with Intel TSX”
  • 36. Reference • Sarita V. Adve, Kourosh Gharachorloo, “Shared Memory Consistency Models: A Tutorial” • Luc Maranget, etc., “A Tutorial Introduction to the ARM and POWER Relaxed Memory Models” • Shaked Flur, etc., “Modelling the ARMv8 Architecture, Operationally Concurrency and ISA” • https://www.youtube.com/watch?v=6QU37TwRO4w • http://www.cl.cam.ac.uk/~sf502/popl16/help.html • Jade Alglave, etc., “The Semantics of Power and ARM Multiprocessor Machine Code” • Paul E. McKenney, “Memory Barriers: a Hardware View for Software Hackers”
  • 37. Reference C/C++ 11 memory model • https://www.youtube.com/watch?v=S-x-23lrRnc • Reinoud Elhorst, “Lowering C11 Atomics for ARM in LLVM” • Torvald Riegel, “Modern C/C++ concurrency” • Mark Barry, “Mathematizing C++ concurrency” LMAX • https://github.com/LMAX-Exchange/disruptor • https://martinfowler.com/articles/lmax.html • http://mechanitis.blogspot.tw/2011/06/dissecting-disruptor-how-do-i-read- from.html RCU • https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt • https://lwn.net/Articles/262464/ • https://lwn.net/Articles/253651/ • https://lwn.net/Articles/264090/
  • 38. Reference Concurrent malloc(3) • Jason Evans, “a scalable concurrent malloc implementation for freebsd” Concurrency security • Sebastian Burckhardt, etc., “A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs” • Ralf-Philipp Weinmann, etc., “Concurrency: A problem and opportunity in the exploitation of memory corruptions” • Yeongjin Jang, etc., “Breaking Kernel Address Space Layout Randomization with Intel TSX” • Nick Stanley, “Hardware Transactional Memory with Intel’s TSX” (有建議的Intel concurrency寫法)