Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2mAUqqs.
Kavya Joshi discusses the internals of the Go race detector and delves into the compiler instrumentation of the program, and the runtime module that detects data races. She also talks about the optimizations that make the dynamic race detector practical for use in the real world, and evaluates how practical it really is. Filmed at qconsf.com.
Kavya Joshi is a Software Engineer at Samsara. She particularly enjoys architecting and building highly concurrent, highly scalable systems.
2. InfoQ.com: News & Community Site
⢠Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
⢠250,000 senior developers subscribe to our weekly newsletter
⢠Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
⢠Post content from our QCon conferences
⢠2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
⢠96 deep dives on innovative topics packed as downloadable emags and
minibooks
⢠Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
go-race-detector
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
6. data races
âwhen two+ threads concurrently access a shared memory
location, at least one access is a write.â
R R R
W R R
R W W
!W W W
count = 1 count = 2 count = 2
!concurrent concurrent concurrent
// Shared variable
var count = 0
func incrementCount() {
if count == 0 {
count ++
}
}
func main() {
// Spawn two âthreadsâ
go incrementCount()
go incrementCount()
}
data race
âg2â
âg1â
7. data races
âwhen two+ threads concurrently access a shared memory
location, at least one access is a write.â
Thread 1 Thread 2
lock(l) lock(l)
count=1 count=2
unlock(l) unlock(l)
!data race
// Shared variable
var count = 0
func incrementCount() {
if count == 0 {
count ++
}
}
func main() {
// Spawn two âthreadsâ
go incrementCount()
go incrementCount()
}
data race
8. ⢠relevant
⢠elusive
⢠have undeďŹned consequences
⢠easy to introduce in languages â¨
like Go
Panic messages from
unexpected program
crashes are often reported
on the Go issue tracker.
An overwhelming number of
these panics
are caused by data races,
and an
overwhelming number of
those reports
centre around Goâs built in
map type.
â Dave Cheney
9. given we want to write multithreaded programs,
how may we protect our systems from the
unknown consequences of the
difďŹcult-to-track-down data race bugsâŚ
in a manner that is reliable and scalable?
10. read by goroutine 7
at incrementCount()
created at main()
race detectors
12. ⢠Go v1.1 (2013)â¨
⢠Integrated with the Go tool chain â
> go run -race counter.goâ¨
⢠Based on C/ C++ ThreadSanitizerâ¨
dynamic race detection library
⢠As of August 2015,
1200+ races in Googleâs codebase,
~100 in the Go stdlib,â¨
100+ in Chromium,â¨
+ LLVM, GCC, OpenSSL, WebRTC, Firefox
go race detector
15. concurrency in go
The unit of concurrent execution : goroutines
user-space threadsâ¨
use as you would threads â¨
> go handle_request(r)
Go memory model speciďŹed in terms of goroutines
within a goroutine: reads + writes are ordered
with multiple goroutines: shared data must be
synchronizedâŚelse data races!
17. ââŚgoroutines concurrently access a shared memory
location, at least one access is a write.â
?
concurrency
var count = 0
func incrementCount() {
if count == 0 {
count ++
}
}
func main() {
go incrementCount()
go incrementCount()
}
âg2â
âg1â
R R R
W R R
R W W
W W W
count = 1 count = 2 count = 2
!concurrent concurrent concurrent
18. how can we determine
âconcurrentâ
memory accesses?
19. var count = 0
func incrementCount() {
if count == 0 {
count++
}
}
func main() {
incrementCount()
incrementCount()
}
not concurrent â same goroutine
20. not concurrent â â¨
lock draws a âdependency edgeâ
var count = 0
func incrementCount() {
mu.Lock()
if count == 0 {
count ++
}
mu.Unlock()
}
func main() {
go incrementCount()
go incrementCount()
}
21. happens-before
memory accesses â¨
i.e. reads, writes
a := b
synchronization â¨
via locks or lock-free sync
mu.Unlock()
ch <â a
X âş Y IF one of:
â same goroutine
â are a synchronization-pair
â X âş E âş Y
across goroutines
IF X not âş Y and Y not âş X ,
concurrent!
orders events
27. (0, 0) (0, 0)
(1, 0)
(3, 0)
(4, 0)
(4, 1) C
(4, 2) D
A âş D ?
(3, 0) < (4, 2) ?
so yes.
L
U
R
WA
B
L
R
U
g1 g2
28. CR
W W
R
g1 g2
(1, 0)A
(2, 0)B
(0, 1)
(0, 2) D
B âş C ?
(2, 0) < (0, 1) ?
no.
C âş B ?
no.
so, concurrent
29. pure happens-before detection
Determines if the accesses to a memory location can be
ordered by happens-before, using vector clocks.
This is what the Go Race Detector does!
31. go run -race
to implement happens-before detection, need to:
create vector clocks for goroutinesâ¨
âŚat goroutine creationâ¨
update vector clocks based on memory access,â¨
synchronization eventsâ¨
âŚwhen these events occurâ¨
compare vector clocks to detect happens-before â¨
relations.â¨
âŚwhen a memory access occurs
33. do we have to modify
our programs then,
to generate the events?
memory accesses
synchronizations
goroutine creation
nope.
34. var count = 0
func incrementCount() {
if count == 0 {
count ++
}
}
func main() {
go incrementCount()
go incrementCount()
}
35. -race
var count = 0
func incrementCount() {
raceread()
if count == 0 {â¨
racewrite()
count ++
}â¨
racefuncexit()
}
func main() {
go incrementCount()
go incrementCount()
36. the gc compiler instruments memory accesses
adds an instrumentation pass over the IR.
go tool compile -race
func compile(fn *Node)
{
...
order(fn)
walk(fn)
if instrumenting {
instrument(Curfn)
}
...
}
37. This is awesome.
We donât have to modify our programs to track memory accesses.
package sync
import âinternal/race"
func (m *Mutex) Lock() {
if race.Enabled {
race.Acquire(âŚ)
}
...
}
raceacquire(addr)
mutex.go
package runtime
func newproc1() {
if race.Enabled {
newg.racectx =
racegostart(âŚ)
}
...
}
proc.go
What about synchronization events, and goroutine creation?
39. TSan implements the happens-before race detection:â¨
creates, updates vector clocks for goroutines -> ThreadStateâ¨
keeps track of memory access, synchronization events ->
Shadow State, Meta Mapâ¨
compares vector clocks to detect data races.
threadsanitizer
40. go incrementCount()
struct ThreadState {
ThreadClock clock;
}
contains a ďŹxed-size vector clock
(size == max(# threads))
func newproc1() {
if race.Enabled {
newg.racectx =
racegostart(âŚ)
}
...
}
proc.go
count == 0
raceread(âŚ)
by compiler instrumentation
1. data race with a previous access?
2. store information about this access â¨
for future detections
41. stores information about memory accesses.
8-byte shadow word for an access:
TID clock pos wr
TID: accessor goroutine IDâ¨
clock: scalar clock of accessor ,
optimized vector clock
pos: offset, size in 8-byte word
wr: IsWrite bit
shadow state
directly-mapped:
0x7fffffffffff
0x7f0000000000
0x1fffffffffff
0x180000000000
application
shadow
42. N shadow cells per application word (8-bytes)
gx read
When shadow words are ďŹlled, evict one at random.
Optimization 1
clock_1 0:2 0gx
gy write
clock_2 4:8 1gy
45. race detection
compare:
<accessorâs vector clock,
new shadow word>
g2 0 0:8 0
0 0
ââŚwhen two+ threads concurrently access a shared
memory location, at least one access is a write.â
g1 1 0:8 1
with: each existing shadow word
46. race detection
compare:
<accessorâs vector clock,
new shadow word>
do the access locations overlap?
are any of the accesses a write?
are the TIDS different?
are they concurrent (no happens-before)?
g2âs vector clock: (0, 0)
existing shadow wordâs clock: (1, ?)
g1 1 0:8 1g2 0 0:8 0
0 0
â
â
â
â
with: each existing shadow word
47. do the access locations overlap?
are any of the accesses a write?
are the TIDS different?
are they concurrent (no happens-before)?
race detection
g1 1 0:8 1g2 0 0:8 0
compare (accessorâs threadState, new shadow word) with
each existing shadow word:
0 0
RACE!
â
â
â
â
49. sync vars
mu := sync.Mutex{} struct SyncVar {
}
stored in the meta map region.
struct SyncVar {
SyncClock clock;
}
contains a vector clock
SyncClock
mu.Unlock()
3 0
g1 g2
mu.Lock() max(
SyncClock)
0 1
50. TSan tracks ďŹle descriptors, memory allocations etc. too
TSan can track your custom sync primitives too, via dynamic
annotations!
a note (or two)âŚ
52. evaluation
âis it reliable?â âis it scalable?â
program slowdown = 5x-15x
memory usage = 5x-10x
no false positives
(only reports âreal racesâ,
but can be benign)
can miss races!
depends on execution trace
â¨
As of August 2015,
1200+ races in Googleâs codebase,
~100 in the Go stdlib,â¨
100+ in Chromium,â¨
+ LLVM, GCC, OpenSSL, WebRTC, Firefox
53. with
go run -race =
gc compiler instrumentation +
TSan runtime library for
data race detection
happens-before using
vector clocks
55. alternatives
I. Static detectors
analyze the programâs source code.â¨
⢠typically have to augment the source with race annotations (-)
⢠single detection pass sufďŹcient to determine all possible â¨
races (+)
⢠too many false positives to be practical (-)â¨
II. Lockset-based dynamic detectors
uses an algorithm based on locks heldâ¨
⢠more performant than pure happens-before (+)
⢠may not recognize synchronization via non-locks,â¨
like channels (would report as races) (-)
56. III. Hybrid dynamic detectors
combines happens-before + locksets.â¨
(TSan v1, but it was hella unscalable)â¨
⢠âbest of both worldsâ (+)
⢠false positives (-)
⢠complicated to implement (-)â¨
â¨
â¨
57. requirements
I. Go speciďŹcs
v1.1+
gc compiler
gccgo does not support as per:
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01828.html
x86_64 required
Linux, OSX, Windows
II. TSan speciďŹcs
LLVM Clang 3.2, gcc 4.8
x86_64
requires ASLR, so compile/ ld with -fPIE, -pie
maps (using mmap but does not reserve) virtual address space;
tools like top/ ulimit may not work as expected.
58. fun facts
TSan
maps (by mmap but does not reserve) tons of virtual address
space; tools like top/ ulimit may not work as expected.
need: gdb -ex 'set disable-randomization off' --args ./a.outâ¨
due to ASLR requirement.â¨
â¨
Deadlock detection?
Kernel TSan?