Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Understanding real world concurrency bugs in go (fixed)

246 views

Published on

understanding real-world concurrency bugs in go

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Understanding real world concurrency bugs in go (fixed)

  1. 1. Understanding Real-World Concurrency Bugs in Go @kakashi
  2. 2. Hello! I am kakashi - Infra lead @UmboCV - Co-organizer @ Golang Taipei Gathering @kakashiliu @kkcliu
  3. 3. Learning Camera Smart Cloud Neural A.I.
  4. 4. Agenda ● Introduction ● Concurrency in Go ● Go concurrency Bugs ○ Blocking ○ Non-Blocking ● Conclusion
  5. 5. Introduction ● Systematic study for 6 popular go projects
  6. 6. Concurrency in Go 1. Making threads (goroutines) lightweight and easy to create 2. Using explicit messaging (via channels) to communicate across threads
  7. 7. Beliefs about Go: ● Make concurrent programming easier and less error-prone ● Make heavy use of message passing via channels, which is less error prone than shared memory ● Have less concurrency bugs ● Built-in deadlock and data racing can catch any bugs
  8. 8. Go Concurrency Usage Patterns surprising finding is that shared memory synchronisation operations are still used more often than message passing
  9. 9. 線程安全,碼農有錢
  10. 10. Go Concurrency Bugs 1. Blocking - one or more goroutines are unintentionally stuck in their execution and cannot move forward. 2. Non-Blocking - If instead all goroutines can finish their tasks but their behaviors are not desired, we call them non-blocking ones
  11. 11. Blocking Bugs Causes Message passing operations are even more likely to cause blocking bugs
  12. 12. faultMutex.Lock() if faultDomain == nil { var err error faultDomain, err = fetchFaultDomain() if err != nil { return cloudprovider.Zone{}, err } } zone := cloudprovider.Zone{} faultMutex.UnLock() return zone, nil Blocking Bug caused by Mutex
  13. 13. faultMutex.Lock() defer faultMutex.UnLock() if faultDomain == nil { var err error faultDomain, err = fetchFaultDomain() if err != nil { return cloudprovider.Zone{}, err } } zone := cloudprovider.Zone{} faultMutex.UnLock() return zone, nil Blocking Bug caused by Mutex
  14. 14. var group sync.WaitGroup group.Add(len(pm.plugins)) for_, p := range pm.plugins { go func(p *plugin) { defer group.Done() }() group.Wait() } Blocking Bug caused by WaitGroup
  15. 15. var group sync.WaitGroup group.Add(len(pm.plugins)) for_, p := range pm.plugins { go func(p *plugin) { defer group.Done() }() group.Wait() // blocking } group.Wait() // fixed Blocking Bug caused by WaitGroup
  16. 16. func finishReq(timeout time.Duration) r ob { ch := make(chanob) go func() { result := fn() ch <- result }() select { case result = <- ch return result case <- time.After(timeout) return nil } } Blocking Bug caused by Channel
  17. 17. func finishReq(timeout time.Duration) r ob { ch := make(chanob, 1) go func() { result := fn() ch <- result // blocking }() select { case result = <- ch return result case <- time.After(timeout) return nil } } Blocking Bug caused by Channel
  18. 18. Blocking Bug: Mistakenly using channel and mutex
  19. 19. Blocking Bug: Mistakenly using channel and mutex func goroutine1() { m.Lock() ch <- request // blocking m.Unlock() } func goroutine2() { for{ m.Lock() // blocking m.Unlock() request <- ch } }
  20. 20. Non-Blocking Bugs Causes There are much fewer non-blocking bugs caused by message passing than by shared memory accesses.
  21. 21. Non-Blocking Bug caused by select and channel ticker := time.NewTicker() for { f() select { case <- stopCh return case <- ticker } }
  22. 22. Non-Blocking Bug caused by select and channel ticker := time.NewTicker() for { select{ case <- stopCh: return default: } f() select { case <- stopCh: return case <- ticker: } }
  23. 23. Non-Blocking Bug caused Timer timer := time.NewTimer(0) if dur > 0 { timer = time.NewTimer(dur) } select{ case <- timer.C: case <- ctx.Done: return nil }
  24. 24. Non-Blocking Bug caused Timer timer := time.NewTimer(0) var timeout <- chan time.Time if dur > 0 { timer = time.NewTimer(dur) timeout = time.NewTimer(dur).C } select{ case <- timer.C: case <- timeout: case <- ctx.Done: return nil }
  25. 25. A data race caused by anonymous function for i:=17; i<=21; i++ { // write go func() { apiVersion := fmt.Sprintf(“v1.%d”, i) }() }
  26. 26. A data race caused by anonymous function for i:=17; i<=21; i++ { // write go func(i int) { apiVersion := fmt.Sprintf(“v1.%d”, i) }(i) }
  27. 27. A data race caused by passing reference through channel
  28. 28. Conclusion 1. Contrary to the common belief that message passing is less error-prone, more blocking bugs in our studied Go applications are caused by wrong message passing than by wrong shared memory protection. 2. Message passing causes less nonblocking bugs than shared memory synchronization 3. Misusing Go libraries can cause both blocking and nonblocking bugs
  29. 29. Q&A

×