Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Java Concurrent Optimization: Concurrent Queue

3,125 views

Published on

Step by step optimize a BlockingQueue, make the ops from 3m to 110m

Published in: Technology
  • 太狠了,之前看 http://psy-lob-saw.blogspot.com/2013/03/single-producerconsumer-lock-free-queue.html 这篇的优化已经做得很强,这里居然连Unsafe都用上...
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Java Concurrent Optimization: Concurrent Queue

  1. 1. 并发队列篇作者:周忱 | 数据平台-DXP微博:@MinZhou邮箱:zhouchen.zm@taobao.com
  2. 2. Java并发编程优化之阻塞队列关于我• 花名:周忱(chén)• 真名:周敏• 微博: @MinZhou• Twitter: @minzhou• 2010年6月加入淘宝• 曾经淘宝Hadoop&Hive研发组Leader• 目前云梯跨机房临时工• Hive Contributor• 自由、开源软件热爱者Data eXchange Platform| zhouchen.zm
  3. 3. Java并发编程优化之阻塞队列关于我• 花名:周忱(chén)• 真名:周敏• 微博: @MinZhou• Twitter: @minzhou• 2010年6月加入淘宝• 曾经淘宝Hadoop&Hive研发组Leader• 目前云梯跨机房临时工• Hive Contributor• 自由、开源软件热爱者Data eXchange Platform| zhouchen.zm
  4. 4. 队列是什么?Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm
  5. 5. 队列是什么?Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm
  6. 6. 队列的运用Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm
  7. 7. ArrayBlockingQueue & LinkedBlockingQueueJava并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• BlockingQueue• ArrayBlockingQueue: 数组实现• LinkedBlockingQueue: 链表实现• Ops约300万
  8. 8. 队列的性能问题Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• Linked list is the EVIL of performance• 在head, tail和size三个变量的写冲突• put/take和offer/poll上的大锁• GC问题
  9. 9. 单Writer原则Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm方法 时间(ms)单线程 long 300单线程 volatile long 4,700单线程 AtomicLong(CAS ) 5,700双线程 AtomicLong(CAS ) 18,000单线程synchronized + long 10,000双线程synchronized + long 118,000• 一个变量递增500,000,000次所需时间
  10. 10. 第一步:环形队列Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 没有写冲突, 不需要上锁, 甚至不需要CAS• 采用volatile关键字让对方线程可见• 不需要维护size• Ops约1100万
  11. 11. 内存屏障Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• Load Buffer• Store Buffer• CPU串行化指令– CPUID– SFENCE– LFENCE– MFENCE• Lock系指令
  12. 12. 第二步:lazySetJava并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• AtomicXXX.lazySet()保证StoreStore• 但不保证StoreLoad• 保证最终一致性• 一个轻量的volatile• Unsafe.putOrderedXXX• Ops约1700万"This is a niche method that is sometimes usefulwhen fine-tuning code using non-blocking datastructures. The semantics are that the write isguaranteed not to be re-ordered with any previouswrite, but may be reordered with subsequentoperations(or equivalently, might not be visible toother threads) until some other volatile write orsynchronizing action occurs).“--Doug Lea
  13. 13. 第三步:求模优化Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• & (k pow 2) - 1 替代%• Ops约2200万public boolean offer(final E e) {…buffer[(int) (currentTail % buffer.length)] = e;…}public boolean offer(final E e) {…buffer[(int) currentTail & mask] = e;…}
  14. 14. False SharingJava并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm
  15. 15. 第四步:去除伪共享Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• Ops约4000万public class PaddedAtomicLong extends AtomicLong {private static final long serialVersionUID = 1L;public PaddedAtomicLong() {}public PaddedAtomicLong(final long initialValue) {super(initialValue);}public long p1, p2, p3, p4, p5, p6;}
  16. 16. CPU CacheJava并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm
  17. 17. 内存排布对性能的影响Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 测试– 顺序读取内存数据– 在一个内存页内随机, 然后转到另外的页内随机– 全随机访问• https://gist.github.com/coderplay/4453283
  18. 18. Cache LineJava并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zmcat /sys/devices/system/cpu/cpu0/cache/index0/*
  19. 19. 第五步:优化内存排布Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 使用Direct ByteBuffer• 使用Unsafe使页对齐• 内存连续• Ops约6800万
  20. 20. 第六步:yield() vs LockSupport.parkNanos(1)Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 减少StoreLoad• 减少CPU相干性的噪声,从而提高cache命中• Ops约1亿1000万
  21. 21. 其它优化Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 环形队列预分配,零GC• 批量生产及消费• Wait free• Ops可达2亿2000万!• CPU亲缘
  22. 22. 思考Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 多消费者• 多生产者
  23. 23. 工具Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• top• vmstat• lscpu• perf• Valgrind tools suite• OProfile• SystemTap• numactl• Intel Vtune• Intel PTU• Intel PCM + ksysguard• MAT
  24. 24. 代码Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm$git clone https://github.com/coderplay/javaopt.git$java –cp bin javaopt.queue.QueuePerfTest n
  25. 25. 推荐读物Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• What every programmer should know about memory• Intel® 64 and IA-32 Architectures Software Developer Manuals• The Art of Multiprocessor Programming• The JSR-133 Cookbook for Compiler Writers (Java Memory Model)• 本人博客: http://coderplay.javaeye.com
  26. 26. Q & AData eXchange Platform| zhouchen.zmJava并发编程优化之阻塞队列作者:周忱 | 数据平台-DXP微博:@MinZhou邮箱:zhouchen.zm@taobao.com

×