Java Concurrent Optimization: Concurrent Queue

2,909 views
2,714 views

Published on

Step by step optimize a BlockingQueue, make the ops from 3m to 110m

Published in: Technology
1 Comment
20 Likes
Statistics
Notes
  • 太狠了,之前看 http://psy-lob-saw.blogspot.com/2013/03/single-producerconsumer-lock-free-queue.html 这篇的优化已经做得很强,这里居然连Unsafe都用上...
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,909
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
89
Comments
1
Likes
20
Embeds 0
No embeds

No notes for slide

Java Concurrent Optimization: Concurrent Queue

  1. 1. 并发队列篇作者:周忱 | 数据平台-DXP微博:@MinZhou邮箱:zhouchen.zm@taobao.com
  2. 2. Java并发编程优化之阻塞队列关于我• 花名:周忱(chén)• 真名:周敏• 微博: @MinZhou• Twitter: @minzhou• 2010年6月加入淘宝• 曾经淘宝Hadoop&Hive研发组Leader• 目前云梯跨机房临时工• Hive Contributor• 自由、开源软件热爱者Data eXchange Platform| zhouchen.zm
  3. 3. Java并发编程优化之阻塞队列关于我• 花名:周忱(chén)• 真名:周敏• 微博: @MinZhou• Twitter: @minzhou• 2010年6月加入淘宝• 曾经淘宝Hadoop&Hive研发组Leader• 目前云梯跨机房临时工• Hive Contributor• 自由、开源软件热爱者Data eXchange Platform| zhouchen.zm
  4. 4. 队列是什么?Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm
  5. 5. 队列是什么?Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm
  6. 6. 队列的运用Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm
  7. 7. ArrayBlockingQueue & LinkedBlockingQueueJava并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• BlockingQueue• ArrayBlockingQueue: 数组实现• LinkedBlockingQueue: 链表实现• Ops约300万
  8. 8. 队列的性能问题Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• Linked list is the EVIL of performance• 在head, tail和size三个变量的写冲突• put/take和offer/poll上的大锁• GC问题
  9. 9. 单Writer原则Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm方法 时间(ms)单线程 long 300单线程 volatile long 4,700单线程 AtomicLong(CAS ) 5,700双线程 AtomicLong(CAS ) 18,000单线程synchronized + long 10,000双线程synchronized + long 118,000• 一个变量递增500,000,000次所需时间
  10. 10. 第一步:环形队列Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 没有写冲突, 不需要上锁, 甚至不需要CAS• 采用volatile关键字让对方线程可见• 不需要维护size• Ops约1100万
  11. 11. 内存屏障Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• Load Buffer• Store Buffer• CPU串行化指令– CPUID– SFENCE– LFENCE– MFENCE• Lock系指令
  12. 12. 第二步:lazySetJava并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• AtomicXXX.lazySet()保证StoreStore• 但不保证StoreLoad• 保证最终一致性• 一个轻量的volatile• Unsafe.putOrderedXXX• Ops约1700万"This is a niche method that is sometimes usefulwhen fine-tuning code using non-blocking datastructures. The semantics are that the write isguaranteed not to be re-ordered with any previouswrite, but may be reordered with subsequentoperations(or equivalently, might not be visible toother threads) until some other volatile write orsynchronizing action occurs).“--Doug Lea
  13. 13. 第三步:求模优化Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• & (k pow 2) - 1 替代%• Ops约2200万public boolean offer(final E e) {…buffer[(int) (currentTail % buffer.length)] = e;…}public boolean offer(final E e) {…buffer[(int) currentTail & mask] = e;…}
  14. 14. False SharingJava并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm
  15. 15. 第四步:去除伪共享Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• Ops约4000万public class PaddedAtomicLong extends AtomicLong {private static final long serialVersionUID = 1L;public PaddedAtomicLong() {}public PaddedAtomicLong(final long initialValue) {super(initialValue);}public long p1, p2, p3, p4, p5, p6;}
  16. 16. CPU CacheJava并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm
  17. 17. 内存排布对性能的影响Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 测试– 顺序读取内存数据– 在一个内存页内随机, 然后转到另外的页内随机– 全随机访问• https://gist.github.com/coderplay/4453283
  18. 18. Cache LineJava并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zmcat /sys/devices/system/cpu/cpu0/cache/index0/*
  19. 19. 第五步:优化内存排布Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 使用Direct ByteBuffer• 使用Unsafe使页对齐• 内存连续• Ops约6800万
  20. 20. 第六步:yield() vs LockSupport.parkNanos(1)Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 减少StoreLoad• 减少CPU相干性的噪声,从而提高cache命中• Ops约1亿1000万
  21. 21. 其它优化Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 环形队列预分配,零GC• 批量生产及消费• Wait free• Ops可达2亿2000万!• CPU亲缘
  22. 22. 思考Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• 多消费者• 多生产者
  23. 23. 工具Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• top• vmstat• lscpu• perf• Valgrind tools suite• OProfile• SystemTap• numactl• Intel Vtune• Intel PTU• Intel PCM + ksysguard• MAT
  24. 24. 代码Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm$git clone https://github.com/coderplay/javaopt.git$java –cp bin javaopt.queue.QueuePerfTest n
  25. 25. 推荐读物Java并发编程优化之阻塞队列Data eXchange Platform| zhouchen.zm• What every programmer should know about memory• Intel® 64 and IA-32 Architectures Software Developer Manuals• The Art of Multiprocessor Programming• The JSR-133 Cookbook for Compiler Writers (Java Memory Model)• 本人博客: http://coderplay.javaeye.com
  26. 26. Q & AData eXchange Platform| zhouchen.zmJava并发编程优化之阻塞队列作者:周忱 | 数据平台-DXP微博:@MinZhou邮箱:zhouchen.zm@taobao.com

×