Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation

405 views

Published on

A slide for session "Analysis and Improvement of IOTA PoW Implementation" in SITCON2018

Published in: Engineering
  • Be the first to comment

[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation

  1. 1. Analysis and Improvement of IOTA PoW Implementation chenwei (魏禛) <zhenwei.tw@gmail.com> AndyYang (楊子賢) <kukry5566@gmail.com> March 10, 2018 / SITCON2018 1
  2. 2. chenwei (魏禛) ● From Tainan, Taiwan ● Study Master degree at National Taiwan University ● Recent work ○ Learning how to implement a interpreter ○ Learning Golang ○ Optimize Neural Network on multiple GPUs ● GitHub <https://github.com/chenwei-tw> 2
  3. 3. AndyYang (楊子賢) ● 來自台北 ● 目前就讀台大資工所一年級 ● 研究領域 : ○ 機器學習 ○ 計算機結構 ● Recent Work : ○ ReRam Based Accelerator for Convolutional Neural Network 3
  4. 4. Brief Introduction to IOTA from: “Iota Tangle Visualization” <https://simulation1.tangle.works/> 4
  5. 5. Brief Introduction to IOTA ● IRI (IOTA Reference Implementation) ○ Provides RESTful API to participate in Tangle ○ Exchange transactions with other nodes ○ Maintain Database for storing transactions Referenced: “IOTA 輕量錢包、完整錢包與 IOTA Node 的關係” <https://blog.louie.lu/2017/12/06/relationship-between-iota-light-wallet- full-wallet-and-full-node/> Referenced: “IOTA API Reference” <https://iota.readme.io/v1.2.0/reference> 5
  6. 6. Brief Introduction to IOTA ● (Light) Wallet ○ 查詢餘額、收款、轉帳 ○ 因為沒有運行完整的 Node,所以 Wallet 的資訊都必 須透過前述的 RESTful API 與一個 full node 做溝通 ○ Before doing any operation with your wallet, check host connected is available Referenced: “IOTA 輕量錢包、完整錢包與 IOTA Node 的關係” <https://blog.louie.lu/2017/12/06/relationship-between-iota-light-wallet- full-wallet-and-full-node/> 6
  7. 7. Brief Introduction to IOTA ● 如何發起一筆交易 ? ○ Node 選擇兩個交易 (transaction) 做驗證 ○ 檢查該兩筆交易是否有衝突 (conflict) (e.g. 帳戶餘額為負) ○ 解出一道加密問題 (PoW),耗費計算力 Referenced: “Tangle 白皮書” <https://hackmd.io/s/ryriSgvAW> Further Reading: “深入理解 IOTA 交易方式” <https://blog.louie.lu/2018/01/10/in-depth-explain-iota-transaction/> 7
  8. 8. How I get involved in ● <attachToTangle> in IRI Referenced: “iotaledger/iri” <https://github.com/iotaledger/iri> 8
  9. 9. How I get involved in ● There are too many IOTA PoW Implementation hided in these libraries ○ curl.lib.js <https://github.com/iotaledger/curl.lib.js> ○ gIOTA <https://github.com/iotaledger/gIOTA> ○ ccurl <https://github.com/iotaledger/ccurl> ○ iota-pearldiver <https://github.com/mlouielu/iota-pearldiver> 9
  10. 10. ● gIOTA 蒐集了多種的PoW實作(C, SSE, AVX, OpenCL) ○ 而這些實作多以 C code 的形式內嵌在 Golang 裡 Why choose gIOTA? ● 故我們可以藉由 C 打造 IOTA 底層的 trinary structure 後,便可快速將實作移轉過去 10
  11. 11. ● Alternative to Binary, Trinary is a base-3 numeral system ● Trits: Analogous to bits, a ternary digit is a trit .The digits may have the values 1, 0, or -1 ● Trytes: A tryte consists of 3 trits, which can represent 27 values. ○ in IOTA, trytes are represented as characters '9,A-Z'. Referenced: “IOTA Glossary” <https://iota.readme.io/docs/glossary> Trinary Structure 11
  12. 12. Source Code: “chenwei-tw/dcurl” <https://github.com/chenwei- tw/dcurl/blob/dev/src/trinary/trinary.h> Our Trinary Structure 12
  13. 13. ● 9 in tryte = {0,0,0} in trits What is PoW (Proof Of Work)? Referenced: “The Anatomy of a Transaction” <https://domschiener.gitbooks.io/iota- guide/content/chapter1/transactions-and-bundles.html> ...0000...0 MWM Hash 13
  14. 14. ● giota 所蒐集的實作使用的多執行緒寫 法,並不是真的把計算函數分工,而是 同時執行多個一樣的函數看誰比較快算 出來的暴力解法 ● 不同執行緒的起始 seed 不一樣 如何找出Nonce? 14
  15. 15. ● C, GO, SSE 的實作沒有 問題 Referenced: “用 C 開發 IOTA PoW 的各種實作" <https://hackmd.io/s/HyNw4VM-z> 實測 giota 正確性 15
  16. 16. ● AVX, OpenCL 卻沒通過 pow_avx_test.go:47: pow is illegal J9QTUNNMONCMIR9JBNMRC9SC9QTBRKBUVCBYBUITBHEICYVQ9HXEXSPWPU9KACTSDRSQBDOJPOOEAFVMP pow_cl_test.go:46: pow is illegal IIHYVX9VHSMQWSNDJYWZOJBCBTPVQBLVBF9UYIYSTEKJVEFVY9JPJJMRLFWOJFKNWKAANSZKLXDBWMALI ● 後來發現 iotaledger/ccurl, 和 gIOTA 的 OpenCL Kernel Function 是一樣的, 但是 ccurl 的結果是對的, 我們推測可 能是 gIOTA 在 launch kernel 的時候發生問題 ● 於是後來的 GPU 效能評估與後續的設計都是基於 iotaledger/ccurl 版本做修改 實測 giota 正確性 16
  17. 17. ● 以一個 tryte 量測三種 PoW 實作的效能 ● 但是後來發現不同的 tryte 找到的 Nonce 時間不一樣 量測各種 PoW 實作效能 17
  18. 18. ● 以大量的 trytes 來量測並繪製分布圖, 觀察各實作的效能 ● 30 trytes 200 samples 的結果 量測各種 PoW 實作效能 47組 samples 執行時間約 10 秒 重複初始化 OpenCL context 的下場 Source Code: “chenwei-tw/iota-pow-in-c” <https://github.com/chenwei-tw/iota-pow-in-c> 18
  19. 19. ● 疑問: 為何使用 GPU 的 OpenCL 效能特別差 ? ● 可能的問題點: ○ 尋找 Nonce 的 kernel function 要計算很久? ○ Device 與 Host 之間的 Communication overhead 過大 ? ○ 還是 OpenCL 哪一個的 API 出了問題 ? ● 另外一個問題: ○ 由於實驗環境的 GPU 為 Nvidia,且 Nvidia 沒有提供 其 OpenCL 的 Profiling Tool OpenCL 效能差的原因? 19
  20. 20. ● 最直覺的想法便是重新把 OpenCL 實作改寫為 CUDA 後 再用 toolkit 的其中一項工具 nvprof 進行觀察 ● 從下圖的結果,無法直接觀察到變慢的原因 自幹一發 CUDA ! Further Reading: “Profiler :: CUDA Toolkit Documentation” <http://docs.nvidia.com/cuda/profiler-users-guide/index.html> 20
  21. 21. ● 後來在 github 找到另一個 Profiling Tool - uftrace, 這個 工具可以提供如: ○ Duration ○ TID ○ Times of Function Call ○ Total time ● 雖然 uftrace 無法分析有關 GPU 的 Profiling Information , 但是它提供的資訊仍可以讓我們了解效能 是卡在哪裡 Referenced: “namhyung/uftrace” <https://github.com/namhyung/uftrace> 嘗試另一個 Profiling Tool 21
  22. 22. ● record : runs a program and saves the trace data ● graph : shows function call graph in the trace data uftrace 的量測結果 $ uftrace record pow_cl $ uftrace graph main 22
  23. 23. ● GPU初始化階段占了近70%的比重 total time init_clcon text init_cl_ke rnel write_cl_b uffer clEnqueueW riteBuffer clWaitForE vents clEnqueueR eadBuffer Hash 1.938 1.354 s 14.362 us 1.541 ms 1.538 ms 569.901 ms 84.981 us 5.502 ms OpenCL context Initialization OpenCL searching nonce uftrace 的量測結果 23
  24. 24. ● 想辦法避免 OpenCL context 重複初始化的問題 ○ 而 ccurl 的解決辦法是,一次只做一個 PoW Task,並 重複利用同一個 context ● 閱讀完 ccurl 的程式碼後,我們認為 ccurl 的資料結構設 計也有試圖想實現 multi-thread Pow Task,但是我們嘗 試在同一個 address space 同時 launch 多個 <ccurl_pow> ,算出來的 hash 卻是錯的 如何改善 OpenCL 版本的問題 24
  25. 25. New IOTA PoW Library - dcurl ● Goal ○ 在給定的硬體環境裡,想辦法讓 PoW 跑越快越好 ○ 整合至 IRI,並檢驗效能是否有提升 ● Our ideas ○ PoW tasks can be multi-threaded executed ○ Integrate powerful IOTA PoW implementation 25
  26. 26. New IOTA PoW Library - dcurl ● Hardware Environment ○ Ubuntu 16.04 ○ Intel(R) Xeon(R) CPU E5-2650 v4 @2.2GHz 48 cores ○ Nvidia Titan Xp ○ 94.2 GB RAM 26
  27. 27. New IOTA PoW Library - dcurl 27
  28. 28. New IOTA PoW Library - dcurl It’s important to find respective lock 28
  29. 29. Does multi-thread really bring speedup? Frequency Time (s) 29
  30. 30. Does multi-thread really bring speedup? Frequency Time (s) 30
  31. 31. Compare dcurl with other PoW Libraries Frequency Time (s) 31
  32. 32. Integrate dcurl into IRI 32
  33. 33. Integrate dcurl into IRI ● Use javah to produce header file for c program $ javah com.iota.iri.hash.PearlDiver 33
  34. 34. Integrate dcurl into IRI ● <jni.h> provides many functions to convert java objects to C objects, such as ... ○ GetIntArrayElements() gets java int array and return c int array ○ SetIntArrayRegion() copys c int array to java int array Further Reading: “JNI Functions” <https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html> Further Reading: “Java Programming Tutorial Java Natve Interface (JNI)” <https://www.ntu.edu.sg/home/ehchua/programming/java/JavaNativeInterface.html> 34
  35. 35. Integrate dcurl into IRI ● Reminder ○ Provide include path to OpenJDK for compiler ○ Set java library path before launch your jvm ● Lets compile it ! ○ We can get a shared library for jvm to load ○ Done! Source code: “chenwei-tw/iri” <https://github.com/chenwei- tw/iri/tree/task/integrate_dcurl> 35
  36. 36. Performance between IRI and dcurl Frequency Time (s) Different Hardware Platform ● Intel(R) Core(™) i7-8700K Processor ● Nvidia GeForce GTX 1080 Ti ● 32 GB Memory <attachToTangle> Performance Comparison 36
  37. 37. Something in progress ... ● Fix AVX implementation ● Let dcurl can configure environment and support multiple GPUs ● dcurl would be crashed if GPU memory is not enough ● dcurl would decide suitable parameter set automatically 37
  38. 38. Future Work ● Add a new interface for PearlDiver in IRI, so everyone can load suitable PoW implementation for their hardware environment ● Search for other bottlenecks of IRI and try to improve 38

×