HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

1,558 views
1,323 views

Published on

Presentation HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu at the AMD Developer Summit (APU13) November 11-13, 2013.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,558
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu

  1. 1. Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization Xidian university Huming Zhu, Maoguo Gong, Baolin Huang zhuhum@mail.xidian.edu.cn 2013.11
  2. 2. openCL COURSE ! ID:0222277,0242277 ! Opencl PROGRAMMING,Practice ! 2011、2012,2013
  3. 3. Contents Complex Network Clustering of NMF 1 2 3 4 5 Parallel Bayesian NMF on GPU Sparse BNMF on GPU Experiment Conclusion
  4. 4. Complex Network Clustering * All pictures are from Internet 12/7/13 Xidian University 5 5
  5. 5. Complex Network Clustering Network clustering aims to divide a network into several communities. It is required that the number of edges linking nodes of the same communities should be higher than the number of edges joining nodes belonging to different communities. •  Network clustering is essential for understanding how a network is organized and functions. 12/7/13 Xidian University 6 6
  6. 6. Non-negative Matrix Factorization (NMF) "  The NMF problem is defined as a searching for an approximation of the matrix A with respect to some metric (e.g., the norm) by factoring A into the product W × H of two reduced matrices W and H. "  NMF was applied in many areas, image processing, " powerful interpretability and close relationship between clustering methods. " Need a lot of computation power. [1] D. D. Lee, H. S. Seung: Learning the parts of objects by non-negative matrix factorization. Nature 401,pp. 788–791 (1999). 12/7/13 Xidian University 7 7
  7. 7. Bayesian NMF Input : Nonnegative data (observation) matrix A, fixed hyperparameters a, b. Output : Nonnegative matrices W and H Step1 :Initialize W and H to nonnegative values Step5. 12/7/13 If convergence then stop, otherwise, go to step2. Xidian University 8 8
  8. 8. Contents Complex Network Clustering of NMF 1 2 3 4 5 12/7/13 Parallel Bayesian NMF on GPU Sparse BNMF on GPU Experiment Conclusion Xidian University 9
  9. 9. Parallel Bayesian NMF • P-BNMF • Sparse-BNMF。 12/7/13 Xidian University 10
  10. 10. P-BNMF kernel matrix multiplication Matrix square sum 12/7/13 Xidian University 11
  11. 11. Matrix multiplication "  Update matrix:W*H "  Kernel: mat_mult_AB 12/7/13 Xidian University 12
  12. 12. sum of square of Matrix 12/7/13 Xidian University 13
  13. 13. Contents Complex Network Clustering of NMF 1 2 3 4 5 12/7/13 Parallel Bayesian NMF on GPU Sparse BNMF on GPU Experiment Conclusion Xidian University 14
  14. 14. Sparse-BNMF Problem GPU memory 1G,P-BNMF scale limit! Solution Sparse matrix storage format (CSR) ,Present Sparse-BNMF。 12/7/13 Xidian University 15
  15. 15. Sparse-BNMF CSR column : Aj_column, Av_column, Ap_column CSR : Aj, Av, Ap 12/7/13 Xidian University 16
  16. 16. 12/7/13 Xidian University 17
  17. 17. Pseudo-code for A_WH_csr kernel l 12/7/13 uint row = globalidy; if(row < row_num) { uint rowStart = Ap[row]; //get the start start position in Aj of this row. uint rowEnd = Ap[row+1]; //get the end position of this row. int index = rowStart + groupidx * 16 + localid; //the size of group is 16*1 //get the position of this pe(processing elelmet). int col = Aj[index];//get the position in Av of this pe. int aStart = widthA *groupidy; int aEnd = aStart + widthA -1; int aStep = 16; float Csub = 0.+0.000001; int bStart = col; int bStep = 16*widthB; for(int a = aStart, b = bStart; a < aEnd; a += aStep, b += bStep) { if(rowStart + groupidx * 16 < rowEnd) {//if there exist any nonzero value in this group As[localid]=W[a + localid]; barrier(CLK_LOCAL_MEM_FENCE); } if(rowStart + groupidx * 16+ localid < rowEnd) {// if this pe correspond to a nonzero value for(int k=0; k<16; k++) Bs[k*16+localid]= H[b + k*widthB]; for(int k=0; k<16; k++) Csub += Bs[k*16+localid]*As[k]; } if(rowStart + groupidx * 16+ localid < rowEnd) Av_result[index] =1.0/Csub; } Xidian University } 18
  18. 18. 12/7/13 Xidian University 19
  19. 19. Contents Complex Network Clustering of NMF 1 2 3 4 5 12/7/13 Parallel Bayesian NMF on GPU Sparse BNMF on GPU Experiment Conclusion Xidian University 20
  20. 20. Machine Host Product Name Device HP xw9400 workstation Product Name AMD Radeon HD 7770 OS Windows 7 .x64 Edition Engine Speed 1000MHz CPU 4× Dual-Core AMD Opteron 2220 2.80GHz Processing Elements 640 Memory 32GB Memory 1GB GDDR5 Memory Bandwidths 72GB/s PCI PCI Express® 3.0 x16 " AMD Accelerated Parallel Processing (APP) SDK v2.7, OpenCL 1.2 " Microsoft Visual Studio 2010; 12/7/13 Xidian University 21 21
  21. 21. Evaluation Modularity(Q)[1] Q= ki k j 1 ( Aij − )δ (Ci , C j ) ∑ 2m ij 2m synthetic Q↑,Better Network structure real-world networks Data Vertex Edges Q Data Vertex Edges Q Benchmark 128 1024 0.450 Facebook 324 4436 0.620 500 5135 0.813 Email 1133 5451 0.531 1000 9582 0.904 Netscience 1461 2742 0.905 5000 38007 0.908 Power 4941 6594 0.599 10000 148470 0.860 Scientists 6650 59870 0.647 50000 748337 0.900 Hep 7610 15751 0.772 LFR [1]. M. E. J. Newman, M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69 (2) (2004) 026113. 12/7/13 Xidian University 22 22
  22. 22. Network demo Netscience (part) • The netscience network is a network of co-authorship of scientists working on network theory and experiment. 12/7/13 Xidian University Facebook 23 23
  23. 23. Speedup Data Vertex K BNMF(s) P-BNMF(s) Sparse-BNMF(s) P-Ratio Sparse-Ratio Benchmark 128 64 4.165 0.166 0.226 4.37 3.1 500 1000 5000 128 128 128 109.9 712.5 31031.5 0.823 2.98 109.96 1.096 2.798 71.167 67.63 187.58 279.39 51.35 181.6 417.21 10000 128 186321.7 615.09 334.23 302.92 556.2 50000 128 * * 8250.28 * * Facebook 324 128 46.25 1.328 1.656 34.82 27.93 Email 1133 128 774.4 3.901 3.042 162.24 189.33 Netscience 1461 128 1253.2 6.725 4.628 166.11 215.81 Power Hep Scientists 4941 7610 6650 128 128 128 26202.4 76827.2 63254.5 108.30 271.28 208.2 61.787 152.66 125.55 239.29 281.75 303.81 404.38 491.85 503.84 LFR K is the number of clustering,BNMF(s) serial time,P-Rati: P-BNMF/BNMF speedup Sparse-Ratio:Sparse-BNMF/BNMF speedup。 12/7/13 Xidian University 24 24
  24. 24. Speedup " Netscience " Cluster number K 64~256. " Speedup,Sparse-BNMF better。 12/7/13 Xidian University 25 25
  25. 25. "  Using CodeXL to analyze OpenCL kernels on AMD GPUs 12/7/13 Xidian University 26 26
  26. 26. Kernel information provided by CodeXL Table1. P-BNMF kernel Table 2.Sparse-BNMF kernel的 Method GlobalWorkSize WorkGroupSize Time Method GlobalWorkSize WorkGroupSize Time Update_H {1472 128 1} {16 16 1} 6.12726 mat_mult_AB {1472 1472 1} {16 16 1} 10.73615 Update_H A_WH_csr_col {1472 128 1} {1472 1472 1} {16 16 1} { 1 16 1} 6.11407 7.76119 mat_dot_div {1472 1472 1} {16 16 1} 3.70267 mat_mult_A_s_col {1461 2048 1} { 1 16 1} 5.36341 mat_mult_AtB {1472 128 1} {16 16 1} 9.72355 mat_dot_mult {1472 128 1} {16 16 1} 0.2917 mat_dot_mult {1472 128 1} {16 16 1} 0.30133 mat_squ_sum_row {1472 128 1} {64 1 1} 0.5483 mat_squ_sum_row {1472 128 1} {64 1 1} 0.55304 mat_squ_sum_col { 128 1472 1} { 1 64 1} 7.27985 update_invbeta { 128 1 1} { 4 1 1} 0.03763 Update_W { 128 1472 1} {16 16 1} 6.25437 mat_squ_sum_col update_invbeta Update_W A_WH_csr { 128 1472 1} {128 1 1} { 128 1472 1} {1472 1472 1} { 1 64 1} { 4 1 1} {16 16 1} {16 1 1} 6.99467 0.03748 6.17718 6.29185 mat_mult_AB {1472 1472 1} {16 16 1} 10.75037 mat_mult_s_Bt {2048 1461 1} {16 1 1} 5.37615 mat_dot_div {1472 1472 1} {16 16 1} 3.64148 mat_mult_ABt { 128 1472 1} {16 16 1} 9.04222 mat_dot_mult { 128 1472 1} {16 16 1} 0.27763 mat_dot_mult { 128 1472 1} {16 16 1} 0.2843 " Table 1, bolt kernel,W* H,dot matriply,AtB。 " Table 2, Sparse kernel, A_WH_csr_co和mat_mult_A_s_col。 " CSR is better。 12/7/13 Xidian University 27 27
  27. 27. PNMF VS Sparse-BNMF PNMF Sparse-BNMF SIZE small(<10000) big speedup low high # the Sparse-BNMF algorithm can solve the memory limit problem effectively, # which enables the algorithm to deal with larger scale networks. 12/7/13 Xidian University 28 28
  28. 28. Contents Complex Network Clustering of NMF 1 2 3 4 5 12/7/13 Parallel Bayesian NMF on GPU Sparse BNMF on GPU Experiment Conclusion Xidian University 29
  29. 29. Our work " Present P-BNMF and Sparse-NMF; "  P-BNMF; "  Sparse-BNMF, CSR; " speedup. Future " Portablity。 12/7/13 Xidian University 30 30
  30. 30. Thank You! 31

×