8 並列計算に向けた pcセッティング

8 並列計算に向けた
PCセッティング
広島国際学院大学高石武史
MMDS主催数値シミュレーション入門セミナー
有限要素法ソフトウェアFreefem++講習会中級編
大阪大学数理・データ科学教育研究センター

本日の目標
1. ubuntu / Windows10 上で FreeFem++ 並列計算環境を構築
a. LEVEL 1 : MPI で並列ソルバーを動かす
b. LEVEL 2 : hdppm を使えるようにする
i. examples++-hdppm 以下のサンプルも使えるようにしよう
2. 並列計算のためのスクリプト
a. 行列での記述 : problem => varf + matrix
b. 行列右辺の扱い
c. 並列ソルバーの使い方

1. 並列計算のためのインストール
● ubuntu on Windows10 64bit : 1.1 から
● 通常の ubuntu : 1.2 から
※ubuntu 16.04 LTS で確認しています

ubuntu on Windows10 のみ
1.1 ubuntu on Windows10 64bit の準備

Windows Subsystem for Linux (WSL) と ubuntu をインスト
ール
1. プログラムと機能 -> Windows の機能の有効化または無効化
1. Windows subsystem for Linux (beta) を有効化
2. 再起動
2. Microsoft Store から ubuntu をダウンロード (installで少々時間が必要)
1. username, password を入力
3. 「ubuntu」を実行し、ターミナルを開く
$ sudo dpkg-reconfigure tzdata # Time zone 修正
$ sudo apt-get update
$ sudo apt-get upgrade
$ export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
$ cat “export DISPLAY=localhost:0.0” >>~/.bashrc

ubuntu on Windows10 / ubuntu 編
1.2 FreeFem++ のインストール

その１ : パッケージのインストールとFreeFem++の
入手
必要なパッケージのインストール
$ sudo apt-get install build-essential gfortran gnuplot openmpi-bin libopenmpi-dev
libatlas-dev m4 bison flex freeglut3-dev libopenblas-dev
$ sudo apt-get remove liblapack3
$ sudo apt-get install git cmake python libfftw3-bin libfftw3-dev
FreeFem++ の入手
$ wget http://www.freefem.org/ff++/ftp/freefem++-3.58.tar.gz
$ tar zxvf freefem++-3.58.tar.gz
$ cd freefem++-3.58
$ ./configure --enable-download

その２
LEVEL 1:
$ download/getall -a # 事前に必要なファイルを全てダウンロード
LEVEL 2 (hdppm あり）:
$ cd download/ff-petsc
$ make petsc-slepc SUDO=sudo # NotePC で2時間以上かかる！！
$ cd ../..
$ ./reconfigure
$ download/getall -a # 事前に必要なファイルを全てダウンロード
$ make # NotePC で一時間弱
$ sudo make install

その3 : examples
● DDM-Schwarz-Lame-3d.edp (mpi)
$ cd examples++-mpi
$ ffmpi-run -np 4 DDM-Schwarz-Lame-3d.edp
● LEVEL 2 のみ : diffusion-2d-PETSc.edp (hpddm)
$ cd examples++-hpddm
$ ffmpi-run -np 4 diffusion-2d-PETSc.edp

注意
1. /usr/local/ff-petsc があったら消しておく
2. PATHでスペースがあると reconfigure で失敗する (ubuntu on Windows10)
$ export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
1. $ apt-get remove liblapack3 # 不要かも

[ option for ubuntu on Windows10 ]
1. Xserver install
a. VcXsrv (https://sourceforge.net/projects/vcxsrv/)
2. マウスでコピー＆ペースト (ref)
a. ウィンドウタイトルバーで右クリック
b. メニューから編集

2. FreeFem++ でできる並列計算

並列計算とMPI (Message Passing Interface)
並列計算のメリット
1. 大規模数値計算
2. 高速化も期待
MPI : 並列コンピューティング利用するための標準化された規格
1. CPU間のデータの送受信などを規定
2. FreeFem++ にいくつかのコマンドを実装
3. よく使う変数：
a. mpisize : The total number of processes,
b. mpirank : the id-number of my current process in (0,..., mpisize-1)

parallel solver と DDM
matrix u=A^-1 * b を
分割実行
sub matrix
sub matrix
sub matrix
sub matrix
sub matrix
sub matrix
parallel solver
DDM

Parallel Solver
行列ソルバーを並列版に変更
1. direct solver
a. MUMPS (MUltifrontal Massively Parallel sparse direct Solver)
b. SuperLU (Supernodal LU)
2. Krylov (iterative) solver
a. pARMS (parallel Algebraic Recursive Multilevel Solvers)
b. HIPS (Hierarchical Iterative Parallel Solver)
c. HYPRE (Parallel solvers for sparse linear systems featuring multigrid methods)
● メリット：プログラムは殆どそのまま
● デメリット：メインプログラムのほとんどは並列化されない
行列計算のみ
メインプログラム

Names functions Libraries real complex Types misc
MUMPS (MUltifrontal
Massively Parallel Solver)
defaulttoMUMPS() MUMPS FreeFem mumps mumps direct a direct method based on a multifrontal
approach
SuperLU distributed realdefaulttoSuperL
UDist()
real_SuperLU_DIST
_FreeFem
SuperLU
_DIST
previous
solver
direct LU factorization
SuperLU distributed complexdefaulttoSu
perLUDist()
complex_SuperLU_
DIST_FreeFem
previous
solver
SuperLU
_DIST
direct LU factorization
Pastix (Parallel Sparse
matrix package)
realdefaulttopastix() real_pastix_FreeFe
m
pastix previous
solver
direct direct and block ILU(k) iterative
methods
Pastix (Parallel Sparse
matrix package)
complexdefaulttopa
stix()
complex_pastix_Fre
eFem
previous
solver
pastix direct direct and block ILU(k) iterative
methods
HIPS ( Hierarchical
Iterative Parallel Solver )
defaulttohips() hips_FreeFem hips previous
solver
iterative/
direct
multilevel ILU
HYPRE ( High Level
Preconditioner )
defaulttohypre() hypre_FreeFem hypre previous
solver
iterative AMG (Algebraic MultiGrid) and
Parasails (Parallel Sparse
Approximate Inverse)
pARMS ( parallel
Algebraic Multilevel
Solver )
defaulttoparms() parms_FreeFem parms previous
solver
iterative RAS (Restricted Additive Schwarz)
and Schur Complement type
preconditioner

DDM (Domain Decomposition Method)
領域を分割して，計算を実行
1. hpddm (high-performance unified framework for domain decomposition
methods)
2. PETSc (Portable、Extensible Toolkit for Scientific Computation)
● メリット：殆ど並列計算
● デメリット：プログラムの書き直しが必要
計算領域を分担して実行
メインプログラム

3. 反応拡散モデルの並列計算
FreeFem++ 本に出てくる
Activator-Inhibitor 型の反応拡散モデルを並列化しよう

problem での記述
fespace Vh(Th,Pk);
Vh u,uu,uold;
Vh v,vv,vold;
// Definition of weak-form of RD eqs.
problem RD1 (u,uu,solver=CG,init=0)
= int2d(Th)(u * uu + dt * du * (grad(u)' * grad(uu)))
- int2d(Th)(uold * uu + dt * f(uold, vold) * uu);
problem RD2 (v,vv,solver=CG,init=0)
= int2d(Th)(v * vv + dt * dv * (grad(v)' * grad(vv)))
- int2d(Th)(vold * vv + dt * g(uold, vold) * vv);
// Main loop
for (it=1;it<=itmax;it++)
{
t=dt*it;
real cpu0=clock();
uold=u; vold=v;
RD1; RD2;
real cpu1=clock()-cpu0;
cpuTotal += cpu1;
…
}

行列での記述
fespace Vh(Th,Pk);
Vh u,uu,uold;
Vh v,vv,vold;
Vh b1, b2; // RHS
// Definition of weak-form of RD eqs.
varf RD1(u,uu)
= int2d(Th)(u * uu + dt * du * (grad(u)' * grad(uu)));
varf RHS1(unused,uu) = int2d(Th)(uold * uu + dt * f(uold, vold) * uu);
varf RD2(v,vv)
= int2d(Th)(v * vv + dt * dv * (grad(v)' * grad(vv)));
varf RHS2(unused,vv) = int2d(Th)(vold * vv + dt * g(uold, vold) * vv);
matrix a1=RD1(Vh,Vh,solver=CG,init=0);
matrix a2=RD2(Vh,Vh,solver=CG,init=0);
/// Main loop
for (it=1;it<=itmax;it++)
{
t=dt*it;
real cpu0=clock();
uold=u; vold=v;
b1[] = RHS1(0,Vh);
b2[] = RHS2(0,Vh);
u[] = a1^-1*b1[];
v[] = a2^-1*b2[];
real cpu1=clock()-cpu0;
cpuTotal += cpu1;
…
}

右辺の扱い
problem RD1 (u,uu,solver=CG,init=0)
= int2d(Th)(u * uu + dt * du * (grad(u)' * grad(uu)))
- int2d(Th)(uold * uu + dt * f(uold, vold) * uu);
ー＞
varf RD1(u,uu)
= int2d(Th)(u * uu + dt * du * (grad(u)' * grad(uu)));
varf RHS1(unused,uu) = int2d(Th)(uold * uu + dt * f(uold, vold) * uu);
b1[] = RHS1(0,Vh);

Problem 1
行列版を作り，problem 版と行列版の速度を比較してみよう
$ FreeFem++ RD2-ai-bs-prob.edp
it = 1000, t = 1, Averaged Time=XXXXX(ms)
$ FreeFem++ RD2-ai-bs-matrix.edp

load "MUMPS_FreeFem"
real ttgv=1e10; string ssparams="nprow=1, npcol="+mpisize; // MUMPS
verbosity=0; // message level
．．．
matrix a1=RD1(Vh,Vh,tgv=ttgv);
matrix a2=RD2(Vh,Vh,tgv=ttgv);
set(a1,solver=sparsesolver,sparams=ssparams);
set(a2,solver=sparsesolver,sparams=ssparams);

Problem 2
MUMPS版を作り，速度を計測してみよう
$ ff-mpirun -np 4 RD2-ai-bs-MUMPS.edp -glut ffglut
※ ”-np 4” => 4 processes : 変えてみよう！

Problem 3
Solver を変えて速度を計測してみよう
1. SuperLU_DIST
load "real_SuperLU_DIST_FreeFem"
real ttgv=-1;string ssparams="nprow=1, npcol="+mpisize; // SuperLU_DIST
$ ff-mpirun -np 4 RD2-ai-bs-SuperLU_DIST.edp -glut ffglut

DDM に挑戦： PETSc 版
領域を分割（陣地分け）するなんて，なんかかっこいいかも

DDM に挑戦： PETSc 版 (DDM-1) mesh 生成
int[int] arrayIntersection; // ranks of neighboring subdomains
int[int][int] restrictionIntersection(0); // local-to-neighbors renumbering
real[int] D; // partition of unity
{
meshN ThGlobal = cube(10 * getARGV("-global", 5), getARGV("-global", 5),
getARGV("-global", 5), [10 * x, y, z], label = LL); // global メッシュの生成
build(Th, ThBorder, ThGlobal, fakeInterface, s, overlap, D, arrayIntersection,
restrictionIntersection, Wh, Pk, comm, excluded); // local メッシュの生成
}

(DDM-2) さらなるオプション設定
set(A, sparams = "-hpddm_schwarz_method ras -
hpddm_schwarz_coarse_correction balanced -hpddm_variant right -
hpddm_verbosity 1 -hpddm_geneo_nu 10");
//
// -hpddm_schwarz_method : ras / oras / soras /asm / osm / none
// -hpddm_schwarz_coarse_correction : deflated / additive / balanced
// -hpddm_variant : preconditioning
// -hpddm_verbosity : verbosity level of HPDDM
// -hpddm_geneo_nu : the number of coarse degrees of freedom per subdomain of
the GENEO coarse space

(DDM-3)
// matrix A の partition 情報を用いて，b = M * u[]
dmv(A, M, u[], b); // distributed matrix vector product
// 分散データからプロット (自作)
plotDDM1(Th, u[], Pk, def, real,
{cmm="u t="+t,value=true,fill=1,wait=0,ps="RD2-ai-bs_u"+it+".eps"});

Problem 4
PETSc 版の速度を計測してみよう
$ ff-mpirun -np 4 RD2-ai-bs-PETSc.edp

最後の課題
３次元コードを作成しよう
1. メッシュ生成
load "msh3"
mesh Th2=square(20,20); mesh3 Th=buildlayers(Th2,20);
1. gradient grad(u) (dx(u), dy(u), dz(u))
2. 積分 int2d -> int3d
3. 初期値 u=1.0 * bool((x*x+y*y+z*z) < r0*r0); // if(r<r0) u=1; else u=0

8 並列計算に向けた pcセッティング

8 並列計算に向けた pcセッティング

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 8 並列計算に向けた pcセッティング

Similar to 8 並列計算に向けた pcセッティング (20)

8 並列計算に向けた pcセッティング