SlideShare a Scribd company logo
Some things you need to know
Jongsu Kim
Fortran
Fortran….
• Still Fortran 77, 90, or 95?
• Fortran 2003 & 2008 is already here and 2015 will be a future.
• Some parts will be deleted or obsolescent.
• We are using Fortran wrong way.
What you shouldn’t use
Labeled Do Loops
do 100
ii=istart,ilast,istep
isum = isum + ii
100 continue
1 2 3 4 5 6 7
A
B
C(1) C(2)
EQUIVALENCE
specify the sharing of storage units by two or more objects
in a scoping unit
character (len=3) :: C(2)
character (len=4) :: A,B
equivalence (A,C(1)), (B,C(2))
COMMON
Blocks of physical storage accessed by any of
the scoping units in a program
COMMON /BLOCKA/ A,B,C(10,30)
COMMON I, J, K
ENTRY
subroutine-like-things Inside subroutine
FIXED FORM SOURCE
Fortran 77 style (80 column restriction)
CHARACTER* form
replaced with CHARACTER(LEN=?)
NON-BLOCK DO CONSTRUCT
the DO range doesn't end in a CONTINUE or
END DO
What you shouldn’t use
Labeled Do Loops
Label doesn’t need, hard to remember
what meaning of number. Moreover, we
have END DO or CYCLE statement
EQUIVALENCE
Equivalence is also error-prone. It is hard to
memorize all of positions where this variables
points.
Since COMMON and EQUIVALENCE is not to
encouraged to use, BLOCK statement is also not
to do.
COMMON
Sharing lots of variables over program is
dangerous. It is error-prone
ENTRY
It complicates program because we have
module & subroutine
NON-BLOCK DO CONSTRUCT
Hard to maintain where DO loop ends
What you might want to use – CYCLE , EXIT
• Avoid GOTO Statement
• Use CYCLE or EXIT statement
• CYCLE : Skip to the end of a loop
• EXIT : exit loop
do i=1, 100
x = real(i)
y = sin(x)
if (i == 20) exit
z = cos(x)
enddo
do i=1, 100
x = real(i)
y = sin(x)
if (i == 20) cycle
z = cos(x)
enddo
19 iteration will be done successfully, but at
20th iteration, y = sin(x) executed
then exit loop.
100 iteration, but at i=20, z = cos(x)
doesn’t executed
What you might want to use – CYCLE , EXIT
• Avoid GOTO statement
• Use CYCLE or EXIT statement with nested loop
• Constructs (DO, IF, CASE, etc.) may have names
outer: do j=1, 100
inner: do i=1, 100
x = real(i)
y = sin(x)
if (i > 20) exit outer
z = cos(x)
enddo inner
enddo outer
Exit whole loop at i=21 Skip z=cos(x) when i>21
outer: do j=1, 100
inner: do i=1, 100
x = real(i)
y = sin(x)
if (i > 20) cycle outer
z = cos(x)
enddo inner
enddo outer
What you might want to use – WHERE
real, dimension(4) :: &
x = [ -1, 0, 1, 2 ], &
a = [ 5, 6, 7, 8 ]
...
where (x < 0)
a = -1.
end where
where (x /= 0)
a = 1. / a
elsewhere
a = 0.
end where
where (x < 0)
a = -1.
end where
a : {-1.0, 6.0, 7.0, 8.0}
where (x /= 0)
a = 1. / a
elsewhere
a = 0.
end where
a : {-1.0, 0.0, 1.0/7.0, 1.0/8.0}
What you might want to use – ANY
integer, parameter :: n = 100
real, dimension(n,n) :: a, b, c1, c2
c1 = my_matmul(a, b) ! home-grown function
c2 = matmul(a, b) ! built-in function
if (any(abs(c1 - c2) > 1.e-4)) then
print *, ’There are significant
differences’
endif
• ANY and WHERE remove redundant do loop
What you might want to use – DO CONCURRENT
• Vectorization
• Simple example of Auto-Parallelization
• Definition : Processes one operation on multiple pairs of operands at once
do concurrent (i=1:m)
call dosomething()
end do
DO i=1,1024
C(i) = A(i) * B(i)
END DO
DO i=1,1024,4
C(i:i+3) = A(i:i+3) * B(i:i+3)
END DO
• ALLOW/REQUEST Vectorization. If you need vectorization, enable –parallel option.
• No data dependencies, No EXIT or CYCLE Statement, No return statement.
• Use with OpenMP.
For More..
• Read Fortran 2008 Standard
• http://www.j3-fortran.org/doc/year/10/10-007.pdf
• More recent document for Fortran 2015 (or more, working now)
• http://j3-fortran.org/doc/year/15/15-007.pdf
• Easy to read documents
• The new features of Fortran 2008 : ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1828.pdf
• Modern Programming Languages: Fortran90/95/2003/2008 :
https://www.tacc.utexas.edu/documents/13601/162125/fortran_class.pdf
Build System (MakeFile)
Build?
• Process From Source Code to Executable Files, so called Build.
• Compiler : tool for compile, Linker : tool for Link.
• ifort, gcc, gfortran, and so on are combined tool for compile & link.
Source Code1.f
Source Code2.f
Source Code3.f
Source Code1.o
Source Code2.o
Source Code3.o
Compile Link
Libraries(FFTW..)
Readable Unreadable
a.out
Makefile?
• make do all of compile & link jobs automatically. Makefile is a build script.
• make(actually gmake) is one of many tools. There are many tools like make, so called build
system.
• Visual studio has own build system. Hence it doesn’t use makefile.
$ gcc -o hellomake hellomake.c hellofunc.c -I.
hellomake: hellomake.c hellofunc.c
gcc -o hellomake hellomake.c hellofunc.c -I.
1. Command-line
2. Simple Makefile (1)
• “hellomake:” : rule name
• “hellomake.c hellofunc.c hellomake.h” : dependencies
• “gcc …” : actual command
• Simply “make” execute first rule defined in Makefile
Makefile Command-line
$ make or
$ make hellomake
Makefile?
CC=gcc
CFLAGS=-I.
hellomake: hellomake.o hellofunc.o
$(CC) -o hellomake hellomake.o hellofunc.o -I.
3. Simple Makefile (3)
Add constants
• “CC=gcc” : C Compiler
• “CFLAGS” : list of flags to pass to the compilation command
• For Fortran, “FC” instead of “CC”, “FFLAGS” instead of “CFLAGS”
• Indent(tab) with command line (“$(CC)”) is important!
$ make or
$ make hellomake
Makefile?
CC=gcc
CFLAGS=-I.
DEPS = hellomake.h
hellomake: hellomake.o hellofunc.o
$(CC) -o hellomake hellomake.o hellofunc.o -I.
%.o: %.c $(DEPS)
$(CC) -c $< $(CFLAGS)
4. Simple Makefile (4)
Automatically find .c files and make a rule for compilation(.o). $@ and $< are special macros in Makefile
• Rule %.o : rule for compilation, Rule hellomake : rule for link.
• $@ is the name of the file to be made. (e.g. hellomake for rule hellomake)
• $< The name of the first prerequisite. (hellomake.o is first prerequisite of rule hellomake)
• $^ The names of all the prerequisites, with spaces between them
• $* the prefix shared by target and dependent files (hellomake : $* of hellomake.c)
$ make or
$ make hellomake
Compiler & Linker Options
FFLAGS=-O3 -r8 -openmp -I /home/astromece/usr/fftw/include
LIBS=-L/home/astromeca/usr/lib -lfftw3 -lm
Compiler Options and Linker Options
• -O3 : Optimization Level (O1 : Code size optimization, O2 : General Optimization(Default), O3 : Aggressive
Optimization)
• -r8 : real type is a double precision (8byte(=64bit) for real)
• -I : Specify include directory. Include : .h files (declaration)
• -L : Specify library directory. Library files : .so or .a
• -lfftw3 : Link with fftw3 library
• -lm : link with math library (to use several math intrinsic functions)
Compiler & Linker Options
Recommend options
• -heap-arrays [numbers] : Puts automatic arrays and arrays above [numbers]KB created for temporary
computations on the heap instead of the stack. Same effect as allocate statement.
• -axcode [code] : Specify CPU architecture. DGIST, Boolt : AVX, CSE Server(OMP) : SSE4.1, CSE Server(SMP) :
SSE4.2
• -O2 : before enable –O3, compare results with -O2 and -O3 options. “Sometimes”, -O3 cause different results.
• -parallel : Enable auto parallelized code. turn on if you use DO CONCURRENT.
• -free : free-form source (f90 style), ifort automatically compile .f file as Fortran77. If you want to compile .f
suffix as Fortran 90 or higher, enable this option.
• $ man ifort gives us a lot of additional information.
Debug vs Release
• -g (to use debugger) or –check (check array bounds and son on) option help reducing errors, however, it adds
some additional code hence it slows code and turn off optimization automatically.
• If you are sure that you don’t have errors and want to get results, enable optimization but remove –g or –
check options.
MKL BLAS & CG Method
Intel MKL(Math Kernel Library) and BLAS
Intel MKL
• A library of optimized math routines for science, engineering, and financial applications.
• Basic functions related to matrix or vector included.
• You don’t need any installation, just add library.
BLAS
• Basic Linear Algebra Subprograms
• a set of low-level routines for performing common linear algebra operations such as vector addition, scalar
multiplication, dot products, linear combinations, and matrix multiplication
• It has same interface but has various implementations, ATLAS, MKL, OpenBLAS, GotoBLAS and so on.
• I will use MKL BLAS because it is easy to compile and well documentated.
• It already parallelized. Hence, just turn on an option make all parallelism without using OpenMP. (MPI
parallelism is not implemented).
I will show how to make CG method using MKL BLAS line by line.
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1
1
1
row offsets
column indices
values
9 entries (non zero entries)
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1
1 2
1 7
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3
1 2 2
1 7 2
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3
1 2 2 3
1 7 2 8
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1
1 7 2 8 5
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1 3
1 7 2 8 5 3
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1 3 4
1 7 2 8 5 3 9
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8
1 2 2 3 1 3 4 2
1 7 2 8 5 3 9 6
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8 10
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
column indices
values
9 entries (non zero entries)
row offsets
Indicates end
Sparse matrix
• If construct A matrix with zeros, 16 * 8bytes is required
• Sparse matrix, CSR matrix, requires 23 * 8bytes.
• Inefficient? No, if you have large A matrix, such as 𝑛𝑥 ⋅ 𝑛𝑦 × (𝑛𝑥 ⋅ 𝑛𝑦), CSR is SOOOO efficient.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8 10
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
What BLAS Library Functions Required?
• mkl_dcsrgemv : Computes matrix - vector product of a sparse general matrix stored in the CSR format (3-
array variation) with zero-based indexing with double precision. used in 𝐴𝑥 computation.
• call mkl_dcsrgemv(transa, m, a, ia, ja, x, y)
• transa : determine 𝐴𝑥 (transa=‘N’ or ‘n’) or 𝐴’𝑥 (transa=‘T’ or ‘t’ or ‘C’ or ‘c’).
• m : # of rows of A
• a : Values array of A in CSR format
• ia : Row offset array of A in CSR format
• ja : Column indices array of A in CSR format
• x : x vector
• y : output (𝐴𝑥)
• dcopy : Copy vector (routines), copy arrays from x to y. 𝑦 = 𝑥
• call dcopy(n, x, y)
• n : # of elements in vectors 𝑥 and 𝑦.
• x : Input, 𝑥 vector
• y : Output, 𝑦 vector
What BLAS Library Functions Required?
• ddot : Computes a vector-vector dot product. 𝑥 ⋅ 𝑦
• not subroutine, it’s a function.
• dot(x, y)
• x, y : 𝑥, 𝑦 vector
• daxpy : Computes a vector-scalar product and adds the result to a vector. SAXPY : Single-precision A·X Plus Y
• 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦
• call daxpy(n, a, x, y)
• n : # of elements in vectors 𝑥 and 𝑦.
• A : Scalar A
• x : Input, 𝑥 vector
• y : Output, 𝑦 vector
• dnrm2 : Computes the Euclidean norm of a vector. 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦
• not subroutine, it’s a function
• nrm2(x)
• n : # of elements in vectors 𝑥.
• x : Input, 𝑥 vector

More Related Content

What's hot

Embedded system -Introduction to hardware designing
Embedded system  -Introduction to hardware designingEmbedded system  -Introduction to hardware designing
Embedded system -Introduction to hardware designing
Vibrant Technologies & Computers
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
ZongYing Lyu
 
Synthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumSynthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrum
Hossam Hassan
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
A B Shinde
 
Presentation systemc
Presentation systemcPresentation systemc
Presentation systemc
SUBRAHMANYA S
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
Ehsan Totoni
 
SoC FPGA Technology
SoC FPGA TechnologySoC FPGA Technology
SoC FPGA Technology
Siraj Muhammad
 
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelism
deviyasharwin
 
Programmable logic device (PLD)
Programmable logic device (PLD)Programmable logic device (PLD)
Programmable logic device (PLD)
Sɐɐp ɐɥɯǝp
 
Open mp
Open mpOpen mp
Open mp
Gopi Saiteja
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
Christian Peel
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
Nico Ludwig
 
Максим Харченко. Erlang lincx
Максим Харченко. Erlang lincxМаксим Харченко. Erlang lincx
Максим Харченко. Erlang lincx
Alina Dolgikh
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
guest40fc7cd
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)
Deepak Kumar
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler design
Anul Chaudhary
 
OpenMp
OpenMpOpenMp
OpenMp
Neel Bhad
 
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprograms
baran19901990
 
Matlab isim link
Matlab isim linkMatlab isim link
Matlab isim link
Mohamed Abdelsalam
 
Openmp
OpenmpOpenmp

What's hot (20)

Embedded system -Introduction to hardware designing
Embedded system  -Introduction to hardware designingEmbedded system  -Introduction to hardware designing
Embedded system -Introduction to hardware designing
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
 
Synthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumSynthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrum
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
Presentation systemc
Presentation systemcPresentation systemc
Presentation systemc
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
 
SoC FPGA Technology
SoC FPGA TechnologySoC FPGA Technology
SoC FPGA Technology
 
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelism
 
Programmable logic device (PLD)
Programmable logic device (PLD)Programmable logic device (PLD)
Programmable logic device (PLD)
 
Open mp
Open mpOpen mp
Open mp
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
 
Максим Харченко. Erlang lincx
Максим Харченко. Erlang lincxМаксим Харченко. Erlang lincx
Максим Харченко. Erlang lincx
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler design
 
OpenMp
OpenMpOpenMp
OpenMp
 
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprograms
 
Matlab isim link
Matlab isim linkMatlab isim link
Matlab isim link
 
Openmp
OpenmpOpenmp
Openmp
 

Viewers also liked

CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
Computational Materials Science Initiative
 
Vietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaVietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaJongsu "Liam" Kim
 
NAS EP Algorithm
NAS EP Algorithm NAS EP Algorithm
NAS EP Algorithm
Jongsu "Liam" Kim
 
Stress Tensor & Rotation Tensor
Stress Tensor & Rotation TensorStress Tensor & Rotation Tensor
Stress Tensor & Rotation Tensor
Jongsu "Liam" Kim
 
Level Set Method
Level Set MethodLevel Set Method
Level Set Method
Jongsu "Liam" Kim
 
Level set method for droplet simulation
Level set method for droplet simulationLevel set method for droplet simulation
Level set method for droplet simulation
Jongsu "Liam" Kim
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Jongsu "Liam" Kim
 
The MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKThe MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACK
Maho Nakata
 
SAN
SANSAN
Android Application: Introduction
Android Application: IntroductionAndroid Application: Introduction
Android Application: Introduction
Jollen Chen
 
Intelligence, spies & espionage
Intelligence, spies & espionageIntelligence, spies & espionage
Intelligence, spies & espionage
dgnadt
 
Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012
The Air Force Office of Scientific Research
 
What is Network Security?
What is Network Security?What is Network Security?
What is Network Security?
Faith Zeller
 
Trends in spies
Trends in spiesTrends in spies
Trends in spies
Trend Reportz
 
Serial Killers Presentation1
Serial Killers Presentation1Serial Killers Presentation1
Serial Killers Presentation1
Taylor Leszczynski
 
SAN Review
SAN ReviewSAN Review
CITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHANCITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHAN
Sheikh Hasnain
 

Viewers also liked (20)

CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
 
Cubase subject introduction
Cubase subject introductionCubase subject introduction
Cubase subject introduction
 
History Against Against
History Against AgainstHistory Against Against
History Against Against
 
Vietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaVietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South Korea
 
Cubase1차발표
Cubase1차발표Cubase1차발표
Cubase1차발표
 
NAS EP Algorithm
NAS EP Algorithm NAS EP Algorithm
NAS EP Algorithm
 
Stress Tensor & Rotation Tensor
Stress Tensor & Rotation TensorStress Tensor & Rotation Tensor
Stress Tensor & Rotation Tensor
 
Level Set Method
Level Set MethodLevel Set Method
Level Set Method
 
Level set method for droplet simulation
Level set method for droplet simulationLevel set method for droplet simulation
Level set method for droplet simulation
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
 
The MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKThe MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACK
 
SAN
SANSAN
SAN
 
Android Application: Introduction
Android Application: IntroductionAndroid Application: Introduction
Android Application: Introduction
 
Intelligence, spies & espionage
Intelligence, spies & espionageIntelligence, spies & espionage
Intelligence, spies & espionage
 
Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012
 
What is Network Security?
What is Network Security?What is Network Security?
What is Network Security?
 
Trends in spies
Trends in spiesTrends in spies
Trends in spies
 
Serial Killers Presentation1
Serial Killers Presentation1Serial Killers Presentation1
Serial Killers Presentation1
 
SAN Review
SAN ReviewSAN Review
SAN Review
 
CITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHANCITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHAN
 

Similar to Fortran & Link with Library & Brief Explanation of MKL BLAS

Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshop
Vinay Kumar
 
Matlab lec1
Matlab lec1Matlab lec1
Matlab lec1
Amba Research
 
embedded C.pptx
embedded C.pptxembedded C.pptx
embedded C.pptx
mohammedahmed539376
 
Klee and angr
Klee and angrKlee and angr
Klee and angr
Wei-Bo Chen
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Carol McDonald
 
SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2
alhadi81
 
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)
Ortus Solutions, Corp
 
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
Ortus Solutions, Corp
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?
Zohar Elkayam
 
C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#
Hawkman Academy
 
Should i Go there
Should i Go thereShould i Go there
Should i Go there
Shimi Bandiel
 
MATLAB Programming
MATLAB Programming MATLAB Programming
MATLAB Programming
محمدعبد الحى
 
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analytics
Microsoft Tech Community
 
Etl2
Etl2Etl2
Learn c++ Programming Language
Learn c++ Programming LanguageLearn c++ Programming Language
Learn c++ Programming Language
Steve Johnson
 
Meg bernal insight2014 4219
Meg bernal insight2014 4219Meg bernal insight2014 4219
Meg bernal insight2014 4219
Peter Schouboe
 
Data types and Operators
Data types and OperatorsData types and Operators
Data types and Operators
raksharao
 
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operators
mcollison
 
DBCC - Dubi Lebel
DBCC - Dubi LebelDBCC - Dubi Lebel
DBCC - Dubi Lebel
sqlserver.co.il
 
Lecture 01 variables scripts and operations
Lecture 01   variables scripts and operationsLecture 01   variables scripts and operations
Lecture 01 variables scripts and operations
Smee Kaem Chann
 

Similar to Fortran & Link with Library & Brief Explanation of MKL BLAS (20)

Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshop
 
Matlab lec1
Matlab lec1Matlab lec1
Matlab lec1
 
embedded C.pptx
embedded C.pptxembedded C.pptx
embedded C.pptx
 
Klee and angr
Klee and angrKlee and angr
Klee and angr
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
 
SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2
 
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)
 
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?
 
C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#
 
Should i Go there
Should i Go thereShould i Go there
Should i Go there
 
MATLAB Programming
MATLAB Programming MATLAB Programming
MATLAB Programming
 
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analytics
 
Etl2
Etl2Etl2
Etl2
 
Learn c++ Programming Language
Learn c++ Programming LanguageLearn c++ Programming Language
Learn c++ Programming Language
 
Meg bernal insight2014 4219
Meg bernal insight2014 4219Meg bernal insight2014 4219
Meg bernal insight2014 4219
 
Data types and Operators
Data types and OperatorsData types and Operators
Data types and Operators
 
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operators
 
DBCC - Dubi Lebel
DBCC - Dubi LebelDBCC - Dubi Lebel
DBCC - Dubi Lebel
 
Lecture 01 variables scripts and operations
Lecture 01   variables scripts and operationsLecture 01   variables scripts and operations
Lecture 01 variables scripts and operations
 

Recently uploaded

CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
shahdabdulbaset
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))
shivani5543
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 

Recently uploaded (20)

CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 

Fortran & Link with Library & Brief Explanation of MKL BLAS

  • 1. Some things you need to know Jongsu Kim
  • 3. Fortran…. • Still Fortran 77, 90, or 95? • Fortran 2003 & 2008 is already here and 2015 will be a future. • Some parts will be deleted or obsolescent. • We are using Fortran wrong way.
  • 4. What you shouldn’t use Labeled Do Loops do 100 ii=istart,ilast,istep isum = isum + ii 100 continue 1 2 3 4 5 6 7 A B C(1) C(2) EQUIVALENCE specify the sharing of storage units by two or more objects in a scoping unit character (len=3) :: C(2) character (len=4) :: A,B equivalence (A,C(1)), (B,C(2)) COMMON Blocks of physical storage accessed by any of the scoping units in a program COMMON /BLOCKA/ A,B,C(10,30) COMMON I, J, K ENTRY subroutine-like-things Inside subroutine FIXED FORM SOURCE Fortran 77 style (80 column restriction) CHARACTER* form replaced with CHARACTER(LEN=?) NON-BLOCK DO CONSTRUCT the DO range doesn't end in a CONTINUE or END DO
  • 5. What you shouldn’t use Labeled Do Loops Label doesn’t need, hard to remember what meaning of number. Moreover, we have END DO or CYCLE statement EQUIVALENCE Equivalence is also error-prone. It is hard to memorize all of positions where this variables points. Since COMMON and EQUIVALENCE is not to encouraged to use, BLOCK statement is also not to do. COMMON Sharing lots of variables over program is dangerous. It is error-prone ENTRY It complicates program because we have module & subroutine NON-BLOCK DO CONSTRUCT Hard to maintain where DO loop ends
  • 6. What you might want to use – CYCLE , EXIT • Avoid GOTO Statement • Use CYCLE or EXIT statement • CYCLE : Skip to the end of a loop • EXIT : exit loop do i=1, 100 x = real(i) y = sin(x) if (i == 20) exit z = cos(x) enddo do i=1, 100 x = real(i) y = sin(x) if (i == 20) cycle z = cos(x) enddo 19 iteration will be done successfully, but at 20th iteration, y = sin(x) executed then exit loop. 100 iteration, but at i=20, z = cos(x) doesn’t executed
  • 7. What you might want to use – CYCLE , EXIT • Avoid GOTO statement • Use CYCLE or EXIT statement with nested loop • Constructs (DO, IF, CASE, etc.) may have names outer: do j=1, 100 inner: do i=1, 100 x = real(i) y = sin(x) if (i > 20) exit outer z = cos(x) enddo inner enddo outer Exit whole loop at i=21 Skip z=cos(x) when i>21 outer: do j=1, 100 inner: do i=1, 100 x = real(i) y = sin(x) if (i > 20) cycle outer z = cos(x) enddo inner enddo outer
  • 8. What you might want to use – WHERE real, dimension(4) :: & x = [ -1, 0, 1, 2 ], & a = [ 5, 6, 7, 8 ] ... where (x < 0) a = -1. end where where (x /= 0) a = 1. / a elsewhere a = 0. end where where (x < 0) a = -1. end where a : {-1.0, 6.0, 7.0, 8.0} where (x /= 0) a = 1. / a elsewhere a = 0. end where a : {-1.0, 0.0, 1.0/7.0, 1.0/8.0}
  • 9. What you might want to use – ANY integer, parameter :: n = 100 real, dimension(n,n) :: a, b, c1, c2 c1 = my_matmul(a, b) ! home-grown function c2 = matmul(a, b) ! built-in function if (any(abs(c1 - c2) > 1.e-4)) then print *, ’There are significant differences’ endif • ANY and WHERE remove redundant do loop
  • 10. What you might want to use – DO CONCURRENT • Vectorization • Simple example of Auto-Parallelization • Definition : Processes one operation on multiple pairs of operands at once do concurrent (i=1:m) call dosomething() end do DO i=1,1024 C(i) = A(i) * B(i) END DO DO i=1,1024,4 C(i:i+3) = A(i:i+3) * B(i:i+3) END DO • ALLOW/REQUEST Vectorization. If you need vectorization, enable –parallel option. • No data dependencies, No EXIT or CYCLE Statement, No return statement. • Use with OpenMP.
  • 11. For More.. • Read Fortran 2008 Standard • http://www.j3-fortran.org/doc/year/10/10-007.pdf • More recent document for Fortran 2015 (or more, working now) • http://j3-fortran.org/doc/year/15/15-007.pdf • Easy to read documents • The new features of Fortran 2008 : ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1828.pdf • Modern Programming Languages: Fortran90/95/2003/2008 : https://www.tacc.utexas.edu/documents/13601/162125/fortran_class.pdf
  • 13. Build? • Process From Source Code to Executable Files, so called Build. • Compiler : tool for compile, Linker : tool for Link. • ifort, gcc, gfortran, and so on are combined tool for compile & link. Source Code1.f Source Code2.f Source Code3.f Source Code1.o Source Code2.o Source Code3.o Compile Link Libraries(FFTW..) Readable Unreadable a.out
  • 14. Makefile? • make do all of compile & link jobs automatically. Makefile is a build script. • make(actually gmake) is one of many tools. There are many tools like make, so called build system. • Visual studio has own build system. Hence it doesn’t use makefile. $ gcc -o hellomake hellomake.c hellofunc.c -I. hellomake: hellomake.c hellofunc.c gcc -o hellomake hellomake.c hellofunc.c -I. 1. Command-line 2. Simple Makefile (1) • “hellomake:” : rule name • “hellomake.c hellofunc.c hellomake.h” : dependencies • “gcc …” : actual command • Simply “make” execute first rule defined in Makefile Makefile Command-line $ make or $ make hellomake
  • 15. Makefile? CC=gcc CFLAGS=-I. hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I. 3. Simple Makefile (3) Add constants • “CC=gcc” : C Compiler • “CFLAGS” : list of flags to pass to the compilation command • For Fortran, “FC” instead of “CC”, “FFLAGS” instead of “CFLAGS” • Indent(tab) with command line (“$(CC)”) is important! $ make or $ make hellomake
  • 16. Makefile? CC=gcc CFLAGS=-I. DEPS = hellomake.h hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I. %.o: %.c $(DEPS) $(CC) -c $< $(CFLAGS) 4. Simple Makefile (4) Automatically find .c files and make a rule for compilation(.o). $@ and $< are special macros in Makefile • Rule %.o : rule for compilation, Rule hellomake : rule for link. • $@ is the name of the file to be made. (e.g. hellomake for rule hellomake) • $< The name of the first prerequisite. (hellomake.o is first prerequisite of rule hellomake) • $^ The names of all the prerequisites, with spaces between them • $* the prefix shared by target and dependent files (hellomake : $* of hellomake.c) $ make or $ make hellomake
  • 17. Compiler & Linker Options FFLAGS=-O3 -r8 -openmp -I /home/astromece/usr/fftw/include LIBS=-L/home/astromeca/usr/lib -lfftw3 -lm Compiler Options and Linker Options • -O3 : Optimization Level (O1 : Code size optimization, O2 : General Optimization(Default), O3 : Aggressive Optimization) • -r8 : real type is a double precision (8byte(=64bit) for real) • -I : Specify include directory. Include : .h files (declaration) • -L : Specify library directory. Library files : .so or .a • -lfftw3 : Link with fftw3 library • -lm : link with math library (to use several math intrinsic functions)
  • 18. Compiler & Linker Options Recommend options • -heap-arrays [numbers] : Puts automatic arrays and arrays above [numbers]KB created for temporary computations on the heap instead of the stack. Same effect as allocate statement. • -axcode [code] : Specify CPU architecture. DGIST, Boolt : AVX, CSE Server(OMP) : SSE4.1, CSE Server(SMP) : SSE4.2 • -O2 : before enable –O3, compare results with -O2 and -O3 options. “Sometimes”, -O3 cause different results. • -parallel : Enable auto parallelized code. turn on if you use DO CONCURRENT. • -free : free-form source (f90 style), ifort automatically compile .f file as Fortran77. If you want to compile .f suffix as Fortran 90 or higher, enable this option. • $ man ifort gives us a lot of additional information. Debug vs Release • -g (to use debugger) or –check (check array bounds and son on) option help reducing errors, however, it adds some additional code hence it slows code and turn off optimization automatically. • If you are sure that you don’t have errors and want to get results, enable optimization but remove –g or – check options.
  • 19. MKL BLAS & CG Method
  • 20. Intel MKL(Math Kernel Library) and BLAS Intel MKL • A library of optimized math routines for science, engineering, and financial applications. • Basic functions related to matrix or vector included. • You don’t need any installation, just add library. BLAS • Basic Linear Algebra Subprograms • a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication • It has same interface but has various implementations, ATLAS, MKL, OpenBLAS, GotoBLAS and so on. • I will use MKL BLAS because it is easy to compile and well documentated. • It already parallelized. Hence, just turn on an option make all parallelism without using OpenMP. (MPI parallelism is not implemented). I will show how to make CG method using MKL BLAS line by line.
  • 21. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 1 1 row offsets column indices values 9 entries (non zero entries)
  • 22. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 1 2 1 7 column indices values 9 entries (non zero entries) row offsets
  • 23. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 1 2 2 1 7 2 column indices values 9 entries (non zero entries) row offsets
  • 24. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 1 2 2 3 1 7 2 8 column indices values 9 entries (non zero entries) row offsets
  • 25. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 1 7 2 8 5 column indices values 9 entries (non zero entries) row offsets
  • 26. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 3 1 7 2 8 5 3 column indices values 9 entries (non zero entries) row offsets
  • 27. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 3 4 1 7 2 8 5 3 9 column indices values 9 entries (non zero entries) row offsets
  • 28. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 1 2 2 3 1 3 4 2 1 7 2 8 5 3 9 6 column indices values 9 entries (non zero entries) row offsets
  • 29. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4 column indices values 9 entries (non zero entries) row offsets
  • 30. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 10 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4 column indices values 9 entries (non zero entries) row offsets Indicates end
  • 31. Sparse matrix • If construct A matrix with zeros, 16 * 8bytes is required • Sparse matrix, CSR matrix, requires 23 * 8bytes. • Inefficient? No, if you have large A matrix, such as 𝑛𝑥 ⋅ 𝑛𝑦 × (𝑛𝑥 ⋅ 𝑛𝑦), CSR is SOOOO efficient. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 10 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4
  • 32. What BLAS Library Functions Required? • mkl_dcsrgemv : Computes matrix - vector product of a sparse general matrix stored in the CSR format (3- array variation) with zero-based indexing with double precision. used in 𝐴𝑥 computation. • call mkl_dcsrgemv(transa, m, a, ia, ja, x, y) • transa : determine 𝐴𝑥 (transa=‘N’ or ‘n’) or 𝐴’𝑥 (transa=‘T’ or ‘t’ or ‘C’ or ‘c’). • m : # of rows of A • a : Values array of A in CSR format • ia : Row offset array of A in CSR format • ja : Column indices array of A in CSR format • x : x vector • y : output (𝐴𝑥) • dcopy : Copy vector (routines), copy arrays from x to y. 𝑦 = 𝑥 • call dcopy(n, x, y) • n : # of elements in vectors 𝑥 and 𝑦. • x : Input, 𝑥 vector • y : Output, 𝑦 vector
  • 33. What BLAS Library Functions Required? • ddot : Computes a vector-vector dot product. 𝑥 ⋅ 𝑦 • not subroutine, it’s a function. • dot(x, y) • x, y : 𝑥, 𝑦 vector • daxpy : Computes a vector-scalar product and adds the result to a vector. SAXPY : Single-precision A·X Plus Y • 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦 • call daxpy(n, a, x, y) • n : # of elements in vectors 𝑥 and 𝑦. • A : Scalar A • x : Input, 𝑥 vector • y : Output, 𝑦 vector • dnrm2 : Computes the Euclidean norm of a vector. 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦 • not subroutine, it’s a function • nrm2(x) • n : # of elements in vectors 𝑥. • x : Input, 𝑥 vector