SlideShare a Scribd company logo
Some things you need to know
Jongsu Kim
• Still Fortran 77, 90, or 95?
• Fortran 2003 & 2008 is already here and 2015 will be a future.
• Some parts will be deleted or obsolescent.
• We are using Fortran wrong way.
What you shouldn’t use
Labeled Do Loops
do 100
isum = isum + ii
100 continue
1 2 3 4 5 6 7
C(1) C(2)
specify the sharing of storage units by two or more objects
in a scoping unit
character (len=3) :: C(2)
character (len=4) :: A,B
equivalence (A,C(1)), (B,C(2))
Blocks of physical storage accessed by any of
the scoping units in a program
subroutine-like-things Inside subroutine
Fortran 77 style (80 column restriction)
replaced with CHARACTER(LEN=?)
the DO range doesn't end in a CONTINUE or
What you shouldn’t use
Labeled Do Loops
Label doesn’t need, hard to remember
what meaning of number. Moreover, we
have END DO or CYCLE statement
Equivalence is also error-prone. It is hard to
memorize all of positions where this variables
Since COMMON and EQUIVALENCE is not to
encouraged to use, BLOCK statement is also not
to do.
Sharing lots of variables over program is
dangerous. It is error-prone
It complicates program because we have
module & subroutine
Hard to maintain where DO loop ends
What you might want to use – CYCLE , EXIT
• Avoid GOTO Statement
• Use CYCLE or EXIT statement
• CYCLE : Skip to the end of a loop
• EXIT : exit loop
do i=1, 100
x = real(i)
y = sin(x)
if (i == 20) exit
z = cos(x)
do i=1, 100
x = real(i)
y = sin(x)
if (i == 20) cycle
z = cos(x)
19 iteration will be done successfully, but at
20th iteration, y = sin(x) executed
then exit loop.
100 iteration, but at i=20, z = cos(x)
doesn’t executed
What you might want to use – CYCLE , EXIT
• Avoid GOTO statement
• Use CYCLE or EXIT statement with nested loop
• Constructs (DO, IF, CASE, etc.) may have names
outer: do j=1, 100
inner: do i=1, 100
x = real(i)
y = sin(x)
if (i > 20) exit outer
z = cos(x)
enddo inner
enddo outer
Exit whole loop at i=21 Skip z=cos(x) when i>21
outer: do j=1, 100
inner: do i=1, 100
x = real(i)
y = sin(x)
if (i > 20) cycle outer
z = cos(x)
enddo inner
enddo outer
What you might want to use – WHERE
real, dimension(4) :: &
x = [ -1, 0, 1, 2 ], &
a = [ 5, 6, 7, 8 ]
where (x < 0)
a = -1.
end where
where (x /= 0)
a = 1. / a
a = 0.
end where
where (x < 0)
a = -1.
end where
a : {-1.0, 6.0, 7.0, 8.0}
where (x /= 0)
a = 1. / a
a = 0.
end where
a : {-1.0, 0.0, 1.0/7.0, 1.0/8.0}
What you might want to use – ANY
integer, parameter :: n = 100
real, dimension(n,n) :: a, b, c1, c2
c1 = my_matmul(a, b) ! home-grown function
c2 = matmul(a, b) ! built-in function
if (any(abs(c1 - c2) > 1.e-4)) then
print *, ’There are significant
• ANY and WHERE remove redundant do loop
What you might want to use – DO CONCURRENT
• Vectorization
• Simple example of Auto-Parallelization
• Definition : Processes one operation on multiple pairs of operands at once
do concurrent (i=1:m)
call dosomething()
end do
DO i=1,1024
C(i) = A(i) * B(i)
DO i=1,1024,4
C(i:i+3) = A(i:i+3) * B(i:i+3)
• ALLOW/REQUEST Vectorization. If you need vectorization, enable –parallel option.
• No data dependencies, No EXIT or CYCLE Statement, No return statement.
• Use with OpenMP.
For More..
• Read Fortran 2008 Standard
• More recent document for Fortran 2015 (or more, working now)
• Easy to read documents
• The new features of Fortran 2008 :
• Modern Programming Languages: Fortran90/95/2003/2008 :
Build System (MakeFile)
• Process From Source Code to Executable Files, so called Build.
• Compiler : tool for compile, Linker : tool for Link.
• ifort, gcc, gfortran, and so on are combined tool for compile & link.
Source Code1.f
Source Code2.f
Source Code3.f
Source Code1.o
Source Code2.o
Source Code3.o
Compile Link
Readable Unreadable
• make do all of compile & link jobs automatically. Makefile is a build script.
• make(actually gmake) is one of many tools. There are many tools like make, so called build
• Visual studio has own build system. Hence it doesn’t use makefile.
$ gcc -o hellomake hellomake.c hellofunc.c -I.
hellomake: hellomake.c hellofunc.c
gcc -o hellomake hellomake.c hellofunc.c -I.
1. Command-line
2. Simple Makefile (1)
• “hellomake:” : rule name
• “hellomake.c hellofunc.c hellomake.h” : dependencies
• “gcc …” : actual command
• Simply “make” execute first rule defined in Makefile
Makefile Command-line
$ make or
$ make hellomake
hellomake: hellomake.o hellofunc.o
$(CC) -o hellomake hellomake.o hellofunc.o -I.
3. Simple Makefile (3)
Add constants
• “CC=gcc” : C Compiler
• “CFLAGS” : list of flags to pass to the compilation command
• For Fortran, “FC” instead of “CC”, “FFLAGS” instead of “CFLAGS”
• Indent(tab) with command line (“$(CC)”) is important!
$ make or
$ make hellomake
DEPS = hellomake.h
hellomake: hellomake.o hellofunc.o
$(CC) -o hellomake hellomake.o hellofunc.o -I.
%.o: %.c $(DEPS)
$(CC) -c $< $(CFLAGS)
4. Simple Makefile (4)
Automatically find .c files and make a rule for compilation(.o). $@ and $< are special macros in Makefile
• Rule %.o : rule for compilation, Rule hellomake : rule for link.
• $@ is the name of the file to be made. (e.g. hellomake for rule hellomake)
• $< The name of the first prerequisite. (hellomake.o is first prerequisite of rule hellomake)
• $^ The names of all the prerequisites, with spaces between them
• $* the prefix shared by target and dependent files (hellomake : $* of hellomake.c)
$ make or
$ make hellomake
Compiler & Linker Options
FFLAGS=-O3 -r8 -openmp -I /home/astromece/usr/fftw/include
LIBS=-L/home/astromeca/usr/lib -lfftw3 -lm
Compiler Options and Linker Options
• -O3 : Optimization Level (O1 : Code size optimization, O2 : General Optimization(Default), O3 : Aggressive
• -r8 : real type is a double precision (8byte(=64bit) for real)
• -I : Specify include directory. Include : .h files (declaration)
• -L : Specify library directory. Library files : .so or .a
• -lfftw3 : Link with fftw3 library
• -lm : link with math library (to use several math intrinsic functions)
Compiler & Linker Options
Recommend options
• -heap-arrays [numbers] : Puts automatic arrays and arrays above [numbers]KB created for temporary
computations on the heap instead of the stack. Same effect as allocate statement.
• -axcode [code] : Specify CPU architecture. DGIST, Boolt : AVX, CSE Server(OMP) : SSE4.1, CSE Server(SMP) :
• -O2 : before enable –O3, compare results with -O2 and -O3 options. “Sometimes”, -O3 cause different results.
• -parallel : Enable auto parallelized code. turn on if you use DO CONCURRENT.
• -free : free-form source (f90 style), ifort automatically compile .f file as Fortran77. If you want to compile .f
suffix as Fortran 90 or higher, enable this option.
• $ man ifort gives us a lot of additional information.
Debug vs Release
• -g (to use debugger) or –check (check array bounds and son on) option help reducing errors, however, it adds
some additional code hence it slows code and turn off optimization automatically.
• If you are sure that you don’t have errors and want to get results, enable optimization but remove –g or –
check options.
MKL BLAS & CG Method
Intel MKL(Math Kernel Library) and BLAS
Intel MKL
• A library of optimized math routines for science, engineering, and financial applications.
• Basic functions related to matrix or vector included.
• You don’t need any installation, just add library.
• Basic Linear Algebra Subprograms
• a set of low-level routines for performing common linear algebra operations such as vector addition, scalar
multiplication, dot products, linear combinations, and matrix multiplication
• It has same interface but has various implementations, ATLAS, MKL, OpenBLAS, GotoBLAS and so on.
• I will use MKL BLAS because it is easy to compile and well documentated.
• It already parallelized. Hence, just turn on an option make all parallelism without using OpenMP. (MPI
parallelism is not implemented).
I will show how to make CG method using MKL BLAS line by line.
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
row offsets
column indices
9 entries (non zero entries)
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 2
1 7
column indices
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3
1 2 2
1 7 2
column indices
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3
1 2 2 3
1 7 2 8
column indices
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1
1 7 2 8 5
column indices
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1 3
1 7 2 8 5 3
column indices
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1 3 4
1 7 2 8 5 3 9
column indices
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8
1 2 2 3 1 3 4 2
1 7 2 8 5 3 9 6
column indices
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
column indices
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8 10
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
column indices
9 entries (non zero entries)
row offsets
Indicates end
Sparse matrix
• If construct A matrix with zeros, 16 * 8bytes is required
• Sparse matrix, CSR matrix, requires 23 * 8bytes.
• Inefficient? No, if you have large A matrix, such as 𝑛𝑥 ⋅ 𝑛𝑦 × (𝑛𝑥 ⋅ 𝑛𝑦), CSR is SOOOO efficient.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8 10
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
What BLAS Library Functions Required?
• mkl_dcsrgemv : Computes matrix - vector product of a sparse general matrix stored in the CSR format (3-
array variation) with zero-based indexing with double precision. used in 𝐴𝑥 computation.
• call mkl_dcsrgemv(transa, m, a, ia, ja, x, y)
• transa : determine 𝐴𝑥 (transa=‘N’ or ‘n’) or 𝐴’𝑥 (transa=‘T’ or ‘t’ or ‘C’ or ‘c’).
• m : # of rows of A
• a : Values array of A in CSR format
• ia : Row offset array of A in CSR format
• ja : Column indices array of A in CSR format
• x : x vector
• y : output (𝐴𝑥)
• dcopy : Copy vector (routines), copy arrays from x to y. 𝑦 = 𝑥
• call dcopy(n, x, y)
• n : # of elements in vectors 𝑥 and 𝑦.
• x : Input, 𝑥 vector
• y : Output, 𝑦 vector
What BLAS Library Functions Required?
• ddot : Computes a vector-vector dot product. 𝑥 ⋅ 𝑦
• not subroutine, it’s a function.
• dot(x, y)
• x, y : 𝑥, 𝑦 vector
• daxpy : Computes a vector-scalar product and adds the result to a vector. SAXPY : Single-precision A·X Plus Y
• 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦
• call daxpy(n, a, x, y)
• n : # of elements in vectors 𝑥 and 𝑦.
• A : Scalar A
• x : Input, 𝑥 vector
• y : Output, 𝑦 vector
• dnrm2 : Computes the Euclidean norm of a vector. 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦
• not subroutine, it’s a function
• nrm2(x)
• n : # of elements in vectors 𝑥.
• x : Input, 𝑥 vector

More Related Content

What's hot

Embedded system -Introduction to hardware designing
Embedded system  -Introduction to hardware designingEmbedded system  -Introduction to hardware designing
Embedded system -Introduction to hardware designing
Vibrant Technologies & Computers
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
ZongYing Lyu
Synthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumSynthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrum
Hossam Hassan
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
A B Shinde
Presentation systemc
Presentation systemcPresentation systemc
Presentation systemc
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
Ehsan Totoni
SoC FPGA Technology
SoC FPGA TechnologySoC FPGA Technology
SoC FPGA Technology
Siraj Muhammad
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelism
Programmable logic device (PLD)
Programmable logic device (PLD)Programmable logic device (PLD)
Programmable logic device (PLD)
Sɐɐp ɐɥɯǝp
Open mp
Open mpOpen mp
Open mp
Gopi Saiteja
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
Christian Peel
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
Nico Ludwig
Максим Харченко. Erlang lincx
Максим Харченко. Erlang lincxМаксим Харченко. Erlang lincx
Максим Харченко. Erlang lincx
Alina Dolgikh
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)
Deepak Kumar
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler design
Anul Chaudhary
Neel Bhad
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprograms
Matlab isim link
Matlab isim linkMatlab isim link
Matlab isim link
Mohamed Abdelsalam

What's hot (20)

Embedded system -Introduction to hardware designing
Embedded system  -Introduction to hardware designingEmbedded system  -Introduction to hardware designing
Embedded system -Introduction to hardware designing
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
Synthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumSynthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrum
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
Presentation systemc
Presentation systemcPresentation systemc
Presentation systemc
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
SoC FPGA Technology
SoC FPGA TechnologySoC FPGA Technology
SoC FPGA Technology
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelism
Programmable logic device (PLD)
Programmable logic device (PLD)Programmable logic device (PLD)
Programmable logic device (PLD)
Open mp
Open mpOpen mp
Open mp
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
Максим Харченко. Erlang lincx
Максим Харченко. Erlang lincxМаксим Харченко. Erlang lincx
Максим Харченко. Erlang lincx
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler design
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprograms
Matlab isim link
Matlab isim linkMatlab isim link
Matlab isim link

Viewers also liked

CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
Computational Materials Science Initiative
Vietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaVietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaJongsu "Liam" Kim
NAS EP Algorithm
NAS EP Algorithm NAS EP Algorithm
NAS EP Algorithm
Jongsu "Liam" Kim
Stress Tensor & Rotation Tensor
Stress Tensor & Rotation TensorStress Tensor & Rotation Tensor
Stress Tensor & Rotation Tensor
Jongsu "Liam" Kim
Level Set Method
Level Set MethodLevel Set Method
Level Set Method
Jongsu "Liam" Kim
Level set method for droplet simulation
Level set method for droplet simulationLevel set method for droplet simulation
Level set method for droplet simulation
Jongsu "Liam" Kim
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Jongsu "Liam" Kim
The MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKThe MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACK
Maho Nakata
Android Application: Introduction
Android Application: IntroductionAndroid Application: Introduction
Android Application: Introduction
Jollen Chen
Intelligence, spies & espionage
Intelligence, spies & espionageIntelligence, spies & espionage
Intelligence, spies & espionage
Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012
The Air Force Office of Scientific Research
What is Network Security?
What is Network Security?What is Network Security?
What is Network Security?
Faith Zeller
Trends in spies
Trends in spiesTrends in spies
Trends in spies
Trend Reportz
Serial Killers Presentation1
Serial Killers Presentation1Serial Killers Presentation1
Serial Killers Presentation1
Taylor Leszczynski
SAN Review
SAN ReviewSAN Review
Sheikh Hasnain

Viewers also liked (20)

CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
Cubase subject introduction
Cubase subject introductionCubase subject introduction
Cubase subject introduction
History Against Against
History Against AgainstHistory Against Against
History Against Against
Vietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaVietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South Korea
NAS EP Algorithm
NAS EP Algorithm NAS EP Algorithm
NAS EP Algorithm
Stress Tensor & Rotation Tensor
Stress Tensor & Rotation TensorStress Tensor & Rotation Tensor
Stress Tensor & Rotation Tensor
Level Set Method
Level Set MethodLevel Set Method
Level Set Method
Level set method for droplet simulation
Level set method for droplet simulationLevel set method for droplet simulation
Level set method for droplet simulation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
The MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKThe MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACK
Android Application: Introduction
Android Application: IntroductionAndroid Application: Introduction
Android Application: Introduction
Intelligence, spies & espionage
Intelligence, spies & espionageIntelligence, spies & espionage
Intelligence, spies & espionage
Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012
What is Network Security?
What is Network Security?What is Network Security?
What is Network Security?
Trends in spies
Trends in spiesTrends in spies
Trends in spies
Serial Killers Presentation1
Serial Killers Presentation1Serial Killers Presentation1
Serial Killers Presentation1
SAN Review
SAN ReviewSAN Review
SAN Review

Similar to Fortran & Link with Library & Brief Explanation of MKL BLAS

Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshop
Vinay Kumar
Matlab lec1
Matlab lec1Matlab lec1
Matlab lec1
Amba Research
embedded C.pptx
embedded C.pptxembedded C.pptx
embedded C.pptx
Klee and angr
Klee and angrKlee and angr
Klee and angr
Wei-Bo Chen
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Carol McDonald
SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)
Ortus Solutions, Corp
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
Ortus Solutions, Corp
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?
Zohar Elkayam
C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#
Hawkman Academy
Should i Go there
Should i Go thereShould i Go there
Should i Go there
Shimi Bandiel
MATLAB Programming
MATLAB Programming MATLAB Programming
MATLAB Programming
محمدعبد الحى
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analytics
Microsoft Tech Community
Learn c++ Programming Language
Learn c++ Programming LanguageLearn c++ Programming Language
Learn c++ Programming Language
Steve Johnson
Meg bernal insight2014 4219
Meg bernal insight2014 4219Meg bernal insight2014 4219
Meg bernal insight2014 4219
Peter Schouboe
Data types and Operators
Data types and OperatorsData types and Operators
Data types and Operators
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operators
DBCC - Dubi Lebel
DBCC - Dubi LebelDBCC - Dubi Lebel
DBCC - Dubi Lebel
Lecture 01 variables scripts and operations
Lecture 01   variables scripts and operationsLecture 01   variables scripts and operations
Lecture 01 variables scripts and operations
Smee Kaem Chann

Similar to Fortran & Link with Library & Brief Explanation of MKL BLAS (20)

Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshop
Matlab lec1
Matlab lec1Matlab lec1
Matlab lec1
embedded C.pptx
embedded C.pptxembedded C.pptx
embedded C.pptx
Klee and angr
Klee and angrKlee and angr
Klee and angr
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?
C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#
Should i Go there
Should i Go thereShould i Go there
Should i Go there
MATLAB Programming
MATLAB Programming MATLAB Programming
MATLAB Programming
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analytics
Learn c++ Programming Language
Learn c++ Programming LanguageLearn c++ Programming Language
Learn c++ Programming Language
Meg bernal insight2014 4219
Meg bernal insight2014 4219Meg bernal insight2014 4219
Meg bernal insight2014 4219
Data types and Operators
Data types and OperatorsData types and Operators
Data types and Operators
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operators
DBCC - Dubi Lebel
DBCC - Dubi LebelDBCC - Dubi Lebel
DBCC - Dubi Lebel
Lecture 01 variables scripts and operations
Lecture 01   variables scripts and operationsLecture 01   variables scripts and operations
Lecture 01 variables scripts and operations

Recently uploaded

Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
Mahmoud Morsy
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
Madan Karki
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning

Recently uploaded (20)

Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning

Fortran & Link with Library & Brief Explanation of MKL BLAS

  • 1. Some things you need to know Jongsu Kim
  • 3. Fortran…. • Still Fortran 77, 90, or 95? • Fortran 2003 & 2008 is already here and 2015 will be a future. • Some parts will be deleted or obsolescent. • We are using Fortran wrong way.
  • 4. What you shouldn’t use Labeled Do Loops do 100 ii=istart,ilast,istep isum = isum + ii 100 continue 1 2 3 4 5 6 7 A B C(1) C(2) EQUIVALENCE specify the sharing of storage units by two or more objects in a scoping unit character (len=3) :: C(2) character (len=4) :: A,B equivalence (A,C(1)), (B,C(2)) COMMON Blocks of physical storage accessed by any of the scoping units in a program COMMON /BLOCKA/ A,B,C(10,30) COMMON I, J, K ENTRY subroutine-like-things Inside subroutine FIXED FORM SOURCE Fortran 77 style (80 column restriction) CHARACTER* form replaced with CHARACTER(LEN=?) NON-BLOCK DO CONSTRUCT the DO range doesn't end in a CONTINUE or END DO
  • 5. What you shouldn’t use Labeled Do Loops Label doesn’t need, hard to remember what meaning of number. Moreover, we have END DO or CYCLE statement EQUIVALENCE Equivalence is also error-prone. It is hard to memorize all of positions where this variables points. Since COMMON and EQUIVALENCE is not to encouraged to use, BLOCK statement is also not to do. COMMON Sharing lots of variables over program is dangerous. It is error-prone ENTRY It complicates program because we have module & subroutine NON-BLOCK DO CONSTRUCT Hard to maintain where DO loop ends
  • 6. What you might want to use – CYCLE , EXIT • Avoid GOTO Statement • Use CYCLE or EXIT statement • CYCLE : Skip to the end of a loop • EXIT : exit loop do i=1, 100 x = real(i) y = sin(x) if (i == 20) exit z = cos(x) enddo do i=1, 100 x = real(i) y = sin(x) if (i == 20) cycle z = cos(x) enddo 19 iteration will be done successfully, but at 20th iteration, y = sin(x) executed then exit loop. 100 iteration, but at i=20, z = cos(x) doesn’t executed
  • 7. What you might want to use – CYCLE , EXIT • Avoid GOTO statement • Use CYCLE or EXIT statement with nested loop • Constructs (DO, IF, CASE, etc.) may have names outer: do j=1, 100 inner: do i=1, 100 x = real(i) y = sin(x) if (i > 20) exit outer z = cos(x) enddo inner enddo outer Exit whole loop at i=21 Skip z=cos(x) when i>21 outer: do j=1, 100 inner: do i=1, 100 x = real(i) y = sin(x) if (i > 20) cycle outer z = cos(x) enddo inner enddo outer
  • 8. What you might want to use – WHERE real, dimension(4) :: & x = [ -1, 0, 1, 2 ], & a = [ 5, 6, 7, 8 ] ... where (x < 0) a = -1. end where where (x /= 0) a = 1. / a elsewhere a = 0. end where where (x < 0) a = -1. end where a : {-1.0, 6.0, 7.0, 8.0} where (x /= 0) a = 1. / a elsewhere a = 0. end where a : {-1.0, 0.0, 1.0/7.0, 1.0/8.0}
  • 9. What you might want to use – ANY integer, parameter :: n = 100 real, dimension(n,n) :: a, b, c1, c2 c1 = my_matmul(a, b) ! home-grown function c2 = matmul(a, b) ! built-in function if (any(abs(c1 - c2) > 1.e-4)) then print *, ’There are significant differences’ endif • ANY and WHERE remove redundant do loop
  • 10. What you might want to use – DO CONCURRENT • Vectorization • Simple example of Auto-Parallelization • Definition : Processes one operation on multiple pairs of operands at once do concurrent (i=1:m) call dosomething() end do DO i=1,1024 C(i) = A(i) * B(i) END DO DO i=1,1024,4 C(i:i+3) = A(i:i+3) * B(i:i+3) END DO • ALLOW/REQUEST Vectorization. If you need vectorization, enable –parallel option. • No data dependencies, No EXIT or CYCLE Statement, No return statement. • Use with OpenMP.
  • 11. For More.. • Read Fortran 2008 Standard • • More recent document for Fortran 2015 (or more, working now) • • Easy to read documents • The new features of Fortran 2008 : • Modern Programming Languages: Fortran90/95/2003/2008 :
  • 13. Build? • Process From Source Code to Executable Files, so called Build. • Compiler : tool for compile, Linker : tool for Link. • ifort, gcc, gfortran, and so on are combined tool for compile & link. Source Code1.f Source Code2.f Source Code3.f Source Code1.o Source Code2.o Source Code3.o Compile Link Libraries(FFTW..) Readable Unreadable a.out
  • 14. Makefile? • make do all of compile & link jobs automatically. Makefile is a build script. • make(actually gmake) is one of many tools. There are many tools like make, so called build system. • Visual studio has own build system. Hence it doesn’t use makefile. $ gcc -o hellomake hellomake.c hellofunc.c -I. hellomake: hellomake.c hellofunc.c gcc -o hellomake hellomake.c hellofunc.c -I. 1. Command-line 2. Simple Makefile (1) • “hellomake:” : rule name • “hellomake.c hellofunc.c hellomake.h” : dependencies • “gcc …” : actual command • Simply “make” execute first rule defined in Makefile Makefile Command-line $ make or $ make hellomake
  • 15. Makefile? CC=gcc CFLAGS=-I. hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I. 3. Simple Makefile (3) Add constants • “CC=gcc” : C Compiler • “CFLAGS” : list of flags to pass to the compilation command • For Fortran, “FC” instead of “CC”, “FFLAGS” instead of “CFLAGS” • Indent(tab) with command line (“$(CC)”) is important! $ make or $ make hellomake
  • 16. Makefile? CC=gcc CFLAGS=-I. DEPS = hellomake.h hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I. %.o: %.c $(DEPS) $(CC) -c $< $(CFLAGS) 4. Simple Makefile (4) Automatically find .c files and make a rule for compilation(.o). $@ and $< are special macros in Makefile • Rule %.o : rule for compilation, Rule hellomake : rule for link. • $@ is the name of the file to be made. (e.g. hellomake for rule hellomake) • $< The name of the first prerequisite. (hellomake.o is first prerequisite of rule hellomake) • $^ The names of all the prerequisites, with spaces between them • $* the prefix shared by target and dependent files (hellomake : $* of hellomake.c) $ make or $ make hellomake
  • 17. Compiler & Linker Options FFLAGS=-O3 -r8 -openmp -I /home/astromece/usr/fftw/include LIBS=-L/home/astromeca/usr/lib -lfftw3 -lm Compiler Options and Linker Options • -O3 : Optimization Level (O1 : Code size optimization, O2 : General Optimization(Default), O3 : Aggressive Optimization) • -r8 : real type is a double precision (8byte(=64bit) for real) • -I : Specify include directory. Include : .h files (declaration) • -L : Specify library directory. Library files : .so or .a • -lfftw3 : Link with fftw3 library • -lm : link with math library (to use several math intrinsic functions)
  • 18. Compiler & Linker Options Recommend options • -heap-arrays [numbers] : Puts automatic arrays and arrays above [numbers]KB created for temporary computations on the heap instead of the stack. Same effect as allocate statement. • -axcode [code] : Specify CPU architecture. DGIST, Boolt : AVX, CSE Server(OMP) : SSE4.1, CSE Server(SMP) : SSE4.2 • -O2 : before enable –O3, compare results with -O2 and -O3 options. “Sometimes”, -O3 cause different results. • -parallel : Enable auto parallelized code. turn on if you use DO CONCURRENT. • -free : free-form source (f90 style), ifort automatically compile .f file as Fortran77. If you want to compile .f suffix as Fortran 90 or higher, enable this option. • $ man ifort gives us a lot of additional information. Debug vs Release • -g (to use debugger) or –check (check array bounds and son on) option help reducing errors, however, it adds some additional code hence it slows code and turn off optimization automatically. • If you are sure that you don’t have errors and want to get results, enable optimization but remove –g or – check options.
  • 19. MKL BLAS & CG Method
  • 20. Intel MKL(Math Kernel Library) and BLAS Intel MKL • A library of optimized math routines for science, engineering, and financial applications. • Basic functions related to matrix or vector included. • You don’t need any installation, just add library. BLAS • Basic Linear Algebra Subprograms • a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication • It has same interface but has various implementations, ATLAS, MKL, OpenBLAS, GotoBLAS and so on. • I will use MKL BLAS because it is easy to compile and well documentated. • It already parallelized. Hence, just turn on an option make all parallelism without using OpenMP. (MPI parallelism is not implemented). I will show how to make CG method using MKL BLAS line by line.
  • 21. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 1 1 row offsets column indices values 9 entries (non zero entries)
  • 22. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 1 2 1 7 column indices values 9 entries (non zero entries) row offsets
  • 23. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 1 2 2 1 7 2 column indices values 9 entries (non zero entries) row offsets
  • 24. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 1 2 2 3 1 7 2 8 column indices values 9 entries (non zero entries) row offsets
  • 25. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 1 7 2 8 5 column indices values 9 entries (non zero entries) row offsets
  • 26. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 3 1 7 2 8 5 3 column indices values 9 entries (non zero entries) row offsets
  • 27. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 3 4 1 7 2 8 5 3 9 column indices values 9 entries (non zero entries) row offsets
  • 28. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 1 2 2 3 1 3 4 2 1 7 2 8 5 3 9 6 column indices values 9 entries (non zero entries) row offsets
  • 29. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4 column indices values 9 entries (non zero entries) row offsets
  • 30. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 10 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4 column indices values 9 entries (non zero entries) row offsets Indicates end
  • 31. Sparse matrix • If construct A matrix with zeros, 16 * 8bytes is required • Sparse matrix, CSR matrix, requires 23 * 8bytes. • Inefficient? No, if you have large A matrix, such as 𝑛𝑥 ⋅ 𝑛𝑦 × (𝑛𝑥 ⋅ 𝑛𝑦), CSR is SOOOO efficient. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 10 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4
  • 32. What BLAS Library Functions Required? • mkl_dcsrgemv : Computes matrix - vector product of a sparse general matrix stored in the CSR format (3- array variation) with zero-based indexing with double precision. used in 𝐴𝑥 computation. • call mkl_dcsrgemv(transa, m, a, ia, ja, x, y) • transa : determine 𝐴𝑥 (transa=‘N’ or ‘n’) or 𝐴’𝑥 (transa=‘T’ or ‘t’ or ‘C’ or ‘c’). • m : # of rows of A • a : Values array of A in CSR format • ia : Row offset array of A in CSR format • ja : Column indices array of A in CSR format • x : x vector • y : output (𝐴𝑥) • dcopy : Copy vector (routines), copy arrays from x to y. 𝑦 = 𝑥 • call dcopy(n, x, y) • n : # of elements in vectors 𝑥 and 𝑦. • x : Input, 𝑥 vector • y : Output, 𝑦 vector
  • 33. What BLAS Library Functions Required? • ddot : Computes a vector-vector dot product. 𝑥 ⋅ 𝑦 • not subroutine, it’s a function. • dot(x, y) • x, y : 𝑥, 𝑦 vector • daxpy : Computes a vector-scalar product and adds the result to a vector. SAXPY : Single-precision A·X Plus Y • 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦 • call daxpy(n, a, x, y) • n : # of elements in vectors 𝑥 and 𝑦. • A : Scalar A • x : Input, 𝑥 vector • y : Output, 𝑦 vector • dnrm2 : Computes the Euclidean norm of a vector. 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦 • not subroutine, it’s a function • nrm2(x) • n : # of elements in vectors 𝑥. • x : Input, 𝑥 vector