SlideShare a Scribd company logo
1 of 60
Download to read offline
Hudson Mendes

Lead Java Software Engineer @ AIQUDO

twitter.com/hudsonmendes

linkedin.com/in/hudsonmendes

medium.com/@hudsonmendes
SIMD (VECTORISATION)
SINGLE INSTRUCTION
MULTIPLE DATA
SOUJAVA & BELFASTJUG
WHAT IS AN IMAGE FOR

NEURAL NETS / AI?
WHAT ARE IMAGES FOR

NEURAL NETS / AI?
WHY DOES IT MATTER

FOR US / FOR JAVA?
CERN’S COLT

JNI AND PANAMA
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
WHAT IS AN IMAGE FOR NEURAL NETS / AI?
Hudson MendesSouJava & Belfast JUG
WHAT IS AN IMAGE FOR NEURAL NETS / AI?
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SouJava & Belfast JUG
WHAT IS AN IMAGE FOR NEURAL NETS / AI?
Hudson Mendes
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210
200
200 200 130 130 …
210 200 200 130 130 …
… … … … … …
210 200 130 130 130 …
RED
GREEN
BLUE
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210 200 200 130 130 …
210 200 200 130 130 …
… … … … … …
89 60 65 50 20 20
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210 200 200 130 130 …
210 200 200 130 130 …
… … … … … …
… 20 0 12 12 0
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SouJava & Belfast JUG
WHAT IS AN IMAGE FOR NEURAL NETS / AI?
Hudson Mendes
▸ Humans: colours

& other "features"
▸ AI: numbers
▸ Which numbers?

RGB matrices
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210
200
200 200 130 130 …
210 200 200 130 130 …
… … … … … …
210 200 130 130 130 …
RED
GREEN
BLUE
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210 200 200 130 130 …
210 200 200 130 130 …
… … … … … …
89 60 65 50 20 20
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210 200 200 130 130 …
210 200 200 130 130 …
… … … … … …
… 20 0 12 12 0
255
255
230
230
210
200
130
130
130
120
…
89
60
65
50
20
20
20
…
20
0
12
12
0
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SouJava & Belfast JUG
WHAT IS AN IMAGE FOR NEURAL NETS / AI?
Hudson Mendes
▸ Humans: colours

& other "features"
▸ AI: numbers
▸ Which numbers?

RGB matrices
▸ 64 x 64 x 3 = 12,288
▸ 1 Image =

1 Feature Vector
255
255
230
230
210
200
130
130
130
120
…
89
60
65
50
20
20
20
…
20
0
12
12
0
64 pixels
64pixels
64x64x3=12,288
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210
200
200 200 130 130 …
210 200 200 130 130 …
… … … … … …
210 200 130 130 130 …
RED
GREEN
BLUE
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210 200 200 130 130 …
210 200 200 130 130 …
… … … … … …
89 60 65 50 20 20
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210 200 200 130 130 …
210 200 200 130 130 …
… … … … … …
… 20 0 12 12 0
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SouJava & Belfast JUG
WHAT IS AN IMAGE FOR NEURAL NETS / AI?
Hudson Mendes
▸ Humans: colours

& other "features"
▸ AI: numbers
▸ Which numbers?

RGB matrices
▸ 64 x 64 x 3 = 12,288
▸ 1000 Images =

1000 Feature Vectors
255
255
230
230
210
200
130
130
130
120
…
89
60
65
50
20
20
20
…
20
0
12
12
0
64 pixels
64pixels
64x64x3=12,288
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210
200
200 200 130 130 …
210 200 200 130 130 …
… … … … … …
210 200 130 130 130 …
RED
GREEN
BLUE
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210 200 200 130 130 …
210 200 200 130 130 …
… … … … … …
89 60 65 50 20 20
255 255 255 244 230 …
255 255 255 230 230 …
255 255 230 210 200 …
230 210 200 130 130 …
210 200 200 130 130 …
210 200 200 130 130 …
… … … … … …
… 20 0 12 12 0
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SouJava & Belfast JUG
64X64 IMAGE =
ARRAY OF 12,288



1024X1024 =

ARRAY OF 3,145,728
CNN, or Convolutional Neural Nets

ς(X * W.T + b)

ς(x) => 1 / (1 + Math.pow(Math.E, x)
Hudson MendesSouJava & Belfast JUG
CNN, or Convolutional Neural Nets

ς(X * W.T + b)

X => feature vector of the image
Hudson MendesSouJava & Belfast JUG
CNN, or Convolutional Neural Nets

ς(X * W.T + b)

X, W, b: not numbers, but vectors
Hudson MendesSouJava & Belfast JUG
CNN, or Convolutional Neural Nets

ς(X * W.T + b)

X, W, b: LARGE VECTORS
Hudson MendesSouJava & Belfast JUG
CNN, or Convolutional Neural Nets

=> Vectorisation <=
Hudson MendesSouJava & Belfast JUG
SIMD

Single Instruction, Multiple Data

Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SouJava & Belfast JUG
SIMD

How much faster?

Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SouJava & Belfast JUG
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
EXAMPLE WITH NUMPY
elapsedinms
0
125
250
375
500
500 X 5 5000 X 5 50000 X 5
SiSD SiMD
SouJava & Belfast JUG
WELL, I DON’T DO AI OR NNS.

DOES IT MATTER TO ME?
WHAT ARE IMAGES FOR

NEURAL NETS / AI?
WHY DOES IT MATTER

FOR US / FOR JAVA?
CERN’S COLT

JNI AND PANAMA
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SIMD

Single “Instruction"?
SouJava & Belfast JUG
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SIMD

Single “Instruction"?

Add, Subtract, Multiply, Divide

but also Log, Exp, Sqrt, etc
SouJava & Belfast JUG
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SIMD

Multiple “Data"?
SouJava & Belfast JUG
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SIMD

Multiple “Data”?

Numerical data:

Int, Long, Float, Double, etc
SouJava & Belfast JUG
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SISD IMPLEMENTATIONS: SUM STREAM OF DOUBLES
SouJava & Belfast JUG
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SISD IMPLEMENTATIONS: ADDING ELEMENTS OF 2 ARRAYS
SouJava & Belfast JUG
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SISD IMPLEMENTATIONS: EUCLIDIAN DISTANCE BTW DOUBLE IN 2 ARRAYS
SouJava & Belfast JUG
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SISD IMPLEMENTATIONS: LOGS OF MATRICES
SouJava & Belfast JUG
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SISD (OR SINGLE INSTRUCTION SINGLE DATA) IN BYTECODE
SouJava & Belfast JUG
Hudson Mendes
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
SISD (OR SINGLE INSTRUCTION SINGLE DATA) IN BYTECODE
▸ Given Vector of size N
▸ Each 1 N (N[i]) receives 1 instruction (Single Instruction, Single Data)
▸ SiSd O(n) = 2n x > SiMd O(n) = n
elapsedinms
0
250
500
500 X 5 1000 X 5 5000 X 5 10000 X 5 50000 X 5 100000 X 5
SiSD SiMD
SouJava & Belfast JUG
HOW TO DO SIMD

IN JAVA THEN?
WHAT ARE IMAGES FOR

NEURAL NETS / AI?
WHY DOES IT MATTER

FOR US / FOR JAVA?
CERN’S COLT

JNI AND PANAMA
LET’S ASK THE DATA SCIENTISTS?
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson MendesBelfast JUG
CLASSIC SCIENTIFIC COMPUTING

REFERENCES POINT TO…
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson MendesBelfast JUG
CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
1 GFLOP = 1.000.000.000 float point operations per second
SouJava & Belfast JUG
CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
1 GFLOP = 1.000.000.000 float point operations per second
That looks pretty fast!
SouJava & Belfast JUG
CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
1 GFLOP = 1.000.000.000 float point operations per second
That looks pretty fast!
CERN’s Colt, Java SIMD Library?
SouJava & Belfast JUG
CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson MendesSouJava & Belfast JUG
CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson MendesSouJava & Belfast JUG
CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Not Faster than Serial, but WHY?
SouJava & Belfast JUG
CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson MendesSouJava & Belfast JUG
CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Not SIMD!
Digging Source Code:
SouJava & Belfast JUG
CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
1 GFLOP = 1.000.000.000 float point operations per second
Fast, but not SIMD
CERN’s Colt, Java SIMD Library?
SouJava & Belfast JUG
DEEP LEARNING DATA

SCIENTISTS SAY…
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson MendesBelfast JUG
DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson MendesSouJava & Belfast JUG
DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson MendesSouJava & Belfast JUG
DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Faster than all of THEM!
SouJava & Belfast JUG
DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Digging Source Code:
SouJava & Belfast JUG
DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Digging Source Code:
SouJava & Belfast JUG
DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Digging Source Code:
Yes, SIMD! done with JNI
SouJava & Belfast JUG
SO, IS JNI (JAVA NATIVE INTERFACE)

THE ONLY WAY TO SIMDS?
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson MendesSouJava & Belfast JUG
WAYS TO DO SIMD?
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
1 + 2 = 3
iconst_1
iconst_2
iadd
MOV AX,@DATA
MOV DS,AX
MOV AX,OPR1
MOV BX,OPR2
CLC
ADD AX,BX
MOV DI,OFFSET RESULT
MOV [DI], AX
MOV AH,09H
MOV DX,OFFSET RESULT
INT 21H
MOV AH,4CH
INT 21H
END
JAVA
BYTECODE
ASSEMBLER
Hudson Mendes
WAYS TO DO SIMD?
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
1 + 2 = 3
iconst_1
iconst_2
iadd
MOV AX,@DATA
MOV DS,AX
MOV AX,OPR1
MOV BX,OPR2
CLC
ADD AX,BX
MOV DI,OFFSET RESULT
MOV [DI], AX
MOV AH,09H
MOV DX,OFFSET RESULT
INT 21H
MOV AH,4CH
INT 21H
END
VECTOR.SUM() ivector_1
vector_add
EXPORT XCORR_KERNEL
xcorr_kernel PROC
VMOV.I32 q0, #0
CMP r3, #0
BLE xcorr_kernel_done
VLD1.16 {d3}, [r2]!
SUBS r3, r3, #4
BLE xcorr_kernel_process4_done
(…)
JAVA
BYTECODE
ASSEMBLER
JAVA
BYTECODE
ASSEMBLER
SouJava & Belfast JUG Hudson Mendes
WAYS TO DO SIMD?
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
VECTOR.SUM() ivector_1
vector_add
EXPORT XCORR_KERNEL
xcorr_kernel PROC
VMOV.I32 q0, #0
CMP r3, #0
BLE xcorr_kernel_done
VLD1.16 {d3}, [r2]!
SUBS r3, r3, #4
BLE xcorr_kernel_process4_done
(…)
JAVA
BYTECODE
ASSEMBLER
SO, YES - AT THE MINUTE

JNI IS PRETTY MUCH THE ONLY WAY…
SouJava & Belfast JUG Hudson Mendes
IS THERE ANYTHING BETTER

THAN JNI IN THE NEAR FUTURE?
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson MendesBelfast JUG
IS THERE ANYTHING BETTER

THAN JNI IN THE NEAR FUTURE?
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson MendesBelfast JUG
YES!
JAVA9+ SUPERWORD
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Source http://prestodb.rocks/code/simd/
SouJava & Belfast JUG
JAVA9+ SUPERWORD
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Source http://prestodb.rocks/code/simd/
"Exploiting Superword Level Parallelism with Multimedia Instruction
Sets", LARSEN Samuel and AMARASINGHE Saman, from MIT

HTTP://GROUPS.CSAIL.MIT.EDU/CAG/SLP/SLP-PLDI-2000.PDF
SouJava & Belfast JUG
JAVA9+ SUPERWORD
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Source http://prestodb.rocks/code/simd/
FOR SUPERWORD, MUST NOT HAVE:
•AN OR CONDITION AS THE LOOP CONDITION
•A NON-INLINED METHOD INSIDE THE LOOP
•AN ARBITRARY METHOD AS THE LOOP CONDITION
•MANUALLY UNROLLING OF THE LOOP
•A LONG AS THE LOOP VARIABLE
•MULTIPLE EXIT POINTS
SouJava & Belfast JUG
JAVA9+ SUPERWORD
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Source http://prestodb.rocks/code/simd/
ON BY DEFAULT ON J9+
FOR SUPERWORD, MUST NOT HAVE:
•AN OR CONDITION AS THE LOOP CONDITION
•A NON-INLINED METHOD INSIDE THE LOOP
•AN ARBITRARY METHOD AS THE LOOP CONDITION
•MANUALLY UNROLLING OF THE LOOP
•A LONG AS THE LOOP VARIABLE
•MULTIPLE EXIT POINTS
SouJava & Belfast JUG
PROJECT PANAMA
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Source http://openjdk.java.net/projects/panama/
SouJava & Belfast JUG
PROJECT PANAMA
SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION)
Hudson Mendes
Source http://openjdk.java.net/projects/panama/
BETTER NATIVE API THAN JNI
SouJava & Belfast JUG
Hudson Mendes

Lead Java Software Engineer @ AIQUDO

twitter.com/hudsonmendes

linkedin.com/in/hudsonmendes

medium.com/@hudsonmendes
THANK YOU!
JMH CODE AVAILABLE AT

HTTPS://GITHUB.COM/HUDSONMENDES/BELFASTJUG-SAMPLE-3
SOUJAVA & BELFASTJUG

More Related Content

What's hot

Student manual
Student manualStudent manual
Student manual
ec931657
 

What's hot (17)

Capitulo 5 Soluciones Purcell 9na Edicion
Capitulo 5 Soluciones Purcell 9na EdicionCapitulo 5 Soluciones Purcell 9na Edicion
Capitulo 5 Soluciones Purcell 9na Edicion
 
14 mecv14 dvd
14 mecv14 dvd14 mecv14 dvd
14 mecv14 dvd
 
Sistemas de múltiples grados de libertad
Sistemas de múltiples grados de libertadSistemas de múltiples grados de libertad
Sistemas de múltiples grados de libertad
 
POTENCIAS Y RADICALES
POTENCIAS Y RADICALESPOTENCIAS Y RADICALES
POTENCIAS Y RADICALES
 
Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocation
 
第7回 大規模データを用いたデータフレーム操作実習(1)
第7回 大規模データを用いたデータフレーム操作実習(1)第7回 大規模データを用いたデータフレーム操作実習(1)
第7回 大規模データを用いたデータフレーム操作実習(1)
 
Final_Presentation
Final_PresentationFinal_Presentation
Final_Presentation
 
Solutions manual for calculus an applied approach brief international metric ...
Solutions manual for calculus an applied approach brief international metric ...Solutions manual for calculus an applied approach brief international metric ...
Solutions manual for calculus an applied approach brief international metric ...
 
Introduction to Machine Learning @ Mooncascade ML Camp
Introduction to Machine Learning @ Mooncascade ML CampIntroduction to Machine Learning @ Mooncascade ML Camp
Introduction to Machine Learning @ Mooncascade ML Camp
 
Data visualization with Python and SVG
Data visualization with Python and SVGData visualization with Python and SVG
Data visualization with Python and SVG
 
Integral table
Integral tableIntegral table
Integral table
 
Realibity design
Realibity designRealibity design
Realibity design
 
Student manual
Student manualStudent manual
Student manual
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Hideitsu Hino
Hideitsu HinoHideitsu Hino
Hideitsu Hino
 
Diffusion kernels on SNP data embedded in a non-Euclidean metric
Diffusion kernels on SNP data embedded in a non-Euclidean metricDiffusion kernels on SNP data embedded in a non-Euclidean metric
Diffusion kernels on SNP data embedded in a non-Euclidean metric
 
Ch16s
Ch16sCh16s
Ch16s
 

Similar to Belfast JUG, SIMD (Vectorial) Operations

232 md5-considered-harmful-slides
232 md5-considered-harmful-slides232 md5-considered-harmful-slides
232 md5-considered-harmful-slides
Dan Kaminsky
 

Similar to Belfast JUG, SIMD (Vectorial) Operations (20)

Multichannel IoT CAUSAL digital twin
Multichannel IoT CAUSAL digital twinMultichannel IoT CAUSAL digital twin
Multichannel IoT CAUSAL digital twin
 
Black Hat Europe 2015 - Time and Position Spoofing with Open Source Projects
Black Hat Europe 2015 - Time and Position Spoofing with Open Source ProjectsBlack Hat Europe 2015 - Time and Position Spoofing with Open Source Projects
Black Hat Europe 2015 - Time and Position Spoofing with Open Source Projects
 
Wim Remes SOURCE Boston 2011
Wim Remes SOURCE Boston 2011 Wim Remes SOURCE Boston 2011
Wim Remes SOURCE Boston 2011
 
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBMSolr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBM
 
Next Top Data Model by Ian Plosker
Next Top Data Model by Ian PloskerNext Top Data Model by Ian Plosker
Next Top Data Model by Ian Plosker
 
Extending Structured Streaming Made Easy with Algebra with Erik Erlandson
Extending Structured Streaming Made Easy with Algebra with Erik ErlandsonExtending Structured Streaming Made Easy with Algebra with Erik Erlandson
Extending Structured Streaming Made Easy with Algebra with Erik Erlandson
 
Analysis of an OSS supply chain attack - How did 8 millions developers downlo...
Analysis of an OSS supply chain attack - How did 8 millions developers downlo...Analysis of an OSS supply chain attack - How did 8 millions developers downlo...
Analysis of an OSS supply chain attack - How did 8 millions developers downlo...
 
Blur Filter - Hanpo
Blur Filter - HanpoBlur Filter - Hanpo
Blur Filter - Hanpo
 
How I learned to stop worrying and love the dark silicon apocalypse.pdf
How I learned to stop worrying and love the dark silicon apocalypse.pdfHow I learned to stop worrying and love the dark silicon apocalypse.pdf
How I learned to stop worrying and love the dark silicon apocalypse.pdf
 
Image Recognition on Streaming Data
Image Recognition  on Streaming DataImage Recognition  on Streaming Data
Image Recognition on Streaming Data
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Is writing performant code too expensive?
Is writing performant code too expensive? Is writing performant code too expensive?
Is writing performant code too expensive?
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtionNÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
 
Cyberspectrum Sydney 0x01 Introduction to SDR
Cyberspectrum Sydney   0x01 Introduction to SDRCyberspectrum Sydney   0x01 Introduction to SDR
Cyberspectrum Sydney 0x01 Introduction to SDR
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platform[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platform
 
232 md5-considered-harmful-slides
232 md5-considered-harmful-slides232 md5-considered-harmful-slides
232 md5-considered-harmful-slides
 
DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann
DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumannDSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann
DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann
 
Reading2018 kikuta
Reading2018 kikutaReading2018 kikuta
Reading2018 kikuta
 

Recently uploaded

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 

Recently uploaded (20)

DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stage
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 

Belfast JUG, SIMD (Vectorial) Operations

  • 1. Hudson Mendes
 Lead Java Software Engineer @ AIQUDO
 twitter.com/hudsonmendes
 linkedin.com/in/hudsonmendes
 medium.com/@hudsonmendes SIMD (VECTORISATION) SINGLE INSTRUCTION MULTIPLE DATA SOUJAVA & BELFASTJUG
  • 2. WHAT IS AN IMAGE FOR
 NEURAL NETS / AI? WHAT ARE IMAGES FOR
 NEURAL NETS / AI? WHY DOES IT MATTER
 FOR US / FOR JAVA? CERN’S COLT
 JNI AND PANAMA
  • 3. SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) WHAT IS AN IMAGE FOR NEURAL NETS / AI? Hudson MendesSouJava & Belfast JUG
  • 4. WHAT IS AN IMAGE FOR NEURAL NETS / AI? Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SouJava & Belfast JUG
  • 5. WHAT IS AN IMAGE FOR NEURAL NETS / AI? Hudson Mendes 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 200 130 130 … 210 200 200 130 130 … … … … … … … 210 200 130 130 130 … RED GREEN BLUE 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 130 130 … 210 200 200 130 130 … … … … … … … 89 60 65 50 20 20 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 130 130 … 210 200 200 130 130 … … … … … … … … 20 0 12 12 0 SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SouJava & Belfast JUG
  • 6. WHAT IS AN IMAGE FOR NEURAL NETS / AI? Hudson Mendes ▸ Humans: colours
 & other "features" ▸ AI: numbers ▸ Which numbers?
 RGB matrices 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 200 130 130 … 210 200 200 130 130 … … … … … … … 210 200 130 130 130 … RED GREEN BLUE 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 130 130 … 210 200 200 130 130 … … … … … … … 89 60 65 50 20 20 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 130 130 … 210 200 200 130 130 … … … … … … … … 20 0 12 12 0 255 255 230 230 210 200 130 130 130 120 … 89 60 65 50 20 20 20 … 20 0 12 12 0 SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SouJava & Belfast JUG
  • 7. WHAT IS AN IMAGE FOR NEURAL NETS / AI? Hudson Mendes ▸ Humans: colours
 & other "features" ▸ AI: numbers ▸ Which numbers?
 RGB matrices ▸ 64 x 64 x 3 = 12,288 ▸ 1 Image =
 1 Feature Vector 255 255 230 230 210 200 130 130 130 120 … 89 60 65 50 20 20 20 … 20 0 12 12 0 64 pixels 64pixels 64x64x3=12,288 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 200 130 130 … 210 200 200 130 130 … … … … … … … 210 200 130 130 130 … RED GREEN BLUE 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 130 130 … 210 200 200 130 130 … … … … … … … 89 60 65 50 20 20 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 130 130 … 210 200 200 130 130 … … … … … … … … 20 0 12 12 0 SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SouJava & Belfast JUG
  • 8. WHAT IS AN IMAGE FOR NEURAL NETS / AI? Hudson Mendes ▸ Humans: colours
 & other "features" ▸ AI: numbers ▸ Which numbers?
 RGB matrices ▸ 64 x 64 x 3 = 12,288 ▸ 1000 Images =
 1000 Feature Vectors 255 255 230 230 210 200 130 130 130 120 … 89 60 65 50 20 20 20 … 20 0 12 12 0 64 pixels 64pixels 64x64x3=12,288 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 200 130 130 … 210 200 200 130 130 … … … … … … … 210 200 130 130 130 … RED GREEN BLUE 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 130 130 … 210 200 200 130 130 … … … … … … … 89 60 65 50 20 20 255 255 255 244 230 … 255 255 255 230 230 … 255 255 230 210 200 … 230 210 200 130 130 … 210 200 200 130 130 … 210 200 200 130 130 … … … … … … … … 20 0 12 12 0 SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SouJava & Belfast JUG
  • 9. 64X64 IMAGE = ARRAY OF 12,288
 
 1024X1024 =
 ARRAY OF 3,145,728
  • 10. CNN, or Convolutional Neural Nets
 ς(X * W.T + b)
 ς(x) => 1 / (1 + Math.pow(Math.E, x) Hudson MendesSouJava & Belfast JUG
  • 11. CNN, or Convolutional Neural Nets
 ς(X * W.T + b)
 X => feature vector of the image Hudson MendesSouJava & Belfast JUG
  • 12. CNN, or Convolutional Neural Nets
 ς(X * W.T + b)
 X, W, b: not numbers, but vectors Hudson MendesSouJava & Belfast JUG
  • 13. CNN, or Convolutional Neural Nets
 ς(X * W.T + b)
 X, W, b: LARGE VECTORS Hudson MendesSouJava & Belfast JUG
  • 14. CNN, or Convolutional Neural Nets
 => Vectorisation <= Hudson MendesSouJava & Belfast JUG
  • 15. SIMD
 Single Instruction, Multiple Data
 Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SouJava & Belfast JUG
  • 16. SIMD
 How much faster?
 Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SouJava & Belfast JUG
  • 17. Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) EXAMPLE WITH NUMPY elapsedinms 0 125 250 375 500 500 X 5 5000 X 5 50000 X 5 SiSD SiMD SouJava & Belfast JUG
  • 18. WELL, I DON’T DO AI OR NNS.
 DOES IT MATTER TO ME? WHAT ARE IMAGES FOR
 NEURAL NETS / AI? WHY DOES IT MATTER
 FOR US / FOR JAVA? CERN’S COLT
 JNI AND PANAMA
  • 19. Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SIMD
 Single “Instruction"? SouJava & Belfast JUG
  • 20. Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SIMD
 Single “Instruction"?
 Add, Subtract, Multiply, Divide
 but also Log, Exp, Sqrt, etc SouJava & Belfast JUG
  • 21. Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SIMD
 Multiple “Data"? SouJava & Belfast JUG
  • 22. Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SIMD
 Multiple “Data”?
 Numerical data:
 Int, Long, Float, Double, etc SouJava & Belfast JUG
  • 23. Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SISD IMPLEMENTATIONS: SUM STREAM OF DOUBLES SouJava & Belfast JUG
  • 24. Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SISD IMPLEMENTATIONS: ADDING ELEMENTS OF 2 ARRAYS SouJava & Belfast JUG
  • 25. Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SISD IMPLEMENTATIONS: EUCLIDIAN DISTANCE BTW DOUBLE IN 2 ARRAYS SouJava & Belfast JUG
  • 26. Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SISD IMPLEMENTATIONS: LOGS OF MATRICES SouJava & Belfast JUG
  • 27. Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SISD (OR SINGLE INSTRUCTION SINGLE DATA) IN BYTECODE SouJava & Belfast JUG
  • 28. Hudson Mendes SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) SISD (OR SINGLE INSTRUCTION SINGLE DATA) IN BYTECODE ▸ Given Vector of size N ▸ Each 1 N (N[i]) receives 1 instruction (Single Instruction, Single Data) ▸ SiSd O(n) = 2n x > SiMd O(n) = n elapsedinms 0 250 500 500 X 5 1000 X 5 5000 X 5 10000 X 5 50000 X 5 100000 X 5 SiSD SiMD SouJava & Belfast JUG
  • 29. HOW TO DO SIMD
 IN JAVA THEN? WHAT ARE IMAGES FOR
 NEURAL NETS / AI? WHY DOES IT MATTER
 FOR US / FOR JAVA? CERN’S COLT
 JNI AND PANAMA
  • 30. LET’S ASK THE DATA SCIENTISTS? SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson MendesBelfast JUG
  • 31. CLASSIC SCIENTIFIC COMPUTING
 REFERENCES POINT TO… SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson MendesBelfast JUG
  • 32. CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes 1 GFLOP = 1.000.000.000 float point operations per second SouJava & Belfast JUG
  • 33. CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes 1 GFLOP = 1.000.000.000 float point operations per second That looks pretty fast! SouJava & Belfast JUG
  • 34. CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes 1 GFLOP = 1.000.000.000 float point operations per second That looks pretty fast! CERN’s Colt, Java SIMD Library? SouJava & Belfast JUG
  • 35. CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson MendesSouJava & Belfast JUG
  • 36. CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson MendesSouJava & Belfast JUG
  • 37. CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Not Faster than Serial, but WHY? SouJava & Belfast JUG
  • 38. CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson MendesSouJava & Belfast JUG
  • 39. CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Not SIMD! Digging Source Code: SouJava & Belfast JUG
  • 40. CLASSIC SCIENTIFIC COMPUTING: CERN’S COLT SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes 1 GFLOP = 1.000.000.000 float point operations per second Fast, but not SIMD CERN’s Colt, Java SIMD Library? SouJava & Belfast JUG
  • 41. DEEP LEARNING DATA
 SCIENTISTS SAY… SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson MendesBelfast JUG
  • 42. DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson MendesSouJava & Belfast JUG
  • 43. DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson MendesSouJava & Belfast JUG
  • 44. DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Faster than all of THEM! SouJava & Belfast JUG
  • 45. DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Digging Source Code: SouJava & Belfast JUG
  • 46. DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Digging Source Code: SouJava & Belfast JUG
  • 47. DEEP LEARNING DATA SCIENTIST: DEEPLEARNING4J SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Digging Source Code: Yes, SIMD! done with JNI SouJava & Belfast JUG
  • 48. SO, IS JNI (JAVA NATIVE INTERFACE)
 THE ONLY WAY TO SIMDS? SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson MendesSouJava & Belfast JUG
  • 49. WAYS TO DO SIMD? SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) 1 + 2 = 3 iconst_1 iconst_2 iadd MOV AX,@DATA MOV DS,AX MOV AX,OPR1 MOV BX,OPR2 CLC ADD AX,BX MOV DI,OFFSET RESULT MOV [DI], AX MOV AH,09H MOV DX,OFFSET RESULT INT 21H MOV AH,4CH INT 21H END JAVA BYTECODE ASSEMBLER Hudson Mendes
  • 50. WAYS TO DO SIMD? SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) 1 + 2 = 3 iconst_1 iconst_2 iadd MOV AX,@DATA MOV DS,AX MOV AX,OPR1 MOV BX,OPR2 CLC ADD AX,BX MOV DI,OFFSET RESULT MOV [DI], AX MOV AH,09H MOV DX,OFFSET RESULT INT 21H MOV AH,4CH INT 21H END VECTOR.SUM() ivector_1 vector_add EXPORT XCORR_KERNEL xcorr_kernel PROC VMOV.I32 q0, #0 CMP r3, #0 BLE xcorr_kernel_done VLD1.16 {d3}, [r2]! SUBS r3, r3, #4 BLE xcorr_kernel_process4_done (…) JAVA BYTECODE ASSEMBLER JAVA BYTECODE ASSEMBLER SouJava & Belfast JUG Hudson Mendes
  • 51. WAYS TO DO SIMD? SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) VECTOR.SUM() ivector_1 vector_add EXPORT XCORR_KERNEL xcorr_kernel PROC VMOV.I32 q0, #0 CMP r3, #0 BLE xcorr_kernel_done VLD1.16 {d3}, [r2]! SUBS r3, r3, #4 BLE xcorr_kernel_process4_done (…) JAVA BYTECODE ASSEMBLER SO, YES - AT THE MINUTE
 JNI IS PRETTY MUCH THE ONLY WAY… SouJava & Belfast JUG Hudson Mendes
  • 52. IS THERE ANYTHING BETTER
 THAN JNI IN THE NEAR FUTURE? SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson MendesBelfast JUG
  • 53. IS THERE ANYTHING BETTER
 THAN JNI IN THE NEAR FUTURE? SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson MendesBelfast JUG YES!
  • 54. JAVA9+ SUPERWORD SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Source http://prestodb.rocks/code/simd/ SouJava & Belfast JUG
  • 55. JAVA9+ SUPERWORD SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Source http://prestodb.rocks/code/simd/ "Exploiting Superword Level Parallelism with Multimedia Instruction Sets", LARSEN Samuel and AMARASINGHE Saman, from MIT
 HTTP://GROUPS.CSAIL.MIT.EDU/CAG/SLP/SLP-PLDI-2000.PDF SouJava & Belfast JUG
  • 56. JAVA9+ SUPERWORD SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Source http://prestodb.rocks/code/simd/ FOR SUPERWORD, MUST NOT HAVE: •AN OR CONDITION AS THE LOOP CONDITION •A NON-INLINED METHOD INSIDE THE LOOP •AN ARBITRARY METHOD AS THE LOOP CONDITION •MANUALLY UNROLLING OF THE LOOP •A LONG AS THE LOOP VARIABLE •MULTIPLE EXIT POINTS SouJava & Belfast JUG
  • 57. JAVA9+ SUPERWORD SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Source http://prestodb.rocks/code/simd/ ON BY DEFAULT ON J9+ FOR SUPERWORD, MUST NOT HAVE: •AN OR CONDITION AS THE LOOP CONDITION •A NON-INLINED METHOD INSIDE THE LOOP •AN ARBITRARY METHOD AS THE LOOP CONDITION •MANUALLY UNROLLING OF THE LOOP •A LONG AS THE LOOP VARIABLE •MULTIPLE EXIT POINTS SouJava & Belfast JUG
  • 58. PROJECT PANAMA SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Source http://openjdk.java.net/projects/panama/ SouJava & Belfast JUG
  • 59. PROJECT PANAMA SIMD, SINGLE INSTRUCTION MULTIPLE DATA (VECTORISATION) Hudson Mendes Source http://openjdk.java.net/projects/panama/ BETTER NATIVE API THAN JNI SouJava & Belfast JUG
  • 60. Hudson Mendes
 Lead Java Software Engineer @ AIQUDO
 twitter.com/hudsonmendes
 linkedin.com/in/hudsonmendes
 medium.com/@hudsonmendes THANK YOU! JMH CODE AVAILABLE AT
 HTTPS://GITHUB.COM/HUDSONMENDES/BELFASTJUG-SAMPLE-3 SOUJAVA & BELFASTJUG