SlideShare a Scribd company logo
1 of 15
Download to read offline
M U LT I - C O R E 

K - M E A N S
BÖHM C.; PERDACHER M.; PLANT C.
SPEAKER: MARTIN PERDACHER
MULTI-CORE K-MEANS
INTRODUCTION
• K-means is highly relevant use-case for knowledge discovery on
big data
• We maximise the performance of K-means by applying two types
of parallelism:
• MIMD (Multiple Instruction Multiple Data)
• SIMD (Single Instruction Multiple Data)
• Avoid branching operations like if-then:
• Code cluster IDs and distances in joint variables
MIMD VS SIMD
IN A SHARED ENVIRONMENT
INTRODUCTION
• Corse-grained parallelism
• OpenMP
• Fine-grained parallelism
• Advanced Vector eXtensions
(AVX2)
• Auto-vectorization exists, but
is far from being efficient.
AVX REGISTERS
INTRODUCTION
YMM0:
YMM1:
…
YMM15:
256	bit
IEEE-754	double:
64	bit
fractionexponentsign
±2exponent·fraction
YMM0:
YMM1:
YMM2:
+
=
+
=
+
=
+
=
AVX OPERATIONS
_mm256_add_pd
_mm256_sub_pd
_mm256_mul_pd
_mm256_min_pd
CLASSICAL VARIANT
K-MEANS
LOOP TRAVERSAL
MULTI-CORE K-MEANS
75
3
n
d
1
1
2 4
2
6
Thread1
31
2 4
2
Thread2
k
SIMD
5 7
3
31
sequential
loops
AVX
INTELLIGENT REUSE OF REGISTERS
YMM0
YMM1
YMM2
YMM3
YMM4
YMM5
YMM6
YMM7
YMM8
YMM9
YMM10
YMM11
YMM12
YMM13
YMM14
YMM15
16 distance calculations between
4 data points and 4 centroids
4 dimensions of the data points
4 dimensions of the centroids
reserved for intermediate results
minimum distance for the assignment of the 4 points
AVOID BRANCHING
BACKPACKED CLUSTER ID CODING
• How to determine 

efficiently?
• AVX has primitives for min but
not for argmin
• Idea is to store current
clusterId j in least significant 8
bits of current distance
sign exponent fraction (52 bit)
cluster-ID
AVOID BRANCHING
BACKPACKED CLUSTER ID CODING
• Our technique automatically copies the clusterId
• Even with SIMD - primitives:
sign exponent fraction (52 bit)
cluster-ID
YMM15 := _mm256_min_pd (YMM14, YMM15)
29.5
410.9
29.5
YMM15: 316.3
418.7
316.3
212.8
416.5
212.8
115.0
412.3
412.3
YMM14:
new
YMM15 :
new
INFLUENCE ON THE DISTANCE?
BACKPACKED CLUSTER ID CODING
• How much does a backpacked clusterId change the distance?
• Not much:

If the true distance = 1.0 and we have a clusterId of 255

13

1.000000000000057
• Not significantly:

Euclidean distance involves a square root, this means that half
of the bits are numerically insignificant anyway
sign exponent fraction (52 bit)
numerically significant in ||xi-µj|| cluster-ID: 26 bit
SETTING
PERFORMANCE EVALUATION
• 2 quad-core CPUs 2.4 GHz

- Intel Xeon E5-2609 

- (Sandy Bridge micro-architecture)

- AVX1
• Cache

- 4x32 kB L1 data cache

- 4x256 kB L2 cache

- 10 MB (shared) L3 cache
• Software

C++ (GNU g++)
• 5 iterations
• Synthetic data

- n up to 64 millions

- k up to 20

- d up to 100
• Real data from UCI

- Forest Covertype

(n=580000, d=54)

- Houshold data

(n= 2 Million, d=7)
REAL DATA
RUN UNTIL CONVERGENCE
0
2
4
6
8
10
12
Synthetic
12D
CoverType
54D
Household
7D
No Vect. (1-core)
Autovect. (1-core)
MKM (1-core)
No Vect. (8-core)
Autovect. (8-core)
MKM (8-core)
51.2
39.1
55.3
SYNTHETIC DATA
DASHED LINE SHOWS IDEAL CURVE
Neue Experimente für SDM final Version
n=32 Million; k=40; d=20
# Threads Autovect. BLAS‐KM no ID coding MKM
1 134.313 43.873 60.915 31.18 134.313 43.873 60.915 31.18
2 68.03 28.856 25.569 18.896 67.1565 21.9365 30.4575 15.59
3 46.871 19.408 18.228 12.501 44.771 14.6243333 20.305 10.3933333
4 36.031 15.39 13.843 9.155 33.57825 10.96825 15.22875 7.795
5 29.411 12.296 13.888 7.64 26.8626 8.7746 12.183 6.236
6 25.081 13.858 10.583 6.554 22.3855 7.31216667 10.1525 5.19666667
7 21.914 11.896 10.923 5.533 19.1875714 6.26757143 8.70214286 4.45428571
8 19.758 10.392 8.519 5.017 16.789125 5.484125 7.614375 3.8975
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6 7 8
Runtimefor5Iterations(s)
Number of Threads
Autovect.
BLAS-KM
no ID coding
MKM
0
10
20
30
40
50
1 2 3 4 5 6 7 8
Runtimefor5Iterations(s)
Number of Threads
0
20
40
60
80
100
1 2 3 4 5 6 7 8
Runtimefor5Iterations(s)
Number of Threads
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8
Runtimefor5Iterations(s)
Number of Threads
SCALABILITY
IN N, D AND K
Autovect. 8 MKM 8 factor no vect 1 core
1 0.887 0.147 6.03401361 6,113
16 13.748 2.534 5.42541436 95.532
32 26.865 5.036 5.33459095
48 43.191 8.274 5.22008702
64 59.179 9.306 6.3592306 2408
258.757791
d = 20 ; k= 40 28.3733365
c=8
iter=5
0
10
20
30
40
50
60
70
0 20 40 60
Runtimefor5Iter.(s)
# Objects (Millions)
0
10
20
30
40
50
60
70
0 10 20 30 40 50
Dimensionality
0
10
20
30
40
50
60
70
20 40 60 80 100
# Clusters
Autovect.
MKM
M U LT I - C O R E 

K - M E A N S
BÖHM C.; PERDACHER M.; PLANT C.
SPEAKER: MARTIN PERDACHER
Source code available at:
https://informatik.univie.ac.at/dm/downloads/
PaperId: 031_115

More Related Content

What's hot

Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningIntel Nervana
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
3 2--power-aware-cloud
3 2--power-aware-cloud3 2--power-aware-cloud
3 2--power-aware-cloudBHUVIJAYAVELU
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnnTaeoh Kim
 
Design the High Speed Kogge-Stone Adder by Using
Design the High Speed Kogge-Stone Adder by UsingDesign the High Speed Kogge-Stone Adder by Using
Design the High Speed Kogge-Stone Adder by UsingIJERA Editor
 
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Ganesan Narayanasamy
 
Network simulator 2
Network simulator 2Network simulator 2
Network simulator 2shwetha mk
 
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchFast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchYuichiro Yasui
 
A brief analysis of MobileMT data. Dr. Daniel Sattel
A brief analysis of MobileMT data. Dr. Daniel SattelA brief analysis of MobileMT data. Dr. Daniel Sattel
A brief analysis of MobileMT data. Dr. Daniel SattelExpert Geophysics Limited
 
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...AILABS Academy
 
NUMA-aware Scalable Graph Traversal on SGI UV Systems
NUMA-aware Scalable Graph Traversal on SGI UV SystemsNUMA-aware Scalable Graph Traversal on SGI UV Systems
NUMA-aware Scalable Graph Traversal on SGI UV SystemsYuichiro Yasui
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u netDing Li
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelKoichi Shirahata
 
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer SystemsMapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer SystemsMikhail Kurnosov
 
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...Yuichiro Yasui
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Yuichiro Yasui
 
Orthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
Orthogonal Faster than Nyquist Transmission for SIMO Wireless SystemsOrthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
Orthogonal Faster than Nyquist Transmission for SIMO Wireless SystemsT. E. BOGALE
 

What's hot (20)

Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep Learning
 
Cnq1
Cnq1Cnq1
Cnq1
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
3 2--power-aware-cloud
3 2--power-aware-cloud3 2--power-aware-cloud
3 2--power-aware-cloud
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
 
Design the High Speed Kogge-Stone Adder by Using
Design the High Speed Kogge-Stone Adder by UsingDesign the High Speed Kogge-Stone Adder by Using
Design the High Speed Kogge-Stone Adder by Using
 
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Network simulator 2
Network simulator 2Network simulator 2
Network simulator 2
 
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchFast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
 
A brief analysis of MobileMT data. Dr. Daniel Sattel
A brief analysis of MobileMT data. Dr. Daniel SattelA brief analysis of MobileMT data. Dr. Daniel Sattel
A brief analysis of MobileMT data. Dr. Daniel Sattel
 
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
 
NUMA-aware Scalable Graph Traversal on SGI UV Systems
NUMA-aware Scalable Graph Traversal on SGI UV SystemsNUMA-aware Scalable Graph Traversal on SGI UV Systems
NUMA-aware Scalable Graph Traversal on SGI UV Systems
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u net
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
 
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer SystemsMapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
 
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
 
Orthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
Orthogonal Faster than Nyquist Transmission for SIMO Wireless SystemsOrthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
Orthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
 

Similar to Multi-Core K-Means Optimization with AVX

Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...RISC-V International
 
Reducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksReducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksHakky St
 
2016 03-03 marchand
2016 03-03 marchand2016 03-03 marchand
2016 03-03 marchandSCEE Team
 
Cluster Computing with Dryad
Cluster Computing with DryadCluster Computing with Dryad
Cluster Computing with Dryadbutest
 
Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)kike2005
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsDilum Bandara
 
Semiconductor overview
Semiconductor overviewSemiconductor overview
Semiconductor overviewNabil Chouba
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-ryuz88
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
 
underground cable fault location using aruino,gsm&gps
underground cable fault location using aruino,gsm&gps underground cable fault location using aruino,gsm&gps
underground cable fault location using aruino,gsm&gps Mohd Sohail
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
Simple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud StorageSimple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud StorageKevin Tong
 
数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战Weiwei Fang
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingAMD
 
Navigating dc architectures tech&sales
Navigating dc architectures tech&salesNavigating dc architectures tech&sales
Navigating dc architectures tech&salesEric Zhaohui Ji
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyPerry Lea
 

Similar to Multi-Core K-Means Optimization with AVX (20)

26_Fan.pdf
26_Fan.pdf26_Fan.pdf
26_Fan.pdf
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 
Reducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksReducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networks
 
2016 03-03 marchand
2016 03-03 marchand2016 03-03 marchand
2016 03-03 marchand
 
Cluster Computing with Dryad
Cluster Computing with DryadCluster Computing with Dryad
Cluster Computing with Dryad
 
Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
 
Semiconductor overview
Semiconductor overviewSemiconductor overview
Semiconductor overview
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
 
underground cable fault location using aruino,gsm&gps
underground cable fault location using aruino,gsm&gps underground cable fault location using aruino,gsm&gps
underground cable fault location using aruino,gsm&gps
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Simple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud StorageSimple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud Storage
 
数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D Packaging
 
Navigating dc architectures tech&sales
Navigating dc architectures tech&salesNavigating dc architectures tech&sales
Navigating dc architectures tech&sales
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
 

Recently uploaded

Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 

Recently uploaded (20)

Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Multi-Core K-Means Optimization with AVX

  • 1. M U LT I - C O R E 
 K - M E A N S BÖHM C.; PERDACHER M.; PLANT C. SPEAKER: MARTIN PERDACHER
  • 2. MULTI-CORE K-MEANS INTRODUCTION • K-means is highly relevant use-case for knowledge discovery on big data • We maximise the performance of K-means by applying two types of parallelism: • MIMD (Multiple Instruction Multiple Data) • SIMD (Single Instruction Multiple Data) • Avoid branching operations like if-then: • Code cluster IDs and distances in joint variables
  • 3. MIMD VS SIMD IN A SHARED ENVIRONMENT INTRODUCTION • Corse-grained parallelism • OpenMP • Fine-grained parallelism • Advanced Vector eXtensions (AVX2) • Auto-vectorization exists, but is far from being efficient.
  • 6. LOOP TRAVERSAL MULTI-CORE K-MEANS 75 3 n d 1 1 2 4 2 6 Thread1 31 2 4 2 Thread2 k SIMD 5 7 3 31 sequential loops
  • 7. AVX INTELLIGENT REUSE OF REGISTERS YMM0 YMM1 YMM2 YMM3 YMM4 YMM5 YMM6 YMM7 YMM8 YMM9 YMM10 YMM11 YMM12 YMM13 YMM14 YMM15 16 distance calculations between 4 data points and 4 centroids 4 dimensions of the data points 4 dimensions of the centroids reserved for intermediate results minimum distance for the assignment of the 4 points
  • 8. AVOID BRANCHING BACKPACKED CLUSTER ID CODING • How to determine 
 efficiently? • AVX has primitives for min but not for argmin • Idea is to store current clusterId j in least significant 8 bits of current distance sign exponent fraction (52 bit) cluster-ID
  • 9. AVOID BRANCHING BACKPACKED CLUSTER ID CODING • Our technique automatically copies the clusterId • Even with SIMD - primitives: sign exponent fraction (52 bit) cluster-ID YMM15 := _mm256_min_pd (YMM14, YMM15) 29.5 410.9 29.5 YMM15: 316.3 418.7 316.3 212.8 416.5 212.8 115.0 412.3 412.3 YMM14: new YMM15 : new
  • 10. INFLUENCE ON THE DISTANCE? BACKPACKED CLUSTER ID CODING • How much does a backpacked clusterId change the distance? • Not much:
 If the true distance = 1.0 and we have a clusterId of 255
 13
 1.000000000000057 • Not significantly:
 Euclidean distance involves a square root, this means that half of the bits are numerically insignificant anyway sign exponent fraction (52 bit) numerically significant in ||xi-µj|| cluster-ID: 26 bit
  • 11. SETTING PERFORMANCE EVALUATION • 2 quad-core CPUs 2.4 GHz
 - Intel Xeon E5-2609 
 - (Sandy Bridge micro-architecture)
 - AVX1 • Cache
 - 4x32 kB L1 data cache
 - 4x256 kB L2 cache
 - 10 MB (shared) L3 cache • Software
 C++ (GNU g++) • 5 iterations • Synthetic data
 - n up to 64 millions
 - k up to 20
 - d up to 100 • Real data from UCI
 - Forest Covertype
 (n=580000, d=54)
 - Houshold data
 (n= 2 Million, d=7)
  • 12. REAL DATA RUN UNTIL CONVERGENCE 0 2 4 6 8 10 12 Synthetic 12D CoverType 54D Household 7D No Vect. (1-core) Autovect. (1-core) MKM (1-core) No Vect. (8-core) Autovect. (8-core) MKM (8-core) 51.2 39.1 55.3
  • 13. SYNTHETIC DATA DASHED LINE SHOWS IDEAL CURVE Neue Experimente für SDM final Version n=32 Million; k=40; d=20 # Threads Autovect. BLAS‐KM no ID coding MKM 1 134.313 43.873 60.915 31.18 134.313 43.873 60.915 31.18 2 68.03 28.856 25.569 18.896 67.1565 21.9365 30.4575 15.59 3 46.871 19.408 18.228 12.501 44.771 14.6243333 20.305 10.3933333 4 36.031 15.39 13.843 9.155 33.57825 10.96825 15.22875 7.795 5 29.411 12.296 13.888 7.64 26.8626 8.7746 12.183 6.236 6 25.081 13.858 10.583 6.554 22.3855 7.31216667 10.1525 5.19666667 7 21.914 11.896 10.923 5.533 19.1875714 6.26757143 8.70214286 4.45428571 8 19.758 10.392 8.519 5.017 16.789125 5.484125 7.614375 3.8975 0 20 40 60 80 100 120 140 160 1 2 3 4 5 6 7 8 Runtimefor5Iterations(s) Number of Threads Autovect. BLAS-KM no ID coding MKM 0 10 20 30 40 50 1 2 3 4 5 6 7 8 Runtimefor5Iterations(s) Number of Threads 0 20 40 60 80 100 1 2 3 4 5 6 7 8 Runtimefor5Iterations(s) Number of Threads 0 50 100 150 200 250 300 1 2 3 4 5 6 7 8 Runtimefor5Iterations(s) Number of Threads
  • 14. SCALABILITY IN N, D AND K Autovect. 8 MKM 8 factor no vect 1 core 1 0.887 0.147 6.03401361 6,113 16 13.748 2.534 5.42541436 95.532 32 26.865 5.036 5.33459095 48 43.191 8.274 5.22008702 64 59.179 9.306 6.3592306 2408 258.757791 d = 20 ; k= 40 28.3733365 c=8 iter=5 0 10 20 30 40 50 60 70 0 20 40 60 Runtimefor5Iter.(s) # Objects (Millions) 0 10 20 30 40 50 60 70 0 10 20 30 40 50 Dimensionality 0 10 20 30 40 50 60 70 20 40 60 80 100 # Clusters Autovect. MKM
  • 15. M U LT I - C O R E 
 K - M E A N S BÖHM C.; PERDACHER M.; PLANT C. SPEAKER: MARTIN PERDACHER Source code available at: https://informatik.univie.ac.at/dm/downloads/ PaperId: 031_115