SlideShare a Scribd company logo
1 of 25
Heterogeneous Computing at USC
Dept. of Computer Science and Engineering
University of South Carolina
Dr. Jason D. Bakos
Assistant Professor
Heterogeneous and Reconfigurable Computing Lab (HeRC)
This material is based upon work supported
by the National Science Foundation under
Grant Nos. CCF-0844951 and CCF-0915608.
Heterogeneous Computing
• Subfield of computer architecture
• Mix general-purpose CPUs with “specialized processors” for high-
performance computing
• Specialized processors include:
– Field Programmable Gate Arrays (FPGAs)
– Graphical Processing Units (GPUs)
• Our goals:
– Adapt scientific and engineering applications to heterogeneous
programming and execution models
– Leverage our experience to build development tools for these models
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 2
Heterogeneous Computing
initialization
0.5% of run time
“hot” loop
99% of run time
clean up
0.5% of run time
49% of
code
49% of
code
2% of code
co-processor
Kernel
speedup
Application
speedup
Execution
time
50 34 5.0 hours
100 50 3.3 hours
200 67 2.5 hours
500 83 2.0 hours
1000 91 1.8 hours
• Example:
– Application requires a week
of CPU time
– Offload computation
consumes 99% of
execution time
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 3
My Group
• Applications work
– Computational biology:
• Computational phylogeny reconstruction (FPGA)
• Sequence alignment (GPU)
– Numerical linear algebra
• Sparse matrix-vector multiply (FPGA)
– Data mining:
• Frequent itemset mining (GPU)
– Electronic design automation:
• Logic minimization heuristics (GPU)
• Tools
– Automatic CPU/coprocessor partitioning for legacy code
– Performance modeling
– Bandwidth-constrained high-level synthesis
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 4
Field Programmable Gate Arrays
• Programmable logic device
• Contains:
– Programmable logic gates, RAMs, multipliers, I/O interfaces
– Programmable interconnect
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 5
Programming FPGAs
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 6
FPGA Platforms
Annapolis Micro
Systems
WILDSTAR 2
PRO
GiDEL
PROCSTAR III
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 7
Convey HC-1
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 8
Convey HC-1
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 9
GPU Platforms
NVIDIA Tesla S1070
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 10
GPU Acceleration of Data Mining
2-itemsets:
<ABC>, <ABE>, <ACE>, <BCE>
2-itemsets with
threshold 2:
3-itemsets:
3-itemsets with
threshold 2:
<BCE>
• Key enabling techniques:
– GPU-mappable data structures
• Our GPU accelerated implementation achieves a 20X speedup
over state-of-the-art serial implementations
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 11
Automated Task Partitioning
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 12
Phylogenic Reconstruction
genus
Drosophila
654,729,075
possible trees
with 12 leaves
200 trillion
possible trees
for 16 leaves
2.2 x 1020
possible trees
for 20 leaves
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 13
Our Projects
• FPGA-based co-processors for computational biology:
1000X speedup! 10X speedup!
GRAPPA: MP reconstruction of whole
genome data based on gene-
rearrangements
MrBayes: Monte Carlo-based
reconstruction based on likelihood
model for sequence data
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 14
Sparse Matrix Arithmetic
• Sparse matrices are large matrices that contain mostly zero-
values
– Common in many scientific and engineering applications
• Often represent a linear system and are thus multiplied by a
vector when using an iterative linear solver
• Compressed Storage Row (CSR) representation:
1 -1 0 -3 0
-2 5 0 0 0
0 0 4 6 4
-4 0 2 7 0
0 8 0 0 -5
val = (1 -1 -3 -2 5 4 6 4 -4 2 7 8 -5)
col = (0 1 3 0 1 2 3 4 0 2 3 1 4)
ptr = (0 3 5 8 11 13)
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 15
Sparse Matrix-Vector Multiply
• Code for Ax = b
– A is matrix stored in val, col, ptr
row = 0
for i = 0 to number_of_nonzero_elements do
if i = ptr[row+1] then row=row+1, b[row]=0.0
b[row] = b[row] + val[i] * x[col[i]]
end
recurrence (reduction)
non-affine (indirect) indexing
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 16
Indirect Addressing
• Technique:
• Can scale up the number of these processing elements until
you run out of memory bandwidth
S
x
RAM
CSR stream
val
col
Processing element (PE)
val
vec
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 17
segmented
local cache
Double Precision Accumulation
Mem Mem
Control
Partial sums
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 18
Problem:
New values arrive every clock cycle, but adders are deeply pipelined
Causes a data dependency
Reduction Rules
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 19
Sparse Matrix-Vector Multiply
• 32 PEs on the Convey HC-1
– Each PE can achieve up to 300 MFLOPs/s
– 32 PE gives an upper bound of 9.6 GFLOPs/s
• The HC-1 coprocessor has 80 GB/s of memory bandwidth
– Gives a performance upper bound of ~7.1 GFLOPs/s
• In our implementation, we achieved up to 50% of this
peak, depending on the matrix tested
– Depends on:
• Vector cache performance
• On-chip contention for memory interfaces
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 20
Maximizing Memory Bandwidth
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 21
…
8 x 128 bit
memory
channels
64 x 1024 bit
onchip memory
4096 bit, 42 x 96
bit shift register
128
1024 96 (val/col)
PE
Summary
• Manually accelerated several applications on using FPGA
and GPU-based coprocessors
• Working to develop tools for to make it easier to take
advantage of heterogeneous platforms
Heterogeneous Computing at USC | USC HPC Workshop| 4/14/11 22
GPU Acceleration of Sequence Alignment
• DNA/protein sequence, e.g.
– TGAGCTGTAGTGTTGGTACCC => TGACCGGTTTGGCCC
• Goal: align the two sequences against substitutions and
deletions:
– TGAGCTGTAGTGTTGGTACCC
– TGAGCTGT----TTGGTACCC
• Used for sequence comparison and database search
• Our work focuses on pairwise alignment of large databases
for noise removal in meta-genomic sequencing
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 23
High-Level Synthesis
• Bandwidth-constrained high-level synthesis
• Example: 16-input expression:
out = (AA1 * A1 + AC1 * C1 + AG1 * G1 + AT1 * T1) *
(AG2 * A2 + AC2 * C2 + AG2 * G2 + AT2 * T2)
* * * * * * * *
+ + + +
+ +
*
A
B
C
D
A
BC
D
mux mux
*
*
+
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 24
GPU Acceleration of Two Level Logic Minimization
A B C D out
0 0 0 0 1
0 0 1 0 1
0 1 1 1 1
0 1 1 0 1
1 1 1 1 0
1 0 1 1 0
0 1 0 1 0
anything else X
A’B’D’
A’BC
(ACD)’
(A’BC’D)’
A’B’CD
A’B’C’D A’B’
A’B’CD
A’B’CD’
A’C
• Key enabling techniques:
– Novel reduction algorithms optimized for GPU execution
• Achieves 10X speedup over single-thread software
Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 25

More Related Content

Similar to Heterogeneous Computing at USC

Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsUnai Lopez-Novoa
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA accelerationMarco77328
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-isctembreternitz
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for researchEsteban Hernandez
 
Modern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High PerformanceModern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High Performanceinside-BigData.com
 
Survey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning AlgorithmSurvey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning AlgorithmSahil Kaw
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedWee Hyong Tok
 
Swetha Jayachandran resume
Swetha Jayachandran resumeSwetha Jayachandran resume
Swetha Jayachandran resumeswetha_chandran
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systemsinside-BigData.com
 

Similar to Heterogeneous Computing at USC (20)

Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
 
Modern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High PerformanceModern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High Performance
 
Survey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning AlgorithmSurvey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning Algorithm
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learned
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
Swetha Jayachandran resume
Swetha Jayachandran resumeSwetha Jayachandran resume
Swetha Jayachandran resume
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
 
NSCC Training - Introductory Class
NSCC Training - Introductory ClassNSCC Training - Introductory Class
NSCC Training - Introductory Class
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Heterogeneous Computing at USC

  • 1. Heterogeneous Computing at USC Dept. of Computer Science and Engineering University of South Carolina Dr. Jason D. Bakos Assistant Professor Heterogeneous and Reconfigurable Computing Lab (HeRC) This material is based upon work supported by the National Science Foundation under Grant Nos. CCF-0844951 and CCF-0915608.
  • 2. Heterogeneous Computing • Subfield of computer architecture • Mix general-purpose CPUs with “specialized processors” for high- performance computing • Specialized processors include: – Field Programmable Gate Arrays (FPGAs) – Graphical Processing Units (GPUs) • Our goals: – Adapt scientific and engineering applications to heterogeneous programming and execution models – Leverage our experience to build development tools for these models Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 2
  • 3. Heterogeneous Computing initialization 0.5% of run time “hot” loop 99% of run time clean up 0.5% of run time 49% of code 49% of code 2% of code co-processor Kernel speedup Application speedup Execution time 50 34 5.0 hours 100 50 3.3 hours 200 67 2.5 hours 500 83 2.0 hours 1000 91 1.8 hours • Example: – Application requires a week of CPU time – Offload computation consumes 99% of execution time Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 3
  • 4. My Group • Applications work – Computational biology: • Computational phylogeny reconstruction (FPGA) • Sequence alignment (GPU) – Numerical linear algebra • Sparse matrix-vector multiply (FPGA) – Data mining: • Frequent itemset mining (GPU) – Electronic design automation: • Logic minimization heuristics (GPU) • Tools – Automatic CPU/coprocessor partitioning for legacy code – Performance modeling – Bandwidth-constrained high-level synthesis Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 4
  • 5. Field Programmable Gate Arrays • Programmable logic device • Contains: – Programmable logic gates, RAMs, multipliers, I/O interfaces – Programmable interconnect Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 5
  • 6. Programming FPGAs Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 6
  • 7. FPGA Platforms Annapolis Micro Systems WILDSTAR 2 PRO GiDEL PROCSTAR III Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 7
  • 8. Convey HC-1 Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 8
  • 9. Convey HC-1 Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 9
  • 10. GPU Platforms NVIDIA Tesla S1070 Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 10
  • 11. GPU Acceleration of Data Mining 2-itemsets: <ABC>, <ABE>, <ACE>, <BCE> 2-itemsets with threshold 2: 3-itemsets: 3-itemsets with threshold 2: <BCE> • Key enabling techniques: – GPU-mappable data structures • Our GPU accelerated implementation achieves a 20X speedup over state-of-the-art serial implementations Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 11
  • 12. Automated Task Partitioning Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 12
  • 13. Phylogenic Reconstruction genus Drosophila 654,729,075 possible trees with 12 leaves 200 trillion possible trees for 16 leaves 2.2 x 1020 possible trees for 20 leaves Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 13
  • 14. Our Projects • FPGA-based co-processors for computational biology: 1000X speedup! 10X speedup! GRAPPA: MP reconstruction of whole genome data based on gene- rearrangements MrBayes: Monte Carlo-based reconstruction based on likelihood model for sequence data Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 14
  • 15. Sparse Matrix Arithmetic • Sparse matrices are large matrices that contain mostly zero- values – Common in many scientific and engineering applications • Often represent a linear system and are thus multiplied by a vector when using an iterative linear solver • Compressed Storage Row (CSR) representation: 1 -1 0 -3 0 -2 5 0 0 0 0 0 4 6 4 -4 0 2 7 0 0 8 0 0 -5 val = (1 -1 -3 -2 5 4 6 4 -4 2 7 8 -5) col = (0 1 3 0 1 2 3 4 0 2 3 1 4) ptr = (0 3 5 8 11 13) Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 15
  • 16. Sparse Matrix-Vector Multiply • Code for Ax = b – A is matrix stored in val, col, ptr row = 0 for i = 0 to number_of_nonzero_elements do if i = ptr[row+1] then row=row+1, b[row]=0.0 b[row] = b[row] + val[i] * x[col[i]] end recurrence (reduction) non-affine (indirect) indexing Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 16
  • 17. Indirect Addressing • Technique: • Can scale up the number of these processing elements until you run out of memory bandwidth S x RAM CSR stream val col Processing element (PE) val vec Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 17 segmented local cache
  • 18. Double Precision Accumulation Mem Mem Control Partial sums Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 18 Problem: New values arrive every clock cycle, but adders are deeply pipelined Causes a data dependency
  • 19. Reduction Rules Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 19
  • 20. Sparse Matrix-Vector Multiply • 32 PEs on the Convey HC-1 – Each PE can achieve up to 300 MFLOPs/s – 32 PE gives an upper bound of 9.6 GFLOPs/s • The HC-1 coprocessor has 80 GB/s of memory bandwidth – Gives a performance upper bound of ~7.1 GFLOPs/s • In our implementation, we achieved up to 50% of this peak, depending on the matrix tested – Depends on: • Vector cache performance • On-chip contention for memory interfaces Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 20
  • 21. Maximizing Memory Bandwidth Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 21 … 8 x 128 bit memory channels 64 x 1024 bit onchip memory 4096 bit, 42 x 96 bit shift register 128 1024 96 (val/col) PE
  • 22. Summary • Manually accelerated several applications on using FPGA and GPU-based coprocessors • Working to develop tools for to make it easier to take advantage of heterogeneous platforms Heterogeneous Computing at USC | USC HPC Workshop| 4/14/11 22
  • 23. GPU Acceleration of Sequence Alignment • DNA/protein sequence, e.g. – TGAGCTGTAGTGTTGGTACCC => TGACCGGTTTGGCCC • Goal: align the two sequences against substitutions and deletions: – TGAGCTGTAGTGTTGGTACCC – TGAGCTGT----TTGGTACCC • Used for sequence comparison and database search • Our work focuses on pairwise alignment of large databases for noise removal in meta-genomic sequencing Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 23
  • 24. High-Level Synthesis • Bandwidth-constrained high-level synthesis • Example: 16-input expression: out = (AA1 * A1 + AC1 * C1 + AG1 * G1 + AT1 * T1) * (AG2 * A2 + AC2 * C2 + AG2 * G2 + AT2 * T2) * * * * * * * * + + + + + + * A B C D A BC D mux mux * * + Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 24
  • 25. GPU Acceleration of Two Level Logic Minimization A B C D out 0 0 0 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 0 1 1 0 0 1 0 1 0 anything else X A’B’D’ A’BC (ACD)’ (A’BC’D)’ A’B’CD A’B’C’D A’B’ A’B’CD A’B’CD’ A’C • Key enabling techniques: – Novel reduction algorithms optimized for GPU execution • Achieves 10X speedup over single-thread software Heterogeneous Computing at USC | USC HPC Workshop | 4/14/11 25