SlideShare a Scribd company logo
GraphBLAS and Emus
Jason Riedy (all opinions my own, no planning guarantees)
GraphBLAS BoF at IEEE HPEC, 22 September 2020
Lucata/ Emu Technology
Lucata’s PGAS Architecture
1 nodelet
Gossamer
Core 1
Memory-Side Processor
Gossamer
Core 4
...
Migration Engine
RapidIODisk I/O
8 nodelets
per node
64 nodelets
per Chick
RapidIO
Stationary
Core
• Cacheless multithreaded
multicore
• Memory-side “processor” at
narrow-channel DRAM
• Stationary core for OS
• Physically distributed
memory
• Threads migrate in
hardware on reads!
GraphBLAS and Emus — 22 Sep 2020 2/8
Pointer-Chasing Benchmark
Data-dependent loads, fine-grained access1
Ordered
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Intra-block shuffle: weak locality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Full block shuffle: weak locality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1
Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Vuduc, Riedy. “An Initial Characterization of the Emu
Chick,” Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2018.
GraphBLAS and Emus — 22 Sep 2020 3/8
Selected Results: x86 Pointer-Chasing Benchmark
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
0
20
40
60
80
100Memorybandwidth(GBs) peak STREAM bandwidth
56 threads
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
peak STREAM bandwidth
112 threads
block_shuffle intra_block_shuffle full_block_shuffle
Haswell results, every pattern is different.2
2
Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Riedy, Vuduc, Conte. “A Microbenchmark Characterization
of the Emu Chick.” Parallel Computing, 10.1016/j.parco.2019.04.012
GraphBLAS and Emus — 22 Sep 2020 4/8
Selected Results: Emu Pointer-Chasing Benchmark
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
0
2
4
6
8
10
12
Memorybandwidth(GBs)
peak STREAM bandwidth
2048 threads
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
peak STREAM bandwidth
4096 threads
block_shuffle intra_block_shuffle full_block_shuffle
Mostly flat performance, high utilization.2
GraphBLAS and Emus — 22 Sep 2020 5/8
Selected Results: BFS on a Dynamic Data Structure
15 16 17 18 19 20 21
scale
0
20
40
60
80
100
MTEPS
Emu single node - Cilk
Emu multi-node - Cilk
x86 Haswell - STINGER
x86 Haswell - Cilk
0
500
1000
1500
EdgeBandwidth(MB/s)
Note: Streaming data structure, not statically optimized. 3
3
Hein, Eswar, Abdurrahman Yasar, Prasanth Chatarasi, Li, Young, Conte, Ümit Çatalyürek, Vuduc, Riedy, Bora Uçar.
“Programming Strategies for Irregular Algorithms on the Emu Chick.” ACM ToPC, to appear.
https://arxiv.org/abs/1901.02775
GraphBLAS and Emus — 22 Sep 2020 6/8
Implications for a GraphBLAS Implementation
• We can be more flexible in data organization.
• Not tied to CSR / CSC / COO.
• NCDIMM: No cache line issues
• Stride between vertices, values can be arbitrary.
• Can incorporate more semantic information.
• Targeting “streaming” use.
• High rate of change in a massive graph.
• Linked list of blocks... (STINGER, HORNET)
• But must remember graphs live in a separate
memory space.
• Gossamer side calls stay there.
• Stationary core calls must transfer input and output.
GraphBLAS and Emus — 22 Sep 2020 7/8
Experiences “Porting” Existing Apps & Bindings
Capabilities nice to have:
• Allocating memory to hold k entries w/o knowing the type
• Converting the support to a bool GxB_Matrix (T →bool)
• Eases operating on masks of different types
• Execution context: SC-GC, GC-GC, SC-SC
• A sized blob type that is not a UDT
• Sometimes used to hold keys with no GB meaning
• Selects and ops with bools still useful
• Users want “iterators”
• Some uses are horrible to replace without a relational
join-type operation
• Still coming up with more...
GraphBLAS and Emus — 22 Sep 2020 8/8

More Related Content

Similar to GraphBLAS and Emus

Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
jsvetter
 

Similar to GraphBLAS and Emus (20)

LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...
 
Industrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric computeIndustrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric compute
 
NUMA-aware Scalable Graph Traversal on SGI UV Systems
NUMA-aware Scalable Graph Traversal on SGI UV SystemsNUMA-aware Scalable Graph Traversal on SGI UV Systems
NUMA-aware Scalable Graph Traversal on SGI UV Systems
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
 
Real-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor DataReal-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor Data
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
Introduction to STINGER
Introduction to STINGERIntroduction to STINGER
Introduction to STINGER
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
 
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
 
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
 
Cisco CCNA Data Center Networking Fundamentals
Cisco CCNA Data Center Networking FundamentalsCisco CCNA Data Center Networking Fundamentals
Cisco CCNA Data Center Networking Fundamentals
 
Accelerix ISSCC 1998 Paper
Accelerix ISSCC 1998 PaperAccelerix ISSCC 1998 Paper
Accelerix ISSCC 1998 Paper
 
Hpc 4 5
Hpc 4 5Hpc 4 5
Hpc 4 5
 
Architecture at Scale
Architecture at ScaleArchitecture at Scale
Architecture at Scale
 

More from Jason Riedy

More from Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity Analysis
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
 

Recently uploaded

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 

Recently uploaded (20)

Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 

GraphBLAS and Emus

  • 1. GraphBLAS and Emus Jason Riedy (all opinions my own, no planning guarantees) GraphBLAS BoF at IEEE HPEC, 22 September 2020 Lucata/ Emu Technology
  • 2. Lucata’s PGAS Architecture 1 nodelet Gossamer Core 1 Memory-Side Processor Gossamer Core 4 ... Migration Engine RapidIODisk I/O 8 nodelets per node 64 nodelets per Chick RapidIO Stationary Core • Cacheless multithreaded multicore • Memory-side “processor” at narrow-channel DRAM • Stationary core for OS • Physically distributed memory • Threads migrate in hardware on reads! GraphBLAS and Emus — 22 Sep 2020 2/8
  • 3. Pointer-Chasing Benchmark Data-dependent loads, fine-grained access1 Ordered 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Intra-block shuffle: weak locality 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Full block shuffle: weak locality 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Vuduc, Riedy. “An Initial Characterization of the Emu Chick,” Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2018. GraphBLAS and Emus — 22 Sep 2020 3/8
  • 4. Selected Results: x86 Pointer-Chasing Benchmark 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) 0 20 40 60 80 100Memorybandwidth(GBs) peak STREAM bandwidth 56 threads 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) peak STREAM bandwidth 112 threads block_shuffle intra_block_shuffle full_block_shuffle Haswell results, every pattern is different.2 2 Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Riedy, Vuduc, Conte. “A Microbenchmark Characterization of the Emu Chick.” Parallel Computing, 10.1016/j.parco.2019.04.012 GraphBLAS and Emus — 22 Sep 2020 4/8
  • 5. Selected Results: Emu Pointer-Chasing Benchmark 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) 0 2 4 6 8 10 12 Memorybandwidth(GBs) peak STREAM bandwidth 2048 threads 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) peak STREAM bandwidth 4096 threads block_shuffle intra_block_shuffle full_block_shuffle Mostly flat performance, high utilization.2 GraphBLAS and Emus — 22 Sep 2020 5/8
  • 6. Selected Results: BFS on a Dynamic Data Structure 15 16 17 18 19 20 21 scale 0 20 40 60 80 100 MTEPS Emu single node - Cilk Emu multi-node - Cilk x86 Haswell - STINGER x86 Haswell - Cilk 0 500 1000 1500 EdgeBandwidth(MB/s) Note: Streaming data structure, not statically optimized. 3 3 Hein, Eswar, Abdurrahman Yasar, Prasanth Chatarasi, Li, Young, Conte, Ümit Çatalyürek, Vuduc, Riedy, Bora Uçar. “Programming Strategies for Irregular Algorithms on the Emu Chick.” ACM ToPC, to appear. https://arxiv.org/abs/1901.02775 GraphBLAS and Emus — 22 Sep 2020 6/8
  • 7. Implications for a GraphBLAS Implementation • We can be more flexible in data organization. • Not tied to CSR / CSC / COO. • NCDIMM: No cache line issues • Stride between vertices, values can be arbitrary. • Can incorporate more semantic information. • Targeting “streaming” use. • High rate of change in a massive graph. • Linked list of blocks... (STINGER, HORNET) • But must remember graphs live in a separate memory space. • Gossamer side calls stay there. • Stationary core calls must transfer input and output. GraphBLAS and Emus — 22 Sep 2020 7/8
  • 8. Experiences “Porting” Existing Apps & Bindings Capabilities nice to have: • Allocating memory to hold k entries w/o knowing the type • Converting the support to a bool GxB_Matrix (T →bool) • Eases operating on masks of different types • Execution context: SC-GC, GC-GC, SC-SC • A sized blob type that is not a UDT • Sometimes used to hold keys with no GB meaning • Selects and ops with bools still useful • Users want “iterators” • Some uses are horrible to replace without a relational join-type operation • Still coming up with more... GraphBLAS and Emus — 22 Sep 2020 8/8