SlideShare a Scribd company logo
1 of 10
Download to read offline
Introducing Lucata’s GraphBLAS
v1.0 is tagged! And tested!
Jason Riedy and Shannon Kuntz
13 Oct 2021
Lucata Corporation
Parallel Graph Analysis
Known issues:
• Scattered memory access w/ small(?) seq. bursts
• Cache lines provide fraction of avail. BW
• Prefetechers fire up then mis-predict
• Large bursts (high degree) ⇒ load imbalance
• Combined with 90% diameter ≤ 8.
• Plenty of load-balancing pre-processing...
• Streaming: The graph is changing.
• Pre-processing can hurt where and when changes
are interesting.
CPU+cache systems have one set of coping mechanisms.
GPGPUs / flex. vectors another. And Lucata has yet more...
Lucata — Riedy, Kuntz — 13 Oct 2021 2/10
The Lucata Pathfinder PGAS architecture
NCDRAM: 64GiB
Memory-Side Proc.
Cores: 24 or so
Migration, I/O, etc.
SC
Node 0
Node 7
Chassis 0
Lucata System
Four chassis system is a 2 TiB NSF
CCRI resource at crnch.gatech.edu
• Optimized for weak locality
• Scattered jumps and seq. access
• Stationary core for OS per node + SSDs
• Hardware partitioned global address space
(PGAS) with a twist
• Plenty of network BW, low latency
• Details in a moment...
• Multithreaded multicore LCE (or GC)
• Currently 1536 threads per node, 12k
per chassis, 50k per 4 chassis.
• “Helps” with load balance
• No cache.
Lucata — Riedy, Kuntz — 13 Oct 2021 3/10
Lucata’s PGAS Twist for Weak Locality
NCDRAM: 64GiB
Memory-Side Proc.
Cores: 24 or so
Migration, I/O, etc.
SC
Node 0
Node 7
Chassis 0
Lucata System
Four chassis system is a 2 TiB NSF
CCRI resource at crnch.gatech.edu
• Threads write remotely, always read locally
• Writes: 8 Memory-side processors (MSPs)
• Writes+ don’t touch the cores.
• Handle some arithmetic ops. (FPADD)
• Deep queue, no control flow
• Reads ⇒migrate⇐.
• Hardware: Remote read ⇒ package
and send the thread context
• Read latency is local up to migration.
• Control flow depends on reads.
• Contrast with Tera MTA / Cray XMT:
Need far fewer threads, far less
network bandwidth.
Lucata — Riedy, Kuntz — 13 Oct 2021 4/10
Programming the not-Beast: Not painful.
NCDRAM: 64GiB
Memory-Side Proc.
Cores: 24 or so
Migration, I/O, etc.
SC
Node 0
Node 7
Chassis 0
Lucata System
Four chassis system is a 2 TiB NSF
CCRI resource at crnch.gatech.edu
• PGAS: Read and write directly.
• Memory views implemented in hardware
• Intra-node malloc
• Node-striped mw_malloc1d
• Node-blocked mw_malloc2d ...
• Implemented by pointer bitfields
• Fork/join parallelism: Cilk+ + extensions
• Yes, Cilk+ is alive: OpenCilk
• Fast: Spawning a thread ≈ function
• Composes: “Serial elision.”
• Collectives? In progress.
• Some Cilk+ reducers map
perfectly.
Lucata — Riedy, Kuntz — 13 Oct 2021 5/10
Challenges in Implementing the GraphBLAS
• Naïve kernels may migrate for every edge.
• Gustavson’s SpGEMM
• Treat with a form of message aggregation
• Collectives like prefix-scan...
• Fork/join is cheap, but not free.
• Prefix-scan must join, (re-fork, re-join,)2 join
• (e.g. Existing code from Arch Robinson.)
• Separate memory spaces: SC ∩ Lucata = ∅
• From SC: Data structures fully opaque
• From LCE/GC: Trigger but not fulfill transfer
• Anything that operates edge-by-edge via or outside the
GraphBLAS: NO
Lucata — Riedy, Kuntz — 13 Oct 2021 6/10
Current LGB Data Structure Perspective
Lucata — Riedy, Kuntz — 13 Oct 2021 7/10
Current LGB Data Structure before Operating
Lucata — Riedy, Kuntz — 13 Oct 2021 8/10
Future Aspects (Student Opportunities?)
• Add a fast, fixed-size block pool per matrix
• Fast de-allocation for temporaries
• Or ignore the pool, CSR-ish for marked temps
• Can alloc. consecutive blocks, next = count
• Opportunities for large-degree vertices!
• Can stripe across nodes, views are transparent
• Composing parallelism ⇒ more options than current
big/small splits (e.g. MTGL)
• Feed back into deeper Lucata spawning / flow
control decisions
• Extending CMU/SEI’s GBTLX (C++ with SPIRAL) for
PGAS, migratory threads
Lucata — Riedy, Kuntz — 13 Oct 2021 9/10
And Streaming! (NSF SBIR 2105977)
• HW supports multi-Gig network ingest per node
• Large-scale locking or snapshotting is a no-go
• Starting current streaming/updating algs...
• No locking: “Valid” algs reading edges atomically1
• Starting graph + some subset of concurrent changes
• BFS, connected components, linear algebra centralities
(PageRank, Katz), triangle counting
• Copying subgraphs also... Seed set expansion
• In turn can be updated “validly.”
• Other approaches: Aspen (already in Cilk+)
1
Chunxing Yin and J.R. Concurrent Katz Centrality for Streaming Graphs. HPEC 2019. DOI 10.1109/HPEC.2019.8916572
Lucata — Riedy, Kuntz — 13 Oct 2021 10/10

More Related Content

What's hot

Data Center Networking in the Era of Overlays
Data Center Networking in the Era of OverlaysData Center Networking in the Era of Overlays
Data Center Networking in the Era of OverlaysOpen Networking Summits
 
OpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful CloudsOpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful CloudsOpenNebula Project
 
Cloud2Ground xtream hyper converged cloud platform
Cloud2Ground xtream hyper converged cloud platformCloud2Ground xtream hyper converged cloud platform
Cloud2Ground xtream hyper converged cloud platformcloud2groundtech
 
Automatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapAutomatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapManolis Vavalis
 
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...Dmitry Afanasiev
 
ODSA PoC: Network Flow Processor Overview
ODSA PoC: Network Flow Processor OverviewODSA PoC: Network Flow Processor Overview
ODSA PoC: Network Flow Processor Overviewjennimenni
 
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan KoomanOpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan KoomanNETWAYS
 
Dotnet microarchitecture of a coarse-grain out-of-order superscalar processor
Dotnet  microarchitecture of a coarse-grain out-of-order superscalar processorDotnet  microarchitecture of a coarse-grain out-of-order superscalar processor
Dotnet microarchitecture of a coarse-grain out-of-order superscalar processorEcwaytech
 
Microarchitecture of a coarse grain out-of-order superscalar processor
Microarchitecture of a coarse grain out-of-order superscalar processorMicroarchitecture of a coarse grain out-of-order superscalar processor
Microarchitecture of a coarse grain out-of-order superscalar processorecway
 
Introduction to CloudStack Storage Subsystem
Introduction to CloudStack Storage SubsystemIntroduction to CloudStack Storage Subsystem
Introduction to CloudStack Storage Subsystembuildacloud
 
Play With Android
Play With AndroidPlay With Android
Play With AndroidChamp Yen
 
Cloud Computing as Innovation Hub - Mohammad Fairus Khalid
Cloud Computing as Innovation Hub - Mohammad Fairus KhalidCloud Computing as Innovation Hub - Mohammad Fairus Khalid
Cloud Computing as Innovation Hub - Mohammad Fairus KhalidOpenNebula Project
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Throughput oriented aarchitectures
Throughput oriented aarchitecturesThroughput oriented aarchitectures
Throughput oriented aarchitecturesNomy059
 
9 ec606a.36(finished)
9 ec606a.36(finished)9 ec606a.36(finished)
9 ec606a.36(finished)myrajendra
 
Linux kernel development chapter 10
Linux kernel development chapter 10Linux kernel development chapter 10
Linux kernel development chapter 10huangachou
 

What's hot (17)

Data Center Networking in the Era of Overlays
Data Center Networking in the Era of OverlaysData Center Networking in the Era of Overlays
Data Center Networking in the Era of Overlays
 
OpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful CloudsOpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful Clouds
 
Factored Operating Systems paper review
Factored Operating Systems paper reviewFactored Operating Systems paper review
Factored Operating Systems paper review
 
Cloud2Ground xtream hyper converged cloud platform
Cloud2Ground xtream hyper converged cloud platformCloud2Ground xtream hyper converged cloud platform
Cloud2Ground xtream hyper converged cloud platform
 
Automatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapAutomatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmap
 
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
 
ODSA PoC: Network Flow Processor Overview
ODSA PoC: Network Flow Processor OverviewODSA PoC: Network Flow Processor Overview
ODSA PoC: Network Flow Processor Overview
 
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan KoomanOpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
 
Dotnet microarchitecture of a coarse-grain out-of-order superscalar processor
Dotnet  microarchitecture of a coarse-grain out-of-order superscalar processorDotnet  microarchitecture of a coarse-grain out-of-order superscalar processor
Dotnet microarchitecture of a coarse-grain out-of-order superscalar processor
 
Microarchitecture of a coarse grain out-of-order superscalar processor
Microarchitecture of a coarse grain out-of-order superscalar processorMicroarchitecture of a coarse grain out-of-order superscalar processor
Microarchitecture of a coarse grain out-of-order superscalar processor
 
Introduction to CloudStack Storage Subsystem
Introduction to CloudStack Storage SubsystemIntroduction to CloudStack Storage Subsystem
Introduction to CloudStack Storage Subsystem
 
Play With Android
Play With AndroidPlay With Android
Play With Android
 
Cloud Computing as Innovation Hub - Mohammad Fairus Khalid
Cloud Computing as Innovation Hub - Mohammad Fairus KhalidCloud Computing as Innovation Hub - Mohammad Fairus Khalid
Cloud Computing as Innovation Hub - Mohammad Fairus Khalid
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Throughput oriented aarchitectures
Throughput oriented aarchitecturesThroughput oriented aarchitectures
Throughput oriented aarchitectures
 
9 ec606a.36(finished)
9 ec606a.36(finished)9 ec606a.36(finished)
9 ec606a.36(finished)
 
Linux kernel development chapter 10
Linux kernel development chapter 10Linux kernel development chapter 10
Linux kernel development chapter 10
 

Similar to LAGraph 2021-10-13

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
Linac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer RequirementsLinac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer Requirementsinside-BigData.com
 
数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战Weiwei Fang
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and EmusJason Riedy
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersSangjin Han
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Ontico
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allenjaxconf
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
 
HPCS16 - Frederick Lefebvre - Bridging the last mile
HPCS16 - Frederick Lefebvre - Bridging the last mileHPCS16 - Frederick Lefebvre - Bridging the last mile
HPCS16 - Frederick Lefebvre - Bridging the last mileFrédérick Lefebvre
 
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData Inc
 
Using Ceph for Large Hadron Collider Data
Using Ceph for Large Hadron Collider DataUsing Ceph for Large Hadron Collider Data
Using Ceph for Large Hadron Collider DataRob Gardner
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster CassandraTzach Livyatan
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
 

Similar to LAGraph 2021-10-13 (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Linac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer RequirementsLinac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer Requirements
 
数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacenters
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allen
 
Cpu Caches
Cpu CachesCpu Caches
Cpu Caches
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
19th Session.pptx
19th Session.pptx19th Session.pptx
19th Session.pptx
 
HPCS16 - Frederick Lefebvre - Bridging the last mile
HPCS16 - Frederick Lefebvre - Bridging the last mileHPCS16 - Frederick Lefebvre - Bridging the last mile
HPCS16 - Frederick Lefebvre - Bridging the last mile
 
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
 
Using Ceph for Large Hadron Collider Data
Using Ceph for Large Hadron Collider DataUsing Ceph for Large Hadron Collider Data
Using Ceph for Large Hadron Collider Data
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 

More from Jason Riedy

Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondJason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksJason Riedy
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateJason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisJason Riedy
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Jason Riedy
 

More from Jason Riedy (20)

Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity Analysis
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
 

Recently uploaded

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Recently uploaded (20)

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 

LAGraph 2021-10-13

  • 1. Introducing Lucata’s GraphBLAS v1.0 is tagged! And tested! Jason Riedy and Shannon Kuntz 13 Oct 2021 Lucata Corporation
  • 2. Parallel Graph Analysis Known issues: • Scattered memory access w/ small(?) seq. bursts • Cache lines provide fraction of avail. BW • Prefetechers fire up then mis-predict • Large bursts (high degree) ⇒ load imbalance • Combined with 90% diameter ≤ 8. • Plenty of load-balancing pre-processing... • Streaming: The graph is changing. • Pre-processing can hurt where and when changes are interesting. CPU+cache systems have one set of coping mechanisms. GPGPUs / flex. vectors another. And Lucata has yet more... Lucata — Riedy, Kuntz — 13 Oct 2021 2/10
  • 3. The Lucata Pathfinder PGAS architecture NCDRAM: 64GiB Memory-Side Proc. Cores: 24 or so Migration, I/O, etc. SC Node 0 Node 7 Chassis 0 Lucata System Four chassis system is a 2 TiB NSF CCRI resource at crnch.gatech.edu • Optimized for weak locality • Scattered jumps and seq. access • Stationary core for OS per node + SSDs • Hardware partitioned global address space (PGAS) with a twist • Plenty of network BW, low latency • Details in a moment... • Multithreaded multicore LCE (or GC) • Currently 1536 threads per node, 12k per chassis, 50k per 4 chassis. • “Helps” with load balance • No cache. Lucata — Riedy, Kuntz — 13 Oct 2021 3/10
  • 4. Lucata’s PGAS Twist for Weak Locality NCDRAM: 64GiB Memory-Side Proc. Cores: 24 or so Migration, I/O, etc. SC Node 0 Node 7 Chassis 0 Lucata System Four chassis system is a 2 TiB NSF CCRI resource at crnch.gatech.edu • Threads write remotely, always read locally • Writes: 8 Memory-side processors (MSPs) • Writes+ don’t touch the cores. • Handle some arithmetic ops. (FPADD) • Deep queue, no control flow • Reads ⇒migrate⇐. • Hardware: Remote read ⇒ package and send the thread context • Read latency is local up to migration. • Control flow depends on reads. • Contrast with Tera MTA / Cray XMT: Need far fewer threads, far less network bandwidth. Lucata — Riedy, Kuntz — 13 Oct 2021 4/10
  • 5. Programming the not-Beast: Not painful. NCDRAM: 64GiB Memory-Side Proc. Cores: 24 or so Migration, I/O, etc. SC Node 0 Node 7 Chassis 0 Lucata System Four chassis system is a 2 TiB NSF CCRI resource at crnch.gatech.edu • PGAS: Read and write directly. • Memory views implemented in hardware • Intra-node malloc • Node-striped mw_malloc1d • Node-blocked mw_malloc2d ... • Implemented by pointer bitfields • Fork/join parallelism: Cilk+ + extensions • Yes, Cilk+ is alive: OpenCilk • Fast: Spawning a thread ≈ function • Composes: “Serial elision.” • Collectives? In progress. • Some Cilk+ reducers map perfectly. Lucata — Riedy, Kuntz — 13 Oct 2021 5/10
  • 6. Challenges in Implementing the GraphBLAS • Naïve kernels may migrate for every edge. • Gustavson’s SpGEMM • Treat with a form of message aggregation • Collectives like prefix-scan... • Fork/join is cheap, but not free. • Prefix-scan must join, (re-fork, re-join,)2 join • (e.g. Existing code from Arch Robinson.) • Separate memory spaces: SC ∩ Lucata = ∅ • From SC: Data structures fully opaque • From LCE/GC: Trigger but not fulfill transfer • Anything that operates edge-by-edge via or outside the GraphBLAS: NO Lucata — Riedy, Kuntz — 13 Oct 2021 6/10
  • 7. Current LGB Data Structure Perspective Lucata — Riedy, Kuntz — 13 Oct 2021 7/10
  • 8. Current LGB Data Structure before Operating Lucata — Riedy, Kuntz — 13 Oct 2021 8/10
  • 9. Future Aspects (Student Opportunities?) • Add a fast, fixed-size block pool per matrix • Fast de-allocation for temporaries • Or ignore the pool, CSR-ish for marked temps • Can alloc. consecutive blocks, next = count • Opportunities for large-degree vertices! • Can stripe across nodes, views are transparent • Composing parallelism ⇒ more options than current big/small splits (e.g. MTGL) • Feed back into deeper Lucata spawning / flow control decisions • Extending CMU/SEI’s GBTLX (C++ with SPIRAL) for PGAS, migratory threads Lucata — Riedy, Kuntz — 13 Oct 2021 9/10
  • 10. And Streaming! (NSF SBIR 2105977) • HW supports multi-Gig network ingest per node • Large-scale locking or snapshotting is a no-go • Starting current streaming/updating algs... • No locking: “Valid” algs reading edges atomically1 • Starting graph + some subset of concurrent changes • BFS, connected components, linear algebra centralities (PageRank, Katz), triangle counting • Copying subgraphs also... Seed set expansion • In turn can be updated “validly.” • Other approaches: Aspen (already in Cilk+) 1 Chunxing Yin and J.R. Concurrent Katz Centrality for Streaming Graphs. HPEC 2019. DOI 10.1109/HPEC.2019.8916572 Lucata — Riedy, Kuntz — 13 Oct 2021 10/10