SlideShare a Scribd company logo
Novel Architectures for Applications in
Data Science and Beyond
Jason Riedy, Jeffrey Young, Tom Conte
Center for Research into Novel Computing Hierarchies at Georgia Tech
1 March 2019
Apps: Massive+-scale data analysis
Cyber-security Identify anomalies, malicious actors
Health care Find outbreaks, population epidemiology, similar
patient association
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating markets, smart &
sustainable cities
Systems biology Understanding interactions, drug design
Power grid / Smart cities Disruptions, conservation, prediction
Irregular data access. Changing data.
Rogues Gallery — 1 Mar 2019 2/23
High-Performance Data Analysis (HPDA)
Novel applications:
• Data at scale and speed needs new ideas for
computing analysis.
• “Big data” platforms fare poorly v. a single thread
plus large SSD even for static data sets. (McSherry,
Isard, Murray. “Scalability! But at what COST?” HotOS
XV, 2015.)
• Many high-level codes are written and re-written to
answer one question: need flexibility.
• Some primitives may be tuned and re-used.
Rogues Gallery — 1 Mar 2019 3/23
HPDA and Architectures
• Current architectures are hitting limits on
manufacturing, heat dissipation, memory latency...
• New architecture proposals are difficult to evaluate
via simulation and modeling alone.
• What happens when novel prototypes hit reality?
• Architects/designers need rapid feedback on new
ideas.
• New ideas only become successful with a
community: software ecosystem, trained students.
Need bridges between apps and architects.
Rogues Gallery — 1 Mar 2019 4/23
Introducing the CRNCH Rogues Gallery
A physical & virtual space for hosting novel computing
architectures, systems, and accelerators.
Emu Chick FPGAs & HMC/HBM
FPAA
Amortize effort and cost of trying novel architectures.
Break the “but it’s too much work” barrier.
http://crnch.gatech.edu/rogues-gallery
Rogues Gallery — 1 Mar 2019 5/23
Emu Technology’s PGAS Architecture
1 nodelet
Gossamer
Core 1
Memory-Side Processor
Gossamer
Core 4
...
Migration Engine
RapidIODisk I/O
8 nodelets
per node
64 nodelets
per Chick
RapidIO
Stationary
Core
• Multithreaded multicore
• Memory-side “processor” for
operations in
narrow-channel DRAM
• Stationary core for OS
• Threads migrate in
hardware on reads!
• Optimize for weak locality
Rogues Gallery — 1 Mar 2019 6/23
3D Stacked Memory and FPGAs
FPGAs enable flexible
“near-memory” research for both
hardware and programming
models:
• Micron/Pico EX700 & HMC
• Nallatech 385s (Arria 10)
• Intel Arria10 DevKit
• Nallatech 520N (Stratix 10)
• Xilinx MpSOC
Rogues Gallery — 1 Mar 2019 7/23
Neuromorphic Systems
• Field-Programmable Analog Array (FPAA) System-On
Chip, designed in the lab of Dr. Jennifer Hasler.
• Analog arrays can achieve unprecedented power and
size reductions.
FPAA “driven” by a RPi Neuromorphic Workshop,
27 Apr 2018
Rogues Gallery — 1 Mar 2019 8/23
Flexible Infrastructure
login /
notebook
rg-adm
Slurm Ctl
toolbox
(NFS)
Scheduling,
Tools, and
Admin
Key:
Schedulable Resource
Physical Resource
VM
USB device
User
Resources
fpaa-host
power-host
nvidia-tegra-N
nvidia-tegra-1
fpaa-dev
rg-db
Slurm DBD
emu-dev emu-chick
..Nfpga-dev-1
fpga-hmcfpga-intel
Powell, Riedy, Young, and Conte. “Wrangling Rogues: Managing
Experimental Post-Moore Architectures.”
https://arxiv.org/abs/1808.06334
• Available. Plans to
integrate with NSF
XSEDE.
• Scheduler being
deployed.
• Incorporates
Singularity and virtual
machines for
OS/library versioning.
• Fun tools: usbip for
FPAA... Analog inputs?
Rogues Gallery — 1 Mar 2019 9/23
Rogues Gallery Press
On The Next Platform, a partner of The Register:
https://www.nextplatform.com/2018/08/27/a-rogues-gallery-of-post-moores-law-options/
Rogues Gallery — 1 Mar 2019 10/23
Neuromorphic Workshop - April 2018
FPAA-focused workshop led by Dr.
Hasler introduced the wider
community to FPAA programming
for neuromorphic systems.
• Tutorial-style class
• Lightning talks from GT and
DoE researchers
• http://crnch.gatech.
edu/neuro-workshop18
Rogues Gallery — 1 Mar 2019 11/23
Rogues Gallery ASPLOS Tutorial
Programming Novel Architectures in the Post-Moore Era
with The Rogues Gallery
https://crnch-rg.gitlab.io/rg/asplos-2019/
• To be held at ASPLOS 2019 in Providence, RI.
• 14 April 2019, 8.30am – 12.00pm
• Architecture detailed descriptions
• Hands-on with the Emu Chick and (hopefully) the
FPAA
Rogues Gallery — 1 Mar 2019 12/23
Rogues Gallery VIP Team
Rogues Gallery VIP team has started in Spring 2019. This
course allows undergraduates to get research experience
working with software for novel architectures.
http://www.vip.gatech.edu/teams/
new-team-rogues-gallery
Rogues Gallery — 1 Mar 2019 13/23
Selected Results: Emu Pointer-Chasing Benchmark
Data-dependent loads, fine-grained access1
Ordered
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Intra-block shuffle: weak locality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Full block shuffle: weak locality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1
Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Vuduc, Riedy. “An Initial Characterization of the Emu
Chick,” Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2018.
Rogues Gallery — 1 Mar 2019 14/23
Selected Results: x86 Pointer-Chasing Benchmark
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
0
20
40
60
80
100Memorybandwidth(GBs)
peak STREAM bandwidth
56 threads
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
peak STREAM bandwidth
112 threads
block_shuffle intra_block_shuffle full_block_shuffle
Haswell results, every pattern is different.2
2
Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Riedy, Vuduc, Conte. “A Microbenchmark Characterization
of the Emu Chick.” https://arxiv.org/abs/1809.07696
Rogues Gallery — 1 Mar 2019 15/23
Selected Results: Emu Pointer-Chasing Benchmark
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4MBlock size (number of 16B elements)
0
2
4
6
8
10
12
Memorybandwidth(GBs)
peak STREAM bandwidth
2048 threads
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
peak STREAM bandwidth
4096 threads
block_shuffle intra_block_shuffle full_block_shuffle
Mostly flat performance, high utilization.2
Rogues Gallery — 1 Mar 2019 16/23
Selected Results: BFS on a Dynamic Data Structure
15 16 17 18 19 20 21
scale
0
20
40
60
80
100
MTEPS
Emu single node - Cilk
Emu multi-node - Cilk
x86 Haswell - STINGER
x86 Haswell - Cilk
0
500
1000
1500
EdgeBandwidth(MB/s)
Note: Streaming data structure, not statically optimized. 3
3
Hein, Eswar, Abdurrahman Yasar, Prasanth Chatarasi, Li, Young, Conte, Ümit Çatalyürek, Vuduc, Riedy, Bora Uçar.
“Programming Strategies for Irregular Algorithms on the Emu Chick.” https://arxiv.org/abs/1901.02775
Rogues Gallery — 1 Mar 2019 17/23
Selected Results: Labeled Subgraph Alignment
1 2 4 8 16 32 64 128
Number of Threads
0
10
20
30
40
50
Speedup
Multi-BLK
Multi-HCB
Single-BLK
Single-HCB
gsaNA, the first parallel algorithm, strong scaling on DBLP
graph (2048 vertices)3
Rogues Gallery — 1 Mar 2019 18/23
Selected Results: FPGAs
Hadidi, Asgari, Young, Mudassar, Garg, Krishna, Kim. “Performance
Implications of NoCs on 3D-Stacked Memories: Insights from the
Hybrid Memory Cube (HMC),” ISPASS 2018
• Characterizations
with FPGA and Hybrid
Memory Cube show
latency/bandwidth
tradeoff.
• Other FPGA work is
focused on compilers,
HPC prototyping, and
sparse algorithms for
Intel and Xilinx FPGAS.
Rogues Gallery — 1 Mar 2019 19/23
Lessons Learned, Emu Chick i
• Finding appropriate metrics is difficult, e.g. for the
Emu:
• Comparing ASICs (e.g. x86) to FPGA-based prototypes
can be unfair either way.
• Fraction of peak bandwidth for the idealized
problem?
• SpMV: FLOP/s ∝ BW, level 2 sparse BLAS op.
• Graph500 BFS: TEPS ∝ BW
Rogues Gallery — 1 Mar 2019 20/23
Lessons Learned, Emu Chick ii
• Distilling observations on architecture ↔
programming model:
• Program data location for load (BW) balance.
• Remote memory operations v. migration exposes the
architecture.
• Migrations cost more than it appears. Computation?
• Stack spills/access can cause ping-ponging.
• How does HW support for top-down (Cilk-ish) affect
bottom-up (UPC/SHMEM) PGAS programming?
• Memory allocation similar to UPC, SHMEM
• UPC++ rpc_ff v. Emu thread migration?
Rogues Gallery — 1 Mar 2019 21/23
Acknowledgments
Fantastic students and colleagues:
• Srinivas Eswar (GT CSE)
• Dr. Eric Hein (GT ECE ⇒ Emu)
• Patrick Lavin (GT CSE)
• Dr. Jiajia Li (GT CSE ⇒ PNNL)
• Abdurrahman Yaşar (GT CSE)
• Dr. Ümit Çatalürek (GT CSE)
• Dr. Tom Conte (GT CS/ECE)
• Dr. Bora Uçar (ENS Lyon CNRS)
• Dr. Rich Vuduc (GT CSE)
• Dr. Jeffrey S. Young (GT CS)
Code (ideally will have links from crnch.gatech.edu):
• https://gitlab.com/crnch-rg (soon)
• https://github.com/ehein6/emu-microbench
Other testbeds:
• ORNL: ExCL
• PNNL: CENATE
• Argonne
• Sandia
• Berkeley: AQCT
• (others?)
Rogues Gallery — 1 Mar 2019 22/23
Rogues Gallery: Active and Growing
• Added FPGA resources, integrating FPAAs
• Tight development loop with Emu
• Active research projects and publications
• Community outreach and education underway
• One GT student has made the leap to industry
already...
CRNCH Rogues Gallery connects researchers and
students with novel architectures and architects with
upcoming applications.
Let us host / manage your neat stuff!
http://crnch.gatech.edu/rogues-gallery
Rogues Gallery — 1 Mar 2019 23/23

More Related Content

Similar to Novel Architectures for Applications in Data Science and Beyond

CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
Jason Riedy
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
Larry Smarr
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
Kyong-Ha Lee
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
Larry Smarr
 
Challenges in end-to-end performance
Challenges in end-to-end performanceChallenges in end-to-end performance
Challenges in end-to-end performance
Jisc
 
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
KTN
 
Acceleration of XML Parsing through Prefetching
Acceleration of XML  Parsing through PrefetchingAcceleration of XML  Parsing through Prefetching
Acceleration of XML Parsing through PrefetchingRohit Deshpande
 
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Larry Smarr
 
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Larry Smarr
 
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
Geoffrey Fox
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
Jason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
Jason Riedy
 
[150824]symposium v4
[150824]symposium v4[150824]symposium v4
[150824]symposium v4
yyooooon
 
Multicore series-1-0223
Multicore series-1-0223Multicore series-1-0223
Multicore series-1-0223Aysha Khan
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in space
Facultad de Informática UCM
 
Priorities Shift In IC Design
Priorities Shift In IC DesignPriorities Shift In IC Design
Priorities Shift In IC Design
Abacus Technologies
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
eSAT Journals
 
Gridcomputing
GridcomputingGridcomputing
Gridcomputingpchengi
 
PRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardPRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path Forward
Larry Smarr
 

Similar to Novel Architectures for Applications in Data Science and Beyond (20)

CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
Challenges in end-to-end performance
Challenges in end-to-end performanceChallenges in end-to-end performance
Challenges in end-to-end performance
 
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
 
Acceleration of XML Parsing through Prefetching
Acceleration of XML  Parsing through PrefetchingAcceleration of XML  Parsing through Prefetching
Acceleration of XML Parsing through Prefetching
 
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
 
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
 
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
[150824]symposium v4
[150824]symposium v4[150824]symposium v4
[150824]symposium v4
 
Multicore series-1-0223
Multicore series-1-0223Multicore series-1-0223
Multicore series-1-0223
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 
Priorities Shift In IC Design
Priorities Shift In IC DesignPriorities Shift In IC Design
Priorities Shift In IC Design
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
 
Gridcomputing
GridcomputingGridcomputing
Gridcomputing
 
PRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardPRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path Forward
 

More from Jason Riedy

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
Jason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
Jason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
Jason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
Jason Riedy
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
Jason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Jason Riedy
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
Jason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
Jason Riedy
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
Jason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
Jason Riedy
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Jason Riedy
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
Jason Riedy
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity Analysis
Jason Riedy
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Jason Riedy
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
 

More from Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity Analysis
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 

Recently uploaded

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 

Recently uploaded (20)

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 

Novel Architectures for Applications in Data Science and Beyond

  • 1. Novel Architectures for Applications in Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies at Georgia Tech 1 March 2019
  • 2. Apps: Massive+-scale data analysis Cyber-security Identify anomalies, malicious actors Health care Find outbreaks, population epidemiology, similar patient association Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating markets, smart & sustainable cities Systems biology Understanding interactions, drug design Power grid / Smart cities Disruptions, conservation, prediction Irregular data access. Changing data. Rogues Gallery — 1 Mar 2019 2/23
  • 3. High-Performance Data Analysis (HPDA) Novel applications: • Data at scale and speed needs new ideas for computing analysis. • “Big data” platforms fare poorly v. a single thread plus large SSD even for static data sets. (McSherry, Isard, Murray. “Scalability! But at what COST?” HotOS XV, 2015.) • Many high-level codes are written and re-written to answer one question: need flexibility. • Some primitives may be tuned and re-used. Rogues Gallery — 1 Mar 2019 3/23
  • 4. HPDA and Architectures • Current architectures are hitting limits on manufacturing, heat dissipation, memory latency... • New architecture proposals are difficult to evaluate via simulation and modeling alone. • What happens when novel prototypes hit reality? • Architects/designers need rapid feedback on new ideas. • New ideas only become successful with a community: software ecosystem, trained students. Need bridges between apps and architects. Rogues Gallery — 1 Mar 2019 4/23
  • 5. Introducing the CRNCH Rogues Gallery A physical & virtual space for hosting novel computing architectures, systems, and accelerators. Emu Chick FPGAs & HMC/HBM FPAA Amortize effort and cost of trying novel architectures. Break the “but it’s too much work” barrier. http://crnch.gatech.edu/rogues-gallery Rogues Gallery — 1 Mar 2019 5/23
  • 6. Emu Technology’s PGAS Architecture 1 nodelet Gossamer Core 1 Memory-Side Processor Gossamer Core 4 ... Migration Engine RapidIODisk I/O 8 nodelets per node 64 nodelets per Chick RapidIO Stationary Core • Multithreaded multicore • Memory-side “processor” for operations in narrow-channel DRAM • Stationary core for OS • Threads migrate in hardware on reads! • Optimize for weak locality Rogues Gallery — 1 Mar 2019 6/23
  • 7. 3D Stacked Memory and FPGAs FPGAs enable flexible “near-memory” research for both hardware and programming models: • Micron/Pico EX700 & HMC • Nallatech 385s (Arria 10) • Intel Arria10 DevKit • Nallatech 520N (Stratix 10) • Xilinx MpSOC Rogues Gallery — 1 Mar 2019 7/23
  • 8. Neuromorphic Systems • Field-Programmable Analog Array (FPAA) System-On Chip, designed in the lab of Dr. Jennifer Hasler. • Analog arrays can achieve unprecedented power and size reductions. FPAA “driven” by a RPi Neuromorphic Workshop, 27 Apr 2018 Rogues Gallery — 1 Mar 2019 8/23
  • 9. Flexible Infrastructure login / notebook rg-adm Slurm Ctl toolbox (NFS) Scheduling, Tools, and Admin Key: Schedulable Resource Physical Resource VM USB device User Resources fpaa-host power-host nvidia-tegra-N nvidia-tegra-1 fpaa-dev rg-db Slurm DBD emu-dev emu-chick ..Nfpga-dev-1 fpga-hmcfpga-intel Powell, Riedy, Young, and Conte. “Wrangling Rogues: Managing Experimental Post-Moore Architectures.” https://arxiv.org/abs/1808.06334 • Available. Plans to integrate with NSF XSEDE. • Scheduler being deployed. • Incorporates Singularity and virtual machines for OS/library versioning. • Fun tools: usbip for FPAA... Analog inputs? Rogues Gallery — 1 Mar 2019 9/23
  • 10. Rogues Gallery Press On The Next Platform, a partner of The Register: https://www.nextplatform.com/2018/08/27/a-rogues-gallery-of-post-moores-law-options/ Rogues Gallery — 1 Mar 2019 10/23
  • 11. Neuromorphic Workshop - April 2018 FPAA-focused workshop led by Dr. Hasler introduced the wider community to FPAA programming for neuromorphic systems. • Tutorial-style class • Lightning talks from GT and DoE researchers • http://crnch.gatech. edu/neuro-workshop18 Rogues Gallery — 1 Mar 2019 11/23
  • 12. Rogues Gallery ASPLOS Tutorial Programming Novel Architectures in the Post-Moore Era with The Rogues Gallery https://crnch-rg.gitlab.io/rg/asplos-2019/ • To be held at ASPLOS 2019 in Providence, RI. • 14 April 2019, 8.30am – 12.00pm • Architecture detailed descriptions • Hands-on with the Emu Chick and (hopefully) the FPAA Rogues Gallery — 1 Mar 2019 12/23
  • 13. Rogues Gallery VIP Team Rogues Gallery VIP team has started in Spring 2019. This course allows undergraduates to get research experience working with software for novel architectures. http://www.vip.gatech.edu/teams/ new-team-rogues-gallery Rogues Gallery — 1 Mar 2019 13/23
  • 14. Selected Results: Emu Pointer-Chasing Benchmark Data-dependent loads, fine-grained access1 Ordered 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Intra-block shuffle: weak locality 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Full block shuffle: weak locality 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Vuduc, Riedy. “An Initial Characterization of the Emu Chick,” Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2018. Rogues Gallery — 1 Mar 2019 14/23
  • 15. Selected Results: x86 Pointer-Chasing Benchmark 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) 0 20 40 60 80 100Memorybandwidth(GBs) peak STREAM bandwidth 56 threads 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) peak STREAM bandwidth 112 threads block_shuffle intra_block_shuffle full_block_shuffle Haswell results, every pattern is different.2 2 Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Riedy, Vuduc, Conte. “A Microbenchmark Characterization of the Emu Chick.” https://arxiv.org/abs/1809.07696 Rogues Gallery — 1 Mar 2019 15/23
  • 16. Selected Results: Emu Pointer-Chasing Benchmark 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4MBlock size (number of 16B elements) 0 2 4 6 8 10 12 Memorybandwidth(GBs) peak STREAM bandwidth 2048 threads 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Block size (number of 16B elements) peak STREAM bandwidth 4096 threads block_shuffle intra_block_shuffle full_block_shuffle Mostly flat performance, high utilization.2 Rogues Gallery — 1 Mar 2019 16/23
  • 17. Selected Results: BFS on a Dynamic Data Structure 15 16 17 18 19 20 21 scale 0 20 40 60 80 100 MTEPS Emu single node - Cilk Emu multi-node - Cilk x86 Haswell - STINGER x86 Haswell - Cilk 0 500 1000 1500 EdgeBandwidth(MB/s) Note: Streaming data structure, not statically optimized. 3 3 Hein, Eswar, Abdurrahman Yasar, Prasanth Chatarasi, Li, Young, Conte, Ümit Çatalyürek, Vuduc, Riedy, Bora Uçar. “Programming Strategies for Irregular Algorithms on the Emu Chick.” https://arxiv.org/abs/1901.02775 Rogues Gallery — 1 Mar 2019 17/23
  • 18. Selected Results: Labeled Subgraph Alignment 1 2 4 8 16 32 64 128 Number of Threads 0 10 20 30 40 50 Speedup Multi-BLK Multi-HCB Single-BLK Single-HCB gsaNA, the first parallel algorithm, strong scaling on DBLP graph (2048 vertices)3 Rogues Gallery — 1 Mar 2019 18/23
  • 19. Selected Results: FPGAs Hadidi, Asgari, Young, Mudassar, Garg, Krishna, Kim. “Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube (HMC),” ISPASS 2018 • Characterizations with FPGA and Hybrid Memory Cube show latency/bandwidth tradeoff. • Other FPGA work is focused on compilers, HPC prototyping, and sparse algorithms for Intel and Xilinx FPGAS. Rogues Gallery — 1 Mar 2019 19/23
  • 20. Lessons Learned, Emu Chick i • Finding appropriate metrics is difficult, e.g. for the Emu: • Comparing ASICs (e.g. x86) to FPGA-based prototypes can be unfair either way. • Fraction of peak bandwidth for the idealized problem? • SpMV: FLOP/s ∝ BW, level 2 sparse BLAS op. • Graph500 BFS: TEPS ∝ BW Rogues Gallery — 1 Mar 2019 20/23
  • 21. Lessons Learned, Emu Chick ii • Distilling observations on architecture ↔ programming model: • Program data location for load (BW) balance. • Remote memory operations v. migration exposes the architecture. • Migrations cost more than it appears. Computation? • Stack spills/access can cause ping-ponging. • How does HW support for top-down (Cilk-ish) affect bottom-up (UPC/SHMEM) PGAS programming? • Memory allocation similar to UPC, SHMEM • UPC++ rpc_ff v. Emu thread migration? Rogues Gallery — 1 Mar 2019 21/23
  • 22. Acknowledgments Fantastic students and colleagues: • Srinivas Eswar (GT CSE) • Dr. Eric Hein (GT ECE ⇒ Emu) • Patrick Lavin (GT CSE) • Dr. Jiajia Li (GT CSE ⇒ PNNL) • Abdurrahman Yaşar (GT CSE) • Dr. Ümit Çatalürek (GT CSE) • Dr. Tom Conte (GT CS/ECE) • Dr. Bora Uçar (ENS Lyon CNRS) • Dr. Rich Vuduc (GT CSE) • Dr. Jeffrey S. Young (GT CS) Code (ideally will have links from crnch.gatech.edu): • https://gitlab.com/crnch-rg (soon) • https://github.com/ehein6/emu-microbench Other testbeds: • ORNL: ExCL • PNNL: CENATE • Argonne • Sandia • Berkeley: AQCT • (others?) Rogues Gallery — 1 Mar 2019 22/23
  • 23. Rogues Gallery: Active and Growing • Added FPGA resources, integrating FPAAs • Tight development loop with Emu • Active research projects and publications • Community outreach and education underway • One GT student has made the leap to industry already... CRNCH Rogues Gallery connects researchers and students with novel architectures and architects with upcoming applications. Let us host / manage your neat stuff! http://crnch.gatech.edu/rogues-gallery Rogues Gallery — 1 Mar 2019 23/23