SlideShare a Scribd company logo
Dec 7, 2013
CARL2013@Davis, CA

PyCoRAM
Yet Another Implementation of CoRAM Memory
Architecture for Modern FPGA-based Computing

Shinya Takamaeda-Yamazaki†‡, Kenji Kise†, James C. Hoe*
†Tokyo

Institute of Technology

‡JSPS

Research Fellow

*Carnegie

Mellon University
Agenda

n  Background
n  PyCoRAM Overview
n  PyCoRAM Microarchitecture
n  Evaluation
n  Conclusion

Dec 7, 2013

Shinya T-Y. Tokyo Tech

2
Background

Dec 7, 2013

Shinya T-Y. Tokyo Tech

3
FPGA as SoC
n  Put together various components on a single FPGA
l  CPU core

FPGA

•  Microblaze (Soft-macro)
•  Cortex-A9 (Hard-macro)

CPU

HW
Acc

HW
Acc

l  Hardware accelerator logic
Interconnect

•  Modeled in traditional RTL
–  Verilog HDL, VHDL

•  Modeled in new modeling tool

Ether

DRAM
I/F

PCI-E

–  Bluespec, AutoESL, Chisel, …

l  DDRx DRAM interface
l  PCI-express
l  Ethernet, …
Dec 7, 2013

Shinya T-Y. Tokyo Tech

4
Portability Issue of Application Design
n  How to support various FPGA platforms?
l  Different logic size,
memory interface,
peripherals and I/O propertyL

Digilent Atlys
(Xilinx Spartan-6 LX45)

ScalableCore System (our FPGA system)
(Xilinx Spartan-6 LX16 × 128-node)
Dec 7, 2013

Shinya T-Y. Tokyo Tech

Xilinx ML605
(Xilinx Virtex-6 LX240T)
5
IP-core Based System Development
n  To build a system, add IP-cores and connect themJ
l  IP-cores are connected through a standard on-chip interconnect
l  EDK automatically generates an on-chip interconnection and
(some) device-dependent interfaces
•  No (or few) annoying steps!
IP-core List
FPGA
CPU
IP-core
Instances

HW
Acc

HW
Acc

Interconnect
Ether

Interconnect

DRAM
I/F

PCI-E

DRAM
Dec 7, 2013

Xilinx Platform Studio (XPS)

Shinya T-Y. Tokyo Tech

6
Abstract Memory System for FPGAs
n  CoRAM (Connected RAM) [FPGA’11]
l  High-level abstraction for memory management
•  Decoupling computing logics and memory access behaviors
•  Memory access patterns in software model (C language)
Read/Write

Communication
FIFOs (Registers)

CoRAM
Channel

Read/Write

Read
Write

CoRAM
Memory

Abstracted
On-chip Memories

HW Kernels
(Computing Logics)

Dec 7, 2013

Off-chip
Memory

Shinya T-Y. Tokyo Tech

Manage
Control Threads
(Memory Access
Pattern in C)

7
What  “Runs”  CoRAM?

From
CoRAM Tutorial
@FPGA’13

RTL conversion
Core Logic
SRAM

Control
thread
programs

Architecture
Microarchitecture

FPGA

Network-on-Chip
Memory Translation (TLBs)
Memory Interfaces and Caches

6/19/2013

Dec 7, 2013

CoRAM Tutorial

Shinya T-Y. Tokyo Tech

Cluster DMA

Control Logic

Fabric

Cluster DMA

Fabric

Control Logic

High-level synthesis
from C to RTL
using LLVM

CONNECT
NoC generator
18

8
PyCoRAM Overview

Dec 7, 2013

Shinya T-Y. Tokyo Tech

9
Motivation: CoRAM for EDK
n  Integration of CoRAM memory architecture for modern
EDK-based development flow with standard IP-cores
Portable application
design with CoRAM

Cooperation with standard IP-cores

Accelerator logic
Standard IP-core

CPU core

CoRAM
Abstraction
Standard On-chip Interconnect
Device-dependent Interfaces

Dec 7, 2013

Shinya T-Y. Tokyo Tech

10
PyCoRAM
n  Python-based implementation of CoRAM memory
architecture for modern FPGA EDKs
l  CoRAM memory abstraction for EDK development flow

n  Key features
l  Control Thread in Python
•  We developed Python-to-Verilog HLS Compiler from scratch

l  AMBA AXI4 Interconnect for on-chip interconnect
•  For IP-core based development on Xilinx Platform Studio (XPS)

l  Parameterized RTL Design Support for User-logic
•  Generate-statement and Parameter-statement analyzed by our
original Verilog analysis tool-chain (Pyverilog)

Dec 7, 2013

Shinya T-Y. Tokyo Tech

11
Comparison with Original CoRAM
CoRAM

PyCoRAM

Language
for Control-Thread

C

Python

Supported
Memory Operations

(Blocking/Non-Blocking)
Read/Write

(Blocking/Non-Blocking)
Read/Write

On-chip Interconnect

CONNECT NoC [FPGA’12]

AMBA AXI4

FSM Granularity
in Control Thread

LLVM-IR

Python AST Node

Generate Statement
Support for User logics

No

Yes

Supported FPGAs

Xilinx ML605
Altera Terasic DE-4

Any FPGAs
supporting AXI Bus

# Lines of Code

11,682 lines
(w/o CONNECT)

4,922 lines
(w/o Pyverilog)

FSM: Finite State Machine
LLVM-IR: Low Level Virtual Machine Intermediate Representation
AST: Abstract Syntax Tree
Shinya T-Y. Tokyo Tech
Dec 7, 2013

12
PyCoRAM Development Flow
n  PyCoRAM generates an IP-core package from user-logic
RTLs and control thread scripts in Python
l  Each part can be replaced with the original CoRAM’s component
RTL Conversion

User-logic
(Verilog HDL)

Control
Threads
(Python)

Portable
Application
Design

Dec 7, 2013

Logic
Hierarchy
Analysis

Python-toVerilog
Compilation

Control
Signal
Insertion

IP-core
generation
with AXI4
Interface
IP-core
Packing

Control
Signal Port
Addition

PyCoRAM Tool-chain
Python-to-Verilog HLS
Shinya T-Y. Tokyo Tech

(RTL,
.mpd,
and
.pao)

Top design
synthesis with
AXI4
IP-core
Integration
on EDK

Synthesis

FPGA
Bit
File

Vendor EDA Flow
13
FPGA Accelerator with PyCoRAM IP-core
FPGA

Other
AXI
IP-core
or
CPU

PyCoRAM IP
(Application)

CoRAM
Memory
DMA
Cluster

HW Kernels
(Computing Logics)

CoRAM
Memory

DMAC

AXI I/F

CoRAM
Channel

CoRAM
Stream

CoRAM
Stream

DMAC

DMAC

DMAC

AXI I/F

AXI I/F

AXI I/F

CoRAM
Memory
DMA
Cluster

Control
Thread

CoRAM
Memory

FSM

AXI4 Interconnect

DRAM Controller

DRAM (Off-chip)

Dec 7, 2013

Shinya T-Y. Tokyo Tech

14
PyCoRAM
Microarchitecture

Dec 7, 2013

Shinya T-Y. Tokyo Tech

15
PyCoRAM Microarchitecture (Logical View)
GPIO

User
I/O

User Logic
CoRAM
Register

Control
Thread

CoRAM
Channel
CoRAM
Memory
DMAC

Dec 7, 2013

CoRAM
Stream
DMAC

Shinya T-Y. Tokyo Tech

FSM

16
PyCoRAM Microarchitecture (Logical View)
Modeled in RTL
(Verilog HDL) User
I/O

GPIO

User Logic
CoRAM
Register

Control
Thread

Memory Access
Pattern
in Python

CoRAM
Channel
CoRAM
Memory
DMAC

Dec 7, 2013

CoRAM
Stream
DMAC

Shinya T-Y. Tokyo Tech

FSM

17
PyCoRAM Microarchitecture (Physical View)
GPIO

User
I/O

PyCoRAM IP
User Logic
CoRAM
Register

Control
Thread

CoRAM
Channel
CoRAM
Memory

CoRAM
Stream

DMAC

DMAC

AXI I/F

AXI I/F

FSM

AXI4 Interconnect

FPGA
Dec 7, 2013

DRAM Controller
Shinya T-Y. Tokyo Tech

18
PyCoRAM Microarchitecture (Physical View)
GPIO

User
I/O

Control Thread
in Python

PyCoRAM IP
User Logic
CoRAM
Register

Control
Thread

CoRAM
Channel

Parameterized RTL
CoRAM
Design Support
Memory

CoRAM
Stream

DMAC

DMAC

AXI I/F

AXI I/F

FSM

AXI4 Master Interface

AXI4 Interconnect

FPGA
Dec 7, 2013

DRAM Controller
Shinya T-Y. Tokyo Tech

19
Control Thread in Python
n  Operations for CoRAM objects
l  To/from CoRAM Memory

User
I/O

User Logic

•  Data movement pattern with DMA operations
between on-chip CoRAM memory and DRAM

l  To/from CoRAM Channel
•  Token communication action
between user-logic and control thread

Control
Thread

CoRAM
Channel
CoRAM
Memory

FSM

DMAC

0� def calc_sum(times):�
ram = CoramMemory(idx=0, datawidth=32, size=1024)�
1�
channel = CoramChannel(idx=0, datawidth=32)�
2�
addr = 0�
3�
sum = 0�
4�
for i in range(times):�
5�
ram.write(0, addr, 128)� # Transfer (off-chip DRAM to BRAM)
6�
channel.write(addr)�
# Notification to User-logic
7�
sum += channel.read()� # Wait for Notification from User-logic
8�
addr += 128 * (32/8)�
9�
print(‘sum=’, sum)�
# $display Verilog system task
10�
�
11� calc_sum(8)�
Dec 7, 2013

Shinya T-Y. Tokyo Tech

20
CoRAM objects in User Logic
n  CoRAM objects as standard BRAM or FIFO
l  Very similar interface to the standard memory components
l  User-logic can use their contents in them in the same way

n  Essential parameters to define object characteristics
l  Thread name, ID, data width, address length, …

CoramMemory1P�
#(�
.CORAM_THREAD_NAME("thread_name"),�
.CORAM_ID(0),�
.CORAM_ADDR_LEN(ADDR_LEN),�
.CORAM_DATA_WIDTH(DATA_WIDTH)�
)�
inst_memory�
(.CLK(CLK),�
.ADDR(mem_addr),�
.D(mem_d),�
.WE(mem_we),�
.Q(mem_q)�
);�
Dec 7, 2013

CoramChannel�
#(�
.CORAM_THREAD_NAME("thread_name"),�
.CORAM_ID(0),�
.CORAM_ADDR_LEN(CHANNEL_ADDR_LEN),�
.CORAM_DATA_WIDTH(CHANNEL_DATA_WIDTH)�
)�
inst_channel�
(.CLK(CLK),�
.RST(RST),�
.D(comm_d),�
.ENQ(comm_enq),�
.FULL(comm_full),�
.Q(comm_q),�
.DEQ(comm_deq),�
.EMPTY(comm_empty)�
);�

(a) CoRAM Memory

(b) CoRAM Channel

Shinya T-Y. Tokyo Tech

21
AXI4 Master Interface
n  DMA controller works as AXI4 master IP-core interface
WrData

Enque

AlmFull

WrData

Enque

AlmFull

Empty
Deque

RdData

FSM

Addr

Size

RdEn

RdEn

Ready

WrData

Enque

AlmFull

Control
Thread

RdEn
Busy
Ready

RdData

・・・

WrEn

DMA Controller

Deque

RdData

Empty
Deque

WrEn

WrData

BramAddr
DramAddr
Size

Empty

CoRAM
Channel

DMA
Cluster

WrEn

Addr

WrData

・・・

RdData

CoRAM
Memory
(BRAM)

WrEn

WrData

RdData

Addr

CoRAM
Memory
(BRAM)

RdData

Addr

WrEn

RdData

WrData

Addr

HW Kernels
(Computing Logic)

Write Address
Channel

Write Data
Channel

Read Address
Channel

RDATA

RREADY

RVALID

ARADDR

ARLEN

ARVALID

ARREADY

WDATA

BVALID

WVALID

WREADY

AWADDR

AWLEN

AWVALID

AWREADY

AXI Master Interface
(Protocol Conversion)

Read Data
Channel

AXI4 Interconnect

Dec 7, 2013

Shinya T-Y. Tokyo Tech

22
For Parameterized RTL design support
n  Generate-statement support by advanced RTL analyzer
l  Not supported by the original CoRAM compiler
Dataflow

n  Pyverilog: Python-based Tool-chain
for Verilog HDL Design
l  Parser
l  Dataflow Analysis
l  Optimization
l  RTL Code Generation
l  Control flow Analysis
l  Graphical Output
Dec 7, 2013

State Machine
Shinya T-Y. Tokyo Tech

23
Evaluation

Dec 7, 2013

Shinya T-Y. Tokyo Tech

24
Evaluation
n  Point: Maximum memory bandwidth utilization
l  PyCoRAM is a memory abstraction framework

n  Setup
l  2 FPGA boards
•  Digilent Atlys
–  Spartan-6 LX45
–  DDR2-800 DRAM 128MB (1.2GB/s*)

*Due to 300MHz operation

–  AXI4 128-bit, 100MHz (1.6GB/s)

•  Xilinx ML605

Digilent Atlys
(Xilinx Spartan-6 LX45)

–  Virtex-6 LX240T
–  DDR3-800 DRAM 512MB (6.4GB/s)
–  AXI4 256-bit, 200MHz (6.4GB/s)

l  EDK
•  Xilinx Platform Studio (14.6)
Dec 7, 2013

Shinya T-Y. Tokyo Tech

Xilinx ML605
(Xilinx Virtex-6 LX240T)

25
Evaluation: Application
n  Array-sum: calculate summation value of an array
l  Two CoRAM memories as Double-buffered
l  Varying SIMD width (=# simultaneous ops) to check the effect to
the memory bandwidth utilization
•  4, 8, 16, 32, 64 (bytes)
sum

Output

+

MUX

s3

s2

s1

s0

+

+

+

+

D[3]

D[2]

D[1]

D[0]

MUX
CoRAM
Memory
0

Dec 7, 2013

D[3
]

D[2
]

D[1
]

D[0
]

from DMA Controller 0

D[3
]

D[2
]

D[1
]

D[0
]

from DMA Controller 1

Shinya T-Y. Tokyo Tech

CoRAM
Memory
1

26
Memory Bandwidth Utilization
n  Good bandwidth utilization
l  Atlys: 85.5% (at 16-byte)
l  ML605: 84.9% (at 64-byte)

n  Degradation reasons
l  Sequential (single) transaction for each DMA controller

1

Atlys (Spartan-6)
Bandwidth Utilization

Bandwidth Utilization

•  Memory latency directly affects the performance adversely

0.8
0.6
0.4
0.2
0
4

Dec 7, 2013

8
16
SIMD size [byte]

1

ML605 (Virtex-6)

0.8
0.6
0.4
0.2

32
Shinya T-Y. Tokyo Tech

0
4

8
16
32
SIMD size [byte]

64

27
Conclusion and …

Dec 7, 2013

Shinya T-Y. Tokyo Tech

28
Conclusion
n  PyCoRAM: Python-based implementation of CoRAM
memory architecture for modern FPGA EDKs
n  Future work
l  Further evaluation on more realistic applications
l  AXI4 slave feature for control thread
l  Tutorial slideJ
Portable application
design with CoRAM

Cooperation with standard IP-cores

Accelerator logic
Standard IP-core

CPU core

CoRAM
Abstraction
Standard On-chip Interconnect
Device-dependent Interfaces

Dec 7, 2013

Automatically managed by EDK
Shinya T-Y. Tokyo Tech

29
PyCoRAM and Pyverilog are ready for public!

n PyCoRAM (0.7.0-public)
l https://github.com/shtaxxx/PyCoRAM

n Pyverilog (0.6.0-public)
l https://github.com/shtaxxx/Pyverilog

Thanks!
Dec 7, 2013

Shinya T-Y. Tokyo Tech

30

More Related Content

What's hot

Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
RISC-V International
 
An open flow for dn ns on ultra low-power RISC-V cores
An open flow for dn ns on ultra low-power RISC-V coresAn open flow for dn ns on ultra low-power RISC-V cores
An open flow for dn ns on ultra low-power RISC-V cores
RISC-V International
 
ゆるふわコンピュータ (IPSJ-ONE2017)
ゆるふわコンピュータ (IPSJ-ONE2017)ゆるふわコンピュータ (IPSJ-ONE2017)
ゆるふわコンピュータ (IPSJ-ONE2017)
Shinya Takamaeda-Y
 
RISC-V 30908 patra
RISC-V 30908 patraRISC-V 30908 patra
RISC-V 30908 patra
RISC-V International
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Anne Nicolas
 
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
Shinya Takamaeda-Y
 
FPGAs for Supercomputing: The Why and How
FPGAs for Supercomputing: The Why and HowFPGAs for Supercomputing: The Why and How
FPGAs for Supercomputing: The Why and How
DESMOND YUEN
 
AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhere
inside-BigData.com
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
RISC-V International
 
Improve Vectorization Efficiency
Improve Vectorization EfficiencyImprove Vectorization Efficiency
Improve Vectorization Efficiency
Intel® Software
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
Intel® Software
 
RISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLD
Ray Song
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket Chip
RISC-V International
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
RISC-V International
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
AMD Developer Central
 
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systemsA compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
Takefumi MIYOSHI
 
Andes open cl for RISC-V
Andes open cl for RISC-VAndes open cl for RISC-V
Andes open cl for RISC-V
RISC-V International
 
RISC-V assembly
RISC-V assemblyRISC-V assembly
RISC-V assembly
Peter Cheung
 
RISC-V Zce Extension
RISC-V Zce ExtensionRISC-V Zce Extension
RISC-V Zce Extension
RISC-V International
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
LEGATO project
 

What's hot (20)

Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
 
An open flow for dn ns on ultra low-power RISC-V cores
An open flow for dn ns on ultra low-power RISC-V coresAn open flow for dn ns on ultra low-power RISC-V cores
An open flow for dn ns on ultra low-power RISC-V cores
 
ゆるふわコンピュータ (IPSJ-ONE2017)
ゆるふわコンピュータ (IPSJ-ONE2017)ゆるふわコンピュータ (IPSJ-ONE2017)
ゆるふわコンピュータ (IPSJ-ONE2017)
 
RISC-V 30908 patra
RISC-V 30908 patraRISC-V 30908 patra
RISC-V 30908 patra
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
 
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
 
FPGAs for Supercomputing: The Why and How
FPGAs for Supercomputing: The Why and HowFPGAs for Supercomputing: The Why and How
FPGAs for Supercomputing: The Why and How
 
AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhere
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 
Improve Vectorization Efficiency
Improve Vectorization EfficiencyImprove Vectorization Efficiency
Improve Vectorization Efficiency
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
 
RISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLD
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket Chip
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systemsA compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
 
Andes open cl for RISC-V
Andes open cl for RISC-VAndes open cl for RISC-V
Andes open cl for RISC-V
 
RISC-V assembly
RISC-V assemblyRISC-V assembly
RISC-V assembly
 
RISC-V Zce Extension
RISC-V Zce ExtensionRISC-V Zce Extension
RISC-V Zce Extension
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 

Viewers also liked

Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
Shinya Takamaeda-Y
 
マルチパラダイム型高水準ハードウェア設計環境の検討
マルチパラダイム型高水準ハードウェア設計環境の検討マルチパラダイム型高水準ハードウェア設計環境の検討
マルチパラダイム型高水準ハードウェア設計環境の検討
Shinya Takamaeda-Y
 
PyCoRAMによるPythonを用いたポータブルなFPGAアクセラレータ開発 (チュートリアル@ESS2014)
PyCoRAMによるPythonを用いたポータブルなFPGAアクセラレータ開発 (チュートリアル@ESS2014)PyCoRAMによるPythonを用いたポータブルなFPGAアクセラレータ開発 (チュートリアル@ESS2014)
PyCoRAMによるPythonを用いたポータブルなFPGAアクセラレータ開発 (チュートリアル@ESS2014)
Shinya Takamaeda-Y
 
PythonとPyCoRAMでお手軽にFPGAシステムを開発してみよう
PythonとPyCoRAMでお手軽にFPGAシステムを開発してみようPythonとPyCoRAMでお手軽にFPGAシステムを開発してみよう
PythonとPyCoRAMでお手軽にFPGAシステムを開発してみよう
Shinya Takamaeda-Y
 
Zynq+PyCoRAM(+Debian)入門
Zynq+PyCoRAM(+Debian)入門Zynq+PyCoRAM(+Debian)入門
Zynq+PyCoRAM(+Debian)入門
Shinya Takamaeda-Y
 
8051 memory
8051 memory8051 memory
8051 memory
Mayank Garg
 
PyCoRAM (高位合成友の会@ドワンゴ, 2015年1月16日)
PyCoRAM (高位合成友の会@ドワンゴ, 2015年1月16日)PyCoRAM (高位合成友の会@ドワンゴ, 2015年1月16日)
PyCoRAM (高位合成友の会@ドワンゴ, 2015年1月16日)
Shinya Takamaeda-Y
 
Fpga 02-memory-and-pl ds
Fpga 02-memory-and-pl dsFpga 02-memory-and-pl ds
Fpga 02-memory-and-pl ds
Malik Tauqir Hasan
 
Direct memory access
Direct memory accessDirect memory access
Direct memory access
WBUTTUTORIALS
 
Pythonによる高位設計フレームワークPyCoRAMでFPGAシステムを開発してみよう
Pythonによる高位設計フレームワークPyCoRAMでFPGAシステムを開発してみようPythonによる高位設計フレームワークPyCoRAMでFPGAシステムを開発してみよう
Pythonによる高位設計フレームワークPyCoRAMでFPGAシステムを開発してみよう
Shinya Takamaeda-Y
 
memory 8051
memory  8051memory  8051
memory 8051
VJ Aiswaryadevi
 
8051 Inturrpt
8051 Inturrpt8051 Inturrpt
8051 Inturrpt
Ramasubbu .P
 
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
Shinya Takamaeda-Y
 
Interrupt programming with 8051 microcontroller
Interrupt programming with 8051  microcontrollerInterrupt programming with 8051  microcontroller
Interrupt programming with 8051 microcontroller
Ankit Bhatnagar
 
DL Hacks輪読 Semi-supervised Learning with Deep Generative Models
DL Hacks輪読 Semi-supervised Learning with Deep Generative ModelsDL Hacks輪読 Semi-supervised Learning with Deep Generative Models
DL Hacks輪読 Semi-supervised Learning with Deep Generative Models
Yusuke Iwasawa
 
8 interrupt 8051
8 interrupt 80518 interrupt 8051
8 interrupt 8051
daniemol
 
DPA
DPADPA
8086 Interrupts & With DOS and BIOS by vijay
8086 Interrupts &  With DOS and BIOS  by vijay8086 Interrupts &  With DOS and BIOS  by vijay
8086 Interrupts & With DOS and BIOS by vijay
Vijay Kumar
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
Ken Kuroki
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
suga93
 

Viewers also liked (20)

Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
 
マルチパラダイム型高水準ハードウェア設計環境の検討
マルチパラダイム型高水準ハードウェア設計環境の検討マルチパラダイム型高水準ハードウェア設計環境の検討
マルチパラダイム型高水準ハードウェア設計環境の検討
 
PyCoRAMによるPythonを用いたポータブルなFPGAアクセラレータ開発 (チュートリアル@ESS2014)
PyCoRAMによるPythonを用いたポータブルなFPGAアクセラレータ開発 (チュートリアル@ESS2014)PyCoRAMによるPythonを用いたポータブルなFPGAアクセラレータ開発 (チュートリアル@ESS2014)
PyCoRAMによるPythonを用いたポータブルなFPGAアクセラレータ開発 (チュートリアル@ESS2014)
 
PythonとPyCoRAMでお手軽にFPGAシステムを開発してみよう
PythonとPyCoRAMでお手軽にFPGAシステムを開発してみようPythonとPyCoRAMでお手軽にFPGAシステムを開発してみよう
PythonとPyCoRAMでお手軽にFPGAシステムを開発してみよう
 
Zynq+PyCoRAM(+Debian)入門
Zynq+PyCoRAM(+Debian)入門Zynq+PyCoRAM(+Debian)入門
Zynq+PyCoRAM(+Debian)入門
 
8051 memory
8051 memory8051 memory
8051 memory
 
PyCoRAM (高位合成友の会@ドワンゴ, 2015年1月16日)
PyCoRAM (高位合成友の会@ドワンゴ, 2015年1月16日)PyCoRAM (高位合成友の会@ドワンゴ, 2015年1月16日)
PyCoRAM (高位合成友の会@ドワンゴ, 2015年1月16日)
 
Fpga 02-memory-and-pl ds
Fpga 02-memory-and-pl dsFpga 02-memory-and-pl ds
Fpga 02-memory-and-pl ds
 
Direct memory access
Direct memory accessDirect memory access
Direct memory access
 
Pythonによる高位設計フレームワークPyCoRAMでFPGAシステムを開発してみよう
Pythonによる高位設計フレームワークPyCoRAMでFPGAシステムを開発してみようPythonによる高位設計フレームワークPyCoRAMでFPGAシステムを開発してみよう
Pythonによる高位設計フレームワークPyCoRAMでFPGAシステムを開発してみよう
 
memory 8051
memory  8051memory  8051
memory 8051
 
8051 Inturrpt
8051 Inturrpt8051 Inturrpt
8051 Inturrpt
 
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
 
Interrupt programming with 8051 microcontroller
Interrupt programming with 8051  microcontrollerInterrupt programming with 8051  microcontroller
Interrupt programming with 8051 microcontroller
 
DL Hacks輪読 Semi-supervised Learning with Deep Generative Models
DL Hacks輪読 Semi-supervised Learning with Deep Generative ModelsDL Hacks輪読 Semi-supervised Learning with Deep Generative Models
DL Hacks輪読 Semi-supervised Learning with Deep Generative Models
 
8 interrupt 8051
8 interrupt 80518 interrupt 8051
8 interrupt 8051
 
DPA
DPADPA
DPA
 
8086 Interrupts & With DOS and BIOS by vijay
8086 Interrupts &  With DOS and BIOS  by vijay8086 Interrupts &  With DOS and BIOS  by vijay
8086 Interrupts & With DOS and BIOS by vijay
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 

Similar to PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond Moore11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond Moore
RCCSRENKEI
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
Shinya Takamaeda-Y
 
Digital Systems Design
Digital Systems DesignDigital Systems Design
Digital Systems Design
Reza Sameni
 
P4_tutorial.pdf
P4_tutorial.pdfP4_tutorial.pdf
P4_tutorial.pdf
PramodhN3
 
Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdl
Arshit Rai
 
Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdl
Arshit Rai
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
Yutaka Kawai
 
FPGA_prototyping proccesing with conclusion
FPGA_prototyping proccesing with conclusionFPGA_prototyping proccesing with conclusion
FPGA_prototyping proccesing with conclusion
PersiPersi1
 
FIELD PROGRAMMABLE GATE ARRAYS AND THEIR APPLICATIONS
FIELD PROGRAMMABLE GATE ARRAYS AND THEIR APPLICATIONSFIELD PROGRAMMABLE GATE ARRAYS AND THEIR APPLICATIONS
FIELD PROGRAMMABLE GATE ARRAYS AND THEIR APPLICATIONS
Professor at RYM Engineering College, Ballari
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
LEGATO project
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Software
 
0.FPGA for dummies: Historical introduction
0.FPGA for dummies: Historical introduction0.FPGA for dummies: Historical introduction
0.FPGA for dummies: Historical introduction
Maurizio Donna
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
LEGATO project
 
Fixed-point Multi-Core DSP Platform
Fixed-point Multi-Core DSP PlatformFixed-point Multi-Core DSP Platform
Fixed-point Multi-Core DSP Platform
Sundance Multiprocessor Technology Ltd.
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Dr. Fabio Baruffa
 
Fundamentals of FPGA
Fundamentals of FPGAFundamentals of FPGA
Fundamentals of FPGA
velamakuri
 
An Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate ArraysAn Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate Arrays
KingshukDas35
 
CASFPGA1.ppt
CASFPGA1.pptCASFPGA1.ppt
CASFPGA1.ppt
AswiniSamantray2
 
FPGA @ UPB-BGA
FPGA @ UPB-BGAFPGA @ UPB-BGA
FPGA @ UPB-BGA
Jose Pinilla
 

Similar to PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46) (20)

11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond Moore11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond Moore
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
 
Digital Systems Design
Digital Systems DesignDigital Systems Design
Digital Systems Design
 
P4_tutorial.pdf
P4_tutorial.pdfP4_tutorial.pdf
P4_tutorial.pdf
 
Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdl
 
Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdl
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
FPGA_prototyping proccesing with conclusion
FPGA_prototyping proccesing with conclusionFPGA_prototyping proccesing with conclusion
FPGA_prototyping proccesing with conclusion
 
FIELD PROGRAMMABLE GATE ARRAYS AND THEIR APPLICATIONS
FIELD PROGRAMMABLE GATE ARRAYS AND THEIR APPLICATIONSFIELD PROGRAMMABLE GATE ARRAYS AND THEIR APPLICATIONS
FIELD PROGRAMMABLE GATE ARRAYS AND THEIR APPLICATIONS
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
0.FPGA for dummies: Historical introduction
0.FPGA for dummies: Historical introduction0.FPGA for dummies: Historical introduction
0.FPGA for dummies: Historical introduction
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
Fixed-point Multi-Core DSP Platform
Fixed-point Multi-Core DSP PlatformFixed-point Multi-Core DSP Platform
Fixed-point Multi-Core DSP Platform
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
Fundamentals of FPGA
Fundamentals of FPGAFundamentals of FPGA
Fundamentals of FPGA
 
An Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate ArraysAn Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate Arrays
 
CASFPGA1.ppt
CASFPGA1.pptCASFPGA1.ppt
CASFPGA1.ppt
 
FPGA @ UPB-BGA
FPGA @ UPB-BGAFPGA @ UPB-BGA
FPGA @ UPB-BGA
 

More from Shinya Takamaeda-Y

オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステムオープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
Shinya Takamaeda-Y
 
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモDNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
Shinya Takamaeda-Y
 
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Shinya Takamaeda-Y
 
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Shinya Takamaeda-Y
 
PythonとVeriloggenを用いたRTL設計メタプログラミング
PythonとVeriloggenを用いたRTL設計メタプログラミングPythonとVeriloggenを用いたRTL設計メタプログラミング
PythonとVeriloggenを用いたRTL設計メタプログラミング
Shinya Takamaeda-Y
 
Pythonを用いた高水準ハードウェア設計環境の検討
Pythonを用いた高水準ハードウェア設計環境の検討Pythonを用いた高水準ハードウェア設計環境の検討
Pythonを用いた高水準ハードウェア設計環境の検討
Shinya Takamaeda-Y
 
コンピュータアーキテクチャ研究の最新動向〜ISCA2015参加報告〜 @FPGAエクストリーム・コンピューティング 第7回 (#fpgax #7)
コンピュータアーキテクチャ研究の最新動向〜ISCA2015参加報告〜 @FPGAエクストリーム・コンピューティング 第7回 (#fpgax #7)コンピュータアーキテクチャ研究の最新動向〜ISCA2015参加報告〜 @FPGAエクストリーム・コンピューティング 第7回 (#fpgax #7)
コンピュータアーキテクチャ研究の最新動向〜ISCA2015参加報告〜 @FPGAエクストリーム・コンピューティング 第7回 (#fpgax #7)
Shinya Takamaeda-Y
 
FPGA・リコンフィギャラブルシステム研究の最新動向
FPGA・リコンフィギャラブルシステム研究の最新動向FPGA・リコンフィギャラブルシステム研究の最新動向
FPGA・リコンフィギャラブルシステム研究の最新動向
Shinya Takamaeda-Y
 
PyCoRAMを用いたグラフ処理FPGAアクセラレータ
PyCoRAMを用いたグラフ処理FPGAアクセラレータPyCoRAMを用いたグラフ処理FPGAアクセラレータ
PyCoRAMを用いたグラフ処理FPGAアクセラレータ
Shinya Takamaeda-Y
 
メモリ抽象化フレームワークPyCoRAMを用いたソフトプロセッサ混載FPGAアクセラレータの開発
メモリ抽象化フレームワークPyCoRAMを用いたソフトプロセッサ混載FPGAアクセラレータの開発メモリ抽象化フレームワークPyCoRAMを用いたソフトプロセッサ混載FPGAアクセラレータの開発
メモリ抽象化フレームワークPyCoRAMを用いたソフトプロセッサ混載FPGAアクセラレータの開発
Shinya Takamaeda-Y
 
むかし名言集bot作りました!
むかし名言集bot作りました!むかし名言集bot作りました!
むかし名言集bot作りました!
Shinya Takamaeda-Y
 
APGAS言語X10を用いたオンチップネットワークシミュレーションの並列化
APGAS言語X10を用いたオンチップネットワークシミュレーションの並列化APGAS言語X10を用いたオンチップネットワークシミュレーションの並列化
APGAS言語X10を用いたオンチップネットワークシミュレーションの並列化
Shinya Takamaeda-Y
 
Mapping Applications with Collectives over Sub-communicators on Torus Network...
Mapping Applications with Collectives over Sub-communicators on Torus Network...Mapping Applications with Collectives over Sub-communicators on Torus Network...
Mapping Applications with Collectives over Sub-communicators on Torus Network...
Shinya Takamaeda-Y
 
Network Performance of Multifunction On-chip Router Architectures (IEICE-CPSY...
Network Performance of Multifunction On-chip Router Architectures (IEICE-CPSY...Network Performance of Multifunction On-chip Router Architectures (IEICE-CPSY...
Network Performance of Multifunction On-chip Router Architectures (IEICE-CPSY...
Shinya Takamaeda-Y
 

More from Shinya Takamaeda-Y (14)

オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステムオープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
 
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモDNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
 
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
 
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
 
PythonとVeriloggenを用いたRTL設計メタプログラミング
PythonとVeriloggenを用いたRTL設計メタプログラミングPythonとVeriloggenを用いたRTL設計メタプログラミング
PythonとVeriloggenを用いたRTL設計メタプログラミング
 
Pythonを用いた高水準ハードウェア設計環境の検討
Pythonを用いた高水準ハードウェア設計環境の検討Pythonを用いた高水準ハードウェア設計環境の検討
Pythonを用いた高水準ハードウェア設計環境の検討
 
コンピュータアーキテクチャ研究の最新動向〜ISCA2015参加報告〜 @FPGAエクストリーム・コンピューティング 第7回 (#fpgax #7)
コンピュータアーキテクチャ研究の最新動向〜ISCA2015参加報告〜 @FPGAエクストリーム・コンピューティング 第7回 (#fpgax #7)コンピュータアーキテクチャ研究の最新動向〜ISCA2015参加報告〜 @FPGAエクストリーム・コンピューティング 第7回 (#fpgax #7)
コンピュータアーキテクチャ研究の最新動向〜ISCA2015参加報告〜 @FPGAエクストリーム・コンピューティング 第7回 (#fpgax #7)
 
FPGA・リコンフィギャラブルシステム研究の最新動向
FPGA・リコンフィギャラブルシステム研究の最新動向FPGA・リコンフィギャラブルシステム研究の最新動向
FPGA・リコンフィギャラブルシステム研究の最新動向
 
PyCoRAMを用いたグラフ処理FPGAアクセラレータ
PyCoRAMを用いたグラフ処理FPGAアクセラレータPyCoRAMを用いたグラフ処理FPGAアクセラレータ
PyCoRAMを用いたグラフ処理FPGAアクセラレータ
 
メモリ抽象化フレームワークPyCoRAMを用いたソフトプロセッサ混載FPGAアクセラレータの開発
メモリ抽象化フレームワークPyCoRAMを用いたソフトプロセッサ混載FPGAアクセラレータの開発メモリ抽象化フレームワークPyCoRAMを用いたソフトプロセッサ混載FPGAアクセラレータの開発
メモリ抽象化フレームワークPyCoRAMを用いたソフトプロセッサ混載FPGAアクセラレータの開発
 
むかし名言集bot作りました!
むかし名言集bot作りました!むかし名言集bot作りました!
むかし名言集bot作りました!
 
APGAS言語X10を用いたオンチップネットワークシミュレーションの並列化
APGAS言語X10を用いたオンチップネットワークシミュレーションの並列化APGAS言語X10を用いたオンチップネットワークシミュレーションの並列化
APGAS言語X10を用いたオンチップネットワークシミュレーションの並列化
 
Mapping Applications with Collectives over Sub-communicators on Torus Network...
Mapping Applications with Collectives over Sub-communicators on Torus Network...Mapping Applications with Collectives over Sub-communicators on Torus Network...
Mapping Applications with Collectives over Sub-communicators on Torus Network...
 
Network Performance of Multifunction On-chip Router Architectures (IEICE-CPSY...
Network Performance of Multifunction On-chip Router Architectures (IEICE-CPSY...Network Performance of Multifunction On-chip Router Architectures (IEICE-CPSY...
Network Performance of Multifunction On-chip Router Architectures (IEICE-CPSY...
 

Recently uploaded

"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
Anant Gupta
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
What's new in android: jetpack compose 2024
What's new in android: jetpack compose 2024What's new in android: jetpack compose 2024
What's new in android: jetpack compose 2024
Toru Wonyoung Choi
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
SubhamMandal40
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
ldtexsolbl
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
SynapseIndia
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
shyamraj55
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
ssuser1915fe1
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
Brian Pichman
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 

Recently uploaded (20)

"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
What's new in android: jetpack compose 2024
What's new in android: jetpack compose 2024What's new in android: jetpack compose 2024
What's new in android: jetpack compose 2024
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 

PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

  • 1. Dec 7, 2013 CARL2013@Davis, CA PyCoRAM Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing Shinya Takamaeda-Yamazaki†‡, Kenji Kise†, James C. Hoe* †Tokyo Institute of Technology ‡JSPS Research Fellow *Carnegie Mellon University
  • 2. Agenda n  Background n  PyCoRAM Overview n  PyCoRAM Microarchitecture n  Evaluation n  Conclusion Dec 7, 2013 Shinya T-Y. Tokyo Tech 2
  • 3. Background Dec 7, 2013 Shinya T-Y. Tokyo Tech 3
  • 4. FPGA as SoC n  Put together various components on a single FPGA l  CPU core FPGA •  Microblaze (Soft-macro) •  Cortex-A9 (Hard-macro) CPU HW Acc HW Acc l  Hardware accelerator logic Interconnect •  Modeled in traditional RTL –  Verilog HDL, VHDL •  Modeled in new modeling tool Ether DRAM I/F PCI-E –  Bluespec, AutoESL, Chisel, … l  DDRx DRAM interface l  PCI-express l  Ethernet, … Dec 7, 2013 Shinya T-Y. Tokyo Tech 4
  • 5. Portability Issue of Application Design n  How to support various FPGA platforms? l  Different logic size, memory interface, peripherals and I/O propertyL Digilent Atlys (Xilinx Spartan-6 LX45) ScalableCore System (our FPGA system) (Xilinx Spartan-6 LX16 × 128-node) Dec 7, 2013 Shinya T-Y. Tokyo Tech Xilinx ML605 (Xilinx Virtex-6 LX240T) 5
  • 6. IP-core Based System Development n  To build a system, add IP-cores and connect themJ l  IP-cores are connected through a standard on-chip interconnect l  EDK automatically generates an on-chip interconnection and (some) device-dependent interfaces •  No (or few) annoying steps! IP-core List FPGA CPU IP-core Instances HW Acc HW Acc Interconnect Ether Interconnect DRAM I/F PCI-E DRAM Dec 7, 2013 Xilinx Platform Studio (XPS) Shinya T-Y. Tokyo Tech 6
  • 7. Abstract Memory System for FPGAs n  CoRAM (Connected RAM) [FPGA’11] l  High-level abstraction for memory management •  Decoupling computing logics and memory access behaviors •  Memory access patterns in software model (C language) Read/Write Communication FIFOs (Registers) CoRAM Channel Read/Write Read Write CoRAM Memory Abstracted On-chip Memories HW Kernels (Computing Logics) Dec 7, 2013 Off-chip Memory Shinya T-Y. Tokyo Tech Manage Control Threads (Memory Access Pattern in C) 7
  • 8. What  “Runs”  CoRAM? From CoRAM Tutorial @FPGA’13 RTL conversion Core Logic SRAM Control thread programs Architecture Microarchitecture FPGA Network-on-Chip Memory Translation (TLBs) Memory Interfaces and Caches 6/19/2013 Dec 7, 2013 CoRAM Tutorial Shinya T-Y. Tokyo Tech Cluster DMA Control Logic Fabric Cluster DMA Fabric Control Logic High-level synthesis from C to RTL using LLVM CONNECT NoC generator 18 8
  • 9. PyCoRAM Overview Dec 7, 2013 Shinya T-Y. Tokyo Tech 9
  • 10. Motivation: CoRAM for EDK n  Integration of CoRAM memory architecture for modern EDK-based development flow with standard IP-cores Portable application design with CoRAM Cooperation with standard IP-cores Accelerator logic Standard IP-core CPU core CoRAM Abstraction Standard On-chip Interconnect Device-dependent Interfaces Dec 7, 2013 Shinya T-Y. Tokyo Tech 10
  • 11. PyCoRAM n  Python-based implementation of CoRAM memory architecture for modern FPGA EDKs l  CoRAM memory abstraction for EDK development flow n  Key features l  Control Thread in Python •  We developed Python-to-Verilog HLS Compiler from scratch l  AMBA AXI4 Interconnect for on-chip interconnect •  For IP-core based development on Xilinx Platform Studio (XPS) l  Parameterized RTL Design Support for User-logic •  Generate-statement and Parameter-statement analyzed by our original Verilog analysis tool-chain (Pyverilog) Dec 7, 2013 Shinya T-Y. Tokyo Tech 11
  • 12. Comparison with Original CoRAM CoRAM PyCoRAM Language for Control-Thread C Python Supported Memory Operations (Blocking/Non-Blocking) Read/Write (Blocking/Non-Blocking) Read/Write On-chip Interconnect CONNECT NoC [FPGA’12] AMBA AXI4 FSM Granularity in Control Thread LLVM-IR Python AST Node Generate Statement Support for User logics No Yes Supported FPGAs Xilinx ML605 Altera Terasic DE-4 Any FPGAs supporting AXI Bus # Lines of Code 11,682 lines (w/o CONNECT) 4,922 lines (w/o Pyverilog) FSM: Finite State Machine LLVM-IR: Low Level Virtual Machine Intermediate Representation AST: Abstract Syntax Tree Shinya T-Y. Tokyo Tech Dec 7, 2013 12
  • 13. PyCoRAM Development Flow n  PyCoRAM generates an IP-core package from user-logic RTLs and control thread scripts in Python l  Each part can be replaced with the original CoRAM’s component RTL Conversion User-logic (Verilog HDL) Control Threads (Python) Portable Application Design Dec 7, 2013 Logic Hierarchy Analysis Python-toVerilog Compilation Control Signal Insertion IP-core generation with AXI4 Interface IP-core Packing Control Signal Port Addition PyCoRAM Tool-chain Python-to-Verilog HLS Shinya T-Y. Tokyo Tech (RTL, .mpd, and .pao) Top design synthesis with AXI4 IP-core Integration on EDK Synthesis FPGA Bit File Vendor EDA Flow 13
  • 14. FPGA Accelerator with PyCoRAM IP-core FPGA Other AXI IP-core or CPU PyCoRAM IP (Application) CoRAM Memory DMA Cluster HW Kernels (Computing Logics) CoRAM Memory DMAC AXI I/F CoRAM Channel CoRAM Stream CoRAM Stream DMAC DMAC DMAC AXI I/F AXI I/F AXI I/F CoRAM Memory DMA Cluster Control Thread CoRAM Memory FSM AXI4 Interconnect DRAM Controller DRAM (Off-chip) Dec 7, 2013 Shinya T-Y. Tokyo Tech 14
  • 16. PyCoRAM Microarchitecture (Logical View) GPIO User I/O User Logic CoRAM Register Control Thread CoRAM Channel CoRAM Memory DMAC Dec 7, 2013 CoRAM Stream DMAC Shinya T-Y. Tokyo Tech FSM 16
  • 17. PyCoRAM Microarchitecture (Logical View) Modeled in RTL (Verilog HDL) User I/O GPIO User Logic CoRAM Register Control Thread Memory Access Pattern in Python CoRAM Channel CoRAM Memory DMAC Dec 7, 2013 CoRAM Stream DMAC Shinya T-Y. Tokyo Tech FSM 17
  • 18. PyCoRAM Microarchitecture (Physical View) GPIO User I/O PyCoRAM IP User Logic CoRAM Register Control Thread CoRAM Channel CoRAM Memory CoRAM Stream DMAC DMAC AXI I/F AXI I/F FSM AXI4 Interconnect FPGA Dec 7, 2013 DRAM Controller Shinya T-Y. Tokyo Tech 18
  • 19. PyCoRAM Microarchitecture (Physical View) GPIO User I/O Control Thread in Python PyCoRAM IP User Logic CoRAM Register Control Thread CoRAM Channel Parameterized RTL CoRAM Design Support Memory CoRAM Stream DMAC DMAC AXI I/F AXI I/F FSM AXI4 Master Interface AXI4 Interconnect FPGA Dec 7, 2013 DRAM Controller Shinya T-Y. Tokyo Tech 19
  • 20. Control Thread in Python n  Operations for CoRAM objects l  To/from CoRAM Memory User I/O User Logic •  Data movement pattern with DMA operations between on-chip CoRAM memory and DRAM l  To/from CoRAM Channel •  Token communication action between user-logic and control thread Control Thread CoRAM Channel CoRAM Memory FSM DMAC 0� def calc_sum(times):� ram = CoramMemory(idx=0, datawidth=32, size=1024)� 1� channel = CoramChannel(idx=0, datawidth=32)� 2� addr = 0� 3� sum = 0� 4� for i in range(times):� 5� ram.write(0, addr, 128)� # Transfer (off-chip DRAM to BRAM) 6� channel.write(addr)� # Notification to User-logic 7� sum += channel.read()� # Wait for Notification from User-logic 8� addr += 128 * (32/8)� 9� print(‘sum=’, sum)� # $display Verilog system task 10� � 11� calc_sum(8)� Dec 7, 2013 Shinya T-Y. Tokyo Tech 20
  • 21. CoRAM objects in User Logic n  CoRAM objects as standard BRAM or FIFO l  Very similar interface to the standard memory components l  User-logic can use their contents in them in the same way n  Essential parameters to define object characteristics l  Thread name, ID, data width, address length, … CoramMemory1P� #(� .CORAM_THREAD_NAME("thread_name"),� .CORAM_ID(0),� .CORAM_ADDR_LEN(ADDR_LEN),� .CORAM_DATA_WIDTH(DATA_WIDTH)� )� inst_memory� (.CLK(CLK),� .ADDR(mem_addr),� .D(mem_d),� .WE(mem_we),� .Q(mem_q)� );� Dec 7, 2013 CoramChannel� #(� .CORAM_THREAD_NAME("thread_name"),� .CORAM_ID(0),� .CORAM_ADDR_LEN(CHANNEL_ADDR_LEN),� .CORAM_DATA_WIDTH(CHANNEL_DATA_WIDTH)� )� inst_channel� (.CLK(CLK),� .RST(RST),� .D(comm_d),� .ENQ(comm_enq),� .FULL(comm_full),� .Q(comm_q),� .DEQ(comm_deq),� .EMPTY(comm_empty)� );� (a) CoRAM Memory (b) CoRAM Channel Shinya T-Y. Tokyo Tech 21
  • 22. AXI4 Master Interface n  DMA controller works as AXI4 master IP-core interface WrData Enque AlmFull WrData Enque AlmFull Empty Deque RdData FSM Addr Size RdEn RdEn Ready WrData Enque AlmFull Control Thread RdEn Busy Ready RdData ・・・ WrEn DMA Controller Deque RdData Empty Deque WrEn WrData BramAddr DramAddr Size Empty CoRAM Channel DMA Cluster WrEn Addr WrData ・・・ RdData CoRAM Memory (BRAM) WrEn WrData RdData Addr CoRAM Memory (BRAM) RdData Addr WrEn RdData WrData Addr HW Kernels (Computing Logic) Write Address Channel Write Data Channel Read Address Channel RDATA RREADY RVALID ARADDR ARLEN ARVALID ARREADY WDATA BVALID WVALID WREADY AWADDR AWLEN AWVALID AWREADY AXI Master Interface (Protocol Conversion) Read Data Channel AXI4 Interconnect Dec 7, 2013 Shinya T-Y. Tokyo Tech 22
  • 23. For Parameterized RTL design support n  Generate-statement support by advanced RTL analyzer l  Not supported by the original CoRAM compiler Dataflow n  Pyverilog: Python-based Tool-chain for Verilog HDL Design l  Parser l  Dataflow Analysis l  Optimization l  RTL Code Generation l  Control flow Analysis l  Graphical Output Dec 7, 2013 State Machine Shinya T-Y. Tokyo Tech 23
  • 24. Evaluation Dec 7, 2013 Shinya T-Y. Tokyo Tech 24
  • 25. Evaluation n  Point: Maximum memory bandwidth utilization l  PyCoRAM is a memory abstraction framework n  Setup l  2 FPGA boards •  Digilent Atlys –  Spartan-6 LX45 –  DDR2-800 DRAM 128MB (1.2GB/s*) *Due to 300MHz operation –  AXI4 128-bit, 100MHz (1.6GB/s) •  Xilinx ML605 Digilent Atlys (Xilinx Spartan-6 LX45) –  Virtex-6 LX240T –  DDR3-800 DRAM 512MB (6.4GB/s) –  AXI4 256-bit, 200MHz (6.4GB/s) l  EDK •  Xilinx Platform Studio (14.6) Dec 7, 2013 Shinya T-Y. Tokyo Tech Xilinx ML605 (Xilinx Virtex-6 LX240T) 25
  • 26. Evaluation: Application n  Array-sum: calculate summation value of an array l  Two CoRAM memories as Double-buffered l  Varying SIMD width (=# simultaneous ops) to check the effect to the memory bandwidth utilization •  4, 8, 16, 32, 64 (bytes) sum Output + MUX s3 s2 s1 s0 + + + + D[3] D[2] D[1] D[0] MUX CoRAM Memory 0 Dec 7, 2013 D[3 ] D[2 ] D[1 ] D[0 ] from DMA Controller 0 D[3 ] D[2 ] D[1 ] D[0 ] from DMA Controller 1 Shinya T-Y. Tokyo Tech CoRAM Memory 1 26
  • 27. Memory Bandwidth Utilization n  Good bandwidth utilization l  Atlys: 85.5% (at 16-byte) l  ML605: 84.9% (at 64-byte) n  Degradation reasons l  Sequential (single) transaction for each DMA controller 1 Atlys (Spartan-6) Bandwidth Utilization Bandwidth Utilization •  Memory latency directly affects the performance adversely 0.8 0.6 0.4 0.2 0 4 Dec 7, 2013 8 16 SIMD size [byte] 1 ML605 (Virtex-6) 0.8 0.6 0.4 0.2 32 Shinya T-Y. Tokyo Tech 0 4 8 16 32 SIMD size [byte] 64 27
  • 28. Conclusion and … Dec 7, 2013 Shinya T-Y. Tokyo Tech 28
  • 29. Conclusion n  PyCoRAM: Python-based implementation of CoRAM memory architecture for modern FPGA EDKs n  Future work l  Further evaluation on more realistic applications l  AXI4 slave feature for control thread l  Tutorial slideJ Portable application design with CoRAM Cooperation with standard IP-cores Accelerator logic Standard IP-core CPU core CoRAM Abstraction Standard On-chip Interconnect Device-dependent Interfaces Dec 7, 2013 Automatically managed by EDK Shinya T-Y. Tokyo Tech 29
  • 30. PyCoRAM and Pyverilog are ready for public! n PyCoRAM (0.7.0-public) l https://github.com/shtaxxx/PyCoRAM n Pyverilog (0.6.0-public) l https://github.com/shtaxxx/Pyverilog Thanks! Dec 7, 2013 Shinya T-Y. Tokyo Tech 30