2. ASIC & VLSI
• Time-to-market: Some large ASICs can take a year or
more to design.
• Design Issues: you need a lot of time to handles the
mapping, routing, placement, and timing.
• The FPGA design flow eliminates the complex and time-
consuming floorplanning, place and route, timing
analysis.
6. Why HDL?
• To allow the designer to implement and verify complex
hardware functionality at a high level, without the
requirement of having to know the details of the low-
level design implementation.
• Advantage:
• FPGAs have lower prototyping costs
• FPGAs have shorter production times
• Synthesis: The process which translates VHDL code
into a complete circuit with logical elements( gates, flip
flops, etc…).
8. Loop Unrolling
• arrays a[i], b[i] and c[i] are mapped to RAMs.
• Rolled Loop: This implementation takes four clock cycles, one multiplier and each RAM can be a
single port.
• Unrolled Loop: The entire loop operation can be performed in a single clock cycle. requires four
multipliers and requires the ability to perform 4 reads and 4 write in the same clock cycle; may
require the arrays be implemented as register arrays rather than RAM.
12. Pipelining
• Function pipelining is only possible as there is no resource contention or data dependency which
prevents pipelining. The input array “m[2]” is implemented with a single-port RAM. The function
cannot be pipelined because the two reads operations on input “m[2]” (“op_Read_m[0]” and
“op_Read_m[1]”) cannot be performed in the same clock cycle.
• Solution: The resource contention problem could be solved by using a dual-port RAM for array
“m[2]", allowing both reads to be performed in the same clock cycle or increasing the the interval
of pipeline
14. Array Optimizations
• Mapping: When there are many small arrays mapping to a single large
array will reduce the storage overhead.
• Partitioning: If each small array gets a separate memory, a lot of memory
space is potentially wasted and the design will be large and consequently
large power consumption.
• Horizontal mapping: this corresponds to creating a new array by
concatenating the original arrays. Physically, this gets implemented as a
single array with more elements.
• Vertical mapping: this corresponds to creating a new array by
concatenating the original words in the array. Physically, this gets
implemented by a single array with a larger bit-width.
16. Horizontal mapping
• Although horizontal mapping can result in using less RAM
components and hence improve area, it can have an impact on
throughput and performance.
• In the previous example both the accesses to "array1" and "array2"
can be performed in the same clock cycle.
• If both arrays are mapped to the same RAM this will now require a
separate access, and clock cycle, for each read operation.
18. Array Partitioning
• Arrays can also be partitioned into smaller arrays because it has a limited
amount of read ports and write ports which can limit the throughput of a
load/store intensive algorithm.
• The bandwidth can sometimes be improved by splitting up the original
array (a single memory resource) into multiple smaller arrays (multiple
memories), effectively increasing the number of ports.
19. Array Partitioning
• If the elements of an array are accessed one at a time, an efficient
implementation in hardware is to keep them grouped together and
mapped into a RAM.
• If multiple elements of an array are required simultaneously, it may
be more advantageous for performance to implement them as
individual registers: allowing parallel access to the data.
• Implementing an array of storage elements as individual registers
may help performance but this consume large area and increase
power consumption.
20. xa7a100tfgg484-2i
2-D for size N =128*128
Input Array Dual port Independent Registers
LUT 1642 10778
FF 835 9548
Power 246 2031