Memories

High Level Synthesis (HLS) allows arrays to be mapped to different memory type resources such as block RAM, distributed RAM, registers, and FIFOs. The necessary addresses, data, and control signals needed to access the memories are automatically created. Several pre-built FPGA and ASIC memory libraries are included in HLS for the supported technologies. In addition to these existing libraries, custom libraries can also be created using Memory Generator which allows users to read in a VHDL or Verilog model of a memory and generate vendor specific custom memory libraries.

Sadhvi Praveen

Memories Overview
Memories Overview
Memories are storage components that are mapped to arrays in HLS C++ based on their size and usage. The necessary logic needed for accessing the memory will be automatically added during synthesis. HLS provides pre-built RAM and ROM libraries for supported technologies. These components contain memories with user-defined latency to add register banks to the input and output of the memories and choose different Read/Write Resolutions (if supported). In addition to these memory libraries, custom memory libraries can be generated using Memory Generator in HLS. This utility is useful for creating ASIC memories from IP groups or foundries.
Types of Memories:
1. Block RAM: Block Rams are dedicated/centralized memory blocks used for storing large arrays. By default, HLS maps larger arrays to available Block RAM components in the library. Different types of Block RAMs are:
  Single Port RAM: Single port RAMs have one read/write port and can only support one read operation or one write operation in a single clock cycle. HLS may choose to map an array to a single port RAM if it is not possible to perform the read or write operations concurrently in the same clock cycle.
  Simple Dual Port RAM: Simple Dual Port (1R1W) RAMs have one read port and one write port. It can support single read and write operations in a clock cycle as well as one read and one write operation concurrently in a single clock cycle. It has two separate addresses, one for read operations and one for write operations.
  True Dual Port RAM: True Dual Port RAMs have two single read-write ports hooked to the same memory array. Since both ports can perform memory read or write operations, HLS can schedule two read operations or two write operations in the same clock cycle. This differs from the simple dual port RAM, which cannot perform two operations of the same kind (e.g. two read operations or two write operations) in one clock cycle.
2. Distributed RAM: Distributed RAM is smaller in size compared to Block RAM and distributed across the fabric. This allows the memory to be placed close to the logic it supports. Due to its proximity, data access is faster. Distributed RAM is suitable for storing smaller arrays or data sets and intermediate results.
3. Read Only Memory (ROM): ROMs are used to store constant data that remains unchanged throughout the code. HLS maps constant array values into ROM components.
4. FIFO: First In First out (FIFO) acts as an intermediate storage element between two hierarchical blocks and synchronizes the data between the blocks. The first data element to enter the FIFO is the first to be read.
5. Register: Registers are storage elements for holding intermediate results. Variables are mapped to registers. Registers are used mainly in tandem with control logic as they store the output of one clock cycle to be used as the input of the next one. HLS also allows arrays to be mapped to registers.
Design Challenges with Memories in HLS
Memory accesses tend to be the bottleneck in a design, and they can potentially limit the ability to pipeline, or negate the benefits gained from loop unrolling, due to fixed number of ports in the memory components. Over constraining a loop containing memories may lead to scheduling failures due to resource competition failures.
Resource Competition failure occurs due to impossible constraints for e.g. pipelining a design so that it attempts to read a single port memory more than once in a clock period. This failure may be resolved by either changing the design constraints or by modifying the source code:
- Relaxing the Initiation interval of the design such that the design schedules and we obtain the datapath. This may lead to performance reduction.
- Using a memory with a greater number of ports, if available.
- Increase the word width of the data in the memory such that multiple words can be read from a single port.
- “Reshaping” the arrays into different physical structures. Two ways:
  Interleaving: Interleaving an array creates smaller arrays by partitioning the elements cyclically from the original array definition.
  Banking: Banking an array splits the original memory into multiple banks of memories of the specified size. Consecutive elements are kept together.
- Rewrite array accesses based on patterns or order of access to reduce memory or IO access. This can be done by storing values in a register.
Using the above techniques for Memories in HLS may help significantly improve the area, performance, and power of the generated RTL.

HLS Hackathon

Wednesday, Jul 02nd-8:00 AM PDT

Accelerating Inferencing Using HLS Hackathon

Memories

Memories Overview

Memories Overview

Design Challenges with Memories in HLS