1. Datapath Overview

    Datapath Introduction

    Datapath is a collection of functional units (adders/multipliers/shifters), storage elements such as memories (RAM, ROM, registers) which implements the user specifications. The datapath subsystem is the one of the most important parts of hardware design as it represents the functional behavior of what the designer intended to do.

    Blocks in Datapath

    • Functional Unit: Functional Unit corresponds to all blocks which aid in the building of the function as per the specification in HLS. They mainly consist of arithmetic functions. Some of them are listed below:
      • Adders
      • Multipliers
      • Subtractors
      • Dividers
      • Shifters
      • Comparators
    • Memory blocks: Memory blocks pertains to the following:
      • Registers
      • RAM
      • ROM

    In HLS , arrays are mainly mapped to RAM’s and ROM’s, while variables are mapped to registers. Registers are used mainly in tandem with control logic as they store the output of one clock cycle to be used as the input of the next one.

    Memories can be classified based on the number and type of ports as single port, dual port, 1R1W; for FPGA’s they can be classified with what have been used to implement them as Block RAM’s (dedicated RAM’s), LUT RAM’s (using Look Up tables), while register uses slices.

    • Interconnects such as buses, arbiters along with muxes are also part of blocks in the datapath.

    Datapath in High-Level Synthesis

    In High-Level Synthesis (HLS), datapath is generated with a multi-step process:

    1. Compilation: The input source code/ file which has the design is analyzed and compiled. Along with the compilation, some initial optimizations such as dead code removal, constant propagation, constant folding, common subexpression evaluation etc are performed.
    2. Library: Library consists of components which will be used in the datapath in latter stages. The components present are part of functional units (adders,multipliers,dividers,shifters), memory units (reg, ROM, RAM) and interconnects (muxes). The components present are differentiated in terms of bitwidth of inputs/outputs, speed grades which affect the area and delay.
    3. Allocation and scheduling: Based on the user’s input clock period, the components needed for the datapath are taken from the library and placed into various clock steps, such that:
      1. The components fit within a clock cycle.
      2. If they do not fit within a clock cycle, they must be allocated across multiple clock cycles with using a register in between.
      3. there are no data dependencies or no disruption of the data flow between the components.
    4. Sharing, Reallocation: Once the allocation and scheduling has been completed, further optimization can be done to save up on area or improve latency/performance based on the user choice. This is done with sharing of the physical operators based on the life cycle of each of them. For example, an adder is mapped to a Carry Save adder in cycle 1. The life cycle of the adder is 3 clock periods. If there is another adder operation in cycle 4, then the same physical adder can be reused in cycle 4, saving the area of one additional adder. This is sharing of resources. Reallocation is when there is one adder which takes two cycles, then there is a replacement of the initial adder with this new one to save up on a cycle of latency.
    5. DPFSM: Data path Finite State Machine is created will put the entire data path based on the schedule which has been generated into a sequence.
    6. Binding: This is where the components from the above step are bound to actual RTL implementations. There can be different RTL/hardware implementations of adders (Carry Propagation Adder, Carry Look ahead adder, etc.) and this step maps to adder component to any type of an adder based on the clock period, data dependencies and user constraints. Similarly array elements, variables can be mapped to physical memory components as well.

    These are the steps for generating the datapath in HLS.

    Challenges in designing of datapath in HLS

    • Runtime Issue: Runtime issues can occur while designing of datapath in HLS due to the constraints provided by users. These constraints can be due to the large number of loops present in the design, the pipelining with II=1 and unrolling of them. A very high clock frequency can lead to tougher designing as any HLS tool would take a while to generate the datapath and the schedule.
    • Scheduling Failures: Scheduling failures can be one of the main reasons for failure of datapath designing. This basically means the HLS tool is unable to generate the schedule or the datapath. This can occur due to the following reasons:
      • Resource Competition (Memories): Occurs due to impossible constraints for e.g. pipelining a design so that it attempts to read a single port memory more than once in a clock period. They can be resolved by changing the design constraints or by modifying the code.
      • Fixing this by design constraints can be done by
        • Relaxing the Initiation interval of the design such that the design schedules and we obtain the datapath. This will lead to performance reduction though.
        • Using a memory with a greater number of ports.
        • Split the memory to sub memories or increase the word width of the data in the memory such that multiple words can be read from a single port.
      • Fixing this by changing the source code can be done by:
        • Looking for multiple IO accesses and reducing it single access
        • Rewrite array accesses based on patterns or order of access to reduce memory or IO access. This can be done by storing values in a register.
      • Data Feedback Dependency Scheduling Failures: Scheduling will fail if the design contains feedback, and the feedback timing path is greater than the initiation interval. This can arise especially when pipelining with II=1 and unrolling loops in the design, along with bad coding style. Few examples are:
        • Using saturation and rounding on accumulators.
        • Conditional breaks, returns, continue statements.
        • An example of data feedback dependency is :

    In the image above fb is written in the 4th cycle of iteration 1, while the same fb is used in 2nd cycle of iteration 2. So, fb is read for 2nd iteration before the 1st iteration is completed.

    This is a common feedback dependency of the data path, where the feedback is 2 cycles and the initiation interval is 1 cycle. This can be fixed by using a temporary variable for fb to avoid this conflict as seen in the image below.

    A fb_old variable can be used and fb’s value can be written to it, once the read of fb_old is completed. This way there is no feedback dependency error.

    Memory Read/Write Dependencies: When a memory does not have a proper read/write resolution as in it can read before write or write before read and when the design is pipelined with II=1, with the same address location being read and written in the same clock cycle, this dependency issue occurs. The HLS tool will not be able to prove if the address is different. This is another scenario where the datapath schedule will fail.

    Conclusion

    Datapath is the architecture which consists of multiple functional units, memory blocks, interconnects which implements the design specification. This along with the control logic and FSM provides the final RTL of the input HLS source code. The various components of datapath have been discussed, along with how it is generated in HLS tools along with the issues/challenges faced in datapath creation.