1. C++ Modeling Intro

    C++ Modeling Intro

    Many HLS designs can be efficiently modeled using untimed C++, which features a simple coding style and very fast simulation performance. With this modeling approach, hardware blocks are modeled as C++ classes, and design hierarchy and block interconnect can be modeled using function calls between classes. Fixed-point and floating-point datatypes can be used to accurately model aspects such as limited numerical precision and saturation and rounding.C++ libraries for HLS enable blocks to model math functions, DSP functions, and more.

    When using C++ for HLS models, most of the constructs within the C++ language can be used freely, including:

    • classes, templates, inheritance
    • control constructs such as: while, for, if/else, switch/case
    • C datatypes such as int, short, char, uint64_t, arrays, structs, etc.

    Specific C++ constructs which generally cannot be used within HLS models since they cannot be synthesized to HW include:

    • malloc/free, new/delete
    • function recursion
    • dynamically allocated strings
    • C++ try/catch/throw

    With the untimed C++ modeling approach, HW modules are represented as classes, and design hierarchy is modeled by having a parent class instantiate a child class and call a member function of the child class to model that block's behavior. HW reset values are modeled within the class constructor by assigning reset values to variables.

    In the C++ code, the interconnect between modules is modeled in several ways:

    • Case 1) By passing a reference to ac_channel, which models hardware interconnect using message passing (typically the rdy/vld/dat protocol).
    • Case 2) By passing a C++ variable, which models hardware interconnect like a Verilog input wire.
    • Case 3) By passing a C++ array, which models hardware interconnect that is usually implemented as an interface to a RAM.
    • Case 4) By passing a reference to ac_sync, which models a HW synchronization primitive equivalent to a software "barrier".

    For the first case above, the C++ source code does not specify the detailed HW implementation of the message passing interface. HLS can implement the interface in a variety of different ways, including:

    • Using a simple rdy/vld/dat signal-level handshake.
    • Using the same protocol as above but adding either a skid-buffer or fifo storage for additional buffering capacity.
    • By implementing the protocol as only vld/dat(i.e. no rdy signal).
    • By mapping the interconnect to a simple FIFO, or a FIFO implemented with a ping-pong RAM, or other FIFO interface approaches.
    • By mapping to interfaces such as AXI4-stream.

    For the second case above, HLS implements the variable as an input wire to the HW block. When the HW block is pipelined during HLS, pipeline registers are not added when reading the input wire to minimize HW area.

    For the third case above, HLS usually implements the array that was passed into the child block as an interface to a RAM. HLS will automatically implement the detailed RAM interface, including IO protocols, timing behavior, pipelined memory access behavior, etc.

    To improve HW QOR, HLS may make several transformations to the way the RAM is organized and accessed:

    • HLS may build separate RAMS for each of the individual fields of an array that has an element type of a struct.
    • HLS may merge or split array accesses that appear in the C++ source.
    • HLS may reorder RAM accesses when it pipelines designs to improve QOR.
    • HLS may build a banked memory structure.

    Another implementation option during HLS is to implement the array interface as a bus interface using an address-mapped bus such as AXI4.

    In the pre-HLS untimed C++ simulation, all ac_channels are modeled as FIFOs with infinite depth. The testbench for the design repeatedly calls the member function of the top class that models the DUT behavior. This function in turn calls the member functions of all child modules in the HW hierarchy. In this manner the concurrent execution of the HW processes is modeled within a single software thread in the pre-HLS simulation. After HLS, all ac_channels will be implemented with finite depth, and the functionality in the C++ DUT will be implemented with fully concurrent HW in the RTL.

    Libraries for C++ HLS

    There are a large set of open-source C++ HLS libraries that can be used when modeling in untimed C++. These include:

    • ac_types: This package provides parameterized classes to model integer, fixed-point, and floating-point data types, and is designed to provide very high-speed C++ simulation performance and optimal HW QOR.
    • ac_math: This package provides parameterized classes to model a large variety of math functions such as cosine, tangent, square_root, etc., and to provide optimal HW QOR.
    • ac_dsp: This package provides parameterized classes to model various DSP filters like FFTs and FIRs.
    • ac_ml: This package contains IP and reference designs for machine learning.

    Designs Suited for Untimed C++ Modeling

    The untimed C++ modeling approach is well-suited for dataflow designs such as:

    • DSP filters
    • Time-domain signal processing
    • Image signal processors
    • ML accelerators implemented using a dataflow HW architecture

    For these types of designs, typically multiple separate HW blocks are created which use ac_channels as the primary communication approach between blocks. Because the untimed C++ flow does not model time in the pre-HLS simulation, timing-dependent behavior cannot be verified until RTL is generated. For this reason, designs that have control flow or timing dependent behavior may be better suited to use the SystemC flow for HLS, since it is usually desirable to verify these aspects in the pre-HLS simulation.

    Untimed C++ Development Tools and Verification

    One of the advantages of using C++ for HW modeling is the large variety of open-source development tools available. Some development tools that are commonly used with C++ HLS models include:

    • make, cmake for build management
    • VSCode, Eclipse for integrated development environments
    • gdb/ddd for debugging C++ code
    • AdressSanitizer, MemorySanitizer for detecting C++ coding errors
    • gcov/lcov - for C++ style line coverage metrics

    When using the untimed C++ modeling approach, there are a wide variety of verification techniques that can be applied to the C++ HLS model.

    • CCOV can be used to measure code coverage on the C++ model that takes into consideration the HW structure of the model.
    • CDesignChecker can be used to perform HW-aware static analysis of the C++ model.
    • The standard C++ assert() statement can be used to detect errors, and when the model is synthesized by HLS the assertion will appear in the RTL as a SVA or PSL assertion.
    • The C++ model can be imported into environments such as Matlab or python to provide stimulus and response checking.

    After HLS generates RTL, there are a variety of verification techniques that can be applied.

    • HLS can automatically run the RTL DUT "side by side" with the pre-HLS C++ DUT and apply stimulus and compare the results.
    • Random stall injection can be automatically applied to message passing channels in the RTL DUT, stressing the model so that bugs can be found.
    • SystemVerilog UVM can be used to stress test the RTL DUT and to measure and meet functional coverage and RTL code coverage metrics.