1. HLS Basics

    1. HLS Basics - Left

      HLS 101

      HLS typically uses C++ or SystemC to raise abstraction above that of RTL. For hardware design (HLS) and verification (HLV) there are considerable advantages to using this methodology to deliver high quality RTL, be it VHDL or Verilog.

      It is important to understand that HLS still involves hardware design skills that the RTL designer will be experienced with. What you write in the source will materially impact the resulting RTL. This is especially true when it comes to memory architecture, interface bandwidth limits, and bit-accurate mathematical tradeoffs.

      Loop Unrolling

      Loop unrolling is a key mechanism for driving parallelism in the HLS process. Designers must be aware that dependencies from iteration to iteration can limit absolute parallelism when the loop is partially or even fully unrolled.

      Pipelining

      Pipelining involves specifying how often (in clock cycles) a loop or function body starts. Initiation Interval is often “1”, often implying continuous clock-by-clock throughput. However there are some cases where an Initiation Interval greater than one may be used. Often due to resource sharing, or feedback considerations.

      Hierarchy

      Hierarchy is a key concept in both RTL and HLS for managing design complexity and reuse. In SystemC, we can be explicit, creating hierarchy from SC_MODULE definitions that specify exact interfaces, along with clocks and resets. In C++, hierarchy and parallelism are implicitly modeled and transformed to explicit implementations in RTL.

    2. HLS Basics - Right

      Design Partitioning

      Partitioning enables the designer to better manage complexity and optimize mutually exclusive elements in a design for more optimal reuse. Conceptually, a small cluster or scope of operators kept together to ease sharing or optimization decisions.

      Partitions can often be combinational in nature, though sequentially pieplined datapath elements can also be used when higher frequencies or a greater number of operators are used.

      Datatypes

      Bit-accurate datatypes provide a mechanism to model exact hardware bit-width and arithmetic precision. From simple signed and unsigned integer representations that are familiar to RTL designers, through fixed point datatypes with rounding and saturation modes, all the way to complete IEEE compliant user-specified exponent and mantissa floating point types, HLS ensures that there is no misunderstanding between the C++/SystemC numerical behavior and the resulting RTL created by HLS.

      Some companies may have their own home-grown types for modeling bit-accuracy. There are two publicly available types in the AC Types and SystemC types. Both can be obtained without cost, and both support C++/SystemC design styles and provide integer and fixed point modeling. The AC Types have become particularly popular due to their true unlimited length, consistent behavior, and faster simulation speed in executables.

      Interfaces

      Interfaces can be as simple as unsynchronized wires coming from a set of Control/Status Registers, through complex streaming or SRAM interfaces, to complex bus standards. Moving data and the impact that movement has on bandwidth and system-level performance is key. Common interfaces used in HLS might start with “channels” or “connections” for clear rdy/vld/data signaling.

  2. HLS Building Blocks & IPs

    1. HLS Building Blocks & IPs - Left

      Arithmetic Functions

      Designs are heavy in mathematical computations, making arithmetic functions a critical component of HLS designing. This is the case for designs belonging to any domain which are being targeted for both FPGA and ASIC. The arithmetic functions can be as simple as the basic operators such as adders, multipliers, dividers; to more complicated functions such as trigonometric functions, squares, roots, exponents, logarithms.

      HLS leverages C++/SystemC datatypes making it important to design arithmetic functions specific to these datatypes, such that they are synthesized efficiently in hardware. In FPGA’s, these arithmetic functions are mapped to DSP’s and LUT’s, while for ASIC design these arithmetic functions are built using the basic standard cell libraries targeting a particular technology node.

      Datapath

      In hardware, an input data traverses through muxes, arithmetic and logic circuits to provide the output data. This path of traversal or this collection of functional units is called Datapath, and this represents the functional behavior of what the designer intended to do.

      In HLS, the Datapath operations are initially allocated from the selected FPGA or ASIC library and then scheduled based on the clock period, registering where required into different clock steps with the help of a Finite State Machine. Datapath contains information of data flow or in other words – “data dependencies”. The Datapath must be scheduled in such a way that the data dependencies have no feedback failure. Otherwise, the final RTL does not get generated making Datapath one of the most critical components of HLS design.

      Control Logic & FSM

      HLS automatically infers control logic and finite state machine (FSM) for the scheduled operations. Control logic is responsible for the data movement in the data path. It manages pipeline stages, facilitates resource sharing, and decides the next state in the FSM. The FSM controls state transitions and sequencing of operations in the data path by generating control signals based on the scheduled design. Various FSM encoding techniques are available for optimization in HLS.

      Memories

      HLS allows arrays to be mapped to different memory type resources such as block RAM, distributed RAM, registers, ROM, and FIFO. The necessary addresses, data, and control signals needed to access the memories are automatically created. Several pre-built FPGA and ASIC memory libraries are included in HLS for the supported technologies. In addition to these existing libraries, custom libraries can also be created using Memory Generator which allows users to read in a VHDL or Verilog model of a memory and generate vendor specific custom memory libraries.

    2. HLS Building Blocks & IPs - Right

      Channels & Bus Interfaces

      HLS uses channels to model data streaming interfaces. A channel is templatized to accept any data type. Channels are used as IO interface and in between blocks for exchanging data. Communication protocols for channels can range from a simple FIFO to a complicated bus-protocol driven masters and slaves. When a bus-protocol is applied, it enables bus interface behavior in the channel allowing streaming of data, control signals, and synchronization within the system. Various bus protocols are available in HLS to facilitate easy design integration within a system.

      IP

      IP cores in HLS are licensed reusable functional blocks of code designed in high-level language. They are licensed and sold to other vendors to be used as standalone designs or as block within their own design. Since they are re-used with any technology or library, ASIC or FPGA, they are designed to be as efficient as possible in terms of power, performance, and area. The IPs are usually verified behaviorally, formally and covered in both functional and code coverage.

      The IP cores can be something as simple as a normalization function to something as complicated as an entire image processing algorithm. Designers can leverage the well designed and verified IP cores; along with the ease of designing with HLS they can reduce the time to market. This makes IP cores an important aspect of HLS designs.

      Open-Source Foundation libraries for HLS

      The open-source foundational libraries for HLS implemented in standard C++ for bit-accurate hardware and software design. The goal of these libraries is to create an open community for exchange of knowledge and IP for HLS (High-Level Synthesis) that can be used to accelerate both research and design. The libraries are targeted to enable a faster path to hardware acceleration by providing easy-to-understand, high-quality fundamental building blocks that can be synthesized into both FPGA and ASIC. These libraries are delivered as an open-source project on GitHub under the Apache 2.0 license and contributions are welcome.

  3. Languages

    1. Languages - Left

      C++ Modeling

      Many HLS designs can be efficiently modeled using untimed C++, which features a simple coding style and very fast simulation performance. With this modeling approach, hardware blocks are modeled as C++ classes, and design hierarchy and block interconnect can be modeled using function calls between classes. Fixed-point and floating-point datatypes can be used to accurately model aspects such as limited numerical precision and saturation and rounding. C++ libraries for HLS enable blocks to model math functions, DSP functions, and more.

    2. Languages - Right

      SystemC Modeling

      Using SystemC to model HLS designs enables a wide range of hardware design types to be modeled and efficiently verified before HLS. SystemC enables hardware aspects to be modeled such as modules and module hierarchy, ports, signals, concurrent processes, reset signals, and time. All major HDL simulators support simulation of mixed testbench and design hierarchies that may contain SystemC, SystemVerilog, and VHDL modules. Within SystemC HLS models, libraries such as MatchLib can be used to enable transaction interfaces to be modeled and to provide building block IP models. SystemC HLS models can use fixed-point and floating-point datatypes to accurately model aspects such as limited numerical precision and saturation and rounding. C++ libraries for HLS enable blocks to model math functions, DSP functions, and more.

  4. Verification

    1. Verification - Left

      High-Level Verification (HLV)

      HLV is the application of known and trusted verification techniques to HLS design source. HLV enables starting verification sooner, at the C++ and SystemC HLS level of abstraction, without waiting for RTL. As it operates on a higher level of abstraction, efficiencies are gained as simulations run at speeds up to 30, 100 or even up to 500x faster than comparable RTL simulations. A guiding principle of HLV is that is use methods that Design Verification (DV) teams are both already familiar with and also trust. Regarding dynamic verification, examples include metrics driven code and functional coverage closure and test plan integration. On the static and formal side, this means applying both static lint checks plus deep formal property checks on the HLS design source prior. All of this takes place prior to running HLS to produce RTL. While each team or product will likely have their own verification requirements, HLV must provide for rigorous and thorough techniques up to and including ASIC quality signoff on HLS design source. Another way to think of HLV is its goal is to ensure there are no bug escapes into RTL.
      With proper planning and forethought, HLV also facilitates re-use when it comes time to perform sign-off on the post-HLS RTL.

    2. Verification - Right

      High-Level Static and Formal Verification

      Sequential Formal Verification techniques can be applied to C-source written for HLS. A C-Level verification suite includes simulation, coverage metrics, and formal analysis very similar to direct RTL design flows. Assertions, covers and reachability analysis from C-based formal analysis all combine to give the designer and verifier feedback to close coverage and functionality efficiently.

  5. Methodologies

    1. Methodologies - Left

      HLS FPGA

      For an optimal implementation of algorithmic designs in an FPGA, it is important to understand the FPGA architecture and its functioning. Most FPGAs contain specialized resources such as RAMs, DSPs, shift registers – every vendor offers these blocks in multiple size configurations, features such as simple dual-port or true dual-port RAMs, synchronous versus asynchronous modes, floating-point versus fixed-point arithmetic operations, and various stages of pipeline registers. Therefore, it is critical to choose the right FPGA device with appropriate features that are best suited for your design.

      Processor Accelerators

      HLS enables developers to create domain specific accelerators to deliver higher performance and greater efficiency. Designers can no longer rely on silicon scaling to deliver these improvements. By offloading computationally complex algorithms from the processor to hardware accelerators, performance, cost, power, and area can all be improved. Algorithms running on general purpose processors can be migrated to either co-processors or bus-based accelerators. This can enable the design to use a smaller, more efficient processor and increases parallelism. HLS makes the transition fast, easy, and minimizes execution and schedule risks.

    2. Methodologies - Right

      MathWorks

      MATLAB and Simulink are integrated in HLS flow to enable seamless algorithm-to-RTL design process. The process begins with algorithm development in MATLAB/Simulink. The floating-point MATLAB or Simulink model is translated to C++ and instantiated in the original MATLAB or Simulink testbench for verification and validation purpose. The C++ model is further refined for High-Level synthesis and continuously validated in the MATLAB or Simulink testbench. The High-Level Verification framework provides HLS model instrumentation for coverage analysis that can be run inside the MATLAB testbench. The coverage information can be used to improve the DUT quality and the stimulus generator that can be used later for RTL verification. MATLAB/Simulink, High-Level Verification and Synthesis, and traditional RTL verification flow together enable seamless implementation and continuous verification flow from MATLAB/Simulink to RTL.

  6. Applications

    1. Applications - Left

      AI/ML Accelerators & Design

      Machine learning algorithms are ideal for implementation in hardware using HLS. They are computationally complex and highly parallel. They often perform poorly on general purpose CPUs. HLS enables designers to explore a wide range of architectures to achieve an optimal implementation. Bespoke AI accelerators can outperform general purpose accelerators, such as GPUs or TPUs by an order of magnitude or more.

    2. Applications - Right

      Image/Video Processors & Design

      The future is now! Deep Fakes, Self-Driving Cars, Virtual Reality. What is this based on? What is next? Image and video processing is both an old and new industry. Modern technologies allow for AI to recognize objects and generate them. All these modern solutions start with the fundamentals of signal processing for digital images. How is an image recognized? How is an image processed? This topic will cover the basics of image and video processing, how High-Level Synthesis can be used for these basics, and more.