MathWorks
MATLAB and Simulink are integrated in HLS flow to enable seamless algorithmtoRTL design process. The process begins with algorithm development in MATLAB/Simulink. The floatingpoint MATLAB or Simulink model is translated to C++ and instantiated in the original MATLAB or Simulink testbench for verification and validation purpose. The C++ model is further refined for HighLevel synthesis and continuously validated in the MATLAB or Simulink testbench. The HighLevel Verification framework provides HLS model instrumentation for coverage analysis that can be run inside the MATLAB testbench. The coverage information can be used to improve the DUT quality and the stimulus generator that can be used later for RTL verification. MATLAB/Simulink, HighLevel Verification and Synthesis, and traditional RTL verification flow together enable seamless implementation and continuous verification flow from MATLAB/Simulink to RTL.

HighLevel Synthesis Flow from MATLAB/Simulink Model
HighLevel Synthesis Flow from MATLAB/Simulink Model
Introduction
When MATLAB and Simulink became the defacto standard in algorithm development, many research teams started to look at possibilities of generating RTL directly from MATLAB code, especially for FPGA design. Many of those solutions were targeted to certain applications or technologies. The approach is promising but it has a huge abstraction level chasm between the expressively powerful and dynamic MATLAB or Simulink model against the static and bitlevel Verilog or VHDL. These generators require a modeling style that limits the freedom of algorithm developers to use the full power of the MATLAB language or Simulink models.
HighLevel Synthesis opened new possibilities to model and develop HW at a much higher abstraction level using C++ or SystemC as modelling language. The MATLABHLSRTL flow introduces a new intermediate language, but it allows each language to be used in its optimal abstraction level. This reduces the verification complexity and bridges the abstraction level chasm between MATLAB and RTL. Algorithm developers can use MATLAB as they are used to, which maintains their productivity. The abstraction level difference between MATLAB and C++ is small and HLS enables architecture exploration without changing the source code. HLStoRTL synthesis is a reliable technology today.
Model translation from MATLAB to C++ or SystemC has been available for a long time. Yet, the code generation has been targeted and optimized for simulation and embedded software, which is not necessarily optimal for supporting HLS. Editing the generated C++ code manually breaks the validation flow. HLS tools need different coding styles, which limits a fully automated flow to a specific HLS tool. Furthermore, translating the dynamic MATLAB features and operatorbased complex operations to HLS optimized C++ requires a certain level of synthesis. Therefore, the manual translation of MATLAB or Simulink model to HLS C++ code is still one of the best methods.
Limitations of the existing design processes
There are several problems to be solved before an abstract MATLAB algorithm model is converted to an efficient HW implementation. The main issue is the structural difference between abstract, sequential simulation code with operatorbased vector and matrix operations in MATLAB and HW architecture with parallel processing and distributed memory architectures. A widely used method is to write the MATLAB code in a very low level – almost RTL – that makes the translation easy but slows down the simulation dramatically and limits the freedom of algorithm developer. Replacing the MATLAB vector and matrix operations with C++ linear algebra library functions is not possible, because their coding style is optimized for software execution and is not synthesizable with HLS tools.
Another problem is the required blocklevel architecture in HW. MATLAB code may have function hierarchy or System Object based class hierarchy, but it doesn’t define parallelism nor the block hierarchy of the HW. Most MATLAB designs use multiple toolbox functions or complex operations like matrix multiplication that may have to be handled as individual modules in the HW. Furthermore, MATLAB language doesn’t have any communication channel like ac_channel that is needed to define the communication between two concurrent clocked modules in HW. In Simulink the design can be partitioned graphically using subsystems communicating with each other through channels. This model hierarchy can be translated directly into HW architecture.
Reusability of the translated models and tool dependency are also significant problems. If the models are just translated and not optimized for parameterization nor templatized for feature configurability, reusing the models from previous projects may be difficult. An earlier version of the methodology introduced in this paper had a reusability problem too. Automatically generated models are typically not reusable.
From MATLAB/Simulink to HLS Design Methodology
MATLAB and Simulink models can describe any algorithm, and they can be built in very different ways. MATLAB model can be anything between a single MATLAB script and hierarchical objectoriented design. Simulink models can also be constructed by using only library primitives, which is not a very efficient methodology for large designs, or by using custom MATLAB functions or System Objects and parameterizable subsystems. That makes automated design translation very difficult and requires different methods in the manual translation process too.
The MATLAB/SimulinktoHLS design methodology has multiple phases and bidirectional communication. The translated models, regardless of if manually or automatically, must be validated against the original higher abstraction level model to ensure that the functionality is correctly translated. A proper functional validation requires a test coverage close to 100%. Otherwise, some features or corner cases may not be tested. Figure 1. shows the multiple steps and validation paths of the design process. This design methodology covers both MATLAB and Simulink designs and MATLAB functions or system objects in a Simulink block diagram.
The process begins with a manual or semiautomatic model translation. The five translation options introduced in this paper describe the handling of the different model types. Using a HLS model library makes the translation of matrix and vector operations easier. Later, the validation steps are needed to ensure the correctness of the translated model and algorithmic performance of the bitaccurate implementation.
Usually, to reach the optimal HW architecture, the design must be optimized for HLS, which requires structural changes to the code, like repartitioning function call trees, adding data buffers and sometimes even recoding some parts of the algorithm. Therefore, a coverage analysis is required to ensure that the code modifications do not break the algorithm in any use case. Once the model is thoroughly validated and verified, it can be synthesized to RTL and verified using the standard RTL verification methodologies like UVM.
Several translation projects have proven that MATLAB and Simulink models should be maintained with double precision numbers, which is the MATLAB default data type. Using a fixedpoint representation increases the number of lines of code and adds fixedpoint specific functions, making it more difficult to translate and especially difficult to validate. In double precision model a deviation of 1015 between MATLAB and C++ models is normal background noise caused by order of operation differences and floatingpoint mantissa size. A significant difference is 103 to 106 depending on the design and fixedpoint word length used. Yet, the acceptable difference between the double precision and fixedpoint implementation depends on the algorithmic performance requirements for the overall system.
Analyzing blocklevel architecture
First, the MATLAB design must be analyzed for computational resource needs and data transfers between the functions. Based on this information, the HW architecture can be sketched.
The computational resource needs of a code segment or function can be analyzed by using MATLAB’s profiler, manual estimation, or simply by measuring the simulation time of the part of the algorithm in relation to the total simulation time. This estimate is not accurate, but it gives an indication of which parts of the simulation model may need increased parallelism, and possibly multiple parallel instances of a certain block.
Data transfer rate is an even more important metric. In the HW the data must be stored in a memory or streamed through the signal path. If a module interface has a large data set like a highresolution image, a pingpong memory interface requires a RAM that is twice as big as the dataset itself. Optimizing the data sizes by defining the module structure in a MATLAB model saves time and effort later in the translation.
Model analysis in Simulink is easier, because the design is already modular, and Simulink has several helpful features to analyze the design. In many cases the HW blocklevel architecture can be extracted from the Simulink subsystem hierarchy, as demonstrated in Figure 2. In a deep hierarchy all workspace parameters that should be configurable in the HW implementation must be brought up to the toplevel interface.
Translating MATLAB scalar code to HLS C++
Translation of a MATLAB scalar model is mainly a syntax conversion from MATLAB to C++. Some details must be added manually, like variable declarations that are needed in MATLAB only in special cases or declaration and initialization of static variables. Loop syntax and array indexing are typical sources of errors. In MATLAB array indexing begins with 1 and in C++ with 0. These types of coding errors can be found with a HLS linting tool.
Data type definitions should be made in a separate type definition file. This enables using a floatingpoint type ac_ieee_float64 in the conversion phase.Alternatively, a very long fixedpoint data type with 64 or 128 bits can be used. When the functionality is validated, the C++ model can be quantized, and data types changed to the final fixedpoint types.
Arbitrary length fixedpoint data types are needed in HLS. Two fixedpoint data types are available as open source under Apache 2.0 license: Algorithmic C fixedpoint library (ac_fixed.h) and SystemC fixedpoint data types. SystemC data types are slower to simulate and clumsier to use than ac_fixedtypes, but functionally both data types are almost identical.
In the classbased example shown in Figure 3. the data types in_t, accu_t and out_t are defined in the class template, which makes the model more configurable and independent of the global type names. Furthermore, the shift register vector SReg is declared as a private data member of the class.
The C++ function interface is using an ac_channel type that enables using the model as a concurrent clocked process with register or FIFO interface in HW. Using the ac_channel requires a data availability test in the C++ code to avoid simulation problems. The ifstatement is replaced by a corresponding handshaking signaling in the HW as well as the readoperation. The local accumulator variables that are only initialized in the MATLAB model, must be declared in the C++ code as well.
The functional code inside the forloop needs only a syntax conversion from MATLAB to C++. The output needs a writeoperation in C++ if the ac_channel interface is used. With a standard C++ function interface, assignment to the output variable is enough.
Translating MATLAB matrix or vector model to HLS C++
Operatorbased matrix and vector operations are more complicated. In MATLAB a statement A = X * Y’ is just one line MATLAB code, but the matrix multiplier implementation in C++ is a large function. Many helpful function libraries can be found in hlslibs.org [12], e.g., ac_math and ac_dsp and matrix library that has C++ function equivalents to most MATLAB operatorbased matrix and vector operations.
In MATLAB the user can write multiple matrix and vector operations in a single statement. Because the C++ implementations are functions, there can be only one function call in each statement. Longer statements must be split into multiple single operation statements. As a free addon, MATLAB indicates the size of the intermediate array variables, that can be used to declare the array sizes in C++, as a side effect. Translation of a simple MATLAB matrix operation is shown in Figure 4.
Translating Simulink design hierarchy to HLS C++
The properly assembled Simulink model has a useful design hierarchy that can be transferred into the HLS model as is. Most HLS tools do not accept any functional operations in the block hierarchy level, so the design must be “cleaned up”. The hierarchy should be kept as simple as possible to enable exploration of different HW implementations.
The module architecture can be implemented as a C++ function or class hierarchy, or an SC_MODULE hierarchy in SystemC. The hierarchical blocks may contain both functional blocks and further hierarchical blocks. Figure 5. demonstrates how a block hierarchy of the DUT can be extracted from the Simulink model having different block implementations. The hierarchical subsystem becomes a hierarchical block in the HLS model and the functional subsystems are translated to functional blocks.
Translation of a functional model depends on the model implementation in Simulink. If the Simulink model uses MATLAB function blocks or System Objects, MATLAB model translation can be used. Library component based schematic implementations in Simulink need a different approach.
Translating Simulink leaflevel block diagram to HLS C++
The leaflevel block diagram using primitive library elements, also called a Simulink schematic, is more difficult to translate. Figure 6. demonstrates a typical Simulink schematic implementation.
This example is not difficult to translate to C++. The model has five multiplication operations, four additions and two registers. It can be written as a forloop that simplifies the structure, or it can be flattened. The translation begins from the second register. The output of both multipliers are stored into a local variable. When the multiplyaccumulate operations are computed, the register gets the value of the first register.
Now the multiplications connected to the first register can be calculated and summed up with the results of the previous stage. The result of the input multiplication can be added to the sum of the feedback path and assigned to the first register. The output is the sum of the first register input and the sum of the feed forward path. An example implementation with a forloop is shown in figure 7.
Simulink library blocks can be more complicated to translate. They can be filters, FFTs, matrix operations, memories, or other complex mathematical functions. Opensource HLS function libraries can be used to implement Simulink library blocks in the same way as toolbox functions in MATLAB.
Validation and verification flow
Multilanguage design flows must have a seamless verification and validation process to ensure the design integrity throughout the process. Using C++ as an intermediate language in the design flow increases the number of verification steps, but it provides a comprehensive set of tools that make the verification easier.
This design flow has three validation and verification steps as shown in Figure 1:
 Functional validation of the handwritten C++ code against the original MATLAB model with MATLAB testbench.
 Validation of the algorithmic performance of the quantized C++ model in the MATLAB testbench
 Coverage analysis of the C++ model in MATLAB testbench
Verification of the synthesized RTL code against the C++ model is a standard part of the Catapult framework.
Validation of the handwritten C++ model against the MATLAB model is the most critical phase. It is implemented by using the MATLAB mex external C API. The C++ model is automatically wrapped into a mex wrapper and compiled to a shared object and an Sfunction block for Simulink is generated in some HLS tool frameworks. The shared object is instantiated into the MATLAB or Simulink testbench manually, parallel to the original MATLAB reference model. Functional difference of the C++ model output should be in the range of +/10^{15} compared the output of the MATLAB reference model.
The same testbench can be used for the performance analysis of the fixedpoint HLS model with minor changes to the scoreboard. For MATLAB testbench no modifications are needed, but in Simulink the input and output data types must be matched to the corresponding data types in the HLS model. Figure 8. shows the instantiation of the HLS DUT into the Simulink testbench. Some additional analysis components are needed to measure the difference between the Simulink and HLS model outputs.
Coverage Analysis for the C++ model provides information about the test quality. Clevel code coverage tools instrument the DUT with probes that store line, decision, and expression coverage information during the simulation and stores the information into a coverage database. The instrumented C++ model can be wrapped into a mex wrapper, instantiated into MATLAB testbench, and simulated in MATLAB. The coverage information gives feedback that helps the test engineer to improve the MATLAB test setup to cover all corner cases. The same test scenarios can be reused later in UVM environment as stimulus.
Clevel verification reduces the verification time dramatically, with up to 1000 times faster execution. This allows running more functional tests, resulting in better design quality. A comprehensive study by University of Oulu reports that their Clevel simulations ran 191 times faster than equivalent RTL simulations in a UVM environment.
Verification of the generated RTL code is a standard procedure in most HLS tools. The most common verification method is RTL cosimulation, which simulates both the generated RTL code and the HLS C++ code in parallel and compares their output. If the results are equal, the synthesis was correct, and the hardware works as specified. In addition to the RTL cosimulation, there are formal methods available. The same RTL cosimulation setup can be reused in a UVM environment.
Quantization of translated HLS
Validation of the translated model is done with floatingpoint or very wide fixedpoint data types to keep the focus of the validation on the functionality, without introducing quantization effects. When the model functionality is completely validated against the original MATLAB or Simulink model, it can be optimized for HLS. The optimization has two major parts: structural optimization covering module and loop optimizations and fixedpoint optimization a.k.a quantization.
During the quantization process, all variables in the HLS model are analyzed and declared to optimal length fixedpoint or integer data types that are synthesized to RTL bit vectors of the same length as the C++ variables. Quantization process has been a subject of many studies and there are very comprehensive SignaltoNoiseRatio (SNR) based methodologies that analyze the effects of quantization noise in the system context, but usually the effort is far too high compared to the benefit. A simpler methodology may require more simulation cycles, but the results are comparable to the SNRbased quantization results.
Value Range Analysis (VRA) based quantization methodology is a simple methodology to find an optimal fixedpoint data type for each variable in the design. VRA is simulationbased methodology that analyzes the propagation of the limited accuracy of input values through the step sizes of the internal variables and recommends word lengths that can represent the minimum step size. In this methodology, the input signals must be “pseudo quantized” having a limited number of possible values. VRA requires an instrumented data type, e.g., ac_fixed<>, that collects and stores maximum and minimum values, signedness, minimum nonzero absolute value, and minimum nonzero absolute difference between two consecutive samples during the simulation. These values are used to calculate the required number of integer and fractional bits:
int_bits = ceil(log2(maxval)) + signed; frac_bits1 = floor(log2(minval)); frac_bits2 = floor(log2(mindiff));
The number of integer bits contains the sign bit, and it can be used as it is. The minimum absolute value is not necessarily the smallest value the variable must be able to represent. For example, a sine wave has a relatively large step size around zero, but a small step size around the pinnacle. To get a better estimate of the required number of fractional bits, both the minimum absolute difference and the minimum nonzero absolute value are needed and the bigger of the two fractional bit numbers should be used. A type declaration of ac_fixed using these numbers looks like:
typedef ac_fixed<(int_bits+frac_bits2), int_bits, signed> fixed_t;
With SystemC fixedpoint data types, the signedness defines, if sc_sfixed<> or sc_ufixed<> type must be used. Otherwise, the type definition is the same.
When Value Range Analysis feature is activated, it collects the variable values during the SystemC simulation and stores the results into a file that can be analyzed in the spreadsheet tool or a Design Analyzer supporting Value Range Analysis perspective. Figure 9. shows part of the initial simulation results for the IIR filter block. All internal variable types are defined as 64bit fixedpoint with 32 integer bits. Only the input and output variables are declared as 16bit variables based on the specification.
This table contains all information collected during the simulation. In addition to the presented columns, there are columns displaying the measured minimum and maximum values and other relevant information that can be used for further analysis. The modified declaration column contains the recommended data types for each variable. For the arrays, the worstcase values are used.
The VRE results may need some fine tuning based on the performance analysis simulation with MATLAB. Like all simulationbased analysis, the VRE results are as good as the stimulus used for the analysis. It should cover the whole value range of the inputs and be long enough to generate enough value changes to each variable. Otherwise, the statistical significance of the measured results may not be enough to create reliable fixedpoint type recommendations.
Conclusions
HLSbased MATLABtoRTL design process is a good alternative to direct synthesis. Using C++ or SystemC as an intermediate language between MATLAB and RTL moves the HW specific design from MATLABlevel to C++ level, giving algorithm developers freedom to use the full power of MATLAB language. Model translation from MATLAB to C++ takes place in the same abstraction level making the validation and debugging of the translated model easy. A strong validation methodology that reuses the MATLAB testbench, combined with coverage analysis, ensures the functional correctness of the HLS model.
HW related modifications are made to the C++ model only. The C++ model is quantized and optimized for HighLevel Synthesis without any modifications to the MATLAB model. The validation environment developed during the model translation phase provides a powerful regression environment that can be used to ensure the model correctness after every modification. HLS tools are usually well connected to RTL verification. Therefore, the verification in this methodology is focusing on the MATLABC++ level.
Translating MATLAB or Simulink model to an efficient RTL model is not fully automated yet. The manual conversion process is fast and easy when the HLS optimized library elements are available. The opensource library development enables translating more complex MATLAB models in a short time.
The design methodology described in this paper has been used in many different types of projects. It has proven to be flexible, easy to deploy, and efficient methodology for all kinds of HW designs. Manual language translation requires some knowledge about the coding style requirements of the HLS tool being used, but by using classbased C++ with templates or SystemC, many of the coding rules can be integrated into the templates.