Data Types
Bit-accurate data types offer a reliable mechanism to model exact hardware bit-widths and arithmetic precision in High-Level Synthesis (HLS) workflows. These types range from simple signed and unsigned integer representations, familiar to RTL designers, to fixed point data types with support for rounding and saturation modes, and even fully IEEE-compliant floating-point types with user-defined exponent and mantissa widths. This level of control ensures bit-exact behavior, maintaining alignment between C++/SystemC numerical models and the final RTL generated by the HLS tool.
Some organizations use proprietary or custom-developed bit-accurate types for modeling. However, two widely available open-source options are AC types and SystemC types they support robust C++/SystemC design styles and offer comprehensive integer and fixed-point modeling capabilities. AC Types have become especially popular due to their support for unlimited bit-length, consistent simulation semantics, and faster execution performance in compiled environments.
-
Tracks
-
Using Bit-Accurate Datatypes
In this video the edge detection algorithm is converted to use bit-accurate data types. -
Neural Network Quantization for Low-Power
This webinar describes how to use Qkeras and High-Level Synthesis to produce a bespoke quantized CNN accelerator, and compares accuracy, power, performance, and area of different quantizations.What you will learn:Determine the optimal operand sizing for hardware accelerator deploying neural network using QKerasDetermine the area, performance, and energy of a neural network acceleratorCompare software performance against hardware accelerated performance - make informed trade-off decisions -
Quantization of High-Level Synthesis Designs Using Value Range Analysis
Algorithm developers use double precision datatypes to be able to focus on the algorithm's mathematical functionality. The process of converting floating-point algorithm to bit-level optimized model is complicated. This paper introduces simple/ robust quantization methodology for HLS designs based on value range analysis.
-
-
Introduction
Hardware design involves designing with complete bit accuracy as the designer has control over every bit. The purpose of High-Level Synthesis is to generate this hardware logic with the help of C/C++. The issue with C++ is that the data types do not provide the same control or features which a hardware designer would get from writing RTL. This is where bit accurate data types come into play which provide support for accurate mathematical modelling just like how an RTL designer would do. From a design perspective, bit accurate data types help improve the area by reducing the unnecessary bits. From a simulation perspective, using bit accurate data types reduces the time taken for simulation versus using native C++ data types which would contain large numbers of unused bits.
Companies might have their own version of bit accurate datatypes, but the two publicly available datatypes are the AC and SystemC datatypes which are supported in both C/C++ and SystemC. They are arbitrary in length and bitaccurate. The rest of this page deep dives into the AC datatypes along with some description of SystemC datatypes.
-
AC Datatypes:
-
intro
The below table shows the different types provided as classes in the AC data types which designers can use for accurate HLS modelling.
-
File Description Class ac_int.h Integer ac_int<W,S> ac_fixed.h Fixed- Point ac_fixed<W,I,S,Q> ac_float.h Floating Point ac_float<W,I,E,Q> ac_std_float.h Standard
Floating Pointac_std_float
<W,E>ac_complex.h Complex ac_complex<T> -
The parameters for the classes...
The parameters for the classes are:
• W: integer representing width of the type. For ac_float it is width of the mantissa.
• S: bool parameter representing signedness of ac_int or ac_fixed type.
• I: integer representing integer width.
• Q,O: enumeration parameter for quantization (rounding) and overflow modes.
• T: type. For ac_complex, T is restricted to AC Numerical types and C++ integer and floating types.
Let us now look at each of these data types in detail.
-
ac_int:
ac_int datatype is bit accurate “integer” type. The syntax and range as given below:
-
Type Description Numerical Range Quantum ac_int<W,false> Unsigned Integer 0 to (2^W)-1 1 ac_int<W,true> Signed Integer -2^(W-1) to 2^(W-1)-1 1 -
ac_int cnt.
The template parameters are:
- W- Width of integer
- S- Signedness
Usage:
Example 1:
#include <ac_int.h> ac_int <8,false> val;
- val will accept values from 0 to 255.
Example 2:
#include <ac_int.h> ac_int <8,true > val;
- val will now accept values from -128 to 127.
Example 1 is an e.g. of unsigned integer while Example 2 is an e.g. of signed integer.
The following table shows how ac_int must be used to obtain C++ datatypes along with type specifiers:
-
C++ Datatype ac_int Range Short int ac_int<16,true> -32,768 to 32,767 Unsigned short int ac_int<16,false> 0 to 65,536 Unsigned int ac_int<32,true> 0 to 4,294,967,295 Int ac_int<32,true> -2,147,483,648 to
2,147,483,647Long int ac_int<32,true> -2,147,483,648 to
2,147,483,647Unsigned long int ac_int<32,false> 0 to 4,294,967,295 Long long int ac_int<64,true> -(2^63) to (2^63)-1 Unsigned long long
intac_int<64,false> 0 to (2^64)-1 -
ac_fixed:
The Algorithmic C fixed point data types are declared as:
-
ac_fixed<int W, int I, bool S, ac_q_mode Q, ac_o_mode O>
W - Total width of the entire fixed value
I - The total width of integer or the position of where the decimal is present in the fixed datatype
S - Signedness
ac_q_mode Q
- Quantization mode
ac_o_mode O
- Overflow mode -
The quantization mode and overflow...
The quantization mode and overflow mode are optional template parameters.
The range and quantum are given below:
-
Type Description Numerical Range Quantum ac_fixed<W,I,false> Unsigned fixed-point 0 to (1-2^(-W)).(2^I) 2^(I-W) ac_int<W,I,true> Signed fixed-point (-0.5). (2^I) to (0.5 – (2^(-W))) .(2^I) 2^(I-W) -
Usage...
Usage:
#include <ac_fixed.h> ac_fixed <8,2,true> val1;
- val1 is 8 bit in width, with integer part being 2 bits and decimal part being 6 bits and is a signed value.
ac_fixed <8,2,false> val2;
- val2 is 8 bit in width, with integer part being 2 bits and decimal part being 6 bits and is an unsigned value.
ac_fixed <4,4,true> val3;
- val3 is now a 4-bit integer: equivalent to ac_int<4,true>
-
Quantization and Overflow:
Quantization:
- Quantization: Bits to the right of the LSB of target type are lost. The value can be adjusted by
- Rounding: Choose the nearest quantization level, at exact half value is set based on specific rounding mode.
- Truncation: Choose the closest quantization level such that result (quantized value) is less than or equal the source value (truncation toward minus infinity) or such that the absolute value of the result is less than or equal the source value (truncation towards zero).
Set based on the 4th template which is optional. The default is set as AC_TRN which is truncation towards negative infinity.
The table here shows the various supported quantization modes.
Overflow:
- Value after quantization is outside the range of the target is overflow except when the overflow mode is AC_SAT_SYM where the range is symmetric: [(-2 ^(W-1) +1, 2^(W-1) -1] in which case the most negative number -2^(W-1) triggers an overflow.
- AC_WRAP is set as default, which is controlled by the 5th template in ac_fixed.
- AC_WRAP just drops the bits left of the MSB.
- The below table shows the different overflow modes supported for ac_fixed.
Operators and Methods:
- Binary Arithmetic and Logical Operators:
- Arithmetic operators such as “+”,”-“,”/”,”*” and “%” and binary operators such as “&”,”|” and “!” are supported.
- ac_fixed is returned if one or either of the operands of the type ac_fixed, otherwise will be of the type ac_int.
- If either of the types are signed, then the output type is signed as well.
- The “-” operator always returns an ac_int/ac_fixed of type signed.
- Arithmetic operators return the full bit precision data type.
- Relational Operators:
- Relational operators !=, ==, >, >=, < and <= are also binary operations and have some of the same characteristics described for arithmetic and logical operations.
- Return type is bool.
- Shifting Operators:
- Both Left shift (<<) and right shift (>>) are supported.
- Left shifts will lose bits and it behaves like a hardware shift register, irrespective of data type and bit width.
- The left shift operator shifts in zeros.
- The right shift operator shifts in the MSB bit for ac_int/ac_fixed of type signed, 0 for ac_int/ac_fixed integers of type unsigned.
- Unary Operators:
- “+”,”-“,”~” and “!” are supported.
- +x returns x while -x returns 0-x.
- ~ returns one’s complement
- ! returns 0 if true otherwise false
- Increment and Decrement Operator:
- ++ and --, increases and decreased the value by 1 respectively.
- Post and Pre are supported.
- Bit Select Operator:
- Bits of ac_int/ac_fixed can be selected using the [] operator.
- E.g.
ac_int<3> val; if(val[1]==1) val=2;
- Slicing:
- Read Slice:
- Used to read slices of bits from ac_int/ac_fixed values.
- Out of bounds reads are allowed, will just be sign extended for signed and zero padded for unsigned.
- syntax: slc<W>(int lsb), where W is the total length of slice and lsb is the starting index.
- Read Slice:
E.g.
- Write Slice:
- set_slc method is used for writing values to slices of ac_int/ac_fixed.
- Usage: set_slc(int lsb, const ac_int<W,S> &slc), where W is the width of the slice and S is the Signedness, lsb is the starting position of the slice in the value.
E.g.
- Conversions:
- Implicit Conversion:
- Few implicit conversions to C integer types are supported by ac_int.
- ac_fixed does not have any implicit conversion to C types.
- Implicit Conversion:
- Explicit Conversion:
The following functions are supported for explicit conversion:
- Quantization: Bits to the right of the LSB of target type are lost. The value can be adjusted by
-
Ac_float:
- Represent higher range of values compared to ac_fixed with same number of bits.
- Have a mantissa and exponent; mantissa is a value in fixed point while exponent shifts the decimal point making it a floating-point value.
- The value of floating point is:
-
Value = m * (2^e)
Where m is the mantissa and e is the exponent. -
The usage in Catapult...
The usage in Catapult is:
-
ac_float<W,I,E,Q>
where the first two parameters W and I define the mantissa as anac_fixed<W,I,true>
, the E defines the exponent as anac_int<E,true>
and Q defines the rounding mode. -
E.g. is:
E.g. is:
#include <ac_float.h> ac_float<10,1,4,AC_RND> x = 0.23 // W=10, I=1, E=4, Q=AC_RND
- Ac_float does not have an implied “1” bit.
- Due to this reason, infinity (+inf and -inf) and NaN are not represented.
- The mantissa is encoded as two’s complement instead of the standard sign-magnitude representation.
- Saturation is performed by default (AC_SAT).
- The default quantization parameter is AC_TRN.
- The range and quantum is:
-
Numerical Range Quantum (-0.5).2^(I+max_exp) to (0.5-2^(-W)).2^(I+max_exp)where max_exp = (2^(E-1))-1 2^(I+W+min_exp) where min_exp = -2^(E-1) -
Standard float is represented as...
- Standard float is represented as ac_float<25,2,8> and double is represented as ac_float<54,2,11>.
Operators and Functions:
- Multiplication (*): This is an arithmetic operator where there is no loss of precision, the mantissas are multiplied (determined by ac_fixed rules for multiplication) and exponents are added (determined by ac_int rules for addition).
- Division (/): Mantissa is divided based on ac_fixed rules, while exponents are subtracted based on ac_int rules.
- Add(op1,op2), sub(op1,op2): Add and subtract op2 to/from op1.
- Shifters (<<,>>): Bidirectional where Mantissa is unchanged, and shift value is added to (<<) and subtracted from (>>) exponent.
- Assignment (=): Quantization which is specified by target followed by saturation if outside the numerical range.
- mantissa(), exp(): Methods to get the mantissa and exp values.
- Explicit conversion to other types: to_double(), to_float(), to_ac_fixed(), to_ac_int(), to_int(),to_uint(), to_long(), to_ulong(), to_int64() and to_uint64()
-
-
Standard Floating-Point Data types:
-
The standard floating-point datatypes...
- The standard floating-point datatypes consist of the following:
-
Type Bit Widths Description ac_ieee_float 16,32,64,128,256 Binary format
IEEE Floatsbfloat16 16 Bfloat16 from
Googleac_std_float<W,E> Arbitrary W is overall
width; E is
exponential
value -
They are different from ac_float...
They are different from ac_float in the following ways:
- The datatype of results of arithmetic operators are same as operands, there is no bit growth
- The mantissa is encoded with implied “1” for normal numbers.
- Special values like Nan, +inf and -inf are encoded with the exp as all ones.
- The mantissa is encoded as sign-magnitude instead of 2s complement.
-
ac_std_float:
- Has an overall width W and exponent width “E”.
- Used to implement ac_ieee_float and bfloat16.
- Provides conversions to and from ac_float.
Usage:
#include <ac_std_float.h> ac_std_float<8,2>a = …;
-
Operators...
Operators:
- Arithmetic Binary operators such as +,-,/,* are supported using the default rounding of AC_RND_CONV (IEEE:roundTiesToEven) with subnormal support.
- Arithmetic Assign Operators such as +=,-+,/=,*+ are supported using the default rounding of AC_RND_CONV with subnormal support.
- Relation Operators such as ==,!=,<,>=,>,<= are supported where except !=, the rest of them return false if either operand is NaN.
- Unary operators such as !,+,-.
Member Functions:
-
Member Function Description x.abs() Return the absolute value x.copysign(ac_std_float &op2) Return value whose absolute value matches that of x, but whose sign bit matches that of op2. x.fpclassify() FP_NAN: if x is not a number
FP_INFINITE: if x is +Inf or -Inf
FP_ZERO: if x is 0
FP_SUBNORMAL: if x is not zero, but smaller than smallest normal value
FP_NORMAL: if none of the abovex.isfinite() x.fpclassify() != FP_NAN && fpclassify(x) != FP_INFINITE x.isnormal() x.fpclassify() == FP_NORMAL x.isnan() x.fpclassify() == FP_NAN x.isinf() x.fpclassify() == FP_INFINITE -
Explicit Conversion Methods...
Explicit Conversion Methods:
-
Member Function Description to_ac_float() Conversion to closest equivalent ac_float:
The closest equivalent to
ac_std_float<W,E> is ac_float<W-E+1, 2,
E, AC_RND_CONV>to_float() Conversion to float to_double() Conversion to double convert_to_ac_fixed<W,I,S,Q,O>(
bool map_inf=false)Convert to specific ac_fixed type. Values +/-Inf
are mapped to +/-max() if map_inf is true;
otherwise asserts. Values +/-NaN assert.convert_to_ac_int<W,S>(bool
map_inf=false)Equivalent to:
convert_to_fixed<W,W,S,AC_TRN_ZERO,
AC_WRAP>(map_inf).to_ac_int()convert_to_int(bool
map_inf=false)Equivalent to:
convert_to_ac_int<32,true>(map_inf).to_int()convert_to_int64(bool
map_inf=false)Equivalent to:
convert_to_ac_int<64,true>
(map_inf).to_int64() -
ac_ieee_float:
- The ac_ieee_float provides support for standard IEEE binary floating-point numbers. It is implemented using ac_std_float with the corresponding W and E settings as shown below.
- The same functions and operators present in ac_std_float are supported here as well.
-
Bfloat16:
-
The class ac::bfloat16 provides support...
- The class ac::bfloat16 provides support for the tensorflow::bfloat16 from Google.
- The two types could be used interchangeably, but it is important to set the rounding mode to be towards zero when using thetensorflow version: assert(!std::fesetround(FE_TOWARDZERO));
- Most operators and methods are the same as for ac_std_float and ac_ieee_float
Ac_complex:
- The algorithmic datatype ac_complex is a templatized class for representing complex numbers. The template argument defines the type of the real and imaginary numbers and can be any of the following:
- Algorithmic C integer type: ac_int<W,S>
- Algorithmic C fixed-point type: ac_fixed<W,I,S,Q,O>
- Native C integer types: bool, (un)signed char, short, int, long and long long
- Native C floating-point types: float and double
- Important feature of the ac_complex type is that operators return the types according to the rules of the underlying type.
- For example, operators on ac_complex types based on ac_int and ac_fixed will return results for the operators ‘+’, ‘-’ and ‘*’ with no loss of precision, similarly for native float and double as well.
- Binary operators are defined for ac_complex types that are based on different types, provided the underlying types have the necessary operators defined.
Usage and Example:
#include <ac_complex.h> ac_complex<ac_fixed<16,8,true> > x (2.0, -3.0); ac_complex<ac_int<5,true> > i(2, 1); ac_complex<unsigned short> s(1, 0); ac_complex<double> d(3.5, 3.14);
-
Operators:
Operators:
The following operators are defined for ac_complex:
- Arithmetic operators such as +,-,/,* where the 1st or 2nd argument may be C int or ac_fixed.
- Assignment operator (=)
- Unary arithmetic such as +,-.
- !x, Equivalent to x==0.
- ==,!=; comparison of ac_int with float/double is not supported.
Methods:
-
Methods Description r(),real() Return real part of ac_complex i(),imag() Return imaginary part of ac_complex set_r(const T2&r) Set the real value of ac_complex set_i(const T2&i) Set the imaginary value of ac_complex conj() Complex Conjugate sign_conj() returns (sign(real), sign(imag))) as an
ac_complex<ac_int<2,true>>mag_sqr() returns sqr(real)+sqr(imag) -
SC Datatypes...
SC Datatypes:
- All C++ native datatypes are supported in SystemC.
sc_bit:
- Is either 0 or 1.
- Supports the following operators:
- Bitwise and, or, not ,xor
- And,or,xor assignment
- Comparison
sc_logic:
- Takes 4 possible values
- 0 – false
- 1- True
- X- Unknown
- Z- floating value
- Supports the following operators:
- Bitwise and, or, not ,xor
- And,or,xor assignment
- Comparison
sc_bv:
- Bit vector for multi-bit. sc_bit is for one bit.
- E.g.: sc_bv <8> val;
- The same operators as sc_bit are supported.
- Additional supported function is the range function and the reduction functions.
sc_lv:
- sc_logic is for one bit while this is multi-bit.
- E.g. :sc_lv<8> val;
- The same operators as sc_bit are supported.
- Additional supported function is the range function and the reduction functions.
sc_int:
- sc_int is a signed integer of fixed precision.
- It is 64 bits and stored as 2s complement.
- sc_int<int v> where v is the no of bits which are from 1 to 64.
- The N-1th bit is the sign bit
- Supported operators and functions:
- Arithmetic
- Shifters
- Unary
- Binary Operators
- Range, concatenation, bit select
sc_uint:
- Unsigned version of sc_int.
- Has all the supported operators and functions as sc_int
sc_bigint:
- Supports greater than 64 bits.
sc_biguint:
- Unsigned version of sc_bigint, supporting greater than 64 bits.
sc_fixed:
- Signed fixed point with finite precision.
- Takes in the following parameters:
- Total bitwidth
- Integer width (position of decimal point)
- Quantization mode (default is SC_TRN)
- Overflow mode (default is SC_WRAP)
- N_BITS (default is 0)
sc_ufixed:
- Unsigned fixed point with finite precision
- Takes in the following parameters:
- Total bitwidth
- Integer width (position of decimal point)
- Quantization mode (default is SC_TRN)
- Overflow mode (default is SC_WRAP)
- N_BITS (default is 0)
-
-
-
Conclusion
High-Level Synthesis involves designing with C++, however since it is hardware design, native C++ data types would be inefficient in terms of area and power. Using bit accurate data types is very important and this is where the ac_datatypes come into play. They are easy to use, modularized and have very good simulation speed making it ideal for hardware design using High-Level Synthesis.