Data Types

Bit-accurate data types offer a reliable mechanism to model exact hardware bit-widths and arithmetic precision in High-Level Synthesis (HLS) workflows. These types range from simple signed and unsigned integer representations, familiar to RTL designers, to fixed point data types with support for rounding and saturation modes, and even fully IEEE-compliant floating-point types with user-defined exponent and mantissa widths. This level of control ensures bit-exact behavior, maintaining alignment between C++/SystemC numerical models and the final RTL generated by the HLS tool.
Some organizations use proprietary or custom-developed bit-accurate types for modeling. However, two widely available open-source options are AC types and SystemC types they support robust C++/SystemC design styles and offer comprehensive integer and fixed-point modeling capabilities. AC Types have become especially popular due to their support for unlimited bit-length, consistent simulation semantics, and faster execution performance in compiled environments.

Stuart Clubb

Tracks
- Using Bit-Accurate Datatypes
  
  In this video the edge detection algorithm is converted to use bit-accurate data types.
  Webinar May 17, 2021 by Michael Fingeroff
  
  Data Types
- Neural Network Quantization for Low-Power
  
  This webinar describes how to use Qkeras and High-Level Synthesis to produce a bespoke quantized CNN accelerator, and compares accuracy, power, performance, and area of different quantizations.What you will learn:Determine the optimal operand sizing for hardware accelerator deploying neural network using QKerasDetermine the area, performance, and energy of a neural network acceleratorCompare software performance against hardware accelerated performance - make informed trade-off decisions
  Webinar Russell Klein Ajay Mishra
  
  Data Types
- Quantization of High-Level Synthesis Designs Using Value Range Analysis
  
  Algorithm developers use double precision datatypes to be able to focus on the algorithm's mathematical functionality. The process of converting floating-point algorithm to bit-level optimized model is complicated. This paper introduces simple/ robust quantization methodology for HLS designs based on value range analysis.
  PDF Petri Solanti
  
  Data Types
- Reference material on AC Datatypes: Algorithmic C (AC) Datatypes Reference Manual
  PDF
- Reference material on AC Datatypes: HLSLibs
  LINK
Introduction

Hardware design involves designing with complete bit accuracy as the designer has control over every bit. The purpose of High-Level Synthesis is to generate this hardware logic with the help of C/C++. The issue with C++ is that the data types do not provide the same control or features which a hardware designer would get from writing RTL. This is where bit accurate data types come into play which provide support for accurate mathematical modelling just like how an RTL designer would do. From a design perspective, bit accurate data types help improve the area by reducing the unnecessary bits. From a simulation perspective, using bit accurate data types reduces the time taken for simulation versus using native C++ data types which would contain large numbers of unused bits.
Companies might have their own version of bit accurate datatypes, but the two publicly available datatypes are the AC and SystemC datatypes which are supported in both C/C++ and SystemC. They are arbitrary in length and bitaccurate. The rest of this page deep dives into the AC datatypes along with some description of SystemC datatypes.

AC Datatypes:

intro

The below table shows the different types provided as classes in the AC data types which designers can use for accurate HLS modelling.

File	Description	Class
ac_int.h	Integer	ac_int<W,S>
ac_fixed.h	Fixed- Point	ac_fixed<W,I,S,Q>
ac_float.h	Floating Point	ac_float<W,I,E,Q>
ac_std_float.h	Standard Floating Point	ac_std_float <W,E>
ac_complex.h	Complex	ac_complex<T>

The parameters for the classes...

The parameters for the classes are:
• W: integer representing width of the type. For ac_float it is width of the mantissa.
• S: bool parameter representing signedness of ac_int or ac_fixed type.
• I: integer representing integer width.
• Q,O: enumeration parameter for quantization (rounding) and overflow modes.
• T: type. For ac_complex, T is restricted to AC Numerical types and C++ integer and floating types.
Let us now look at each of these data types in detail.
ac_int:

ac_int datatype is bit accurate “integer” type. The syntax and range as given below:

Type	Description	Numerical Range	Quantum
ac_int<W,false>	Unsigned Integer	0 to (2^W)-1	1
ac_int<W,true>	Signed Integer	-2^(W-1) to 2^(W-1)-1	1

ac_int cnt.
The template parameters are:
- W- Width of integer
- S- Signedness
Usage:
Example 1:
```
#include <ac_int.h> 

ac_int <8,false> val;
```
- val will accept values from 0 to 255.
Example 2:
```
#include <ac_int.h> 

ac_int <8,true > val; 
```
- val will now accept values from -128 to 127.
Example 1 is an e.g. of unsigned integer while Example 2 is an e.g. of signed integer.
The following table shows how ac_int must be used to obtain C++ datatypes along with type specifiers:

C++ Datatype	ac_int	Range
Short int	ac_int<16,true>	-32,768 to 32,767
Unsigned short int	ac_int<16,false>	0 to 65,536
Unsigned int	ac_int<32,true>	0 to 4,294,967,295
Int	ac_int<32,true>	-2,147,483,648 to 2,147,483,647
Long int	ac_int<32,true>	-2,147,483,648 to 2,147,483,647
Unsigned long int	ac_int<32,false>	0 to 4,294,967,295
Long long int	ac_int<64,true>	-(2^63) to (2^63)-1
Unsigned long long int	ac_int<64,false>	0 to (2^64)-1

ac_fixed:

The Algorithmic C fixed point data types are declared as:

ac_fixed<int W, int I, bool S, ac_q_mode Q, ac_o_mode O>

W - Total width of the entire fixed value
I - The total width of integer or the position of where the decimal is present in the fixed datatype
S - Signedness
ac_q_mode Q - Quantization mode
ac_o_mode O - Overflow mode

The quantization mode and overflow...

The quantization mode and overflow mode are optional template parameters.
The range and quantum are given below:

Type	Description	Numerical Range	Quantum
ac_fixed<W,I,false>	Unsigned fixed-point	0 to (1-2^(-W)).(2^I)	2^(I-W)
ac_int<W,I,true>	Signed fixed-point	(-0.5). (2^I) to (0.5 – (2^(-W))) .(2^I)	2^(I-W)

Usage...
Usage:
```
#include <ac_fixed.h>

ac_fixed <8,2,true> val1;
```
- val1 is 8 bit in width, with integer part being 2 bits and decimal part being 6 bits and is a signed value.
```
ac_fixed <8,2,false> val2;
```
- val2 is 8 bit in width, with integer part being 2 bits and decimal part being 6 bits and is an unsigned value.
```
ac_fixed <4,4,true> val3;
```
- val3 is now a 4-bit integer: equivalent to ac_int<4,true>
Quantization and Overflow:
Quantization:
- Quantization: Bits to the right of the LSB of target type are lost. The value can be adjusted by
  Rounding: Choose the nearest quantization level, at exact half value is set based on specific rounding mode.
  Truncation: Choose the closest quantization level such that result (quantized value) is less than or equal the source value (truncation toward minus infinity) or such that the absolute value of the result is less than or equal the source value (truncation towards zero).
Set based on the 4th template which is optional. The default is set as AC_TRN which is truncation towards negative infinity.
The table here shows the various supported quantization modes.
Overflow:
- Value after quantization is outside the range of the target is overflow except when the overflow mode is AC_SAT_SYM where the range is symmetric: [(-2 ^(W-1) +1, 2^(W-1) -1] in which case the most negative number -2^(W-1) triggers an overflow.
- AC_WRAP is set as default, which is controlled by the 5th template in ac_fixed.
- AC_WRAP just drops the bits left of the MSB.
- The below table shows the different overflow modes supported for ac_fixed.
Operators and Methods:
- Binary Arithmetic and Logical Operators:
  Arithmetic operators such as “+”,”-“,”/”,”*” and “%” and binary operators such as “&”,”|” and “!” are supported.
  ac_fixed is returned if one or either of the operands of the type ac_fixed, otherwise will be of the type ac_int.
  If either of the types are signed, then the output type is signed as well.
  The “-” operator always returns an ac_int/ac_fixed of type signed.
  Arithmetic operators return the full bit precision data type.
- Relational Operators:
  Relational operators !=, ==, >, >=, < and <= are also binary operations and have some of the same characteristics described for arithmetic and logical operations.
  Return type is bool.
- Shifting Operators:
  Both Left shift (<<) and right shift (>>) are supported.
  Left shifts will lose bits and it behaves like a hardware shift register, irrespective of data type and bit width.
  The left shift operator shifts in zeros.
  The right shift operator shifts in the MSB bit for ac_int/ac_fixed of type signed, 0 for ac_int/ac_fixed integers of type unsigned.
- Unary Operators:
  “+”,”-“,”~” and “!” are supported.
  +x returns x while -x returns 0-x.
  ~ returns one’s complement
  ! returns 0 if true otherwise false
- Increment and Decrement Operator:
  ++ and --, increases and decreased the value by 1 respectively.
  Post and Pre are supported.
- Bit Select Operator:
  Bits of ac_int/ac_fixed can be selected using the [] operator.
  E.g.
```
ac_int<3> val; if(val[1]==1) val=2;
```
- Slicing:
  Read Slice:
  Used to read slices of bits from ac_int/ac_fixed values.
  Out of bounds reads are allowed, will just be sign extended for signed and zero padded for unsigned.
  syntax: slc<W>(int lsb), where W is the total length of slice and lsb is the starting index.
E.g.
- Write Slice:
  set_slc method is used for writing values to slices of ac_int/ac_fixed.
  Usage: set_slc(int lsb, const ac_int<W,S> &slc), where W is the width of the slice and S is the Signedness, lsb is the starting position of the slice in the value.
E.g.
- Conversions:
  Implicit Conversion:
  Few implicit conversions to C integer types are supported by ac_int.
  ac_fixed does not have any implicit conversion to C types.
- Explicit Conversion:
The following functions are supported for explicit conversion:
Ac_float:
- Represent higher range of values compared to ac_fixed with same number of bits.
- Have a mantissa and exponent; mantissa is a value in fixed point while exponent shifts the decimal point making it a floating-point value.
- The value of floating point is:
Value = m * (2^e)

Where m is the mantissa and e is the exponent.
The usage in Catapult...

The usage in Catapult is:

ac_float<W,I,E,Q>

where the first two parameters W and I define the mantissa as an ac_fixed<W,I,true>, the E defines the exponent as an ac_int<E,true> and Q defines the rounding mode.

E.g. is:
E.g. is:
```
#include <ac_float.h> 

 

ac_float<10,1,4,AC_RND> x = 0.23 // W=10, I=1, E=4, Q=AC_RND
```
- Ac_float does not have an implied “1” bit.
- Due to this reason, infinity (+inf and -inf) and NaN are not represented.
- The mantissa is encoded as two’s complement instead of the standard sign-magnitude representation.
- Saturation is performed by default (AC_SAT).
- The default quantization parameter is AC_TRN.
- The range and quantum is:

Numerical Range	Quantum
(-0.5).2^(I+max_exp) to (0.5-2^(-W)).2^(I+max_exp)where max_exp = (2^(E-1))-1	2^(I+W+min_exp) where min_exp = -2^(E-1)

Standard float is represented as...
- Standard float is represented as ac_float<25,2,8> and double is represented as ac_float<54,2,11>.
Operators and Functions:
- Multiplication (*): This is an arithmetic operator where there is no loss of precision, the mantissas are multiplied (determined by ac_fixed rules for multiplication) and exponents are added (determined by ac_int rules for addition).
- Division (/): Mantissa is divided based on ac_fixed rules, while exponents are subtracted based on ac_int rules.
- Add(op1,op2), sub(op1,op2): Add and subtract op2 to/from op1.
- Shifters (<<,>>): Bidirectional where Mantissa is unchanged, and shift value is added to (<<) and subtracted from (>>) exponent.
- Assignment (=): Quantization which is specified by target followed by saturation if outside the numerical range.
- mantissa(), exp(): Methods to get the mantissa and exp values.
- Explicit conversion to other types: to_double(), to_float(), to_ac_fixed(), to_ac_int(), to_int(),to_uint(), to_long(), to_ulong(), to_int64() and to_uint64()

Standard Floating-Point Data types:

The standard floating-point datatypes...
- The standard floating-point datatypes consist of the following:

Type	Bit Widths	Description
ac_ieee_float	16,32,64,128,256	Binary format IEEE Floats
bfloat16	16	Bfloat16 from Google
ac_std_float<W,E>	Arbitrary	W is overall width; E is exponential value

They are different from ac_float...
They are different from ac_float in the following ways:
- The datatype of results of arithmetic operators are same as operands, there is no bit growth
- The mantissa is encoded with implied “1” for normal numbers.
- Special values like Nan, +inf and -inf are encoded with the exp as all ones.
- The mantissa is encoded as sign-magnitude instead of 2s complement.
ac_std_float:
- Has an overall width W and exponent width “E”.
- Used to implement ac_ieee_float and bfloat16.
- Provides conversions to and from ac_float.
Usage:
```
#include <ac_std_float.h> 

 

ac_std_float<8,2>a = …; 
```
Operators...
Operators:
- Arithmetic Binary operators such as +,-,/,* are supported using the default rounding of AC_RND_CONV (IEEE:roundTiesToEven) with subnormal support.
- Arithmetic Assign Operators such as +=,-+,/=,*+ are supported using the default rounding of AC_RND_CONV with subnormal support.
- Relation Operators such as ==,!=,<,>=,>,<= are supported where except !=, the rest of them return false if either operand is NaN.
- Unary operators such as !,+,-.
Member Functions:

Member Function	Description
x.abs()	Return the absolute value
x.copysign(ac_std_float &op2)	Return value whose absolute value matches that of x, but whose sign bit matches that of op2.
x.fpclassify()	FP_NAN: if x is not a number FP_INFINITE: if x is +Inf or -Inf FP_ZERO: if x is 0 FP_SUBNORMAL: if x is not zero, but smaller than smallest normal value FP_NORMAL: if none of the above
x.isfinite()	x.fpclassify() != FP_NAN && fpclassify(x) != FP_INFINITE
x.isnormal()	x.fpclassify() == FP_NORMAL
x.isnan()	x.fpclassify() == FP_NAN
x.isinf()	x.fpclassify() == FP_INFINITE

Explicit Conversion Methods...

Explicit Conversion Methods:

Member Function	Description
to_ac_float()	Conversion to closest equivalent ac_float: The closest equivalent to ac_std_float<W,E> is ac_float<W-E+1, 2, E, AC_RND_CONV>
to_float()	Conversion to float
to_double()	Conversion to double
convert_to_ac_fixed<W,I,S,Q,O>( bool map_inf=false)	Convert to specific ac_fixed type. Values +/-Inf are mapped to +/-max() if map_inf is true; otherwise asserts. Values +/-NaN assert.
convert_to_ac_int<W,S>(bool map_inf=false)	Equivalent to: convert_to_fixed<W,W,S,AC_TRN_ZERO, AC_WRAP>(map_inf).to_ac_int()
convert_to_int(bool map_inf=false)	Equivalent to: convert_to_ac_int<32,true>(map_inf).to_int()
convert_to_int64(bool map_inf=false)	Equivalent to: convert_to_ac_int<64,true> (map_inf).to_int64()

ac_ieee_float:
- The ac_ieee_float provides support for standard IEEE binary floating-point numbers. It is implemented using ac_std_float with the corresponding W and E settings as shown below.
- The same functions and operators present in ac_std_float are supported here as well.

Bfloat16:

The class ac::bfloat16 provides support...
- The class ac::bfloat16 provides support for the tensorflow::bfloat16 from Google.
- The two types could be used interchangeably, but it is important to set the rounding mode to be towards zero when using thetensorflow version: assert(!std::fesetround(FE_TOWARDZERO));
- Most operators and methods are the same as for ac_std_float and ac_ieee_float
Ac_complex:
- The algorithmic datatype ac_complex is a templatized class for representing complex numbers. The template argument defines the type of the real and imaginary numbers and can be any of the following:
  Algorithmic C integer type: ac_int<W,S>
  Algorithmic C fixed-point type: ac_fixed<W,I,S,Q,O>
  Native C integer types: bool, (un)signed char, short, int, long and long long
  Native C floating-point types: float and double
- Important feature of the ac_complex type is that operators return the types according to the rules of the underlying type.
- For example, operators on ac_complex types based on ac_int and ac_fixed will return results for the operators ‘+’, ‘-’ and ‘*’ with no loss of precision, similarly for native float and double as well.
- Binary operators are defined for ac_complex types that are based on different types, provided the underlying types have the necessary operators defined.
Usage and Example:
```
#include <ac_complex.h> 

ac_complex<ac_fixed<16,8,true> > x (2.0, -3.0); 

ac_complex<ac_int<5,true> > i(2, 1); 

ac_complex<unsigned short> s(1, 0); 

ac_complex<double> d(3.5, 3.14);
```
Operators:
Operators:
The following operators are defined for ac_complex:
- Arithmetic operators such as +,-,/,* where the 1st or 2nd argument may be C int or ac_fixed.
- Assignment operator (=)
- Unary arithmetic such as +,-.
- !x, Equivalent to x==0.
- ==,!=; comparison of ac_int with float/double is not supported.
Methods:

Methods	Description
r(),real()	Return real part of ac_complex
i(),imag()	Return imaginary part of ac_complex
set_r(const T2&r)	Set the real value of ac_complex
set_i(const T2&i)	Set the imaginary value of ac_complex
conj()	Complex Conjugate
sign_conj()	returns (sign(real), sign(imag))) as an ac_complex<ac_int<2,true>>
mag_sqr()	returns sqr(real)+sqr(imag)

SC Datatypes...
SC Datatypes:
- All C++ native datatypes are supported in SystemC.
sc_bit:
- Is either 0 or 1.
- Supports the following operators:
  Bitwise and, or, not ,xor
  And,or,xor assignment
  Comparison
sc_logic:
- Takes 4 possible values
  0 – false
  1- True
  X- Unknown
  Z- floating value
- Supports the following operators:
  Bitwise and, or, not ,xor
  And,or,xor assignment
  Comparison
sc_bv:
- Bit vector for multi-bit. sc_bit is for one bit.
- E.g.: sc_bv <8> val;
- The same operators as sc_bit are supported.
- Additional supported function is the range function and the reduction functions.
sc_lv:
- sc_logic is for one bit while this is multi-bit.
- E.g. :sc_lv<8> val;
- The same operators as sc_bit are supported.
- Additional supported function is the range function and the reduction functions.
sc_int:
- sc_int is a signed integer of fixed precision.
- It is 64 bits and stored as 2s complement.
- sc_int<int v> where v is the no of bits which are from 1 to 64.
- The N-1th bit is the sign bit
- Supported operators and functions:
  Arithmetic
  Shifters
  Unary
  Binary Operators
  Range, concatenation, bit select
sc_uint:
- Unsigned version of sc_int.
- Has all the supported operators and functions as sc_int
sc_bigint:
- Supports greater than 64 bits.
sc_biguint:
- Unsigned version of sc_bigint, supporting greater than 64 bits.
sc_fixed:
- Signed fixed point with finite precision.
- Takes in the following parameters:
  Total bitwidth
  Integer width (position of decimal point)
  Quantization mode (default is SC_TRN)
  Overflow mode (default is SC_WRAP)
  N_BITS (default is 0)
sc_ufixed:
- Unsigned fixed point with finite precision
- Takes in the following parameters:
  Total bitwidth
  Integer width (position of decimal point)
  Quantization mode (default is SC_TRN)
  Overflow mode (default is SC_WRAP)
  N_BITS (default is 0)

Conclusion

High-Level Synthesis involves designing with C++, however since it is hardware design, native C++ data types would be inefficient in terms of area and power. Using bit accurate data types is very important and this is where the ac_datatypes come into play. They are easy to use, modularized and have very good simulation speed making it ideal for hardware design using High-Level Synthesis.

Data Types

Tracks

Using Bit-Accurate Datatypes

Neural Network Quantization for Low-Power

Quantization of High-Level Synthesis Designs Using Value Range Analysis

Reference material on AC Datatypes: Algorithmic C (AC) Datatypes Reference Manual

Reference material on AC Datatypes: HLSLibs

Introduction

AC Datatypes:

intro

The parameters for the classes...

ac_int:

ac_int cnt.

ac_fixed:

The quantization mode and overflow...

Usage...

Quantization and Overflow:

Ac_float:

The usage in Catapult...

E.g. is:

Standard float is represented as...

Standard Floating-Point Data types:

The standard floating-point datatypes...

They are different from ac_float...

ac_std_float:

Operators...

Explicit Conversion Methods...

ac_ieee_float:

Bfloat16:

The class ac::bfloat16 provides support...

Operators:

SC Datatypes...

Conclusion