Hierarchy
Hierarchy is a fundamental concept in both Register Transfer Level (RTL) and High-Level Synthesis (HLS) for managing design complexity, promoting modular design, and enabling IP reuse. In SystemC, hierarchy is defined explicitly using SC_MODULE constructs that clearly specify interfaces, along with clock and reset signals. In C++-based HLS, hierarchy and parallelism are modeled implicitly and later transformed into an explicit RTL structure, ensuring accurate, efficient hardware implementation.
-
Hierarchy Webinar Featured Content
-
Hierarchical Design Using a Pure C++ Class-Based Approach
Untimed C++ synthesis has required the use of C-style "wrapper" functions to instantiate a design block & define its interfaces. This coding style is acceptable but limits what can be done using a pure C++ class-based design style. Class-based hierarchal design using HLS allows users to define & interconnect modules using C++ classes & more.What you will learn:Creating design modules from a C++ classDefining the top-level hardware interfaceCreating a multi-block design using C++ classes -
Optimizing SystemC/C++ Hardware Architectures Through High-Level Synthesis
Hardware architecture has a huge impact on RTL "quality of results" when deploying High-Level Synthesis (HLS). We will cover how to code different hardware architectures in C++ or SystemC to achieve optimal results in the output RTL.What you will learn:Fundamental filter architectures and HLS coding styleWindowing for efficient image processingDelay line implementation with a single-port RAMz
-
-
Hierarchy Featured Content
-
Catapult HLS Design Analyzer Analyzes MultiBlock Designs
Catapult HLS' Design Analyzer is used to navigate and analyze multi-block designs in this video. -
Multi-block Concurrency for Highest Performance
In this video the design is recoded to further improve performance by coding each design class so that it will be synthesized as a separate concurrent process. -
Google develops WebM video decompression hardware IP using High-Level Synthesis
The WebM project defines an open file format designed for the distribution of compressed media content across the web. Google is a major contributor to the WebM project, having recently undertaken the design and development of the first hardware decoder IP for WebM, otherwise known as the VP9 G2 decoder.
-
-
Introduction
RTL is modular in nature, where there is a top-level module which contains a large number of sub-modules. The top module is synthesized to give the final gate level logic. The sub-modules are defined separately and then instantiated throughout the top-level module. In other words, there is a hierarchy with the top-level module being the 1st level of hierarchy, while the other sub-modules forming the inner hierarchy.
Similarly in High Level Synthesis written in C++, there is a top-level function which is synthesized to give the RTL logic. These functions will consist of function calls of sub-functions with the sub-functions being defined separately. The top-level function in the 1st level of hierarchy while the sub-functions form the inner hierarchies. The term “function” here can be C++ functions or can be classes as well. More on this can be seen below.
For SystemC, just like RTL we can define modules using the keyword “SC_MODULE”. Just like RTL, there consists of a “top” SC_MODULE which will be the 1st level of hierarchy followed by the sub level SC_MODULEs which form the inner hierarchies. In RTL the hierarchies are connected with the help of wires. For SystemC this is done with the help of Matchlib connections or sc_signal. In C++, ac_channel is used for streaming of data between hierarchies. Ac_channel behaves as a FIFO. Instead of ac_channel, a shared memory can be used for inter block communication. There should be some form of arbitration between the blocks and the shared memory to make sure no data is being lost or overwritten. The other alternate is to use a ping-pong memory for better parallelism of hierarchical blocks.
Hierarchy in C++:
Before we get into hierarchy let us look at how a top module is defined in C++ using both function and class.
How to define a top module in HLS?
Function:
- With function, the arguments in the function definition, form the interfaces of hardware code.
- The HLS tools can inference the bitwidth and direction of the port.
- The ports can be input, output or inout.
- Outputs need to be passed by reference to have the value modified.
- Different functions can be defined and called to form hierarchies.
E.g.
#pragma design top void top(int a, int b, int &c) { c = a +b; }
The corresponding RTL module would be:
module top ( clk, rst, a, b, c ); input clk; input rst; input [31:0] a; input [31:0] b; output [31:0] c;
This is a simple e.g. Of course, with a ready/valid handshake interface the corresponding a_rdy/b_rdy/c_rdy and a_vld/b_vld/c_vld would be present.
This single top function is the starting point which can be expanded to have multiple levels of hierarchies.
Classes:
- Some HLS tools support classes to create modules and thereby HLS designs.
- They are the best for hierarchies due to ease of use and modularity.
- A top-level pragma is defined at the beginning of the class.
- An interface pragma is defined for a public member function. The arguments inside the member function form the interfaces in hardware. Only one public member function is allowed as an interface.
- Each class can be defined separately, and an object can be called for forming hierarchies.
- Multiple objects of the same class can also be used to form hierarchies.
The same above code in a class would be written as:
#pragma design top class add { public: add(){} #pragma design interface void top(int a, int b, int &c) { c = a +b; } };
The corresponding RTL module for this is:
module add_top ( clk, rst, a, b ,c ); input clk; input rst; input [31:0] a; input [31:0] b; output [31:0] c; module add( clk, rst, a, b, c ); input clk; input rst; input [31:0] a; input [31:0] b; output [31:0] c; add_top add_top_inst ( .clk(clk), .rst(rst), .a (a), .b(b), .c(c) ); endmodule
The class name forms a wrapper around the function interface. It aids in interconnect declarations for component instantiations. The components are names as <class_name>_<interface_function_name> as seen below.
The top function and interfaces can be defined as pragmas or in the GUI. It varies according to the HLS tool.
Design Block:
- ‘din’ and ‘dout’ are streamed in using ac_channels in the above design. Other streaming interfaces such as axistream can be used as well.
- Run is the function interface name which is a clocked process.
- SimpleOneBlock is the name of the class which forms a single hierarchy.
Concurrent Processes:
- C++ designs containing “rolled“ sequential loops that are not automatically merged will have lower performance
- Loop merging will not happen when there is:
- Out of order array accesses between loops
- Complex control
- Non-deterministic data exchange between loops
E.g:
void BLOCK0(int din[3], int dout[3]){ WRITE:for(int i=0;i<3;i++){ dout[i] = din[i]; } } void BLOCK1(int din[3], int dout[3]){ READ:for(int i=2;i>=0;i--){ dout[i] = din[i]; } } void top(int din[3], int dout[3]){ int tmp[3]; BLOCK0(din,tmp); BLOCK1(tmp,dout);
This single design block can be extended further to multiple hierarchies. The hierarchy blocks help in providing some form of concurrency as seen in the diagram below.
Design Block:
- Here block0 and block1 are both clocked processes. These are function interfaces in C++ which are both present within a hierarchy called TwoBlockHierOneMod.
- They are connected using some form of streaming interface or memory. More on this in the next section.
- Class member functions mapped to design blocks can run concurrently. This is one level of hierarchy and an e.g. where hierarchy can be used to improve the performance of designs in HLS.
E.g.
class TwoBlockHierOneMod{ ac_channel<uint10> connect;//Interconnect channel uint10 acc0; uint20 acc1; #pragma hls_design void block0(ac_channel<uint4 > &din){ acc0 += din.read(); connect.write(acc0); } #pragma hls_design void block1(ac_channel<uint6 > &dout){ acc1 += connect.read(); dout.write(acc1); } public: TwoBlockHierOneMod(){ acc0 = 0;//Initialize in constructor acc1 = 0;//Initialize in constructor } #pragma hls_design interface void run (ac_channel<uint4 > &din, ac_channel<uint20 > &dout){ block0(din); block1(dout); }
Multi Block with Multi-Class (Another level of hierarchy)
- Previous design has one level of hierarchy with interfaces present as function within a single class connected through an interconnect like ac_channels.
- Additional level of hierarchy is added, by defining the functions inside separate classes as seen in the diagram below.
- More functions can be added inside each of these blocks. But only one can be an interface. The rest of them will be design blocks as seen in the previous design.
Source code:
class Block0{ uint10 acc; public: Block0(){ acc = 0;//Initialize in constructor } #pragma hls_design interface void run(ac_channel<uint4 > &din, ac_channel<uint10 > &dout){ acc += din.read(); dout.write(acc); } }; class Block1{ uint20 acc; public: Block1(){ acc = 0;//Initialize in constructor } #pragma hls_design interface void run (ac_channel<uint10 > &din, ac_channel<uint20 > &dout){ acc += din.read(); dout.write(acc); } class TwoBlockHierTwoMod{ ac_channel<uint6> connect;//Interconnect channel Block0 inst0;//Module instances Block1 inst1; public: TwoBlockHierTwoMod(){} #pragma hls_design interface void run(ac_channel<uint4 > &din, ac_channel<uint20 > &dout){ inst0.run(din,connect); inst1.run(connect,dout);
- ac_channel must be used to interconnect classes or class member functions mapped to design blocks
- Arrays mapped to memory must be passed through ac_channel for shared memories between design blocks
- Coding style must be followed
- Arrays mapped to memories on the top-level interface are allowed
- Cannot mix design blocks and glue logic
- Interconnect for classes or functions mapped to design blocks cannot be mixed with inlined C++ code
- Will get a compilation error
- There are two ways to share a memory between classes/member functions mapped to design blocks
- Using an ac_channel and required coding style
- Explicitly coding a separate design block that contains a memory
- The diagram below explains them.
Reasons to Use Hierarchy
- Allowing Blocks to Run in Parallel: The primary reason to add hierarchy is to pipeline the design so the blocks can run independently. A common example is a chain of digital filters that implement decimation or interpolation
- Reducing Synthesis Runtime on HLS tools
- Improving Regularity
What can be designed with this methodology?
- Sub-systems with bus or chip interfaces
- Point-to-point communication.
- Dataflow in a single direction through a channel.
Hierarchy in SystemC:
SystemC hierarchies are created with the help of SC_MODULE. Just like C++, they need to be interconnected with the help of some form of channel. This is done with the help of Matchlib connections.
A simple exampleof sc_module is seen here below. The Dut is designed inheriting the sc_module class.
#pragma hls_design top class flop : public sc_module { public: sc_in<bool> CCS_INIT_S1(clk); sc_in<bool> CCS_INIT_S1(rst_bar); sc_in<uint32_t> CCS_INIT_S1(in1); sc_out<uint32_t> CCS_INIT_S1(out1); SC_CTOR(flop) { SC_THREAD(process); sensitive << clk.pos(); async_reset_signal_is(rst_bar, false); } void process() { // this is the reset state: out1 = 0; wait(); // WAIT // this is the non-reset state: while (1) { out1 = in1.read(); wait(); // WAIT } } };
The corresponding RTL for this is:
module flop_process ( clk, rst_bar, in1, out1 ); input clk; input rst_bar; input [31:0] in1; output [31:0] out1; reg [31:0] out1; // Interconnect Declarations for Component Instantiations always @(posedge clk or negedge rst_bar) begin if ( ~ rst_bar ) begin out1 <= 32'b00000000000000000000000000000000; end else begin out1 <= in1; end end endmodule // ------------------------------------------------------------------ // Design Unit: flop // ------------------------------------------------------------------ module flop ( clk, rst_bar, in1, out1 ); input clk; input rst_bar; input [31:0] in1; output [31:0] out1; // Interconnect Declarations for Component Instantiations flop_process flop_process_inst ( .clk(clk), .rst_bar(rst_bar), .in1(in1), .out1(out1) ); endmodule
- Actual design is present in a module names <module_name>_<thread_name>. The top module is called <module_name>.
- For parallel access, multiple threads can be used.
- The threads are connected using Connections::Combinational , memories. They are synchronized using SyncChannel present between connections.
- Hierarchy is created using multiple sc_modules connected with the help of Connections library or sc_signal.
-
Conclusion
Hierarchy in HLS helps in parallel processing, better modularity and faster runtime of HLS tools. More complex requirements can be handled easier with the help of hierarchy. Hierarchy in C++ is achieved with the help of functions and most importantly classes. In SystemC, threads and sc_module are used. For communication channels, signals or memories models between hierarchies help in achieving the final design.