iFlow
1. Introduction to the Backend Design Flow
1.1 Chip Design Flow (From Function Definition to Chip Return)
The basic chip design process is shown in the following figure:
Digital IC backend design process: (From the post-synthesis netlist to GDS)
1.2 Logic Synthesis (Synthesis)
1.2.1 Logic Foundation
Map the RTL code from the front-end to a specific technology library, add constraint information, and perform logical optimization on the RTL code to generate a gate-level netlist.
Synthesis tool: yosys
Synthesis process: Translation + Optimization + Mapping
- Translation: yosys uses its internal IP library to perform structural and logical optimizations on the RTL code and generate a netlist in GTECH format. (Independent of the technology library)
- Optimization: Perform structural optimization of the cells based on constraint information (timing, area, power consumption constraints).
- Mapping: Map the cells into the corresponding gate-level circuits in the technology library.
Basic strategies of synthesis:
- Top-down: Take the top-level module as the current design module and complete the synthesis of the entire design at once.
- Advantage: Good optimization effect for medium-scale designs, and no additional processing is required for module boundaries.
- Disadvantage: Slow synthesis speed for very large-scale designs and may even fail to converge.
- Bottom-up: Synthesize the bottom-level modules first, and then the top-level module calls the sub-modules generated by the synthesis to complete the entire synthesis process.
- Advantage: Reduces the memory requirement and is suitable for very large-scale designs.
- Disadvantage: Additional processing is required for module boundaries.
1.2.2 Input Files
- RTL code (including Verilog, VHDL).
- Library files (.lib files), containing information of all standard cells and macro cells:
- Cell information: Function, area, power consumption, etc.
- Wire load model: Resistance, capacitance.
- Working environment: Technology, voltage, temperature.
- Constraint rules involved: Maximum and minimum capacitance, maximum and minimum transition time, maximum and minimum fanout.
- Constraint files (.sdc files), containing all timing constraints in the design: PVT (select worst case), Input drives (driving capacity), Transition times (transition time), Capacitive output loads (driving capacitive load), internal parasitic RC (wire load model):
- Environmental conditions PVT (process, voltage, temperature): The influence of the surrounding environment such as technology, voltage, and temperature on device delay. (fast, typical, slow) The higher the temperature, the slower the speed; the higher the voltage, the faster the speed.
> Path1: Input port to register
Assume the external input circuit delay is 4ns, and the clock cycle $T_{clk}$ is 10ns, then the maximum delay from the input end to the register (internal logic) is 10 - 4 - $T_{setup}$ (ns), where $T_{setup}$ is the setup time.
> Path2: Register to register
The delay should satisfy $T_{comp}$ < $T_{clk}$ - $T_{ck2q}$ - $T_{setup}$, where $T_{comp}$ is the combinational logic delay, and $T_{ck2q}$ is the delay from the CK end to the Q end of the register.
> Path3: Register to output port
Assume the external output path delay is 4ns, the clock cycle $T_{clk}$ is 10ns, and the maximum internal logic delay is 10 - 4 - $T_{ck2q}$.
> Path4: Input port to output port
Combinational logic delay: $T_{clk}$ - $T_{input\_delay}$ - $T_{output\_delay}$
1.3 Formal Verification (Formal Verification)
1.3.1 Formal Verification
Compare two designs through logical abstraction to ensure consistent functionality (only compare logic, not check timing). As shown in Figure 9, formal verification includes RTL vs netlist (post-synthesis), netlist (post-synthesis) vs netlist (post-PR). Since the yosys synthesis lacks the svf file required by the commercial tool formality, the comparison of RTL vs netlist (post-synthesis) cannot be performed.
1.3.2 Comparison Principle
- Divide the design into multiple combinations of Logic Cone and Compare Point. (Logic Cone: A conical logic where a group of inputs finally converge to a comparison point, which can be the output of a register, the input of a port, or the output of a black box. Compare Point: Comparison point, including the input of a register, the output of a port, and the input of a black box)
1.3.3 Reasons Leading to Unmatch
Table 1 Reasons and Solutions for Unmatch in Formal Verification
Performance | Possible Reasons | Solutions |
---|---|---|
The number of unmatched points in ref and imp is different | The design has been renamed | - Manually set user match - Turn on the signature analysis option |
The number of unmatched in ref is more than that in imp | Redundant registers have been logically optimized during synthesis | No special processing is required |
Some missing cells have generated Black boxes | Read in the missing cells | |
The number of unmatched in ref is less than that in imp | Extra logic has been generated during synthesis | Check the logical mapping |
1.4 Placement and Routing
1.4.1 Placement and Routing
Placement and routing is the process of converting the circuit netlist into a physical layout. The design process is shown in Figure 13:
1.4.2 Init
Input data:
- Gate-level netlist after synthesis or DFT.
- Physical library: techlef and cell lef.
- Timing library:.lib, and.db is also used in commercial tools.
1.4.3 Floorplan
- Area Planning
- Die area: The area occupied by the entire layout.
- Core area: The area available for placing cells.
- Standard cell utilization = Total area of standard cells / (Core area - Area of macro cells), the initial empirical value is between 70% - 80%. Due to the immaturity of the placement and routing functions of open-source EDA tools, too high utilization may affect routing and can be solved by reducing the utilization rate.
Planning of Macro Cell Placement Positions
Issues to be considered: Optimal timing (iterative), no routing congestion (iterative), power supply feasibility, narrow channels caused by macro cell placement, and the Port positions of macro cells.
The narrow channels reserved for macro cell placement can be used to place standard cells on the one hand and facilitate the routing of macro cell Ports and reduce congestion on the other hand.
Planning of Port Placement Positions
Generally, they are grouped and placed according to the Port functions and signal directions.
Power Supply Planning
As shown in Figure 15, the odd-numbered layers of the power supply lines are used for horizontal routing, and the even-numbered layers are used for vertical routing. TM1 and TM2 are used to design the main power supply network, M2-M8 are used for the secondary power supply network, and M1 is the power supply network of the standard cell library.
Power supply capacity meets the requirements:
In the formulas, is the total power consumption of the entire design, is the total power consumption of the standard cells, is the total power consumption of the macro cells, and is the supply voltage.
Factors to be considered in power supply planning:
- Routing resources: Metal layers available for implementing the power supply network and the maximum power supply capacity.
- Power supply requirements: The maximum current demand under a given voltage.
- Component power PINs: It is necessary to understand the PINs of VDD and VSS of macro cells and standard cells and their approximate connection methods to the power supply network.
- Narrow channels: Special attention needs to be paid to the power supply of standard cells in narrow channels.
1.4.4 CTS Clock Tree Synthesis (Clock Tree Synthesis)
Clock tree synthesis ensures that the clock buffer/inverter tree from the Clock's root point to each sink point is grown, and the time deviation (skew) of the clock signal reaching the clock terminals of each register is as small as possible.
As shown in Figure 16, before clock tree synthesis, a clock source is fanned out to the clock terminals of many registers. After clock tree synthesis, a clock tree is composed of multiple levels of buffers.
Clock Source
External crystal oscillator + internal clock generator + high-frequency clock generated by internal PLL + various frequencies of clocks generated by internal frequency division.
First, generate a certain frequency clock (such as 25 MHz) from the crystal oscillator or clock generator, then generate a frequency-multiplied clock (high-frequency clock) through the PLL, and finally generate various frequency clocks through the frequency divider and send them to each functional module.Number of Phase-Locked Loops (PLL)
PLL occupies a large area, so the number of PLLs should be as small as possible. First, count the clock frequency requirements of each functional module, design the frequency divider, and finally calculate the number of PLLs.
Location of PLL
The location of the PLL determines the length of the clock tree (Clock Tree Latency). It is necessary to clarify the multiplexing relationship of each clock, which modules the PLL frequency-multiplied clock supplies and the locations of these modules.
Clock Constraints
- The first part is crystal oscillator -> PLL
- The second part is PLL -> clock gen module (generating divided clock signals)
- The third part is the output of the frequency divider -> each functional module
CTS Steps
- Grow the clock tree
- Optimize the clock tree and timing
- Route the clock tree
- Manually adjust the clock tree
- View the clock tree report and repeat the previous four processes
1.4.5 Route
- Track: Yellow and blue dashed lines, without width. Routing based on the grid requires all metal traces to be on the track.
- Pitch: The distance between two tracks.
- Trace: The actual metal trace on the track, with width.
- Grid point: The intersection of two tracks.
- The height and width of the standard cell are integer multiples of the pitch, and the pins of the standard cell are placed on the grid points during placement.
Steps of Route:
Global routing (Global Routing)
Global routing is to plan the routing paths, determine the general position and direction, and does not make actual connections.
Track assignment (Track Assignment)
Assign each wire to a track and perform actual routing for the connection. When routing, try to make the metal longer and reduce the number of vias. This stage does not perform DRC design rule checks.
Detail Routing (Detailed Routing)
Use the paths generated during global routing and track assignment to route and drill vias. Since track assignment only considers taking long lines as much as possible, many DRC violations will occur. During detailed routing, fixed-size sboxes are used to fix violations. Sboxes are small grids evenly divided in the entire layout. Violations within the small grids will be fixed, but DRC violations at the boundaries of the small grids cannot be fixed, which needs to be completed in the next step.
Search and repair
Repair DRC violations that have not been completely eliminated in detailed routing. In this step, gradually increase the size of the sbox to find and repair DRC violations.
Note: The clock tree routing has the highest priority.
1.4.6 Insert fillers
Connect the N-wells of each row of standard cells to improve the stability of the power supply network.
Insert redundant vias: Replace single vias with double vias as much as possible to improve the yield.
1.4.7 Export Files
Export the layout gds file and the Verilog gate-level netlist for use in subsequent processes.
1.5 Static Timing Analysis (Static Prime Analysis, STA)
Static timing analysis is a method of verifying the timing validity of a circuit by checking the timing information of all paths. Its principle is shown in Figure 18.
- Divide the design into several paths
- Calculate the delay of each path separately
- Check whether the delay of each path meets the requirements
1.5.1 Setup Time and Hold Time
Setup Time
The time during which the data must remain stable before the rising edge of the clock.
The arrival time of the data at the D terminal of UFF1:
= + +
The longest time allowed to meet the setup:
= + -
= - > 0, that is, + - - - - > 0
Let - = , and after arrangement:
+ > + +
Methods to fix timing violations:
- Increase : Decrease the frequency.
- Decrease : Optimize combinational logic, divide the pipeline, and reduce the load on the critical path.
- Decrease : Replace with a faster timing logic unit, such as HVT->LVT.
Hold Time
The time during which the data must remain stable after the rising edge of the clock.
The arrival time of the data at the D terminal of DFF1:
= + +
The longest time allowed to meet the hold:
= +
= - > 0, that is, + + - - > 0
Let - = , and after arrangement:
+ < +
Methods to fix timing violations:
- Increase : Increase the combinational path delay and insert buffers.
- Decrease : Even use a negative skew.
1.5.2 Input Files
db file: Consistent with the db file of synthesis, and libraries under multiple corners such as ss and ff are required
Gate-level netlist
Constraint file.db
Back-annotation files: sdf, spef
SDF (Standard Delay Format): Standard delay format, describes the timing information in the design, indicates the delay between module pins and pins, the delay from the clock to the data, and the internal connection delay. The sdf file can be directly used for post-simulation of the circuit.
SPEF (Standard Parasitic Exchange Format): Standard parasitic exchange format, the RC value information extracted from the netlist, a file format for transferring RC information between the extraction tool and the timing verification tool. SPEF provides RC information, and the delay calculation is relatively more accurate.
SDF file back-annotation includes cell delay and wire delay, and parasitic SPEF back-annotation describes RC parameters. SDF back-annotation runs faster than SPEF back-annotation.
2. Introduction to the Open Source EDA Process iFlow
2.1 Build iFlow
System environment: iFlow is supported for use under Ubuntu 20.04, and versions lower than 20.04 are not recommended.
Install dependent tools and libraries:
Tools
- build-essential 12.8
- cmake 3.16.3
- clang 10.0
- bison 3.5.1
- flex 2.6.4
- swig 4.0
- klayout 0.26
Library
- libeigen3-dev 3.3.7-2
- libbo