#### **RTL Synthesis**

- RTL = Register Transfer Level
- RTL code (Verilog, VHDL, or something else) completely specifies
  - all registers
  - Logic operations
  - Arithmetic operation
- Synthesis will convert these to meet some combination of an area + delay constraint
  - Boolean minimization techniques used to improve both area and speed
  - Different area, speed constraints will produce different gate level netlists, but will be functionally equivalent

BR 6/01

## Random Logic vs Arithmetic RTL

- General boolean minimization techniques work well for random logic to meet area/speed constraints
- Not true for Arithmetic operations (addition, multiplication, etc)
  - Design space is too large, and resulting netlists are usually sub-optimal when compared to structured netlists

BR 6/01

2

#### An Example

- What should be synthesized for 'y <= a + b' where y, a, b are 32 bit values?
- Many different adder structures to choose from:
  - Ripple carry slow, but area efficient
  - Carry select adder faster than ripple, but more gates
  - Carry Save adder fastest adder architecture for general logic gates, but requires lots of gates
- Need a methodology that the RTL synthesis tool can use to choose between various architectures for an arithmetic operation based on speed/area constraints

BR 6/01

3



 - I.E. for one technology a 10-bit ripple adder might be a faster than a 10-bit CLA, while in a different technology the opposite is true.

BR 6/01

# Example Technology Mapping

- In just about any ASIC technology ( ie standard cell or gate array), a 12-bit adder is faster done via a CLA structure than a ripple structure
- In the LUT4 (4-input lookup tables) FPGA technology from Xilinx and Altera, the opposite is true
  - Basic programmable cells can implement a two bit sum and has fast carry logic as part of the cell
  - The delay through a LUT4+programmable routing is much slower than the dedicated carry logic+routing between cells
  - This means that ripple chains are more effective than CLA structures for higher values of N than other technologies

BR 6/01



## Synopsys Design Ware

- *Design Compiler* is the basic RTL synthesis tool from Synopsys
- DesignWare components and libraries is the method by which a user can define custom implementations and technology mappings for arithmetic operations
- The DesignWare Foundation Basic library already has architectures that tradeoff area/speed for many arithmetic operations
  - These architectures (i.e. Ripple vs CLA) are based on generic logic gates and use timing information from the technology library plus area/time constraints to pick an architecture

7

BR 6/01



# Unsigned vs Signed types For addition, unsigned and signed addition uses the same hardware, so it does not matter which we use For other operations like multiplication, it makes a difference Signed means 2's complement representation Different hardware required for signed vs. unsigned multiply

BR 6/01







BR 6/01

#### Dware Cache

- When Design compiler builds a Design Ware component of a particular type, architecture and size (I.e, adder, ripple, 8 bits) this is cached so that next time will be faster
  - Caches both structure and timing information
  - Cache resides under ~/synopsys\_cache\_\* (exact directory name is version dependent.
- Choosing a particular architecture means that DC has to build architectures of different types to meet the word size, then evaluate each against area/time constraints

BR 6/01

13

#### Controlling the Architecture Choice

- Can directly control which architecture is used for a particular operator by using a *Synopsys pragma* to specify the architecture
  - a pragma is a control directive embedded in a comment
- This can be useful if a particular implementation is required as a starting point for optimization
  - NOTE: the chosen architecture is used a starting point it will still be modified according to synthesis options
  - I.e., you start with a Ripple adder, and are synthesizing for speed, synthesis transformations will modify it beyond recognition and will probably not be as good as one that started from a CLA structure

BR 6/01

14





| Component Instantiation                                                                                                                                                                                                                                                                                                            |         |                                                                                         |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|-----------------------------------------------------------------------------------------|
| <pre>ibbrary ieee.synopsys.pW01;<br/>use ieee.std_logic_l164.all;<br/>use ieee.std_logic_arith.all;<br/>use synopsys.attributes.all;<br/>use DW01.DW01_components.all;<br/>- instatiate DW01 component direct1</pre>                                                                                                               |         | Can also instantiate a<br>component directly<br>instead of using operator<br>inference. |
| <pre>entity adder is<br/>generic( N : integer := 16 );<br/>port ( a,b: in std_logic_vector(N-1 downto 0);<br/>sum: out std_logic_vector(N-1 downto 0) );<br/>end adder;<br/>architecture a of adder is<br/>attribute implementation:STRING;<br/>but is in straight in the integration of U = 10 in N=100 in the integration.</pre> |         |                                                                                         |
| <pre>signal 10: std_logic;<br/>begin<br/>10 &lt;= '0';<br/>U1: DW01_add genetic map (width =&gt; N)<br/>port map (CI =&gt;10, A =&gt;a, B=&gt;b, SUM =&gt; sum,<br/>CO =&gt; open);</pre>                                                                                                                                          |         |                                                                                         |
| ena a;                                                                                                                                                                                                                                                                                                                             | BR 6/01 | 17                                                                                      |



#### Tutorial on Creating Dware libraries

- The synopsys software resides at /opt/ecad/synopsy/default (call this \$synopsys)
- *\$synopsys/doc/online/dw/dwdg/dwdg\_2.pdf* contains a tutorial on creating a custom Dware library
- Creates an adder that has an ov output (overflow output) and has two architectures – ripple ("rpl") and ("cla").
  - I will not attempt to repeat this entire tutorial here, just hit the highpoints
  - You will need to read this tutorial in order to complete the next assignment

BR 6/01

19











#### Carry Select comments

- Critical path is still the carry
- The goal is to match the delay along the carry path to the final select on the sum mux to delay of the rpl adder
  - Can increase the ripple size at each stage because the carry delay to the mux select gets longer
  - Exact choice of sizes for each stage depends on gate delays
- In your implementation, choose your own stage sizes
   CANNOT make them all them size you must choose some scheme for gradual increase
  - You know that N will be a maximum of 32, so just pick some progression of sizes (like 4-5-7-7-9 or 4-5-7-8-8 or whatever).

BR 6/01

25

27

#### Archive dware.zip

Unpacks a directory *dware\_tut.students* . Important files:

• gcmos.lib, gcmos.db GCMOS library

• DWSL\_addov.vhd, DWSL\_addov\_cla.vhd, DWSL\_addov\_rpl.vhd, DWSL\_addov\_csel.vhd - you must modify the 'csel' architecture.

• *analyze\_dwsl.script* - pass this script to *dc\_shell* to compile all of the DWSL\*.vhd files. Must execute this after any changes to DWSL\*.vhd files

• *rtl/{adder\_cla.vhd, adder\_rpl.vhd, adder\_csel.vhd}* -- VHDL files that have manual component instantiations for addov the three respective architectures.

BR 6/01

#### 26

#### Archive dware.zip

- *adder\_sample.script* a sample *dc\_shell* script for synthesizing one of the 'rtl/adder\*.vhd' designs for a particular bit width. Modify this script or use the Perl script below. The synthesized design is written to the 'gate/' directory.
- make\_design.pl a perl script to assist in generating designs for different architectures, N values instead of writing a separate dc\_shell script for each case. A sample run is:

make\_design.pl adder.template %arch%=rpl %dly%=0 %N%=16

will substitute the values shown for the corresponding strings in the adder.template file to create a new *dc\_shell* script.

If you don't feel comfortable using this script, simply write your own dc\_shell script for each case you want to test.

BR 6/01



Makefile.dw test is included in this directory

BR 6/01

#### 28

## Testbench in *dw\_test*

- Files *tb16.vhd*, *tb28.vhd* are two testbenches for testing 16-bit and 28-bit adder implementations
  - Generates 100 pairs of random numbers, does sum using *addov* component, prints result
  - Configurations are included in each testbench file for the three gate level architectures
- After generating a gate level implementation (ie., *dware\_tut/gate/adder16\_rpl.vhd*)
  - Copy to *dw\_test* directory, edit file to remove the entity declaration for adder
  - Make sure the architecture name, file name for gate level architecture matches was is expected by the configurations in the testbench files, and also the makefile.

BR 6/01

29

#### Approach

- Read through the DW tutorial referenced previously
  - You do not have to make any modifications to the files as mentioned in the tutorial, I have already made the changes and converted them to use the GCMOS library
- Try generating a couple of different sized adders for "rpl", "cla" architectures
- Make sure you can simulate these using the *dw\_test* library (you might even want to write a testbench for a different sized adder like N=20 to ensure that you understand the files).

BR 6/01

## Approach (cont.)

- Look at the code in *DWSL\_addov\_rpl.vhd*, *DWSL\_addov\_cla.vhd* to understand the GENERATE approach for creating the adder structure
  - Look at the PLD model we covered for more examples of GENERATE statements
- Fill in the architecture of *DWSL\_addov\_csel.vhd* to create a parameterized carry-select adder
  - Should use the fa (full adder), mux2tol cells primarily
     The full adder (â) cell function in gcmos.lib is not specified so that Synopsys will treat it as a black box – this meant the gate level structure will not be modified via synthesis constraints – easier to debug.
- Generate designs of size N=16, N=28 and test via the *dw\_test* testbench your adder should generate the same results as the other architectures.

BR 6/01

## How to get help in Synopsys

Within *dc\_shell*, can do "man *command\_name*" to bring up a man page on that command.

Extensive PDF documents at \$synopsys/doc/online

synth/ directory contains all documents for synthesis tools. Synt/dcrm has dc\_shell reference manual.

Synth/dcug has  $dc_shell$  user guide. Both of these are good places to look for answers to questions about  $dc_shell$ .

*dw/dwug* has user guide for Design Ware (basic concepts, usage examples).

dw/dwdg has notes for creating custom libraries including tutorial.

BR 6/01

32

### Before you ask questions

- · Have you looked at all of the files/examples ?
  - Have you looked inside the files and attempted to understand the particular VHDL or *dc\_shell* commands being used?
  - Have you looked at the input files required by the script and output files produced by it?
- Have you looked at the Synopsys PDF documentation?
- Have you used the 'man' facility in dc\_shell?

BR 6/01

33

31