An Introduction to the Concepts of Timing and Delays in Verilog

An Introduction to the Concepts of Timing and Delays in Verilog

The concepts of timing and delays within circuit simulations are very important because they allow a degree of realism to be incorporated into the modelling process. In Verilog, without explicit specification of such constraints, the outputs of pre-defined primitives and user-defined modules are all assumed to resolve instantaneously (or at least, within one simulator timestep). This, clearly, is not enough for a designer to work with, especially since the time taken for changes to propagate through a module may lead to race conditions in other modules. Some designs, such as high speed microprocessors, may have very tight timing requirements that must be met. Failure to meet these constraints may result in the design failing to work at all, or possibly even producing invalid outputs. Thus, while the obvious aim of the designer may be to produce a circuit that functions correctly, it is equally important that the circuit also conforms to any timing constraints required of it.

This page aims to provide an introduction to these concepts, and in particular, to the Verilog conventions and techniques for dealing with delays and timing.

Delays

Delays can be modelled in a variety of ways, depending on the overall design approach that has been adopted. These correspond neatly to the different levels of modelling that have already been introduced, namely gate-level modelling, dataflow modelling and behavioural modelling.

Gate level modelling

At this level, the delays to be considered are propagation delay through the gate, and the time taken for the output to actually change state. These changes of state are grouped into four categories based on the transition occurring. Each category of change of state has an associated delay, three of which can be specified by the designer, the fourth being computed from the other three. The three delays which can be specified, and the transitions for which they are relevant, are detailed below :

Table 1 : Transition Delays
Rise Delay 0, x, z -> 1
Fall Delay 1, x, z -> 0
Turn-Off Delay 0, 1, x -> z

0, 1, x and z take their usual meanings of logic low, logic high, unknown and high impedance. (Reminder : High impedance means that the net is not directly being driven by anything and so is floating. Thus it has neither a high nor a low logic value.) Any or all of these delays can be specified for each gate by use of the delay token `#'. If only one value is specified, it is used for all three delays. If two are given, they are used for the rise- and fall-delays respectively. The turn-off delay (the time taken for the output to go to a high impedance state) is taken to be the minimum of these values. Alternatively, all three values can be explicitly set. The use of delays is illustrated below for the 2-input multiplexor given in an earlier example.

module multiplexor_2_to_1(out, cnt, a, b);

   /*
    *  A 2-1 1-bit multiplexor
    */
   
   output out;
   input  cnt, a, b;
   wire	  not_cnt, a0_out, a1_out;
   
   not # 2    n0(not_cnt, cnt);        /* Rise=2, Fall=2, Turn-Off=2 */
   and #(2,3) a0(a0_out, a, not_cnt);  /* Rise=2, Fall=3, Turn-Off=2 */
   and #(2,3) a1(a1_out, b, cnt);  
   or  #(3,2) o0(out, a0_out, a1_out); /* Rise=3, Fall=2, Turn-Off=2 */
   
endmodule /* multiplexor_2_to_1 */
    

(Since none of the gates used above are tri-state devices, the value for the Turn-Off delay should not be specified, and the internally calculated value for this delay will never be used in such gates.)

The fourth category of transitions is for a change of state to an unknown value (i.e. 0, 1, z -> x), and its delay value is taken to be the minimum of the above three.

Dataflow modelling

As dataflow modelling does not use the concept of gates, but instead has the concept of signals or values, the approach taken to allow modelling of delays is slightly different. The delays are instead associated with the net (e.g. a wire) along which the value is transmitted. Since values can be assigned to a net in a number of ways, there are corresponding methods of specifying the appropriate delays.

Net Declaration Delay
The delay to be attributed to a net can be associated when the net is declared. Thereafter any changes of the signals being assigned to the net will only be propagated after the specified delay.

e.g. wire #10 out; assign out = in1 & in2;

If either of the values of in1 or in2 should happen to change before the assigment to out has taken place, then the assignment will not be carried out, as input pulses shorter than the specified delay are filtered out. This is known as inertial delay.

Regular Assignment Delay
This is used to introduce a delay onto a net that has already been declared.

e.g. wire out; assign #10 out = in1 & in2;

This has a similar effect to the code above, computing the value of in1 & in2 at the time that the assign statement is executed, and then storing that value for the specified delay (in this case 10 time units), before assigning it to the net out.

Implicit Continuous Assigment
Since a net can be implicitly assigned a value at its declaration, it is possible to introduce a delay then, before that assignment takes place.

e.g. wire #10 out = in1 & in2;

It should be easy to see that this is effectively a combination of the above two types of delay, rolled into one.

Behavioural modelling

At this level of abstraction, the circuit is modelled by assigning values to variables, some of which correspond to the the inputs and outputs of the module in question. Again, there are number of different types of delay associated with this style of programming :

Regular Delay Control
This is the most common delay used - sometimes also referred to as inter-assignment delay control.

e.g. #10 q = x + y;

It simply waits for the appropriate number of timesteps before executing the command.

Intra-Assignment Delay Control
With this kind of delay, the value of x + y is stored at the time that the assignment is executed, but this value is not assigned to q until after the delay period, regardless of whether or not x or y have changed during that time.

e.g. q = #10 x + y;

This is similar to the delays used in dataflow modelling.

Examples using delays

Given the earlier information on delays, it is now time to look at some designs that incorporate delays, and examine the effect that they have on their outputs.

The Full-Adder

The design below is for a full-adder, written using gate-level modelling techniques. (Note : The generate and propagate signals, G and P from the diagram, are not given as outputs here. However, some designs which attempt to improve on the overall data rate may make use of them, thus requiring them to be added to the list of module outputs - see the carry skip adder later on.) The code given specifies some of the delays described above - the first of the two graphs shows the output of an identical circuit but without any delays, while the second shows the actual output from the code below. [View full source code : Delays / No delays]

module full_adder(sum_out, carry_out, a, b, carry_in);
   
   /*
    * A gate-level model of a 1-bit full-adder
    */
   
   output carry_out, sum_out;
   input  carry_in, a, b;
   wire	  one_high, generate, propagate;
   
   xor #(3,2) x0(one_high, a, b);
   xor #(3,2) x1(sum_out, one_high, carry_in);
   and #(2,4) a0(generate, a, b);
   and #(2,4) a1(propagate, one_high, carry_in);
   or  #(3)   o0(carry_out, generate, propagate);
   
endmodule /* full_adder */
    

Note : The clk signal in the graphs below is not required for the operation of the circuit, and is provided purely to illustrate the delay in the output signals.

Full Adder : Output - no delays

Full Adder : Output - no delays

Full Adder : Output - delays

Full Adder : Output - delays as specified

As can be seen from the output graphs, the effect of the delays on the timing of the outputs can be quite significant, possibly even resulting in the correct output for the sum not being available until after the inputs have changed again. This could lead to race conditions or worse, so the rate at which the inputs are allowed to change must be controlled. (See the section on setup and hold times.) Since this is usually governed by a clock, the clock period chosen must be longer than the maximum delay time between the inputs changing and the outputs settling - but this may differ depending on the actual inputs. For example, with the rise and fall delays for all of the gates given as above in the code for the full_adder, the output of logic high on the carry_out line will take between 5 and 8 time units to appear, if the module had previously been outputting a logic low. However, the transition to a logic low from a logic high, for the same output, will take between 7 and 9 time units.

Exercise 1 : Calculate the best and worst delays for both rising and falling transitions on the sum output. Answers

The timing constraints imposed upon each full adder must allow for the worst case of each of these transitions, so the inputs must stay constant for at least a period of 9 time units.

The Ripple Carry Adder

When these adders are combined, as in the 4-bit ripple carry adder below, the delays become cumulative, since the maximum delay for each carry_out to ripple (propagate) to the next unit must be allowed for in the overall design.

Exercise 2 : What delay is required before ALL of the outputs of a 4-bit ripple carry adder can be guaranteed to have settled? Answers

The code below uses the full_adder module defined earlier. The graphs show sample sections of the output signals, which illustrate the differences between a circuit using full_adders with no delays, and one using full_adders with delays as specified earlier. [View full source code : Delays / No delays]

module ripple_carry_4_bit(sum_out, carry_out, a, b, carry_in);

   /*
    *  A gate-level model of a 4-bit ripple carry adder
    */

   output [3:0]	sum_out;
   output	carry_out;
   input [3:0]	a, b;
   input	carry_in;
   wire [2:0]	ripple;
		
   full_adder f_a0(sum_out[0], ripple[0], a[0], b[0], carry_in);
   full_adder f_a1(sum_out[1], ripple[1], a[1], b[1], ripple[0]);
   full_adder f_a2(sum_out[2], ripple[2], a[2], b[2], ripple[1]);
   full_adder f_a3(sum_out[3], carry_out, a[3], b[3], ripple[2]);

endmodule /* ripple_carry_4_bit */
    

Note : The clk signal in the graphs below is not required for the operation of the circuit, and is provided purely to illustrate the delay in the output signals.

4-bit Ripple Carry Adder : Output - no delays

4-bit Ripple Carry Adder : Output - no delays

4-bit Ripple Carry Adder : Output - delays

4-bit Ripple Carry Adder : Output - delays as specified

The delay required could be determined from the output graphs, if the worst case input vectors were used. The worst case input vectors are the ones that generate the longest overall delay though the design. For many complex designs, there may be no easy way of determining these vectors, but for the adder used in this example, it can be seen that the worst case vectors will be the ones that cause each full_adder module to propagate a carry from one stage to the next, as this has the longest critical path through the module.

Exercise 3 : Work out the worst case input vectors (i.e. a, b and carry_in) for the 4-bit ripple carry adder. Answers

Knowing the worst case vectors allows tests to be run to confirm the minimum period for which the inputs must be stationary. This is important as it determines the maximum data rate through that part of the circuit - often a crucial consideration in many modern designs. Such an analysis may result in an alternative solution, with a higher data rate, being required.

The Carry Skip Adder

The carry skip adder offers a significant speed improvement over the ripple carry adder, if the propagate signals from the individual full-adders are available. Combining these (using and gates) allows a propagate signal for the block to be generated. This extra signal means that in some cases, blocks will not need to wait for an earlier carry to ripple all the way through each of the earlier blocks. Rather, if it can be determined that a particular block (e.g. bits 4 to 7) will propagate any carry into that block, and the carry_in is already known, then that carry can skip around the block, and be passed into the next block (i.e. bits 8 to 11). This gives a considerable saving in time as the carry signal need now only pass through two gates - the AND and the OR - rather than the eight it would otherwise have to negotiate in the ripple_carry_4_bit module. For this to work, however, it is necessary to be able to set the carry_in of each of the blocks to LOW each time any of the inputs a or b are changed.

Exercise 4 : What happens if this is not done?
(Hint : Look at what happens when a block does not generate a carry.)
Answers

Exercise 5 : How could this be overcome?
(Hint : Consider changing the combinational logic between blocks.)
Answers

There are many other improved adder designs that are even faster than this, but they are beyond the scope of these examples.

Blocking and Non-blocking Assignments

As has just been seen, the two main types of delay used in behavioural model code, are regular delays and intra-assigment delays. Although the differences in their actions may not be immediately obvious, they are perhaps best illustrated by the use of blocking and non-blocking assignments. Regular delays are most often used with blocking assignments, and intra-assignment delays are most often used with non-blocking assignments.
Blocking Assignments
Blocking assignments are the most basic of the assignment operations, and simply copy the value of the expression at the right hand side of the = operator to the variable on the left hand side. However, if two assignments that depend on each other are scheduled at the same time, e.g. an attempt to swap two variables, such as :

always @(posedge clk) a = b;

always @(posedge clk) b = a;

then a race condition occurs, and both a and b will end up with one of the values. The value that they are both left with will depend on which of the assignments was scheduled first.

Non-blocking Assignments
Non-blocking assignments eliminate the possibility of race conditions in situations like this, as at the time that the assignment operation is executed the expression on the right hand side of the <= operator is copied to an internal temporary variable, which is then copied to the variable on the left hand side. All of the `reads' for a particular timestep are carried out before any of the `writes', and so values can be safely swapped as below :

always @(posedge clk) a <= b;

always @(posedge clk) b <= a;

This time, the code has the intended effect.

Without any explicit delays, all assignments take place in the same simulator time step, but this does not mean that they all execute simultaneously. The order of their execution is still important.

Below are four separate modules. Each uses a different combination of assignment type and delay type.

Exercise 6 : Look at the code below and, for each of the different modules, write out the time and the values of all of the registers, each time any of them changes value. Answers

Note : In the following examples, the event queuing system is assumed to be stack-based, with later events being pushed onto the end of the stack, but read from the front. However, the implementation of the queuing system is not specified in the Verilog language specification, so this need not necessarily be the case. Hence, the order in which events scheduled for the same time step in separate blocks will occur is non-deterministic (i.e. cannot be predicted) and will depend on the particular implementation of the queuing system for the specific version of Verilog that you are running.
(On our system, the stack-based system appears to be used.)

module blocking;

   reg[7:0] a, b, c, d, e;

   
   initial begin
      $monitor($time, " :\ta = %d\t", a,
	       "b = %d\tc = %d\t", b, c,
	       "d = %d\te = %d", d, e);
      #50 $finish;
   end

   initial begin
         a = 2;
         b = 5;
      #1 a = c;
      #1 a = d;
      #2 a = 4;
      #2 a = 7;
         b = 6;
      #2 a = d;
      $display("a, b - done");
   end

   initial begin
         c = 1;
         d = c;
         e = a;
      #2 e = d;
         c = 0;      
         d = 3;
      #5 c = a;
         d = 1;
         d = 2;
      $display("c, d, e - done");
   end
    
endmodule /* blocking */
 	    
module non_blocking;

   reg[7:0] a, b, c, d, e;

   
   initial begin
      $monitor($time, " :\ta = %d\t", a,
	       "b = %d\tc = %d\t", b, c,
	       "d = %d\te = %d", d, e);
      #50 $finish;
   end

   initial begin
         a <= 2;
         b <= 5;
      #1 a <= c;
      #1 a <= d;
      #2 a <= 4;
      #2 a <= 7;
         b <= 6;
      #2 a <= d;
      $display("a, b - done");
   end

   initial begin
         c <= 1;
         d <= c;
         e <= a;
      #2 e <= d;
         c <= 0;      
         d <= 3;
      #5 c <= a;
         d <= 1;
         d <= 2;
      $display("c, d, e - done");
   end
    
endmodule /* non_blocking */
 	    
module blocking_intra;

   reg[7:0] a, b, c, d, e;

   
   initial begin
      $monitor($time, " :\ta = %d\t", a,
	       "b = %d\tc = %d\t", b, c,
	       "d = %d\te = %d", d, e);
      #50 $finish;
   end

   initial begin
      a = 2;
      b = 5;
      a = #1 c;
      a = #1 d;
      a = #2 4;
      a = #2 7;
      b = 6;
      a = #2 d;
      $display("a, b - done");
   end

   initial begin
      c = 1;
      d = c;
      e = a;
      e = #2 d;
      c = 0;      
      d = 3;
      c = #5 a;
      d = 1;
      d = 2;
      $display("c, d, e - done");
   end
    
endmodule /* blocking_intra */
 	    
module non_blocking_intra;

   reg[7:0] a, b, c, d, e;

   
   initial begin
      $monitor($time, " :\ta = %d\t", a,
	       "b = %d\tc = %d\t", b, c,
	       "d = %d\te = %d", d, e);
      #50 $finish;
   end

   initial begin
      a <= 2;
      b <= 5;
      a <= #1 c;
      a <= #1 d;
      a <= #2 4;
      a <= #2 7;
      b <= 6;
      a <= #2 d;
      $display("a, b - done");
   end

   initial begin
      c <= 1;
      d <= c;
      e <= a;
      e <= #2 d;
      c <= 0;      
      d <= 3;
      c <= #5 a;
      d <= 1;
      d <= 2;
      $display("c, d, e - done");
   end
    
endmodule /* non_blocking_intra */
 	    

Delay Models

When modelling circuit delays, there are a number of options available to the modeller in terms of how to deal with attributing the delays around the circuit model. The three most commonly used techniques are distributed delay, lumped delay and pin-to-pin delay.

Distributed Delay
The distributed delay method requires delays to be assigned to every element of the circuit - then the delay between any two points can be calculated by adding together the delays of the components through which the signal being monitored passes.

Lumped Delay
This is similar to the distributed delay approach, except that it is only modules (rather than their component parts) that are assigned delays. Normally, the delay assigned to the module is the longest path through it, to ensure that the model reflects the worst case performance.

Pin-to-pin Delay
(This technique is sometimes also referred to as the path delay method.) Delays are specified for each input to output pin pairing, rather than being associated with specific elements. This can be advantageous as it means that details of the internals of the module need not be known for the analysis to be carried out.

The behavioural modelling techniques mentioned earlier allow for the distributed delay and the lumped delay methods to be implemented without any further special commands. However, in order to use the pin-to-pin method, some way to specify the timings to use is required.

Pin to Pin Timing Specifications

Verilog provides a set of commands for just this purpose. These timing-related commands can only be used within a block delimited by the keywords specify and endspecify, which appears within a module definition in the same way that behavioural modelling code does in an initial begin...end or an always...begin..end block. The specify blocks allow the timing for single- or multi-bit path delays to be configured, and also provide a convenient notation for simplifying any changes that may need to be made to a particular timing delay.



specify
  (a => out) = 9;
  (b => out) = 7;
endspecify
	    
Parallel Connections

The => notation can only be used when the source and destination ports, a and out respectively in this case, are of the same (bit-)width. Hence a and out could both be single- or multi-bit vectors. (e.g. reg a, out; or reg [3:0] a, out;)


specify
  (a *> out) = 9;
endspecify
	    
Full Connections

The *> notation may be used when every bit of the source port is to be associated with every bit of the destination port. The two ports need not be the same width. (e.g. reg [3:0] a; reg [7:0] out;)


specify
  specparam a_to_out = 9;

  (a => out) = a_to_out;
endspecify
	    
"specparam" Statements

specparam statements are local definitions (i.e. local to this specify...endspecify block) that may simplify the task of changing values for a large set of delays. The use of these statements for all timing specifications is recommended. Should any of the delay values assigned to a set of connections change, it is now only necessary to change the value in the specparam statement, rather than all of the parallel or full connections.


specify
  specparam a_high = 2;
  specparam a_low  = 4;

  if  (a) (a => out) = a_high;
  if (~a) (a => out) = a_low;
endspecify 
	    
Conditional Path Delays

Conditional path delays (or state dependent path delays) can be used to set up different delays through a module according to the state of one or more control signals. The keyword if is the only one that can be used - unusually, there is no corresponding else. The control clause can be any normal expression.

Pin-to-pin timings can also be expressed in terms of rise-, fall- and turn-off times. (See the earlier section on gate level modelling.) Different delays can be specified for each possible signal transition, but only in certain combinations, and the order in which they are to be declared must be strictly observed. The allowable combinations limit the number of values that may be specified in any one statement to be 1, 2, 3, 6 or 12 only. The permitted combinations are as follows :

Table 2 : Pin-to-pin Transition Timings
Number of parameters Used for...
1 All transitions.
2
Rise and Fall times.
Rise : 0 -> 1, 0 -> z, z -> 1
Fall : 1 -> 0, 1 -> z, z -> 0
3
Rise, Fall and Turn-Off times.
Rise : 0 -> 1, 0 -> z
Fall : 1 -> 0, 1 -> z
Turn-Off : z -> 0, z -> 1
6 The following transitions in this order :
0 -> 1, 1 -> 0, 0 -> z, z -> 1, 1 -> z, z -> 0
12 The following transitions in this order :
0 -> 1, 1 -> 0, 0 -> z, z -> 1, 1 -> z, z -> 0,
0 -> x, x -> 1, 1 -> x, x -> 0, x -> z, z -> x

If the x transitions are not specified, a pessimistic approach is taken to ensure worst case timings. Any transition from an unknown (x) to a known (0, 1 or z) state will take the maximum of the specified times, while a transition from a known state to an unknown state will take the minimum of the specified times. (e.g. if 6 values have been specified, a 0 -> x transition will take the minimum of the delays specified for a 0 -> 1 or a 0 -> z transition.)

Setup and Hold Times

All of the examples that seen so far have been of combinatorial logic. However, timing is equally important in sequential logic, if not more so. Sequential elements, such as flip-flops, have set timing constraints that must be observed if they are to work correctly. Two of these, the setup and hold times specify the amount of time for which the data input must not change before and after the rising clock edge, respectively. Failure to observe these constraints may result in unexpected behaviour from the element.

To facilitate checking for violations of these (and other) timing constraints, Verilog has a number of system tasks (identified by the `$' prefix). The two relevant calls here are $setup and $hold.

$setup(data_line, clk_line, limit); data_line is the name of the signal which is to be monitored for constraint violations, clk_line is the event (name and transition of the signal) with reference to which the timing constraints are measured, and limit is the period before the event on the clk_line (normally a rising edge) during which the data_line signal is not allowed to change. If the signal breaks this constraint, an error is generated.
$hold(clk_line, data_line, limit); $hold is very similar to the $setup system task, except that its first two arguments are in the opposite order, and that the period it specifies is after an event on the clk_line.

These (and the other timing-related functions) can only be called from within specify blocks. Such functions are not restricted to use with sequential circuits - they may be used on any circuit where events can be seen to occur with respect to some other event. Use of the $setup and $hold tasks is probably best illustrated by the examples after the next section.

Timescales

Up until now, all of the timing and delay values have been measured in terms of simulator timesteps, with no reference to real time. Verilog allows different timescales (mappings from simulator timesteps to real time) to be assigned to each module. The `timescale directive is used for this :

`timescale reference_time_units / time_precision

where reference_time_units and time_precision are values with a measurement - the two values need not use the same measurement (e.g. `timescale 10 us / 100 ns ), but can only be specified to the nearest 1, 10 or 100 units. The reference_time_units is the value attributed to the delay (#) operator, and the time_precision is the accuracy to which reported times are rounded during simulations.

`timescale directives can be given before each module to setup the timings for that module, and remain in force until overridden by the next such directive.

Setup and Hold Example

The example chosen to illustrate the use of the $setup and $hold system tasks is an implementation of an iterative solution to the Towers of Hanoi problem, details of which can be found elsewhere. This solution also incorporates a Start button, which can be pressed and held down for as long as desired. Upon release of the button, the circuit will output the sequence of moves required to solve the the puzzle for the set number of disks (in this case, 5).

The basic design of the system is illustrated below.

The Towers of Hanoi Move Generator

The code for this is also presented here :

/****************************************************************************\
 *                                                                          *
 *                            The Towers of Hanoi                           *
 *                                                                          *
\****************************************************************************/

/*
 *  Setup up some global parameters, for ease of change.
 */

`define clk_period 20
`define setup_time 4
`define hold_time  1


/****************************************************************************\
 *                                                                          *
 *               'Basic building block' module definitions                  *
 *                                                                          *
\****************************************************************************/

module toggle(q, qbar, clk, toggle, reset);

   /*
    *  A mixed style model of a T-type (toggle) flip-flop,
    *  with a reset line and delays on the outputs.
    *  This first part is behavioural code.
    */

   output q, qbar;
   input  clk, toggle, reset;
   reg	  q;

   always @(posedge clk)
      if (reset == 1)
	 #5 q = 0;
      else if (toggle == 1)
	 #6 q = ~q;

   /*  This part is dataflow-style  */
   
   assign #1 qbar = ~q;

endmodule /* toggle */

module effr(q, clk, enable, reset, d);

   /*
    *  A behavioural model of an E-type (enable) flip-flop
    *  with a reset signal, and delays on the outputs.
    */

   output q;
   input  clk, enable, reset, d;
   reg	  q;

   /*
    *  This next block checks for timing violations of the 
    *  flip-flop's setup and hold times.
    */
   
   specify
      $setup(d, posedge clk, `setup_time);
      $hold(posedge clk, d, `hold_time);
   endspecify

   /*
    *  This is the actual code for the E-type.
    */
   
   always @(posedge clk)
      if (reset == 1)
	 #5 q = 0;
      else if (enable == 1)
	 #6 q = d;
   
endmodule /* effr */

module effs(q, clk, enable, set, d);

   /*
    *  Another behavioural model of an E-type, this time with
    *  a set line, and delays on the outputs.  The same timing
    *  checks as before are implemented here, too.
    */
   
   output q;
   input  clk, enable, set, d;
   reg	  q;
   
   specify
      $setup(d, posedge clk, `setup_time);
      $hold(posedge clk, d, `hold_time);
   endspecify
   
   always @(posedge clk)
      if (set == 1)
	 #5 q = 1;
      else if (enable == 1)
	 #6 q = d;
   
endmodule /* effs */



/****************************************************************************\
 *                                                                          *
 *    Now, the more complex modules for implementing the actual solution    *
 *                                                                          *
\****************************************************************************/

module evenSlice(bus, oneOut, zeroOut, clk, init, oneIn, zeroIn);

   /*
    *  A dataflow model of one bit slice of the full moves generator.
    *  The only differences between this module and the oddSlice one
    *  are in the initialisation values.  (Note the types of the
    *  flip-flops used.)
    */
   
   inout [3:0] bus;
   output      oneOut, zeroOut;
   input       clk, init, oneIn, zeroIn;
   
   wire	       enable, tq, tqbar;
   wire [1:0]  toPeg, fromPeg, new;
   
   toggle tog (tq, tqbar, clk, oneIn, init);
   effr   to0 (toPeg[0], clk, enable, init, new[0]);
   effs   to1 (toPeg[1], clk, enable, init, new[1]);
   effs from0 (fromPeg[0], clk, enable, init, toPeg[0]);
   effr from1 (fromPeg[1], clk, enable, init, toPeg[1]);

   assign #2 oneOut  =  oneIn & tq;
   assign #2 zeroOut = zeroIn & tqbar;
   assign #2 enable  = zeroIn & tq;
      
   assign #2 new[1] = ~(toPeg[1] & fromPeg[1]);
   assign #2 new[0] = ~(toPeg[0] & fromPeg[0]);
   
   assign bus = (enable == 1) ? {fromPeg, toPeg} : 4'bz;
   
endmodule /* evenSlice */

module oddSlice(bus, oneOut, zeroOut, clk, init, oneIn, zeroIn);

   /*
    *  See the comments for the evenSlice module.
    */
   
   inout [3:0] bus;
   output      oneOut, zeroOut;
   input       clk, init, oneIn, zeroIn;
   
   wire	       enable, tq, tqbar;
   wire [1:0]  toPeg, fromPeg, new;
   
   toggle tog (tq, tqbar, clk, oneIn, init);   
   effs   to0 (toPeg[0], clk, enable, init, new[0]);
   effs   to1 (toPeg[1], clk, enable, init, new[1]);
   effs from0 (fromPeg[0], clk, enable, init, toPeg[0]);
   effr from1 (fromPeg[1], clk, enable, init, toPeg[1]);

   assign #2 oneOut  =  oneIn & tq;
   assign #2 zeroOut = zeroIn & tqbar;
   assign #2 enable  = zeroIn & tq;
   
   assign #2 new[1] = ~(toPeg[1] & fromPeg[1]);
   assign #2 new[0] = ~(toPeg[0] & fromPeg[0]);
   
   assign bus = (enable == 1) ? {fromPeg, toPeg} : 4'bz;
   
endmodule /* evenSlice */

module start_button(go, clk, press);

   /*
    *  A gate level model of the start button, with the functionality
    *  as described elsewhere.
    */

   output  go;
   input   clk, press;
   wire	   e_out, not_press;
   supply1 vdd;

   /*
    *  This block checks that the pulse with on the input line is
    *  wider than 3, otherwise it is invalid.
    */
   
   specify
      specparam min_time = 3;
      $width(posedge press, min_time);
   endspecify
   
   effs       st_0 (e_out, clk, vdd, press, vdd);
   not #(1)    n_0 (not_press, press);
   and #(2,1)  a_0 (go, e_out, not_press);

endmodule /* start_button */
	   
module tower(from_peg, to_peg, done, clk, start);

   /*
    *  This is a dataflow model of the actual move generator - to 
    *  be thought of as a stack (or tower) of modules, each of which
    *  with one disk of the puzzle.
    * 
    *  It brings together all of the other modules, and presents a
    *  clean interface to the outside world, taking a 'start' signal
    *  and returning a 'done' signal, once the sequence has been 
    *  completed.
    */
   
   output [1:0]	from_peg, to_peg;
   output	done;	
   input	clk, start;
   wire [4:0]	oneOut, zeroOut;
   wire [3:0]	bus;
   wire		init;		
   supply1	vdd;

   start_button  st_0(init, clk, start);
   oddSlice  rung0 (bus, oneOut[0], zeroOut[0], clk, ~init, vdd, vdd);
   evenSlice rung1 (bus, oneOut[1], zeroOut[1], clk, ~init,
		         oneOut[0], zeroOut[0]);   
   oddSlice  rung2 (bus, oneOut[2], zeroOut[2], clk, ~init,
		         oneOut[1], zeroOut[1]);
   evenSlice rung3 (bus, oneOut[3], zeroOut[3], clk, ~init,
		         oneOut[2], zeroOut[2]);
   oddSlice  rung4 (bus, oneOut[4], zeroOut[4], clk, ~init,
		         oneOut[3], zeroOut[3]);

   assign from_peg = bus[3:2];
   assign to_peg   = bus[1:0];
   assign done     = oneOut[4];

endmodule /* tower */


/****************************************************************************\
 *                                                                          *
 *  The final stimulus module is used to check that the tower module works  *
 *  properly                                                                *
 *                                                                          *
\****************************************************************************/

module stimulus;

   /*
    *  This is a behavioural model.  It simply instantiates the tower
    *  module, provides it with inputs and monitors its outputs.
    */

   reg	      clk, button;
   wire [1:0] from, to;
   wire	      done;
   
   tower t_0(from, to, done, clk, button);
   
   initial begin
      clk = 0;
      forever #(`clk_period / 2) clk = ~clk;
   end
   
   initial begin
      button = 0;      
      #40 button = 1;
      #50 button = 0;
   end

   always @(posedge clk)
      #(`clk_period - 1) $display($time, "  From peg %d To peg %d", 
				  from, to);

   always @(posedge clk)
      if (done == 1) #`clk_period $stop;
   
endmodule /* stimulus */
	   
    

The design has been implemented using bit slice techniques to allow for easy extension to different numbers of bits (which correspond to the number of disks in the problem). Each new disk to be catered for requires the counter and the disk selector to be extended, and a new move generator with its tri-state outputs to be attached to the bus. This is a very good methodology to adopt when designing circuits that may be extended. In this case, another point to be borne in mind is that the move generators require to be initialised to different values depending on whether the number of disks is even or odd.

The code makes use of many of the delay techniques covered earlier, as well as the setup and hold checks.

Exercise 7 : Copy the above code to a file, and run it though the Verilog compiler. Now change the clock period to half of its current value, and run the code again. What happens? Answers

Setup and hold violations allow the maximum rate at which data can be clocked through the sequential elements of a circuit to be determined. However, there may be other constraints on the circuit which affect the overall data rate.

Exercise 8 : Change the clock period to 15, and re-run the code. What happens this time? Why does this occur? Answers

Exercise 9 : Determine the maximum clock frequency with which the circuit will function correctly. Answers

This example should have given a good idea of the sort of techniques employed in modelling circuits, making use of delays and timing checks. Obviously, it has not covered all of the concepts presented earlier, but has shown a typical use of many of them.

Negative Edge Triggering

As a final note, the issue of clocking on both positive and negative edges should be addressed. While this may seem to be an attractive option, for example clocking data signals on the rising edge and control signals on the falling edge, it normally does not have the intended effect of doubling the maximum clock frequency of the circuit in question.

Exercise 10 : Why not? Answers

Consequently, while there may not be any directly adverse effects of using both positive and negative edges of a clock, common synchronous design practices tend to shy away from this, preferring to keep the design 'clean' by using only one edge of the clock signal to latch all values.


Last modified: Mon Oct 27 11:43:15 GMT 1997 by
Gerard M. Blair