EE 8273 Test 2 - Fall '02 Solutions - Reese

Work all problems. Closed book, closed notes.

1. (6 pts) What is the most important factor in a register file design: speed, area, or power? JUSTIFY your answer.

SPEED. The register file must provide operands for new instructions each clock cycle, also retire values from finished instructions – it is directly in the critical path.

2. (6 pts) When are sense-amplifiers required in an SRAM design?

Sense Amps are needed when bitlines do not have full swing – they must convert limited swing signals to full swing 0, 1 values.

3. (7 pts) Give two advantages for using a hierarchical decoding scheme. You must JUSTIFY your answers.

Speed – smaller blocks means lower word line loading.

*Power – only the block being accessed has to be enabled, which saves power.* 

4. (6 pts) Assume I have 1M x 32 SRAM and a hierarchical decoding scheme that limits each block to 512 rows x 64 columns. How many blocks do I need?

 $1M \ge 32 = 2^{20} \ge 2^5 = 2^{25}$  bits. Each block has  $512 \ge 64 = 2^9 \ge 2^6 = 2^{15}$  bits.

Number of blocks needed is  $2^{25}/2^{15} = 2^{10} = 1024$  blocks.

5. (6 pts) What is the most important factor for DRAM design – speed , area, or power? JUSTIFY your answer.

AREA. DRAMs are meant for main memory, where density is all important. That is why only bit line is used, one capacitor, one access transistor.

6. (6 pts) Draw the basic storage cell for a DRAM design.



 (6 pts) What lines are typically most heavily loaded in an SRAM design – word lines or bit lines? Explain your answer.

*Wordlines* – *they drive gates of access transistors; Bitlines only have source/drain diffusion capacitance.* 

8. (6 pts) The folded bitline architecture in DRAMs is basically a common-mode noise rejection scheme. Where does this noise come from? Be very SPECIFIC.

Noise source is wordline to bitline coupling. Folded bitline has wordline crossing both bitlines that go to sense-amp, so same noise is injected into both bitlines, and appears as common mode noise which is rejected.

9. (6 pts) In a DRAM, why do successive accesses to locations on different rows take longer than accesses to successive locations on the same row. Use your knowledge of DRAM architecture to justify your answer.

When accessing a row, all bits in the row are latched outside of sense amps. Changing the column address on the same row just accesses the outputs of different sense amps, which is fast. Before accessing a different row, the sense amps have to be re-biased, and bitlines precharged before the next row access can start – this is slow.

10. (6 pts) What does lateral shielding mean in the discussion of a clock network.

Vdd/Gnd metal runs on each side of a clock line.

11. (6 pts) What is easier to deal with from a designer viewpoint – a gridded global clock or a hierarchical clock network. Justify your answer.

Gridded global clock is easier because just use one skew value everywhere that is just due to wire delay. With heriarchical clock network, must do detailed timing analysis on all clock end points which have different loading, different buffering delays – timing is much more complicated.

*12.* (6 pts) What was the primary reason for going from a gridded global clock to a hierarchical clock network: speed, area, or power. JUSTIFY your answer.

Both power and speed are nearly equally weighted. For speed, it is easier to control skew in local regions because can match data arrival times with clock arrival times from buffered clocks. Can also de-skew the local clock with the global clock as done in the IA-64.

For power, can conditionally shutoff local clocks to save power. Also, loading is more tightly controlled so don't have to overdesign the clock network as in a gridded global clock.

13. (10 pts) Describe the basic operation of the active deskewing buffer in the Intel IA-32 process. Draw a diagram and label blocks to help support your description.

Note programmable delay buffer



Fig. 7. Deskew buffer (DSK) architecture.

- 14. (10 pts) Assume I need to implement the logic equation
  Y = (A xnor B) or (not (A or (B and not(C)))
  in a Domino logic pipeline stage. Show the necessary domino logic gates at the transistor level. Assume that you have dual rail inputs (of course!!!)
  - Y = (A xnor B) or (not (A or (B and not(C))))= (A xnor B) or ( not(A) and not (B and not(C)))(A xnor B) or ( not (A) and (not(B) or C))



15. (7 pts) Draw a rough diagram that shows how a global clock network is distributed/buffered across a chip. What is the primary goal of this distribution network

Primary goal is minimize skew across chip or tightly bound skew.



Fig. 2. Global clock distribution network.