VHDL Simulation #3

Introduction

Simulation is performed in order to give insight to problems which cannot be efficiently analyzed via an analytical approach. The objective of this assignment is use our 'arbiter' and 'cpu' entities provide some quantitative answers to issues in an 8 CPU system that utilizes different features of the arbiter. This lab is worth 150 pts.

Setup

Unpack the file sim3.zip in the directory in which your 'src' and 'obj' directories are located for your VHDL assignments. This will unpack a 'sim3.student' directory that contains all of the source files needed by this assignment. You should rename this directory to 'sim3', and create a sim3 library. The Makefile for this assignment is included in 'sim3.student/Makefile.sim3' . Among other things, the source files contain altered versions of the cpu and arbiter provide in simulation #2. The arbiter component has two GENERICS :

   ROUND_ROBIN :boolean := TRUE; -- do round_robin priority if true
                                 -- else do Fixed priority

   OVERLAP_GRANT :boolean := TRUE; -- overlap grant requests if true
                                   -- else do NON-OVERLAPPED grant request.

In a fixed prority scheme, the priorities assigned to bus request lines to resolve simultaneous requests never change. In this arbiter, bus request #0 has the highest priority, bus request #7 has the lowest. In a round robin scheme, the priorities are rotated after every bus transfer with the goal of giving each request line equal time at being the highest priority.

The OVERLAP_GRANT generic controls whether or not the arbiter will perform overlapped bus requests. An overlapped bus request is when the arbiter grants the bus to a CPU while a different CPU still has ownership of the bus. Using overlapped grants minimizes the bus ownership transfer time between CPUs (minimizes the number of cycles that BBUSY goes to a 'H' during bus ownership transfer).

Definitions

IO Tranaction : refers to an IO transaction by a CPU (the CPU requests the bus; the cpu is granted the bus by the arbiter; and then the cpu assumes bus mastery by asserting the BBUSY line for some number of clock cycles).
Transfer Size: The number of clocks that the BBUSY line is asserted by a CPU during an IO transaction.
Total Clocks : the total number of clocks during the simulation. This number applies to the entire simulation.
Bus Latency for a CPU is the number of clocks from assertion of bus request to assertion of BBUSY. Note that this will be at least 2 clocks if there is no competition for the bus because it takes 1 clock for the arbiter to respond to the bus request, and 1 more clock for the CPU to respond to the bus grant.
Bus utilization is the number of clocks in which the 'bbusy' line is asserted (had a '0' value) divided by the total number of clocks. Note that this number can never be 100% because there are always some cycles in which BBUSY is high during bus ownership transfer from one CPU to another.

Questions

Use an 8 CPU simulation, and answer the following questions:

With transfer size = 8 clocks and using a fixed priority/overlapped bus grant arbiter, at what %bus utilization does the difference between IO transfers for the lowest (CPU #7) and highest priority CPUs (CPU #0) exceed 20% of the IO transfers of the highest priority CPU? If the priority scheme is changed to round robin, what is the %difference in IO transfers between CPU #0 and CPU #7 at the same bus utilization?
With transfer size = 8 clocks and for a fixed priority/overlapped bus grant arbiter, what is the average bus latency for CPU#0 and CPU#7 at the bus utilization discovered in question #1? If the priority scheme is changed to round robin, what is the average bus latency for CPU#0 and CPU#7 at the same bus utilization?
Repeat question #2 for a transfer size of 16 clocks (remember, that questions #2 uses the bus utilization obtained from question #1).
For a round robin priority/overlapped bus grant arbiter, at what %bus utilization does the average access latency become twice the transfer size? Answer for transfer sizes = 8 and transfer sizes = 16.

Discovering the Answers

To determine the answers to the above questions, you will need to do an 8-CPU+Arbiter simulation in which the CPUs make IO transfers. There must be some method of controlling the number of requests that a CPU makes in a period of time (the 'rate' of IO requests) in order to create different levels of bus utilization (the more requests the CPUs make within a period of time, the higher the bus utilization). The exact bus utilization will depend on the number of CPUs, the IO activity by the CPUs, and the size of the transfers when a CPU assumes control of the bus. You will also need to track statistics for each CPU and on a global basis. One statistic you will need to keep for each CPU is the number of IO transfers made by the CPU over the simulation, and also the average bus latency per transfer. An example of a global statistic is the total number of clocks or the total number of clocks in which the BBUSY line is asserted.

I have included in the ZIP archive a CPU model that follows the bus protocol defined in simulation #2. Various generics have been added to the CPU model for the purposes of this simulation. Please use this model instead of the one wrote for simulation #2 so that I know that everybody is starting from the same point. The generics on the CPU model are explained below:

To control the number of IO transfers that a CPU makes, a generic called 'REQUEST_RATE' (type 'natural') has been added to the 'CPU' entity. Assume the request rate can vary between 0 to 1999 with the goal being that the higher the REQUEST_RATE, the more IO requests a CPU will make in a given period of time. Modify the CPU code such that random requests (uniform distribution) are generated from a CPU component to match the request rate. This is done by keeping a boolean array of 2000 elements (call this array REQ_ARRAY). A counter (LCNTR), which is incremented every clock cycle between IO requests, is used to index into the REQ_ARRAY. If REQ_ARRAY[LCNTR] is TRUE, then a bus request is generated. Each time the counter reaches the end of the REQ_ARRAY (LCNTR = 2000), the array values are first reset to FALSE, then new random elements in the REQ_ARRAY are set to TRUE to be used to generate IO requests over the next 2000 local clocks. The REQUEST_RATE generic is used to control how many of the 2000 elements in REQ_ARRAY are set to TRUE -- a REQUEST_RATE value of '10' means that 10 of the 2000 values in the REQ_ARRAY are set to TRUE). Note that for this CPU model, the clocks spent waiting between IO requests are in the LOCAL state, so the code for accomplishing the above actions should be placed in the LOCAL state.
A generic called 'RND_SEED' (type natural) has been added to the 'CPU' entity. This value can vary between 1 and 50 and specifies the array index of the starting random seed as specified in the 'rnd2' package. Assign each CPU component a DIFFERENT rnd_seed value (I don't care what values you use). The 'rnd2' package is included in the 'sim3/rnd.vhd' file in the ZIP archive attached to this lab. The same ZIP archive also contains a file called 'rnd_test.vhd' that shows you how to make use of the functions in the rnd2 package to generate random numbers.
A generic called 'CPU_ID' (type natural) has been added to the 'CPU' entity. This value will be the ID number of the CPU and should be put on the address bus anytime the CPU has the bus. CPU_ID numbers should be assigned sequentially in the simulation (0 to 7 if the simulation contains 8 CPUs).
A generic called 'CLK_MAX' (type natural) has been added to the 'CPU' entity. The CPU should compare CLK_MAX to the total number of clocks it has seen thus far; when the total clks equals this CLK_MAX then the CPU should set its 'active' output to 'Z' and not generate any more bus requests. The 'active' output should be a '0' value otherwise.

The testbench file 'tb_cpu8' instantiates an arbiter with 8 CPU components. Make whatever modifications are necessary to the provided models in order produce the statistical information required in the following section.

Data Collection

If implemented as specified above, you should find that varying the request rate from 0 to 60 is more than adequate to achieve 0 to maximum bus utilization. How you choose to vary the request rate and in what steps is up to you.

You can answer the required questions by plotting:

Average IO Latency (y-axis) versus Bus Utilization (x-axis) for CPUs #0 and #7, for transfer sizes 8,16, for round-robin versus fixed priority.
IO Transfers (y-axis) vs Bus Utilization (x-axis) for CPUs #0 and #7, for transfer sizes 8, 16,for round-robin versus fixed priority.

To reduce the number of seperate graphs, plot CPUs #0 and #7 on the same graph along with the data for transfer sizes 8,16 for a given priority scheme (either fixed or round robin). This means there will be 4 lines per graph, and a total of 4 graphs. Use these graphs and/or numerical simulation results to determine the answers to the required questions. Produce your plots with CLK_MAX=10000 in the simulation.

Model Output

After all CPU's have set their active output to 'Z' which causes the 'active' line to go to a 'H' then statistics in the following format should be printed (you don't have to follow this format exactly, but I want to see the equivalent information):

# CPU #0 Total Clks: 10000, Total IOs: 25, Total Latency: 57, LatencyPerIO: 2.280000e+00
# CPU #1 Total Clks: 10000, Total IOs: 24, Total Latency: 78, LatencyPerIO: 3.250000e+00
# CPU #2 Total Clks: 10000, Total IOs: 24, Total Latency: 55, LatencyPerIO: 2.291667e+00
# CPU #3 Total Clks: 10000, Total IOs: 24, Total Latency: 64, LatencyPerIO: 2.666667e+00
# CPU #4 Total Clks: 10000, Total IOs: 24, Total Latency: 59, LatencyPerIO: 2.458333e+00
# CPU #5 Total Clks: 10000, Total IOs: 22, Total Latency: 107, LatencyPerIO: 4.863636e+00
# CPU #6 Total Clks: 10000, Total IOs: 24, Total Latency: 63, LatencyPerIO: 2.625000e+00
# CPU #7 Total Clks: 10000, Total IOs: 24, Total Latency: 57, LatencyPerIO: 2.375000e+00
# TransferSize: 8 ReqRate: 5 %BusUtil: 15% AvgIOs: 23 AvgTotalLatency: 67 AvgLatencyPerIO: 2.913043e+00

One way to keep the statistics is to define a package that has some global counters in it for each CPU (you can assume that there will be only 8 CPUs). After the 'active' line goes to an 'H', the stimulus module can call a 'report' procedure which you have placed in this package; the 'report' procedure prints the statistics to standard output (use a 'writeline' function to file OUTPUT);

Sanity Checks

Please do some simple sanity checking on your statistics. For example, you can't get a bus utilization > 100 %. The maximum bus utilization for overlapped grant operation is ( (transfer_size)/(transfer_size+1))*100%. For very low request rates where there is little bus contention, the Latency per IO should be 2 or very close to this. For very low requests rates, you can estimate the number of IOs made by each CPU as (Total Clocks)/2000 * Req_Rate . For very low bus request rates, the %bus utilization will be close to (Number_of_IOs * Transfer_size * Number_of_CPUs)/Total_Clocks*100%.

To Turn In

From the directory above your 'sim3' directory (your should be in the 'src' directory of your VHDL development tree), execute the script submit_sim3.pl . This will create a compressed tar file of your 'sim3' directory and will mail it to me.

Two of the files I have placed in the ZIP archive are called 'sim3/cfg_tb.template' and 'sim3/cfg_tb.vhd'. The 'cfg_tb.vhd' file uses the testbench provided in 'tb_cpu8.vhd'.

The perl script 'sim3_sol.pl' in ZIP archive uses the 'cfg_tb.template' file to run a series of VHDL simulations, varying the GENERICS above and recording your model output to a file. Look at the comments in the Perl script to determine how to run the perl script, and specify the parameters for each run. Once you have run the simulations, you can extract the data from the log file and produce your graphs (you might want to write another perl script that extracts this data and produces the plots automatically - I will leave this up to you).

When you submit your simulation, name your Makefile as 'Makefile.sim3' and place it in your sim3 submission directory. I do not care how many new VHDL packages/entities/configurations that you add to the original files I give you or how you modify the entities/packages which are already there. I just want your code to be compatible with the 'cfg_tb.vhd' configuration I have provided, and I want to be able to execute your 'Makefiles/Makefile.sim3' to recompile your code. If you do not provide this, I can't grade your simulation and you will receive no credit. Be sure your makefile runs from a 'clean compile' - i.e., no timestamps present in the timestamp obj/qhdl/sim3 directory.

I will use your models with my own parameters for REQ_RATE, TRANSFER_SIZE, etc and look at your model output to see if it makes sense. I do not expect your numbers to match my numbers exactly - however, I do expect approximate agreement.

Report

In your submission include a report file ('sim3/report.pdf') that has the requested graphs, and the answers to the questions. You will need to justify the answers to your questions via either the graphs or numerical output from your simulation. If you need to produce additional graphs that have expanded views in the area of interests to justif your answers, then do so. If you do not justify an answer, it will be counted as wrong. I expect answers significant to only one decimal point (ie. 3.6 is ok, but "about 4" is not acceptable).