辅导案例-MIPS 5

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Programming Assignment (Simulating Pipelined Machine)
Ontime Due: August 20, 2020
Late Due: August 22, 2020
In this assignment, you will become familiar with how a basic MIPS 5-stage pipeline works. You
will be given a simulator that models an unpipelined processor that implements a small MIPS-like
instruction set. Your assignment is to create a cycle-accurate simulator of a pipelined version of this
processor. Your simulator will perform (1) data forwarding, (2) a simple branch prediction scheme,
and (3) the pipeline interlocks to stall the pipeline when necessary.
You will need a Linux environment to complete this assignment. There are multiple ways to
set up your work environment. If you use a Mac or a Linux OS, you may already have a working
environment or can set up without too much pain (e.g., installing XCode on a Mac or using a
package manager like HomeBrew or MacPort). If you use a Windows OS, you can try Windows
Subsystem for Linux (link available in the Resource tab in the course homepage) or consider running
a Virtual Machine as a guest OS. If you do not want to set up a work environment locally on your
computer, you can choose to work on the grace cluster as you did before when you took CMSC216.
The course space is already created in the cluster and you can login with your directory ID and
password (ssh [email protected]). If you work on the grace cluster and use Visual
Studio Code (VSCode) as your code editor, you can consider installing a package that supports
remote SSH connection. It allows you to work on a remote file system in VSCode.
Note that there can be subtle differences in machine configurations in all the options above. It
is possible that your work run on your work environment but fail on the submit server – I believe
that you experienced the symptoms before when you took CMSC 216. It is obvious that your work
should pass tests on the submit server. A common reason is the use of uninitialized variables.
1 Files
The first thing you should do is to check the files downloaded from ELMS. There are 8 files:
• asm.c
• Makefile
• mips-small-pipe.c
• mips-small-pipe.h
• mips-small.c
• mult.s
• simple.output
• simple.s
asm.c is an assembler for the reduced MIPS ISA which your simulator will implement (more about
the assembler and the ISA later). mips-small.c is the C source file for a fully functional un-
pipelined simulator. mips-small-pipe.c is a C source code template that has some useful data
structures and routines for a pipelined simulator. You can use this template to get started on the
assignment. mips-small-pipe.h is a header file included by mips-small-pipe.c, and Makefile
1
Name Format Type Opcode Func
LW I-type 0x23
SW I-type 0x2B
BEQZ I-type 0x4
ADDI I-type 0x8
ADD R-type 0x0 0x20
SUB R-type 0x0 0x22
SLL R-type 0x0 0x4
SRL R-type 0x0 0x6
AND R-type 0x0 0x24
OR R-type 0x0 0x25
HALT J-type 0x3F
Table 1: Instruction encodings for a reduced MIPS ISA.
is a unix make file which will produce two binaries asm, sim, and sim-pipe from the source files,
asm.c, mips-small.c, and mips-small-pipe.c, respectively.
In the remaining files, we have provided two example assembly programs. simple.s performs an
arithmetic operation and mult.s multiplies two numbers. simple.output is the result of running
simple.s through our pipelined simulator, and can be used to verify your simulator’s output.
2 A Scaled-Down MIPS ISA
You will be simulating the MIPS ISA from Hennessy & Patterson, with some key differences. First,
instead of a 64-bit architecture, you will implement a 32-bit architecture. In other words, all registers
and data paths are 32 bits wide, and all instructions will operate on 32-bit operands – an integer
variable should be big enough to hold an instruction on the grace cluster. Second, to keep your
simulator simple, you will only be required to support a scaled-down version of the MIPS ISA
consisting of 11 instructions. These instructions, along with their encoding, are given in Table 1.
We will adhere to the MIPS instruction formats presented in Figure A.22 (Instruction layout for
MIPS) of Hennessy & Patterson, with the exception that there is no “shamt” field in the R-type
format, and instead the “func” field is 11-bits wide. Finally, although all immediate values in
branch and jump instructions are already left-shifted by 2, so your BEQZ instruction SHOULD
NOT perform the left shift in the implementation. A simple implementation of MIPS in pages C-31
through C-33 and Figure C.23 would give you some hints on pipeline implementation.)
Notice that all the instructions in Table 1 except for one exist in the normal MIPS ISA. These
instructions behave exactly as described in the text (except they are 32-bit versions rather than 64-
bit versions). The instruction we’ve added is the HALT instruction. As its name implies, when your
simulator executes a HALT instruction, it should terminate the simulation. For more information
about MIPS ISA, consult Section A.9 of Hennessy & Patterson.
3 asm: An Assembler for the Reduced MIPS ISA
We have provided an assembler, asm.c, so that you can assemble programs for your simulator. The
asm.c file is fully functional, and you will not need to make any modifications to this file. Simply
use the Makefile to make the binary asm from the asm.c source file – i.e., You would work on a
Linux environment and type make to compile everything using the Makefile.
The format for assembly programs is very simple. A valid assembly program is an ASCII file in
which each line of the file represents a single instruction, or a data constant. The format for a line
of assembly code is:
labelinstructionfield0field1field2comments
The leftmost field on a line is the label field which indicates a symbolic address. Valid labels
contain a maximum of 6 characters and can consist of letters and numbers. The label is optional
2
(the tab following the label field is not. After the optional label is a tab – i.e., – the tab character
indicates if there is a label or not. Then follows the instruction field, where the instruction can
be any of the assembly-language mnemonics listed in Table 1. After another tab comes a series of
fields. All fields are given as decimal numbers. The number of fields depends on the instruction.
The following describes the instructions and how they are specified in assembly code:
lw rd rs1 imm Reg[rd] <- Mem[Reg[rs1] + imm]
sw rd rs1 imm Reg[rd] -> Mem[Reg[rs1] + imm]
beqz rd rs1 imm if (Reg[rs1] == 0) PC <- PC+4+imm
addi rd rs1 imm Reg[rd] <- Reg[rs1] + imm
add rd rs1 rs2 Reg[rd] <- Reg[rs1] + Reg[rs2]
sub rd rs1 rs2 Reg[rd] <- Reg[rs1] - Reg[rs2]
sll rd rs1 rs2 Reg[rd] <- Reg[rs1] << Reg[rs2]
srl rd rs1 rs2 Reg[rd] <- Reg[rs1] >> Reg[rs2]
and rd rs1 rs2 Reg[rd] <- Reg[rs1] & Reg[rs2]
or rd rs1 rs2 Reg[rd] <- Reg[rs1] | Reg[rs2]
halt stop simulation
Note that in the case of the beqz instruction, PC-relative addressing is used (and again, your
simulator should not perform the left-shift when computing the PC-relative branch target). For the
lw, sw, and beqz instructions, the imm field can either be a decimal value, or a label can be used. In
the case of a label, the assembler performs a different action depending on whether the instruction
is a lw / sw instruction, or a beqz instruction. For lw and sw instructions, the assembler inserts
the absolute address corresponding to the label. For beqz instructions, the assembler computes a
PC-relative offset with respect to the label. After the last field is another tab, then any comments.
The comments end at the end of the line.
In addition to instructions, lines of assembly code can also include directives for the assembler.
The only directive we will use is .fill. The .fill directive tells the assembler to put a number
into the place where the instruction would normally be stored. The .fill directive uses one field,
which can be either a numeric value or a symbolic address. For example, “.fill 32” puts the value
32 where the instruction would normally be stored. In the following example, “.fill start” will
store the value 8, because the label “start” refers to address 8 (remember that the MIPS architecture
uses byte addresses).
addi 1 0 5 load reg1 with 5
addi 2 0 -1 load reg2 with -1
start add 1 1 2 decrement reg1
lw 3 0 var1 loads reg3 with value stored in var1
addi 3 3 -1 decrement reg3
sw 3 0 var1 put reg3 back (thus var1 is decremented)
beqz 0 1 done goto done when reg1==0
beqz 0 0 start back to start
add 0 0 0
done halt
.fill start will contain start address (8)
var1 .fill 32 Declare a variable, initialized to 32
Try taking the above example and running the assembler on it. Enter the above assembly code
into a file called “ex.s”. Then type “asm ex.s ex.out”. The assembler will generate a file “ex.out”
which should contain:
(address 0x0): 20010005
(address 0x4): 2002ffff
(address 0x8): 00220820
(address 0xc): 8c03002c
(address 0x10): 2063ffff
(address 0x14): ac03002c
3
(address 0x18): 10200008
(address 0x1c): 1000ffe8
(address 0x20): 00000020
(address 0x24): fc000000
(address 0x28): 00000008
(address 0x2c): 00000020
The assembler assumes that all programs will be loaded into memory beginning at address 0x0.
4 Simulator without Pipelining
As described above, we have provided you with a simulator that simulates the same ISA, but which
is not cycle accurate because it does not account for pipelining. The source code for this simulator
is in mips-small.c. You can use this simulator to run assembly language programs prior to getting
the pipelined simulator working.
Build the unpipelined simulator by typing “make sim” in the directory where you’ve copied the
files we have provided. This will produce an executable sim which you can run. Try running this
simulator on the program we have provided, mult.s. This program multiplies two numbers, specified
in memory at the labels mcand and mplier, and stores the result at the label answer. Be sure to
assemble the program before running the simulator. The sequence of commands you should give
are:
make sim
asm mult.s mult.out
sim mult.out
When you run the simulator, a ton of messages will spew onto your terminal. The simulator we
have provided dumps the state of the machine (including the contents of registers and memory) at
every simulated cycle. Note that mips-small.c implements an unpipelined, single-cycle machine.
So, each instruction completes in a cycle, which is long enough to process everything needed from
fetching an instruction through updating the register file.
You can examine this output to see what the simulator is doing. To put these messages in a
file so that you can actually read them, redirect the output of the simulator to a file. – e.g., on a
C shell, by saying “sim mult.out >! sim.output”.1 Your pipelined simulator will spew similar
messages, except it will also include the state of pipeline registers.
In addition to using the unpipelined simulator to run assembly programs, you should also look
at the code in mips-small.c. The code is very simple, but you should make sure you understand it
before moving on to building the pipelined simulator.
5 Assignment
Your assignment is to build a cycle-accurate simulator that accounts for pipelining. We
have provided starter code in mips-small-pipe.c and mips-small-pipe.h. This code serves two
purposes: it will make your life a bit easier since you won’t have to write the whole simulator from
scratch, and it is meant to enforce some coding disciplines that ensure you actually simulate the
details of the processor pipeline.
The following sections describe the assignment in more detail.
5.1 Simulate Pipelining
You will be simulating the basic MIPS pipeline depicted in Figure C.22 of Hennessy & Patterson. We
have provided pipeline register data structures specified in the file, mips-small-pipe.h. To enforce
1You may search Google with the keywords input output redirection if you have not used the input/output
redirection on a Linux environment.
4
an implementation of your simulator that simulates pipelining, you are required to use
these structures in their unmodified form. In other words, each cycle simulated by your
simulator must produce the correct values for all the fields in the pipeline register data structures.
typedef struct IFIDStruct {
int instr;
int pcPlus1;
} IFID_t;
typedef struct IDEXStruct {
int instr;
int pcPlus1;
int readRegA;
int readRegB;
int offset;
} IDEX_t;
typedef struct EXMEMStruct {
int instr;
int aluResult;
int readRegB;
} EXMEM_t;
typedef struct MEMWBStruct {
int instr;
int writeData;
} MEMWB_t;
typedef struct WBENDStruct {
int instr;
int writeData;
} WBEND_t;
typedef struct stateStruct {
int pc;
int instrMem[MAXMEMORY];
int dataMem[MAXMEMORY];
int reg[NUMREGS];
int numMemory;
IFID_t IFID;
IDEX_t IDEX;
EXMEM_t EXMEM;
MEMWB_t MEMWB;
WBEND_t WBEND;
int cycles; /* number of cycles run so far */
} state_t, *Pstate;
There are three small modifications in the above pipeline register data structures as compared
with Figure C.22 in Hennessy & Patterson. First, we don’t latch the “branch taken” signal in the
EXMEM register. Instead, when your simulator computes the direction of the branch in the execute
(EX) stage, you can directly modify the PC register from the execute (EX) stage. Second, the MUX
in the WB stage has been moved to the MEM stage. Therefore, the MEMWB pipeline registers
only latch two values, instr and writeData, instead of three values as indicated by Figure C.22
in the textbook. Finally, we have added an extra pipeline register after the WB stage. Instead
of communicating internally through the register file, your simulator will always forward from the
5
pipeline registers. For instance, in Figure C.7 of the textbook, there is a communication path from
the DADD instruction to the OR instruction internally through the register file (this is achieved
by writing the register file on the first half of the clock cycle, and reading the register file on the
second half of the clock cycle). In your simulator, this register communication will be replaced by
forwarding between the WBEND pipeline registers to the execute stage of the OR instruction in
cycle CC6.
As mentioned above, your simulator must produce all the values for the pipeline register data
structures on a cycle-by-cycle basis. In addition, your simulator must also use the following
simulator loop code (which is in mips-small-pipe.c):
while (1) {
printState(state);
/* copy everything so all we have to do is make changes.
(this is primarily for the memory and reg arrays) */
memcpy(&new, state, sizeof(state_t));
new.cycles++;
/* --------------------- IF stage --------------------- */
/* --------------------- ID stage --------------------- */
/* --------------------- EX stage --------------------- */
/* --------------------- MEM stage --------------------- */
/* --------------------- WB stage --------------------- */
/* transfer new state into current state */
memcpy(state, &new, sizeof(state_t));
}
You will insert the necessary code between the comments that perform the functions at each
pipeline stage, but do not need to change any other part of this loop. Especially, be careful not
to change the output format of printState.2 Within each pipeline stage, the code will compute
the new values of the pipeline register data structures (stored in the “new” structure) as a function
of the old values (stored in the “state” structure). Then, after the code in all the stages have
computed their values, the “memcpy” at the end of the simulator loop copies the values from the
new structure into the state structure. This corresponds to the rising edge of the clock latching
values into the pipeline registers for the next clock cycle.
5.2 Dealing with Data Hazards
Your simulator should deal with data hazards by bypassing (or forwarding) values from the
pipeline register data structures. You can implement such data forwarding by checking for data
dependences in the execute stage. For every instruction that enters the execute stage, you will have
to check against the results produced by the previous 3 instructions (whose results can be found
in the EXMEM, MEMWB, and WBEND pipeline registers). If there are no data dependences,
then the execute stage should take its operands from the readRegA and readRegB IDEX pipeline
registers. If there are data dependences, then the execute stage should instead copy the results from
the pipeline register of the dependent instruction.
2You are allowed to write helper functions in the C file if needed.
6
Most data hazards can be resolved using bypassing. However, there is one case that requires
stalling the pipeline. As seen in Figure C.9 in the textbook, if an instruction uses a source register
that is the destination register of an immediately preceeding load instruction, then the pipeline
must be stalled for a single cycle. This can be implemented by checking for such a dependence in
the decode stage and inserting a bubble (NOP instruction - see Section 5.4.2) instead of allowing
the instruction to proceed. One special case is when a load from a location is immediately followed
by a store (Figure C.8 in the textbook). The depending store can be bypassed by forwarding from
the MEM/WB latch (loaded value) to the data memory write port of the memory stage (store
instruction). However, for your implementation, you will not need to implement this; simply stall
the pipeline one clock cycle for ALL load instructions that are followed by a depending instruction,
even if the depending instruction is a store.
5.3 Dealing with Control Hazards
The pipeline you will be simulating will have a branch delay of 2 cycles since the branch direction
is not resolved until the end of the execute stage. Instead of stalling the pipeline for 2 cycles after
every fetched branch, your simulator should implement a “hybrid” static branch prediction scheme.
In this scheme, you should predict taken for backward branches, and predict not-taken for forward
branches. To implement this hybrid scheme, test the immediate value in the fetch stage
of every fetched branch instruction. If the immediate value is positive, then fetch the next
consecutive instruction on the following cycle. If the immediate value is negative, then fetch the
branch target (computed as a PC-relative address using the offset) on the following cycle.
Make sure that your simulator validates the branch direction computed in the execute stage
against the predicted direction. If the prediction was correct, no action should be taken. If the
prediction was incorrect, then the simulator must squash the two instructions in the pipeline, which
at this point will be in the IFID and IDEX pipeline registers. This squashing can be accomplished
by replacing the incorrectly fetched instructions with NOP instructions and clearing all the other
state in the IFID and IDEX pipeline registers.
5.4 Miscellaneous
5.4.1 Memory
As indicated by the state t structure described in Section 5.1, there are two arrays that implement
memory, an instruction memory array instrMem and a data memory array dataMem. This simulates
the separate instruction and memory structures in the pipeline in Figure C.22 in the textbook. The
code we have provided initializes both of these arrays to the same values read in from the assembly
code file. In your simulator, you should read instructions from the instrMem array in the fetch stage,
and write (read) values to (from) the dataMem array in the memory stage.
In the simulator code we have provided for you, the amount of memory simulated is determined
by the size of the assembly program file that you read into the simulator. The numMemory field of
the state t structure is set to this size during initialization and is fixed for the duration of the
simulation. This means that you must allocate all the memory your program will need statically in
the assembly code file. The maximum size of the assembly program allowed is set by the MAXMEMORY
constant declared in mips-small-pipe.h and is currently 16K words (64K bytes).
5.4.2 NOP Instruction
The MIPS ISA does not include an explicit NOP (null operation) instruction, and neither does our
reduced MIPS ISA. However, the NOP instruction is needed to initialize all the instruction fields of
the pipeline register data structures, and to replace or “squash” those instructions in the pipeline
following a branch misprediction. In your simulator, you will use the “add 0 0 0” instruction as a
NOP instruction (since this instruction does not effect any state). We have already declared this as
NOPINSTRUCTION and used it in the initialization code.
7
5.4.3 Halting
When your simulator encounters the HALT instruction, it should end the simulation. Notice, however,
that it is incorrect to halt as soon as the HALT instruction is fetched since this code may be
speculative (dependent on a yet to be verified branch). If the branch was mispredicted, then you
definitely do not want to halt. Another issue is that any SW instructions which are farther along
the pipeline than the HALT instruction must be allowed to complete before the machine is stopped.
To solve these problems, halt the machine when the HALT instruction reaches the MEMWB pipeline
register. This ensures that all previously executed instructions have completed.
5.5 Grading
Your work will be graded based on the correctness of the required functionality. In other words,
for any valid reduced MIPS ISA program, your simulator should accurately simulate the pipeline
and produce the correct values for all the pipeline registers at each cycle. We will be using our own
programs to validate your simulator, so make sure you test your simulator on several test cases (in
other words, you should write a few extra assembly programs to test your simulator). To assist in
grading, use the simulator loop template and the printState routine that we have given you in
their unmodified form. Each time you call printState, it should print in the format that we have
provided, and you should only call printState once every iteration of the simulator loop (you are
free to use printState as much as you want when debugging, but take all the extra printState calls
out of your code before handing in your assignment). Finally, compare the output of your simulator
on the simple.s program to output given in simple.output. There should be no differences (i.e.,
running diff simple.output should yield no differences).
5.6 Academic Honesty
Do not allow any other student to see any of your code. You may however discuss the assignment in
general terms, with other students, only for the clarification purpose. The line between clarification
purpose discussion and academic dishonesty. You should never show your code to others or reuse
code you see on any on-/off-line sources. Also, do not discuss too much implementation details
during discussions. Both of you could write very similar code in the end. If copying or excessive
collaboration is detected in your submissions, the matter will be referred to the Office of Student
Conduct.
6 Submission
You will submit mips-small-pipe.h and mips-small-pipe.c to the submit server.
8