程序代写案例-COMP3231

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
1 | P a g e

COMP3231 Computer Architecture
Problem-Set Assignment Two
(version 1.0)
Please answer all questions. You should hand-in the assignment through the Moodle System.

Question 1 (3 points)
This exercise examines the accuracy of various branch predictors for the following repeating pattern
(e.g., in a loop) of branch outcomes: NT, NT, T, T, NT, T.
a) (0.5) What is the accuracy of always-taken and always-not-taken predictors for this sequence of
branch outcomes?
b) (1) What is the accuracy of the 2-bit predictor for the
five branches in this pattern, assuming that the
predictor starts off in the 00 state (predict not taken)
of Figure C.15?
c) (1.5) What is the accuracy of the 2-bit predictor if
this pattern is repeated forever?




Question 2 (5 points)
a) (2.5) In this exercise, we are going to examine the single-issue Tomasulo RISC-V pipeline when
running the following loop:
addi x4,x1,800 //x1 = upper bound for X
foo: fld f2,0(x1) //f2 = X[i]
fmul.d f4,f2,f0 //f4 = a*X[i]
fld f6,0(x2) //f6 = Y[i]
fld f8, 0(x3) //f8 = Z[i]
fdiv.d f8,f8,f0 //f8 = Z[i] / a
fadd.d f6,f4,f6 //f6 = a*X[i] + Y[i]
fadd.d f8,f8,f6 //f8 = a*X[i] + Y[i] + Z[i] / a
fsd f8,0(x3) //Z[i] = a*X[i] + Y[i] + Z[i] / a
addi x1,x1,8 //increment X index
addi x2,x2,8 //increment Y index
addi x3,x3,8 //increment Z index
bltu x1,x4,foo //continue loop?

2 | P a g e

The functional units (FUs) are described in the following table.
FU type Cycles in Ex Number of FUs Number of RSs
Integer 1 1 5
FP adder 10 1 3
FP multiplier 15 1 2
FP divider 20 1 2


Here are a few assumptions about the microarchitecture:
• Functional units are not pipelined, i.e., if one instruction is using the functional unit,
another instruction cannot enter it.
• No forwarding between functional units, i.e., the results are communicated by the CDB.
• There are five load buffer slots and five store buffer slots.
• The execution stage (EX) does both the effective address calculation and the memory
access for loads and stores. Load/store operations contends with other Integer operations
for the use of Integer functional unit.
• The pipeline is modelled as IF/ID/IS/EX/WB.
• The issue (IS) and Write-back (WB) result stages each require one clock cycle.
• The EX stage of load/store requires one clock cycle.
• The bltu branch instruction requires one clock cycle.
• The system has a branch target buffer which caches the branch address of the bltu
instruction.
Perform a dry run of three iterations of the loop and show when will the instruction being
issued, executed, and written back. Report your answer in the form of a table as shown below.
Iteration Instruction Issue at EX/ MEM
start at
Write
CDB at
Comment
1 fld f2,0(x1) 1 2 3


1 fmul.d f4,f2,f0 2 4

Wait for f2
1 fld f6,0(x2) 3
The three “at” columns show the timing (cycle) when the event happens. The “Comment”
column is for showing/describing any event(s)/issue(s) that causing the instruction to wait for.
Show three iterations of the loop in the table. Please ignore the first instruction before the loop.
We have completed the first instruction of the first iteration for you.
b) (2.5) Repeat the analysis with a two-issue Tomasulo pipeline with speculation.
Assume:
• Fully pipelined FP function units.
• There are 20 ROB slots.
• Two instructions of any type can commit per clock.
3 | P a g e

• Store operations perform the memory address calculation at the Execution stage and
commit the memory storage at the Commit stage
Report your answer in the form of a table as shown below.
Iteration Instruction Issue at EX/ MEM
start at
Write
CDB at
Commit
at
Comment
1 fld f2,0(x1) 1 2 3 4


1 fmul.d f4,f2,f0 1 4



Wait for f2
1 fld f6,0(x2) 2




Question 3 (2 points)
Consider the following code, which multiplies two vectors that contain single-precision complex
(real parts saved in re, imaginary parts saved in im) values:
for (i=0;i<200;i++) {
c_re[i] = a_re[i] * b_re[i] - a_im[i] * b_im[i];
c_im[i] = a_re[i] * b_im[i] + a_im[i] * b_re[i]; }
Assume that the vector processor has a maximum vector length of 64. The base addresses of the
arrays a_re[], a_im[], b_re[], b_im[], c_re[], and c_im[] are placed in the registers x1, x2, x3, x4, x5,
and x6 respectiviely.
Assume we have used vsetdcfg 6*FP32 to configure the first six vector registers to hold 32-bit FP
data, and the vector registers are v1, v2, v3, v4, v5, and v6. Convert the above loop into RV64V
code. Note that 200 is not divisible by 64.




欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468