程序代写案例-EE524/CPTS561

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

EE524/CPTS561 Make up exam
Friday, 4 December 2020. Submit by 8:00 am, 5 December 2020
Name: ID:
FOR FULL CREDIT:
• Please show your work, justify the claims you make and point to a source if you are using any.
You may lose points if you use a resource without appropriate references.
• Please DO NOT collaborate on your answers. You should work on the exam questions by
yourself.
• DO NOT post the questions to any forum online
Please type or hand-write your answers and upload them to Blackboard. Please convert your files or
images to PDF format and upload the PDF. It makes annotations and returning feedback to you
much easier.
Submit your exams by 8:00 am on Saturday, 5 December 2020. Late submissions will be
penalized at 10 points/hour for each hour of delay.
Name: EE524/CPTS561, Fall 2020
Question 1 [20 points]
For the memory reference stream given in the following, compare the cost of executing it on a
bus-based machine that supports the MESI and MSI protocols. All of the references in the streams
are to the same location: r/w indicates read or write, and the digit refers to the processor issuing the
references. Assume that all caches are initially empty, and use the following cost model: read/write
cache hit—1cycle; misses requiring simple transaction on bus (e.g. upgrade from E/S to M) - 60
cycles; and misses requiring whole cache block transfer - 90 cycles. Assume all caches are write
allocated, i.e. data at the missed-write location is loaded to cache, followed by a write-hit operation.
Furthermore, the cache is write-back, i.e. data is written into the memory only when it is evicted or
invalidated from the cache. The state transition diagrams for both protocols are below:
PrRd/—
PrRd/—
PrWr/BusRdX
BusRd/—
PrWr/—
S
M
I
BusRdX/Flush
BusRdX/—
BusRd/FlushPrWr/BusRdX
PrRd/BusRd
PrWr/—
BusRd/Flush
PrRd/
BusRdX/Flush
PrWr/BusRdX
PrWr/—
PrRd/—
PrRd/—
BusRd/Flush’¢
E
M
I
S
PrRd
BusRd(S)
BusRdX/Flush’¢
BusRdX/Flush
BusRd/
Flush
PrWr/BusRdX
PrRd/
BusRd (S )
MSI MESI
Complete the following tables while paying particular attention to the state in each processor.
After completing the table describe any differences you see in the coherence patterns of MESI and
MSI.
2
Name: EE524/CPTS561, Fall 2020
MESI
P1 P2 P3 Bus Action Cycles
Read 1
Read 1
Read 1
Write 1
Write 1
Read 2
Write 2
Write 2
Read 2
Read 1
Read 3
Write 3
Write 1
Read 2
Latency
MSI
P1 P2 P3 Bus Action Cycles
Read 1
Read 1
Read 1
Write 1
Write 1
Read 2
Write 2
Write 2
Read 2
Read 1
Read 3
Write 3
Write 1
Read 2
Latency
3
Name: EE524/CPTS561, Fall 2020
4
Name: EE524/CPTS561, Fall 2020
Question 2 [20/20 points]
In each part, 14 points are for filling the table and 6 points are for the reasoning.
Consider the following snippet of code which has three branches. A branch with an if statement is
taken, if code inside the curly brackets is not executed.
i f ( va l % 2 == 0) { /∗ B1 ∗/
sum += val ; /∗ Not taken path f o r B1 ∗/
}
i f ( va l % 5 == 0) { /∗ B2 ∗/
sum += val ; /∗ Not taken path f o r B2 ∗/
}
i f ( va l % 10 == 0) { /∗ B3 ∗/
sum += val ; /∗ Not taken path f o r B3 ∗/
}
a) Consider a correlating predictor that has one bit of history. Also consider that each branch has
a 2 bit branch predictor that needs to mispredictions to change its prediction (the one shown in
lectures). With these assumptions, complete the following table with the branch prediction behavior.
Assume that all predictors are initialized to Not taken/not taken state and the global predictor is
also not taken initially. G denotes the global predictor.
Val G B1pred
B1
act.
B1
new G
B2
pred
B2
act.
B2
new G
B3
pred
B3
act.
B3
new
2 NT NT/NT NT/NT NT/NT
5
20
7
15
What is the total accuracy of branch predictions for the sequence of values? What are some of
the reasons the correlating branch predictor is mispredicting?
5
Name: EE524/CPTS561, Fall 2020
6
Name: EE524/CPTS561, Fall 2020
b) Now, consider that we increase the number of global predictor bits to two. Complete the table
below assuming that all the predictors are initialized to not taken again. G denotes the global
predictor. Note that the last taken branch in the global history is updated at the least significant bit,
i.e. if B1 is taken and B2 is not taken, the global history will be T/NT.
Val G B1pred
B1
act.
B1
new G
B2
pred
B2
act.
B2
new G
B3
pred
B3
act.
B3
new
2 NT/NT NT/NT NT/NT NT/NT
5
20
7
15
Does the prediction accuracy improve with the two bit global history? Why or why not?
7
Name: EE524/CPTS561, Fall 2020
Question 3 [10 points each]
(a) You are proposing to add explicit register renaming support to the processors designed by
the your company. Other designers in the team ask you about the requirements and support
needed to implement explicit register renaming. Please list the reasons that you will provide
your fellow designers below:
(b) Briefly describe the issue of branch divergence in graphical processing units. How can we
ensure that the program function correctly when branch divergence occurs? In addition, list
any potential negative effects of branch divergence.
8
Name: EE524/CPTS561, Fall 2020
Question 4 (Short answer questions) [5 points each]
(a) In a processor with dynamic scheduling and speculative execution, an issued instruction may
be waiting in reservation station. List the two causes of the waiting.
(b) Assume that you designed a speculative processor that is able to issue three instructions per
cycle. You also include Tomasulo’s algorithm, branch prediction and the necessary hardware
(e.g. multiple common data buses) to support three issues per cycle. However, you observe that
for real applications the hardware does not always issue three instructions per cycle. What
are the reasons that prevent the processor from sustaining three instructions per cycle? Can
simultaneous multithreading help in alleviating this issue?
9
Name: EE524/CPTS561, Fall 2020
(c) List four possible methods that help in reducing control flow penalty in a uniprocessor. You
may include both software and hardware solutions.
(d) What are the advantages of using chaining in vector processors? Provide an equivalent analogy
for optimization in scalar processors.
10

欢迎咨询51作业君