EE524/CPTS561 Make up exam Friday, 4 December 2020. Submit by 8:00 am, 5 December 2020 Name: ID: FOR FULL CREDIT: • Please show your work, justify the claims you make and point to a source if you are using any. You may lose points if you use a resource without appropriate references. • Please DO NOT collaborate on your answers. You should work on the exam questions by yourself. • DO NOT post the questions to any forum online Please type or hand-write your answers and upload them to Blackboard. Please convert your files or images to PDF format and upload the PDF. It makes annotations and returning feedback to you much easier. Submit your exams by 8:00 am on Saturday, 5 December 2020. Late submissions will be penalized at 10 points/hour for each hour of delay. Name: EE524/CPTS561, Fall 2020 Question 1 [20 points] For the memory reference stream given in the following, compare the cost of executing it on a bus-based machine that supports the MESI and MSI protocols. All of the references in the streams are to the same location: r/w indicates read or write, and the digit refers to the processor issuing the references. Assume that all caches are initially empty, and use the following cost model: read/write cache hit—1cycle; misses requiring simple transaction on bus (e.g. upgrade from E/S to M) - 60 cycles; and misses requiring whole cache block transfer - 90 cycles. Assume all caches are write allocated, i.e. data at the missed-write location is loaded to cache, followed by a write-hit operation. Furthermore, the cache is write-back, i.e. data is written into the memory only when it is evicted or invalidated from the cache. The state transition diagrams for both protocols are below: PrRd/— PrRd/— PrWr/BusRdX BusRd/— PrWr/— S M I BusRdX/Flush BusRdX/— BusRd/FlushPrWr/BusRdX PrRd/BusRd PrWr/— BusRd/Flush PrRd/ BusRdX/Flush PrWr/BusRdX PrWr/— PrRd/— PrRd/— BusRd/Flush’¢ E M I S PrRd BusRd(S) BusRdX/Flush’¢ BusRdX/Flush BusRd/ Flush PrWr/BusRdX PrRd/ BusRd (S ) MSI MESI Complete the following tables while paying particular attention to the state in each processor. After completing the table describe any differences you see in the coherence patterns of MESI and MSI. 2 Name: EE524/CPTS561, Fall 2020 MESI P1 P2 P3 Bus Action Cycles Read 1 Read 1 Read 1 Write 1 Write 1 Read 2 Write 2 Write 2 Read 2 Read 1 Read 3 Write 3 Write 1 Read 2 Latency MSI P1 P2 P3 Bus Action Cycles Read 1 Read 1 Read 1 Write 1 Write 1 Read 2 Write 2 Write 2 Read 2 Read 1 Read 3 Write 3 Write 1 Read 2 Latency 3 Name: EE524/CPTS561, Fall 2020 4 Name: EE524/CPTS561, Fall 2020 Question 2 [20/20 points] In each part, 14 points are for filling the table and 6 points are for the reasoning. Consider the following snippet of code which has three branches. A branch with an if statement is taken, if code inside the curly brackets is not executed. i f ( va l % 2 == 0) { /∗ B1 ∗/ sum += val ; /∗ Not taken path f o r B1 ∗/ } i f ( va l % 5 == 0) { /∗ B2 ∗/ sum += val ; /∗ Not taken path f o r B2 ∗/ } i f ( va l % 10 == 0) { /∗ B3 ∗/ sum += val ; /∗ Not taken path f o r B3 ∗/ } a) Consider a correlating predictor that has one bit of history. Also consider that each branch has a 2 bit branch predictor that needs to mispredictions to change its prediction (the one shown in lectures). With these assumptions, complete the following table with the branch prediction behavior. Assume that all predictors are initialized to Not taken/not taken state and the global predictor is also not taken initially. G denotes the global predictor. Val G B1pred B1 act. B1 new G B2 pred B2 act. B2 new G B3 pred B3 act. B3 new 2 NT NT/NT NT/NT NT/NT 5 20 7 15 What is the total accuracy of branch predictions for the sequence of values? What are some of the reasons the correlating branch predictor is mispredicting? 5 Name: EE524/CPTS561, Fall 2020 6 Name: EE524/CPTS561, Fall 2020 b) Now, consider that we increase the number of global predictor bits to two. Complete the table below assuming that all the predictors are initialized to not taken again. G denotes the global predictor. Note that the last taken branch in the global history is updated at the least significant bit, i.e. if B1 is taken and B2 is not taken, the global history will be T/NT. Val G B1pred B1 act. B1 new G B2 pred B2 act. B2 new G B3 pred B3 act. B3 new 2 NT/NT NT/NT NT/NT NT/NT 5 20 7 15 Does the prediction accuracy improve with the two bit global history? Why or why not? 7 Name: EE524/CPTS561, Fall 2020 Question 3 [10 points each] (a) You are proposing to add explicit register renaming support to the processors designed by the your company. Other designers in the team ask you about the requirements and support needed to implement explicit register renaming. Please list the reasons that you will provide your fellow designers below: (b) Briefly describe the issue of branch divergence in graphical processing units. How can we ensure that the program function correctly when branch divergence occurs? In addition, list any potential negative effects of branch divergence. 8 Name: EE524/CPTS561, Fall 2020 Question 4 (Short answer questions) [5 points each] (a) In a processor with dynamic scheduling and speculative execution, an issued instruction may be waiting in reservation station. List the two causes of the waiting. (b) Assume that you designed a speculative processor that is able to issue three instructions per cycle. You also include Tomasulo’s algorithm, branch prediction and the necessary hardware (e.g. multiple common data buses) to support three issues per cycle. However, you observe that for real applications the hardware does not always issue three instructions per cycle. What are the reasons that prevent the processor from sustaining three instructions per cycle? Can simultaneous multithreading help in alleviating this issue? 9 Name: EE524/CPTS561, Fall 2020 (c) List four possible methods that help in reducing control flow penalty in a uniprocessor. You may include both software and hardware solutions. (d) What are the advantages of using chaining in vector processors? Provide an equivalent analogy for optimization in scalar processors. 10
欢迎咨询51作业君