代写辅导接单-Computer Architecture Instructor: Gedare Bloom Project 2

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Computer Architecture Instructor: Gedare Bloom Project 2

You must do this assignment in your assigned group. You may not use code downloaded from the Internet without prior permission. Always watch the course website for updates on the assignments.

Read all instructions carefully. Start early. I recommend completing one Part each week. Due Saturday, December 9

It’s your first day on the job at the Foo Company. You are part of a design team that needs to evaluate the RV32I machine that management has decided to consider. Foo is creating some scientific computing software and needs to know that RV32I is worthwhile. The computational kernel of the software is a function that averages the columns of a matrix using integer truncation (round down) of the result. The following is the high-level source code for this computational kernel:

unsigned int mm[nrow][ncol]; unsigned int avg[nrow];

void avg_columns(unsigned int **matrix, int nrow, int ncol) { for (int i = 0; i < ncol; i++) {

unsigned int sum = 0;

for (int k = 0; k < nrow; k++) { } sum += matrix[k][i];

}} avg[i] = sum / nrow;

To make management happy you need to implement this functionality in RISC-V assembly without using a divide (div/divu) instruction, as the RV32I machine lacks support for those instructions. Then evaluate the performance of your program on the RV32I machine and consider several possible optimizations as detailed below. The machine is a fully pipelined processor with hazard detection, data forwarding, and cache. It predicts branches and jumps as never taken, and implements branch and jump instructions in the MEM stage. You have a simulator for the machine, which is the QtRVSim platform.

Set the cache parameters for both i$ (Program Cache) and d$ (Data Cache) to be direct- mapped with block size of 16 bytes and 64 blocks in each cache. Hence, each cache size is 1 KiB.

Group Formation

See the Project2_Groups in Canvas (PeopleàGroups) to find your group members and make contact with them. Do this ASAP.

I require that you use git source version control. You may use bitbucket or github, with a private git repository that is shared among your team members. You may like to use branches and pull requests to synchronize your work with each other, or you may also work directly on the same repository (e.g., the first team member’s) and push directly to a branch (or even just use the main branch) there by giving write access to the other team member. Commit and push frequently as you work with your team.

Part 1: Unsigned Column Average (20 points)

Your first task is to finish the implementation of avg_columns so that it works on unsigned (positive) integers. A simple test case could be

��=�0 1 2� 123

avg_columns(A, 2, 3);

��=[0 1 2]

Management would like your software implementation to work on test cases up to at least nrow = 32 and ncol = 32.

Part 2: Memory-Aware Implementation (20 points)

Rewrite the assembly code you generated for avg_columns to use the following high- level source code structure:

void avg_columns(int **matrix, int nrow, int ncol) { for (int i = 0; i < nrow; i++) {

avg[i] = matrix[i][0];

for (int k = 1; k < ncol; k++) { } avg[i] += matrix[i][k];

}} avg[i] /= nrow;

Part 3: Performance Analysis (20 points)

1. Compare your implementations from Part 1 and Part 2 using speedup. Consider a variety of matrix sizes from 8x8 up to 256x256. Does one implementation do better than the other? Why do you think that is?

2. Replace your software implementations from Part 1 and Part 2 of division by the divu instruction. Repeat your analysis comparing these two implementations with hardware division. Do your conclusions change? Why do you think that is?

3. Repeat the analysis of steps 1 and 2 with four times as much data cache. Consider quadrupling the block size, quadrupling the number of blocks, or doubling the block size and doubling the number of blocks. Do your conclusions change? Does one of the cache parameters help more than the other? Why do you think?

If you had to choose between implementing division in hardware and quadrupling the size of the data cache, which would you do? Are you surprised by that conclusion?

Part 4: Report and Log (40 points)

Determine how best to represent the results of your analysis from Part 3 using tables or graphs. Management is most impressed by fancy graphics and gets bored by lots of text.

Create a written report in PDF format that explains:

• Your team composition and roles of each member

• Your method of implementation of unsigned column average

• Your analyses of performance

• Your recommendation for adding more cache versus implementing division in

hardware

• Anything else you would like to say to management

Add the report and its source (e.g., docx file) to your repository. Be careful to avoid merge conflicts with binary format files such as Word or PDF because they can be difficult to resolve.

In addition, generate a text file with the contents of your git log using the command:

$ git log --pretty=format:"%h %an (%ad): %s" --date=short > log.txt

Submit the log.txt file for 10 points.

Submission Instructions (read carefully)

Make a tar and gzip file with your group members’ UCCS usernames (email address without the @uccs.edu part) as the name.

– tar -zcvf ${USERNAME1}-{USERNAME2}-project2.tgz ${USERNAME1}- {USERNAME2}-project2/

Where all your files are in the ${USERNAME1}-${USERNAME2}-project2 directory. For example, I would replace ${USERNAME1} above with gbloom, and ${USERNAME2} with the second member of my team. If your team has a third member, add their username as well. Do not include compiled output in your submission! Include the PDF report, your source code, and the log.txt file. Upload the tgz file to Project 2 on Canvas.