代写辅导接单- CMPEN 431

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Project 1: Dynamic (trace) scheduling

For the first project based on CMPEN 431 material, you will explore how to

a) programmatically perform the dataflow scheduling you have previously only seen done by hand

b) be able to use this scheduler to develop intuitions about the marginal benefits of OoO core parameters in the service of developing heuristics for project part 3

In a programming language of your choice among {C, Java, Python, C++} [note, your makefile must ensure that the correct version of python is invoked on CSE machines, i.e. be explicit in whether to invoke the python3 or python2.7 interpreter], implement a program with the following specification that performs dynamic (OoO) scheduling with conservative load-store ordering on a restricted set of simplified instructions:

Your program will take one command line input, a filename, and output its results in a file named

out.txt

Format of input file:

The input file will consist of a first line with two comma separated (no whitespace) positive integers followed by between 1-256 lines of a format described below. The two integers on the first line will specify the number of physical registers in the system and the issue width of the machine. All machine resources will match issue width, and the number of physical registers will always be greater than 32. If not, the program should produce an empty output file.

Each subsequent line will contain one of the following "instructions" R,<REG>,<REG>,<REG>

I,<REG>,<REG>,<IMM>

L,<REG>,<IMM>,<REG>

S,<REG>,<IMM>,<REG>

where {R,I,L,S} are the capital letters R, I, L, and S, {,} is comma, <REG> is a positive integer value between 0 and 31, inclusive, and <IMM> is a POSITIVE integer value between 0 and 65535, with both <REG> and <IMM> encoded as decimals. The first <REG> is the destination for R, I, and L. Memory (not modeled) is the destination for S.

Assuming:

Pipeline stages = <FE>,<DE>,<RE>,<DI>,<IS>,<WB>,<CO>

No fetch/decode stalls (equivalent to unbounded queue between DE and RE)

The first instruction is always fetched in cycle 0

The initial architectural to physical mappings are A0->P0, A1->P1...A31->P31 and all other physical registers are on the free list in increasing register order (i.e. the next register consumed in renaming will always be P32)

All memory operations hit in the cache, and all instructions writeback in the cycle following their issue.

produce the following output in out.txt:

For each instruction in the input file, produce a corresponding line in out.txt of the form <FE>,<DE>,<RE>,<DI>,<IS>,<WB>,<CO>

where all comma-separated fields are non-negative integers encoded as decimals and represent the cycle in which the associated instruction completes the specified stage. For simplicity, assume that S- class instructions occupy WB in the cycle after they issue.

Note that the following restrictions with regard to register freeing and conservative load-store scheduling apply: registers freed at commit in a cycle (N) are not available on the free list until the following cycle (N+1), and that potentially dependent memory operations cannot issue in the same cycle.

Reminder: You are not allowed to copy large pieces (or the entirety) of OoO simulator code from the web wholesale (yes, people have done this. No, submitting a random MIPS simulator doesn't actually fulfill the assignment specifications and is a base score of 0 in addition to an AI violation...) That said, you are welcome to look over the SimpleScalar documentation for inspiration, noting for instance, the structure of its primary simulation loop.

Example input-output pairs will be provided in the modules directory for your testing benefit.

An autograder on Gradescope will evaluate these examples in the testing environment and report compliance with expected behavior.

Submit the following ( to both Canvas & Gradescope):

One compressed tarball {.tgz} file containing both A and B below that extracts both A and B into the

same (current) directory (i.e. does not contain any directory hierarchy itself):

A) Your program source code, in one of the above listed languages.

B) A makefile that, when run on a Gradescope Ubuntu image in a directory containing only your makefile, your submitted program code, and a file named test.in, when given the target "test" (i.e. the command line "make test" is executed) will perform any necessary compilation and invoke your program on test.in to produce out.txt

Please note the following -- while, as a project-type assignment, you are strictly prohibited from collaborating on the contents of A) above, you may freely assist each other in developing or making copies of B) -- the point of this exercise is not to force you to read the manual pages for gmake if you are not familiar with make environments; however, a common build and invocation method is required for logistically plausible testing.

Suggestion 0: Please try copying your tarball into a clean directory, unpacking, and running "make test" in a linux environment before turning in this assignment. (If you do not attempt to discover if your code compiles before turning it in, we are not going to be charitably inclined to figure out how to fix your build for you after the deadline.) Once this basic sanity check passes, uses the autograder output to correct any errors with the provided test cases.

Suggestion 1: note that implementing dynamic scheduling involves performing associative search - for the scales considered, this can be readily implemented (inefficiently) via linear search -- substantially more efficient data structures e.g. dictionaries/maps/etc. can also be used, but may not be worthwhile to create from scratch if you are implementing your solution in a language like C that neither supports them natively or through standard libraries. In !C, it is often simplest to use the supported associative search structures.

Suggestion 2: Do not attempt to program a closed form solution. It isn't going to work. Really, just don't. You want to do a cycle by cycle simulation.

Suggestion 3: No, really, you want to do a cycle by cycle simulation. Once you've parsed the input file into an internal data structure, the heart of your solution should be wrapped inside a loop that iterates over all of the structures that need to be updated in a given cycle and then repeats until all instructions have committed.

Consider the following as a reasonable guideline for your top level function:

int main(int argc, char** argv){

init(argc,argv);

unsigned int committedInsts=0;

unsigned int fetchIndex=0;

while(committedInsts<icount){

committedInsts=commit(committedInsts);

WB();

Issue();

Dispatch();

Rename();

Decode();

fetchIndex=fetch(fetchIndex);

cyclecount++;

}

emitOutput();

return 0;

}

Suggestion 4: You're going to want to simulate the relevant structures. The in-order portions of the pipeline effectively function as queues - so, use queues to model them. The free list can be similarly modeled. The map table is a table, as is the ready table - both can be modeled with simple arrays.

The IQ needs associative lookups - you can be cleverly efficient with mapping structures or you can just search the entire structure linearly every cycle, your choice.

Suggestion 5: Excluding your input/output parsing/formatting code (which may vary greatly in verbosity between languages) the functions implementing your primary simulator loop should probably be no more than 100-200 lines of code, give or take. Depending on implementation approach, most pipeline stages should be relatively straightforward to implement. Expect RENAME and ISSUE to be somewhat more complicated.

Suggestion 6: You will want to keep track of all of the intermediate data associated with your instruction as it goes through each stage. It's probably a good idea to define a data structure for each instruction that collects both its architectural properties and its transient microarchitectural state properties (i.e. at the least, what an ROB entry would need to hold)

Suggestion 7: Once all of the provided example files work for you, consider creating some input files of your own to see how the simulation output responds -- the real goal of this assignment is to get you thinking about how you would build (one of the components of) a tool to help explore the many possible implementations and policy options that are possible in processors. How might you need to change your code to model asymmetric numbers of functional units? Variable functional unit latency/pipelining? Cache misses? Control instructions?