程序代写接单-Simple Processor Pipeline Simulator

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Simple Processor Pipeline Simulator Due: Monday April 11 (Recommended). Extended Deadline April 18 In this assignment, your group will implement a simple processor in-order pipeline simulator, similar to the concepts that we covered in class. Your simulator will read an instruction trace with a simplified format. Your simulator should be a cycle-accurate simulator that models how each instruction moves through the pipeline. You will model the simple 5-stage in-order pipeline described in class. You will also design and implement data structures to track instruction progress until instructions retire and exit the pipeline. You need to print periodic and final statistics about total execution time, and instruction type histogram. The details of the system are described below. System to Simulate: As discussed in class, the processor has a 5-stage pipeline with the following stages: Instruction Fetch (IF), Instruction Decode and Read Operands (ID), Instruction Issue and Execute (EX), Memory access (MEM), and writeback results/retire (WB). The processor is W-wide superscalar (i.e., W pipelines running in parallel). Each instruction has to go through all five stages in the pipeline in this order (IF-ID-EX-MEM-WB). The next W instructions in program order should proceed to the next pipeline stage as long as they have no dependences. The simulated processor has a single integer ALU unit, a single floating point unit, a single branch execution unit, a single read port from the L1 data cache, and a single write port into the L1 data cache. We make the following assumptions about the simulated system: 1. Nobranchprediction 2. AllinstructionfetcheshitintheL1instructioncacheanddon’tneedtoread from memory 3. Allmemoryoperations(loadsandstores)hitintheL1datacacheanddon’t need to read/write from/to memory. 4. Integerandfloatingpointoperationsexecutein1cycle. 5. Anybranchinstructiondelaysinstructionfetchuntilafterthebranchexecutes. The next instruction(s) to be fetched will go to the IF stage in the cycle after the branch finishes the EX stage. 6. AllloadsandstoresaccesstheL1datacachein1cycle. Input Trace: Your simulator will consume a trace with the following format: Each line represents an instruction with comma separated values representing the following: 1. Instructionprogramcounter(PC):Ahexadecimalvaluerepresentingthe instruction address. 2. Instructiontype:Avaluebetween1and5: 0. Integerinstruction:AninstructionthatusestheintegerALU 1. Floatingpointinstruction:Aninstructionthatusesthefloatingpoint unit 2. Branch:Aninstructionthattransferscontroltothenextinstruction in the trace. 3. Load:Aninstructionthatreadsavaluefrommemory. 4. Store:Aninstructionthatwritesavaluetomemory. • A list of PC values for instructions that the current instruction depends on. Some instructions don’t have any dependences, so this list will be empty. Other instructions depend on 1-4 other instructions. We provide three sample traces with this format in the assignment file directory (https://canvas.sfu.ca/courses/67328/files/folder/Proj2 ). These traces are simplified versions of traces from the first value prediction championship (https://www.microarch.org/cvp1/) with one mainly integer trace, one floating point trace, and one server trace. Each trace has approximately 30 million instructions. You should decompress each trace before using in your simulation (run gunzip file_name to decompress each trace). Tracking Dependences Each instruction can execute as long as its dependences are satisfied. Dependences (hazards) you need to track for each instruction fall into the following three categories: 1. Structuralhazards:Anotherinstructioninthesamecycleisusingthesame functional unit. An integer instruction cannot execute in the same cycle as another integer instruction. A floating point instruction cannot execute in the same cycle as another floating point instruction. A branch instruction cannot execute in the same cycle as another branch instruction. A load instruction cannot go to the MEM stage in the same cycle as another load instruction. A store instruction cannot go to the MEM stage in the same cycle as another store instruction. 2. Controlhazards:Abranchinstructionhaltsinstructionfetchuntilthecycle after the branch executes (finishes EX stage). 3. Datahazards:AninstructioncannotgotoEXuntilallitsdatadependencesare satisfied. A dependence on an integer or floating point instruction is satisfied after they finish the EX stage. A dependence on a load or store is satisfied after they finish the MEM stage. Running Replications To get different estimates for runtime, you need to run six different replications for each configuration. Each replication contains 1 million instructions. The first replication starts at instruction 1 in the trace. The second replication starts at instruction 5M. The third starts at instruction 10M. The fourth starts at instruction 15M. The fifth starts at instruction 20M. The sixth starts at instruction 25M. For each replication after the first, all data dependences on instructions before the trace starts should be ignored (that is, if instruction #10M is data-dependent on another instruction before it, that dependence should not be considered in the simulation). Experimental Design You need to run two-factor experimental design where you measure the impact of the pipeline width (W) and the workload trace (T). The pipeline width has four levels: 1, 2, 3, or 4. There are three provided traces. The pipeline width determines how many instructions in parallel can go to IF, ID, EX, MEM and WB in any given cycle (as long as their dependences are satisfied). You need to compute the overall mean runtime for all 4/3 W/T levels, the impact of each level, and the allocation of variation to W, T, their interaction and experimental error. You need to run 6 replications (explained above) for each of the 4x3 configurations to estimate the impact of each level and the allocation of variation. Performance Metrics: You need to keep track of the following performance metrics for the duration of the simulation: • Simulation clock (in cycles). Each simulation starts in cycle 0, and ends with the millionth instruction finishes its WB stage. • Total execution time (in cycles) at the end of simulation. • A histogram containing the breakdown of retired instructions by instruction type. For example: 30% are integer instructions, 5% are floating point, 20% are branches, 30% are loads and 15% are stores. At the end of each simulation, you need to print a report with all the above performance metrics. Submission instructions: You need to submit a single tarball "proj2.tar.gz" that contains the following: 1. Sourcecodeofyoursimulationprogram(includingallheaderfiles,corcppfiles and appropriate comments) and a Makefile that we can use to compile your code on CSIL machines. Your executable should run using the following command line parameter format:: ./proj2 trace_file_name start_inst inst_count W Where trace_file_name is the complete path to the trace file; start_inst is the instruction in the trace to start the simulation; inst_count is the number of instructions to simulate starting from start_inst; and W is the pipeline width. For example: ./proj2 srv_0 10000000 1000000 2 Should simulate a 2-wide processor using the srv_0 trace starting from instruction 10M and simulating 1M instructions. Please note that when we grade your assignment code, we will run the following commands: So please make sure the tarball you submit can successfully pass these commands on CSIL machines. 2. Apdffilethatcontainstheresultsofyourtwofactoranalysis.Youshould include the results of all replications for each of the 4 levels of W and 3 traces, the overall mean, the impact of each level, and the allocation of variation results. tar xvzf proj2.tar.gz make proj2 ./proj2 srv_0 10000000 1000000 2