2023/12/8 18:06 Final Project: Microarchitectural Design-space Exploration Using SimpleScalar
Final Project: Microarchitectural Design-space
Exploration Using SimpleScalar
FAQ: Design Project FAQ (https://psu.instructure.com/courses/2277465/pages/design-project-faq?
wrap=1)
Final Project: Design Space Exploration
(Note, this, like other assignments, can utilize grace hours)
In this project, you are going to use SimpleScalar as the evaluation engine to perform a design space
exploration, using the provided framework code, over an 18- dimensional processor pipeline and
memory hierarchy design space. You will use a 5-benchmark suite as the workload. This is an
individual project, and submitted artifacts include BOTH a) your code implementations of
replacements for two stub functions within the provided framework and b) a project report discussing
your chosen heuristics for exploring the design space that summarizes your findings and how they
confirm or run counter to intuitions developed in the discussion of material in class and assigned
readings. The project report, the data contained within and their analysis will be the primary means of
assessing this project, but a submission of your code modifications in YOURCODEHERE.cpp is
required.
The provided framework already handles the invocation of SimpleScalar, the recording of evaluated
points, and the collection of data from each evaluation. The set of possible points within the design
space to be considered are constrained by the provided shell script wrapper used to employ
SimpleScalar as a design point evaluation tool. All allowed configuration parameters for each
dimension of the design space are described in the provided shell script. The shell script takes 18
integer arguments, one for each configuration dimension, which expresses the index of the selected
parameter for each dimension. All reported results should be normalized against a baseline design
with configuration
parameters: 0, 0, 0, 0, 0, 0, 5, 0, 5, 0, 2, 2, 2, 3, 0, 0, 3, 0 (This baseline is hard-coded into the
framework).
Note that not all possible parameter settings represent a valid combination. One of your tasks
will be to write a configuration validation function based upon restrictions described later in this
document. Further, note that this design space is too large to efficiently search in an exhaustive
manner, so a key project goal will be to develop heuristics in prioritizing which design points to
explore first.
2023/12/8 18:06 Final Project: Microarchitectural Design-space Exploration Using SimpleScalar
Your assignment is to, with an evaluation count limit of 1000 design points, explore the design space
in order to select the best performing design under a set of two different optimization functions.
These are
1) the “best” performance-oriented design (in term of the minimum geometric mean of normalized
energy-delay-squared product across all benchmarks),
2) the most energy-efficient design (as measured by the lowest geometric mean of normalized
energy-delay product [units of energy delay product are joule-seconds] across all benchmarks).
The code to perform this exploration is incomplete in the following two ways:
The validation function that checks if a given configuration complies with
project/simplescalar/logical constraints is not fully implemented (it currently only checks that each
dimension is within bounds independently, but does not check their interactions). You will need to
write the validation function to enforce these interdimensional checks (i.e. the L2 must be bigger
than the L1, etc.)
The proposal function that generates the next configuration to explore (i.e. moves to the next of
the 1000 sample points) currently generates a completely random next configuration, thereby not
exploring the space of possible configurations with any intelligence or efficiency. Use your
knowledge of performance and efficiency tradeoffs to heuristically narrow the search space such
that the performance (ED^2P) and efficiency (EDP) options explore a different (overlapping is
fine) set of 1000 designs
Modeling Considerations:
The IC (instruction count) for each benchmark is a constant. Thus, for performance, you will be trying
to optimize sim_IPC and the clock cycle time. We will use the following very simplistic model for clock
cycle time: The clock cycle time is determined by the fetch width and whether the machine is in-order,
or dynamic as follows:
Dynamic, fetch width = 1 means a 115 ps clock cycle
Static, fetch width = 1 means a 100 ps clock cycle
Dynamic, fetch width = 2 means a 125 ps clock cycle
Static, fetch width = 2 means a 120 ps clock cycle
Dynamic, fetch width = 4 means a 150 ps clock cycle
Static, fetch width = 4 means a 140 ps clock cycle
Dynamic, fetch width = 8 means a 175 ps clock cycle
Static, fetch width = 8 means a 165 ps clock cycle
Power and energy settings are as follows:
Core Leakage power:
Dynamic, fetch width = 1 : 1.5 mW
Static, fetch width = 1 : 1mW
Dynamic, fetch width = 2 : 2 mW
Static, fetch width = 2 : 1.5 mW
2023/12/8 18:06 Final Project: Microarchitectural Design-space Exploration Using SimpleScalar
Dynamic, fetch width = 4 : 8 mW
Static, fetch width = 4 : 7 mW
Dynamic, fetch width = 8 : 32 mW
Static, fetch width = 8 : 30 mW
Cache and memory access energy, leakage/refresh power (don’t forget instruction fetch when
calculating access energy!):
8KB : 20pJ, 0.125mW
16KB : 28pJ, 0.25mW
32KB : 40pJ, 0.5mW
64KB : 56pJ, 1mW
128KB : 80pJ, 2mW
256KB : 112pJ, 4mW
512KB : 160pJ, 8mW
1024KB : 224pJ, 16mW
2048KB : 360pJ, 32mW
Main Memory : 2nJ, 512mW
Energy per committed instruction:
Dynamic, fetch width = 1 : 10pJ
Static, fetch width = 1 : 8pJ
Dynamic, fetch width = 2 : 12pJ
Static, fetch width = 2 : 10pJ
Dynamic, fetch width = 4 : 18pJ
Static, fetch width = 4 : 14pJ
Dynamic, fetch width = 8 : 27pJ
Static, fetch width = 8 : 20pJ
Area cost values:
Static scheduled processor (base): fetch width /2 * 1 sq-mm
Dynamically scheduled processor (base): 4 + fetch width /3 * 1 sq-mm
Cache: 1 sq-mm/32 KB capacity, all cache levels
LSQ: (size/2) * 1/128 sq-mm
RUU: (size/4) * 1/128 sq-mm
Branch predictors:
Perfect: 1234567890 sq-mm
Not-taken : 0 sq-mm
Bimodal: 0.25 sq-mm
2-level: 0.5 sq-mm
comb: 0.75 sq-mm
Other setting constraints (for validation) are as follows:
2023/12/8 18:06 Final Project: Microarchitectural Design-space Exploration Using SimpleScalar
1. The il1 block size must match the ifq size (e.g., for the baseline machine the ifqsize is set to 1
word (8B) then the il1 block size is also set to 8B). The dl1 should have the same block size as
your il1.
2. The ul2 block size must be at least twice your il1 (dl1) block size with a maximum block size of
128B. Your ul2 must be at least as large as il1+dl1 in order to be inclusive.
3. The il1 sizes and il1 latencies are linked as follows (the same linkages hold for the dl1 size and
dl1 latency):
il1 = 8 KB (baseline, minimum size) means il1lat = 1
il1 = 16 KB means il1lat = 2
il1 = 32 KB means il1lat = 3
il1 = 64 KB (maximum size) means il1lat = 4
The above are for direct mapped caches. For 2-way set associative add an additional cycle of
latency to each of the above; for 4-way (maximum) add two additional cycles.
4. The ul2 sizes and ul2 latencies are linked as follows:
ul2 = 128 KB (minimum size) means ul2lat = 7
ul2 = 256 KB (baseline) means ul2lat = 8
ul2 = 512 KB means ul2 lat = 9
ul2 = 1024 KB (1 MB) means ul2lat = 10
ul2 = 2 MB (maximum size) means ul2lat = 11
The above are for 4-way set associative caches. For 8-way set associative add one additional
cycle of latency to each of the above; for 16 way (maximum) add two additional cycles; for 2-way
set associative subtract 1 cycle; for direct mapped subtract 2 cycles.
5. Miscellaneous
mplat is fixed at 3
fetch:speed ratio of no more than 2
ifqsize can be set to a maximum of 8 words (64B)
decode:width and issue:width equal to your fetch:ifqsize
mem:width is fixed at 8B (memory bus width)
memport can be set to a maximum of 2
mem:lat is fixed at 51 + 7 cycles for 8 word
tlb:lat is fixed at 30, maximum tlb size of 512 entries for a 4-way set associative tlb
ruu:size maximum of 128 (assume must be a power of two)
lsq:size maximum of 32 (assume must be a power of two)
The framework code will evaluate a fixed number of design points per run. This parameter cannot
be changed. The key part of your task in this project is to develop the (set of/parameterized) heuristic
search function(s) that select(s) the next design point to consider, given either a performance, or an
energy efficiency goal, etc. When generating plots for the final report, the output for each plot should
correspond to the framework code being run only once for each of the optimization function options.
Your report for the course Project must be submitted via Canvas. (PDF only)
2023/12/8 18:06 Final Project: Microarchitectural Design-space Exploration Using SimpleScalar
The report should be a minimum of 4, maximum of 10, typed, single space pages in 12 point font for
text and tables. The report should include, at minimum, the following four plots:
1 line plot of the normalized geomean ED^2P (y axis) for each considered design point vs. number of
designs considered (x axis);
1 line plot of the normalized geomean of energy-delay product (y axis) vs number of designs
considered;
1 bar chart showing normalized per-benchmark ED^2P and geomean normalized ED^2P for the best
performance-optimized design;
1 bar chart showing per-benchmark normalized energy-delay product and geomean normalized
energy delay product for the most energy-efficient design found;
You will not be penalized for not finding the absolute best design points. However, you must produce
a distinct design space sequencing heuristic for both optimization metrics, and defend/explain both
the heuristic and its results in your report (i.e. if you claim that the highest performing configuration is
the one with minimum resources in all dimensions, and list “magic” as the reason, you will not receive
full points). For clarity in the written report, when listing the best design points, please do not
represent them in terms of their index representations (e.g. 1 0 0 5 2 ...) and instead describe the
actual value used for each dimension in a table or similar presentation. Points will also be assigned
for following the guidelines and adhering to appropriate levels of clarity, and style (and spelling,
grammar, etc.) for a technical document (e.g. where possible, avoid tangential discourse on whether
a task was difficult or whether a result was, for instance, exciting – focus on analysis within the scope
of the design space explored, how past information was used to generate the next configuration, the
reasons that particular configuration combinations are sensible, etc.). Reports that do not cover
both of the specified optimization criteria will receive a maximum of 50% as their score.
You are allowed to edit the entirety of YOURCODEHERE.cpp, not just the specified functions (i.e. you
may add as many auxiliary functions and data structures as you see fit.) If you would prefer to avoid
using the C++ framework, you may rewrite your own framework in a language of your choosing -
however, it is advised that this will be time consuming and the use of the standard framework is
strongly advised.
Submit the project writeup in this drop box. You will submit the "YOURCODEHERE.cpp" file in a
separate dropbox as a partial audit of your implementation efforts. (note - if choosing to re-implement
the framework in another language, for CANVAS simplicity, please submit a .zip file containing your
entire framework codebase and any associated build scripts instead).