辅导案例-CSE 130-Assignment 2

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Assignment 2
CSE 130: Principles of Computer System Design, Fall 2020
Due: Friday, November 20 at 9:00AM
Goals
The goal for Assignment 2 is to modify your single-threaded RPC server from Assignment 1 to provide multi-threading and
to provide a simple key-value store for mathematical operations. It must still provide all of the original functionality from
Assignment 1, and will support the following additional features:
1. Multiple threads, one per client, up to 16 simultaneous threads serving clients.
2. Support for variables in math operations. This requires that values be stored in a key-value store.
3. Sharing of the key-value store across all server threads, and synchronization between the server threads for reading and writing
key-value pairs.
As usual, you must have a design document and writeup along with your README.md in your git repository. Your code
must build rpcserver using make.
Programming assignment: Multi-threaded RPC server with simple key-value store
Design document
Before writing code for this assignment, as with every other assignment, you must write up a design document. Your design
document must be called DESIGN.pdf, and must be in PDF.
Your design should describe the design of your code in enough detail that a knowledgeable programmer could duplicate your
work. This includes descriptions of the data structures you use, non-trivial algorithms and formulas, and a description of each
function with its purpose, inputs, outputs, and assumptions it makes about inputs or outputs.
Write your design document before you start writing code. It’ll make writing code a lot easier. Also, if you want help with
your code, the first thing we’re going to ask for is your design document. We’re happy to help you with the design, but we can’t
debug code without a design any more than you can.
You must commit your design document before you commit the code specified by the design document. You’re welcome
to do the design in pieces (e.g., overall design and detailed design of marshaling functions but not of the full server), as long as
you don’t write code for the parts that aren’t well-specified. We expect you to commit multiple versions of the design document;
your commit should specify why you changed the design document if you do this (e.g., “original approach had flaw X”, “detailed
design for module Y”, etc.). If you commit code before it’s designed, or you commit your design a few minutes before the
working code that the design describes, you will lose points. We want you to get in the habit of designing components before
you build them.
Because this assignment builds on the previous one, you may base your design document for Assignment 2 on that from your
previous assignment. Of course, this assumes that your overall design is similar to that from your prior assignment.
Program functionality
This assignment incorporates all of Assignment 1, and builds on it, so you should be familiar with the material in Assignment 1
and apply it here as well. We’re not going to repeat parts of the RPC protocol described for the first assignment.
You may only use system calls (read(), write(), send(), recv()) for input or output of user data. You may use string
functions to manipulate user data, and you may use fprintf() or similar calls for error messages. Note that string functions
like sprintf() and sscanf() don’t perform user input or output, and are thus allowed.
Your code may be either C or C++, but all source files must have a .cpp suffix and be compiled by clang++ with no errors
or warnings using the following flags: -std=gnu++11 -Wall -Wextra -Wpedantic -Wshadow
Variables and math operations
Assignment 2 builds on your prior assignment that implemented a simple RPC protocol, and follows that protocol except for the
changes listed below. These changes are necessary to support the use of either variables or integers for math operations. We also
want to support more math operations, including integer division and integer remainder. The protocol for file operations hasn’t
Assignment 2, CSE 130, Fall 2020 - 1 - © 2020 Ethan L. Miller
changed from Assignment 1, but you must still support the file operations from that assignment, and must support having them
multi-threaded.
The full set of arithmetic operators for this assignment is:
Operator Opcode Number of operands
add (+) 0x0101 2–3
sub (-) 0x0102 2–3
mul (*) 0x0103 2–3
div (/) 0x0104 2–3
mod (%) 0x0105 2–3
del 0x010f 1
The basic math operators all take either 2 or 3 operands. The first two operands are the inputs to the math operation, and
are either integers or variable names. The third operand, if present, is the name of the variable into which the result is stored, in
addition to returning the value as the result of the RPC call.
Each opcode is modified with zero or more flags to indicate whether the first operand and/or second operand is a variable,
and whether the result should be stored into a variable, in which case there’s a third operand, which is always a variable name.
The flags are OR’ed together with the opcode by the client, so the server knows what arguments to expect. The flags are:
VARIABLE_A: 0x10 VARIABLE_B: 0x20 VARIABLE_RESULT: 0x40
For example, the math expression y = x - 2 would have base opcode 0x0102, which would be OR’ed with 0x10 and
with 0x40 because the first input is a variable, and the result is to be stored into a variable. The full 16-bit operation code sent to
the server is thus: 0x0102 | 0x10 | 0x40 = 0x0152. Numbers are still 64-bit signed integers, but variable names are
sent as a single byte for the length of the name (at most 31 characters long), followed by the actual bytes that make up the name.
The single byte indicating the length of the name must therefore have a value between 1–31, inclusive; any other value is an error
(EINVAL). The third operand is only sent if necessary—if the VARIABLE_RESULT flag is set in the opcode.
Because we have more operations and variable names, there are more ways that an RPC can fail. This table summarizes the
errors that might occur, and the error code that your server must return for each error:
Overflow EOVERFLOW (75) Divide by zero EINVAL (22)
Undefined variable ENOENT (2) Undefined operation ENOTSUPP (95)
The del operator doesn’t do any math. Instead, it deletes an existing variable name from the key-value store (see below). If
the variable name doesn’t exist, it returns ENOENT, but returns zero otherwise unless the name itself is invalid (see below).
Key-value store
Those variable values have to be maintained somewhere, so your RPC server must maintain a table of keys and their associated
values. As described above, a key is like a variable name: it’s from 1–31 characters long, and must start with a letter (upper or
lower case). All other characters in a variable name must be a letter, number, or underscore (_). Any key that doesn’t match these
requirements will cause the operation to fail with EINVAL.
Each key has an associated value: a signed 64-bit integer (int64_t). The keys and values are stored in an in-memory
chained hash table, where the number of linked lists may be specified on the command line with the -H size option. If it
isn’t specified, assume -H 32. For example, specifying -H 64 would mean that there are 64 slots in the base table, each of
which may have a linked list coming off it. Because names are at most 31 characters long, we recommend that a linked list node
be a struct that contains a fixed-size array for the name and a 64-bit signed integer for the value—no need for variable-size
structures. Of course, you’ll also need to have other fields to support the chained hash table that you’re building.
As with most hash tables, your hash table must support insertion, replacement (replacing an old value with a new one),
deletion, and lookup. Hash tables were covered in CSE 101, so we won’t go into detail here, but please ask during office hours if
you need a refresher. Keys and values can be inserted into the hash table as the result of a math operation, or can be updated if the
variable name already exists. Keys are looked up to get a value to put into a math operation if an operand is a variable. Deletion
is done explicitly by the del operator, as described above. You may use any reasonable hash function you like, but you must
cite your source if you get one from the Interwebs. cityhash is considered a good hash function to use, as is murmurhash,
but you may pick another one if you like. The key-value store must be written by you, using no C++ STL types. Write the
hash table and linked list code yourself. This shouldn’t take long, and writing it yourself will give you flexibility to provide
synchronization as needed.
The hash table is shared among all threads running in the server, and persists across all connections the server accepts. Thus,
access to the key-value store must be synchronized between threads, since it could cause corruption if multiple threads accessed
the key-value store without coordination. You should use either semaphores or locks and condition variables to implement this.
Assignment 2, CSE 130, Fall 2020 - 2 - © 2020 Ethan L. Miller
The last thing that the key-value store supports is the ability to dump (opcode 0x0301) or load (opcode 0x0302) the entire
key-value store. Dump writes keys and values, one per line, in the format varname=-1234 (no spaces!), and load reads keys
and values in the same format. These opcodes thus each take a single argument—a file name—using the same format as other file
names. Load and dump may fail in the same way that any read or write operation may fail. Variables loaded into the key-value
store don’t clear existing values, but they do replace a value already present for a variable that’s in the file. For example, if x=5 in
the key-value store and you load x=8, it would replace the original value for x. However, y=8 would be unaffected if it’s not in
the file. You may use dprintf() to print out the values and fscanf() and fdopen(), respectively, to write and read
the keys and values. There is also an operator (0x0310) that clears all values from the key-value store; it takes exactly one
argument: the 32-bit unsigned integer 0x0badbad0. This is done to ensure that the key-value store isn’t accidentally erased. If
this “magic number” isn’t passed as a parameter, return EINVAL.
Multi-threading
Your previous RPC server could only handle a single request at a time, limiting throughput. In Assignment 2, you’ll use multi-
threading to improve throughput. This is done by having each request processed in its own thread. The typical way to do this
is to have a “pool” of “worker” threads available for use. The server creates N threads when it starts; N = 4 by default, but
the argument -N nthreads tells the server to use N = nthreads . For example, -N 6 would tell the server to use 6 worker
threads.
Each thread waits until there is work to do, does the work, and then goes back to waiting. Worker threads may not “busy wait”
or “spin lock”; they must actually sleep by waiting on a lock, condition variable, or semaphore. Each worker thread does the same
basic thing that your server did for Assignment 1 to handle client requests, so you can likely reuse much of the code for it. You’ll
have to add code for the dispatcher to pass the necessary information to the worker thread. Worker threads should be independent
of one another; if they ever need to access shared resources, they must use synchronization mechanisms for correctness. Each
thread may allocate up to 8 KiB of buffer space for its own use.
A single “dispatch” thread listens to the connection and, when a connection is made, alerts (using a synchronization method
such as a semaphore or condition variable) one thread in the pool to handle the connection. Once it has done this, it assumes that
the worker thread will handle the connection itself, and goes back to listening for the next connection. The dispatcher thread is a
small loop (likely in a single function) that contacts one of the threads in the pool when a request needs to be handled. If there are
no threads available, it must wait until one is available to hand off the request. You’ll need to use a synchronization mechanism
for this.
You’ll be using the POSIX threads library (libpthreads) to implement multithreading. There are a lot of calls in this
library, but you only need a few for this assignment. You’ll need:
• pthread_create (...)
• Semaphores: sem init, sem overview, sem wait, and sem post.
• Locks & condition variables: pthread mutex init, pthread mutex lock, pthread mutex unlock,
pthread cond wait, pthread cond signal, pthread cond init
Since your server will never exit, you don’t need pthread_exit, and you likely won’t need pthread_join since
your thread pool never gets smaller. You may not use any synchronization primitives other than (basic) semaphores, locks, and
condition variables. You are, of course, welcome to write code for your own higher-level primitives from locks and semaphores
if you want.
There are several pthread tutorials available on the internet, including this one and this one. We’ll also cover some basic
pthreads techniques in section.
If you prefer, you may use the C++ STL threading library, but only the classes in , , and
.
Testing your code
You should test your code on your own system. You can run the server on localhost using a port number above 1024 (e.g.,
8912). Come up with requests you can make of your server, and try them using the Python program we’ve posted on Canvas.
When you’re ready to submit your code, you should consider cloning a new copy of your repository (from gitlab.soe.ucsc.edu)
to a clean directory to see if it builds properly, and runs as you expect. That’s an easy way to tell if your repository has all of the
right files in it. You can then delete the newly-cloned copy of the directory on your local machine once you’re done with it. As
with other assignments, there will be an automated service on gitlab.soe.ucsc.edu that will allow you to run a subset of
the tests that we’ll run on your code for each assignment. See earlier assignments for details.
Assignment 2, CSE 130, Fall 2020 - 3 - © 2020 Ethan L. Miller
README and Writeup
As for previous assignments, your repository must include (README.md) and (WRITEUP.pdf). The README.md file should
be short, and contain any instructions necessary for running your code. You should also list limitations or issues in README.md,
telling a user if there are any known issues with your code.
Your WRITEUP.pdf is where you’ll describe the testing you did on your program and answer any short questions the
assignment might ask. The testing can be unit testing (testing of individual functions or smaller pieces of the program) or
whole-system testing, which involves running your code in particular scenarios.
For Assignment 2, please answer the following questions:
• Which parts of the server can run in parallel? About what fraction of the time must a thread doing a math
problem with three variables run inside a critical section? How does this impact parallelism?
• Does your multi-threaded design return a response to an individual client faster, slower, or at the same speed?
Justify your answer.
Submitting your assignment
All of your files for Assignment 2 must be in the asgn2 directory in your git repository. Your repository must meet the
Assignment Acceptance Criteria described on Canvas when you submit a commit for grading. It’s OK to commit and push
a repository that doesn’t meet minimum requirements for grading. However, we’ll only grade a commit that meets these
minimum requirements. You must submit the commit ID you want us to grade via Google Form, linked to the assignment
page on Canvas. This must be done before the assignment deadline.
Hints
• Start early on the design. As you saw with Assignment 1, RPC servers are difficult to write and debug. You
should reuse much of your design and code from Assignment 1, particularly if you got it working well.
• Attend section the weeks of November 2nd and 9th for help on multithreading.
• The rules on system calls for Assignment 1 apply for Assignment 2 as well, with exceptions noted above. You
should also get familiar with libpthread.
• Consider using getopt(3) to parse command-line options.
• Implement your chained hash table separately, outside of your server, and then integrate it into your server
code.
• Test your server using the RPC client we’re providing on Canvas. Make sure you test error conditions as well
as “normal” operation. Use options on rpcclient to dump out network data if it helps. Also, run multiple
instances of the client to ensure that you get parallelism. You’ll need version 2 of rpcclient.
• If you need help, use online documentation such as man pages and documentation on Makefiles. You’ll
need to install glibc-doc to get the man pages for pthreads. If you still need help, ask the course staff.
You should be familiar with the rules on academic integrity before you start the assignment.
• Read and follow the Assignment Acceptance Criteria on Canvas.
Grading
As with all of the assignments in this class, we will be grading you on all of the material you turn in, with the approximate
distribution of points as follows: design document (35%); coding practices (15%); functionality (40%); writeup (10%).
If you submit a commit ID without a green checkmark next to it or modify .gitlab-ci.yml in any way, your
maximum grade is 5%. Make sure you submit a commit ID with a green checkmark.
Assignment 2, CSE 130, Fall 2020 - 4 - © 2020 Ethan L. Miller

欢迎咨询51作业君