辅导案例-S 570

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

CptS 570 Machine Learning, Fall 2020
Homework #4
Due Date: Tue, Dec 8 midnight via Blackboard
NOTE 1: Please use a word processing software (e.g., Microsoft word or Latex) to write your
answers and submit a printed copy to me at the beginning of class on Oct 23. The rationale is that
it is sometimes hard to read and understand the hand-written answers.
NOTE 2: Please ensure that all the graphs are appropriately labeled (x-axis, y-axis, and each
curve). The caption or heading of each graph should be informative and self-contained.
1. (Finite-Horizon MDPs.) Our basic definition of an MDP in class defined the reward
function R(s) to be a function of just the state, which we will call a state reward function. It
is also common to define a reward function to be a function of the state and action, written as
R(s, a), which we will call a state-action reward function. The meaning is that the agent gets
a reward of R(s, a) when they take action a in state s. While this may seem to be a significant
difference, it does not fundamentally extend our modeling power, nor does it fundamentally
change the algorithms that we have developed.
a) Describe a real world problem where the corresponding MDP is more naturally modeled
using a state-action reward function compared to using a state reward function.
b) Modify the Finite-horizon value iteration algorithm so that it works for state-action reward
functions. Do this by writing out the new update equation that is used in each iteration and
explaining the modification from the equation given in class for state rewards.
c) Any MDP with a state-action reward function can be transformed into an “equivalent”
MDP with just a state reward function. Show how any MDP with a state-action reward
function R(s, a) can be transformed into a different MDP with state reward function R(s),
such that the optimal policies in the new MDP correspond exactly to the optimal policies in
the original MDP. That is an optimal policy in the new MDP can be mapped to an optimal
policy in the original MDP. Hint: It will be necessary for the new MDP to introduce new “book
keeping” states that are not in the original MDP.
2. (k-th Order MDPs.) A standard MDP is described by a set of states S, a set of actions
A, a transition function T , and a reward function R. Where T (s, a, s′) gives the probability
of transitioning to s′ after taking action a in state s, and R(s) gives the immediate reward of
being in state s.
A k-order MDP is described in the same way with one exception. The transition function T de-
pends on the current state s and also the previous k−1 states. That is, T (sk−1, · · · , s1, s, a, s′)
= Pr(s′|a, s, s1, · · · , sk−1) gives the probability of transitioning to state s′ given that action a
was taken in state s and the previous k − 1 states were (sk−1, · · · , s1).
Given a k-order MDP M = (S,A, T,R) describe how to construct a standard (First-order)
MDP M ′ = (S′, A′, T ′, R′) that is equivalent to M. Here equivalent means that a solution to
M ′ can be easily converted into a solution to M . Be sure to describe S′, A′, T ′, and R′. Give
a brief justification for your construction.
3. Some MDP formulations use a reward function R(s, a) that depends on the action taken in a
state or a reward function R(s, a, s′) that also depends on the result state s′ (we get reward
R(s, a, s′) when we take action a in state s and then transition to s′). Write the Bellman
optimality equation with discount factor β for each of these two formulations.
1
4. Consider a trivially simple MDP with two states S = {s0, s1} and a single action A = {a}.
The reward function is R(s0) = 0 and R(s1) = 1. The transition function is T (s0, a, s1) = 1
and T (s1, a, s1) = 1. Note that there is only a single policy pi for this MDP that takes action
a in both states.
a) Using a discount factor β = 1 (i.e. no discounting), write out the linear equations for
evaluating the policy and attempt to solve the linear system. What happens and why?
b) Repeat the previous question using a discount factor of β = 0.9.
5. Please read the following paper and write a brief summary (at most one page) of the main
points.
Matthew Zook, Solon Barocas, danah boyd, Kate Crawford, Emily Keller, Seeta Pea Gangad-
haran, Alyssa Goodman, Rachelle Hollander, Barbara Knig, Jacob Metcalf, Arvind Narayanan,
Alondra Nelson, Frank Pasquale: Ten simple rules for responsible big data research. PLoS
Computational Biology 13(3) (2017)
https://www.microsoft.com/en-us/research/wp-content/uploads/2017/10/journal.pcbi_
.1005399.pdf
6. Please read the following paper and write a brief summary (at most one page) of the main
points.
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay
Chaudhary, Michael Young, Jean-Franois Crespo, Dan Dennison: Hidden Technical Debt in
Machine Learning Systems. NIPS 2015: 2503-2511
7. Please read the following paper and write a brief summary (at most one page) of the main
points.
Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, D. Sculley: The ML test score: A
rubric for ML production readiness and technical debt reduction. BigData 2017: 1123-1132
8. Please go through the excellent talk given by Kate Crawford at NIPS-2017 Conference on the
topic of “Bias in Data Analysis” and write a brief summary (at most one page) of the main
points.
Kate Crawford: The Trouble with Bias. Invited Talk at the NIPS Conference, 2017. Video:
https://www.youtube.com/watch?v=fMym_BKWQzk
9. Please go through the following program on societal impacts of AI and write a brief summary
(at most one page) of the main points.
Video: https://www.pbs.org/wgbh/frontline/film/in-the-age-of-ai/
2
Grading Rubric
Each question in the students work will be assigned a letter grade of either A,B,C,D, or F by the
Instructor and TAs. This five-point (discrete) scale is described as follows:
• A) Exemplary (=100%).
Solution presented solves the problem stated correctly and meets all requirements of the prob-
lem.
Solution is clearly presented.
Assumptions made are reasonable and are explicitly stated in the solution.
Solution represents an elegant and effective way to solve the problem and is not overly com-
plicated than is necessary.
• B) Capable (=75%).
Solution is mostly correct, satisfying most of the above criteria under the exemplary category,
but contains some minor pitfalls, errors/flaws or limitations.
• C) Needs Improvement (=50%).
Solution demonstrates a viable approach toward solving the problem but contains some major
pitfalls, errors/flaws or limitations.
• D) Unsatisfactory (=25%)
Critical elements of the solution are missing or significantly flawed.
Solution does not demonstrate sufficient understanding of the problem and/or any reasonable
directions to solve the problem.
• F) Not attempted (=0%)
No solution provided.
The points on a given homework question will be equal to the percentage assigned (given by the
letter grades shown above) multiplied by the maximum number of possible points worth for that
question. For example, if a question is worth 6 points and the answer is awarded a B grade, then
that implies 4.5 points out of 6.
3

欢迎咨询51作业君