辅导案例-BBAZ16602

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

BBAZ16602 - MIDTERM (Full Mark: 110, Bonus: 10)
48 + 3 HOURS (Nov 1, 2020, 6:00pm -- Nov 3, 2020, 8:59pm)
Q1 (15 marks)
time: 10 seconds
Howard is a very famous detective in Macau. Every time there is a crime,
Howard can quickly catch many suspects, tho the problem is ... really too
many suspects ...
So, you, as a junior detective, you need to help Howard to narrow down the
number of suspects.
Assume Howard caught suspects. for each suspect, Howard allows him/her
to say only 1 sentence, either "i am innocent" or "a, b or c is guilty". In the
second sentence, the suspect can finger out any amount of "guilty" people.
Also, all the suspects are numbered from to .
God knows that you have a tough job, it always send you a message in your
dream to tell you the number of suspects that are telling truth to make your life
easier.
Hopefully, together will all these information, you can narrow down the
suspect list.
input: the first lines are the sentences from number suspect to number
suspect respectively. The line is the number , it is
the number of suspects are telling the truth.
output: shows the list of possible suspects
n
1 n
n 1 n
n+ 1 k ∈ {1, 2,⋯ , n− 1}
file name: midterm_Q1.py
input: midterm_Q1_input.txt
output: midterm_Q1_output.txt
sample input:
1 i am innocent
2 3 is guilty
3 i am innocent
4 1 or 2 is guilty
3
sample output: (if unable to narrow down, just print the empty list)
[2]
In [ ]:

Q2 (25 marks)
We store data in bits in our computers, data compression is a way to use fewer
bits than the original representation, so the size of the data can be
reduced.
Compression can be either lossy or lossless. Lossless compression reduces
bits by identifying and eliminating statistical redundancy. No information is
lost in lossless compression, for example, the file format ".zip'' is one of the
example of lossless compression. Lossy compression reduces bits by
removing unnecessary or less important information, for example,".jpg'' is one
of the lossy compression for images.
In this question, you will be focused in lossless compression. For example,
say we have a mapping for encoding as follows:
Using the mapping above, we can therefore encode a message, for
example:
Huffman code is a type of encoding method that is commonly used for
lossless data compression in our computers. It has variable length of its
codes, that is, the size of each code is not necessary the same. To construct
Huffman code, we first need to count the occurrence or frequency of each
character. Using the same example from above, we have
BADHEADFEED⇒ 001000011111100000011101100100011
Then, we can simplify the table by removing those occurrence0
Now we construct a tree using the following rule:
- All the characters are leaf nodes
- Form a sub-tree by taking the least probable sub-trees
One may illustrate the process by the following,

The Huffman code for each character is determined by its leaf position of the
tree. That is, counting from the root, left corresponds to and right
corresponds to . So, is the right most leaf, and hence the corresponding
code is .
0
1 A
111
Therefore, using the Huffman coding, we have
So, compare to the fixed length code, Huffman code of
uses only bits, which is bits less than the fixed length code.
BADHEADFEED⇒ 000111011101011101001101001
BADHEADFEED
27 6
ABOUT THE PROGRAM BODY, JUST USE THE FOLLOWING,
REMEMBER TO MODIFY THE FILENAMES YOURSELF:
In [ ]:
#execute line by line
fout = open('output filename here', 'w')
with open('filename here', 'r') as fp:
for line in fp:
exec(line)
fout.close()
Q2.(a) [6 marks] Write a function to return a
dictionary that stores the characters and the corresponding occurrence given
an input string, e.g. .
NOTE: DO write a function to do so, as you will need to combine all the
functions later.
time: 3 seconds
ToDict(some_input_string)
{'A':5, 'B':10, ...}
filename: midterm_Q2_a.py
input file: midterm_Q2_a_input.py
output file: midterm_Q2_a_output.py
sample input:
tx = 'This is the brand new world.'
print(ToDict(tx), file = fout)
sample output:
{' ': 5, '.': 1, 'T': 1, 'd': 1, 'e': 3, 'h': 3, 'i': 2, 'l': 2, 'n': 1, 'o': 2, 'r': 1, 's': 2, 't': 1, 'w': 3}
In [ ]:

Q2.(b) [8 marks] Use the tree data structure, and the node
data structure from lecture notes to stores the corresponding
characters.
Then, write a function that
outputs a Huffman tree given an input dictionary. (here, we output the preorder
of the tree)
time: 3 seconds
Tree() Node()
HuffmanTree(some_input_dictionary)
filename: midterm_Q2_b.py
input file: midterm_Q2_b_input.py
output file: midterm_Q2_b_output.py
sample input:
tx = 'BADHEADFEED'
dic = ToDict(tx)
tr = HuffmanTree(dic)
print(tr.preorder(tr.root, []), file = fout)
sample output:
['BFDEHA', 'BFD', 'BF', 'B', 'F', 'D', 'EHA', 'E', 'HA', 'H', 'A']
In [ ]:

Q2.(c) [6 marks] Write a function in the tree data structure
that returns a dictionary, which stores the characters and the
corresponding Huffman code given an input tree.
time: 3 seconds
ToHuffman()
Tree()
filename: midterm_Q2_c.py
input file: midterm_Q2_c_input.py
output file: midterm_Q2_c_output.py
sample input:
tx = 'BADHEADFEED'
dic = ToDict(tx)
tr = HuffmanTree(dic)
print(tr.ToHuffman(), file = fout)
sample output:
{'A': '111', 'B': '000', 'D': '01', 'E': '10', 'F': '001', 'H': '110'}
In [ ]:

Q2.(d) [2 marks] Write a Huffman encoding function
to return a binary
sequence that corresponds to your input, e.g. .
time: 10 seconds
encHuffman(some_input_string, Huffman_dict)
11001...
filename: midterm_Q2_d.py
input file: midterm_Q2_d_input.py
output file: midterm_Q2_d_output.py
sample input:
tx = 'BADHEADFEED'
dic = ToDict(tx)
tr = HuffmanTree(dic)
Huff_dic = tr.ToHuffman()
print(encHuffman('BADHEADFEED', Huff_dic), file = fout)
sample output:
000111011101011101001101001
In [ ]:

Q2.(e) [3 marks] Write a function
to decode a binary
sequence with input a binary sequence and a Huffman dictionary.
time: 20 seconds
decHuffman(some_binary_seq, Huffman_dict)
filename: midterm_Q2_e.py
input file: midterm_Q2_e_input.py
output file: midterm_Q2_e_output.py
sample input:
tx = 'BADHEADFEED'
dic = ToDict(tx)
tr = HuffmanTree(dic)
Huff_dic = tr.ToHuffman()
enc = encHuffman('BADHEADFEED', Huff_dic)
print(decHuffman(enc, Huff_dic), file = fout)
sample output:
BADHEADFEED
In [ ]:

Q3 (20 marks)
Q3.(a) [8 marks] Haloween party is just finished. You and your friends are still
very excited and hope to have another party on Nov 2, 2020. Now, you come
up with an idea -- social dance party, and you are the coordinator of this party.
Assuming there are men and women. In this party, each man need to pair
with a woman, and you know, each of you have partner preference in your
mind. So, you, as a coordinator, you need to pair them so that the 'pair rate' is
maximized.
men is numbered from to , women is also numbered from to . For
each man, it has preference of choosing women, similar for each woman.
For example, if man and woman is paired, the pair rate equals if woman
is in man 's preference list or vice versa, and otherwise.
The overall pair rate is the sum of each pair's pair rate.
n n
n 1 n n 1 n
b g 1 g
b 0
filename: midterm_Q3_a.py
input file: midterm_Q3_a_input.py
output file: midterm_Q3_a_output.py
sample input: (the first line is the number of men, it is also the number of
women, the coming lines corresponds to the preference list of men from
man to , the last lines corresponds to the women's preference list, if a
line is , that means the corresponding man or woman has no
preference.)
3
1 2
0
3
0
3
0
sample output:
2
time: 45 seconds
n
1 n n
0
In [ ]:

Q3.(b) [12 marks] Now, assuming each man and woman will set their
preference ranking from to , if a man and a woman is paired, their score
will the the multiplication of the rankings. It is obvious that, the sum of the
pairing score need to be minimized, what is the minimized value?
sample input: (the first line is the number of men, it is also the number of
women, the coming lines corresponds to the priority preference rating of
men from man to , the last lines corresponds to the women's priority
preference)
3
1 2 3
3 1 2
1 3 2
2 1 3
1 2 3
1 2 3
sample output (man 1 - woman 3, man 2 - woman 2, man 3 - woman 1):
8
time: 60 seconds
1 n b g
n
1 n n
In [ ]:

Q4 (50 marks)
We have a set of movies, with the properties like type(s), director(s), actor(s),
etc. We also have a set of users and the users rating for some movies.
Now, can you use graph to recommend me a movie that I might like?
Q4.(a) [2 marks] Which is the most voted movie? Please show the first most
voted movies.
filename: midterm_Q4_a.py
input file: midterm_Q4_a_input.py
output file: midterm_Q4_a_output.py
sample input (the first 10 most voted movies):
10
ti 60 d
n
In [ ]:

Q4.(b) [2 marks] Which movie has the highest rate? Please show the first
highest rate movies.
In fact, we can use the average ratings of the movie as the score but using this
won't be fair enough since a movie with 4.3 average rating and only 3 votes
cannot be considered better than the movie with 4.1 as as average rating but
300 votes. So, we use a more fair rating scheme for each movie:
where is the number of votes for the movie;
is the number of votes that 90% of the movies have;
is the average rating of the movie; And
is the mean ratings across the entire ratings
filename: midterm_Q4_b.py
input file: midterm_Q4_b_input.py
output file: midterm_Q4_b_output.py
sample input (the first 10 highest rate movies):
10
ti 60 d
n
WR (weighted rating) = R+ C
v
v+m
m
v+m
v
m
R
C
In [ ]:

Q4.(c) [4 marks] For those who give a movie score greater than or equal to ,
they are the potential fans of the movie. So, given a movie name, can you
show the what other movies that these potential fans are also potential fans?
List the first 10 of them.
filename: midterm_Q4_c.py
input file: midterm_Q4_c_input.py
output file: midterm_Q4_c_output.py
sample input (the movie name):
Independence Day (a.k.a. ID4) (1996)
time: 120 seconds
4
In [ ]:

Q4.(d) [4 marks] Based on (c), you build a naive recommender system.
Suppose a reviewer give a bunch of scores to different movies, for those
movies he/she gives at least , you should be able to find the others who also
give these movies at least . Based on these reviewers, calculate the scores
for the movies that they have rated, and sort them by scores. List the first 10 of
them.
filename: midterm_Q4_d.py
input file: midterm_Q4_d_input.py
output file: midterm_Q4_d_output.py
sample input (userId):
12345
time: 120 seconds
4
4
In [ ]:

Q4.(e) [8 marks] Improving your recommender system. A person rates 5 to a
movie, if another person also rates 5 to the same movie, it might means that
these two people share a common interest. Now, we define a 'distance' metric
between a person and a movie, mathematically,
where the distance is integer value. More specifically, If a person rates 1.5 to a
movie, then the distance between them is 8.
Now, based on this distance metric, we can calculate the average distance
between two people among the movies they both rated.
Therefore, given the average distance between person A and person B, and
the number of common movies that they both rated. You may recommend
person A the movies that person B have rated with at least rate 4. So, if person
A is fixed, you will be able to find all possible people and sorted by the number
of movies rated in common with descendant order, and distance with
ascendant order. List the first 20 of them.
filename: midterm_Q4_e.py
input file: midterm_Q4_e_input.py
output file: midterm_Q4_e_output.py
sample input (userId):
12345
time: 180 seconds
distance = 11 − 2 × rate,
In [ ]:

Q4.(f) [8 marks] Improving your recommender system. Following (d), you will
be able to identify the movies that you haven't watched from their rated movie
list (probably those you haven't rate). Now, we define a new metric call
'distance per person of a movie' by
You see that the smaller value of distance per person of a movie meaning that
the more people having watched that movie, which means such movies should
recommend to you. List the first 20 movies.
filename: midterm_Q4_f.py
input file: midterm_Q4_f_input.py
output file: midterm_Q4_f_output.py
sample input (userId):
12345
time: 240 seconds
distance per person of a movie = ,
average distance from you to those people
number of people having rated this movie
In [ ]:

Q4.(g) [10 marks, essay + code] Based on (e), one should be able to calculate
the minimum distance from a person thru another person (rated at least a
movie in common) to a movie that haven't rated, simply calculate the average
distance of different movies and sort them in ascendant order for the
recommender system, and list the first 20 movies.
essay filename: midterm_Q4_g.pdf
Compare the difference between (f) and (g), and analyze which method is
better, in terms of?
filename: midterm_Q4_g.py
input file: midterm_Q4_g_input.py
output file: midterm_Q4_g_output.py
sample input (userId):
12345
time: 240 seconds
In [ ]:

Q4.(h) [12 marks, OPEN QUESTION, essay + code] In fact, we can also rate a
person's flavor using the movie types. Think of a metric that can merge the
movie types, ratings, and similarity. Then, write a recommender system and
list the first 20 movies.
essay filename: midterm_Q4_h.pdf
You should provide the reason of such metric, and show some stats that
supports your thought.
filename: midterm_Q4_h.py
input file: midterm_Q4_h_input.py
output file: midterm_Q4_h_output.py
sample input (userId):
12345
time: 300 seconds
In [ ]:

BONUS
print the following:
BBAZ16602 is fun!
B i f |
B s u |
A n .
Z
1
6
6
0
2
output file: midterm_bonus_output.txt
In [ ]:

欢迎咨询51作业君