程序代写案例-COMP3300

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
School of Computing
COMP3300 Assignment Project
Assignment marks: 18% of overall unit marks.
Due date: see iLearn page.
Objective: To gain expe
rience in evaluating a dataset for privacy and utility so that the relevant
privacy law is respected; select a privacy preserving technique to protect identified vulnerabilities.
Please note: This assignment specification aims to provide as complete a description of this
assessment task as possible. However, as with any specification, there will always be things we
should have said that we have left out and areas in which we could have done a better job of
explanation. As a result, you are strongly encouraged to ask any questions of clarification you might
have, either by raising them during a lecture or by posting them on the iLearn discussion forum
devoted to this assignment.
Transport data release
The Westeros Rail Network, illustrated in the map above, serves the Westeros region. In order to
improve its services eg. whether to upgrade station services, it collects information about journeys
via the “tap on/tap off” information it logs every time a passenger goes through a barrier at a
Westeros Rail
Network
wi#Universitas##
* 1¥ ˢm
¥# B"##¥DistrictRock *
centre # ¥School
ˢ*Landing #
Par
station. There are, of course, a multitude of journeys taking place every day, but due to the often
uniqueness of a journey it could still be the case that journeys could be used to identify particular
passengers. To serve the needs of transparency (so that the station upgrade can be explained to the
public), the Westeros Rail Network is planning to release a “de-identified” dataset.
The Westeros Rail Company would like to know whether their proposed de-identified dataset
satisfies the Westeros Privacy Law:
“No individual can be re-identified in any publicly-released dataset. Re-identified means
that a record can be linked with very high likelihood to an individual using information which could
be plausibly obtained.”
It has hired a team of privacy experts to determine whether or not there are still vulnerabilities in
the dataset that could be construed as a breach. If any vulnerabilities are discovered the team has
been asked to make recommendations for how the data should be changed by applying an
appropriate privacy technique, to ensure that the core utility of the dataset is not destroyed.
The Valar Morghulis Travel Card
The Valar Morghulis Card (or VM Card) is what passengers
use to go through the electronic barriers at a station by
tapping on or tapping off. Each time they do so, information
about the journey and the card is collected by the Westeros
Train Network.
Data collected about the card
When a card is issued, it is assigned a Card Id, and a card type. The Card Id is a four digit number
and is unique to the card. It is used by the Westeros Rail Company in its internal accounting system
by, for example, linking it to a customer’s credit card. This also makes it easy to issue a customer
with a new card in case of loss: they simply create a new card with a new Card Id and link it to the
customer’s internal record. The separation of the Card Id and the customer record is how the Rail
Company hopes to abide by the Westeros Privacy Laws.
Each card also has a type, such as Adult or Child. The complete set
of card types is shown in the table on the left.
Each individual who has a card is actually quasi-identified by their
Card Id. However individuals do not know what their Card Id is, so
in this sense, the Westeros Rail Company assumes that the separation
between the real passenger and the Card Id is sufficient to de-identify
the data in the transport dataset (described below).
Data collected about the journey
Whenever a passenger either taps on OR taps off, the station, and the
tap on/tap off time and day is logged as well as the card type and the card Id. This is communicated
to the Westeros Travel Dataset to create a dataset consisting of records, where each record is a
complete journey including the tap on time and station and the tap off time and station.
The Westeros Transport Dataset
Below is an example of a few records in the transport dataset that the Westeros Rail Company is
planning on releasing:
CardID, CardType, Touch On: Day, Time, Station, Touch off: Day, Time, Station
7378, #6, Thursday, 1121, Stadium, Thursday, 1131, Braavos
7378, #6, Wednesday, 1843, University, Wednesday, 1902, Braavos
6150, #3, Wednesday, 227, Police, Wednesday, 235, Braavos
6150, #3, Wednesday, 1856, Casterly Rock Shopping, Wednesday, 1918, King's Landing
3090, #1, Tuesday, 1411, Law Courts, Tuesday, 1455, Braavos
This shows five journeys of three different passengers (identified by the CardID).
The first record, for example shows that a passenger with CardID 7378 tapped on at 11:21am on
Thursday at the Stadium Station, and then tapped off at 11:31am at Braavos Station. We know that
this passenger is travelling under a Senior Card.
The first task of the privacy experts is to determine whether the records can be de-identified. (See
below for details.)
A Public Dataset: Twitter
The privacy experts have pointed out to the Westeros Rail Network that they must check to see
whether any record in the Westeros Transport Dataset can be linked to individuals in a public
dataset. They have proposed using Twitter posts to do this as they are easily available, and include
posting times as a potential attribute that can be linked.
An example of a twitter post is shown below:
TheHighSparrow@TheApprenticeWesteros — Just got to the Business District with interviewing for the new
season of The Apprentice Westeros!! Looking forward to hiring and firing. Posted 15:43, Friday.
The Assignment
In this assignment the privacy experts first carry out an assessment of privacy vulnerabilities of the
Westeros Transport Dataset which can be downloaded from iLearn in a file called
WesterosTravel.csv. It consists of journeys logged during a single week, and each record is as
described above. Notice however that the times appear simply as a number between 0 and 2359, to
make it easier for you to use filtering in your analysis. This means that midnight is represented by 0,
and 5 minutes past midnight is simply 5. For all other times of 2 or more digits, the last two digits
represent the minutes after the hour and any digits preceding those represent the hour. So 946 is
09:46 and 1327 is 13:27.
Task 1: Role Description
In this task, the privacy experts have asked for volunteers from the public to help them. Each
volunteer travels regularly on the train. They also often travel with a friend or colleague.
In Task 1, you will take on the role of the volunteer. To find out who you are, click on the Character
Specification under the Assignment tab in iLearn. The specification contains a short description of
your journeys. Your journey description will also include some information about a friend or
colleague that sometimes shares all or parts of your journey.
You are also provided with a famous individual’s identity who has a twitter feed, and you are
provided with a public post from that person.
Task 1 (a) : Re-identify your journeys (10 marks)
For this task, use what you know about your journeys described in Task 1 to identify your Card Id.
Task 1 (b) : Re-identify your friend's or colleague’s journeys (10 marks)
For this task, use what you know about your friend or colleague’s journeys to identify their Card Id.
Task 1 (c ) : Learn a new fact about your friend or colleague (10 marks)
For this task, identify an additional destination travelled to by your friend or colleague that is not
included in the description in Task 1. ie. It should be about a different journey that is not part of
their regular routine.
Task 1 (d) : Perform a linkage attack with twitter (10 marks)
For this task, link the information in your given Twitter post from Task 1 to some journeys in the
dataset to identify the Card Id of the Twitter poster.
Task 1 (e) : Summarise your findings (15 marks)
Summarise the vulnerabilities that you have discovered by answering the following:
(a) Describe what you did to re-identify yourself and your friend/colleague. (3 marks)
(b) Describe what you did to find out a new fact about your friend or colleague. (3 marks)
(c) Describe what you did to re-identify the famous person. (3 marks)
(d) Explain whether these attacks constitute a breach of the Westeros Privacy Law. (6 marks)
Your summaries for each part (a)-(d) above should be 1-2 sentences each, and contain sufficient
information to show your understanding of the re-identification attacks you performed.
Task 2 (15 marks)
After reporting the vulnerabilities discovered by the volunteers in Task 1 the privacy experts are
told by the Westeros Rail Network that the most useful part of the data is the number of tap on/tap
off times within a time period at any given station. This is their stated utility.
(a) Select which attributes of the transport dataset are not needed for this utility to be still
achievable. (5 marks)
(b) If the attributes you identified were to be removed, explain whether the vulnerabilities
identified in Task 1 are still present. (5 marks)
(c) What would you (now in the role of the privacy expert) report to the Westeros Rail Company
about whether the modification suggested at part (b) above preserves their their stated utility?
What would you report regarding the modification’s ability to abide by the Westeros privacy
law? Briefly state and explain your overall recommendation regarding this modification. (5
marks).
Task 3 (15 marks)
(a) Choose the most appropriate privacy preserving technique from the choices (i) and (ii)
below. Your technique must preserve the following utility: it must enable the number of tap on/tap
offs within each 10 minute period to be preserved approximately.
(i) Generalise the tap on/tap off timings to intervals of 5 minutes by rounding to the nearest 5-
minute period. So a time of 09:12 would be reported as 09:10 and a time of 09:14 would be
reported as 09:15.
(ii) Suppress the Hour in the tap on/tap off timings i.e. only preserve the minutes eg turn 09:45 into
45, and turn 18:45 into 45.
Indicate in your answer on iLearn which method you have chosen. (1 mark).
For the choice you made above, apply the generalisation technique to the dataset and then answer
the following:
(b) In the dataset you have just created:
1. At station Stadium, what is the average number of tap-ons between 4pm and 6pm on
weekdays? (ie. count the total number of tap-ons at Stadium between 4pm-6pm on Monday-
Fridays and divide by 5). (3 marks)
2. At station Braavos, what is the average number of tap-ons between 9am and 11am on
weekdays? (3 marks)
3. At station Winterfell what is the average number of tap-ons between 10am and 2pm on
weekdays? (3 marks)
(c) Referring to the Westeros privacy law, and by comparing your answers from part (a) to the
corresponding results on the original dataset, would you recommend the approach ((i) or (ii))
you took to balance the trade-off between privacy and utility? Explain your answer. (5 marks)
Task 4 (15 marks)
An alternative approach is to apply bounded noise with B=5 (selected uniformly) to the tap on/tap
off times. For example, if the time is 09:45 then the possible outputs are 09:40, 09:41, … 09:50,
each with equal probability.
(a) Assume that this perturbation technique will be applied to the dataset. For the first two
scenarios described at Task 3 (a) (i.e. 1&2), calculate the probability of incorrectly reporting
the average count. (Hint: How many people may be incorrectly excluded/included in the
count? What is the probability of excluding/including these people?) (8 marks)
(b) By referring to the corresponding scenarios in Task 3 or otherwise, what method of
anonymisation would you recommend so that the Westeros Privacy Law is respected? Explain
your answer. (7 marks)
The Assessment
This assignment is structured to allow you to decide how much effort you want to expend for the
return in marks that you might hope for. You can choose which parts of the privacy assessment to
complete, however do note that later Tasks cannot be completed unless you have completed Tasks
that precede it. However you can decide upfront whether you are shooting for a pass or a high
distinction and know exactly how much work will be required to obtain that mark.
Here is what is required to obtain marks in one of the performance bands for this assignment:
Pass: Completion of Task 1.
.
Credit: the P level implementation + completion of Task 2.
Distinction: the Cr level implementation + completion of Task 3.
High Distinction: the D level implementation + completion of Task 4.
What you have to do
Download the dataset WesterosTravel.csv from iLearn; view the Character Specification to find out
your identity, partner and famous twitter user and information about their journeys. Open the file
WesterosTravel.csv. using a spreadsheet program such as Excel (or any program of your choosing)
and use its filtering function (or other means) to help you find privacy vulnerabilities in the dataset.
What you must hand in
In the submission page on iLearn for this assignment you must input the answers to your analysis
above by the due date. This is in the form of an iLearn Quiz. Some of these answers will be
automatically marked and so it is important to abide by the input instructions.
You have exactly two attempts to submit the quiz, so please only submit when you are happy with
your answers.
Late penalty
It is important that you hand your work on in time. Please note that a late penalty of 10% per day
will be applied unless a valid Special Consideration is lodged.

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468