程序代写案例-FIT 5124-Assignment 1

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

FIT 5124 Advance Topics in Security (S1 2021)
Assignment 1: Total marks 100
Due on 3 April 11:59:59 am (Firm!)
1 Overview
The objective of this assignment is to assess the learning outcomes in cryptographic
techniques and protocols for searching over encrypted data. Specifically, the tasks will
evaluate your knowledge of security analysis of searchable symmetric encryption (SSE),
count attacks against SSE, countermeasures for SSE schemes, and inference attacks
against property-preserving encryption.
2 Submission Policy
You need to submit a report (one single PDF file) to describe what you have done and what
you have observed with screen shots whenever necessary; you also need to provide
explanation or codes to the observations that are related to the tasks. In your report, you are
expected to answer all the questions listed in this manual. Typeset your report into .pdf
format (make sure it can be opened with Adobe Reader) and name it as the format:
[Your Name]-[Student ID]-FIT5124-Assignment1,
HarryPotter-12345678-FIT5124-Assignment1.pdf.
All source code if required should be embedded in your report. Please upload the PDF file to
Moodle. Note that the assignment is due on 3 April, Saturday, 11:59:59 am (No
extension).
Late submission penalty: 10-point deduction per day. If you require a special
consideration, the application should be submitted at least three days in advance via
Monash Connect (https://www.monash.edu/connect). Zero tolerance on plagiarism: If
you are found cheating, penalties will be applied, i.e., a zero grade for the unit. The
demonstration video is also used to detect/avoid plagiarism. University policies can be found
at https://www.monash.edu/students/academic/
policies/academic-integrity
3 Security Analysis of Searchable Symmetric Encryption (30 Marks)
Perform the formal security analysis for the scheme in the paper of “Structured Encryption
and Controlled Disclosure” https://eprint.iacr.org/2011/010.pdf (see the screenshot below).
More detailed construction and notations can be found in the paper.
FIT 5124 Advance Topics in Security (S1 2021)
3.1 Security Definition (10 Marks)
Please give the simulation-based security definition for the above scheme regarding
non-adaptive security.
3.2 Security Analysis (20 Marks)
Please perform the formal security analysis based on the security definition and the
construction. Hint: leakage function definition (5 marks), steps on how to simulate the
security game (10 marks), explanation on how the simulated view is
indistinguishable from the real view (5 marks).
4 Count Attacks against Searchable Symmetric Encryption (40 marks)
4.1 Case study I: given a document set {d1, d2, d3, d4, d5, d6}, and d1 = {w1, w2,
w3, w4, w6}, d2 = {w1, w2, w3, w6}, d3 = {w1, w2, w4, w6}, d4 = {w1, w4, w6}, d5 =
{w4, w5, w6}, d6 = {w4, w6}, and assume that the adversary knows the above
information about the document set. Then the adversary starts to launch count
attacks for the document set encrypted by an SSE scheme. He observed the
following query response from the server.
FIT 5124 Advance Topics in Security (S1 2021)
q eids
q1 e1 e2 e3 e5 e6
q2 e3 e4
q3 e1 e2 e3 e4
q4 e2 e3 e4
q5 e1 e2 e3 e4 e5 e6
q6 e6
Note that e1,..., e6 are the permuted document IDs and the adversary does not know
the mapping between them and the original ones.
Q: Please explain how to recover the underlying keyword of each query step by step,
and write down the mapping between queries and keywords (10 marks).
4.2 Case study II: given a document set {d1, d2, d3, d4, d5, d6}, and d1 = {w1, w2,
w3, w4, w6}, d2 = {w2, w3, w6}, d3 = {w1, w3, w4, w6}, d4 = {w1, w4, w6}, d5 = {w4,
w5, w6}, d6 = {w5}, and assume that the adversary knows the above information
about the document set. Then the adversary starts to launch count attacks for the
document set encrypted by an SSE scheme. He observed the following query
response from the server.
q eids
q1 e2 e3 e4 e5 e6
q2 e2 e3 e5 e6
q3 e1 e3
FIT 5124 Advance Topics in Security (S1 2021)
q4 e4 e5
q5 e2 e5 e6
q6 e4 e5 e6
Note that e1,..., e6 are the permuted document IDs and the adversary does not know
the mapping between them and the original ones.
Q: Please explain how to recover the underlying keyword of each query step by step,
and write down the mapping between queries and keywords. Can the count attack
recover all queries? If not, please explain the reason. (15 marks)
4.3 Case study III: given a document set {d1, d2, d3, d4, d5, d6}, and d1 = {w1, w2,
w3, w6}, d2 = {w2, w3, w6}, d3 = {w1, w2, w3, w4, w6}, d4 = {w1, w3, w4, w6}, d5 =
{w5, w6}, d6 = {w5, w6}, and assume that the adversary knows the above
information about the document set. Then the adversary starts to launch count
attacks for the document set encrypted by an SSE scheme. He observed the
following query response from the server.
q eids
q1 e1 e5 e6
q2 e1 e2 e6
q3 e3 e4
q4 e1 e2 e3 e4 e5 e6
q5 e2 e5 e6
q6 e1 e2 e5 e6
FIT 5124 Advance Topics in Security (S1 2021)
Q: Please explain how to recover the underlying keyword of each query step by step,
and write down the mapping between queries and keywords. Can the count attack
recover all queries? If not, please explain the reason. (15 marks)
5 Padding Countermeasures in Searchable Symmetric Encryption (20 Marks)
The primary countermeasure for the above count attack is padding solution. In this
task, you are given the attached source code and data:
● Inverted_index_5000: This binary file contains the inverted index of the top
5000 most frequent keywords in the real Enron email dataset
(https://www.cs.cmu.edu/~enron/).
● frequency_5000.csv: This .csv file details the (keyword, frequency) in the
above index. This .csv file is already sorted in the descending order.
● app.py and utilities.py. The current app.py script reads the above inverted
index into a Python dictionary data structure, namedly inverted_dict. Then,
using inverted_dict[keyword] will give you the list of document identifiers
mapping to that keyword. You can review these two scripts for more details.
The first padding approach (Approach #1) is to add more padding (w,id’) pairs for
each keyword w such that all the keywords have the same number of real and
padding pairs, where id’ is a padding document identifier. In this way, all keywords
will have the result lengths the same as the result length of the most frequent
keyword when the untrusted server executes the search operation in SSE scheme.
The second padding solution (Approach #2) is to add more padding (w,id’) pairs to w
such that the query’s result length of w is a multiplication of an integer. For example,
the current frequency of w is 6, then it needs 4 padding (w,id’) pairs if the
multiplication of 5 is used.
The above two approaches can be found in [4].
The third padding solution (Approach #3) is to only pad keywords that have the
similar frequencies into the same result length. As a result, this approach will
minimise the padding overhead. However, this approach first requires defining the
clusters of keywords in advance. Then, we can identify the most frequent keyword in
each cluster and pad the remaining keywords in the same cluster to that length.
More details about this approach can be found in [5].
Q1: Please develop the padding solution in Approach #1 and identify the padding
overhead. Note that the padding overhead is the ratio between the total number of
FIT 5124 Advance Topics in Security (S1 2021)
real and padding pairs in the inverted index over the real number of pairs. Add the
screenshot of your python code and the result into your report (5 marks)
Q2: Please develop the padding solution in Approach #2 and identify the padding
overhead if the multiplication of 100 is used. Add the screenshot of your python code
and the result into your report(5 marks)
Q3: Please develop the padding solution in Approach #3 and identify the padding
overhead. Note that you are given the cluster configuration in the app.py. Add the
screenshot of your python code and the result into your report (5 marks)
For example,
cluster_points_256 =
[392,648,904,1160,1416,1672,1928,2184,2440,2696,2952,3208,3464,3720
,3976,4232,4488,4744]
denotes that the 1st keyword to the 392nd keyword (inclusive) in
frequency_5000.csv will be in the first cluster, and the 4745th keyword to the
5000th keyword (inclusive) will be in the last cluster. There are 19 clusters in
this setting. With this configuration, there are at least 256 keywords in each
cluster (i.e., cluster size).
Another configuration is that,
cluster_points_512 = [904,1416,1928,2440,2952,3464,3976,4488]
will have at least 512 keywords in the same cluster.
Q4: Compare the padding overhead between the above Approach #1, Approach #2,
and Approach #3 with cluster sizes 256 and 512. Add the screenshot of your python
code and the result into your report (5 marks)
6 Inference Attacks against Order-preserving Encryption (10 Marks)
As introduced in Week 3, sorting attacks is applicable to uncover dense
OPE-encrypted column ( ). In that setting, a ciphertext contains at least aδ = 1
fraction of plaintext(s) to perform a direct mapping on the same sorting order.δ = 1
However, the attack becomes less applicable for low-density OPE-encrypted
columns if the fraction is less than 1. Intuitively, the plaintext space (i.e., auxiliary
dataset) known by the attacker who is launching the inference attack might be
smaller than the underlying plaintext space of the ciphertext space.
FIT 5124 Advance Topics in Security (S1 2021)
To address the limitation of the sorting attacks when dealing with low-density
OPE-encrypted columns , Naveed et. al. [1] introduced cumulative attack. In short,
the attack combines the ordering with the frequencies to increase the ability to match
a plaintext to a ciphertext. For example, if a ciphertext matches 90% of its ciphertext
in the encrypted column, then the mapping should match that ciphertext to a
plaintext that also matches 90% of its plaintext. To do this, the attacker uses
empirical cumulative distribution function [2] (ECDF, or simply called CDF) to
evaluate the probability of a real-valued ciphertext. Formally, the attack is defined as:
Cumulative-Atk(c, z): Given an OPE-encrypted column c over the ciphertext space
C and the auxiliary plaintext dataset z over the plaintext space M, the adversary
computes:
1. compute andψ←() φ←()
2. compute andπ←() µ←()
3. output
∈

=1
||
∑ (|ψ

−

∙π| + |φ

−

∙µ| )
where and are the histograms of c and z, respectively; and are the CDFs of cψ π φ µ
and z, respectively; is the set of permutation matrices. || × ||
The step 3 can be formulated as a linear sum assignment problem (LSAP) as below:
minimise
=1

∑
=1

∑

subject to //sum of all weights on a column =1
=1

∑

= 1, 1≤ ≤||
//sum of all weights on a row =1
=1

∑

= 1, 1≤ ≤ ||
,

∈{0, 1} 1≤ , ≤ |

|
where the cost matrix gives the cost for mapping a plaintext to a cipher text as

the sum of the mismatch in the frequencies plus the mismatch in cumulative
frequencies:

= |ψ

− π

|2 + |φ

− µ

|2
In the attached material, you are given an OPE-encrypted column in
ope_enc_covid_age.csv, and the auxiliary plaintext dataset in auxiliary.csv. The
dataset is extracted from the Mexican government’s COVID19 dataset [3]. The
cumulative_attack.py contains the instructions how to perform Steps 1-3.
FIT 5124 Advance Topics in Security (S1 2021)
Q1. Complete the Step 1 by calculating the histograms of the cipher c and the
plaintext z. Please add your script screenshot to your report (2.5 marks)
Q2. Complete the Step 2 by calculating the empirical CDFs of c and z. Please add
your script screenshot to your report. (2.5 marks)
Q3. In Step 3, compute the cost matrix C and apply the LSAP resolver to identify the
optimal weight matrix X. Please add your script screenshot to your report (2.5
marks)
Q4. Print the inference mapping between the cipher and the plaintext, and copy it
into your assignment report.(2.5 marks)
Reference:
[1] Muhammad Naveed, Seny Kamara, and Charles V. Wright. 2015. Inference Attacks on
Property-Preserving Encrypted Databases. In Proceedings of the 22nd ACM SIGSAC
Conference on Computer and Communications Security (CCS '15). Association for Computing
Machinery, New York, NY, USA, 644–655. DOI:https://doi.org/10.1145/2810103.2813651
[2] CDF and Empirical CDF,
https://en.wikipedia.org/wiki/Cumulative_distribution_function#Empirical_distribution_function
[3] COVID-19 patient pre-condition dataset,
https://www.kaggle.com/tanmoyx/covid19-patient-precondition-dataset
[4] Cash, D., Grubbs, P., Perry, J., & Ristenpart, T. (2015, October). Leakage-abuse attacks
against searchable encryption. In Proceedings of the 22nd ACM SIGSAC conference on
computer and communications security (pp. 668-679).
[5] Bost, Raphael, and Pierre-Alain Fouque. "Thwarting Leakage Abuse Attacks against
Searchable Encryption-A Formal Approach and Applications to Database Padding." IACR
Cryptol. ePrint Arch. 2017 (2017): 1060.

欢迎咨询51作业君