The University of New South Wales - COMP9312 - 23T2 - Data Analytics for Graphs
Exam Questions
Important updates for the exam paper will be listed here.
Summary
SubmissionSubmit your files on Moodle (Only the last submission will be used)
Required FilesOnly .pdf file is accepted. The file name should be exam_Zid.pdf
Duration2pm--5pm Tue 22 Aug (Sydney Time)
Marks100 marks (50% toward your total mark for this course)
By attending this exam, you consent to the following policy:
I acknowledge that all of the work I submit for this exam will be completed by me without assistance from anyone else. I will not copy the questions
to any site outside CSE except onto my home computer.
No late penalty policy is applied. Anything submitted after 5pm Tuesday 22 Aug will be ignored (unless you have adjustments based on an
Equitable Learning Plan).
We will be running plagiarism-checking on your submissions. Plagiarism, especially in exams, will be prosecuted as Student Misconduct.
"Tutoring" sites like chegg.com will be monitored during the exam. Use of such sites will also be prosecuted as serious Student Misconduct.
You might want to keep this page open in a browser tab in case you need to refer to it. You should also have a mail reader open to receive any major
updates on the exam (which will also be shown on the top) and an EdForum page.
To a s k fo r c l a r i f i c a t i o n o n a ny q u e s t i o n , c re a t e a private post on the EdForum.
General Instructions:
Questions are not worth equal marks.
Questions may be answered in any order.
Do not leave submissions to the last minute.
Check that you submitted all of your answers.
If you believe that insufficient information has been provided to answer a given question, then you should write any assumptions that you think are
necessary to complete the question and continue work from there. If the assumptions are reasonable, you can still obtain full marks for the question.
START OF QUESTIONS
Q1 (8 marks)
Figure 1.1 Figure 1.2
1.1 Consider the graph in Figure 1.1, provide the result of a DFS travesal starting from A. Ensure that neighbors of each vertex are visited in alphabetical
order.
1.2 Show the minimum spanning tree of the graph in Figure 1.2.
Q2 (12 marks)
Figure 2 Figure 2.1 Figure 2.2
Consider the DAG given in Figure 2 and answer the following questions about tree cover index for reachability queries:
2.1 Given the tree cover T1 in Figure 2.1, show the intervals of each vertex in T1.
2.2 Given the tree cover T2 in Figure 2.2, show the intervals of each vertex in T2.
2.3 Which index is better? Why?
Q3 (14 marks)
3.1 Given an adjacency list representation of an undirected unweighted graph and a vertex ID named start, the following Python code aims to compute the
shortest distance from start to all other vertices. Please analyze its time complexity, and present how you can revise the code to improve its efficiency.
def SSSD(graph, start):
.
&
visited = [False] * len(graph)
distance = [-1] * len(graph)
distance[start] = 0
queue = list()
queue.append(start)
visited[start] = True
while queue:
current_node = queue.pop(0)
for neighbor in graph[current_node]:
if not visited[neighbor]:
queue.append(neighbor)
visited[neighbor] = True
distance[neighbor] = distance[current_node]+1
return distance
3.2 Given an undirected unweighted graph with n vertices and m edges, represented by an adjacency list, there may exist multiple paths with the same
distance between a pair of vertices. Design an algorithm to efficiently compute the number of shortest paths between two query vertices u and v in the
graph. Please provide the pseudocode of your algorithm and analyze its time complexity (some description for your pseudocode is helpful). A template is
shown as follows.
def NSP(graph, u, v):
sp = 0
/* write your pseudocode here */
return sp
Q4 (12 marks)
Figure 4
We use the adjacency list to represent an undirected weighted connected graph with n vertices and m edges. You need to compute the shortest distance
between two query vertices u and v. It is obvious that the shortest distance can be easily computed by the Dijkstra algorithm. To improve the query
efficiency, we have precomputed the shortest distance from a specific vertex to all other vertices. You should consider how to utilize the precomputed
data and improve your query efficiency. Please provide the pseudocode and analyze its time complexity (some description for your pseudocode is
helpful). A template is shown below.
Ta ke t h e g r a p h i n F i g u r e 4 a s a n ex a m p l e . T h e p r e c o m p u t e d i n d ex i s S = [ 1, [ 4,0,5,1 2 ,7,1 3,1 0 ] ] . T h e f i r s t i t e m i s a ve r t ex I D w h i c h i n d i c a t e s we h ave
precomputed the shortest distance from 1 to all other vertices. The second item of S is the shortest distance array.
def SD(graph, u, v, S):
/* write your pseudocode here */
Q5 (12 marks)
5.1 Please determine whether the following statements are TRUE or FALSE, and justify your answer (one or two sentences for each FALSE statement).
a. GraphSAGE and GCN utilize the same method to aggregate information from neighboring nodes.
b. GAT employs normalization during the computation of attention values.
c. GAT is more robust compared to GCN in terms of over-smoothing issue.
d. Adding skip connection to GCN can help prevent overfitting.
5.2 Please determine whether the following statements are TRUE or FALSE, and justify your answer (one or two sentences for each FALSE statement).
a. When size of dataset is large, GNN tends to suffer over-smoothing issue.
b. The stability of GNN training are mainly affected by the size of graph.
c. Assume there is no dropout layer, given a dataset and a fixed hidden dimension, a GCN layer will have more learnable parameters compared to a
GraphSAGE layer.
d. Supervised training computes loss by minimizing the discrepency between predictions and labels, while unsupervised training does not necessarily
require any loss function.
Figure 5.3
5.3 Given the graph G shown in Figure 5.3 with 6 nodes where each node has 3 neighbors. The dimension of each node feature is 5. Suppose one
GraphSAGE layer is trained to produce updated embedding of nodes, where the dimension of updated embedding is 10. Calculate the total number of
learnable parameters within this GraphSAGE layer. You can find the aggregation function of GraphSAGE on the topic 7.2.
Q6 (12 marks)
Consider processing an undirected graph G in the distributed graph system Pregel. Provide the pseudocode to efficiently compute the clustering
coefficient of each vertex. The definition of clustering coefficient can be found in p.23 of topic 6.
Assume we run your algorithm for the graph in Figure 5.3 in three machines. Vertices A and D are in machine #1. Vertices B and E are in machine #2.
-
Vertices C and F are in machine #3. Please calculate how many messages are sent in the algorithm.
Can the combiner optimization be used in this case? If yes, please show your implementation of combiner, and the number of messages if the combiner is
used.
Compute(MessageIterator* msgs){
/* write your pseudocode here */
}
Combine(MessageIterator* msgs){
/* write your pseudocode here */
}
Q7 (15 marks)
Figure 7
Consider a graph G with n vertices and m edges, represented by an adjacency list. Given an arbitrary integer k, design an index structure for efficiently
computing the vertex set of all k-cores and present the corresponding query algorithm. You are expected to achieve high query efficiency while reduce
the index size. Analyze the index space complexity and the query time complexity of your solution. You may describe your index structure textually or draw
figures by taking Figure 7 as an example. Note that only the index structure and the query algorithm are required. The algorithm for index construction is
not necessary.
Ta ke t h e g r a p h G s h ow n i n F i g u r e 7 a s a n ex a m p l e . G i ve n k = 3, yo u r q u e r y a l g o r i t h m s h o u l d r e t u r n a t wo - d i m e n s i o n a l a r r ay : [ [ 3,4,5,6 ] , [ 8,0,1 0,1 1,1 2 ] ] . N o t e
that the vertex sets and vertices in each subgraph can be in any order.
Q8 (15 marks)
Figure 8.1 Figure 8.2
Consider a graph G with n vertices and m edges, represented by an adjacency list. Given a list of new edges E', design an algorithm to find all new
triangles. Please provide the pseudocode of your algorithm and analyze its time complexity.
Ta ke F i g u r e 8.1 a n d F i g u r e 8.2 a s a n ex a m p l e , a l l n e w e d g e s a r e m a r ke d w i t h r e d c o l o r a n d a r e g o i n g t o b e a d d e d i n G , i .e . , E ' = [ [ 1,4 ] , [ 4,5 ] , [ 4,9 ] , [ 9,7 ] , [ 9,8 ] ,
[6,0]].
Your algorithm should retur n a two-dimensional ar ray containing all new triangles from G to G' (i.e.,triangles that are not par t of the original graph G but
have emerged in G'). That is [[1,2,4],[1,4,6],[1,4,5],[4,6,9],[6,9,0],[6,7,9],[7,8,9]]. Note that triangles and vertices in each triangle can be in any order.
def newT(graph, E'):
/* write your pseudocode here */
END OF QUESTIONS