COMP2521: Assignment 2

Social Network Analysis

A notice on the class web page will be posted after each major revision. Please check the class notice board and this assignment page

frequently. The specification may change.

Change log:

[08/04 16:50] Removed some incorrect comments and preprocessor instructions from the provided .c files.

Objectives

to implement graph based data analysis functions to mine a given social network

to give you further practice with C and data structures

Admin

Marks 20 marks

Individual

Assignment This assignment is an individual assignment.

Due 11:59:59pm Friday 23 April 2021 (Friday of Week 10)

Late Penalty 2 marks per day off the ceiling. The latest you can submit the assignment (with late penalty) is 8pm Monday 26April 2021.

Submit Read instructions in the Submission section below.

Aim

In this assignment, your task is to implement graph-based data analysis functions to mine a given social network. You should start by

reading the Wikipedia entries on these topics:

Floyd-Warshall algorithm

Edge betweenness and community structure

The main focus of this assignment is to calculate measures that could identify "influencers", "followers", etc., and also discover possible

"communities" in a given social network.

Dos and Don'ts!

Please note that:

For this assignment you can use source code that is available as part of the course material (lectures, exercises, tutes and labs).

However, you must properly acknowledge it in your solution.

All the required code for each part must be in the respective *.c file. You may not create additional C files.

You may implement additional helper functions.

Helper functions should be declared as static.

After implementing the functions in FloydWarshall.h and CentralityMeasures.h, you may use these functions in other tasks.

You must not modify any .h files.

None of your submitted files should contain a "main" function.

If you have not implemented any part, you must still submit an empty file with the corresponding file name.

Provided Files

We have provided an implementation of a directed graph ADT that you should use. However, you must not modify Graph.h or Graph.c.

Further, you should only interact with a graph via the functions and data structures in Graph.h.

Also note:

you may assume that all edge weights will be greater than zero.

10/4/21 下午1:11

⻚码：1/4

you may assume that graphs will not contain parallel edges or self-loops.

Download files:

Ass2_files.zip - after you have downloaded the files, you should remove _template from the name of all .c files (the _template was

added as a measure to prevent you from overwriting your work if you redownload the files). If you are connected to CSE, you can

download the files from the command-line using the command:

Testing files will be released later this week.

Part 1: Floyd-Warshall algorithm

In order to discover "influencers", we need to repeatedly find shortest paths between all pairs of nodes. In this section, you need to

implement the Floyd-Warshall algorithm, which is a shortest-path algorithm for graphs. Unlike Dijkstra's algorithm, which is a single-

source, shortest-path algorithm, the Floyd-Warshall algorithm computes the shortest distances between every pair of vertices in a graph.

The algorithm begins with the following observation: If the shortest path from to passes through , then the partial paths from to and

to must be minimal as well. A similar idea is adopted in the Warshall algorithm (in Week 7 lecture) for constructing the transitive closure

matrix of a graph.

Consider a graph with vertices numbered 1 through (note: in the provided files, vertices are numbered from 0 to ). In the

-th step, let be a function that returns the shortest path from to that only uses vertices from the set

as intermediate vertices along the way in the graph. In the -th step, the algorithm will then have to find the shortest

paths between all pairs using only the vertices from .

For all pairs of vertices it holds that the shortest path must either only contain vertices in the set , or otherwise must be a

path that goes from to via vertex . This implies that in the -th step, the shortest path from to either remains

or is being improved to , depending on which of

these paths is shorter. Therefore, each shortest path remains the same, or contains the node whenever it is improved. In each iteration,

all pairs of nodes are assigned the cost for the shortest path found so far:

This formula is the heart of the Floyd-Warshall algorithm. The algorithm works by first computing for all pairs

for , then , and so on. This process continues until , and we have found the shortest path for all pairs using any

intermediate vertices.

The Floyd-Warshall algorithm typically only provides the lengths of the shortest paths between all pairs of vertices. We can construct the

shortest path between any two endpoint vertices and by recording the "next vertex" of along the shortest path to . The idea of

recording the "next vertex" is similar to the idea of recording the "predecessor vertex" in Dijkstra's Algorithm. Note that multiple shortest

paths may exist between two vertices. For simplicity, in this assignment, we will only consider the first found shortest path between two

vertices. In the lectures, we will discuss how to construct all of the shortest paths between two vertices in Dijkstra's Algorithm.

Your task: In this section, you need to implement the following file:

FloydWarshall.c

See FloydWarshall.h for details on the data structure that needs to be returned. Note: if there is no path from vertex to vertex , the

distance from to should be set to infinity, defined in FloydWarshall.h, and the next vertex of vertex along the (non-existant) path

to should be set to -1. For all vertices , the next vertex along the path from to itself should be set to -1.

Part 2: Edge Betweenness Centrality

Centrality measures play a very important role in analysing a social network. For example, vertices with higher "betweenness" measures

often correspond to "influencers" in the given social network. In this part you will implement a well known centrality measure, edge

betweenness, for a given graph.

Edge betweenness is an indicator of highly central edges in networks. Mathematically, betweenness centrality of an edge is the sum of

the fraction of all-pairs shortest paths that pass through :

cp /web/cs2521/21T1/assigns/ass2/Ass2_files.zip .

i j k i k

k j

G N N − 1

(k− 1) shortestPath(i, j, k− 1) i j

{1, 2, . . . , k− 1} k

(i, j) {1, 2, . . . , k}

{1, . . . , k− 1}

i j k k i j

shortestPath(i, j, k− 1) shortestPath(i, k, k− 1) + shortestPath(k, j, k− 1)

k

shortestPath(i, j, k) =

min[shortestPath(i, j, k− 1), shortestPath(i, k, k− 1) + shortestPath(k, j, k− 1)]

shortestPath(i, j, k) (i, j)

k = 1 k = 2 k = N (i, j)

i j i j

v w

v w v

w v v

e

e

cB(e) = ∑

s,t∈V , t is reachable from s

δ(s, t|e)

δ(s, t)

( , ) ( , | )

10/4/21 下午1:11

⻚码：2/4

where is the set of nodes, is the number of shortest paths from to , and is the number of those paths passing

through edge .

In this assignment, as we have made an assumption that there is at most one shortest path between each pair of vertices, the formula can

be simplified to the following:

Note that due to our assumption, will be 1 if the shortest path from to passes through , and 0 otherwise.

Your task: In this section, you need to implement the following file:

CentralityMeasures.c

See CentralityMeasures.h for details on the data structure that needs to be returned. It is highly recommended that you use the

Floyd-Warshall algorithm from Part 1 to compute the shortest paths between vertices. Note: the edge betweenness of a non-existant edge

should be set to -1.

Part 3: Discovering Community

Discovering communities in a network, known as community detection/discovery, is a fundamental problem in network science. A network

is said to have community structure if the nodes of the network can be easily grouped into sets of nodes such that each set of nodes is

densely connected internally.

The Girvan-Newman algorithm detects communities by progressively removing edges from the original network. The connected

components of the remaining network are the communities. The Girvan-Newman algorithm focuses on edges that are most likely

"between" communities. It adopts the "edge betweenness" centrality introduced in Part 2. The algorithm's steps for community detection

are summarized below:

1. Calculate the edge betweenness of all edges in the network.

2. Remove the edge(s) with the highest edge betweenness.

3. Recalculate the edge betweenness of all edges affected by the removal.

4. Repeat Steps 2 and 3 until no edges remain.

After the removal of each edge, we only have to recalculate the betweennesses of those edges that were affected by the removal, which is

at most only those in the same component as the removed edge.

The end result of the Girvan-Newman algorithm is a dendrogram. As the Girvan-Newman algorithm runs, the dendrogram is produced from

the top down (i.e. the network splits up into different communities with the successive removal of links). The leaves of the dendrogram are

individual nodes.

Your task: In this section, you need to implement the following file:

GirvanNewman.c

See GirvanNewman.h for details on the data structure that needs to be returned. It is highly recommended that you use the edge

betweenness algorithm from Part 2 to help you calculate the edge betweenness of each edge.

Assessment Criteria

Part 1: Floyd-Warshall Algorithm (25%)

Part 2: Edge Betweenness Centrality (25%)

Part 3: Discovering Community (30%)

Style: (20%)

Submission

You need to submit the following files and only these files:

FloydWarshall.c

CentralityMeasures.c

GirvanNewman.c

V δ(s, t) s t δ(s, t|e)

e

cB(e) = ∑

s,t∈V , t is reachable from s

δ(s, t|e)

δ(s, t|e) s t e

10/4/21 下午1:11

⻚码：3/4

You can submit from the command-line using the following command:

or you can submit via WebCMS.

Note: Currently, submission does not perform any dryrun testing. Thus, if you submit now, it is your responsibility to ensure that your code

compiles and behaves as expected.

As mentioned earlier, please note that:

For this assignment you can use source code that is available as part of the course material (lectures, exercises, tutes and labs).

However, you must properly acknowledge it in your solution.

All the required code for each part must be in the respective *.c file. You may not create additional C files.

You may implement additional helper functions.

Helper functions should be declared as static.

After implementing the functions in FloydWarshall.h and CentralityMeasures.h, you may use these functions in other tasks.

You must not modify any .h files.

None of your submitted files should contain a "main" function.

If you have not implemented any part, you must still submit an empty file with the corresponding file name.

Plagiarism

This is an individual assignment. Each student will have to develop their own solution without help from other people. You are not permitted

to exchange code or pseudocode. If you have questions about the assignment, ask your tutor. All work submitted for assessment must be

entirely your own work. We regard unacknowledged copying of material, in whole or part, as an extremely serious offence. For further

information, read the Student Conduct.

give cs2521 ass2 FloydWarshall.c CentralityMeasures.c GirvanNewman.c

10/4/21 下午1:11

⻚码：4/4

欢迎咨询51作业君