CSCC46 Social and Information Networks Fall 2020 University of Toronto Scarborough CSCC46 Assignment 2 Instructions • Your work must be your own. • These problems require thought, but do not require long answers. Please be as concise as possible. • This assignment is due by 2pm on Thursday, October 29. You may use up to two flex days to submit the assignment late, meaning the hard deadline after which no late assignments will be accepted is 2pm Saturday, October 31. • Your assignment must be typed; handwritten assignments will not be marked. You may use any word- processing software you like (many academics use LaTeX). Figures can either be drawn electronically, or you may hand-draw, take pictures, and include them as images in your submission. Whatever word-processing you choose, please submit in PDF format. • All assignments must be submitted electronically on MarkUs (see the course webpage for the MarkUs link). Well before the due date, try submitting with MarkUs. You can submit an empty file as a placeholder, and then submit a new version of the file later by using the “Replace” column. Check that you have submitted the correct version of your file by downloading it from MarkUs. Useful Mathematical Facts This assignment is more rigorous than the previous one. Here are some mathematical facts that might be useful. • The Union Bound. This incredibly simple and intuitive fact comes in handy more often than you might expect. It says that for any finite set of events, the probability that at least one of them happens is no greater than the sum of the probabilities of the individual events: P (⋃ i Ai ) ≤ ∑ i P (Ai) The union of a bunch of events simply means “the event that at least one of these events happens”, and obviously cannot exceed the sum of the individual probabilities of the events, because that is the probability that they all happen in the extreme case that they are all independent. It might be smaller if some of the events are overlapping. For example, say that A1 is the event that the Blue Jays win, and P (A1) = 0.4, and say that P2 is the event that it rains, and P (A2) = 0.25. The probability that at least one of A1 and A2 happens (either it rains or they win) cannot be greater than P (A1) +P (A2) = 0.4 + 0.25 = 0.65. It could be smaller though (for example, if it only rains when the Blue Jays win). • A Useful Limit. Oftentimes you want to reason about the probability that a bunch of events don’t happen. If the probability of one of the events happening can be expressed as 1/x, and there are x independent events, then you can use: lim x→∞ ( 1− 1 x )x = 1 e 1 CSCC46 Social and Information Networks Fall 2020 University of Toronto Scarborough • Change of Base Formula. To change the base of a logarithm from α to β, use: logα x = logβ x logβ α Question 1: Signed Networks [25 pts] 1.1 [10 pts] In this question, we will analyze a simple generative model of signed networks to see if balance occurs randomly. Consider the following simple model for constructing random signed networks, which we’ll call the G+ model. Start with a complete graph on n nodes. For each edge e mark its sign as positive with probability p (and thus negative with probability 1− p). All edges are undirected. Let GB denote the event that a graph G is balanced. In this question, we’ll show that P (GB) → 0 as n→∞ for graphs generated according to the G+ model. Assume that p = 1/8. (a) [2 pts] Let T be a maximum set of disjoint-edge triangles in G. A “disjoint-edge” set of triangles is one in which every edge is in exactly one triangle. Give a simple lower bound for |T |, the number of triangles that don’t share any edges in G (the bound should be an increasing function of n). (b) [2 pts] For any triangle in G, what is the probability that it is balanced? (c) [4 pts] Using the simple lower bound from part (a), give an upper bound on the probability that all of the triangles in T are balanced. Show that this probability approaches 0 as n→∞. (d) [2 pts] Explain why the last part implies that P (GB)→ 0 as n→∞. 1.2 [5 pts] If balanced signed networks don’t show up by chance, as we showed in the previous question, how do they arise? One class of mechanisms that researchers have proposed and studied are dynamic processes, in which signed networks can evolve over time to become more balanced. The following describes a very simple example of such a dynamic process: I Pick a triad at random. II If it’s balanced, do nothing. III Otherwise, arbitrarily choose one of the edges and flip its sign so that the triad becomes balanced. Consider the following claim: in this process, the number of balanced triads can never decrease. Is this true? If so, give a proof, otherwise give a counterexample. 1.3 (a) [5 pts] One way signed networks can evolve over time is if new nodes join the network and create new signed edges to nodes already in the network. Consider the simple network shown below. Is it possible to add a node D such that it forms signed edges with all existing nodes (A, B, and C), but isn’t itself part of any unbalanced triangles? Justify your answer. A B C + + − (b) [5 pts] Using your answer to the previous sub-problem, consider the following question. Take any complete signed network, on any number of nodes, that is unbalanced. When (if ever) is it possible for a new node X able to join the network and form edges to all existing nodes in such a way that it does not become involved in any unbalanced triangles? If it’s possible, give an example. If it’s not, give an argument explaining why. 2 CSCC46 Social and Information Networks Fall 2020 University of Toronto Scarborough Question 2: Decentralized Search [50 pts] In class, we saw a decentralized search algorithm based on “geography” that seeks a path between two nodes on a grid by successively taking the edge towards the node closest to the destination. Here we will examine an example of a decentralized search algorithm on a network where the edges are cre- ated according to an underlying hierarchical tree structure. In particular, the nodes of our network will be defined as the leaves of a tree, and the edge probabilities between two nodes will depend on their proximity in this underlying tree structure. The tree may, for instance, be interpreted as representing the hierarchical organization of a university where one is more likely to have friends inside the same department, a bit less likely in the same school, and the least likely across schools. Let us organize students at U of T into a tree hierarchy, where the root is University of Toronto and the second level contains the different schools (engineering, arts and sciences, etc.). The third level represents the departments and the final level (i.e., the leaves) are the U of T students. Fatima, a student from the computer science department, wants to hang out with Max, who is in sociology. If Fatima does not know Max, she could ask a friend in the sociology department to introduce them. If Fatima does not know anybody in the sociology department, she may seek a friend in the U of T humanities school instead. In general, she will try to find a friend who is “close” to Max in the tree. One important thing to keep in mind: In this problem, the network nodes are only the leaf nodes in the tree. Other nodes in the tree are virtual and only there to determine the edge creation probabilities between the nodes of the network. So there are two networks: one is the observed network (i.e., “edges between students”) and the other is the underlying tree structure that is used to generate the edges in the observed network (“the hierarchy of U of T”). Figure 1: Illustration of the graph in Question 2. Black nodes and edges are used to illustrate the hierarchy and structure, but are not part of our network. Red nodes (leaf nodes) and red edges are the ones in our network. The lowest common ancestor of s and t is the root of the tree. The decentralized search proceeds as follows. Denote the starting node by s and the destination by t. At each step, the algorithm looks at the neighbors of the current node s and moves to the one “closest” to t, that is, the algorithm moves to the node with the lowest common ancestor with t. In this graph, from s we move to u. 2.1 This part covers some basic facts of the setting of the hierarchical model and defines the notion of tree-based “distance” between the nodes. Consider a complete, perfectly balanced b-ary tree T (each non-leaf node has b children and b ≥ 2), and a network whose nodes are the leaves of T (the red nodes in Figure 1). Let the number of network nodes (equivalently, the number of leaves in the tree) be N and let h(T ) denote the height of T (see Figure 1 for an example). Recall that a tree with one node has a height of zero. 3 CSCC46 Social and Information Networks Fall 2020 University of Toronto Scarborough Note: When answering questions in this section, feel free to cite well-known properties of trees (e.g., number of nodes in a complete binary tree of a certain height), but please provide evidence of reasoning as well. In other words, we don’t expect you to prove basic tree properties by induction; just provide some sound reasoning. (a) (2 points) Write h(T ) in terms of N . (b) (3 points) Next we want to define the notion of “tree distance.” The intuition we want to capture is that students that share the same department are closer than, for example, students who only share the same school. For instance, in the tree in Figure 1 nodes u and t are “closer” than nodes u and s. We formalize the notion of “distance” as follows. Given two network nodes (leaf nodes) v and w, let L(v, w) denote the subtree of T rooted at the lowest common ancestor of v and w, and h(v, w) denote its height (that is, h(L(v, w))). In Figure 1, L(u, t) is the tree in the circle and h(u, t) = 2. Note that we can think of h(v, w) as a “distance” between nodes v and w. For a given node v, what is the maximum possible value of h(v, w)? (c) (5 points) Given a value d and a network node v, show that there are bd − bd−1 nodes satisfying h(v, w) = d. 2.2 This part helps you design a decentralized search algorithm in the network. We will generate a random network on the leaf nodes in a way that models the observation that a node is more likely to know “close” nodes than “distant” nodes according to our university organizational hierarchy captured by the tree T . For a node v, we define a probability distribution of node v creating an edge to any other node w: pv(w) = 1 Z b−h(v,w) where Z = ∑ w 6=v b −h(v,w) is a normalizing constant. By symmetry, all nodes v have the same normalizing constant. Next, we set some parameter k and ensure that every node v has exactly k outgoing edges. We do this with the following procedure. For each node v, we repeatedly sample a random node w according to pv and create edge (v, w) in the network. We continue this until v has exactly k neighbors. Equivalently, after we add an edge from v to w, we can set pv(w) to 0 and renormalize with a new Z to ensure that ∑ w p(w) = 1. This results in a k-regular directed network. (a) (7 points) Show that Z ≤ logbN . (Hint: use the results in parts 2.1(a) and 2.1(b), and group nodes by their distance. For each distance d, you know exactly how many nodes are at distance d and you have a lower bound for the probability of linking to them). (b) (8 points) For a node v and the target t, let T ′ be the subtree of L(v, t) satisfying: • T ′ is of height h(v, t)− 1, • T ′ contains t, • T ′ does not contain v. For instance, in Figure 1, T ′ of L(s, t) is the tree in the circle. Let us consider an edge e from v to a random node u sampled from pv. We say that e points to T ′ if u is one of the leaf nodes of T ′. Show that the probability of e pointing to T ′ is no less than 1b logbN . (Hint: Note that all of the nodes in T ′ are the same distance away from v, by our notion of tree distance that we defined in 2.1(b). How many nodes are in T ′?) 2.3 (15 points) Let the out-degree k for each node be c · (logbN)2, where c and b are constants. Show that when N grows very large, the probability that v has no edge pointing to T ′ is asymptotically no more than N−θ, where θ is a positive constant which you need to compute. (Hints: Use the result in part 2.2(b) and recall that each of the k edges is independently created. Also, use limx→∞(1 − 1x )x = 1e .) 4 CSCC46 Social and Information Networks Fall 2020 University of Toronto Scarborough Argue why the above result indicates that for any node v, we can, with high probability, find an edge to a (leaf) node u satisfying h(u, t) < h(v, t). 2.4 (10 points) Show that starting from any (leaf) node s, within O(logbN) steps, we can reach any (leaf) node t. You do not need to prove it in a strict probabilistic argument. You can just assume that for any (leaf) node v, you can always get to a (leaf) node u satisfying h(u, t) < h(v, t) and argue why you can reach t in O(logbN) steps. Power Laws [25 pts] 3.1 [10 pts] The main summary statistics of distributions are called moments. For example, the mean is also known as the first moment and the variance is also known as the second moment. In Lecture 6, we calculated the mean of a power law distribution. In this question, calculate the skewness of a power law distribution, which is also known as the third moment. That is, calculate ∫∞ xmin x3p(x)dx, where p(x) = α− 1 xmin ( x xmin )−α 3.2 [15 pts] As we learned in class, the power law can be a great fit for the heavy-tailed data we en- counter in real-world networks. But one problem we have to overcome is: what parameters are the best fit? The power law is characterized by the minimum x-value xmin and the slope α. In this question, we’ll learn how to estimate α from data. Say you have a dataset {xi | i ∈ {1, . . . , n}} of n numbers, and you suspect these numbers come from a power law distribution and want to find the best fit α. The strategy we’re going to use to estimate α is called the maximum likelihood approach. We’re going to compute the log-likelihood of the data given a power law model, then calculate which value of α maximizes this log-likelihood (we take the log of the likelihood to simplify the math). (a) [5 pts] Let L(α) = ln (∏ni p(xi)) be the log-likelihood of the data as a function of α. Plug in the definition of a power law for p(xi) (shown above in Question 1.1) and simplify the expression for L(α). (b) [10 pts] Now take the derivative of L(α) with respect to α and set it to zero: dL(α) dα = 0 This is now a simple single-variable calculus problem. Solve this equation to get the value of α that maximizes the log-likelihood of the data. That is the value of α that fits the data best. 5
欢迎咨询51作业君