CS 4320/5320 - Homework 7 Part 1. Distributed Data Processing. We have seen distributed query processing methods in class. In particular, we have seen distributed joins and how to estimate their processing costs. Describe a scenario in which data is distributed across at least three sites. Describe how the data is distributed, the data size, and all data properties relevant for the following calculations. Now, describe a query and at least two alternative query plans for it. Both query plans must contain at least one distributed join and both plans must contain different join operators. Estimate processing costs for both plans according to the method seen in class. Your are allowed to choose constants freely and to adopt a cost unit of your choice. You are allowed to make additional assumptions on data or system if needed for cost calculations (as long as they are consistent with the cost calculations seen in class). Those assumptions must be clearly stated. Conclude which plan is cheaper to execute. Part 2. Graph Data Processing. We have seen graph processing using the Cypher language via the Neo4j system in class (www.Neo4j.com). Describe a scenario of your choice. Next, describe a sequence of Cypher commands that create a graph representing the situation you describe. The graph must contain at least five nodes and at least five relationships. Furthermore, each node and relationship should have a label and at least one property. For each command, show us Cypher code and describe in one sentence its semantics. Next, write five Cypher queries analyzing a graph. Those queries may refer to the graph you created in the first step. Alternatively, they may refer to one of the graphs that are offered in the Neo4j online demo (https://neo4j.com/try-neo4j). For each query, describe precisely its intended semantics. Also, describe the query result (either verbally or, if using the online demo, include a screenshot of the result plot). The five queries must cover the following cases: • At least one query that uses aggregation. • At least two queries must match a pattern with at least two nodes. • At least one query must contain a non-empty WHERE clause. • At least one query must reference a property of a node or relationship in the MATCH clause. Submit your solution for all steps as one single .pdf file. You receive up to 40 Points for Part 1 and 50 Points for Part 2.
欢迎咨询51作业君