程序代写案例-COMP5338

* Some questions not yet marked
COMP5338 Exam Main 2020
Due Nov 25 at 23:59 Points 100 Questions 18 Available Nov 9 at 18:00 - Nov 25 at 23:59 16
days
Time Limit 130 Minutes Allowed Attempts Unlimited
Instructions
Attempt History
Attempt Time Score
LATEST Attempt 1 less than 1 minute 0 out of 100 *
Submitted Nov 19 at 19:44
This is an open book exam, you are allowed to consult course material and your own notes while attempting the questions.
This exam contains 12 multiple choice questions and 6 short answer questions.
Question 1~7 are independent multiple choice questions.
Question 8~12 refer to a common scenario. The scenario description is given before question 8.
All short answer questions have a single text field for you to type the answer. Some short answer questions have multiple parts. The
answer to each part should be clearly labelled in the answer text field.
You are required to answer all questions in your own words. Copying large amount of text from course material or Internet may result in
plagiarism allegation.
Take the Quiz Again
0 / 2 ptsQuestion 1Unanswered
The timestamp compression algorithm used in Facebook's Gorilla might not be suitable for
Multiple related time series data
Unevenly spaced time series data Correct Answer
Long running time series data
High frequency time series data
0 / 2 ptsQuestion 2Unanswered
Which one of the following is TRUE about R-Tree index of spatial data
The top level MBRs should not overlap with each other
R-Tree queries may need to exam multiple siblings of a node at any level Correct Answer
One MBR may be covered by many upper level node’s MBR and associated with all of them
The top level MBRs should cover the whole space
0 / 2 ptsQuestion 3Unanswered
Which of the following statements is NOT TRUE about InfluxDB data model?
A time series may have no tag.
A time series contains many data points with the same series key.
A measurement may include multiple time series.
A time series may include multiple fields. Correct Answer
0 / 2 ptsQuestion 4Unanswered
Which of the following statement is NOT TRUE about Bigtable design

The metadata table usually has a higher replication factor than the replication factor of regular tables.
Correct Answer
Bigtable is not designed to support general purpose queries
Bigtable is designed to optimize write query performance
Bigtable does not support join operation
0 / 2 ptsQuestion 5Unanswered
What is the Z-order value of point (2,5) with X and Y coordinates each can be represented as 3 bit
binary
25 Correct Answer
7
10
30
0 / 2 ptsQuestion 6Unanswered
Which of the following is TRUE about Grid file index of spatial data?
Queries looking for a specific point only need to check a single space region (bucket) Correct Answer
There should be equal number of grid lines in each dimension
The space regions segmented by grid lines may overlap with each other
The spacing between adjacent grid lines in the same dimension should be the same.
0 / 2 ptsQuestion 7Unanswered
Which one of the following is NOT an example of self-describing feature of semi-structured data?
including property names in each node in a property graph model
including header row in a csv file Correct Answer
including field names in JSON document
including element tags in XML document
Question 8-12 refer to a collection stores, with the following five documents:
{_id:1, name: "Java Hut", description: "Coffee and cakes"},
{_id:2, name: "Burger Buns", description: "Gourmet hamburgers"},
{_id:3, name: "Coffee Shop", description: "Just coffee"},
{_id:4, name: "Clothes Clothes Clothes", description: "Discount clothing"},
{_id:5, name: "Java Shopping", description: "Indonesian goods"}

The collection has an index created by the following command:
db.stores.createIndex( { name: "text", description: "text" } )
The collection is stored in a replica set with three members. All members have the same data at
the beginning. During a period between t and t , one write query and one or more concurrent
read queries are sent to the collection. No other write query is sent in the period. The respective
write and read query are as follows.
db.stores.updateMany(
{description:{$regex: ".*Coffee.*", $options:'i'}},
   {$set: {category: "restaurant"}}
)
db.stores.find( { $text: { $search: "java coffee shop" } } )

Below are a series of events triggered by the update query:
The write query is sent at t ;
The write is applied on primary at t ;
0 10
0
1
It arrives at secondary 1 at t and is applied at secondary 1 at t ;
The primary is aware of successful replication to secondary 1 at t ;
The write arrives at secondary 2 at t and is applied at secondary 2 at t ;
Secondary 1 receives notice to update its snapshot of its most recent majority write at t ;
The primary is aware of successful replication to secondary 2 at t
Secondary 2 receives notice to update is snapshot of its most recent majority write at t .
2 3
4
5 6
7
8
9
0 / 2 ptsQuestion 8Unanswered
Assume concurrent read requests are sent from multiple clients, all read request set the read
concern level to `local’. They have different read preference setting. What is the earliest time for all
read request to return the same results.
t9
t4
tCorrect Answer 6
t8
0 / 2 ptsQuestion 9Unanswered
Assume the read preference is set to secondary and the read concern is set to ‘majority. And the
read request arrives at secondary 1 after t . Which of the following would be the result of the
query?
7

{_id:1, name: "Java Hut", description: "Coffee and cakes"},
{_id:3, name: "Coffee Shop", description: "Just coffee", category: "restaurant"},
{_id:5, name: "Java Shopping", description: "Indonesian goods"}

{_id:1, name: "Java Hut", description: "Coffee and cakes", category: "restaurant"},
{_id:3, name: "Coffee Shop", description: "Just coffee"},
{_id:5, name: "Java Shopping", description: "Indonesian goods"}

{_id:1, name: "Java Hut", description: "Coffee and cakes", category: "restaurant"},
{_id:3, name: "Coffee Shop", description: "Just coffee", category: "restaurant"},
{_id:5, name: "Java Shopping", description: "Indonesian goods"}
Correct Answer

{_id:1, name: "Java Hut", description: "Coffee and cakes"},
{_id:3, name: "Coffee Shop", description: "Just coffee"},
{_id:5, name: "Java Shopping", description: "Indonesian goods"}
0 / 2 ptsQuestion 10Unanswered
Assume the read preference is set to primary and the read concern is set to ‘majority’. And the
read request arrives at the primary between t and t . What of the followings would be the possible
result of the query?
2 3

{_id:1, name: "Java Hut", description: "Coffee and cakes", category: "restaurant"},
{_id:3, name: "Coffee Shop", description: "Just coffee", category: "restaurant"},
{_id:5, name: "Java Shopping", description: "Indonesian goods", category: "restauran
t"}

{_id:1, name: "Java Hut", description: "Coffee and cakes", category: "restaurant"},
{_id:3, name: "Coffee Shop", description: "Just coffee"}
{_id:5, name: "Java Shopping", description: "Indonesian goods"}

{_id:1, name: "Java Hut", description: "Coffee and cakes"},
{_id:3, name: "Coffee Shop", description: "Just coffee"},
{_id:5, name: "Java Shopping", description: "Indonesian goods"}
Correct Answer

{_id:1, name: "Java Hut", description: "Coffee and cakes", category: "restaurant"},
{_id:3, name: "Coffee Shop", description: "Just coffee", category: "restaurant"},
{_id:5, name: "Java Shopping", description: "Indonesian goods"}
0 / 2 ptsQuestion 11Unanswered
Assume the write concern is set to ‘majority’; which member will send the acknowledgement to the
client and at what time?
Secondary 2; at t6
Primary; at tCorrect Answer 4
Secondary 1; at t3
Primary; at t1
0 / 2 ptsQuestion 12Unanswered
Assume the read preference is set to secondary and the read concern is set to ‘local’. And the read
request arrives at secondary 2 before t . Which of the following would not be the possible result of
the query?
6

{_id:1, name: "Java Hut", description: "Coffee and cakes"},
{_id:3, name: "Coffee Shop", description: "Just coffee", category: "restaurant"},
{_id:5, name: "Java Shopping", description: "Indonesian goods"}

{_id:1, name: "Java Hut", description: "Coffee and cakes"},
{_id:3, name: "Coffee Shop", description: "Just coffee"},
{_id:5, name: "Java Shopping", description: "Indonesian goods"}


{_id:1, name: "Java Hut", description: "Coffee and cakes", category: "restaurant"},
{_id:3, name: "Coffee Shop", description: "Just coffee"},
{_id:5, name: "Java Shopping", description: "Indonesian goods"}

{_id:1, name: "Java Hut", description: "Coffee and cakes", category: "restaurant"},
{_id:3, name: "Coffee Shop", description: "Just coffee", category: "restaurant"},
{_id:5, name: "Java Shopping", description: "Indonesian goods"}
Correct Answer
Not yet marked / 6 ptsQuestion 13
Your Answer:
Unanswered
Organisations dealing with a variety of data may adopt the polyglot persistence option or the
multi model database option to handle data with different features.
Using your own words to describe each option [2 points].
Describe the also the advantage and disadvantages of each option[4 points].
Not yet marked / 6 ptsQuestion 14
Your Answer:
Unanswered
Scaling out is the preferred mechanism database use to handle scalability issue. There are two
options to scale out. Using a system you have learned to describe each option and the scalability
issue each can solve.
Not yet marked / 9 ptsQuestion 15
Your Answer:
Unanswered
Most real word entities have complex relationship with each other. Traditional RDMBS relies on
JOIN operations to query or analyse related data. When data size grows, join operation could
easily become the performance bottle neck. NoSQL systems use various mechanisms to avoid
expensive join operation. Describe at least three such mechanisms and how each is used in a
particular system.
Not yet marked / 15 ptsQuestion 16Unanswered
Your Answer:
All parts of this query refer to a MongoDB collection revisions. This collection is used to store
revisions of Wikipedia pages. Each document in the collection represents a revision. Below is a
sample document showing only fields relevant to the question:
{
"_id": 11890891,
"title": "US Election"
"user": "Bot1",
"timestamp": “2016-09-30T08:18:54Z”   
}
The “_id” field stores the unique revision ID of a revision. The “title” field stores the title of the
Wikipedia page the revision belongs to. The user field stores the name of the user who made the
revision. The “timestamp” field stores the time when this revision is made.For convenience the
value of the timestamp field is shown in String format, but should be considered as having proper
Date type. Assume the current revisions collection contains revision documents belonging to 20
unique titles made by 100 unique users. Indexes have been set up using the following two
commands:
db.revisions.createIndex({title: 1, timestamp:1})
db.revisions.createIndex({user:1})
An aggregation query is used to create another collection revsummary:
db.revisions.aggregate([
{$group: {_id: {title: "$title", user: "$user"},
first: {$min:"$timestamp"},
last: {$max:"$timestamp"}}},
{$project: {_id: 0,
title: "$_id.title",
user: "$_id.user",
age: {$subtract: ["$last", "$first"]}}},
{$match: {age: {$gt:0}}},
{$group: {_id: "$title",
users: {$push: {user:"$user", age:"$age"}}}},
{$out: "revsummary"}
])
1. [2 points] Show a sample output document of the project stage.
2. [2 points] Which stage(s) may have more output documents than input documents.
3. [2.5 points] Write a query to find the user that has made the smallest number of revisions in
page “US Election” in 2020. You can run the query on revisions or revsummary collection.
4. [2.5 points] Write a query to find which page has been edited by the smallest number of users.
You can run the query on revisions or revsummary collection.
5. [6 points] Describe all possible query plans for the following query.
db.revisions.find({user:"aUser", title: "US Election")).sort({timestamp:-1})
Not yet marked / 15 ptsQuestion 17Unanswered
Suppose we have a Dynamo cluster with 5 nodes: n0~n4. They are of different capacity. The
tokens assigned to each node are as follows:
n0: 5, 35; n1: 15; n2: 24, 72; n3: 49; n4: 61, 90
Your Answer:
The ring space range is [0,99]. The cluster has a replication factor of 3; the size of key’s
preference list is 4; both read and write requests need to get responses from at least two replica
nodes to be considered successful. A table about web pages is stored in this cluster, with keys
equal to a page’s URL.
1. [4 points] Suppose the hash value of key “www.cnn.com (http://www.cnn.com) ” is 50. What
is the preference list of this key?
2. [4 points] Suppose the hash value of key “www.bbc.co.uk (http://www.bbc.co.uk) ” is 20. It
has two different versions with the following vector clocks: ([n2, 3],[n0,1]) and ([n2, 2],[n0,1]); A
client request to update this record is coordinated by n2, what would be the vector clock of the
updated version? Which nodes will be contacted to complete the write request?
3. [4 points] Key “www.amazon.com (http://www.amazon.com) ” has two different versions
with the following vector clocks (([n4, 5],[n3,1],[n0,1]), ([n4,4],[n3,2]) What is the common
ancestor of the two vector clocks? Explain what happens on each branch?
4. [3 points] Assume the data size on node n3 has reached its maximum capacity, while the data
size on node n2 has significantly decreased. Describe steps can be taken to re-balance the
data distribution among nodes.
Not yet marked / 25 ptsQuestion 18Unanswered
All parts of this question are based on the sample Neo4j property graph shown below. The graph
models a Twitter-like system. It contains two types of nodes: User and Tweet. There are four types
of relationships between nodes in the graph. A user can post many tweets. A user can follow
other users. A tweet may reply or retweet other tweets. In the sample graph, assume the number
inside each Tweet node represents the node id; the number on each relationship edge represents
the relationship id as well as the creation order of that relationship. The id of User nodes are as
follows:
u1:1; u2: 2; u3: 3.
Your Answer:
1. [3 points] Write a query to find out the average number of retweets and replies a tweet has.
2. [6 points] Write a query to find out the number of tweets posted by a user with name ‘u2’ as
RETWEETS to tweets posted by all users followed by ‘u2’.
3. [2 points] Write down the value of byte 1~4 in the node record representing node u1.
4. [6 points] For the POSTS relationship record with id 3, write down the value of
byte 1~4
byte 5~8
byte 13~16
byte 17~20
byte 21~24
byte 25~28
5. [8 points] Describe the query plan of the following query. You do not need to use the exact
operator name when describing the plan. The ProduceResult operator can be omitted. The
description should contain enough information such as the pattern to be matched, filter to be
applied.
MATCH (u:User{name:"u3"})-[f:FOLLOWS]->(fu:User)-[:POSTS]->(t:Tweet)
WITH fu, count(t) as ts
ORDER BY ts DESC LIMIT 1
RETURN fu, ts

欢迎咨询51作业君
51作业君 51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: ITCSdaixie