# 程序代写案例-CITS5504

/CITS5504-Data-Warehousing, Sem1, 2020, Final Exam
Date and Time of Exam Creation: 05/20/2020 6:19PM AWST | Total Exam Points: 50 | Est. Completion Time: 0mins Avg. Point Biserial: 0.00 |
Upper 27%: 0.00 | Lower 27%: 0.00 | Disc. Index: 0.00 | Difficulty: 0.00 | Total Questions: 7 | Multiple Choice: 0 | Fill in the Blank: 1 | True/False:
0 | Essay: 6
1.
(3 marks) Suppose the balance of Jim and Lucy before March 20 is zero. The transactional table of 3 days (20th-22nd
March) is shown below. Assume that a daily snapshot is taken at the end of each day. What is the corresponding daily
snapshot table for this transactional table? (You may write the answer in the format of comma separated rows.)
2.
(5 marks) Suppose you are asked to design a data warehouse for the Fitness First gyms with two fact tables and three
dimension tables. The first fact table Visits is to keep track of the  check-in and check-out information of the customers to
the club, which can be used to analyse the duration of each visit. The second fact table Members is to store the customer
membership information for club income analysis.
Given three dimensions Time, Location and Gender, and a measure duration in the Visits fact table, a query across all
three dimensions could be "the average time of each visit for WA branches by female gym goers".
1. If we roll up along the Gender dimension, what would be a sensible query based on the original query?
2. Similarly, given a measure fee in the Members fact table, what would be a sensible query?
3. What schema would you use for this data warehouse?
3.
(6 marks) Given the table below, and assume that a customer can be uniquely identified by the date-of-birth and name;
the city a customer lives in is a slowly changing dimension, answer the following questions.
1. Given the history preserving strategy depicted in the table below, briefly explain how to compute the total expenditure
of each customer using SQL.
2. Choose an alternative strategy for handling slowly changing dimensions, and explain how to compute the total
expenditure of each customer.
3. Discuss the pros and cons of each strategy.

Question ID: 4873 | Point Value: | Categories:
Question ID: 4921 | Point Value: | Categories:
Question ID: 4689 | Point Value: | Categories:
/4.
(6 marks) Given a 3-D cuboid ABC evenly divided into 64 chunks, the cardinality of A is 1600, B is 400 and C is 40. The
total number of memory units to store the 2-D cuboid BC is 400x40, AB is 1600x400 and AC is 1600x40, respectively.
A goal in multi-way array aggregation is to minimise the memory consumption. Suppose
only one chunk of ABC can be stored in the main memory at any time, and
the ABC cuboid can only be scanned once.
1. How many memory units in total are needed to store the temporary aggregated results of the AB, BC and AC cuboids?
2. Briefly explain how to compute the total memory units needed to store the temporary aggregated results.
5.
(10 marks) A supermarket has 5 transactions shown in the table below. Suppose the minimum support of a frequent
pattern is 3.

List two of the closed frequent patterns:
List a max-pattern:
Hints:
A frequent pattern  is closed if none of its immediate super-patterns has the same support as .
A frequent pattern is a max-pattern if is frequent and there exists no frequent super-pattern ⊃.
Question ID: 5043 | Point Value: | Categories:
Question ID: 3100 | Point Value: | Categories:
1 2
3
/6.
(10 marks) Given the data set in the table below, we like to train a decision tree to predict who is likely to buy a computer.

The formula of gini index is shown below.
where D is the data set and  p  is the relative frequency of class j in D.  If a data set  D is split on the attribute A into two
subsets  D  and  D , the gini index  gini (D) is defined as:
1. Explain how to calculate the information gain of the attribute "Student".
2. What is the reduction in impurity if the root node uses the Student attribute?
7.
(10 marks) A university would like to cluster undergraduate students into groups based on the following attributes:
Age (numerical)
Gender (categorical)
Residency (categorical)
Degree (categorical)
ATAR or other University Entry Test Score (numerical)
WAM (Current Weighted Course Average) (numerical)
1. Given two clustering algorithms k-Means and DBScan, which would be more suitable for this clustering task? Explain
why.
2. Suggest distance measures to be used to calculate the similarity between two students.