辅导案例-COMP9318

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Instructions¶
1. This note book contains instructions for COMP9318-Lab2.
2. You are required to complete your implementation in a
file submission.py provided along with this notebook.
3. You are only allowed to use Python 3.6 for implementation.
4. You are not allowed to print out unnecessary stuff. We will not consider any output
printed out on the screen. All results should be returned in appropriate data structures
return by corresponding functions.
5. You need to submit the code for Lab2 via following
link: https://kg.cse.unsw.edu.au/submit/
6. For each question, we have provided you with detailed instructions along with
question headings. In case of any problem, you can post your query @ Piazza.
7. If you choose to skip a question, leave the corresponding function body as it is (i.e.,
keep the pass line), otherwise it may affect your mark for other questions.
8. You are allowed to add other functions and/or import additional modules (you may
have to in this lab), but you are not allowed to define global variables. Only functions
are allowed in submission.py.
9. You should not import unnecessary modules/libraries, failing to import such modules
at test time will lead to errors.
10. We will provide immediate feedback on your submission. You can access your scores
using the online submission portal on the same day.
11. For Final Evaluation we will be using a different dataset, so your final scores may
vary.
12. You are allowed to submit as many times as you want before the deadline, but ONLY
the latest version will be kept and marked.
13. Submission deadline for this assignment is 20:59:59 on 23rd March, 2020 (Sydney
Time). We will not accept any late submissions.
Question 1: Optimized BUC algorithm (100
points)
You need to implement the full buc_rec_optimized algorithm with the single-tuple
optimization (as described below). Given an input dataframe:
A B M
1 2 100
2 1 20
Invoking buc_rec_optimized on this data will result in following dataframe:
A B M
1 2 100
1 ALL 100
2 1 20
2 ALL 20
ALL 1 20
ALL 2 100
ALL ALL 120
We have pre-defined the function buc_rec_optimized in the file submission.py, and
its helper functions are defined in the file helper.py.
Note: You should use the functions defined in the file helper.py, you are not allowed to
change this file. We will provide this file in the test environment.
Input and output
Both input and output are dataframes.
The input dataframe (i.e., the base cuboid) is directly generated from the input file. Given the
dimensionality of the base cuboid is d, each row is like:
v_1 v_2 ... v_d m
where v_i is the cell's value on the i-th dimension, and m is the measure value.
The output dataframe contains n rows, each for a non-empty cell in the compute data cube
derived from the input base cuboid. Each row is formatted like input:
v_1 v_2 ... v_d m
where v_i is the cell's value on the i-th dimension, and m is the measure value.
The single-tuple optimization
Consider the naive way of recursive implementation of the BUC algorithm, you will notice
that it uses several recursive calls to compute all the derived results from an input that consists
of only one tuple. This is certainly a waste of computation.
For example, if we are asked to compute the cube given the following input
B C M
1 2 100
We can immmediately output the following, without using any recursive calls.
1 2 100
* 2 100
1 * 100
* * 100
** Note: For lab-2, you are allowed to use only two libraries, i.e., pandas, and numpy.**
In [1]:
import pandas as pd
import numpy as np
In [2]:
##============================================================
# Data file format:
# * tab-delimited input file
# * 1st line: dimension names and the last dimension is assumed to be the
measure
# * rest of the lines: data values.

def read_data(filename):
df = pd.read_csv(filename, sep='\t')
return (df)

# helper functions
def project_data(df, d):
# Return only the d-th column of INPUT
return df.iloc[:, d]

def select_data(df, d, val):
# SELECT * FROM INPUT WHERE input.d = val
col_name = df.columns[d]
return df[df[col_name] == val]

def remove_first_dim(df):
# Remove the first dim of the input
return df.iloc[:, 1:]

def slice_data_dim0(df, v):
# syntactic sugar to get R_{ALL} in a less verbose way
df_temp = select_data(df, 0, v)
return remove_first_dim(df_temp)
In [3]:
def buc_rec_optimized(df):# do not change the heading of the function
pass # **replace** this line with your code
In [4]:
## You can test your implementation using the following code...
import helper
import submission as submission
input_data = read_data('./asset/a_.txt')
output = submission.buc_rec_optimized(input_data)
output
Out[4]:
A B M
0 1 2 100
1 1 ALL 100
2 2 1 20
3 2 ALL 20
4 ALL 1 20
5 ALL 2 100
6 ALL ALL 120