辅导案例-EE4211

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
EE4211 Data Science for IoT, Project Description, Oct. 2019 Page 1 of 3
Instructions:
In this project, you are given a dataset collected by an actual IoT system (see description
below) and asked to use the dataset to build a forecasting model. You have to answer a set
of questions, as well as propose your own interesting questions.
1. Form teams in groups of 4 students and select a name for your team. Be creative! Please
email me your group members and team name by Monday 07 October 2019, 12 noon.
2. Interim Report due Tuesday 22 October 2019
(a) Start with Question 1 on exploring the data. Use a Jupyter notebook (ipynb file)
to do the analysis and generate a PDF file of your Python notebook. Prepare
an interim report answering all the parts of Question 1. Submit (i) PDF of your
interim report answering all of the parts of Question 1, (ii) PDF file/Print preview
of your Jupyter notebook, and (iii) the Jupyter notebook (ipynb file).
(b) Then propose additional analysis using the dataset given, justifying why this addi-
tional analysis is useful and interesting. Write a brief 1-2 page proposal describing
your proposed work.
(c) Zip all four files into one zip file named as Group Name Interim.zip and upload to
the appropriate LumiNUS folder by Tuesday 22 October 2019, 23:59.
3. Final Report due Tuesday 12 November 2019
(a) Complete Questions 2 and 3. Use a Jupyter notebook (ipynb file) to do the analysis
and generate a PDF file of your Python notebook. Prepare a final report answering
all the parts of Questions 2 and 3. Submit (i) PDF of your final report answering
all of the parts of Questions 2 and 3, (ii) PDF file/Print preview of your Jupyter
notebook, and (iii) the Jupyter notebook (ipynb file).
(b) Zip all three files into one zip file named as Group Name Final.zip and upload to
the appropriate LumiNUS folder by Tuesday 12 November 2019, 23:59.
Data File:
The data file is available in the LumiNUS Files under the directory ”Class Project”.
Data Description:
In this project, we will consider natural gas consumption data from residential consumers.
The smart gas meter data used for this paper was obtained from the Pecan Street project
(https://www.pecanstreet.org/). The source of the data are homes in the Mueller neighbor-
hood of Austin, Texas, USA. The homes in this neighborhood are primarily newly constructed,
and include single-family homes, apartments, and town homes. Itron Centron SR smart gas
meters are deployed in these homes and these meters send their information to a gateway
inside the home. The gateway uses the home’s Internet connection to send the data to the
meter data management system (MDMS) or the processing center. The gas meters measure
the cumulative gas consumption at a frequency of 15 seconds. The meters report a reading
(in terms of the cumulative consumption) when the last marginal 2 cubic foot (or higher) of
natural gas passes through the meter. Data from a six month interval (1 Oct 2015 to 31 Mar
2016) has been provided. The data has the following format:

EE4211 Data Science for IoT, Project Description, Oct. 2019 Page 2 of 3
The timestamp provides the date as well as the the hour and minute values when each reading
was taken. Each meter has an unique identifier (MeterID). Recall that the meter readings
are cumulative and not generated at periodic intervals.
Questions:(30 pts)
1. Exploring the Data (10 pts)
1.1 How many houses are included in the measurement study? Are there any mal-
functioning meters? If so, identify them and the time periods where they were
malfunctioning. The information below regarding data collection may be useful.
1.2 Generate hourly readings from the raw data. Select one month from the 6-month
study interval and plot the hourly readings (time-series) for that month. Hint:
You will have to decide what to do if there are no readings for a certain hour.
1.3 Intuitively, we expect that gas consumption from different homes to be corre-
lated. For example, many homes would experience higher consumption levels in
the evening when meals are cooked. For each home, find the top five homes with
which it shows the highest correlation.
2. Forecasting (10 pts)
2.1 In this part, you will asked to build a model to forecast the hourly readings in
the future (next hour). Can you explain why you may want to forecast the gas
consumption in the future? Who would find this information valuable? What can
you do if you have a good forecasting model?
2.2 Build a linear regression model to forecast the hourly readings in the future (next
hour). Generate two plots: (i) Time series plot of the actual and predicted hourly
meter readings and (ii) Scatter plot of actual vs predicted meter readings (along
with the line showing how good the fit is).
2.3 Do the same as Question 2.2 above but use support vector regression (SVR).
3. Student Proposal (10 pts)
3.1 At this point, you understand the data quite well. Propose and carry out additional
analysis using the dataset given. Please be sure to justify why this additional
analysis is useful and interesting.
EE4211 Data Science for IoT, Project Description, Oct. 2019 Page 3 of 3
Additional Information about Data Collection:
1. Gas flow meters have a sensor that is used to measure the volume of gas that passes
though a pipe. Different meters use different sensors (e.g. ultrasonic sensors, synthetic
diaphragm with rotating valve etc.). The meters check on the sensors periodically to
get a reading of the current consumption value. This is what is meant in the sentence
above: ”The gas meters measure the cumulative gas consumption at a frequency of 15
seconds.”
2. Now, just because the meter has obtained a reading from the sensors, it does not not
have to send the reading off to the meter data management system (MDMS). Imagine
1.3 million households in Singapore sending out gas readings every 15 seconds to Singa-
pore Power. The processing and bandwidth requirements may be too high for Singapore
Power. So Singapore Power may wish for the meters to report at a lower frequency or
when the consumption exceeds a certain threshold. However, the smart meter manufac-
turer does not know what is the reporting criterion of its users. So it builds meters that
can read every 15 seconds because it thinks that this is a frequency that is high enough
for all potential customers. The ”reporting” frequency to the MDMS (as opposed to the
”measuring” frequency) can be determined by the user of the meter such as Singapore
Power.
3. So when are the meters supposed to ”report” to the MDMS? The documentation that
came with the data says ”once the marginal consumption exceed 2 cubic meters”. As
you may observe in the data, this is not necessarily the case in some of the readings.
So is that an anomaly? That is for you to decide and justify. If you were Singapore
Power, under what circumstances would you think that a meter reading is suspicious
and decide to investigate? Remember that there are two sides to the story. If you do
not receive a reading from a meter for a really long time, would you think that the
meter is defective? So would that justify sending a reading even if the consumption has
not increased?
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468