程序代写案例-ACS341

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
Module title: ACS341

Assignment Name: Coursework 2

Person responsible and contact
details:
Dr John Oyekan Assignment
weighting:

60%
Assignment released: 21st of April 2021 Hand in date: 14th of May 2021


Assignment due date: Submit your assignment by 23:59 on the 14th of May 2021; this
course work makes up 60% of your total module mark. Submit your report on
Blackboard as a pdf file. Also include your orange file (.ows) and your Matlab codes as
part of your submission.
Extenuating Circumstances: If you have any extenuating circumstances (medical or special
circumstances) that might have affected your performance on the assignment, please complete
an extenuating circumstances form.
Unfair means: All work must be completed as individuals. References should be used to
support your domain analysis research and cited in the appropriate manner. Suspected unfair
means will be investigated and will lead to penalties. For more information on the university
unfair means’ guidance, please check: http://www.shef.ac.uk/ssid/exams/plagiarism.
The challenge: You have been approached by an energy company. They want to be able to
make better predictions on energy usage and demand in Spain. They have provided you with a
dataset that contains 4 years of electrical consumption, electricity generation, pricing, and
weather data for various cities in Spain. The dataset is made up of two parts in one file.
 The first part of the file contains the weather data from 5 Spanish largest cities
(Barcelona, Valencia, Bilbao, Madrid and Seville). The features of the weather data are
quite self-explanatory and include maximum temperature, minimum temperature,
pressure, humidity, snowfall, rainfall among other features.

 The second part of the file contains the amount of power generated by the Electricity
Transmission Service Operator using various energy sources (Nuclear, renewable, coal,
biomass, solar etc) in order to meet the demand from the Spanish cities. It also contains
price predictions (price day ahead), energy required predictions (total load forecast) as
well as the actual price and the actual energy used (total load actual).
Your challenge is to investigate the following:
1. Can you build machine learning models that will offer better predictions to the current
estimates: price day ahead and total load forecast? After building your model, how
different is your prediction to the current predictions (e.g price and total load forecast)
as well as to the actual values?
2. Which town draws more energy and what energy source is used most when demand
raises?
3. Which energy source is used most during different times and conditions (e.g when very
cold, very hot, evening, morning or afternoon)?
4. Which weather features contribute most to energy usage?

In order to solve this challenge:
Clean and pre-process the data (25%): There are some of missing data samples in the
dataset. You will need to come up with a strategy to remove this. Decide on an approach and
explain how and why you followed your approach.
Follow the steps for creating an effective machine learning pipeline (25%): Understand the
domain by doing a domain analysis (Hint: you can use some knowledge from coursework 1.
But make sure you reference your sources in your report), perform feature engineering etc.
Justify what you did in each of these steps.
Decide what machine learning methodologies to use and justify your choice (25%): In no
specific order, make use of PCA to reduce your dataset dimension, linear correlation to see
which variables correlate most to energy demand and which weather feature is most important
to this challenge. Apply Decision Trees (or Random Forest), Regression and Neural Networks
to develop machine learning prediction models. (Hint: you can use Decision Trees to
investigate which energy demand source is used when. You can use both regression and Neural
Network to develop a model to predict energy and price demand).
Use and explain the appropriate metrics to evaluate your model (25%): Discuss how you
used cross-validation in your pipeline (show your training and testing curves) to build a model
that can generalise. Discuss the effects of various training and testing data ratio splits on your
prediction results and how the ratios could aid you in preventing overfitting and underfitting.
What steps did you take to reduce overfitting or prevent underfitting using the degree of
complexity of your polynomial model, number of hidden layers in the Neural Network or the
depth of the Decision Trees (Random Forest). (Hint: You might decide that you want to
compare the Neural Network model with the Regression model using RMSE as a metric to
decide which of them to take forward in your pipeline).
Write your results in no more than a 15 page report. Make sure your report has a table
content, sections, discussion and conclusions. The table of contents, cover page and Appendix
will not count to the 15 pages.
In addition to the above questions to answer:
Using PCA, which data features capture the most variability in the dataset? (Hint: Perform
PCA first, extract the PCs that capture the highest variability in the dataset. Then see which
features contribute to the PCs (Principal Components)). Highlight the most contributing PCs
together with the features that contribute most to them.
Using the features you learnt from the PCA step, can you generate a Neural Network and
Decision Tree (Random Forest) model that predicts energy demand? Can you explain in plain
English how the features contribute to the energy demand? How does this result make sense?
You must create a Matlab code and an orange pipeline design for your solution(s).
Support your report with an orange pipeline design and Matlab code. Make sure you provide
comments in your Matlab code as well as instructions on how to run it.
Hand in your report (.pdf), software (Orange and Matlab) via mole by 12 midnight on the 14th
of May 2021. This course work makes up 60% of your total module mark.

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468