辅导案例-ST3MVA

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top


Page 1
ST3MVA
Assignment 1 – Hand in Date: 12(noon) 6th Nov 2019
 Late work will be subject to University policy.
 You can complete this assignment either individually or in pairs, if working in a
pair you must email me your names before 12 (noon) 25th October,
 The datasets are available to download from Blackboard (from the
Assessments page).
 Note the page limits – any work outside of these limits will not be read! R
output, including graphs are not included in the page limit so do please submit
sensibly sized output.
Question 1:
Mid infrared spectroscopy (MIR) is a method involving infrared light being beamed
at a sample of matter and the absorption of the light is then measured: different
absorptions are seen for different wavelengths of light. It is expected that
different samples will respond slightly differently depending on their makeup, for
example if samples of fruit were being tested, measurements from one
wavelength might vary due to the degree of ripeness, measurements from
another wavelength might vary depending on the concentration of juice.
Therefore, such data can be useful in modelling variables for which it is otherwise
hard to obtain measurements for directly.
You have the MIR absorption values for 30 different wavelengths (variables wl1 to
wl30) for 29 samples of manure. The aim is to see if there are clear patterns in
the data. Furthermore, it is of interest to see which of the wavelengths are best
for uncovering any patterns in the data.
a)
In R (or R Studio) apply principal components analysis to this dataset. Include all
relevant R output in your submission.
In your answer you should include ONLY R output (graphs, and results) that you
think are RELEVANT even if they are not specifically referred to in the questions
below (for example if PCs 8 and 9 are not important then don’t include
loadings/scores plots of PC8 vs PC9).
Parts b) to d) cover the interpretation, as such you should not include any
discussion/comments amongst your output submitted for part a).
[30 marks]

Department of
Mathematics and Statistics


Unit name goes here

Page 2
The page limit for the following written parts of this question is ¾ of a page of A4.
b)
How many principal components do you think should be interpreted and why?
[5 marks]
c)
Using the principal component output interpret your chosen number of principal
components.
[15 marks]
d)
Interpret your scores plot/s – do there appear to be any groups of samples? If
so, what properties do these group have?
[15 marks]
Question 2:
A dog toy manufacturer has conducted a small-scale piece of market research to
investigate customer reactions to their latest product.
The manufacturer asked 15 dog owners to use the new toy to play with their dog
for a week. Each dog owner then completed a questionnaire rating 8 different
aspects of the toy from ‘durability’, to ‘ease of play’ to ‘cost’ – these are variables
A1 – A8 in the dataset. Each aspect has been given an integer rating from 0 to 9,
where 0 is the worst rating and 9 the best.
Before you read the dataset into R you need to add yourself/yourselves as extra
individual/s! The rating you will provide for each aspect is based on your unique 8
digit student number/s. For example if my student number was 28374615 I
would add myself to the dataset as follows: ID of 28374615 and ratings of A1 = 2,
A2 = 8, A3 = 3, A4 = 7, A5 = 4, A6 = 6, A7 = 1, and A8 = 5, see below:
If you are working in a pair you should add both of you as two new individuals in
the dataset (resulting in 17 individuals in total). If you have any questions
regarding this please speak to/email me ([email protected]).
You should upload your individualised dog toy dataset, or provide a screen shot!
Page 3
a)
Produce a set of star or segment glyphs for the dog toy data, including
yourself/yourselves as the additional individual/s, using R (or R Studio).
Your plot should have a sensible title which includes your student number/s.
[10 marks]
b)
In R (or R Studio) produce two dendrograms of your dog toy data, including
yourself/yourselves as the additional individual/s.
The distance measure you should use for both plots is Manhattan.
The two clustering algorithms you should consider are furthest
neighbour/complete linkage and nearest neighbour/single linkage.
Your plots should have sensible titles including the clustering algorithm and your
student number/s.
(It is OK for your additional individuals to be displayed as 16 and 17 in your plots).
[10 marks]

You are required to write no more than a couple of sentences each for c), d) and
e) as such the page limit for the following written parts of this question is ½ page
of A4, minimum font size 12 with standard margins.
c)
Specify one clear example of agreement between your two dendrograms and
one clear example of how they differ.
[5 marks]
d)
State your preferred dendrogram (there is no right or wrong answer for this), and
suggest a height at which you would cut it. You should justify why you would cut
at your chosen height.
[5 marks]
e)
Consider both your glyphs and your chosen preferred dendrogram together.
Briefly discuss similarities in the conclusions that can be made from the two
analyses, you should highlight an example in your discussion. What additional
information is provided in your glyphs?
[5 marks]
Further guidance:
Be succinct – these are not trick questions. Focus on exactly what the question is
asking and answer it directly.
Page 4
What to upload:
You can upload a single word or pdf file containing all of your R output and
answers, if you prefer you can upload one file per question (but not for each part
of a question).
You should upload your individualised dog toy dataset, or provide a screen shot!
(I will be reproducing your plots to mark them so I need to be able to check that
the data you are entering is correct).
You do not need to submit any R code.
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468