This R worksheet consists solely of assessed questions. The material for these questions recaps all of the R Worksheets on the course so far. The deadline for submitting your solutions to these questions is Thursday 14 December at 1700 – note that this is an earlier deadline than usual.
Answers to the following five assessed questions should be submitted via this Microsoft form.
This work will be marked automatically by computer, so make sure your answers are accurate – the computer does not “know what you meant”; only what you actually enter into the form.
So that (most) students get a different numbers to work with, you
will work from files whose names are based on your student ID number.
Your student ID number is (usually) a 9-digit number starting
201
. Your datasets will be CSV files at addresses
like
https://mpaldridge.github.io/math1710/data/R3-xy.csv
where xy
is replaced by the last two digits of your
student ID number. So if your student ID is 201623429
, then
you would use the file at
https://mpaldridge.github.io/math1710/data/R3-29.csv
; if
your student ID number is 201491200
then you should use the
file https://mpaldridge.github.io/math1710/data/R3-00.csv
and so on. Take care to check you get this correct: if you get your
student ID number wrong and/or use the wrong data file, the computer is
likely to award you 0 marks.
For the first two questions, you should read in the data set at
https://mpaldridge.github.io/math1710/data/R3-xy.csv
where xy
is the last two digits of your Student ID. Note
that the file name is R3-xy.csv
, with a number
3
after the R
.
Assessed Question 1. What is the correlation between the stock prices of Amaflix and Netazon? Give your answer to 4 significant figures.
Assessed Question 2. The 10th entry of the Amaflix stock price was wrongly recorded – the actual value should be double the recorded value. Correct this entry, and then calculate the mean of the first 20 days of the Amaflix data. Give this mean to 2 decimal places.
For the next question, you should read in the data set at
https://mpaldridge.github.io/math1710/data/R10-xy.csv
where xy
is the last two digits of your Student ID. Note
that the file name is R10-xy.csv
, with a number
10
.
Assessed Question 3. Draw a scatterplot from this
R10
data, where the x-axis is the columnx
from this data and the y-axis is the other column (whose name you should find out using R). You should see a clear outlier point, to the top-left, top-right, bottom-left, or bottom-right. Which?
For the next question, you do not need to read in any data.
Assessed Question 4. Let \(X \sim \mathrm{Po}(100)\). What is \(\mathbb P(X \leq 103 \mid X \geq 94)\)? Give your answer to 3 decimal places.
For the final question, you should read in the data set at
https://mpaldridge.github.io/math1710/data/R8-xy.csv
where xy
is the last two digits of your Student ID. Note
that the file name is R8-xy.csv
, with a number
8
.
Assessed Question 5. The
R8
data represents a discrete random variable \(X\) with range given by the columnx
and PMF given by the columnpmf_x
. What is \(\mathbb E\sqrt{X}\), the expected value of the square-root of \(X\)? Give your answer to one decimal place.
Remember to submit your answers via this Microsoft form before 1700 on Thursday 14 December.