This R worksheet consists solely of assessed questions. The material for these questions recaps all of the R Worksheets on the course so far. The deadline for submitting your solutions to these questions is Thursday 14 December at 1700 – note that this is an earlier deadline than usual.

Answers to the following five assessed questions should be submitted via this Microsoft form.

This work will be marked automatically by computer, so make sure your answers are accurate – the computer does not “know what you meant”; only what you actually enter into the form.

So that (most) students get a different numbers to work with, you will work from files whose names are based on your student ID number. Your student ID number is (usually) a 9-digit number starting 201. Your datasets will be CSV files at addresses like

https://mpaldridge.github.io/math1710/data/R3-xy.csv

where xy is replaced by the last two digits of your student ID number. So if your student ID is 201623429, then you would use the file at https://mpaldridge.github.io/math1710/data/R3-29.csv; if your student ID number is 201491200 then you should use the file https://mpaldridge.github.io/math1710/data/R3-00.csv and so on. Take care to check you get this correct: if you get your student ID number wrong and/or use the wrong data file, the computer is likely to award you 0 marks.


For the first two questions, you should read in the data set at

https://mpaldridge.github.io/math1710/data/R3-xy.csv

where xy is the last two digits of your Student ID. Note that the file name is R3-xy.csv, with a number 3 after the R.

Assessed Question 1. What is the correlation between the stock prices of Amaflix and Netazon? Give your answer to 4 significant figures.

Assessed Question 2. The 10th entry of the Amaflix stock price was wrongly recorded – the actual value should be double the recorded value. Correct this entry, and then calculate the mean of the first 20 days of the Amaflix data. Give this mean to 2 decimal places.

For the next question, you should read in the data set at

https://mpaldridge.github.io/math1710/data/R10-xy.csv

where xy is the last two digits of your Student ID. Note that the file name is R10-xy.csv, with a number 10.

Assessed Question 3. Draw a scatterplot from this R10 data, where the x-axis is the column x from this data and the y-axis is the other column (whose name you should find out using R). You should see a clear outlier point, to the top-left, top-right, bottom-left, or bottom-right. Which?

For the next question, you do not need to read in any data.

Assessed Question 4. Let \(X \sim \mathrm{Po}(100)\). What is \(\mathbb P(X \leq 103 \mid X \geq 94)\)? Give your answer to 3 decimal places.

For the final question, you should read in the data set at

https://mpaldridge.github.io/math1710/data/R8-xy.csv

where xy is the last two digits of your Student ID. Note that the file name is R8-xy.csv, with a number 8.

Assessed Question 5. The R8 data represents a discrete random variable \(X\) with range given by the column x and PMF given by the column pmf_x. What is \(\mathbb E\sqrt{X}\), the expected value of the square-root of \(X\)? Give your answer to one decimal place.


Remember to submit your answers via this Microsoft form before 1700 on Thursday 14 December.