This R worksheet does include assessed questions at the bottom. The deadline for submitting your solutions to these questions is Monday 4 December at 1400. If you have difficulty with this worksheet, you can get help at the office hours drop-in sessions.
When we looked a discrete distributions, we saw the
dxxx()
, pxxx()
, qxxx()
and
rxxx()
functions (where xxx
could be
binom
, geom
or pois
). These gave,
respectively the PMF, CDF, quantile function, and random samples. The
same works with continuous distributions norm
(for the
normal distribution), exp
(for the exponential
distribution), or beta
(for the Beta distribution, which we
shall see later in Lecture 20).
We will concentrate in this worksheet just on the functions for the normal distribution:
dnorm()
gives the probability
density function \(f(x)\) of a normal
distribution \(X \sim \mathrm{N}(\mu,
\sigma^2)\). The format is dnorm(x, mu, sigma)
,
where x
is the point at which the density is evaluated,
mu
is the expectation \(\mu\) of the normal distribution, and
sigma
is the standard deviation \(\sigma\) of the normal distribution. Note
that the third parameter is the standard deviation \(\sigma\) – that is, the square-root \(\sigma = \sqrt{\sigma^2}\) of the variance
– not the variance \(\sigma^2\) itself:
this is the easiest mistake to make when working with the normal
distribution in R!
pnorm()
gives the CDF \(F(x) = \mathbb P(X \leq x)\) of a normal
distribution. The format is pnorm(x, mu, sigma)
in the same
way.
lower.tail
. By default
this is set to TRUE
, which gives the CDF \(F(x) = \mathbb P(X \leq x)\); but when set
to FALSE
, we instead get the upper-tail probability \(1 - F(x) = \mathbb P(X > x)\). So a full
command for that is
pnorm(x, mu, sigma, lower.tail = FALSE)
.qnorm()
gives the quantile function
\(F^{-1}(q)\). That is,
qnorm(q, mu, sigma)
gives the value \(x\) such that \(F(x) = \mathbb P(X \leq x ) = q\).
lower.tail = FALSE
argument that
gives the \(x\) such that \(1 - F(x) = \mathbb P(X > x) = q\).rnorm()
generates random IID
samples from a normal distribution. That is,
rnorm(n, mu, sigma)
gives \(n\) such random samples.
In all these functions, if mu
and sigma
are
omitted, it is assumed you are dealing with a standard normal
distribution \(\mu = 0, \sigma^2 = \sigma =
1\).
The pnorm()
function is the most important of these
functions and is the one we use most often. Unlike with discrete random
variables, it’s rarely useful to use the PDF dnorm()
;
instead, we use the CDF pnorm()
. In particular, the
probabiltiy \(\mathbb P(a \leq X \leq
b)\) that \(X\) lies in some
interval \([a,b]\) is given by
pnorm(b, mu, sigma) - pnorm(a, mu, sigma)
.
In Lecture
16 we gave many examples of how to perform calculations using the
pnorm()
and one example qnorm()
functions.
Your main task for this worksheet is to revise what was in that
lecture.
Exercise 9.1. What is the value of the standard normal density at 0?
Exercise 9.2. Let \(X_2 \sim \mathrm{N}(10, 25)\). What is \(\mathbb P(X > 16 \text{ or } X \leq 8)\)?
Exercise 9.3. Let \(X_3 \sim \mathrm{N} (100, 10^2)\). Give an interval, symmetric around the expectation, such that \(X_3\) has a 50:50 chance of landing in that interval.
Exercise 9.4. Let \(X_4 \sim \mathrm{N}(42, 17^2)\). Sample 10,000 samples from the distribution \(X_4\). Calculate the sample mean and sample variance of your samples.
We briefly mention that the pexp()
variation takes the
format pexp(x, lambda)
, where \(x\) is where the CDF should be evaluated
and lambda
is the rate parameter \(\lambda\).
Exercise 9.5. Let \(X_4 \sim \mathrm{Exp}(0.2)\). Calculate \(\mathbb P(4 \leq X_4 \leq 7)\) using the
pexp()
function.
The pbeta()
variation takes the format
pbeta(x, alpha, beta)
, which will make sense to you after
Lecture 20.
The following five assessed questions should be submitted via this Microsoft Form. I recommend you do this in Week 9, but the official deadline is Monday 4 December at 1400.
This work will be marked automatically by computer, so make sure your answers are accurate – the computer does not “know what you meant”; only what you actually enter into the form.
So that (most) students get a different distribution to work
with, you will work from a file whose name is based on your student ID
number. Your student ID number is (usually) a 9-digit number starting
201
. Your dataset is the CSV file at
https://mpaldridge.github.io/math1710/data/R8-xy.csv
where xy
is replaced by the last two digits of your
student ID number. So if your student ID is 201623429
, then
you should use the file at
https://mpaldridge.github.io/math1710/data/R8-29.csv
; if
your student ID number is 201491200
then you should use the
file https://mpaldridge.github.io/math1710/data/R8-00.csv
and so on. Take care to check you get this correct: if you get your
student ID number wrong and/or use the wrong data file, the computer is
likely to award you 0 marks.
The first two questions are based on a discrete random variable \(X\). Read the CSV file at
https://mpaldridge.github.io/math1710/data/R8-xy.csv
into
R. Remember that xy
should be replaced by the last 2 digits
of your student ID number. This data set has two columns, called
x
and pmf_x
. The column x
represents the range of the discrete random variable \(X\), and the column pmf_x
represents the values of the PMF \(p_X(x) =
\mathbb P(X = x)\) of \(X\).
Assessed Question 1. What is \(\mathbb P(X \leq 234.5)\)? Round your answer to three significant figures.*
Assessed Question 2. Calculate the variance of \(X\). Round your answer to two decimal places.
The final three questions are based on the continuous normal distribution \(Y \sim \mathrm{N}(100, i+1)\), where \(i\) is the final single digit of your student ID. that is, the normal distribution with expectation \(\mu = 100\), and whose variance \(\sigma^2\) is equal to the last digit of your student ID plus 1.
Assessed Question 3. What is the probability \(\mathbb P(Y \geq 103)\)? Round your answer to three significant figures.
Assessed Question 4. What is the conditional probability \(\mathbb P(Y < 97 \mid Y < 99)\)? Round your answer to three significant figures.
Assessed Question 5. What is the value of the PDF \(f_Y(y)\) evaluated at the 95th percentile \(y\) of \(Y\); that is, evaluated at \(y\), where \(y\) is the point below which 95% of the probability lies? Round your answer to three significant figures.
Remember to submit your answers via the Microsoft Form.