R Worksheet 8: Normal distribution

This R worksheet does include assessed questions at the bottom. The deadline for submitting your solutions to these questions is Monday 4 December at 1400. If you have difficulty with this worksheet, you can get help at the office hours drop-in sessions.

When we looked a discrete distributions, we saw the dxxx(), pxxx(), qxxx() and rxxx() functions (where xxx could be binom, geom or pois). These gave, respectively the PMF, CDF, quantile function, and random samples. The same works with continuous distributions norm (for the normal distribution), exp (for the exponential distribution), or beta (for the Beta distribution, which we shall see later in Lecture 20).

We will concentrate in this worksheet just on the functions for the normal distribution:

dnorm() gives the probability density function \(f(x)\) of a normal distribution \(X \sim \mathrm{N}(\mu, \sigma^2)\). The format is dnorm(x, mu, sigma), where x is the point at which the density is evaluated, mu is the expectation \(\mu\) of the normal distribution, and sigma is the standard deviation \(\sigma\) of the normal distribution. Note that the third parameter is the standard deviation \(\sigma\) – that is, the square-root \(\sigma = \sqrt{\sigma^2}\) of the variance – not the variance \(\sigma^2\) itself: this is the easiest mistake to make when working with the normal distribution in R!
pnorm() gives the CDF \(F(x) = \mathbb P(X \leq x)\) of a normal distribution. The format is pnorm(x, mu, sigma) in the same way.
- There is an optional argument lower.tail. By default this is set to TRUE, which gives the CDF \(F(x) = \mathbb P(X \leq x)\); but when set to FALSE, we instead get the upper-tail probability \(1 - F(x) = \mathbb P(X > x)\). So a full command for that is pnorm(x, mu, sigma, lower.tail = FALSE).
qnorm() gives the quantile function \(F^{-1}(q)\). That is, qnorm(q, mu, sigma) gives the value \(x\) such that \(F(x) = \mathbb P(X \leq x ) = q\).
- Similarly, there is a lower.tail = FALSE argument that gives the \(x\) such that \(1 - F(x) = \mathbb P(X > x) = q\).
rnorm() generates random IID samples from a normal distribution. That is, rnorm(n, mu, sigma) gives \(n\) such random samples.

In all these functions, if mu and sigma are omitted, it is assumed you are dealing with a standard normal distribution \(\mu = 0, \sigma^2 = \sigma = 1\).

The pnorm() function is the most important of these functions and is the one we use most often. Unlike with discrete random variables, it’s rarely useful to use the PDF dnorm(); instead, we use the CDF pnorm(). In particular, the probabiltiy \(\mathbb P(a \leq X \leq b)\) that \(X\) lies in some interval \([a,b]\) is given by pnorm(b, mu, sigma) - pnorm(a, mu, sigma).

In Lecture 16 we gave many examples of how to perform calculations using the pnorm() and one example qnorm() functions. Your main task for this worksheet is to revise what was in that lecture.

Exercise 9.1. What is the value of the standard normal density at 0?

Exercise 9.2. Let \(X_2 \sim \mathrm{N}(10, 25)\). What is \(\mathbb P(X > 16 \text{ or } X \leq 8)\)?

Exercise 9.3. Let \(X_3 \sim \mathrm{N} (100, 10^2)\). Give an interval, symmetric around the expectation, such that \(X_3\) has a 50:50 chance of landing in that interval.

Exercise 9.4. Let \(X_4 \sim \mathrm{N}(42, 17^2)\). Sample 10,000 samples from the distribution \(X_4\). Calculate the sample mean and sample variance of your samples.

We briefly mention that the pexp() variation takes the format pexp(x, lambda), where \(x\) is where the CDF should be evaluated and lambda is the rate parameter \(\lambda\).

Exercise 9.5. Let \(X_4 \sim \mathrm{Exp}(0.2)\). Calculate \(\mathbb P(4 \leq X_4 \leq 7)\) using the pexp() function.

The pbeta() variation takes the format pbeta(x, alpha, beta), which will make sense to you after Lecture 20.

Assessed questions

The following five assessed questions should be submitted via this Microsoft Form. I recommend you do this in Week 9, but the official deadline is Monday 4 December at 1400.

This work will be marked automatically by computer, so make sure your answers are accurate – the computer does not “know what you meant”; only what you actually enter into the form.

So that (most) students get a different distribution to work with, you will work from a file whose name is based on your student ID number. Your student ID number is (usually) a 9-digit number starting 201. Your dataset is the CSV file at

https://mpaldridge.github.io/math1710/data/R8-xy.csv

where xy is replaced by the last two digits of your student ID number. So if your student ID is 201623429, then you should use the file at https://mpaldridge.github.io/math1710/data/R8-29.csv; if your student ID number is 201491200 then you should use the file https://mpaldridge.github.io/math1710/data/R8-00.csv and so on. Take care to check you get this correct: if you get your student ID number wrong and/or use the wrong data file, the computer is likely to award you 0 marks.

The first two questions are based on a discrete random variable \(X\). Read the CSV file at https://mpaldridge.github.io/math1710/data/R8-xy.csv into R. Remember that xy should be replaced by the last 2 digits of your student ID number. This data set has two columns, called x and pmf_x. The column x represents the range of the discrete random variable \(X\), and the column pmf_x represents the values of the PMF \(p_X(x) = \mathbb P(X = x)\) of \(X\).

Assessed Question 1. What is \(\mathbb P(X \leq 234.5)\)? Round your answer to three significant figures.*

Assessed Question 2. Calculate the variance of \(X\). Round your answer to two decimal places.

The final three questions are based on the continuous normal distribution \(Y \sim \mathrm{N}(100, i+1)\), where \(i\) is the final single digit of your student ID. that is, the normal distribution with expectation \(\mu = 100\), and whose variance \(\sigma^2\) is equal to the last digit of your student ID plus 1.

Assessed Question 3. What is the probability \(\mathbb P(Y \geq 103)\)? Round your answer to three significant figures.

Assessed Question 4. What is the conditional probability \(\mathbb P(Y < 97 \mid Y < 99)\)? Round your answer to three significant figures.

Assessed Question 5. What is the value of the PDF \(f_Y(y)\) evaluated at the 95th percentile \(y\) of \(Y\); that is, evaluated at \(y\), where \(y\) is the point below which 95% of the probability lies? Round your answer to three significant figures.

Remember to submit your answers via the Microsoft Form.

R Worksheet 8: Normal distribution

MATH1710 Probability and Statistcs I

University of Leeds, 2023–24

Assessed questions