Lecture 21 All questions answered

21.1 Distributions

Can you summarise the distributions we need to know?

The main random variables we have covered in the module are the following.

First, the discrete random variables:

Distribution Range PMF Expectation Variance
Bernoulli: \(\text{Bern}(p)\) \(\{0,1\}\) \(p(0) = 1- p\), \(p(1) = p\) \(p\) \(p(1-p)\)
Binomial: \(\text{Bin}(n,p)\) \(\{0,1,\dots,n\}\) \(\displaystyle\binom{n}{x} p^x (1-p)^{n-x}\) \(np\) \(np(1-p)\)
Geometric: \(\text{Geom}(p)\) \(\{1,2,\dots\}\) \((1-p)^{x-1}p\) \(\displaystyle\frac{1}{p}\) \(\displaystyle\frac{1-p}{p^2}\)
Poisson: \(\text{Po}(\lambda)\) \(\{0,1,\dots\}\) \(\mathrm{e}^{-\lambda} \displaystyle\frac{\lambda^x}{x!}\) \(\lambda\) \(\lambda\)

Then the continuous random variables:

Distribution Range PDF Expectation Variance
Exponential: \(\text{Exp}(\lambda)\) \(\mathbb R_+\) \(\lambda \mathrm e^{-\lambda x}\) \(\displaystyle\frac{1}{\lambda}\) \(\displaystyle\frac{1}{\lambda^2}\)
Normal: \(\mathrm N(\mu,\sigma^2)\) \(\mathbb R\) \({\displaystyle{\small \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left( - \frac{(x - \mu)^2}{2\sigma^2} \right)}}\) \(\mu\) \(\sigma^2\)
Beta: \(\text{Beta}(\alpha, \beta)\) \([0,1]\) \(\propto x^{\alpha - 1}(1-x)^{\beta - 1}\) \(\displaystyle\frac{\alpha}{\alpha + \beta}\) \({\displaystyle{\small \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}}}\)

21.2 Discrete random variables in R

How do you work with discrete random variables in R?

If we are dealing with a famous distribution like the binomial, geometric or Poisson distributions, we can use the functions like pbinom(), pgeom(), ppois() and their relatives, as on R Worksheet 7. However, for arbitrary random variables, we need to do more of the work ourselves. This was discussed on R Worksheet 8, but when questions came up on this on R Worksheet 9, lots of people got them wrong, so perhaps I didn’t explain it well.

Let’s do an example with a simple discrete random variable. Its PMF is as follows:

\(x\) 1 2 3 5.5 7 8
\(p(x)\) 0.1 0.2 0.1 0.2 0.3 0.1

If we want to work with this in R, we need two vectors. One, which I’ll call x, to hold the values that the random variable can take, and one, which I’ll call pmf_x, to hold the values of the PMF \(p(x) = \mathbb P(X = x)\).

x     <- c(  1,   2,   3, 5.5,   7,   8)
pmf_x <- c(0.1, 0.2, 0.1, 0.2, 0.3, 0.1)

I can check that the PMF sums to 1, as it must.

sum(pmf_x)
[1] 1

Suppose I wanted to calculate the expectation of this random variable. When doing this “by hand”, we know that this is \[ \mathbb EX = \sum_x x \,p(x) . \] So in R, this is

mu <- sum(x * pmf_x)
mu
[1] 4.8

Suppose I wanted to find the variance. Using the definitional formula, we know this is \[ \operatorname{Var}(X) = \mathbb E(X - \mu)^2 = \sum_x (x-\mu)^2\,p(x) . \] In R, I’ve already saved the expectation as the R object mu. So now I can use

sum((x - mu)^2 * pmf_x)
[1] 5.91

Alternatively, we have the computational formula \(\operatorname{Var}(X) = \mathbb EX^2 - \mu^2\), where \[ \mathbb EX^2 = \sum_x x^2 \, p(x) .\] In R, this is

EX2 <- sum(x^2 * pmf_x)
EX2 - mu^2
[1] 5.91

which gives the same answer.

When calculating a variance, especially with the computational formula and especially when dealing with a discrete random variable with a large range, it’s important to keep plenty of accuracy in \(\mu = \mathbb EX\). In the above, I did this by saving this as an R object mu. This is much better and more accurate than just writing down a few decimal places and writing it out by hand again.

Remember that the function var() is used for calculating the sample variance of some data – that’s no use for us here when we want to find the variance of a random variable.

See R Worksheet 8 for more on this, including how to use the “step function” function stepfun() to work with the CDF \(F(x) = \mathbb P(X \leq x)\).

21.3 Statistical tables

Can you go over the rules for how to use statistical tables?

In Lecture 17, we discussed how to do calculations with the normal distribution using R and using statistical tables.

We need to be able to do this when working with a normal distribution \(X \sim \mathrm{N}(\mu, \sigma^2)\). We also need to be able to do this when approximating another distribution by the normal distribution – for example, \(\mathrm{Bin}(n,p) \approx \mathrm{N}(np, np(1-p))\) where \(n\) is large and \(p\) is neither close to 0 nor to 1, or a \(\mathrm{Po}(\lambda) \approx \mathrm{N}(\lambda,\lambda)\) when \(\lambda\) is large. (Remember to use a continuity correction when approximating a discrete distribution by the normal).

The first thing we need to do to use statistical tables is to standardise. That is, we can convert \(X \sim \mathrm{N}(\mu, \sigma^2)\) to \(Z = (X - \mu)/\sigma \sim \mathrm{N}(0,1)\) by subtracting the mean and dividing by the standard deviation. So, for example if \(X \sim \mathrm{N}(10, 4^2)\) and we want to calculate \(\mathbb P(8 \leq X \leq 13)\), then standardising gives \[ \mathbb P(8 \leq X \leq 13) = \mathbb P \left( \frac{8 - 10}{4} \leq \frac{X - 10}{4} \leq \frac{13 - 10}{4} \right) = \mathbb P(-0.5 \leq Z \leq 0.75) . \]

Statistical tables give values of \(\Phi(z) = \mathbb P(Z \leq z)\) for \(z \geq 0\). We can use these values in a few different ways. In what follows, we always assume \(z \geq 0\). (Remember, too, that for continuous distributions it’s irrelevant whether an inequality is \(<\) or \(\leq\).)

First, if we have a standard “less than” problem, so want to know \(\mathbb P(Z \leq z) = \Phi(z)\), we can just look this up in the table.

Second, If we have a “greater than”, so want to know \(\mathbb P(Z > z)\), we can use the complement rule to write \(\mathbb P(Z > z) = 1 - \mathbb P(Z \leq z) = 1 - \Phi(z)\). We then look up \(\Phi(z)\) in the table, and our answer is \(1 - \Phi(z)\).

Third, If we have a negative number, we can use the fact that the normal distribution is symmetric about 0. So we swap to the positive value of the number, and switch the inequality. So \[\begin{align*} \mathbb P(Z \geq -z) &= \mathbb P(Z \leq z) = \Phi(z) \\ \mathbb P(Z \leq -z) &= \mathbb P(Z \geq z) = 1 - \mathbb P(Z < z) = 1 - \Phi(z) . \end{align*}\] In the second of these, we used rule 2 again in the second equality.

To summarize: for \(z \geq 0\):

  1. \(\mathbb P(Z \leq z) = \Phi(z)\)
  2. \(\mathbb P(Z > z) = 1 - \Phi(z)\)
  3. \(\mathbb P(Z \geq -z) = \Phi(z)\)
  4. \(\mathbb P(Z < -z) = 1 - \Phi(z)\)

So to go back to our example, we’ve already standardised to get \[ \mathbb P(8 \leq X \leq 13) = \mathbb P(-0.5 \leq Z \leq 0.75) = \mathbb P(Z \leq 0.75) - \mathbb P(Z \leq -0.5) . \] The first term can be looked up directly in the table (rule 1): \[ \mathbb P(Z \leq 0.75) = \Phi(0.75) = 0.7734 . \] For the second term, we can use rules 2 and 3 to get \[ \mathbb P(Z \leq -0.5) = \mathbb P(Z \geq 0.5) = 1 - \mathbb P(Z < 0.5) = 1 - \Phi(0.5) = 1 - 0.6915 = 0.3085 , \] where we found \(\Phi(0.5) = 0.6815\) in the table. To put this all together, we get \[ \mathbb P(8 \leq X \leq 13) = \mathbb P(Z \leq 0.75) - \mathbb P(Z \leq -0.5) = 0.7734 - 0.3085 = 0.4649 . \]

21.4 “Edge cases”

21.5 End-of semester survey

21.6 Other modules