Problem Sheet 5

This is Problem Sheet 5. This problem sheet covers Lectures 15 to 18. You should work through all the questions on this problem sheet in preparation for your tutorial in Week 10. The problem sheet contains two assessed questions, which are due in by 2pm on Monday 11 December.

A: Short questions

A1. Consider the continuous random variable \(X\) with PDF \[ f(x) = \begin{cases} \tfrac12x & \text{for $0 \leq x \leq 1$} \\ \tfrac12 & \text{for $1 < x \leq 2$} \\ \tfrac32 - \tfrac12x & \text{for $2 < x \leq 3$} \end{cases} \] and \(f(x) = 0\) otherwise.

(a) Calculate the CDF for \(X\).

Solution. We treat the different cases separately.

For \(x < 0\), we have \(F(x) = 0\).

For \(0 \leq x \leq 1\), we have \[ F(x) = \int_0^x \tfrac12 y \, \mathrm dy = \left[\tfrac14y^2\right]_0^x = \tfrac14 x^2 .\] In particular, \(F(1) = \frac14\).

For \(1 < x \leq 2\), we have \[ F(x) = \int_0^x f(y)\, \mathrm dy = F(1) + \int_1^y \tfrac12 \, \mathrm dy = \tfrac14 + \left[ \tfrac 12 y\right]_1^x = \tfrac12 x - \tfrac14 .\] In particular, \(F(2) = \frac34\).

For \(2 < x \leq 3\), we have \[ F(x) = \int_0^x f(y)\, \mathrm dy = F(2) + \int_2^y \left(\tfrac32 - \tfrac12y\right) \, \mathrm dy = \tfrac34 + \left[ \tfrac 32 y - \tfrac14 y^2\right]_2^x = \tfrac32 x - \tfrac14 x^2 - \tfrac 54 .\] In particular, \(F(3) = 1\).

For \(x > 3\), we have \(F(x) = 1\).

Hence, \[ F(x) = \begin{cases} 0 & \text{for $x < 0$} \\ \tfrac14 x^2 & \text{for $0 \leq x \leq 1$} \\ \tfrac12 x - \tfrac14 & \text{for $1 < x \leq 2$} \\ \tfrac32 x - \tfrac14 x^2 - \tfrac 54 & \text{for $2 < x \leq 3$} \\ 1 & \text{for $x > 3$.} \end{cases} \]

(b) What is \(\mathbb P(\tfrac32 \leq X \leq \tfrac52)\)?

Solution. This is \[ F\big(\tfrac52\big) - F\big(\tfrac32\big) = \tfrac{15}{16} - \tfrac12 = \tfrac{7}{16} .\] (Here, it was useful to note that \(x = \tfrac52\) is in the \(2 < x \leq 3\) range and \(x = \tfrac32\) is in the \(1 < x \leq 2\) range.)

(c) Calculate the expectation \(\mathbb EX\).

Solution. We have \[\begin{align*} \mathbb EX &= \int_{-\infty}^{\infty} x\,f(x)\, \mathrm dx \\ &= \int_0^1 x\times\tfrac12x\, \mathrm dx + \int_1^2 x \times \tfrac12 \, \mathrm dx + \int_2^3 x \times \left(\tfrac32 - \tfrac12 x\right)\, \mathrm dx \\ &= \left[\tfrac16 x^3 \right]_0^1 + \left[\tfrac14 x^2 \right]_1^2 + \left[\tfrac34 x^2 - \tfrac16 x^3 \right]_2^3 \\ &= \tfrac16 - 0 + 1 - \tfrac14 + \tfrac94 - \tfrac53 \\ &= \tfrac32 . \end{align*}\]

A2. Let \(X\) be a continuous random variable with PDF \[ f(x) = \frac{k}{x^3} \qquad \text{for $x \geq 1$} \] and \(f(x) = 0\) otherwise.

(a) What value of \(k\) makes this into a true PDF?

Solution. We need the PDF to integrate to 1. So \[ 1 = \int_{-\infty}^\infty f(x) \, \mathrm dx = \int_1^\infty kx^{-3} \, \mathrm dx = \left[-\tfrac 12 kx^{-2}\right]_1^\infty = - 0 + \tfrac12k . \] So \(k = 2\).

(b) What is \(\mathbb P(X \geq 3)\)?

Solution. This is \[ \mathbb P(X \geq 3) = \int_3^\infty 2x^{-3}\,\mathrm dx = \left[-x^{-2}\right]_3^\infty = \tfrac19 . \]

(c) What is the expected value \(\mathbb EX\)?

Solution. This is \[ \mathbb EX = \int_1^\infty x\times 2x^{-3} \mathrm dx = \left[2x^{-1}\right]_1^\infty = 2 . \]

A3. Let \(X \sim \text{Exp}(\frac12)\).

(a) What is \(\mathbb EX\)?

Solution. \(\mathbb EX = \displaystyle\frac{1}{\frac12} = 2\)

(b) What is \(\mathbb P(1 \leq X \leq 3)\)?

Solution. We have \[ \mathbb P(1 \leq X \leq 3) = F(3) - F(1) = (1 - \mathrm e^{-3/2}) - (1 - \mathrm e^{-1/2}) = 0.383 . \]

A4. Let \(Z \sim \mathrm{N}(0,1)\). Calculate the following (a) using statistical tables; (b) using R. (For part (a), you should show enough working to convince a reader that you really did use the tables.)

(i) \(\mathbb P(Z \leq -1.2)\)

Solution. Using statistical tables, \[ \Phi(-1.2) = 1 - \Phi(1.20) = 1 - 0.8849 = 0.1151 .\]

Using R: pnorm(-1.2) gives 0.1150697.

(ii) \(\mathbb P(-1.2 \leq Z \leq 0.8)\)

Solution. Using statistical tables, and part (i), \[ \Phi(0.80) - \Phi(-1.2) = 0.7781 - 0.1151 = 0.6730 . \]

Using R: pnorm(0.8) - pnorm(-1.2) gives 0.6730749.

(iii) \(\mathbb P(Z \leq 0.27)\) (using interpolation for part (a))

Solution. We can interpolate between \(\Phi(0.25) = 0.5987\) and \(\Phi(0.30) = 0.6179\), to get \[ \Phi(0.27) \approx 0.6 \,\Phi(0.25) + 0.4\, \Phi(0.30) = 0.6064 . \]

Using R: pnorm(0.27) gives 0.6064199.

A5. Let \(X \sim \mathrm{Po}(25)\). Calculate the following (a) exactly, using R; (b) approximately, using a normal approximation with a continuity correction and statistical tables. (For part (b), you should show enough working to convince a reader that you really did use the tables.)

(i) \(\mathbb P(X \leq 27)\)

Solution. Using R: ppois(27, 25) gives 0.7001861.

The approximation is \(X \approx Y \sim \mathrm N(25, 25) = \mathrm{N}(25, 5^2)\). With a continuity correction, we expand the interval \((-\infty,27]\) outwards to \((-\infty, 27.5]\), and get \[ \mathbb P(X \leq 27) \approx \mathbb P(Y \leq 27.5) \approx \mathbb P \left(\frac{Y - 25}{5} \leq \frac{27.5 - 25}{5}\right) = \Phi(0.50) = 0.692. \]

(ii) \(\mathbb P(X \geq 28 \mid X \geq 27)\)

Solution. By the definition of conditional probability, we have \[ \mathbb P(X \geq 28 \mid X \geq 27) = \frac{\mathbb P(X \geq 28 \text{ and } X \geq 27)}{\mathbb P(X \geq 27)} = \frac{\mathbb P(X \geq 28)}{\mathbb P(X \geq 27)} ,\] since if \(X \geq 28\) it’s automatically the case that \(X \geq 27\).

Using R, we need to remember that lower.tail = FALSE gives \(\mathbb P(X > x)\) with strict inequality, which for discrete random variables is equivalent to \(\mathbb P(X \geq x +1)\). So we actually want

ppois(27, 25, lower.tail = FALSE) / ppois(26, 25, lower.tail = FALSE)

which gives 0.8089648.

The approximations are \[\begin{align*} \mathbb P(Z \geq 28) &\approx \mathbb P(Y \geq 27.5) = 1 - \Phi(0.50) = 0.3085 \\ \mathbb P(Z \geq 27) &\approx \mathbb P(Y \geq 26.5) = 1 - \Phi(0.30) = 0.3821 , \end{align*}\] where again we used the continuity correct to expand \([28,\infty)\) outwards to \([27.5,\infty)\) and the same for \([27,\infty)\). This gives the answer \(0.3085/0.3821 = 0.807\).

B: Long questions

B1. (a) Let \(X \sim \text{Exp}(\lambda)\). Show that \[ \mathbb P(X > x + y \mid X > y) = \mathbb P(X > x) . \]

Solution.
Using the definition of conditional probability, we have \[\mathbb P(X > x + y \mid X > y) = \frac{\mathbb P(X > x + y \text{ and } X > y) }{\mathbb P(X > y)}= \frac{\mathbb P(X > x + y ) }{\mathbb P(X > y)} , \] since is \(X > x + y\) then we automatically have \(X > y\). Note also that, for an exponential distribution we have \[ \mathbb P(X > x) = 1 - F(x) = 1 - (1 - \mathrm e^{-\lambda x}) = \mathrm e^{-\lambda x} . \] So the left-hand side of the statement in the question is \[ \frac{\mathrm{e}^{-\lambda(x + y)}}{\mathrm{e}^{-\lambda y}} = \mathrm e^{-\lambda x - \lambda y + \lambda y} = \mathrm{e}^{-\lambda x} , \] which equals the right-hand side, by the above.

(b) The result proved in part (a) is called the “memoryless property”. Why do you think it’s called that?

Solution. Think of \(X\) as a waiting time. The result tells us that, given that we’ve already waited \(y\) minutes, the probability that we have to wait at least another \(x\) minutes is exactly the same as the probability we had to wait at least \(x\) minutes starting from the beginning. In other words, no matter when we start timing from, the probability we have to wait more than \(x\) minutes remains the same.

This is called the “memoryless property” because it’s as if the process has no memory of how long we’ve already been waiting for.

(This property also holds for the geometric distribution. The expected number of rolls of a dice until you get a six is always 6 rolls, no matter how many times you’ve already rolled the dice.)

(c) When you get to certain bus stop, the average amount of time you have to wait for a bus to arrive is 20 minutes. Specifically, the time until the next bus arrives is modelled as an exponential distribution with expectation \(1/\lambda = 20\) minutes. Suppose you have already been waiting at the bus stop for 15 minutes. What is the expected further amount of time you still have to wait for a bus to arrive?

Solution. By the memoryless property, it’s irrelevant how long we’ve been waiting for: the average time until a bus arrives is always \(1/\lambda = 20\) minutes.

B2. The main dangerous radioactive material left over after the Chernobyl disaster is Caesium-137. The amount of time it takes a Caesium-137 particle to decay is known to follow an exponential distribution with rate \(\lambda = 0.023\) years-1.

(a) What is the average amount of time it takes a Caesium-137 particle to decay?

Solution. The expectation is \(1/\lambda = 43.5\) years.

(b) The “half-life” of a radioactive substance is the amount of time it takes for half of the substance to decay. Using the information in the question, calculate the half-life of Caesium-137.

Solution. The half-life is the median of the distribution; that is, the solution \(x\) to \[ F(x) = 1 - \mathrm{e}^{-0.023x} = \tfrac12 . \] So \[ x = \frac{\log\frac12}{-0.023} = \frac{\log 2}{0.023} = 30.1 \text{ years} . \]

(c) It is estimated that roughly 24 kg of Caesium-137 was released during the Chernobyl disaster, which happened roughly 37.6 years ago. Estimate the mass of Caesium-137 that has still not decayed?

Solution. The proportion of Caesium-137 still remaining is \[ \mathbb P(X > 35.6) = \mathrm e^{-0.023 \times 37.6} = 0.421 , \] so roughly \(24 \times 0.441 = 10.1\) kg of Caesium-137 has still not decayed.

B3. Consider the pair of random variables \((X,Y)\) with joint PDF \[ f_{X,Y}(x,y) = 2 \qquad \text{for $0 \leq x \leq y \leq 1$} \] and \(f_{X,Y}(x,y) = 0\) otherwise. (In particular, note that the joint PDF is only nonzero when \(x \leq y\).)

(a) Draw a picture of the range of \((X,Y)\) in the \(xy\)-plane.

(b) Describe the conditional distribution of \(X\) given \(Y = y\), for \(0 \leq y \leq 1\).

Solution. Fix \(y\). The conditional distribution is \[ f_{X \mid Y}(x \mid y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} \propto f_{X,Y}(x,y) .\] We know that \(f_{X,Y}(x,y) = 2\) when \(0 \leq x \leq y\) and is \(0\) otherwise. So the conditional distribution of \(X\) given \(Y = y\) is continuous uniform on the interval \([0, y]\).

If we want to check the denominator \(f_Y(y)\) formally, we can check that \[ f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \mathrm dx = \int_0^y 2\, \mathrm dy = 2y ,\] so the conditional PDF is indeed \(f_{X \mid Y}(x \mid y) = 2/2y = 1/y\) for \(0 \leq x \leq y\) and 0 otherwise.

(c) What is the marginal PDF \(f_X\) of \(X\)?

Solution. Again the key is that the joint PDF is only nonzero when \(y \geq x\) but \(y \leq 1\). So \[ f_X(x) = \int_{-\infty}^\infty f_{X,Y}(x,y) \, \mathrm dy = \int_x^1 2 \, \mathrm dy = 2(1 - x) \] for \(0 \leq x \leq 1\) and 0 otherwise.

(d) Are \(X\) and \(Y\) independent?

Solution. No. Take, for example, \(x = \frac34\) and \(y = \frac14\). It’s clear that this \(f_{X,Y}(\frac34,\frac14) = 0\), while \(f_X(\frac34)\) and \(f_Y(\frac14)\) are nonzero, just by looking at the picture from part (a).

We can check it formally too, if we want. Since \(x > y\), this point has joint PDF \(f_{X,Y}(\frac34,\frac14) = 0\). We know the marginal PMFs, though are \[\begin{align*} f_X\big(\tfrac34\big) &= 2\big(1 - \tfrac34\big) = \tfrac12 \\ f_Y\big(\tfrac14\big) &= 2 \times \tfrac14 = \tfrac12 . \end{align*}\] (we used \(f_Y(y) = 2y)\) based on symmetry with \(f_X(1-x)\), or alternatively by calculating it “long-hand”.) So \(f_X(x)f_Y(y) = \tfrac12 \times \tfrac12 = \tfrac14 \neq 0\). So \(X\) and \(Y\) are not independent.

B4. Let \(X_1, X_2, \dots, X_n\) be IID random variable with common expectation \(\mu\) and common variance \(\sigma^2\), and let \(\overline X = (X_1 + \cdots + X_n)/n\) be the mean of these random variables. We will be considering the random variable \(S^2\) given by \[ S^2 = \sum_{i=1}^n (X_i - \overline X)^2 . \]

(a) By writing \[ X_i - \overline X = (X_i - \mu) - (\overline X - \mu) \] or otherwise, show that \[ S^2 = \sum_{i=1}^n (X_i - \mu)^2 - n(\overline X - \mu)^2 . \]

Solution. Using the suggestion in the question, we have \[\begin{align*} S^2 &= \sum_{i=1}^n (X_i - \overline X)^2 \\ &= \sum_{i=1}^n \big( (X_i - \mu) - (\overline X - \mu) \big)^2 \\ &= \sum_{i=1}^n \big( (X_i - \mu)^2 - 2(X_i - \mu)(\overline X - \mu) + (\overline X - \mu)^2\big) \\ &= \sum_{i=1}^n (X_i - \mu)^2 - \sum_{i=1}^n 2(X_i - \mu)(\overline X - \mu) + \sum_{i=1}^n (\overline X - \mu)^2 \\ &= \sum_{i=1}^n (X_i - \mu)^2 - 2\left(\sum_{i=1}^n X_i - \sum_{i=1}^n \mu\right)(\overline X - \mu) + (\overline X - \mu)^2 \sum_{i=1}^n 1 \\ &= \sum_{i=1}^n (X_i - \mu)^2 - 2(n\overline X - n\mu) (\overline X - \mu) + n (\overline X - \mu)^2 \\ &= \sum_{i=1}^n (X_i - \mu)^2 - 2n(\overline X - \mu)^2 + n(\overline X - \mu)^2 \\ &= \sum_{i=1}^n (X_i - \mu)^2 - n(\overline X - \mu)^2 . \end{align*}\] This is mostly manipulation of sums as we have seen before, although note that going from the fifth to sixth lines we used the definition of \(\overline X\) to write \(\sum_{i=1}^n X_i\) as \(n \overline X\).

(b) Hence or otherwise, show that \[ \mathbb E S^2 = (n - 1)\sigma^2 . \] You may use facts about \(\overline X\) from the notes provided you state them clearly. (You may find it helpful to recognise some expectations as definitional formulas for variances, where appropriate.)

Solution. Starting with the linearity of expectation, we have \[\begin{align*} \mathbb ES^2 &= \mathbb E \left( \sum_{i=1}^n (X_i - \mu)^2 - n(\overline X - \mu)^2 \right) \\ &= \sum_{i=1}^n \mathbb E (X_i - \mu)^2 - n \mathbb E(\overline X - \mu)^2 \\ &= \sum_{i=1}^n \operatorname{Var}(X_i) - n \operatorname{Var}(\overline X) . \end{align*}\] The last line follows because \(\mathbb EX_i = \mu\) for all \(i\) by assumption, and we showed in the notes that \(\mathbb E \overline X = \mu\) also; hence, as hinted, the expectations are precisely definitional formulas for the variances. We then also know that \(\operatorname{Var}(X_i) = \sigma^2\) by assumption, and we showed Lecture 18 that \(\operatorname{Var}(\overline X) = \sigma^2/n\). Hence \[ \mathbb ES^2 = \sum_{i=1}^n \sigma^2 - n\, \frac{\sigma^2}{n} = n \sigma^2 - \sigma^2 = (n-1)\sigma^2, \] as required.

(c) At the beginning of this module, we defined the sample variance of the values \(x_1, x_2, \dots, x_n\) to be \[ s^2_x = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar x)^2 . \] Explain one reason why we might consider it appropriate to use \(1/(n-1)\) as the factor at the beginning of this expression, rather than simply \(1/n\).

Solution. We often model a data set \(x_1, x_2, \dots, x_n\) as being realisations of an IID sequence of random variables \(X_1, X_2, \dots, X_n\). In this case, we are using the summary statistic of the sample variance \(s_x^2\) to “estimate” the variance \(\operatorname{Var}(X_1) = \sigma^2\). Using the factor \(1/(n-1)\) ensures that this estimator is correct “in expectation”, because \[ \mathbb E s_X^2 = \mathbb E \frac{1}{n-1}S^2 = \frac{1}{n-1} \mathbb ES^2 = \frac{1}{n-1}(n-1)\sigma^2 = \sigma^2 . \] This property of being correct in expectation is called being an “unbiased” estimator, and its usually considered beneficial for an estimator to be unbiased.

Note that we already know that the sample mean \(\bar x\) is an unbiased estimator for the expectation \(\mathbb EX = \mu\), as we already know that \(\mathbb E\overline X = \mu\).

(You may learn more about estimation and “unbiasedness” in MATH1712 Probability and Statistics II.)

B5. Roughly how many times should I toss a coin for there to be a 95% chance that between 49% and 51% of my coin tosses land Heads?

(It may be useful to know that, for a standard normal distribution, we have \(\Phi(1.96) = 0.975\).)

Solution. The number of Heads in \(n\) coin tosses is \(X \sim \mathrm{Bin}(n, \frac12)\), which, since \(n\) will be large, and \(p\) not near 0 or 1, is approximately \(Y \sim \mathrm{N}(\frac n2, \frac n4)\). We want to choose \(n\) such that \[ \mathbb P(0.49 n \leq Y \leq 0.51n) = 0.95 .\]

Standardising, we have \[ \mathbb P \left( \frac{0.49n - 0.5n}{0.5\sqrt{n}} \leq \frac{Y - 0.5n}{0.5\sqrt{n}} \leq \frac{0.51n - 0.5n}{0.5\sqrt{n}}\right) = \mathbb P(-0.02\sqrt{n} \leq Z \leq 0.02\sqrt{n}) \] Since the normal distribution is symmetric, we want \[\mathbb P(X \leq 0.02\sqrt{n}) = 0.975 .\] From the hint in the question (or Table 2 of the statistical tables, or by the R command qnorm(0.975)), this requires \(0.02\sqrt{n} = 1.96\), which is \(n \approx 9600\).

So if we toss about 10,000 coins, there’s about a 95% chance we get between 4900 and 5100 Heads.

C: Assessed questions

The last two questions are assessed questions. hese two questions count for 3% of your final mark for this module.

The deadline for submitting your solutions is 2pm on Monday 11 December at the beginning of Week 11. Submission will be via Gradescope; submission will open on Monday 5 November. Your work will be marked by your tutor and returned later, when solutions will also be made available.

Both questions are “long questions”, where the marks are not only for mathematical accuracy but also for the clarity and completeness of your explanations.

You should not collaborate with others on the assessed questions: your answers must represent solely your own work. The University’s rules on academic integrity – and the related punishments for violating them – apply to your work on the assessed questions.

C1. Let \(X\) be a continuous random variable with PDF \[ f(x) = \tfrac29 (2 - x) \qquad \text{for $-1 \leq x \leq c$} \] and \(f(x) = 0\) otherwise.

(a) Explaining your work, find the value of the constant \(c\).

Hint. Remember that the integral under a PDF must equal 1.

(b) What is \(\mathbb P(X > 1)\)?

Hint. This is standard.

(c) Calculate the expectation of\(X\).

Hint. This is standard.

(d) Calculate the variance of \(X\).

Hint. This is standard. I recommend using the computational formula, so start by finding \(\mathbb EX^2\).

C2. For each of the following, (a) calculate the exact value using R; (b) get an approximate value using an appropriate approximation and without using R. (Statistical tables are available.)

(i) \(\mathbb P(X \leq 3)\), where \(X \sim \mathrm{Bin}(1000, 0.005)\).

Hint. For the approximation, note that \(n\) is large and \(p\) is small.

(ii) \(\mathbb P(296 \leq Y \leq 307)\), where \(Y \sim \mathrm{Bin}(1200, 0.25)\).

Hint. For the approximation, note that \(n\) is large and \(p\) is not small.

(iii) \(\mathbb P(Z \geq 398)\), where \(Z \sim \mathrm{Bin}(400, 0.995)\).

Hint. This one will require you to think for yourself! I have not told you how to do this question. You might notice it looks a bit like the Poisson `small \(p\)’ case, but mirror-imaged. How can you use this to your advantage.

Solutions to short questions

A1. (b) \(\tfrac{7}{16}\) (c) 1.5
A2. (a) 2 (b) \(\tfrac19\) (c) 2
A3. (a) 2 (b) 0.383
A4. (i) 0.1150 (ii) 0.6731 (or 0.6730 with tables) (iii) 0.6064
A5. (a) (i) 0.7002 (ii) 0.809 (b) (i) 0.6914 (ii) 0.807