Lecture 14 Expectation and covariance

14.1 Expectation of sums and products

When we have multiple random variables, we might be interested in functions of those multiple random variables – for example their sum or their product. It’s often possible to find out about the whole distribution of a sum, product, or function of the variables – see MATH2715 Statistical Methods for more on this – but here we will just look at their expectations and, later, variances.

Theorem 14.1 Let \(X\) and \(Y\) be two random variables with joint probability mass function \(p_{X,Y}\). Then

  1. \(\mathbb Eg(X,Y) = \displaystyle\sum_{x,y} g(x,y) p_{X,Y}(x,y)\).
  2. (Linearity of expectation, 2) \(\mathbb E(X + Y) = \mathbb EX + \mathbb EY\), regardless of whether \(X\) and \(Y\) are independent or not.
  3. If \(X\) and \(Y\) are independent, then \(\mathbb EXY = \mathbb EX \times \mathbb EY\).

If we put the second point here together with the other result of linearity of expectation (Theorem 10.2) then we get the general rule \[ \mathbb E(aX + bY + c) = a\,\mathbb EX + b \,\mathbb EY + c , \] and this holds whether or not \(X\) and \(Y\) are independent.

Proof. Part 1 is just the law of the unconscious statistician for the random variable \((X,Y)\), and the same proof holds.

For part 2, we have \[\begin{align*} \mathbb E(X + Y) &= \sum_{x,y} (x + y)p_{X,Y}(x,y) \\ &= \sum_{x,y} x\,p_{X,Y}(x,y) + \sum_{x,y} y\,p_{X,Y}(x,y) \\ &= \sum_x x \sum_y p_{X,Y}(x,y) + \sum_y y \sum_x p_{X,Y}(x,y) \end{align*}\] But summing a joint PMF over one of the variables gives the marginal PMF; so \(\sum_y p_{X,Y}(x,y) = p_X(x)\) and \(\sum_x p_{X,Y}(x,y) = p_Y(y)\). So this gives \[\begin{align*} \mathbb E(X + Y) &= \sum_x x\, p_X(x) + \sum_y y\,p_Y(y) \\ &= \mathbb EX + \mathbb EY . \end{align*}\]

For part 3, if \(X\) and \(Y\) are independent, then \(p_{X,Y}(x,y) = p_X(x) \, p_Y(y)\). Therefore, \[\begin{align*} \mathbb EXY &= \sum_{x,y} xy p_{X,Y}(x,y) \\ &= \sum_x \sum_y xy p_X(x) p_Y(y) \\ &= \sum_x x p_X(x) \sum_y y p_Y(y) \\ &= \mathbb EX \times \mathbb EY, \end{align*}\] as required.

Example 14.1 Let \(X_1, X_2, \dots, X_n\) be IID \(\text{Bern}(p)\) random variables. We know that \(\mathbb EX_1 = p\) and \(\operatorname{Var}(X_1) = p(1-p)\).

Now let \(Y = X_1 + X_2 + \cdots + X_n\). Since each \(X_i\) indicates whether or not trial \(i\) was a success, this means \(Y\) counts the number of successes in \(n\) trials. This is a binomial random variable, \(Y \sim \text{Bin}(n,p)\).

We can use this structure to calculate \[ \mathbb EY = \mathbb E(X_1 + \cdots + X_n) = n \mathbb EX_1 = np . \]

This has proved the expectation of the binomial distribution from Lecture 11. (Note that this would hold even if the trials weren’t independent.)

14.2 Covariance

If we are interested at how two random variables vary together, we need to look at the covariance.

Definition 14.1 Let \(X\) and \(Y\) be two random variables with expectations \(\mathbb EX =\mu_X\) and \(\mathbb EY = \mu_Y\) respectively. Then their covariance is \[ \operatorname{Cov}(X,Y) = \mathbb E(X - \mu_X)(Y - \mu_Y) . \]

In the least surprising result of this whole module, we also have a computational formula to go along with this definitional formula.

Theorem 14.2 Let \(X\) and \(Y\) be two random variables with expectations \(\mu_X\) and \(\mu_Y\) respectively. Then their covariance can also be calculated as \[ \operatorname{Cov}(X,Y) = \mathbb EXY - \mu_X\, \mu_Y . \]

Proof. Exactly as we’ve done many times before, we have \[\begin{align*} \operatorname{Cov}(X,Y) &= \mathbb E(X - \mu_X)(Y - \mu_Y) \\ &= \mathbb E(XY - X\,\mu_Y - \mu_X\, Y + \mu_X\,\mu_Y) \\ &= \mathbb EXY - \mu_Y \,\mathbb EX - \mu_X \,\mathbb EY + \mu_X \, \mu_Y \\ &= \mathbb EXY - \mu_X \, \mu_Y - \mu_X \, \mu_Y + \mu_X \, \mu_Y \\ &= \mathbb EXY - \mu_X \, \mu_Y , \end{align*}\] and we’re done.

Example 14.2 We continue with our coin-tossing example from the previous lecture, where \(X\) is the number of Heads in the first two coin tosses and \(Y\) the number of Heads in the first three coin tosses.

We know that \(X \sim \text{Bin}(2, \frac12)\), so \(\mu_X = 1\), and \(Y \sim \text{Bin}(3, \frac12)\), so \(\mu_Y = 1.5\). To find the covariance using the computational formula, we also need \(\mathbb EXY\), which is \[\begin{align*} \mathbb EXY &= \sum_{x,y} xy\, p_{X,Y}(x,y) \\ &= 0\times 0\times p_{X,Y}(0,0) + 0 \times 1 \times p_{X,Y}(0,1) + \cdots + 2\times 3 \times p_{X,Y}(2,3) \\ &= 0 \times \tfrac18 + 0 \times \tfrac18 + \cdots + 6 \times \tfrac18 \\ &= 2. \end{align*}\] Hence the covariance is \[ \operatorname{Cov}(X,Y) = \mathbb EXY - \mu_X\mu_Y = 2 - 1 \times 1.5 = 0.5 .\]

A very important fact is the following.

Theorem 14.3 If \(X\) and \(Y\) are independent, then \(\operatorname{Cov}(X,Y) = 0\).

Be careful not to get this the wrong way around: if \(\operatorname{Cov}(X,Y) = 0\) it doesn’t necessarily mean that \(X\) and \(Y\) are independent.

To use the “contrapositive” (which is allowed!), in our example, we have \(\operatorname{Cov}(X,Y) \neq 0\), which means that \(X\) and \(Y\) are not independent – confirming what we already knew.

Proof. Recall from Theorem 14.1 that if \(X\) and \(Y\) are independent, we have \(\mathbb EXY = \mathbb EX \times \mathbb EY = \mu_X \, \mu_Y\). Then from the computational formula, we have \[ \operatorname{Cov}(X,Y) = \mathbb EXY - \mu_X\,\mu_Y = \mu_X\,\mu_Y - \mu_X\,\mu_Y = 0, \] and we are done.

Here are some more important properties of the covariance.

Theorem 14.4 Let \(X\), \(Y\) and \(Z\) be random variables. Then

  1. \(\operatorname{Cov}(X,Y) = \operatorname{Cov}(Y,X)\);
  2. \(\operatorname{Cov}(X,X) = \operatorname{Var}(X)\);
  3. \(\operatorname{Cov}(aX, Y) = a\,\operatorname{Cov}(X,Y)\);
  4. \(\operatorname{Cov}(X + b, Y) = \operatorname{Cov}(X,Y)\);
  5. \(\operatorname{Cov}(X + Y, Z) = \operatorname{Cov}(X, Z) + \operatorname{Cov}(Y,Z)\).

Proof. Part 1 and 2 are immediate from the definition.

Parts 3, 4 and 5 are quite similar. We’ll do part 5 here, and you can do parts 3 and 4 on Problem Sheet 4.

For part 5, note that \(\mathbb E(X + Y) = \mu_X + \mu_Y\) by linearity of expectation. Hence \[\begin{align*} \operatorname{Cov}(X + Y, Z) &= \mathbb E \big((X + Y) - (\mu_X + \mu_Y)\big)(Z - \mu_Z) \\ &= \mathbb E \big((X - \mu_X) + (Y - \mu_Y)\big)(Z - \mu_Z) \\ &= \mathbb E \big((X - \mu_X)(Z - \mu_Z) + (Y - \mu_Y) (Z - \mu_Z) \big) \\ &= \mathbb E (X - \mu_X)(Z - \mu_Z) + \mathbb E (Y - \mu_Y) (Z - \mu_Z) \\ &= \operatorname{Cov}(X,Z) + \operatorname{Cov}(Y,Z) , \end{align*}\] as required.

Example 14.3 We could calculate the covariance in our coin-tossing example a different way, by noting that \(Y = X + Z\), where \(Z \sim \text{Bern}(\frac12)\) represents the third coin toss and is independent of \(X\). Then we have \[ \operatorname{Cov}(X,Y) = \operatorname{Cov}(X, X + Z) = \operatorname{Cov}(X, X) + \operatorname{Cov}(X, Z) = \operatorname{Var}(X) + 0 = \operatorname{Var}(X) ,\] where we used \(\operatorname{Cov}(X, Z) = 0\) since \(X\) and \(Z\) are independent. We already know that \(\operatorname{Var}(X) = 2 \times \tfrac12 \times (1 - \tfrac12) = \tfrac12\) because \(X \sim \text{Bin}(2, \frac12)\). So \(\operatorname{Cov}(X,Y) = \frac12\), matching our previous calculation.

Now that we know some facts about the covariance, we can calculate the variance of a sum.

Theorem 14.5 Let \(X\) and \(Y\) be two random variables. Then \[ \operatorname{Var}(X + Y) = \operatorname{Var}(X) + 2\operatorname{Cov}(X,Y) + \operatorname{Var}(Y) . \]

If \(X\) and \(Y\) are independent, then \[ \operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) . \]

It’s easy to forget the conditions for the following two facts:

  • \(\mathbb E(X + Y) = \mathbb EX + \mathbb EY\) regardless of whether \(X\) and \(Y\) are independent or not.
  • \(\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)\) if \(X\) and \(Y\) are independent.

Proof. For the main part of the proof, we start with the definition of variance. By linearity of expectation, we have \(\mathbb E(X + Y) = \mu_X + \mu_Y\). So \[\begin{align*} \operatorname{Var}(X + Y) &= \mathbb E\big((X + Y) - (\mu_X + \mu_Y)\big)^2 \\ &= \mathbb E \big((X - \mu_X) + (Y - \mu_Y) \big)^2 \\ &= \mathbb E \big( (X - \mu_X)^2 + 2(X - \mu_X)(Y - \mu_Y) + (Y - \mu_Y)^2\big) \\ &= \mathbb E(X - \mu_X)^2 + 2 \mathbb E(X - \mu_X)(Y - \mu_Y) + \mathbb E (Y - \mu_Y)^2 \\ &= \operatorname{Var}(X) + 2\operatorname{Cov}(X,Y) + \operatorname{Var}(Y) , \end{align*}\] where we used the linearity of expectation.

For the second part, recall that is \(X\) and \(Y\) are independent, then \(\operatorname{Cov}(X,Y) = 0\).

Example 14.4 Returning to the “binomial as a sum of Bernoullis” example, we have \[ \operatorname{Var}(Y) = \operatorname{Var}(X_1 + \cdots X_n) = n\operatorname{Var}(X_1) = np(1-p) . \] Here we used that the \(X_i\) are independent (the first I in “IID”).

14.3 Correlation

It can sometimes be useful to “normalise” the covariance, by dividing through by the individual standard deviations. This gives a measurement of the linear relationship between two random variables.

Definition 14.2 Let \(X\) and \(Y\) be two random variables. Then the correlation between \(X\) and \(Y\) is \[ \operatorname{Corr}(X,Y) = \frac{\operatorname{Cov}(X,Y)} {\sqrt{\operatorname{Var}(X)\,\operatorname{Var}(Y)}} . \]

As with the sample correlation \(r_{xy}\) from Section 1, the correlation is a number between \(-1\) and \(+1\), where values near \(+1\) mean that large values of \(X\) and large values of \(Y\) are likely to occur together, while values near \(-1\) mean that large values of \(X\) and small values of \(Y\) are likely to occur together.

Recall that, if \(X\) and \(Y\) are independent, then \(\operatorname{Cov}(X,Y) = 0\). Hence it follows that if \(X\) and \(Y\) are independent, then \(\operatorname{Corr}(X,Y) = 0\) also.

Example 14.5 For the coin-tossing again, we have \[ \operatorname{Corr}(X,Y) = \frac{\operatorname{Cov}(X,Y)} {\sqrt{\operatorname{Var}(X)\,\operatorname{Var}(Y)}} = \frac{\frac12}{\sqrt{\frac12 \times \frac34}} = \sqrt{\tfrac23} = 0.816 . \]

Summary

  • \(\mathbb E(X + Y) = \mathbb EX + \mathbb EY\)
  • The covariance is \(\operatorname{Cov}(X,Y) = \mathbb E(X - \mu_X)(Y - \mu_Y) = \mathbb EXY - \mu_X \,\mu_Y\).
  • \(\operatorname{Var}(X + Y) = \operatorname{Var}(X) + 2\operatorname{Cov}(X,Y) + \operatorname{Var}(Y)\); or if \(X\) and \(Y\) are independent, then \(\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)\).

Recommended reading: