3 Monte Carlo error I: theory

3.1 Estimation error

Today we are going to analysing the accuracy of Monte Carlo estimation. But before talking about Monte Carlo estimation specifically, let’s first remind ourselves of some concepts about error in statistical estimation more generally. We will use the following definitions.

Definition 3.1 Let \(\widehat\theta\) be an estimator of a parameter \(\theta\). Then we have the following definitions of the estimate \(\widehat\theta\):

The bias is \(\operatorname{bias}\big(\widehat\theta\big) = \mathbb E\big(\widehat\theta - \theta\big) = \mathbb E\widehat\theta - \theta\).
The mean-square error is \(\operatorname{MSE}\big(\widehat\theta\big) = \mathbb E \big(\widehat\theta - \theta\big)^2\).
The root-mean-square error is the square-root of the mean-square error, \[\operatorname{RMSE}\big(\widehat\theta\big) = \sqrt{\operatorname{MSE}(\widehat\theta)} = \sqrt{\mathbb E (\widehat\theta - \theta)^2} . \]

Usually, the main goal of estimation is to get the mean-square error of an estimate as small as possible. This is because the MSE measures by what distance we are missing on average. It can be more convenient to discuss the root-mean-square error, as that has the same units as the parameter being measured. (If \(\theta\) and \(\widehat{\theta}\) are in metres, say, then the MSE is in metres-squared, whereas the RMSE error is in metres again.)

It’s nice to have an unbiased estimator – that is, one with bias 0. This is because bias measures any systematic error in a particular direction. However, unbiasedness by itself is not enough for an estimate to be good – we need low variance too. (Remember the old joke about the statistician who misses his first shot ten yards to the left, misses his second shot ten yards to the right, then claims to have “hit the target on average.”)

(Remember also that “bias” is simply the word statisticians use for \(\mathbb E(\widehat\theta - \theta)\); we don’t mean “bias” in the derogatory way it is sometimes used in political arguments, for example.)

You probably also remember the relationship between the mean-square error, the bias, and the variance:

Theorem 3.1 \(\operatorname{MSE}\big(\widehat\theta\big) = \operatorname{bias}\big(\widehat\theta\big)^2 + \operatorname{Var}\big(\widehat\theta\big)\).

(There’s a proof in Voss, An Introduction to Statistical Computing, Proposition 3.14, if you’ve forgotten.)

Since the bias contributes to the mean-square error, that’s another reason to like estimator with low – or preferably zero – bias. (That said, there are some situations where there’s a “bias–variance tradeoff”, where allowing some bias reduces the variance and so can reduce the MSE. It turns out that Monte Carlo is not one of these cases, however.)

3.2 Bias and error of the Monte Carlo estimator

In this lecture, we’re going to be looking more carefully at the size of the errors made by the Monte Carlo estimator \[ \widehat{\theta}_n^{\mathrm{MC}} = \frac{1}{n} \big(\phi(X_1) + \phi(X_2) + \cdots + \phi(X_n) \big) = \frac{1}{n} \sum_{i=1}^n \phi(X_i) . \]

Our main result is the following.

Theorem 3.2 Let \(X\) be a random variable, \(\phi\) a function, and \(\theta = \Exg\phi(X)\). Let \[ \widehat{\theta}_n^{\mathrm{MC}} = \frac{1}{n} \sum_{i=1}^n \phi(X_i) \] be the Monte Carlo estimator of \(\theta\). Then:

\(\widehat{\theta}_n^{\mathrm{MC}}\) is unbiased, in that \(\operatorname{bias}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = 0\).
The variance of of \(\widehat{\theta}_n^{\mathrm{MC}}\) is \({\displaystyle \operatorname{Var}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = \frac{1}{n} \operatorname{Var}\big(\phi(X)\big)}\).
The mean-square error of \(\widehat{\theta}_n^{\mathrm{MC}}\) is \({\displaystyle \operatorname{MSE}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = \frac{1}{n} \operatorname{Var}\big(\phi(X)\big)}\).
The root-mean-square error of \(\widehat{\theta}_n^{\mathrm{MC}}\) is \[{\displaystyle \operatorname{RMSE}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = \sqrt{\frac{1}{n} \operatorname{Var}\big(\phi(X)\big)} = \frac{1}{\sqrt{n}} \operatorname{sd}\big(\phi(X)\big)}. \]

Before we get to the proof, let’s recap some relevant probability.

Let \(Y_1, Y_2, \dots\) be IID random variables with common expectation \(\mathbb EY_1 = \mu\) and common variance \(\operatorname{Var}(Y_1) = \sigma^2\). Consider the mean of the first \(n\) random variables, \[ \overline{Y}_n = \frac{1}{n} \sum_{i=1}^n Y_i . \] Then the expectation of \(\overline{Y}_n\) is \[ \mathbb E \overline{Y}_n = \mathbb E\left(\frac{1}{n}\sum_{i=1}^n Y_i\right) = \frac{1}{n} \sum_{i=1}^n \mathbb{E}Y_i = \frac{1}{n}\,n\mu = \mu . \] The variance of \(\overline{Y}_n\) is \[ \operatorname{Var}\big( \overline{Y}_n \big)= \operatorname{Var} \left(\frac{1}{n}\sum_{i=1}^n Y_i\right) = \bigg(\frac{1}{n}\bigg)^2 \sum_{i=1}^n \operatorname{Var}(Y_i) = \frac{1}{n^2}\,n\sigma^2 = \frac{\sigma^2}{n} , \] where, for this one, we used the independence of the random variables.

Proof. Apply the probability facts from above with \(Y = \phi(X)\). This gives:

\(\Ex \widehat{\theta}_n^{\mathrm{MC}} = \Ex \overline Y_n = \Ex Y = \Exg \phi(X)\), so \(\operatorname{bias}(\widehat{\theta}_n^{\mathrm{MC}}) = \Exg \phi(X) - \Exg \phi(X) = 0\).
\({\displaystyle \operatorname{Var}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = \operatorname{Var}\big(\overline Y_n\big) = \frac{1}{n} \operatorname{Var}(Y) = \frac{1}{n} \operatorname{Var}\big(\phi(X)\big)}\).
Using Theorem 3.1, \[\operatorname{MSE}(\widehat{\theta}_n^{\mathrm{MC}}) = \operatorname{bias}(\widehat{\theta}_n^{\mathrm{MC}})^2 + \operatorname{Var}(\widehat{\theta}_n^{\mathrm{MC}}) = 0^2 + \frac{1}{n} \operatorname{Var}\big(\phi(X)\big) = \frac{1}{n} \operatorname{Var}\big(\phi(X)\big) . \]
Take the square root of part 3.

There’s a problem here, though. The reason we are doing Monte Carlo estimation in the first place is that we couldn’t calculate \(\Exg \phi(X)\). So it seems very unlikely we’ll be able to calculate the variance \(\operatorname{Var}(\phi(X))\) either.

But we can estimate the variance from our samples too: by taking the sample variance of our samples \(\phi(x_i)\). That is, we can estimate the variance of the Monte Carlo estimator by \[ S^2 = \frac{1}{n-1} \sum_{i=1}^n \big(\phi(X_i) - \widehat{\theta}_n^{\mathrm{MC}} \big)^2 . \] Then we can similarly estimate the mean-square and root-mean-square errors by \[ \text{MSE} \approx \frac{1}{n}S^2 \qquad \text{and} \qquad \text{RMSE} \approx \sqrt{\frac{1}{n} S^2} = \frac{1}{\sqrt{n}}S \] respectively.

3.3 Example

Example 3.1 Let’s go back to the very first example in the module, Example 1.1, where we were trying to find the expectation of an \(\operatorname{Exp}(2)\) random variable. We used this R code:

n <- 1e6
samples <- rexp(n, 2)
MCest <- mean(samples)
MCest

[1] 0.4997326

(Because Monte Carlo estimation is random, this won’t be the exact same estimate we had before, of course.)

So if we want to investigate the error, we can use the sample variance of these samples. We will use the sample variance function var() to calculate the sample variance.

var_est <- var(samples)
MSEest  <- var_est / n
RMSEest <- sqrt(MSEest)
c(var_est, MSEest, RMSEest)

[1] 2.497426e-01 2.497426e-07 4.997425e-04

The first number is var_est \(= 0.25\), the sample variance of our \(\phi(x_i)\)s: \[ s^2 = \frac{1}{n-1} \sum_{i=1}^n \big(\phi(x_i) - \widehat{\theta}_n^{\mathrm{MC}}\big)^2 . \] This should be a good estimate of the true variance \(\operatorname{Var}(\phi(X))\). In calculating this, we used R’s var() function, which calculate the sample variance of some data.

The second number is MSEest \(= 2.5\times 10^{-7}\), our estimate of the mean-square error. Since \(\operatorname{MSE}(\widehat{\theta}_n^{\mathrm{MC}}) = \frac{1}{n} \operatorname{Var}(\phi(X))\), then \(\frac{1}{n} s^2\) should be a good estimate of the MSE.

The third number is RMSEest \(= 5\times 10^{-4}\) our estimate of the root-mean square error, which is simply the square-root of our estimate of the mean-square error.

Next time: We’ll continue analysing Monte Carlo error, with more examples.

Summary:

The Monte Carlo estimator is unbiased.
The Monte Carlo estimator has mean-square error \(\Var(\phi(X))/n\), so the root-mean-square error scales like \(1/\sqrt{n}\).
The mean-square error can be estimated by \(S^2 / n\), where \(S^2\) is the sample variance of \(\phi(X)\).

On Problem Sheet 1, you should now be able to answers Questions 1–6, except 2(c).

Read more: Voss, An Introduction to Statistical Computing, Subsection 3.2.2.