4  Monte Carlo error II: practice

4.1 Recap

Let’s recap where we’ve got to. We know that the Monte Carlo estimator for \(\theta = \Exg \phi(X)\) is \[ \widehat{\theta}_n^{\mathrm{MC}} = \frac{1}{n} \sum_{i=1}^n \phi(X_i) .\] Last time, we saw that the Monte Carlo estimator is unbiased, and that its mean-square and root-mean-square errors are \[ \operatorname{MSE}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = \frac{1}{n} \operatorname{Var}\big(\phi(X)\big) \qquad \operatorname{RMSE}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = \sqrt{\frac{1}{n} \operatorname{Var}\big(\phi(X)\big)} . \] We saw that these themselves can be estimated as \(S^2/n\) and \(S/\sqrt{n}\) respectively, where \(S^2\) is the sample variance of the \(\phi(X_i)\)s.

Let’s do one more example before moving on.

Example 4.1 In Example 2.1, we were estimating \(\mathbb P(Z > 2)\), where \(Z\) is a standard normal.

Our code was

n <- 1e6
samples <- rnorm(n)
MCest <- mean(samples > 2)
MCest
[1] 0.022686

So our root-mean-square error can be approximated as

RMSEest <- sqrt(var(samples > 2) / n)
RMSEest
[1] 0.0001489005

4.2 Confidence intervals

So far, we have described our error tolerance in terms of the MSE or RMSE. But we could have talked about “confidence intervals” or “margins of error” instead. This might be easier to understand for non-mathematicians, for whom “root-mean-square error” doesn’t really mean anything.

Here, we will want to appeal to the central limit theorem approximation. A bit more probability revision: Let \(Y_1, Y_2, \dots\) be IID again, with expectation \(\mu\) and variance \(\sigma^2\). Write \(\overline Y_n\) for the mean. We’ve already reminded ourselves that \(\mathbb E \overline Y_n = \mu\) and \(\Var(\overline{Y}_n) = \sigma^2/n\). But the central limit theorem says that the distribution of \(\overline Y_n\) is approximately normally distributed with those parameters, so \(\overline Y_n \approx \operatorname{N}(\mu, \sigma^2/n)\) when \(n\) is large. (This is an informal statement of the central limit theorem: you probably know some more formal ways to more precisely state the it, but this will do for us.)

Recall that, in the normal distribution \(\operatorname{N}(\mu, \sigma^2)\), we expect to be within \(1.96\) standard deviations of the mean with 95% probability. More generally, the interval \([\mu - q_{1-\alpha/2}\sigma, \mu + q_{1-\alpha/2}\sigma]\), where \(q_{1-\alpha/2}\) is the \((1- \frac{\alpha}{2})\)-quantile of the normal distribution, contains the true value with probability approximately \(1 - \alpha\).

We can form an approximate confidence interval for a Monte Carlo estimate using this idea. We have our Monte Carlo estimator \(\widehat{\theta}_n^\mathrm{MC}\) as our estimator of the \(\mu\) parameter, and our estimator of the root-mean-square error \(S/\sqrt{n}\) as our estimator of the \(\sigma\) parameter. So our confidence interval is estimated as \[\bigg[ \widehat{\theta}_n^\mathrm{MC} - q_{1-\alpha/2}\,\frac{S}{\sqrt{n}}, \ \widehat{\theta}_n^\mathrm{MC} + q_{1-\alpha/2}\,\frac{S}{\sqrt{n}} \bigg] . \]

Example 4.2 We continue the example of Example 2.1 and Example 4.1, where we were estimating \(\mathbb P(Z > 2)\) for \(Z\) a standard normal.

n <- 1e6
samples <- rnorm(n)
MCest   <- mean(samples > 2)
RMSEest <- sqrt(var(samples > 2) / n)
MCest
[1] 0.022735

Our confidence interval is estimates as follows

alpha <- 0.05
quant <- qnorm(1 - alpha / 2)
c(MCest - quant * RMSEest, MCest + quant * RMSEest)
[1] 0.02244285 0.02302715

4.3 How many samples do I need?

In our examples we’ve picked the number of samples \(n\) for our estimator, then approximated the error based on that. But we could do things the other way around – fix an error tolerance that we’re willing to deal with, then work out what sample size we need to achieve it.

We know that the root-mean-square error is \[ \operatorname{RMSE}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = \sqrt{\frac{1}{n} \operatorname{Var}\big(\phi(X)\big)} \] So if we want to get the RMSE down to \(\epsilon\), say, then this shows that we need \[ n = \frac{1}{\epsilon^2} \Var\big(\phi(X)\big) . \]

We still have a problem here, though. We (usually) don’t know \(\Var(\phi(X))\). But we can’t even estimate \(\Var(\phi(X))\) until we’ve already taken the samples. But we can use this idea with a three-step process:

  1. Run an initial “pilot” Monte Carlo algorithm with a small number of samples \(n\). Use the results of the “pilot” to estimate the variance \(S^2 \approx \Var(\phi(X))\). We want \(n\) small enough that this runs very quickly, but big enough that we get a reasonably OK estimate of the variance.

  2. Pick a desired RMSE accuracy \(\epsilon\). We now know that we require roughly \(N = S^2 / \epsilon^2\) samples to get our desired accuracy.

  3. Run the “real” Monte Carlo algorithm with this big number of samples \(N\). We will put up with this being quite slow, because we know we’re definitely going to get the error tolerance we need.

(We could potentially use further steps, where we now check the variance with the “real” big-\(N\) samples, and, if we learn we had underestimated in Step 1, take even more samples to correct for this.)

Example 4.3 Let’s try this with Example 1.2 from before. We were trying to estimate \(\mathbb{E}(\sin X)\), where \(X \sim \operatorname{N}(1, 2^2)\).

We’ll start with just \(n = 1000\) samples, for our pilot study.

n_pilot <- 1000
samples <- rnorm(n_pilot, 1, 2)
var_est <- var(sin(samples))
var_est
[1] 0.4900872

This was very quick! We won’t have got a super-accurate estimate of \(\mathbb E\phi(X)\), but we have a reasonable idea of roughly what \(\operatorname{Var}(\phi(X))\) is. This will allow us to pick out “real” sample size in order to get a root-mean-square error of \(10^{-4}\).

epsilon <- 1e-4
n_real  <- round(var_est / epsilon^2)
n_real
[1] 49008721

This tells us that we will need about 50 million samples! This is a lot, but now we know we’re going to get the accuracy we want, so it’s worth it. (In this particular case, 50 million samples will only take a few second on a modern computer. But generally, once we know our code works and we know how many samples we will need for the desired accuracy, this is the sort of thing that we could leave running overnight or whatever.)

samples <- rnorm(n_real, 1, 2)
MCest <- mean(sin(samples))
MCest
[1] 0.1139167
RMSEest <- sqrt(var(sin(samples)) / n_real)
RMSEest
[1] 9.968996e-05

This was very slow, of course. But we see that we have indeed got our Monte Carlo estimate to (near enough) the desired accuracy.

Generally, if we want a more accurate Monte Carlo estimator, we can just take more samples. But the equation \[ n = \frac{1}{\epsilon^2} \Var\big(\phi(X)\big) \] is actually quite bad news. To get an RMSE of \(\epsilon\) we need order \(1/\epsilon^2\) samples. That’s not good. Think of it like this: to double the accuracy we need to quadruple the number of samples. Even worse: to get “one more decimal place of accuracy” means dividing \(\epsilon\) by ten; but that means multiplying the number of samples by one hundred!

More samples take more time, and cost more energy and money. Wouldn’t it be nice to have some better ways of increasing the accuracy of a Monte Carlo estimate besides just taking more and more samples?

Next time: We begin our study of clever “variance reduction” methods for Monte Carlo estimation.

Summary:

  • We can approximate confidence intervals for a Monte Carlo estimate by using a normal approximation.

  • To get the root-mean-square error below \(\epsilon\) we need \(n = \Var(\phi(X))/\epsilon^2\) samples.

  • We can use a two-step process, where a small “pilot” Monte Carlo estimation allows us to work out how many samples we will need for the big “real” estimation.

Read more: Voss, An Introduction to Statistical Computing, Subsections 3.2.2–3.2.4.