<- 1e6
n <- rexp(n, 2)
samples <- mean(samples)
MCest MCest
[1] 0.4997326
\[ \newcommand{\Exg}{\operatorname{\mathbb{E}}} \newcommand{\Ex}{\mathbb{E}} \newcommand{\Ind}{\mathbb{I}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\Corr}{\operatorname{Corr}} \newcommand{\ee}{\mathrm{e}} \]
\[ \]
Today we are going to analysing the accuracy of Monte Carlo estimation. But before talking about Monte Carlo estimation specifically, let’s first remind ourselves of some concepts about error in statistical estimation more generally. We will use the following definitions.
Definition 3.1 Let \(\widehat\theta\) be an estimator of a parameter \(\theta\). Then we have the following definitions of the estimator \(\widehat\theta\):
The bias is \(\operatorname{bias}\big(\widehat\theta\big) = \mathbb E\big(\widehat\theta - \theta\big) = \mathbb E\widehat\theta - \theta\).
The mean-square error is \(\operatorname{MSE}\big(\widehat\theta\big) = \mathbb E \big(\widehat\theta - \theta\big)^2\).
The root-mean-square error is the square-root of the mean-square error, \[\operatorname{RMSE}\big(\widehat\theta\big) = \sqrt{\operatorname{MSE}(\widehat\theta)} = \sqrt{\mathbb E (\widehat\theta - \theta)^2} . \]
Usually, the main goal of estimation is to get the mean-square error of an estimate as small as possible. This is because the MSE measures by what distance we are missing on average. It can be easier to interpret what the root-mean-square error means, as the RMSE has the same units as the parameter being measured: if \(\theta\) and \(\widehat{\theta}\) are in metres, say, then the MSE is in metres-squared, whereas the RMSE error is in metres again. If you minimise the MSE you also minimise the RMSE and vice versa.
It’s nice to have an “unbiased” estimator – that is, one with bias 0. This is because bias measures any systematic error in a particular direction. However, unbiasedness by itself is not enough for an estimate to be good – we need low variance too. (Remember the old joke about the statistician who misses his first shot ten yards to the left, misses his second shot ten yards to the right, then claims to have “hit the target on average.”)
(Remember also that “bias” is simply the word statisticians use for \(\mathbb E(\widehat\theta - \theta)\); we don’t mean “bias” in the derogatory way it is sometimes used in political arguments, for example.)
You probably also remember the relationship between the mean-square error, the bias, and the variance:
Theorem 3.1 \(\operatorname{MSE}\big(\widehat\theta\big) = \operatorname{bias}\big(\widehat\theta\big)^2 + \operatorname{Var}\big(\widehat\theta\big)\).
Proof. The MSE is \[\begin{align} \operatorname{MSE}\big(\widehat\theta\big) = \Exg\big(\widehat\theta - \theta\big)^2 &= \Exg \big(\widehat\theta^2 - 2\theta\widehat\theta + \theta\big)^2 \\ &= \Exg \widehat\theta^2 - 2\theta \Exg \widehat\theta + \theta^2 , \end{align}\] where we have expanded the brackets and bought the expectation inside (remembering that \(\theta\) is a constant). Since the variance can be written as \(\Var(\widehat\theta) = \Exg\widehat\theta^2 - (\Exg \widehat\theta)^2\), we can use a cunning trick of both subtracting and adding \((\Exg \widehat\theta)^2\). This gives \[\begin{align} \operatorname{MSE}\big(\widehat\theta\big) &= \Exg \widehat\theta^2 - \big(\!\Exg \widehat\theta\big)^2 + \big(\!\Exg \widehat\theta\big)^2 - 2\theta \Exg \widehat\theta + \theta^2 \\ &= \Var\big(\widehat\theta\big) + \big( (\Exg \widehat\theta)^2 - 2\theta \Exg \widehat\theta + \theta^2 \big) \\ &= \Var\big(\widehat\theta\big) + \big( \! \Exg \widehat\theta - \theta\big)^2 \\ &= \Var\big(\widehat\theta\big) + \operatorname{bias}(\widehat\theta)^2 . \end{align}\] This proves the result.
Since the bias contributes to the mean-square error, that’s another reason to like estimator with low – or preferably zero – bias. But again, unbiasedness isn’t enough by itself; we want low variance too. (There are some situations where there’s a “bias–variance tradeoff”, where allowing some bias reduces the variance and so can reduce the MSE. It turns out that Monte Carlo is not one of these cases, however.)
In this lecture, we’re going to be looking more carefully at the size of the errors made by the Monte Carlo estimator \[ \widehat{\theta}_n^{\mathrm{MC}} = \frac{1}{n} \big(\phi(X_1) + \phi(X_2) + \cdots + \phi(X_n) \big) = \frac{1}{n} \sum_{i=1}^n \phi(X_i) . \]
Our main result is the following.
Theorem 3.2 Let \(X\) be a random variable, \(\phi\) a function, and \(\theta = \Exg\phi(X)\). Let \[ \widehat{\theta}_n^{\mathrm{MC}} = \frac{1}{n} \sum_{i=1}^n \phi(X_i) \] be the Monte Carlo estimator of \(\theta\). Then:
\(\widehat{\theta}_n^{\mathrm{MC}}\) is unbiased, in that \(\operatorname{bias}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = 0\).
The variance of of \(\widehat{\theta}_n^{\mathrm{MC}}\) is \({\displaystyle \operatorname{Var}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = \frac{1}{n} \operatorname{Var}\big(\phi(X)\big)}\).
The mean-square error of \(\widehat{\theta}_n^{\mathrm{MC}}\) is \({\displaystyle \operatorname{MSE}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = \frac{1}{n} \operatorname{Var}\big(\phi(X)\big)}\).
The root-mean-square error of \(\widehat{\theta}_n^{\mathrm{MC}}\) is \[{\displaystyle \operatorname{RMSE}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = \sqrt{\frac{1}{n} \operatorname{Var}\big(\phi(X)\big)} = \frac{1}{\sqrt{n}} \, \operatorname{sd}\big(\phi(X)\big)}. \]
Before we get to the proof, let’s recap some relevant probability.
Let \(Y_1, Y_2, \dots\) be IID random variables with common expectation \(\mathbb EY_1 = \mu\) and common variance \(\operatorname{Var}(Y_1) = \sigma^2\). Consider the mean of the first \(n\) random variables, \[ \overline{Y}_n = \frac{1}{n} \sum_{i=1}^n Y_i . \] Then the expectation of \(\overline{Y}_n\) is \[ \mathbb E \overline{Y}_n = \mathbb E\left(\frac{1}{n}\sum_{i=1}^n Y_i\right) = \frac{1}{n} \sum_{i=1}^n \mathbb{E}Y_i = \frac{1}{n}\,n\,\mu = \mu . \] The variance of \(\overline{Y}_n\) is \[ \operatorname{Var}\big( \overline{Y}_n \big)= \operatorname{Var} \left(\frac{1}{n}\sum_{i=1}^n Y_i\right) = \bigg(\frac{1}{n}\bigg)^2 \sum_{i=1}^n \operatorname{Var}(Y_i) = \frac{1}{n^2}\,n\,\sigma^2 = \frac{\sigma^2}{n} , \] where, for this one, we used the independence of the random variables.
Proof. Apply the probability facts from above with \(Y = \phi(X)\). This gives:
\(\Ex \widehat{\theta}_n^{\mathrm{MC}} = \Ex \overline Y_n = \Ex Y = \Exg \phi(X)\), so \(\operatorname{bias}(\widehat{\theta}_n^{\mathrm{MC}}) = \Exg \phi(X) - \Exg \phi(X) = 0\).
\({\displaystyle \operatorname{Var}\big(\widehat{\theta}_n^{\mathrm{MC}}\big) = \operatorname{Var}\big(\overline Y_n\big) = \frac{1}{n} \operatorname{Var}(Y) = \frac{1}{n} \operatorname{Var}\big(\phi(X)\big)}\).
Using Theorem 3.1, \[\operatorname{MSE}(\widehat{\theta}_n^{\mathrm{MC}}) = \operatorname{bias}(\widehat{\theta}_n^{\mathrm{MC}})^2 + \operatorname{Var}(\widehat{\theta}_n^{\mathrm{MC}}) = 0^2 + \frac{1}{n} \operatorname{Var}\big(\phi(X)\big) = \frac{1}{n} \operatorname{Var}\big(\phi(X)\big) . \]
Take the square root of part 3.
Let’s think about MSE \(\frac{1}{n} \Var(\phi(X))\). The variance terms is some fixed fact about the random variable \(X\) and the function \(\phi\). So as \(n\) gets bigger, \(\frac{1}{n}\) gets smaller, so the MSE gets smaller, and the estimator gets more accurate. This goes back to what we said when we introduced the Monte Carlo estimator: we get a more accurate estimate by increasing \(n\). More specifically, the MSE scales like \(1/n\), or – perhaps a more useful result – the RMSE scales like \(1/\sqrt{n}\). We’ll come back to this in the next lecture.
So when we form a Monte Carlo estimate \(\hat\theta_n^{\text{MC}}\), we now know it will be unbiased. We’d also like to know it’s mean-square and/or root-mean-square error too.
There’s a problem here, though. The reason we are doing Monte Carlo estimation in the first place is that we couldn’t calculate \(\Exg \phi(X)\). So it seems very unlikely we’ll be able to calculate the variance \(\operatorname{Var}(\phi(X))\) either. So how will be able to assess the mean-square (or root-mean-square) error of our Monte Carlo estimator?
Well, we can’t know it exactly. But we can estimate the variance from the samples we are already using: by taking the sample variance of the samples \(\phi(x_i)\). That is, we can estimate the variance of the Monte Carlo estimator by the sample variance \[ S^2 = \frac{1}{n-1} \sum_{i=1}^n \big(\phi(X_i) - \widehat{\theta}_n^{\mathrm{MC}} \big)^2 . \] Then we can similarly estimate the mean-square and root-mean-square errors by \[ \text{MSE} \approx \frac{1}{n}S^2 \qquad \text{and} \qquad \text{RMSE} \approx \sqrt{\frac{1}{n} S^2} = \frac{1}{\sqrt{n}}\,S \] respectively.
Example 3.1 Let’s go back to the very first example in the module, Example 1.1, where we were trying to find the expectation of an \(\operatorname{Exp}(2)\) random variable. We used this R code:
<- 1e6
n <- rexp(n, 2)
samples <- mean(samples)
MCest MCest
[1] 0.4997326
(Because Monte Carlo estimation is random, this won’t be the exact same estimate we had before, of course.)
So if we want to investigate the error, we can use the sample variance of these samples. We will use the sample variance function var()
to calculate the sample variance. In this simple case, the function is \(\phi(x) = x\), so we need only use the variance of the samples themselves.
<- var(samples)
var_est <- var_est / n
MSEest <- sqrt(MSEest)
RMSEest c(var_est, MSEest, RMSEest)
[1] 2.497426e-01 2.497426e-07 4.997425e-04
The first number is var_est
\(= 0.2497\), the sample variance of our \(\phi(x_i)\)s:
\[ s^2 = \frac{1}{n-1} \sum_{i=1}^n \big(\phi(x_i) - \widehat{\theta}_n^{\mathrm{MC}}\big)^2 . \] This should be a good estimate of the true variance \(\operatorname{Var}(\phi(X))\). (In fact, in this simple case, we know that \(\operatorname{Var}(X) = \frac{1}{2^2} = 0.25\), so we know that the estimate is good.) In calculating this in the code, we used R’s var()
function, which calculates the sample variance of some values.
The second number is MSEest
\(= 2.497\times 10^{-7}\), our estimate of the mean-square error. Since \(\operatorname{MSE}(\widehat{\theta}_n^{\mathrm{MC}}) = \frac{1}{n} \operatorname{Var}(\phi(X))\), then \(\frac{1}{n} S^2\) should be a good estimate of the MSE.
The third number is RMSEest
\(= 5\times 10^{-4}\) our estimate of the root-mean square error, which is simply the square-root of our estimate of the mean-square error.
Example 3.2 In Example 2.1, we were estimating \(\mathbb P(Z > 2)\), where \(Z\) is a standard normal.
Our code was
<- 1e6
n <- rnorm(n)
samples <- mean(samples > 2)
MCest MCest
[1] 0.022299
So our root-mean-square error can be approximated as
<- var(samples > 2) / n
MSEest sqrt(MSEest)
[1] 0.0001476542
since samples > 2
is the indicator function of whether \(X_i > 2\) or not.
Next time: We’ll continue analysing Monte Carlo error, looking at confidence intervals and assessing how many samples to take..
Summary:
The Monte Carlo estimator is unbiased.
The Monte Carlo estimator has mean-square error \(\Var(\phi(X))/n\), so the root-mean-square error scales like \(1/\sqrt{n}\).
The mean-square error can be estimated by \(S^2 / n\), where \(S^2\) is the sample variance of the \(\phi(X_i)\).
Read more: Voss, An Introduction to Statistical Computing, Subsection 3.2.2.