<- 1e6
n <- rnorm(n)
samples <- mean(samples > 2)
MCest MCest
[1] 0.022713
\[\newcommand{\Exg}{\operatorname{\mathbb{E}}} \newcommand{\Ex}{\mathbb{E}} \newcommand{\Ind}{\mathbb{I}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\Corr}{\operatorname{Corr}} \newcommand{\ee}{\mathrm{e}}\]
This lecture and the next, we will be looking at our second variance reduction method for Monte Carlo estimation: the use of antithetic variables.” The word “antithetic” refers to using negative correlation to reduce the variance an estimator.
Let’s start with the simple example of estimating an expectation from \(n = 2\) samples. Suppose \(Y\) has expectation \(\mu = \Ex Y\) and variance \(\Var(Y) = \sigma^2\). Suppose \(Y_1\) and \(Y_2\) are independent samples from \(Y\). Then the Monte Carlo estimator is \[ \overline Y = \tfrac12(Y_1 + Y_2) . \] This estimator is unbiased, since \[ \Ex \overline Y = \Ex \big(\tfrac12(Y_1 + Y_2)\big) = \tfrac12 ( \Ex Y_1 + \Ex Y_2 ) = \tfrac12 (\mu + \mu) = \mu . \] Thus the mean-square error equals the variance, which is \[ \Var \big( \overline Y\big) = \Var \big(\tfrac12(Y_1 + Y_2)\big) =\tfrac14 \big( \Var(Y_1) + \Var(Y_2) \big)= \tfrac14 (\sigma^2 + \sigma^2) = \tfrac12 \sigma^2 . \]
But what if \(Y_1\) and \(Y_2\) still have the same distribution as \(Y\) but now are not independent? The expectation is still the same, so the estimator is still unbiased. But the variance (and hence mean-square error) is now \[ \Var \big( \overline Y\big) = \Var \big(\tfrac12(Y_1 + Y_2)\big) =\tfrac14 \big( \Var(Y_1) + \Var(Y_2) + 2 \Cov(Y_1, Y_2) \big) . \] Write \(\rho\) for the correlation \[ \rho = \Corr(Y_1, Y_2) = \frac{\Cov(Y_1, Y_2)}{\sqrt{\Var(Y_1) \Var(Y_2)}} = \frac{\Cov(Y_1, Y_2)}{\sqrt{\sigma^2 \times \sigma^2}} = \frac{\Cov(Y_1, Y_2)}{\sigma^2} . \] (Remember that \(-1 \leq \rho \leq +1\).) Then the variance is \[ \Var \big( \overline Y\big) = \tfrac14 ( \sigma^2 + \sigma^2 + 2 \rho \sigma^2 ) = \frac{1+\rho}{2} \,\sigma^2 . \]
We can compare this with the variance \(\frac12 \sigma^2\) from the independent-sample case:
If \(Y_1\) and \(Y_2\) are positively correlated, in that \(\rho > 0\), then the variance, and hence the mean-square error, has got bigger. This means the estimator is worse. This is because, with positive correlation, errors compound each other – if one sample is bigger than average, then the other one is likely to be bigger than average too; while if one sample is smaller than average, then the other one is likely to be smaller than average too.
If \(Y_1\) and \(Y_2\) are negatively correlated, in that \(\rho < 0\), then the variance, and hence the mean-square error, has got smaller. This means the estimator is better. This is because, with negative correlation, errors compensate for each other – if one sample is bigger than average, then the other one is likely to be smaller than average, which will help “cancel out” the error.
We have seen that negative correlation helps improve estimation from \(n=2\) samples. How can we make this work in our favour for Monte Carlo simulation with many more samples?
We will look at the idea of antithetic pairs. So instead of taking \(n\) samples \[ X_1, X_2, \dots, X_n \] that are all independent of each other, we will take \(n/2\) pairs of samples \[ (X_1, X'_1), (X_2, X'_2), \dots, (X_{n/2}, X'_{n/2}) . \] (Here, \(n/2\) pairs means \(n\) samples over all.) Within each pair, \(X_i\) and \(X_i'\) will not be independent, but between different pairs \(i \neq j\), \((X_i, X_i')\) and \((X_j, X'_j)\) will be independent.
Definition 6.1 Let \(X\) be a random variable, \(\phi\) a function, and write \(\theta = \Exg\phi(X)\). Let \(X'\) have the same distribution as \(X\) (but not necessarily be independent of it). Suppose that \((X_1, X_1')\), \((X_2, X_2')\), \(\dots\), \((X_{n/2}, X'_{n/2})\) are pairs of random samples from \((X, X')\). Then the antithetic variables Monte Carlo estimator \(\widehat\theta_n^{\mathrm{AV}}\) of \(\theta\) is \[ \widehat{\theta}_n^{\mathrm{AV}} = \frac{1}{n} \sum_{i=1}^{n/2} \big(\phi(X_i) + \phi(X'_i) \big) .\]
The expression above for \(\widehat{\theta}_n^{\mathrm{AV}}\) makes it clear that that this is a mean of the sum from each pair. Alternatively, we can rewrite the estimator as \[ \widehat{\theta}_n^{\mathrm{AV}} = \frac{1}{2} \left( \frac{1}{n/2} \sum_{i=1}^{n/2} \phi(X_i) + \frac{1}{n/2} \sum_{i=1}^{n/2} \phi(X_i') \right) , \] which highlights that it is the mean of the estimator from the \(X_i\)s and the the estimator from the \(X'_i\)s.
Example 6.1 Recall Example 2.1 (continued in Example 4.1 and Example 4.2). Here, we were estimating \(\mathbb P(Z > 2)\) for \(Z\) a standard normal.
The basic Monte Carlo estimate was
<- 1e6
n <- rnorm(n)
samples <- mean(samples > 2)
MCest MCest
[1] 0.022713
Can we improve this estimate with an antithetic variable? Well, if \(Z\) is a standard normal, then \(Z' = -Z\) is also standard normal and is not independent of \(Z\). So maybe that could work as an antithetic variable. Let’s try
<- 1e6
n <- rnorm(n / 2)
samples1 <- -samples1
samples2 <- (1 / n) * sum((samples1 > 2) + (samples2 > 2))
AVest AVest
[1] 0.022743
Example 6.2 Let’s consider estimating \(\mathbb E \sin U\), where \(U\) is continuous uniform on \([0,1]\).
The basic Monte Carlo estimate is
<- 1e6
n <- runif(n)
samples <- mean(sin(samples))
MCest MCest
[1] 0.4596232
We used runif(n, min, max)
to generate \(n\) samples on the interval \([\mathtt{min}, \mathtt{max}]\). However, if you omit the min
and max
arguments, then R assumes the default values min = 0
, max = 1
, which is what we want here.
If \(U\) is uniform on \([0,1]\), then \(1 - U\) is also uniform on \([0,1]\). We could try using that as an antithetic variable.
<- 1e6
n <- runif(n / 2)
samples1 <- 1 - samples1
samples2 <- (1 / n) * sum(sin(samples1) + sin(samples2))
AVest AVest
[1] 0.4597191
We have taken \(n/2\) pairs of samples, because that means we have \(n/2 \times 2 = n\) samples over all, which seems like a fair comparison. This is certainly the case if generating the sample and generating its antithetic pair cost roughly the same in terms of time (or energy, or money). However, if generating the first variate of each pair is slow, but then generating the second antithetic variate is much quicker, it might be a fairer comparison to take a full \(n\) pairs.
More generally, you might put a cost \(c_1\) on each first variate and \(c_2\) on each antithetic pair. Then one can compare a cost of \(c_1n\) for standard Monte Carlo with \(n\) samples to a cost of \((c_1 + c_2)m\) for antithetic variables Monte Carlo with \(m\) pairs.
Are these antithetic variables estimates an improvement on the basic Monte Carlo estimate? We’ll find out next time.
Next time: We continue our study of the antithetic variables method with more examples and analysis of the error.
Summary:
Estimation is helped by combining individual estimates that are negatively correlated.
For antithetic variables Monte Carlo estimation, we take pairs of non-independent variables \((X, X')\), to get the estimator \[ \widehat{\theta}_n^{\mathrm{AV}} = \frac{1}{n} \sum_{i=1}^{n/2} \big(\phi(X_i) + \phi(X'_i) \big) . \]
On Problem Sheet 1, you should now be able to answer all questions. You should work through this problem sheet in advance of the problems class on Thursday 17 October.
Read more: Voss, An Introduction to Statistical Computing, Subsection 3.3.2.