7  Antithetic variables II

7.1 Error with antithetic variables

Recall from last time the antithetic variables Monte Carlo estimator. We take sample pairs \[ (X_1, X'_1), (X_2, X'_2), \dots, (X_{n/2}, X_{n/2}') , \] where samples are independent between different pairs but not independent within the same pair. The estimator of \(\theta = \Exg \phi(X)\) is \[ \widehat{\theta}_n^{\mathrm{AV}} = \frac{1}{n} \sum_{i=1}^{n/2} \big(\phi(X_i) + \phi(X'_i) \big) .\] We hope this is better than the standard Monte Carlo estimator if \(\phi(X)\) and \(\phi(X')\) are negatively correlated.

Theorem 7.1 Let \(X\) be a random variable, \(\phi\) a function, and \(\theta = \Exg\phi(X)\). Let \(X'\) have the same distribution as \(X\), and write \(\rho = \operatorname{Corr}(\phi(X_i),\phi(X'_i))\). Let \[ \widehat{\theta}_n^{\mathrm{AV}} = \frac{1}{n} \sum_{i=1}^{n/2} \big(\phi(X_i) + \phi(X_i')\big) \] be the antithetic variables Monte Carlo estimator of \(\theta\). Then:

  1. \(\widehat{\theta}_n^{\mathrm{AV}}\) is unbiased, in that \(\operatorname{bias}\big(\widehat{\theta}_n^{\mathrm{AV}}\big) = 0\).

  2. The variance of of \(\widehat{\theta}_n^{\mathrm{AV}}\) is \[ \operatorname{Var}\big(\widehat{\theta}_n^{\mathrm{AV}}\big) = \frac{1}{2n} \operatorname{Var}\big(\phi(X) + \phi(X')\big) = \frac{1+\rho}{n}\Var\big(\phi(X)\big). \]

  3. The mean-square error of \(\widehat{\theta}_n^{\mathrm{AV}}\) is \[ \operatorname{MSE}\big(\widehat{\theta}_n^{\mathrm{AV}}\big) = \frac{1}{2n} \operatorname{Var}\big(\phi(X) + \phi(X')\big) = \frac{1+\rho}{n}\Var\big(\phi(X)\big). \]

  4. The root-mean-square error of \(\widehat{\theta}_n^{\mathrm{AV}}\) is \[ \operatorname{RMSE}\big(\widehat{\theta}_n^{\mathrm{AV}}\big) = \frac{1}{\sqrt{2n}} \sqrt{\operatorname{Var}\big(\phi(X) + \phi(X')\big)} = \frac{\sqrt{1+\rho}}{\sqrt{n}}\sqrt{\Var\big(\phi(X)\big)}. \]

In points 2, 3 and 4, generally the first expression, involving the variance \(\operatorname{Var}(\phi(X) + \phi(X'))\), is the most convenient for computation. We can estimate this easily from data using the sample variance in the usual way (as we will in the examples below).

The second expression, involving the correlation \(\rho\), is usually clearer for understanding. Comparing these to the same results for the standard Monte Carlo estimator (Theorem 3.2), we see that the antithetic variables method is an improvement (that is, has a smaller mean-square error) when \(\rho < 0\), but is worse when \(\rho > 0\). This proves that negative correlation improves our estimator.

Proof. For unbiasedness, we have \[ \Ex \widehat{\theta}_n^{\mathrm{AV}} = \Ex \left(\frac{1}{n} \sum_{i=1}^{n/2} \big(\phi(X_i) + \phi(X_i')\big)\right) = \frac{1}{n} \,\frac{n}{2} \big(\Exg\phi(X) + \Exg \phi(X')) = \frac{1}{2}(\theta+ \theta) = \theta ,\] since \(X'\) has the same distribution as \(X\).

For the other three points, each of the first expressions follows straightforwardly in essentially the same way. (You can fill in the details yourself, if you need to.) For the second expressions, we have \[\begin{align*} \Var \big(\phi(X) + \phi(X')\big) &= \Var\big(\phi(X)\big) + \Var\big(\phi(X')\big) + 2\operatorname{Cov}\big(\phi(X),\phi(X')\big) \\ &= \Var\big(\phi(X)\big) + \Var\big(\phi(X')\big) + 2\rho\sqrt{\Var\big(\phi(X)\big) \Var\big(\phi(X')\big)} \\ &= \Var\big(\phi(X)\big) + \Var\big(\phi(X)\big) + 2\rho\sqrt{\Var\big(\phi(X)\big) \Var\big(\phi(X)\big)} \\ &= 2(1+\rho)\Var\big(\phi(X)\big) . \end{align*}\] The results then follow.

7.2 Examples

Let’s return to the two examples we tried last time.

Example 7.1 In Example 6.1, we were estimating \(\mathbb P(Z > 2)\) for \(Z\) a standard normal.

The basic Monte Carlo estimate and its root-mean-square error are

n <- 1e6
samples <- rnorm(n)
MCest   <- mean(samples > 2)
MC_RMSE <- sqrt(var(samples > 2) / n)
c(MCest, MC_RMSE)
[1] 0.0225920000 0.0001485989

We then used \(Z' = -Z\) as an antithetic variable. its root-mean-square error are

n <- 1e6
samples1 <- rnorm(n / 2)
samples2 <- -samples1
AVest <- (1 / n) * sum((samples1 > 2) + (samples2 > 2))
AV_RMSE <- sqrt(var((samples1 > 2) + (samples2 > 2)) / (2 * n))
c(AVest, AV_RMSE)
[1] 0.0227100000 0.0001472364

This looked like it made very little difference – perhaps a small improvement. This can be confirmed by looking at the sample correlation with R’s cor() function.

cor(samples1 > 2, samples2 > 2)
[1] -0.02323712

We see there was a very small but negative correlation: the variance, and hence the mean-square error, was reduced by about 2%.

Example 7.2 In Example 6.2, we were estimating \(\mathbb E \sin U\), where \(U\) is continuous uniform on \([0,1]\).

The basic Monte Carlo estimate and its root-mean square error is

n <- 1e6
samples <- runif(n)
MCest <- mean(sin(samples))
MC_RMSE <- sqrt(var(sin(samples)) / n)
c(MCest, MC_RMSE)
[1] 0.459211568 0.000247734

We then used \(U' = 1 - U\) as an antithetic variable

n <- 1e6
samples1 <- runif(n / 2)
samples2 <- 1 - samples1
AVest <- (1 / n) * sum(sin(samples1) + sin(samples2))
AV_RMSE <- sqrt(var(sin(samples1) + sin(samples2)) / (2 * n))
c(AVest, AV_RMSE)
[1] 4.596740e-01 2.483349e-05

This time, we see a big improvement: the root-mean-square error has gone down by a whole order of magnitude, from \(2\times 10^{-4}\) to \(2\times 10^{-5}\). It would normally take 100 times as many samples to reduce the RMSE by a factor of 10, but we’ve got the extra 99 million samples for free by using antithetic variables!

The benefit here can be confirmed by looking at the sample correlation.

cor(sin(samples1), sin(samples2))
[1] -0.9899602

That’s a very large negative correlation, which shows why the antithetic variables made such a huge improvement.

7.3 Finding antithetic variables

Antithetic variables can provide a huge advantage compared to standard Monte Carlo, as we saw in the second example above. The downside is that it can often be difficult to find an appropriate antithetic variable.

To even be able to try the antithetic variables method, we need to find a random variable \(X'\) with the same distribution as \(X\) that isn’t merely an independent copy. Both the examples we have seen of this use a symmetric distribution; that is, a distribution \(X\) such that \(X' = a - X\) has the same distribution as \(X\), for some \(a\).

  • We saw that if \(X \sim \operatorname{N}(0, 1)\) is a standard normal distribution, then \(X' = -X \sim \operatorname{N}(0, 1)\) too. More generally, if \(X\sim \operatorname{N}(\mu, \sigma^2)\), then \(X' = 2\mu - X \sim \operatorname{N}(\mu, \sigma^2)\) can be tried as an antithetic variable.

  • We saw that if \(U \sim \operatorname{U}[0, 1]\) is a continuous uniform distribution on \([0,1]\), then \(U' = 1-U \sim \operatorname{U}[0, 1]\) too. More generally, if \(X\sim \operatorname{U}[a, b]\), then \(X' = (a + b) - X \sim \operatorname{U}[a, b]\) can be tried as an antithetic variable.

Later, when we study the inverse transform method (in Lecture 13) we will see another, more general, way to generate antithetic variables.

But to be a good antithetic variable, we need \(\phi(X)\) and \(\phi(X')\) to be negatively correlated too – preferably strongly so. Often, this is a matter of trial-and-error – it’s difficult to set out hard principles. But there are some results that try to formalise the idea that “nice functions of negatively correlated random variables are themselves negatively correlated”, which can be useful. We give one example of such a result here.

Theorem 7.2 Let \(U \sim \operatorname{U}[0, 1]\) and \(U' = 1 - U\). Let \(\phi\) be a monotonically increasing function. Then \(\phi(U)\) and \(\phi'(U)\) are negatively correlated, in that \(\operatorname{Cov}\big(\phi(U), \phi(U')\big) \leq 0\).

If I were to just write out the proof of this theorem, the proof would be fairly short and easy to understand. But it would be difficult to understand why I had taken the steps I had. So I will try to explain why each of the steps in the proof is a natural one – although doing so may make the proof seem longer and more complicated than it really is.

Proof. The covariance in question is \[ \operatorname{Cov}\big(\phi(U), \phi(U')\big) = \Exg \phi(U)\phi(1-U) - \big(\Exg\phi(U)\big) \big(\Exg\phi(1-U)\big) . \]

We would like to take a single expectation sign all the way outside. But we can’t do this yet, because \(\big(\Exg\phi(U)\big) \big(\Exg\phi(1-U)\big)\) and \(\Exg\phi(U)\phi(1-U)\) are different things. However, we can do this if we introduce a new random variable \(V\) that is also \(\operatorname{U}[0,1]\) but is independent of \(U\). Then we so have \[\big(\Exg\phi(U)\big) \big(\Exg\phi(1-U)\big) = \big(\Exg\phi(U)\big) \big(\Exg\phi(1-V)\big) = \Exg \phi(U)\phi(1-V).\] So now we can write \[ \operatorname{Cov}\big(\phi(U), \phi(U')\big) = \Exg \big(\phi(U)\phi(1-U) - \phi(U)\phi(1-V)\big) . \]

We’ve got a slightly odd asymmetry between \(U\) and \(V\) here, though. Why is the first term \(\phi(U)\phi(1-U)\) and not \(\phi(V)\phi(1-V)\)? Why is the second term \(\phi(U)\phi(1-V)\) and not \(\phi(V)\phi(1-U)\)? Well, maybe to even things up, we can write them as half of one and half of the other. That is, \[\begin{align*} \operatorname{Cov}\big(\phi(U), \phi(U')\big) &= \tfrac12 \Exg \big(\phi(U)\phi(1-U) +\phi(V)\phi(1-V) \\ &\qquad\qquad\quad {}- \phi(U)\phi(1-V) - \phi(V)\phi(1-U)\big) . \end{align*}\]

This expression will factorise nicely. It’s \[ \operatorname{Cov}\big(\phi(U), \phi(U')\big) = \tfrac12 \Exg \big(\phi(U)-\phi(V)\big)\big(\phi(1-U) - \phi(1-V)\big) . \]

We now claim that this expectation is negative. In fact, we have a stronger result: \[\big(\phi(U) - \phi(V)\big)\big(\phi(1-U) - \phi(1-V)\big) \tag{7.1}\] is always negative, so its expectation certainly is. To see this, think separately of the two cases \(U \leq V\) and \(V \leq U\).

  • If \(U \leq V\), then \(\phi(U) \leq \phi(V)\) too, since \(\phi\) is increasing. But, also this means that \(1-U \geq 1-V\), so \(\phi(1 - U) \geq \phi(1-V)\). This means that, in Equation 7.1, the first term is negative and the second term is positive, so the product is negative.

  • If \(V \leq U\), then \(\phi(V) \leq \phi(U)\) too, since \(\phi\) is increasing. But, also this means that \(1-V \geq 1-U\), so \(\phi(1 - V) \geq \phi(1-U)\). This means that, in Equation 7.1, the first term is positive and the second term is negative, so the product is negative.

This completes the proof.

Next time: We come to the third, and most important, variance reduction scheme: importance sampling.

Summary:

  • The antithetic variables estimator is unbiased and has mean-square error \[ \operatorname{MSE}\big(\widehat{\theta}_n^{\mathrm{AV}}\big) = \frac{1}{2n} \operatorname{Var}\big(\phi(X) + \phi(X')\big) = \frac{1+\rho}{n}\Var\big(\phi(X)\big). \]

  • If \(U \sim \operatorname{U}[0, 1]\) and \(\phi\) is monotonically increasing, then \(\phi(U)\) and \(\phi(1-U)\) are negatively correlated.

On Thursday’s lecture, we will be discussing your answers to Problem Sheet 1.

Read more: Voss, An Introduction to Statistical Computing, Subsection 3.3.2.