This sheet sets out the tasks for the computational coursework for MATH5835. You should write a short report on your solutions to these tasks. For administrative and other information about this coursework, see the information in the lecture notes.

This coursework sheet is available in HTML format or in R Markdown format. The R Markdown file should be opened in RStudio; I recommend the “Visual” editor – see the top-left of the editing window.

Part I: Mixture model, Monte Carlo estimation

The distribution of wind speeds in a city follows a mixture of log-normal distributions. If we denote the wind speed (in miles per hour) by \(X\), this means that \[ \log X \sim \operatorname{N}(m_K, s_K^2) , \] where the index \(K\) is random with \[ \mathbb{P}(K = 1) = 0.1 \qquad \mathbb{P}(K = 2) = 0.4 \qquad \mathbb{P}(K = 3) = 0.5 , \] and the means and standard deviations are given by \[ \mathbf{m} = \begin{pmatrix} 2.1 \\ 1.6 \\ -0.5 \end{pmatrix} \qquad \mathbf{s} = \begin{pmatrix} 0.55 \\ 0.7 \\ 0.6 \end{pmatrix} . \]

These can be read into R as follows:

weights <- c(0.1,  0.4,  0.5)
meanlog <- c(2.1,  1.6, -0.5)
sdlog   <- c(0.55, 0.7,  0.6)

The density of the log-normal distribution with parameters \(m \in \mathbb{R}\) and \(s > 0\) is \[ f(x) = \frac{1}{x \sqrt{2 \pi s^2}} \exp\left(-\frac{(\log x - m)^2}{2s^2}\right) \] for all \(x \geq 0\). This function is available in R as dlnorm().

The PDF of the mixture distribution for wind speed can be produced with the following function:

dwind <- function(x, weight, meanlog, sdlog) {
  mixtures <- length(weight)
  pdf <- 0
  for(k in 1:mixtures) {
    pdf <- pdf + weight[k] * dlnorm(x, meanlog[k], sdlog[k])
  }
  return(pdf)
}

# Example
dwind(c(0.1, 0.7, 2.6), weights, meanlog, sdlog)
## [1] 0.03645785 0.46812551 0.06733350

Samples from the wind speed distribution can be generated with the following function:

rwind <- function(n, weights, meanlog, sdlog) {
    mixtures <- length(weights)
    k <- sample(mixtures, n, replace = TRUE, prob = weights)
    rlnorm(n, meanlog[k], sdlog[k])
}

# Example
rwind(5, weights, meanlog, sdlog)
## [1] 5.4708217 3.5207015 6.4939263 0.6224686 0.7901488

We are interested in the probability that the wind speed exceeds 40 miles per hour.

Use basic Monte Carlo estimation to estimate the probability that the wind speed exceeds 40 miles per hour.

Determine the root mean squared error of your estimator.

We now aim to find a more efficient estimator for the probability.

Use importance sampling to estimate the probability that the wind speed is larger than 40 miles per hour. Explain the choices you make in designing your estimator.

Compare the root-mean-square error of your estimator with the previous one.

Part II: Simpler model, Bayesian inference

For simplicity, we now adopt a less complex model, where we assume that the wind speed follows a single log-normal distribution with parameters \(m\) and \(s\), so \(\log X \sim \operatorname{N}(m, s^2)\). Further, we assume that \(m = 0\) is known and \(s\) is unknown. We take a prior distribution that \(s \sim \operatorname{Exp}(1)\) is exponentially distributed with rate 1.

On five different days, we have observed the wind speeds \[ x_1 = 0.50 \quad x_2 = 1.67 \quad x_3 = 2.22 \quad x_4 = 0.22 \quad x_5 = 4.36 . \] Using this data we want to learn about the unknown value \(s\).

From Bayes’ theorem we know that the posterior distribution of \(s\) given the data \(\mathbf{x} = (x_1, \dots, x_5)\) has density \[ \pi(s \mid \mathbf{x}) \propto \pi(s) \prod_{i=1}^5 f(x_i \mid s) , \] where the prior distribution \(\pi = \pi(s)\) is the density of the \(\operatorname{Exp}(1)\) distribution and the likelihood \(f(x_i \mid s)\) is the density of the log-normal distribution with parameters \(m = 0\) and \(s\). (The constant of proportionality is difficult to calculate.)

We want to sample from the posterior density \(\pi(\cdot \mid \mathbf{x})\) using envelope rejection sampling.

Produce a plot of \(\pi(s \mid \mathbf{x})\) (up to the unknown constant) as a function of \(s\) for the observations \(\mathbf{x}\) listed above.

Choose an appropriate proposal density \(g = g(s)\) for the rejection sampling method. Choose a value for the constant \(c\) in the envelope rejection sampling algorithm. Explain why you made these choices, using plots where this helps justify your choices.

Implement the envelope rejection sampling algorithm. Generate at least \(n = 10\,000\) samples from the posterior density \(\pi(\cdot \mid \mathbf{x})\). Plot a histogram of these samples.

Compare the posterior distribution to the prior distribution, and comment on any similarities or differences.

An alternative way to sample from the posterior distribution would be using the random walk Metropolis algorithm, where the proposal distribution is the Gaussian random walk with typical step size \(\sigma\).

Sample from the posterior distribution \(\pi(\cdot \mid \mathbf{x})\) using the random walk Metropolis algorithm. Explain how you chose the value \(\sigma\), and explain any other design choices you make.

Compare your results to those found with envelope rejection sampling. Which method do think was better?


You can get help on this coursework (a) in the computational practical session, (b) during office hours, on Mondays at 1500 in EC Stoner 9.10n, or (c) by submitting your draft work for feedback by Friday 6 December.

Check the information in the lecture notes if you have queries about this coursework.

If you think there may be an error on this problem sheet, please contact .