<- function(n, trans, initial) {
rmarkov <- nrow(trans)
states <- rep(0, n)
MC
1] <- initial
MC[for (i in 1:(n - 1)) MC[i + 1] <- sample(states, 1, prob = trans[MC[i], ])
return(MC)
}
17 Markov chains in discrete space
17.1 Markov chains and MCMC
In the first part of this module, we looked at Monte Carlo methods, where we estimated
However, these methods are often not available when dealing with very complex distributions
For more complicated distributions, we can make progress by loosening the assumption that
Rather than the
having exactly the same distribution as , we might be willing for them to have approximately the same distribution as , and reach the same distribution in the limit as .Rather than having the
be independent, we could let them have some dependence, but we will want to set things up so that we limit the dependence so there is only “light” dependence and so that we understand the dependence structure well.
The way will do this is to allow
Using a Monte Carlo estimator where the samples
In this part of the of the module, we will study MCMC in depth. We will take a brief tour through the theory of Markov chains, then talk about how to use the output of a Markov chain for Monte Carlo estimation. We will look specifically at the Metropolis–Hastings algorithm, which is one way of setting up a Markov chain to have a specific distribution as its limiting distribution, and has some properties in common with rejection sampling ideas we have already seen.
The schedule will be:
Today and Lecture 18: Theory of Markov chains in discrete space
Lecture 19: Metropolis–Hastings algorithm in discrete space
Lecture 20: Theory of Markov chains in continuous space
Lecture 21: Metropolis–Hastings algorithm in continuous space
Lectures 22 and 23: MCMC in practice (including for Bayesian statistics).
Some of you may have studied Markov chains before – for example, in the Leeds second-year module MATH2750 Introduction to Markov Processes. If so, you should find that today and the next lecture are just a brief reminder of things you already know, but the rest of the material is likely to be new.
17.2 Introduction to Markov chains
A Markov chain in discrete time
Think of playing a simple board game where you roll a dice and move that many squares along the board. Let
is random – because it depends on the roll of the dice;
depends on which square
you are on now – because the value of dice roll will be added to your current square;given the square
you are on now, it doesn’t depend which sequence of squares you previously landed on to get there.
Definition 17.1 A sequence of random variables
Example 17.1 Consider a simple model of an unreliable printer:
On day 1, the printer is working.
If the printer is working, then the next day there is a 90% chance it is still working, but a 10% chance it has broken.
If the printer is broken, then the next day there is a 50% chance it has been mended, but a 50% chance it is still broken.
We can model this as a Markov chain on the state space
Example 17.2 Consider the simple random walk on
If
We can also write this as
In both the Markov chains we have looked at – and, indeed, all the Markov chains we will ever look at – the transition probability
Once we have the notation
For the two-state “unreliable printer” Markov chain, the transition matrix is
For the simple random walk, the transition matrix is the “infinite matrix”
The
17.3 Simulation of Markov chains
We can take rmarkov()
, n
is the number of samples (or steps) to take, trans
is the transition matrix initial
is the initial state
The key line is the for
loop in the penultimate line. Here, the next state
Example 17.3 Let’s simulate the two-state broken printer Markov chain from Example 17.1.
<- matrix(c(0.9, 0.1, 0.5, 0.5), 2, 2, byrow = TRUE)
trans <- 1
initial <- rmarkov(80, trans, initial)
MC plot(MC, col = "blue", lwd = 2, type = "b")
In the first line, we entered a matrix(P, 2, 2)
, where P
was a vector of length byrow = TRUE
to ensure that.
Our sample shows that printer spent most of the time working (state 1), but when it did break (state 2) it usually got mended pretty quickly.
Example 17.4 We can simulate the simple random walk from Example 17.4 with the following code.
<- function(n, up) {
rrw <- rep(0, n)
RW <- 1 - up
down
1] <- 0
RW[for (i in 1:(n - 1)) {
+ 1] <- RW[i] + sample(c(1, -1), 1, prob = c(up, down))
RW[i
}
return(RW)
}
So with
<- rrw(100, 0.6)
RW plot(RW, col = "blue", type = "l")
which goes up on average. With
<- rrw(100, 0.3)
RW plot(RW, col = "blue", type = "l")
which goes down on average. The simple symmetric random walk, with
<- rrw(1000, 0.5)
RW plot(RW, col = "black", type = "l", ylim = c(-80, 80))
<- c("red", "orange", "yellow", "green", "blue", "darkblue", "purple")
cols for (i in 1:7) {
<- rrw(1000, 0.5)
RW points(RW, col = cols[i], type = "l")
}
Next time. We continue our whistle-stop tour of discrete-space Markov chains.
Summary:
A Markov chain is a stochastic process where the next step
depends on the current step , but, given current step , does not depend on the past .A Markov chain is governed by its transition probabilities
. These are written in the transition matrix , whose rows add up to 1.The simple random walk on the integers at each step goes up 1 with probability
and down 1 with probability . If , it is a simple symmetric random walk.
Read more: Voss, An Introduction to Statistical Computing, Subsections 2.3.1; my notes for MATH2750 Introduction to Markov Processes.