Section 1 Stochastic processes and the Markov property

Stochastic processes with discrete or continuous state space and discrete or continuous time
The Markov “memoryless” property

1.1 Deterministic and random models

A model is an imitation of a real-world system. For example, you might want to have a model to imitate the world’s population, the level of water in a reservoir, cashflows of a pension scheme, or the price of a stock. Models allow us to try to understand and predict what might happen in the real world in a low risk, cost effective and fast way.

To design a model requires a set of assumptions about how it will work and suitable parameters need to be determined, perhaps based on past collected data.

An important distinction is between deterministic models and random models. Another word for a random model is a stochastic (“sto-kass-tik”) model. Deterministic models do not contain any random components, so the output is completely determined by the inputs and any parameters. Random models have variable outcomes to account for uncertainty and unpredictability, so they can be run many times to give a sense of the range of possible outcomes.

Consider models for:

the future position of the Moon as it orbits the Earth,
the future price of shares in Apple.

For the moon, the random components – for example, the effect of small meteorites striking the Moon’s surface – are not very significant and a deterministic model based on physical laws is good enough for most purposes. For Apple shares, the price changes from day to day are highly uncertain, so a random model can account for the variability and unpredictability in a useful way.

In this module we will see many examples of stochastic models. Lots of the applications we will consider come from financial mathematics and actuarial science where the use of models that take into account uncertainty is very important, but the principles apply in many areas.

1.2 Stochastic processes

If we want to model, for example, the total number of claims to an insurance company in the whole of 2020, we can use a random variable \(X\) to model this – perhaps a Poisson distribution with an appropriate mean. However, if we want to track how the number of claims changes over the course of the year 2021, we will need to use a stochastic process (or “random process”).

A stochastic process, which we will usually write as \((X_n)\), is an indexed sequence of random variables that are (usually) dependent on each other.

Each random variable \(X_n\) takes a value in a state space \(\mathcal S\) which is the set of possible values for the process. As with usual random variables, the state space \(\mathcal S\) can be discrete or continuous. A discrete state space denotes a set of distinct possible outcomes, which can be finite or countably infinite. For example, \(\mathcal S = \{\text{Heads},\text{Tails}\}\) is the state space for a single coin flip, while in the case of counting insurance claims, the state space would be the nonnegative integers \(\mathcal S = \mathbb Z_+ = \{0,1,2,\dots\}\). A continuous state space denotes an uncountably infinite continuum of gradually varying outcomes. For example, the nonnegative real line \(\mathcal S = \mathbb R_+ = \{x \in \mathbb R : x \geq 0\}\) is the state space for the amount of rainfall on a given day, while some bounded subset of \(\mathbb R^3\) would be the state space for the position of a gas particle in a box.

Further, the process has an index set that puts the random variables that make up the process in order. The index set is usually interpreted as a time variable, telling us when the process will be measured. The index set for time can also be discrete or continuous. Discrete time denotes a process sampled at distinct points, often denoted by \(n = 0,1,2,\dots\), while continuous time denotes a process monitored constantly over time, often denoted by \(t \in \mathbb R_+ = \{x \in \mathbb R : x \geq 0\}\). In the insurance example, we might count up the number of claims each day – then the discrete index set will be the days of the year, which we could denote \(\{1,2,\dots,365\}\). Alternatively, we might want to keep a constant tally that we update after every claim, requiring a continuous time index \(t\) representing time across the whole year. In discrete time, we can write down the first few steps of the process as \((X_0, X_1, X_2, \dots)\).

This gives us four possibilities in total:

Discrete time, discrete space
- Example: Number of students attending each lecture of maths module.
- Markov chains – discrete time, discrete space stochastic processes with a certain “Markov property” – are the main topic of the first half of this module.
Discrete time, continuous space
- Example: Daily maximum temperature in Leeds.
- We will briefly mention continuous space Markov chains in the first half of the course, but these are not as important.
Continuous time, discrete space
- Example: Number of visitors to a webpage over time.
- Markov jump processes – continuous time, discrete space stochastic processes with the “Markov property” – are the main topic of the second half of this module.
Continuous time, continuous space
- Example: Level of the FTSE 100 share index over time.
- Such processes, especially the famous Brownian motion – another process with the Markov property – are very important, but outside the scope of this course. See MATH3734 Stochastic Calculus for Finance next year, for example.

1.3 Markov property

Because stochastic processes consist of a large number – even infinitely many; even uncountably infinitely many – random variables that could all be dependent on each other, they can get extremely complicated. The Markov property is a crucial property that restricts the type of dependencies in a process, to make the process easier to study, yet still leaves most of the useful and interesting examples intact. (Although particular examples of Markov processes go back further, the first general study was by the Russian mathematician Andrey Andreyevich Markov, published in 1906.)

Think of a simple board game where we roll a dice and move that many squares forward on the board. Suppose we are currently on the square \(X_n\). Then what can we say about which square \(X_{n+1}\) we move to on our next turn?

\(X_{n+1}\) is random, since it depends on the roll of the dice.
\(X_{n+1}\) depends on where we are now \(X_n\), since the score of dice will be added onto the number our current square,
Given the square \(X_n\) we are now, \(X_{n+1}\) doesn’t depend any further on which sequence of squares \(X_0, X_1, \dots, X_{n-1}\) we used to get here.

It is this third point that is the crucial property of the stochastic processes we will study in this course, and it is called the Markov property or memoryless property. We say “memoryless”, because it’s as if the process forgot how it got here – we only need to remember what square we’ve reached, not which squares we used to get here. The stochastic process before this moment has no bearing on the future, given where we are now. A mathematical way to say this is that “the past and the future are conditionally independent given the present.”

To write this down formally, we need to recall conditional probability: the conditional probability of an event \(A\) given another event \(B\) is written \(\mathbb P(A \mid B)\), and is the probability that \(A\) occurs given that \(B\) definitely occurs. You may remember the definition \[ \mathbb P(A \mid B) = \frac{\mathbb P(A \cap B)}{\mathbb P(B)} , \] although is often more useful to reason directly about conditional probabilities than use this formula.

(You may also remember that the definition of conditional probability requires that \(\mathbb P(B) > 0\). Whenever we write down a conditional probability, we implicitly assume the conditioning event has strictly positive probability without explicitly saying so.)

Definition 1.1 Let \((X_n) = (X_0, X_1, X_2, \dots)\) be a stochastic process in discrete time \(n = 0,1,2,\dots\) and discrete space \(\mathcal S\). Then we say that \((X_n)\) has the Markov property if, for all times \(n\) and all states \(x_0, x_1, \dots,x_n, x_{n+1} \in \mathcal S\) we have \[ \mathbb P(X_{n+1}=x_{n+1} \mid X_{n}=x_{n}, X_{n-1} = x_{n-1}, \dots,X_0=x_0) = \mathbb P(X_{n+1}=x_{n+1} \mid X_{n}=x_{n}) . \]

Here, the left hand side is the probability we go to state \(x_{n+1}\) next conditioned on the entire history of the process, while the right hand side is the probability we go to state \(x_{n+1}\) next conditioned only on where we are now \(x_n\). So the Markov property tells us that it only matters where we are now and not how we got here.

(There’s also a similar definition for continuous time processes, which we’ll come to later in the course.)

Stochastic processes that have the Markov property are much easier to study than general processes, as we only have to keep track of where we are now and we can forget about the entire history that came before.

In the next section, we’ll see the first, and most important, example of a discrete time discrete space Markov chain: the “random walk”.