MATH5835M Statistical Computing

Author

Matthew Aldridge

Published

September 30, 2024

About MATH5835

Organisation of MATH5835

This module is MATH5835M Statistical Computing.

This module lasts for 11 weeks from 30 September to 13 December 2024. The exam will take place between 13 and 24 January 2025.

The module leader, the lecturer, and the main author of these notes is Dr Matthew Aldridge. (You can call me “Matt”, “Matthew”, or “Dr Aldridge”, pronounced “old-ridge”.) My email address is m.aldridge@leeds.ac.uk, although I much prefer questions in person at office hours (see below) rather than by email.

Lectures

The main way you will learn new material for this module is by attending lectures. There are three lectures per week:

  • Mondays at 1400 in Roger Stevens LT 13

  • Thursdays at 1200 in Roger Stevens LT 03

  • Fridays at 1000 in Rogers Stevens LT 09

I recommend taking your own notes during the lecture. I will put brief summary notes from the lectures on this website, but they will not reflect all the details I say out loud and write on the whiteboard. Lectures will go through material quite quickly and the material may be quite difficult, so it’s likely you’ll want to spend time reading through your notes after the lecture. Lectures should be recorded on the lecture capture system; I find it very difficult to read the whiteboard in these videos, but if you unavoidably miss a lecture, for example due to illness, you may find they are better than nothing.

In Weeks 3, 5, 7, 9 and 11, the Thursday lecture will operate as a “problems class” – see more on this below.

Attendance at lectures in compulsory.

Problem sheets and problem classes

Mathematics and statistics are “doing” subjects! To help you learn material for the module and to help you prepare for the exam, I will provide 5 unassessed problem sheets. These are for you to work through in your own time to help you learn; they are not formally assessed and will not be marked by me (or anyone else). You are welcome to discuss work on the problem sheets with colleagues and friends, although my recommendation would be to write-up your “last, best” attempt neatly by yourself.

You should work through each problem sheet in preparation for the problems class in the Thursday lecture of Week 3, 5, 7, 9 and 11. In the problems class, you should be ready to discuss your answers to questions you managed to solve, explain your progress on questions you partially solved, and ask for help on questions you got stuck on. You can also ask for extra help or feedback at office hours (see below).

Coursework

There will be one piece of assessed coursework, which will make up 20% of your module mark. You can read more about the coursework here.

The coursework will be in the form of a worksheet. The worksheet will have some questions, mostly computational but also mathematical, and you will have to write a report containing your answers and computations.

The assessed coursework will be introduced in the computer practical sessions in Week 9.

The deadline for the coursework will be the penultimate day of the Autumn term, Thursday 12 December at 1400. Feedback and marks will be returned on Monday 13 January, the first day of the Spring term.

Office hours

I will run a weekly office hours drop-in session for feedback and consultation. You can come along if you want to talk to me about anything on the course, including if you’d like some feedback on your attempts at problem sheet questions. (For extremely short queries, you can approach me before or after lectures, but my response will often be: “Come to my office hours, and we can discuss it there!”)

Office hours will happen on Mondays from 1500 to 1600 – so directly after the Monday lecture – in my office, which is EC Stoner 9.10n on the 9th floor of the EC Stoner building. (One way my office is via the doors directly opposite the main entrance to the School of Mathematics. You can also get there from Staircase 1 on the Level 10 “red route” through EC Stoner, next to the Maths Satellite.) If you cannot make this time, contact me for an alternative arrangement.

Exam

There will be one exam, which will make up 80% of your module mark.

The exam will be in the January 2025 exam period (13–24 January); the date and time will be announced in December. The exam will be in person and on campus.

The exam will last 2 hours and 30 minutes. The exam will consist of 4 questions, all compulsory. You will be allowed to use a basic non-programmable calculator in the exam.

Content of MATH5835

Necessary background

It is recommended that students should have completed at least two undergraduate level courses in probability and statistics, or something equivalent, although confidence and proficiency in basic material is more important than very deep knowledge. For Leeds undergraduates, the official prerequisite is MATH2715 Statistical Methods, although confidence and proficiency in the material of MATH1710 & MATH1712 Probability and Statistics 1 & 2 is probably more important.

Some knowledge I will assume:

  • Probability: Basic rules of probability; random variables, both continuous and discrete; “famous” distributions (especially the normal distribution and the continuous uniform distribution); expectation, variance, covariance, correlation; law of large numbers and central limit theorem.

  • Statistics: Estimation of parameters; bias and error; sample mean and sample variance

This module will also include an material on Markov chains. I won’t assume any pre-existing knowledge of this, and I will introduce all new material we need, but students who have studied Markov chains before (for example in the Leeds module MATH2750 Introduction to Markov Processes) may find a couple of lectures here are merely a reminder of things they already know.

The lectures will include examples using the R program language. The coursework and problem sheets will require use of R. The exam, while just a “pencil and paper” exam, will require understanding and writing short portions of R code. We will assume basic R capability – that you can enter R commands, store R objects using the <- assignment, and perform basic arithmetic with numbers and vectors. Other concepts will be introduced as necessary. If you want to use R on your own device, I recommend downloading (if you have not already) the R programming language and the program RStudio. (These lecture notes were written in R using RStudio.)

Syllabus

We plan to cover the following topics in the module:

  • Monte Carlo estimation: definition and examples; bias and error; variance reduction techniques: control variates, antithetic variables, importance sampling. [9 lectures]

  • Random number generation: pseudo-random number generation using linear congruential generators; inverse transform method; rejection sampling [7 lectures]

  • Markov chain Monte Carlo (MCMC): [7 lectures]

    • Introduction to Markov chains in discrete and continuous space

    • Metropolis–Hastings algorithm: definition; examples; MCMC in practice; MCMC for Bayesian statistics

  • Bootstrap: Empirical distribution; definition of the bootstrap; bootstrap error; bootstrap confidence intervals [4 lectures]

  • Frequently-asked questions [1 lecture]

Together with the 5 problems classes, this makes 33 lectures.

Book

The following book is strongly recommended for the module:

The library has electronic access to this book (and two paper copies).

Dr Voss is a lecturer in the School of Mathematics and the University of Leeds, and has taught MATH5835 many times. An Introduction to Statistical Computing grew out of his lecture notes for this module, so the book is ideally suited for this module. My lectures will follow this book closely – specifically:

  • Monte Carlo estimation: Sections 3.1–3.3

  • Random number generation: Sections 1.1–1.4

  • Markov chain Monte Carlo: Section 2.3 and Sections 4.1–4.3

  • Bootstrap: Section 5.2

For a second look at material, for preparatory reading, for optional extended reading, or for extra exercises, this book comes with my highest recommendation!