Matthew Aldridge

A (more) probabilistic proof of a geometric result

2026-07-01T00:00:00+00:00

Here’s a cute little paper: it’s called “Generalization of Marion’s theorem: volumes of central polytopes obtained by trisecting the edges of simplices” by Yu. V. Kazakov.

Marion’s theorem is a simple geometric theorem about triangles. For each side of the triangle, trisect the side and connect the two trisection points to the opposite corner. This gives you a hexagon in the middle of the triangle, as shown in this picture:

Marion’s theorem (after Marion Walter) states that the area of this central hexagon is one-tenth the area of the original triangle.

Kazakov’s result is a multidimensional generalisation of this. You can read the paper itself for details of the construction, but the result is that, in $n$ dimensions, the central polytope has volume $1/\binom{2n+1}{n}$ the volume of the original simplex. Marion’s theorem is the $n = 2$ case, for which $\binom{5}{2}$ does indeed equal 10.

But this paper interested me because it reduces the problem to a purely probabilistic one. Again, see the paper for details of why the reduction holds, but the probabilistic setup is this: Let $Y_0, Y_1, \dots, Y_n$ be $n+1$ IID exponentially distributed random variables. What is the probability that the ratio of the largest of the $Y_k$s to the smallest of the $Y_k$s is at most 2? That is, what’s the probability the largest is no more than twice as big as the smallest? The answer to this question is, Kazakov explains, the same as the ratio of the volumes of the two shapes.

The paper calculates this probability “by hand”: it writes down the relevant integral, expands some brackets usng the binomial theorem, calculates the integral as a sum involving binomial coefficients, and cites some combinatorial result that expresses that sum as $1/\binom{2n+1}{n}$. This is fine – but, speaking as a probabilist, I felt like I wanted a more “genuinely probabilistic” argument, that better explains why the answer is $\binom{2n+1}{n}$. The purpose of this blogpost is to record such a proof.

We start off with $n+1$ exponential alarm clocks $Y_0, Y_1, \dots, Y_n$. Eventually one of the alarm clocks rings – let’s say it’s at time $Y^*$ – and that’s the first alarm clock.

At this point, we can take advantage of the memoryless property of the exponential distribution. So, after this first ring, the remaining $n$ alarm clocks can reset themselves to $n$ new exponential distributions $Z_1, \dots, Z_n$. For the ratio of the last alarm clock time to the first alarm clock time to be at most 2, all these new alarm clocks must ring before another $Y^*$ seconds are up.

So in order to “succeed”, the $n$ new reset alarm clocks $Z_1, \dots, Z_n$ must all be shorter than the shortest of the $n+1$ original alarm clocks; that is, shorter than all $n+1$ of those original alarm clocks $Y_0, Y_1, \dots, Y_n$. In other words, of the $2n + 1$ alarm clocks,

\[Y_0, Y_1, \dots, Y_n, Z_1, \dots, Z_n\]

we need the longest $n+1$ of them to be in the first $n+1$ places and the shortest $n$ of them to be in the last $n$ places. Of the $2n+1$ positions, the shortest $n$ will occupy a subset of $n$ of those positions; 1 of the $\binom{2n+1}{n}$ such subsets is the last $n$ positions, so the probability is $1/\binom{2n+1}{n}$.

Guardian 100 best novels (stats and errors)

2026-05-17T00:00:00+00:00

I have been enjoying reading through (and arguing with!) the Guardian’s 100 best novels list. You can see the whole top 100 at that link, but the top 10 is this:

Middlemarch by George Eliot
Beloved by Toni Morrison
Ulysses by James Joyce
To the Lighthouse by Virginia Woolf
In Search of Lost Time by Marcel Proust
Anna Karenina by Leo Tolstoy
War and Peace by Leo Tolstoy
Jane Eyre by Charlotte Brontë
The Great Gatsby by F Scott Fitzgerald
Pride and Prejudice by Jane Austen

On that page, you can also click through to see all the voters and which 10 books each of them voted for. So I thought it would be fun to do a bit of statistical messing around with the votes and see what I could find out. With a bit of rootling around you can find this file, and then – in my case, with help from GPT – you can extract all the voting data. (To save anyone else the effort, you can find that voting data in a much more pleasant CSV file here, on my Github.)

Scoring system

The first task I set myself was to work out how the raw votes were used to compile the top 100. The Guardian doesn’t say exactly how this was done, but in this article we get a hint: “We scored the titles according to how often they were voted for, and then added a weighting based on individual rankings.”

I tried a mixture of guesswork and machine optimisation, but I could never get a system that exactly matched the Guardian’s top 100. In particular, no matter what I tried, My Ántonia by Willa Cather, which is #100 on the Guardian list, kept coming out somewhere around the mid-70s, messing everything up. I now think this is an error – see more on that below – but if I ignore that one book, I can get a match on the rest.

So it looks to me that the scoring method is this:

A book gets 20 points for being mentioned on a list at all.
The book then gets extra points for how high it is on the list: 1 extra point for tenth, 2 extra points for ninth, and so on, up to 10 extra points for first.
So overall, the scores are 21 for tenth, 22 for ninth, up to 30 for first.

The scoring method might not exactly be this – you can probably change the 20 a bit and still get equivalent results. (And of course you can scale the scores by some constant factor without changing anything.) But I’m fairly sure the true scoring method must be pretty close to this.

This method does give a few tied results, which, if my scoring hunch is correct, the Guardian must have decided some way to break. It doesn’t make much difference, though: the first tie is that Blood Meridian by Cormac McCarthy, Crime and Punishment by Fyodor Dostoevsky, and Jude the Obscure by Thomas Hardy are all joint 68th. Also, A Portrait of the Artist as a Young Man misses out on the top 100 on the tie-breaker alone: it’s joint 98th along with three other books that made it onto the list.

Errors

I think the Guardian has made two errors in compiling the votes into the top 100.

This first is My Ántonia. That got four votes; under my scoring – which I think is their scoring too – this gives it 100 points, enough to put it joint 75th, alongside The Bluest Eye by Toni Morrison, Dracula by Bram Stoker, and The Rainbow by DH Lawrence. But in the Guardian’s list it’s #100, the last book to make it onto the list. My suspicion is is that Tahmima Anam’s tenth-place vote for My Ántonia somehow got ignored. That vote gave the book 20 points for being included, plus 1 point for being tenth; without it, the book’s score goes down from 100 to 79, which moves it down from joint-75th to joint-97th, consistent with its ranking of 100.

The second problem is the book by Albert Camus called L’Étranger in French. Its title has been translated as both The Stranger (more common in the US) and The Outsider (more common in the UK). “The Stranger” received two votes, for 51 points, and “The Outsider” also received two votes, for 52 points. Individually, neither of these are enough to get on the list – but, merged together, 103 points for The Stranger/Outsider is enough to catapult it up to 71st place, between Jude the Obscure by Thomas Hardy and Kindred by Octavia E Butler.

Update: The Guardian’s corrections and clarifications column has acknowledged these mistakes (I don’t know whether indirectly from this post?), but they aren’t changing the list:

A production error meant Albert Camus’s The Outsider was omitted from our top 100 novels list; its intended placing was 71. Also, My Ántonia should have been 78, not 100 […] The rankings remain as first published but with the errata acknowledged here.

Update to the update: Actually the Guardian has decided to update the ranking to correct the errors. The Outsider is in at 71; My Ántonia rises to 78 (behind The Bluest Eye and Dracula but ahead of The Rainbow, for whatever tie-breakery reasons). The Go-Between by LP Hartley is unceremoniously shunted out of the top 100. I haven’t changed anything else in this blogpost to take these updates into account, because I can’t be bothered, but I suppose various rankings I mention are going to be out by a couple of places here and there.

Bubbling under

The first fun thing I wanted to with the data was to see which books were just outside the top 100. The following are all the books that received three votes and that missed out only on placings within voters’ lists:

Missing out on the top 100 only by the Guardian’s tie-break:
- A Portrait of the Artist as a Young Man by James Joyce
Joint 103rd:
- Love in the Time of Cholera by Gabriel García Márquez
- The Years by Annie Ernaux
- The Lord of the Rings by J.R.R. Tolkien
- To Kill a Mockingbird by Harper Lee
- Light in August by William Faulkner
Joint 108th
- The Mirror and the Light by Hilary Mantel
- Robinson Crusoe by Daniel Defoe
- The Name of the Rose by Umberto Eco
- The Summer Book by Tove Jansson
Joint 112th:
- Barchester Towers by Anthony Trollope
- A Dance to the Music of Time by Anthony Powell
- Drive Your Plow Over the Bones of the Dead by Olga Tokarczuk
- The Blue Flower by Penelope Fitzgerald
- Alice’s Adventures in Wonderland by Lewis Carroll (credit to Steffen Rayburn-Maarup, who noticed that one of these votes appeared under the title “Alice in Wonderland”)
Joint 117th:
- How to Be Both by Ali Smith
- Money by Martin Amis
- A Month in the Country by JL Carr (Rayburn-Maarup again – the author’s name was inconsistent in the data)
120th:
- American Pastoral by Philip Roth
Joint 121st:
- Huckleberry Finn by Mark Twain
- The Grapes of Wrath by John Steinbeck
- Sense and Sensibility by Jane Austen
- The House of Mirth by Edith Wharton
Joint 125th:
- Infinite Jest by David Foster Wallace
- The Tale of Genji by Murasaki Shikibu
- Villette by Charlotte Brontë
128th:
- Herzog by Saul Bellow
129th:
- Septology by Jon Fosse
Joint 130th:
- The Catcher in the Rye by J. D. Salinger
- Underworld by Don DeLillo
132nd
- The Death of the Heart by Elizabeth Bowen

The Death of the Heart got three tenth-place votes. The highest ranked book to get two votes was NW by Zadie Smith, with one first-place and one second-place vote.

Steffen Rayburn-Maarup also notes that if the two votes for A Wizard of Earthsea by Ursula K Le Guin (the first novel in the Earthsea cycle) were merged with the vote for “Earthsea” (which I would assume is a vote for the cycle of novels as a whole), that would be joint 103rd too.

Best novelists

Another fun one: who are the best novelists? To make this list, I just added up the scores from all each author’s books. Virginia Woolf now jumps over George Eliot to claim the top spot.

The top 10 authors, together with their scores, and their books (most popular first) that received at least two votes, are these:

Virginia Woolf (1687): To the Lighthouse, Mrs Dalloway, Orlando, The Waves, Jacob’s Room, A Room of One’s Own
George Eliot (1669): Middlemarch, Daniel Deronda
Jane Austen (1650): Pride and Prejudice, Emma, Persuasion, Mansfield Park, Sense and Sensibility
Toni Morrison (1501): Beloved, Song of Solomon, The Bluest Eye, Sula
Leo Tolstoy (1319): Anna Karenina, War and Peace
Charles Dickens (1149): Bleak House, David Copperfield, Great Expectations, Our Mutual Friend
James Joyce (1075): Ulysses, A Portrait of the Artist as a Young Man
Marcel Proust (741): In Search of Lost Time
Henry James (731): The Portrait of a Lady, The Golden Bowl, The Turn of the Screw, The Ambassadors
Vladimir Nabokov (697): Lolita, Pale Fire, Pnin

Many authors had enough points to make it onto the top 100 list, if only their voters had been able to converge on which book to choose. The top 10 novelists of those not represented in the top 100 novels are these:

John Steinbeck (178): The Grapes of Wrath, Cannery Row, East of Eden
Don DeLillo (170): Underworld
Saul Bellow (158): Herzog, The Adventures of Augie March
Anthony Trollope (130): Barchester Towers
Angela Carter (129): Nights at the Circus, Wise Children
Iris Murdoch (129): five books with one vote each
Penelope Fitzgerald (127): The Blue Flower, The Beginning of Spring
Evelyn Waugh (121): A Handful of Dust
Abdulrazak Gurnah (120): Afterlives, Paradise
John Updike (119): the Rabbit omnibus got a vote, as did three of its constituent parts

Plus Albert Camus (157: The Outsider/Stranger, The Plague), who should have been on the list already anyway.

Alternative scoring methods

The scoring method adopted here isn’t the only way to convert votes to a ranking. I thought it might be interesting to see how other ways of scoring would change the results.

The main axis along which to compare scoring methods is what I shall call “aggressiveness”. An aggressive scoring method gives big rewards for being at the top of someone’s list and very little credit for being down towards the nine/ten area; while a non-aggressive scoring method gives a big reward for being on someone’s list at all, but only a very small extra reward for being high on that list. It seemed to make sense to look at the two extremes of this axis.

Aggressive scoring

The maximally aggressive method is simply to rank on the number of #1 votes – how many people said this was their favourite novel. If two books are tied on #1 votes, you then look at #2 votes, and so on.

Under this method, the top 10 changes to this:

Middlemarch by George Eliot (19 #1s, no change)
Ulysses by James Joyce (13 #1s, up 1)
Anna Karenina by Leo Tolstoy (7 #1s, up 3)
Beloved by Toni Morrison (7 #1s, down 2)
War and Peace by Leo Tolstoy (7 #1s, up 2)
In Search of Lost Time by Marcel Proust (6 #1s, down 1)
Wuthering Heights by Emily Brontë (6 #1s, up 13)
To the Lighthouse by Virginia Woolf (5 #1s, down 4)
Don Quixote by Miguel de Cervantes (5 #1s, up 17)
Moby-Dick by Herman Melville (4 #1s, up 5)

Some of the big risers up the list on this method include:

Jacob’s Room by Virginia Woolf (29th, up 61)
Catch-22 by Joseph Heller (39th, up 58)
The Road by Cormac McCarthy (47th, up 51)
Life and Fate by Vasily Grossman (43rd, up 48)
Invisible Cities by Italo Calvino (46th, up 47)

Many books that had two or three total votes, one of which was a #1 vote, failed to make the original top 100 but would make the aggressively scored 100. These include:  

NW by Zadie Smith
The Enigma of Arrival by V. S. Naipaul
The Years by Annie Ernaux
Cannery Row by John Steinbeck
The Lord of the Rings by J.R.R. Tolkien

Gentle scoring

We could also look at minimally aggressive scoring. Here, we just rank on total number of votes. Given a tie, we then look at total number of votes if participants were invited to list only 9 books, and so on.

Now, the Guardian’s method is already pretty tame – 21 points for being on a list at all, with only a maximum of 9 more based on position – so this doesn’t change the list very much at all. But, for the record the top 10 would be this:

Middlemarch by George Eliot (56 votes, no change)
Beloved by Toni Morrison (43 votes, no change)
Ulysses by James Joyce (36 votes, no change)
To the Lighthouse by Virginia Woolf (31 votes, no change)
In Search of Lost Time by Marcel Proust (27 votes, no change)
Anna Karenina by Leo Tolstoy (26 votes, no change)
Jane Eyre by Charlotte Brontë (21 votes, up 1)
War and Peace by Leo Tolstoy (20 votes, down 1)
The Great Gatsby by F Scott Fitzgerald (20 votes, up 3)
Pride and Prejudice by Jane Austen (20 votes, down 1)

One book, Love in the Time of Cholera by Gabriel García Márquez, would enter the top 100. There are no huge moves, although maybe A Farewell to Arms by Ernest Hemingway and The Vegetarian by Han Kang would rise a few spots.

Weirdest and least-weird ballots

The voter who was most representative of the electoral college as a whole was Eimear McBride, narrowly beating Siri Hustvedt – at least by one way of measuring representative-ness that I can’t be bothered to get into right now. McBride voted for five out of the top six on the final list; her full ballot was as follows:  

Ulysses by James Joyce (#3)
Crime and Punishment by Fyodor Dostoevsky (#69)
Middlemarch by George Eliot (#1)
In Search of Lost Time by Marcel Proust (#5)
Wuthering Heights by Emily Brontë (#20)
The Magic Mountain by Thomas Mann (#42)
To the Lighthouse by Virginia Woolf (#4)
Anna Karenina by Leo Tolstoy (#6)
Nineteen Eighty-Four by George Orwell (#16)
Moby-Dick by Herman Melville (#15)

The most idiosyncratic voter was Nussaibah Younis – only one of the books on her ballot was voted for by someone else, and even that book only once. Her ballot was as follows:  

The Song of Achilles by Madeline Miller (only vote)
Detransition, Baby by Torrey Peters (only vote)
The Trees by Percival L. Everett (only vote)
The Sellout by Paul Beatty (#201)
Vernon Subutex 1 by Virginie Despentes (only vote)
Love Me Tender by Constance Debré (only vote)
Big Swiss by Jen Beagin (only vote)
Mammoth by Eva Baltasar (only vote)
A Long Way Down by Nick Hornby (only vote)
We All Want Impossible Things by Catherine Newman (only vote)

(This is a correction – I earlier awarded this to Nikesh Shukla, who actually had the fifth-weirdest ballot.)

My ballot

No one asked, but my votes would be:

The Great Gatsby by F Scott Fitzgerald (#11)
One Hundred Years of Solitude by Gabriel García Márquez (#17)
Nineteen Eighty-Four by George Orwell (#16)
The Metamorphosis by Franz Kafka (#48)
The Outsider/Stranger by Albert Camus (should have been #71)
The Unbearable Lightness of Being by Milan Kundera (#490)
Alice’s Adventures in Wonderland by Lewis Carroll (#112)
A Clockwork Orange by Anthony Burgess (no votes)
The Sun Also Rises by Ernest Hemingway (#226)
The Road by Cormac McCarthy (#98)

I declined Chronicle of a Death Foretold and The Old Man and the Sea, which I might slightly prefer to their heftier siblings, on the grounds that they are more novellas than novels. But then broke that rule to allow The Metamorphosis. I feel a bit bad these are all by men – Play It As It Lays by Joan Didion (no votes) nearly made it. And is Alice in Wonderland really a novel exactly, anyway? Perhaps not. Although if it is, why couldn’t I have counted Charlie and the Chocolate Factory too…

The multiset coefficient deserves more respect!

2026-03-09T00:00:00+00:00

Being the second in a series of blogposts quite unnecessarily scolding the reader about the binomial coefficient (Previously: “Don’t write the binomial coefficient as n! / k! (n-k)!”)

The binomial coefficient

\[\binom{n}{k} = \frac{n^{\underline{k}}}{k!} = \frac{n(n-1)\cdots(n-k+1)}{k!}\]

counts the number of ways in which you can choose $k$ objects from a set of $n$ objects. How many ways can I pick a team of $k$ players from a squad of $n$ players? How many hands of $k$ cards could I deal from a deck of $n$ cards. How many different lottery tickets of $k$ numbers are there using the numbers from 1 to $n$?

Importantly, each object can appear at most once: you can’t choose the same player to play twice in the team, or the have same card appear twice in your hand, or choose the same number twice on your lottery ticket.

But sometimes we are interested in choosing $k$ objects from a set of $n$ objects where each object can appear multiple times. How many boxes of $k$ chocolates can be made from a range of $n$ varieties? In how many ways can $k$ small balls be placed into $n$ large boxes? In how many ways can $k$ identical tasks be assigned to $n$ workers?

How can we count these? If you ask a mathematician, they will probably tell you that these can still be counted using the binomial coefficient, just slightly differently. The number of sets with multiplicity – called multisets – is the slightly different binomial coefficient $\binom{n+k-1}{k}$, with $n+k-1$, rather than just $n$, on the top.

There’s a famously elegant method to show that this binomial coefficient counts the number of multisets, called the “stars and bars” construction. Think of the “$k$ balls into $n$ boxes” application. Let’s take $n = 5$ boxes and $k = 7$ balls. We can picture the $n = 5$ boxes by drawing $n - 1 = 4$ bars:

    |     |     |     |

This creates $n = 5$ boxes: the first box is to the left of the first bar; the second box is between the first and second bars; the third box in between the second and third bars; the fourth box in between the third and fourth bars; and the fifth and final box is to the right of the fourth bar.

We can then denote which boxes the balls fall into by using a star to mark each ball.

 ** |     | *** |  *  |  *

This denotes that there are two balls in the first box, no balls in the second box, three balls in the third box, and one each in the fourth and fifth boxes.

We can now even up the unequal gaps in the pattern to leave:

 * * | | * * * | * | *

In total we have a pattern of $n + k - 1 = 11$ symbols: $n - 1 = 4$ bars and $k = 7$ stars. Any such pattern of $n + k - 1$ symbols ($k$ stars and $n - 1$ bars) corresponds to a multiset. So the number of multisets is the numbers of ways to places $k$ stars among $n - k + 1$ positions, which is $\binom{n-k+1}{k}$.

The fact that both normal sets and multisets are both counted by the binomial coefficient (in slightly different ways) is very convenient, this mathematician would probably say. You only have to learn one thing! Whenever you are dealing with multisets, you can just switch immediately to binomial coefficients, and use all the useful facts you already know about the binomial coefficient to help with your maths problem.

But I don’t really like this. It makes the “multiset coefficient” (as I will call it) merely an appendage to the much more important binomial coefficient, unworthy of study in its own right. But I want to raise the status of the multiset coefficient, to value it an equally cherished sibling of the binomial coefficient!

To do this, I’m going to start by giving the multiset coefficient its own notation and algebraic definition. Then I want to go through some of the identities involving the binomial coefficient and give multiset coefficient equivalents of those results – not merely by substituting $\binom{n-k+1}{k}$ into the already-existing binomial identity to give a simple multiset corollary, but rather by taking the logical argument behind the binomial identity and applying it anew to the multiset situation, then seeing what is produced by that analysis.

1. Notation

There does not seem to be a universally recognised notation for the multiset coefficient, but Richard Stanley (author of the legendary Enumerative Combinatorics textbook, which deals with this sort of thing) suggests

\[{\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg)}\]

– like the binomial coefficient, but with double brackets. If it’s good enough for Prof Stanley, it’s good enough for me.

In speech, the binomial coefficient $\binom{n}{k}$ is said as “$n$ choose $k$”. Stanley proposes that $\big(\kern-0.2em\tbinom{n}{k}\kern-0.2em\big)$ should be “$n$ multichoose $k$”, which I also like.

2. Algebraic definition

As we know, the binomial coefficient can be calculated with the expression

\[\binom{n}{k} = \frac{n^{\underline{k}}}{k!} = \frac{n(n-1)\cdots(n-k+1)}{k!} .\]

The argument here is that the “falling factorial” $n^{\underline{k}} = n(n-1)\cdots(n-k+1)$ in the numerator is the number of ways to count sets where the order does matter, then dividing $k!$ compensates for each set having been chosen in $k!$ different orders.

The equivalent expression for the multiset coefficient is

\[{\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg)} = \frac{n^{\overline{k}}}{k!} = \frac{n(n+1)\cdots(n+k-1)}{k!} .\]

Again, this can be justified by the numerator – here a “rising factorial” $n^{\overline{k}} = n(n+1)\cdots(n+k-1)$ – being the count of the multisets where the order matters, and the denominator $k!$ allowing for the different orderings.

Why do we get this rising factorial when the order matters? I like to think of hanging $k$ flags on $n$ flagpoles:

The first flag has $n$ choices: it can go on any of the $n$ flagpoles.
The second flag now has $n+1$ choices: either it goes on one of the $n - 1$ empty flagpoles, or it goes on the same pole as the first flag, in which case it can either go above the first flag or below it. Over all, that’s $(n-1) + 2 = n+1$ choices.
The third flag has $n+2$ choices. If the first two flags went on different poles, we have $n-2$ empty flagpoles, above or below the first flag, and above or below the second flag, making $(n-2) + 2 + 2 = n+2$. If the first two flags went on the same pole, we have $n-1$ empty poles, or the busy pole: above both flags, in between them, or below both flags; that’s $(n-1)+3 = n+2$ as well.

As we go, each flag creates an extra space, either by splitting an empty pole into “above or below the new flag”, or by splitting the “bit of pole” it gets attached to into directly above or directly below the new flag. Hence we get $n^{\overline{k}} = n(n+1)\cdots(n+k-1)$ multisets where the order matters. Dividing by the $k!$ orderings of the flags gives the expression we were after.

3. A special item

The most famous identity involving the binomial coefficient is Pascal’s formula:

\[\binom{n}{k} = \binom{n-1}{k} + \binom{n-1}{k-1} .\]

To decide what the multiset coefficient equivalent of this is, we’ll have to think about what it shows. Statements like these are best proven using a “double counting” argument: that is, you show that both sides of the equation are counting the same thing in different ways.

Here, the left-hand side $\binom{n}{k}$ is, of course, just the number of ways of choosing $k$ items from $n$ items. Suppose one of the objects is “special” somehow; then we can count separately the number of sets that don’t include the special item and those that do include the special item. If the special item isn’t included, then we need to pick all $k$ items from the $n-1$ non-special items, which can be done in $\binom{n-1}{k}$ ways. If the special item is included, then we only need $k-1$ more of the $n-1$ non-special items, which can be done in $\binom{n-1}{k-1}$ ways. Adding these two together gives the right-hand side.

Almost the same argument works with the multiset coefficient. If the special item isn’t included, then we need to pick all $k$ items from the $n-1$ non-special items, which can be done in $\big(\kern-0.2em\tbinom{n-1}{k}\kern-0.2em\big)$ ways. If the special item is included, then we only need $k-1$ more items – but there are still $n$ choices, not $n-1$, because we’re allowed to pick yet more of the special item, so this gives $\big(\kern-0.2em\tbinom{n}{k-1}\kern-0.2em\big)$.

The multiset coefficient version of Pascal’s formula is therefore

\[{\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg)} = {\bigg(\kern-0.4em\dbinom{n-1}{k}\kern-0.4em\bigg)} + {\bigg(\kern-0.4em\dbinom{n}{k-1}\kern-0.4em\bigg)} .\]

4. A boss

What about this binomial coefficient identity:

\[k\binom{n}{k} = n \binom{n-1}{k-1} .\]

This counts the number of ways of picking a board of $k$ people from an office of $n$ employees, with one of those people being the boss of the board. We can pick the board in $\binom{n}{k}$ ways, then promote one of those $k$ board-members to be the boss, giving the left-hand side. Alternatively, we can pick one of the $n$ employees to be the boss, then fill out the $k-1$ non-boss board positions from the remaining $n-1$ employees in $\binom{n-1}{k-1}$ ways, giving the right-hand side.

A similar argument works for the multiset coefficient. We can place $k$ flags on $n$ poles, then pick one of them top be the “boss-flag”. Alternatively, we can pick one of $n$ poles to put the boss-flag on. We then need $k-1$ more flags to be put in $n + 1$ locations: the $n-1$ empty poles, above the boss-flag, or below the boss-flag. The new identity is, then:

\[k{\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg)} = n {\bigg(\kern-0.4em\dbinom{n+1}{k-1}\kern-0.4em\bigg)} .\]

5. Many bosses

Suppose the board has not merely one boss but a sub-board of $j$ bosses. Again, we can pick $k$ board-members then promote $j$ of them, or we can pick $j$ bosses then the $k-j$ remaining non-bosses. We get

\[\binom{k}{j} \binom{n}{k} = \binom{n}{j}\binom{n-j}{k-j} .\]

The multiset version of this is also similar. We can put up $k$ flags on $n$ poles, then pick $j$ of them to be boss-flags. In this second step, we can only pick each hoisted flag at most once to be a boss-flag, so that gives a binomial, not multiset, coefficient on the left-hand side. As before, if we pick the $j$ boss-flags first, this creates $n+j$ spaces for the remaining flags, since each boss flag divides a location in two. We get:

\[\binom{k}{j} {\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg)} = {\bigg(\kern-0.4em\dbinom{n}{j}\kern-0.4em\bigg)} {\bigg(\kern-0.4em\dbinom{n+j}{k-j}\kern-0.4em\bigg)} .\]

I’m not sure whether I should be fully satisfied with this one or not: it feels a bit of a compromise for one of the coefficients in this expression to be a rogue binomial coefficient, rather than all four terms being multiset coefficients. But I can’t think of anything better – can you? (Alternatively, is there an “opposite” identity with three binomial coefficients and one multiset coefficient?)

6. Generating function

The generating function of the binomial coefficients is

\[\sum_{k=0}^n \binom{n}{k} x^k = (1 + x)^n .\]

The generating function of the multiset coefficients is

\[\sum_{k=0}^\infty {\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg)} x^k = (1 - x)^{-n} .\]

To see the first of these, imagine multiplying out the brackets of

\[(1+x)^n = (1+x)(1+x)\cdots(1+x) .\]

To get a term $x^k$, you need to have picked an “$x$” from $k$ of the sets of brackets and 1s from the other $n-k$ sets of brackets. This can be done in $\binom{n}{k}$ ways.

For the second, we use the geometric progression formula

\[(1-x)^{-1} = (1 + x + x^2 + x^3 + \cdots)\]

and imagine multiplying out

\[(1-x)^{-n} = (1 + x + x^2 + \cdots)\cdots (1 + x + x^2 + \cdots) .\]

To get a term $x^k$, you need to have picked an “$x$” $k$ times, but that could be multiple $x$’s from the same bracket by picking an $x^2$ or $x^3$ and so on. So we are choosing with multiplicities, giving $\big(\kern-0.2em\tbinom{n}{k}\kern-0.2em\big)$ ways.

7. Maximum item

Another binomial coefficient identity is this:

\[\sum_{m=1}^n \binom{m-1}{k-1} = \binom{n}{k} ,\]

known as the “hockeystick identity”, due to the shape the relevant coefficients draw on Pascal’s triangle. (Some people prefer to write the lower limit in the sum as $m = k$, since the summands for $m = 1, 2, \dots, k - 1$ are all 0.)

This identity comes from counting the $k$-subsets of ${1, 2, \dots, n}$ based on their maximum item. If the maximum item is $m$, then you need to choose the remaining $k - 1$ items from the $m-1$ items that are smaller than $m$.

For multisets, the argument is almost the same, but you can pick more “joint-maximum” items that are equal to $m$ if you want. So you need to choose the remaining $k-1$ items from the $m$ items that are smaller than or equal to $m$. Hence we get

\[\sum_{m=1}^n \bigg(\kern-0.4em\dbinom{m}{k-1}\kern-0.4em\bigg) = \bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg)\]

(where all the summands are nonzero).

9. Men and women

Vandermonde’s identity is the result

\[\binom{n+m}{k} = \sum_{j=0}^k \binom{n}{j} \binom{m}{k-j} .\]

Suppose the office has $n+m$ employees: $n$ women and $m$ men. Again, we want to choose a board of $k$ employees, which can be done in $\binom{n+m}{k}$ ways. We can also count the boards by their gender split: the all male boards, those with 1 woman and $k-1$ men, and so on. The number of ways to pick a board with $j$ women and the remaining $k-j$ members being men is the product $\binom{n}{j} \binom{m}{k-j}$.

Exactly the same argument works for multisets, so we have the same result:

\[\bigg(\kern-0.4em\dbinom{n+m}{k}\kern-0.4em\bigg) = \sum_{j=0}^k \bigg(\kern-0.4em\dbinom{n}{j}\kern-0.4em\bigg) \bigg(\kern-0.4em\dbinom{m}{k-j}\kern-0.4em\bigg) .\]

10. Symmetry

I’ve left an important binomial identity – maybe the most important one – until last: the symmetry relation

\[\binom{n}{k} = \binom{n}{n-k} .\]

You can choose a set of $k$ items from $n$ items; but, alternatively, you can choose the $n-k$ items that are not going to be in your set, leaving the $k$ items you want left over.

I left this until last, because I wasn’t sure what the right multiset version of this is. The following is certainly a symmetry relation, at least. Recall the stars-and-bars construction from before: we had

$n-1$ bars, which define $n$ boxes,
$k$ stars, representing $k$ objects.

What if we now swap the roles of the stars and the bars – so now the stars are defining the boxes, into which we places some bars representing objects? We would then have

$k$ stars, which define $k+1$ boxes,
$n-1$ bars, representing $n-1$ objects.

Since these are both counting the same patterns of stars and bars, we get

\[\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = \bigg(\kern-0.4em\dbinom{k+1}{n-1}\kern-0.4em\bigg) .\]

(This argument sets up an explicit bijection between $k$-submultisets of an $n$-set and $(n-1)$-submultisets of a $(k+1)$-set. I admit I’d never thought about this bijection before. I’d like to know more about its properties.)

It’s not totally clear this is the right multiset generalisation of the binomial coefficient symmetry relation – in particular, the swapping over of top and bottom (the $k$ now appears on the top of the multiset coefficient on the right-hand side of the equation) seems a bit weird. But I think it does work: if you do the forbidden thing I’ve told you not to do and convert the multiset coefficient to a binomial coefficient, apply the symmetry relation, then convert back, you get

\[\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = \binom{n+k-1}{k} = \binom{n+k-1}{n-1} = \binom{(k+1) + (n-1) - 1}{n-1} = \bigg(\kern-0.4em\dbinom{k+1}{n-1}\kern-0.4em\bigg) ,\]

which is what we claimed, although don’t tell anyone I did this.

In summary

So to gather everything together, here are the results we’ve discussed:

Binomial coefficient	Multiset coefficient
$\dbinom{n}{k} = \dfrac{n^{\underline{k}}}{k!}$	$\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = \dfrac{n^{\overline{k}}}{k!}$
$\dbinom{n}{k} = \dbinom{n}{n-k}$	$\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = \bigg(\kern-0.4em\dbinom{k+1}{n-1}\kern-0.4em\bigg)$
$\dbinom{n}{k} = \dbinom{n-1}{k} + \dbinom{n-1}{k-1}$	$\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = \bigg(\kern-0.4em\dbinom{n-1}{k}\kern-0.4em\bigg) + \bigg(\kern-0.4em\dbinom{n}{k-1}\kern-0.4em\bigg)$
$k\dbinom{n}{k} = n \dbinom{n-1}{k-1}$	$k\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = n \bigg(\kern-0.4em\dbinom{n+1}{k-1}\kern-0.4em\bigg)$
$\dbinom{k}{j} \dbinom{n}{k} = \dbinom{n}{j}\dbinom{n-j}{k-j}$	$\dbinom{k}{j} \bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = \bigg(\kern-0.4em\dbinom{n}{j}\kern-0.4em\bigg) \bigg(\kern-0.4em\dbinom{n+j}{k-j}\kern-0.4em\bigg)$
$\displaystyle\sum_{k=0}^n \binom{n}{k} x^k = (1 + x)^n$	$\displaystyle\sum_{k=0}^\infty \bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) x^k = (1 - x)^{-n}$
$\displaystyle\sum_{m=1}^n \binom{m-1}{k-1} = \dbinom{n}{k}$	$\displaystyle\sum_{m=1}^n \bigg(\kern-0.4em\dbinom{m}{k-1}\kern-0.4em\bigg) = \bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg)$
$\dbinom{n+m}{k} = \displaystyle\sum_{j=0}^k \dbinom{n}{j} \dbinom{m}{k-j}$	$\bigg(\kern-0.4em\dbinom{n+m}{k}\kern-0.4em\bigg) = \displaystyle\sum_{j=0}^k \bigg(\kern-0.4em\dbinom{n}{j}\kern-0.4em\bigg) \bigg(\kern-0.4em\dbinom{m}{k-j}\kern-0.4em\bigg)$

And remember: the second column is no less important than the first column! Justice for the multiset coefficient!

4 is discrete π

2026-02-01T00:00:00+00:00

Recap

In my last blogpost, I looked for a discrete version of the exponential function

\[\exp(x) = \frac{x^0}{0!} + \frac{x^1}{1!} + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots .\]

By swapping the standard power $x^n$ for the falling factorial power $x^\underline{n} = x(x-1)\cdots(x-n+1)$, we got the “discrete exponential”

\[\begin{align} \operatorname{dexp}(x) &= \frac{x^\underline{0}}{0!} + \frac{x^\underline{1}}{1!} + \frac{x^\underline{2}}{2!} + \frac{x^\underline{3}}{3!} + \cdots \\ &= \binom x0 + \binom x1 + \binom x2 + \binom x3 + \cdots , \end{align}\]

which is just $\operatorname{dexp}(x) = 2^x$.

Pleasingly, just as $\frac{\mathrm{d}}{\mathrm dx} \exp(x) = \exp(x)$, so we have a “discrete equivalent result” $\Delta \operatorname{dexp}(x) = \operatorname{dexp}(x)$, where $\Delta$ is the discrete difference $\Delta f(x) = f(x+1)-f(x)$.

We also found that, more generally, the discrete equivalent of $\exp(\alpha x)$ (considered as a function of $x$) is $(1+\alpha)^x$.

Discrete cos and sin (non-negative integer x)

So, for the next step, how about discrete sine and discrete cosine? Can we define discrete versions of sin and cos that end up having similar properties to their standard trigonometric counterparts?

The standard Taylor series definitions are

\[\begin{align} \cos(x) &= \frac{x^0}{0!} - \frac{x^2}{2!} + \frac{x^4}{4!} - \cdots \\ \sin(x) &= \frac{x^1}{1!} - \frac{x^3}{3!} + \frac{x^5}{5!} - \cdots \end{align}\]

By our now-standard procedure of swapping powers for falling factorials then recognising $x^\underline{n}/n!$ as a binomial coefficient, we should take the following definitions:

\[\begin{align} \operatorname{dcos}(x) &= \binom x0 - \binom x2 + \binom x4 - \cdots \\ \operatorname{dsin}(x) &= \binom x1 - \binom x3 + \binom x5 - \cdots \end{align}\]

For the moment, let’s stick to non-negative integer $x = 0, 1, 2, \dots$. The discrete cos sequence is A146559

\[1, 1, 0, -2, -4, -4, 0, 8, 16, 16, 0, -32, -64, -64, 0, 128, 256, 256, \dots\]

and the discrete sin sequence is A009545

\[0, 1, 2, 2, 0, -4, -8, -8, 0, 16, 32, 32, 0, -64, -128, -128, 0, 256, \dots\]

Pleasingly, we can see the sequences are related by the difference operator:

\[\begin{align} \Delta \operatorname{dcos}(x) &= - \operatorname{dsin}(x) \\ \Delta \operatorname{dsin}(x) &= \operatorname{dcos}(x) ; \end{align}\]

which is exactly the same structure as the derivatives of usual sin and cos:

\[\begin{align} \frac{\mathrm{d}}{\mathrm{d}x} \operatorname{cos}(x) &= - \operatorname{sin}(x) \\ \frac{\mathrm{d}}{\mathrm{d}x}\operatorname{sin}(x) &= \operatorname{cos}(x) . \end{align}\]

Further, just by looking at the two sequences, we can tell that we have sort-of-periodic behaviour modulo 8.

x	dcos	dsin
$x = 0 \bmod 8$	$\operatorname{dcos}(x) = 2^{x/2}$	$\operatorname{dsin}(x) = 0$
$x = 1 \bmod 8$	$\operatorname{dcos}(x) = 2^{(x-1)/2}$	$\operatorname{dsin}(x) = 2^{(x-1)/2}$
$x = 2 \bmod 8$	$\operatorname{dcos}(x) = 0$	$\operatorname{dsin}(x) = 2^{x/2}$
$x = 3 \bmod 8$	$\operatorname{dcos}(x) = -2^{(x-1)/2}$	$\operatorname{dsin}(x) = 2^{(x-1)/2}$
$x = 4 \bmod 8$	$\operatorname{dcos}(x) = -2^{x/2}$	$\operatorname{dsin}(x) = 0$
$x = 5 \bmod 8$	$\operatorname{dcos}(x) = -2^{(x-1)/2}$	$\operatorname{dsin}(x) = -2^{(x-1)/2}$
$x = 6 \bmod 8$	$\operatorname{dcos}(x) = 0$	$\operatorname{dsin}(x) = -2^{x/2}$
$x = 7 \bmod 8$	$\operatorname{dcos}(x) = 2^{x/2}$	$\operatorname{dsin}(x) = -2^{(x-1)/2}$

These look very reminiscent of how the usual cos and sin behave: we have periodic zeros, with the functions alternately positive and negative in the intervals between those zeros. More specifically, we have for the usual trigonometric functions:

sin has zeros at $0, \pi, 2\pi, 3\pi, \dots$
cos has zeros at $\frac{\pi}{2}, \frac{3\pi}{2}, \frac{5\pi}{2}, \dots$
cos and sin both have period $2\pi$

and for our discrete equivalents:

dsin has zeros at $0, 4, 8, 12, \dots, = 0, 4, 2\times 4, 3 \times 4, \dots$
dcos has zeros at $2, 6, 10, \dots = \frac{4}{2}, \frac{3 \times 4}{2}, \frac{5 \times 4}{2}, \dots$
dcos and dsin both have sort-of-periodic behaviour with “period” $8 = 2\times 4$

It seems clear from these that for our new discrete trigonometric operations, 4 is playing the role that $\pi$ plays for the normal trigonometric operations: whence this blogpost’s title.

On the other hand, the usual sin and cos are bounded between -1 and +1, which is not true of the discrete sin and cos, which seem to growing over time (between the periodic zeros). For example, the standard sin and cos have

$\cos(x)^2 + \sin(x)^2 = 1$
$\sin(x) = \cos\big(x - \frac{\pi}{2}\big)$

while the discrete sin and cos have

$\operatorname{dcos}(x)^2 + \operatorname{dsin}(x)^2 = 2^x$
$\operatorname{dsin}(x) = 2\operatorname{dcos}(x - 2)$

In fact, if we really wanted to coerce the discrete sin and cos into the usual versions, we could do so: it is the case that

\[\begin{align} \operatorname{dcos}(x) &= 2^{x/2} \cos \left(\frac{\pi x}{4} \right) \\ \operatorname{dsin}(x) &= 2^{x/2} \sin \left(\frac{\pi x}{4} \right) , \end{align}\]

at least for non-negative integer $x$, thanks to the convenient expression $\sin (\frac{\pi}{4}) = \cos (\frac{\pi}{4}) = 1/\sqrt{2}$. Can I prove these formulas for dcos in terms of cos and dsin in terms of sin? Might they even be true for non-integer $x$?

Discrete cos and sin (general x)

As is probably clear from the disorganised nature of this post, I’m writing as I work things out. And, at this point, it’s just occurred to me that I may have been going about things the wrong way.

I had started with the Taylor series definitions of cos and sin. But maybe I should have started with these definitions instead:

\[\begin{align} \cos(x) &= \frac{\mathrm{e}^{\mathrm{i}x} + \mathrm{e}^{-\mathrm{i}x}}{2} \\ \sin(x) &= \frac{\mathrm{e}^{\mathrm{i}x} - \mathrm{e}^{-\mathrm{i}x}}{2\mathrm{i}} . \end{align}\]

I argued last time that the discrete equivalent of $\mathrm{e}^{\alpha x}$ is $(1 + \alpha)^x$, so this suggests we should take the definitions

\[\begin{align} \operatorname{dcos}(x) &= \frac{(1+\mathrm{i})^x + (1-\mathrm{i})^x}{2} \\ \operatorname{dsin}(x) &= \frac{(1+\mathrm{i})^x - (1-\mathrm{i})^x}{2\mathrm{i}} . \end{align}\]

Are these the same as the binomial coefficient definitions from before? Yes they are: use the binomial theorem

\[\begin{align} (1 + i)^x &= \sum_{n=0}^\infty \binom{x}{n} \mathrm{i}^n \\ (1 - i)^x &= \sum_{n=0}^\infty \binom{x}{n} (-\mathrm{i})^n , \end{align}\]

and check where you get constructive or destructive interference in the sums.

This generalises much more easily to real values of $x$. And, more importantly, allows me easily to draw some pictures. Below, discrete cos is in blue and discrete sin in red; the points are the integer values.

Is it true that for all real $x$ we have

\[\begin{align} \operatorname{dcos}(x) &= 2^{x/2} \cos \left(\frac{\pi x}{4} \right) \\ \operatorname{dsin}(x) &= 2^{x/2} \sin \left(\frac{\pi x}{4} \right) , \end{align} ,\]

as I suggested earlier? Yes it is. To see this, we want to put our complex numbers into modulus–argument form: that is, $1 + i = \sqrt{2} \mathrm{e}^{\mathrm i \pi/4}$ and $1 + i = \sqrt{2} \mathrm{e}^{-\mathrm i \pi/4}$. Then we have

\[\operatorname{dcos}(x) = \frac{\big(\sqrt{2} \mathrm{e}^{\mathrm i \pi/4}\big)^x + \big(\sqrt{2} \mathrm{e}^{\mathrm i \pi/4}\big)^x}{2} = (\sqrt{2})^x \, \frac{\mathrm{e}^{\mathrm i \pi x/4} + \mathrm{e}^{\mathrm i \pi x/4}}{2} = 2^{x/2} \cos \left(\frac{\pi x}{4} \right) ,\]

and similarly for dsin. This proves the result.

2 is discrete e

2026-01-29T00:00:00+00:00

Here’s a question that came into my mind as I was falling asleep last night: What’s the discrete equivalent of the exponential function $\exp(x) = \mathrm{e}^x$?

What did my brain mean by this question? Maybe something like this: If you take a definition of the exponential, what is the most plausible (or a plausible) generalisation of that where more naturally continuous operations are replaced by more naturally discrete ones? This is not, of course, a formally precisely stated question, but seemed intriguing enough to think about for as long as I remained awake.

Two definitions of the exponential I thought of were these. First, the exponential is the solution to the differential equation $f’(x) = f(x)$. Second, the exponential is defined by the Taylor series

\[\exp(x) = \sum_{n=0}^\infty \frac{x^n}{n!} .\]

Let’s start with the first, the differential equation definition. The derivative $f’$ is a “naturally continuous” operation; a more natural discrete equivalent is the discrete difference $\Delta f(x) = f(x+1)-f(x)$. So if the “continuous exponential” satisfies $f’(x) = f(x)$, perhaps the “discrete exponential” should satisfy $\Delta f(x) = f(x)$. A few moments thought should be enough to convince you that the solution is $f(x) = 2^x$, since

\[2^{x+1} - 2^x = 2^x(2 - 1) = 2^x .\]

What about the second, the Taylor series equation definition. I would argue that to get a discrete equivalent it makes sense to replace the power $x^n$ by the falling factorial power $x^{\underline{n}} = x(x-1)\cdots(x-n+1)$ – this came up in my discussion of sums of powers a while ago. In that case we get

\[f(x) = \sum_{n=0}^\infty \frac{x^{\underline{n}}}{n!} = \sum_{n=0}^\infty \binom{x}{n} ,\]

which has become a sum of binomial coefficients. For integer $x$, this is the total number of subsets from a collection of $x$ items, summed over the subsets of size $n = 0, 1, 2, \dots$. This total number of subsets is $2^x$ (each item can be either included in the subset or not), giving $f(x) = 2^x$ again. (This also true non-integer $x$, by the binomial theorem.)

So the two different definitions have both suggested to us that $2^x$ is the discrete equivalent of $\mathrm{e}^x$: or that “2 is discrete e”, as I put it (slightly facetiously) in the title of this blogpost.

More generally, we could look at a discrete equivalent of $\mathrm{e}^{\alpha x} = \exp(\alpha x)$, thinking of $\alpha$ as a fixed constant and $x$ as the argument of the function. This satisfies the differential equation $f’(x) = \alpha f(x)$; moving to the discrete equivalent $\Delta f(x) = \alpha f(x)$ gives $f(x) = (1 + \alpha)^x$, since

\[(1+\alpha)^{x+1} - (1+\alpha)^x = (1+\alpha)^x(1 + \alpha - 1) = \alpha(1+\alpha)^x .\]

Similarly, the Taylor series approach suggests

\[f(x) = \sum_{n=0}^\infty \frac{\alpha^n x^{\underline{n}}}{n!} = \sum_{n=0}^\infty \binom{x}{n} \alpha^n = (1+\alpha)^x ,\]

too. (Since $\alpha$ is the constant, I don’t think it needs to be “discretised” from the power $\alpha^n$ to any falling factorial.) Both ways, we get the same equivalent $(1 + \alpha)^x$; of course, setting $\alpha = 1$ gets back $2^x$, as before.

Two appendices

I mentioned this to Pete and Ben at lunchtime, and they thought of two interesting extra directions to take this.

Pete asked what if we look for a discrete equivalent not on the integers but on a mesh of width $h$. That would suggest using the discrete difference

\[\Delta_h f(x) = \frac{f(x + h) - f(x)}{h}\]

instead. It’s easy to check that $\Delta_h f(x) = f(x)$ gives $(1 + h)^{x/h}$, since

\[\frac{(1 + h)^{(x+h)/h} + (1 + h)^{x/h}}{h} = \frac{(1 + h)^{x/h}\big((1+h)^{1} - 1\big)}{h} = \frac{h}{h}(1 + h)^{x/h} = (1 + h)^{x/h} .\]

Note that setting $h = 1$ gets back $(1 + 1)^{x/1} = 2^x$, while sending $h \to 0$ gives the limiting value

\[(1 + h)^{x/h} = \big( (1 + h)^{1/h} \big)^x \to \mathrm{e}^x ,\]

the continuous exponential. The same argument suggests the width-$h$ discrete equivalent of $\mathrm{e}^{\alpha x}$ is $(1+\alpha h)^{x/h}$.

Ben pointed out there’s a “mirror image” way to discretise things: instead of the discrete forward difference $\Delta f(x) = f(x+1) - f(x)$ you use the discrete backward difference $\nabla f(x) = f(x) - f(x-1)$ and instead of the falling factorial $x^{\underline n} = x(x-1) \cdots (x-n+1)$ you use the rising factorial $x^{\overline n} = x(x+1) \cdots (x+n-1)$.

The backward difference equation $\nabla f(x) = \alpha f(x)$ has solution $f(x) = (1 - \alpha)^{-x}$, since

\[(1 - \alpha)^{-x} - (1-\alpha)^{-(x-1)} = (1-\alpha)^{-x}\big(1 - (1 - \alpha)\big) = \alpha (1-\alpha)^{-x} .\]

Similarly, the Taylor series gives

\[f(x) = \sum_{n=0}^\infty \frac{\alpha^n x^{\overline{n}}}{n!} = \sum_{n=0}^\infty \binom{x+n-1}{n} \alpha^n = (1-\alpha)^{-x} ,\]

by a moderately famous result I’m not going to prove here. Either way, we get $(1 - \alpha)^{-x}$ the mirror-image way, compared to $(1 + \alpha)^x$ the usual way. But here, you can’t set $\alpha = 1$ to get back “discrete e”, because at $\alpha = 1$ you get $0^{-x} = (1/0)^x$, and division by 0 isn’t allowed.

Don’t write the binomial coefficient as n! / k! (n-k)!

2025-11-21T00:00:00+00:00

The binomial coefficient $\binom{n}{k}$, pronounced “$n$ choose $k$”, is the number of ways of choosing a collection of $k$ objects from a set of $n$ objects.

To calculate the binomial coefficient, it’s best to start by thinking about picking the $k$ objects in order. There are $n$ choices for the first object, then $n-1$ choices for the second object (since you can’t pick the one you just picked again), $n-2$ for the third, and so on, down to $n-k+1$ for the final object (avoiding the $k-1$ you’ve already picked). So to pick the objects in order, there are

\[n^{\underline{k}} = n \times (n-1) \times \cdots \times (n-k+1)\]

ways to do it. This number $n^{\underline{k}}$ is called the falling factorial (said “$n$ to the $k$ falling”) or permutation number. But we wanted the objects chosen where the order doesn’t matter. So we’ve over-counted. Every potential collection has been counted

\[k! = k \times (k-1) \times \cdots \times 2 \times 1\]

(“$k$ factorial”) times. So we have to divide through by that, to get

\[\binom{n}{k} = \frac{n^{\underline{k}}}{k!} .\]

This is a nice formula for calculating the binomial coefficient. Let’s call it Formula A.

(There are other notations used for the falling factorial, such as $(n)_k$ or $P(n,k)$. I personally like $n^{\underline{k}}$, which I think is due to Donald Knuth, but I’m fine with any of the others.)

But if you look it up the binomial coefficient in books or on the web, you’re much more likely to find the formula

\[\binom{n}{k} = \frac{n!}{k!\,(n-k)!} .\]

This formula, let’s call it Formula B, is also correct, in that is mathematically equal to Formula A for $k = 0, 1, \dots, n$. It’s also much, much more popular than Formula A. But I think it’s bad!

1. Difficult to understand

The first thing is that it’s not so easy to interpret how Formula B corresponds to “the number of ways to choose $k$ objects from a set of $n$ objects.”

For Formula A, the numerator $n^\underline{k}$ is the number of ordered ways to pick the objects and the denominator $k!$ is the number of orderings for each collection. Easy.

For Formula B, the numerator $n!$ is the number of ways of ordering the whole set. So imagine ordering the entire set then picking the first $k$ items out of that ordering. The two terms in the denominator are $k!$ and $(n-k)!$ are because we don’t care about either the order within the first $k$ items (because order of our chosen items doesn’t matter) or the order within the last $n-k$ items (because we’re not picking them anyway). But this is weird: why are we bothering to order the whole set when we’re only going to pick a few of the items?

Alternatively, Formula B-ers might argue that $n! / (n-k)!$ is just another way of writing $n^{\underline{k}}$. I suppose in a way it is: we have

\[\frac{n!}{(n-k)!} = \frac{n \times (n-1) \times \cdots \times (n-k+1) \times (n-k) \times (n-k-1) \times \cdots \times 1}{\phantom{n \times (n-1) \times \cdots \times (n-k+1) \times {}} (n-k) \times (n-k-1) \times \cdots \times 1} ,\]

which ensures the right-hand tail of the top factorial cancels. But I dislike this – it’s the sort of clever-clever writing of of an expression in terms of other things that, yes, avoids defining notation for the falling factorial, but goes out of its way to disguise what’s actually going on. Reject such too-smart-by-half rewritings of easily interpretable expressions!

2. Doesn’t work outside the range

How many ways are there of choosing 7 items from a set of 5 items? Zero! It’s obviously impossible to pick 7 items out of 5, because once you’ve picked 5 items you’ve run out and have none left. What do the formulas have to say about this?

Formula A says

\[\binom{5}{7} = \frac{5^{\underline{7}}}{5!} = \frac{0}{120} = 0 ,\]

since the numerator is

\[5^{\underline{7}} = 5 \times 4 \times 3 \times 2 \times 1 \times 0 \times (-1) = 0\]

due to having a 0 in it. So Formula A gives the correct answer of 0.

What about Formula B? Here we have

\[\binom{5}{7} = \frac{5!}{7!\,(-2)!} = \frac{120}{5040 \times ???} ,\]

since there’s no such thing as “minus 2 factorial”. So Formula B fails. (You can’t get around this using the Gamma function; that’s still undefined for negative integers.)

3. Silly with big numbers

Suppose I want to know how many ways I can choose 2 items from a set of 25. So I want to know the binomial coefficient $\binom{25}{2}$.

With Formula A, this is

\[\binom{25}{2} = \frac{25^{\underline{2}}}{2!} = \frac{600}{2} = 300 .\]

Nice and simple – you can even do the calculation in your head.

But with Formula B, this is

\[\binom{25}{2} = \frac{25!}{2! \times 23!} = \frac{15\,511\,210\,043\,330\,985\,984\,000\,000}{2 \times 25\,852\,016\,738\,884\,976\,640\,000} ,\]

which apparently also comes out as 300 (although I certainly can’t do it in my head).

So, first: just look at it – that is obviously very silly! More seriously, this shows that Formula B will have numerical stability problems for large numbers: when $n$ gets large, your computer will have to round the gigantic number $n!$, and this can lead to inaccuracy in calculating $\binom{n}{k}$.

4. Doesn’t work with non-integer n

The binomial coefficient is also useful when “multiplying out of brackets”. Here, we have

\[(a + b)^n = \sum_{k=0}^n \binom{n}{k} a^k b^{n-k}\]

But we might also want to think about $(a + b)^x$ where $x$ is not an integer. In that case the formula

\[(a + b)^x = \sum_{k=0}^\infty \binom{x}{k} a^k b^{x-k}\]

still holds for $\vert a\vert < \vert b\vert$, where here the binomial coefficient means the same thing as in Formula A:

\[\binom{x}{k} = \frac{x^{\underline{k}}}{k!} = \frac{x(x-1)\cdots(x-k+1)}{k!} .\]

So, for example, with $b = 1$, $\vert a\vert < 1$, and $x = 1/2$, we have the correct formula

\[\begin{align} (a + 1)^{1/2} &= \binom{1/2}{0} + \binom{1/2}{1} a + \binom{1/2}{2} a^2 + \binom{1/2}{3} a^3 + \cdots \\ &= 1 + \frac12 a - \frac{1}{8} a^2 + \frac{1}{16} a^3 + \cdots , \end{align}\]

where we have used Formula A to get, for example

\[\binom{1/2}{3} = \frac{(\frac12)^{\underline{3}}}{3!} = \frac{\frac12 \times (-\frac12) \times (-\frac32) }{6} = \frac{\frac38}{6} = \frac{1}{16} ,\]

working perfectly well when the top of the binomial coefficient isn’t an integer.

What happens with Formula B? We get

\[\binom{1/2}{3} = \frac{\frac12 !}{3! \, (-\frac52) !} = \frac{???}{3 \times ??} ,\]

which doesn’t work at all, since the factorial doesn’t make sense for non-integer arguments. (Admittedly, you can usually get around this one with the Gamma function.)

5. Comparison with the multiset coefficient

There is also another coefficient that we often study alongside the binomial coefficient: the multiset coefficient. This is the number of ways of choosing $k$ items from a set of $n$ items if you’re allowed to pick each item more than once. Think of how many different handfuls of 6 chocolates you can pick out from a tub of Celebrations: you might get 3 Bounties, 2 Mars bars and Twix, for example.

The multiset coefficient can be written using the rising factorial

\[n^{\overline{k}} = n(n+1) \cdots (n+k-1) ,\]

where here the terms are going up by 1 each time. Then the multiset coefficient is

\[\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg)  = \frac{n^{\overline{k}}}{k!} ,\]

where we are using double-brackets for the multiset coefficient, compared to single-brackets for the binomial coefficient.

So writing these Formula A-style, we have a delightful symmetry between the binomial and multiset coefficients:

\[\binom{n}{k} = \frac{n^{\underline{k}}}{k!} \qquad \bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg)  = \frac{n^{\overline{k}}}{k!} .\]

But writing them Formula B-style gives the much less pleasing to the eye

\[\binom{n}{k} = \frac{n!}{k! \, (n-k)!} \qquad \bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = \frac{(n+k-1)!}{k!\,(n-1)!} .\]

Yuck.

But what about the symmetry relation?

The Formula B-ers might have a case with the following point, though: What about the symmetry relation? The symmetry relation is the result

\[\binom{n}{k} = \binom{n}{n-k} .\]

The B-ers would argue that this expression is very obvious with their Formula, since

\[\binom{n}{k} = \frac{n!}{k! \, (n-k)!} = \frac{n!}{(n-k)! \, k!} = \binom{n}{n-k} .\]

They would argue that this is not at all easy to see from Formula A. Is it really true that

\[\frac{n^{\underline{k}}}{k!} = \frac{n^{\underline{n-k}}}{(n-k)!} ?\]

I sort of see the point. But I would argue that Formula B makes this too obvious. The symmetry relation is a deep and important fact, that deserves a proper explanation; just saying “It’s trivial: Formula B says so!” denies us the opportunity to understand why the symmetry relation holds.

To understand why it holds, think of taking $n$ balls. How many ways we can paint $k$ of the balls red and the remaining $n-k$ blue? Well, we could either choose the $k$ red balls $\binom{n}{k}$ ways, painting the left-over balls blue, or we could choose the $n-k$ blue balls $\binom{n}{n-k}$ ways, and paint the left-over balls red. Since these count the same thing, they must be equal. This is the correct way to to understand the symmetry relation, not just finding some apparent coincidence in your formula. So, paradoxically, I think that Formula A making the symmetry relation less obvious is actually a benefit, not a drawback.

The geometric distribution starts from 0

2025-08-19T00:00:00+00:00

You keep rolling a dice until you get a six: how many rolls does this take?

More generally, you have a sequence of “trials”, each of which succeeds independently with probability $p$ or fails with probability $1-p$. You keep running these trials until you get a success. How many trials does this take?

Mathematicians call this number of trials needed a geometric distribution. But there’s actually a bit of disagreement about exactly what the geometric distribution is. There are two different conventions:

Convention 1 is that the geometric distribution counts the number of trials up to and including the first success. So if I roll my dice and get three, one, two, two, six, then I rolled the dice 5 times altogether, including the final six, so $X = 5$. The possible numbers of total trials are $1, 2, 3, \dots$, starting from 1. The probability of performing exactly $x$ trials up to and including the first success is $p_1(x) = (1-p)^{x-1} \, p$ for $x = 1, 2, \dots$, since you need $x-1$ failures followed by the $x$th trial being a success.
Convention 0 is that the geometric distribution counts the number of failures before the first success. So if I roll my dice and get three, one, two, two, six, then I rolled 4 non-sixes before the six, so $X = 4$. The possible numbers of failures are $0, 1, 2, \dots$, starting from 0. The probability of getting exactly $x$ failures before the first success is $p_0(x) = (1-p)^x \, p$ for $x = 0, 1, \dots$, since you need $x$ failures followed by a success.

A Convention-1 geometric distribution can be turned into a Convention-0 geometric distribution by subtracting 1; a Convention-0 geometric distribution can be turned into a Convention-1 geometric distribution by adding 1. So these aren’t, deep down, substantially different objects. But it is usually important that people know which convention you’re talking about.

Which convention is more popular?

Convention 1 is used by:
- Probability by Durrett; Probability by Grimmett and Walsh; Probability and Random Processes by Grimmett and Stirzaker; Introductory Probability by Grinstead and Snell; Probability and Computing by Mitzenmacher and Upfal; Introduction to Probability Models by Ross; Elementary Probability by Stirzaker; Weighing the Odds by Williams
- My MATH1710 notes*; my successor’s MATH1700 notes*; Vittoria Silvestri’s Cambridge notes; Oliver Johnson’s Bristol notes
- Every single one of the half-dozen colleagues I asked this week
- Claude*; Microsoft Copilot
Convention 0 is used by:
- Introduction to Probability by Blitzstein and Hwang*
- The statistical programming language R
- My predecessor’s MATH1710 notes; Richard Weber’s Cambridge notes
- Wolfram MathWorld
Both conventions are given equal coverage by
- Wikipedia
- ChatGPT; Google Gemini

(A * denotes that the source mentions the existence of the other convention.)

It seems that Convention 1 is more popular, perhaps almost overwhelmingly so. (Although actually Convention 0 did do a little better than I had expected.)

When I taught the geometric distribution I was a strong Convention 1-er, although I did mention that the language R uses Convention 0, which I said I found very annoying. In the lectures, I think I said something along the lines of: “When I’m King of the World, I will force everyone to use the convention where the geometric distribution is the number of trials up to and including the first success. That this is not universally recognised is just further evidence of the fallen state of Mankind.”

In this blogpost I want to admit I was wrong. Over the past year or so, I’ve had a Damascene conversion, and I’m now fully on-board with Convention 0. (You can see my doubts first starting to bloom towards the end of this earlier blogpost.) I want to explain why I now think that Convention 0 is better.

Why start from 1?

Before explaining why I changed my mind, let me try to recreate my former thought process about why Convention 1 might be preferable.

First, Convention 1 is the thing you actually want to know about. If I’m rolling a dice until getting a certain number, I want to know how many times I have to roll it all together, not how many unsuccessful rolls I’ll have before the successful one.

Second, under Convention 1, the expected number of trials up to and including the first success is $1/p$, while under Convention 2, the expected number of failed trials is $1/p - 1 = (1-p)/p$. The first expression is neater – especially in the “$n$ equally likely outcomes, of which one is a success” setting, where the Convention 1 expectation is $n$ and the Convention 2 expectation is $n - 1$.

To put these together, suppose I’m rolling a $d$-sided dice until getting a particular number. It seems both more useful and more pleasant to say “on average it will take $d$ rolls to succeed” than to say “on average it will take $d-1$ failed rolls before succeeding”.

Why start from 0?

I still have some sympathy with that point of view. But if we look at the mathematical properties of the two conventions, it’s clear that Convention 0 always has the nicer properties. Here are some examples I thought of.

1. Thinning. To thin a random variable $X$ by a probability $a$, we think of $X$ as representing a number of items, each of which is independently kept with probability $a$ or discarded with probability $1-a$.

Convention 0: Thinning a Convention-0 geometric distribution gives another Convention-0 geometric distribution but with a different value of $p$.
Convention 1: Thinning a Convention-1 geometric distribution gives a distribution not in any well-known family.

2. Compound Poisson. A compound Poisson distribution is a sum of Poisson-many IID copies of some distribution. We can think of this as receiving a Poisson number of deliveries, each of which contains a IID random number of items; the total number of items across all the deliveries is a compound Poisson distribution.

Convention 0: A Convention-0 geometric distribution is compound Poisson where the compounded distribution is a logarithmic distribution.
Convention 1: A Convention-1 geometric distribution is not compound Poisson.

3. Mixed Poisson. A mixed Poisson distribution is a Poisson distribution where the rate parameter is itself chosen at random. We can think of the random rate parameter being how popular we are today, then the mixed Poisson distribution as the number of items we receive, which is Poisson conditional on our popularity.

Convention 0: A Convention-0 geometric distribution is mixed Poisson where the rate follows an exponential distribution.
Convention 1: A Convention-1 geometric distribution is not mixed Poisson.

4. Infinite divisibility. A random variable is infinitely divisible if, for any $n$, it can be written as the sum of $n$ copies of an IID random variable $Y_n$. It is called discrete infinitely divisible if $Y_n$ takes only non-negative integer values.

Convention 0: A Convention-0 geometric distribution is infinitely divisible and discrete infinitely divisible.
Convention 1: A Convention-1 geometric distribution is infinitely divisible but is not discrete infinitely divisible.

5. Factorial tilting. This one’s a bit more obscure. One way of defining the exponential tilting $X^{(s)}$ of $X$ is that the moment generating function $M$ of $X$ and the moment generating function $M^{(s)}$ of $X^{(s)}$ are related by $M^{(s)}(t) = M(t + s) / M(s)$. Jørgensen and Kokonendji define the “factorial tilting” $M^{[s]}$ as an alternative for discrete distributions, instead working with the factorial moment generating function $\Phi(t) = \mathbb E(1+t)^X$: the factorial moment generating function $\Phi$ of $X$ and the factorial moment generating function $\Phi^{[s]}$ of $X^{[s]}$ are related by $\Phi^{[s]}(t) = \Phi(t + s) / \Phi(s)$. This preserves, for example, the families of Poisson, Bernoulli, binomial and Hermite distributions.

Convention 0: The factorial tilting of a Convention-0 geometric distribution is another Convention-0 geometric distribution (for $s$ such that the factorial tilting exists).
Convention 1: The factorial tilting of a Convention-1 geometric distribution is not in any well-known family.

I think that’s 5–0 for Convention 0.

Update: I thought of another: the Convention-0 geometric is the equilibrium distribution of the M/M/1 queue; I can’t think of any sensible queueing model for which a Convention-1 geometric distribution is the equilibrium distribution.

A modest proposal

Actually, though, I want to go further. I don’t just want to convert everyone to Convention 0. More controversially still, I want to change the parameter of the geometric distribution. Rather than using the success probability $p$ as the parameter, I want to use the odds against success $\theta = (1 - p)/p$.

Why such a bizarre choice? To do this, I want to put the geometric distribution within the wider family of negative binomial distributions. A negative binomial distribution has two parameters: $n$ and $p$ (or $n$ and $\theta$, I will shortly argue). The negative binomial distribution, at least to us Convention 0ers, counts the number of failures before the $n$th success. So, for example, if you roll a dice until getting a six for the tenth time, the number of non-sixes you rolled en route is negative binomial with $n = 10$ and $p = \frac{1}{6}$ (or $\theta = 5$). Setting $n = 1$ gets back the geometric distribution in the Convention 0 form, so the Convention-0 geometric slots in nicely as the first and most important example in the bigger family of negative binomials.

(None of my Convention 1-loving colleagues were willing to bite the bullet and admit the negative binomial should be the number of trials up to and including the $n$th success, with minimum value $n$. So maybe they’re all secret Convention 0ers like me, deep down.)

But it turns out that the negative binomial with $\theta$ as the odds against success behaves in a number of interesting ways as the “opposite” of the binomial distribution. The binomial distribution is the number of successes out of a fixed number $n$ of trials each of which succeeds with probability $\theta$. Remember that for the binomial $\theta$ is the success probability, but for the negative binomial $\theta$ is the odds against success.

So what are these interesting “opposites”?

(I’ll be using the notation $n^{\underline{k}} = n(n-1)\cdots(n-k+1)$ for the falling factorial and $n^{\overline{k}} = n(n+1)\cdots(n+k-1)$ for the rising factorial.)

1. Probability mass function.

The PMF of the binomial distribution is $\displaystyle \binom{n}{x} \theta^x (1 - \theta)^{n-x}$, where $\binom{n}{x} = n^{\underline{x}} / x!$ is the binomial coefficient.
The PMF of the negative binomial distribution is $\displaystyle \left(\kern-0.4em\binom{n}{x}\kern-0.4em\right) \theta^{-x} (1 + \theta)^{-n-x}$, where $\left(\kern-0.2em\binom{n}{x}\kern-0.2em\right) = n^{\overline{x}} / x!$ is the multiset coefficient.

2. Expectation.

The expectation of the binomial distribution is $n\theta$.
The expectation of the negative binomial distribution is $n\theta$.

3. Variance.

The variance of the binomial distribution is $n\theta(1-\theta)$.
The variance of the negative binomial distribution is $n\theta(1+\theta)$.

4. Factorial moments. The $k$th factorial moment is $\mathbb EX^{\underline{k}} = \mathbb EX(X-1)\cdots(X - k + 1)$.

The $k$th factorial moment of the binomial distribution is $n^{\underline{k}} \,\theta^k$.
The $k$th factorial moment of the negative binomial distribution is $n^{\overline{k}} \,\theta^k$.

5. Probability generating function. The probability generating function is $G_X(t) = \mathbb E\,t^X$.

The probability generating function of the binomial distribution is $(1 - \theta + \theta t)^n$.
The probability generating function of the negative binomial distribution is $(1 + \theta - \theta t)^{-n}$.

6. Thinning

The $a$-thinning of a binomial distribution keeps $n$ the same but changes the success probability from $\theta$ to $a\theta$.
The $a$-thinning of a negative binomial distribution keeps $n$ the same but changes the odds against success from $\theta$ to $a\theta$.

All these “opposites” results are much more pleasant than they would be if the negative binomial (and therefore the geometric) were parameterised by the success probability $p$ where $p = 1/(1 + \theta)$.

Photos: Belfast

2025-08-07T00:00:00+00:00

Belfast and the northern coast of Northern Ireland, 3–6 August 2025

Which election was the closest?

2025-05-02T00:00:00+00:00

Last night, there was a very close byelection in Runcorn and Helsby. The result was

Reform	Labour
12,645	12,639

Very close! (In this and all results, I’ll only look at the top two candidates, and ignore any other votes.)

But was this closer than the most famous close election of recent times: the 2000 US presidential election in Florida? The result there was:

Bush	Gore
2,912,790	2,912,253

So which was closer? Well, that depends. How do we measure the closeness of elections?

The simplest way would be just to look at the absolute difference or the “winning margin” d: that is, the number d of extra votes the winning candidate got over the losing candidate. Smaller differences are more impressively tight elections: a small absolute difference means a very close election, while a big absolute difference is not a close election. So how does that work out here?

Reform	Labour	abs. diff.
12,645	12,639	6

Bush	Gore	abs. diff.
2,912,790	2,912,253	537

Well, that solves it! The Runcorn byelection was decided by 6 votes, while Florida 2000 was decided by 537 votes. So Runcorn was much, much closer!

But, wait. These two elections were of very different sizes. There were about 25,000 voters in Runcorn, between the two biggest parties, but almost 6 million voters in Florida! Surely that needs be taken into account? Otherwise, we could say that Manchester United’s 3–0 thrashing of Athletico Madrid last night (absolute difference: 3) was even closer than the byelection.

So perhaps it makes more sense to look at the relative difference – that is, by what percentage one candidate beat the other one, or the absolute difference d divided by the total number of votes n. (Again, n will still just be the total of number of votes among the two best candidates.)

Reform	Labour	abs. diff.	rel. diff.
12,645	12,639	6	0.024%

Bush	Gore	abs. diff.	rel. diff.
2,912,790	2,912,253	537	0.009%

So, both very close elections. But smaller results are better, still, so this time, it’s Flordia 2000 that wins: it’s reduced by a factor of two-and-a-bit compared to Runcorn, on this measure. Florida was closer!

Looking again at relative difference

I think this is the result that most people would accept: closeness of elections is decided by relative (or percentage) difference between the top two candidates, so Flordia 2000 was closer.

But I want to think a bit more carefully about the relative difference: What does it mean, and what justification can we give for its use?

I propose that one way to think about the relative difference is the following.

Suppose we have 99 voters. Then there are 100 possible numerical results, from a 99–0 wipe-out for Red over Blue, to a 98–1 trouncing for Red over Blue, then 97–2, all the way through to a 0–99 wipe-out for Blue over Red. Suppose further that all 100 of those results are equally likely, coming up with 1% probability each. We could then come up with a kind of p-value: What is the probability (under this model) that the election result was as close as this or closer?

Suppose in our example that the Red party wins 51-48. Then, out of the 100 possible outcomes, there were four outcomes that would have been as close as this or closer: 51–48 itself, plus 50–49, 49–50, and 48–51. (In the latter two, Blue beat Red.) So the “p-value” (or “somewhat p-value-like quantity”) is 4/100 or 0.04.

So now consider an election with n voters and a difference of d. What’s the p-value here? The answer is that there are d + 1 outcomes as close as this or closer, and n + 1 possible outcomes. So the answer is (d + 1) / (n + 1).

[Optional proof: To see this, we need to count up all the results between “this result” and “the opposite result”, where the other candidate wins by the same amount. To count up these we can imagine distributing the winner’s votes to the loser: we could distribute 0 (current result) or 1 (two votes closer in the margin – unless there was only one vote in it to start with), or 2 votes, or 3, up to d votes; at the point we distribute d votes, we get the “opposite” result, the equal “closeness” as the original result, and any further distribution gives a less-close result with a bigger victory for the originally-losing candidate.]

This p-value we have justified, (d + 1) / (n + 1), is very nearly the absolute difference d/n; there’s just an extra “plus 1” on the top and bottom. So, my first proposal is not to look at the relative difference d/n, but rather the adjusted relative difference (d + 1) / (n + 1).

For all but the tiniest electorates, replacing n by n + 1 makes almost no difference. For extremely close elections, though, replacing d by d + 1 can effect things: this is a slight extra penalty for the smaller election. (Like Runcorn!)

Reform	Labour	rel. diff.	adj. rel. diff.
12,645	12,639	0.024%	0.028%

Bush	Gore	rel. diff.	adj. rel. diff.
2,912,790	2,912,253	0.00922%	0.00924%

Here, our adjustment slightly increased the score of the smaller Runcorn byelection, but barely changed the much bigger Florida election. Now, the Florida 2000 score is reduced by about a factor of 3 compared to Runcorn.

To put it another way, under this “each result equally likely” model:

a result like the Runcorn byelection would happen once every 3,600 elections;
a result like Florida 2000 would happen once every 11,000 elections.

So the Runcorn byelection result is (under this model) about 3 times as likely as the Florida 2000 result.

Before we move on, one last argument for the adjusted relative difference, with the “plus 1”s, over the standard relative difference. Consider two elections: in the first, there are four voters, and it’s a 2–2; in the second, there are four million voters, and, astonishingly, it’s an exact 2 million – 2 million tie! I guess you could say that these are both “perfectly close” results, as verified by their ideal relative differences of 0. But I think most people would regard the 2 million – 2 million tie as way more impressive. And that is indeed what shows up with the adjusted relative difference: the first score is (0 + 1)/(4 + 1) = 0.2, while the second is 1/4,000,001 = 0.00000025, an enormous reduction by a factor of almost a million.

A modest proposal

It was nice that the mathematical justification we gave above recovered (a slight adjustment of) the relative difference statistic that everyone uses anyway. But was that mathematical justification actually convincing, or was I pulling the wool over your eyes?

Well, I wasn’t being deceptive, but there is part of the argument I find unconvincing. It’s where I said that we could consider all 100 results from 99–0 to 0–99 equally likely. Is that a reasonable assumption? If I tossed 99 coins, getting 99 heads and 0 tails would be extraordinarily unlikely – about a one in 600 billion billion billion chance – but getting a 50–49 victory for heads over tails happens quite regularly – about one in every 13 goes.

(This is because there’s only one way for 99 coins to land all heads, but there are many ways for 99 coins to land with a 50/49 split. See the Wikipedia page on the “binomial distribution” for more on this.)

So we should take this into account when calculating our p-value. That is, we should instead caculate the probability of getting a result as close as this or closer with this “coin-tossing” model, rather than the previous “equally likely scores” model. I propose to call the p-value calculated with the coin-toss model the “Frodsham score” (after the place where the infamous punch occured that led to the Runcorn byelection – although the “chad score” would also work).

The bad news is that, while there is a formula to calculate the Frodsham score, it’s not a very pleasant one: if the loser gets a votes and the winner b votes, for a total of n = a + b votes, then the Frodsham score is

\[F = 2^{-n} \sum_{k=a}^b \binom{n}{k}\]

where that thing in brackets is the “binomial coefficient” (read “n choose k”, and famous from Pascal’s triangle). Pleasantness as a formula aside, though, the Frodsham score is easily calculated on a computer. We get:

Reform	Labour	adj. rel. diff.	Frodsham
12,645	12,639	0.028%	3.5%

Bush	Gore	adj. rel. diff.	Frodsham
2,912,790	2,912,253	0.00922%	18%

To put it another way, under this “coin tossing” model:

a result like the Runcorn byelection would happen once every 28 elections;
a result like Florida 2000 would happen once every 6 elections.

Runcorn wins again! The Florida 2000 result occurs about 5 times as often.

A final mathematical aside: while the exact formula for the Frodsham score is unpleasant, when the absolute difference d is small and the number of voters n is large, the approximation

\[F \approx \sqrt{\frac{2}{\pi}} \, \frac{d+1}{\sqrt{n}} \approx 0.798 \, \frac{d+1}{\sqrt{n}}\]

is pretty accurate:

Reform	Labour	Frodsham	approx. Frodsham
12,645	12,639	3.511%	3.512%

Bush	Gore	Frodsham	approx. Frodsham
2,912,790	2,912,253	17.6%	17.8%

And, of course, if you just want to compare elections with each other, you can ignore the common factor of 0.798 in the approximation. So my suggestion is this: Don’t take the percentage difference, by taking the difference (or, better, the difference plus 1) divided by n. Rather, divide by the square root of n instead!

Film review: Eternal Sunshine of the Spotless Mind

2025-02-12T00:00:00+00:00

I think Eternal Sunshine of the Spotless Mind is probably my favourite film of all time. Certainly it’s the non-kids film I’ve watched the most. The first time I saw it was probably 2005 in my university dorm room one evening off a DVD borrowed from a friend, and I thought it was so great I watched it again the next morning before giving it back. And then for quite a while I used to watch it annually at about this time of year (the film has a long prologue set on Valentine’s day, before flashing back to most of the action happening the night before), before deciding that was maybe a bit too much of a depressed-single-person thing to do. But last night I watched it for probably the first time in at least five years, and definitely for the first time in the cinema.

My Grand Unified Theory of Eternal Sunshine goes something like this: Jim Carrey, Michel Gondry and Charlie Kaufman are respectively a generationally talented actor, director and writer, but they all have so many ideas gushing out of them that they really struggle to keep everything under control for two hours. Carrey is a supremely gifted physical comedian, but it gets tiresome after a few minutes. Gondry is the most galaxy-brained mad inventor working in film today, but all his best works (except this) are at music video length, and the feature films don’t really hang together. Kaufman builds these zany worlds and wild concepts, but – although others will argue against this – he has never really stuck the landing, for me. So combining the three ought to leave us with an unruly mess punctured by moments of brilliance. Yet somehow, miraculously, they all keep it super-restrained here; the wackiness is still there, and is still excellent, but it’s just poking through from under the surface rather than overwhelming everything.

How much does Eternal Sunshine benefit from being on the big screen? Well, I do think it is a film that looks beautiful, but it’s not visually stunning or astoundingly cinematic, so I didn’t feel I was watching a dramatically different film from the one I’ve seen so many times on the TV. But what does work much better is the sound. There are lots of in-Joel’s-memory scenes where the real-world conversations of the mind-erasers start bleeding in, and this is much clearer in full surround-sound. (Also, I don’t recall noticing before that the very first noise in the film is the scientists driving away in the morning.)

Various other thoughts on this rewatch (or, Random thoughts for a-few-days-before-Valentine’s day 2025):

Kate Winslet’s performance is still – still! – underrated in this.
They are all superstars with pretty small parts, but I think I can argue with a straight face that these are life-time best performances from Tom Wilkinson, Mark Ruffalo, Elijah Wood and Kirsten Dunst. (Ruffalo breaks your heart just by taking his glasses off.)
David Cross’s delivery of “I’m making! A birdhouse!” and Tom Wilkinson’s of “Paaaatrick, baaaaby-boy” are both legendary, as far as I’m concerned.
Jon Brion’s score is excellent, but the bit where “Row, row, row your boat” fades into the piano is particularly gorgeous.
I love the visual effect where Joel is eating the Chinese food while standing behind the TV, a TV that is showing him eat the food, as if it’s see-through. If you see what I mean?
“Technically speaking, it is brain damage” is a very sharp line of dialogue.
I reckon it’s between this and Casablanca for the film with the most quotable lines of dialogue: “Random thoughts for Valentine’s day 2004…” “Sand is overrated: it’s just tiny little rocks.” “‘Blessed are the forgetful, for they get the better even of their blunders.’” “Pope Alexander.” “Are we the dining dead?” The “I’m just a fucked-up girl…” speech.
I’m not sure I believe Joel’s art would be quite as avant-garde, quite as interesting, as it is shown to be here.
This is an almost perfect film – but I think there’s a plot hole I still haven’t managed to resolve to my satisfaction. In Joel’s memory – or dream? – Memory-Clementine says “Meet me in Montauk”. The next day Joel knocks off work and takes the train there. But how come Real-Clementine is there too? Is it just a coincidence?
I’ve read quite a lot about this film, but I still don’t have a good handle on how the ending actually came about. I know that Kaufman originally wrote an ending where Joel and Clementine continually erase each other for their whole lives – which strikes me as the sort of ingenious but somewhat sour depressive note that has prevented me from fully enjoying his post-Eternal Sunshine writing. But the actual ending we have is just beautiful in its (let’s call it) realistic optimism. Whose idea was this? Who wrote “OK.” “OK?”, the greatest last lines of any film?

I don’t know whether I was worried that I would cry at the film or that I wouldn’t. (Is it worse to appear embarrassingly vulnerable in public, or is it worse if a piece of art that has been very important to you no longer has the same emotional punch it once did?) But, for the record, my cry count was 3: “Row, row, row your boat,” as mentioned above; the ending, of course; and, most of all, the scene towards the end in the beachhouse – the key scene of the whole film.

Binomial coefficient	Multiset coefficient
\(\dbinom{n}{k} = \dfrac{n^{\underline{k}}}{k!}\)	\(\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = \dfrac{n^{\overline{k}}}{k!}\)
\(\dbinom{n}{k} = \dbinom{n}{n-k}\)	\(\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = \bigg(\kern-0.4em\dbinom{k+1}{n-1}\kern-0.4em\bigg)\)
\(\dbinom{n}{k} = \dbinom{n-1}{k} + \dbinom{n-1}{k-1}\)	\(\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = \bigg(\kern-0.4em\dbinom{n-1}{k}\kern-0.4em\bigg) + \bigg(\kern-0.4em\dbinom{n}{k-1}\kern-0.4em\bigg)\)
\(k\dbinom{n}{k} = n \dbinom{n-1}{k-1}\)	\(k\bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = n \bigg(\kern-0.4em\dbinom{n+1}{k-1}\kern-0.4em\bigg)\)
\(\dbinom{k}{j} \dbinom{n}{k} = \dbinom{n}{j}\dbinom{n-j}{k-j}\)	\(\dbinom{k}{j} \bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) = \bigg(\kern-0.4em\dbinom{n}{j}\kern-0.4em\bigg) \bigg(\kern-0.4em\dbinom{n+j}{k-j}\kern-0.4em\bigg)\)
\(\displaystyle\sum_{k=0}^n \binom{n}{k} x^k = (1 + x)^n\)	\(\displaystyle\sum_{k=0}^\infty \bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg) x^k = (1 - x)^{-n}\)
\(\displaystyle\sum_{m=1}^n \binom{m-1}{k-1} = \dbinom{n}{k}\)	\(\displaystyle\sum_{m=1}^n \bigg(\kern-0.4em\dbinom{m}{k-1}\kern-0.4em\bigg) = \bigg(\kern-0.4em\dbinom{n}{k}\kern-0.4em\bigg)\)
\(\dbinom{n+m}{k} = \displaystyle\sum_{j=0}^k \dbinom{n}{j} \dbinom{m}{k-j}\)	\(\bigg(\kern-0.4em\dbinom{n+m}{k}\kern-0.4em\bigg) = \displaystyle\sum_{j=0}^k \bigg(\kern-0.4em\dbinom{n}{j}\kern-0.4em\bigg) \bigg(\kern-0.4em\dbinom{m}{k-j}\kern-0.4em\bigg)\)