There are assessed questions associated with this worksheet, at the bottom of this worksheet. The deadline for submitting your solutions to these questions is Monday 21 November at 1400. If you have difficulty with this worksheet, you can get help at the office hours drop-on sessions.
If you worked through the optional R Worksheet 6, then you might like to save your work as an R Markdown document, rather than an R Script as you have done before. You should not submit any R Script or R Markdown document, however – answers are submitted into a Microsoft Form, as before.
In this worksheet, we will look at working with three famous discrete distributions: the binomial, Poisson, and geometric distributions. (In the next worksheet we will look at arbitrary PMFs.)
There are 12 functions we will be studying:
Binomial | Poisson | Geometric |
---|---|---|
dbinom() |
dpois() |
dgeom() |
pbinom() |
ppois() |
pgeom() |
qbinom() |
qpois() |
qgeom() |
rbinom() |
rpois() |
rgeom() |
You’ll notice that each function has a one-letter prefix
(d
, p
, q
, or r
) and
a longer suffix (binom
, pois
, or
geom
). You’ve probably guessed that the suffixes refer to
the binomial, Poisson, and
geometric distributions. We will give more details
about the prefixes later, but for now, let us briefly note:
d
denotes the probability mass
function (PMF) \(p_X(x) = \mathbb P(X
= x)\). (The letter d
is for “density”, although the
word “density” only actually applies to continuous random
variables.)p
denotes the cumulative
distribution function (CDF) \(F_X(s)
= \mathbb P(X \leq x)\). (I’m not sure what the letter
p
stands for – “probability”, I guess?)q
denotes the quantile
function, which we will talk about later.r
generates random
samples from the distribution.Let’s start by going through the functions for the binomial distribution.
First dbinom()
gives the PMF of a
binomial random variable \[ p_X(x) =
\binom{n}{x} p^x (1 - p)^{n - x} . \] The function takes three
arguments:
So for example, if \(X \sim \text{Bin}(10, 0.4)\) and you want to calculate \(p_X(5) = \mathbb P(X = 5)\), then you can find this as
n <- 10
p <- 0.4
dbinom(5, n, p)
## [1] 0.2006581
or just dbinom(5, 10, p)
, for short.
You can also put give a vector as the argument for \(x\), if you want multiple values of the PMF. For example, to find \(p_X(6)\), \(p_X(7)\) and \(p_X(9)\) together, you can use
dbinom(c(6, 7, 9), n, p)
## [1] 0.111476736 0.042467328 0.001572864
Exercise 7.1. Let \(X \sim \text{Bin}(20, 0.6)\).
(a) Calculate \(\mathbb P(X = 13)\).
(b) By usingdbinom()
together withsum()
, calculate \(\mathbb P(12 \leq X \leq 15)\).
Second, pbinom()
gives the CDF \[ F_X(x) = \mathbb P(X \leq x) = \sum_{y = 0}^x
\binom{n}{y} p^y (1 - p)^{n - y} . \] The arguments go in the
same order \(x, n, p\), as before, and
\(x\) can be a vector.
Suppose \(X \sim \text{Bin}(10, 0.4)\) again. Then the probability that \(X\) is at most 6 is
pbinom(6, n, p)
## [1] 0.9452381
In addition, pbinom()
also has an extra optional
argument lower.tail = ...
which can be set to
lower.tail = TRUE
to calculate the CDF as above. This
is the default, which means the CDF is what is calculated if you don’t
use the lower.tail
argument at all.lower-tail = FALSE
to instead calculate the upper-tail
probability \(1 - F(x) = \mathbb P(X >
x)\). Note that that is strictly greater than \(x\), not greater-than-or-equal.Exercise 7.2. Let \(X \sim \text{Bin}(20, 0.6)\) again.
(a) Calculate \(\mathbb P(X \leq 12)\).
(b) Calculate \(\mathbb P(X \leq x)\) for all \(x\) between 0 and 20, with all answer rounded to 2 decimal places.
(c) Calculate \(\mathbb P(X \geq 16)\). (Careful: that’s a greater-than-or-equal sign.)
Third, qbinom()
gives the quantile
function. That is, for \(0 \leq f \leq
1\), the command qbinom(f, n, p)
gives the value
\(x\) such that \(F(x) = f\), where \(F(x) = \mathbb P(X \leq x)\), if there is
such an \(f\). If \(F(x) = f\) does not have an exact solution,
then qbinom(f, n, p)
gives the smallest \(x\) such that \(F(x) \geq p\). To put it another way, the
quantile function is the inverse of the CDF, \(F^{-1}(f) = x\). To put it yet another way,
qbinom()
answers the question “How large an \(x\) do I need to be at least \(100f\%\) sure that \(X \leq x\).
The quantile function is not as important as the other functions here, and we will not use it very often.
As before, the first argument can be a vector, and the
lower.tail = ...
argument can be optionally used find the
inverse of the upper-tail function \(1 - F(x)
= \mathbb P(X > x)\).
Exercise 7.3. Let \(X \sim \text{Bin}(20, 0.6)\) again. What is the smallest number \(x\) such that \(X\) is 95% likely to be less than \(x\).
Finally, rbinom()
can be used to
simulate random outcomes of a binomial random variable. Here, the first
argument is the number of samples one wants, then \(n\) and \(p\), as before. Here, for example, are 20
samples of a \(\text{Bin}(10, 0.4)\)
random variables:
rbinom(20, n, p)
## [1] 5 4 2 4 4 3 3 4 7 5 6 4 5 4 5 3 6 2 4 3
Exercise 7.4. Let \(X \sim \text{Bin}(20, 0.6)\) again.
(a) Generate 1000 random samples from \(X\), and store them in a variable calledsamples
.
(b) Draw a histogram of yoursamples
data.
(c) Calcultate the mean of yoursamples
data.
(d) You should find that your answer to part (d) is close to 12. Why do you think this is?
The functions for the Poisson distribution look very similar to those for the binomial distribution, except that instead of \(n\) and \(p\), there is just a single rate parameter \(\lambda\).
Exercise 7.5. Explain in mathematics what the following four functions have calculated:
lambda <- 3.2
dpois(2, lambda)
## [1] 0.2087025
ppois(4, lambda, lower.tail = FALSE)
## [1] 0.2193875
qpois(0.95, lambda)
## [1] 6
var(rpois(10000, lambda))
## [1] 3.228267
Exercise 7.5. Let \(X \sim \text{Bin}(500, 0.01)\). Calculate exactly:
(a) \(\mathbb P(X = 4)\);
(b) \(\mathbb P(X \geq 7)\).
(c) Repeat the calculations in parts (a) and (b) using a Poisson approximation to the binomial. Comment on the accuracy of the approximation.
The functions for the geometric distribution – dgeom()
,
pgeom()
, qgeom()
, rgeom()
– work
similarly again, but with one extra annoyance.
You’ll recall that in the lecture notes, we defined a geometric distribution with parameter \(p\) to be the number of trials up to and including the first success. So if \(X \sim \text{Geom}(p)\), then \[ p_X(x) = \mathbb{P}(X = x) = (1 - p)^{x - 1} p . \] However, R uses an alternative definition, where a geometric distribution \(Y\) is the number of failures before the first success, so \[ p_Y(y) = \mathbb{P}(Y = y) = (1 - p)^{y} p . \]
So:
dgeom(x - 1, p)
.pgeom(x - 1, p)
.qgeom(f, p) + 1
.rgeom(n, p) + 1
You may find it helpful to create new functions by running the following code block (that you don’t need to understand).
dgeomalt <- function(x, prob, log = FALSE) {
dgeom(x - 1, prob, log = log)
}
pgeomalt <- function(q, prob, lower.tail = TRUE, log.p = FALSE) {
pgeom(q - 1, prob, lower.tail = lower.tail, log.p = log.p)
}
qgeomalt <- function(p, prob, lower.tail = TRUE, log.p = FALSE) {
qgeom(p, prob, lower.tail = lower.tail, log.p = log.p) + 1
}
rgeomalt <- function(n, prob) {
rgeom(n, prob) + 1
}
This will temporarily create new functions dgeomalt()
,
pgeomalt()
, qgeomalt()
,
rgeomalt()
that work the way we prefer.
Exercise 7.6. Let \(X \sim \text{Geom}(0.2)\) under the “number of trials up to and including the first success” definition used in the MATH1710 lectures. Calculate:
(a) \(\mathbb P(X = 20)\), rounded to two significant figures;
(b) \(\mathbb P(X \geq 10)\).
(c) How many trials are required to give us a 95% chance of seeing a success?
The following five assessed questions should be submitted via this Microsoft Form. I recommend you do this in Week 7, but the official deadline is Monday 21 November at 1400.
This work will be marked automatically by computer, so make sure your answers are accurate – the computer does not “know what you meant”; only what you actually enter into the form.
So that (most) students get different data to work with, the questions will use the number \(i\), where \(i\) is the final digit of your Student ID number – that is, a number between 0 and 9. (Note that we are only using one digit this time, where on previous worksheets we used two digits.)
Any rounding should be performed with the R round()
or signif()
functions.
(Note: It was a bit difficult to write questions that given “sensible” answers for all 10 values of \(i\), so if you have checked your answer very carefully, you should not be discouraged if it is extremely close to 0 or 1.)
Assessed Question 1. Let \(X_1 \sim \text{Bin}(n ,p)\), where \(n = 20 + i\) and \(p = (1 + i)/20\), where \(i\) is the final digit of your Student ID number. What is \(\mathbb P(X_1 \geq 5)\)? Round your answer to three significant figures.
Assessed Question 2. Let \(X_2 \sim \text{Bin}(n,p)\), where \(n = 200 + i\) and \(p = (i+1)/500\), where \(i\) is the final digit of your Student ID number. What is \(\mathbb P(X_2 \text{ is even})\)? Round your answer to five significant figures.
Assessed Question 3. Let \(X_3 \sim \text{Bin}(n,p)\), where \(n = 200 + i\) and \(p = (21 + i)/1000\), where \(i\) is the final digit of your Student ID number. Let \(Y_3\) be the Poisson approximation to \(X_3\). What is \(\mathbb P(Y_3 \leq 4)\)? Round your answer to four significant figures.
Assessed Question 4. Suppose you roll a pair of dice until you get a double-six. What is the smallest number of times must you roll the pair of dice in order to give yourself at least a 95% chance of seeing a double six?
Assessed Question 5. This question does not require you to run any R code. But suppose you did run the command
sd(rpois(n, lambda))
where \(\lambda = (1 + i)/2\) and \(n\) was set to be an extremely large number. What answer would you expect? Give your answer as a decimal to two decimal places.*