This R worksheet does not include any assessed questions.
On the last R worksheet, we saw the basic commands for plotting data visualisations:
boxplot()
for boxplots,hist()
for histograms,plot()
for scatterplots.We also saw some optional arguments these functions could take, to change the appearance of the plot:
breaks = ...
argument to specify the number or
sizes of bins in a histogram;type = ...
argument to draw points, lines, or both
in a scatterplot.In this worksheet, we will see some other optional arguments that can be used to improve the appearance of your plots. Many of these arguments can be applied to any of the three plot types we’ve studied. The most important arguments are about labelling or titling plots and about setting the limits on axes.
When drawing a plot, it’s very important that it’s clear what the
data represents. To do this, we almost always need to label the x and y
axes. It’s often helpful to give to give the plot an explanatory title.
Axis label are set with the arguments
xlab = ...
and
ylab = ...
respectively. A title is set
with main = ...
. The text to appear in the
labels/title must always be put in quotation marks " "
.
Remember that multiple arguments must be separated by commas.
Let’s use the Met Office historical temperature data again.
temperature <- read.csv("https://mpaldridge.github.io/math1710/data/met-office.csv")
In the last worksheet, we drew a histogram of December temperatures with the command
hist(temperature$dec, freq = FALSE)
But this picture would be clearer properly labelled and titled.
hist(temperature$dec, freq = FALSE, xlab = "Average temperature (degrees Celsius)", ylab = "Frequency density", main = "Historical December temperatures in the UK")
Adding all these extra arguments can make your code difficult to read. Remember that R allows you to use line breaks when it’s obvious a command isn’t finished – for example, when a pair of brackets has been opened but not yet closed. You can also add spaces to make your code easier to read. For example, you may find the following formatting of exactly the same command as above more pleasant.
hist(temperature$dec, freq = FALSE,
xlab = "Average temperature (degrees Celsius)",
ylab = "Probability density",
main = "Historical December temperatures in the UK"
)
Exercise 5.1. Read the Met Office temperature data into R. Last time, in Exercise 4.5, you drew a scatterplot of January and August temperatures. Do this again, but with explanatory labels on the axes and an explanatory title.
For boxplots diagrams with multiple boxplots, you will also want to
label the individual boxplots. This is done with the
names = ...
arguments to
boxplot()
. Here, names
should be equal to a
vector of the names given to the boxplots. Remember that vectors are set
with c()
, and each of the names must be in quotation
marks.
Adapting another example from last time, we get
boxplot(temperature$jul, temperature$aug, temperature$sep,
xlab = "Month",
names = c("July", "August", "September"),
ylab = "Average temperature (degrees Celsius)",
main = "Historical UK temperatures"
)
Exercise 5.2. Draw a figure consisting of boxplots for two or more months of temperature data. Make sure all axes are labelled, including each of the individual months.
When you draw a plot, R tries to choose the upper and lower limits of the x and y axes sensibly, to ensure that all the data fits, and that the numbers displayed next to the axes can be nice round numbers. (For histograms, R also ensures the lower limit on the y-axis is 0.) However, sometimes, you may wish to choose the axis limits yourself. The main reason we might want to fix the axis limits ourselves is that, in some circumstances, it’s appropriate ensure one or both of the axes start at 0, to put changes in the corresponding value in proper context. R, on the other hand, may choose to zoom in on relatively unimportant small changes.
We can choose the axis limits ourselves using the
xlim = ...
and
ylim = ...
arguments. Here
xlim
or ylim
should be set to be a vector of
length 2, with the first number being the lower limit for the axis and
the second number being the upper limit. If you just set one of the axis
limits, R will still try to choose the other one sensibly.
For example, if making a scatterplot of February and July temperatures, I may wan to put both month’s temperatures on the same scale – say from -2 to 10 – to give a fair comparison between them.
plot(temperature$feb, temperature$apr,
xlab = "Average February temperature (degrees Celsius)",
ylab = "Average April temperature (degrees Celsius)",
xlim = c(-2, 10),
ylim = c(-2, 10)
)
Exercise 5.3. Draw a histogram of of August temperature. Make sure the axes of your plot are appropriate and give your plot a title. Now redraw the histogram with the temperature axis going down to 0 (and an appropriate upper limit). Do you think changing the axis limits was helpful here?
There are many other optional arguments than can be passed to our
plot-drawing functions. You can find out about many of these by reading
the relevant help files in R. Try typing ?boxplot
,
hist
, or plot.default
(or ?par
,
for the adventurous) into the console and pressing Enter – you should
see the help files open in the bottom-right quadrant of RStudio.
Some of these arguments include:
sub = ...
sets a subheading – remember
the text must be in quotation marks.col = ...
to set colours. In
boxplot()
this is the shading of the two central boxes; in
hist()
this is the shading of the histogram bars; in
plot()
this is the colour of the points or lines. The
argument col = ...
takes many standard colour names
provided they are in quotation marks: try col = "blue"
or
col = "red"
, for example.lwd = ...
sets the line width. R’s
default line width (lwd = 1
) is sometimes too thin,
especially for plot()
with the type = "l"
option set. I find lwd = 2
is sometimes preferable for
scatterplots with lines. (In hist()
, lwd
changes the thickness of the axes, slightly bizarrely.)plot()
, we already know that
type = ...
can give us points (p
), lines
(l
) or both (b
).plot()
, the argument
pch = ...
sets the symbol used to plot
points. The default is pch = 1
, which is a circle. Try
setting pch
to other numbers to see what happens.log = ...
to log = "x"
(for just the x axis),
log = "y"
(for just the y axis), or log = "xy"
(for both).Exercise 5.4. Draw a plot of your choice based on the Met Office temperature data, but use as many extra wacky options as you can. Make sure to title and label your plot, of course.