This R worksheet does include Assessed Questions at the end. The deadline for submitting your solutions to the Assessed Questions is Monday 23 October at 1400. I recommend working on this worksheet during your R Practical in Week 3 – check your timetable for details.
Remember to open a new R Script (or add to your last R Script) to save your work.
So far, our objects in R have just been single numbers. However, we can do much more with R when our objects are vectors of many numbers. For example, we can represent a dataset as a vector. In this worksheet, we will find out how to enters vector into R, and look at operations we can perform with these vectors.
We can enter a vector by using the c()
function. (The letter c
stands for “combine” – or maybe
“concatenate”.) So to enter the vector \((1,
3, 4, 4, 2, 4, 1)\) as an object – let’s call it
“my_vector
” – we enter
my_vector <- c(1, 3, 4, 4, 2, 4, 1)
and we can view it with
my_vector
[1] 1 3 4 4 2 4 1
Exercise 2.1. A student collects data for one week on the number of sandwiches sold by the Dolche Vita cafe. On Monday 27 sandwiches were sold, on Tuesday 29 sandwiches were sold, on Wednesday 30 sandwiches were sold, on Thursday 29 sandwiches were sold, and on Friday 18 sandwiches were sold. Enter this data into R as a vector, and call it
sandwiches
.View your vector, and check you have entered the data correctly.
There are many situations when we want a vector to be consecutive
string of numbers, like \((1, 2, 3, 4, 5, 6,
7, 8, 9, 10)\). R has a shortcut to allow us to enter this
quickly using using colon operator :
. For
example, to get the consecutive integers from 1 to 10 using the
:
operator, we enter
1:10
[1] 1 2 3 4 5 6 7 8 9 10
Exercise 2.2. As efficiently as you can, enter the following vectors into R:
(a) \((10, 11, 12, \dots, 29, 30)\),
(b) \((100, 99, 98, \dots, 87, 86)\).
(c) By using:
together with the combine functionc()
, enter \((1, 2, 3, 4, 5, 24, 25, 26, 27, 28)\).
We can deal with a vector such as my_vector
as a whole,
but sometimes we just want to deal with an individual entry within the
vector. We do that using square brackets
[ ]
.
So, for example, the fifth entry of my_vector
is
my_vector[5]
[1] 2
The second and fourth entries are
my_vector[c(2, 4)]
[1] 3 4
The first three entries are
my_vector[1:3]
[1] 1 3 4
Notice that in the second and third examples we put a vector
(c(2, 4)
and then 1:3 = c(1, 2, 3)
) into the
square brackets to pull out multiple entries of
my_vector
.
We can also assign values to entries using the same notation. So to
change the second entry of my_vector
to a 5, we enter
my_vector[2] <- 5
and we can check this second entry has indeed changed by reading the vector:
my_vector
[1] 1 5 4 4 2 4 1
Exercise 2.3. The student from Example 2.1 discovers an extra sandwich that was sold on Monday. Increase the first entry of the
sandwiches
by 1. (Try to do it without looking at what the first entry was previously.)
Standard arithmetic operations like +
, -
,
*
, /
, ^
, and so on, can also be
used with vectors. They work entry-by-entry. Let’s demonstrate with two
vectors of the same length:
vector_1 <- c(3, 3, 4, 4, 5)
vector_2 <- 1:5
Then
vector_1 + vector_2
[1] 4 5 7 8 10
outputs a new vector. The first entry of the new vector is the first
entry of vector_1
, which is 3, plus the first entry of
vector_2
, which is 1, making 4. Then the second entry of
the new vector is the second entry of vector_1
, 3, plus the
second entry of vector_2
, 2, making 5. And so on.
Note that
vector_1 * vector_2
[1] 3 6 12 16 25
gives the vector of entry-by-entry multiplications, not the scalar
dot-product \(\mathbf{x} \cdot \mathbf{y} =
\sum_i x_i y_i\). If you want the dot-product, you can use the
sum()
function from the next section:
sum(vector_1 * vector_2)
[1] 62
Exercise 2.4. The number of baguettes sold in the cafe each day was \((8, 9, 8, 7, 5)\). The student decides to count these as sandwiches, so wants to add these numbers to the
sandwiches
data from Exercise 2.3. Add these new values to the previous data.
What happens if we try to use one of these operations between a vector and a single number, like this?
vector_1 + 2
[1] 5 5 6 6 7
vector_2 * 3
[1] 3 6 9 12 15
You should notice that R has added 2 to each entry of
vector_1
, and multiplied every entry of
vector_2
by 3. That is because R has “recycled” the scalar
(2 or 3) enough times to match the length of the vector.
You should be very careful when applying an operation between vectors of different lengths. R will “recycle” the shorter vector by repeating it until it matches the length of the longer vector. If the shorter vector doesn’t fit into the longer vector a whole number of times, R will also give a warning message after the answer. This often gives behaviour you don’t expect (or don’t want)!
Exercise 2.5. Predict what each of the following five commands will produce. Then run the code in R to see if you were right. If you were wrong, explain what R has done.
c(1, 3, 5) + c(2, 2, 3)
1:5 * c(0, 0, 1, 1, 2)
c(2, 2, 3, 3)^c(1, 2, 2, 1)
1:8 / 1:4
1:7 + c(1, 2)
There are a number of useful functions that operate on vectors, including:
sum()
gives the sum
of all the numbers in the vector.mean()
gives the mean
of the numbers in the vector.length()
gives us the
length of the vector; that is, how many numbers were in
the vector.min()
and
max()
give the minimum
(smallest) and maximum (largest) entries of the vector,
respectively.median()
(median)
sd()
(standard
deviation), var()
(sample
variance), IQR()
(interquartile range). You can find others by using
Google or just experimenting with R to see what works.cov()
(sample covariance) and
cor()
(correlation)
require two arguments.So, for example, to find the sum of our vector
my_vector
, we enter
sum(my_vector)
[1] 21
If we think we’re likely to use the value of the sum many times in the future, it will be convenient to save that sum as an R object itself:
my_total <- sum(my_vector)
Exercise 2.6. Using your
sandwiches
data from Exercise 2.4, find the mean and median number of sandwiches sold in a day.
In the last worksheet, we saw functions like log()
,
exp()
, round()
, and signif()
that
work on a single number. If you apply these functions to a vector, R
applies the function to each entry of the vector one at a time. So, for
example,
exp(my_vector)
[1] 2.718282 148.413159 54.598150 54.598150 7.389056 54.598150 2.718282
outputs a vector whose first element is the exponential of the first
element of my_vector
, whose second element is the
exponential of the second element of my_vector
, and so
on.
Exercise 2.7. Get R to output the square roots of the integers from 1 to 20, all rounded to 2 decimal places.
Your numerical answer to the following five assessed questions should be submitted to the Microsoft Form that is linked to below. I recommend you do this in Week 3, but the official deadline is Monday 23 October at 1400.
You should submit your work by typing your answers into this Microsoft Form. You may need to log into your University account to access the Form.
This worked will be marked automatically by computer, so make sure you answers are accurate – the computer does not “know what you meant”; only what you actually enter into the form.
So that (most) students get different data to work with, you will
work with data whose name is based on your student ID number. Your
student ID number is (usually) a 9-digit number starting
201
. In the command below, xy
is replaced by
the last two digits of your student ID number. So if your student ID is
201623429
, then xy
is 29
; while
if your student ID is 201790400
, then xy
is
00
. Take care to check you get this correct: if you get
your student ID number wrong and/or use the wrong data, the computer is
likely to mark all your answers wrong award you 0 total marks.
If asked to round your answer, you should give your answer as
produced by R’s round()
or signif()
functions.
(It will not make any difference to the computer marking whether or not
`trailing 0s’ are given.)
First, run the following line of code exactly,
except that the letters xy
toward the end of the line are
to be replaced by the last two digits of your student ID number, as
described above. The code is this:
source("http://mpaldridge.github.io/math1710/data/R2-xy.R")
If you copy-and-paste this command – which is recommended – remember
to replace the letters xy
by the last two digits of your
student ID before pressing enter.
That line of code should create two vectors for you in R. These two
vectors, amaflix
and netazon
are the stock
prices of two companies, “Amaflix” and “Netazon” over a number of
days.
Assessed Question 1. How many days of data are in the dataset?
Assessed Question 2. What was the mean stock price of Amaflix? Give your answer to two decimal places.
Assessed Question 3. What was the variance of stock price of Netazon over the first 25 days of the dataset? Round your answer to three significant figures.
Assessed Question 4. Which company’s data was generally more spread out?
Assessed Question 5. Would you say that the relationship between the companies’ stock prices is a strong positive relationship, a strong negative relationship, or a weak relationship?
Remember to submit your answers into this Microsoft Form before 1400 on Monday 23 October.