R Worksheet 2: Vectors

This R worksheet does include Assessed Questions at the end. The deadline for submitting your solutions to the Assessed Questions is Monday 23 October at 1400. I recommend working on this worksheet during your R Practical in Week 3 – check your timetable for details.

Remember to open a new R Script (or add to your last R Script) to save your work.

So far, our objects in R have just been single numbers. However, we can do much more with R when our objects are vectors of many numbers. For example, we can represent a dataset as a vector. In this worksheet, we will find out how to enters vector into R, and look at operations we can perform with these vectors.

Entering vectors into R

We can enter a vector by using the c() function. (The letter c stands for “combine” – or maybe “concatenate”.) So to enter the vector \((1, 3, 4, 4, 2, 4, 1)\) as an object – let’s call it “my_vector” – we enter

my_vector <- c(1, 3, 4, 4, 2, 4, 1)

and we can view it with

my_vector

[1] 1 3 4 4 2 4 1

Exercise 2.1. A student collects data for one week on the number of sandwiches sold by the Dolche Vita cafe. On Monday 27 sandwiches were sold, on Tuesday 29 sandwiches were sold, on Wednesday 30 sandwiches were sold, on Thursday 29 sandwiches were sold, and on Friday 18 sandwiches were sold. Enter this data into R as a vector, and call it sandwiches.

View your vector, and check you have entered the data correctly.

There are many situations when we want a vector to be consecutive string of numbers, like \((1, 2, 3, 4, 5, 6, 7, 8, 9, 10)\). R has a shortcut to allow us to enter this quickly using using colon operator :. For example, to get the consecutive integers from 1 to 10 using the : operator, we enter

1:10

 [1]  1  2  3  4  5  6  7  8  9 10

Exercise 2.2. As efficiently as you can, enter the following vectors into R:
(a) \((10, 11, 12, \dots, 29, 30)\),
(b) \((100, 99, 98, \dots, 87, 86)\).
(c) By using : together with the combine function c(), enter \((1, 2, 3, 4, 5, 24, 25, 26, 27, 28)\).

Accessing entries of vectors

We can deal with a vector such as my_vector as a whole, but sometimes we just want to deal with an individual entry within the vector. We do that using square brackets [ ].

So, for example, the fifth entry of my_vector is

my_vector[5]

[1] 2

The second and fourth entries are

my_vector[c(2, 4)]

[1] 3 4

The first three entries are

my_vector[1:3]

[1] 1 3 4

Notice that in the second and third examples we put a vector (c(2, 4) and then 1:3 = c(1, 2, 3)) into the square brackets to pull out multiple entries of my_vector.

We can also assign values to entries using the same notation. So to change the second entry of my_vector to a 5, we enter

my_vector[2] <- 5

and we can check this second entry has indeed changed by reading the vector:

my_vector

[1] 1 5 4 4 2 4 1

Exercise 2.3. The student from Example 2.1 discovers an extra sandwich that was sold on Monday. Increase the first entry of the sandwiches by 1. (Try to do it without looking at what the first entry was previously.)

Operations with vectors

Standard arithmetic operations like +, -, *, /, ^, and so on, can also be used with vectors. They work entry-by-entry. Let’s demonstrate with two vectors of the same length:

vector_1 <- c(3, 3, 4, 4, 5)
vector_2 <- 1:5

Then

vector_1 + vector_2

[1]  4  5  7  8 10

outputs a new vector. The first entry of the new vector is the first entry of vector_1, which is 3, plus the first entry of vector_2, which is 1, making 4. Then the second entry of the new vector is the second entry of vector_1, 3, plus the second entry of vector_2, 2, making 5. And so on.

Note that

vector_1 * vector_2

[1]  3  6 12 16 25

gives the vector of entry-by-entry multiplications, not the scalar dot-product \(\mathbf{x} \cdot \mathbf{y} = \sum_i x_i y_i\). If you want the dot-product, you can use the sum() function from the next section:

sum(vector_1 * vector_2)

[1] 62

Exercise 2.4. The number of baguettes sold in the cafe each day was \((8, 9, 8, 7, 5)\). The student decides to count these as sandwiches, so wants to add these numbers to the sandwiches data from Exercise 2.3. Add these new values to the previous data.

What happens if we try to use one of these operations between a vector and a single number, like this?

vector_1 + 2

[1] 5 5 6 6 7

vector_2 * 3

[1]  3  6  9 12 15

You should notice that R has added 2 to each entry of vector_1, and multiplied every entry of vector_2 by 3. That is because R has “recycled” the scalar (2 or 3) enough times to match the length of the vector.

You should be very careful when applying an operation between vectors of different lengths. R will “recycle” the shorter vector by repeating it until it matches the length of the longer vector. If the shorter vector doesn’t fit into the longer vector a whole number of times, R will also give a warning message after the answer. This often gives behaviour you don’t expect (or don’t want)!

Exercise 2.5. Predict what each of the following five commands will produce. Then run the code in R to see if you were right. If you were wrong, explain what R has done.

c(1, 3, 5) + c(2, 2, 3)
1:5 * c(0, 0, 1, 1, 2)
c(2, 2, 3, 3)^c(1, 2, 2, 1)
1:8 / 1:4
1:7 + c(1, 2)

Functions of vectors

There are a number of useful functions that operate on vectors, including:

sum() gives the sum of all the numbers in the vector.
mean() gives the mean of the numbers in the vector.
length() gives us the length of the vector; that is, how many numbers were in the vector.
min() and max() give the minimum (smallest) and maximum (largest) entries of the vector, respectively.
The are a number of useful statistical functions, like median() (median) sd() (standard deviation), var() (sample variance), IQR() (interquartile range). You can find others by using Google or just experimenting with R to see what works.
Some functions on vectors, like cov() (sample covariance) and cor() (correlation) require two arguments.

So, for example, to find the sum of our vector my_vector, we enter

sum(my_vector)

[1] 21

If we think we’re likely to use the value of the sum many times in the future, it will be convenient to save that sum as an R object itself:

my_total <- sum(my_vector)

Exercise 2.6. Using your sandwiches data from Exercise 2.4, find the mean and median number of sandwiches sold in a day.

In the last worksheet, we saw functions like log(), exp(), round(), and signif() that work on a single number. If you apply these functions to a vector, R applies the function to each entry of the vector one at a time. So, for example,

exp(my_vector)

[1]   2.718282 148.413159  54.598150  54.598150   7.389056  54.598150   2.718282

outputs a vector whose first element is the exponential of the first element of my_vector, whose second element is the exponential of the second element of my_vector, and so on.

Exercise 2.7. Get R to output the square roots of the integers from 1 to 20, all rounded to 2 decimal places.

Assessed questions

Your numerical answer to the following five assessed questions should be submitted to the Microsoft Form that is linked to below. I recommend you do this in Week 3, but the official deadline is Monday 23 October at 1400.

You should submit your work by typing your answers into this Microsoft Form. You may need to log into your University account to access the Form.

This worked will be marked automatically by computer, so make sure you answers are accurate – the computer does not “know what you meant”; only what you actually enter into the form.

So that (most) students get different data to work with, you will work with data whose name is based on your student ID number. Your student ID number is (usually) a 9-digit number starting 201. In the command below, xy is replaced by the last two digits of your student ID number. So if your student ID is 201623429, then xy is 29; while if your student ID is 201790400, then xy is 00. Take care to check you get this correct: if you get your student ID number wrong and/or use the wrong data, the computer is likely to mark all your answers wrong award you 0 total marks.

If asked to round your answer, you should give your answer as produced by R’s round() or signif() functions. (It will not make any difference to the computer marking whether or not `trailing 0s’ are given.)

First, run the following line of code exactly, except that the letters xy toward the end of the line are to be replaced by the last two digits of your student ID number, as described above. The code is this:

source("http://mpaldridge.github.io/math1710/data/R2-xy.R")

If you copy-and-paste this command – which is recommended – remember to replace the letters xy by the last two digits of your student ID before pressing enter.

That line of code should create two vectors for you in R. These two vectors, amaflix and netazon are the stock prices of two companies, “Amaflix” and “Netazon” over a number of days.

Assessed Question 1. How many days of data are in the dataset?

Assessed Question 2. What was the mean stock price of Amaflix? Give your answer to two decimal places.

Assessed Question 3. What was the variance of stock price of Netazon over the first 25 days of the dataset? Round your answer to three significant figures.

Assessed Question 4. Which company’s data was generally more spread out?

Assessed Question 5. Would you say that the relationship between the companies’ stock prices is a strong positive relationship, a strong negative relationship, or a weak relationship?

Remember to submit your answers into this Microsoft Form before 1400 on Monday 23 October.