Patterns and Sub-Setting

(Sections 2.2-2.5)

Making Patterned Vectors

Sequencing

Consider the seq() function:

seq(from = 5, to = 15, by = 1)
 [1]  5  6  7  8  9 10 11 12 13 14 15

The default value of the parameter by is 1, so we could get the same thing with:

seq(from = 5, to = 15)
 [1]  5  6  7  8  9 10 11 12 13 14 15

More Examples

seq(5,10)  # from is 5, to is 10
[1]  5  6  7  8  9 10
seq(3, 15, 2)
[1]  3  5  7  9 11 13 15
seq(0, 1, 0.1)
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Going to to, But Not Past It

seq(3, 16, 2)
[1]  3  5  7  9 11 13 15

Negative Steps are OK

seq(5, -4, -1)
 [1]  5  4  3  2  1  0 -1 -2 -3 -4

Colon Operator:

A shortcut for sequencing, when by is 1 or -1.

Going up …

1:5
[1] 1 2 3 4 5

Going down …

5:1
[1] 5 4 3 2 1

Repeating Vectors

rep(3, times = 5)
[1] 3 3 3 3 3

We can apply rep() to a vector of length greater than 1:

vec <- c(7, 3, 4)
rep(vec, times = 3)
[1] 7 3 4 7 3 4 7 3 4

You Can rep Character Vectors

rep("Toto", 4)
[1] "Toto" "Toto" "Toto" "Toto"

The each Parameter for rep()

vec <- c(7, 3, 4)
rep(vec, each = 2, times = 3)
 [1] 7 7 3 3 4 4 7 7 3 3 4 4 7 7 3 3 4 4

Varying times

vec <- c("x", "y", "z")
rep(vec, times = 1:3)
[1] "x" "y" "y" "z" "z" "z"

Complex Patterns

In order to make:

  • fifty 10’s followed by
  • fifty 30’s followed by
  • fifty 50’s followed by …
  • … fifty 150’s

Write:

rep(seq(10, 150, 20), each = 50)

Practice

Write one-line commands to produce each of the following:

  • the lowercase letters of the alphabet, repeated three times
  • one A, two B’s, three C’s, …, twenty-six Z’s.
  • the real numbers 0, 0.01, 0.02, …, 1.98, 1.99, 2.00

Sub-Setting

Definition

Sub-setting

The operation of selecting one or more elements from a vector.

A Sample Vector

Recall heights:

heights <- c(72, 70, 69, 58, NA, 45)
names(heights) <- c("Scarecrow", "Tinman", "Lion", "Dorothy", "Toto", "Boq")
heights
Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
       72        70        69        58        NA        45 

The Bracket Operator

Find subsets of vectors using brackets:

heights[4]  # the fourth element only
Dorothy 
     58 

Get Any Number of Them

If we want two or more elements, then we specify their indices in a vector.

desired <- c(1,5)  # want first and fifth elements
heights[desired]
Scarecrow      Toto 
       72        NA 

Also OK to be direct:

heights[c(1,5)]
Scarecrow      Toto 
       72        NA 

Negative Numbers are Significant

heights[-2] # select all but second element
Scarecrow      Lion   Dorothy      Toto       Boq 
       72        69        58        NA        45 
heights[-c(1,3)]  # all but first and third
 Tinman Dorothy    Toto     Boq 
     70      58      NA      45 

Outside of the Range …

heights[7]  # only has elements 1 through 6
<NA> 
  NA 

Patterned Vectors Useful!

evenPeople <- seq(2,6,2)
heights[evenPeople]
 Tinman Dorothy     Boq 
     70      58      45 

Quick Names for a Vector:

vec <- c(23, 14, 82, 33, 33, 45)
names(vec) <- LETTERS[1:length(vec)]
vec
 A  B  C  D  E  F 
23 14 82 33 33 45 

Names for Sub-Setting

heights["Tinman"]
Tinman 
    70 
heights[c("Scarecrow", "Boq")]
Scarecrow       Boq 
       72        45 

Sub-setting to Modify a Vector

heights["Dorothy"] <- 60

We can replace more than one element:

heights[c("Scarecrow", "Boq")] <- c(73, 46)

The subset of indices may be as complex as you like:

vec <- c(3,4,5,6,7,8)
vec[seq(from = 2, to = 6, by = 2)] <- c(100, 200, 300)
vec
[1]   3 100   5 200   7 300

Sub-setting to Rearrange

inhabitants <- c("Oz", "Toto", "Boq", "Glinda")
permuted <- inhabitants[c(3,4,1,2)]
permuted
[1] "Boq"    "Glinda" "Oz"     "Toto"  

More on Logical Vectors

Boolean Expressions

Boolean expressions are expressions that evaluate to a logical vector:

13 < 20
[1] TRUE
13 < 5
[1] FALSE

Element-Wise Evaluation

a <- c(10, 13, 17)
b <- c(8, 15, 12)
a < b  # a Boolean expression!
[1] FALSE  TRUE FALSE

Some Boolean Operators

Operation What It Means
< less than
> greater than
<= less than or equal to
>= greater than or equal to
== equal to
& and
| or
! not

Inequalities and Character Vectors

a<- c("Dorothy", "toto", "Boq")
b <- c("tinman", "Toto", "2017")
a < b
[1]  TRUE  TRUE FALSE

Why?

  • D comes before t in the alphabet;
  • lowercase t comes before uppercase T, according to R;
  • characters for numbers come before letter-characters, according to R.

Equality Operator

The equality (==) operator indicates whether the expressions being compared evaluate to the same value.

Made with two equal-signs, not one!

It’s not about strict identity.

a <- c(Dorothy = 1,Toto = 2) # a named vector
b <- c(Glinda = 1, Tinman = 2)  # different vector (names different)
a == b
Dorothy    Toto 
   TRUE    TRUE 

And, Or, Not

a <- c(TRUE, TRUE, FALSE, FALSE)
b <- c(TRUE, FALSE, TRUE, FALSE)
a & b  # "a and b"
[1]  TRUE FALSE FALSE FALSE
a | b  # "a or b"
[1]  TRUE  TRUE  TRUE FALSE
!c(TRUE, FALSE)  # "not"
[1] FALSE  TRUE

Recycling

Why Does This Work?

c(2, 3, 6, 7) > 5
[1] FALSE FALSE  TRUE  TRUE

After all:

  • c(2, 3, 6, 7) has length 4
  • 5 only has length 1

Answer: the 5 was recycled.

Definition

Recycling

An automatic process by which R, when given two vectors, repeats elements of the shorter vector until it is as long as the longer vector.

Recycling enables the two resulting vectors to be combined element-wise in operations.

Sub-Setting with Logical Vectors

Desired Heights

Recall our heights vector:

heights <- c(Scarecrow = 72, Lion = 70, Tinman = 69, 
             Dorothy = 58, Toto = NA, Boq = 45)
heights
Scarecrow      Lion    Tinman   Dorothy      Toto       Boq 
       72        70        69        58        NA        45 

We want the heights of Scarecrow, Tinman and Dorothy. Here’s one way:

wanted <- c(TRUE, FALSE, TRUE, TRUE, FALSE, FALSE)
heights[wanted]
Scarecrow    Tinman   Dorothy 
       72        69        58 

Another Example

Select those persons whose heights exceed a certain amount.

#heights of some people:
people <- c(55, 64, 67, 70, 63, 72)
tall <- (people >= 70)
tall
[1] FALSE FALSE FALSE  TRUE FALSE  TRUE
people[tall]
[1] 70 72

All at Once

people[people >= 70]
[1] 70 72

We think: “Select from people, where people is at least 70.”

Sub-setting with a Different Vector

Get the ages of people who are over 70 inches tall.

age <- c(23, 21, 22, 25, 63)
height <- c(68, 67, 71, 70, 69)
age[height < 70]
[1] 23 21 63

Logically-Complex Sub-Setting

Get the heights of people who are less than 60 years old and who also like Toto.

age <- c(23, 21, 22, 25, 63)
height <- c(68, 67, 71, 70, 69)
likesToto <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
height[age < 60 & likesToto]
[1] 68 67

Counting

How many people are no more than 70 inches tall?

length(people[people < 70])
[1] 4

Practice

peopleNames <- c("Raj", "Bettina", "Nisha", "Zephyr")
peopleAges <- c(20, 30, 25, 24)
peopleHeights <- c(72, 68, 69, 66)

Write one-line commands to find:

  • the ages of everyone who is under 25
  • the heights of everyone who is under 25
  • the heights of everyone whose name comes after “Q” in the alphabet
  • the names of everyone who is between 60 and 69.5 inches tall
  • the heights of everyone other than Raj

NA-Caution

Effect of NA on Sub-Setting

heights
Scarecrow      Lion    Tinman   Dorothy      Toto       Boq 
       72        70        69        58        NA        45 
tall <- (heights > 65)
tall
Scarecrow      Lion    Tinman   Dorothy      Toto       Boq 
     TRUE      TRUE      TRUE     FALSE        NA     FALSE 

Toto’s height was missing.

  • R can’t say whether or not he was more than 65 inches tall.
  • Hence it assigns NA to the Toto-element of the tall vector.

Which, Any, All

which()

Applied to a logical vector, the which() function returns the indices of the vector that have the value TRUE:

boolVec <- c(TRUE,TRUE,FALSE,TRUE)
which(boolVec)
[1] 1 2 4

Find the indices of heights where the heights are at least 65:

which(heights > 65)
Scarecrow      Lion    Tinman 
        1         2         3 

any()

Is anyone more than 71 inches tall?

heights
Scarecrow      Lion    Tinman   Dorothy      Toto       Boq 
       72        70        69        58        NA        45 
any(heights > 71)
[1] TRUE

Yes: the Scarecrow is more than 71 inches tall.

Does a Value Appear?

vec <- c("Dorothy", "Tin Man", "Scarecrow", "Glinda")
any(vec == "Tin Man")
[1] TRUE
any(vec == "Wizard")
[1] FALSE

The %in%-Operator

A shortcut to the previous constrcutions involving any():

"Tin Man" %in% c("Dorothy", "Tin Man", "Scarecrow", "Glinda")
[1] TRUE
"Wizard" %in% c("Dorothy", "Tin Man", "Scarecrow", "Glinda")
[1] FALSE

all()

Is everyone more than 71 inches tall?

all(heights > 71)
[1] FALSE

Careful about NAs! Is everyone more than 40 inches tall?

all(heights > 40)
[1] NA

Toto’s height is NA so R can’t say whether all the heights are bigger than 40.