Patterns and Sub-Setting

(Sections 2.2-2.5)

Making Patterned Vectors

Sequencing

Consider the seq() function:

The default value of the parameter by is 1, so we could get the same thing with:

More Examples

Going to to, But Not Past It

Negative Steps are OK

Colon Operator:

A shortcut for sequencing, when by is 1 or -1.

Going up …

Going down …

Repeating Vectors

We can apply rep() to a vector of length greater than 1:

vec <- c(7, 3, 4)
rep(vec, times = 3)
[1] 7 3 4 7 3 4 7 3 4

You Can rep Character Vectors

The each Parameter for rep()

vec <- c(7, 3, 4)
rep(vec, each = 2, times = 3)
 [1] 7 7 3 3 4 4 7 7 3 3 4 4 7 7 3 3 4 4

Varying times

vec <- c("x", "y", "z")
rep(vec, times = 1:3)
[1] "x" "y" "y" "z" "z" "z"

Complex Patterns

In order to make:

  • fifty 10’s followed by
  • fifty 30’s followed by
  • fifty 50’s followed by …
  • … fifty 150’s

Write:

Practice

Write one-line commands to produce each of the following:

  • the lowercase letters of the alphabet, repeated three times
  • one A, two B’s, three C’s, …, twenty-six Z’s.
  • the real numbers 0, 0.01, 0.02, …, 1.98, 1.99, 2.00

Sub-Setting

Definition

Sub-setting

The operation of selecting one or more elements from a vector.

A Sample Vector

Recall heights:

heights <- c(72, 70, 69, 58, NA, 45)
names(heights) <- c("Scarecrow", "Tinman", "Lion", "Dorothy", "Toto", "Boq")
heights
Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
       72        70        69        58        NA        45 

The Bracket Operator

Find subsets of vectors using brackets:

Get Any Number of Them

If we want two or more elements, then we specify their indices in a vector.

Also OK to be direct:

Negative Numbers are Significant

Outside of the Range …

Patterned Vectors Useful!

Names for Sub-Setting

Sub-setting to Modify a Vector

heights["Dorothy"] <- 60
heights
Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
       72        70        69        60        NA        45 

We can replace more than one element:

heights[c("Scarecrow", "Boq")] <- c(73, 46)
heights
Scarecrow    Tinman      Lion   Dorothy      Toto       Boq 
       73        70        69        60        NA        46 

The subset of indices may be as complex as you like:

Sub-setting to Rearrange

More on Logical Vectors

Boolean Expressions

Boolean expressions are expressions that evaluate to a logical vector:

Element-Wise Evaluation

Some Boolean Operators

Operation What It Means
< less than
> greater than
<= less than or equal to
>= greater than or equal to
== equal to
& and
| or
! not

Inequalities and Character Vectors

Why?

  • D comes before t in the alphabet;
  • lowercase t comes before uppercase T, according to R;
  • characters for numbers come before letter-characters, according to R.

Equality Operator

The equality (==) operator indicates whether the expressions being compared evaluate to the same value.

Made with two equal-signs, not one!

It’s not about strict identity:

a <- c(Dorothy = 1,Toto = 2) # a named vector
b <- c(Glinda = 1, Tinman = 2)  # different vector (names different)
# but they count as "equal":
a == b
Dorothy    Toto 
   TRUE    TRUE 

And, Or, Not

a <- c(TRUE, TRUE, FALSE, FALSE)
b <- c(TRUE, FALSE, TRUE, FALSE)
a & b  # "a and b"
[1]  TRUE FALSE FALSE FALSE
a | b  # "a or b"
[1]  TRUE  TRUE  TRUE FALSE
!c(TRUE, FALSE)  # "not"
[1] FALSE  TRUE

Recycling

Why Does This Work?

c(2, 3, 6, 7) > 5
[1] FALSE FALSE  TRUE  TRUE

After all:

  • c(2, 3, 6, 7) has length 4
  • 5 only has length 1

Answer: the 5 was recycled.

Definition

Recycling

An automatic process by which R, when given two vectors, repeats elements of the shorter vector until it is as long as the longer vector.

Recycling enables the two resulting vectors to be combined element-wise in operations.

Sub-Setting with Logical Vectors

Desired Heights

Recall our heights vector:

heights <- c(Scarecrow = 72, Lion = 70, Tinman = 69, 
             Dorothy = 58, Toto = NA, Boq = 45)
heights
Scarecrow      Lion    Tinman   Dorothy      Toto       Boq 
       72        70        69        58        NA        45 

We want the heights of Scarecrow, Tinman and Dorothy. Here’s one way:

Another Example

Select those persons whose heights exceed a certain amount.

#heights of some people:
people <- c(55, 64, 67, 70, 63, 72)
tall <- (people >= 70)
tall
[1] FALSE FALSE FALSE  TRUE FALSE  TRUE
people[tall]
[1] 70 72

All at Once

We think: “Select from people, where people is at least 70.”

Sub-setting with a Different Vector

Ages and heights of some people:

age <- c(23, 21, 22, 25, 63)
height <- c(68, 67, 71, 70, 69)

Get the ages of people who are over 70 inches tall.

age[height < 70]
[1] 23 21 63

Logically-Complex Sub-Setting

Get the heights of people who are less than 60 years old and who also like Toto.

age <- c(23, 21, 22, 25, 63)
height <- c(68, 67, 71, 70, 69)
likesToto <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
height[age < 60 & likesToto]
[1] 68 67

Counting

How many people are no more than 70 inches tall?

length(people[people < 70])
[1] 4

Practice

peopleNames <- c("Raj", "Bettina", "Nisha", "Zephyr")
peopleAges <- c(20, 30, 25, 24)
peopleHeights <- c(72, 68, 69, 66)

Write one-line commands to find:

  • the ages of everyone who is under 25
  • the heights of everyone who is under 25
  • the heights of everyone whose name comes after “Q” in the alphabet
  • the names of everyone who is between 60 and 69.5 inches tall
  • the heights of everyone other than Raj

NA-Caution

Effect of NA on Sub-Setting

heights
Scarecrow      Lion    Tinman   Dorothy      Toto       Boq 
       72        70        69        58        NA        45 
tall <- (heights > 65)
tall
Scarecrow      Lion    Tinman   Dorothy      Toto       Boq 
     TRUE      TRUE      TRUE     FALSE        NA     FALSE 

Toto’s height was missing.

  • R can’t say whether or not he was more than 65 inches tall.
  • Hence it assigns NA to the Toto-element of the tall vector.

Which, Any, All

which()

Applied to a logical vector, the which() function returns the indices of the vector that have the value TRUE:

Find the indices of heights where the heights are at least 65:

which(heights > 65)
Scarecrow      Lion    Tinman 
        1         2         3 

any()

Is anyone more than 71 inches tall?

heights
Scarecrow      Lion    Tinman   Dorothy      Toto       Boq 
       72        70        69        58        NA        45 
any(heights > 71)
[1] TRUE

Yes: the Scarecrow is more than 71 inches tall.

Does a Value Appear?

vec <- c("Dorothy", "Tin Man", "Scarecrow", "Glinda")
any(vec == "Tin Man")
[1] TRUE
any(vec == "Wizard")
[1] FALSE

The %in%-Operator

A shortcut to the previous constrcutions involving any():

all()

Is everyone more than 71 inches tall?

all(heights > 71)
[1] FALSE

Careful about NAs! Is everyone more than 40 inches tall?

all(heights > 40)
[1] NA

Toto’s height is NA so R can’t say whether all the heights are bigger than 40.