[1] "character"
(All Sections)
We have met strings before.
String
A sequence of charactersWe use quotes to mark the beginning and end of a string.
Like all data types in R, strings do not exist on their own. They are always elements of a vector.
… can be used as delimiters. Your choice won’t affect the value of the string:
… in a vector:
Best Practice in R: Use double-quotes whenever possible.
Naively construed, characters are the things you can type on your computer keyboard:
You can include quotes in a string. But you have to be careful. For example, how would you get the following in your Console?
## "Welcome", she said, "the coffee's on me!"
This doesn’t work:
You have to escape the special meaning of the quotes-marks, if you want them appear inside a string:
The back-slash character \
is an example of a special character called a control character.
Control Character
A member of a character set that does not represent a written symbol.The function of \
is to escape the regular meaning of the character immediately following it.
\
?Suppose you need to write:
The Windows path is: "C:\\Inetpub\\vhosts\\example.com".
Then you must escape the backslash with another backslash!
Other control characters can be inserted into a string with the \
. For example, we have already met newline:
The backslash escapes the ordinary meaning of n
, making it stand for the newline control character instead.
Character | Meaning |
---|---|
\n | newline |
\r | carriage return |
\t | tab |
\b | backspace |
\a | alert (bell) |
\f | form feed |
\v | vertical tab |
Mostly we will stick with:
\
\t
\n
The \
can generate non-control characters, too. For example, it can help form Unicode characters.
Unicode
A computing-industry standard for the consistent encoding of text in most of the world’s written languages.stringr comes along with the tidyverse.
We’ll use it a lot for basic manipulation of strings.
How many characters are in the string "hello"
?
Note that the following gives the wrong answer:
Many basic strings operations are vector-in, vector-out:
You can assign a new value to part of a string:
Let’s see if that worked:
Many basic strings operations are vector-in, vector-out:
Watch closely:
Use str_trim()
:
You can make all of the letters in a string lowercase:
You can make them all uppercase:
Consider the following character vector that records several dates:
How can you get access to each element (month, day, year)?
str_split()
will do the job for you:
[[1]]
[1] "3" "14" "1963"
[[2]]
[1] "04" "01" "1965"
[[3]]
[1] "12" "2" "1983"
This is a list!
Let’s split a string into its words:
Splitting on the space would not have worked if some of the words had been separated by more than one space:
"you have won the lottery" %>% # two spaces between 'the' and 'lottery'
str_split(pattern = " ") %>%
unlist()
[1] "you" "have" "won" "the" "" "lottery"
We’ll address this issue in the next Chapter.
In order to split a string into its constituent characters, split on the empty string:
You could use this idea to, say, count the number of occurrences of “a” in a word:
… stringr is way ahead of you, there:
The stringr counterpart to paste()
is str_c()
:
The default is to separate the arguments with the empty string. But you can separate by something else:
This doesn’t work:
This does: