(Section 12.4)
Make sure these are attached:
[1] "hello" "there" "Mary" "Poppins"
The pattern
parameter in str_*()
functions takes a string that is converted to a regular expression by R.
Try to split on the whitespace:
Before R can convert pattern
to a regex, it has to understand pattern
for what it is: a string!
But \s
is not a control character!
So to get a literal \s
, we must escape the \
:
Moral: Always escape again when the regex you intend uses a backslash for escaping!
Regular Expression | Entered as String |
---|---|
\s+ | “\\s+” |
find\.dot | “find\\.dot” |
^\w*\d{1,3}$ | “^\\w*\\d{1,3}$” |
Suppose that we have a vector of dates:
[1] "3 - 14 - 1963" "4/13/ 2005" "12-1-1997" "11 / 11 / 1918"
We would prefer them to be in just ONE format!
str_replace_all()
[1] "3/14/1963" "4/13/2005" "12/1/1997" "11/11/1918"
Parameters:
x
(not written out here, due to the piping) is the text in which the substitution occurs;pattern
is the regex for the type of sub-string we want to replace;replacment
is what we want to replace matches of the pattern with.str_replace()
[1] "3/14 - 1963" "4/13/ 2005" "12/1-1997" "11/11 / 1918"
Task: write a function called doubleVowels()
that replaces every vowel with its double:
Example of use:
The replacement
argument can include regex features that refer to elements of the pattern
!
Capitalize every vowel:
Put asterisks around every word-repetition:
Some sentences:
Select all and only the strings that contain a word beginning with capital T.
My name is Tom, Sir
And I'm Tiny Tulip!
Whereas my name is Lester.
If you just need to know whether or not there is a match:
My name is Tom, Sir
And I'm Tiny Tulip!
Whereas my name is Lester.
My name is Tom, Sir
And I'm Tiny Tulip!
Whereas my name is Lester.
Extract pairs of words beginning with the same letter in sentences2
defined below:
The big bad wolf is walking warily to the cottage.
He huffs and he puffs peevishly.
He wears gnarly gargantuan bell bottoms!
str_match()
:The big bad wolf is walking warily to the cottage.
He huffs and he puffs peevishly.
He wears gnarly gargantuan bell bottoms!
\\1
capture-group.The big bad wolf is walking warily to the cottage.
He huffs and he puffs peevishly.
He wears gnarly gargantuan bell bottoms!
Recall our motivating example (from package bcscr):
name phone
1 Philson, Mickey 580-789-5775
2 Shiner, Marget (206)948-8169
3 Sackrider, Dionne (432)297-3683
4 Kukowski, Isobel (240)619-8432
5 Isenhour, Garth 6417823425
6 Kapinos, Enid 6018723027
7 Blaker, Theodore (510)812-9092
8 Crossett, Rosaura 6063292954
9 Northern, Willy 551-427-1399
10 Goettl, Latonia (303)242-6982
11 Campagna, Ryann 727-692-1835
12 Wash, Mira 509-216-3598
13 Flansburg, Louann 3049163908
14 Winborne, Angella (678)249-9107
15 Arledge, Marcia (430)625-4239
16 Cookson, Eladia 507-588-4874
17 Tisher, Dee 470-439-4114
18 Difiore, Tyrell 4055294829
19 Colas, Tristan 7857923661
20 Sprenger, Ava (217)343-9603
21 Getman, Jesenia (646)812-6606
22 Starr, Ashley (281)514-6984
23 Raney, Irmgard 573-586-5935
24 Bryson, Dionna (325)627-2149
25 Welk, Bruno 2894782665
26 Dias, Petra 2694361985
27 Alejandro, Nana 254-563-7229
28 Sanson, Jason (469)453-3600
29 Ellerbe, Gracia 320-749-5706
30 Parris, Julius 630-537-5563
31 Tomasello, Rachele (240)696-2942
32 Tackitt, Mireille 3312028129
33 Taliaferro, Kaycee 7622728177
34 Imperato, Natalya 6572415716
35 Letcher, Basilia (401)437-2309
36 Gallaher, Deena 269-521-6040
37 Pierri, Viola 6572108846
38 Benefiel, Chante 4257611776
39 Phan, Kellye 479-325-3593
40 Cosenza, Saul 2165746335
41 Neihoff, Velvet 337-314-5395
42 Arboleda, Lynsey (306)409-9494
43 Metcalfe, Mervin (319)219-2300
44 Hammes, Stefani 630-629-4630
45 Nordahl, Yahaira (610)390-8353
46 Nader, Marceline 660-299-3416
47 Lasorsa, Vicente 7135491648
48 Bessette, Esther 4257614047
49 Hinchman, Marisela 8479223654
50 Lippincott, Lucia 631-512-5400
tidyr::extract()
Does the Job! last first area office line
1 Philson Mickey 580 789 5775
2 Shiner Marget 206 948 8169
3 Sackrider Dionne 432 297 3683
4 Kukowski Isobel 240 619 8432
5 Isenhour Garth 641 782 3425
6 Kapinos Enid 601 872 3027
7 Blaker Theodore 510 812 9092
8 Crossett Rosaura 606 329 2954
9 Northern Willy 551 427 1399
10 Goettl Latonia 303 242 6982
11 Campagna Ryann 727 692 1835
12 Wash Mira 509 216 3598
13 Flansburg Louann 304 916 3908
14 Winborne Angella 678 249 9107
15 Arledge Marcia 430 625 4239
16 Cookson Eladia 507 588 4874
17 Tisher Dee 470 439 4114
18 Difiore Tyrell 405 529 4829
19 Colas Tristan 785 792 3661
20 Sprenger Ava 217 343 9603
21 Getman Jesenia 646 812 6606
22 Starr Ashley 281 514 6984
23 Raney Irmgard 573 586 5935
24 Bryson Dionna 325 627 2149
25 Welk Bruno 289 478 2665
26 Dias Petra 269 436 1985
27 Alejandro Nana 254 563 7229
28 Sanson Jason 469 453 3600
29 Ellerbe Gracia 320 749 5706
30 Parris Julius 630 537 5563
31 Tomasello Rachele 240 696 2942
32 Tackitt Mireille 331 202 8129
33 Taliaferro Kaycee 762 272 8177
34 Imperato Natalya 657 241 5716
35 Letcher Basilia 401 437 2309
36 Gallaher Deena 269 521 6040
37 Pierri Viola 657 210 8846
38 Benefiel Chante 425 761 1776
39 Phan Kellye 479 325 3593
40 Cosenza Saul 216 574 6335
41 Neihoff Velvet 337 314 5395
42 Arboleda Lynsey 306 409 9494
43 Metcalfe Mervin 319 219 2300
44 Hammes Stefani 630 629 4630
45 Nordahl Yahaira 610 390 8353
46 Nader Marceline 660 299 3416
47 Lasorsa Vicente 713 549 1648
48 Bessette Esther 425 761 4047
49 Hinchman Marisela 847 922 3654
50 Lippincott Lucia 631 512 5400
name last first phone area office line
1 Philson, Mickey Philson Mickey 580-789-5775 580 789 5775
2 Shiner, Marget Shiner Marget (206)948-8169 206 948 8169
3 Sackrider, Dionne Sackrider Dionne (432)297-3683 432 297 3683
4 Kukowski, Isobel Kukowski Isobel (240)619-8432 240 619 8432
5 Isenhour, Garth Isenhour Garth 6417823425 641 782 3425
6 Kapinos, Enid Kapinos Enid 6018723027 601 872 3027
7 Blaker, Theodore Blaker Theodore (510)812-9092 510 812 9092
8 Crossett, Rosaura Crossett Rosaura 6063292954 606 329 2954
9 Northern, Willy Northern Willy 551-427-1399 551 427 1399
10 Goettl, Latonia Goettl Latonia (303)242-6982 303 242 6982
11 Campagna, Ryann Campagna Ryann 727-692-1835 727 692 1835
12 Wash, Mira Wash Mira 509-216-3598 509 216 3598
13 Flansburg, Louann Flansburg Louann 3049163908 304 916 3908
14 Winborne, Angella Winborne Angella (678)249-9107 678 249 9107
15 Arledge, Marcia Arledge Marcia (430)625-4239 430 625 4239
16 Cookson, Eladia Cookson Eladia 507-588-4874 507 588 4874
17 Tisher, Dee Tisher Dee 470-439-4114 470 439 4114
18 Difiore, Tyrell Difiore Tyrell 4055294829 405 529 4829
19 Colas, Tristan Colas Tristan 7857923661 785 792 3661
20 Sprenger, Ava Sprenger Ava (217)343-9603 217 343 9603
21 Getman, Jesenia Getman Jesenia (646)812-6606 646 812 6606
22 Starr, Ashley Starr Ashley (281)514-6984 281 514 6984
23 Raney, Irmgard Raney Irmgard 573-586-5935 573 586 5935
24 Bryson, Dionna Bryson Dionna (325)627-2149 325 627 2149
25 Welk, Bruno Welk Bruno 2894782665 289 478 2665
26 Dias, Petra Dias Petra 2694361985 269 436 1985
27 Alejandro, Nana Alejandro Nana 254-563-7229 254 563 7229
28 Sanson, Jason Sanson Jason (469)453-3600 469 453 3600
29 Ellerbe, Gracia Ellerbe Gracia 320-749-5706 320 749 5706
30 Parris, Julius Parris Julius 630-537-5563 630 537 5563
31 Tomasello, Rachele Tomasello Rachele (240)696-2942 240 696 2942
32 Tackitt, Mireille Tackitt Mireille 3312028129 331 202 8129
33 Taliaferro, Kaycee Taliaferro Kaycee 7622728177 762 272 8177
34 Imperato, Natalya Imperato Natalya 6572415716 657 241 5716
35 Letcher, Basilia Letcher Basilia (401)437-2309 401 437 2309
36 Gallaher, Deena Gallaher Deena 269-521-6040 269 521 6040
37 Pierri, Viola Pierri Viola 6572108846 657 210 8846
38 Benefiel, Chante Benefiel Chante 4257611776 425 761 1776
39 Phan, Kellye Phan Kellye 479-325-3593 479 325 3593
40 Cosenza, Saul Cosenza Saul 2165746335 216 574 6335
41 Neihoff, Velvet Neihoff Velvet 337-314-5395 337 314 5395
42 Arboleda, Lynsey Arboleda Lynsey (306)409-9494 306 409 9494
43 Metcalfe, Mervin Metcalfe Mervin (319)219-2300 319 219 2300
44 Hammes, Stefani Hammes Stefani 630-629-4630 630 629 4630
45 Nordahl, Yahaira Nordahl Yahaira (610)390-8353 610 390 8353
46 Nader, Marceline Nader Marceline 660-299-3416 660 299 3416
47 Lasorsa, Vicente Lasorsa Vicente 7135491648 713 549 1648
48 Bessette, Esther Bessette Esther 4257614047 425 761 4047
49 Hinchman, Marisela Hinchman Marisela 8479223654 847 922 3654
50 Lippincott, Lucia Lippincott Lucia 631-512-5400 631 512 5400
Count the number of words in a string that begin with a lower or uppercase p
.
How might we find the words in a string that contain three or more of the same letter?
In some other languages you see a regex like this:
/regex/gm
Modes are at the end. Popular modes:
g
: “global”, looking for all possible matches in the string;i
: “case-insensitive” mode, so that letter-characters in the regex match both their upper and lower-case versions;m
: “multiline” mode, so that the anchors ^
and $
are attached to newlines within the string rather than to the absolute beginning and end of the string;x
: “white-space” mode, where white-spaces in the regex are ignored unless they are escaped (useful for lining out the regex and inserting comments to explain its operation).Since stringr has _all
versions of the main regex functions, we don’t usually have to worry about setting global mode in R.
Use (? )
whenever you want the modes to take effect. (Usually at the beginning of the string.)
Example:
"(?im)t[aeiou]{1,3}$"
At the end of lines in the string we are looking for t (or T) followed by one to three vowels. Uppercase, or lowercase – doesn’t matter.
The big bad wolf is walking warily to the cottage.
He huffs and he puffs peevishly.
He wears gnarly gargantuan bell bottoms!
Let’s try some of these ideas:
https://homerhanumat.github.io/r-notes/110-regex.html#practice-exercises
Comments With “Ignore Whitespace” Mode