(practice with the babynames
data)
This package exists in order to give us the data table babynames
.
# A tibble: 1,924,665 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724
2 1880 F Anna 2604 0.0267
3 1880 F Emma 2003 0.0205
4 1880 F Elizabeth 1939 0.0199
5 1880 F Minnie 1746 0.0179
6 1880 F Margaret 1578 0.0162
7 1880 F Ida 1472 0.0151
8 1880 F Alice 1414 0.0145
9 1880 F Bertha 1320 0.0135
10 1880 F Sarah 1288 0.0132
# … with 1,924,655 more rows
How popular has the name “Mary” been over the years?
How popular has your name been, over the years?
(Work with percentages, not absolute number of births.)
Let’s investigate the popularity of the name “Prince” as a name for boys, since the year 1970.
We will make special note of 1978, the year that Prince released his classic album Purple Rain.
babynames %>%
filter(name == "Prince" & year >= 1970 & sex == "M") %>%
mutate(perc = prop * 100) %>%
ggplot(aes(x = year, y = perc)) +
geom_line() +
geom_vline(aes(xintercept = 1978), color = "purple") +
labs(
x = NULL,
y = 'Percentage of males named "Prince"',
title = "There are more Princes, Now!",
subtitle = "(after Purple Rain was released in 1978)"
)
Which name for males has been more popular over the years: “Homer”, or “Jalen”?
What are the top 5 most popular female names for each decade from the 1950s through the 2000-oughts?
… you reversed the order of grouping?