(Section 8.2)
Make sure the packages we will use are loaded:
We want to build this scatter plot with ggplot2.
Most people don’t name the parameters for ggplot()
, they just put the arguments in the right order:
The general form is:
geom_gylphType()
Thus we have such things as:
geom_point()
for points;geom_bar()
for the bars of a bar graph;geom_histogram()
for the rectangles that make up a histogram;geom_density()
for the curve of a density plot;geom_violin()
for the violins of a violin plot;geom_jitter()
for jittered points representing individual cases;geom_rug()
for rug-ticks representing individual cases;And many more!
x and y-location are aesthetic properties mapped to variables when we make the frame.
You can map other aesthetic properties, such as color.
Label your axes, providing units if you have them.
The color-setting is not wrapped in a call to aes()
. All points will be blue, no matter what the data says, so here color
is NOT an aesthetic.
Examples of Fixed Properties, for Points:
Code | Effect |
---|---|
geom_point(color = "blue") |
all the points are blue |
geom_point(shape = 22) |
all points are solid squares |
geom_point(size = 3) |
all points are bigger than default size (1) |
(Or you can aesthetically map these properties!)
geom_point(aes(color = sex))
geom_point(aes(shape = sex))
geom_point(aes(size = height))
Use bar plots to study categorical variables.
seat
seat
by sex
Make a density plot of fastest speed ever driven:
We’ll make violin plots, and layer on jittered points:
ggplot2 has two functions for splitting a plot into facets:
facet_grid()
facet_wrap()
Let’s investigate facet-ting while looking at some new data:
Our aim:
Study how
season
anddayType
relate tovolume
(the number of people who use the trail on a given day).
hightemp lowtemp avgtemp cloudcover precip volume weekday dayType season
1 83 50 66.5 7.6 0.00 501 TRUE weekday summer
2 73 49 61.0 6.3 0.29 419 TRUE weekday summer
3 74 52 63.0 7.5 0.32 397 TRUE weekday spring
4 95 61 78.0 2.6 0.00 385 FALSE weekend summer
5 44 52 48.0 10.0 0.14 200 TRUE weekday spring
6 69 54 61.5 6.6 0.02 375 TRUE weekday spring
Hmm, difficult to compare across a row.
Let aesthetic mapping do some of the work:
Use facet_wrap()
when you want to split up by categorical variable that has lots of values.
Let’s study the ages of employees in the various sectors of employment.
Sometimes more aesthetic mapping is better:
Use named colors from colors()
to customize your plots:
aes()
Trying to aesethetically map color to sex, this won’t work:
Error in layer(data = data, mapping = mapping, stat = stat,
geom = GeomPoint, : object 'sex' not found
aes()
Recall this density plot with rug:
The rug-ticks over-plot each other. How to fix this?
Create a new variable of randomly-jittered speeds:
Each “jittered” speed is the true speed \(\pm\) a random real number up to 1.