(Section 8.1)
It’s hard to succinctly describe how ggplot2 works because it embodies a deep philosophy of visualisation.
—Hadley Wickham (author of the ggplot2 package)
At first we focus on these key concepts. Then we will learn how to translate these concepts into code.
Frame
The relationship between position and the data being plotted.
m111survey
height sex fastest GPA
1 76 male 119 3.56
2 74 male 110 2.50
3 64 female 85 3.80
4 62 female 100 3.50
5 72 male 95 3.20
m111survey
Research Question: What’s the relationship between fastest
and GPA
?
Define the frame with two variables: fastest
and GPA
.
Glyph
The basic graphical unit that corresponds to a case in the data table.
m111survey
Scatter PlotIn the m111survey
graph, let’s represent each student (case) with a point.
The points are the glyphs.
fastest
for the case.GPA
for the case.An aesthetic is a perceptible property of a glyph that varies from case to case.
We already know two aesthetics:
Some other possible aesthetics are:
m111survey
Let’s use the color of each point to indicate the sex of the student.
We are mapping the aesthetic “color” to the variable sex
.
Let’s also map the aesthetic “size” to the variable height
.
Scale
The relationship between the value of a variable and the graphical attribute to be displayed for that value.
Example: we mapped color to sex
. R chose to set the value “female” to a reddish color, and the value “male” to a turquoise-blue color. That choice was the choice of a scale. (You can make R use a different scale if you like.)
Every aesthetic mapping involves a scale. R has default scales ready to use, if you don’t choose you own.
This scale maps:
Guide
An indication, for the human viewer, of the scale being used in an aesthetic mapping.
A guide takes you backwards: from the perceptual property to the data value it represents.
fastest
GPA
sex
A bar graph of sex
in m111survey
:
Note:
Let’s map the aesthetic “fill” to weight_feel
:
Question:
How are the fastest speeds driven distributed, for students in the
m111survey
data?
Let’s investigate with a histogram.
fastest
.fastest
in linear fashionWe can layer the histogram with a rug of jittered speeds.
Question:
Is there a relationship between seating preference and the fastest speed ever driven?
fastest
)seat
)seat
)These are a good alternative to density plots, especially when studying the relationship between a numerical variable and a categorical variable.
These are useful in about the same range of circumstances as violin plots.
In a list of values:
\(Q_3 - Q_1\) is called the interquartile range (IQR).
When there are no outliers:
Question:
How does fastest speed drive relate to sex and to seating preference?
sex
, y-location mapped to fastest
.seat
.