Graphing with ggplot2

(Section 8.2)

Load Packages

Make sure the packages we will use are loaded:

library(ggplot2)
library(bcscr)
library(mosaicData)

First Graph Goal

We want to build this scatter plot with ggplot2.

The Frame

Build from a Data Frame

Start with the ggplot().

ggplot(data = m111survey)

All variables involved in the plot will come from data.

Blank (for now!)

Establish the Graph Frame

Map x and y location to variables.

ggplot(
  data = m111survey,
  mapping = aes(
    x = fastest, 
    y = GPA
  )
)

Axes set up, but still no points.

Skip Naming the Parameters

Most people don’t name the parameters for ggplot(), they just put the arguments in the right order:

Adding Glyphs

General Form

The general form is:

geom_gylphType()

Thus we have such things as:

  • geom_point() for points;
  • geom_bar() for the bars of a bar graph;
  • geom_histogram() for the rectangles that make up a histogram;
  • geom_density() for the curve of a density plot;
  • geom_violin() for the violins of a violin plot;
  • geom_jitter() for jittered points representing individual cases;
  • geom_rug() for rug-ticks representing individual cases;

And many more!

Add Point Glyphs

Adding Aesthetic Mappings

Mapping New Aesthetic Properties

x and y-location are aesthetic properties mapped to variables when we make the frame.

You can map other aesthetic properties, such as color.

Add a Mappng to Color

Adding Labels

Label your axes, providing units if you have them.

Add Labels

Aesthetic Mappings vs. Fixed Properties

A Fixed Property

The color-setting is not wrapped in a call to aes(). All points will be blue, no matter what the data says, so here color is NOT an aesthetic.

Examples of Fixed Properties, for Points:

Code Effect
geom_point(color = "blue") all the points are blue
geom_point(shape = 22) all points are solid squares
geom_point(size = 3) all points are bigger than default size (1)

(Or you can aesthetically map these properties!)

  • geom_point(aes(color = sex))
  • geom_point(aes(shape = sex))
  • geom_point(aes(size = height))

Bar Plots

Use bar plots to study categorical variables.

Distribution of seat

Expand to see code
ggplot(m111survey, aes(x = seat)) +
  geom_bar(color = "black", fill = "skyblue") +
  labs(
    x = "seating perference",
    title = "The middle is popular!"
  )

seat by sex

Dodging

Density Plots

Fastest Speed Ever Driven

Make a density plot of fastest speed ever driven:

Layer in a Rug

Violin Plots

Fastest Speed, by Sex

We’ll make violin plots, and layer on jittered points:

Facets

  • break data into groups
  • separate plot for each group
  • but the plots are arranged nicely together in one graph

Two Ways to Facet

ggplot2 has two functions for splitting a plot into facets:

  • facet_grid()
  • facet_wrap()

Some New Data

Let’s investigate facet-ting while looking at some new data. Run this in R Studio:

View(railtrail)

You can also get help:

Our Aim

Study how season and dayType relate to volume (the number of people who use the trail on a given day).

Facet Grid, Take One

ggplot(railtrail, aes(x = volume)) +
  geom_density(fill = "burlywood") +
  facet_grid(season ~ dayType)

Facet Grid, Take Two

It was difficult to compare across rows. Why not let aesthetic mappings do some of the work?

Facet Wrap

Use facet_wrap() when you want to split up by a categorical variable that has lots of values.

Our Aim

Let’s study the ages of employees in the various sectors of employment.

Facet Wrap Code

Alternative to Facets

Sometimes more aesthetic mapping is better:

A Few Final Points

Customize!

Use named colors from colors() to customize your plots:

A Common Error: Failure to aes()

Trying to aesethetically map color to sex, this won’t work:

Remember to aes()

ggplot(m111survey, aes(x = fastest, y = GPA)) +
  geom_point(aes(color = sex))

Tweaking With Programming

Recall this density plot with rug:

The rug-ticks over-plot each other. How to fix this?

The Fix

Create a new variable of randomly-jittered speeds:

n <- nrow(m111survey)
jitters <- 
  runif(
    n,
    min = -1, 
    max = 1
  )
m111survey$jitteredSpeeds <- m111survey$fastest + jitters

Each “jittered” speed is the true speed \(\pm\) a random real number up to 1.

Now Plot

ggplot(m111survey, aes(x = fastest)) + 
  geom_density(fill = "burlywood") +
  geom_rug(aes(x = jitteredSpeeds)) +
  labs(x = "Fastest speed ever driven (mph)")