Adjustment

(Lesson 12)

Packages

Make sure these are attached:

library(LSTbook)

Correcting for Other Factors

Childhood Respiratory Disease

?CRDS
View(CRDS)

CRDS

# A tibble: 654 × 5
     age   FEV height sex   smoker
   <int> <dbl>  <dbl> <chr> <chr> 
 1     9  1.71   57   F     not   
 2     8  1.72   67.5 F     not   
 3     7  1.72   54.5 F     not   
 4     9  1.56   53   M     not   
 5     9  1.90   57   M     not   
 6     8  2.34   61   F     not   
 7     6  1.92   58   F     not   
 8     6  1.42   56   F     not   
 9     8  1.99   58.5 F     not   
10     9  1.94   60   F     not   
# ℹ 644 more rows

Does Smoking Reduce FEV?

A naive study:

mod_smoker <- 
  CRDS |> 
  model_train(
    FEV ~ smoker
  )

Graphical Point of View

Code

CRDS |> 
  point_plot(
    FEV ~ smoker,
    annot = "model"
  )

Model Coefficients

mod_smoker |> 
  coef()

 (Intercept) smokersmoker 
   2.5661426    0.7107189

Smokers raised their FEV by 0.71 liters, on average.

So does smoking actually improve lung-function?

Watch for Confounding Factors

Suppose you are studying the relationship between an explanatory variable X and a response variable Y. Suppose that Z is:

a third variable
that is associated with X and
is part of what causes Y.

Then we say that Z is a confounding variable in the the study.

Correcting for Confounding Factors

Rate-corrections often make sense (see textbook for examples).
You can often correct by adding the possible confounders (or variables associated with them) to your model.

It turns out that we can implement Option 2 here.

Size of Lungs

I bet that:

Children with larger lungs also are more likely to be smokers (because they are older).
Lung-size has a causal impact on FEV. (The bigger your lungs, the more air you can expel.)

So lung-size would be a confounding variable in the study.

Height and Age

Lung-size is not available in the data.
But height is available, and
it is probably associated with lung-size.

So let’s include height the model as a “proxy” for lung-size.

mod_smoker_adjust <-
  CRDS |> 
  model_train(
    FEV ~ smoker + height
  )

Graphical Point of View

Code

CRDS |> 
  point_plot(
    FEV ~ height + smoker,
    annot = "model"
  )

Coefficients for the Adjusted Model

mod_smoker_adjust |> 
  coef()

 (Intercept) smokersmoker       height 
-5.427619561  0.006319331  0.131882555

Looks like smoking doesn’t make much of a difference in FEV.