# A tibble: 654 × 5
age FEV height sex smoker
<int> <dbl> <dbl> <chr> <chr>
1 9 1.71 57 F not
2 8 1.72 67.5 F not
3 7 1.72 54.5 F not
4 9 1.56 53 M not
5 9 1.90 57 M not
6 8 2.34 61 F not
7 6 1.92 58 F not
8 6 1.42 56 F not
9 8 1.99 58.5 F not
10 9 1.94 60 F not
# ℹ 644 more rows
Does Smoking Reduce FEV?
A naive study:
mod_smoker <- CRDS |>model_train( FEV ~ smoker )
Graphical Point of View
Code
CRDS |>point_plot( FEV ~ smoker,annot ="model" )
Model Coefficients
mod_smoker |>coef()
(Intercept) smokersmoker
2.5661426 0.7107189
Smokers raised their FEV by 0.71 liters, on average.
So does smoking actually improve lung-function?
Watch for Confounding Factors
Suppose you are studying the relationship between an explanatory variable X and a response variable Y. Suppose that Z is:
a third variable
that is associated with X and
is part of what causes Y.
Then we say that Z is a confounding variable in the the study.
Correcting for Confounding Factors
Rate-corrections often make sense (see textbook for examples).
You can often correct by adding the possible confounders (or variables associated with them) to your model.
It turns out that we can implement Option 2 here.
Size of Lungs
I bet that:
Children with larger lungs also are more likely to be smokers (because they are older).
Lung-size has a causal impact on FEV. (The bigger your lungs, the more air you can expel.)
So lung-size would be a confounding variable in the study.
Height and Age
Lung-size is not available in the data.
But height is available, and
it is probably associated with lung-size.
So let’s include height the model as a “proxy” for lung-size.