Model Functions

(Lesson 11)

Packages

Make sure these are attached:

library(LSTbook)
library(mosaicData)

Pattern vs. Noise

A Pattern

One might speculate that males and females differ in their heights by some amount. Then we might offer a function for the height of a person:

\[\text{height} = b + a \times \left\{ \begin{array}{ll}0\ \text{when}\ \text{person is female}\\1\ \text{when}\ \text{person is male}\end{array} \right\} \]

  • \(b\) is the “baseline”: the height of a female
  • \(a\) is the extra height one gets by being male

With More Symbols:

\[f(x) = b + a \times I_{\text{male}}(x),\]

  • \(x\) is a person
  • \(f(x)\) is the height of a person \(x\)
  • \(b\) is the baseline height (for females)
  • \(I_{\text{male}}(x)\) is an “indicator” function:
    • 0 if \(x\) is female
    • 1 if \(x\) is male
  • \(a\) is the extra height \(x\) gets for being male

But …

… this is not realistic!

  • heights of females vary
  • heights of males vary

We would like to say …

\[\text{height} = b + a \times I_{\text{male}}(x) + \epsilon,\] where \(\epsilon\) is random “noise”.

A Very General Idea

\[\text{observed value} = f(x) + \epsilon,\]

where:

  • \(f(x)\) is the pattern in process that produces the value we observe
  • \(\epsilon\) is noise:
    • the adjustment due to everything all factors unconnected to the pattern;
    • modeled as random.

How Do We Model Noise?

We will learn several ways, but the most common noise-model is the family of normal probability distributions.

An example: try this repeatedly:

A Thousand Tries

Code
data.frame(epsilon = rnorm(1000, sd = 3)) |> 
  point_plot(epsilon ~ 1, annot = "violin")

Variability of the Noise

rnorm(1, mean = 0, sd = 3)

In the code above,

  • mean = 0 means that the mean of a very large number of trials should be around 0.
  • sd = 3 means that the standard deviation of many trials should be around 3. (So the variance would be \(\approx\) 9.)

Math Fact: The larger the number of trials, the closer the mean and standard deviation of the trials are liable to be to their respective targets.

A Data-Simulator

a <- 66 # average female height
b <- 6 # bump in height for being male
sd_noise <- 3
height_sim <- datasim_make(
  sex <- categorical(n, female = 0.5, male = 0.5, exact = FALSE),
  .noise <- rnorm(n, mean = 0, sd = sd_noise),
  height <- cat2value(sex, female = a, male = a + b) + .noise
)
height_sim |>
  take_sample(n = 100)
# A tibble: 100 × 2
   sex    height
   <chr>   <dbl>
 1 female   63.0
 2 female   67.2
 3 female   66.5
 4 female   62.9
 5 male     73.4
 6 female   70.5
 7 female   63.9
 8 male     79.2
 9 male     70.3
10 female   64.4
# ℹ 90 more rows

Try It Yourself

Summaries

Try this a few times:

The Concept of a Model

Processes That Generate Data

When we collect data, we are tapping into some real-world process that generates the data.

Example

We want to study differences in height between GC males and females, so we take a sample of students and measure their heights.

There is a process at work here:

  • It starts with the current population of GC students. (Though this set could change a bit during sampling.)
  • We sample 100 of them. (Maybe we try to sample in a way such that any set of 100 students is equally likely to be the sample selected.)
  • With each sample student, we fumble around with rulers, meaduring tapes, etc., and get a measurement of height.
  • We also ask each student for their sex.

Features

The process has many features. The three that are most relevant to our research question are:

  • The mean height we would get if we would have sampled all the female students.
  • The mean height of all the male students.
  • Noise: the way the heights of students very from student to student that are unconnected to differences in sex.

Sources of the noise are many:

  • genes related to height that one gets from one’s mother
  • or from one’s father;
  • one’s diet as a child;
  • whether or not one ever got stretched on a rack;
  • variation in the measurement of one’s height (each time you use the ruler on the same person, you would get a slightly different reading);
  • many other factors, insofar as they are unaccounted for by one’s sex.

We don’t have to know or care what they might be. But they are features of the data-generating process that combine to make the noise.

In Reality …

… the process that generates the data isn’t a computer simulator!

(After all, a computer simulator is an object produced by the R programming language.)

And unlike a simulator, there need not be any fixed values associated with the process.

  • such as, e.g., the mean height of all female students. (The set of female students changes a bit during the sampling period, and presumably their heights change a bit during this time as well.)

But …

… it’s often reasonable to model the real-world process as working like a simulator.

But of course …

… we don’t have values for the parameters:

  • the baseline b
  • the bump for being male a
  • the variability of the noise sd_noise

So we don’t know which simulator the process works “most like”.

Mathematically …

… the model is:

\[\text{height} = a + b \times I_{\text{male}}(x) + \epsilon,\] where \(a\) and \(b\) are unknown constants and \(\epsilon\) has a normal distribution centered on 0 with an unknown standard deviation \(\sigma\).

In Computer Terms …

… the model is not a single simulator, but instead is a function that one could use to make simulators:

height_model <- function(a, b, sd_noise, p) {
  datasim_make(
    sex <- categorical(n, female = p, male = 1 - p, exact = FALSE),
    .noise <- rnorm(n, mean = 0, sd = sd_noise),
    height <- cat2value(sex, female = a, male = a + b) + .noise
  )
}

Some Particular Parameters

Our notion is that “reality” plugged some unknown parameters into height_model() to build the data-generating process:

height_sim <- 
  height_model(
    a = 66,        # statisticians don't know this (but wish they did)
    b = 5,         # smae for this
    sd_noise = 3,  # same for this
    p = 0.5        # statisticians don't know this, and don't care 
  )

Then the process of gathering data is modeled as:

height_sim |> 
  take_sample(n = whatever_sample_size)

Statisticians See Data

the_data <-
  height_sim |> 
  take_sample(n = 100)
the_data
# A tibble: 100 × 2
   sex    height
   <chr>   <dbl>
 1 male     65.0
 2 male     70.7
 3 female   60.5
 4 female   62.4
 5 female   68.3
 6 female   66.0
 7 female   64.1
 8 female   68.9
 9 male     67.1
10 female   67.2
# ℹ 90 more rows

They See the Pattern + Noise …

.. but they do not get to see the noise directly. We could, if we wanted to:

Statisticians must use the data to estimate the unknown parameters.

Training a Model

They get their estimates by training the model:

the_data |> 
  model_train(height ~ sex)
A trained model relating the response variable "height"
to explanatory variable "sex".

To see relevant details, use model_eval(), conf_interval(),
R2(), regression_summary(), anova_summary(), or model_plot(),
or the native R model-reporting functions.

Give it a Name

mod_height <-
  the_data |> 
  model_train(height ~ sex)

Training

Training the model means:

using the data to find values for the parameters so that if you put these values into the height_model() function you get a data-simulator that makes the given data more likely than a data-simulator constructed from any other values.

Terminology Note: Most people use the term “model” to refer not only to the hypothesized simulator-making function (height_model() (or, in mathematical terms, the model-equation), but also to refer to the object mod_height obtained by training the model on the data.

Extract Information from the Trained Model

# estimates of a and b
mod_height |>  coef()
(Intercept)     sexmale 
  65.756052    5.218401 
# estimate of the noise standard deviation, sigma
mod_height |> sigma()
[1] 2.520942

These are the values obtained in the training.

  • They probably don’t equal the the parameter-values of the simulator that the data-generating process is imagined to work like. (People often call these the simulator-values the “true” values of the parameters.)
  • But they are best guesses, based on the given data.

Model Coefficients

Model Coefficients

These are the estimates of the parameters found by training the model (except for the estimate of the noise-variability).

What Counts as a Parameter?

height_model <- function(a, b, sd_noise, p) {
  datasim_make(
    sex <- categorical(n, female = p, male = p, exact = FALSE),
    .noise <- rnorm(n, mean = 0, sd = sd_noise),
    height <- cat2value(sex, female = a, male = a + b) + .noise
  )
}
  • Primary parameters are a and b. (Statisticians want to estimate them.)
  • sd_noise is a nuisance parameter. (Statisticians might not wish to know it, but need to include it for their models to be realistic.)
  • p is not a parameter in the model. (It represents a feature of the data-generating process in which we are not currently interested.)

Modelling Height

Back to some familiar data:

Galton |> head(10)

A Model

We have in mind to model height based on mother and father and sex:

Model equation:

\[\text{height} = a + b_1\times x_{\text{mother}} + b_2\times x_{\text{father}} + b_3\times I_{\text{male}}(x) + \epsilon.\] We don’t know the parameters …

Training

… so we use the data to train the model:

mod_height <-
  Galton |> 
  model_train(height ~ mother + father + sex)
mod_height
A trained model relating the response variable "height"
to explanatory variables "mother" & "father" & "sex".

To see relevant details, use model_eval(), conf_interval(),
R2(), regression_summary(), anova_summary(), or model_plot(),
or the native R model-reporting functions.

Model Coefficients

We get estimates of the parameters:

mod_height |> 
  coef()
(Intercept)      mother      father        sexM 
 15.3447600   0.3214951   0.4059780   5.2259513 

What do these numbers mean?

Evaluating the Response Variable …

… when the mother is 62 inches tall.

mod_height |> 
  model_eval(
    mother = 62,
    father = 69,
    sex = "F",
    interval = "none"
  )
  mother father sex  .output
1     62     69   F 63.28994

Evaluating the Response Variable …

… when the mother is 63 inches tall.

mod_height |> 
  model_eval(
    mother = 63,
    father = 69,
    sex = "F",
    interval = "none"
  )
  mother father sex  .output
1     63     69   F 63.61144

Evaluating the Response Variable …

… using several heights, going up one inch at a time:

mod_height |> 
  model_eval(
    mother = c(62, 63, 64, 65),
    father = 69,
    sex = "F",
    interval = "none"
  )
  mother father sex  .output
1     62     69   F 63.28994
2     63     69   F 63.61144
3     64     69   F 63.93293
4     65     69   F 64.25443

Compare with the coefficient for mother:

   mother 
0.3214951 

Evaluating the Response Variable …

… using several father-heights, going up one inch at a time:

mod_height |> 
  model_eval(
    mother = 62,
    father = c(69, 70, 71, 72),
    sex = "F",
    interval = "none"
  )
  mother father sex  .output
1     62     69   F 63.28994
2     62     70   F 63.69592
3     62     71   F 64.10190
4     62     72   F 64.50788

Compare with the coefficient for father:

  father 
0.405978 

The Pattern

When the explanatory variable is numerical:

the evaluation for the response variable changes by the coefficient each time you increase the value of the explanatory variable by one unit (while holding the other explanatory variables constant).

What If …

… the explanatory variable is categorical?

Evaluating the Response Variable …

… when the child is female:

mod_height |> 
  model_eval(
    mother = 62,
    father = 69,
    sex = "F",
    interval = "none"
  )
  mother father sex  .output
1     62     69   F 63.28994

Evaluating the Response Variable …

… when the child is male:

mod_height |> 
  model_eval(
    mother = 62,
    father = 69,
    sex = "M",
    interval = "none"
  )
  mother father sex  .output
1     62     69   M 68.51589

Evaluating the Response Variable …

… at both values of sex:

mod_height |> 
  model_eval(
    mother = 62,
    father = 69,
    sex = c("F", "M"),
    interval = "none"
  )
  mother father sex  .output
1     62     69   F 63.28994
2     62     69   M 68.51589

Compare with the coefficient for sexM:

    sexM 
5.225951 

Pattern

sexM is the change in the response variable when you switch from female to male, holding the other explanatory variables constant.

But why is there no sexF coefficient? To answer this we must investigate further.

A Zero-Level Child

What does the model say the height will be if:

  • the mother is 0 inches
  • the father is 0 inches
  • the child is female?
mod_height |> 
  model_eval(
    mother = 0,
    father = 0,
    sex = "F",
    interval = "none"
  )
  mother father sex  .output
1      0      0   F 15.34476

Wait a Minute …

We’ve seen that number before:

(Intercept) 
   15.34476 

The Intercept

This is what the model evaluates the response variable to be, when:

  • all numerical explanatory variables are set to 0
  • all categorical explanatory variables are set to their “baseline” levels

(“F” comes before “M” in the alphabet, so female was chosen for the baseline.)

The Residuals

mod_height |> residuals()
           1            2            3            4            5            6 
-0.780160366  0.445790944  0.245790944  0.245790944  0.898521277 -0.101478723 
           7            8            9           10           11           12 
-1.875527412 -1.875527412 -0.594751872  1.631199438 -1.094751872 -3.094751872 
          13           14           15           16           17           18 
 0.631199438 -1.868800562 -3.368800562  2.173471371 -0.826528629 -1.826528629 
          19           20           21           22           23           24 
 1.899422681 -2.100577319 -2.100577319  2.251196923  4.025245613  1.525245613 
          25           26           27           28           29           30 
 0.525245613  0.525245613  3.251196923 -3.248803077  3.733439626  1.233439626 
          31           32           33           34           35           36 
-0.766560374 -0.808801819 -0.945065239  3.454216423 -0.545783577  2.680167734 
          37           38           39           40           41           42 
 1.680167734  1.680167734  0.680167734 -1.819832266 -2.319832266  0.001662869 
          43           44           45           46           47           48 
-3.747281227 -4.747281227 -0.586533660 -1.086533660  0.339417651  0.895709043 
          49           50           51           52           53           54 
-0.604290957 -0.904290957 -0.904290957 -1.904290957  2.821660354  0.621660354 
          55           56           57           58           59           60 
-1.378339646 -2.378339646  3.056456611  2.056456611  0.556456611 -8.443543389 
          61           62           63           64           65           66 
 0.782407921 -3.417592079  0.443155489 -1.056844511 -1.556844511 -2.616544981 
          67           68           69           70           71           72 
 0.931521910  0.731521910  0.431521910  2.957473221  1.957473221  1.457473221 
          73           74           75           76           77           78 
 0.957473221 -1.042526779  1.337201663  2.063152974  1.563152974  1.658696798 
          79           80           81           82           83           84 
-0.341303202  0.884648109  3.501687068 -0.198312932 -1.198312932  0.527638379 
          85           86           87           88           89           90 
 0.027638379 -0.472361621 -0.472361621 -0.133109189  0.849133514 -2.150866486 
          91           92           93           94           95           96 
 0.444677339  0.444677339 -1.055322661  0.170628649 -1.829371351 -1.055322661 
          97           98           99          100          101          102 
-3.055322661 -1.829371351  2.944677339 -3.055322661  5.670628649  5.170628649 
         103          104          105          106          107          108 
 1.670628649 -1.829371351  2.306892069  2.006892069  1.506892069 -0.507876216 
         109          110          111          112          113          114 
 1.063183461  0.063183461  0.063183461  1.289134772  0.289134772  0.289134772 
         115          116          117          118          119          120 
 4.266172474  2.266172474 -0.733827526  2.992123784 -1.007876216  2.266172474 
         121          122          123          124          125          126 
 1.766172474  1.766172474  0.266172474  3.492123784  1.513618919  6.421684553 
         127          128          129          130          131          132 
 2.421684553  1.421684553  0.421684553  0.647635864  2.264674823  2.064674823 
         133          134          135          136          137          138 
-1.935325177  1.290626134 -0.613830041  1.612121269  1.612121269  1.112121269 
         139          140          141          142          143          144 
-0.613830041 -1.613830041 -2.113830041  0.612121269 -0.887878731 -2.387878731 
         145          146          147          148          149          150 
 0.386169959  1.612121269  5.386169959  1.386169959  0.386169959 -4.613830041 
         151          152          153          154          155          156 
 0.612121269 -0.237267092  1.546917526  1.546917526  0.546917526 -1.453082474 
         157          158          159          160          161          162 
 0.772868837 -0.227131163  2.343928513 -0.230120176 -1.995323919 -2.795323919 
         163          164          165          166          167          168 
-2.292334906 -2.292334906 -3.066383596  3.255111539  3.255111539  2.755111539 
         169          170          171          172          173          174 
 1.755111539  1.755111539  1.255111539  0.755111539  0.255111539  1.584228044 
         175          176          177          178          179          180 
 0.584228044  0.084228044  1.810179354 -1.970839771 -1.970839771 -1.970839771 
         181          182          183          184          185          186 
 1.665423649  0.665423649 -0.334576351  0.891374959 -0.608625041 -0.608625041 
         187          188          189          190          191          192 
-3.108625041  2.995411069 -0.004588931  1.189907796 -2.810092204  1.415859107 
         193          194          195          196          197          198 
 0.415859107 -1.084140893  1.350655364  0.350655364  0.350655364 -5.649344636 
         199          200          201          202          203          204 
 0.576606674  0.576606674 -0.423393326 -1.423393326 -1.423393326  1.350655364 
         205          206          207          208          209          210 
 1.350655364  0.350655364 -0.923393326  1.672150499  0.672150499  0.398101810 
         211          212          213          214          215          216 
-1.601898190 -2.601898190  2.672150499  1.172150499  1.172150499  0.398101810 
         217          218          219          220          221          222 
-4.101898190  0.511402931 -0.262645758 -0.262645758 -0.262645758 -1.762645758 
         223          224          225          226          227          228 
 1.172150499  0.672150499 -0.327849501 -0.327849501 -3.327849501  0.398101810 
         229          230          231          232          233          234 
-0.101898190 -1.780403055  3.458131039  0.958131039 -0.172337421 -1.172337421 
         235          236          237          238          239          240 
-2.172337421  3.053613889 -1.172337421 -2.472337421  2.053613889  0.053613889 
         241          242          243          244          245          246 
-1.946386111 -3.946386111  4.149157714 -0.529347151 -1.529347151  0.696604159 
         247          248          249          250          251          252 
-1.303395841 -5.303395841  2.196604159  2.631400416  1.631400416  1.631400416 
         253          254          255          256          257          258 
-3.868599584  4.057351727  2.057351727  1.357351727  0.857351727  0.857351727 
         259          260          261          262          263          264 
-0.942648273 -1.442648273  1.910654106  0.110654106 -1.089345894 -1.589345894 
         265          266          267          268          269          270 
 3.136605417  0.136605417 -3.363394583 -3.863394583 -3.863394583  3.113643119 
         271          272          273          274          275          276 
 2.113643119  0.613643119 -4.886356881 -4.886356881 -0.160405570 -1.660405570 
         277          278          279          280          281          282 
-2.660405570 -2.886356881 -4.886356881 -0.160405570 -2.160405570 -2.160405570 
         283          284          285          286          287          288 
 0.113643119  0.113643119  2.339594430  0.339594430  0.339594430 -1.660405570 
         289          290          291          292          293          294 
 9.113643119  5.113643119  1.113643119  4.339594430  2.339594430  1.039594430 
         295          296          297          298          299          300 
-2.660405570  3.113643119  2.613643119  0.339594430 -0.886356881 -0.886356881 
         301          302          303          304          305          306 
 2.210091660  0.210091660 -1.089908340  1.936042970  0.936042970  0.136042970 
         307          308          309          310          311          312 
-0.063957030  1.135138254  0.435138254 -1.564861746 -2.564861746 -3.564861746 
         313          314          315          316          317          318 
-4.564861746  2.661089565  0.435138254 -1.564861746 -2.864861746  1.161089565 
         319          320          321          322          323          324 
 2.370839227  0.370839227 -1.903209462 -3.203209462 -4.303209462  4.232149241 
         325          326          327          328          329          330 
-0.267850759 -0.767850759 -1.767850759 -1.767850759 -1.767850759  0.958100552 
         331          332          333          334          335          336 
 0.458100552 -4.702647016 -1.564861746  0.661089565 -0.338910435 -2.338910435 
         337          338          339          340          341          342 
 1.435138254  0.435138254  0.435138254  0.435138254 -0.064861746 -1.064861746 
         343          344          345          346          347          348 
 4.661089565  0.661089565 -0.338910435  0.531586795 -2.468413205 -3.968413205 
         349          350          351          352          353          354 
-0.542461895 -1.042461895 -1.742461895 -2.042461895 -3.242461895  3.053644377 
         355          356          357          358          359          360 
-0.446355623 -2.446355623  0.279595687 -0.220404313  1.595885822 -1.904114178 
         361          362          363          364          365          366 
 3.321837132 -0.678162868 -1.243366611 -2.243366611 -0.317415300 -2.017415300 
         367          368          369          370          371          372 
 0.756633389 -2.743366611 -2.017415300 -3.017415300  2.875139512  0.875139512 
         373          374          375          376          377          378 
 0.375139512  0.375139512 -1.124860488  1.101090822  0.101090822 -0.898909178 
         379          380          381          382          383          384 
 1.431288522  0.431288522 -0.068711478 -4.068711478 -0.042760167 -0.542760167 
         385          386          387          388          389          390 
-0.842760167  2.875139512  2.875139512 -3.898909178  2.599623659 -1.600376341 
         391          392          393          394          395          396 
-1.278881206 -3.778881206  1.947070105 -0.052929895  1.947070105  1.947070105 
         397          398          399          400          401          402 
 3.703361497 -3.296638503  0.429312808  4.364109065 -1.635890935  3.590060375 
         403          404          405          406          407          408 
 2.590060375  0.590060375  4.394388171  0.394388171 -0.605611829  0.620339482 
         409          410          411          412          413          414 
 0.620339482  0.120339482 -0.379660518 -0.379660518 -1.379660518 -1.379660518 
         415          416          417          418          419          420 
-0.897417815  3.198126009  2.198126009  1.898126009  1.698126009  0.924077320 
         421          422          423          424          425          426 
 0.424077320 -1.875922680 -2.075922680  1.398126009  1.198126009  0.198126009 
         427          428          429          430          431          432 
 4.973079415  3.973079415  1.973079415 -1.526920585 -3.026920585 -4.026920585 
         433          434          435          436          437          438 
 0.198126009 -1.301873991 -1.801873991  0.424077320 -1.575922680 -2.075922680 
         439          440          441          442          443          444 
 3.037378442  1.037378442  0.537378442  0.537378442 -3.736670248  0.334389429 
         445          446          447          448          449          450 
-2.665610571 -0.439659260 -0.939659260  1.037378442  3.763329752  2.763329752 
         451          452          453          454          455          456 
 1.263329752 -1.736670248 -1.736670248  0.995136997  0.995136997  0.495136997 
         457          458          459          460          461          462 
 0.495136997  1.721088307  0.721088307 -0.278911693  3.198126009  2.198126009 
         463          464          465          466          467          468 
-0.801873991 -0.801873991  1.924077320  0.924077320  0.924077320  0.424077320 
         469          470          471          472          473          474 
-0.575922680  0.519621144 -0.980378856 -2.480378856  0.745572455 -0.254427545 
         475          476          477          478          479          480 
-0.754427545 -3.254427545  0.177379699 -1.522620301 -9.522620301  0.903331010 
         481          482          483          484          485          486 
 0.203331010 -0.596668990 -4.296668990  2.459920674 -2.740079326  0.985871985 
         487          488          489          490          491          492 
-0.514128015  0.162611414  3.888562725 -0.111437275  3.162611414  4.162611414 
         493          494          495          496          497          498 
 1.162611414  1.162611414 -4.837388586  2.388562725 -1.611437275  1.501863847 
         499          500          501          502          503          504 
-1.998136153 -2.998136153  1.227815157 -0.772184843 -1.772184843 -2.772184843 
         505          506          507          508          509          510 
 1.501863847 -0.072184843 -0.772184843 -1.074126758  4.281117537  3.281117537 
         511          512          513          514          515          516 
 0.281117537  4.484106550  2.484106550  2.484106550  0.484106550 -0.289942140 
         517          518          519          520          521          522 
 2.323358982  1.323358982  1.323358982  0.323358982  0.049310293 -0.950689707 
         523          524          525          526          527          528 
-0.950689707 -1.450689707  3.484106550 -0.515893450  2.710057860  2.710057860 
         529          530          531          532          533          534 
 1.602612672  1.102612672  0.602612672 -0.171436017 -1.171436017 -0.194398315 
         535          536          537          538          539          540 
-0.194398315 -0.694398315 -4.194398315 -5.194398315 -5.194398315  0.531552995 
         541          542          543          544          545          546 
-0.968447005 -0.968447005  2.627096820  5.353048130 -0.146951870  1.127096820 
         547          548          549          550          551          552 
-1.872903180 -0.946951870 -2.146951870  1.466349252 -0.126808691 -4.200857381 
         553          554          555          556          557          558 
 3.079619887  1.079619887  2.305571197  0.240367454 -0.759632546 -0.759632546 
         559          560          561          562          563          564 
-1.059632546 -1.259632546 -1.259632546 -1.759632546 -1.759632546 -1.759632546 
         565          566          567          568          569          570 
-1.333681235 -1.574400830 -3.074400830 -3.235148398 -0.009197087  2.464851602 
         571          572          573          574          575          576 
 2.264851602  1.464851602 -3.735148398  2.490802913  1.190802913 -2.509197087 
         577          578          579          580          581          582 
 2.925599170  2.925599170  4.151550480  2.151550480 -0.077389843 -1.277389843 
         583          584          585          586          587          588 
-3.277389843 -3.277389843 -2.051438533 -2.551438533 -3.051438533 -4.051438533 
         589          590          591          592          593          594 
 2.247094305 -0.752905695 -0.752905695 -1.752905695  1.473045615  0.473045615 
         595          596          597          598          599          600 
-0.526954385 -0.526954385 -1.526954385 -2.526954385 -2.752905695 -5.752905695 
         601          602          603          604          605          606 
 1.973045615 -1.526954385  2.447094305  2.447094305  0.247094305 -0.252905695 
         607          608          609          610          611          612 
-1.026954385 -1.687701952  0.247094305 -1.752905695 -2.752905695  2.473045615 
         613          614          615          616          617          618 
 2.473045615  1.473045615  1.473045615  1.473045615  0.473045615 -0.526954385 
         619          620          621          622          623          624 
 2.068589440  1.568589440 -0.431410560 -2.431410560 -2.431410560  2.794540750 
         625          626          627          628          629          630 
-1.205459250 -1.705459250  4.704852860  1.204852860  0.704852860  1.930804170 
         631          632          633          634          635          636 
-1.431410560  1.568589440 -0.431410560  1.294540750  0.794540750  2.568589440 
         637          638          639          640          641          642 
-0.431410560 -2.431410560 -2.931410560 -3.431410560 -0.205459250 -1.205459250 
         643          644          645          646          647          648 
-1.205459250 -1.431410560 -1.431410560 -2.431410560  0.794540750  0.294540750 
         649          650          651          652          653          654 
-2.205459250 -0.595147140  1.568589440  1.326646267  0.326646267 -1.673353733 
         655          656          657          658          659          660 
-3.173353733  1.052597578  0.229337007 -0.694100043 -1.468148732 -1.770662993 
         661          662          663          664          665          666 
 0.711579710  0.211579710 -3.788420290  0.937531020  0.437531020 -0.831224182 
         667          668          669          670          671          672 
-3.466925155 -1.240973844 -1.240973844 -2.240973844 -2.240973844 -2.240973844 
         673          674          675          676          677          678 
-6.240973844  0.033074845 -0.466925155 -0.966925155 -2.240973844  1.651580967 
         679          680          681          682          683          684 
 0.854569980 -2.145430020 -2.445430020 -3.145430020 -3.145430020 -4.145430020 
         685          686          687          688          689          690 
 3.080521291  3.080521291  0.080521291 -0.919478709  3.645783033  3.645783033 
         691          692          693          694          695          696 
 2.445783033  1.671734343 -0.328265657  1.849334492  1.031577195 -1.168422805 
         697          698          699          700          701          702 
 2.057528505  1.557528505  1.057528505  0.057528505  1.170829627  0.170829627 
         703          704          705          706          707          708 
 1.896780938  1.896780938 -0.603219062  1.170829627 -1.129170373 -0.603219062 
         709          710          711          712          713          714 
-3.603219062 -3.668422805 -1.442471495 -1.442471495  2.128588182  0.128588182 
         715          716          717          718          719          720 
 0.354539493  0.354539493 -0.645460507 -0.645460507 -0.645460507 -0.645460507 
         721          722          723          724          725          726 
-0.645460507 -1.145460507 -1.645460507  3.153072330  1.653072330 -1.346927670 
         727          728          729          730          731          732 
-1.346927670  2.813819897  2.013819897  1.013819897  0.313819897 -0.186180103 
         733          734          735          736          737          738 
-1.186180103 -2.686180103  0.539771208  0.974567465 -0.025432535  0.200518776 
         739          740          741          742          743          744 
 2.093073587  1.593073587  1.093073587  0.593073587  3.319024898 -0.382442265 
         745          746          747          748          749          750 
 1.594564950  1.594564950 -1.905435050 -2.905435050 -2.905435050 -1.679483740 
         751          752          753          754          755          756 
-1.679483740 -2.179483740  3.094564950 -3.905435050 -3.905435050  3.320516260 
         757          758          759          760          761          762 
 0.320516260  0.320516260 -1.679483740 -3.679483740 -3.679483740 -2.583939915 
         763          764          765          766          767          768 
-3.583939915  3.642011396  3.142011396  2.142011396  3.416060085 -0.583939915 
         769          770          771          772          773          774 
 2.642011396  1.642011396 -1.357988604 -2.357988604  0.234566207  0.034566207 
         775          776          777          778          779          780 
-1.965433793 -3.965433793 -0.739482482 -2.739482482 -2.739482482 -5.739482482 
         781          782          783          784          785          786 
 3.576807653  2.576807653 -1.423192347  2.802758963  1.802758963  2.380545490 
         787          788          789          790          791          792 
 1.945749233 -0.554250767 -0.622443522 -0.822443522 -2.822443522  2.403507788 
         793          794          795          796          797          798 
 2.403507788  0.403507788  2.338304045  0.338304045  1.064255356  0.064255356 
         799          800          801          802          803          804 
 0.064255356  0.064255356  0.064255356  2.862788193  1.345030896  0.345030896 
         805          806          807          808          809          810 
-1.654969104 -1.429017794 -1.654969104  1.666526031  0.666526031  0.166526031 
         811          812          813          814          815          816 
-0.333473969 -0.633473969 -0.833473969 -1.333473969  3.892477341  2.892477341 
         817          818          819          820          821          822 
 1.892477341  0.892477341 -0.107522659 -1.107522659 -3.107522659 -4.107522659 
         823          824          825          826          827          828 
-1.999457025 -2.499457025 -2.499457025  1.726494286 -0.273505714 -5.177961890 
         829          830          831          832          833          834 
 0.047989421  0.047989421 -2.952010579  1.143533245  0.143533245  0.143533245 
         835          836          837          838          839          840 
 2.369484556  2.369484556 -0.630515444 -0.630515444 -1.630515444 -3.630515444 
         841          842          843          844          845          846 
 2.682785678  2.708736988  1.343533245  1.143533245  0.143533245 -0.156466755 
         847          848          849          850          851          852 
 1.869484556 -2.130515444 -0.534971619 -0.534971619  1.690979691  1.690979691 
         853          854          855          856          857          858 
 0.190979691 -1.809020309  2.786523516  1.012474826 -1.213476484 -1.213476484 
         859          860          861          862          863          864 
 1.012474826  3.583534503  3.583534503  1.583534503  1.309485814  1.548019908 
         865          866          867          868          869          870 
 1.548019908  0.548019908  0.548019908  0.773971219  4.371006406  0.871006406 
         871          872          873          874          875          876 
 3.596957716  2.096957716  0.096957716  0.096957716 -0.903042284  3.371006406 
         877          878          879          880          881          882 
 0.871006406  5.096957716  3.096957716  2.096957716  2.096957716 -1.903042284 
         883          884          885          886          887          888 
-2.307498459  0.156986946 -0.617061743 -2.960027814  0.265923497 -0.734076503 
         889          890          891          892          893          894 
 0.301468579 -3.972580111  2.722610157  1.222610157 -0.577389843 -0.777389843 
         895          896          897          898 
-1.577389843 -0.051438533 -0.551438533 -1.051438533 

Error Sum-of-Squares (ESS)

resids <- mod_height |> residuals()
ess <- sum(resids^2)
ess
[1] 4149.162

In linear regression, to “train” the model is to choose the coefficients so that the ESS (error sum-of-squares) is as small as possible.

Aligning

Minimizing the ESS is what our author is thinking about when he says:

model_train() finds numerical values for the coefficients that cause the model function to align as closely as possible to the data.”

These are also the coefficients to build a data-simulator that would make the observed data most likely.

Try It Out!

Find the Model Coefficients

A Peek at Logistic Regression

Categorical Response

If the response variable is categorical, it must have only two values.

Sex Modelled by Height

mod_sex <-
  Galton |> 
  model_train(sex ~ height)
mod_sex
A trained model relating the response variable "sex"
to explanatory variable "height".

To see relevant details, use model_eval(), conf_interval(),
R2(), regression_summary(), anova_summary(), or model_plot(),
or the native R model-reporting functions.

Some Evaluations

mod_sex |> 
  model_eval(
    height = c(62, 63, 64, 65, 66, 67),
    interval = "none"
  )
  height    .output
1     62 0.02709270
2     63 0.05818404
3     64 0.12053506
4     65 0.23316108
5     66 0.40282177
6     67 0.59943330

The model is giving probabilities for a person to be male, based on the height.

The Sigmoid Curve

Code
mod_sex_evals <-
  mod_sex |> 
  model_eval(
    height = seq(58, 74, by = 0.1),
    interval = "none"
  )

mod_sex_evals |> 
  point_plot(
    .output ~ height
  )

The Sigmoid Band

Galton |> 
  point_plot(
    sex ~ height,
    annot = "model"
  )

Coefficients

mod_sex |> 
  coef()
(Intercept)      height 
-52.9842242   0.7968258 

Later on we will learn how the sigmoid curve is made from the coefficients.