What is the correct way of adding bias terms in the residuals of the linear regression model?

First, I fit a linear model:

$y=beta_0 + beta_1x_1+beta_2x_2 + epsilon$

Now I want to visualize $y$ after the effects of $x_1$ and $x_2$ have been removed or adjusted. I can visualize the $y$ vs. $x_1$ or $x_2$ relationship by using only the residuals $epsilon$. The problem is I want to add the bias term in the residuals. Say, I want to plot the adjusted $y$ as a boxplot with respect to another independent variable (e.g. diagnostic group).

For now, I am adjusting the effects of $x_1$, $x_2$ on $y$ as below:

$y_a=y – beta_1(x_1-bar{x_1}) – beta_2(x_2-bar{x_2}) qquad (1)$

Here, for one data point, I am defining the effect of $x_1$ as the change in $y$ caused by the difference of $x_1$ from the mean of $x_1$ i.e. $bar{x_1}$.

After few algebraic manipulations:

$y_a= (beta_0 + beta_1 bar{x_1} + beta_2 bar{x_2}) + epsilon
= bias + residuals qquad (2) $

First, I am not 100% convinced with myself with this technique. However, this article also uses this technique for covariate adjustment (see Equation 1 on Page 728).

Question1: Is this technique correct? and why if yes/no? Or asking the same question based on equation 2: Is the bias term $(beta_0 + beta_1 bar{x_1} + beta_2 bar{x_2})$ added to the residuals is correct?

Let’s assume the above adjustment technique is correct.
Let’s say $x_1$ is a categorical variable with more than two levels. How to calculate the mean of $x_1$ ($bar{x_1}$)?

How to calculate the mean of a categorical variable? To be strict, it doesn’t even makes a sense to calculate the mean or any summary statistic off a categorical variable. Is there any workaround for this?

Cross Validated Asked by gruangly on November 14, 2021

1 Answers

One Answer

I think I am confused by the way you are using the word bias. Seems like they adjusted the HCV by head size by fitting a linear regression model. So for people with above average head-size they artificially reduced HCV and with below average head size they increased (adjusted) HCV, in order to reduce the variance caused by head size.

Then they used HCVadj as a factor to model dimentia status.

The reason they adjust before hand is because they want to compute the Cohen's D which uses the mean HCVs in the demented and non-demented population (no room for head-size or external factors in this model). So now they have a Cohen's D adjusted for head size. I see no problem here but I am no expert in Cohen's D.

If you have a categorical predictor you can accomplish the same with unbalanced effect coding. (see here: What is effect coding?). Include the 4-level class variable in the model using 4 dummy variables(x_21, x_22, ....) coded as shown on that page.

Fit the model y = B_1*(x_1 - X_bar) + B_2*x_21 + B_3*x_22 + B_4*x_23 + B_4*x_24.

Calculate the Ya's for each individual as you have in your equation (1) (without the intercept). Then use your Ya's to calculate your Cohen's D. If you are not calculating Cohen's D or something similar that requires 2 groups then you don't need this method. Maybe you can find some other way that takes into account the other factors?

Answered by Derrick Kaufman on November 14, 2021

Add your own answers!

Related Questions

Tensor Classification Models

1  Asked on September 3, 2020 by mamafoku


Maximum likelihood estimator for a discontinuous PDF

0  Asked on August 17, 2020 by probdiscr


ARDL and ECM lags

0  Asked on August 8, 2020 by php-useless


What statistical analysis to used for kinetic data with multiple groups?

1  Asked on August 5, 2020 by carlos-valenzuela


Random forest after cross validation

1  Asked on August 1, 2020 by steven-niggebrugge


Ask a Question

Get help from others!

© 2021 All rights reserved.