How to test if sum of two coefficients of ols model is greater than zero using R?

The regression model is:

y = b0 +b1x1 + b2x2 + b3x3 + e

I want to test if b1 + b2 > 0.

the R package car has a function linearHypothesis that can test if b1 + b2 = 0 by

linearHypothesis(lm_model, "b1 + b2 = 0")

but this is a two-sided test. I want a one-sided test.

linearHypothesis(lm_model,"b1 + b2 > 0") can’t work, it seems that linearHypothesis can only do two-sided test. Are there some R packages can do this one-sided test?

Cross Validated Asked on November 21, 2021

3 Answers

3 Answers

Using multcomp

The multcomp package offers the capability of testing linear equalities or inequalities of coefficients. Here, I'll use the same data generated by @conjugateprior to illustrate the package:


# Generate data and fit model

d <- data.frame(X1 = rnorm(500), X2 = rnorm(500), X3 = rnorm(500))
d$Y <- 1 + d$X1 * -1 + d$X2 * 0.9 + d$X3*0.2 + rnorm(500)

mod0 <- lm(Y ~ X1 + X2 + X3, data = d)

# Specify the linear hypothesis
glht_mod <- glht(
  model = mod0
  , linfct = c("X1 + X2 >= 0")

# Inspect summary and confidence interval

Which produces the following output (truncated).

Linear Hypotheses:
             Estimate Std. Error t value Pr(<t)
X1 + X2 >= 0 -0.07211    0.06284  -1.148  0.126

Linear Hypotheses:
             Estimate lwr      upr     
X1 + X2 >= 0 -0.07211     -Inf  0.0314

We see that the p-value for the null hypothesis that $beta_{X1} + beta_{X2} geq 0$ is $0.126$. So the test provides little evidence against the null hypothesis.

The one-sided 95%-confidence interval is $(-infty, 0.0314]$ so the data is compatible at the 95%-level that the sum of the two coefficients is up to $0.03144$. The upper limit is larger than $0$ so the data indicate that $0$ is among the plausible values for the sum. This confidence limits is in good agreement with the confidence interval obtained by @conjugateprior using the bootstrap.

Using restriktor

The restriktor package allows you to fit linear equality or inequality restrictions about parameters and effects. The packages has its own website with a tutorial and other explanations. Again, using the same data as above:


# Generate data

d <- data.frame(X1 = rnorm(500), X2 = rnorm(500), X3 = rnorm(500))
d$Y <- 1 + d$X1 * -1 + d$X2 * 0.9 + d$X3*0.2 + rnorm(500)

# Fit the model
mod0 <- lm(Y ~ X1 + X2 + X3, data = d)

Now we'll have to specify the constraints, which we can do easily using text-based description (important: the constraint syntax has to be enclosed within single quotes). Then, refit the model using the restriktor function:

myconstr <- '(X1 + X2) > 0'

constr_mod <- restriktor(
  object = mod0
  , constraints = myconstr 

To test the hypothesis, let's use the iht function:


This generates the following output (truncated):

Overview of all available hypothesis tests:

Global test: H0: all parameters are restricted to be equal (==)
         vs. HA: at least one inequality restriction is strictly true (>)
       Test statistic: 818.0010,   p-value: <0.0001

Type A test: H0: all restrictions are equalities (==) 
         vs. HA: at least one inequality restriction is strictly true (>)
       Test statistic: 0.0000,   p-value: 1

Type B test: H0: all restrictions hold in the population
         vs. HA: at least one restriction is violated
       Test statistic: 1.3169,   p-value: 0.1259

Type C test: H0: at least one restriction is false or active (==) 
         vs. HA: all restrictions are strictly true (>)
       Test statistic: -1.1475,   p-value: 0.8741

The relevant hypothesis in your case is B, which features a p-value of $0.1259$ and thus offers little evidence against the null hypothesis that $beta_{X1} + beta_{X2} > 0$.

The package offers several different methods to calculate the standard errors of the coefficients, including the bootstrap or heteroscedasticity-consistent standard errors.

Answered by COOLSerdash on November 21, 2021

Doubtless there are packages to do it, but you could also try using the bootstrap, which will work for nearly arbitrary contrasts. Here's a simple implementation that gets a bootstrap sampling distribution for b2 + b1, from which the p value can be computed by asking what proportion of estimates are positive. To change the statistic(s) you're interested just change the second line of the stat function and rerun.

We'll first set up some fake data


d <- data.frame(X1 = rnorm(500), X2 = rnorm(500), X3 = rnorm(500))
d$Y <- 1 + d$X1 * -1 + d$X2 * 0.9 + d$X3*0.2 + rnorm(500)

noting that the 'true' population value is -0.1.

Now fit a regular (correctly specified, in this case) model and get a point estimate for b2 + b1.

mod0 <- lm(Y ~ X1 + X2 + X3, data = d)
cfs <- coef(mod0)
cfs['X2'] + cfs['X1'] # About -0.072

That's the statistic we're interested in getting a sampling distribution and thereby a p-value for, so we wrap up this process in a function:

stat <- function(dat, inds){
  cfs <- coef(lm(Y ~ X1 + X2 + X3, data = dat[inds,]))
  cfs['X2'] + cfs['X1'] # return the statistic in this sample

where inds are the indices of rows in dat in a single bootstrap sample. The boot function will provide those, so we just have to be sure to use them in the call to lm.

Now to get a lot of bootstrap samples of b2 + b1:


cores <- parallel::detectCores() # because we are impatient
res <- boot(d, stat, R = 10000, parallel = "multicore", ncpus = cores)

(If the parallelization has problems then just remove the parallel and ncpus arguments to boot and be prepared to wait a bit longer). For more precision, increase R, the number of (re)samples.

Anyway, here's the whole bootstrap sampling distribution

hist(res$t, breaks = 100, xlab = "Estimated b2 + b1",
     main = "Bootstrap Sampling Distribution of b2 + b1")

Now we might ask how often b2 + b1 is greater than 0 (a.k.a. the bootstrap one-sided p-value)

sum(as.numeric(res$t > 0)) / length(res$t) # about 15% of the time

Alternatively, we might ask whether 0 is outside the lower 95% of estimates

quantile(as.numeric(res$t), prob = 0.95) # about .038 which is greater than 0 obvs.

or we could just use the the bootstrap samples to get a standard error and do things more parametrically, but in any case it looks like we should 'fail to reject' that b2 + b1 > 0.

This answer has been a bit fast and loose on the inferential side of things, e.g. glossing over some bootstrap decisions, with the aim of getting the machinery in place. Hereafter, and as noted above, you can just keep adjusting stat to get bootstrap sampling distributions for increasingly weird and wonderful quantities. Though if you do that you might want to read up on the limits, assumptions, and strategies of bootstrapping.

Answered by conjugateprior on November 21, 2021

You could bootstrap this. Example:

fit <- lm(Sepal.Length ~ Petal.Length + Petal.Width, data = iris)

sum(coef(fit)[c("Petal.Length", "Petal.Width")])

bootfun <- function(DF, ind, origfit) {
  #bootstrap residuals:
  DF$Sepal.Length <- fitted(origfit) + residuals(origfit)[ind]

  fit <- lm(Sepal.Length ~ Petal.Length + Petal.Width, data = DF)

  sum(coef(fit)[c("Petal.Length", "Petal.Width")])

myboot <- boot(iris, bootfun, 9999, origfit = fit)

mean(myboot$t <= 0)
#[1] 0.00970097
#--> reject H0: alpha_1 + alpha_2 <= 0

Answered by Roland on November 21, 2021

Add your own answers!

Related Questions

Ask a Question

Get help from others!

© 2021 All rights reserved.