# Assumptions of OLS and linear mixed models

I’ve only taken a few statistics courses, and so I apologize if any of my questions are rudimentary, however, I’m wondering if someone could explain/direct me to resources regarding the correct process of testing model assumptions, model fitting, and also the consequences behind not meeting model assumptions such as normality, homoscedasticity, etc?

My impression is that it’s important to meet model assumptions because otherwise, the mathematics of the models will not work. However, this link, caught my eye in that it seems to explain that you can still estimate parameters, and though it’s not ideal for hypothesis testing (is this assessment of p-value?), you can get around this with bootstrapping.

Is this true in both OLS and in linear mixed effects modeling where you have to account for random effects? Additionally, someone suggested to me that it’s not practical in real life settings to meet all the assumptions because data is rarely perfect (ie: not always normally distributed, has a lot of variance, etc). My understanding is if the data isn’t meeting assumptions, that suggests it’s the wrong model and using the model just leads to inaccurate results. Is this true, or in real life settings of analysis, are model assumptions rarely ever met? This doesn’t seem true to me, though in my own experience, I have been having difficulty meeting many assumptions or rectifying them with transformations, which makes it hard for me even proceed in my analyses.

Any thoughts/advice on this to clear things up would be very appreciated.

Cross Validated Asked by molecularrunner on November 21, 2021

It is worth remembering that the reason assumptions are made is usually so that statistical tests may be carried out and that the estimators have certain desirable properties (like unbiasedness and consistency). Many "assumptions" are better considered as "conditions" that are needed in order to make certain inferences.

Common assumptions are:

• that the model matrix is of full rank (ie. no perfect collinearity). This is necessary for the estimates to even exist.

• that the relationship between the linear predictor and the outcome is linear. This is necessary for the estimates to be unbiased.

• that the samples are independent. This is necessary for the estimates to be consistent and for them to have nice distributional properties. Mixed effects models are often used when this assumption is invalidated due to repeated measures/clustering/nesting.

• that the residuals are homoskedastic. This is necessary to make valid inferences.

Note that mild departures from these assumptions are to be expected.

This is not intended to be an exhaustive answer. People have written textbooks on these matters, and they are discussed at length in answers to other questions on this site.

Answered by Robert Long on November 21, 2021

## Related Questions

### What statistical analysis to used for kinetic data with multiple groups?

1  Asked on August 5, 2020 by carlos-valenzuela

### In R, why do the p-values from anova() change when you add more predictors?

0  Asked on August 4, 2020 by m-smith

### Random forest after cross validation

1  Asked on August 1, 2020 by steven-niggebrugge

### Grey relation between two datasets?

0  Asked on July 31, 2020 by msilvy

### General procedures for combined feature selection, model tuning, and model selection?

1  Asked on July 31, 2020 by uared1776

### Classification model not working for a large dataset

1  Asked on July 30, 2020 by gabriel-ullmann

### Sigma algebra generated by random variable on a set with generators

0  Asked on July 28, 2020 by gabriel

### What is the seasonal trend lowess model in time series?

0  Asked on July 28, 2020 by christopher-u

### Non seasonal and seasonal parameters of this time-series

0  Asked on July 27, 2020 by statsmonkey

### Extended Cox model and cox.zph

2  Asked on July 25, 2020 by finance