I’ve only taken a few statistics courses, and so I apologize if any of my questions are rudimentary, however, I’m wondering if someone could explain/direct me to resources regarding the correct process of testing model assumptions, model fitting, and also the consequences behind not meeting model assumptions such as normality, homoscedasticity, etc?
My impression is that it’s important to meet model assumptions because otherwise, the mathematics of the models will not work. However, this link, caught my eye in that it seems to explain that you can still estimate parameters, and though it’s not ideal for hypothesis testing (is this assessment of p-value?), you can get around this with bootstrapping.
Is this true in both OLS and in linear mixed effects modeling where you have to account for random effects? Additionally, someone suggested to me that it’s not practical in real life settings to meet all the assumptions because data is rarely perfect (ie: not always normally distributed, has a lot of variance, etc). My understanding is if the data isn’t meeting assumptions, that suggests it’s the wrong model and using the model just leads to inaccurate results. Is this true, or in real life settings of analysis, are model assumptions rarely ever met? This doesn’t seem true to me, though in my own experience, I have been having difficulty meeting many assumptions or rectifying them with transformations, which makes it hard for me even proceed in my analyses.
Any thoughts/advice on this to clear things up would be very appreciated.
It is worth remembering that the reason assumptions are made is usually so that statistical tests may be carried out and that the estimators have certain desirable properties (like unbiasedness and consistency). Many "assumptions" are better considered as "conditions" that are needed in order to make certain inferences.
Common assumptions are:
that the model matrix is of full rank (ie. no perfect collinearity). This is necessary for the estimates to even exist.
that the relationship between the linear predictor and the outcome is linear. This is necessary for the estimates to be unbiased.
that the samples are independent. This is necessary for the estimates to be consistent and for them to have nice distributional properties. Mixed effects models are often used when this assumption is invalidated due to repeated measures/clustering/nesting.
that the residuals are homoskedastic. This is necessary to make valid inferences.
Note that mild departures from these assumptions are to be expected.
This is not intended to be an exhaustive answer. People have written textbooks on these matters, and they are discussed at length in answers to other questions on this site.
Answered by Robert Long on November 21, 2021
1 Asked on August 5, 2020 by carlos-valenzuela
0 Asked on August 4, 2020 by m-smith
1 Asked on August 1, 2020 by steven-niggebrugge
1 Asked on July 31, 2020 by uared1776
1 Asked on July 30, 2020 by gabriel-ullmann
0 Asked on July 28, 2020 by gabriel
0 Asked on July 28, 2020 by christopher-u
0 Asked on July 27, 2020 by statsmonkey
Get help from others!