Hope I got it right, as this is my first active post 🙂

I was trying to find a solution the whole day for my problem.

I am trying to predict a continuos variable based on 20 different predictors. The predicted variable is an average m’3 per transportation unit across all palets. the 20 different predictors are the ordered volume per department.

The data looks like this:

You can see the predictors in the columns VS01 – VS20. Lots of them are zero, because not every order buys something from a department.

In order to create a linear regression I was trying to normalize data with different approaches.

- Log is not working due to the zeros
- Square Root is giving me kind of good distribution but does not solve the issue with the zeros
- Yeo Johnson is the same as the Sqare Root try

What I’ve found is that the solutions for zero inflated issues are focusing only on the predicted value, not on the predictors.

So I am trying to ask the community, if you have another approach to try out?

Bellow you can find the 2nd picutre about 2 of the department’s volume from the orders. Almost all of them look like this:

- Once transformed with Square Root
- Once transformed with Yeo Johnson
- Once the original data set, where it is visible that the data is not only skewed also that there is the gap between the zeros and the next bin of values

Additional what would be nice to understand a bit more is how to transform such a non linear distribution into something normal distributed

What I would also be open, if someone could suggest a different approach than linear regression. Maybe:

- A decision tree approach
- A PCA approach

Thanks a lot

Cross Validated Asked by Eugen Cuic on November 14, 2021

0 Answers1 Asked on January 21, 2021 by funkwecker

0 Asked on January 21, 2021

0 Asked on January 20, 2021 by igor-f

1 Asked on January 19, 2021 by wetlabstudent

1 Asked on January 19, 2021 by raghavsikaria

0 Asked on January 18, 2021 by ladan-gol

0 Asked on January 18, 2021 by thomas-moore

0 Asked on January 18, 2021 by shawn-strasser

1 Asked on January 17, 2021 by matthias

0 Asked on January 17, 2021 by cat-cuddler

case control study hypothesis testing inference observational study panel data

0 Asked on January 16, 2021 by ss-varshini

0 Asked on January 16, 2021 by sgg

gradient descent machine learning mathematical statistics risk training error

0 Asked on January 16, 2021 by adam-pollack

3 Asked on January 16, 2021 by sorcererofdm

gradient descent machine learning neural networks optimization pattern recognition

5 Asked on January 15, 2021 by aristide-herve

0 Asked on January 14, 2021 by mat

experiment design fractional factorial multivariate analysis random allocation

1 Asked on January 14, 2021 by doxav

multiarmed bandit optimization queueing real time time series

1 Asked on January 14, 2021

1 Asked on January 14, 2021 by user261225

Get help from others!

Recent Questions

- Consider the sequence where $a_1>0$, $ka_n>a_{n+1}$ and $0<k<1$. Can we say it converges?
- In a Reflexive banach space, given a closed convex set $C$ and some point $y$, there is a point in $C$, of minimal distance to $y$
- Ball / Urn question with a twist
- How to prove $phi'(t)1_{Omega_t}(w)$ is measurable?
- Using characteristic functions to determine distribution of sum of independent normal random variables.

Recent Answers

- Oliver Diaz on How to prove $phi'(t)1_{Omega_t}(w)$ is measurable?
- DanielWainfleet on Consider the sequence where $a_1>0$, $ka_n>a_{n+1}$ and $0<k<1$. Can we say it converges?
- Evangelopoulos F. on In a Reflexive banach space, given a closed convex set $C$ and some point $y$, there is a point in $C$, of minimal distance to $y$
- neca on In a Reflexive banach space, given a closed convex set $C$ and some point $y$, there is a point in $C$, of minimal distance to $y$
- awkward on Ball / Urn question with a twist

© 2021 InsideDarkWeb.com. All rights reserved.