I am trying to design an email test to measure the demand lift obtained from a marketing promotion (treatment) versus no promotion (control). To do so, I want to calculate the per-group sample size required to get a significant read on the difference in average demand per-customer for different marketing segments.

To do so, I am applying the following formula (for each segment):

$$

N = frac{2(Z_{1-alpha/2}+Z_{pi})^2sigma^2}{Delta^2}

$$

Where:

$Z_{1-alpha/2}$ = percentile of the normal distribution used as the critical value in a two-tailed test (1.96)

$Z_{pi}$ = percentile of the normal distribution where $pi$ is the power of the test (0.84 for 80th percentile)

$sigma$ = within-group standard deviation

$Delta$ = expected mean difference between the treatment versus control population

To calculate the standard deviation and expected mean difference above, I pulled historical response for the same period last year during which the test will run. My question is this: should the group means and standard deviations be estimated from the total population which was exposed to the treatment (and control), respectively, or should the mean and standard deviation be calculated based on respondents only? Put another way, should I use the mean/variance for the full audience exposed to a given treatment in the past, or the mean/variance for responders only, and then back solve for required full audience?

The results that I’m getting appear counter-intuitive, with similar required sample sizes among the most-engaged and least-engaged audiences, so I know I must be doing this wrong.

Most of the material that I’ve come across from the marketing community involves using a desired difference in response rate to solve for appropriate per-group sample sizes. In my case, however, the metric of interest is demand-based rather than raw response (average demand per customer). That said, the response rate is an important metric, as it is particularly low for certain groups of customers, but it does not directly reflect the metric of interest.

Thanks in advance!

Cross Validated Asked by user291972 on November 14, 2021

1 AnswersHere is a simulation to show that your approximate formula for sample size $n$ gives a reasonable answer for a particular case, which may be realistic.

Suppose $sigma^2/Delta^2 = 9,$ significance level is 5% and desired power is 80%. Then the formula gives $n approx 141.$ [An exact formula would use a noncentral t distribution, but with $n > 100,$ the approximate formula should be OK.]

```
n = 2*(1.96+.84)^2*9; n
[1] 141.12
```

Now suppose I do $m = 100,000$ two-sided pooled two-sample t tests using samples of size $n = 150$ to try to detect a significant difference (5% level) in sample means from populations $mathsf{Norm}(mu_1 = 100, 15)$ and $mathsf{Norm}(mu_2 = 105, 15),$ so that $Delta = 5, sigma= 15$ and $sigma^2/Delta^2 = (15/5)^2 = 9.$ [For the population means, only $Delta=|mu_1-mu_2| = 5$ matters.]

Then I should reject at the 5% level a little more than 80% of the time. The simulation shows rejection 82% of the time, so the simulation is in substantial agreement with your formula.

```
set.seed(2020)
pv = replicate(10^5, t.test(rnorm(150,100,15),
rnorm(150,105,15),var.eq=T)$p.val)
mean(pv <= .05)
[1] 0.82189
```

Answered by BruceET on November 14, 2021

3 Asked on March 9, 2021 by pythonnoob

0 Asked on March 4, 2021 by bmurray

0 Asked on March 2, 2021 by pluviophile

1 Asked on March 2, 2021 by sleepy

chi squared test contingency tables ecology hypothesis testing statistical significance

0 Asked on March 1, 2021 by sedi

2 Asked on February 28, 2021 by peterbe

0 Asked on February 27, 2021 by user2991421

categorical data categorical encoding continuous data machine learning random forest

1 Asked on February 27, 2021 by mathslover

1 Asked on February 27, 2021 by misologie

1 Asked on February 25, 2021 by mcgurck

0 Asked on February 25, 2021 by la_haine

0 Asked on February 24, 2021 by zge

0 Asked on February 24, 2021 by diricksen

1 Asked on February 24, 2021 by zvisofer

0 Asked on February 23, 2021 by afton-nelson

0 Asked on February 23, 2021 by jamzy

1 Asked on February 23, 2021 by fluctuation

Get help from others!

Recent Questions

- Steam error: “You are missing the following 32-bit libraries, and steam may not run: libXtst.so.6 libgtk-x11-2.0.so.0”
- a certain website doesn’t respond in Opera and Chrome
- Viewing the contents of a flatpak
- Singular noun objects of plural subjects
- Is it correct to say “proud of having met you”, or should it be “proud to have met”?

Recent Answers

- cges30901 on Viewing the contents of a flatpak
- libbynotzoey on Singular noun objects of plural subjects
- TrevorD on Singular noun objects of plural subjects
- mark on Singular noun objects of plural subjects
- Dan Scally on Steam error: “You are missing the following 32-bit libraries, and steam may not run: libXtst.so.6 libgtk-x11-2.0.so.0”

© 2021 InsideDarkWeb.com. All rights reserved.