# Unbiased estimatior for $bar{x}$ from a Random Sample with unequal selection probability

I have the following population:

Where the left column is the age of our individuals and the right column is their weight (in kg).

The exercise tells us that we use Random Sampling with no replacement to take our sample and we are twice as likely to select an individual whose age is lower than 20.

I have to find 3 things:

1. An unbiased estimator for $$bar{x}$$.
2. The probability of obtaining the sample: $$S = { 50, 35, 85 }$$
3. With $$S$$ as our sample, estimate the total population weight with a 75% confidence interval.

Any help would de appreciated, I have worked on this for hours and gotten nowhere.

Cross Validated Asked by PLanderos33 on November 14, 2021

A small population with twice as many children as adults. Suppose there are 4 each of children of ages 5, 10, and 15; and that there are 3 each of adults of ages 25 and 45. That means the average weight in the population of $$36$$ is $$[4(20+35+50) + 3(90+85)]/18 = 52.5$$kg.

What sample size? Also, suppose we are take a random sample of $$n = 3$$ from this population. (A clue that we should use $$n = 3$$ is having been given a sample of size three in the problem.)

The population of 36 weights is kg as follows:

kg=c(rep(c(20,35,50), each=4), rep(c(90,85), each=3))
mean(kg)
[1] 52.5


Simulation results. If we take many samples of size 3 from this population, we can get a good approximation of the sampling distribution.

set.seed(2020)
m = 10^5;  n=3;  s.3 = wt = numeric(m)
for(i in 1:m) {
x = sort(sample(kg, 3))
s.3[i] = sum(x == c(35,50,85))
wt[i] = mean(x) }
mean(s.3==3)    # prob sample has 35,50,85
[1] 0.058995    # aprx 1/17
1/17
[1] 0.05882353  # exact 1/17
mean(wt)
[1] 52.50809    # aprx 52.5
2*sd(wt)/1000
[1] 0.02902334


Probability of specified sample. With a million samples, one can expect 2 or 3 places of accuracy. One can show by simple combinatorics that the probability of getting one each of the weights $$35, 50, 85$$ (in some order) is $$1/7,$$ which is consistent with the simulation.

Unbiased estimator. Also, the mean weight in the population is $$52.5.$$ The simulation approximates $$E(bar X_3)= 52.508 pm 0.029,$$ with a 95% margin of simulation error.

If sampling had been with replacement, it is obvious that the mean of the sample of $$n=3$$ would be an unbiased estimate of the population weight $$52.2.$$ It is not hard to show that the same is true for sampling without replacement, and I will leave that to you.

Confidence intervals. I don't know what you have studied about confidence intervals. The sample mean of the specified sample of three observations is $$bar X_3 = 56.67;$$ it should be the center of a CI for the true mean weight of the population. Using it's standard error you should be able to get some style of CI.

Three observations are hardly enough for a good bootstrap CI, but if you know about bootstrapping this part of the problem may be a prompt to do whatever kind of bootstrap you may have studied. A naive percentile 75% nonparametric bootstrap CI can be found as follows (repeatedly re-sampling with replacement from the sample of three). This CI is $$(40.0, 73.3),$$ which does cover the known population mean.

set.seed(721)
re.avg = replicate(10^4, mean(sample(c(35,50,85), 3, rep=T)))
quantile(re.avg, c(.125, .875))
12.5%    87.5%
40.00000 73.33333


Answered by BruceET on November 14, 2021

## Related Questions

### Relative Error is not normally distributed

1  Asked on January 3, 2021

### Tensor product between an ispline and a bspline for fitting data that should be monotonic in one dimension

0  Asked on January 3, 2021

### Interpretation of TSA::arimax output model is presented in R

1  Asked on January 2, 2021 by wasif

### Training samples with no labels: To include or not to include?

1  Asked on January 2, 2021 by aishwarya-a-r

### Custom Loss Function – Inducing sparsity

1  Asked on January 2, 2021 by mark-f

### Belief propagation on Polytree

0  Asked on January 2, 2021 by jonasc

### Q: Dividing maximum value by minimum value and reporting the difference “in times”

0  Asked on January 2, 2021

### Hypothesis test for difference of mean when two groups have different size population

1  Asked on January 1, 2021 by ambleu

### Combining Error Terms into a General Error Term

1  Asked on January 1, 2021

### Should I delete or average repeating training inputs from a Gaussian Process?

1  Asked on December 31, 2020 by mvharen

### Does data point ordering matter in LASSO regression?

0  Asked on December 31, 2020 by rik

### Split train//validation/test sets by time, is it correct?

3  Asked on December 31, 2020 by wishihadabettername

### Bayesian inference on mean of statistic from population

1  Asked on December 31, 2020 by helmut

### How to plot $x^{1700}(1-x)^{300}$?

3  Asked on December 30, 2020

### Relaxed Lasso Logistic Regression: Estimating second penalty parameter

2  Asked on December 30, 2020 by joanne-cheung

### Chi squared test questions

0  Asked on December 30, 2020 by woodpigeon

### QQ plot comparison of z-normalized datasets

1  Asked on December 30, 2020 by prinzvonk

### Quantify whether a set of binary segmentation models (experts) have diversity on a fixed dataset?

1  Asked on December 30, 2020 by saeed

### Weighted normal errors regression with censoring

1  Asked on December 29, 2020 by paul-m

### Why does scipy use Wald Statistic + t-test as opposed to Wald Statistic + Wald test for linear regression?

1  Asked on December 29, 2020 by adam-kurkiewicz

### Ask a Question

Get help from others!