StatisticsBootstrapping
Bootstrapping is the use of simulation to approximate the value of the plug-in estimator of a statistical functional  which is expressed in terms of independent observations from the input distribution . The key point is that drawing  observations from the empirical distribution  is the same as drawing  times 
Example
Consider the statistical functional  the expected difference between the greatest and least of 10 independent observations from . Suppose that 50 observations  from  are observed, and that  is the associated empirical CDF. Explain how  may be estimated with arbitrarily small error.
Solution. The value of is defined to be the expectation of a distribution that we have instructions for how to sample from. So we sample 10 times with replacement from , identify the largest and smallest of the 10 observations, and record the difference. We repeat times for some large integer , and we return the sample mean of these values.
By the law of large numbers, the result can be made arbitrarily close to with arbitrarily high probability by choosing sufficiently large.
Although this example might seem a bit contrived, bootstrapping is useful in practice because of a common source of statistical functionals that fit the bootstrap form: standard errors.
Example
Suppose that we estimate the median  of a distribution using the plug-in estimator  for 75 observations, and we want to produce a confidence interval for . Show how to use bootstrapping to estimate the standard error of the estimator.
Solution. By definition, the standard error of  is the square root of the variance of the median of 75 independent draws from . Therefore, the plug-in estimator of the standard error is the square root of the variance of the median of 75 independent draws from . This can be readily simulated. If the observations are stored in a vector X, then
using Random, Statistics, StatsBase X = rand(75) std(median(sample(X, 75)) for _ in 1:10^5)
sd(sapply(1:10^5,function(n) {median(sample(X,75,replace=TRUE))}))returns a very accurate approximation of .
Perhaps the most important caution regarding bootstrapping is that the bootstrap only approximates . It only approximates (where is the underlying true distribution from which the observations are sampled) insofar as we have enough observations for to approximate well.
Exercise
Suppose that  is the uniform distribution on . Generate 75 observations from , store them in a vector , and compute the bootstrap estimate of , where  is the standard deviation of 75 independent observations from . Use Monte Carlo simulation to directly estimate . Can the gap between your approximations of  and  be made arbitrarily small by using more bootstrap samples?
Solution. The gap cannot be made arbitrarily small. We would need to get more than 75 samples from the distribution to get closer to the exact value of .
X = rand(75) std(median(sample(X, 75)) for _ in 1:10^6) # estimate T(ν̂) std(median(rand(75)) for _ in 1:10^6) # estimate T(ν)
 English
English