The central limit theorem is probably the most important result in all of statistical theory. In fact, the central limit theorem makes statistics possible. It is one of those mathematical results that seem counter intuitive. Yet, if you understand it in a deep way, it kind of seems so obvious.

## What does the central limit theorem say?

In essence, the central limit theorem says that if you take a sample from any distribution, and calculate the mean of that sample, it should approximate the true mean of the actual distribution. Of course, that seems obvious. Here’s where the magic happens though, if we take multiple samples and calculate the mean for them, they will be different from each other. Some will be less than the true mean, some will be more than the true mean. On average, we will have as many that are larger than the true mean than will be below the true mean. Also intuitively, as we get further from the true mean for the distribution, the less likely one of our samples will produce a mean that extreme.

Given these facts, it seems almost logical that by taking multiple samples, we will end up with a bell curve around the true mean. In fact, that is exactly what the central limit tells us that we will get. The distribution of our sample mean will be distributed according to a normal distribution! No matter what distribution, or how odd it looks, repeated sample means will give us a normal distribution! That is what is incredible about the central limit theorem.

This makes everything so much easier. Armed with this knowledge, we can take advantage of this fact and quantify how uncertain we are about what the true value for a mean is. No matter how the data is distributed, we can talk meaningfully about the uncertainty of parameter estimates. That is exactly what makes statistics useful and possible.

Notice that I didn’t go into too much technical details. The wikipedia page has the math, in all of its glory. I just wanted to give you the intuition behind the math with my description above, don’t take my description too literally, there are actual formal proofs you can rely on.

## An Example in Python

The first thing that we need to do is import some libraries. We will import a statistical distribution, the generalized gamma distribution. Then we will import numpy, and finally we will import matplotlib for making some plots.

```
from scipy.stats import gengamma
import matplotlib.pyplot as plt
import numpy as np
```

With these packages in place, we can start to build out the example in python. We’ll start by sampling from the generalized gamma distribution. The distribution that we are going to work with will have parameters of a=40,c=2. Here is the code to do that.

`sample = gengamma.rvs(40,2,size=1000)`

This will give a distribution that looks like this:

You can see that this distribution is not normally distributed. It doesn’t have to be. So what we’ll do is we’ll get 1000 samples of 1000 random draws each, and we will compute the mean of each sample. The code is basically, just looping through the code above a bunch of times and storing the results.

```
means = []
check = [10,100,1000]
for i in range(1000):
if i in check:
plt.hist(means)
plt.title('Histogram of {} Sampled Means')
plt.show()
means.append(np.mean(gengamma.rvs(40,2,size=1000)))
```

Here is what the distribution of sampled means looks like for the first 10 samples:

And here it is for 100 samples:

And finally this is what the distribution for all of 1000 samples:

As you can see the distribution of sampled means converges to a normal distribution.

## So What?

So far, I have only taken the time to talk about the mean. It turns out though, that the logic I laid out above will work for any statistic. For a sampled dataset, it will be above the true statistic value for the population sometimes, and it will be below it. The further you get away from the true value, the less likely you will find a value that far away. So all statistics will follow this normal looking distribution. This result is what statistics is based off of.

We care because this let’s us meaningfully talk about standard errors, which are just a way to estimate how wide this distribution should be from just a single sample. But there is something else that I want to bring up. Something far more important, it is the concept of bootstrapping your standard errors for a statistic.

## Bootstrapping And the Central Limit Theorem

So in my last post, I talked about how to make a maximum likelihood estimate. Well, as I mentioned in that post, the default optimizer in python will create an estimate of the inverse hessian matrix, which gives you the variance-covariance matrix for your parameters, and thus you can talk about statistical significance. How can you handle that if you need to rely on an optimizer that won’t create an estimate of the variance covariance matrix? You can rely on the central limit theorem and bootstrap your way to success.

The way to solve that is that you consider your dataset to be the population from which you are drawing samples. Then you just draw samples with replacement, calculate your statistics of interest on your sampled data, and record the results. Rinse and repeat a couple thousand times, and you have a very nice estimate of the standard errors around your statistic of interest. Bootstrapping is just a simple application of the central limit theorem, and it will always work! You can get an idea of how statistically significant your results are by looking at the distribution around your statistic of interest. So go out there and start applying the central limit theorem in what you are doing.