# Understanding the Central Limit Theorem

The Centrum or Central Limit Theorem is a very important part of probability theory. This theorem establishes that, even in cases where the original variables are not distributed normally, when an independent random variable is added, a bell curve or normal distribution can be achieved through their properly normalized sum.

This theorem is a significant contributor to the probability theory as the theorem establishes that statistical and probabilistic methods that help create normal distributions are also applicable towards problems that might involve other types of distributions also.

The central limit theorem has been derived from probability theory and the theorem is useful in various places in statistics. The central limit theorem might seem abstract or devoid of any crucial application, but this theorem is very important when it comes to practices of statistics.

The central limit theorem mainly deals with the distribution of a population. The theorem allows the users to simplify statistical problems by allowing the users to work with any distribution that is normalized approximately.

One assumption that the theorem works on is that the sample size should be large enough. But to understand how large should “large enough” be, we need to look at two factors.

**Accuracy Requirements**–

If the statistician requires the sampling distribution to resemble a normal distribution extremely closely then more sample points will be required.

**Shape of Underlying Population**–

If the underlying population or the original population resembles the normal distribution very closely then fewer sample points are required to solve the problem at hand.

**Understanding the Central Limit Theorem with an
Example**

A good example of Central Limit Theorem is suppose that if a sample has manifold observations and each of those observations is generated randomly such that the particular observation is independent of the value of any other observations in the sample.

If the arithmetic mean of these observed values is computed and such a method is applied multiple times, then in such a case, the CLM (central limit theorem) states that average distribution will be approximated closely by normal distribution.

If a coin is flipped many times then the chances that a certain number of heads will be generated by creating a fixed number of flips will result in a normal curve, the mean of such a distribution will approximate to one half of how many ever total flips in each series.

CLT has multiple variants and in for the generic form of the theorem, all the random variables of the distribution must be distributed identically. For non identical distributions (non independent observations) also, in variants, the mean converges to the sample’s normal distribution occurs but they should follow the conditions of CLT.

The de Moivre–Laplace theorem is the most basic version of the theorem and it stated that normal distribution is capable of being an approximation of the binomial distribution.

**Central Limit Theorem- Practically Useful or a Weak
Theorem?**

In practice, CLT is a weak-convergence theorem in the most applicable theory of economics- The Probability theory. Such theorems express that the addition of many identically distributed independent random variables, or random variables that have specific types of dependence, have a tendency to be distributed as per any of the attractor distributions. When the variables have a finite variance, the distribution considered to be normal distribution is in fact the attracter distribution. Let us see how this theorem is useful in practice for statistical problems.

### Central Limit Theorem in Practice

The central limit theorem is of significant use in probability sampling. For instance, the unexpected appearance of normal distribution from a skewed population distribution has very important applications when it comes to statistical practice.

The practices in statistics involving confidence intervals or hypothesis testing make certain assumptions concerning population from which the data was obtained. One such assumption that is made initially in a statistics course is that populations with which we work are normally distributed.

This assumption that the data has been taken from a normal distribution simplifies the matters for the time being but such an assumption might seem a bit unrealistic to a statistician. If one starts working with real data, then it can be realized that outliers, asymmetry, skewness and multiple peaks can show up quite normally in any population. One can maneuver with the problem of data from any population which is not normally distributed. The central limit theorem and the use of appropriate sample size helps us in getting around any problems related to data from populations that is not distributed normal. Thus this is a theorem of great practical value.

Even though, the statisticians might not know shape of any distribution from where our data comes, the central limit theorem says that that we need to treat sampling distribution as if it is normal.

It is also important to understand that for statisticians, that in order for their conclusions which were derived through this theorem, to hold true, we won’t need a large enough sample size. We can further understand, for any given situation, the size of sample required through exploratory data analysis.

**General Idea of the Central Limit Theorem**

The central limit theorem works on a general principle that regardless of the model that is used in population distribution, as n increases, the mean of the sample will be distributed normally around the mean of the population, and as n increases, the standard deviation decreases.

Certain conditions have to be followed Central Limit Theorem. These conditions are explained below;

- The samples that the theorem is to be applied on should be independent
- The sample size that the theorem is to be applied on should be “large enough”

**Condition for the Central Limit Theorem**

Independent Sample Test – The condition is as described below:

- Randomization – Every sample used in the theorem would represent random sample part of the whole population, or minimum need to follow population.
- 10% Rule – n that is being employed can’t be more than 10% of population.

Large Enough Sample Size

- Example size n must be big sufficient as per mentioned below:

np≥10; nq≥10

**An example to illustrate whether the Central Limit
Theorem is appropriate**

Let’s take an example to understand how aptly this theorem holds true in a statistical problem. It is supposed that 8% children are affected by nearsightedness in the entire population. The population consisted of 194 children who got eyesight tested. In such a case, let us see if we can use the central limit theorem.

**Randomization –**

We will need to presume that there is no factor in this region that will make it possible for these kids having vision problems.

**10% Rule –**

This population will be all children this will be in millions. 194 will be less than the population’s 10%.

N (p) =194*.08 will be 15.52; n (q) =194*.92 is 176.48

We will need to an assumption when we use the Central Limit Theorem in such a situation

**Sample Mean of Central Limit Theorem **

X1, X2, ….., Xn will give n number of random variables which will be independent as well as distributed identically with mean (μ); standard deviation (σ).

X = (X1 +X 2 +…+X n)/n is the sample mean

Thus, E(X) =μ and SD(X) =σ/√n

**Central Limit Theorem’s Implications**

- A sample’s proportions- If we have any population that has the probability (p) of particular characteristic (with q=1-p). If in a population, the random sample is n then mean and the standard deviation of our sample’s proportion that has feature
- We will be able to use Central Limit Theorem when n will be large enough
- If the number of times the distinguishing is present in the sample is X, p=X /n, the sample proportion will have mean (p) and standard deviation √(pq/n)

**Central Limit Theorem’s Application**

If nearsightedness affects 8% of children, 194 children will have the eyesight tested.

- X has to be normally distributed having mean of .08*194=15.52 and the standard deviation of √(.08*.92*194)= 3.77

**Central Limit Theorem for Proportions **

The distribution of proportion of the children with nearsightedness distributed?

To be divided by n= 194: mean = .08, Standard Deviation = .0195

68-95-99.7 Rule

- One can have surety of 68% that sample mean is less than 1 SD
- One can have surety of 95% the sample mean is less than 2 SD
- One can have surety of 99.7% that sample mean is less than 3 SD.

Example: Nearsightedness (cont) With 192 received kids, what is a sensible variety of shortsighted kids the department can assume?

- Because 3 standard deviations cover 99.7% of the data, we use this for ‘sensible’.
- 3 SD = 11.34
- 15.5 -11.4 = 4.2, 15.52+11.361=26.881
- The expected no. of nearsighted children will be in range of 4 to 27.

**Central Limit Theorem involving a Dichotomous Outcome**

Suppose if any characteristic say X is measured, in any population. Let’s assume that it is a dichotomous characteristic and if about 30% of parent population is success. This is shown below.

Central Limit Theorem is applicable even to such binomial populations given the condition below is met

Minimum of n(p) and n(1-p) is > 5

Where n is sample size,

p the probability of success for a trial

So we can take sample with n being 20 with substitution,

So min(n (p), n(1-p)) = min (20*0.3; 20*0.7) = min (6, 14) becomes 6

Thus our criterion for CLT will be met.

As seen earlier for the binomial distribution, the mean for the population mean and standard deviation are:

The distribution of means of sample that will be based on samples size of 10 is as shown below.

Rather than taking sample size 20, if random samples with n being 10 are taken. In such the requirement of sample size is not met for CLT

(min(n(p), n(1-p)) = min (10*0.3, 10*0.7) = min (3, 7) is 3

Sample size should be higher for this distribution to reach normality

**Central Limit Theorem when the Distribution is Skewed**

Poisson distribution is additional useful probability model to model discrete variables. An example could be as how many events occur at any particular time interval. Let us consider, a person gets 4 spam mails in a day and the number of mails is not constant. If that person receives 5 spam email on any particular day. In such a case we need to understand that event’s probability considering the usual rate to be 4 everyday.

For such a case, we calculate the Poisson probability as below:

For such a distribution, the mean will be μ, the number of events that occur will be ‘X’ and the approximate constant (‘e’)will be equal to 2.71828. So for the example that is mentioned above:

Let us consider a new Poisson distribution that hasμ is 3 and σ is 1.73. This distribution will be as is shown in below figure:

The population will not be distributed normally, still CLT applies when n is greater than 30. If the sample size of the sample hosen is 30, the samples obtained will be distributed the way they are shown in below graph(mean is 3; standard deviation is 0.32). On the contrary, with samples that are smaller with sizes such as 10, the samples obtained will be distributed the way they are exhibited in the lower graph.

Also sample size of 10 will not conform to the criteria for CLT, and smaller samples will provide a distribution which will not be normal. The sample’s standard deviation will be higher with smaller samples.

**Author
Profile:** Mark
Brady has an MBA degree
with 15 years of work experience in multiple fields including marketing,
marketing research, analytics and insight development. He has worked with
clients in multiple industries like retail.