(In this derivation I may go into some unnecessary detail for some of the mathematically experienced readers, but I would rather to do that than for some readers to be lost by me skipping steps)
I am assuming that the reader has a firm grasp of calculus, and a working knowledge of linear algebra(very little is actually required). Please send any corrections/comments to john@solaire.ie
Most people who have taken an entry level statistic class will have heard about the normal distribution. Its is an extremely useful distribution for dealing with real world problems. By the time most students will begin dealing with the normal distribution they will have learned about elementary calculus. It seems like it would be a perfect application of integration, however instead students use log tables with precomputed values in them to solve problems. A curious student may wonder why and discover that there is no integral for the P.D.F(Probability Density Function) of the normal distribution consisting of elementary functions. What sort of crazy function could this be?
p(x)=2πσ21e2σ2−1(x−μ)2
At first glance this function seems so complex that it may seem unlikely for it to occur naturally. However as we will show here it can be derived from simple properties.
Lets call the P.D.F of the normal distribution p(x)
There are 2 fundamental properties of p(x) that we are going to use in its derivation.
The rate at which the probability of finding a value decreases is proportional to the distance from the mean.
The rate at which the probability of finding a value decreases is proportional to the frequency themselves.
We will also be using some of the properties of all P.D.Fs in this derivation.
The function will always be non-negative
The integral of the function from negative infinity to infinity is 1
We will need to translate each of these properties to mathematical notation.
dx−d(p(x))∝(x−μ)(A1)
dx−d(p(x))∝p(x)(A2)
p(x)⩾0(A3)
∫−∞∞p(x)dx=1(A4)
Combining properties 1 and 2:
dx−d(p(x))∝(x−μ)p(x)(Eq1.1)
Adding constant of proportionality
dxd(p(x))=−k(x−μ)(p(x))
p(x)1d(p(x))=−k(x−μ)dx(Eq.1.2)
Integrate Both sides
∫p(x)1d(p(x))=∫−k(x−μ)dx(Eq2.1)
∫p(x)1d(p(x))=−k∫(x−μ)dx
I have used substitution in the next stage, while it is unnecessary, doing it here keeps the formula nice and tidy.
β=x−μ(Eq.2.2.1)
dβ=dx(Eq.2.2.2)
∫p(x)1d(p(x))=−k∫βdβ(Eq.2.2.3)
∫p(x)1d(p(x))=−k2β2+C
lnp(x)=−k2(x−μ)2+C(Eq.2.2.4)
p(x)=e−k2(x−μ)2+C(Eq.2.3.1)
p(x)=eCe−k2(x−μ)2(Eq.2.3.2)
h=eC(Eq.2.3.3)
This formulation shows that h is a constant and can so be treated as such moving forward
p(x)=he2−k(x−μ)2(Eq.2.4)
∫−∞∞p(x)dx=1(A4)
∫−∞∞he2−k(x−μ)2dx=1(Eq3.1.1)
∫−∞∞e2−k(x−μ)2dx=h1(Eq3.1.2)
u=x−μ(Eq.3.2.1)
du=dx(Eq.3.2.2)
∫−∞∞e2−ku2du=h1(Eq.3.2.3)
The function is symmetric
2∫0∞e2−ku2du=h1(Eq.3.2.4)
∫0∞e2−ku2du=2h1(Eq.3.2.5)
We now need to evaluate this integral, this is a Gaussian integral. Gaussian integrals are the integrals of e to the power of a convex quadratic function. Evaluating these types of integrals is quite an involved process.
I=∫0∞e2−ku2du(Eq.4.1)
Square both sides
replace one of the u with a dummy variable v
The product of two integrals where the two variables are not dependent and the bounds of the two integral are not dependent on the other variable can be transformed to an iterated integral. How is this true?
M=∫abf(x)dx∫cdg(y)dy(Eq.B1)
If ∫abf(x)dx is defined it will become a constant, so it can be brought inside the other integral.
M=∫cd∫abf(x)dxg(y)dy(Eq.B2)
g(y) is a constant with respect to ∫abf(x)dx so it can be brought inside the integral.
M=∫cd∫abf(x)g(y)dxdy(Eq.B3)
I2=∫0∞∫0∞e2−ku2e2−kv2dudv(Eq.4.4)
I2=∫0∞∫0∞e2−k(u2+v2)dudv(Eq.4.5)
Transforming the coordinates will make it easier to solve. How this formula is derived is quite interesting and is detailed quite well here. To explain this would require its own post but I highly recommend checking out the link. I will explain each of the components of this formula before moving on.
We are going to transform it from using Cartesian coordinates to polar coordinates
Let T be the function which does the transformation.
(x,y)=T(u,v)(Eq.5.1)
The (x,y) are the coordinates of the original region we are working with.
(u,v) are the new coordinates we are changing to, for polar coordinates these are (r,θ)
This means that
T(r,θ)=(rcosθ.rsinθ)(Eq.5.2)
DT(r,θ) is the derivative matrix of T
What is a derivative matrix?
It is a generalization of the derivative of a single valued function.
Lets start with a simple function.
Let f(a):R→R
Df(a)=[dxdf](Eq.C1)
The derivative matrix of single input, single output function is just a matrix with a single entry which is the derivative of the function.
Let g(a):RN→R
Dg(a)=[∂x1∂f∂x2∂f∂x3∂f…∂xn∂f](Eq.C2)
The derivative matrix of function with multiple inputs and a single output is a matrix with 1 row with an entry each input which is the partial derivative of the output of the function with respect to that input.
The derivative matrix of function with multiple inputs and multiple outputs is a matrix with row for each output, with an entry on each row for each input which is the partial derivative of the output of that row with respect to that input.
So what is the derivative matrix of the function T?
The sign of h will be the same as the sign of I. As p(x)⩾0 for all values of x. This means h⩾0 and therefore I⩾0 . Therefore we reject the negative solution for I.
I=2kπ(Eq.7.1.1)
I=∫0∞e2−ku2du=2h1(Eq.7.1.2)
2kπ=2h1(Eq.7.1.3)
2kπ=2h1(Eq.7.1.4)
π2k=2h(Eq.7.1.5)
2π2k=h(Eq.7.1.6)
2πk=h(Eq.7.1.7)
This P.M.F satisfies the all the properties we started with, however the k is a constant which encodes some information, modifying k will change the properties of the distribution but we don't know in what way. Encoding extraneous information in a distribution is a bad idea. If we plot this function with various values of k we can see that variance changes. So we will find the k in terms of the variance.
p(x)=2πke2−k(x−μ)2(Eq.7.2)
The definition of variance
σ2=∫−∞∞(x−μ)2f(x)dx(Eq.C6)
In our case our function is p(x)
σ2=∫−∞∞(x−μ)2p(x)dx(Eq.8.1)
L=x−μ(Eq.8.2.1)
dxdL=1(Eq.8.2.2)
dL=dx(Eq.8.2.3)
σ2=∫−∞∞(L)2p(L+μ)dL(Eq.8.3.1)
σ2=∫−∞∞(L)22πke2−k(L+μ−μ)2dL(Eq.8.3.2)
σ2=∫−∞∞(L)22πke2−k(L)2dL(Eq.8.3.3)
σ2=2πk∫−∞∞L2e2−kL2dL(Eq.8.3.4)
σ2=2πkJ(Eq.8.3.5)
Next we need to evaluate
J=∫−∞∞L2e2−kL2dL(Eq.8.4.1)
It is in fact symmetrical so we can rewrite it as follows
J=2∫0∞L2e2−kL2dL(Eq.8.4.2)
Integration by parts
Integration by parts is an extremely useful method for solving integrals and since it derivation is relatively straight forward I will go on a bit of a tangent and show it to you now.