We have a set of observations (xi,yi), and we want to find the parameters a,b,σ,μ such that Y=aX+b+ϵ, where ϵ≈N(0,σ) is Normally distributed.
Solution
We will apply Maximum Likelihood Estimation. Applied to this problem, we want to find the argmax of the expression below:
∏P[yi∣xi]=log(∏P[yi∣xi])=∑log(P[yi∣xi])=∑log(σ2π1e−21(σaxi+b−yi)2)=−Nlog(σ)−2Nlog(2π)−21∑(σaxi+b−yi)2=−Nlog(σ)−21∑(σaxi+b−yi)2As log(∏zi)=∑log(zi)As P[yi∣xi]=σ2π1e−21(σaxi+b−yi)2As log(∏zi)=∑log(zi)As −2Nlog(2π) is a constantAs −2Nlog(2π) is a constant
In order to find the argmax we equal all derivatives to zero:
0a∑xi+Nb=∂b∂∏P[yi∣xi]=∂b∂(−Nlog(σ)−21∑(σaxi+b−yi)2)=∂b∂(∑(σaxi+b−yi)2)=∑∂b∂((axi+b−yi)2)=∑2(axi+b−yi)=∑axi+b−yi⇓=∑yiRemoving terms without b and constantsDividing both sides by 2
Doing the same for a:
0a∑xi2+b∑xi=∂a∂∏P[yi∣xi]=∂a∂(−Nlog(σ)−21∑(σaxi+b−yi)2)=∂a∂(∑(σaxi+b−yi)2)=∑∂a∂((axi+b−yi)2)=∑2xi(axi+b−yi)=∑axi2+xib−xiyi⇓=∑xiyiRemoving terms without a and constantsDividing both sides by 2
Applying the formula above, we get the solution for the Linear regression as a function of our observations. We can also calculate σ if we want to know the estimated error of our predictions, but it is not needed to build the predictor itself