How do we use Maximum Likelihood Estimation for Generative Models (Gaussian Discriminant Analysis, QDA, LDA)?
In Gaussian Discriminant Analysis, we understood the approach of generative models. Particularly, we want to find the probability distribution of the underlying classes rather than just the decision boundary. However, in real life, we do not know those probability distributions accurately. Therefore, we need some sort of estimation tool to figure out the prior probability and the probability distribution of the feature classes.
Equivalently,
- We model the class conditional probability distributions for each class
k
(often assuming that they’re Gaussian).- Estimate the parameters of these distributions—such as the mean vectors, covariance matrices, and the class priors—from the training data (commonly using methods like Maximum Likelihood Estimation).
MLE is used for estimating the parameters of a statistical distribution. In GDA, we need to estimate the normal distribution (continuous) of each class and prior probability (discrete) for each class.
Coin Flipping Exercise (for prior probability)
Flip a Biased Coin with Head probability p
and tail 1-p
. Suppose that I flip the coin 10 times, and got 8 heads and 2 tails
Question: what is the value of bias p
that is the most likely to lead to this outcome.
Recall that the number of heads is binomial distribution:
Thus, the probability of getting 8 heads in 10 flips is
Let’s define this expression as the “Likelihood”. We will see why we call likelihood instead of probability when it comes to the continuous distribution case.
Therefore, the LIKELIHOOD FUNCTION is:
Optimization problem.
We can solve this by finding the critical point of
Intuitively, this calculation yields what we expected, which is
Estimated Prior Probability
Important
Given that we observe an event “A”
x
times out ofn
trials and we want to estimate the Prior Probability of event A happening, then the estimated Prior Probability is :
Another Definition: Suppose our training set is n points, with x in class C. Then our estimated prior for class C is .
Likelihood of a Gaussian
We have training data (sample points) . We would like to find the best-fit Gaussian (meaning find the best and for given training data.)
Difference between Probability and Likelihood
In Continuous Distribution, the probability of getting a particular point is zero. However, in Likelihood, we will ignore this phenomenon and consider that it’s not zero.
Likelihood of Gaussian is defined as:
To simplify the computation, we take the log of this and called it the Log Likelihood,
Recall the PDF of Multivariate Gaussian Distribution is:
Each is a Normal Distribution, thus,
Similar to the discrete case, we can take the derivative of log likelihood to find the critical point (maximum).
Estimation of
Note that this expression is the same as the Sample Mean.
Estimation of Variance
is used since we do not know the exact mean of the distribution. Our best is the estimated u ()
Takeaway
Use Sample Mean and Sample Variance* in class
C
to estimate the mean and variance of classC
Gaussian. * Almost Sample Variance except we’re using the estimated .
Conclusion:
- QDA: Estimate the conditional mean, and conditional variance of each class Separately and estimate Prior Probabilities .
- LDA: Estimate the conditional mean, and Prior Probabilities and One Variance for all Class.
- We define “one variance for all class” as:
Shewchuk
Notice that although LDA is computing one variance for all the data, each sample point contributes with respect to its own class’s mean. This gives a very different result than if you simply use the global mean! It’s usually smaller than the global variance. We say “within-class” because we use each point’s distance from its class’s mean, but “pooled” because we then pool all the classes together.
- Basically, the mean and prior probability calculation remains the same, whereas for variance, the calculation of variance is different in such a way that we use “
class mean
instead ofoverall mean
” in calculating the “one” variance.