Anisotropic Gaussians

$X \sim N (μ, Σ)$

The pdf of X is $\frac{1}{( 2 π ) ^{2} ∣ Σ ∣} exp (- \frac{1}{2} (x - μ)^{T} Σ^{- 1} (x - μ))$

where, $∣ Σ ∣ is the determinant$ .

$Σ$ is the d x d PSD Covariance Matrix $Σ^{- 1}$ is the d x d PSD Precision Matrix

For understanding the function easier, rewrite $f (x) = n (q (x))$ where $q (x) = (x - u)^{T} Σ^{- 1} (x - u)$ .

If we think carefully, we notice that $n (.)$ is a $f n : R \to R$ , and $q (x)$ is a $f n : R^{d} \to R$ .

$q (x)$ is a quadratic function that we know. $q (x)$ is the quadratic form of the precision matrix $Σ^{- 1}$ (a quadratic bowl with center at $μ$ ). $n (.)$ is a monotonic, convex function: an exponential of the negation of the half of its argument. The graphs are shown below.

Left is q(x) : Quadratic Bowl. Right is the Probability Density Function. Notice how n(x) : exponential function transforms the graph.

The two graphs have different isovalues, but the mapping doesn’t change its isosurfaces. Thus, minimization of q(x) is equivalent to the maximization of the probability distribution function.

Hint

Meaning of “Different isovalues but same isosurfaces”: This suggests that the contour levels (isovalues) of the function change, but the overall structure of the level sets (isosurfaces) remains the same. This typically happens when you apply a monotonic transformation to a function. In our case, this function is exponential function.

Summary

If you understand the isosurfaces of a quadratic function, then you understand the isosurfaces of a Gaussian, because they’re the same.

Remember from Visualizing Quadratic Form that the isocontours of $(x - u)^{T} Σ^{- 1} (x - u)$ are determined by the eigenvalues and eigenvectors of $Σ^{1/2}$ .

Side Notes

Remember from the induced norm theorem that, $q (x)$ is some sort of norm squared. In this case, having a precision matrix in the middle means that the norm is a sort of warped distance from x to mean $μ$ . Formally:
$d (x, μ) = ∥ Σ^{- 1/2} x - Σ^{- 1/2} μ ∥ = (x - μ)^{T} Σ^{- 1} (x - μ) = q (x)$

Shewchuk

So we think of the precision matrix as a “metric tensor” which defines a metric, a sort of warped distance from x to the mean μ.

dev/brain

Explorer

Anisotropic Gaussians

Graph View