$Y = WX$, if $\Sigma(X)$ is known and $\Sigma(Y)==I$ , then $W$ is a whitening transformation.

常见的W可以是: $L^T$ [$\Sigma(X)$的Cholesky decomposition factor], 或者直接就是 $\Sigma(X)^{-\frac{1}{2}}$

对于数据矩阵的白化:把每个feature当成一个随机变量,那么样本就是一个随机的向量;的一般是先用极大似然估计这组数据的协方差矩阵,再用Cholesky分解进行白化;
在自监督学习中,白化hidden representation是一种重要的方法:
The whitening operation has a “scattering” effect on the batch samples, avoiding degenerate solutions where all the sample representations collapse to a single point.
Ref: