利⽤ EM 迭代模型参数的思路
我们来介绍如何利⽤ EM 的⽅法,迭代地求取⾼斯混合模型的参数,先拿出迭代公式,我们令:
- 样本的观测数据集为$X = {x_1, x_2, …, x_N}$
- 隐变量集为$Z = {z_1. z_2, …, z_N}$
- 参数为$\theta = {p_1, p_2, …, p_K, \mu_1, \mu_2, …, \mu_K, \Sigma_1, \Sigma_2, …, \Sigma_K}$
则有:
\[\theta^{t+1} = arg \max_\theta~\int_Z log~p(X, Z \mid \theta)p(Z \mid X, \theta^{(t)})dZ\]我们令:
\[Q(\theta, \theta^{(t)}) = \int_Z log~p(X, Z \mid \theta) p(Z \mid X, \theta^{(t)})dZ\]其中$Z$是离散型变量, 包含了$N$个样本$z_1, z_2, …, z_N$, 因此进一步转化为:
\[\begin{aligned} \theta(\theta, \theta^{(t)}) & = \int_Z log~P(X, Z \mid \theta) p(Z \mid X, \theta^{(t)})dZ \\ & = \sum_Z [log \prod_{i=1}^N p(x_i, z_i \mid \theta) \prod_{i=1}^N p(z_i \mid x_i \theta^{(t)})] \\ & = \sum_{z_1, z_2, ..., z_N}[(\sum_{i=1}^N log~p(x_i, z_i, \theta))(\prod_{i=1}^N p(z_i \mid x_i, \theta^{(t)})] \\ & = \sum_{z_1, z_2, ..., z_N}[(log~p(x_1, z_1 \mid \theta) + ... + log~p(x_N, z_N \mid \theta))](\prod_{i=1}^N p(z_i \mid x_i, \theta^{(t)})) \end{aligned}\]我们任取其中一项, 看看能否化简:
\[\sum_{z_1, z_2, .., z_N}[log~p(x_1, z_1 \mid \theta)\prod_{i=1}^N p(z_i \mid x_i, \theta^{(t)})] = \sum_{z_1, z_2, ..., z_N}[log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)}) \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)})]\]我们知道有这么一条性质:
\[\sum\sum x_iy_i = \sum x_i \sum y_i\]因此上式进一步化简为:
\[\begin{aligned} \sum_{z_1, z_2, ..., z_N}[log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)})\prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)})] & = \sum_{z_1, z_2, ..., z_n} [log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)})] \sum_{z_1, z_2, ..., z_N} \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)}) \\ & = \sum_{z_1}[log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)})] \sum_{z_1, ..., z_N} \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)})\end{aligned}\]上面的最后一步是依据各个求和等式中的有关项和无关项,化简了求和公式下面的下标。
我们单独看后面一部分:
\[\begin{aligned} \sum_{z_1, ..., z_N} \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)}) & = \sum_{z_1, ..., z_N} p(z_2 \mid x_2, \theta^{(t)})p(z_3 \mid x_3, \theta^{(t)}...p(z_N \mid X_N, \theta^{(t)})) \\ & = \sum_{z_2} p(z_2 \mid x_2, \theta^{(t)}) \sum_{z_3} p(z_3 \mid x_3, \theta^{(t)})... \sum_{z_N}p(z_N \mid x_N, \theta^{(t)}) \end{aligned}\]对于其中任意一项,由于$z_i$取遍所有值, 根据概率加和为1的基本定理, 因此有:
\[\sum_{z_i}p(z_i \mid x_i, \theta^{(t)}) = 1 \Rightarrow \sum_{z_2, ..., z_N} \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)}) = 1\]所以:
\[\begin{aligned} \sum_{z_1, z_2, ..., z_N} [log~p(x_1, z_1 \mid \theta) \prod_{i=1}^N p(z_i \mid x_i, \theta^{(t)})] & = \sum_{z_1}[log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)})]\sum_{z_2, ..., z_N} \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)}) \\ & = \sum_{z_1} log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)}) \end{aligned}\]这只是其中一项, 那我们所有项都类比地考虑进来, 得到$Q(\theta, \theta^{(t)})$最终地化简形式:
\[\begin{aligned} Q(\theta, \theta^{(t)}) & = \sum_{z_1, z_2, ..., z_N}[(log~p(x_1, z_1 \mid \theta) + ... + log~p(x_N, z_N \mid \theta))] \\ & = \sum_{z_1}[log~p(x_1, z_1 \mid \theta) p(z_1 \mid x_1, \theta^{(t)})] \end{aligned}\]