On this page

    利⽤ EM 迭代模型参数的思路

    我们来介绍如何利⽤ EM 的⽅法,迭代地求取⾼斯混合模型的参数,先拿出迭代公式,我们令:

    则有:

    \[\theta^{t+1} = arg \max_\theta~\int_Z log~p(X, Z \mid \theta)p(Z \mid X, \theta^{(t)})dZ\]

    我们令:

    \[Q(\theta, \theta^{(t)}) = \int_Z log~p(X, Z \mid \theta) p(Z \mid X, \theta^{(t)})dZ\]

    其中$Z$是离散型变量, 包含了$N$个样本$z_1, z_2, …, z_N$, 因此进一步转化为:

    \[\begin{aligned} \theta(\theta, \theta^{(t)}) & = \int_Z log~P(X, Z \mid \theta) p(Z \mid X, \theta^{(t)})dZ \\ & = \sum_Z [log \prod_{i=1}^N p(x_i, z_i \mid \theta) \prod_{i=1}^N p(z_i \mid x_i \theta^{(t)})] \\ & = \sum_{z_1, z_2, ..., z_N}[(\sum_{i=1}^N log~p(x_i, z_i, \theta))(\prod_{i=1}^N p(z_i \mid x_i, \theta^{(t)})] \\ & = \sum_{z_1, z_2, ..., z_N}[(log~p(x_1, z_1 \mid \theta) + ... + log~p(x_N, z_N \mid \theta))](\prod_{i=1}^N p(z_i \mid x_i, \theta^{(t)})) \end{aligned}\]

    我们任取其中一项, 看看能否化简:

    \[\sum_{z_1, z_2, .., z_N}[log~p(x_1, z_1 \mid \theta)\prod_{i=1}^N p(z_i \mid x_i, \theta^{(t)})] = \sum_{z_1, z_2, ..., z_N}[log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)}) \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)})]\]

    我们知道有这么一条性质:

    \[\sum\sum x_iy_i = \sum x_i \sum y_i\]

    因此上式进一步化简为:

    \[\begin{aligned} \sum_{z_1, z_2, ..., z_N}[log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)})\prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)})] & = \sum_{z_1, z_2, ..., z_n} [log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)})] \sum_{z_1, z_2, ..., z_N} \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)}) \\ & = \sum_{z_1}[log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)})] \sum_{z_1, ..., z_N} \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)})\end{aligned}\]

    上面的最后一步是依据各个求和等式中的有关项和无关项,化简了求和公式下面的下标。

    我们单独看后面一部分:

    \[\begin{aligned} \sum_{z_1, ..., z_N} \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)}) & = \sum_{z_1, ..., z_N} p(z_2 \mid x_2, \theta^{(t)})p(z_3 \mid x_3, \theta^{(t)}...p(z_N \mid X_N, \theta^{(t)})) \\ & = \sum_{z_2} p(z_2 \mid x_2, \theta^{(t)}) \sum_{z_3} p(z_3 \mid x_3, \theta^{(t)})... \sum_{z_N}p(z_N \mid x_N, \theta^{(t)}) \end{aligned}\]

    对于其中任意一项,由于$z_i$取遍所有值, 根据概率加和为1的基本定理, 因此有:

    \[\sum_{z_i}p(z_i \mid x_i, \theta^{(t)}) = 1 \Rightarrow \sum_{z_2, ..., z_N} \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)}) = 1\]

    所以:

    \[\begin{aligned} \sum_{z_1, z_2, ..., z_N} [log~p(x_1, z_1 \mid \theta) \prod_{i=1}^N p(z_i \mid x_i, \theta^{(t)})] & = \sum_{z_1}[log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)})]\sum_{z_2, ..., z_N} \prod_{i=2}^N p(z_i \mid x_i, \theta^{(t)}) \\ & = \sum_{z_1} log~p(x_1, z_1 \mid \theta)p(z_1 \mid x_1, \theta^{(t)}) \end{aligned}\]

    这只是其中一项, 那我们所有项都类比地考虑进来, 得到$Q(\theta, \theta^{(t)})$最终地化简形式:

    \[\begin{aligned} Q(\theta, \theta^{(t)}) & = \sum_{z_1, z_2, ..., z_N}[(log~p(x_1, z_1 \mid \theta) + ... + log~p(x_N, z_N \mid \theta))] \\ & = \sum_{z_1}[log~p(x_1, z_1 \mid \theta) p(z_1 \mid x_1, \theta^{(t)})] \end{aligned}\]