Matlab编程–EM算法
核心代码
计算充分统计量
N k = ∑ i = 1 N P ( z k ∣ x i ) μ k = ∑ i = 1 N P ( z k ∣ x i ) ∗ x i σ k = ∑ i = 1 N P ( z k ∣ x i ) ∗ x i x i T \begin{aligned}
N_k =& \sum_{i=1}^NP(z_k|x_i)\\
\mu_k = & \sum_{i=1}^NP(z_k|x_i)*x_i \\
\sigma_k = & \sum_{i=1}^NP(z_k|x_i)*x_ix_i^T
\end{aligned}
N k = μ k = σ k = i = 1 ∑ N P ( z k ∣ x i ) i = 1 ∑ N P ( z k ∣ x i ) ∗ x i i = 1 ∑ N P ( z k ∣ x i ) ∗ x i x i T
又因方差矩阵是对角方差,( p , p ) − > ( p , 1 ) (p,p)->(p,1) ( p , p ) − > ( p , 1 ) ,故第三项改为:
σ k = ∑ i = 1 N P ( z k ∣ x i ) ∗ x i . ∗ x i \sigma_k = \sum_{i=1}^NP(z_k|x_i)*x_i.*x_i
σ k = i = 1 ∑ N P ( z k ∣ x i ) ∗ x i . ∗ x i
扩展为矩阵:
N : s u m ( ( k , n ) , 2 ) ′ : ( 1 , k ) N:sum((k,n),2)':(1,k) N : s u m ( ( k , n ) , 2 ) ′ : ( 1 , k )
μ = { μ 1 , μ 2 , … , μ K } : ( p , K ) \mu=\{\mu_1,\mu_2,\dots,\mu_K \}:(p,K) μ = { μ 1 , μ 2 , … , μ K } : ( p , K )
⭕️向量化情况六:含∑ \sum ∑ 符号的处理
展开分析:
σ k = P ( z k ∣ x 1 ) ∗ x 1 . ∗ x 1 + P ( z k ∣ x 2 ) ∗ x 2 . ∗ x 2 + ⋯ + P ( z k ∣ x n ) ∗ x n . ∗ x n \sigma_k=P(z_k|x_1)*x_1.*x_1+P(z_k|x_2)*x_2.*x_2+\dots+P(z_k|x_n)*x_n.*x_n
σ k = P ( z k ∣ x 1 ) ∗ x 1 . ∗ x 1 + P ( z k ∣ x 2 ) ∗ x 2 . ∗ x 2 + ⋯ + P ( z k ∣ x n ) ∗ x n . ∗ x n
改写∑ \sum ∑ 为矩阵乘法运算:
σ k = ( x 1 . ∗ x 1 x 2 . ∗ x 2 . . . x 3 . ∗ x 3 ) ⋅ ( P ( z k ∣ x 1 ) P ( z k ∣ x 2 ) ⋮ P ( z k ∣ x n ) ) = X . ∗ X ⋅ p o s t ( k , : ) ′ : ( p , n ) ( 1 , n ) T = ( p , 1 ) \begin{aligned}
\sigma_k=&\left( \begin{matrix}
x_1.*x_1& x_2.*x_2& ...& x_3.*x_3\\
\end{matrix} \right) \cdot \left( \begin{array}{c}
P\left( z_k|x_1 \right)\\
P\left( z_k|x_2 \right)\\
\vdots\\
P\left( z_k|x_n \right)\\
\end{array} \right)\\
=& X.*X\cdot post(k,:)':(p,n)(1,n)^T=(p,1)
\end{aligned}
σ k = = ( x 1 . ∗ x 1 x 2 . ∗ x 2 . . . x 3 . ∗ x 3 ) ⋅ ⎝ ⎜ ⎜ ⎜ ⎛ P ( z k ∣ x 1 ) P ( z k ∣ x 2 ) ⋮ P ( z k ∣ x n ) ⎠ ⎟ ⎟ ⎟ ⎞ X . ∗ X ⋅ p o s t ( k , : ) ′ : ( p , n ) ( 1 , n ) T = ( p , 1 )
将σ k \sigma_k σ k 列向扩展,可得σ \sigma σ :
σ = X . ∗ X ⋅ p o s t ( : , : ) ′ : ( p , n ) ( k , n ) T = ( p , k ) \sigma=X.*X\cdot post(:,:)':(p,n)(k,n)^T=(p,k)
σ = X . ∗ X ⋅ p o s t ( : , : ) ′ : ( p , n ) ( k , n ) T = ( p , k )
function [N, F, S, llk] = expectation (data, gmm) [post, llk] = postprob(data, gmm.mu, gmm.sigma, gmm.w(:)); N = sum(post, 2 )'; F = data * post'; S = (data .* data) * post';
计算后验概率
P ( z k ∣ x i ) = P ( x i ∣ z k ) ∗ P ( z k ) ∑ k = 1 K P ( x i ∣ z k ) ∗ P ( z k ) P(z_k|x_i)=\frac{P(x_i|z_k)*P(z_k)}{\sum_{k=1}^KP(x_i|z_k)*P(z_k)}
P ( z k ∣ x i ) = ∑ k = 1 K P ( x i ∣ z k ) ∗ P ( z k ) P ( x i ∣ z k ) ∗ P ( z k )
取对数:
l n ( P ( z k ∣ x i ) ) = l n ( P ( x i ∣ z k ) ) + l n ( P ( z k ) ) − l n ( ∑ k = 1 K P ( x i ∣ z k ) ∗ P ( z k ) ) ln(P(z_k|x_i))=ln(P(x_i|z_k))+ln(P(z_k))-ln(\sum_{k=1}^KP(x_i|z_k)*P(z_k))
l n ( P ( z k ∣ x i ) ) = l n ( P ( x i ∣ z k ) ) + l n ( P ( z k ) ) − l n ( k = 1 ∑ K P ( x i ∣ z k ) ∗ P ( z k ) )
注:上式中所有项都是实数,故只需要把每个项都扩展为( k , n ) (k,n) ( k , n ) 即可
这部分只需要扩展相同即可:
l n ( P ( x i ∣ z k ) ) : ( k , n ) ln(P(x_i|z_k)):(k,n) l n ( P ( x i ∣ z k ) ) : ( k , n )
l n ( P ( z k ) ) : ( k , 1 ) ln(P(z_k)):(k,1) l n ( P ( z k ) ) : ( k , 1 ) 【需要扩展】
⭐️l n ( ∑ k = 1 K P ( x i ∣ z k ) ∗ P ( z k ) ) ln(\sum_{k=1}^KP(x_i|z_k)*P(z_k)) l n ( ∑ k = 1 K P ( x i ∣ z k ) ∗ P ( z k ) ) :处理比较麻烦,请看后面的[第三节](# logsum)的内容
l n ( P ( z k ∣ x i ) ) : ( k , n ) − r e p m a t ( ( 1 , n ) , [ k , 1 ] ) = ( k , n ) ln(P(z_k|x_i)):(k,n)-repmat((1,n),[k,1])=(k,n) l n ( P ( z k ∣ x i ) ) : ( k , n ) − r e p m a t ( ( 1 , n ) , [ k , 1 ] ) = ( k , n )
function [post, llk] = postprob (data, mu, sigma, w) post = lgmmprob(data, mu, sigma, w); llk = logsumexp(post, 1 ); post = exp (bsxfun (@minus, post, llk));
计算似然概率
N ( x i ∣ μ k , Σ k ) = 1 ( 2 π ) D / 2 1 ∣ Σ k ∣ 1 / 2 exp { − 1 2 ( x i − μ k ) T Σ k − 1 ( x i − μ k ) } \mathcal{N}(\mathbf{x_i} | \boldsymbol{\mu_k}, \mathbf{\Sigma_k})=\frac{1}{(2 \pi)^{D / 2}} \frac{1}{|\mathbf{\Sigma_k}|^{1 / 2}} \exp \left\{-\frac{1}{2}(\mathbf{x_i}-\boldsymbol{\mu_k})^{T} \mathbf{\Sigma_k}^{-1}(\mathbf{x_i}-\boldsymbol{\mu_k})\right\}
N ( x i ∣ μ k , Σ k ) = ( 2 π ) D / 2 1 ∣ Σ k ∣ 1 / 2 1 exp { − 2 1 ( x i − μ k ) T Σ k − 1 ( x i − μ k ) }
function logprob = lgmmprob (data, mu, sigma, w) ndim = size (data, 1 ); C = sum(mu.*mu./sigma) + sum(log (sigma)); D = (1. /sigma)' * (data .* data) - 2 * (mu./sigma)' * data + ndim * log (2 * pi ); logprob = -0.5 * (bsxfun (@plus, C', D)); logprob = bsxfun (@plus, logprob, log (w));
返回矩阵的大小:( k , n ) (k,n) ( k , n )
总结:
EM算法的过程
计算一个人所有的nfiles【即同一个人中的所有句子】的充分统计量
计算充分统计量 ,函数expectation
利用充分统计量 去完成参数μ ^ ( t + 1 ) \hat\mu^{(t+1)} μ ^ ( t + 1 ) 的更新 。
参数μ ^ ( t + 1 ) \hat\mu^{(t+1)} μ ^ ( t + 1 ) 更新的公式:
p k ^ = π k ^ = 1 N N k \hat{p_k}=\hat{\pi_{k}}=\frac{1}{N}N_k
p k ^ = π k ^ = N 1 N k
第k k k 个分布中的均值:
μ k ^ = E z i = k ∣ x ( x ) N k = p ( z i = k ∣ x i ) x i N k \hat{\mu_k}=\frac{E_{z_i=k|x}(x)}{N_k}=\frac{p(z_i=k|x_i)x_i}{N_k}
μ k ^ = N k E z i = k ∣ x ( x ) = N k p ( z i = k ∣ x i ) x i
第k k k 个分布中的方差:
Σ k ^ = E z i = k ∣ x ( x ) ⋅ ( ( x i − μ k ) . ∗ ( x i − μ k ) ) N k = E z i = k ∣ x ( x i . ∗ x i ) − 2 E z i = k ∣ x ( x i . ∗ μ k ) + μ k . ∗ μ k N k = E z i = k ∣ x ( x i . ∗ x i ) − μ k . ∗ μ k N k \begin{aligned}
\hat{\Sigma_{k}}=&\frac{E_{z_i=k|x}(x)\cdot((x_i-\mu_k).*(x_i-\mu_k))}{N_k} \\
=&\frac{E_{z_i=k|x}(x_i.*x_i)-2E_{z_i=k|x}(x_i.*\mu_k)+\mu_k.*\mu_k}{N_k} \\
=&\frac{E_{z_i=k|x}(x_i.*x_i)-\mu_k.*\mu_k}{N_k}
\end{aligned}
Σ k ^ = = = N k E z i = k ∣ x ( x ) ⋅ ( ( x i − μ k ) . ∗ ( x i − μ k ) ) N k E z i = k ∣ x ( x i . ∗ x i ) − 2 E z i = k ∣ x ( x i . ∗ μ k ) + μ k . ∗ μ k N k E z i = k ∣ x ( x i . ∗ x i ) − μ k . ∗ μ k
N = 0 ; F = 0 ; S = 0 ; L = 0 ; nframes = 0 ; tim = tic; for ix = 1 : nfiles, [n, f, s, l] = expectation(dataList{ix}(:, 1 :ds_factor:end ), gmm); N = N + n; F = F + f; S = S + s; L = L + sum(l) nframes = nframes + length (l); end tim = toc(tim); fprintf('[llk = %.2f] \t [elaps = %.2f s]\n' , L/nframes, tim); gmm = maximization(N, F, S);
其中利用充分统计量对参数更新的代码:
function gmm = maximization (N, F, S) w = N / sum(N); mu = bsxfun (@rdivide, F, N); sigma = bsxfun (@rdivide, S, N) - (mu .* mu); sigma = apply_var_floors(w, sigma, 0.1 ); gmm.w = w; gmm.mu= mu; gmm.sigma = sigma;
编程需要注意的点:
预防矩阵无法求逆函数:apply_var_floors
function sigma = apply_var_floors (w, sigma, floor_const) vFloor = sigma * w' * floor_const; sigma = bsxfun (@max , sigma, vFloor); sigma = bsxfun (@plus, sigma, 1e-6 * ones (size (sigma, 1 ), 1 ));
求和矩阵,选取最大值作为方差
直接在对角线上加上极小值
LogSum 处理:
参考论文http://m.elecfans.com/article/805898.html
Logsumexp的定义:
log SumExp ( x 1 … x n ) = log ( ∑ i = 1 n e x i ) = log ( ∑ i = 1 n e x i − c e c ) = log ( e c ∑ i = 1 n e x i − c ) = log ( ∑ i = 1 n e x i − c ) + log ( e c ) = log ( ∑ i = 1 n e x i − c ) + c \begin{aligned}
\log \operatorname{SumExp}\left(x_{1} \ldots x_{n}\right) &=\log \left(\sum_{i=1}^{n} e^{x_{i}}\right) \\
&=\log \left(\sum_{i=1}^{n} e^{x_{i}-c} e^{c}\right) \\
&=\log \left(e^{c} \sum_{i=1}^{n} e^{x_{i}-c}\right) \\
&=\log \left(\sum_{i=1}^{n} e^{x_{i}-c}\right)+\log \left(e^{c}\right) \\
&=\log \left(\sum_{i=1}^{n} e^{x_{i}-c}\right)+c
\end{aligned}
log S u m E x p ( x 1 … x n ) = log ( i = 1 ∑ n e x i ) = log ( i = 1 ∑ n e x i − c e c ) = log ( e c i = 1 ∑ n e x i − c ) = log ( i = 1 ∑ n e x i − c ) + log ( e c ) = log ( i = 1 ∑ n e x i − c ) + c
本文处理的公式为:l n ( ∑ k = 1 K P ( x i ∣ z k ) ∗ P ( z k ) ) ln(\sum_{k=1}^KP(x_i|z_k)*P(z_k)) l n ( ∑ k = 1 K P ( x i ∣ z k ) ∗ P ( z k ) )
⭐️注:这里容易犯一个错误,把l n ln l n 和∑ \sum ∑ 符号交换了,实际上是有问题的,直接就是sum(Post,1)
常规解法:就是先把对数符号内的矩阵求出,但是计算机中容易导致underflow
技巧:改写为logsumexp公式
l n ( ∑ k = 1 K e l n ( P ( x i ∣ z k ) + l n P ( z k ) ) ln(\sum_{k=1}^Ke^{ln(P(x_i|z_k)+lnP(z_k)})
l n ( k = 1 ∑ K e l n ( P ( x i ∣ z k ) + l n P ( z k ) )
其中:
l n ( P ( x i ∣ z k ) + l n P ( z k ) ln(P(x_i|z_k)+lnP(z_k) l n ( P ( x i ∣ z k ) + l n P ( z k ) 前文中已经求出了,矩阵大小为:( k , n ) (k,n) ( k , n )
这样原本会underflow的部分P ( x i ∣ z k ) ∗ P ( z k ) P(x_i|z_k)*P(z_k) P ( x i ∣ z k ) ∗ P ( z k ) 就由连乘变为连加l n ( P ( x i ∣ z k ) + l n P ( z k ) ln(P(x_i|z_k)+lnP(z_k) l n ( P ( x i ∣ z k ) + l n P ( z k ) ,就不会出现值太小的问题,计算机不会出现存储不了的情况。
但此时幂的波动范围太大,又会导致溢出的情况,所以调整c的位置,让幂的波动不要太大
l n ( ∑ k = 1 K e l n ( P ( x i ∣ z k ) + l n P ( z k ) ) = c + l o g ( ∑ k = 1 K e l n ( P ( x i ∣ z k ) + l n P ( z k ) − c ) ln(\sum_{k=1}^Ke^{ln(P(x_i|z_k)+lnP(z_k)})= c+log(\sum_{k=1}^Ke^{ln(P(x_i|z_k)+lnP(z_k)-c})
l n ( k = 1 ∑ K e l n ( P ( x i ∣ z k ) + l n P ( z k ) ) = c + l o g ( k = 1 ∑ K e l n ( P ( x i ∣ z k ) + l n P ( z k ) − c )
注:c c c 无论取什么都是成立的,一般取幂部分的最大值即可。
经上分析,下面的代码就不是那么难以理解了!!!
function y = logsumexp (x, dim) xmax = max (x, [], dim); y = xmax + log (sum(exp (bsxfun (@minus, x, xmax)), dim)); ind = find (~isfinite (xmax)); if ~isempty (ind) y(ind) = xmax(ind); end
Python版本:
根据上面的分析,重新用Python编写了一个小的Demo,需要注意的点:
python
在执行sum操作的时候,会把矩阵变成列表,如np.sum(X(2,3),1)
最后的大小为(2,)
。解决方案是再手动增加一个维度,如np.sum(X,1).reshape(-1,1)
编写logsumexp
函数的时候编写出错了,方程中l n ( ∑ k = 1 K e l n ( P ( x i ∣ z k ) + l n P ( z k ) ) ln(\sum_{k=1}^Ke^{ln(P(x_i|z_k)+lnP(z_k)}) l n ( ∑ k = 1 K e l n ( P ( x i ∣ z k ) + l n P ( z k ) ) 求和的对象是K K K ,应该求l n ( P ( x i ∣ z k ) + l n P ( z k ) ln(P(x_i|z_k)+lnP(z_k) l n ( P ( x i ∣ z k ) + l n P ( z k ) 【大小为(k,n)】的第一维求max操作,而不是n。
""" Created on Fri Jan 17 22:41:45 2020 @author: wangj """ import numpy as npfrom sklearn import datasetsimport matplotlib.pyplot as pltX, y = datasets.make_blobs(n_samples=2000 , n_features=2 , centers=[[-1 , -1 ], [2 , 3 ], [5 , 6 ]], random_state=8 ) X = X.T nmix = 2 final_niter = 1000 def gmm_em (X, nmix, final_niter) : gmm_mu, gmm_cov, gmm_w = comp_gm_gv(nmix) print(gmm_mu) for i in range(final_niter): n, f, s = expectation(X, gmm_mu, gmm_cov, gmm_w) gmm_mu, gmm_cov, gmm_w = maximization(n, f, s) print(gmm_mu) return gmm_mu, gmm_cov def comp_gm_gv (nmix) : gmm_mu = np.array([[-10 , -9 ], [3 , 4 ], [10 , 8 ]]).T gmm_cov = np.array([[1 , 1 ], [1 , 1 ], [1 , 1 ]]).T gmm_w = np.array([[1 /3 ], [1 /3 ], [1 /3 ]]).T print("gmm_mu:{},gmm_cov:{},gmm_w:{}" .format(gmm_mu.shape, gmm_cov.shape, gmm_w.shape)) return gmm_mu, gmm_cov, gmm_w def lgmmprob (X, gmm_mu, gmm_cov, gmm_w) : ndim = X.shape[0 ] C = (np.sum(gmm_mu * gmm_mu / gmm_cov, 0 ).reshape(1 , -1 ) + np.sum(np.log(gmm_cov), 0 ).reshape(1 , -1 )) D = np.dot((1 / gmm_cov).T, np.multiply(X, X)) - 2 * np.dot((gmm_mu / gmm_cov).T, X) + ndim * np.log(2 * np.pi) logprob = -0.5 * (C.T + D) print("gmm:{}" .format(gmm_w.shape)) print("log:{}" .format(np.log(gmm_w).shape)) logprob = logprob + np.log(gmm_w.T) return logprob def postprob (X, gmm_mu, gmm_cov, gmm_w) : log_jointpro = lgmmprob(X, gmm_mu, gmm_cov, gmm_w) log_post = log_jointpro - logsumexp(log_jointpro) post = np.exp(log_post) return post def logsumexp (log_jointpro) : max_log_jointpro = np.max(log_jointpro, 0 ).reshape(1 , -1 ) y = max_log_jointpro + np.log(np.sum(np.exp(log_jointpro - max_log_jointpro), 0 ).reshape(1 , -1 )) return y def expectation (X, gmm_mu, gmm_cov, gmm_w) : post = postprob(X, gmm_mu, gmm_cov, gmm_w) N = (np.sum(post, 1 ).reshape(-1 , 1 )).T F = np.dot(X, post.T) S = np.dot(X * X, post.T) return N, F, S def maximization (N, F, S) : print("N:{},F:{},S:{}" .format(N.shape, F.shape, S.shape)) w = N / np.sum(N) mu = np.divide(F, N) print("mu:{}" .format(mu.shape)) cov = np.subtract(np.divide(S, N), np.multiply(mu, mu)) cov = cov + 1e-6 *np.ones((cov.shape[0 ],1 )) return mu, cov, w gmm_mu, gmm_cov = gmm_em(X, nmix, final_niter) print(gmm_mu) plt.figure() plt.scatter(X[0 , :], X[1 , :], c=y) plt.show()