maxs.t.Ent(D)k=1∑Npk=−k=1∑Npklog2pk=1我們將資訊熵的中的符號去掉,即最小化 ∑ k = 1 N p k log 2 p k \displaystyle{\sum_{k=1}^{N} p_{k}\log_{2}p_{k}}k=1∑Npklog2pk,我們採用拉格朗日乘子法,設J ( p 1 , p 2 , ⋯   , p k , λ ) = ∑ k = 1 N p k log 2 p k + λ ( ∑ k = 1 N p k − 1 )
J(p1,p2,⋯,pk,λ)=k=1∑Npklog2pk+λ(k=1∑Npk−1)對 J ( p 1 , p 2 , ⋯   , p k , λ ) J(p_{1},p_{2},\cdots,p_{k},\lambda)J(p1,p2,⋯,pk,λ) 求關於 p 1 , p 2 , ⋯   , p k , λ p_{1},p_{2},\cdots,p_{k},\lambdap1,p2,⋯,pk,λ 的導數,並令其為 0 ,可得∂ J ( p 1 , p 2 , ⋯   , p k , λ ) ∂ p k = log 2 p k + 1 ln 2 + λ = 0 , k = 1 , 2 , ⋯   , N ∂ J ( p 1 , p 2 , ⋯   , p k , λ ) ∂ λ = ∑ k = 1 N p k − 1 = 0
pkλ=N1,k=1,2,⋯,N=log2N−ln21所以當 p k = 1 N , k = 1 , 2 , ⋯   , N p_{k} = \frac{1}{N},\quad k =1,2,\cdots,Npk=N1,k=1,2,⋯,N 時,J ( p 1 , p 2 , ⋯   , p k , λ ) J(p_{1},p_{2},\cdots,p_{k},\lambda)J(p1,p2,⋯,pk,λ) 取最小值,即資訊熵取最大值。
amp;maxamp;s.t.amp;Ent(D)amp;∑k=1Npkamp;=−∑k=1Npklog2pkamp;=1amp;maxamp;Ent(D)amp;=−∑k=1Npklog2pkamp;s.t.amp;∑k=1Npkamp;=1
maxs.t.Ent(D)k=1∑Npk=−k=1∑Npklog2pk=1我們將資訊熵的中的符號去掉,即最小化 ∑ k = 1 N p k log 2 p k \displaystyle{\sum_{k=1}^{N} p_{k}\log_{2}p_{k}}k=1∑Npklog2pk,我們採用拉格朗日乘子法,設J ( p 1 , p 2 , ⋯   , p k , λ ) = ∑ k = 1 N p k log 2 p k + λ ( ∑ k = 1 N p k − 1 )
J(p1,p2,⋯,pk,λ)amp;=∑k=1Npklog2pk+λ(∑k=1Npk−1)J(p1,p2,⋯,pk,λ)amp;=∑k=1Npklog2pk+λ(∑k=1Npk−1)
J(p1,p2,⋯,pk,λ)=k=1∑Npklog2pk+λ(k=1∑Npk−1)對 J ( p 1 , p 2 , ⋯   , p k , λ ) J(p_{1},p_{2},\cdots,p_{k},\lambda)J(p1,p2,⋯,pk,λ) 求關於 p 1 , p 2 , ⋯   , p k , λ p_{1},p_{2},\cdots,p_{k},\lambdap1,p2,⋯,pk,λ 的導數,並令其為 0 ,可得∂ J ( p 1 , p 2 , ⋯   , p k , λ ) ∂ p k = log 2 p k + 1 ln 2 + λ = 0 , k = 1 , 2 , ⋯   , N ∂ J ( p 1 , p 2 , ⋯   , p k , λ ) ∂ λ = ∑ k = 1 N p k − 1 = 0
∂J(p1,p2,⋯,pk,λ)∂pk∂J(p1,p2,⋯,pk,λ)∂λamp;=log2pk+1ln2+λ=0,k=1,2,⋯,Namp;=∑k=1Npk−1=0∂J(p1,p2,⋯,pk,λ)∂pkamp;=log2pk+1ln2+λ=0,k=1,2,⋯,N∂J(p1,p2,⋯,pk,λ)∂λamp;=∑k=1Npk−1=0
∂pk∂J(p1,p2,⋯,pk,λ)∂λ∂J(p1,p2,⋯,pk,λ)=log2pk+ln21+λ=0,k=1,2,⋯,N=k=1∑Npk−1=0我們可以得到p k = 2 − ( 1 ln 2 + λ ) , k = 1 , 2 , ⋯   , N p_{k}= 2^{-(\frac{1}{\ln 2} +\lambda)},\quad k =1,2,\cdots,N\\pk=2−(ln21+λ),k=1,2,⋯,N而∑ k = 1 N p k − 1 = 0 \sum_{k=1}^{N}p_{k} -1 =0k=1∑Npk−1=0所以N ⋅ 2 − ( 1 ln 2 + λ ) = 1 N\cdot2^{-(\frac{1}{\ln 2} +\lambda)} = 1N⋅2−(ln21+λ)=1可得p k = 1 N , k = 1 , 2 , ⋯   , N λ = log 2 N − 1 ln 2
pkλamp;=1N,k=1,2,⋯,Namp;=log2N−1ln2pkamp;=1N,k=1,2,⋯,Nλamp;=log2N−1ln2
pkλ=N1,k=1,2,⋯,N=log2N−ln21所以當 p k = 1 N , k = 1 , 2 , ⋯   , N p_{k} = \frac{1}{N},\quad k =1,2,\cdots,Npk=N1,k=1,2,⋯,N 時,J ( p 1 , p 2 , ⋯   , p k , λ ) J(p_{1},p_{2},\cdots,p_{k},\lambda)J(p1,p2,⋯,pk,λ) 取最小值,即資訊熵取最大值。