Baum-Welsh Algorithm

The Baum-Welsh algorithm is used in computer science and statistics to find unknown parameters of the hidden Markov model (HMM). It uses a forward-reverse algorithm and is a special case of a generalized EM algorithm .

Baum - Welch Algorithm for Estimating Markov's Hidden Model

The hidden Markov model is a probabilistic model of many random variables $\{O_{1},\;\ldots ,\;O_{t},\;Q_{1},\;\ldots ,\;Q_{t}\}$ ${\ displaystyle \ {O_ {1}, \; \ ldots, \; O_ {t}, \; Q_ {1}, \; \ ldots, \; Q_ {t} \}}$ $\{O_{1},\;\ldots ,\;O_{t},\;Q_{1},\;\ldots ,\;Q_{t}\}$ . Variables $O_{t}$ ${\ displaystyle O_ {t}}$ $O_{t}$ - known discrete observations, and $Q_{t}$ ${\ displaystyle Q_ {t}}$ $Q_{t}$ - “hidden” discrete quantities. Within the framework of the hidden Markov model, there are two independent statements ensuring the convergence of this algorithm:

$t$ ${\ displaystyle t}$ $t$ hidden variable with known $(t-1)$ ${\ displaystyle (t-1)}$ $(t-1)$ variable is independent of all previous $(t-1)$ ${\ displaystyle (t-1)}$ $(t-1)$ variables, i.e. $P(Q_{t}\mid Q_{t-1},\;O_{t-1},\;\ldots ,\;Q_{1},\;O_{1})=P(Q_{t}\mid Q_{t-1})$ ${\ displaystyle P (Q_ {t} \ mid Q_ {t-1}, \; O_ {t-1}, \; \ ldots, \; Q_ {1}, \; O_ {1}) = P (Q_ {t} \ mid Q_ {t-1})}$ $P(Q_{t}\mid Q_{t-1},\;O_{t-1},\;\ldots ,\;Q_{1},\;O_{1})=P(Q_{t}\mid Q_{t-1})$ ;
$t$ ${\ displaystyle t}$ $t$ known observation depends only on $t$ ${\ displaystyle t}$ $t$ state, that is, does not depend on time, $P(O_{t}\mid Q_{t},\;Q_{t-1},\;O_{t-1},\;\ldots ,\;Q_{1},\;O_{1})=P(O_{t}\mid Q_{t})$ ${\ displaystyle P (O_ {t} \ mid Q_ {t}, \; Q_ {t-1}, \; O_ {t-1}, \; \ ldots, \; Q_ {1}, \; O_ { 1}) = P (O_ {t} \ mid Q_ {t})}$ $P(O_{t}\mid Q_{t},\;Q_{t-1},\;O_{t-1},\;\ldots ,\;Q_{1},\;O_{1})=P(O_{t}\mid Q_{t})$ .

Next, an algorithm of “assumptions and maximizations” will be proposed for finding the maximum probabilistic estimate of the parameters of the hidden Markov model for a given set of observations. This algorithm is also known as the Baum-Welsh algorithm.

$Q_{t}$ ${\ displaystyle Q_ {t}}$ $Q_{t}$ Is a discrete random variable that takes one of $N$ ${\ displaystyle N}$ $N$ values $(1\ldots N)$ ${\ displaystyle (1 \ ldots N)}$ $(1\ldots N)$ . We assume that this Markov model, defined as $P(Q_{t}\mid Q_{t-1})$ ${\ displaystyle P (Q_ {t} \ mid Q_ {t-1})}$ $P(Q_{t}\mid Q_{t-1})$ homogeneous in time, i.e. independent of $t$ ${\ displaystyle t}$ $t$ . Then you can ask $P(Q_{t}\mid Q_{t-1})$ ${\ displaystyle P (Q_ {t} \ mid Q_ {t-1})}$ $P(Q_{t}\mid Q_{t-1})$ as a time-independent stochastic matrix of displacements $A=\{a_{ij}\}=p(Q_{t}=j\mid Q_{t-1}=i)$ ${\ displaystyle A = \ {a_ {ij} \} = p (Q_ {t} = j \ mid Q_ {t-1} = i)}$ $A=\{a_{ij}\}=p(Q_{t}=j\mid Q_{t-1}=i)$ . Special occasion for time $t=1$ ${\ displaystyle t = 1}$ $t=1$ determined by the initial distribution $\pi _{i}=P(Q_{1}=i)$ ${\ displaystyle \ pi _ {i} = P (Q_ {1} = i)}$ $\pi _{i}=P(Q_{1}=i)$ .

We assume that we are able $j$ ${\ displaystyle j}$ $j$ at time $t$ ${\ displaystyle t}$ $t$ , if a $Q_{t}=j$ ${\ displaystyle Q_ {t} = j}$ $Q_{t}=j$ . The sequence of given states is defined as $q=(q_{1},\;\ldots ,\;q_{T})$ ${\ displaystyle q = (q_ {1}, \; \ ldots, \; q_ {T})}$ $q=(q_{1},\;\ldots ,\;q_{T})$ where $q_{t}\in \{1\ldots N\}$ ${\ displaystyle q_ {t} \ in \ {1 \ ldots N \}}$ $q_{t}\in \{1\ldots N\}$ is a state at the moment $t$ ${\ displaystyle t}$ $t$ .

Observation may have one of $L$ ${\ displaystyle L}$ $L$ possible values $O_{t}\in \{o_{1},\;\ldots ,\;o_{L}\}$ ${\ displaystyle O_ {t} \ in \ {o_ {1}, \; \ ldots, \; o_ {L} \}}$ $O_{t}\in \{o_{1},\;\ldots ,\;o_{L}\}$ . The probability of a given observation vector at time $t$ ${\ displaystyle t}$ $t$ for condition $j$ ${\ displaystyle j}$ $j$ defined as $b_{j}(o_{t})=P(O_{t}=o_{t}\mid Q_{t}=j)$ ${\ displaystyle b_ {j} (o_ {t}) = P (O_ {t} = o_ {t} \ mid Q_ {t} = j)}$ $b_{j}(o_{t})=P(O_{t}=o_{t}\mid Q_{t}=j)$ ( $B=\{b_{ij}\}$ ${\ displaystyle B = \ {b_ {ij} \}}$ $B=\{b_{ij}\}$ Is a matrix $L$ ${\ displaystyle L}$ $L$ on $N$ ${\ displaystyle N}$ $N$ ) Preset Sequence of Observations $O$ ${\ displaystyle O}$ $O$ expressed as $O=(O_{1}=o_{1},\;\ldots ,\;O_{T}=o_{T})$ ${\ displaystyle O = (O_ {1} = o_ {1}, \; \ ldots, \; O_ {T} = o_ {T})}$ $O=(O_{1}=o_{1},\;\ldots ,\;O_{T}=o_{T})$ .

Therefore, we can describe the hidden Markov model using $\lambda =(A\;,B,\;\pi )$ ${\ displaystyle \ lambda = (A \ ;, B, \; \ pi)}$ $\lambda =(A\;,B,\;\pi )$ . For a given observation vector $O$ ${\ displaystyle O}$ Baum-Welsh algorithm finds $\lambda ^{*}=\max _{\lambda }P(O\mid \lambda )$ ${\ displaystyle \ lambda ^ {*} = \ max _ {\ lambda} P (O \ mid \ lambda)}$ . $\lambda$ ${\ displaystyle \ lambda}$ maximizes the likelihood of observations $O$ ${\ displaystyle O}$ .

Algorithm

Initial data: $\lambda =(A,\;B,\;\pi )$ ${\ displaystyle \ lambda = (A, \; B, \; \ pi)}$ with random initial conditions.

The algorithm iteratively updates the parameter $\lambda$ ${\ displaystyle \ lambda}$ before converging at one point.

Direct procedure

Define $\alpha _{i}(t)=p(O_{1}=o_{1},\;\ldots ,\;O_{t}=o_{t},\;Q_{t}=i\mid \lambda )$ ${\ displaystyle \ alpha _ {i} (t) = p (O_ {1} = o_ {1}, \; \ ldots, \; O_ {t} = o_ {t}, \; Q_ {t} = i \ mid \ lambda)}$ , which is the probability of obtaining a given sequence $o_{1},\;\ldots ,\;o_{t}$ ${\ displaystyle o_ {1}, \; \ ldots, \; o_ {t}}$ for condition $i$ ${\ displaystyle i}$ at time $t$ ${\ displaystyle t}$ .

$\alpha _{i}(t)$ ${\ displaystyle \ alpha _ {i} (t)}$ can be calculated recursively:

$\alpha _{i}(1)=\pi _{i}\cdot b_{i}(O_{1});$ ${\ displaystyle \ alpha _ {i} (1) = \ pi _ {i} \ cdot b_ {i} (O_ {1});}$
$\alpha _{j}(t+1)=b_{j}(O_{t+1})\sum _{i=1}^{N}{\alpha _{i}(t)\cdot a_{ij}}.$ ${\ displaystyle \ alpha _ {j} (t + 1) = b_ {j} (O_ {t + 1}) \ sum _ {i = 1} ^ {N} {\ alpha _ {i} (t) \ cdot a_ {ij}}.}$

Reverse Procedure

This procedure allows you to calculate $\beta _{i}(t)=p(O_{t+1}=o_{t+1},\ldots ,O_{T}=o_{T}\mid Q_{t}=i,\lambda )$ ${\ displaystyle \ beta _ {i} (t) = p (O_ {t + 1} = o_ {t + 1}, \ ldots, O_ {T} = o_ {T} \ mid Q_ {t} = i, \ lambda)}$ probability of a finite given sequence $o_{t+1},\;\ldots ,\;o_{T}$ ${\ displaystyle o_ {t + 1}, \; \ ldots, \; o_ {T}}$ provided that we started from the initial state $i$ ${\ displaystyle i}$ at time $t$ ${\ displaystyle t}$ .

Can calculate $\beta _{i}(t)$ ${\ displaystyle \ beta _ {i} (t)}$ :

$\beta _{i}(T)=p(\mid Q_{t}=i,\lambda )=1;$ ${\ displaystyle \ beta _ {i} (T) = p (\ mid Q_ {t} = i, \ lambda) = 1;}$
$\beta _{i}(t)=\sum _{j=1}^{N}{\beta _{j}(t+1)a_{ij}b_{j}(O_{t+1})}.$ ${\ displaystyle \ beta _ {i} (t) = \ sum _ {j = 1} ^ {N} {\ beta _ {j} (t + 1) a_ {ij} b_ {j} (O_ {t + one})}.}$

Using $\alpha$ ${\ displaystyle \ alpha}$ and $\beta$ ${\ displaystyle \ beta}$ the following values can be calculated:

$\gamma _{i}(t)\equiv p(Q_{t}=i\mid O,\;\lambda )={\frac {\alpha _{i}(t)\beta _{i}(t)}{\displaystyle \sum _{j=1}^{N}\alpha _{j}(t)\beta _{j}(t)}},$ ${\ displaystyle \ gamma _ {i} (t) \ equiv p (Q_ {t} = i \ mid O, \; \ lambda) = {\ frac {\ alpha _ {i} (t) \ beta _ {i } (t)} {\ displaystyle \ sum _ {j = 1} ^ {N} \ alpha _ {j} (t) \ beta _ {j} (t)}},}$
$\xi _{ij}(t)\equiv p(Q_{t}=i,\;Q_{t+1}=j\mid O,\;\lambda )={\frac {\alpha _{i}(t)a_{ij}\beta _{j}(t+1)b_{j}(o_{t+1})}{\displaystyle \sum _{i=1}^{N}\displaystyle \sum _{j=1}^{N}\alpha _{i}(t)a_{ij}\beta _{j}(t+1)b_{j}(O_{t+1})}}.$ ${\ displaystyle \ xi _ {ij} (t) \ equiv p (Q_ {t} = i, \; Q_ {t + 1} = j \ mid O, \; \ lambda) = {\ frac {\ alpha _ {i} (t) a_ {ij} \ beta _ {j} (t + 1) b_ {j} (o_ {t + 1})} {\ displaystyle \ sum _ {i = 1} ^ {N} \ displaystyle \ sum _ {j = 1} ^ {N} \ alpha _ {i} (t) a_ {ij} \ beta _ {j} (t + 1) b_ {j} (O_ {t + 1})} }.}$

Having $\gamma$ ${\ displaystyle \ gamma}$ and $\xi$ ${\ displaystyle \ xi}$ , you can determine:

${\bar {\pi }}_{i}=\gamma _{i}(1),$ ${\ displaystyle {\ bar {\ pi}} _ {i} = \ gamma _ {i} (1),}$
${\bar {a}}_{ij}={\frac {\displaystyle \sum _{t=1}^{T-1}\xi _{ij}(t)}{\displaystyle \sum _{t=1}^{T-1}\gamma _{i}(t)}},$ ${\ displaystyle {\ bar {a}} _ {ij} = {\ frac {\ displaystyle \ sum _ {t = 1} ^ {T-1} \ xi _ {ij} (t)} {\ displaystyle \ sum _ {t = 1} ^ {T-1} \ gamma _ {i} (t)}},}$
${\bar {b}}_{i}(k)={\frac {\displaystyle \sum _{t=1}^{T}\delta _{O_{t},\;o_{k}}\gamma _{i}(t)}{\displaystyle \sum _{t=1}^{T}\gamma _{i}(t)}}.$ ${\ displaystyle {\ bar {b}} _ {i} (k) = {\ frac {\ displaystyle \ sum _ {t = 1} ^ {T} \ delta _ {O_ {t}, \; o_ {k }} \ gamma _ {i} (t)} {\ displaystyle \ sum _ {t = 1} ^ {T} \ gamma _ {i} (t)}}.}$

Using new values $A$ ${\ displaystyle A}$ , $B$ ${\ displaystyle B}$ and $\pi$ ${\ displaystyle \ pi}$ , iterations continue until convergence.

Baum-Welsh Algorithm

Baum - Welch Algorithm for Estimating Markov's Hidden Model

Algorithm

Direct procedure

Reverse Procedure

Sources

More articles: