The hidden Markov model is a probabilistic model of many random variables {\ displaystyle \ {O_ {1}, \; \ ldots, \; O_ {t}, \; Q_ {1}, \; \ ldots, \; Q_ {t} \}}
. Variables {\ displaystyle O_ {t}}
- known discrete observations, and {\ displaystyle Q_ {t}}
- “hidden” discrete quantities. Within the framework of the hidden Markov model, there are two independent statements ensuring the convergence of this algorithm:
- {\ displaystyle t}
hidden variable with known {\ displaystyle (t-1)}
variable is independent of all previous {\ displaystyle (t-1)}
variables, i.e. {\ displaystyle P (Q_ {t} \ mid Q_ {t-1}, \; O_ {t-1}, \; \ ldots, \; Q_ {1}, \; O_ {1}) = P (Q_ {t} \ mid Q_ {t-1})}
; - {\ displaystyle t}
known observation depends only on {\ displaystyle t}
state, that is, does not depend on time, {\ displaystyle P (O_ {t} \ mid Q_ {t}, \; Q_ {t-1}, \; O_ {t-1}, \; \ ldots, \; Q_ {1}, \; O_ { 1}) = P (O_ {t} \ mid Q_ {t})}
.
Next, an algorithm of “assumptions and maximizations” will be proposed for finding the maximum probabilistic estimate of the parameters of the hidden Markov model for a given set of observations. This algorithm is also known as the Baum-Welsh algorithm.
{\ displaystyle Q_ {t}}
Is a discrete random variable that takes one of {\ displaystyle N}
values {\ displaystyle (1 \ ldots N)}
. We assume that this Markov model, defined as {\ displaystyle P (Q_ {t} \ mid Q_ {t-1})}
homogeneous in time, i.e. independent of {\ displaystyle t}
. Then you can ask {\ displaystyle P (Q_ {t} \ mid Q_ {t-1})}
as a time-independent stochastic matrix of displacements {\ displaystyle A = \ {a_ {ij} \} = p (Q_ {t} = j \ mid Q_ {t-1} = i)}
. Special occasion for time {\ displaystyle t = 1}
determined by the initial distribution {\ displaystyle \ pi _ {i} = P (Q_ {1} = i)}
.
We assume that we are able {\ displaystyle j}
at time {\ displaystyle t}
, if a {\ displaystyle Q_ {t} = j}
. The sequence of given states is defined as {\ displaystyle q = (q_ {1}, \; \ ldots, \; q_ {T})}
where {\ displaystyle q_ {t} \ in \ {1 \ ldots N \}}
is a state at the moment {\ displaystyle t}
.
Observation may have one of {\ displaystyle L}
possible values {\ displaystyle O_ {t} \ in \ {o_ {1}, \; \ ldots, \; o_ {L} \}}
. The probability of a given observation vector at time {\ displaystyle t}
for condition {\ displaystyle j}
defined as {\ displaystyle b_ {j} (o_ {t}) = P (O_ {t} = o_ {t} \ mid Q_ {t} = j)}
( {\ displaystyle B = \ {b_ {ij} \}}
Is a matrix {\ displaystyle L}
on {\ displaystyle N}
) Preset Sequence of Observations {\ displaystyle O}
expressed as {\ displaystyle O = (O_ {1} = o_ {1}, \; \ ldots, \; O_ {T} = o_ {T})}
.
Therefore, we can describe the hidden Markov model using {\ displaystyle \ lambda = (A \ ;, B, \; \ pi)}
. For a given observation vector {\ displaystyle O} Baum-Welsh algorithm finds {\ displaystyle \ lambda ^ {*} = \ max _ {\ lambda} P (O \ mid \ lambda)} . {\ displaystyle \ lambda} maximizes the likelihood of observations {\ displaystyle O} .
Initial data: {\ displaystyle \ lambda = (A, \; B, \; \ pi)} with random initial conditions.
The algorithm iteratively updates the parameter {\ displaystyle \ lambda} before converging at one point.
Direct procedure
Define {\ displaystyle \ alpha _ {i} (t) = p (O_ {1} = o_ {1}, \; \ ldots, \; O_ {t} = o_ {t}, \; Q_ {t} = i \ mid \ lambda)} , which is the probability of obtaining a given sequence {\ displaystyle o_ {1}, \; \ ldots, \; o_ {t}} for condition {\ displaystyle i} at time {\ displaystyle t} .
{\ displaystyle \ alpha _ {i} (t)} can be calculated recursively:
- {\ displaystyle \ alpha _ {i} (1) = \ pi _ {i} \ cdot b_ {i} (O_ {1});}
- {\ displaystyle \ alpha _ {j} (t + 1) = b_ {j} (O_ {t + 1}) \ sum _ {i = 1} ^ {N} {\ alpha _ {i} (t) \ cdot a_ {ij}}.}
Reverse Procedure
This procedure allows you to calculate {\ displaystyle \ beta _ {i} (t) = p (O_ {t + 1} = o_ {t + 1}, \ ldots, O_ {T} = o_ {T} \ mid Q_ {t} = i, \ lambda)} probability of a finite given sequence {\ displaystyle o_ {t + 1}, \; \ ldots, \; o_ {T}} provided that we started from the initial state {\ displaystyle i} at time {\ displaystyle t} .
Can calculate {\ displaystyle \ beta _ {i} (t)} :
- {\ displaystyle \ beta _ {i} (T) = p (\ mid Q_ {t} = i, \ lambda) = 1;}
- {\ displaystyle \ beta _ {i} (t) = \ sum _ {j = 1} ^ {N} {\ beta _ {j} (t + 1) a_ {ij} b_ {j} (O_ {t + one})}.}
Using {\ displaystyle \ alpha} and {\ displaystyle \ beta} the following values can be calculated:
- {\ displaystyle \ gamma _ {i} (t) \ equiv p (Q_ {t} = i \ mid O, \; \ lambda) = {\ frac {\ alpha _ {i} (t) \ beta _ {i } (t)} {\ displaystyle \ sum _ {j = 1} ^ {N} \ alpha _ {j} (t) \ beta _ {j} (t)}},}
- {\ displaystyle \ xi _ {ij} (t) \ equiv p (Q_ {t} = i, \; Q_ {t + 1} = j \ mid O, \; \ lambda) = {\ frac {\ alpha _ {i} (t) a_ {ij} \ beta _ {j} (t + 1) b_ {j} (o_ {t + 1})} {\ displaystyle \ sum _ {i = 1} ^ {N} \ displaystyle \ sum _ {j = 1} ^ {N} \ alpha _ {i} (t) a_ {ij} \ beta _ {j} (t + 1) b_ {j} (O_ {t + 1})} }.}
Having {\ displaystyle \ gamma} and {\ displaystyle \ xi} , you can determine:
- {\ displaystyle {\ bar {\ pi}} _ {i} = \ gamma _ {i} (1),}
- {\ displaystyle {\ bar {a}} _ {ij} = {\ frac {\ displaystyle \ sum _ {t = 1} ^ {T-1} \ xi _ {ij} (t)} {\ displaystyle \ sum _ {t = 1} ^ {T-1} \ gamma _ {i} (t)}},}
- {\ displaystyle {\ bar {b}} _ {i} (k) = {\ frac {\ displaystyle \ sum _ {t = 1} ^ {T} \ delta _ {O_ {t}, \; o_ {k }} \ gamma _ {i} (t)} {\ displaystyle \ sum _ {t = 1} ^ {T} \ gamma _ {i} (t)}}.}
Using new values {\ displaystyle A} , {\ displaystyle B} and {\ displaystyle \ pi} , iterations continue until convergence.