Hidden path inference

A T=10000 long realisation of the three state (S={1,2,3}) HMM with a four symbol emission alphabet ({A,B,C,D}) specified above is simulated in the stationary regime. In this toy example, performance of several principled path decoders is illustrated. Nameley, below we consider the Viterbi (MAP), PMAP ("optimal accuracy"/BCJR implemented in Bahl L, Cocke J, et al. 1974), PairwiseMAP (suggested by L. Rabiner in his popular 1986 tutorial on HMMs), PVD ("Posterior Viterbi", used in Fariselli P, Martelli P, et al. in 2005), ConstrainedPMAP (i.e. admissible optimal accuracy, used in Käll L, Krogh A, et al. 2005), ViterbiPMAPHybridK2 (introduced in A generalized risk-based approach to segmentation based on hidden Markov models, where all of these and others decoders are discussed in detail).

A stream of plain text output of the six decoders as well as the ground truth state sequence/path are presented in here. The 10000 long stream is also presented below as a color-coded raster scan image of 50 rows of 200 symbols. The rows are separated by grey borderlines and the color code, the overall as well as individual state false negatvie error rates, and log-posterior probability rates are given in the Table below.

Source \ State	Error rate (100%)	Miss 1(100%)	Miss 2(100%)	Miss 3(100%)	Log-probability rate
Truth	0	0	0	0	-1.4080
Viterbi (MAP)	12.68	3.23	98.86	27.93	-1.3525
PMAP (optimal accuracy)	8.38	3.5	100	13.63	-∞
PairwiseMAP	8.4	3.46	100	13.79	-∞
Posterior Viterbi	8.67	3.64	97.14	14.43	-1.3595
ConstrainedPMAP	8.55	3.61	97.71	14.08	-1.3603
ViterbiPMAP_HybridK2	9.07	3.32	97.14	16.36	-1.3567
Borderline

Beware the colour interference (e.g. alteration of green in the presence of red neighbours) when viewing the raster scan.

Note that state 2 is always half as bright as state 3 which makes it easier to distinguish admissible (positive probability) and inadmissible (zero probability) paths. Note also that togeather with the Viterbi decoders, the last three decoders (PV, Constrained PMAP, and the 2-block hybrid of the Viterbi and PMAP decoders) are guaranteed to produce admissible paths. In this example, the PMAP and PairwiseMAP despite their overall accuracy, always fail to detect state 2, implying inadmissibility of the resulting paths.
Color-coded raster scan image of true and decoded paths

Color-coded raster scan image of true and decoded paths

While the variation in the performance among these decoders might appear somewhat limited in this simple example, it is expected to grow with the complexity of the model. Also, the overall pointwise error rate is certainly not the only measure of accuracy. Indeed, state-specific as well as region-based performance measures are commonly used, for example, in computational biology. Since for region-based measures it is generally not possible to derive an efficient optimal decoder, the richness of the family of hybrid decoders, such as the 2-block hybrid of the Viterbi and PMAP decoders (ViterbiPMAP_HybridK2), may prove useful. The situation becomes even more interesting when the individual states of a complex model are aggregated into label classes and optimal label paths are required in addition to optimal state paths. Other factors making further investigation of such decoders interesting are 1) availability of partial annotations, which would require appropriate constraining 2) path sampling for identification of alternative structures/signals.

Alexey Koloydenko
7/08/2010