347 647 9001+1 714 797 8196Request a Call
Call Me

Learning the Markov Model

September 25, 2015
, , , , ,

In previous post, we introduced concept of Markov “memoryless” process and state transition chains for certain class of Predictive Modeling. In this post, we continue mathematical treatment and learning of Markov model.

As you will have noted from last post, Markov processes are represented by series of state transitions in a directed graph. Classical Markov process is of order one i.e. next state transition depends only on current state and not on how current state has been reached, but Markov processes can be of higher order too. Popular children’s game Snakes and Ladder is one example of order one Markov process. We will also make another assumption that events happen in discrete time , that is, at each discrete time interval one and only one state transition happens. This is again not uncommon in practice. Requirement of time being discrete unit is not mandatory requirement as long as key properties of assumptions are understood. For instance, a person browsing webpages (example we also used in last post) can be considered transitioning from one webpage to another in Markov fashion, even if time to change is not uniform or consistent across persons.

Training Data

Training observations for Markov modeling consists of number of sequence of states. For following process flow diagram in Fig. 1 may generate sequences as shown in Fig. 2 where 40 random sequences have been generated. Note that true process rarely be known in real world, and what will be observed are simply manifest sequences. Here true process flow is shown only for illustration.

Figure 1: True but Unobserved Markov Process

true but unobserved Markov Process

Figure 2: 40 Random Sequences


Given this data, how will we go about learning the Markov process?

First, we need to find number of states. This is easy – there are 6 unique state symbols, 1-6. Of course, in practice data always is not universe of data but a sample. There may be case where some rare states remain unobserved in the training data. This will introduce error in the model since we are not learning reality, but we have to accept this as we cannot do anything except try to get better training data.

Learning Markov Model of Order One

Second, we need to assume order of Markov process. Let’s first try to learn Markov process of order = 1. How? Well, strangely, and very conveniently for us, Maximum Likelihood Estimator of Markov Process is simply count based observed transition probabilities. Statistical details are in this paper for interested reader.

Thus, estimated probability of transition from 1 to 2 is number of times transition from 1 to 2 was observed, divided by number of times transition from 1 to anywhere was observed.


If we sample 70%, or about 31 of these observations for training, and do the counting exercise, we will obtain transition matrix as show in Fig. 3 which is compared against “true” matrix which was used to generate the data. While number of training sequences was fairly small, we are able to do pretty decent job of estimation. In practice, of course, there is no ‘truth’ known. There are no standard metric for performance of fit, since true transition probabilities are not known. Often, data is separated in multiple equal-size chunks and separate estimations are done. If they are close to each other – say, as measured by Root-Mean-Squared-Error – then we have more confidence in estimated probabilities.

Figure 3: Order 1 Markov Model

Order 1 Markov Model

Learning Markov Model of Order Two

But how do we know, if order of Markov process is really 1? Let’s assume it’s 2. Now we compute the estimated transition probabilities in same manner, as


Doing so, produces estimated probabilities, as shown in Fig. 4. For brevity, only non-zero values are presented. Note that since 'true' process is only of Order 1, true probabilities are independent of index i.

Figure 4: Order 2 Markov Model

Order 2 Markov Model

How do we know which of Order 1 or Order 2 is true estimation? Here comes cross-validation handy, as is often common in Machine Learning models. We can compute probabilities of observing sequences from cross-validation data, and whichever model provides maximum value is likely correct model. With a caveat which we will come to later.

Let’s pick a sequence 125456 from rest of 30% data. Probability of observing this sequence is

P(12456)= P(2|1)*P(5|2)*P(4|5)*P(5|4)*P(6|5)

P(12456)= 1*0.4*0.4*1*0.6= 0.096

according to Order 1 Markov process. Same for Order 2 is

P(12456)= P(5|12)*P(4|25)*P(5|54)*P(6|45)

P(12456)= 0.4*0.6*1.0*0.7= 0.168

So, for this sequence, Order 2 seems more likely. We will have to do same for all sequences in cross-validation data, and multiply all numbers to obtain joint probability of obtaining observed data. This is actually the very calculation of likelihood which is used in estimation above.

Now about that caveat: Given that two models perform nearly similar, we should always prefer a simpler model, that is, model of smaller order. This can be achieved by applying a penalty to likelihood estimation which is function of order of model. If penalty simply is order, then we obtain what’s known as Akaike Information Criterion (AIC).

In next post, we will talk about training Markov models in cases where observed states are not true states of system.

Other Articles by the same author

Curse Dimensionality

Semi-Supervised Clustering

Other Related Links that you may like

Overview of Text Mining

Role of Business Analyst


About the Author

Ashish Gupa has over 8 years' experience in analytics, machine learning, and business consulting, spanning banking and retail domains, and Asia and Americas geographies


Global Association of Risk Professionals, Inc. (GARP®) does not endorse, promote, review or warrant the accuracy of the products or services offered by EduPristine for FRM® related information, nor does it endorse any pass rates claimed by the provider. Further, GARP® is not responsible for any fees or costs paid by the user to EduPristine nor is GARP® responsible for any fees or costs of any person or entity providing any services to EduPristine Study Program. FRM®, GARP® and Global Association of Risk Professionals®, are trademarks owned by the Global Association of Risk Professionals, Inc

CFA Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by EduPristine. CFA Institute, CFA®, Claritas® and Chartered Financial Analyst® are trademarks owned by CFA Institute.

Utmost care has been taken to ensure that there is no copyright violation or infringement in any of our content. Still, in case you feel that there is any copyright violation of any kind please send a mail to and we will rectify it.

Popular Blogs: Whatsapp Revenue Model | CFA vs CPA | CMA vs CPA | ACCA vs CPA | CFA vs FRM

Post ID = 81953