Modeling and prediction problems occur in different domain and data situations. One type of situation involves sequence of events.

For instance, you may want to model behaviour of customers on your website, looking at pages they land or enter by, links they click, and so on. You may want to do this to understand common issues and needs and may redesign your website to address that. You may, on the other hand, may want to promote certain sections or products on website and want to understand right page architecture and layout. In other example, you may be interested in predicting next medical visit of patient based on previous visits or next purchase product of customer based on previous products. While traditional classification model based prediction methodologies may apply, there is additional class of algorithm available if you can classify actions as finite set of discrete events.

Keywords deserve emphasis:

  • Event refers to any incident which is limited in time duration and can be explicitly and unambiguously identified. So purchase is an event, so is rain, but lung cancer isn’t.
  • Discrete implies that events are mutually exclusive and non-overlapping. If you define purchase of apple as an event then purchase of milk can be separate event. If you define purchase of anything as event then purchase of milk cannot be separate event – in the same modeling framework, of course, as you can always redefine your events in different modeling context.
  • Finite is just what it says. Number of events is countable, non-infinite, and limited in number. In theoretical sense there is no limit on number of events model can support but in practice computation and interpretation can become difficult beyond few thousands of events, and even there we are in Big Data realm for memory and size handling purposes.
  • Set implies that all events are known and grouped a-priori and no unseen event will be part of model or prediction.

Let’s introduce terminology for Sequential Events in context of Markov Chains, before we get into exactly what Markov Chains are.

States are events or group of events. Since we know set of events, we also know set of states under consideration. Set of all states is called state space. Consecutive states define state transition. State A followed by state B define A’B transition, also abbreviated as AB or A|B. A chain consists of sequence (series) of multiple state transitions, for example A’B’C’D’E is a chain, also abbreviated as ABCDE. A dataset consist of many chains which are generated by underlying process. Process is the true system which is generating multiple sequences of events. In our example above, true architecture of website with all the linked pages is process, which each customer visit may be a chain which will capture subset of all states.

State P is accessible from Q if can be reached in finite steps, via zero or more intermediate states, in process representation. This doesn’t mean that every chain will necessarily have state Q after state P. State P and Q are called to be communicating with each other if they can be reached from each other. Communicating class is collection of states where each pair communicates with each other, and not with any state else. Chain is considered irreducible if all states form a communicating class i.e. each state can be reached by each state. A state which cannot be exited or transitioned out of is called absorbing state, and state which cannot be entered or transition into is called entry state.

In example process above, A, B, C, D, E, and F are individual states and set {A, B, C, D, E, F} is state space. A’B, B’C, B’D, B’E, C’D, E’D, D’F are all viable state transitions, while A’D, A’F, B’F, B’A are some examples of non-viable state transitions. Figure represents underlying process which may generate different chains: A’B’C’D’F is one such chain, and A’B’D’F is another, A’B’E’D’F is third. State B is accessible from A, and so is C from A, but not E from C. There is no communicating class in this example since the no pair can be reached from each other. State A is entry state and F is absorbing state.

Lastly, a word about Markov. Named after Andrey Markov, Markov property of state transitions implies that next transition is only function of only last N transitions, and not all transitions in the sequence. In most common situations, N is one, in that next transition is only function of current state, and not how come chain came to being in current state. This is called “memorylessness” and often good approximation of reality and valuable simplifying assumption in many modeling problems. N is called order is Markov process. For instance, next webpage a person can visit is function of current page person is on and irrespective of how person landed on that page. Similarly, next turn a car can take on road is function of current position on road and not where car was before it came on this road. And so on.

In next post, we will delve into mathematics of Markov Chains and talk about how we go about modeling a Markov process.

Other Articles by the same author

Curse Dimensionality

Semi-Supervised Clustering

Other Related Links that you may like

Overview of Text Mining

Role of Business Analyst