Main content area

Model-based biclustering of clickstream data

Melnykov, Volodymyr
Computational statistics & data analysis 2016 v.93 pp. 31-45
Markov chain, algorithms, data collection, dynamic models
Navigation patterns expressed by sequences of visited web-sites or categories can characterize the behavior and habits of users. Such web-page routes taken by individuals are commonly called clickstreams. Clustering clickstream sequences is a recent yet challenging problem with many applications. The main difficulty is related to the fact that one needs to group categorical data sequences rather than vectors and the majority of traditional clustering algorithms are not applicable in this setting. The time-related character of data suggests that dynamic models have a better promise than static ones. Model-based clustering relying on the mixture of first order Markov models will be considered. Since the number of distinct web-pages, and therefore the number of states in a Markov process, can be very high, such a mixture model involves a large number of parameters. Thus, grouping states by their similarity to reduce the number of parameters in the model is also proposed. Then, states are clustered along with users providing a biclustering framework. The developed methodology is illustrated on synthetic and real datasets with good results.