In the last years, individuals’ location is being continuously captured from mobile devices. Such a location data is an important foundation for the learning of individuals’ location behavior, which can serve as indication for their lifestyle as well as other behavioral features. The most common way to represent a person`s location pattern is by a sequence (trajectory) of locations that the person visits in given time frames.
In this work we focused on the task of clustering mobile phone users based on their semantic as well as geographical location distribution. When using semantic locations instead of geographical points, clusters of users can represent, for example, similar behavior of users regardless to their actual location. Therefore, we can identify people with the same lifestyle even when they live in different cities. Most works in this field of research use sequence alignment methods (such as Hamming distance and Longest Common Subsequence) on a deterministic framework (which indicates a simple trajectory of the user), although users might follow more complicated behavioral patterns.
The research objective is to cluster users based on a probabilistic profiling that will represent complex mobility behavior. One way to do so is to profile the individual`s behavior by a Markovian models. With this model we are able to capture the relationship between different time intervals in the user’s trajectory. When using a probabilistic profiling, the distance between two users is evaluated as a distance between two distributions. We used measures from the field of information theory, such as the Kullback-Leibler divergence, applied various clustering methods (K-medoids, Hierarchical clustering, Spectral clustering and DBSCAN) and used internal validation indices in order to find the most suitable clustering to various applications. The used data is real and based on unique dataset.