In this study, researchers aimed to improve the clustering of user activities based on semantic features and spatiotemporal information. They developed a multi-view k-means clustering method that adapts traditional k-means through a co-training process, leveraging prior information or knowledge from each view to enhance consistency across different views. The researchers also introduced three new features: semantic distance, aggregated weights, and unique activity semantic number. The results showed that the proposed method outperformed traditional clustering methods in terms of clustering quality and interpretability.
The researchers constructed a refined feature framework with an emphasis on high-order features across spatiotemporal and semantic dimensions. They used word2vec, a model based on the Continuous Bag-of-Word (CBOW) word embedding algorithm, to convert each node within the user activity semantics into a vector representation. The model has two hyperparameters: 𝑑𝑖𝑚, which represents the length of the embedding vector, and 𝑤, which is the context length.
The researchers then developed three features: semantic distance, aggregated weights, and unique activity semantic number. Semantic distance captures the variability of the user’s semantic activity by taking the maximum distance between any two different semantic vectors within the user’s semantic list. Aggregated weights represent the average vector of all semantic vectors in the user’s semantic list. Unique activity semantic number quantifies the richness of the user’s semantic activities by referring to the number of unique semantic vectors in the user’s semantic list.
The experimental results showed that a set of high-order features across spatiotemporal and semantic dimensions can significantly improve the clustering quality and interpretability of user activities. The researchers also identified different clusters of users based on their semantic activities, such as parents and students associated with the high school education topic in cluster 2, and users who live and work nearby in cluster 5.
In conclusion, this study demonstrates the effectiveness of combining spatiotemporal and semantic features for clustering user activities. The proposed method can provide a more comprehensive understanding of user behavior and preferences, which can be useful for various applications such as recommendation systems and location-based services. Future work may focus on improving the accuracy and interpretability of the clustering results by incorporating additional data sources or using more advanced machine learning techniques.
Computer Science, Machine Learning