In this study, we investigated the inclusion of weak baselines in recent Session-based Recommendation (SBR) literature that utilizes Graph Neural Networks (GNNs). We analyzed eight GNN models published in top-tier venues and reproduced them under identical settings using the original code provided by the authors. Our findings showed that all of these GNN methods were outperformed by simple techniques, which do not even use side information, in terms of Mean Reciprocal Rank (MRR), our hyperparameter optimization criterion. Only in some situations were GNN-based models favorable in terms of Hit Rate. Our results indicate that the problem of including weak baselines still exists in recent SBR literature.
Table 2 describes the baseline algorithms we considered:
- SR (AAAI’19) – A GNN model that constructs session graphs and uses a soft attention mechanism to aggregate information among items. It was one of the first works to use a GNN for SBR.
- STAN (SIGIR’20) – A GNN-based model that combines the specificities of SKNN (AAAI’19) and VSTAN (SIGIR’20). It uses a sequence-aware item scoring procedure and an Inverse Document Frequency approach to promote less popular items.
- SFSKNN (AAAI’18) – A variant of SKNN that focuses on the recency of items by considering only those items for recommendations that appear in neighboring sessions at least once after the last item of the current session.
These simple models are sometimes difficult to beat, as shown in Table 3, which provides summary statistics for the selected datasets. The results indicate that GNN-based models were outperformed by simple techniques in terms of MRR, except in some situations where they were favorable in terms of Hit Rate.
In conclusion, our findings highlight the existence of weak baselines in recent SBR literature, even in top-tier venues. These baselines can mislead the evaluation of GNN-based models and hinder their performance. By reproducing these algorithms under identical settings, we demonstrated that simple techniques can often outperform GNN-based models, emphasizing the need for rigorous evaluation and robust baselines in SBR research.