In this research article, the authors propose a novel method for training graph neural networks (GNNs) called HASP. HASP is designed to address the challenge of training GNNs on large graphs by partitioning them into smaller subgraphs and training each subgraph separately. This approach allows for efficient training of GNNs, resulting in an average improvement of over 50 times when compared to training a full graph on a large-memory CPU machine.
The authors explain that one of the main challenges in training GNNs is the sheer size of the graphs, which can lead to memory limitations and slow training times. To address this challenge, HASP divides the graph into multiple subgraphs, each containing a subset of the nodes and edges. By training each subgraph separately, HASP reduces the complexity of the training process and enables faster convergence.
To ensure that the semantic information is preserved across all subgraphs, HASP duplicates the semantic nodes, which are responsible for carrying the graph’s meaningful concepts, across all subgraphs. This design choice allows for the consistent preservation of semantics while enabling efficient training.
The authors also propose a novel technique called "node sampling" that enables users to customize the inclusion of certain nodes in each subgraph. This feature is particularly useful when dealing with complex scenarios such as integrating external show entities, where specific nodes may need to be included or excluded based on the application’s requirements.
Through extensive experiments on several benchmark datasets, the authors demonstrate the efficiency and effectiveness of HASP. They show that HASP achieves better performance than existing methods while significantly reducing training time and memory usage.
In summary, HASP is a powerful tool for training GNNs efficiently by partitioning the graph into smaller subgraphs and training each subgraph separately. By preserving semantic information across all subgraphs and enabling customized node inclusion, HASP provides a robust solution for dealing with large graphs in various applications.
Computer Science, Information Retrieval