Interactive multi-document summarization is a crucial task in natural language processing, as it enables users to selectively retrieve important information from multiple documents. However, selecting queries based solely on their informativeness may lead to complex and hard-to-answer queries for humans. This paper proposes a preference-based approach that considers the ease of answering queries in addition to their informativeness.
Preference-Based Interactive Multi-Document Summarization:
The proposed method, called APRIL (Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning), combines active preference learning and reinforcement learning to learn a summary generator that can interactively generate summaries for users. The key idea is to use the user’s feedback to adapt the summary generator, rather than relying solely on the informativeness of the queries.
Active Preference Learning
Active preference learning involves asking the user for their preferences between different summaries, and using these preferences to update the summary generator. This approach is effective in reducing the cognitive load on the human oracle, as it allows users to selectively provide feedback rather than answering complex and hard-to-understand queries.
Reinforcement Learning
Reinforcement learning involves using the user’s feedback to train a reward model that can guide the summary generator towards generating high-quality summaries. The reward model is trained based on the user’s preferences, and it can learn to generate summaries that are more informative and easier to understand.
Scaling Laws for Reward Model Overoptimization
One challenge in using reinforcement learning for interactive multi-document summarization is the problem of overoptimization, where the reward model becomes too specialized to the training data and fails to generalize to new documents. To address this issue, Gao et al. propose a scaling law approach that adjusts the reward model based on the complexity of the input document. This approach can help to prevent overoptimization and improve the generalization of the summary generator.
Conclusion
In conclusion, APRIL is a preference-based interactive multi-document summarization method that combines active preference learning and reinforcement learning to learn a summary generator that can interactively generate summaries for users. By considering both the informativeness and the ease of answering queries, APRIL can provide high-quality summaries that are more comprehensible and easier to understand than those generated by traditional methods. The scaling law approach proposed in this paper can help to prevent overoptimization and improve the generalization of the summary generator, making it a promising method for interactive multi-document summarization.