In this paper, the authors propose a new metric called 𝑒 to evaluate the quality of generated summaries in natural language processing (NLP). They introduce METEOR [9], a measure that combines n-gram precision and recall to compute similarity between two sets of mapped unigrams. The authors also provide guidelines for setting penalty parameters and demonstrate their effectiveness using examples.
𝑒 is a useful metric for assessing the quality of summaries, especially when the developer expects the generated summaries not to contain too many tokens from the code snippet itself. SBT (Semantic Behavioral Transfer) is suitable for situations where the code snippet lacks informative tokens for generating a summary, as it does not rely on identifiers used by the code snippet itself.
The authors explain that 𝑒 measures the similarity between two sets of unigrams and computes precision, recall, and harmonic mean to compute a measure of similarity. They also provide guidelines for setting penalty parameters based on [99].
In conclusion, this paper presents an innovative approach to evaluating summary quality in NLP, which could significantly impact the field by improving assessment methods. By using METEOR, developers can more accurately evaluate summaries and make informed decisions about their usefulness for various applications.
Computer Science, Software Engineering