Improving Translation Quality Estimation with Fine-Tuned BERT

In recent years, machine translation has seen significant advancements, with the development of transformer-based models being a major contributor to this progress. One crucial aspect of machine translation is quality estimation, which involves assessing the accuracy of translated text. The authors of this article, a group of researchers from various institutions, aim to contribute to this field by submitting their model to the WMT22 quality estimation shared task.

Model

The authors’ submission, named CometKiwi, utilizes a transformer-based architecture to extract output from the model. This approach involves using the initial token of the sequence, [CLS], which represents the entire sequence and has a unique embedding to distinguish it from other tokens. The model then assigns embeddings to each word in the sequence, creating a comprehensive representation of the input text.
The authors highlight the superiority of the CLS-strategy over other methods, such as calculating the mean or maximum value of the output vectors for pooling within the MonoTransQuest framework. By employing the CLS-strategy, CometKiwi achieves better performance in quality estimation tasks.

Results

The authors evaluate their model on several language pairs and show that it outperforms other submissions to the WMT22 quality estimation shared task. They also provide a detailed analysis of their model’s performance on various evaluation metrics, demonstrating its effectiveness in assessing translation quality.

Conclusion

In summary, CometKiwi is a transformer-based model that utilizes the CLS-strategy for extracting output and outperforms other submissions to the WMT22 quality estimation shared task. The authors’ contributions demonstrate the potential of using transformer-based models in improving translation quality estimation tasks, which can have significant implications for a wide range of applications that rely on accurate language translation.

ARXIV/2312.00525 authored by Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Tharindu Ranasinghe.

Improving Translation Quality Estimation with Fine-Tuned BERT

Model

Results

Conclusion

LLama 2 7B Chat

Categories

Tags

Archives

Improving Translation Quality Estimation with Fine-Tuned BERT

Model

Results

Conclusion

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives