Improving Machine Learning Models with Interactive Correction of Mislabeled Training Data

In this paper, we discuss the challenges of debugging machine learning models when they are trained on imbalanced data. Imbalanced data occurs when one class has a significantly larger number of instances than the other classes. This can lead to biased models that perform poorly on minority classes, causing incorrect predictions and decreasing model accuracy.
The authors propose a technique called Reweighter, which is designed to address these issues by reweighting the training data to balance the classes. The Reweighter algorithm uses two techniques: class-balanced batch sampling and sample-level weight adjustment. Class-balanced batch sampling ensures that each batch contains approximately equal numbers of instances from all classes, while sample-level weight adjustment updates the weights of samples based on their difficulty to balance the classes.
The authors evaluate Reweighter using several experiments on different datasets and compare it to other balancing techniques. They find that Reweighter outperforms other methods in terms of accuracy and computational efficiency. Additionally, they conduct a user study with domain experts who provide feedback on the usability of Reweighter, identifying some limitations and suggesting future research directions.
To summarize, Reweighter is a useful technique for debugging machine learning models trained on imbalanced data. It balances the classes by reweighting the training data, improving model accuracy and reducing bias towards majority classes. While there are some limitations to Reweighter, it has shown promising results in addressing the challenges of imbalanced data and could be useful in a variety of applications where machine learning models need to be accurate and unbiased.

ARXIV/2312.05067 authored by Weikai Yang, Yukai Guo, Jing Wu, Zheng Wang, Lan-Zhe Guo, Yu-Feng Li, Shixia Liu.

Improving Machine Learning Models with Interactive Correction of Mislabeled Training Data

LLama 2 7B Chat

Categories

Tags

Archives

Improving Machine Learning Models with Interactive Correction of Mislabeled Training Data

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives