- Dataset bias can be classified into two types: sample size bias and sample composition bias.
Sample size bias occurs when the number of samples varies across demographic groups, leading to an imbalance in the representation of different groups in the dataset. This can result in biased models that perform better on larger groups or those with more data points.
Sample composition bias arises from differences in the distribution of demographic attributes within each group. For instance, a dataset with more male than female samples may lead to biased models that are less accurate for females.
The authors also highlight the limitations of current approaches to addressing dataset bias and suggest future research directions to overcome these challenges. By understanding the various sources of dataset bias, machine learning practitioners can take steps to mitigate their impact and develop fairer and more accurate models.
Computer Science, Computer Vision and Pattern Recognition