Generative models are becoming more popular in production systems, but evaluating their quality is a challenge. Traditional methods rely on human ratings, which are time-consuming and expensive. This survey aims to identify objective metrics that can accurately assess the quality of generative models for vision and audio.
Objective Metrics
The authors argue that traditional training objectives such as mean squared error are not well correlated with human quality ratings. Instead, they propose using perceptual metrics that take into account structural information common across natural signals or the structure of neurological pathways used in perception. These metrics can be designed by analyzing the normalized laplacian pyramid, which is a way to represent images in a more compact and efficient manner.
Perceptual Metrics
The authors propose using perceptual metrics that better match human quality ratings. One such metric is structural similarity, which takes into account the structure of natural signals or neurological pathways used in perception. Another metric is multiscale structural similarity, which combines information from different scales to provide a more accurate assessment of image quality.
Acknowledgments
The authors acknowledge the support of various funding agencies and thank their colleagues for their contributions to the field.
Conclusion
In conclusion, evaluating generative models for vision and audio is crucial in today’s production systems. While traditional methods rely on human ratings, objective metrics can provide a faster and more efficient way to assess the quality of these models. By using perceptual metrics that take into account structural information common across natural signals or neurological pathways used in perception, we can accurately evaluate the quality of generative models for vision and audio.