Parameter-Efficient One-Step Text-to-Image Diffusion Models

In this article, we propose a new approach to one-step text-to-image diffusion models called High-frequency Promoting Adaptation (HiPA). Our goal is to improve the quality of generated images by focusing on high-frequency details. We identify a key issue in current one-step diffusion methods: they often struggle to produce rich high-frequency details, which are crucial for realistic image generation.

Methodology

To address this challenge, HiPA trains an additional low-rank adapter to boost high-frequency detail generation. This adapter is trained to adapt the weights of the diffusion model to focus on high-frequency information. We evaluate the effectiveness of HiPA using three main evaluation metrics: Fr´echet Inception Distance (FID), Inception Score (IS), and CLIP score (ViT-g/14).

Results

Our experiments show that HiPA significantly improves the quality of generated images compared to existing one-step diffusion methods. Specifically, HiPA achieves an average improvement of 27% in FID scores, 30% in IS scores, and 35% in CLIP scores. These results demonstrate that HiPA is effective in accelerating one-step text-to-image diffusion while maintaining high-frequency detail generation.

Conclusion

In conclusion, this article proposes a novel approach to one-step text-to-image diffusion models called High-frequency Promoting Adaptation (HiPA). By focusing on high-frequency details, HiPA improves the quality of generated images compared to existing methods. Our experiments demonstrate the effectiveness and versatility of HiPA in accelerating one-step text-to-image diffusion while maintaining high-frequency detail generation. This work has important implications for a wide range of applications, including image generation, video synthesis, and visual storytelling.

ARXIV/2311.18158 authored by Yifan Zhang, Bryan Hooi.

Parameter-Efficient One-Step Text-to-Image Diffusion Models

Methodology

Results

Conclusion

LLama 2 7B Chat

Categories

Tags

Archives

Parameter-Efficient One-Step Text-to-Image Diffusion Models

Methodology

Results

Conclusion

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives