Bridging the gap between complex scientific research and the curious minds eager to explore it.

Computer Science, Computer Vision and Pattern Recognition

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

In this article, we propose a novel approach to retrieve Metaverses based on their textual descriptions. Our method involves two main phases: first, we separate the Metaverse descriptions into individual sentences using periods as splitting terms; then, we obtain a contextual representation for the full description through a neural sequence model. Next, we create a painting description using a template and manually adapted information from ChatGPT. Finally, we combine the scenario and painting descriptions to obtain the final one.
We evaluate our approach on several benchmark datasets and achieve promising results. Our proposed method can be used for various applications such as virtual reality, industrial training, and predictive maintenance. By leveraging textual descriptions, our approach offers a language-based solution for Metaverse retrieval, making it more accessible and user-friendly.

Key takeaways

  • We propose a novel approach to retrieve Metaverses based on their textual descriptions.
  • Our method involves two main phases: sentence separation and contextual representation.
  • We create a painting description using a template and manually adapted information from ChatGPT.
  • We evaluate our approach on several benchmark datasets and achieve promising results.
  • Our proposed method can be used for various applications such as virtual reality, industrial training, and predictive maintenance.
  • By leveraging textual descriptions, our approach offers a language-based solution for Metaverse retrieval, making it more accessible and user-friendly.