Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

In this article, we propose a novel approach to retrieve Metaverses based on their textual descriptions. Our method involves two main phases: first, we separate the Metaverse descriptions into individual sentences using periods as splitting terms; then, we obtain a contextual representation for the full description through a neural sequence model. Next, we create a painting description using a template and manually adapted information from ChatGPT. Finally, we combine the scenario and painting descriptions to obtain the final one.
We evaluate our approach on several benchmark datasets and achieve promising results. Our proposed method can be used for various applications such as virtual reality, industrial training, and predictive maintenance. By leveraging textual descriptions, our approach offers a language-based solution for Metaverse retrieval, making it more accessible and user-friendly.

Key takeaways

We propose a novel approach to retrieve Metaverses based on their textual descriptions.
Our method involves two main phases: sentence separation and contextual representation.
We create a painting description using a template and manually adapted information from ChatGPT.
We evaluate our approach on several benchmark datasets and achieve promising results.
Our proposed method can be used for various applications such as virtual reality, industrial training, and predictive maintenance.
By leveraging textual descriptions, our approach offers a language-based solution for Metaverse retrieval, making it more accessible and user-friendly.

ARXIV/2312.14630 authored by Ali Abdari, Alex Falcon, Giuseppe Serra.

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Key takeaways

LLama 2 7B Chat

Categories

Tags

Archives

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Key takeaways

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives