Hierarchical Prompts for Zero-Shot 3D Shape Recognition with CLIP

To further refine the selected views, the authors employ hierarchical prompts powered by large language models (LLMs). These prompts are designed to evaluate the semantic representation of each view and generate hand-crafted prompts containing categories and their textual features. The prediction of each view is calculated separately using the generated prompts, and the entropy of logitsi is computed as before.
The proposed method is evaluated on several benchmark datasets, and the results demonstrate its effectiveness in improving the accuracy of visual grounding compared to existing methods. The authors also provide a detailed analysis of the selected views, which reveals that they are more diverse and informative than those obtained using traditional methods.
In summary, the article presents a novel approach to visual grounding that leverages both CLIP and hierarchical prompts to improve the accuracy and diversity of the selected views. The proposed method has important implications for various applications, including robotics, autonomous driving, and human-computer interaction.

ARXIV/2311.18402 authored by Dan Song, Xinwei Fu, Weizhi Nie, Wenhui Li, Anan Liu.

Hierarchical Prompts for Zero-Shot 3D Shape Recognition with CLIP

LLama 2 7B Chat

Categories

Tags

Archives

Hierarchical Prompts for Zero-Shot 3D Shape Recognition with CLIP

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives