A New Approach to Composed Image Retrieval: Leveraging Language Modeling for Global-Level Pseudo Target Captions

Limiting Future Work in Composed Image Retrieval

In this study, we explored the limitations of current approaches to composed image retrieval and identified potential areas for future research. We employed a two-step approach that incorporates captioning and Large Language Model (LLM) generation to address the issue of selecting relevant reference information. However, we discovered that not all words in the reference images have direct counterparts in the target images, particularly when it comes to negation or certain descriptions.
To overcome these limitations, future research could focus on developing more sophisticated methods for selecting reference information, such as using multi-modal input or incorporating domain-specific knowledge. Another promising avenue is exploring alternative approaches to composed image retrieval, such as leveraging graph-based models or exploiting the hierarchical structure of images.
In summary, while our study makes significant progress in improving composed image retrieval, there are still many opportunities for advancing this field. By addressing these limitations and exploring new directions, we can enhance the accuracy and efficiency of composed image retrieval systems.

ARXIV/2312.08924 authored by Shitong Sun, Fanghua Ye, Shaogang Gong.

A New Approach to Composed Image Retrieval: Leveraging Language Modeling for Global-Level Pseudo Target Captions

LLama 2 7B Chat

Categories

Tags

Archives

A New Approach to Composed Image Retrieval: Leveraging Language Modeling for Global-Level Pseudo Target Captions

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives