Bridging the gap between complex scientific research and the curious minds eager to explore it.

Computer Science, Computer Vision and Pattern Recognition

A New Approach to Composed Image Retrieval: Leveraging Language Modeling for Global-Level Pseudo Target Captions

A New Approach to Composed Image Retrieval: Leveraging Language Modeling for Global-Level Pseudo Target Captions

Limiting Future Work in Composed Image Retrieval

In this study, we explored the limitations of current approaches to composed image retrieval and identified potential areas for future research. We employed a two-step approach that incorporates captioning and Large Language Model (LLM) generation to address the issue of selecting relevant reference information. However, we discovered that not all words in the reference images have direct counterparts in the target images, particularly when it comes to negation or certain descriptions.
To overcome these limitations, future research could focus on developing more sophisticated methods for selecting reference information, such as using multi-modal input or incorporating domain-specific knowledge. Another promising avenue is exploring alternative approaches to composed image retrieval, such as leveraging graph-based models or exploiting the hierarchical structure of images.
In summary, while our study makes significant progress in improving composed image retrieval, there are still many opportunities for advancing this field. By addressing these limitations and exploring new directions, we can enhance the accuracy and efficiency of composed image retrieval systems.