Bridging the gap between complex scientific research and the curious minds eager to explore it.

Computer Science, Computer Vision and Pattern Recognition

Refinement Network for Improved Image Categorization in Minimally Invasive Surgery Videos

Refinement Network for Improved Image Categorization in Minimally Invasive Surgery Videos

In this article, we propose a novel approach to object detection called weakly semi-supervised learning. Our method leverages the knowledge from pre-trained networks and refines the category labels using a new network architecture. We compare our approach with other supervision methods and demonstrate its effectiveness in accurately classifying instances.
The traditional approach to object detection involves training a model on fully annotated images, which can be time-consuming and expensive to obtain. Our method addresses this challenge by utilizing weakly annotated images, where only the category of each proposal is labeled. By refining the category labels using a new network architecture, we can improve the accuracy of object detection.
To achieve this, we design a refinement network that takes the pseudo-labels and visual features from an intermediate RoI Align layer in Faster R-CNN as input. The refinement network is composed of two MLP encoders and five transformer encoders, which perform tool interaction reasoning using self-attention. The last MLP layer predicts the category of each proposal.
Our approach differs from traditional supervised learning methods in that we use both weakly annotated images and pseudo-labels to train the student network. We compare our method with supervised learning, where only fully annotated images are used, and semi-supervised learning, where both fully annotated images and pseudo-labels are used without refinement. Our experiments show that our method outperforms both baselines in terms of accuracy.
In summary, our article proposes a weakly supervised object detection framework that leverages pre-trained networks and refines category labels using a new network architecture. By comparing our approach with other supervision methods, we demonstrate its effectiveness in accurately classifying instances. Our method has the potential to significantly reduce the time and cost associated with fully annotated image labeling, making object detection more accessible to a wider range of applications.