Scene Text Removal and Editing: A Comprehensive Survey

This article explores the impact of varying attention thresholds on the performance of a scene text recognition model called PRM (Polygon Rectification Module). The PRM method adjusts the boundary points of a polygon to fine-tune the recognition of text with different aspect ratios. By conducting experiments with different attention thresholds, the authors found that increasing the threshold led to improved performance for some types of text, while decreasing it resulted in better performance for others.
The authors also introduce a new module called Polygon Generation, which generates polygons based on the coarse anchors obtained from the off-the-shell text recognition model. This module adjusts the boundary points horizontally to produce more precise adjustments, ensuring that the polygon accurately represents the text.
To accommodate different variations in shape, the authors preset various default anchors, including extra-long, large, normal, and short anchors, each with multiple sizes to cover different text instances. By selecting the optimal anchor with the highest recognition confidence, the PRM method can accurately recognize scene text with varying aspect ratios.
In summary, the article explores the impact of attention thresholds on the performance of a scene text recognition model and introduces a new module for polygon adjustment. The authors found that varying attention thresholds can improve or decline performance for different types of text, and they introduced a module to generate polygons more accurately. By selecting the optimal anchor with the highest recognition confidence, the PRM method can accurately recognize scene text with varying aspect ratios.

ARXIV/2312.13778 authored by Linger Deng, Mingxin Huang, Xudong Xie, Yuliang Liu, Lianwen Jin, Xiang Bai.

Scene Text Removal and Editing: A Comprehensive Survey

LLama 2 7B Chat

Categories

Tags

Archives

Scene Text Removal and Editing: A Comprehensive Survey

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives