This article explores the impact of varying attention thresholds on the performance of a scene text recognition model called PRM (Polygon Rectification Module). The PRM method adjusts the boundary points of a polygon to fine-tune the recognition of text with different aspect ratios. By conducting experiments with different attention thresholds, the authors found that increasing the threshold led to improved performance for some types of text, while decreasing it resulted in better performance for others.
The authors also introduce a new module called Polygon Generation, which generates polygons based on the coarse anchors obtained from the off-the-shell text recognition model. This module adjusts the boundary points horizontally to produce more precise adjustments, ensuring that the polygon accurately represents the text.
To accommodate different variations in shape, the authors preset various default anchors, including extra-long, large, normal, and short anchors, each with multiple sizes to cover different text instances. By selecting the optimal anchor with the highest recognition confidence, the PRM method can accurately recognize scene text with varying aspect ratios.
In summary, the article explores the impact of attention thresholds on the performance of a scene text recognition model and introduces a new module for polygon adjustment. The authors found that varying attention thresholds can improve or decline performance for different types of text, and they introduced a module to generate polygons more accurately. By selecting the optimal anchor with the highest recognition confidence, the PRM method can accurately recognize scene text with varying aspect ratios.
Computer Science, Computer Vision and Pattern Recognition