In this paper, the authors propose a new method for improving the quality of scanned documents, called "DocTr." The problem with scanned documents is that they can be distorted or warped due to factors such as the way the document was scanned or the condition of the original document. This can make it difficult to read or recognize text in the document.
The DocTr method addresses this problem by using a deep learning model to transform the distorted image of the document into a flatter, more rectangular version that is easier to read. The model is trained on a large dataset of scanned documents and can handle various types of deformations, including those caused by non-uniform scaling, rotation, and perspective projection.
The key innovation of DocTr is its ability to integrate both geometric unwarping and illumination correction into a single model. This allows the model to simultaneously improve the geometry and lighting conditions of the document, resulting in higher-quality output. The authors also propose a new loss function called "Polar-Doc-IOU" that helps the model learn to predict the correct points for contour and mapping points, leading to more accurate dewarping results.
To implement this method, the authors use a one-stage regression framework that combines both segmentation and dewarping tasks into a single optimization process. This allows for more efficient optimization and eliminates the need for two-stage models that require separate optimization stages for each task.
The authors demonstrate the effectiveness of DocTr on several benchmark datasets, achieving state-of-the-art performance in terms of both geometric unwarping and illumination correction. They also show that their method outperforms existing two-stage models that use a separate optimization stage for segmentation, highlighting the advantages of using a one-stage approach.
In summary, DocTr is a powerful deep learning model that can improve the quality of scanned documents by transforming them into a more rectangular and flatter version. By integrating both geometric unwarping and illumination correction into a single model, DocTr achieves higher accuracy and efficiency than existing methods. Its one-stage regression framework makes it easy to optimize and eliminates the need for separate optimization stages for each task, making it an ideal solution for practical applications.
Computer Science, Computer Vision and Pattern Recognition