In this article, we present a new approach to document image binarization that leverages the transformer architecture for efficient and accurate results. Traditional methods often struggle with images containing large areas of white space or empty zones without text. Our proposed method overcomes these challenges by utilizing a novel transformer-based approach that effectively separates text and non-text regions.
We compare our method with state-of-the-art approaches on several benchmark datasets, demonstrating improved performance in terms of peak signal-to-noise ratio (PSNR), frame rate (Fps), and document retrieval distance (DRD). Our proposed method outperforms existing methods, particularly when dealing with complex documents containing multiple fonts or font sizes.
Our key findings include
- A novel transformer-based approach for efficient document image binarization
- Improved PSNR, Fps, and DRD compared to state-of-the-art methods on several benchmark datasets
- Effective separation of text and non-text regions, even in complex documents with multiple fonts or font sizes
By leveraging the transformer architecture, our proposed method offers a more practical choice for document image binarization from a computation perspective. With its accuracy and efficiency, this approach has significant potential for applications in various fields, such as document analysis, data entry, and information retrieval.
In summary, our article presents a novel transformer-based method for efficient document image binarization that offers improved performance compared to existing approaches. By leveraging the strengths of the transformer architecture, this method demonstrates its effectiveness in separating text and non-text regions, making it a valuable tool for various applications in document analysis and beyond.