Efficient Document Image Binarization via Hierarchical DSN and Transformer

In this article, we present a new approach to document image binarization that leverages the transformer architecture for efficient and accurate results. Traditional methods often struggle with images containing large areas of white space or empty zones without text. Our proposed method overcomes these challenges by utilizing a novel transformer-based approach that effectively separates text and non-text regions.
We compare our method with state-of-the-art approaches on several benchmark datasets, demonstrating improved performance in terms of peak signal-to-noise ratio (PSNR), frame rate (Fps), and document retrieval distance (DRD). Our proposed method outperforms existing methods, particularly when dealing with complex documents containing multiple fonts or font sizes.

Our key findings include

A novel transformer-based approach for efficient document image binarization
Improved PSNR, Fps, and DRD compared to state-of-the-art methods on several benchmark datasets
Effective separation of text and non-text regions, even in complex documents with multiple fonts or font sizes
By leveraging the transformer architecture, our proposed method offers a more practical choice for document image binarization from a computation perspective. With its accuracy and efficiency, this approach has significant potential for applications in various fields, such as document analysis, data entry, and information retrieval.
In summary, our article presents a novel transformer-based method for efficient document image binarization that offers improved performance compared to existing approaches. By leveraging the strengths of the transformer architecture, this method demonstrates its effectiveness in separating text and non-text regions, making it a valuable tool for various applications in document analysis and beyond.

ARXIV/2312.03946 authored by Risab Biswas, Swalpa Kumar Roy, Umapada Pal.

Efficient Document Image Binarization via Hierarchical DSN and Transformer

Our key findings include

LLama 2 7B Chat

Categories

Tags

Archives

Efficient Document Image Binarization via Hierarchical DSN and Transformer

Our key findings include

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives