BioEL is a biomedical entity linking approach that leverages cross-encoders to improve the accuracy and efficiency of entity recognition. Unlike traditional methods that rely on string matching or supervised multi-class classifiers, BioEL uses pre-trained language models and fine-tunes them for semantic information capture. The proposed model employs a bi-encoder architecture, which encodes mentions and entities into the same embedding space before linking them using embedding similarities. To enhance performance, two-stage methods are employed, including re-ranking models to improve accuracy. BioEL also explores generative models that generate linked entities directly, bypassing the need for negative sample mining.
BioEL is divided into three categories: retrieval-based methods, two-stage methods, and generative methods. Retrieval-based methods solely utilize a bi-encoder to retrieve related entities, including BioSyn [7], ResCNN [8], and SapBERT [9]. Two-stage methods further use a re-rank model to improve performance, such as Clustering-based [22] and Prompt-BioEL [13]. Generative methods directly generate linked entities rather than through retrieval, including Gen-BioEL [14] and BioBART [15].
The BioEL model exclusively employs a bi-encoder as its entity linking approach due to its practicality and flexibility. The shared encoder, SapBERT [9], generates dense vectors for both mentions and entities. The mention embedding is formulated as the concatenation of the sentence representation and the entity name embedding.
In summary, BioEL is an efficient and effective biomedical entity linking approach that leverages cross-encoders to capture semantic information. By fine-tuning pre-trained language models and employing a bi-encoder architecture, BioEL improves accuracy and efficiency in entity recognition. Its two-stage method and generative approaches further enhance performance, making it a robust solution for biomedical entity linking tasks.
Computation and Language, Computer Science