Detecting and Explaining Toxic Language with Fine-Grained Definitions and Context-Aware Models

The paper presents a novel approach to detecting toxic content in text, titled "Context-Aware Toxic Language Detection" (CoT). The proposed method leverages a context tree and context selector module to automatically select the most relevant context for each prompt, enabling the detection of toxic language with high accuracy.
The authors argue that traditional approaches to toxic language detection rely on hand-crafted rules or shallow learning methods that fail to account for the complexities of natural language. In contrast, CoT employs a hierarchical context tree structure to represent the universe of context and a context selector module to dynamically select the most appropriate context for each prompt.
The authors evaluate their method on several datasets and demonstrate its effectiveness in detecting toxic language while avoiding false positives. They also show that fine-tuning the model with both labels and rationales can improve its performance, enabling it to provide rich rationales for its decisions.

Key Takeaways

CoT uses a context tree and context selector module to dynamically select the most relevant context for each prompt, improving toxic language detection accuracy.
Traditional approaches to toxic language detection are limited by their reliance on hand-crafted rules or shallow learning methods.
CoT’s hierarchical context tree structure and dynamic context selection enable it to account for the complexities of natural language.
Fine-tuning the model with both labels and rationales can further improve its performance, providing rich explanations for its decisions.

ARXIV/2312.08303 authored by Jiang Zhang, Qiong Wu, Yiming Xu, Cheng Cao, Zheng Du, Konstantinos Psounis.

Detecting and Explaining Toxic Language with Fine-Grained Definitions and Context-Aware Models

Key Takeaways

LLama 2 7B Chat

Categories

Tags

Archives

Detecting and Explaining Toxic Language with Fine-Grained Definitions and Context-Aware Models

Key Takeaways

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives