Detecting Toxicity in Open Source Communication: A Machine Learning Approach

Open source communities are collaborative platforms where software developers work together to create and share code. However, these communities can also be breeding grounds for toxic behavior, such as insults, discrimination, and passive aggressiveness. In this article, we explore the common errors in detecting toxicity in open source discussions and provide a detailed analysis of three prompts that demonstrate effectiveness in identifying toxic comments.
We observed that lengthy phrasing can sometimes lead to misunderstandings when dealing with complex sentences or lengthy phrases. To address this challenge, we propose using a balance between clarity and length when crafting task instructions. Additionally, we identified the importance of multi-level toxicity categories in understanding subtle forms of toxicity, such as passive aggressiveness.
The selected prompts were designed to detect different types of toxicity, including active toxicity (Prompt 1), passive toxicity (Prompt 2), and subtle toxicity (Prompt 3). Each prompt provides a scenario that elicits a response from the model, which can help identify toxic behavior. For example, Prompt 1 asks the model to determine whether a given conversation contains any toxicity, while Prompt 2 inquires about the toxicity level of a particular comment.
Our analysis reveals that these prompts are effective in detecting toxicity due to their simplicity and clarity. By using everyday language and engaging metaphors, these prompts can help demystify complex concepts and make them more accessible to a wider audience.
In conclusion, understanding toxicity in open source communities is essential for creating a productive and inclusive environment. By leveraging the insights from this article, software developers can develop effective prompts that detect toxic behavior and promote healthy communication patterns. As the open source community continues to grow, it is crucial that we prioritize the well-being of all contributors and foster a culture of respect and inclusivity.

ARXIV/2312.13105 authored by Shyamal Mishra, Preetha Chatterjee.

Detecting Toxicity in Open Source Communication: A Machine Learning Approach

LLama 2 7B Chat

Categories

Tags

Archives

Detecting Toxicity in Open Source Communication: A Machine Learning Approach

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives