Addressing Threats to Validity in LLM-Based Software Engineering Research

Open-source models: Transition from closed-source to open-source models to allow for greater transparency and accountability.
Model interpretability: Develop techniques to explain LLM decisions, enabling users to understand the reasoning behind them.
Data privacy: Establish clear data handling policies to protect sensitive information and adhere to ethical standards.
Multiple independent prompts: Encourage the use of multiple prompts or queries to reduce potential bias in LLM outputs.
Metadata provision: Provide metadata on LLM-generated content, enabling users to assess its credibility and relevance.

Clearly define research objectives: Specify the research question or problem statement to ensure alignment with validity standards.
Evaluation methods: Utilize appropriate evaluation methodologies tailored to LLM-based research, such as code obfuscation and multiple independent prompts.
Data privacy and security: Implement robust data protection measures and adhere to ethical guidelines when working with sensitive information.
Reproducibility: Establish protocols for replicating LLM-based research contributions, enabling the community to build upon existing work and advance the field.
Communication and collaboration: Encourage open communication between stakeholders, fostering a collective understanding of evolving practices and guidelines. By following these guidelines, we can enhance the validity and reliability of LLM-based research contributions, ultimately advancing the field while ensuring ethical standards are upheld.