In this article, we discuss how to adapt AI for human-AI interaction in chatbots. We describe a study where we evaluated the effectiveness of different approaches to adapting AI agents for human evaluations. The goal was to improve the quality and relevance of responses provided by ChatGPT, an AI language model.
Step 1: Defining the Rules (AAR/AI Steps)
In this step, we explained how we would evaluate ChatGPT’s performance. We briefed the participants about the study details and how we would do the evaluation. Then, we stated that the participants would be given a questionnaire before and after each task to provide detailed responses.
Step 2: Explaining the Objectives of the AI Agent (AAR/AI in Empirical Context)
In this step, we oriented the participants about ChatGPT’s primary objective, which is to assist them by providing contextual, disambiguous, and correct information. We also explained that the participants would be evaluating ChatGPT’s performance based on their experiences.
Inner Loop
In this section, we discuss difficulties with the AAR/AI process. The participants found it burdensome, resulting in sparse responses. To address this issue, we revised steps 3, 4, and 6 by using a mix of open and closed-ended questions and adjusting the wording of the questions for clarity. We also reversed some items with negative connotations to counter acquiescence bias. Additionally, we omitted Guideline 3 (time services based on context) and Guideline 18 (notify users about changes) due to their irrelevance in our context.
Discussion
In this section, we discuss the use of generative AI (genAI) and its potential impact on software engineering. We highlight how genAI can transform software engineering for the better and how it signals the "end of programming." We also acknowledge the potential presence of desirability bias when interpreting the results.
Conclusion
In conclusion, we adapted AI for human-AI interaction in chatbots by revising the AAR/AI process to improve its effectiveness. Our study demonstrated that using a mix of open and closed-ended questions and adjusting the wording of the questions can enhance the quality and relevance of responses provided by ChatGPT. Additionally, we highlighted the potential of genAI to transform software engineering for the better while acknowledging the potential presence of desirability bias. By demystifying complex concepts through everyday language and engaging metaphors or analogies, we captured the essence of the article without oversimplifying it.