In this paper, the authors present a comprehensive overview of their dialogue system designed to enable robots to engage in natural and effective conversations with humans. The system utilizes various modules such as automatic speech recognition, common ground update, response generation, voice action selection, and expression control and motion control. These modules work together to enable the robot to understand and respond to user inputs in a more human-like manner.
The authors describe how their system leverages intermediate results of automatic speech recognition to improve its performance. They also detail the implementation of various modules using Google Chrome’s Web Speech API for speech recognition, Amazon Polly API for text-to-speech synthesis, and GPT-3.5 and GPT-4 language models developed by OpenAI.
The system is designed to engage in a dialogue while waiting for user understanding, demonstrating its ability to incrementally build common ground and take more natural turns based on the user’s utterances. The authors also discuss their future plans to explore utilizing multimodal information to achieve even more natural turn-taking.
The authors highlight that their system was supported by a Grant-in-Aid for Scientific Research (Grant No. JP19H05692). They also provide references to previous works in the field, including the Dialogue Robot Competition 2023 and Language Resources and Evaluation Conference 2022.
Overall, the authors’ system represents a significant step towards enabling robots to converse with humans in a more natural and effective manner, with potential applications in various fields such as customer service, healthcare, and education.
Computation and Language, Computer Science