Bridging the gap between complex scientific research and the curious minds eager to explore it.

Computation and Language, Computer Science

The impact of responding to patient messages with large language model assistance.

The impact of responding to patient messages with large language model assistance.

This study compared the quality and readability of responses provided by physicians, a language model called GPT-4, and an AI-assisted response generator. The results showed that physician responses were shorter and more readable than the other two options, with a Flesch reading ease score of 67 compared to 45 for GPT-4 and 46 for AI-assisted responses. GPT-4 drafts were found to be overall helpful and safe, but could potentially lead to severe harm or death if left unedited in some cases. AI-assisted responses were more similar in content to GPT-4 drafts than manual responses. The study suggests that physicians are better at providing clear and concise responses than AI systems, and that there is still a need for human oversight and editing in medical communication.

Key Takeaways

  • Physician responses were shorter and more readable than GPT-4 or AI-assisted responses.
  • GPT-4 drafts were overall helpful and safe but could lead to severe harm in some cases if left unedited.
  • AI-assisted responses were more similar in content to GPT-4 drafts than manual responses.
  • Human oversight and editing are still necessary in medical communication.