This study compared the quality and readability of responses provided by physicians, a language model called GPT-4, and an AI-assisted response generator. The results showed that physician responses were shorter and more readable than the other two options, with a Flesch reading ease score of 67 compared to 45 for GPT-4 and 46 for AI-assisted responses. GPT-4 drafts were found to be overall helpful and safe, but could potentially lead to severe harm or death if left unedited in some cases. AI-assisted responses were more similar in content to GPT-4 drafts than manual responses. The study suggests that physicians are better at providing clear and concise responses than AI systems, and that there is still a need for human oversight and editing in medical communication.
Key Takeaways
- Physician responses were shorter and more readable than GPT-4 or AI-assisted responses.
- GPT-4 drafts were overall helpful and safe but could lead to severe harm in some cases if left unedited.
- AI-assisted responses were more similar in content to GPT-4 drafts than manual responses.
- Human oversight and editing are still necessary in medical communication.