The impact of responding to patient messages with large language model assistance.

This study compared the quality and readability of responses provided by physicians, a language model called GPT-4, and an AI-assisted response generator. The results showed that physician responses were shorter and more readable than the other two options, with a Flesch reading ease score of 67 compared to 45 for GPT-4 and 46 for AI-assisted responses. GPT-4 drafts were found to be overall helpful and safe, but could potentially lead to severe harm or death if left unedited in some cases. AI-assisted responses were more similar in content to GPT-4 drafts than manual responses. The study suggests that physicians are better at providing clear and concise responses than AI systems, and that there is still a need for human oversight and editing in medical communication.

Key Takeaways

Physician responses were shorter and more readable than GPT-4 or AI-assisted responses.
GPT-4 drafts were overall helpful and safe but could lead to severe harm in some cases if left unedited.
AI-assisted responses were more similar in content to GPT-4 drafts than manual responses.
Human oversight and editing are still necessary in medical communication.

ARXIV/2310.17703 authored by Shan Chen, Marco Guevara, Shalini Moningi, Frank Hoebers, Hesham Elhalawani, Benjamin H. Kann, Fallon E. Chipidza, Jonathan Leeman, Hugo J.W.L. Aerts, Timothy Miller, Guergana K. Savova, Raymond H. Mak, Maryam Lustberg, Majid Afshar, Danielle S. Bitterman.

The impact of responding to patient messages with large language model assistance.

Key Takeaways

LLama 2 7B Chat

Categories

Tags

Archives

The impact of responding to patient messages with large language model assistance.

Key Takeaways

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives