A research group at Tokyo Medical and Dental University verified the reliability of ChatGPT in self-diagnosing orthopedic diseases, and found that the accuracy rate and reproducibility of the diagnosis were low.They also discovered that the correct answer rate changes depending on the way the question is asked, and clarified the important points for obtaining high reliability.
The number of patients using chatbots powered by AI to self-diagnose before visiting the hospital is increasing, and this number is expected to continue to increase in the future.However, although there have been several studies that have evaluated the correct answer rate of ChatGPT in self-diagnosis, there have been no studies regarding its reproducibility or the extent to which it is recommended to undergo a medical examination.
In this study, five researchers repeatedly asked ChatGPT (ver. 5) questions regarding five orthopedic diseases using the exact same text over five days, and verified the answers.Correct answer rates and reproducibility vary depending on the disease, with the lowest correct answer rate being only 5%, with reproducibility rated as ``poor.''Additionally, only about 5% of responses firmly recommended seeing a medical institution.Furthermore, they discovered that the correct answer rate varied depending on the way the question was asked, and suggested a more preferable question format.
This study highlighted problems in the medical use of ChatGPT.It is expected that this will improve the safety of generative AI as a self-diagnosis tool and make a major contribution to the development of new generative AI systems for medical assistance.In the future, they plan to explore appropriate questioning methods for each disease state, conduct research using generation AI other than ChatGPT, and new versions of ChatGPT, and evaluate their reliability.
Paper information:[Journal of Medical Internet Research] The potential of ChatGPT as a self-diagnostic tool in common orthopedic diseases: An exploratory study