Tohoku University verifies unexpectedly large gap between medical image diagnosis AI and medical findings

　A research group led by Assistant Professor Soh Yibun of Tohoku University Graduate School compared the focus areas of deep learning models that achieved high performance in previous research with the important areas based on doctors' diagnoses, and found that 30% of the focus areas ~80% are unrelated to medically important areas, and it has become clear that there is a large discrepancy between the two.

　Artificial intelligence (AI) such as deep learning has made remarkable progress, and its application to medical image diagnosis is progressing. However, the validity of the deep learning model, such as the extent to which the features of the medical images focused on match the medical findings, is insufficiently verified, and there is a possibility that it may cause a discrepancy with the doctor's diagnosis results in clinical practice. It is feared.

　The research group examined the medical validity of a deep learning model that had achieved high performance in previous research, using the example of drowning diagnosis using forensic post-mortem images. Image features that the deep learning model has focused on are identified using visualization technology and defined as "regions of interest," and image regions annotated based on image findings by diagnostic radiologists are defined as medical "important regions." compared.

　As a result, the model's areas of interest matched medically important areas in only 30% of cases. Furthermore, even when there was approximately 80% agreement, the positions of importance within the region were different. Considering that the tested deep learning model was reported to be able to classify drowning with a high accuracy rate of over 90% in previous research, there was an unexpectedly large discrepancy between the model and clinical medical findings. It can be said that there was.

　The research indicates that there are concerns about the medical validity of medical image diagnosis using AI, and it is expected that clinical application of highly safe AI will be achieved by proceeding with verification and countermeasures such as the development of new training methods. It is said that it will be done.

Paper information:[Journal of Imaging Informatics in Medicine] Inconsistency between Human Observation and Deep LearningModels: Assessing Validity of Postmortem Computed Tomography Diagnosis of Drowning