Home    中文  
 
  • Search
  • lucene Search
  • Citation
  • Fig/Tab
  • Adv Search
Just Accepted  |  Current Issue  |  Archive  |  Featured Articles  |  Most Read  |  Most Download  |  Most Cited

Chinese Journal of Medical Ultrasound (Electronic Edition) ›› 2025, Vol. 22 ›› Issue (09): 832-837. doi: 10.3877/cma.j.issn.1672-6448.2025.09.007

• Ultrasound Quality Control • Previous Articles     Next Articles

Leveraging the large language model DeepSeek-R1 for quality control of thyroid ultrasound reports

Zhenqi Zhang, Yihan Qi, Lu Wang, Ziyue Hu, Tingting Li, Man Lu()   

  1. Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, University of Electronic Science and Technology of China, Chengdu 610041, China
  • Received:2025-08-07 Online:2025-09-01 Published:2025-12-24
  • Contact: Man Lu

Abstract:

Objective

To evaluate the utility of an intelligent quality control system based on the large language model DeepSeek-R1 in quality control of thyroid ultrasound report writing.

Methods

A total of 120 thyroid ultrasound examination reports from the Department of Ultrasound, Sichuan Cancer Hospital, collected between January and December 2024, were randomly selected as study subjects. The reports were independently evaluated by DeepSeek-R1 and two attending ultrasound physicians. The joint assessment results of an expert panel consisting of three associate chief physicians were regarded as the reference standard. Receiver operating characteristic (ROC) curves were generated to assess the performance of DeepSeek-R1 and human quality control, with the area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) calculated. Differences in AUC were compared using the DeLong test. Agreement among different quality controllers was evaluated with the Kappa statistic.

Results

Using the expert panel's assessment as the reference standard, the AUC of the DeepSeek-R1 model was 0.959 [95% confidence interval (CI): 0.902–1.000], with a sensitivity of 92.86% and specificity of 98.91%, both superior to those of the two human quality controllers [AUC: 0.828 (95%CI: 0.720-0.936), 0.835(95%CI: 0.731-0.940); sensitivity: 67.86%, 71.43%; specificity: 97.83%, 95.65%]. The DeLong test showed that the differences in AUC between DeepSeek-R1 and the two physicians were statistically significant (P=0.020, 0.025). The Kappa test demonstrated good consistency between DeepSeek-R1 and the two physicians, with κ values of 0.64 and 0.63, respectively.

Conclusion

The intelligent quality control system based on the DeepSeek-R1 model demonstrated superior performance to human reviewers in the normative evaluation of thyroid ultrasound reports, effectively reducing subjective errors and showing promising clinical applicability. However, the limitations of the model requires careful consideration.

Key words: Large language model, Artificial intelligence, DeepSeek, Ultrasound report, Quality control

Copyright © Chinese Journal of Medical Ultrasound (Electronic Edition), All Rights Reserved.
Tel: 010-51322630、2632、2628 Fax: 010-51322630 E-mail: csbjb@cma.org.cn
Powered by Beijing Magtech Co. Ltd