Home    中文  
 
  • Search
  • lucene Search
  • Citation
  • Fig/Tab
  • Adv Search
Just Accepted  |  Current Issue  |  Archive  |  Featured Articles  |  Most Read  |  Most Download  |  Most Cited

Chinese Journal of Medical Ultrasound (Electronic Edition) ›› 2025, Vol. 22 ›› Issue (11): 1055-1061. doi: 10.3877/cma.j.issn.1672-6448.2025.11.009

• Abdominal Ultrasound • Previous Articles    

Utility of DeepSeek large language models for structured ultrasound reporting and automated tumor staging in gastric and rectal cancer

Zhenqi Zhang, Man Lu, Yihan Qi, Ming Zhuang, Ziyue Hu, Lu Wang()   

  1. Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, University of Electronic Science and Technology of China, Chengdu 610041, China
  • Received:2025-08-25 Online:2025-11-01 Published:2026-02-12
  • Contact: Lu Wang

Abstract:

Objective

To investigate the utility of the DeepSeek large language model (LLM) in the structured generation of ultrasound reports and the automatic assessment of T-staging for gastric and rectal cancer.

Methods

A total of 121 ultrasound examination reports for gastric and rectal cancer, collected from Sichuan Cancer Hospital between January 2023 and December 2024, were included in this study. A structured template for gastric and rectal cancer ultrasound reports was developed by a team of senior sonographers. The DeepSeek R1 and V3 models were employed to extract structured information and assess T-staging. The performance of structured report generation was evaluated using recall, precision, and F1 score, while T-staging performance was assessed based on accuracy. Three physicians were invited to compare the reports generated by DeepSeek with the original reports to evaluate review efficiency and clinical usability.

Results

Regarding structured information extraction, both DeepSeek R1 and V3 models achieved recall, precision, and F1 scores exceeding 0.9, with no statistically significant differences between the two (P>0.05). In T-staging assessment, the DeepSeek R1 model (utilizing reasoning mode) achieved the highest accuracy of 76.86%, which was significantly superior to the 59.50% achieved by the DeepSeek V3 model (χ2=8.51, P<0.05). Compared to the average time required to review original reports [(60.96±6.11) s/report], the review time for structured reports generated by DeepSeek R1 [(18.12±4.52) s/report] (t=60.38; P<0.001) and DeepSeek V3 [(17.15±2.60) s/report] (t=71.98; P<0.001) was significantly shortened. The 5-point Likert scale evaluation showed that the score for the original reports was 3 (3, 3), while the scores for the DeepSeek R1 and V3 reports were 1 (1, 2) (Z=-9.72; P<0.001) and 1 (1, 2) (Z=-9.95; P<0.001), respectively, indicating a statistically significant difference.

Conclusion

The DeepSeek large language models, particularly the R1 version, can effectively extract structured information from gastric and rectal cancer ultrasound reports and demonstrates high accuracy in T-staging assessment. The generated reports contribute to improved review efficiency and possess the potential to assist in clinical decision-making.

Key words: Large language model, DeepSeek, Gastrointestinal ultrasound, Structured reporting, Tumor staging, Artificial intelligence

Copyright © Chinese Journal of Medical Ultrasound (Electronic Edition), All Rights Reserved.
Tel: 010-51322630、2632、2628 Fax: 010-51322630 E-mail: csbjb@cma.org.cn
Powered by Beijing Magtech Co. Ltd